Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract parsed CSS property value definitions #494

Closed
foolip opened this issue Feb 3, 2021 · 16 comments
Closed

Extract parsed CSS property value definitions #494

foolip opened this issue Feb 3, 2021 · 16 comments

Comments

@foolip
Copy link
Member

foolip commented Feb 3, 2021

Back #113 (comment), @tidoust pointed out that reffy parses CSS value definitions and saves the result in a parsedValue in crawl.json, which for properties happens here:

Object.entries(result.css.properties || {}).forEach(([prop, dfn]) => {
if (dfn.value || dfn.newValues) {
try {
dfn.parsedValue = cssDfnParser.parsePropDefValue(
dfn.value || dfn.newValues);
} catch (e) {
dfn.valueParseError = e.message;
}
}
});

In order to test CSS property keyword values for BCD, it would be convenient to have this extracted into webref, or perhaps if css-grammar-parser.js was published as a standalone parser which one could use together with webref.

#110 has a burndown list of remaining syntax issues, but getting everything to 100% isn't necessary in order to provide value here.

@tidoust
Copy link
Member

tidoust commented Feb 3, 2021

I think that it makes more sense to only have unparsed CSS (and IDL) texts in webref. So I would favor publishing css-grammar-parser.js as a standalone parser.

For instance, in the case of IDL and as discussed in #489 (comment), the serialized version of the parsed structure as JSON is good for analyses but it cannot be used directly to re-serialize the structure as an IDL string or as HTML (other info, present in the in-memory AST when the original IDL text is parsed, is needed to be able to call the WebIDL writer). In other words, these parsed objects are only half-useful, depending on what you want to use them for.

Ping @dontcallmedom who developed the CSS grammar parser and who usually loves to make tools standalone :)

@dontcallmedom
Copy link
Member

I'm absolutely supportive of moving css-grammar-parser.js to its own package if there is a use for it; I'm more neutral on whether having parsed or unparsed content in webref, but I think it's a reasonable path forward to start by just having the unparsed fragments and the parser available separately.

@pyoor
Copy link
Contributor

pyoor commented Feb 3, 2021

Just an FYI but css-tree also includes a parser/generator for CSS value definition syntax:
https://github.com/csstree/csstree/tree/master/lib/definition-syntax

@dontcallmedom
Copy link
Member

@foolip I'm happy to look into moving the css grammar parser as a separate tool, but I wonder if the parser provided in CSS Tree would already fulfill that need - could you check if it fits the bill? if not, I'll be happy to package ours out.

@foolip
Copy link
Member Author

foolip commented Feb 4, 2021

@dontcallmedom would it make sense to try using css-tree in Reffy? I imagine it would reveal a bunch of cases which are handled differently by each parser, but if we're lucky css-tree might handle more things than what css-grammar-parser.js does?

It would be a nice outcome if CSS definitions and IDL definitions are handled similarly, both using an externally maintained library, but where Reffy depends on a specific version, so that consumers of webref can use that same version and be confident it will work.

@dontcallmedom
Copy link
Member

where Reffy depends on a specific version, so that consumers of webref can use that same version and be confident it will work.

That sounds like a good goal indeed; I'm happy to check if css-tree is able to parse as much as our own parser, but I'm not sure yet how to enforce the broader invariant - will have to think more about it.

@foolip
Copy link
Member Author

foolip commented Feb 4, 2021

I assume that it would be hopeless to ever get all specs to be valid at the same time, so just like for IDL, we'd need to take an approach of applying local fixes, see w3c/webref#63.

@dontcallmedom
Copy link
Member

so the invariant you suggest is that all the CSS fragments available in webref/*/css/*.json (i.e. if I'm correct given in properties[].value, properties[].newValue, descriptors[].value, and valuespaces[].value) be guaranteed to be parseable with the said library? that makes sense indeed, and would indeed need the same kind of postprocessing from w3c/webref#63.

@dontcallmedom
Copy link
Member

here are the errors that I get today trying to parse the said values with the csstree parser

Error parsing <\'background-color'> || <bg-image> || <bg-position> [ / <bg-size> ]? || <repeat-style> || <attachment> || <box> || <box> in ../webref/ed/css/css-backgrounds.json: Expect a keyword
  <\'background-color'> || <bg-image> || <bg-position> [ / <bg-size> ]? || <repeat-style> || <attachment> || <box> || <box>
---^
Error parsing <custom-arg>? : <extension-name> [ ( <custom-arg>+#? ) ]? ; in ../webref/ed/css/css-extensions.json: Expect `]`
  <custom-arg>? : <extension-name> [ ( <custom-arg>+#? ) ]? ;
----------------------------------------------------^
Error parsing ':' [ left | right | first | blank ] /* Margin rules */ in ../webref/ed/css/css-page.json: Unexpected input
  ':' [ left | right | first | blank ] /* Margin rules */
----------------------------------------^
Error parsing <​length​>{1,2} in ../webref/ed/css/css-tables.json: Expect a keyword
  <​length​>{1,2}
---^
Error parsing none | <length-percentage>+# in ../webref/ed/css/fill-stroke.json: Unexpected input
  none | <length-percentage>+#
-----------------------------^
Error parsing <number> ,? <number> ,? <number> ,? <number> in ../webref/ed/css/svg-animations.json: Unexpected input
  <number> ,? <number> ,? <number> ,? <number>
------------^
Error parsing <‘opacity’> in ../webref/ed/css/SVG.json: Expect a keyword
  <‘opacity’>
---^
Error parsing <‘opacity’> in ../webref/ed/css/SVG.json: Expect a keyword
  <‘opacity’>
---^
Error parsing [ none | <marker-ref> ]{1,4} [ / <‘marker-pattern’> ]? | <marker-ref>{0,4} [ <length> | <percentage> | <number> ] [ <length> | <percentage> | <number> | <marker-ref> ]* in ../webref/ed/css/svg-markers.json: Expect a keyword
  [ none | <marker-ref> ]{1,4} [ / <‘marker-pattern’> ]? | <marker-ref>{0,4} [ <length> | <percentage> | <number> ] [ <length> | <percentage> | <number> | <marker-ref> ]*
------------------------------------^
Error parsing [ <length> | <percentage> | <number> ]#* in ../webref/ed/css/svg-strokes.json: Unexpected input
  [ <length> | <percentage> | <number> ]#*
-----------------------------------------^

(haven't looked in the details yet)

@pyoor
Copy link
Contributor

pyoor commented Feb 4, 2021

One note about the above output. SVG specs use some custom notations not included in the CSS value definition syntax. See w3c/svgwg#642 (comment)

Also, css-tree recently implemented a service that can detect problematic syntax in https://github.com/w3c/csswg-drafts/. It would probably be better (easier) to adapt this service to use webref rather than parsing the repo directly.

css-tree problematic syntax

@foolip
Copy link
Member Author

foolip commented Feb 5, 2021

@dontcallmedom that is indeed the invariant I have in mind, with manual fixups to make it possible in practice. Can you tell at a glance if css-tree handles more or less of the scraped syntax than the existing parser?

@lahmatiy
Copy link

I can explain the output.

Error parsing <\'background-color'> || <bg-image> || <bg-position> [ / <bg-size> ]? || <repeat-style> || <attachment> || <box> || <box> in ../webref/ed/css/css-backgrounds.json: Expect a keyword
  <\'background-color'> || <bg-image> || <bg-position> [ / <bg-size> ]? || <repeat-style> || <attachment> || <box> || <box>
---^

Looks like missed escaping \' – backslash should be removed.


Error parsing <custom-arg>? : <extension-name> [ ( <custom-arg>+#? ) ]? ; in ../webref/ed/css/css-extensions.json: Expect `]`
  <custom-arg>? : <extension-name> [ ( <custom-arg>+#? ) ]? ;
Error parsing none | <length-percentage>+# in ../webref/ed/css/fill-stroke.json: Unexpected input
  none | <length-percentage>+#
-----------------------------^
Error parsing [ <length> | <percentage> | <number> ]#* in ../webref/ed/css/svg-strokes.json: Unexpected input
  [ <length> | <percentage> | <number> ]#*
-----------------------------------------^

Not pretty sure here. CSS Values & Units spec (where CSS Definition Syntax is defined) says nothing about sequence of combinators (is it legit or not). Tab Atkins commented that "stacked combinators" are legit. However, there are just 3 places across all the specs, where it used and, for sure, it can be avoided. I guess it should be discussed and a resolution added to the spec (CSS Values & Units).


Error parsing ':' [ left | right | first | blank ] /* Margin rules */ in ../webref/ed/css/css-page.json: Unexpected input
  ':' [ left | right | first | blank ] /* Margin rules */
----------------------------------------^

Definitions can't have comments, it should be removed.


Error parsing <​length​>{1,2} in ../webref/ed/css/css-tables.json: Expect a keyword
  <​length​>{1,2}
---^

Invisible zero-width symbol here. I had already fixed that w3c/csswg-drafts#4275, not sure why it appeared again.


Error parsing <number> ,? <number> ,? <number> ,? <number> in ../webref/ed/css/svg-animations.json: Unexpected input
  <number> ,? <number> ,? <number> ,? <number>
------------^

Combinators are never used with a comma, a parenthesis etc. I believe <number>#{4} | <number>{4} should be here.


Error parsing <‘opacity’> in ../webref/ed/css/SVG.json: Expect a keyword
  <‘opacity’>
---^
Error parsing <‘opacity’> in ../webref/ed/css/SVG.json: Expect a keyword
  <‘opacity’>
---^
Error parsing [ none | <marker-ref> ]{1,4} [ / <‘marker-pattern’> ]? | <marker-ref>{0,4} [ <length> | <percentage> | <number> ] [ <length> | <percentage> | <number> | <marker-ref> ]* in ../webref/ed/css/svg-markers.json: Expect a keyword
  [ none | <marker-ref> ]{1,4} [ / <‘marker-pattern’> ]? | <marker-ref>{0,4} [ <length> | <percentage> | <number> ] [ <length> | <percentage> | <number> | <marker-ref> ]*
------------------------------------^

should be replaced for ' (apostrophe) or removed.

@tidoust
Copy link
Member

tidoust commented Mar 4, 2021

Combinators are never used with a comma, a parenthesis etc. I believe <number>#{4} | <number>{4} should be here.

@lahmatiy Why couldn't combinators be used with a comma? According to the Value Definition Syntax, the literal , is a shortcut for ','. If I update the definition to <number> ','? <number> ','? <number> ','? <number>, parsing succeeds. I would expect it to succeed as well without the quotes.

Also, the <number>#{4} | <number>{4} construct does not quite achieve the same thing as far as I can tell. It would work with values such as 1 2 3 4 and 1, 2, 3, 4, but would not work with 1 2, 3 4 which seems valid from an SVG Animations perspective.

However, there are just 3 places across all the specs, where it used and, for sure, it can be avoided. I guess it should be discussed and a resolution added to the spec (CSS Values & Units).

Well, 3 places is already a good start ;) Based on the discussion you reference, there seems to be agreement in the CSS WG already that stacked operators are the right way to express certain constraints. Any plan to support that in the parser?

@tidoust
Copy link
Member

tidoust commented Mar 8, 2021

Definitions can't have comments, it should be removed.

I note that the Value Definition Syntax is clear that comments (and whitespaces) are allowed pretty much everywhere:
https://drafts.csswg.org/css-values/#component-whitespace

That said, in this particular case, Reffy's extraction logic should simply not assign the comment to this value, see #534.

@dontcallmedom
Copy link
Member

looking in the stacked modifiers discussion, I've filed an issue to try to get more clarity at least in the spec about which modifiers are allowed to be stacked w3c/csswg-drafts#6085

tidoust added a commit to w3c/webref that referenced this issue Mar 10, 2021
Code follows the same structure as for the `@webref/idl` package. Tests
make sure that all CSS fragments can be parsed with the Value Definition Syntax
parser of CSSTree.

A couple of CSS patches need to be applied to the extracts to fix value definitions.

Tests currently only guarantee that the value definitions can be parsed with CSSTree,
with the exception of a handful value definitions that are a priori valid but that
cannot be parsed with the CSSTree parser for the time being, see discussion in:
w3c/reffy#494 (comment)

Co-authored-by: Philip Jägenstedt <philip@foolip.org>
tidoust added a commit to w3c/webref that referenced this issue Apr 15, 2021
Code follows the same structure as for the `@webref/idl` package. Tests
make sure that all CSS fragments can be parsed with the Value Definition Syntax
parser of CSSTree.

A couple of CSS patches need to be applied to the extracts to fix value definitions.

Tests currently only guarantee that the value definitions can be parsed with CSSTree,
with the exception of a handful value definitions that are a priori valid but that
cannot be parsed with the CSSTree parser for the time being, see discussion in:
w3c/reffy#494 (comment)

Co-authored-by: Philip Jägenstedt <philip@foolip.org>
@tidoust
Copy link
Member

tidoust commented Sep 21, 2022

Closing the issue as css-tree added support for all missing syntaxes and successfully parses all CSS value definitions. Enforced through tests in webref.

@tidoust tidoust closed this as completed Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants