Professionalism
MOUNTAINS AND MOLEHILLS: FIXING TREE-SITTER-SCSS
In my last adventure, I complained mightily about how a tiny bug in tree-sitter-scss was all that stood between me and my next heroic work accomplishment. Although I didn't mention it at the time, I had little faith that my submitted issue would be addressed anytime soon, since when looking at the tree-sitter grammar repository I could see that the SCSS parser had last been updated in early April of 2024 and had been idle for eight months.
I took matters into my own hands.
As I said in the previous post, I am not, by any stretch of the imagination, an expert on parsers. Nor is my C especially sharp. It clearly was sharp once; in fact, if you are reading this, you likely acquired a copy by running C code I wrote... in 1997. Tree-sitter grammars are not written in C (usually), but there is a C layer that you can interact with to perform specialized scans for tokens from the source being parsed.
The first thing I did was come up with a test case. The bug is easy to manifest:
div {
--#{$x}--left: var(--#{$y}--right);
}
When the parser is trying to figure out exactly what that line beginning with double-dashes is, it
has two choices: it could be a simple declaration ("the thing on the left is the thing on the
right"), or it could be a CSS pseudo-class selector starting a nested block. Pseudo-class selectors
are things like :focus
or :not(p)
. It's the second one, where you can say "Everything inside a
div except the paragraphs...", that the scanner is trying to detect. The problem is that :var()
looks like it might be a pseudo-class selector.
Upon seeing the opening double-dashes, my first thought was, "Why is it trying to interpret a CSS
Custom Property identifier as a pseudo-class selector? They're semantically incompatible." But it
turns out, Sass parses identifiers twice. The first time, it just tries to find the whole
selector, because selectors can contain these: #{...}
. That is an SCSS Interpolation, where
Sass expressions (like variables, but also maths and maps and lookups) managed by the Sass
interpreter will appear and be interpreted and replaced with the results of those expressions. Sass
then parses the result, which it now expects to be a valid CSS selector.
I still wondered if there was some distinction I could make, but no, even The W3C defines identifiers this way, in the belief that it might someday make sense to use a CSS Custom Property in the selector position.
I also realized, while taking a walk to ruminate on the problem, that that line of thinking was irrelevant anyway; the problem might not be the presence of CSS Custom Properties at all. A Sass Interpolation may resolve into a CSS Custom Property, bringing its own double-dashes within the expression. The double-dashes were a distraction, not part of the problem.
Looking at the bug and tracing my way through it, I landed on the bit of C where the scanner reaches
a state where the :
it saw might be part of a pseudo-class selector. Here's the code that runs
after encountering a colon:
while (lexer->lookahead != ';' && lexer->lookahead != '}' &&
!lexer->eof(lexer)) {
advance(lexer);
if (lexer->lookahead == '{') {
lexer->result_symbol = PSEUDO_CLASS_SELECTOR_COLON;
return true;
}
}
return false;
It took me awhile, staring at this code, to figure out where the bug was. Can you see it? Look at the test and it'll become obvious.
Answer: : var(--#{
matches the criteria. That #{...}
is an SCSS Interpolation. It is not the
start of a CSS Ruleset Block, which is what this code is looking for. But if you look at the
while
loop, it always assumes any {
before a terminator must be the start of a CSS Ruleset Block
and marks the current token, yes, a pseudo-class selector. That's the bug.
The solution was to put a guard
clause into the loop recording whenever a #
was seen to tell the scanner "Even if the next
character is a {
, ignore it." And once that check was done, go back to saying, "After that check,
you can pay attention if the next character is a {
again."
Not only did this solve my bug, but there are four or five other tests in the test suite that are currently failing, but with this patch they aren't failing anymore.
I can only hope the Tree-Sitter people responsible for tree-sitter-scss pick this up and integrate it soon. In the meantime, I have a live copy, which means that I must now put my money where my mouth is, and start writing the Shatterfly transpiler.