Debugging a syntax error in js2-mode
I haven’t been writing much JavaScript code lately, so my personal motivation to work on js2-mode has waned over the years, but it’s a cool package with some dedicated user base. I’m sure there are people who have wanted to contribute, but didn’t know where to start.
Below is the story of debugging and fixing a certain bug which took a
bit more effort than expected. It can probably serve as a primer both
for hacking js2-mode
and Emacs Lisp in general. Some pre-existing
knowledge of Lisp is required, though.
The Problem
The bug related to the parsing or trailing commas in arrow functions, which is apparently encouraged in some coding styles. This worked in normal functions (the feature was added several years prior):
function foo(a, b,) {
console.log(a, b);
}
But not in arrow functions:
(a, b,) => { console.log(a, b) };
An educated guess from one of the reporters was to check
js2-parse-function-params
and see whether it’s doing something
wrong. This function parses parameters for all kinds of functions
(expression, statement or arrow).
Late Introduction
js2-mode
uses a recursive-descent parser written in Emacs Lisp,
originally ported from Mozilla
Rhino by Steve
Yegge.
So most of its code falls into one of 3 categories:
-
Type definitions for the values representing the syntax tree nodes, using
cl-defstruct
. -
The token stream and the tokenizer. Main entry point is
js2-get-token
. -
The parser code, which is a bunch of functions that read the tokens one by one and call each other in a recursive fashion, according to the JavaScript grammar, while employing a couple of tricks to resolve the ambiguities (when you can’t tell in advance what kind of node you are looking at). And it constructs the syntax tree based on the tokens encountered.
js2-parse-function-params
is one of such functions, it helps reach
the list of function parameters and organizes them into a list, which
is then attached to the function node.
Debugging
So it’s a good place to start1. Let’s switch to *scratch*
, M-x
js2-mode
in there and paste the problematic snippet of code. It
flashes in red: syntax errors.
We switch to another window and visit js2-mode.el
in it. Navigate to
js2-parse-function-params
’s definition and instrument it to be
debugged with C-u C-M-x
. The echo area briefly flashes:
Edebug: js2-parse-function-params
Switch to *scratch*
and type space somewhere at the end, forcing the
buffer to be re-parsed. Emacs stops at the beginning of the
instrumented function.
We step through it, to the end, by tapping SPC
, noting the return
values displayed in the echo area.
Things look wrong, starting from the return value of
(js2-peek-token)
at the beginning of the loop, followed by
js2-must-match-name
failing later, and (js2-create-name-node)
creating a node with some odd :name
attribute.
The first return value was the best hint, though. It returned 165
(token types are numbers), which, if you C-s
through the file, leads
to this line:
(defvar js2-ARROW 165) ; function arrow (=>)
That means the tokenizer says that the next token after the current is
the arrow (=>
) already. We can double check that by seeing the value
of js2-ts-cursor
(by typing e
while debugging and typing the
variable’s name in the prompt). Its value is 12, the position right
after the arrow token.
But when we are parsing some construct, we really expect to be near its beginning (either at the first token, or before it). How come we are not there?
Let’s look at the list of errors. Type M-x js2-display-error-list
,
and it brings up a bunch of warnings, as well as these three errors:
line 1: missing ) after formal parameters
line 1: missing formal parameter
line 1: missing ) in parenthetical
Searching for the first two brings us back to
js2-parse-function-params
. These errors are to be expected.
The last one is more interesting. The error message key to search for
is "msg.no.paren"
, and it appears in 3 functions.
Try instrumenting them all. Only one of them is entered by the code
parsing this example: js2-parse-paren-expr-or-generator-comp
.
What does it do? It checks whether the current token is followed by
for
, or a right paren right away, and when it doesn’t, it calls
js2-parse-expr
and then checks that the said expression is followed
by js2-RP
, which is the “right paren” token. That fails to happen
in our case.
Look at js2-parse-expr
. It calls js2-parse-assign-expr
in a
loop, as long as it finds new js2-COMMA
tokens.
We instrument js2-parse-expr
. Stepping through it, we can see that it
ends up calling (js2-parse-assign-expr)
twice. The first call
returns a js2-name-node
, totally expected. The second call, inside
(setq right (js2-parse-assign-expr)
, returns a js2-function-node
(the return value printed in the echo area looks complex, but we can
make sure by evaluating (aref right 0)
). That’s a surprise.
To see how this happens, let’s go down the rabbit hole. First of all,
reset all instrumentation by calling M-x eval-buffer
in js2-mode.el
.
Then let’s simplify the code so the parser does less work for us to follow:
(a,) => { };
Then instrument js2-parse-expr
with C-u C-M-x
, make an edit in the
code buffer and step through the source until the cursor is after
(setq right
. Press i
to jump in.
Step through and jump inside every successive call to functions named
js2-parse-*
. Meaning, js2-parse-assign-expr
, then to
js2-parse-cond-expr
, and so on. When we reach
js2-parse-primary-expr
, it finishes with reporting a syntax error
and creating a js2-error-node
. The thing to note here is it
“consumes” the current token. The closing paren has just been parsed
as an error node.
As we continue with stepping through the code with SPC
, we can see
the execution bubble up to js2-parse-assign-expr
. In there, we see
(= tt js2-ARROW)
condition match, token stream rewound back to the
position before the last node parsed, and then parsed again as
function.
Erase and Rewind
What’s that about rewinding?
The ability to save and restore the current stream state was added for the implementation of the arrow functions’ support.
When the parameter list is parsed first, there is no good way to know in advance that it belongs to an arrow function, and it’s not just a parenthesized expression. So the parens are first parsed as an expression, and then if it’s followed by the arrow token, the stream is rewound and the code is parsed as a function.
For all of this to work, any valid parameters list of an arrow function has to be parse-able as a parenthesized expession, too.
At some point you might have been wondering: if the parameter list was
not parsed correctly, and it was not followed by the arrow token, how
come js2-parse-function-params
was called at all? Hopefully the
previous section answered that question 2.
Solution
One possible direction would be to change js2-parse-primary-expr
not
to consume the current token. Then the inner js2-parse-assign-expr
call wouldn’t match js2-ARROW
as next token, js2-parse-expr
would
find the closing paren fine, and the call to js2-parse-assign-expr
up the stack which we never actually stepped through, would find the
arrow token in its proper place and parse the arrow function fine.
Unfortunately, that’s kind of dangerous, that approach can lead to infinite looping. Error nodes should consume tokens to ensure that the parser moves forward.
The solution I ended with is a simplified version of Mozilla’s code from Firefox. When a feature is already implemented in there, it’s often handy to refer to its code (even if it looks much busier than the elegant Lisp we are currently editing). The general structure is similar enough, the hierarchy of calls in particular.
The function in question is GeneralParser::expr. We can see that rather than allow just any kind of errors inside, it handles two special cases (trailing comma and triple dot operator), and only when within a context where it’s possible that the curent node is a function parameter list.
The change I checked
in
is much shorter because we don’t really need to worry about (not)
reporting syntax errors: whenever we rewind the token stream to a
previous position, we change js2-parsed-errors
to its previous value
as well (see the end of js2-parse-assign-expr
).
Performance impact3 from this change was very modest (if noticeable at all), so it looks like a success.
Thanks for reading.
-
If you want to follow the scenario step-by-step, you need to check out the version of code that was used at the time, Git commit
b891ede
. ↩ -
To recap: what was recognized as the “parameter list” expression wasn’t followed by the arrow token, but the “error node” inside it was. So the arrow function was parsed from that position, and
js2-parse-function-params
was called then. ↩ -
To measure it, you can visit some large-ish file and evaluate
(benchmark 10 '(js2-reparse t))
. Then change the implementation,M-x eval-buffer
and run the benchmark again. Do this a few times, to get the feel for the spread. ↩