Parser 2.0 (Major Delezer) #52
No reviewers
Labels
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: probablycorey/shrimp#52
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "parser2"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I still don't know if this is a good idea or not. But here's a new parser, written by hand instead of using lezer.
Most of the old tests pass without modification, as I've tried to match the existing APIs. If you check
src/compiler/compiler.tsyou can see how closely we matched it.The only part where AI helped directly was in the string interpolation parser, so I don't take any responsibility for what's going on in there. It's possible variables/dotget won't work inside string interpolation, I still need to test it.
When fully merged, the old parser can be removed - or we can keep it for our records. We can also right away remove some limitations that lezer placed on us, such as simplifying
DotGetparsing.The only tests I'm skipping from the old suite are related to error recovery, but I'm still planning on adding that functionality to this new parser.
Along the way I found (I think) a bug where
if something:won't be a function call butif something true:will, so I corrected that in the tests. Same problem withthrow somethingvsthrow something trueAlso, currently
DotGetdoes work with spaces, egobj . prop, so we might have to explicitly disable that. I also don't thinksomething.4.2.somethingworks in either parser, but I believe we can make it work in the new one.This PR also adds a new tokenizer and token-specific tests. That's where a lot of the magic happens.
On the one hand, this is nice because now we can do anything. On the other hand, it's nice to look at a grammar file to quickly get an overview of how everything is connected. In parser2, the relationships all live in code.
If you look at
parser2.ts, I've written it so all the important nodes are functions you call likeif()ordotGetFunctionCall(). I've also organized the parser into four sections:statement()andexpression()I've also updated
bin/shrimpso you can dobin/shrimp parse <file>orshrimp -p <file>to examine the parse tree using the new parser.Let me know what you think!
doallowed in arg/dict values 16cb47ddccthrowtakes an expression c6e5c44755./bin/shrimp parse3c539130b0One example of why we need errors back:
(new parser)

(old parser)

3c539130b0to757a50e23etest, should create the thread now
Okay, compiler errors work again!
This is slick. It is so explicit, so there is a bunch of code. But I traced through some examples and it all was pretty easy to figure out.
I can rip all the lezer and editor stuff out in another PR.
@ -0,0 +88,4 @@// Compound assignment operators'??=': 'NullishEq','+=': 'PlusEq',I've been wanting this
@ -0,0 +378,4 @@}// Operator precedence (binding power) - higher = tighter bindingexport const precedence: Record<string, number> = {This seems easier to understand than in the lezer file!
@ -0,0 +59,4 @@if (stmt) node.add(stmt)if (this.pos === prevPos && !this.isEOF())throw "parser didn't advance - you need to call next()\n\n ${this.input}\n"Needs backticks instead of
"Merged!