Parser 2.0 (Major Delezer) #52

Merged
defunkt merged 35 commits from parser2 into main 2025-12-08 16:35:34 +00:00
Owner

I still don't know if this is a good idea or not. But here's a new parser, written by hand instead of using lezer.

Most of the old tests pass without modification, as I've tried to match the existing APIs. If you check src/compiler/compiler.ts you can see how closely we matched it.

The only part where AI helped directly was in the string interpolation parser, so I don't take any responsibility for what's going on in there. It's possible variables/dotget won't work inside string interpolation, I still need to test it.

When fully merged, the old parser can be removed - or we can keep it for our records. We can also right away remove some limitations that lezer placed on us, such as simplifying DotGet parsing.

The only tests I'm skipping from the old suite are related to error recovery, but I'm still planning on adding that functionality to this new parser.

Along the way I found (I think) a bug where if something: won't be a function call but if something true: will, so I corrected that in the tests. Same problem with throw something vs throw something true

Also, currently DotGet does work with spaces, eg obj . prop, so we might have to explicitly disable that. I also don't think something.4.2.something works in either parser, but I believe we can make it work in the new one.

This PR also adds a new tokenizer and token-specific tests. That's where a lot of the magic happens.

On the one hand, this is nice because now we can do anything. On the other hand, it's nice to look at a grammar file to quickly get an overview of how everything is connected. In parser2, the relationships all live in code.

If you look at parser2.ts, I've written it so all the important nodes are functions you call like if() or dotGetFunctionCall(). I've also organized the parser into four sections:

  • constructor
  • important nodes, like statement() and expression()
  • all the other nodes in alphabetical order
  • helper/utility functions

I've also updated bin/shrimp so you can do bin/shrimp parse <file> or shrimp -p <file> to examine the parse tree using the new parser.

Let me know what you think!

I still don't know if this is a good idea or not. But here's a new parser, written by hand instead of using lezer. Most of the old tests pass without modification, as I've tried to match the existing APIs. If you check `src/compiler/compiler.ts` you can see how closely we matched it. The only part where AI helped directly was in the string interpolation parser, so I don't take any responsibility for what's going on in there. It's possible variables/dotget won't work inside string interpolation, I still need to test it. When fully merged, the old parser can be removed - or we can keep it for our records. We can also right away remove some limitations that lezer placed on us, such as simplifying `DotGet` parsing. The only tests I'm skipping from the old suite are related to error recovery, but I'm still planning on adding that functionality to this new parser. Along the way I found (I think) a bug where `if something:` won't be a function call but `if something true:` will, so I corrected that in the tests. Same problem with `throw something` vs `throw something true` Also, currently `DotGet` *does* work with spaces, eg `obj . prop`, so we might have to explicitly disable that. I also don't think `something.4.2.something` works in either parser, but I believe we can make it work in the new one. This PR also adds a new tokenizer and token-specific tests. That's where a lot of the magic happens. On the one hand, this is nice because now we can do anything. On the other hand, it's nice to look at a grammar file to quickly get an overview of how everything is connected. In parser2, the relationships all live in code. If you look at `parser2.ts`, I've written it so all the important nodes are functions you call like `if()` or `dotGetFunctionCall()`. I've also organized the parser into four sections: - constructor - important nodes, like `statement()` and `expression()` - all the other nodes in alphabetical order - helper/utility functions I've also updated `bin/shrimp` so you can do `bin/shrimp parse <file>` or `shrimp -p <file>` to examine the parse tree using the new parser. Let me know what you think!
defunkt added 24 commits 2025-12-03 00:18:58 +00:00
defunkt added 2 commits 2025-12-03 00:49:52 +00:00
defunkt added 1 commit 2025-12-03 00:54:33 +00:00
Author
Owner

One example of why we need errors back:

(new parser)
image

(old parser)
image

One example of why we need errors back: (new parser) <img width="1099" alt="image" src="/attachments/0f8bdcc2-c87e-4788-9859-f240a3198036"> (old parser) <img width="1054" alt="image" src="/attachments/fcaf5226-aa76-4b47-9261-5dc21bd7e74f">
defunkt force-pushed parser2 from 3c539130b0 to 757a50e23e 2025-12-03 01:11:47 +00:00 Compare

test, should create the thread now

_test, should create the thread now_
Author
Owner

Okay, compiler errors work again!

image
Okay, compiler errors work again! <img width="1338" alt="image" src="/attachments/1cfec4e1-2f23-4d3a-8073-8f51b93df228">
defunkt added 1 commit 2025-12-03 21:40:09 +00:00
defunkt added 1 commit 2025-12-03 21:40:20 +00:00
probablycorey approved these changes 2025-12-03 21:47:30 +00:00
probablycorey left a comment
Owner

This is slick. It is so explicit, so there is a bunch of code. But I traced through some examples and it all was pretty easy to figure out.

I can rip all the lezer and editor stuff out in another PR.

This is slick. It is so explicit, so there is a bunch of code. But I traced through some examples and it all was pretty easy to figure out. I can rip all the lezer and editor stuff out in another PR.
@ -0,0 +88,4 @@
// Compound assignment operators
'??=': 'NullishEq',
'+=': 'PlusEq',

I've been wanting this

I've been wanting this
@ -0,0 +378,4 @@
}
// Operator precedence (binding power) - higher = tighter binding
export const precedence: Record<string, number> = {

This seems easier to understand than in the lezer file!

This seems easier to understand than in the lezer file!
@ -0,0 +59,4 @@
if (stmt) node.add(stmt)
if (this.pos === prevPos && !this.isEOF())
throw "parser didn't advance - you need to call next()\n\n ${this.input}\n"

Needs backticks instead of "

Needs backticks instead of `"`
defunkt added 1 commit 2025-12-03 23:42:13 +00:00
defunkt added 1 commit 2025-12-03 23:52:30 +00:00
defunkt added 1 commit 2025-12-05 23:25:39 +00:00
defunkt added 1 commit 2025-12-05 23:45:32 +00:00
defunkt added 1 commit 2025-12-07 05:22:46 +00:00
defunkt added 1 commit 2025-12-07 05:23:31 +00:00
defunkt merged commit 5994a2d8f4 into main 2025-12-08 16:35:34 +00:00
Author
Owner

Merged!

Merged!
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: probablycorey/shrimp#52
No description provided.