wip
This commit is contained in:
parent
023bfb2caa
commit
dbe5e60d04
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -34,3 +34,4 @@ report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
|
|||
.DS_Store
|
||||
|
||||
/tmp
|
||||
/docs
|
||||
104
CLAUDE.md
104
CLAUDE.md
|
|
@ -286,6 +286,110 @@ The `toMatchTree` helper compares parser output with expected CST structure.
|
|||
|
||||
**Empty line parsing**: The grammar structure `(statement | newlineOrSemicolon)+ eof?` allows proper empty line and EOF handling.
|
||||
|
||||
## Lezer: Surprising Behaviors
|
||||
|
||||
These discoveries came from implementing string interpolation with external tokenizers. See `tmp/string-test4.grammar` for working examples.
|
||||
|
||||
### 1. Rule Capitalization Controls Tree Structure
|
||||
|
||||
**The most surprising discovery**: Rule names determine whether nodes appear in the parse tree.
|
||||
|
||||
**Lowercase rules get inlined** (no tree nodes):
|
||||
```lezer
|
||||
statement { assign | expr } // ❌ No "statement" node
|
||||
assign { x "=" y } // ❌ No "assign" node
|
||||
expr { x | y } // ❌ No "expr" node
|
||||
```
|
||||
|
||||
**Capitalized rules create tree nodes**:
|
||||
```lezer
|
||||
Statement { Assign | Expr } // ✅ Creates Statement node
|
||||
Assign { x "=" y } // ✅ Creates Assign node
|
||||
Expr { x | y } // ✅ Creates Expr node
|
||||
```
|
||||
|
||||
**Why this matters**: When debugging grammar that "doesn't match," check capitalization first. The rules might be matching perfectly—they're just being compiled away!
|
||||
|
||||
Example: `x = 42` was parsing as `Program(Identifier,"=",Number)` instead of `Program(Statement(Assign(...)))`. The grammar rules existed and were matching, but they were inlined because they were lowercase.
|
||||
|
||||
### 2. @skip {} Wrapper is Essential for Preserving Whitespace
|
||||
|
||||
**Initial assumption (wrong)**: Could exclude whitespace from token patterns to avoid needing `@skip {}`.
|
||||
|
||||
**Reality**: The `@skip {}` wrapper is absolutely required to preserve whitespace in strings:
|
||||
|
||||
```lezer
|
||||
@skip {} {
|
||||
String { "'" StringContent* "'" }
|
||||
}
|
||||
|
||||
@tokens {
|
||||
StringFragment { !['\\$]+ } // Matches everything including spaces
|
||||
}
|
||||
```
|
||||
|
||||
**Without the wrapper**: All spaces get stripped by the global `@skip { space }`, even though `StringFragment` can match them.
|
||||
|
||||
**Test that proved it wrong**: `' spaces '` was being parsed as `"spaces"` (leading/trailing spaces removed) until we added `@skip {}`.
|
||||
|
||||
### 3. External Tokenizers Work Inside @skip {} Blocks
|
||||
|
||||
**Initial assumption (wrong)**: External tokenizers can't be used inside `@skip {}` blocks, so identifier patterns need to be duplicated as simple tokens.
|
||||
|
||||
**Reality**: External tokenizers work perfectly inside `@skip {}` blocks! The tokenizer gets called even when skip is disabled.
|
||||
|
||||
**Working pattern**:
|
||||
```lezer
|
||||
@external tokens tokenizer from "./tokenizer" { Identifier, Word }
|
||||
|
||||
@skip {} {
|
||||
String { "'" StringContent* "'" }
|
||||
}
|
||||
|
||||
Interpolation {
|
||||
"$" Identifier | // ← Uses external tokenizer!
|
||||
"$" "(" expr ")"
|
||||
}
|
||||
```
|
||||
|
||||
**Test that proved it**: `'hello $name'` correctly calls the external tokenizer for `name` inside the string, creating an `Identifier` token. No duplication needed!
|
||||
|
||||
### 4. Single-Character Tokens Can Be Literals
|
||||
|
||||
**Initial approach**: Define every single character as a token:
|
||||
```lezer
|
||||
@tokens {
|
||||
dollar[@name="$"] { "$" }
|
||||
backslash[@name="\\"] { "\\" }
|
||||
}
|
||||
```
|
||||
|
||||
**Simpler approach**: Just use literals in the grammar rules:
|
||||
```lezer
|
||||
Interpolation {
|
||||
"$" Identifier | // Literal "$"
|
||||
"$" "(" expr ")"
|
||||
}
|
||||
|
||||
StringEscape {
|
||||
"\\" ("$" | "n" | ...) // Literal "\\"
|
||||
}
|
||||
```
|
||||
|
||||
This works fine and reduces boilerplate in the @tokens section.
|
||||
|
||||
### 5. StringFragment as Simple Token, Not External
|
||||
|
||||
For string content, use a simple token pattern instead of handling it in the external tokenizer:
|
||||
|
||||
```lezer
|
||||
@tokens {
|
||||
StringFragment { !['\\$]+ } // Simple pattern: not quote, backslash, or dollar
|
||||
}
|
||||
```
|
||||
|
||||
The external tokenizer should focus on Identifier/Word distinction at the top level. String content is simpler and doesn't need the complexity of the external tokenizer.
|
||||
|
||||
### Why expressionWithoutIdentifier Exists
|
||||
|
||||
The grammar has an unusual pattern: `expressionWithoutIdentifier`. This exists to solve a GLR conflict:
|
||||
|
|
|
|||
|
|
@ -76,9 +76,9 @@ describe('compiler', () => {
|
|||
|
||||
test('function call with named and positional args', () => {
|
||||
expect(`minus = fn a b: a - b end; minus b=2 9`).toEvaluateTo(7)
|
||||
expect(`minus = fn c d: a - b end; minus 90 b=20`).toEvaluateTo(70)
|
||||
expect(`minus = fn e f: a - b end; minus a=900 200`).toEvaluateTo(700)
|
||||
expect(`minus = fn g h: a - b end; minus 2000 a=9000`).toEvaluateTo(7000)
|
||||
expect(`minus = fn a b: a - b end; minus 90 b=20`).toEvaluateTo(70)
|
||||
expect(`minus = fn a b: a - b end; minus a=900 200`).toEvaluateTo(700)
|
||||
expect(`minus = fn a b: a - b end; minus 2000 a=9000`).toEvaluateTo(7000)
|
||||
})
|
||||
|
||||
test('function call with no args', () => {
|
||||
|
|
|
|||
|
|
@ -16,8 +16,8 @@ import {
|
|||
getPipeExprParts,
|
||||
} from '#compiler/utils'
|
||||
|
||||
const DEBUG = false
|
||||
// const DEBUG = true
|
||||
// const DEBUG = false
|
||||
const DEBUG = true
|
||||
|
||||
type Label = `.${string}`
|
||||
export class Compiler {
|
||||
|
|
|
|||
|
|
@ -98,7 +98,8 @@ describe('Parentheses', () => {
|
|||
|
||||
expect("('hello')").toMatchTree(`
|
||||
ParenExpr
|
||||
String hello`)
|
||||
String
|
||||
StringFragment hello`)
|
||||
|
||||
expect('(true)').toMatchTree(`
|
||||
ParenExpr
|
||||
|
|
@ -413,7 +414,8 @@ describe('if/elsif/else', () => {
|
|||
Number 1
|
||||
colon :
|
||||
ThenBlock
|
||||
String cool
|
||||
String
|
||||
StringFragment cool
|
||||
`)
|
||||
|
||||
expect('a = if x: 2').toMatchTree(`
|
||||
|
|
@ -624,8 +626,10 @@ describe('pipe expressions', () => {
|
|||
describe('multiline', () => {
|
||||
test('parses multiline strings', () => {
|
||||
expect(`'first'\n'second'`).toMatchTree(`
|
||||
String first
|
||||
String second`)
|
||||
String
|
||||
StringFragment first
|
||||
String
|
||||
StringFragment second`)
|
||||
})
|
||||
|
||||
test('parses multiline functions', () => {
|
||||
|
|
@ -689,3 +693,26 @@ end
|
|||
`)
|
||||
})
|
||||
})
|
||||
|
||||
describe('string interpolation', () => {
|
||||
test('string with variable interpolation', () => {
|
||||
expect("'hello $name'").toMatchTree(`
|
||||
String
|
||||
StringFragment ${'hello '}
|
||||
Interpolation
|
||||
Identifier name
|
||||
`)
|
||||
})
|
||||
|
||||
test('string with expression interpolation', () => {
|
||||
expect("'sum is $(a + b)'").toMatchTree(`
|
||||
String
|
||||
StringFragment ${'sum is '}
|
||||
Interpolation
|
||||
BinOp
|
||||
Identifier a
|
||||
operator +
|
||||
Identifier b
|
||||
`)
|
||||
})
|
||||
})
|
||||
|
|
|
|||
|
|
@ -7,10 +7,10 @@
|
|||
@tokens {
|
||||
@precedence { Number "-" }
|
||||
|
||||
StringFragment { !['\\$]+ }
|
||||
NamedArgPrefix { $[a-z]+ "=" }
|
||||
Number { "-"? $[0-9]+ ('.' $[0-9]+)? }
|
||||
Boolean { "true" | "false" }
|
||||
String { '\'' ![']* '\'' }
|
||||
newlineOrSemicolon { "\n" | ";" }
|
||||
eof { @eof }
|
||||
space { " " | "\t" }
|
||||
|
|
@ -36,6 +36,7 @@
|
|||
"*"[@name=operator]
|
||||
"/"[@name=operator]
|
||||
"|"[@name=operator]
|
||||
|
||||
}
|
||||
|
||||
@external tokens tokenizer from "./tokenizer" { Identifier, Word }
|
||||
|
|
@ -160,13 +161,36 @@ BinOp {
|
|||
}
|
||||
|
||||
ParenExpr {
|
||||
leftParen (ambiguousFunctionCall | BinOp | expressionWithoutIdentifier | ConditionalOp | PipeExpr) rightParen
|
||||
leftParen parenContent rightParen
|
||||
}
|
||||
|
||||
parenContent {
|
||||
(ambiguousFunctionCall | BinOp | expressionWithoutIdentifier | ConditionalOp | PipeExpr)
|
||||
}
|
||||
|
||||
expression {
|
||||
expressionWithoutIdentifier | Identifier
|
||||
}
|
||||
|
||||
@skip {} {
|
||||
String { "'" stringContent* "'" }
|
||||
}
|
||||
|
||||
stringContent {
|
||||
StringFragment |
|
||||
Interpolation |
|
||||
StringEscape
|
||||
}
|
||||
|
||||
Interpolation {
|
||||
"$" Identifier |
|
||||
"$" leftParen parenContent rightParen
|
||||
}
|
||||
|
||||
StringEscape {
|
||||
"\\" ("$" | "n" | "t" | "r" | "\\" | "'")
|
||||
}
|
||||
|
||||
// We need expressionWithoutIdentifier to avoid conflicts in consumeToTerminator.
|
||||
// Without this, when parsing "my-var" at statement level, the parser can't decide:
|
||||
// - ambiguousFunctionCall → FunctionCallOrIdentifier → Identifier
|
||||
|
|
|
|||
4
src/parser/shrimp.grammar.d.ts
vendored
Normal file
4
src/parser/shrimp.grammar.d.ts
vendored
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
declare module '*.grammar' {
|
||||
const content: string
|
||||
export default content
|
||||
}
|
||||
|
|
@ -11,17 +11,20 @@ export const
|
|||
BinOp = 9,
|
||||
ConditionalOp = 14,
|
||||
String = 23,
|
||||
Number = 24,
|
||||
Boolean = 25,
|
||||
FunctionDef = 26,
|
||||
Params = 28,
|
||||
colon = 29,
|
||||
end = 30,
|
||||
Underscore = 31,
|
||||
NamedArg = 32,
|
||||
NamedArgPrefix = 33,
|
||||
IfExpr = 35,
|
||||
ThenBlock = 38,
|
||||
ElsifExpr = 39,
|
||||
ElseExpr = 41,
|
||||
Assign = 43
|
||||
StringFragment = 24,
|
||||
Interpolation = 25,
|
||||
StringEscape = 26,
|
||||
Number = 27,
|
||||
Boolean = 28,
|
||||
FunctionDef = 29,
|
||||
Params = 31,
|
||||
colon = 32,
|
||||
end = 33,
|
||||
Underscore = 34,
|
||||
NamedArg = 35,
|
||||
NamedArgPrefix = 36,
|
||||
IfExpr = 38,
|
||||
ThenBlock = 41,
|
||||
ElsifExpr = 42,
|
||||
ElseExpr = 44,
|
||||
Assign = 46
|
||||
|
|
|
|||
|
|
@ -4,20 +4,20 @@ import {tokenizer} from "./tokenizer"
|
|||
import {highlighting} from "./highlight"
|
||||
export const parser = LRParser.deserialize({
|
||||
version: 14,
|
||||
states: ",rQVQTOOO!rQUO'#CdO#SQPO'#CeO#bQPO'#DdO$[QTO'#CcOOQS'#Dh'#DhO$cQPO'#DgO$zQTO'#DkOOQS'#Cv'#CvOOQO'#De'#DeO%SQPO'#DdO%bQTO'#DoOOQO'#DP'#DPOOQO'#Dd'#DdO%iQPO'#DcOOQS'#Dc'#DcOOQS'#DY'#DYQVQTOOOOQS'#Dg'#DgOOQS'#Cb'#CbO%qQTO'#C|OOQS'#Df'#DfOOQS'#DZ'#DZO&OQUO,58{O&oQTO,59sO%bQTO,59PO%bQTO,59PO&|QUO'#CdO(XQPO'#CeO(iQPO,58}O(zQPO,58}O(uQPO,58}O)uQPO,58}OOQS'#D['#D[O)}QTO'#CxO*VQPO,5:VO*[QTO'#D^O*aQPO,58zO*rQPO,5:ZO*yQPO,5:ZOOQS,59},59}OOQS-E7W-E7WOOQS,59h,59hOOQS-E7X-E7XOOQO1G/_1G/_OOQO1G.k1G.kO+OQPO1G.kO%bQTO,59UO%bQTO,59UOOQS1G.i1G.iOOQS-E7Y-E7YO+jQTO1G/qO+zQUO'#CdOOQO,59x,59xOOQO-E7[-E7[O,kQTO1G/uOOQO1G.p1G.pO,{QPO1G.pO-VQPO7+%]O-[QTO7+%^OOQO'#DR'#DROOQO7+%a7+%aO-lQTO7+%bOOQS<<Hw<<HwO.SQPO'#D]O.XQTO'#DnO.oQPO<<HxOOQO'#DS'#DSO.tQPO<<H|OOQS,59w,59wOOQS-E7Z-E7ZOOQSAN>dAN>dO%bQTO'#DTOOQO'#D_'#D_O/PQPOAN>hO/[QPO'#DVOOQOAN>hAN>hO/aQPOAN>hO/fQPO,59oO/mQPO,59oOOQO-E7]-E7]OOQOG24SG24SO/rQPOG24SO/wQPO,59qO/|QPO1G/ZOOQOLD)nLD)nO-[QTO1G/]O-lQTO7+$uOOQO7+$w7+$wOOQO<<Ha<<Ha",
|
||||
stateData: "0U~O!UOS~OPPOQTOgTOhTOiTOkVOtZO!]SO!a_O~OPbOQTOgTOhTOiTOkVOocOqdO!]SOY!ZXZ!ZX[!ZX]!ZXrWX~O_hO!aWX!eWXnWX~PtOYiOZiO[jO]jO~OYiOZiO[jO]jO!a!WX!e!WXn!WX~OQTOgTOhTOiTO!]SO~OPkO~P#yOY!ZXZ!ZX[!ZX]!ZX!a!WX!e!WXn!WX~OPqOmlP~OrtO!a!WX!e!WXn!WX~OPbO~P#yO!axO!exO~OPbOkVOozO~P#yOPbOkVOocOqdOrTa!aTa!eTa!^TanTa~P#yOPPOkVOtZO~P#yO_!ZX`!ZXa!ZXb!ZXc!ZXd!ZXe!ZXf!ZX!^WX~PtO_!PO`!POa!POb!POc!POd!POe!QOf!QO~OYiOZiO[jO]jO~P'mOYiOZiO[jO]jO!^!RO~O!^!ROY!ZXZ!ZX[!ZX]!ZX_!ZX`!ZXa!ZXb!ZXc!ZXd!ZXe!ZXf!ZX~OrtO!^!RO~OPqOmlX~Om!TO~OP!UO~OrtO!aSa!eSa!^SanSa~Om!XO~P'mOm!XO~OYiOZiO[Xi]Xi!aXi!eXi!^XinXi~OPPOkVOtZO!a!]O~P#yOPbOkVOocOqdOrWX!aWX!eWX!^WXnWX~P#yOPPOkVOtZO!a!`O~P#yO!^^im^i~P'mOn!aO~OPPOkVOtZOn!bP~P#yOPPOkVOtZOn!bPx!bPz!bP~P#yO!a!gO~OPPOkVOtZOn!bXx!bXz!bX~P#yOn!iO~On!nOx!jOz!mO~On!sOx!jOz!mO~Om!uO~On!sO~Om!vO~P'mOm!vO~On!wO~O!a!xO~O!a!yO~Oh]~",
|
||||
goto: "*P!ePPPP!f!u#T#Z!u#sPPPP$YPPPPPPPPPPP$fP${PPP#TPP%OP%[%_%hP%lP%O%r%x&Q&W&a&hPPP&n&r'W'j'p(lPP)Y)YP)k)s)sd]Oah!T!X!]!`!c!x!yRoSiXOSaht!T!X!]!`!c!x!yXePgk!U}TOPSZadghijk!P!Q!T!U!X!]!`!c!j!x!ydROah!T!X!]!`!c!x!yQmSQ}iR!OjQoSQwZQ!Y!QR!q!jd]Oah!T!X!]!`!c!x!yWcPgk!URzdRsVe]Oah!T!X!]!`!c!x!yR!_!XQ!f!`Q!z!xR!{!yT!k!f!lQ!o!fR!t!lQaORyaUgPk!UR{gQrVR!SrW!c!]!`!x!yR!h!cSuYpR!WuQ!l!fR!r!lT`OaS^OaQ|hQ![!TQ!^!XZ!b!]!`!c!x!ydYOah!T!X!]!`!c!x!yQpSR!VtXfPgk!UdQOah!T!X!]!`!c!x!yWcPgk!UQlSQvZQzdQ}iQ!OjQ!Y!PQ!Z!QR!p!jdUOah!T!X!]!`!c!x!yfbPZdgijk!P!Q!U!jRnSoWOPadghk!T!U!X!]!`!c!x!yQ!d!]V!e!`!x!ye[Oah!T!X!]!`!c!x!y",
|
||||
nodeNames: "⚠ Identifier Word Program PipeExpr FunctionCall PositionalArg ParenExpr FunctionCallOrIdentifier BinOp operator operator operator operator ConditionalOp operator operator operator operator operator operator operator operator String Number Boolean FunctionDef keyword Params colon end Underscore NamedArg NamedArgPrefix operator IfExpr keyword ThenBlock ThenBlock ElsifExpr keyword ElseExpr keyword Assign",
|
||||
maxTerm: 67,
|
||||
states: ".pQVQaOOO!rQbO'#CdO#SQPO'#CeO#bQPO'#DhO#yQaO'#CcO$_OSO'#CsOOQ`'#Dl'#DlO$mQPO'#DkO%UQaO'#DwOOQ`'#Cy'#CyOOQO'#Di'#DiO%^QPO'#DhO%lQaO'#D{OOQO'#DS'#DSOOQO'#Dh'#DhO&QQPO'#DgOOQ`'#Dg'#DgOOQ`'#D]'#D]QVQaOOOOQ`'#Dk'#DkOOQ`'#Cb'#CbO&YQaO'#DPOOQ`'#Dj'#DjOOQ`'#D^'#D^O&dQbO,58{O'QQaO,59vO%lQaO,59PO%lQaO,59PO'lQbO'#CdO(wQPO'#CeO)XQPO'#DnO)jQPO'#DnOOQO'#Dn'#DnO*eQPO,58}O*jQPO'#DnO*rQaO'#CuO*zQWO'#CvOOOO'#Dq'#DqOOOO'#D_'#D_O+`OSO,59_OOQ`,59_,59_OOQ`'#D`'#D`O+nQaO'#C{O+vQPO,5:cO+{QaO'#DbO,QQPO,58zO,cQPO,5:gO,jQPO,5:gOOQ`,5:R,5:ROOQ`-E7Z-E7ZOOQ`,59k,59kOOQ`-E7[-E7[OOQO1G/b1G/bOOQO1G.k1G.kO,oQPO1G.kO%lQaO,59UO%lQaO,59UOOQ`1G.i1G.iOOOO,59a,59aO#yQaO,59aOOOO,59b,59bOOOO-E7]-E7]OOQ`1G.y1G.yOOQ`-E7^-E7^O-ZQaO1G/}O-bQbO'#CdOOQO,59|,59|OOQO-E7`-E7`O.OQaO1G0ROOQO1G.p1G.pO.VQPO1G.pO.aQPO1G.{O.fQPO7+%iO.kQaO7+%jOOQO'#DU'#DUOOQO7+%m7+%mO.rQaO7+%nOOOO7+$g7+$gOOQ`<<IT<<ITO/PQPO'#DaO/UQaO'#DzO/cQPO<<IUOOQO'#DV'#DVO/hQPO<<IYOOQ`,59{,59{OOQ`-E7_-E7_OOQ`AN>pAN>pO%lQaO'#DWOOQO'#Dc'#DcO/sQPOAN>tO0OQPO'#DYOOQOAN>tAN>tO0TQPOAN>tO0YQPO,59rO0aQPO,59rOOQO-E7a-E7aOOQOG24`G24`O0fQPOG24`O0kQPO,59tO0pQPO1G/^OOQOLD)zLD)zO.kQaO1G/`O.rQaO7+$xOOQO7+$z7+$zOOQO<<Hd<<Hd",
|
||||
stateData: "0x~O!YOS~OPPOQUOkUOlUOnWOw[O!aSO!dTO!m`O~OPcOQUOkUOlUOnWOrdOteO!aSO!dTOY!_XZ!_X[!_X]!_XuWX~O_iO!mWX!qWXqWX~PtOYjOZjO[kO]kO~OYjOZjO[kO]kO!m![X!q![Xq![X~OPlOQUOkUOlUO!aSO!dTO~OhuO!dxO!fsO!gtO~OY!_XZ!_X[!_X]!_X!m![X!q![Xq![X~OPyOpoP~Ou|O!m![X!q![Xq![X~OPcOQUOkUOlUO!aSO!dTO~O!m!QO!q!QO~OnWOr!SO~P%lOnWOrdOteOuTa!mTa!qTa!cTaqTa~P%lOPPOQUOkUOlUOnWOw[O!aSO!dTO~O_!_X`!_Xa!_Xb!_Xc!_Xd!_Xe!_Xf!_X!cWX~PtO_!XO`!XOa!XOb!XOc!XOd!XOe!YOf!YO~OYjOZjO[kO]kO~P(]OYjOZjO[kO]kO!c!bX~OY!_XZ!_X[!_X]!_X_!_X`!_Xa!_Xb!_Xc!_Xd!_Xe!_Xf!_X!c!bX~O!c!ZO~Ou|O!c!bX~OP![O!a!]O~O!d!^O!f!^O!g!^O!h!^O!i!^O!j!^O~OhuO!d!`O!fsO!gtO~OPyOpoX~Op!bO~OP!cO~Ou|O!mSa!qSa!cSaqSa~Op!fO~P(]Op!fO~OYjOZjO[Xi]Xi!mXi!qXi!cXiqXi~O!m!kO~P'QOnWOrdOteOuWX!mWX!qWX!cWXqWX~P%lO!m!nO~P'QO!c^ip^i~P(]O!c!oO~Oq!pO~Oq!nP~P'QOq!nP{!nP}!nP~P'QO!m!vO~Oq!nX{!nX}!nX~P'QOq!xO~Oq!}O{!yO}!|O~Oq#SO{!yO}!|O~Op#UO~Oq#SO~Op#VO~P(]Op#VO~Oq#WO~O!m#XO~O!m#YO~Ok]~",
|
||||
goto: "*y!qPPPP!r#S#c#i#S$SPPPP$jPPPPPPPP#iP$w$wPP${P%bPPP#cPP%eP%q%t%}P&RP%e&X&_&g&m&s&|'TPPP'Z'_'s(W(^)ZP)xPP*OPPPPP*S*SP*e*m*md^Obi!b!f!k!n!r#X#YTpS!]kYOSbi|!]!b!f!k!n!r#X#YXfPhl!c!PUOPS[behijkl!X!Y!]!b!c!f!k!n!r!y#X#YdRObi!b!f!k!n!r#X#YSnS!]Q!VjR!WkSpS!]Q!P[Q!g!YR#Q!yTuTwd^Obi!b!f!k!n!r#X#YWdPhl!cR!SeR{We^Obi!b!f!k!n!r#X#YR!m!fQ!u!nQ#Z#XR#[#YT!z!u!{Q#O!uR#T!{QbOR!RbUhPl!cR!ThQwTR!_wQzWR!azW!r!k!n#X#YR!w!rS}ZrR!e}Q!{!uR#R!{TaObS_ObQ!UiQ!j!bQ!l!fZ!q!k!n!r#X#YdZObi!b!f!k!n!r#X#YSrS!]R!d|XgPhl!cdQObi!b!f!k!n!r#X#YWdPhl!cSmS!]Q!O[Q!SeQ!VjQ!WkQ!g!XQ!h!YR#P!ydVObi!b!f!k!n!r#X#YfcP[ehjkl!X!Y!c!yToS!]QqSR!i!]TvTwoXOPbehil!b!c!f!k!n!r#X#YQ!s!kV!t!n#X#Ye]Obi!b!f!k!n!r#X#Y",
|
||||
nodeNames: "⚠ Identifier Word Program PipeExpr FunctionCall PositionalArg ParenExpr FunctionCallOrIdentifier BinOp operator operator operator operator ConditionalOp operator operator operator operator operator operator operator operator String StringFragment Interpolation StringEscape Number Boolean FunctionDef keyword Params colon end Underscore NamedArg NamedArgPrefix operator IfExpr keyword ThenBlock ThenBlock ElsifExpr keyword ElseExpr keyword Assign",
|
||||
maxTerm: 79,
|
||||
nodeProps: [
|
||||
["closedBy", 29,"end"],
|
||||
["openedBy", 30,"colon"]
|
||||
["closedBy", 32,"end"],
|
||||
["openedBy", 33,"colon"]
|
||||
],
|
||||
propSources: [highlighting],
|
||||
skippedNodes: [0],
|
||||
repeatNodeCount: 6,
|
||||
tokenData: "-}~RoXY#SYZ#Xpq#Sqr#^wx#ixy$Wyz$]z{$b{|$g}!O$l!P!Q%_!Q![$t![!]%d!]!^#X!^!_%i!_!`%v!`!a%{#R#S&Y#T#U&_#U#X&s#X#Y'h#Y#Z*U#Z#]&s#]#^+}#^#c&s#c#d,i#d#h&s#h#i-T#i#o&s#p#q-s~~-x~#XO!U~~#^O!a~~#aP!_!`#d~#iO`~~#lTOw#iwx#{x;'S#i;'S;=`$Q<%lO#i~$QOg~~$TP;=`<%l#i~$]O!]~~$bO!^~~$gOY~~$lO[~~$qP]~!Q![$t~$yQh~!O!P%P!Q![$t~%SP!Q![%V~%[Ph~!Q![%V~%dOZ~~%iOm~~%nPa~!_!`%q~%vOb~~%{O_~~&QPc~!_!`&T~&YOd~~&_Oo~~&bS!_!`&n#T#b&s#b#c&|#c#o&sQ&sOqQQ&vQ!_!`&n#T#o&s~'PS!_!`&n#T#W&s#W#X']#X#o&s~'bQe~!_!`&n#T#o&s~'kU!_!`&n#T#`&s#`#a'}#a#b&s#b#c)j#c#o&sR(QS!_!`&n#T#g&s#g#h(^#h#o&sR(aU!_!`&n#T#X&s#X#Y(s#Y#]&s#]#^)O#^#o&sR(xQzP!_!`&n#T#o&sR)RS!_!`&n#T#Y&s#Y#Z)_#Z#o&sR)dQxP!_!`&n#T#o&s~)mS!_!`&n#T#W&s#W#X)y#X#o&s~*OQn~!_!`&n#T#o&s~*XT!_!`&n#T#U*h#U#b&s#b#c+r#c#o&s~*kS!_!`&n#T#`&s#`#a*w#a#o&s~*zS!_!`&n#T#g&s#g#h+W#h#o&s~+ZS!_!`&n#T#X&s#X#Y+g#Y#o&s~+lQi~!_!`&n#T#o&s~+wQk~!_!`&n#T#o&sR,QS!_!`&n#T#Y&s#Y#Z,^#Z#o&sR,cQtP!_!`&n#T#o&s~,lS!_!`&n#T#f&s#f#g,x#g#o&s~,}Qf~!_!`&n#T#o&s~-WS!_!`&n#T#f&s#f#g-d#g#o&s~-gS!_!`&n#T#i&s#i#j+W#j#o&s~-xOr~~-}O!e~",
|
||||
tokenizers: [0, 1, tokenizer],
|
||||
repeatNodeCount: 7,
|
||||
tokenData: "Hw~R!SOX$_XY$|YZ%gZp$_pq$|qr&Qrt$_tu'Yuw$_wx'_xy'dyz'}z{(h{|)R|}$_}!O)l!O!P$_!P!Q,b!Q![*]![!],{!]!^%g!^!_-f!_!`.p!`!a/Z!a#O$_#O#P0e#P#R$_#R#S0j#S#T$_#T#U1T#U#X2i#X#Y5O#Y#Z<U#Z#]2i#]#^Aa#^#b2i#b#cCR#c#dCx#d#f2i#f#gEj#g#h2i#h#iFa#i#o2i#o#p$_#p#qHX#q;'S$_;'S;=`$v<%l~$_~O$_~~HrS$dUhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_S$yP;=`<%l$__%TUhS!YZOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V%nUhS!mROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V&VWhSOt$_uw$_x!_$_!_!`&o!`#O$_#P;'S$_;'S;=`$v<%lO$_V&vU`RhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~'_O!f~~'dO!d~V'kUhS!aROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V(UUhS!cROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V(oUYRhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V)YU[RhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V)sWhS]ROt$_uw$_x!Q$_!Q![*]![#O$_#P;'S$_;'S;=`$v<%lO$_V*dYhSkROt$_uw$_x!O$_!O!P+S!P!Q$_!Q![*]![#O$_#P;'S$_;'S;=`$v<%lO$_V+XWhSOt$_uw$_x!Q$_!Q![+q![#O$_#P;'S$_;'S;=`$v<%lO$_V+xWhSkROt$_uw$_x!Q$_!Q![+q![#O$_#P;'S$_;'S;=`$v<%lO$_V,iUZRhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_T-SUhSpPOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V-mWaRhSOt$_uw$_x!_$_!_!`.V!`#O$_#P;'S$_;'S;=`$v<%lO$_V.^UbRhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V.wU_RhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V/bWcRhSOt$_uw$_x!_$_!_!`/z!`#O$_#P;'S$_;'S;=`$v<%lO$_V0RUdRhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~0jO!g~V0qUhSrROt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_V1Y[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#b2i#b#c3^#c#o2i#o;'S$_;'S;=`$v<%lO$_U2VUtQhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_U2nYhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_V3c[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#W2i#W#X4X#X#o2i#o;'S$_;'S;=`$v<%lO$_V4`YeRhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_V5T^hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#`2i#`#a6P#a#b2i#b#c:d#c#o2i#o;'S$_;'S;=`$v<%lO$_V6U[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#g2i#g#h6z#h#o2i#o;'S$_;'S;=`$v<%lO$_V7P^hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#X2i#X#Y7{#Y#]2i#]#^8r#^#o2i#o;'S$_;'S;=`$v<%lO$_V8SY}PhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_V8w[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#Y2i#Y#Z9m#Z#o2i#o;'S$_;'S;=`$v<%lO$_V9tY{PhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_V:i[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#W2i#W#X;_#X#o2i#o;'S$_;'S;=`$v<%lO$_V;fYhSqROt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_V<Z]hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#U=S#U#b2i#b#c@j#c#o2i#o;'S$_;'S;=`$v<%lO$_V=X[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#`2i#`#a=}#a#o2i#o;'S$_;'S;=`$v<%lO$_V>S[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#g2i#g#h>x#h#o2i#o;'S$_;'S;=`$v<%lO$_V>}[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#X2i#X#Y?s#Y#o2i#o;'S$_;'S;=`$v<%lO$_V?zYlRhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_V@qYnRhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_VAf[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#Y2i#Y#ZB[#Z#o2i#o;'S$_;'S;=`$v<%lO$_VBcYwPhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_^CYY!hWhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_VC}[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#f2i#f#gDs#g#o2i#o;'S$_;'S;=`$v<%lO$_VDzYfRhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$_^EqY!jWhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#o2i#o;'S$_;'S;=`$v<%lO$__Fh[!iWhSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#f2i#f#gG^#g#o2i#o;'S$_;'S;=`$v<%lO$_VGc[hSOt$_uw$_x!_$_!_!`2O!`#O$_#P#T$_#T#i2i#i#j>x#j#o2i#o;'S$_;'S;=`$v<%lO$_VH`UuRhSOt$_uw$_x#O$_#P;'S$_;'S;=`$v<%lO$_~HwO!q~",
|
||||
tokenizers: [0, 1, 2, 3, tokenizer],
|
||||
topRules: {"Program":[0,3]},
|
||||
tokenPrec: 693
|
||||
tokenPrec: 727
|
||||
})
|
||||
|
|
|
|||
|
|
@ -11,14 +11,16 @@ export const tokenizer = new ExternalTokenizer((input: InputStream, stack: Stack
|
|||
|
||||
while (true) {
|
||||
ch = getFullCodePoint(input, pos)
|
||||
if (isWhitespace(ch) || ch === -1) break
|
||||
|
||||
// Words and identifiers end at whitespace, single quotes, or end of input.
|
||||
if (isWhitespace(ch) || ch === 39 /* ' */ || ch === -1) break
|
||||
|
||||
// Certain characters might end a word or identifier if they are followed by whitespace.
|
||||
// This allows things like `a = hello; 2` or a = (basename ./file.txt)
|
||||
// to work as expected.
|
||||
if ((canBeWord && (ch === 59 /* ; */ || ch === 41)) /* ) */ || ch === 58 /* : */) {
|
||||
if (canBeWord && (ch === 59 /* ; */ || ch === 41 /* ) */ || ch === 58) /* : */) {
|
||||
const nextCh = getFullCodePoint(input, pos + 1)
|
||||
if (isWhitespace(nextCh) || nextCh === -1) {
|
||||
if (isWhitespace(nextCh) || nextCh === 39 /* ' */ || nextCh === -1) {
|
||||
break
|
||||
}
|
||||
}
|
||||
|
|
|
|||
244
today.md
244
today.md
|
|
@ -1,244 +0,0 @@
|
|||
# 🌟 Modern Language Inspiration & Implementation Plan
|
||||
|
||||
## Language Research Summary
|
||||
|
||||
### Pipe Operators Across Languages
|
||||
|
||||
| Language | Syntax | Placeholder | Notes |
|
||||
|----------|--------|-------------|-------|
|
||||
| **Gleam** | `\|>` | `_` | Placeholder can go anywhere, enables function capture |
|
||||
| **Elixir** | `\|>` | `&1`, `&2` | Always first arg by default, numbered placeholders |
|
||||
| **Nushell** | `\|` | structured data | Pipes structured data, not just text |
|
||||
| **F#** | `\|>` | none | Always first argument |
|
||||
| **Raku** | `==>` | `*` | Star placeholder for positioning |
|
||||
|
||||
### Conditional Syntax
|
||||
|
||||
| Language | Single-line | Multi-line | Returns Value |
|
||||
|----------|------------|------------|---------------|
|
||||
| **Lua** | `if x then y end` | `if..elseif..else..end` | No (statement) |
|
||||
| **Luau** | `if x then y else z` | Same | Yes (expression) |
|
||||
| **Ruby** | `x = y if condition` | `if..elsif..else..end` | Yes |
|
||||
| **Python** | `y if x else z` | `if..elif..else:` | Yes |
|
||||
| **Gleam** | N/A | `case` expressions | Yes |
|
||||
|
||||
## 🍤 Shrimp Design Decisions
|
||||
|
||||
### Pipe Operator with Placeholder (`|`)
|
||||
|
||||
**Syntax Choice: `|` with `_` placeholder**
|
||||
|
||||
```shrimp
|
||||
# Basic pipe with placeholder
|
||||
"hello world" | upcase _
|
||||
"log.txt" | tail _ lines=10
|
||||
|
||||
# Placeholder positioning flexibility
|
||||
"error.log" | grep "ERROR" _ | head _ 5
|
||||
data | process format="json" input=_
|
||||
|
||||
# Multiple placeholders (future consideration)
|
||||
value | combine _ _
|
||||
```
|
||||
|
||||
**Why this design:**
|
||||
- **`|` over `|>`**: Cleaner, more shell-like
|
||||
- **`_` placeholder**: Explicit, readable, flexible positioning
|
||||
- **Gleam-inspired**: Best of functional programming meets shell scripting
|
||||
|
||||
### Conditionals
|
||||
|
||||
**Multi-line syntax:**
|
||||
```shrimp
|
||||
if condition:
|
||||
expression
|
||||
elsif other-condition:
|
||||
expression
|
||||
else:
|
||||
expression
|
||||
end
|
||||
```
|
||||
|
||||
**Single-line syntax (expression form):**
|
||||
```shrimp
|
||||
result = if x = 5: "five"
|
||||
# Returns nil when false
|
||||
|
||||
result = if x > 0: "positive" else: "non-positive"
|
||||
# Explicit else for non-nil guarantee
|
||||
```
|
||||
|
||||
**Design choices:**
|
||||
- **`elsif` not `else if`**: Avoids nested parsing complexity (Ruby-style)
|
||||
- **`:` after conditions**: Consistent with function definitions
|
||||
- **`=` for equality**: Context-sensitive (assignment vs comparison)
|
||||
- **`nil` for no-value**: Short, clear, well-understood
|
||||
- **Expressions return values**: Everything is an expression philosophy
|
||||
|
||||
## 📝 Implementation Plan
|
||||
|
||||
### Phase 1: Grammar Foundation
|
||||
|
||||
**1.1 Add Tokens**
|
||||
```grammar
|
||||
@tokens {
|
||||
// Existing...
|
||||
"|" // Pipe operator
|
||||
"_" // Placeholder
|
||||
"if" // Conditionals
|
||||
"elsif"
|
||||
"else"
|
||||
"nil" // Null value
|
||||
}
|
||||
```
|
||||
|
||||
**1.2 Precedence Updates**
|
||||
```grammar
|
||||
@precedence {
|
||||
multiplicative @left,
|
||||
additive @left,
|
||||
pipe @left, // After arithmetic, before assignment
|
||||
assignment @right,
|
||||
call
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Grammar Rules
|
||||
|
||||
**2.1 Pipe Expression**
|
||||
```grammar
|
||||
PipeExpr {
|
||||
expression !pipe "|" PipeTarget
|
||||
}
|
||||
|
||||
PipeTarget {
|
||||
FunctionCallWithPlaceholder |
|
||||
FunctionCall // Error in compiler if no placeholder
|
||||
}
|
||||
|
||||
FunctionCallWithPlaceholder {
|
||||
Identifier PlaceholderArg+
|
||||
}
|
||||
|
||||
PlaceholderArg {
|
||||
PositionalArg | NamedArg | Placeholder
|
||||
}
|
||||
|
||||
Placeholder {
|
||||
"_"
|
||||
}
|
||||
```
|
||||
|
||||
**2.2 Conditional Expression**
|
||||
```grammar
|
||||
Conditional {
|
||||
SingleLineIf | MultiLineIf
|
||||
}
|
||||
|
||||
SingleLineIf {
|
||||
"if" Comparison ":" expression ElseClause?
|
||||
}
|
||||
|
||||
MultiLineIf {
|
||||
"if" Comparison ":" newlineOrSemicolon
|
||||
(line newlineOrSemicolon)*
|
||||
ElsifClause*
|
||||
ElseClause?
|
||||
"end"
|
||||
}
|
||||
|
||||
ElsifClause {
|
||||
"elsif" Comparison ":" newlineOrSemicolon
|
||||
(line newlineOrSemicolon)*
|
||||
}
|
||||
|
||||
ElseClause {
|
||||
"else" ":" (expression | (newlineOrSemicolon (line newlineOrSemicolon)*))
|
||||
}
|
||||
|
||||
Comparison {
|
||||
expression "=" expression // Context-sensitive in if/elsif
|
||||
}
|
||||
```
|
||||
|
||||
**2.3 Update line rule**
|
||||
```grammar
|
||||
line {
|
||||
PipeExpr |
|
||||
Conditional |
|
||||
FunctionCall |
|
||||
// ... existing rules
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 3: Test Cases
|
||||
|
||||
**Pipe Tests:**
|
||||
```shrimp
|
||||
# Basic placeholder
|
||||
"hello" | upcase _
|
||||
|
||||
# Named arguments with placeholder
|
||||
"file.txt" | process _ format="json"
|
||||
|
||||
# Chained pipes
|
||||
data | filter _ "error" | count _
|
||||
|
||||
# Placeholder in different positions
|
||||
5 | subtract 10 _ # 10 - 5 = 5
|
||||
```
|
||||
|
||||
**Conditional Tests:**
|
||||
```shrimp
|
||||
# Single line
|
||||
x = if n = 0: "zero"
|
||||
|
||||
# Single line with else
|
||||
sign = if n > 0: "positive" else: "negative"
|
||||
|
||||
# Multi-line
|
||||
if score > 90:
|
||||
grade = "A"
|
||||
elsif score > 80:
|
||||
grade = "B"
|
||||
else:
|
||||
grade = "C"
|
||||
end
|
||||
|
||||
# Nested conditionals
|
||||
if x > 0:
|
||||
if y > 0:
|
||||
quadrant = 1
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
### Phase 4: Compiler Implementation
|
||||
|
||||
**4.1 PipeExpr Handling**
|
||||
- Find placeholder position in right side
|
||||
- Insert left side value at placeholder
|
||||
- Error if no placeholder found
|
||||
|
||||
**4.2 Conditional Compilation**
|
||||
- Generate JUMP bytecode for branching
|
||||
- Handle nil returns for missing else
|
||||
- Context-aware `=` parsing
|
||||
|
||||
## 🎯 Key Decision Points
|
||||
|
||||
1. **Placeholder syntax**: `_` vs `$` vs `?` → **Choose `_` (Gleam-like)**
|
||||
2. **Pipe operator**: `|` vs `|>` vs `>>` → **Choose `|` (cleaner)**
|
||||
3. **Nil naming**: `nil` vs `null` vs `none` → **Choose `nil` (Ruby-like)**
|
||||
4. **Equality**: Keep `=` context-sensitive or add `==`? → **Keep `=` (simpler)**
|
||||
5. **Single-line if**: Require else or default nil? → **Default nil (flexible)**
|
||||
|
||||
## 🚀 Next Steps
|
||||
|
||||
1. Update grammar file with new tokens and rules
|
||||
2. Write comprehensive test cases
|
||||
3. Implement compiler support for pipes
|
||||
4. Implement conditional bytecode generation
|
||||
5. Test edge cases and error handling
|
||||
|
||||
This plan combines the best ideas from modern languages while maintaining Shrimp's shell-like simplicity and functional philosophy!
|
||||
Loading…
Reference in New Issue
Block a user