Common parser engine GitHub issue
vibecode
{"vibecode": { "doc": "parser-engine", "role": "forward-direction note: v1 keeps parsers hand-rolled and small; if the count grows to ~4-5 distinct parsers, revisit options like bundling LPeg or building a shared engine", "key_concepts": ["hand_rolled_parsers", "lpeg_bundling_option", "common_engine_trigger", "schema_driven_alternative"], "status": "deferred" }}
Context GitHub issue
In v1, each parser is hand-rolled:
- JSON parser: ~100 lines, simple.
- CSS selector parser: ~200–400 lines.
- HTML parser (Uma): schema-driven, ~1500 lines, but schemas (like html5.json) are easy enough for developers to write.
- Caspian itself: in Lua bootstrap, not part of the standard library.
The hand-rolled approach keeps each parser purpose-built and small. Schemas (the Uma model) cover the "user-defined markup language" case without requiring grammar authoring.
When to revisit GitHub issue
If we find ourselves accumulating hand-rolled parsers — say we add a few more parsers for new file formats, query languages, or domain-specific syntaxes — the duplication cost adds up. At some point a common parser engine pays for itself.
Trigger to reconsider: 4 or 5 distinct hand-rolled parsers in the framework, with the prospect of more.
Options if we revisit GitHub issue
Bundle LPeg GitHub issue
- Roberto Ierusalimschy's PEG engine for Lua. Battle-tested, ~80–100 KB compiled, ~15 years of maintenance.
- Pros: don't write a parsing engine ourselves; well-understood PEG semantics; fast.
- Cons: native dependency (more binary surface); LPeg is C, so grammar errors might surface awkwardly across the Caspian/Lua boundary.
Roll our own PEG (or parser-combinator) engine in Caspian GitHub issue
- ~500–1500 lines of Caspian for a usable engine.
- Pros: stays in our ecosystem; no native dependency; we control the semantics (good error messages, debug hooks).
- Cons: a substantial chunk of code to write and maintain.
- Likely the right choice if we revisit — keeps the framework self-contained.
Parser combinators GitHub issue
- Same effective result as PEG with a different API style.
- Composable primitives (literal, alternation, sequence, etc.) combined into parsers as data structures.
- Pros: very ergonomic for users defining their own languages; easy to compose.
- Cons: parser-combinator engines can be slow for large inputs without aggressive optimization.
Constraint: don't displace the schema model GitHub issue
Whatever engine lands, Uma's schema-driven approach stays the user-facing way to define tag-based markup languages. Writing html5.json (or a sibling for another markup) must remain low-learning-curve. A parser engine is for the framework's own use, not a substitute for the schema config.
If a future engine could also be reachable as a %utils.peg.parse(grammar, string) or similar — for use cases where a real grammar is appropriate (mini-languages, DSLs, custom query syntaxes) — that's a bonus. But the schema-as- config story stays the headline.