Patterns GitHub issue
Overview GitHub issue
vibecode
{"vibecode": { "section": "overview", "role": "introduces Caspian pattern matching using Lua patterns by default", "key_concepts": ["Lua_patterns", "default_engine", "no_alternation", "no_named_captures", "no_quantifier_range", "restructure_in_caspian"] }}
Caspian uses LPeg as the default pattern engine. PEG (Parsing Expression Grammar) is strictly more powerful than traditional regex: it supports alternation (|), recursion, named captures, and arbitrary lookahead. ~50 KB in the install.
Lua's built-in patterns are available as a lightweight engine option for space-constrained deployments — they cover character classes, quantifiers, captures, and anchors, but lack alternation, named captures, lookahead, or {n,m} quantifiers. With Lua patterns active, restructure the logic in Caspian rather than reaching for a more powerful pattern language.
Pattern Engine GitHub issue
vibecode
{"vibecode": { "section": "pattern_engine", "role": "documents the swappable central pattern engine object and available alternative engines", "v1_default": "lpeg", "v1_lightweight_alternative": "lua_patterns_built_in", "key_concepts": ["swappable_engine", "central_engine_object", "lpeg", "lua_patterns", "PCRE2", "RE2", "named_captures", "transparent_routing"] }}
The pattern engine is not hardwired. A central engine object tracks which pattern library is in use, and all pattern operations (match, match?, etc.) route through it. Code that calls $string.match(...) does not need to know which engine is active.
Design around this object from the start: when adding pattern-related features, direct them through the engine object rather than calling the active library's pattern functions directly. This keeps the engine swappable.
Available engines GitHub issue
| Engine | Size | Alternation (`\ | `) | Named captures | Lookahead |
|---|---|---|---|---|---|
| LPeg (default in V1) | ~50 KB | Yes | Yes | Yes | PEG library; strictly more powerful than regex. See lua-dependencies.md § LPeg. |
| Lua patterns (built-in) | 0 KB additional | No | No | No | Lua's own pattern library. Limited but free — already part of every Lua install. |
| PCRE2 | ~500 KB | Yes | Yes | Yes | Full Perl-compatible regex. Pluggable when full PCRE syntax is wanted. |
| RE2 | ~1–2 MB | Yes | Yes | No (by design) | Google's linear-time-guaranteed regex. Pluggable when DOS-resistance matters. |
V1 ships LPeg as the default engine. Lua patterns are still available as the lightweight fallback when the program / runtime is space-constrained and | (alternation) isn't needed. PCRE2 and RE2 are plugin options for environments with no install-size constraint.
Features available depend on which engine is active. With LPeg, you get alternation, recursion, and full PEG expressiveness. With Lua patterns, you get character classes and basic quantifiers — restructure logic in Caspian for anything that would need |. Plug in PCRE2 or RE2 for richer regex features.
Pattern syntax GitHub issue
Open: the canonical Caspian pattern syntax isn't pinned yet. With LPeg as the V1 default engine, there are two paths:
- Caspian regex syntax — define a regex-style string syntax (resembling PCRE:
.,*,+,?,|,[...],(...),\d/\w/\setc.) that we compile to LPeg patterns internally. Users get familiar regex; LPeg powers it under the hood. Requires a ~100–200 line regex-to-LPeg compiler. - Native LPeg syntax — expose LPeg's own pattern language directly. Very powerful but unfamiliar to users coming from regex; uses
/for alternation, not|.
Path (1) is the more user-friendly default and aligns with "we want good regular expressions." Path (2) is the smallest implementation surface. Decide before V1 ships.
The syntax table below is the Lua-patterns engine reference (still available as the lightweight alternative — see Available engines):
| Pattern | Meaning |
|---|---|
%a |
Letters |
%d |
Digits |
%l |
Lowercase letters |
%u |
Uppercase letters |
%s |
Whitespace |
%w |
Alphanumeric |
%p |
Punctuation |
. |
Any character |
* |
0 or more (greedy) |
+ |
1 or more (greedy) |
- |
0 or more (lazy) |
? |
0 or 1 |
^ |
Anchor to start |
$ |
Anchor to end |
[set] |
Character set |
() |
Capture |
| (no alternation) | use multiple .match? calls or switch engines |
Methods GitHub issue
vibecode
{"vibecode": { "section": "methods", "role": "documents match, match?, and chaining methods on strings", "key_concepts": ["match", "match_boolean", "chaining", "negation", "null_safe_navigation", "Match_object"] }}
$string.match(pattern) GitHub issue
Returns a Match object if the pattern matches, or null if it does not. Use safe navigation to handle the null case:
$string.match('pattern')&.text
$string.match?(pattern) GitHub issue
Returns a boolean. Does not allocate a match object — use this when you only need to know whether the pattern matched:
if $string.match?('%d+')
end
Chaining GitHub issue
.match() can be chained. Each call narrows the result, operating on the text of the previous match:
$string.match('blah blah').match('blue')
Negation GitHub issue
$string.match('pattern', not:true)
Match Object GitHub issue
vibecode
{"vibecode": { "section": "match_object", "role": "documents all properties of the Match object returned by string.match", "key_concepts": ["Match.text", "Match.start", "Match.end", "Match.groups", "Match.matches", "Match.count", "string_conversion"] }}
| Property | Description |
|---|---|
$match.text |
The full matched string |
$match.start |
Start position in the original string |
$match.end |
End position in the original string |
$match.groups |
Array of captured groups, empty if none |
$match.groups[0] |
First captured group by position |
$match.matches |
All matches as an array of Match objects |
$match.count |
Number of matches |
String Conversion GitHub issue
A Match object converts to its matched text when used in a string context:
$string = 'abc123def'
$match = $string.match('%d+')
puts $match # "123"
$result = 'value: ' + $match
# "value: 123"
If $match is null, it behaves according to standard null string conversion rules.