Logos Version Migration Guide
This page contains guidance for migrating between versions of logos that have major breaking changes.
Changes in 0.16.0
Logos 0.16.0 was a very large update. As of this writing, the PR changed over 100 files and touches over 1000 lines of code. It fixed a number of long standing issues related to backtracking and matching state machine soundness.
The update also added some major new features and a handful of breaking changes.
New Features
- Dot repetitions such as
.*and.+are now supported. Due to the related supported pitfalls, they are disallowed by default, but can be used if you pass the attribute argumentallow_greedy = trueor if you make them non-greedy. For more information, see Common performance pitfalls. - Logos now precisely follows regex match semantics. Before 0.16.0, repetitions
were greedily followed, which would cause no matches where a match should have
been possible. For example, in 0.15.1, it is impossible to match the pattern
a*abecause allabytes are consumed by the repetition. This irregular behavior has been fixed in 0.16.0. The behavior should now be identical to theregexcrate with the following assumptions:- Every pattern behaves as if it has a start of input anchor (
^) prepended to it. - Unicode word boundaries, some lookaround, and other advanced features not supported by the DFA regex engine will cause a compile time error because they cannot be matched by the state machine that logos generates.
- Every pattern behaves as if it has a start of input anchor (
- The error token semantics are now precisely defined. See Error semantics.
- The new
state_machine_codegenfeature. If you are experiencing issues with stack overflows, enabling this feature will solve them. It is slower than the default tailcall codegen, but it will never overflow the stack. See State machine codegen.
Breaking Changes
- The
ignore_ascii_caseattribute was removed. You can switch to using theignore_caseattribute, which also works on non-unicode patterns. If you explicitly want to ignore case for ascii characters but not others, you will have to do it manually using character classes. See#[token]and#[regex]. - The
sourceattribute has been removed. You can now use theutf8attribute to select either&stror&[u8]as the source type. Custom source types are no longer supported. If you need this feature, you can either stay on0.15.1or contribute an implementation to Logos! For more information onutf8, see its#[logos].