State machine codegen
The method of implementing the DFA state machine in rust code can be changed by
enabling or disabling the crate feature state_machine_codegen. This has no
behavioral differences when enabled or disabled, it only affects performance
and stack memory usage.
Feature enabled
The state machine codegen creates an Enum variant for each state, and puts the
state bodies in the arms of a match statement. The match statement is put
inside of a loop, and state transitions are implemented by assigning to the
current state variable and then continueing to the start of the loop again.
let mut state = State::State0;
loop {
match state {
State::State0 => {
match lexer.read() {
'a' => state = State::State1,
'b' => state = State::State2,
_ => return Token::Error,
}
}
// Etc...
}
}
Feature Disabled
The tailcall code generation creates functions for each state, and state transitions are implemented by calling the next state’s function.
fn state0(lexer: Lexer, context: Context) -> Token {
match lexer.read() {
'a' => state1(lexer, context),
'b' => state2(lexer, context),
_ => Token::Error,
}
}
// Etc ...
Considerations
The tail call code generation generates significantly faster code and is
therefore the default. However, until rust gets guaranteed tail calls with the
become keyword, it is possible to overflow the stack using it. This usually
happens when many “skip” tokens are matched in a row. This can be solved by
wrapping your skip pattern in a repetition, though this is not always the case.
If you don’t want to worry about possible stack overflows, you can use the
state_machine_codegen feature.
Performance Explanation
Tail call code generation is faster because LLVM does not currently optimize switches within loops well. The resulting machine code usually has each state jump back to the top of the loop, where the next state is looked up in a jump table. In contrast, tail call generation usually results in an unconditional jump at the end of the state to the next state. While the unconditional jump is slightly better in terms of instruction count, the real advantage lies in the tail call code generation being much nicer to the branch predictor.
Potential Mitigations
There are a couple of things that would improve the quality of the code
generated by the state machine codegen. First, if you add the
-enable-dfa-jump-thread
LLVM pass to rustc, you end up with very similar machine code in both code
generators (but the state machine has the added benefit of no possiblity of
stack overflows). This option can be added using a config.toml file
(example).
It is probably not a good idea to do this in production code, as adding new
LLVM passes to rust increases the possibility of facing compiler bugs. There is
also the possibility of this optimization being added at the rust level. This
is being explored by RFC 3720.
If that RFC is implemented, the logos state machine codegen could use the new
loop match construct to obtain a similar optimization.