State machine codegen

The method of implementing the DFA state machine in rust code can be changed by enabling or disabling the crate feature state_machine_codegen. This has no behavioral differences when enabled or disabled, it only affects performance and stack memory usage.

Feature enabled

The state machine codegen creates an Enum variant for each state, and puts the state bodies in the arms of a match statement. The match statement is put inside of a loop, and state transitions are implemented by assigning to the current state variable and then continueing to the start of the loop again.

let mut state = State::State0;
loop {
    match state {
        State::State0 => {
            match lexer.read() {
                'a' => state = State::State1,
                'b' => state = State::State2,
                _ => return Token::Error,
            }
        }
        // Etc...
    }
}

Feature Disabled

The tailcall code generation creates functions for each state, and state transitions are implemented by calling the next state’s function.

fn state0(lexer: Lexer, context: Context) -> Token {
    match lexer.read() {
        'a' => state1(lexer, context),
        'b' => state2(lexer, context),
        _ => Token::Error,
    }
}

// Etc ...

Considerations

The tail call code generation generates significantly faster code and is therefore the default. However, until rust gets guaranteed tail calls with the become keyword, it is possible to overflow the stack using it. This usually happens when many “skip” tokens are matched in a row. This can be solved by wrapping your skip pattern in a repetition, though this is not always the case. If you don’t want to worry about possible stack overflows, you can use the state_machine_codegen feature.

Performance Explanation

Tail call code generation is faster because LLVM does not currently optimize switches within loops well. The resulting machine code usually has each state jump back to the top of the loop, where the next state is looked up in a jump table. In contrast, tail call generation usually results in an unconditional jump at the end of the state to the next state. While the unconditional jump is slightly better in terms of instruction count, the real advantage lies in the tail call code generation being much nicer to the branch predictor.

Potential Mitigations

There are a couple of things that would improve the quality of the code generated by the state machine codegen. First, if you add the -enable-dfa-jump-thread LLVM pass to rustc, you end up with very similar machine code in both code generators (but the state machine has the added benefit of no possiblity of stack overflows). This option can be added using a config.toml file (example). It is probably not a good idea to do this in production code, as adding new LLVM passes to rust increases the possibility of facing compiler bugs. There is also the possibility of this optimization being added at the rust level. This is being explored by RFC 3720. If that RFC is implemented, the logos state machine codegen could use the new loop match construct to obtain a similar optimization.