Context-dependent lexing
Sometimes, a single lexer is insufficient to properly handle complex grammars. To address this, many lexer generators offer the ability to have separate lexers with their own set of patterns and tokens, allowing you to dynamically switch between them based on the context.
In Logos, context switching is handled using the morph
method of the logos::Lexer
struct.
This method takes ownership of the current lexer and transforms it into a lexer for a new token type.
It is important to note that:
- Both the original lexer and the new lexer must share the same
Source
type. - The
Extras
type from the original lexer must be convertible into theExtras
type of the new lexer.
Example
The following example demonstrates how to use morph
to handle a C-style language that also supports python blocks:
#![allow(unused)] fn main() { #[derive(Logos, Debug, PartialEq, Clone)] #[logos(skip r"\s+")] enum CToken { /* Tokens supporting C syntax */ // ... #[regex(r#"extern\s+"python"\s*\{"#, python_block_callback)] PythonBlock(Vec<PythonToken>), } #[derive(Logos, Debug, PartialEq, Clone)] #[logos(skip r"\s+")] enum PythonToken { #[token("}")] ExitPythonBlock, /* Tokens supporting Python syntax */ // ... } fn python_block_callback(lex: &mut Lexer<CToken>) -> Option<Vec<PythonToken>> { let mut python_lexer = lex.clone().morph::<PythonToken>(); let mut tokens = Vec::new(); while let Some(token) = python_lexer.next() { match token { Ok(PythonToken::ExitPythonBlock) => break, Err(_) => return None, Ok(tok) => tokens.push(tok), } } *lex = python_lexer.morph(); Some(tokens) } }
Note that if we want to use morph
inside a callback we need to be able to clone the original lexer, as morph
needs to take ownership but the callback receives only a reference to the lexer.
For a more in depth example check out String interpolation.