#[logos]
As previously said, the #[logos]
attribute can be attached to the enum
of your token definition to customize your lexer. Note that they all are
optional.
The syntax is as follows:
#![allow(unused)] fn main() { #[derive(Logos)] #[logos(skip "regex literal")] #[logos(extras = ExtrasType)] #[logos(error = ErrorType)] #[logos(crate = path::to::logos)] #[logos(source = SourceType)] enum Token { /* ... */ } }
where "regex literal"
can be any regex supported by
#[regex]
, and ExtrasType
can be of any type!
An example usage of skip
is provided in the JSON parser example.
For more details about extras, read the eponym section.
Custom error type
By default, Logos uses ()
as the error type, which means that it
doesn't store any information about the error.
This can be changed by using #[logos(error = ErrorType)]
attribute on the enum.
The type ErrorType
can be any type that implements Clone
, PartialEq
,
Default
and From<E>
for each callback's error type.
ErrorType
must implement the Default
trait because invalid tokens, i.e.,
literals that do not match any variant, will produce Err(ErrorType::default())
.
For example, here is an example using a custom error type:
use logos::Logos;
use std::num::ParseIntError;
#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError {
InvalidInteger(String),
#[default]
NonAsciiCharacter,
}
/// Error type returned by calling `lex.slice().parse()` to u8.
impl From<ParseIntError> for LexingError {
fn from(err: ParseIntError) -> Self {
use std::num::IntErrorKind::*;
match err.kind() {
PosOverflow | NegOverflow => LexingError::InvalidInteger("overflow error".to_owned()),
_ => LexingError::InvalidInteger("other error".to_owned()),
}
}
}
#[derive(Debug, Logos, PartialEq)]
#[logos(error = LexingError)]
#[logos(skip r"[ \t]+")]
enum Token {
#[regex(r"[a-zA-Z]+")]
Word,
#[regex(r"[0-9]+", |lex| lex.slice().parse())]
Integer(u8),
}
fn main() {
// 256 overflows u8, since u8's max value is 255.
// 'é' is not a valid ascii letter.
let mut lex = Token::lexer("Hello 256 Jérome");
assert_eq!(lex.next(), Some(Ok(Token::Word)));
assert_eq!(lex.slice(), "Hello");
assert_eq!(
lex.next(),
Some(Err(LexingError::InvalidInteger(
"overflow error".to_owned()
)))
);
assert_eq!(lex.slice(), "256");
assert_eq!(lex.next(), Some(Ok(Token::Word)));
assert_eq!(lex.slice(), "J");
assert_eq!(lex.next(), Some(Err(LexingError::NonAsciiCharacter)));
assert_eq!(lex.slice(), "é");
assert_eq!(lex.next(), Some(Ok(Token::Word)));
assert_eq!(lex.slice(), "rome");
assert_eq!(lex.next(), None);
}
You can add error variants to LexingError
,
and implement From<E>
for each error type E
that could
be returned by a callback. See callbacks.
Specifying path to logos
You can force the derive macro to use a different path to Logos
's crate
with #[logos(crate = path::to::logos)]
.
Custom source type
By default, Logos's lexer will accept &str
as input, unless any of the
pattern literals match a non utf-8 bytes sequence. In this case, it will fall
back to &[u8]
. You can override this behavior by forcing one of the two
source types. You can also specify any custom time that implements
Source
.