Simple array language
Programs written in array languages manipulate
arrays of values as their primary data. In this example, we create a simple one-dimensional1
array language. Programs are interpreted as a sequence of instructions on an initially empty array.
Writing a number or variable appends it to the array, while sum (+) and product (*)
combine all the numbers in the array into a single value, making the array a singleton.
This example demonstrates how explicit lifetime specification
can be used to create lexers which output tokens with (non-static) lifetimes that outlive the source.
These lexers all share the same (immutable) state across different threads, without any cloning or Arcs!
Example program
There is an example program you can run with2:
cargo run --example array_language examples/array_program.txt
Lexing
The variable environment maps variable names to values.
type Environment = HashMap<String, Vec<i128>>;
The token type is paremeterized by the lifetime 'a, which is used in the lexer extras
as the lifetime of the borrow of the variable environment.
#[derive(Debug, Logos)]
#[logos(lifetime = none)]
#[logos(error(LexingError<'s>, |lex| LexingError::UnknownSymbol(lex.slice())))]
#[logos(extras = &'a Environment)]
#[logos(skip " +")]
enum Token<'a> {
#[regex(r"\-?[0-9]+", number_callback)]
Number(i128),
#[regex(r"[[:alpha:]][[:alnum:]]*", var_callback)]
Array(&'a [i128]),
#[token("*")]
Product,
#[token("+")]
Sum,
#[token("~")]
Reverse,
}
The lexer uses two callbacks:
/// Parse lexer slice as an i128
fn number_callback<'s>(lex: &mut Lexer<'s, Token>) -> Result<i128, LexingError<'s>> {
let source = lex.slice();
let res = source.parse();
res.map_err(|err| LexingError::InvalidInteger { err, source })
}
/// Look up the lexer slice in the variable environment,
/// yielding a borrow of the variable's value.
fn var_callback<'s, 'a>(lex: &mut Lexer<'s, Token<'a>>) -> Result<&'a [i128], LexingError<'s>> {
match lex.extras.get(lex.slice()) {
Some(arr) => Ok(arr.as_slice()),
None => Err(LexingError::UnknownVariable(lex.slice())),
}
}
The #[logos(lifetime = none)] attribute explicitly specifies that 'a is not the
source lifetime3. This means that the borrow of the environment (and thus the tokens
the lexer produces) is independent of the source. Logos creates a 's lifetime for the
source instead, which is used in the error type to store the slice causing the error:
/// Token error type, tied to the lifetime of the source.
#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError<'s> {
UnknownSymbol(&'s str),
InvalidInteger {
err: ParseIntError,
source: &'s str,
},
UnknownVariable(&'s str),
#[default]
Other,
}
A file is lexed by creating a separate lexer for each line of the file and combining the results.
/// Open the given file and lex each line, returning the results.
/// For each line, a lexer is created on a new thread.
/// The environment is shared between all lexers.
fn lex_file<'a>(path: &Path, env: &'a Environment) -> Vec<Result<Vec<Token<'a>>, String>> {
let source = std::fs::read_to_string(path).expect("Failed to read file");
std::thread::scope(|s| {
let mut handles = Vec::new();
for line in source.lines() {
handles.push(s.spawn(|| {
let lexer = Token::lexer_with_extras(line, env);
// Convert the lexer errors to strings before returning,
// because the source is scoped to this function.
lexer.map(|res| res.map_err(|e| e.to_string())).collect()
}));
}
handles.into_iter().flat_map(|h| h.join()).collect()
})
}
Scoped threads allow non-static borrows of variables outside the thread. Here, we use
this ability to store a borrow of the variable environment in the extras of each lexer.
This allows us to lex each line in parallel, sharing the variable environment between
threads without needing to clone it or wrap it in an Arc. Since the token lifetime
is independent of the source, the created tokens can be returned as is.
Note
Lexing each line of a file in parallel is done as an example and is probably a bad idea in a real program. It’s more likely that you would want to lex multiple files in parallel.
Evaluation
Each token is evaluated by updating an accumulator, which starts empty. A number or array token has its contents appended to the accumulator, whereas the sum and product operators combine all the numbers in the accumulator into a singleton. Once all tokens have been evaluated sequentially, the final accumulator is returned.
/// Evaluate a sequence of tokens to produce an array.
fn evaluate(tokens: &[Token]) -> Vec<i128> {
let mut accumulator = Vec::new();
for tok in tokens {
match *tok {
Token::Number(n) => accumulator.push(n),
Token::Array(arr) => accumulator.extend(arr),
Token::Product => {
let n = accumulator.drain(..).product();
accumulator.push(n);
}
Token::Sum => {
let n = accumulator.drain(..).sum();
accumulator.push(n);
}
Token::Reverse => accumulator.reverse(),
}
}
accumulator
}
The lines in the input file are evaluated sequentially, printing the returned accumulator.
Full code
use logos::{Lexer, Logos};
use std::collections::HashMap;
use std::fmt::Display;
use std::num::ParseIntError;
use std::path::Path;
/// Token error type, tied to the lifetime of the source.
#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError<'s> {
UnknownSymbol(&'s str),
InvalidInteger {
err: ParseIntError,
source: &'s str,
},
UnknownVariable(&'s str),
#[default]
Other,
}
impl Display for LexingError<'_> {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::UnknownSymbol(s) => write!(f, "unknown symbol `{s}`"),
Self::InvalidInteger { err, source } => {
write!(f, "int error in source `{source}`: {err}")
}
Self::UnknownVariable(s) => write!(f, "unknown variable `{s}`"),
Self::Other => write!(f, "unknown error"),
}
}
}
type Environment = HashMap<String, Vec<i128>>;
/// Parse lexer slice as an i128
fn number_callback<'s>(lex: &mut Lexer<'s, Token>) -> Result<i128, LexingError<'s>> {
let source = lex.slice();
let res = source.parse();
res.map_err(|err| LexingError::InvalidInteger { err, source })
}
/// Look up the lexer slice in the variable environment,
/// yielding a borrow of the variable's value.
fn var_callback<'s, 'a>(lex: &mut Lexer<'s, Token<'a>>) -> Result<&'a [i128], LexingError<'s>> {
match lex.extras.get(lex.slice()) {
Some(arr) => Ok(arr.as_slice()),
None => Err(LexingError::UnknownVariable(lex.slice())),
}
}
#[derive(Debug, Logos)]
#[logos(lifetime = none)]
#[logos(error(LexingError<'s>, |lex| LexingError::UnknownSymbol(lex.slice())))]
#[logos(extras = &'a Environment)]
#[logos(skip " +")]
enum Token<'a> {
#[regex(r"\-?[0-9]+", number_callback)]
Number(i128),
#[regex(r"[[:alpha:]][[:alnum:]]*", var_callback)]
Array(&'a [i128]),
#[token("*")]
Product,
#[token("+")]
Sum,
#[token("~")]
Reverse,
}
/// Evaluate a sequence of tokens to produce an array.
fn evaluate(tokens: &[Token]) -> Vec<i128> {
let mut accumulator = Vec::new();
for tok in tokens {
match *tok {
Token::Number(n) => accumulator.push(n),
Token::Array(arr) => accumulator.extend(arr),
Token::Product => {
let n = accumulator.drain(..).product();
accumulator.push(n);
}
Token::Sum => {
let n = accumulator.drain(..).sum();
accumulator.push(n);
}
Token::Reverse => accumulator.reverse(),
}
}
accumulator
}
/// Open the given file and lex each line, returning the results.
/// For each line, a lexer is created on a new thread.
/// The environment is shared between all lexers.
fn lex_file<'a>(path: &Path, env: &'a Environment) -> Vec<Result<Vec<Token<'a>>, String>> {
let source = std::fs::read_to_string(path).expect("Failed to read file");
std::thread::scope(|s| {
let mut handles = Vec::new();
for line in source.lines() {
handles.push(s.spawn(|| {
let lexer = Token::lexer_with_extras(line, env);
// Convert the lexer errors to strings before returning,
// because the source is scoped to this function.
lexer.map(|res| res.map_err(|e| e.to_string())).collect()
}));
}
handles.into_iter().flat_map(|h| h.join()).collect()
})
}
fn main() {
let filename = std::env::args().nth(1).expect("Expected file argument");
let env = Environment::from([
("NAT".to_owned(), (1..=10).collect()),
("EVEN".to_owned(), (2..=20).step_by(2).collect()),
("ODD".to_owned(), (1..=20).step_by(2).collect()),
("PRIME".to_owned(), vec![2, 3, 5, 7, 11, 13, 17, 19, 23, 29]),
("FIB".to_owned(), vec![0, 1, 1, 2, 3, 5, 8, 13, 21, 34]),
("POW2".to_owned(), (0..=10).map(|n| 2i128.pow(n)).collect()),
]);
let results = lex_file(Path::new(&filename), &env);
for res in &results {
match res {
Ok(tokens) => {
let numbers = evaluate(tokens).into_iter().map(|n| n.to_string());
println!("[{}]", numbers.collect::<Vec<_>>().join(", "));
}
Err(s) => eprintln!("{s}"),
}
}
}
-
Arrays can only contain numbers, not other arrays. ↩
-
You first need to clone this repository. ↩
-
Without the attribute, Logos will assume that
'ais the source lifetime. ↩