Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Simple array language

Programs written in array languages manipulate arrays of values as their primary data. In this example, we create a simple one-dimensional1 array language. Programs are interpreted as a sequence of instructions on an initially empty array. Writing a number or variable appends it to the array, while sum (+) and product (*) combine all the numbers in the array into a single value, making the array a singleton.

This example demonstrates how explicit lifetime specification can be used to create lexers which output tokens with (non-static) lifetimes that outlive the source. These lexers all share the same (immutable) state across different threads, without any cloning or Arcs!

Example program

There is an example program you can run with2:

cargo run --example array_language examples/array_program.txt

Lexing

The variable environment maps variable names to values.

type Environment = HashMap<String, Vec<i128>>;

The token type is paremeterized by the lifetime 'a, which is used in the lexer extras as the lifetime of the borrow of the variable environment.

#[derive(Debug, Logos)]
#[logos(lifetime = none)]
#[logos(error(LexingError<'s>, |lex| LexingError::UnknownSymbol(lex.slice())))]
#[logos(extras = &'a Environment)]
#[logos(skip " +")]
enum Token<'a> {
    #[regex(r"\-?[0-9]+", number_callback)]
    Number(i128),
    #[regex(r"[[:alpha:]][[:alnum:]]*", var_callback)]
    Array(&'a [i128]),
    #[token("*")]
    Product,
    #[token("+")]
    Sum,
    #[token("~")]
    Reverse,
}

The lexer uses two callbacks:

/// Parse lexer slice as an i128
fn number_callback<'s>(lex: &mut Lexer<'s, Token>) -> Result<i128, LexingError<'s>> {
    let source = lex.slice();
    let res = source.parse();
    res.map_err(|err| LexingError::InvalidInteger { err, source })
}

/// Look up the lexer slice in the variable environment,
/// yielding a borrow of the variable's value.
fn var_callback<'s, 'a>(lex: &mut Lexer<'s, Token<'a>>) -> Result<&'a [i128], LexingError<'s>> {
    match lex.extras.get(lex.slice()) {
        Some(arr) => Ok(arr.as_slice()),
        None => Err(LexingError::UnknownVariable(lex.slice())),
    }
}

The #[logos(lifetime = none)] attribute explicitly specifies that 'a is not the source lifetime3. This means that the borrow of the environment (and thus the tokens the lexer produces) is independent of the source. Logos creates a 's lifetime for the source instead, which is used in the error type to store the slice causing the error:

/// Token error type, tied to the lifetime of the source.
#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError<'s> {
    UnknownSymbol(&'s str),
    InvalidInteger {
        err: ParseIntError,
        source: &'s str,
    },
    UnknownVariable(&'s str),
    #[default]
    Other,
}

A file is lexed by creating a separate lexer for each line of the file and combining the results.

/// Open the given file and lex each line, returning the results.
/// For each line, a lexer is created on a new thread.
/// The environment is shared between all lexers.
fn lex_file<'a>(path: &Path, env: &'a Environment) -> Vec<Result<Vec<Token<'a>>, String>> {
    let source = std::fs::read_to_string(path).expect("Failed to read file");
    std::thread::scope(|s| {
        let mut handles = Vec::new();
        for line in source.lines() {
            handles.push(s.spawn(|| {
                let lexer = Token::lexer_with_extras(line, env);
                // Convert the lexer errors to strings before returning,
                // because the source is scoped to this function.
                lexer.map(|res| res.map_err(|e| e.to_string())).collect()
            }));
        }
        handles.into_iter().flat_map(|h| h.join()).collect()
    })
}

Scoped threads allow non-static borrows of variables outside the thread. Here, we use this ability to store a borrow of the variable environment in the extras of each lexer. This allows us to lex each line in parallel, sharing the variable environment between threads without needing to clone it or wrap it in an Arc. Since the token lifetime is independent of the source, the created tokens can be returned as is.

Note

Lexing each line of a file in parallel is done as an example and is probably a bad idea in a real program. It’s more likely that you would want to lex multiple files in parallel.

Evaluation

Each token is evaluated by updating an accumulator, which starts empty. A number or array token has its contents appended to the accumulator, whereas the sum and product operators combine all the numbers in the accumulator into a singleton. Once all tokens have been evaluated sequentially, the final accumulator is returned.

/// Evaluate a sequence of tokens to produce an array.
fn evaluate(tokens: &[Token]) -> Vec<i128> {
    let mut accumulator = Vec::new();
    for tok in tokens {
        match *tok {
            Token::Number(n) => accumulator.push(n),
            Token::Array(arr) => accumulator.extend(arr),
            Token::Product => {
                let n = accumulator.drain(..).product();
                accumulator.push(n);
            }
            Token::Sum => {
                let n = accumulator.drain(..).sum();
                accumulator.push(n);
            }
            Token::Reverse => accumulator.reverse(),
        }
    }
    accumulator
}

The lines in the input file are evaluated sequentially, printing the returned accumulator.

Full code

use logos::{Lexer, Logos};

use std::collections::HashMap;
use std::fmt::Display;
use std::num::ParseIntError;
use std::path::Path;

/// Token error type, tied to the lifetime of the source.
#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError<'s> {
    UnknownSymbol(&'s str),
    InvalidInteger {
        err: ParseIntError,
        source: &'s str,
    },
    UnknownVariable(&'s str),
    #[default]
    Other,
}

impl Display for LexingError<'_> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            Self::UnknownSymbol(s) => write!(f, "unknown symbol `{s}`"),
            Self::InvalidInteger { err, source } => {
                write!(f, "int error in source `{source}`: {err}")
            }
            Self::UnknownVariable(s) => write!(f, "unknown variable `{s}`"),
            Self::Other => write!(f, "unknown error"),
        }
    }
}

type Environment = HashMap<String, Vec<i128>>;

/// Parse lexer slice as an i128
fn number_callback<'s>(lex: &mut Lexer<'s, Token>) -> Result<i128, LexingError<'s>> {
    let source = lex.slice();
    let res = source.parse();
    res.map_err(|err| LexingError::InvalidInteger { err, source })
}

/// Look up the lexer slice in the variable environment,
/// yielding a borrow of the variable's value.
fn var_callback<'s, 'a>(lex: &mut Lexer<'s, Token<'a>>) -> Result<&'a [i128], LexingError<'s>> {
    match lex.extras.get(lex.slice()) {
        Some(arr) => Ok(arr.as_slice()),
        None => Err(LexingError::UnknownVariable(lex.slice())),
    }
}

#[derive(Debug, Logos)]
#[logos(lifetime = none)]
#[logos(error(LexingError<'s>, |lex| LexingError::UnknownSymbol(lex.slice())))]
#[logos(extras = &'a Environment)]
#[logos(skip " +")]
enum Token<'a> {
    #[regex(r"\-?[0-9]+", number_callback)]
    Number(i128),
    #[regex(r"[[:alpha:]][[:alnum:]]*", var_callback)]
    Array(&'a [i128]),
    #[token("*")]
    Product,
    #[token("+")]
    Sum,
    #[token("~")]
    Reverse,
}

/// Evaluate a sequence of tokens to produce an array.
fn evaluate(tokens: &[Token]) -> Vec<i128> {
    let mut accumulator = Vec::new();
    for tok in tokens {
        match *tok {
            Token::Number(n) => accumulator.push(n),
            Token::Array(arr) => accumulator.extend(arr),
            Token::Product => {
                let n = accumulator.drain(..).product();
                accumulator.push(n);
            }
            Token::Sum => {
                let n = accumulator.drain(..).sum();
                accumulator.push(n);
            }
            Token::Reverse => accumulator.reverse(),
        }
    }
    accumulator
}

/// Open the given file and lex each line, returning the results.
/// For each line, a lexer is created on a new thread.
/// The environment is shared between all lexers.
fn lex_file<'a>(path: &Path, env: &'a Environment) -> Vec<Result<Vec<Token<'a>>, String>> {
    let source = std::fs::read_to_string(path).expect("Failed to read file");
    std::thread::scope(|s| {
        let mut handles = Vec::new();
        for line in source.lines() {
            handles.push(s.spawn(|| {
                let lexer = Token::lexer_with_extras(line, env);
                // Convert the lexer errors to strings before returning,
                // because the source is scoped to this function.
                lexer.map(|res| res.map_err(|e| e.to_string())).collect()
            }));
        }
        handles.into_iter().flat_map(|h| h.join()).collect()
    })
}

fn main() {
    let filename = std::env::args().nth(1).expect("Expected file argument");
    let env = Environment::from([
        ("NAT".to_owned(), (1..=10).collect()),
        ("EVEN".to_owned(), (2..=20).step_by(2).collect()),
        ("ODD".to_owned(), (1..=20).step_by(2).collect()),
        ("PRIME".to_owned(), vec![2, 3, 5, 7, 11, 13, 17, 19, 23, 29]),
        ("FIB".to_owned(), vec![0, 1, 1, 2, 3, 5, 8, 13, 21, 34]),
        ("POW2".to_owned(), (0..=10).map(|n| 2i128.pow(n)).collect()),
    ]);

    let results = lex_file(Path::new(&filename), &env);
    for res in &results {
        match res {
            Ok(tokens) => {
                let numbers = evaluate(tokens).into_iter().map(|n| n.to_string());
                println!("[{}]", numbers.collect::<Vec<_>>().join(", "));
            }
            Err(s) => eprintln!("{s}"),
        }
    }
}

  1. Arrays can only contain numbers, not other arrays.

  2. You first need to clone this repository.

  3. Without the attribute, Logos will assume that 'a is the source lifetime.