Really Advanced – Custom Parsers

Sometimes it is desirable to have multiple custom syntax starting with the same symbol.

This is especially common for command-style syntax where the second symbol calls a particular command:

// The following simulates a command-style syntax, all starting with 'perform'.
perform hello world;        // A fixed sequence of symbols
perform action 42;          // Perform a system action with a parameter
perform update system;      // Update the system
perform check all;          // Check all system settings
perform cleanup;            // Clean up the system
perform add something;      // Add something to the system
perform remove something;   // Delete something from the system

Alternatively, a custom syntax may have variable length, with a termination symbol:

// The following is a variable-length list terminated by '>'  
tags < "foo", "bar", 123, ... , x+y, true >

For even more flexibility in order to handle these advanced use cases, there is a low level API for custom syntax that allows the registration of an entire mini-parser.

Use Engine::register_custom_syntax_with_state_raw to register a custom syntax parser together with an implementation function, both of which accept a custom user-defined state value.

How Custom Parsers Work

Leading Symbol

Under this API, the leading symbol for a custom parser is no longer restricted to be valid identifiers.

It can either be:

Parser Function Signature

The custom syntax parser has the following signature.

Fn(symbols: &[ImmutableString], look_ahead: &str, state: &mut Dynamic) -> Result<Option<ImmutableString>, ParseError>

where:

ParameterTypeDescription
symbols&[ImmutableString]a slice of symbols that have been parsed so far, possibly containing $expr$ and/or $block$; $ident$ and other literal markers are replaced by the actual text
look_ahead&stra string slice containing the next symbol that is about to be read
state&mut Dynamicmutable reference to a user-defined state

Most strings are ImmutableString’s so it is usually more efficient to just clone the appropriate one (if any matches, or keep an internal cache for commonly-used symbols) as the return value.

Parameter #1 – Symbols Parsed So Far

The symbols parsed so far are provided as a slice of ImmutableStrings.

The custom parser can inspect this symbols stream to determine the next symbol to parse.

Argument typeValue
text stringtext value
$ident$identifier name
$symbol$symbol literal
$expr$$expr$
$block$$block$
$bool$true or false
$int$value of number
$float$value of number
$string$string text

Parameter #2 – Look-Ahead Symbol

The look-ahead symbol is the symbol that will be parsed next.

If the look-ahead is an expected symbol, the customer parser just returns it to continue parsing, or it can return $ident$ to parse it as an identifier, or even $expr$ to start parsing an expression.

Tip: Strings vs identifiers

The look-ahead of an identifier (e.g. variable name) is its text name.

That of a string literal is its content wrapped in quotes ("), e.g. "this is a string".

If the look-ahead is {, then the custom parser may also return $block$ to start parsing a statements block.

If the look-ahead is unexpected, the custom parser should then return the symbol expected and Rhai will fail with a parse error containing information about the expected symbol.

Parameter #3 – User-Defined Custom State

The state’s value starts off as ().

Its type is Dynamic, possible to hold any value.

Usually it is set to an object map that contains information on the state of parsing.

Return value

The return value is Result<Option<ImmutableString>, ParseError> where:

ValueDescription
Ok(None)parsing is complete and there is no more symbol to match
Ok(Some(symbol))the next symbol to match, which can also be $expr$, $ident$, $block$ etc.
Err(error)error that is reflected back to the Engine – normally ParseError( ParseErrorType::BadInput( LexError::ImproperSymbol(message) ), Position::NONE) to indicate that there is a syntax error, but it can be any ParseError.

A custom parser always returns Some with the next symbol expected (which can be $ident$, $expr$, $block$ etc.) or None if parsing should terminate (without reading the look-ahead symbol).

The $$ return symbol short-cut

A return symbol starting with $$ is treated specially.

Like None, it also terminates parsing, but at the same time it adds this symbol as text into the inputs stream at the end.

This is typically used to inform the implementation function which custom syntax variant was actually parsed.

fn implementation_fn(context: &mut EvalContext, inputs: &[Expression], state: &Dynamic) -> Result<Dynamic, Box<EvalAltResult>>
{
    // Get the last symbol
    let key = inputs.last().unwrap().get_string_value().unwrap();

    // Make sure it starts with '$$'
    assert!(key.starts_with("$$"));

    // Execute the custom syntax expression
    match key {
        "$$hello" => { ... }
        "$$world" => { ... }
        "$$foo" => { ... }
        "$$bar" => { ... }
        _ => Err(...)
    }
}

$$ is a convenient short-cut. An alternative method is to pass such information in the user-defined custom state.

Implementation Function Signature

The signature of an implementation function for Engine::register_custom_syntax_with_state_raw is as follows, which is slightly different from the function for Engine::register_custom_syntax.

Fn(context: &mut EvalContext, inputs: &[Expression], state: &Dynamic) -> Result<Dynamic, Box<EvalAltResult>>

where:

ParameterTypeDescription
context&mut EvalContextmutable reference to the current evaluation context
inputs&[Expression]a list of input expression trees
state&Dynamicreference to the user-defined state

Custom Parser Example

engine.register_custom_syntax_with_state_raw(
    // The leading symbol - which needs not be an identifier.
    "perform",
    // The custom parser implementation - always returns the next symbol expected
    // 'look_ahead' is the next symbol about to be read
    //
    // Return symbols starting with '$$' also terminate parsing but allows us
    // to determine which syntax variant was actually parsed so we can perform the
    // appropriate action.  This is a convenient short-cut to keeping the value
    // inside the state.
    //
    // The return type is 'Option<ImmutableString>' to allow common text strings
    // to be interned and shared easily, reducing allocations during parsing.
    |symbols, look_ahead, state| match symbols.len() {
        // perform ...
        1 => Ok(Some("$ident$".into())),
        // perform command ...
        2 => match symbols[1].as_str() {
            "action" => Ok(Some("$expr$".into())),
            "hello" => Ok(Some("world".into())),
            "update" | "check" | "add" | "remove" => Ok(Some("$ident$".into())),
            "cleanup" => Ok(Some("$$cleanup".into())),
            cmd => Err(LexError::ImproperSymbol(format!("Improper command: {cmd}"))
                       .into_err(Position::NONE)),
        },
        // perform command arg ...
        3 => match (symbols[1].as_str(), symbols[2].as_str()) {
            ("action", _) => Ok(Some("$$action".into())),
            ("hello", "world") => Ok(Some("$$hello-world".into())),
            ("update", arg) => match arg {
                "system" => Ok(Some("$$update-system".into())),
                "client" => Ok(Some("$$update-client".into())),
                _ => Err(LexError::ImproperSymbol(format!("Cannot update {arg}"))
                         .into_err(Position::NONE))
            },
            ("check", arg) => Ok(Some("$$check".into())),
            ("add", arg) => Ok(Some("$$add".into())),
            ("remove", arg) => Ok(Some("$$remove".into())),
            (cmd, arg) => Err(LexError::ImproperSymbol(
                format!("Invalid argument for command {cmd}: {arg}")
            ).into_err(Position::NONE)),
        },
        _ => unreachable!(),
    },
    // No variables declared/removed by this custom syntax
    false,
    // Implementation function
    |context, inputs, state| {
        let cmd = inputs.last().unwrap().get_string_value().unwrap();

        match cmd {
            "$$cleanup" => { ... }
            "$$action" => { ... }
            "$$update-system" => { ... }
            "$$update-client" => { ... }
            "$$check" => { ... }
            "$$add" => { ... }
            "$$remove" => { ... }
            _ => Err(format!("Invalid command: {cmd}"))
        }
    }
);