17Apr

The Third Simplest Bytecode Interpreter, In Rust

Posted by Elf Sternberg as Uncategorized

For the previous step of Implementing The World’s Simplest Bytecode Interpreter in Rust, I followed Vladimir Kazanov as he added larger opcodes to his bytecode. In the third variant, he adds a stack, pushing values on as needed and allowing the operations to pop values off the stack as needed.

My variant of this operation follows his, but is slightly different. He continues to use the stack somewhat arbitrarily, but as I’ve pointed out, Rust doesn’t alow for that outside of unsafe code.

We’re going to swap out the accumulator for a stack, and we’ll return the last value pushed onto the stack as the result. After including the result type, which we’ll use for error messaging and forwarding, our program now looks like this:

<lib.rs>=
use std::result;

pub enum Opcode {
    Push(i64),
    Add,
    Sub,
    Mul,
    Div,
    Done
}

pub struct Program {
    program: Vec<Opcode>,
    stack: Vec<i64>,
}

One thing I wanted to do was add proper error handling. This isn’t exactly "proper," as I’ve added no pretty printing, but it does the job. One thing I have done is redefine Result to always have the types used by my library, and nothing else:

<lib.rs>+=
#[derive(Debug)]
pub enum ProgramError {
    DivisionByZero,
    StackUnderflow,
}

type Result<T> = result::Result<T, ProgramError>;

Okay, now things are gonna get… a little advanced. But only a little. What this next piece is, is simple: I found myself writing the exact same seven lines of code for three of the four operations. So why not extract that out into a function? Well, the biggest problem is that the maths symbols, +, -, etc, aren’t first-class functions; they can’t be passed around. But they are Rust tokens, which means they can be handled by a macro. So I created a macro that does the operation, and manipulates the object to pop() and push() the values as needed:

<lib.rs>+=
macro_rules! make_op {
    ($code:expr, $op:tt) => {{
        if let Some(a) = $code.stack.pop() {
            if let Some(b) = $code.stack.pop() {
                $code.stack.push(b $op a);
                None
            } else { Some(ProgramError::StackUnderflow) }
        } else { Some(ProgramError::StackUnderflow) }
    }
}}

This is a very simple substitution macro; it takes two items separated by a comma and builds a block that either returns None or Some(Error). We’ll use this return value as a signal to our processing loop.

The first item passed into the macro is a Rust expression, something that returns a value (in this case, the value is just the program struct, so access is a no-op), and the second is a Rust token, which is the most primitive object a macro can handle, but it’s perfect for arithmetic symbols.

There is one big oddity in the macro rule; can you spot it? The operation’s values will be extracted into a and b, but the operation itself is b $op a. Multiplication and addition are fully commutative, at least for integers, but subtraction is not; we needed to do the operations in this order to ensure that they were done in an "unsurprising" order: 5 4 - is "5 – 4", not "4 – 5".

Initializing our program is unchanged:

<lib.rs>+=
pub fn interpret(program: Vec<Opcode>) -> Result<i64> {
    let mut code = Program {
        program: program,
        stack: Vec::new()
    };

And now that I have my macro handler, the actual opcode runner is fairly uncluttered, although the cyclomatic complexity of the error handling for division does make it ugly. This is one place where railway-oriented programming would be lovely, but that’s an advanced topic.

Basically, if any operation returns an error the program halts right there with the error value, otherwise it proceeds until it either hits Done or runs out of operations to perform, at which point it tries to extract the value from the stack and return it, or returns a StackUnderflow error if the stack is empty.

<lib.rs>+=
    for op in code.program {
        if let Some(err) = match op {
            Opcode::Push(i) => { code.stack.push(i); None }
            Opcode::Mul => make_op!(code, *),
            Opcode::Add => make_op!(code, +),
            Opcode::Sub => make_op!(code, -),
            Opcode::Div => {
                if let Some(a) = code.stack.pop() {
                    if a == 0 {
                        Some(ProgramError::DivisionByZero)
                    } else if let Some(b) = code.stack.pop() {
`                       code.stack.push(b / a);
                        None
                    } else { Some(ProgramError::StackUnderflow) }
                } else { Some(ProgramError::StackUnderflow) }
            }
            Opcode::Done => break,
        } {
            return Err(err);
        }
    }

    if let Some(v) = code.stack.pop() {
        Ok(v)
    } else {
        Err(ProgramError::StackUnderflow)
    }
}

And the unit tests:

<lib.rs>+=
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn basic() {
        use Opcode::*;
        assert_eq!(interpret(vec![Push(2), Done]).unwrap(), 2);
        assert_eq!(interpret(vec![Push(2), Push(3), Mul, Done]).unwrap(), 6);
        assert_eq!(interpret(vec![Push(5), Push(3), Sub, Done]).unwrap(), 2);
        assert!(interpret(vec![Push(2), Mul, Done]).is_err());
        assert!(interpret(vec![Push(2), Push(0), Div, Done]).is_err());
        assert_eq!(
            interpret(vec![ Push(2), Push(2), Mul, Push(3), Mul,
                            Push(4), Mul, Done]).unwrap(), 48);
        assert_eq!(
            interpret(vec![ Push(5), Push(2), Mul, Push(5), Div,
                            Push(2), Div, Done ]) .unwrap(), 1);
    }
}

And the results are the same:

running 1 tests
test tests::basic ... ok

One thing to note here is that we’ve now entered a realm where type errors are possible; if we could push more than one type of object onto the stack, we would need to differentiate among the types that our operations could handle. For a full-blown general-purpose interpreter, this sort of typechecking will get repetitive very quickly, but once you’ve identified the typechecking pattern in your code you can extract that repetition into a macro. Rust macros are already amazing, and I consider myself very much a beginner when it comes to them.

Comment Form

Subscribe to Feed

Categories

Calendar

April 2019
M T W T F S S
« Mar   May »
1234567
891011121314
15161718192021
22232425262728
2930