Elf Sternberg: The Second Simplest Bytecode Interpreter, In Rust

In my last post, Implementing The World's Simplest Bytecode Interpreter in Rust, I did exactly that, following Vladimir Kazanov's World's Simplest Bytecode Interpreter.

His first interpreter was basically dumb as a rock: starting from an integer value of zero it allowed only increment and decrement. His second variant adds two new instructions which allow for the addition and subtraction of arbitrary integers. This is where the difference between C and Rust really begin to shine. I've added the two instructions to the Opcode:

pub enum Opcode {
    Inc,
    Dec,
    Add(i64),  // New
    Sub(i64),  // New
    Done
}
    
pub struct Program {
    program: Vec<Opcode>,
    accumulate: i64
}

The program definition is unchanged:

pub fn interpret(program: Vec<Opcode>) -> Option<i64> {
    let mut code = Program {
        program: program,
        accumulate: 0
    };

And the actual opcode runner is fairly straightforward:

    for i in code.program {
        code.accumulate = match i {
            Opcode::Inc => code.accumulate + 1,
            Opcode::Dec => code.accumulate - 1,
            Opcode::Add(i) => code.accumulate + i,  // New
            Opcode::Sub(i) => code.accumulate - i,  // New
            Opcode::Done => break
        }
    }

    return Some(code.accumulate)
}

And that's the whole change. Kazanov is trying to show how the program's array of instruction can keep values associated with an operation in-line with the operation itself, but in Rust that's just... an enum. An enum in Rust is an authentic sum type; it can carry additional information. There's no need to coerce the vector into holding "weird" values and casting them appropriately; Rust already guarantees that happened at compile time. And again, the responsibility for verifying the program's correctness off storage is a separate responsibility; by the time the program has entered the interpreter, we've verified that it is syntactically correct at the bytecode level.

There is a cost to this: Rust's Vec will consist of uniformly sized objects, each as big as the largest object defined in the enum plus the management of the enum itself, so at least one byte for the enum differentiation plus the size of a 64-bit integer, even for the Inc and Dec operators. For a small DSL, that's not a terrible price to pay. As an interpreter grows, alternative ways of describing an opcode may be desired. We shall see.

A basic unit test:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn longer() {
        use Opcode::*;
        assert_eq!(interpret(vec![Inc, Inc, Inc, Add(4), Add(3), Sub(6), Done]), Some(4));
    }

}

And the results are the same:

<code>running 1 tests
test tests::longer ... ok</code>

Elf M. Sternberg

Full Stack Web Developer

Where one teaches, two learn.

Blog

THE SECOND SIMPLEST BYTECODE INTERPRETER, IN RUST