17Apr

Implementing “The World’s Simplest Bytecode Interpreter” in Rust

Posted by Elf Sternberg as Uncategorized

Because I feel woefully out of practice with parts of Rust that aren’t related to my current project, I decided to try a couple of on-line exercises, mostly written in other languages, and see what sort of effort it would take to do the exercises in rust.

Vladimir Kazanov’s World’s Simplest Bytecode Interpreter was my first exercise, and it turned out that a number of the things Kazanov used just aren’t possible in Rust. His bytecode interpreter really is the simplest possible: it has three bytecodes: increment, decrement, and done.

Kazanov defines three structures: a VM, a bytecode collection, and a result. He defines a constructor for the VM that nulls it out, and then a function that takes a pointer to the program and runs it through his bytecode interpreter, returning either a result or an error message.

So far, the Rust is going to look the same:

pub enum Opcode {
    Inc,
    Dec,
    Done
}
    
pub struct Program {
    program: Vec<Opcode>,
    accumulate: i64
}

Instead of a pointer to the program, I’m going to move the entire program into my VM. This is probably not what I want, in the long run; I’d either want a reference or the capacity to Clone the code as needed, but this is the world’s simplest bytecode interpreter, so for now, there’s not a lot to do.

The interpret function is equally straightforward. Here’s the first notable changes to Kazanov’s code: First, we don’t need a special return type, because Rust has Option. Second, we don’t need to zero out the memory, because Rust guarantees that any memory we allocate is going to be zeroed out already. Third, we’re moving the program into the VM, thus requiring the client to take on the responsibility of ensuring the program is correct.

More importantly, we’re constrained by the compiler to the opcodes provided; we literally cannot have a bad opcode! Kazanov can; he defines an enum, but in C that’s just a set of named intergers. In Rust, enums have a much more strict semantic meaning. (Now, turning an enum into a static, on-storage representation, and reading it back into memory, is a different task, but it’s a task with exceptionally clear guidance that you just don’t have in C, and it’s guidance that you get for free, guidance with no runtime cost whatsoever. This is part of why I love to work in Rust.)

pub fn interpret(program: Vec<Opcode>) -> Option<i64> {
    let mut code = Program {
        program: program,
        accumulate: 0
    };

And the actual opcode runner is fairly straightforward:

    for i in code.program {
        code.accumulate = match i {
            Opcode::Inc => code.accumulate + 1,
            Opcode::Dec => code.accumulate - 1,
            Opcode::Done => break
        }
    }

    return Some(code.accumulate)
}

A basic unit test:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn longer() {
        use Opcode::*;
        assert_eq!(interpret(vec![Inc, Inc, Inc, Dec, Dec, Done, Inc, Inc]), Some(1));
    }

}

And the results are promising:

running 2 tests
test tests::longer ... ok

In Kazanov’s code, he needs extra test to assert that the program is “correct” in its bytecode, but these tests are not needed in Rust: for embedded bytecode, the compiler literally will not let me write a bad bytecode representation. A byte-to-bytecode reader will be required by the compiler to associate correct data values with read data types.

Comment Form

Subscribe to Feed

Categories

Calendar

April 2019
M T W T F S S
« Mar    
1234567
891011121314
15161718192021
22232425262728
2930