rust/tex/report/miri-report.tex

% vim: tw=100

\documentclass[twocolumn]{article}
\usepackage{blindtext}
\usepackage[hypcap]{caption}
\usepackage{fontspec}
\usepackage[colorlinks, urlcolor={blue!80!black}]{hyperref}
\usepackage[outputdir=out]{minted}
\usepackage{relsize}
\usepackage{xcolor}

\setmonofont{Source Code Pro}[
  BoldFont={* Medium},
  BoldItalicFont={* Medium Italic},
  Scale=MatchLowercase,
]

\newcommand{\rust}[1]{\mintinline{rust}{#1}}

\begin{document}

\title{Miri: \\ \smaller{An interpreter for Rust's mid-level intermediate representation}}
% \subtitle{test}
\author{Scott Olson\footnote{\href{mailto:scott@solson.me}{scott@solson.me}} \\
  \smaller{Supervised by Christopher Dutchyn}}
\date{April 8th, 2016}
\maketitle

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Abstract}

The increasing need for safe low-level code in contexts like operating systems and browsers is
driving the development of Rust\footnote{\url{https://www.rust-lang.org}}, a programming language
backed by Mozilla promising blazing speed without the segfaults. To make programming more
convenient, it's often desirable to be able to generate code or perform some computation at
compile-time. The former is mostly covered by Rust's existing macro feature, but the latter is
currently restricted to a limited form of constant evaluation capable of little beyond simple math.

When the existing constant evaluator was built, it would have been difficult to make it more
powerful than it is. However, a new intermediate representation was recently
added\footnote{\href{https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md}{The MIR RFC}}
to the Rust compiler between the abstract syntax tree and the back-end LLVM IR, called mid-level
intermediate representation, or MIR for short. As it turns out, writing an interpreter for MIR is a
surprisingly effective approach for supporting a large proportion of Rust's features in compile-time
execution.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Background}

The Rust compiler generates an instance of \rust{Mir} for each function [\autoref{fig:mir}]. Each
\rust{Mir} structure represents a control-flow graph for a given function, and contains a list of
``basic blocks'' which in turn contain a list of statements followed by a single terminator. Each
statement is of the form \rust{lvalue = rvalue}. An \rust{Lvalue} is used for referencing variables
and calculating addresses such as when dereferencing pointers, accessing fields, or indexing arrays.
An \rust{Rvalue} represents the core set of operations possible in MIR, including reading a value
from an lvalue, performing math operations, creating new pointers, structs, and arrays, and so on.
Finally, a terminator decides where control will flow next, optionally based on a boolean or some
other condition.

\begin{figure}[ht]
  \begin{minted}[autogobble]{rust}
    struct Mir {
        basic_blocks: Vec<BasicBlockData>,
        // ...
    }

    struct BasicBlockData {
        statements: Vec<Statement>,
        terminator: Terminator,
        // ...
    }

    struct Statement {
        lvalue: Lvalue,
        rvalue: Rvalue
    }

    enum Terminator {
        Goto { target: BasicBlock },
        If {
            cond: Operand,
            targets: [BasicBlock; 2]
        },
        // ...
    }
  \end{minted}
  \caption{MIR (simplified)}
  \label{fig:mir}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{First implementation}

\subsection{Basic operation}

Initially, I wrote a simple version of Miri\footnote{\url{https://github.com/tsion/miri}} that was
quite capable despite its flaws. The structure of the interpreter closely mirrors the structure of
MIR itself. It starts executing a function by iterating the statement list in the starting basic
block, matching over the lvalue to produce a pointer and matching over the rvalue to decide what to
write into that pointer. Evaluating the rvalue may involve reads (such as for the two sides of a
binary operation) or construction of new values. Upon reaching the terminator, a similar matching is
done and a new basic block is selected. Finally, Miri returns to the top of the main interpreter
loop and this entire process repeats, reading statements from the new block.

\subsection{Function calls}

To handle function call terminators\footnote{Calls occur only as terminators, never as rvalues.},
Miri is required to store some information in a virtual call stack so that it may pick up where it
left off when the callee returns. Each stack frame stores a reference to the \rust{Mir} for the
function being executed, its local variables, its return value location\footnote{Return value
pointers are passed in by callers.}, and the basic block where execution should resume. To
facilitate returning, there is a \rust{Return} terminator which causes Miri to pop a stack frame and
resume the previous function. The entire execution of a program completes when the first function
that Miri called returns, rendering the call stack empty.

It should be noted that Miri does not itself recurse when a function is called; it merely pushes a
virtual stack frame and jumps to the top of the interpreter loop. Consequently, Miri can interpret
deeply recursive programs without crashing. It could also set a stack depth limit and report an
error when a program exceeds it.

\subsection{Flaws}

This version of Miri was surprisingly easy to write and already supported quite a bit of the Rust
language, including booleans, integers, if-conditions, while-loops, structs, enums, arrays, tuples,
pointers, and function calls, all in about 400 lines of Rust code. However, it had a particularly
naive value representation with a number of downsides. It resembled the data layout of a dynamic
language like Ruby or Python, where every value has the same size\footnote{A Rust \rust{enum} is a
discriminated union with a tag and data the size of the largest variant, regardless of which variant
it contains.} in the interpreter:

\begin{minted}[autogobble]{rust}
  enum Value {
      Uninitialized,
      Bool(bool),
      Int(i64),
      Pointer(Pointer), // index into stack
      Adt { variant: usize, data_ptr: Pointer },
      // ...
  }
\end{minted}

This representation did not work well for \rust{Adt}s\footnote{Algebraic data types: structs, enums,
arrays, and tuples.} and required strange hacks to support them. Their contained values were
allocated elsewhere on the stack and pointed to by the \rust{Adt} value. When it came to copying
\rust{Adt} values from place to place, this made it more complicated.

Moreover, while the \rust{Adt} issues could be worked around, this value representation made common
\rust{unsafe} programming tricks (which make assumptions about the low-level value layout)
fundamentally impossible.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Current implementation}

Roughly halfway through my time working on Miri, Rust compiler team member Eduard
Burtescu\footnote{\href{https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made
a post on Rust's internal
forums\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's
``Rust Abstract Machine'' forum post}} about a ``Rust Abstract Machine'' specification which could
be used to implement more powerful compile-time function execution, similar to what is supported by
C++14's \mintinline{cpp}{constexpr} feature. After clarifying some of the details of the abstract
machine's data layout with Burtescu via IRC, I started implementing it in Miri.

\subsection{Raw value representation}

The main difference in the new value representation was to represent values by ``abstract
allocations'' containing arrays of raw bytes with different sizes depending on the types of the
values. This closely mimics how Rust values are represented when compiled for traditional machines.
In addition to the raw bytes, allocations carry information about pointers and undefined bytes.

\begin{minted}[autogobble]{rust}
  struct Memory {
      map: HashMap<AllocId, Allocation>,
      next_id: AllocId,
  }

  struct Allocation {
      bytes: Vec<u8>,
      relocations: BTreeMap<usize, AllocId>,
      undef_mask: UndefMask,
  }
\end{minted}

\subsubsection{Relocations}

The abstract machine represents pointers through ``relocations'', which are analogous to relocations
in linkers\footnote{\href{https://en.wikipedia.org/wiki/Relocation_(computing)}{Relocation
(computing) - Wikipedia}}. Instead of storing a global memory address in the raw byte representation
like a traditional machine, we store an offset from the start of the target allocation and add an
entry to the relocation table. The entry maps the index of the start of the offset bytes to the
\rust{AllocId} of the target allocation.

\begin{figure}[ht]
  \begin{minted}[autogobble]{rust}
    let a: [i16; 3] = [2, 4, 6];
    let b = &a[1];
    // A: 02 00 04 00 06 00 (6 bytes)
    // B: 02 00 00 00 (4 bytes)
    //    └───(A)───┘
  \end{minted}
  \caption{Example relocation on 32-bit little-endian}
  \label{fig:reloc}
\end{figure}

In effect, the abstract machine treats each allocation as a separate address space and represents
pointers as \rust{(address_space, offset)} pairs. This makes it easy to detect when pointer accesses
go out of bounds.

See \autoref{fig:reloc} for an example of a relocation. Variable \rust{b} points to the second
16-bit integer in \rust{a}, so it contains a relocation with offset 2 and target allocation
\rust{A}.

\subsubsection{Undefined byte mask}

The final piece of an abstract allocation is the undefined byte mask. Logically, we store a boolean
for the definedness of every byte in the allocation, but there are multiple ways to make the storage
more compact. I tried two implementations: one based on the endpoints of alternating ranges of
defined and undefined bytes and the other based on a simple bitmask. The former is more compact but
I found it surprisingly difficult to update cleanly. I currently use the bitmask system, which is
comparatively trivial.

See \autoref{fig:undef} for an example undefined byte, represented by underscores. Note that there
would still be a value for the second byte in the byte array, but we don't care what it is. The
bitmask would be $10_2$ i.e. \rust{[true, false]}.

\begin{figure}[hb]
  \begin{minted}[autogobble]{rust}
    let a: [u8; 2] = unsafe {
        [1, std::mem::uninitialized()]
    };
    // A: 01 __ (2 bytes)
  \end{minted}
  \caption{Example undefined byte}
  \label{fig:undef}
\end{figure}

% TODO(tsion): Find a place for this text.
% Making Miri work was primarily an implementation problem. Writing an interpreter which models values
% of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
% unconventional techniques compared to many interpreters. Miri's execution remains safe even while
% simulating execution of unsafe code, which allows it to detect when unsafe code does something
% invalid.

\begin{figure}[t]
  \begin{minted}[autogobble]{rust}
    struct Vec<T> {
        data: *mut T,    // 4 byte pointer
        capacity: usize, // 4 byte integer
        length: usize,   // 4 byte integer
    }

    let mut v: Vec<u8> =
        Vec::with_capacity(2);
    // A: 00 00 00 00 02 00 00 00 00 00 00 00
    //    └───(B)───┘
    // B: __ __

    v.push(1);
    // A: 00 00 00 00 02 00 00 00 01 00 00 00
    //    └───(B)───┘
    // B: 01 __

    v.push(2);
    // A: 00 00 00 00 02 00 00 00 02 00 00 00
    //    └───(B)───┘
    // B: 01 02

    v.push(3);
    // A: 00 00 00 00 04 00 00 00 03 00 00 00
    //    └───(B)───┘
    // B: 01 02 03 __
  \end{minted}
  \caption{\rust{Vec} example on 32-bit little-endian}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Future work}

Other possible uses for Miri include:

\begin{itemize}
  \item A graphical or text-mode debugger that steps through MIR execution one statement at a time,
    for figuring out why some compile-time execution is raising an error or simply learning how Rust
    works at a low level.
  \item An read-eval-print-loop (REPL) for Rust may be easier to implement on top of Miri than the
    usual LLVM back-end.
  \item An extended version of Miri could be developed apart from the purpose of compile-time
    execution that is able to run foreign functions from C/C++ and generally have full access to the
    operating system. Such a version of Miri could be used to more quickly prototype changes to the
    Rust language that would otherwise require changes to the LLVM back-end.
  \item Miri might be useful for unit-testing the compiler by comparing the results of Miri's
    execution against the results of LLVM-compiled machine code's execution. This would help to
    guarantee that compile-time execution works the same as runtime execution.
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Thanks}

Eduard Burtescu, Niko Matsakis, and Christopher Dutchyn.

\end{document}
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`% vim: tw=100`

			`\documentclass[twocolumn]{article}`
			`\usepackage{blindtext}`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`\usepackage[hypcap]{caption}`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`\usepackage{fontspec}`
			`\usepackage[colorlinks, urlcolor={blue!80!black}]{hyperref}`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`\usepackage[outputdir=out]{minted}`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`\usepackage{relsize}`
			`\usepackage{xcolor}`

Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`\setmonofont{Source Code Pro}[`
			`BoldFont={* Medium},`
			`BoldItalicFont={* Medium Italic},`
			`Scale=MatchLowercase,`
			`]`

Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`\newcommand{\rust}[1]{\mintinline{rust}{#1}}`

Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`\begin{document}`

			`\title{Miri: \\ \smaller{An interpreter for Rust's mid-level intermediate representation}}`
			`% \subtitle{test}`
			`\author{Scott Olson\footnote{\href{mailto:scott@solson.me}{scott@solson.me}} \\`
			`\smaller{Supervised by Christopher Dutchyn}}`
			`\date{April 8th, 2016}`
			`\maketitle`

Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`

Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`\section{Abstract}`

			`The increasing need for safe low-level code in contexts like operating systems and browsers is`
			`driving the development of Rust\footnote{\url{https://www.rust-lang.org}}, a programming language`
			`backed by Mozilla promising blazing speed without the segfaults. To make programming more`
			`convenient, it's often desirable to be able to generate code or perform some computation at`
			`compile-time. The former is mostly covered by Rust's existing macro feature, but the latter is`
			`currently restricted to a limited form of constant evaluation capable of little beyond simple math.`

			`When the existing constant evaluator was built, it would have been difficult to make it more`
			`powerful than it is. However, a new intermediate representation was recently`
			`added\footnote{\href{https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md}{The MIR RFC}}`
			`to the Rust compiler between the abstract syntax tree and the back-end LLVM IR, called mid-level`
			`intermediate representation, or MIR for short. As it turns out, writing an interpreter for MIR is a`
			`surprisingly effective approach for supporting a large proportion of Rust's features in compile-time`
			`execution.`

Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`

Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`\section{Background}`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`The Rust compiler generates an instance of \rust{Mir} for each function [\autoref{fig:mir}]. Each`
			`\rust{Mir} structure represents a control-flow graph for a given function, and contains a list of`
			``basic blocks'' which in turn contain a list of statements followed by a single terminator. Each
			`statement is of the form \rust{lvalue = rvalue}. An \rust{Lvalue} is used for referencing variables`
			`and calculating addresses such as when dereferencing pointers, accessing fields, or indexing arrays.`
			`An \rust{Rvalue} represents the core set of operations possible in MIR, including reading a value`
			`from an lvalue, performing math operations, creating new pointers, structs, and arrays, and so on.`
			`Finally, a terminator decides where control will flow next, optionally based on a boolean or some`
			`other condition.`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00
			`\begin{figure}[ht]`
			`\begin{minted}[autogobble]{rust}`
			`struct Mir {`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`basic_blocks: Vec<BasicBlockData>,`
			`// ...`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`}`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`struct BasicBlockData {`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`statements: Vec<Statement>,`
			`terminator: Terminator,`
			`// ...`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`}`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`struct Statement {`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`lvalue: Lvalue,`
			`rvalue: Rvalue`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`}`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`enum Terminator {`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`Goto { target: BasicBlock },`
			`If {`
			`cond: Operand,`
			`targets: [BasicBlock; 2]`
			`},`
			`// ...`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`}`
			`\end{minted}`
			`\caption{MIR (simplified)}`
			`\label{fig:mir}`
			`\end{figure}`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`

Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`\section{First implementation}`

Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`\subsection{Basic operation}`

Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`Initially, I wrote a simple version of Miri\footnote{\url{https://github.com/tsion/miri}} that was`
			`quite capable despite its flaws. The structure of the interpreter closely mirrors the structure of`
			`MIR itself. It starts executing a function by iterating the statement list in the starting basic`
			`block, matching over the lvalue to produce a pointer and matching over the rvalue to decide what to`
			`write into that pointer. Evaluating the rvalue may involve reads (such as for the two sides of a`
			`binary operation) or construction of new values. Upon reaching the terminator, a similar matching is`
			`done and a new basic block is selected. Finally, Miri returns to the top of the main interpreter`
			`loop and this entire process repeats, reading statements from the new block.`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00
			`\subsection{Function calls}`

			`To handle function call terminators\footnote{Calls occur only as terminators, never as rvalues.},`
			`Miri is required to store some information in a virtual call stack so that it may pick up where it`
			`left off when the callee returns. Each stack frame stores a reference to the \rust{Mir} for the`
			`function being executed, its local variables, its return value location\footnote{Return value`
			`pointers are passed in by callers.}, and the basic block where execution should resume. To`
			`facilitate returning, there is a \rust{Return} terminator which causes Miri to pop a stack frame and`
			`resume the previous function. The entire execution of a program completes when the first function`
			`that Miri called returns, rendering the call stack empty.`

			`It should be noted that Miri does not itself recurse when a function is called; it merely pushes a`
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`virtual stack frame and jumps to the top of the interpreter loop. Consequently, Miri can interpret`
			`deeply recursive programs without crashing. It could also set a stack depth limit and report an`
			`error when a program exceeds it.`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00
			`\subsection{Flaws}`

report: Add "Flaws" and "Current implementation". 2016-04-09 23:22:06 -05:00			`This version of Miri was surprisingly easy to write and already supported quite a bit of the Rust`
			`language, including booleans, integers, if-conditions, while-loops, structs, enums, arrays, tuples,`
			`pointers, and function calls, all in about 400 lines of Rust code. However, it had a particularly`
			`naive value representation with a number of downsides. It resembled the data layout of a dynamic`
			`language like Ruby or Python, where every value has the same size\footnote{A Rust \rust{enum} is a`
			`discriminated union with a tag and data the size of the largest variant, regardless of which variant`
			`it contains.} in the interpreter:`

			`\begin{minted}[autogobble]{rust}`
			`enum Value {`
			`Uninitialized,`
			`Bool(bool),`
			`Int(i64),`
			`Pointer(Pointer), // index into stack`
			`Adt { variant: usize, data_ptr: Pointer },`
			`// ...`
			`}`
			`\end{minted}`

			`This representation did not work well for \rust{Adt}s\footnote{Algebraic data types: structs, enums,`
			`arrays, and tuples.} and required strange hacks to support them. Their contained values were`
			`allocated elsewhere on the stack and pointed to by the \rust{Adt} value. When it came to copying`
			`\rust{Adt} values from place to place, this made it more complicated.`

			`Moreover, while the \rust{Adt} issues could be worked around, this value representation made common`
			`\rust{unsafe} programming tricks (which make assumptions about the low-level value layout)`
			`fundamentally impossible.`

			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`

			`\section{Current implementation}`

			`Roughly halfway through my time working on Miri, Rust compiler team member Eduard`
			`Burtescu\footnote{\href{https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made`
			`a post on Rust's internal`
			`forums\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's`
			``Rust Abstract Machine'' forum post}} about a ``Rust Abstract Machine'' specification which could
			`be used to implement more powerful compile-time function execution, similar to what is supported by`
			`C++14's \mintinline{cpp}{constexpr} feature. After clarifying some of the details of the abstract`
			`machine's data layout with Burtescu via IRC, I started implementing it in Miri.`

			`\subsection{Raw value representation}`

			The main difference in the new value representation was to represent values by ``abstract
			`allocations'' containing arrays of raw bytes with different sizes depending on the types of the`
			`values. This closely mimics how Rust values are represented when compiled for traditional machines.`
			`In addition to the raw bytes, allocations carry information about pointers and undefined bytes.`

			`\begin{minted}[autogobble]{rust}`
			`struct Memory {`
			`map: HashMap<AllocId, Allocation>,`
			`next_id: AllocId,`
			`}`

			`struct Allocation {`
			`bytes: Vec<u8>,`
			`relocations: BTreeMap<usize, AllocId>,`
			`undef_mask: UndefMask,`
			`}`
			`\end{minted}`

			`\subsubsection{Relocations}`

			The abstract machine represents pointers through ``relocations'', which are analogous to relocations
			`in linkers\footnote{\href{https://en.wikipedia.org/wiki/Relocation_(computing)}{Relocation`
			`(computing) - Wikipedia}}. Instead of storing a global memory address in the raw byte representation`
			`like a traditional machine, we store an offset from the start of the target allocation and add an`
			`entry to the relocation table. The entry maps the index of the start of the offset bytes to the`
			`\rust{AllocId} of the target allocation.`

			`\begin{figure}[ht]`
			`\begin{minted}[autogobble]{rust}`
			`let a: [i16; 3] = [2, 4, 6];`
			`let b = &a[1];`
			`// A: 02 00 04 00 06 00 (6 bytes)`
			`// B: 02 00 00 00 (4 bytes)`
			`// └───(A)───┘`
			`\end{minted}`
			`\caption{Example relocation on 32-bit little-endian}`
			`\label{fig:reloc}`
			`\end{figure}`

			`In effect, the abstract machine treats each allocation as a separate address space and represents`
			`pointers as \rust{(address_space, offset)} pairs. This makes it easy to detect when pointer accesses`
			`go out of bounds.`

			`See \autoref{fig:reloc} for an example of a relocation. Variable \rust{b} points to the second`
			`16-bit integer in \rust{a}, so it contains a relocation with offset 2 and target allocation`
			`\rust{A}.`

			`\subsubsection{Undefined byte mask}`

			`The final piece of an abstract allocation is the undefined byte mask. Logically, we store a boolean`
			`for the definedness of every byte in the allocation, but there are multiple ways to make the storage`
			`more compact. I tried two implementations: one based on the endpoints of alternating ranges of`
			`defined and undefined bytes and the other based on a simple bitmask. The former is more compact but`
			`I found it surprisingly difficult to update cleanly. I currently use the bitmask system, which is`
			`comparatively trivial.`

			`See \autoref{fig:undef} for an example undefined byte, represented by underscores. Note that there`
			`would still be a value for the second byte in the byte array, but we don't care what it is. The`
			`bitmask would be $10_2$ i.e. \rust{[true, false]}.`

			`\begin{figure}[hb]`
			`\begin{minted}[autogobble]{rust}`
			`let a: [u8; 2] = unsafe {`
			`[1, std::mem::uninitialized()]`
			`};`
			`// A: 01 __ (2 bytes)`
			`\end{minted}`
			`\caption{Example undefined byte}`
			`\label{fig:undef}`
			`\end{figure}`
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`% TODO(tsion): Find a place for this text.`
report: Add "Flaws" and "Current implementation". 2016-04-09 23:22:06 -05:00			`% Making Miri work was primarily an implementation problem. Writing an interpreter which models values`
			`% of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some`
			`% unconventional techniques compared to many interpreters. Miri's execution remains safe even while`
			`% simulating execution of unsafe code, which allows it to detect when unsafe code does something`
			`% invalid.`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00
report: Add "Vec<u8>" example. 2016-04-09 23:23:00 -05:00			`\begin{figure}[t]`
			`\begin{minted}[autogobble]{rust}`
			`struct Vec<T> {`
			`data: *mut T, // 4 byte pointer`
			`capacity: usize, // 4 byte integer`
			`length: usize, // 4 byte integer`
			`}`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00
report: Add "Vec<u8>" example. 2016-04-09 23:23:00 -05:00			`let mut v: Vec<u8> =`
			`Vec::with_capacity(2);`
			`// A: 00 00 00 00 02 00 00 00 00 00 00 00`
			`// └───(B)───┘`
			`// B: __ __`

			`v.push(1);`
			`// A: 00 00 00 00 02 00 00 00 01 00 00 00`
			`// └───(B)───┘`
			`// B: 01 __`

			`v.push(2);`
			`// A: 00 00 00 00 02 00 00 00 02 00 00 00`
			`// └───(B)───┘`
			`// B: 01 02`

			`v.push(3);`
			`// A: 00 00 00 00 04 00 00 00 03 00 00 00`
			`// └───(B)───┘`
			`// B: 01 02 03 __`
			`\end{minted}`
			`\caption{\rust{Vec} example on 32-bit little-endian}`
			`\end{figure}`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00
Reword and reformat various parts. 2016-04-09 20:36:55 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`
Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00
Add background and intro to first implementation. 2016-04-08 20:54:03 -05:00			`\section{Future work}`

			`Other possible uses for Miri include:`

			`\begin{itemize}`
			`\item A graphical or text-mode debugger that steps through MIR execution one statement at a time,`
			`for figuring out why some compile-time execution is raising an error or simply learning how Rust`
			`works at a low level.`
			`\item An read-eval-print-loop (REPL) for Rust may be easier to implement on top of Miri than the`
			`usual LLVM back-end.`
			`\item An extended version of Miri could be developed apart from the purpose of compile-time`
			`execution that is able to run foreign functions from C/C++ and generally have full access to the`
			`operating system. Such a version of Miri could be used to more quickly prototype changes to the`
			`Rust language that would otherwise require changes to the LLVM back-end.`
			`\item Miri might be useful for unit-testing the compiler by comparing the results of Miri's`
			`execution against the results of LLVM-compiled machine code's execution. This would help to`
			`guarantee that compile-time execution works the same as runtime execution.`
			`\end{itemize}`

report: Add stub "Thanks" section. 2016-04-09 23:23:15 -05:00			`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`

			`\section{Thanks}`

			`Eduard Burtescu, Niko Matsakis, and Christopher Dutchyn.`

Add basic final paper LaTeX with abstract. 2016-04-08 15:37:17 -05:00			`\end{document}`