167 lines
7.4 KiB
TeX
167 lines
7.4 KiB
TeX
% vim: tw=100
|
||
|
||
\documentclass[twocolumn]{article}
|
||
\usepackage{blindtext}
|
||
\usepackage[hypcap]{caption}
|
||
\usepackage{fontspec}
|
||
\usepackage[colorlinks, urlcolor={blue!80!black}]{hyperref}
|
||
\usepackage[outputdir=out]{minted}
|
||
\usepackage{relsize}
|
||
\usepackage{xcolor}
|
||
|
||
\setmonofont{Source Code Pro}[
|
||
BoldFont={* Medium},
|
||
BoldItalicFont={* Medium Italic},
|
||
Scale=MatchLowercase,
|
||
]
|
||
|
||
\newcommand{\rust}[1]{\mintinline{rust}{#1}}
|
||
|
||
\begin{document}
|
||
|
||
\title{Miri: \\ \smaller{An interpreter for Rust's mid-level intermediate representation}}
|
||
% \subtitle{test}
|
||
\author{Scott Olson\footnote{\href{mailto:scott@solson.me}{scott@solson.me}} \\
|
||
\smaller{Supervised by Christopher Dutchyn}}
|
||
\date{April 8th, 2016}
|
||
\maketitle
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
||
\section{Abstract}
|
||
|
||
The increasing need for safe low-level code in contexts like operating systems and browsers is
|
||
driving the development of Rust\footnote{\url{https://www.rust-lang.org}}, a programming language
|
||
backed by Mozilla promising blazing speed without the segfaults. To make programming more
|
||
convenient, it's often desirable to be able to generate code or perform some computation at
|
||
compile-time. The former is mostly covered by Rust's existing macro feature, but the latter is
|
||
currently restricted to a limited form of constant evaluation capable of little beyond simple math.
|
||
|
||
When the existing constant evaluator was built, it would have been difficult to make it more
|
||
powerful than it is. However, a new intermediate representation was recently
|
||
added\footnote{\href{https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md}{The MIR RFC}}
|
||
to the Rust compiler between the abstract syntax tree and the back-end LLVM IR, called mid-level
|
||
intermediate representation, or MIR for short. As it turns out, writing an interpreter for MIR is a
|
||
surprisingly effective approach for supporting a large proportion of Rust's features in compile-time
|
||
execution.
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
||
\section{Background}
|
||
|
||
The Rust compiler generates an instance of \rust{Mir} for each function [\autoref{fig:mir}]. Each
|
||
\rust{Mir} structure represents a control-flow graph for a given function, and contains a list of
|
||
``basic blocks'' which in turn contain a list of statements followed by a single terminator. Each
|
||
statement is of the form \rust{lvalue = rvalue}. An \rust{Lvalue} is used for referencing variables
|
||
and calculating addresses such as when dereferencing pointers, accessing fields, or indexing arrays.
|
||
An \rust{Rvalue} represents the core set of operations possible in MIR, including reading a value
|
||
from an lvalue, performing math operations, creating new pointers, structs, and arrays, and so on.
|
||
Finally, a terminator decides where control will flow next, optionally based on a boolean or some
|
||
other condition.
|
||
|
||
\begin{figure}[ht]
|
||
\begin{minted}[autogobble]{rust}
|
||
struct Mir {
|
||
basic_blocks: Vec<BasicBlockData>,
|
||
// ...
|
||
}
|
||
|
||
struct BasicBlockData {
|
||
statements: Vec<Statement>,
|
||
terminator: Terminator,
|
||
// ...
|
||
}
|
||
|
||
struct Statement {
|
||
lvalue: Lvalue,
|
||
rvalue: Rvalue
|
||
}
|
||
|
||
enum Terminator {
|
||
Goto { target: BasicBlock },
|
||
If {
|
||
cond: Operand,
|
||
targets: [BasicBlock; 2]
|
||
},
|
||
// ...
|
||
}
|
||
\end{minted}
|
||
\caption{MIR (simplified)}
|
||
\label{fig:mir}
|
||
\end{figure}
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
||
\section{First implementation}
|
||
|
||
\subsection{Basic operation}
|
||
|
||
Initially, I wrote a simple version of Miri\footnote{\url{https://github.com/tsion/miri}} that was
|
||
quite capable despite its flaws. The structure of the interpreter closely mirrors the structure of
|
||
MIR itself. It starts executing a function by iterating the statement list in the starting basic
|
||
block, matching over the lvalue to produce a pointer and matching over the rvalue to decide what to
|
||
write into that pointer. Evaluating the rvalue may involve reads (such as for the two sides of a
|
||
binary operation) or construction of new values. Upon reaching the terminator, a similar matching is
|
||
done and a new basic block is selected. Finally, Miri returns to the top of the main interpreter
|
||
loop and this entire process repeats, reading statements from the new block.
|
||
|
||
\subsection{Function calls}
|
||
|
||
To handle function call terminators\footnote{Calls occur only as terminators, never as rvalues.},
|
||
Miri is required to store some information in a virtual call stack so that it may pick up where it
|
||
left off when the callee returns. Each stack frame stores a reference to the \rust{Mir} for the
|
||
function being executed, its local variables, its return value location\footnote{Return value
|
||
pointers are passed in by callers.}, and the basic block where execution should resume. To
|
||
facilitate returning, there is a \rust{Return} terminator which causes Miri to pop a stack frame and
|
||
resume the previous function. The entire execution of a program completes when the first function
|
||
that Miri called returns, rendering the call stack empty.
|
||
|
||
It should be noted that Miri does not itself recurse when a function is called; it merely pushes a
|
||
virtual stack frame and jumps to the top of the interpreter loop. Consequently, Miri can interpret
|
||
deeply recursive programs without crashing. It could also set a stack depth limit and report an
|
||
error when a program exceeds it.
|
||
|
||
\subsection{Flaws}
|
||
|
||
% TODO(tsion): Incorporate this text from the slides.
|
||
% At first I wrote a naive version with a number of downsides:
|
||
% * I represented values in a traditional dynamic language format,
|
||
% where every value was the same size.
|
||
% * I didn’t work well for aggregates (structs, enums, arrays, etc.).
|
||
% *I made unsafe programming tricks that make assumptions
|
||
% about low-level value layout essentially impossible
|
||
|
||
% TODO(tsion): Find a place for this text.
|
||
Making Miri work was primarily an implementation problem. Writing an interpreter which models values
|
||
of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
|
||
unconventional techniques compared to many interpreters. Miri's execution remains safe even while
|
||
simulating execution of unsafe code, which allows it to detect when unsafe code does something
|
||
invalid.
|
||
|
||
\blindtext
|
||
|
||
\section{Data layout}
|
||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
||
\section{Future work}
|
||
|
||
Other possible uses for Miri include:
|
||
|
||
\begin{itemize}
|
||
\item A graphical or text-mode debugger that steps through MIR execution one statement at a time,
|
||
for figuring out why some compile-time execution is raising an error or simply learning how Rust
|
||
works at a low level.
|
||
\item An read-eval-print-loop (REPL) for Rust may be easier to implement on top of Miri than the
|
||
usual LLVM back-end.
|
||
\item An extended version of Miri could be developed apart from the purpose of compile-time
|
||
execution that is able to run foreign functions from C/C++ and generally have full access to the
|
||
operating system. Such a version of Miri could be used to more quickly prototype changes to the
|
||
Rust language that would otherwise require changes to the LLVM back-end.
|
||
\item Miri might be useful for unit-testing the compiler by comparing the results of Miri's
|
||
execution against the results of LLVM-compiled machine code's execution. This would help to
|
||
guarantee that compile-time execution works the same as runtime execution.
|
||
\end{itemize}
|
||
|
||
\end{document}
|