175 lines
7.7 KiB
Markdown
175 lines
7.7 KiB
Markdown
An informal guide to reading and working on the rustc compiler.
|
||
==================================================================
|
||
|
||
If you wish to expand on this document, or have a more experienced
|
||
Rust contributor add anything else to it, please get in touch:
|
||
|
||
* https://internals.rust-lang.org/
|
||
* https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust
|
||
|
||
or file a bug:
|
||
|
||
https://github.com/rust-lang/rust/issues
|
||
|
||
Your concerns are probably the same as someone else's.
|
||
|
||
The crates of rustc
|
||
===================
|
||
|
||
Rustc consists of a number of crates, including `libsyntax`,
|
||
`librustc`, `librustc_back`, `librustc_trans`, and `librustc_driver`
|
||
(the names and divisions are not set in stone and may change;
|
||
in general, a finer-grained division of crates is preferable):
|
||
|
||
- [`libsyntax`][libsyntax] contains those things concerned purely with syntax –
|
||
that is, the AST, parser, pretty-printer, lexer, macro expander, and
|
||
utilities for traversing ASTs – are in a separate crate called
|
||
"syntax", whose files are in `./../libsyntax`, where `.` is the
|
||
current directory (that is, the parent directory of front/, middle/,
|
||
back/, and so on).
|
||
|
||
- `librustc` (the current directory) contains the high-level analysis
|
||
passes, such as the type checker, borrow checker, and so forth.
|
||
It is the heart of the compiler.
|
||
|
||
- [`librustc_back`][back] contains some very low-level details that are
|
||
specific to different LLVM targets and so forth.
|
||
|
||
- [`librustc_trans`][trans] contains the code to convert from Rust IR into LLVM
|
||
IR, and then from LLVM IR into machine code, as well as the main
|
||
driver that orchestrates all the other passes and various other bits
|
||
of miscellany. In general it contains code that runs towards the
|
||
end of the compilation process.
|
||
|
||
- [`librustc_driver`][driver] invokes the compiler from
|
||
[`libsyntax`][libsyntax], then the analysis phases from `librustc`, and
|
||
finally the lowering and codegen passes from [`librustc_trans`][trans].
|
||
|
||
Roughly speaking the "order" of the three crates is as follows:
|
||
|
||
librustc_driver
|
||
|
|
||
+-----------------+-------------------+
|
||
| |
|
||
libsyntax -> librustc -> librustc_trans
|
||
|
||
|
||
The compiler process:
|
||
=====================
|
||
|
||
The Rust compiler is comprised of six main compilation phases.
|
||
|
||
1. Parsing input
|
||
2. Configuration & expanding (cfg rules & syntax extension expansion)
|
||
3. Running analysis passes
|
||
4. Translation to LLVM
|
||
5. LLVM passes
|
||
6. Linking
|
||
|
||
Phase one is responsible for parsing & lexing the input to the compiler. The
|
||
output of this phase is an abstract syntax tree (AST). The AST at this point
|
||
includes all macro uses & attributes. This means code which will be later
|
||
expanded and/or removed due to `cfg` attributes is still present in this
|
||
version of the AST. Parsing abstracts away details about individual files which
|
||
have been read into the AST.
|
||
|
||
Phase two handles configuration and macro expansion. You can think of this
|
||
phase as a function acting on the AST from the previous phase. The input for
|
||
this phase is the unexpanded AST from phase one, and the output is an expanded
|
||
version of the same AST. This phase will expand all macros & syntax
|
||
extensions and will evaluate all `cfg` attributes, potentially removing some
|
||
code. The resulting AST will not contain any macros or `macro_use` statements.
|
||
|
||
The code for these first two phases is in [`libsyntax`][libsyntax].
|
||
|
||
After this phase, the compiler allocates IDs to each node in the AST
|
||
(technically not every node, but most of them). If we are writing out
|
||
dependencies, that happens now.
|
||
|
||
The third phase is analysis. This is the most complex phase in the compiler,
|
||
and makes up much of the code. This phase included name resolution, type
|
||
checking, borrow checking, type & lifetime inference, trait selection, method
|
||
selection, linting and so on. Most of the error detection in the compiler comes
|
||
from this phase (with the exception of parse errors which arise during
|
||
parsing). The "output" of this phase is a set of side tables containing
|
||
semantic information about the source program. The analysis code is in
|
||
[`librustc`][rustc] and some other crates with the `librustc_` prefix.
|
||
|
||
The fourth phase is translation. This phase translates the AST (and the side
|
||
tables from the previous phase) into LLVM IR (intermediate representation).
|
||
This is achieved by calling into the LLVM libraries. The code for this is in
|
||
[`librustc_trans`][trans].
|
||
|
||
Phase five runs the LLVM backend. This runs LLVM's optimization passes on the
|
||
generated IR and generates machine code resulting in object files. This phase
|
||
is not really part of the Rust compiler, as LLVM carries out all the work.
|
||
The interface between LLVM and Rust is in [`librustc_llvm`][llvm].
|
||
|
||
The final phase, phase six, links the object files into an executable. This is
|
||
again outsourced to other tools and not performed by the Rust compiler
|
||
directly. The interface is in [`librustc_back`][back] (which also contains some
|
||
things used primarily during translation).
|
||
|
||
A module called the driver coordinates all these phases. It handles all the
|
||
highest level coordination of compilation from parsing command line arguments
|
||
all the way to invoking the linker to produce an executable.
|
||
|
||
Modules in the librustc crate
|
||
=============================
|
||
|
||
The librustc crate itself consists of the following submodules
|
||
(mostly, but not entirely, in their own directories):
|
||
|
||
- session: options and data that pertain to the compilation session as
|
||
a whole
|
||
- middle: middle-end: name resolution, typechecking, LLVM code
|
||
generation
|
||
- metadata: encoder and decoder for data required by separate
|
||
compilation
|
||
- plugin: infrastructure for compiler plugins
|
||
- lint: infrastructure for compiler warnings
|
||
- util: ubiquitous types and helper functions
|
||
- lib: bindings to LLVM
|
||
|
||
The entry-point for the compiler is main() in the [`librustc_driver`][driver]
|
||
crate.
|
||
|
||
The 3 central data structures:
|
||
------------------------------
|
||
|
||
1. `./../libsyntax/ast.rs` defines the AST. The AST is treated as
|
||
immutable after parsing, but it depends on mutable context data
|
||
structures (mainly hash maps) to give it meaning.
|
||
|
||
- Many – though not all – nodes within this data structure are
|
||
wrapped in the type `spanned<T>`, meaning that the front-end has
|
||
marked the input coordinates of that node. The member `node` is
|
||
the data itself, the member `span` is the input location (file,
|
||
line, column; both low and high).
|
||
|
||
- Many other nodes within this data structure carry a
|
||
`def_id`. These nodes represent the 'target' of some name
|
||
reference elsewhere in the tree. When the AST is resolved, by
|
||
`middle/resolve.rs`, all names wind up acquiring a def that they
|
||
point to. So anything that can be pointed-to by a name winds
|
||
up with a `def_id`.
|
||
|
||
2. `middle/ty.rs` defines the datatype `sty`. This is the type that
|
||
represents types after they have been resolved and normalized by
|
||
the middle-end. The typeck phase converts every ast type to a
|
||
`ty::sty`, and the latter is used to drive later phases of
|
||
compilation. Most variants in the `ast::ty` tag have a
|
||
corresponding variant in the `ty::sty` tag.
|
||
|
||
3. `./../librustc_llvm/lib.rs` defines the exported types
|
||
`ValueRef`, `TypeRef`, `BasicBlockRef`, and several others.
|
||
Each of these is an opaque pointer to an LLVM type,
|
||
manipulated through the `lib::llvm` interface.
|
||
|
||
[libsyntax]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/
|
||
[trans]: https://github.com/rust-lang/rust/tree/master/src/librustc_trans/
|
||
[llvm]: https://github.com/rust-lang/rust/tree/master/src/librustc_llvm/
|
||
[back]: https://github.com/rust-lang/rust/tree/master/src/librustc_back/
|
||
[rustc]: https://github.com/rust-lang/rust/tree/master/src/librustc/
|
||
[driver]: https://github.com/rust-lang/rust/tree/master/src/librustc_driver
|