diff --git a/src/README b/src/README index 4d1b431ae8a..3618ee18fa5 100644 --- a/src/README +++ b/src/README @@ -1,4 +1,4 @@ -This is preliminary version of the Rust compiler. +This is preliminary version of the Rust compiler(s). Source layout: @@ -11,8 +11,8 @@ boot/util - Ubiquitous helpers boot/llvm - LLVM-based alternative back end boot/driver - Compiler driver -comp/ The self-hosted compiler (doesn't exist yet) -comp/* - Same structure as in boot/ +comp/ The self-hosted compiler ("rustc": incomplete) +comp/* - Similar structure as in boot/ rt/ The runtime system rt/rust_*.cpp - The majority of the runtime services diff --git a/src/boot/README b/src/boot/README index 891e1336c55..e17bfd791ac 100644 --- a/src/boot/README +++ b/src/boot/README @@ -1,15 +1,15 @@ An informal guide to reading and working on the rustboot compiler. ================================================================== -First off, my sincerest apologies for the lightly-commented nature of the -compiler, as well as the general immaturity of the codebase; rustboot is -intended to be discarded in the near future as we transition off it, to a -rust-based, LLVM-backed compiler. It has taken longer than expected for "the -near future" to arrive, and here we are published and attracting contributors -without a good place for them to start. It will be a priority for the next -little while to make new contributors feel welcome and oriented within the -project; best I can do at this point. We were in a tremendous rush even to get -everything organized to this minimal point. +First off, know that our current state of development is "bootstrapping"; +this means we've got two compilers on the go and one of them is being used +to develop the other. Rustboot is written in ocaml and rustc in rust. The +one you *probably* ought to be working on at present is rustc. Rustboot is +more for historical comparison and bug-fixing whenever necessary to un-block +development of rustc. + +There's a document similar to this next door, then, in comp/README. The +comp directory is where we do work on rustc. If you wish to expand on this document, or have one of the slightly-more-familiar authors add anything else to it, please get in touch or diff --git a/src/comp/README b/src/comp/README new file mode 100644 index 00000000000..fad13ba6f35 --- /dev/null +++ b/src/comp/README @@ -0,0 +1,118 @@ +An informal guide to reading and working on the rustc compiler. +================================================================== + +First off, know that our current state of development is "bootstrapping"; +this means we've got two compilers on the go and one of them is being used +to develop the other. Rustboot is written in ocaml and rustc in rust. The +one you *probably* ought to be working on at present is rustc. Rustboot is +more for historical comparison and bug-fixing whenever necessary to un-block +development of rustc. + +There's a document similar to this next door, then, in boot/README. The boot +directory is where we do work on rustboot. + +If you wish to expand on this document, or have one of the +slightly-more-familiar authors add anything else to it, please get in touch or +file a bug. Your concerns are probably the same as someone else's. + + +High-level concepts +=================== + +Rustc consists of the following subdirectories: + +front/ - front-end: lexer, parser, AST. +middle/ - middle-end: resolving, typechecking, translating +driver/ - command-line processing, main() entrypoint +util/ - ubiquitous types and helper functions +lib/ - bindings to LLVM + +The entry-point for the compiler is main() in driver/rustc.rs, and this file +sequences the various parts together. + + +The 3 central data structures: +------------------------------ + +#1: front/ast.rs defines the AST. The AST is treated as immutable after + parsing despite containing some mutable types (hashtables and such). + There are three interesting details to know about this structure: + + - Many -- though not all -- nodes within this data structure are wrapped + in the type spanned[T], meaning that the front-end has marked the + input coordinates of that node. The member .node is the data itself, + the member .span is the input location (file, line, column; both low + and high). + + - Many other nodes within this data structure carry a def_id. These + nodes represent the 'target' of some name reference elsewhere in the + tree. When the AST is resolved, by middle/resolve.rs, all names wind + up acquiring a def that they point to. So anything that can be + pointed-to by a name winds up with a def_id. + + - Many nodes carry an additional type 'ann', for annotations. These + nodes are those that later stages of the middle-end add information + to, augmenting the basic structure of the tree. Currently that + includes the calculated type of any node that has a type; it will also + likely include typestates, layers and effects, when such things are + calculated. + +#2: middle/ty.rs defines the datatype ty.t, with its central member ty.struct. + This is the type that represents types after they have been resolved and + normalized by the middle-end. The typeck phase converts every ast type to + a ty.t, and the latter is used to drive later phases of compilation. Most + variants in the ast.ty tag have a corresponding variant in the ty.struct + tag. + +#3: lib/llvm.rs defines the exported types ValueRef, TypeRef, BasicBlockRef, + and several others. Each of these is an opaque pointer to an LLVM type, + manipulated through the lib.llvm interface. + + +Control and information flow within the compiler: +------------------------------------------------- + +- main() in driver/rustc.rs assumes control on startup. Options are parsed, + platform is detected, etc. + +- front/parser.rs is driven over the input files. + +- Multiple middle-end passes (middle/resolve.rs, middle/typeck.rs) are run + over the resulting AST. Each pass produces a new AST with some number of + annotations or modifications. + +- Finally middle/trans.rs is applied to the AST, which performs a + type-directed translation to LLVM-ese. When it's finished synthesizing LLVM + values, rustc asks LLVM to write them out as a bitcode file, on which you + can run the normal LLVM pipeline (opt, llc, as) to get an executable. + + +Comparison with rustboot +======================== + +Rustc is written in a more "functional" style than rustboot; each rustc pass +tends to depend only on the AST it's given as input, which it does not mutate. +Calculations flow from one pase to another by repeatedly rebuilding the AST +with additional annotations. + +Rustboot normalizes to a statement-centric AST. Rustc uses an +expression-centric AST. + +Rustboot generates 3-address IL into imperative buffers of coded IL quads. +Rustc generates LLVM, an SSA-based expression IL. + +Rustc, being attached to LLVM, generates much better code. Factor of 5 +smaller, usually. Sometimes much more. + +Rustc preserves more of the parsed input structure. Rustboot "desugars" most +of the input, rendering round-trip pretty-printing impossible. Error reporting +is also better in rustc, as type names (as denoted by the user) are preserved +throughout typechecking. + +Rustc is not concerned with the PIC-ness of the resulting code, nor anything +to do with encoding DWARF or x86 instructions. All this superfluous +machine-level logic that seeped up to the translation layer in rustboot is +pushed past LLVM into later stages of the toolchain in rustc. + +Numerous "bad idea" idiosyncracies of the rustboot AST have been eliminated in +rustc. In general the code is much more obvious, minimal and straightforward.