History

bors f60bc3ac0c Auto merge of #44505 - nikomatsakis:lotsa-comments, r=steveklabnik

rework the README.md for rustc and add other readmes

OK, so, long ago I committed to the idea of trying to write some high-level documentation for rustc. This has proved to be much harder for me to get done than I thought it would! This PR is far from as complete as I had hoped, but I wanted to open it so that people can give me feedback on the conventions that it establishes. If this seems like a good way forward, we can land it and I will open an issue with a good check-list of things to write (and try to take down some of them myself).

Here are the conventions I established on which I would like feedback.

**Use README.md files**. First off, I'm aiming to keep most of the high-level docs in `README.md` files, rather than entries on forge. My thought is that such files are (a) more discoverable than forge and (b) closer to the code, and hence can be edited in a single PR. However, since they are not *in the code*, they will naturally get out of date, so the intention is to focus on the highest-level details, which are least likely to bitrot. I've included a few examples of common functions and so forth, but never tried to (e.g.) exhaustively list the names of functions and so forth.
- I would like to use the tidy scripts to try and check that these do not go out of date. Future work.

**librustc/README.md as the main entrypoint.** This seems like the most natural place people will look first. It lays out how the crates are structured and **is intended** to give pointers to the main data structures of the compiler (I didn't update that yet; the existing material is terribly dated).

**A glossary listing abbreviations and things.** It's much harder to read code if you don't know what some obscure set of letters like `infcx` stands for.

**Major modules each have their own README.md that documents the high-level idea.** For example, I wrote some stuff about `hir` and `ty`. Both of them have many missing topics, but I think that is roughly the level of depth that would be good. The idea is to give people a "feeling" for what the code does.

What is missing primarily here is lots of content. =) Here are some things I'd like to see:

- A description of what a QUERY is and how to define one
- Some comments for `librustc/ty/maps.rs`
- An overview of how compilation proceeds now (i.e., the hybrid demand-driven and forward model) and how we would like to see it going in the future (all demand-driven)
- Some coverage of how incremental will work under red-green
- An updated list of the major IRs in use of the compiler (AST, HIR, TypeckTables, MIR) and major bits of interesting code (typeck, borrowck, etc)
- More advice on how to use `x.py`, or at least pointers to that
- Good choice for `config.toml`
- How to use `RUST_LOG` and other debugging flags (e.g., `-Zverbose`, `-Ztreat-err-as-bug`)
- Helpful conventions for `debug!` statement formatting

cc @rust-lang/compiler @mgattozzi

2017-09-19 22:43:58 +00:00

map

rework the README.md for rustc and add other readmes

2017-09-19 09:00:59 -04:00

check_attr.rs

…

def_id.rs

Improve DefIndex formatting to be more semantic

2017-09-04 22:57:22 +02:00

def.rs

Auto merge of #44435 - alexcrichton:in-scope, r=michaelwoerister

2017-09-11 15:35:35 +00:00

intravisit.rs

Use NodeId/HirId instead of DefId for local variables.

2017-09-08 22:00:59 +03:00

itemlikevisit.rs

…

lowering.rs

rustc: Forbid interpolated tokens in the HIR

2017-09-18 17:20:12 -07:00

mod.rs

apply various nits

2017-09-19 09:00:59 -04:00

pat_util.rs

…

print.rs

pprust: increase precedence of block-like exprs

2017-09-07 10:28:31 -04:00

README.md

apply various nits

2017-09-19 09:00:59 -04:00

svh.rs

…

README.md

Introduction to the HIR

The HIR -- "High-level IR" -- is the primary IR used in most of rustc. It is a desugared version of the "abstract syntax tree" (AST) that is generated after parsing, macro expansion, and name resolution have completed. Many parts of HIR resemble Rust surface syntax quite closely, with the exception that some of Rust's expression forms have been desugared away (as an example, for loops are converted into a loop and do not appear in the HIR).

This README covers the main concepts of the HIR.

Out-of-band storage and the `Crate` type

The top-level data-structure in the HIR is the Crate, which stores the contents of the crate currently being compiled (we only ever construct HIR for the current crate). Whereas in the AST the crate data structure basically just contains the root module, the HIR Crate structure contains a number of maps and other things that serve to organize the content of the crate for easier access.

For example, the contents of individual items (e.g., modules, functions, traits, impls, etc) in the HIR are not immediately accessible in the parents. So, for example, if had a module item foo containing a function bar():

mod foo {
  fn bar() { }
}

Then in the HIR the representation of module foo (the Mod stuct) would have only the ItemId I of bar(). To get the details of the function bar(), we would lookup I in the items map.

One nice result from this representation is that one can iterate over all items in the crate by iterating over the key-value pairs in these maps (without the need to trawl through the IR in total). There are similar maps for things like trait items and impl items, as well as "bodies" (explained below).

The other reason to setup the representation this way is for better integration with incremental compilation. This way, if you gain access to a &hir::Item (e.g. for the mod foo), you do not immediately gain access to the contents of the function bar(). Instead, you only gain access to the id for bar(), and you must invoke some function to lookup the contents of bar() given its id; this gives us a chance to observe that you accessed the data for bar() and record the dependency.

Identifiers in the HIR

Most of the code that has to deal with things in HIR tends not to carry around references into the HIR, but rather to carry around identifier numbers (or just "ids"). Right now, you will find four sorts of identifiers in active use:

DefId, which primarily name "definitions" or top-level items.
- You can think of a DefId as being shorthand for a very explicit and complete path, like std::collections::HashMap. However, these paths are able to name things that are not nameable in normal Rust (e.g., impls), and they also include extra information about the crate (such as its version number, as two versions of the same crate can co-exist).
- A DefId really consists of two parts, a CrateNum (which identifies the crate) and a DefIndex (which indixes into a list of items that is maintained per crate).
HirId, which combines the index of a particular item with an offset within that item.
- the key point of a HirId is that it is relative to some item (which is named via a DefId).
BodyId, this is an absolute identifier that refers to a specific body (definition of a function or constant) in the crate. It is currently effectively a "newtype'd" NodeId.
NodeId, which is an absolute id that identifies a single node in the HIR tree.
- While these are still in common use, they are being slowly phased out.
- Since they are absolute within the crate, adding a new node anywhere in the tree causes the node-ids of all subsequent code in the crate to change. This is terrible for incremental compilation, as you can perhaps imagine.

HIR Map

Most of the time when you are working with the HIR, you will do so via the HIR Map, accessible in the tcx via tcx.hir (and defined in the hir::map module). The HIR map contains a number of methods to convert between ids of various kinds and to lookup data associated with a HIR node.

For example, if you have a DefId, and you would like to convert it to a NodeId, you can use tcx.hir.as_local_node_id(def_id). This returns an Option<NodeId> -- this will be None if the def-id refers to something outside of the current crate (since then it has no HIR node), but otherwise returns Some(n) where n is the node-id of the definition.

Similarly, you can use tcx.hir.find(n) to lookup the node for a NodeId. This returns a Option<Node<'tcx>>, where Node is an enum defined in the map; by matching on this you can find out what sort of node the node-id referred to and also get a pointer to the data itself. Often, you know what sort of node n is -- e.g., if you know that n must be some HIR expression, you can do tcx.hir.expect_expr(n), which will extract and return the &hir::Expr, panicking if n is not in fact an expression.

Finally, you can use the HIR map to find the parents of nodes, via calls like tcx.hir.get_parent_node(n).

HIR Bodies

A body represents some kind of executable code, such as the body of a function/closure or the definition of a constant. Bodies are associated with an owner, which is typically some kind of item (e.g., a fn() or const), but could also be a closure expression (e.g., |x, y| x + y). You can use the HIR map to find find the body associated with a given def-id (maybe_body_owned_by()) or to find the owner of a body (body_owner_def_id()).

README.md

Introduction to the HIR

Out-of-band storage and the Crate type

Identifiers in the HIR

HIR Map

HIR Bodies

Out-of-band storage and the `Crate` type