Let a portion of DefPathHash uniquely identify the DefPath's crate.
This allows to directly map from a `DefPathHash` to the crate it originates from, without constructing side tables to do that mapping -- something that is useful for incremental compilation where we deal with `DefPathHash` instead of `DefId` a lot.
It also allows to reliably and cheaply check for `DefPathHash` collisions which allows the compiler to gracefully abort compilation instead of running into a subsequent ICE at some random place in the code.
The following new piece of documentation describes the most interesting aspects of the changes:
```rust
/// A `DefPathHash` is a fixed-size representation of a `DefPath` that is
/// stable across crate and compilation session boundaries. It consists of two
/// separate 64-bit hashes. The first uniquely identifies the crate this
/// `DefPathHash` originates from (see [StableCrateId]), and the second
/// uniquely identifies the corresponding `DefPath` within that crate. Together
/// they form a unique identifier within an entire crate graph.
///
/// There is a very small chance of hash collisions, which would mean that two
/// different `DefPath`s map to the same `DefPathHash`. Proceeding compilation
/// with such a hash collision would very probably lead to an ICE and, in the
/// worst case, to a silent mis-compilation. The compiler therefore actively
/// and exhaustively checks for such hash collisions and aborts compilation if
/// it finds one.
///
/// `DefPathHash` uses 64-bit hashes for both the crate-id part and the
/// crate-internal part, even though it is likely that there are many more
/// `LocalDefId`s in a single crate than there are individual crates in a crate
/// graph. Since we use the same number of bits in both cases, the collision
/// probability for the crate-local part will be quite a bit higher (though
/// still very small).
///
/// This imbalance is not by accident: A hash collision in the
/// crate-local part of a `DefPathHash` will be detected and reported while
/// compiling the crate in question. Such a collision does not depend on
/// outside factors and can be easily fixed by the crate maintainer (e.g. by
/// renaming the item in question or by bumping the crate version in a harmless
/// way).
///
/// A collision between crate-id hashes on the other hand is harder to fix
/// because it depends on the set of crates in the entire crate graph of a
/// compilation session. Again, using the same crate with a different version
/// number would fix the issue with a high probability -- but that might be
/// easier said then done if the crates in questions are dependencies of
/// third-party crates.
///
/// That being said, given a high quality hash function, the collision
/// probabilities in question are very small. For example, for a big crate like
/// `rustc_middle` (with ~50000 `LocalDefId`s as of the time of writing) there
/// is a probability of roughly 1 in 14,750,000,000 of a crate-internal
/// collision occurring. For a big crate graph with 1000 crates in it, there is
/// a probability of 1 in 36,890,000,000,000 of a `StableCrateId` collision.
```
Given the probabilities involved I hope that no one will ever actually see the error messages. Nonetheless, I'd be glad about some feedback on how to improve them. Should we create a GH issue describing the problem and possible solutions to point to? Or a page in the rustc book?
r? `@pnkfelix` (feel free to re-assign)
Only store a LocalDefId in some HIR nodes
Some HIR nodes are guaranteed to be HIR owners: Item, TraitItem, ImplItem, ForeignItem and MacroDef.
As a consequence, we do not need to store the `HirId`'s `local_id`, and we can directly store a `LocalDefId`.
This allows to avoid a bit of the dance with `tcx.hir().local_def_id` and `tcx.hir().local_def_id_to_hir_id` mappings.
This allows a build system to indicate a location in its own dependency
specification files (eg Cargo's `Cargo.toml`) which can be reported
along side any unused crate dependency.
This supports several types of location:
- 'json' - provide some json-structured data, which is included in the json diagnostics
in a `tool_metadata` field
- 'raw' - emit the provided string into the output. This also appears as a json string in
`tool_metadata`.
If no `--extern-location` is explicitly provided then a default json entry of the form
`"tool_metadata":{"name":<cratename>,"path":<cratepath>}` is emitted.
Encode MIR metadata by iterating on DefId instead of traversing the HIR tree
Split out of https://github.com/rust-lang/rust/pull/80347.
This part only traverses `mir_keys` and encodes MIR according to the def kind.
r? `@oli-obk`
This allows to directly map from a DefPathHash to the crate it
originates from, without constructing side tables to do that mapping.
It also allows to reliably and cheaply check for DefPathHash collisions.
Remove DepKind::CrateMetadata and pre-allocation of DepNodes
Remove much of the special-case handling around crate metadata
dependency tracking by replacing `DepKind::CrateMetadata` and the
pre-allocation of corresponding `DepNodes` with on-demand invocation
of the `crate_hash` query.
Rework diagnostics for wrong number of generic args (fixes#66228 and #71924)
This PR reworks the `wrong number of {} arguments` message, so that it provides more details and contextual hints.
Consistently avoid constructing optimized MIR when not doing codegen
The optimized MIR for closures is being encoded unconditionally, while
being unnecessary for cargo check. This turns out to be especially
costly with MIR inlining enabled, since it triggers computation of
optimized MIR for all callees that are being examined for inlining
purposes https://github.com/rust-lang/rust/pull/77307#issuecomment-751915450.
Skip encoding of optimized MIR for closures, enum constructors, struct
constructors, and trait fns when not doing codegen, like it is already
done for other items since 49433.
Separate out a `hir::Impl` struct
This makes it possible to pass the `Impl` directly to functions, instead
of having to pass each of the many fields one at a time. It also
simplifies matches in many cases.
See `rustc_save_analysis::dump_visitor::process_impl` or `rustdoc::clean::clean_impl` for a good example of how this makes `impl`s easier to work with.
r? `@petrochenkov` maybe?
This makes it possible to pass the `Impl` directly to functions, instead
of having to pass each of the many fields one at a time. It also
simplifies matches in many cases.
The optimized MIR for closures is being encoded unconditionally, while
being unnecessary for cargo check. This turns out to be especially
costly with MIR inlining enabled, since it triggers computation of
optimized MIR for all callees that are being examined for inlining
purposes.
Skip encoding of optimized MIR for closures, enum constructors, struct
constructors, and trait fns when not doing codegen, like it is already
done for other items since 49433.