Some notes:
* This code attempts to present the breakdown of each variant for
every enum in the MIR. This is meant to guide decisions about how to
revise representations e.g. when to box payloads for rare variants
to shrink the size of the enum overall.
* I left out the "Total:" line that hir-stats presents, because this
implementation uses the MIR Visitor infrastructure, and the memory
usage of structures directly embedded in other structures (e.g. the
`func: Operand` in a `TerminatorKind:Call`) is not distinguished
from similar structures allocated in a `Vec` (e.g. the `args:
Vec<Operand>` in a `TerminatorKind::Call`). This means that a naive
summation of all the accumulated sizes is misleading, because it
will double-count the contribution of the `Operand` of the `func` as
well as the size of the whole `TerminatorKind`.
* I did consider abandoning the MIR Visitor and instead hand-coding
a traversal that distinguished embedded storage from indirect
storage. But such code would be fragile; better to just require
people to take care when interpreting the presented results.
* This traverses the `mir.promoted` rvalues to capture stats for MIR
stored there, even though the MIR visitor super_mir method does not
do so. (I did not observe any new mir being traversed when compiling
the rustc crate, however.)
* It might be nice to try to unify this code with hir-stats. Then
again, the reporting portion is the only common code (I think), and
it is small compared to the visitors in hir-stats and mir-stats.
Based on the discussion in https://github.com/rust-lang/rust/pull/37573,
it is likely better to keep this limited to std::io, instead of
modifying a function which users expect to be a memcpy.
Ultimately copy_from_slice is being a bottleneck, not io::Cursor::read.
It might be worthwhile to move the check here, so more places can
benefit from it.
During benchmarking, I found that one of my programs spent between 5 and
10 percent of the time doing memmoves. Ultimately I tracked these down
to single-byte slices being copied with a memcopy in io::Cursor::read().
Doing a manual copy if only one byte is requested can speed things up
significantly. For my program, this reduced the running time by 20%.
Why special-case only a single byte, and not a "small" slice in general?
I tried doing this for slices of at most 64 bytes and of at most 8
bytes. In both cases my test program was significantly slower.
rustdoc: link to cross-crate sources directly.
Fixes#37684 by implementing proper support for getting the `Span` of definitions across crates.
In rustdoc this is used to generate direct links to the original source instead of fragile redirects.
This functionality could be expanded further for making error reporting code more uniform and seamless across crates, although at the moment there is no actual source to print, only file/line/column information.
Closes#37870 which is also "fixes" #37684 by throwing away the builtin macro docs from libcore.
After this lands, #37727 could be reverted, although it doesn't matter much either way.
Refactor TraitObject to Slice<ExistentialPredicate>
For reference, the primary types changes in this PR are shown below. They may add in the understanding of what is discussed below, though they should not be required.
We change `TraitObject` into a list of `ExistentialPredicate`s to allow for a couple of things:
- Principal (ExistentialPredicate::Trait) is now optional.
- Region bounds are moved out of `TraitObject` into `TyDynamic`. This permits wrapping only the `ExistentialPredicate` list in `Binder`.
- `BuiltinBounds` and `BuiltinBound` are removed entirely from the codebase, to permit future non-constrained auto traits. These are replaced with `ExistentialPredicate::AutoTrait`, which only requires a `DefId`. For the time being, only `Send` and `Sync` are supported; this constraint can be lifted in a future pull request.
- Binder-related logic is extracted from `ExistentialPredicate` into the parent (`Binder<Slice<EP>>`), so `PolyX`s are inside `TraitObject` are replaced with `X`.
The code requires a sorting order for `ExistentialPredicate`s in the interned `Slice`. The sort order is asserted to be correct during interning, but the slices are not sorted at that point.
1. `ExistentialPredicate::Trait` are defined as always equal; **This may be wrong; should we be comparing them and sorting them in some way?**
1. `ExistentialPredicate::Projection`: Compared by `ExistentialProjection::sort_key`.
1. `ExistentialPredicate::AutoTrait`: Compared by `TraitDef.def_path_hash`.
Construction of `ExistentialPredicate`s is conducted through `TyCtxt::mk_existential_predicates`, which interns a passed iterator as a `Slice`. There are no convenience functions to construct from a set of separate iterators; callers must pass an iterator chain. The lack of convenience functions is primarily due to few uses and the relative difficulty in defining a nice API due to optional parts and difficulty in recognizing which argument goes where. It is also true that the current situation isn't significantly better than 4 arguments to a constructor function; but the extra work is deemed unnecessary as of this time.
```rust
// before this PR
struct TraitObject<'tcx> {
pub principal: PolyExistentialTraitRef<'tcx>,
pub region_bound: &'tcx ty::Region,
pub builtin_bounds: BuiltinBounds,
pub projection_bounds: Vec<PolyExistentialProjection<'tcx>>,
}
// after
pub enum ExistentialPredicate<'tcx> {
// e.g. Iterator
Trait(ExistentialTraitRef<'tcx>),
// e.g. Iterator::Item = T
Projection(ExistentialProjection<'tcx>),
// e.g. Send
AutoTrait(DefId),
}
```
This commit adds a new attribute that instructs the compiler to emit
target specific code for a single function. For example, the following
function is permitted to use instructions that are part of SSE 4.2:
#[target_feature = "+sse4.2"]
fn foo() { ... }
In particular, use of this attribute does not require setting the
-C target-feature or -C target-cpu options on rustc.
This attribute does not have any protections built into it. For example,
nothing stops one from calling the above `foo` function on hosts without
SSE 4.2 support. Doing so may result in a SIGILL.
This commit also expands the target feature whitelist to include lzcnt,
popcnt and sse4a. Namely, lzcnt and popcnt have their own CPUID bits,
but were introduced with SSE4.
The `src/rustc` path is intended for assembling a compiler (e.g. the bare bones)
not actually compiling the whole compiler itself. This path was accidentally
getting hijacked to represent the whole compiler being compiled, so let's
redirect that elsewhere for that particular cargo project.
Closes#38039
Show multiline spans in full if short enough
When dealing with multiline spans that span few lines, show the complete span instead of restricting to the first character of the first line.
For example, instead of:
```
% ./rustc file2.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> file2.rs:13:9
|
13 | foo(1 + bar(x,
| ^ trait `{integer}: std::ops::Add<()>` not satisfied
|
```
show
```
% ./rustc file2.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
--> file2.rs:13:9
|
13 | foo(1 + bar(x,
| ________^ starting here...
14 | | y),
| |_____________^ ...ending here: trait `{integer}: std::ops::Add<()>` not satisfied
|
```
The [proposal in internals](https://internals.rust-lang.org/t/proposal-for-multiline-span-comments/4242/6) outlines the reasoning behind this.
Separate function bodies from their signatures in HIR
Also give them their own dep map node.
I'm still unhappy with the handling of inlined items (1452edc1), but maybe you have a suggestion how to improve it.
Fixes#35078.
r? @nikomatsakis
They don't implement FnLikeNode anymore, instead are handled differently
further up in the call tree. Also, keep less information (just def ids
for the args).
Setup two tasks, one of which only processes the signatures, in order to
isolate the typeck entries for signatures from those for bodies.
Fixes#36078Fixes#37720
This used to work with the rustc_clean attribute, but doesn't anymore
since my rebase; but I don't know enough about the type checking to find
out what's wrong. The dep graph looks like this:
ItemSignature(xxxx) -> CollectItem(xxxx)
CollectItem(xxxx) -> ItemSignature(xxxx)
ItemSignature(xxxx) -> TypeckItemBody(yyyy)
HirBody(xxxx) -> CollectItem(xxxx)
The cycle between CollectItem and ItemSignature looks wrong, and my
guess is the CollectItem -> ItemSignature edge shouldn't be there, but
I'm not sure how to prevent it.