Fix spans and expected token lists, fix#33413 + other cosmetic improvements
Add test for #33413
Convert between `Arg` and `ExplicitSelf` precisely
Simplify pretty-printing for methods
trans-collector: Assorted fixes and refactorings needed for making trans collector-driven.
As the title says. The messages on the individual commits should do a good job of explaining what they are about.
r? @nikomatsakis
[MIR trans] Optimize trans for biased switches
Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.
To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.
In combination with #33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.
cc #33111
Remove unification despite ambiguity in projection
Turns out that closures aren't explicitly considered in `project.rs`, so the ambiguity handling w.r.t. closures can just be removed as the change done in `select.rs` covers it.
r? @nikomatsakis
Don't use env::current_exe with libbacktrace
If the path we give to libbacktrace doesn't actually correspond to the
current process, libbacktrace will segfault *at best*.
cc #21889
r? @alexcrichton
cc @semarie
Only break critical edges where actually needed
Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.
This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.
In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².
In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".
So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.
cc #33111
r? @Aatch
Clean up `hir::lowering`
Clean up `hir::lowering`:
- give lowering functions mutable access to the lowering context
- refactor the `lower_*` functions and other functions that take a lowering context into methods
- simplify the API that `hir::lowering` exposes to `driver`
- other miscellaneous cleanups
r? @nrc
trans: Always lower to `frem`
Long ago LLVM unfortunately didn't handle the 32-bit MSVC case of `frem` where
it can't be lowered to `fmodf` because that symbol doesn't exist. That was since
fixed in http://reviews.llvm.org/D12099 (landed as r246615) and was released in
what appears to be LLVM 3.8. Now that we're using that branch of LLVM let's
remove our own hacks and help LLVM optimize a little better by giving it
knowledge about what we're doing.
Copy more libraries from local Rust to stage0
When bootstrapping Rust using a previously built toolchain, I noticed
a number of libraries were not copied in. As a result the copied in
rustc fails to execute because it can't find all its dependences.
Add them into the local_stage0.sh script.
Explain the meaning of the fields of the control word and provide more
details about how the relevant one (Precision Control) is updated in
the fast path.
The fast path of the float parser relies on the rounding to happen
exactly and directly to the correct number of bits. On x87, instead,
double rounding would occour as the FPU stack defaults to 80 bits of
precision.
This can be fixed by setting the precision of the FPU stack before
performing the int to float conversion. This can be achieved by
changing the value of the x87 control word. This is a somewhat common
operation that is in fact performed whenever a float needs to be
truncated to an integer, but it is undesirable to add its overhead for
code that does not rely on x87 for computations (i.e. on non-x86
architectures, or x86 architectures which perform FPU computations on
using SSE).
Fixes `num::dec2flt::fast_path_correct` (on x87).
mir: don't attempt to promote Unpromotable constant temps.
Fixes#33537. This was a non-problem in regular functions, but we also promote in `const fn`s.
There we always qualify temps so you can't depend on `Unpromotable` temps being `NOT_CONST`.
re-introduce a cache for ast-ty-to-ty
It turns out that `ast_ty_to_ty` is supposed to be updating the `def`
after it finishes, but at some point in the past it stopped doing
so. This was never noticed because of the `ast_ty_to_ty_cache`, but that
cache was recently removed. This PR fixes the code to update the def
properly, but apparently that is not quite enough to make the operation
idempotent, so for now we reintroduce the cache too.
Fixes#33586.
r? @eddyb
Make --emit dep-info work correctly with -Z no-analysis again.
Previously, it would attempt to resolve some external crates that weren't necessary for dep-info output.
Fixes#33231.
In short, this PR changes the MIR printer so that it:
* places an empty line between the MIR for each item
* does *not* write an empty line before the first BB when there are no
var decls
* aligns the "// Scope" comments 50 chars in (makes the output more
readable)
* prints the scope comments as "// scope N at ..." instead of "//
Scope(N) at ..."
* prints a prettier scope tree:
* no more unbalanced delimiters!
* no more "Parent" entry (these convey no useful information)
* drop the "Scope()" and just print scope IDs
* no braces when the scope is empty
rustbuild: Add support for crate tests + doctests
This commit adds support to rustbuild to run crate unit tests (those defined by
`#[test]`) as well as documentation tests. All tests are powered by `cargo test`
under the hood.
Each step requires the `libtest` library is built for that corresponding stage.
Ideally the `test` crate would be a dev-dependency, but for now it's just easier
to ensure that we sequence everything in the right order.
Currently no filtering is implemented, so there's not actually a method of
testing *only* libstd or *only* libcore, but rather entire swaths of crates are
tested all at once.
A few points of note here are:
* The `coretest` and `collectionstest` crates are just listed as `[[test]]`
entires for `cargo test` to naturally pick up. This mean that `cargo test -p
core` actually runs all the tests for libcore.
* Libraries that aren't tested all mention `test = false` in their `Cargo.toml`
* Crates aren't currently allowed to have dev-dependencies due to
rust-lang/cargo#860, but we can likely alleviate this restriction once
workspaces are implemented.
cc #31590
A number of trait methods like PartialEq::eq or Hash::hash don't
actually need a distinct arm for each variant, because the code within
the arm only depends on the number and types of the fields in the
variants. We can easily exploit this fact to create less and better
code for enums with multiple variants that have no fields at all, the
extreme case being C-like enums.
For nickel.rs and its by now infamous 800 variant enum, this reduces
optimized compile times by 25% and non-optimized compile times by 40%.
Also peak memory usage is down by almost 40% (310MB down to 190MB).
To be fair, most other crates don't benefit nearly as much, because
they don't have as huge enums. The crates in the Rust distribution that
I measured saw basically no change in compile times (I only tried
optimized builds) and only 1-2% reduction in peak memory usage.
It turns out that `ast_ty_to_ty` is supposed to be updating the `def`
after it finishes, but at some point in the past it stopped doing
so. This was never noticed because of the `ast_ty_to_ty_cache`, but that
cache was recently removed. This PR fixes the code to update the def
properly, but apparently that is not quite enough to make the operation
idempotent, so for now we reintroduce the cache too.
Fixes#33425.