Some simple improvements to MIR pretty printing
In short, this PR changes the MIR printer so that it:
* places an empty line between the MIR for each item
* does *not* write an empty line before the first BB when there are no
var decls
* aligns the "// Scope" comments 50 chars in (makes the output more
readable)
* prints the scope comments as "// scope N at ..." instead of "//
Scope(N) at ..."
* prints a prettier scope tree:
* no more unbalanced delimiters!
* no more "Parent" entry (these convey no useful information)
* drop the "Scope()" and just print scope IDs
* no braces when the scope is empty
In action: https://gist.github.com/jonas-schievink/1c11226cbb112892a9470ce0f9870b65
Improve derived implementations for enums with lots of fieldless variants
A number of trait methods like PartialEq::eq or Hash::hash don't
actually need a distinct arm for each variant, because the code within
the arm only depends on the number and types of the fields in the
variants. We can easily exploit this fact to create less and better
code for enums with multiple variants that have no fields at all, the
extreme case being C-like enums.
For nickel.rs and its by now infamous 800 variant enum, this reduces
optimized compile times by 25% and non-optimized compile times by 40%.
Also peak memory usage is down by almost 40% (310MB down to 190MB).
To be fair, most other crates don't benefit nearly as much, because
they don't have as huge enums. The crates in the Rust distribution that
I measured saw basically no change in compile times (I only tried
optimized builds) and only 1-2% reduction in peak memory usage.
Make AtomicBool the same size as bool
Reopening #32365
This allows `AtomicBool` to be transmuted to a `bool`, which makes it more consistent with the other atomic types. Note that this now guarantees that the atomic type will always contain a valid `bool` value, which wasn't the case before (due to `fetch_nand`).
r? @alexcrichton
System limits may restrict the number of threads effectively spawned
by this test (eg. systemd recently introduced a 512 tasks per unit
maximum default).
This commit explicitly asserts on the expected number of threads,
making failures due to system limits easier to spot.
More details at https://bugs.debian.org/822325
Signed-off-by: Luca Bruno <lucab@debian.org>
Fix spans and expected token lists, fix#33413 + other cosmetic improvements
Add test for #33413
Convert between `Arg` and `ExplicitSelf` precisely
Simplify pretty-printing for methods
trans-collector: Assorted fixes and refactorings needed for making trans collector-driven.
As the title says. The messages on the individual commits should do a good job of explaining what they are about.
r? @nikomatsakis
[MIR trans] Optimize trans for biased switches
Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.
To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.
In combination with #33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.
cc #33111
Remove unification despite ambiguity in projection
Turns out that closures aren't explicitly considered in `project.rs`, so the ambiguity handling w.r.t. closures can just be removed as the change done in `select.rs` covers it.
r? @nikomatsakis
Don't use env::current_exe with libbacktrace
If the path we give to libbacktrace doesn't actually correspond to the
current process, libbacktrace will segfault *at best*.
cc #21889
r? @alexcrichton
cc @semarie
Only break critical edges where actually needed
Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.
This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.
In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².
In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".
So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.
cc #33111
r? @Aatch
Clean up `hir::lowering`
Clean up `hir::lowering`:
- give lowering functions mutable access to the lowering context
- refactor the `lower_*` functions and other functions that take a lowering context into methods
- simplify the API that `hir::lowering` exposes to `driver`
- other miscellaneous cleanups
r? @nrc
trans: Always lower to `frem`
Long ago LLVM unfortunately didn't handle the 32-bit MSVC case of `frem` where
it can't be lowered to `fmodf` because that symbol doesn't exist. That was since
fixed in http://reviews.llvm.org/D12099 (landed as r246615) and was released in
what appears to be LLVM 3.8. Now that we're using that branch of LLVM let's
remove our own hacks and help LLVM optimize a little better by giving it
knowledge about what we're doing.