Before this PR, type names could depend on the cratenum being used
for a given crate and also on the source location of closures.
Both are undesirable for incremental compilation where we cache
LLVM IR and don't want it to depend on formatting or in which
order crates are loaded.
Replace syntax's SmallVector with AccumulateVec
This adds a new type to data_structures, `SmallVec`, which wraps `AccumulateVec` with support for re-allocating onto the heap (`SmallVec::reserve`). `SmallVec` is then used to replace the implementation of `SmallVector` in libsyntax.
r? @eddyb
Fixes#37371. Using `SmallVec` instead of libsyntax's `SmallVector` will provide the `N = 2/4` case easily (just needs a few more `Array` impls).
cc @nnethercote, probably interested in this area
A way to remove otherwise unused locals from MIR
There is a certain amount of desire for a pass which cleans up the provably unused variables (no assignments or reads). There has been an implementation of such pass by @scottcarr, and another (two!) implementations by me in my own dataflow efforts.
PR like https://github.com/rust-lang/rust/pull/35916 proves that this pass is useful even on its own, which is why I cherry-picked it out from my dataflow effort.
@nikomatsakis previously expressed concerns over this pass not seeming to be very cheap to run and therefore unsuitable for regular cleanup duties. Turns out, regular cleanup of local declarations is not at all necessary, at least now, because majority of passes simply do not (or should not) care about them. That’s why it is viable to only run this pass once (perhaps a few more times in the future?) per function, right before translation.
r? @eddyb or @nikomatsakis
Use where clasues and only where clauses for bounds in the
iterators for Graph.
The rest of the code uses bounds on the generic declarations for
Debug, and we may want to change those to for consistency. I did
not do that here because I don't know whether or not that's a good
idea. But for the iterators, they were inconsistent causing
confusion, at least for me.
This commit partially inlines two functions, `find_cycles_from_node` and
`mark_as_waiting_from`, at two call sites in order to avoid function
unnecessary function calls on hot paths.
It also fully inlines and removes `is_popped`.
These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 2% when
doing debug builds with a stage1 compiler.
Add ArrayVec and AccumulateVec to reduce heap allocations during interning of slices
Updates `mk_tup`, `mk_type_list`, and `mk_substs` to allow interning directly from iterators. The previous PR, #37220, changed some of the calls to pass a borrowed slice from `Vec` instead of directly passing the iterator, and these changes further optimize that to avoid the allocation entirely.
This change yields 50% less malloc calls in [some cases](https://pastebin.mozilla.org/8921686). It also yields decent, though not amazing, performance improvements:
```
futures-rs-test 4.091s vs 4.021s --> 1.017x faster (variance: 1.004x, 1.004x)
helloworld 0.219s vs 0.220s --> 0.993x faster (variance: 1.010x, 1.018x)
html5ever-2016- 3.805s vs 3.736s --> 1.018x faster (variance: 1.003x, 1.009x)
hyper.0.5.0 4.609s vs 4.571s --> 1.008x faster (variance: 1.015x, 1.017x)
inflate-0.1.0 3.864s vs 3.883s --> 0.995x faster (variance: 1.232x, 1.005x)
issue-32062-equ 0.309s vs 0.299s --> 1.033x faster (variance: 1.014x, 1.003x)
issue-32278-big 1.614s vs 1.594s --> 1.013x faster (variance: 1.007x, 1.004x)
jld-day15-parse 1.390s vs 1.326s --> 1.049x faster (variance: 1.006x, 1.009x)
piston-image-0. 10.930s vs 10.675s --> 1.024x faster (variance: 1.006x, 1.010x)
reddit-stress 2.302s vs 2.261s --> 1.019x faster (variance: 1.010x, 1.026x)
regex.0.1.30 2.250s vs 2.240s --> 1.005x faster (variance: 1.087x, 1.011x)
rust-encoding-0 1.895s vs 1.887s --> 1.005x faster (variance: 1.005x, 1.018x)
syntex-0.42.2 29.045s vs 28.663s --> 1.013x faster (variance: 1.004x, 1.006x)
syntex-0.42.2-i 13.925s vs 13.868s --> 1.004x faster (variance: 1.022x, 1.007x)
```
We implement a small-size optimized vector, intended to be used primarily for collection of presumed to be short iterators. This vector cannot be "upsized/reallocated" into a heap-allocated vector, since that would require (slow) branching logic, but during the initial collection from an iterator heap-allocation is possible.
We make the new `AccumulateVec` and `ArrayVec` generic over implementors of the `Array` trait, of which there is currently one, `[T; 8]`. In the future, this is likely to expand to other values of N.
Huge thanks to @nnethercote for collecting the performance and other statistics mentioned above.
remove keys w/ skolemized regions from proj cache when popping skolemized regions
This addresses #37154 (a regression). The projection cache was incorrectly caching the results for skolemized regions -- when we pop skolemized regions, we are supposed to drop cache keys for them (just as we remove those skolemized regions from the region inference graph). This is because those skolemized region numbers will be reused later with different meaning (and we have determined that the old ones don't leak out in any meaningful way).
I did a *somewhat* aggressive fix here of only removing keys that mention the skolemized regions. One could imagine just removing all keys added since we started the skolemization (as indeed I did in my initial commit). This more aggressive fix required fixing a latent bug in `TypeFlags`, as an aside.
I believe the more aggressive fix is correct; clearly there can be entries that are unrelated to the skoelemized region, and it's a shame to remove them. My one concern was that it *is* possible I believe to have some region variables that are created and related to skolemized regions, and maybe some of them could end up in the cache. However, that seems harmless enough to me-- those relations will be removed, and couldn't have impacted how trait resolution proceeded anyway (iow, the cache entry is not wrong, though it is kind of useless).
r? @pnkfelix
cc @arielb1
ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints
This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the [BLAKE2b hash function](https://blake2.net/), an improved version of the BLAKE sha-3 finalist.
I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits):
```
SipHash
libcore: 0.199 seconds
libstd: 0.090 seconds
BLAKE2b
libcore: 0.162 seconds
libstd: 0.078 seconds
```
If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit.
I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson)
r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)
fail obligations that depend on erroring obligations
Fix a bug where an obligation that depend on an erroring obligation would
be regarded as successful, leading to global cache pollution and random
lossage.
Fixes#33723.
Fixes#34503.
r? @eddyb since @nikomatsakis is on vacation
beta-nominating because of the massive lossage potential (e.g. with `Copy` this could lead to random memory leaks), plus this is a regression.
Fix a bug where an obligation that depend on an erroring obligation would
be regarded as successful, leading to global cache pollution and random
lossage.
Fixes#33723.
Fixes#34503.
generate fewer basic blocks for variant switches
CC #33567
Adds a new field to TestKind::Switch that tracks the variants that are actually matched against. The other candidates target a common "otherwise" block.