This commit is spawned out of a performance regression investigation in #50496.
In tracking down this regression it turned out that the `expand_statements`
function in the compiler was taking quite a long time. Further investigation
showed two key properties:
* The function was "fast" on glibc 2.24 and slow on glibc 2.23
* The hottest function was memmove from glibc
Combined together it looked like glibc gained an optimization to the memmove
function in 2.24. Ideally we don't want to rely on this optimization, so I
wanted to dig further to see what was happening.
The hottest part of `expand_statements` was `Drop for Drain` in the call to
`splice` where we insert new statements into the original vector. This *should*
be a cheap operation because we're draining and replacing iterators of the exact
same length, but under the hood memmove was being called a lot, causing a
slowdown on glibc 2.23.
It turns out that at least one of the optimizations in glibc 2.24 was that
`memmove` where the src/dst are equal becomes much faster. [This program][prog]
executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting
how glibc 2.24 is optimizing `memmove` if the src/dst are equal.
And all that brings us to what this commit itself is doing. The change here is
purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being
copied doesn't actually need to be copied. For normal usage of just `Drain`
itself this check isn't really necessary, but because `Splice` internally
contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this
should fix the regression seen in #50496 on glibc 2.23 and also fix the
regression on Windows where `memmove` looks to not have this optimization.
Note that the way `splice` was called in `expand_statements` would cause a
quadratic number of elements to be copied via `memmove` which is likely why the
tuple-stress benchmark showed such a severe regression.
Closes#50496
[prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
Rollup of 11 pull requests
Successful merges:
- #49988 (Mention Result<!, E> in never docs.)
- #50148 (turn `ManuallyDrop::new` into a constant function)
- #50456 (Update the Cargo submodule)
- #50460 (Make `String::new()` const)
- #50464 (Remove some transmutes)
- #50505 (Added regression function match value test)
- #50511 (Add some explanations for #[must_use])
- #50525 (Optimize string handling in lit_token().)
- #50527 (Cleanup a `use` in a raw_vec test)
- #50539 (Add more logarithm constants)
- #49523 (Update RELEASES.md for 1.26.0)
Failed merges:
Add some explanations for #[must_use]
`#[must_use]` can be given a string argument which is shown whilst warning for things.
We should add a string argument to most of the user-exposed ones.
I added these for everything but the operators, mostly because I'm not sure what to write there or if we need anything there.
Add more logarithm constants
Right now, we have `ln(2)` and `ln(10)`, but only `log2(e)` and `log10(e)`. This also adds `log2(10)` and `log10(2)` for consistency.
Optimize string handling in lit_token().
In the common case, the string value in a string literal Token is the
same as the string value in a string literal LitKind. (The exception is
when escapes or \r are involved.) This patch takes advantage of that to
avoid calling str_lit() and re-interning the string in that case. This
speeds up incremental builds for a few of the rustc-benchmarks, the best
by 3%.
Benchmarks that got a speedup of 1% or more:
```
coercions
avg: -1.1% min: -3.5% max: 0.4%
regex-check
avg: -1.2% min: -1.5% max: -0.6%
futures-check
avg: -0.9% min: -1.4% max: -0.3%
futures
avg: -0.8% min: -1.3% max: -0.3%
futures-opt
avg: -0.7% min: -1.2% max: -0.1%
regex
avg: -0.5% min: -1.2% max: -0.1%
regex-opt
avg: -0.5% min: -1.1% max: -0.1%
hyper-check
avg: -0.7% min: -1.0% max: -0.3%
```
In the common case, the string value in a string literal Token is the
same as the string value in a string literal LitKind. (The exception is
when escapes or \r are involved.) This patch takes advantage of that to
avoid calling str_lit() and re-interning the string in that case. This
speeds up incremental builds for a few of the rustc-benchmarks, the best
by 3%.
Also remove some unnecessary debug_assert! when creating the shared
root, since the root should be stored in the rodata and thus be
impossible to accidentally modify.
lint: deny incoherent_fundamental_impls by default
Warn the ecosystem of the pending intent-to-disallow in #49799.
There are 4 ICEs on my machine, look unrelated (having happened before in https://github.com/rust-lang/rust/issues/49146#issuecomment-384473523)
```rust
thread 'main' panicked at 'assertion failed: position <= slice.len()', libserialize/leb128.rs:97:1
```
```
[run-pass] run-pass/allocator/xcrate-use2.rs
[run-pass] run-pass/issue-12133-3.rs
[run-pass] run-pass/issue-32518.rs
[run-pass] run-pass/trait-default-method-xc-2.rs
```
r? @nikomatsakis
This splits into_slices() into into_key_slice() and into_val_slice(). While the
extra calls would get optimized out, this is a useful semantic change since we
call keys() while iterating, and we don't want to construct and out-of-bounds
val() pointer in the process if we happen to be pointing to the shared static
root.
This also paves the way for doing the alignment handling conditional differently
for the keys and values.
This gives a pointer to that static empty node instead of allocating
a new node, and then whenever inserting makes sure that the root
isn't that empty node.
This way we can safely statically allocate a LeafNode to use as the
placeholder before allocating, and any type accessing it will be able to
access the metadata at the same offset.