explain the types used in the open64 call
Fixes https://github.com/rust-lang/rust/issues/71915, where I learned about this quirk. I don't actually know what I am talking about here. ;)
Btreemap iter intertwined
3 commits:
1. Introduced benchmarks for `BTreeMap::iter()`. Benchmarks named `iter_20` were of the whole iteration process, so I renamed them. Also the benchmarks of `range` that I wrote earlier weren't very good. I included an (awkwardly named) one that compares `iter()` to `range(..)` on the same set, because the contrast is surprising:
```
name ns/iter
btree::map::range_unbounded_unbounded 28,176
btree::map::range_unbounded_vs_iter 89,369
```
Both dig up the same pair of leaf edges. `range(..)` also checks that some keys are correctly ordered, the only thing `iter()` does more is to copy the map's length.
2. Slightly refactoring the code to what I find more readable (not in chronological order of discovery), boosts performance:
```
>cargo-benchcmp.exe benchcmp a1 a2 --threshold 5
name a1 ns/iter a2 ns/iter diff ns/iter diff % speedup
btree::map::find_rand_100 18 17 -1 -5.56% x 1.06
btree::map::first_and_last_10k 64 71 7 10.94% x 0.90
btree::map::iter_0 2,939 2,209 -730 -24.84% x 1.33
btree::map::iter_1 6,845 2,696 -4,149 -60.61% x 2.54
btree::map::iter_100 8,556 3,672 -4,884 -57.08% x 2.33
btree::map::iter_10k 9,292 5,884 -3,408 -36.68% x 1.58
btree::map::iter_1m 10,268 6,510 -3,758 -36.60% x 1.58
btree::map::iteration_mut_100000 478,575 453,050 -25,525 -5.33% x 1.06
btree::map::range_unbounded_unbounded 28,176 36,169 7,993 28.37% x 0.78
btree::map::range_unbounded_vs_iter 89,369 38,290 -51,079 -57.16% x 2.33
btree::set::clone_100_and_remove_all 4,801 4,245 -556 -11.58% x 1.13
btree::set::clone_10k_and_remove_all 529,450 496,030 -33,420 -6.31% x 1.07
```
But you can tell from the `range_unbounded_*` lines that, despite an unwarranted, vengeful attack on the range_unbounded_unbounded benchmark, this change still doesn't allow `iter()` to catch up with `range(..)`.
3. I guess that `range(..)` copes so well because it intertwines the leftmost and rightmost descend towards leaf edges, doing the two root node accesses close together, perhaps exploiting a CPU's internal pipelining? So the third commit distils a version of `range_search` (which we can't use directly because of the `Ord` bound), and we get another boost:
```
cargo-benchcmp.exe benchcmp a2 a3 --threshold 5
name a2 ns/iter a3 ns/iter diff ns/iter diff % speedup
btree::map::first_and_last_100 40 43 3 7.50% x 0.93
btree::map::first_and_last_10k 71 64 -7 -9.86% x 1.11
btree::map::iter_0 2,209 1,719 -490 -22.18% x 1.29
btree::map::iter_1 2,696 2,205 -491 -18.21% x 1.22
btree::map::iter_100 3,672 2,943 -729 -19.85% x 1.25
btree::map::iter_10k 5,884 3,929 -1,955 -33.23% x 1.50
btree::map::iter_1m 6,510 5,532 -978 -15.02% x 1.18
btree::map::iteration_mut_100000 453,050 476,667 23,617 5.21% x 0.95
btree::map::range_included_excluded 405,075 371,297 -33,778 -8.34% x 1.09
btree::map::range_included_included 427,577 397,440 -30,137 -7.05% x 1.08
btree::map::range_unbounded_unbounded 36,169 28,175 -7,994 -22.10% x 1.28
btree::map::range_unbounded_vs_iter 38,290 30,838 -7,452 -19.46% x 1.24
```
But I think this is just fake news from the microbenchmarking media. `iter()` is still trying to catch up with `range(..)`. And we can sure do without another function. So I would skip this 3rd commit.
r? @Mark-Simulacrum
perf: Unify the undo log of all snapshot types
Extracted from #69218 and extended to all the current snapshot types.
Since snapshotting is such a frequent action in the compiler and many of the scopes execute so little work, the act of creating the snapshot and rolling back empty/small snapshots end up showing in perf. By unifying all the logs into one the creation of snapshots becomes significantly cheaper at the cost of some complexity when combining the log with the specific data structures that are being mutated.
Depends on https://github.com/rust-lang-nursery/ena/pull/29
Update RLS
In addition to fixing the toolstate, this also changes the default
compilation model to the out-of-process one, which should hopefully
target considerable memory usage for long-running instances of the RLS.
Fixes#71753
r? @ghost
In addition to fixing the toolstate, this also changes the default
compilation model to the out-of-process one, which should hopefully
target considerable memory usage for long-running instances of the RLS.
Rollup of 4 pull requests
Successful merges:
- #69984 (Add Option to Force Unwind Tables)
- #71830 (Remove clippy from some leftover lists of "possibly failing" tools)
- #71894 (Suggest removing semicolon in last expression only if it's type is known)
- #71897 (Improve docs for embed-bitcode and linker-plugin-lto)
Failed merges:
r? @ghost
Suggest removing semicolon in last expression only if it's type is known
Fixes#67971
Is there a syntax for explicitly checking if a note doesn't exist in test output? Something like `//~ !NOTE ...`
I believe r? @estebank deals with diagnostics.
Remove clippy from some leftover lists of "possibly failing" tools
https://github.com/rust-lang/rust/pull/70655 successfully made clippy get built and tested on CI on every merge, but the lack of emitted toolstate info caused the toolstate to get updated to test-fail. We should remove clippy entirely from toolstate, as it now is always test-pass.
The changes made in this PR reflect what we do for `rustdoc`, which is our preexisting tool that is gated on CI.
r? @Mark-Simulacrum
Add Option to Force Unwind Tables
When panic != unwind, `nounwind` is added to all functions for a target.
This can cause issues when a panic happens with RUST_BACKTRACE=1, as
there needs to be a way to reconstruct the backtrace. There are three
possible sources of this information: forcing frame pointers (for which
an option exists already), debug info (for which an option exists), or
unwind tables.
Especially for embedded devices, forcing frame pointers can have code
size overheads (RISC-V sees ~10% overheads, ARM sees ~2-3% overheads).
In production code, it can be the case that debug info is not kept, so it is useful
to provide this third option, unwind tables, that users can use to
reconstruct the call stack. Reconstructing this stack is harder than
with frame pointers, but it is still possible.
---
This came up in discussion on #69890, and turned out to be a fairly simple addition.
r? @hanna-kruppe
By merging the undo_log of all structures part of the snapshot the cost
of creating a snapshot becomes much cheaper. Since snapshots with no or
few changes are so frequent this ends up mattering more than the slight
overhead of dispatching on the variants that map to each field.
Update btree_map::VacantEntry::insert docs to actually call insert
It looks like they were copied from the `or_insert` docs. This change
makes the example more like the hash_map::VacantEntry::insert docs.
Correctly handle UEFI targets as Windows-like when emitting sections for LLVM bitcode
This handles UEFI handles when emitting inline assembly for sections containing LLVM bitcode. See details in #71880. I have locally confirmed that this change fixes compilation of projects using the `x86_64-unknown-uefi` target compiling with `cargo-xbuild`, but I am not very familiar with LLVM bitcode so this may not be the correct approach.
r? @alexcrichton as they wrote the initial LLVM bitcode emitting code?