Remove some redundant checks from BufReader
The implementation of BufReader contains a lot of redundant checks. While any one of these checks is not particularly expensive to execute, especially when taken together they dramatically inhibit LLVM's ability to make subsequent optimizations by confusing data flow increasing the code size of anything that uses BufReader.
In particular, these changes have a ~2x increase on the benchmark that this adds a `black_box` to. I'm adding that `black_box` here just in case LLVM gets clever enough to remove the reads entirely. Right now it can't, but these optimizations are really setting it up to do so.
We get this optimization by factoring all the actual buffer management and bounds-checking logic into a new module inside `bufreader` with a new `Buffer` type. This makes it much easier to ensure that we have correctly encapsulated the management of the region of the buffer that we have read bytes into, and it lets us provide a new faster way to do small reads. `Buffer::consume_with` lets a caller do a read from the buffer with a single bounds check, instead of the double-check that's required to use `buffer` + `consume`.
Unfortunately I'm not aware of a lot of open-source usage of `BufReader` in perf-critical environments. Some time ago I tweaked this code because I saw `BufReader` in a profile at work, and I contributed some benchmarks to the `bincode` crate which exercise `BufReader::buffer`. These changes appear to help those benchmarks at little, but all these sorts of benchmarks are kind of fragile so I'm wary of quoting anything specific.
Use `action/checkout@v3` in the book
As this type of document is often copied/pasted, using a newer version of `actions/checkout` would be better.
changelog: none
Signed-off-by: Yuki Okushi <jtitor@2k36.org>
Update cargo
5 commits in d8d30a75376f78bb0fabe3d28ee9d87aa8035309..85b500ccad8cd0b63995fd94a03ddd4b83f7905b
2022-07-19 13:59:17 +0000 to 2022-07-24 21:10:46 +0000
- Make the empty rustc-wrapper test more explicit. (rust-lang/cargo#10899)
- expand RUSTC_WRAPPER docs (rust-lang/cargo#10896)
- Stabilize Workspace Inheritance (rust-lang/cargo#10859)
- Fix typo in unstable docs: s/PROGJCT/PROJECT/ (rust-lang/cargo#10890)
- refactor(source): Open query API for adding more types of queries (rust-lang/cargo#10883)
Rollup of 8 pull requests
Successful merges:
- #98583 (Stabilize Windows `FileTypeExt` with `is_symlink_dir` and `is_symlink_file`)
- #99698 (Prefer visibility map parents that are not `doc(hidden)` first)
- #99700 (Add a clickable link to the layout section)
- #99712 (passes: port more of `check_attr` module)
- #99759 (Remove dead code from cg_llvm)
- #99765 (Don't build std for *-uefi targets)
- #99771 (Update pulldown-cmark version to 0.9.2 (fixes url encoding for some chars))
- #99775 (rustdoc: do not allocate String when writing path full name)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
rustdoc: do not allocate String when writing path full name
No idea if this makes any perf difference, but it just seems like premature pessimisation to use String when str will do.
passes: port more of `check_attr` module
Continues from #99213.
Port more diagnostics in `rustc_passes::check_attr` to using the diagnostic derive and translation machinery.
r? `@compiler-errors`
Add a clickable link to the layout section
The layout section (activated by `--show-type-layout`) is currently not linkable to (outside of chrome's link to text feature). This PR makes it linkable via `#layout`.
Prefer visibility map parents that are not `doc(hidden)` first
Far simpler approach to #98876.
This only fixes the case where the parent is `doc(hidden)`, not where the child is `doc(hidden)` since I don't know how to get the attrs on the import statement given a `ModChild`... I'll try to follow up with that, but this is a good first step.
Stabilize Windows `FileTypeExt` with `is_symlink_dir` and `is_symlink_file`
These calls allow detecting whether a symlink is a file or a directory,
a distinction Windows maintains, and one important to software that
wants to do further operations on the symlink (e.g. removing it).
Optimized vec::IntoIter::next_chunk impl
```
x86_64v1, default
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
On znver2 the default impl seems to be slow due to different inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deep call tree.
codegen: use new {re,de,}allocator annotations in llvm
This obviates the patch that teaches LLVM internals about
_rust_{re,de}alloc functions by putting annotations directly in the IR
for the optimizer.
The sole test change is required to anchor FileCheck to the body of the
`box_uninitialized` method, so it doesn't see the `allocalign` on
`__rust_alloc` and get mad about the string `alloca` showing up. Since I
was there anyway, I added some checks on the attributes to prove the
right attributes got set.
r? `@nikic`
```
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
The znver2 default impl seems to be slow due to inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deeper call tree.