On arm64 we have seen on several databases that ISB (instruction synchronization
barrier) is better to use than yield in a spin loop. The yield instruction is a
nop. The isb instruction puts the processor to sleep for some short time. isb
is a good equivalent to the pause instruction on x86.
Below is an experiment that shows the effects of yield and isb on Arm64 and the
time of a pause instruction on x86 Intel processors. The micro-benchmarks use
https://github.com/google/benchmark.git
$ cat a.cc
static void BM_scalar_increment(benchmark::State& state) {
int i = 0;
for (auto _ : state)
benchmark::DoNotOptimize(i++);
}
BENCHMARK(BM_scalar_increment);
static void BM_yield(benchmark::State& state) {
for (auto _ : state)
asm volatile("yield"::);
}
BENCHMARK(BM_yield);
static void BM_isb(benchmark::State& state) {
for (auto _ : state)
asm volatile("isb"::);
}
BENCHMARK(BM_isb);
BENCHMARK_MAIN();
$ g++ -o run a.cc -O2 -lbenchmark -lpthread
$ ./run
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
AWS Graviton2 (Neoverse-N1) processor:
BM_scalar_increment 0.485 ns 0.485 ns 1000000000
BM_yield 0.400 ns 0.400 ns 1000000000
BM_isb 13.2 ns 13.2 ns 52993304
AWS Graviton (A-72) processor:
BM_scalar_increment 0.897 ns 0.874 ns 801558633
BM_yield 0.877 ns 0.875 ns 800002377
BM_isb 13.0 ns 12.7 ns 55169412
Apple Arm64 M1 processor:
BM_scalar_increment 0.315 ns 0.315 ns 1000000000
BM_yield 0.313 ns 0.313 ns 1000000000
BM_isb 9.06 ns 9.06 ns 77259282
static void BM_pause(benchmark::State& state) {
for (auto _ : state)
asm volatile("pause"::);
}
BENCHMARK(BM_pause);
Intel Skylake processor:
BM_scalar_increment 0.295 ns 0.295 ns 1000000000
BM_pause 41.7 ns 41.7 ns 16780553
Tested on Graviton2 aarch64-linux with `./x.py test`.
Rollup of 8 pull requests
Successful merges:
- #81922 (Let `#[allow(unstable_name_collisions)]` work for things other than function)
- #82483 (Use FromStr trait for number option parsing)
- #82739 (Use the beta compiler for building bootstrap tools when `download-rustc` is set)
- #83650 (Update Source Serif to release 4.004)
- #83826 (List trait impls before deref methods in doc's sidebar)
- #83831 (Add `#[inline]` to IpAddr methods)
- #83863 (Render destructured struct function param names as underscore)
- #83865 (Don't report disambiguator error if link would have been ignored)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Don't report disambiguator error if link would have been ignored
Fixes#83859.
This prevents us from warning on links such as `<hello@example.com>`.
Note that we still warn on links such as `<hello@localhost>` because
they have no dots in them. However, the links will still work, even
though a warning is reported.
r? ````@jyn514````
List trait impls before deref methods in doc's sidebar
This PR is acting directly on a suggestion made by ```````@jyn514``````` in #83133. I've tested the changes locally, and can confirm that it does in fact properly achieve what he thought it would. This PR also in turn closes#83133.
Use the beta compiler for building bootstrap tools when `download-rustc` is set
## Motivation
This avoids having to rebuild bootstrap and tidy each time you rebase
over master. In particular, it makes rebasing and running `x.py fmt` on
each commit in a branch significantly faster. It also avoids having to
rebuild bootstrap after setting `download-rustc = true`.
## Implementation
Instead of extracting the CI artifacts directly to `stage0/`, extract
them to `ci-rustc/` instead. Continue to copy them to the proper
sysroots as necessary for all stages except stage 0.
This also requires `bootstrap.py` to download both stage0 and CI
artifacts and distinguish between the two when checking stamp files.
Note that since tools have to be built by the same compiler that built
`rustc-dev` and the standard library, the downloaded artifacts can't be
reused when building with the beta compiler. To make sure this is still
a good user experience, warn when building with the beta compiler, and
default to building with stage 2.
I tested this by rebasing this PR from edeee915b1 over 1c77a1fa3c and confirming that only the bootstrap library itself had to be rebuilt, not any dependencies and not `tidy`. I also tested that a clean build with `x.py build` builds rustdoc exactly once and does no other work, and that `touch src/librustdoc/lib.rs && x.py build` works. `x.py check` still behaves as before (checks using the beta compiler, even if there are changes to `compiler/`).
Helps with https://github.com/rust-lang/rust/issues/81930.
r? `@Mark-Simulacrum`
Use FromStr trait for number option parsing
Replace `parse_uint` with generic `parse_number` based on `FromStr`.
Use it for parsing inlining threshold to avoid casting later.
Let `#[allow(unstable_name_collisions)]` work for things other than function
Fixes#81522
In addition to the report in #81522, currently `#[allow(unstable_name_collisions)]` doesn't suppress the corresponding diagnostics even if this attribute is appended to an expression statement or a let statement. It seems like this is because the wrong `HirId` is passed to `struct_span_lint_hir`.
It's fixed in this PR, and a regression test for it is also added.
Use `#[inline(always)]` on trivial UnsafeCell methods
UnsafeCell is the standard building block for shared mutable data
structures. UnsafeCell should add zero overhead compared to using raw
pointers directly.
Some reports suggest that debug builds, or even builds at opt-level 1,
may not always be inlining its methods. Mark the methods as
`#[inline(always)]`, since once inlined the methods should result in no
actual code other than field accesses.
This prevents us from warning on links such as `<hello@example.com>`.
Note that we still warn on links such as `<hello@localhost>` because
they have no dots in them. However, the links will still work, even
though a warning is reported.
Bump bootstrap to 1.52 beta
This includes the standard bump, but also a workaround for new cargo behavior around clearing out the doc directory when the rustdoc version changes.
Allow clobbering unsupported registers in asm!
Previously registers could only be marked as clobbered if the target feature for that register was enabled. This restriction is now removed.
cc #81092
r? ``@nagisa``
Tests: Remove redundant `ignore-tidy-linelength` annotations
This is step 2 towards fixing #77548.
In the codegen and codegen-units test suites, the `//` comment markers
were kept in order not to affect any source locations. This is because
these tests cannot be automatically `--bless`ed.
core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.
SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by `ptr::swap_nonoverlapping_one` (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory".
As such, [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/)'s `rustc_codegen_spirv` backend cannot currently allow the use of `ptr::swap_nonoverlapping_one` - but that comes at a great price, since it's the building block of `mem::{swap,replace}`, and those in turn are used by e.g. `Option::take` and `Range`'s `Iterator` implementation (the latter blocking the use of `for i in 0..n` loops).
There's 4 options I can see in terms of supporting `ptr::swap_nonoverlapping_one` in `rustc_codegen_spirv`:
* legalize the block-wise swap loop back into swapping whole values, for SPIR-V
* this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching `core::ptr` on the fly"
* (**this PR**) disable the block-wise swap optimization altogether when `#[cfg(target_arch = "spirv")`
* I've tested it and it does in fact allow compiling `for i in 0..n` loops, which was my primary motivation
* main downside IMO is the fact that `core` now acknowledges an out-of-tree backend
* as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another
* only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement
* would avoid any surprises in terms of potentially-broken/inefficient codegen, in general
* however, it may be universally applicable (thanks to caches), even if the optimal block size could differ
* move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants
* this also has an impact on MIR optimizations (cc ``@rust-lang/wg-mir-opt)`` - which currently cannot hope to make sense of e.g. `Option::take` despite it being effectively `_0 = *_1;` `*_1 = None;` `return;`
* long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/) supporting e.g. range `for` loops
r? ``@nagisa`` cc ``@rust-lang/libs``
resolve/expand: Cache intermediate results of `#[derive]` expansion
Expansion function for `#[derive]` (`rustc_builtin_macros::derive::Expander::expand`) may return an indeterminate result, and therefore can be called multiple times.
Previously we parsed the `#[derive(Foo, Bar)]`'s input and tried to resolve `Foo` and `Bar` on every such call.
Now we maintain a cache `Resolver::derive_data` and take all the necessary data from it if it was computed previously.
So `Foo, Bar` is now parsed at most once, and `Foo` and `Bar` are successfully resolved at most once.
## Motivation
This avoids having to rebuild bootstrap and tidy each time you rebase
over master. In particular, it makes rebasing and running `x.py fmt` on
each commit in a branch significantly faster. It also avoids having to
rebuild bootstrap after setting `download-rustc = true`.
## Implementation
Instead of extracting the CI artifacts directly to `stage0/`, extract
them to `ci-rustc/` instead. Continue to copy them to the proper
sysroots as necessary for all stages except stage 0.
This also requires `bootstrap.py` to download both stage0 and CI
artifacts and distinguish between the two when checking stamp files.
Note that since tools have to be built by the same compiler that built
`rustc-dev` and the standard library, the downloaded artifacts can't be
reused when building with the beta compiler. To make sure this is still
a good user experience, warn when building with the beta compiler, and
default to building with stage 2.
1.52 Cargo adds rust-lang/cargo#8640 which means that cargo will try to purge
the doc directory caches for us. In theory this may mean that we can jettison
the clear_if_dirty for rustdoc versioning entirely, but for now just workaround
the effects of this change in a less principled but more local way.
UnsafeCell is the standard building block for shared mutable data
structures. UnsafeCell should add zero overhead compared to using raw
pointers directly.
Some reports suggest that debug builds, or even builds at opt-level 1,
may not always be inlining its methods. Mark the methods as
`#[inline(always)]`, since once inlined the methods should result in no
actual code other than field accesses.
Rollup of 8 pull requests
Successful merges:
- #73945 (Add an unstable --json=unused-externs flag to print unused externs)
- #81619 (Implement `SourceIterator` and `InPlaceIterable` for `ResultShunt`)
- #82726 (BTree: move blocks around in node.rs)
- #83521 (2229: Fix diagnostic issue when using FakeReads in closures)
- #83532 (Fix compiletest on FreeBSD)
- #83793 (rustdoc: highlight macros more efficiently)
- #83809 (Remove unneeded INITIAL_IDS const)
- #83827 (cleanup leak after test to make miri happy)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Remove unneeded INITIAL_IDS const
Some IDs inside this map didn't exist anymore, some others were duplicates of what we have inside `IdMap`. So instead of keeping the two around and since `INITIAL_IDS` was only used by `IdMap`, no need to keep both of them.
rustdoc: highlight macros more efficiently
Instead of producing `<span class=macro>assert_eq</span><span class=macro>!</span>`,
just produce `<span class=macro>assert_eq!</span>`.
2229: Fix diagnostic issue when using FakeReads in closures
This PR fixes a diagnostic issue caused by https://github.com/rust-lang/rust/pull/82536. A temporary work around was used in this merged PR which involved feature gating the addition of FakeReads introduced as a result of pattern matching in closures.
The fix involves adding an optional closure DefId to ForLet and ForMatchedPlace FakeReadCauses. This DefId will only be added if a closure pattern matches a Place starting with an Upvar.
r? ```@nikomatsakis```
BTree: move blocks around in node.rs
Without changing any names or implementation, reorder some members:
- Move down the ones defined long ago on the demised `struct Root`, to below the definition of their current host `struct NodeRef`.
- Move up some defined on `struct NodeRef` that are interspersed with those defined on `struct Handle`.
- Move up the `correct_…` methods squeezed between the two flavours of `push`.
- Move the unchecked static downcasts (`cast_to_…`) after the upcasts (`forget_`) and the (weirdly named) dynamic downcasts (`force`).
r? ````@Mark-Simulacrum````
Add an unstable --json=unused-externs flag to print unused externs
This adds an unstable flag to print a list of the extern names not used by cargo.
This PR will enable cargo to collect unused dependencies from all units and provide warnings.
The companion PR to cargo is: https://github.com/rust-lang/cargo/pull/8437
The goal is eventual stabilization of this flag in rustc as well as in cargo.
Discussion of this feature is mostly contained inside these threads: #57274#72342#72603
The feature builds upon the internal datastructures added by #72342
Externs are uniquely identified by name and the information is sufficient for cargo.
If the mode is enabled, rustc will print json messages like:
```
{"unused_extern_names":["byteorder","openssl","webpki"]}
```
For a crate that got passed byteorder, openssl and webpki dependencies but needed none of them.
### Q: Why not pass -Wunused-crate-dependencies?
A: See [ehuss's comment here](https://github.com/rust-lang/rust/issues/57274#issuecomment-624839355)
TLDR: it's cleaner. Rust's warning system wasn't built to be filtered or edited by cargo.
Even a basic implementation of the feature would have to change the "n warnings emitted" line that rustc prints at the end.
Cargo ideally wants to synthesize its own warnings anyways. For example, it would be hard for rustc to emit warnings like
"dependency foo is only used by dev targets", suggesting to make it a dev-dependency instead.
### Q: Make rustc emit used or unused externs?
A: Emitting used externs has the advantage that it simplifies cargo's collection job.
However, emitting unused externs creates less data to be communicated between rustc and cargo.
Often you want to paste a cargo command obtained from `cargo build -vv` for doing something
completely unrelated. The message is emitted always, even if no warning or error is emitted.
At that point, even this tiny difference in "noise" matters. That's why I went with emitting unused externs.
### Q: One json msg per extern or a collective json msg?
A: Same as above, the data format should be concise. Having 30 lines for the 30 crates a crate uses would be disturbing to readers.
Also it helps the cargo implementation to know that there aren't more unused deps coming.
### Q: Why use names of externs instead of e.g. paths?
A: Names are both sufficient as well as neccessary to uniquely identify a passed `--extern` arg.
Names are sufficient because you *must* pass a name when passing an `--extern` arg.
Passing a path is optional on the other hand so rustc might also figure out a crate's location from the file system.
You can also put multiple paths for the same extern name, via e.g. `--extern hello=/usr/lib/hello.rmeta --extern hello=/usr/local/lib/hello.rmeta`,
but rustc will only ever use one of those paths.
Also, paths don't identify a dependency uniquely as it is possible to have multiple different extern names point to the same path.
So paths are ill-suited for identification.
### Q: What about 2015 edition crates?
A: They are fully supported.
Even on the 2015 edition, an explicit `--extern` flag is is required to enable `extern crate foo;` to work (outside of sysroot crates, which this flag doesn't warn about anyways).
So the lint would still fire on 2015 edition crates if you haven't included a dependency specified in Cargo.toml using `extern crate foo;` or similar.
The lint won't fire if your sole use in the crate is through a `extern crate foo;` statement, but that's not its job.
For detecting unused `extern crate foo` statements, there is the `unused_extern_crates` lint
which can be enabled by `#![warn(unused_extern_crates)]` or similar.
cc ```@jsgf``` ```@ehuss``` ```@petrochenkov``` ```@estebank```
Fix error codes check run and ensure it will not go unnoticed again
Fixes#83268.
The error codes explanations were not checked anymore. I fixed this issue and also added variables to ensure that this won't happen again (at least not silently).