codegen: use new {re,de,}allocator annotations in llvm
This obviates the patch that teaches LLVM internals about
_rust_{re,de}alloc functions by putting annotations directly in the IR
for the optimizer.
The sole test change is required to anchor FileCheck to the body of the
`box_uninitialized` method, so it doesn't see the `allocalign` on
`__rust_alloc` and get mad about the string `alloca` showing up. Since I
was there anyway, I added some checks on the attributes to prove the
right attributes got set.
r? `@nikic`
```
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
The znver2 default impl seems to be slow due to inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deeper call tree.
This obviates the patch that teaches LLVM internals about
_rust_{re,de}alloc functions by putting annotations directly in the IR
for the optimizer.
The sole test change is required to anchor FileCheck to the body of the
`box_uninitialized` method, so it doesn't see the `allocalign` on
`__rust_alloc` and get mad about the string `alloca` showing up. Since I
was there anyway, I added some checks on the attributes to prove the
right attributes got set.
While we're here, we also emit allocator attributes on
__rust_alloc_zeroed. This should allow LLVM to perform more
optimizations for zeroed blocks, and probably fixes#90032. [This
comment](https://github.com/rust-lang/rust/issues/24194#issuecomment-308791157)
mentions "weird UB-like behaviour with bitvec iterators in
rustc_data_structures" so we may need to back this change out if things
go wrong.
The new test cases require LLVM 15, so we copy them into LLVM
14-supporting versions, which we can delete when we drop LLVM 14.
This allows using most delay loaded functions before the init code initializes them. It also only preloads a select few functions, rather than all functions.
Co-Authored-By: Mark Rousskov <mark.simulacrum@gmail.com>
interpret, ptr_offset_from: refactor and test too-far-apart check
We didn't have any tests for the "too far apart" message, and indeed that check mostly relied on the in-bounds check and was otherwise probably not entirely correct... so I rewrote that check, and it is before the in-bounds check so we can test it separately.
Expose size_hint() for TokenStream's iterator
The iterator for `proc_macro::TokenStream` is a wrapper around a `Vec` iterator:
babff2211e/library/proc_macro/src/lib.rs (L363-L371)
so it can cheaply provide a perfectly precise size hint, with just a pointer subtraction:
babff2211e/library/alloc/src/vec/into_iter.rs (L170-L177)
I need the size hint in syn (https://github.com/dtolnay/syn/blob/1.0.98/src/buffer.rs) to reduce allocations when converting TokenStream into syn's internal TokenBuffer representation.
Aside from `size_hint`, the other non-default methods in `std::vec::IntoIter`'s `Iterator` impl are `advance_by`, `count`, and `__iterator_get_unchecked`. I've included `count` in this PR since it is trivial. I did not include `__iterator_get_unchecked` because it is spoopy and I did not feel like dealing with that. Lastly, I did not include `advance_by` because that requires `feature(iter_advance_by)` (https://github.com/rust-lang/rust/issues/77404) and I noticed this comment at the top of libproc_macro:
babff2211e/library/proc_macro/src/lib.rs (L20-L22)
correct the output of a `capacity` method example
The output of this example in std::alloc is different from which shown in the comment. I have tested it on both Linux and Windows.
Constify a few `(Partial)Ord` impls
Only a few `impl`s are constified for now, as #92257 has not landed in the bootstrap compiler yet and quite a few impls would need that fix.
This unblocks #92228, which unblocks marking iterator methods as `default_method_body_is_const`.
add miri-track-caller to more intrinsic-exposing methods
Follow-up to https://github.com/rust-lang/rust/pull/98674: I went through the Miri test suite to find more functions that would benefit from Miri backtrace pruning, and this is what I found.
Basically anything that just exposes a potentially-UB intrinsic to the user should get this treatment.
kmc-solid: Use `libc::abort` to abort a program
This PR updates the target-specific abort subroutine for the [`*-kmc-solid_*`](https://doc.rust-lang.org/nightly/rustc/platform-support/kmc-solid.html) Tier 3 targets.
The current implementation uses a `hlt` instruction, which is the most direct way to notify a connected debugger but is not the most flexible way. This PR changes it to call the `abort` libc function, making it possible for a system designer to override its behavior as they see fit.
Support vec zero-alloc optimization for tuples and byte arrays
* Implement IsZero trait for tuples up to 8 IsZero elements;
* Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
* Add more codegen tests for this optimization.
* Lower size of array for IsZero trait because it fails to inline checks
* Implement IsZero trait for tuples up to 8 IsZero elements;
* Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
* Add more codegen tests for this optimization.
* Lower size of array for IsZero trait because it fails to inline checks
The implementation of BufReader contains a lot of redundant checks.
While any one of these checks is not particularly expensive to execute,
especially when taken together they dramatically inhibit LLVM's ability
to make subsequent optimizations.
miri: prune some atomic operation and raw pointer details from stacktrace
Since Miri removes `track_caller` frames from the stacktrace, adding that attribute can help make backtraces more readable (similar to how it makes panic locations better). I made them only show up with `cfg(miri)` to make sure the extra arguments induced by `track_caller` do not cause any runtime performance trouble.
This is also testing the waters for whether the libs team is okay with having these attributes in their code, or whether you'd prefer if we find some other way to do this. If you are fine with this, we will probably want to add it to a lot more functions (all the other atomic operations, to start).
Before:
```
error: Undefined Behavior: Data race detected between Atomic Load on Thread(id = 2) and Write on Thread(id = 1) at alloc1727 (current vector clock = VClock([9, 0, 6]), conflicting timestamp = VClock([0, 6]))
--> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/sync/atomic.rs:2594:23
|
2594 | SeqCst => intrinsics::atomic_load_seqcst(dst),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between Atomic Load on Thread(id = 2) and Write on Thread(id = 1) at alloc1727 (current vector clock = VClock([9, 0, 6]), conflicting timestamp = VClock([0, 6]))
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `std::sync::atomic::atomic_load::<usize>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/sync/atomic.rs:2594:23
= note: inside `std::sync::atomic::AtomicUsize::load` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/sync/atomic.rs:1719:26
note: inside closure at ../miri/tests/fail/data_race/atomic_read_na_write_race1.rs:22:13
--> ../miri/tests/fail/data_race/atomic_read_na_write_race1.rs:22:13
|
22 | (&*c.0).load(Ordering::SeqCst)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
After:
```
error: Undefined Behavior: Data race detected between Atomic Load on Thread(id = 2) and Write on Thread(id = 1) at alloc1727 (current vector clock = VClock([9, 0, 6]), conflicting timestamp = VClock([0, 6]))
--> tests/fail/data_race/atomic_read_na_write_race1.rs:22:13
|
22 | (&*c.0).load(Ordering::SeqCst)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data race detected between Atomic Load on Thread(id = 2) and Write on Thread(id = 1) at alloc1727 (current vector clock = VClock([9, 0, 6]), conflicting timestamp = VClock([0, 6]))
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside closure at tests/fail/data_race/atomic_read_na_write_race1.rs:22:13
```
Lock stdout once when listing tests
This is a marginal optimization for normal operation, but for `cargo miri nextest list` (which is invoked by `cargo miri nextest run`) this knocks the startup time on `regex` down from 87 seconds to 17 seconds. Still slow, but a nice 5x improvement.
Add cgroupv1 support to available_parallelism
Fixes#97549
My dev machine uses cgroup v2 so I was only able to test that code path. So the v1 code path is written only based on documentation. I could use some help testing that it works on a machine with cgroups v1:
```
$ x.py build --stage 1
# quota.rs
fn main() {
println!("{:?}", std:🧵:available_parallelism());
}
# assuming stage1 is linked in rustup
$ rust +stage1 quota.rs
# spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS
# should print Ok(3)
$ ./quota
```
If it doesn't work as expected an strace, the contents of `/proc/self/cgroups` and the structure of `/sys/fs/cgroups` would help.
Add `[f32]::sort_floats` and `[f64]::sort_floats`
It's inconvenient to sort a slice or Vec of floats, compared to sorting integers. To simplify numeric code, add a convenience method to `[f32]` and `[f64]` to sort them using `sort_unstable_by` with `total_cmp`.
Rename `<*{mut,const} T>::as_{const,mut}` to `cast_`
This renames the methods to use the `cast_` prefix instead of `as_` to
make it more readable and avoid confusion with `<*mut T>::as_mut()`
which is `unsafe` and returns a reference.
Sorry, didn't notice ACP process exists, opened https://github.com/rust-lang/libs-team/issues/51
See #92675
make vtable pointers entirely opaque
This implements the scheme discussed in https://github.com/rust-lang/unsafe-code-guidelines/issues/338: vtable pointers should be considered entirely opaque and not even readable by Rust code, similar to function pointers.
- We have a new kind of `GlobalAlloc` that symbolically refers to a vtable.
- Miri uses that kind of allocation when generating a vtable.
- The codegen backends, upon encountering such an allocation, call `vtable_allocation` to obtain an actually dataful allocation for this vtable.
- We need new intrinsics to obtain the size and align from a vtable (for some `ptr::metadata` APIs), since direct accesses are UB now.
I had to touch quite a bit of code that I am not very familiar with, so some of this might not make much sense...
r? `@oli-obk`
Fix the stable version of `AsFd for Arc<T>` and `Box<T>`
These merged in #97437 for 1.64.0, apart from the main `io_safety`
feature that stabilized in 1.63.0.
std: use futex-based locks on Fuchsia
This switches `Condvar` and `RwLock` to the futex-based implementation currently used on Linux and some BSDs. Additionally, `Mutex` now has its own, priority-inheriting implementation based on the mutex in Fuchsia's `libsync`. It differs from the original in that it panics instead of aborting when reentrant locking is detected.
````@rustbot```` ping fuchsia
r? ````@m-ou-se````
This renames the methods to use the `cast_` prefix instead of `as_` to
make it more readable and avoid confusion with `<*mut T>::as_mut()`
which is `unsafe` and returns a reference.
See #92675
stdlib support for Apple WatchOS
This is a follow-up to https://github.com/rust-lang/rust/pull/95243 (Add Apple WatchOS compiler targets) that adds stdlib support for Apple WatchOS.
`@deg4uss3r`
`@nagisa`
Windows: Use `FindFirstFileW` for getting the metadata of locked system files
Fixes#96980
Usually opening a file handle with access set to metadata only will always succeed, even if the file is locked. However some special system files, such as `C:\hiberfil.sys`, are locked by the system in a way that denies even that. So as a fallback we try reading the cached metadata from the directory.
Note that the test is a bit iffy. I don't know if `hiberfil.sys` actually exists in the CI.
r? rust-lang/libs
Improve the function pointer docs
This is #97842 but for function pointers instead of tuples. The concept is basically the same.
* Reduce duplicate impls; show `fn (T₁, T₂, …, Tₙ)` and include a sentence saying that there exists up to twelve of them.
* Show `Copy` and `Clone`.
* Show auto traits like `Send` and `Sync`, and blanket impls like `Any`.
https://notriddle.com/notriddle-rustdoc-test/std/primitive.fn.html
* Reduce duplicate impls; show only the `fn (T)` and include a sentence
saying that there exists up to twelve of them.
* Show `Copy` and `Clone`.
* Show auto traits like `Send` and `Sync`, and blanket impls like `Any`.
core::any: replace some generic types with impl Trait
This gives a cleaner API since the caller only specifies the concrete type they usually want to.
r? `@yaahc`
Fix `Skip::next` for non-fused inner iterators
`iter.skip(n).next()` will currently call `nth` and `next` in succession on `iter`, without checking whether `nth` exhausts the iterator. Using `?` to propagate a `None` value returned by `nth` avoids this.
Use split_once in FromStr docs
Current implementation:
```rust
fn from_str(s: &str) -> Result<Self, Self::Err> {
let coords: Vec<&str> = s.trim_matches(|p| p == '(' || p == ')' )
.split(',')
.collect();
let x_fromstr = coords[0].parse::<i32>()?;
let y_fromstr = coords[1].parse::<i32>()?;
Ok(Point { x: x_fromstr, y: y_fromstr })
}
```
Creating the vector is not necessary, `split_once` does the job better.
Alternatively we could also remove `trim_matches` with `strip_prefix` and `strip_suffix`:
```rust
let (x, y) = s
.strip_prefix('(')
.and_then(|s| s.strip_suffix(')'))
.and_then(|s| s.split_once(','))
.unwrap();
```
The question is how much 'correctness' is too much and distracts from the example. In a real implementation you would also not unwrap (or originally access the vector without bounds checks), but implementing a custom Error and adding a `From<ParseIntError>` and implementing the `Error` trait adds a lot of code to the example which is not relevant to the `FromStr` trait.
proc_macro/bridge: stop using a remote object handle for proc_macro Ident and Literal
This is the fourth part of https://github.com/rust-lang/rust/pull/86822, split off as requested in https://github.com/rust-lang/rust/pull/86822#pullrequestreview-1008655452. This patch transforms the `Ident` and `Group` types into structs serialized over IPC rather than handles.
Symbol values are interned on both the client and server when deserializing, to avoid unnecessary string copies and keep the size of `TokenTree` down. To do the interning efficiently on the client, the proc-macro crate is given a vendored version of the fxhash hasher, as `SipHash` appeared to cause performance issues. This was done rather than depending on `rustc_hash` as it is unfortunately difficult to depend on crates from within `proc_macro` due to it being built at the same time as `std`.
In addition, a custom arena allocator and symbol store was also added, inspired by those in `rustc_arena` and `rustc_span`. To prevent symbol re-use across multiple invocations of a macro on the same thread, a new range of `Symbol` names are used for each invocation of the macro, and symbols from previous invocations are cleaned-up.
In order to keep `Ident` creation efficient, a special ASCII-only case was added to perform ident validation without using RPC for simple identifiers. Full identifier validation couldn't be easily added, as it would require depending on the `rustc_lexer` and `unicode-normalization` crates from within `proc_macro`. Unicode identifiers are validated and normalized using RPC.
See the individual commit messages for more details on trade-offs and design decisions behind these patches.
This method is still only used for Literal::subspan, however the
implementation only depends on the Span component, so it is simpler and
more efficient for now to pass down only the information that is needed.
In the future, if more information about the Literal is required in the
implementation (e.g. to validate that spans line up as expected with
source text), that extra information can be added back with extra
arguments.
This builds on the symbol infrastructure built for `Ident` to replicate
the `LitKind` and `Lit` structures in rustc within the `proc_macro`
client, allowing literals to be fully created and interacted with from
the client thread. Only parsing and subspan operations still require
sync RPC.
Doing this for all unicode identifiers would require a dependency on
`unicode-normalization` and `rustc_lexer`, which is currently not
possible for `proc_macro` due to it being built concurrently with `std`
and `core`. Instead, ASCII identifiers are validated locally, and an RPC
message is used to validate unicode identifiers when needed.
String values are interned on the both the server and client when
deserializing, to avoid unnecessary copies and keep Ident cheap to copy and
move. This appears to be important for performance.
The client-side interner is based roughly on the one from rustc_span, and uses
an arena inspired by rustc_arena.
RPC messages passing symbols always include the full value. This could
potentially be optimized in the future if it is revealed to be a
performance bottleneck.
Despite now having a relevant implementaion of Display for Ident, ToString is
still specialized, as it is a hot-path for this object.
The symbol infrastructure will also be used for literals in the next
part.
Unfortunately, as it is difficult to depend on crates from within proc_macro,
this is done by vendoring a copy of the hasher as a module rather than
depending on the rustc_hash crate.
This probably doesn't have a substantial impact up-front, however will be more
relevant once symbols are interned within the proc_macro client.
add missing null ptr check in alloc example
`alloc` can return null on OOM, if I understood correctly. So we should never just deref a pointer we get from `alloc`.
Add assertion that `transmute_copy`'s U is not larger than T
This is called out as a safety requirement in the docs, but because knowing this can be done at compile time and constant folded (just like the `align_of` branch is removed), we can just panic here.
I've looked at the asm (using `cargo-asm`) of a function that both is correct and incorrect, and the panic is completely removed, or is unconditional, without needing build-std.
I don't expect this to cause much breakage in the wild. I scanned through https://miri.saethlin.dev/ub for issues that would look like this (error: Undefined Behavior: memory access failed: alloc1768 has size 1, so pointer to 8 bytes starting at offset 0 is out-of-bounds), but couldn't find any.
That doesn't rule out it happening in crates tested that fail earlier for some other reason, though, but it indicates that doing this is rare, if it happens at all. A crater run for this would need to be build and test, since this is a runtime thing.
Also added a few more transmute_copy tests.