Generalized operand.rs#nontemporal_store and fixed tidy issues
Generalized operand.rs#nontemporal_store's implem even more
With a BuilderMethod trait implemented by Builder for LLVM
Cleaned builder.rs : no more code duplication, no more ValueTrait
Full traitification of builder.rs
Generalized FunctionCx
Added ValueTrait and first change
Generalize CondegenCx
Generalized the Builder struct defined in librustc_codegen_llvm/builder.rs
DynamicLibrary uses libc's dlsym() function internally to find symbols.
Some implementations of dlsym(), like musl's, only look at dynamically-
exported symbols, as found in shared libraries. To also export symbols
from the main executable, we would need to pass --export-dynamic to the
linker. Since this flag isn't available everywhere, ignore the test for
now.
The requirements here are not "ELFv1" requirements, but big-endian
requirements, as the extension or non-extension of the argument is
necessary to put the argument in the correct half of the register.
Parameter passing in the ELFv2 ABI needs these same transformations.
Since this code makes no difference on little-endian machines, simplify
it to use the same code path everywhere.
Rollup of 17 pull requests
Successful merges:
- #55182 (Redox: Update to new changes)
- #55211 (Add BufWriter::buffer method)
- #55507 (Add link to std::mem::size_of to size_of intrinsic documentation)
- #55530 (Speed up String::from_utf16)
- #55556 (Use `Mmap` to open the rmeta file.)
- #55622 (NetBSD: link libstd with librt in addition to libpthread)
- #55750 (Make `NodeId` and `HirLocalId` `newtype_index`)
- #55778 (Wrap some query results in `Lrc`.)
- #55781 (More precise spans for temps and their drops)
- #55785 (Add mem::forget_unsized() for forgetting unsized values)
- #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint)
- #55865 (Unix RwLock: avoid racy access to write_locked)
- #55901 (fix various typos in doc comments)
- #55926 (Change sidebar selector to fix compatibility with docs.rs)
- #55930 (A handful of hir tweaks)
- #55932 (core/char: Speed up `to_digit()` for `radix <= 10`)
- #55956 (add tests for some fixed ICEs)
Failed merges:
r? @ghost
core/char: Speed up `to_digit()` for `radix <= 10`
I noticed that `char::to_digit()` seemed to do a bit of extra work for handling `[a-zA-Z]` characters. Since `to_digit(10)` seems to be the most common case (at least in the `rust` codebase) I thought it might be valuable to create a fast path for that case, and according to the benchmarks that I added in one of the commits it seems to pay off. I also created another fast path for the `radix < 10` case, which also seems to have a positive effect.
It is very well possible that I'm measuring something entirely unrelated though, so please verify these numbers and let me know if I missed something!
### Before
```
# Run 1
test char::methods::bench_to_digit_radix_10 ... bench: 16,265 ns/iter (+/- 1,774)
test char::methods::bench_to_digit_radix_16 ... bench: 13,938 ns/iter (+/- 2,479)
test char::methods::bench_to_digit_radix_2 ... bench: 13,090 ns/iter (+/- 524)
test char::methods::bench_to_digit_radix_36 ... bench: 14,236 ns/iter (+/- 1,949)
# Run 2
test char::methods::bench_to_digit_radix_10 ... bench: 16,176 ns/iter (+/- 1,589)
test char::methods::bench_to_digit_radix_16 ... bench: 13,896 ns/iter (+/- 3,140)
test char::methods::bench_to_digit_radix_2 ... bench: 13,158 ns/iter (+/- 1,112)
test char::methods::bench_to_digit_radix_36 ... bench: 14,206 ns/iter (+/- 1,312)
# Run 3
test char::methods::bench_to_digit_radix_10 ... bench: 16,221 ns/iter (+/- 2,423)
test char::methods::bench_to_digit_radix_16 ... bench: 14,361 ns/iter (+/- 3,926)
test char::methods::bench_to_digit_radix_2 ... bench: 13,097 ns/iter (+/- 671)
test char::methods::bench_to_digit_radix_36 ... bench: 14,388 ns/iter (+/- 1,068)
```
### After
```
# Run 1
test char::methods::bench_to_digit_radix_10 ... bench: 11,521 ns/iter (+/- 552)
test char::methods::bench_to_digit_radix_16 ... bench: 12,926 ns/iter (+/- 684)
test char::methods::bench_to_digit_radix_2 ... bench: 11,266 ns/iter (+/- 1,085)
test char::methods::bench_to_digit_radix_36 ... bench: 14,213 ns/iter (+/- 614)
# Run 2
test char::methods::bench_to_digit_radix_10 ... bench: 11,424 ns/iter (+/- 1,042)
test char::methods::bench_to_digit_radix_16 ... bench: 12,854 ns/iter (+/- 1,193)
test char::methods::bench_to_digit_radix_2 ... bench: 11,193 ns/iter (+/- 716)
test char::methods::bench_to_digit_radix_36 ... bench: 14,249 ns/iter (+/- 3,514)
# Run 3
test char::methods::bench_to_digit_radix_10 ... bench: 11,469 ns/iter (+/- 685)
test char::methods::bench_to_digit_radix_16 ... bench: 12,852 ns/iter (+/- 568)
test char::methods::bench_to_digit_radix_2 ... bench: 11,275 ns/iter (+/- 1,356)
test char::methods::bench_to_digit_radix_36 ... bench: 14,188 ns/iter (+/- 1,501)
```
I ran the benchmark using:
```sh
python x.py bench src/libcore --stage 1 --keep-stage 0 --test-args "bench_to_digit"
```
A handful of hir tweaks
- remove an unused `hir_vec` macro pattern
- simplify `fmt::Debug` for `hir::Path` (take advantage of the `Display` implementation)
- remove an unused type alias (`CrateConfig`)
- simplify a `match` expression (join common patterns)
Add mem::forget_unsized() for forgetting unsized values
~~Allows passing values of `T: ?Sized` types to `mem::drop` and `mem::forget`.~~
Adds `mem::forget_unsized()` that accepts `T: ?Sized`.
I had to revert the PR that removed the `forget` intrinsic and replaced it with `ManuallyDrop`: https://github.com/rust-lang/rust/pull/40559
We can't use `ManuallyDrop::new()` here because it needs `T: Sized` and we don't have support for unsized return values yet (will we ever?).
r? @eddyb
More precise spans for temps and their drops
This PR has two main enhancements:
1. when possible during code generation for a statement (like `expr();`), pass along the span of a statement, and then attribute the drops of temporaries from that statement to the statement's end-point (which will be the semicolon if it is a statement that is terminating by a semicolon).
2. when evaluating a block expression into a MIR temp, use the span of the block's tail expression (rather than the span of whole block including its statements and curly-braces) for the span of the temp.
Each of these individually increases the precision of our diagnostic output; together they combine to make a much clearer picture about the control flow through the spans.
Fix#54382
NetBSD: link libstd with librt in addition to libpthread
Some aio(3) and mq(3) functions in the libc crate actually come from NetBSD librt, not libc or libpthread.
Use `Mmap` to open the rmeta file.
Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.
r? @eddyb
Speed up String::from_utf16
Collecting into a `Result` is idiomatic, but not necessarily fast due to rustc not being able to preallocate for the resulting collection. This is fine in case of an error, but IMO we should optimize for the common case, i.e. a successful conversion.
This changes the behavior of `String::from_utf16` from collecting into a `Result` to pushing to a preallocated `String` in a loop.
According to [my simple benchmark](https://gist.github.com/ljedrz/953a3fb74058806519bd4d640d6f65ae) this change makes `String::from_utf16` around **twice** as fast.
Add link to std::mem::size_of to size_of intrinsic documentation
The other intrinsics with safe/stable alternatives already have documentation to this effect.
Redox: Update to new changes
These are all cherry-picked from our fork:
- Remove the `env:` scheme
- Update `execve` system call to `fexec`
- Interpret shebangs: these are no longer handled by the kernel, which like usual tries to be as minimal as possible
Reattach all grandchildren when constructing specialization graph.
Specialization graphs are constructed by incrementally adding impls in the order of declaration. If the impl being added has its specializations in the graph already, they should be reattached under the impl. However, the current implementation only reattaches the one found first. Therefore, in the following specialization graph,
```
Tr1
|
I3
/ \
I1 I2
```
If `I1`, `I2`, and `I3` are declared in this order, the compiler mistakenly constructs the following graph:
```
Tr1
/ \
I3 I2
|
I1
```
This patch fixes the reattach procedure to include all specializing grandchildren-to-be.
Fixes#50452.
`concurrent_recv_timeout_and_upgrade` reproduces a problem 100%
times on my MacBook with command:
```
./x.py test --stage 0 ./src/test/run-pass/mpsc_stress.rs
```
Thus it is commented out.
Other tests cases were useful for catching another test cases
which may arise during the fix.
This diff is a part of my previous rewrite attempt: #42883
CC #39364