Add more custom folding to `core::iter` adaptors
Many of the iterator adaptors will perform faster folds if they forward
to their inner iterator's folds, especially for inner types like `Chain`
which are optimized too. The following types are newly specialized:
| Type | `fold` | `rfold` |
| ----------- | ------ | ------- |
| `Enumerate` | ✓ | ✓ |
| `Filter` | ✓ | ✓ |
| `FilterMap` | ✓ | ✓ |
| `FlatMap` | exists | ✓ |
| `Fuse` | ✓ | ✓ |
| `Inspect` | ✓ | ✓ |
| `Peekable` | ✓ | N/A¹ |
| `Skip` | ✓ | N/A² |
| `SkipWhile` | ✓ | N/A¹ |
¹ not a `DoubleEndedIterator`
² `Skip::next_back` doesn't pull skipped items at all, but this couldn't
be avoided if `Skip::rfold` were to call its inner iterator's `rfold`.
Benchmarks
----------
In the following results, plain `_sum` computes the sum of a million
integers -- note that `sum()` is implemented with `fold()`. The
`_ref_sum` variants do the same on a `by_ref()` iterator, which is
limited to calling `next()` one by one, without specialized `fold`.
The `chain` variants perform the same tests on two iterators chained
together, to show a greater benefit of forwarding `fold` internally.
test iter::bench_enumerate_chain_ref_sum ... bench: 2,216,264 ns/iter (+/- 29,228)
test iter::bench_enumerate_chain_sum ... bench: 922,380 ns/iter (+/- 2,676)
test iter::bench_enumerate_ref_sum ... bench: 476,094 ns/iter (+/- 7,110)
test iter::bench_enumerate_sum ... bench: 476,438 ns/iter (+/- 3,334)
test iter::bench_filter_chain_ref_sum ... bench: 2,266,095 ns/iter (+/- 6,051)
test iter::bench_filter_chain_sum ... bench: 745,594 ns/iter (+/- 2,013)
test iter::bench_filter_ref_sum ... bench: 889,696 ns/iter (+/- 1,188)
test iter::bench_filter_sum ... bench: 667,325 ns/iter (+/- 1,894)
test iter::bench_filter_map_chain_ref_sum ... bench: 2,259,195 ns/iter (+/- 353,440)
test iter::bench_filter_map_chain_sum ... bench: 1,223,280 ns/iter (+/- 1,972)
test iter::bench_filter_map_ref_sum ... bench: 611,607 ns/iter (+/- 2,507)
test iter::bench_filter_map_sum ... bench: 611,610 ns/iter (+/- 472)
test iter::bench_fuse_chain_ref_sum ... bench: 2,246,106 ns/iter (+/- 22,395)
test iter::bench_fuse_chain_sum ... bench: 634,887 ns/iter (+/- 1,341)
test iter::bench_fuse_ref_sum ... bench: 444,816 ns/iter (+/- 1,748)
test iter::bench_fuse_sum ... bench: 316,954 ns/iter (+/- 2,616)
test iter::bench_inspect_chain_ref_sum ... bench: 2,245,431 ns/iter (+/- 21,371)
test iter::bench_inspect_chain_sum ... bench: 631,645 ns/iter (+/- 4,928)
test iter::bench_inspect_ref_sum ... bench: 317,437 ns/iter (+/- 702)
test iter::bench_inspect_sum ... bench: 315,942 ns/iter (+/- 4,320)
test iter::bench_peekable_chain_ref_sum ... bench: 2,243,585 ns/iter (+/- 12,186)
test iter::bench_peekable_chain_sum ... bench: 634,848 ns/iter (+/- 1,712)
test iter::bench_peekable_ref_sum ... bench: 444,808 ns/iter (+/- 480)
test iter::bench_peekable_sum ... bench: 317,133 ns/iter (+/- 3,309)
test iter::bench_skip_chain_ref_sum ... bench: 1,778,734 ns/iter (+/- 2,198)
test iter::bench_skip_chain_sum ... bench: 761,850 ns/iter (+/- 1,645)
test iter::bench_skip_ref_sum ... bench: 478,207 ns/iter (+/- 119,252)
test iter::bench_skip_sum ... bench: 315,614 ns/iter (+/- 3,054)
test iter::bench_skip_while_chain_ref_sum ... bench: 2,486,370 ns/iter (+/- 4,845)
test iter::bench_skip_while_chain_sum ... bench: 633,915 ns/iter (+/- 5,892)
test iter::bench_skip_while_ref_sum ... bench: 666,926 ns/iter (+/- 804)
test iter::bench_skip_while_sum ... bench: 444,405 ns/iter (+/- 571)
rustc: Default 32 codegen units at O0
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
Point at signature on unused lint
```
warning: struct is never used: `Struct`
--> $DIR/unused-warning-point-at-signature.rs:22:1
|
22 | struct Struct {
| ^^^^^^^^^^^^^
```
Fix#33961.
To match the C signature, main() should be generated with C int type
for the argc parameter and result, i.e. i32 instead of i64 on 64bit.
That way it no longer relies on the upper 32 bits being zero, which I'm
not sure is guaranteed by ABIs or startup code.
Allow replacing HashMap entries
This is an obvious API hole. At the moment the only way to retrieve an entry from a `HashMap` is to get an entry to it, remove it, and then insert a new entry. This PR allows entries to be replaced.
TrustedRandomAccess specialisation for Iterator::cloned when Item: Copy.
This should fix#44424. It also provides a potential fix for more iterators using `Iterator::cloned`.
Add aarch64-unknown-linux-musl target
This adds support for the aarch64-unknown-linux-musl target in the build and CI systems.
This addresses half of issue #42520.
The new file `aarch64_unknown_linux_musl.rs` is a copy of `aarch64_unknown_linux_gnu.rs` with "gnu" replaced by "musl", and the added logic in `build-arm-musl.sh` is similarly a near-copy of the arches around it, so overall the changes were straightforward.
Testing:
```
$ sudo ./src/ci/docker/run.sh cross
...
Dist std stage2 (x86_64-unknown-linux-gnu -> aarch64-unknown-linux-musl)
Building stage2 test artifacts (x86_64-unknown-linux-gnu -> aarch64-unknown-linux-musl)
Compiling getopts v0.2.14
Compiling term v0.0.0 (file:///checkout/src/libterm)
Compiling test v0.0.0 (file:///checkout/src/libtest)
Finished release [optimized] target(s) in 16.91 secs
Copying stage2 test from stage2 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / aarch64-unknown-linux-musl)
...
Build completed successfully in 0:55:22
```
```
$ rustup toolchain link local obj/build/x86_64-unknown-linux-gnu/stage2
$ rustup default local
```
After setting the local toolchain as default, and adding this in ~/.cargo/config:
```
[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-musl-gcc"
```
...then the toolchain was able to build a working ripgrep as a test:
```
$ readelf -a target/aarch64-unknown-linux-musl/debug/rg | grep -i interpreter
$ readelf -a target/aarch64-unknown-linux-musl/debug/rg | grep NEEDED
$ file target/aarch64-unknown-linux-musl/debug/rg
target/aarch64-unknown-linux-musl/debug/rg: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=be11036b0988fac5dccc9f6487eb780b05186582, not stripped
```
On Windows with the NTFS filesystem, `fs::copy` would return the sum of the
lengths of all streams, which can be different from the length reported by
`metadata` and thus confusing for users unaware of this NTFS peculiarity.
This makes `fs::copy` return the same length `metadata` reports which is the
value it used to return before PR #26751. Note that alternate streams are still
copied; their length is just not included in the returned value.
This change relies on the assumption that the stream with index 1 is always the
main stream in the `CopyFileEx` callback. I could not find any official
document confirming this but empirical testing has shown this to be true,
regardless of whether the alternate stream is created before or after the main
stream.
Resolves#44532
Point at parameter type on E0301
On "the parameter type `T` may not live long enough" error, point to the
parameter type suggesting lifetime bindings:
```
error[E0310]: the parameter type `T` may not live long enough
--> $DIR/lifetime-doesnt-live-long-enough.rs:28:5
|
27 | struct Foo<T> {
| - help: consider adding an explicit lifetime bound `T: 'static`...
28 | foo: &'static T
| ^^^^^^^^^^^^^^^
|
note: ...so that the reference type `&'static T` does not outlive the data it points at
--> $DIR/lifetime-doesnt-live-long-enough.rs:28:5
|
28 | foo: &'static T
| ^^^^^^^^^^^^^^^
```
Fix#36700.