This commit is a refactoring of the LTO backend in Rust to support compilations
with multiple codegen units. The immediate result of this PR is to remove the
artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer
term this is intended to lay the groundwork for LTO with incremental compilation
and ultimately be the underpinning of ThinLTO support.
The problem here that needed solving is that when rustc is producing multiple
codegen units in one compilation LTO needs to merge them all together.
Previously only upstream dependencies were merged and it was inherently relied
on that there was only one local codegen unit. Supporting this involved
refactoring the optimization backend architecture for rustc, namely splitting
the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM
module has been optimized it may be blocked and queued up for LTO, and only
after LTO are modules code generated.
Non-LTO compilations should look the same as they do today backend-wise, we'll
spin up a thread for each codegen unit and optimize/codegen in that thread. LTO
compilations will, however, send the LLVM module back to the coordinator thread
once optimizations have finished. When all LLVM modules have finished optimizing
the coordinator will invoke the LTO backend, producing a further list of LLVM
modules. Currently this is always a list of one LLVM module. The coordinator
then spawns further work to run LTO and code generation passes over each module.
In the course of this refactoring a number of other pieces were refactored:
* Management of the bytecode encoding in rlibs was centralized into one module
instead of being scattered across LTO and linking.
* Some internal refactorings on the link stage of the compiler was done to work
directly from `CompiledModule` structures instead of lists of paths.
* The trans time-graph output was tweaked a little to include a name on each
bar and inflate the size of the bars a little
Remove new and index methods already implement for Idx
These are the rest of the repeated implementations for new and index methods. Follow up of https://github.com/rust-lang/rust/pull/44889
Normalize spaces in lang attributes.
So, like, I grepped for all `lang` attributes for *reasons* and I noticed that they all share the same spacing of `#[lang = "item_name"]` except these five instances. So I decided to fix that. So enjoy this PR of exactly ten spaces.
Improve wording for StepBy
No other iterator makes the distinction between an iterator and an iterator adapter
in its summary line, so change it to be consistent with all other adapters.
docs improvement std::sync::{PoisonError, TryLockError}
Addresses the `PoisonError` and `TryLockError` parts of #29377.
Adds examples and links.
r? @steveklabnik
adding E0623 for return types - both parameters are anonymous
This is a fix for #44018
```
error[E0621]: explicit lifetime required in the type of `self`
--> $DIR/ex3-both-anon-regions-return-type-is-anon.rs:17:5
|
16 | fn foo<'a>(&self, x: &i32) -> &i32 {
| ---- ----
| |
| this parameter and the return type are
declared with different lifetimes...
17 | x
| ^ ...but data from `x` is returned here
error: aborting due to previous error
```
It also works for the below case where we have self as anonymous
```
error[E0623]: lifetime mismatch
--> src/test/ui/lifetime-errors/ex3-both-anon-regions-self-is-anon.rs:17:19
|
16 | fn foo<'a>(&self, x: &Foo) -> &Foo {
| ---- ----
| |
| this parameter and the return type are
declared with different lifetimes...
17 | if true { x } else { self }
| ^ ...but data from `x` is returned here
error: aborting due to previous error
```
r? @nikomatsakis
Currently, I have enabled E0621 where return type and self are anonymous, hence WIP.
Add blanket TryFrom impl when From is implemented.
Adds `impl<T, U> TryFrom<T> for U where U: From<T>`.
Removes `impl<'a, T> TryFrom<&'a str> for T where T: FromStr` (originally added in #40281) due to overlapping impls caused by the new blanket impl. This removal is to be discussed further on the tracking issue for TryFrom.
Refs #33417.
/cc @sfackler, @scottmcm (thank you for the help!), and @aturon
First step toward implementing impl Trait in argument position
First step implementing #44721.
Add a flag to hir and ty TypeParameterDef and raise an error when using
explicit type parameters when calling a function using impl Trait in
argument position.
I don't know if there is a procedure to add an error code so I just took an available code. Is that ok ?
r? @nikomatsakis
Add more custom folding to `core::iter` adaptors
Many of the iterator adaptors will perform faster folds if they forward
to their inner iterator's folds, especially for inner types like `Chain`
which are optimized too. The following types are newly specialized:
| Type | `fold` | `rfold` |
| ----------- | ------ | ------- |
| `Enumerate` | ✓ | ✓ |
| `Filter` | ✓ | ✓ |
| `FilterMap` | ✓ | ✓ |
| `FlatMap` | exists | ✓ |
| `Fuse` | ✓ | ✓ |
| `Inspect` | ✓ | ✓ |
| `Peekable` | ✓ | N/A¹ |
| `Skip` | ✓ | N/A² |
| `SkipWhile` | ✓ | N/A¹ |
¹ not a `DoubleEndedIterator`
² `Skip::next_back` doesn't pull skipped items at all, but this couldn't
be avoided if `Skip::rfold` were to call its inner iterator's `rfold`.
Benchmarks
----------
In the following results, plain `_sum` computes the sum of a million
integers -- note that `sum()` is implemented with `fold()`. The
`_ref_sum` variants do the same on a `by_ref()` iterator, which is
limited to calling `next()` one by one, without specialized `fold`.
The `chain` variants perform the same tests on two iterators chained
together, to show a greater benefit of forwarding `fold` internally.
test iter::bench_enumerate_chain_ref_sum ... bench: 2,216,264 ns/iter (+/- 29,228)
test iter::bench_enumerate_chain_sum ... bench: 922,380 ns/iter (+/- 2,676)
test iter::bench_enumerate_ref_sum ... bench: 476,094 ns/iter (+/- 7,110)
test iter::bench_enumerate_sum ... bench: 476,438 ns/iter (+/- 3,334)
test iter::bench_filter_chain_ref_sum ... bench: 2,266,095 ns/iter (+/- 6,051)
test iter::bench_filter_chain_sum ... bench: 745,594 ns/iter (+/- 2,013)
test iter::bench_filter_ref_sum ... bench: 889,696 ns/iter (+/- 1,188)
test iter::bench_filter_sum ... bench: 667,325 ns/iter (+/- 1,894)
test iter::bench_filter_map_chain_ref_sum ... bench: 2,259,195 ns/iter (+/- 353,440)
test iter::bench_filter_map_chain_sum ... bench: 1,223,280 ns/iter (+/- 1,972)
test iter::bench_filter_map_ref_sum ... bench: 611,607 ns/iter (+/- 2,507)
test iter::bench_filter_map_sum ... bench: 611,610 ns/iter (+/- 472)
test iter::bench_fuse_chain_ref_sum ... bench: 2,246,106 ns/iter (+/- 22,395)
test iter::bench_fuse_chain_sum ... bench: 634,887 ns/iter (+/- 1,341)
test iter::bench_fuse_ref_sum ... bench: 444,816 ns/iter (+/- 1,748)
test iter::bench_fuse_sum ... bench: 316,954 ns/iter (+/- 2,616)
test iter::bench_inspect_chain_ref_sum ... bench: 2,245,431 ns/iter (+/- 21,371)
test iter::bench_inspect_chain_sum ... bench: 631,645 ns/iter (+/- 4,928)
test iter::bench_inspect_ref_sum ... bench: 317,437 ns/iter (+/- 702)
test iter::bench_inspect_sum ... bench: 315,942 ns/iter (+/- 4,320)
test iter::bench_peekable_chain_ref_sum ... bench: 2,243,585 ns/iter (+/- 12,186)
test iter::bench_peekable_chain_sum ... bench: 634,848 ns/iter (+/- 1,712)
test iter::bench_peekable_ref_sum ... bench: 444,808 ns/iter (+/- 480)
test iter::bench_peekable_sum ... bench: 317,133 ns/iter (+/- 3,309)
test iter::bench_skip_chain_ref_sum ... bench: 1,778,734 ns/iter (+/- 2,198)
test iter::bench_skip_chain_sum ... bench: 761,850 ns/iter (+/- 1,645)
test iter::bench_skip_ref_sum ... bench: 478,207 ns/iter (+/- 119,252)
test iter::bench_skip_sum ... bench: 315,614 ns/iter (+/- 3,054)
test iter::bench_skip_while_chain_ref_sum ... bench: 2,486,370 ns/iter (+/- 4,845)
test iter::bench_skip_while_chain_sum ... bench: 633,915 ns/iter (+/- 5,892)
test iter::bench_skip_while_ref_sum ... bench: 666,926 ns/iter (+/- 804)
test iter::bench_skip_while_sum ... bench: 444,405 ns/iter (+/- 571)
rustc: Default 32 codegen units at O0
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
Point at signature on unused lint
```
warning: struct is never used: `Struct`
--> $DIR/unused-warning-point-at-signature.rs:22:1
|
22 | struct Struct {
| ^^^^^^^^^^^^^
```
Fix#33961.