save LTO import info and check it when trying to reuse build products
Fix#59535
Previous runs of LTO optimization on the previous incremental build can import larger portions of the dependence graph into a codegen unit than the current compilation run is choosing to import. We need to take that into account when we choose to reuse PostLTO-optimization object files from previous compiler invocations.
This PR accomplishes that by serializing the LTO import information on each incremental build. We load up the previous LTO import data as well as the current LTO import data. Then as we decide whether to reuse previous PostLTO objects or redo LTO optimization, we check whether the LTO import data matches. After we finish with this decision process for every object, we write the LTO import data back to disk.
----
What is the scenario where comparing against past LTO import information is necessary?
I've tried to capture it in the comments in the regression test, but here's yet another attempt from me to summarize the situation:
1. Consider a call-graph like `[A] -> [B -> D] <- [C]` (where the letters are functions and the modules are enclosed in `[]`)
2. In our specific instance, the earlier compilations were inlining the call to`B` into `A`; thus `A` ended up with a external reference to the symbol `D` in its object code, to be resolved at subsequent link time. The LTO import information provided by LLVM for those runs reflected that information: it explicitly says during those runs, `B` definition and `D` declaration were imported into `[A]`.
3. The change between incremental builds was that the call `D <- C` was removed.
4. That change, coupled with other decisions within `rustc`, made the compiler decide to make `D` an internal symbol (since it was no longer accessed from other codegen units, this makes sense locally). And then the definition of `D` was inlined into `B` and `D` itself was eliminated entirely.
5. The current LTO import information reported that `B` alone is imported into `[A]` for the *current compilation*. So when the Rust compiler surveyed the dependence graph, it determined that nothing `[A]` imports changed since the last build (and `[A]` itself has not changed either), so it chooses to reuse the object code generated during the previous compilation.
6. But that previous object code has an unresolved reference to `D`, and that causes a link time failure!
----
The interesting thing is that its quite hard to actually observe the above scenario arising, which is probably why no one has noticed this bug in the year or so since incremental LTO support landed (PR #53673).
I've literally spent days trying to observe the bug on my local machine, but haven't managed to find the magic combination of factors to get LLVM and `rustc` to do just the right set of the inlining and `internal`-reclassification choices that cause this particular problem to arise.
----
Also, I have tried to be careful about injecting new bugs with this PR. Specifically, I was/am worried that we could get into a scenario where overwriting the current LTO import data with past LTO import data would cause us to "forget" a current import. ~~To guard against this, the PR as currently written always asserts, at overwrite time, that the past LTO import-set is a *superset* of the current LTO import-set. This way, the overwriting process should always be safe to run.~~
* The previous note was written based on the first version of this PR. It has since been revised to use a simpler strategy, where we never attempt to merge the past LTO import information into the current one. We just *compare* them, and act accordingly.
* Also, as you can see from the comments on the PR itself, I was quite right to be worried about forgetting past imports; that scenario was observable via a trivial transformation of the regression test I had devised.
Rollup of 5 pull requests
Successful merges:
- #64588 (Add a raw "address of" operator)
- #67031 (Update tokio crates to latest versions)
- #67131 (Merge `TraitItem` & `ImplItem into `AssocItem`)
- #67354 (Fix pointing at arg when cause is outside of call)
- #67363 (Fix handling of wasm import modules and names)
Failed merges:
r? @ghost
Fix handling of wasm import modules and names
The WebAssembly targets of rustc have weird issues around name mangling
and import the same name from different modules. This all largely stems
from the fact that we're using literal symbol names in LLVM IR to
represent what a function is called when it's imported, and we're not
using the wasm-specific `wasm-import-name` attribute. This in turn leads
to two issues:
* If, in the same codegen unit, the same FFI symbol is referenced twice
then rustc, when translating to LLVM IR, will only reference one
symbol from the first wasm module referenced.
* There's also a bug in LLD [1] where even if two codegen units
reference different modules, having the same symbol names means that
LLD coalesces the symbols and only refers to one wasm module.
Put another way, all our imported wasm symbols from the environment are
keyed off their LLVM IR symbol name, which has lots of collisions today.
This commit fixes the issue by implementing two changes:
1. All wasm symbols with `#[link(wasm_import_module = "...")]` are
mangled by default in LLVM IR. This means they're all given unique names.
2. Symbols then use the `wasm-import-name` attribute to ensure that the
WebAssembly file uses the correct import name.
When put together this should ensure we don't trip over the LLD bug [1]
and we also codegen IR correctly always referencing the right symbols
with the right import module/name pairs.
Closes#50021Closes#56309Closes#63562
[1]: https://bugs.llvm.org/show_bug.cgi?id=44316
Merge `TraitItem` & `ImplItem into `AssocItem`
In this PR we:
- Merge `{Trait,Impl}Item{Kind?}` into `AssocItem{Kind?}` as discussed in https://github.com/rust-lang/rust/issues/65041#issuecomment-538105286.
- This is done by using the cover grammar of both forms.
- In particular, it requires that we syntactically allow (under `#[cfg(FALSE)]`):
- `default`ness on `trait` items,
- `impl` items without a body / definition (`const`, `type`, and `fn`),
- and associated `type`s in `impl`s with bounds, e.g., `type Foo: Ord;`.
- The syntactic restrictions are replaced by semantic ones in `ast_validation`.
- Move syntactic restrictions around C-variadic parameters from the parser into `ast_validation`:
- `fn`s in all contexts now syntactically allow `...`,
- `...` can occur anywhere in the list syntactically (`fn foo(..., x: usize) {}`),
- and `...` can be the sole parameter (`fn foo(...) {}`.
r? @petrochenkov
Update tokio crates to latest versions
Drops few old crates from the workspace (they are only used during tests, not in Rust itself) and allows to remove even more crates during next `rustc-ap-*` update.
Add a raw "address of" operator
* Parse and feature gate `&raw [const | mut] expr` (feature gate name is `raw_address_of`)
* Add `mir::Rvalue::AddressOf`
* Use the new `Rvalue` for:
* the new syntax
* reference to pointer casts
* drop shims for slices and arrays
* Stop using `mir::Rvalue::Cast` with a reference as the operand
* Correctly evaluate `mir::Rvalue::{Ref, AddressOf}` in constant propagation
cc @Centril @RalfJung @oli-obk @eddyb
cc #64490
Rollup of 7 pull requests
Successful merges:
- #66755 (Remove a const-if-hack in RawVec)
- #67127 (Use structured suggestion for disambiguating method calls)
- #67219 (Fix up Command Debug output when arg0 is specified.)
- #67285 (Indicate origin of where type parameter for uninferred types )
- #67328 (Remove now-redundant range check on u128 -> f32 casts)
- #67367 (Move command line option definitions into a dedicated file)
- #67442 (Remove `SOCK_CLOEXEC` dummy variable on platforms that don't use it.)
Failed merges:
r? @ghost
Move command line option definitions into a dedicated file
config.rs has reached the 3000 line tidy limit, this commit moves command line option definitions into a new file - options.rs, and leaves the rest of configuration infrastructure in config.rs.
Remove now-redundant range check on u128 -> f32 casts
This code was added to avoid UB in LLVM 6 and earlier, but we no longer support those LLVM versions.
Since https://reviews.llvm.org/D47807 (released in LLVM 7), uitofp does exactly what we need.
Closes#51872
Indicate origin of where type parameter for uninferred types
Based on #65951 (which is not merge yet), fixes#67277.
This PR improves a little the diagnostic for code like:
```
async fn foo() {
bar().await;
}
async fn bar<T>() -> () {}
```
by showing:
```
error[E0698]: type inside `async fn` body must be known in this context
--> unresolved_type_param.rs:9:5
|
9 | bar().await;
| ^^^ cannot infer type for type parameter `T` declared on the function `bar`
|
...
```
(The
```
declared on the function `bar`
```
part is new)
A small side note: `Vec` and `slice` seem to resist this change, because querying `item_name()` panics, and `get_opt_name()` returns `None`.
r? @estebank
Fix up Command Debug output when arg0 is specified.
PR https://github.com/rust-lang/rust/pull/66512 added the ability to set argv[0] on
Command. As a side effect, it changed the Debug output to print both the program and
argv[0], which in practice results in stuttery output (`"echo" "echo" "foo"`).
This PR reverts the behaviour to the the old one, so that the command is only printed
once - unless arg0 has been set. In that case it emits `"[command]" "arg0" "arg1" ...`.
config.rs has reached the 3000 line tidy limit, this commit moves
command line option definitions into a new file - options.rs, and
leaves the rest of configuration infrastructure in config.rs.
adopts simple strategy devised with assistance from mw: Instead of accumulating
(and acting upon) LTO import information over an unbounded number of prior
compilations, just see if the current import set matches the previous import set.
if they don't match, then you cannot reuse the PostLTO build product for that
module.
In either case (of a match or a non-match), we can (and must) unconditionally
emit the current import set as the recorded information in the incremental
compilation cache, ready to be loaded during the next compiler run for use in
the same check described above.
resolves issue 59535.
Set release channel on non-dist builders
Toolstate publication only runs if the channel is "nightly" and
previously the toolstate builders did not know that the channel was
nightly (since they are not dist builders).
A look through bootstrap seems to indicate that nothing should directly
depend on the channel being set to `-dev` on the test builders, though
this may cause some problems with UI tests (if for some reason they're
dumping the channel into stderr), but we cannot find evidence of such so
hopefully this is fine.
r? @pietroalbini
Add more delegations to the fmt docs and add doctests
HI,
this is a continuation to #67021
I replaced the `Debug` example with one that use the `Debug*` helpers so that padding etc will work too.
I also added asserts for the doctests as @RalfJung asked :)
The only thing I left with the `write!` macro is the `Display` example as I didn't know if there's a better way to do that.
r? @QuietMisdreavus
Rollup of 8 pull requests
Successful merges:
- #67189 (Unify binop wording)
- #67270 (std: Implement `LineWriter::write_vectored`)
- #67286 (Fix the configure.py TOML field for a couple LLVM options)
- #67321 (make htons const fn)
- #67382 (Remove some unnecessary `ATTR_*` constants.)
- #67389 (Remove `SO_NOSIGPIPE` dummy variable on platforms that don't use it.)
- #67394 (Remove outdated references to @T from comments)
- #67406 (Suggest associated type when the specified one cannot be found)
Failed merges:
r? @ghost
Suggest associated type when the specified one cannot be found
Fixes#67386, so code like this:
```
use std::ops::Deref;
fn homura<T: Deref<Trget = i32>>(_: T) {}
fn main() {}
```
results in:
```
error[E0220]: associated type `Trget` not found for `std::ops::Deref`
--> type-binding.rs:6:20
|
6 | fn homura<T: Deref<Trget = i32>>(_: T) {}
| ^^^^^^^^^^^ help: there is an associated type with a similar name: `Target`
error: aborting due to previous error
```
(The `help` is new)
I used an `all_candidates: impl Fn() -> Iterator<...>` instead of `collect`ing to avoid the cost of allocating the Vec when no errors are found, at the expense of a little added complexity.
r? @estebank
std: Implement `LineWriter::write_vectored`
This commit implements the `write_vectored` method of the `LineWriter`
type. First discovered in bytecodealliance/wasmtime#629 the
`write_vectored` method of `Stdout` bottoms out here but only ends up
writing the first buffer due to the default implementation of
`write_vectored`.
Like `BufWriter`, however, `LineWriter` can have a non-default
implementation of `write_vectored` which tries to preserve the
vectored-ness as much as possible. Namely we can have a vectored write
for everything before the newline and everything after the newline if
all the stars align well.
Also like `BufWriter`, though, special care is taken to ensure that
whenever bytes are written we're sure to signal success since that
represents a "commit" of writing bytes.
Toolstate publication only runs if the channel is "nightly" and
previously the toolstate builders did not know that the channel was
nightly (since they are not dist builders).
A look through bootstrap seems to indicate that nothing should directly
depend on the channel being set to `-dev` on the test builders, though
this may cause some problems with UI tests (if for some reason they're
dumping the channel into stderr), but we cannot find evidence of such so
hopefully this is fine.
Switch bootstrap to 1.41
This updates the version number for master to 1.42 and switches the bootstrap compiler to yesterday's beta. Fallout of cfg(bootstrap) changes is also dealt with.