BTreeMap/BTreeSet::from_iter: use bulk building to improve the performance
Bulk building is a common technique to increase the performance of building a fresh btree map. Instead of inserting items one-by-one, we sort all the items beforehand then create the BtreeMap in bulk.
Benchmark
```
./x.py bench library/alloc --test-args btree::map::from_iter
```
* Before
```
test btree::map::from_iter_rand_100 ... bench: 3,694 ns/iter (+/- 840)
test btree::map::from_iter_rand_10_000 ... bench: 1,033,446 ns/iter (+/- 192,950)
test btree::map::from_iter_seq_100 ... bench: 5,689 ns/iter (+/- 1,259)
test btree::map::from_iter_seq_10_000 ... bench: 861,033 ns/iter (+/- 118,815)
```
* After
```
test btree::map::from_iter_rand_100 ... bench: 3,033 ns/iter (+/- 707)
test btree::map::from_iter_rand_10_000 ... bench: 775,958 ns/iter (+/- 105,152)
test btree::map::from_iter_seq_100 ... bench: 2,969 ns/iter (+/- 336)
test btree::map::from_iter_seq_10_000 ... bench: 258,292 ns/iter (+/- 29,364)
```
Mmap the incremental data instead of reading it.
Instead of reading the full incremental state using `fs::read_file`, we memmap it using a private read-only file-backed map.
This allows the system to reclaim any memory we are not using, while ensuring we are not polluted by
outside modifications to the file.
Suggested in https://github.com/rust-lang/rust/pull/83036#issuecomment-800458082 by `@bjorn3`
Pin bootstrap checksums and add a tool to update it automatically
⚠️⚠️ This is just a proactive hardening we're performing on the build system, and it's not prompted by any known compromise. If you're aware of security issues being exploited please [check out our responsible disclosure page](https://www.rust-lang.org/policies/security). ⚠️⚠️
---
This PR aims to improve Rust's supply chain security by pinning the checksums of the bootstrap compiler downloaded by `x.py`, preventing a compromised `static.rust-lang.org` from affecting building the compiler. The checksums are stored in `src/stage0.json`, which replaces `src/stage0.txt`. This PR also adds a tool to automatically update the bootstrap compiler.
The changes in this PR were originally discussed in [Zulip](https://zulip-archive.rust-lang.org/stream/241545-t-release/topic/pinning.20stage0.20hashes.html).
## Potential attack
Before this PR, an attacker who wanted to compromise the bootstrap compiler would "just" need to:
1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries and corresponding `.sha256` files to `static.rust-lang.org`.
There is no signature verification in `x.py` as we don't want the build system to depend on GPG. Also, since the checksums were not pinned inside the repository, they were downloaded from `static.rust-lang.org` too: this only protected from accidental changes in `static.rust-lang.org` that didn't change the `*.sha256` files. The attack would allow the attacker to compromise past and future invocations of `x.py`.
## Mitigations introduced in this PR
This PR adds pinned checksums for all the bootstrap components in `src/stage0.json` instead of downloading the checksums from `static.rust-lang.org`. This changes the attack scenario to:
1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries to `static.rust-lang.org`.
3. Land a (reviewed) change in the `rust-lang/rust` repository changing the pinned hashes.
Even with a successful attack, existing clones of the Rust repository won't be affected, and once the attack is detected reverting the pinned hashes changes should be enough to be protected from the attack. This also enables further mitigations to be implemented in following PRs, such as verifying signatures when pinning new checksums (removing the trust on first use aspect of this PR) and adding a check in CI making sure a PR updating the checksum has not been tampered with (see the future improvements section).
## Additional changes
There are additional changes implemented in this PR to enable the mitigation:
* The `src/stage0.txt` file has been replaced with `src/stage0.json`. The reasoning for the change is that there is existing tooling to read and manipulate JSON files compared to the custom format we were using before, and the slight challenge of manually editing JSON files (no comments, no trailing commas) are not a problem thanks to the new `bump-stage0`.
* A new tool has been added to the repository, `bump-stage0`. When invoked, the tool automatically calculates which release should be used as the bootstrap compiler given the current version and channel, gathers all the relevant checksums and updates `src/stage0.json`. The tool can be invoked by running:
```
./x.py run src/tools/bump-stage0
```
* Support for downloading releases from `https://dev-static.rust-lang.org` has been removed, as it's not possible to verify checksums there (it's customary to replace existing artifacts there if a rebuild is warranted). This will require a change to the release process to avoid bumping the bootstrap compiler on beta before the stable release.
## Future improvements
* Add signature verification as part of `bump-stage0`, which would require the attacker to also obtain the release signing keys in order to successfully compromise the bootstrap compiler. This would be fine to add now, as the burden of installing the tool to verify signatures would only be placed on whoever updates the bootstrap compiler, instead of everyone compiling Rust.
* Add a check on CI that ensures the checksums in `src/stage0.json` are the expected ones. If a PR changes the stage0 file CI should also run the `bump-stage0` tool and fail if the output in CI doesn't match the committed file. This prevents the PR author from tweaking the output of the tool manually, which would otherwise be close to impossible for a human to detect.
* Automate creating the PRs bumping the bootstrap compiler, by setting up a scheduled job in GitHub Actions that runs the tool and opens a PR.
* Investigate whether a similar mitigation can be done for "download from CI" components like the prebuilt LLVM.
r? `@Mark-Simulacrum`
Change scope of temporaries in match guards
Each pattern in a match arm has its own copy of the match guard in MIR, with its own temporary, so it has to be dropped before the the guards are joined to the single copy of the arm. This PR changes `then_else_break` to allow it to put the temporary in the innermost scope possible. This change isn't done for `if` expressions because that affects a large number of mir-opt tests and could more significantly affect performance.
closes#88649
r? `@oli-obk`
Remove SmallVector mention
SmallVector is long gone, as it's been first replaced
by OneVector in commit e5e6375352,
which then has been removed entirely in favour of SmallVec in
commit 130a32fa72.
Document when to use Windows' `symlink_dir` vs. `symlink_file`
It was previously unclear why there are two functions and when they should be used.
Fixes: #88635
Add tests for some const generics issues
closes#82956closes#84659closes#86530closes#86535
there is also a random test in here about array repeat expressions that I already had on this branch but it seems to fit the theme of this PR so kept it...
r? `@lcnr`
Improve structured tuple struct suggestion
Previously, the span was just for the constructor name, which meant it
would result in syntactically-invalid code when applied. Now, the span
is for the entire expression.
I also changed it to use `span_suggestion_verbose`, for two reasons:
1. Now that the suggestion span has been corrected, the output is a bit
cluttered and hard to read. Putting the suggestion its own window
creates more space.
2. It's easier to see what's being suggested, since now the version
after the suggestion is applied is shown.
r? `@davidtwco`
Change return type for T::{log,log2,log10} to u32.
The value is at most 128, and this is consistent with using u32 for small values elsewhere (e.g. BITS, count_ones, leading_zeros).
Avoid invoking the hir_crate query to traverse the HIR
Walking the HIR tree is done using the `hir_crate` query. However, this is unnecessary, since `hir_owner(CRATE_DEF_ID)` provides the same information. Since depending on `hir_crate` forces dependents to always be executed, this leads to unnecessary work.
By splitting HIR and attributes visits, we can avoid an edge to `hir_crate` when trying to visit the HIR tree.
Stop allocating vtable entries for non-object-safe methods
Current a vtable entry is allocated for all associated fns, even if the method is not object-safe: https://godbolt.org/z/h7vx6f35T
As a result, each vtable for `Iterator`' currently consumes 74 `usize`s. This PR stops allocating vtable entries for those methods, reducing vtable size of each `Iterator` vtable to 7 `usize`s.
Note that this PR introduces will cause more invocations of `is_vtable_safe_method`. So a perf run might be needed. If result isn't favorable then we might need to query-ify `is_vtable_safe_method`.
Each pattern in a match arm has its own copy of the match guard in MIR,
with its own temporary, so it has to be dropped before the the guards
are joined to the single copy of the arm.
Provide `layout_of` automatically (given tcx + param_env + error handling).
After #88337, there's no longer any uses of `LayoutOf` within `rustc_target` itself, so I realized I could move the trait to `rustc_middle::ty::layout` and redesign it a bit.
This is similar to #88338 (and supersedes it), but at no ergonomic loss, since there's no funky `C: LayoutOf<Ty = Ty>` -> `Ty: TyAbiInterface<C>` generic `impl` chain, and each `LayoutOf` still corresponds to one `impl` (of `LayoutOfHelpers`) for the specific context.
After this PR, this is what's needed to get `trait LayoutOf` (with the `layout_of` method) implemented on some context type:
* `TyCtxt`, via `HasTyCtxt`
* `ParamEnv`, via `HasParamEnv`
* a way to transform `LayoutError`s into the desired error type
* an error type of `!` can be paired with having `cx.layout_of(...)` return `TyAndLayout` *without* `Result<...>` around it, such as used by codegen
* this is done through a new `LayoutOfHelpers` trait (and so is specifying the type of `cx.layout_of(...)`)
When going through this path (and not bypassing it with a manual `impl` of `LayoutOf`), the end result is that only the error case can be customized, the query itself and the success paths are guaranteed to be uniform.
(**EDIT**: just noticed that because of the supertrait relationship, you cannot actually implement `LayoutOf` yourself, the blanket `impl` fully covers all possible context types that could ever implement it)
Part of the motivation for this shape of API is that I've been working on querifying `FnAbi::of_*`, and what I want/need to introduce for that looks a lot like the setup in this PR - in particular, it's harder to express the `FnAbi` methods in `rustc_target`, since they're much more tied to `rustc` concepts.
r? `@nagisa` cc `@oli-obk` `@bjorn3`
rustdoc: Clean up handling of lifetime bounds
Previously, rustdoc recorded lifetime bounds by rendering them into the
name of the lifetime parameter. Now, it leaves the name as the actual
name and instead records lifetime bounds in an `outlives` list, similar
to how type parameter bounds are recorded.
Also, higher-ranked lifetimes cannot currently have bounds, so I simplified
the code to reflect that.
r? `@GuillaumeGomez`
Rollup of 4 pull requests
Successful merges:
- #88257 (Provide more context on incorrect inner attribute)
- #88432 (Fix a typo in raw_vec)
- #88511 (x.py clippy: don't run with --all-targets by default)
- #88657 (Fix 2021 `dyn` suggestion that used code as label)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Fix 2021 `dyn` suggestion that used code as label
The arguments to `span_suggestion` were in the wrong order, so the error
looked like this:
error[E0783]: trait objects without an explicit `dyn` are deprecated
--> src/test/ui/editions/dyn-trait-sugg-2021.rs:10:5
|
10 | Foo::hi(123);
| ^^^ help: <dyn Foo>: `use `dyn``
Now the error looks like this, as expected:
error[E0783]: trait objects without an explicit `dyn` are deprecated
--> src/test/ui/editions/dyn-trait-sugg-2021.rs:10:5
|
10 | Foo::hi(123);
| ^^^ help: use `dyn`: `<dyn Foo>`
This issue was only present in the 2021 error; the 2018 lint was
correct.
r? `@m-ou-se`
SmallVector is long gone, as it's been first replaced
by OneVector in commit e5e6375352,
which then has been removed entirely in favour of SmallVec in
commit 130a32fa72.
Add links in docs for some primitive types
This pull request adds additional links in existing documentation of some of the primitive types.
Where items are linked only once, I have used the `[link](destination)` format. For items in `std`, I have linked directly to the HTML, since although the primitives are in `core`, they are not displayed on `core` documentation. I was unsure of what length I should keep lines of documentation to, so I tried to keep them within reason.
Additionally, I have avoided excessively linking to keywords like `self` when they are not relevant to the documentation. I can add these links if it would be an improvement.
I hope this can improve Rust. Please let me know if there's anything I did wrong!
The arguments to `span_suggestion` were in the wrong order, so the error
looked like this:
error[E0783]: trait objects without an explicit `dyn` are deprecated
--> src/test/ui/editions/dyn-trait-sugg-2021.rs:10:5
|
10 | Foo::hi(123);
| ^^^ help: <dyn Foo>: `use `dyn``
Now the error looks like this, as expected:
error[E0783]: trait objects without an explicit `dyn` are deprecated
--> src/test/ui/editions/dyn-trait-sugg-2021.rs:10:5
|
10 | Foo::hi(123);
| ^^^ help: use `dyn`: `<dyn Foo>`
This issue was only present in the 2021 error; the 2018 lint was
correct.
Make sure FileCheck is copied in the LLVM output directory
The tool, which is needed by parts of our test suite, is built as part of LLVM but is *not* copied to the directory containing the output LLVM binaries. This adds a flag to ensure the binary is copied. This shouldn't add any extra built time, as the flag just installs extra binaries that were already compiled.
This is not strictly needed for the test suite to work (as it also checks `build/$target/llvm/build/bin` for the binary), but it allows deleting the `build/$TARGET/llvm/build` directory (which also contains the intermediary build artifacts) without affecting the test suite, saving disk space.
rustdoc: Box `GenericArg::Const` to reduce enum size
This should reduce the amount of memory allocated in the common cases
where the `GenericArg` is a lifetime or type.
Include debug info for the allocator shim
Issue Details:
In some cases it is necessary to generate an "allocator shim" to forward various Rust allocation functions (e.g., `__rust_alloc`) to an underlying function (e.g., `malloc`). However, since this allocator shim is a manually created LLVM module it is not processed via the normal module processing code and so no debug info is generated for it (if debugging info is enabled).
Fix Details:
* Modify the `debuginfo` code to allow creating debug info for a module without a `CodegenCx` (since it is difficult, and expensive, to create one just to emit some debug info).
* After creating the allocator shim add in basic debug info.