Stop keeping metadata in memory before writing it to disk
Fixes#96358
I created this PR according with the instruction given in the issue except for the following points:
- While the issue says "Write metadata into the temporary file in `encode_and_write_metadata` even if `!need_metadata_file`", I could not do that. That is because though I tried to do that and run `x.py test`, I got a lot of test failures as follows.
<details>
<summary>List of failed tests</summary>
<pre>
<code>
failures:
[ui] src/test/ui/json-multiple.rs
[ui] src/test/ui/json-options.rs
[ui] src/test/ui/rmeta/rmeta-rpass.rs
[ui] src/test/ui/save-analysis/emit-notifications.rs
[ui] src/test/ui/svh/changing-crates.rs
[ui] src/test/ui/svh/svh-change-lit.rs
[ui] src/test/ui/svh/svh-change-significant-cfg.rs
[ui] src/test/ui/svh/svh-change-trait-bound.rs
[ui] src/test/ui/svh/svh-change-type-arg.rs
[ui] src/test/ui/svh/svh-change-type-ret.rs
[ui] src/test/ui/svh/svh-change-type-static.rs
[ui] src/test/ui/svh/svh-use-trait.rs
test result: FAILED. 12915 passed; 12 failed; 100 ignored; 0 measured; 0 filtered out; finished in 71.41s
Some tests failed in compiletest suite=ui mode=ui host=x86_64-unknown-linux-gnu target=x86_64-unknown-linux-gnu
Build completed unsuccessfully in 0:01:58
</code>
</pre>
</details>
- I could not resolve the extra tasks about `create_rmeta_file` and `create_compressed_metadata_file` for my lack of ability.
Fix RSS reporting on macOS
> NOTE: This is a duplicate of #98164, which I closed because I borked my rustc fork
Currently, `rustc_data_structures::profiling::get_resident_set_size()` always returns `None` on macOS. This is because
macOS does not implement procfs used in the unix version of the function:
```rust
...
else if #[cfg(unix)] {
pub fn get_resident_set_size() -> Option<usize> {
let field = 1;
let contents = fs::read("/proc/self/statm").ok()?;
let contents = String::from_utf8(contents).ok()?;
let s = contents.split_whitespace().nth(field)?;
let npages = s.parse::<usize>().ok()?;
Some(npages * 4096)
}
...
```
The proposed solution uses libproc, and more specifically `proc_pidinfo`, which has been available on macOS since 10.5 if the function signature inside libproc.h is to be believed:
```c
int proc_pidinfo(int pid, int flavor, uint64_t arg, void *buffer, int buffersize) __OSX_AVAILABLE_STARTING(__MAC_10_5, __IPHONE_2_0);
```
Rename rustc_serialize::opaque::Encoder as MemEncoder.
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).
(This was previously merged as commit 5 in #94732 and then was reverted
in #97905 because of a perf regression caused by commit 4 in #94732.)
r? ```@bjorn3```
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).
(This was previously merged as commit 5 in #94732 and then was reverted
in #97905 because of a perf regression caused by commit 4 in #94732.)
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).
There are two impls of the `Encoder` trait: `opaque::Encoder` and
`opaque::FileEncoder`. The former encodes into memory and is infallible, the
latter writes to file and is fallible.
Currently, standard `Result`/`?`/`unwrap` error handling is used, but this is a
bit verbose and has non-trivial cost, which is annoying given how rare failures
are (especially in the infallible `opaque::Encoder` case).
This commit changes how `Encoder` fallibility is handled. All the `emit_*`
methods are now infallible. `opaque::Encoder` requires no great changes for
this. `opaque::FileEncoder` now implements a delayed error handling strategy.
If a failure occurs, it records this via the `res` field, and all subsequent
encoding operations are skipped if `res` indicates an error has occurred. Once
encoding is complete, the new `finish` method is called, which returns a
`Result`. In other words, there is now a single `Result`-producing method
instead of many of them.
This has very little effect on how any file errors are reported if
`opaque::FileEncoder` has any failures.
Much of this commit is boring mechanical changes, removing `Result` return
values and `?` or `unwrap` from expressions. The more interesting parts are as
follows.
- serialize.rs: The `Encoder` trait gains an `Ok` associated type. The
`into_inner` method is changed into `finish`, which returns
`Result<Vec<u8>, !>`.
- opaque.rs: The `FileEncoder` adopts the delayed error handling
strategy. Its `Ok` type is a `usize`, returning the number of bytes
written, replacing previous uses of `FileEncoder::position`.
- Various methods that take an encoder now consume it, rather than being
passed a mutable reference, e.g. `serialize_query_result_cache`.
Because it really has two halves:
- A read-only part that checks if further work is needed.
- The further work part, which is much less hot.
This makes things a bit clearer and nicer.
It creates the src pointer first, which is then invalidated by a
unique borrow of the destination pointer. Swap the borrows around
to fix this. Found with miri.
Some tests took too long and owning_ref is fundamentally flawed,
so don't run these tests or run them with a shorter N. This makes
miri with `-Zmiri-strict-provenance` usable to find UB.
Begin fixing all the broken doctests in `compiler/`
Begins to fix#95994.
All of them pass now but 24 of them I've marked with `ignore HELP (<explanation>)` (asking for help) as I'm unsure how to get them to work / if we should leave them as they are.
There are also a few that I marked `ignore` that could maybe be made to work but seem less important.
Each `ignore` has a rough "reason" for ignoring after it parentheses, with
- `(pseudo-rust)` meaning "mostly rust-like but contains foreign syntax"
- `(illustrative)` a somewhat catchall for either a fragment of rust that doesn't stand on its own (like a lone type), or abbreviated rust with ellipses and undeclared types that would get too cluttered if made compile-worthy.
- `(not-rust)` stuff that isn't rust but benefits from the syntax highlighting, like MIR.
- `(internal)` uses `rustc_*` code which would be difficult to make work with the testing setup.
Those reason notes are a bit inconsistently applied and messy though. If that's important I can go through them again and try a more principled approach. When I run `rg '```ignore \(' .` on the repo, there look to be lots of different conventions other people have used for this sort of thing. I could try unifying them all if that would be helpful.
I'm not sure if there was a better existing way to do this but I wrote my own script to help me run all the doctests and wade through the output. If that would be useful to anyone else, I put it here: https://github.com/Elliot-Roberts/rust_doctest_fixing_tool
Fixing #95444 by only displaying passes that take more than 5 millise…
As discussed in #95444, I have added the code to test and only display prints that are greater than 5 milliseconds.
r? `@jyn514`
Add a dedicated length-prefixing method to `Hasher`
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Fixes#94026
r? rust-lang/libs
---
The core of this change is the following two new methods on `Hasher`:
```rust
pub trait Hasher {
/// Writes a length prefix into this hasher, as part of being prefix-free.
///
/// If you're implementing [`Hash`] for a custom collection, call this before
/// writing its contents to this `Hasher`. That way
/// `(collection![1, 2, 3], collection![4, 5])` and
/// `(collection![1, 2], collection![3, 4, 5])` will provide different
/// sequences of values to the `Hasher`
///
/// The `impl<T> Hash for [T]` includes a call to this method, so if you're
/// hashing a slice (or array or vector) via its `Hash::hash` method,
/// you should **not** call this yourself.
///
/// This method is only for providing domain separation. If you want to
/// hash a `usize` that represents part of the *data*, then it's important
/// that you pass it to [`Hasher::write_usize`] instead of to this method.
///
/// # Examples
///
/// ```
/// #![feature(hasher_prefixfree_extras)]
/// # // Stubs to make the `impl` below pass the compiler
/// # struct MyCollection<T>(Option<T>);
/// # impl<T> MyCollection<T> {
/// # fn len(&self) -> usize { todo!() }
/// # }
/// # impl<'a, T> IntoIterator for &'a MyCollection<T> {
/// # type Item = T;
/// # type IntoIter = std::iter::Empty<T>;
/// # fn into_iter(self) -> Self::IntoIter { todo!() }
/// # }
///
/// use std:#️⃣:{Hash, Hasher};
/// impl<T: Hash> Hash for MyCollection<T> {
/// fn hash<H: Hasher>(&self, state: &mut H) {
/// state.write_length_prefix(self.len());
/// for elt in self {
/// elt.hash(state);
/// }
/// }
/// }
/// ```
///
/// # Note to Implementers
///
/// If you've decided that your `Hasher` is willing to be susceptible to
/// Hash-DoS attacks, then you might consider skipping hashing some or all
/// of the `len` provided in the name of increased performance.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_length_prefix(&mut self, len: usize) {
self.write_usize(len);
}
/// Writes a single `str` into this hasher.
///
/// If you're implementing [`Hash`], you generally do not need to call this,
/// as the `impl Hash for str` does, so you can just use that.
///
/// This includes the domain separator for prefix-freedom, so you should
/// **not** call `Self::write_length_prefix` before calling this.
///
/// # Note to Implementers
///
/// The default implementation of this method includes a call to
/// [`Self::write_length_prefix`], so if your implementation of `Hasher`
/// doesn't care about prefix-freedom and you've thus overridden
/// that method to do nothing, there's no need to override this one.
///
/// This method is available to be overridden separately from the others
/// as `str` being UTF-8 means that it never contains `0xFF` bytes, which
/// can be used to provide prefix-freedom cheaper than hashing a length.
///
/// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating
/// them into a buffer), then you can hash the bytes of the `str` followed
/// by a single `0xFF` byte.
///
/// If your `Hasher` works in chunks, you can also do this by being careful
/// about how you pad partial chunks. If the chunks are padded with `0x00`
/// bytes then just hashing an extra `0xFF` byte doesn't necessarily
/// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash
/// the same sequence of chunks. But if you pad with `0xFF` bytes instead,
/// ensuring at least one padding byte, then it can often provide
/// prefix-freedom cheaper than hashing the length would.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_str(&mut self, s: &str) {
self.write_length_prefix(s.len());
self.write(s.as_bytes());
}
}
```
With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`.
`write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing. But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach.
---
Compatibility:
Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
95444: Adding passes that include memory increase
Fix95444: Change the substraction with the abs_diff() method
Fix95444: Change the substraction with abs_diff() method
rustc_metadata: Do not encode unnecessary module children
This should remove the syntax context shift and the special case for `ExternCrate` in decoder in https://github.com/rust-lang/rust/pull/95880.
This PR also shifts some work from decoding to encoding, which is typically useful for performance (but probably not much in this case).
r? `@cjgillot`
Allow self-profiler to only record potentially costly arguments when argument recording is turned on
As discussed [on zulip](https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Identifying.20proc-macro.20slowdowns/near/277304909) with `@wesleywiser,` I'd like to record proc-macro expansions in the self-profiler, with some detailed data (per-expansion spans for example, to follow #95473).
At the same time, I'd also like to avoid doing expensive things when tracking a generic activity's arguments, if they were not specifically opted into the event filter mask, to allow the self-profiler to be used in hotter contexts.
This PR tries to offer:
- a way to ensure a closure to record arguments will only be called in that situation, so that potentially costly arguments can still be recorded when needed. With the additional requirement that, if possible, it would offer a way to record non-owned data without adding many `generic_activity_with_arg_{...}`-style methods. This lead to the `generic_activity_with_arg_recorder` single entry-point, and the closure parameter would offer the new methods, able to be executed in a context where costly argument could be created without disturbing the profiled piece of code.
- some facilities/patterns allowing to record more rustc specific data in this situation, without making `rustc_data_structures` where the self-profiler is defined, depend on other rustc crates (causing circular dependencies): in particular, spans. They are quite tricky to turn into strings (if the default `Debug` impl output does not match the context one needs them for), and since I'd also like to avoid the allocation there when arg recording is turned off today, that has turned into another flexibility requirement for the API in this PR (separating the span-specific recording into an extension trait). **edit**: I've removed this from the PR so that it's easier to review, and opened https://github.com/rust-lang/rust/pull/95739.
- allow for extensibility in the future: other ways to record arguments, or additional data attached to them could be added in the future (e.g. recording the argument's name as well as its data).
Some areas where I'd love feedback:
- the API and names: the `EventArgRecorder` and its method for example. As well as the verbosity that comes from the increased flexibility.
- if I should convert the existing `generic_activity_with_arg{s}` to just forward to `generic_activity_with_arg_recorder` + `recorder.record_arg` (or remove them altogether ? Probably not): I've used the new API in the simple case I could find of allocating for an arg that may not be recorded, and the rest don't seem costly.
- [x] whether this API should panic if no arguments were recorded by the user-provided closure (like this PR currently does: it seems like an error to use an API dedicated to record arguments but not call the methods to then do so) or if this should just record a generic activity without arguments ?
- whether the `record_arg` function should be `#[inline(always)]`, like the `generic_activity_*` functions ?
As mentioned, r? `@wesleywiser` following our recent discussion.
make unaligned_references lint deny-by-default
This lint has been warn-by-default for a year now (since https://github.com/rust-lang/rust/pull/82525), so I think it is time to crank it up a bit. Code that triggers the lint causes UB (without `unsafe`) when executed, so we really don't want people to write code like this.
Cached stable hash cleanups
r? `@nnethercote`
Add a sanity assertion in debug mode to check that the cached hashes are actually the ones we get if we compute the hash each time.
Add a new data structure that bundles all the hash-caching work to make it easier to re-use it for different interned data structures
Enforce well formedness for type alias impl trait's hidden type
fixes#84657
This was not an issue with return-position-impl-trait because the generic bounds of the function are the same as those of the opaque type, and the hidden type must already be well formed within the function.
With type-alias-impl-trait the hidden type could be defined in a function that has *more* lifetime bounds than the type alias. This is fine, but the hidden type must still be well formed without those additional bounds.
This allows profiling costly arguments to be recorded only when `-Zself-profile-events=args` is on: using a closure that takes an `EventArgRecorder` and call its `record_arg` or `record_args` methods.