convert NLL ops to caches
This is a extension of <https://github.com/rust-lang/rust/pull/51460>. It uses a lot more caching than we used to do. This caching is not yet as efficient as it could be, but I'm curious to see the current perf results.
This is the high-level idea: in the MIR type checker, use [canonicalized queries](https://rust-lang-nursery.github.io/rustc-guide/traits/canonical-queries.html) for all the major operations. This is helpful because the MIR type check is operating in a context where all types are fully known (mostly, anyway) but regions are completely renumbered. This means we often wind up with duplicate queries like `Foo<'1, '2> :Bar` and `Foo<'3, '4>: Bar`. Canonicalized queries let us re-use the results. By the final commit in this PR, we can essentially just "read off" the resulting region relations and add them to the NLL type check.
Rollup of 7 pull requests
Successful merges:
- #49987 (Add str::split_ascii_whitespace.)
- #50342 (Document round-off error in `.mod_euc()`-method, see issue #50179)
- #51658 (Only do sanity check with debug assertions on)
- #51799 (Lower case some feature gate error messages)
- #51800 (Add a compiletest header for edition)
- #51824 (Fix the error reference for LocalKey::try_with)
- #51842 (Document that Layout::from_size_align does not allow align=0)
Failed merges:
r? @ghost
Fix the error reference for LocalKey::try_with
There's no such thing as `ThreadLocalError` and the method obviously returns `AccessError`, so adjusting (probably only outdated docs).
Add a compiletest header for edition
r? @nikomatsakis
Are the `-Zunstable-options` options needed in these tests? It looks like they aren't. If not, I can remove them.
Only do sanity check with debug assertions on
r? @nnethercote
I'm slighty confused. These changes address code that the `unused-warnings` benchmark doesn't go through, yet I see a 5% improvement to nightly on the `check` run, and no improvement on the other runs.
Maybe this change allows unrelated code in the same function to be better optimized?
Document round-off error in `.mod_euc()`-method, see issue #50179
Due to a round-off error the method `.mod_euc()` of both `f32` and `f64` can produce mathematical invalid outputs. If `self` in magnitude is much small than the modulus `rhs` and negative, `self + rhs` in the first branch cannot be represented in the given precision and results into `rhs`. In the mathematical strict sense, this breaks the definition. But given the limitation of floating point arithmetic it can be thought of the closest representable value to the true result, although it is not strictly in the domain `[0.0, rhs)` of the function. It is rather the left side asymptotical limit. It would be desirable that it produces the mathematical more sound approximation of `0.0`, the right side asymptotical limit. But this breaks the property, that `self == self.div_euc(rhs) * rhs + a.mod_euc(rhs)`.
The discussion in issue #50179 did not find an satisfying conclusion to which property is deemed more important. But at least we can document the behaviour. Which this pull request does.
Add str::split_ascii_whitespace.
As mentioned in #48656.
While `str::split_whitespace` now works in `libcore`, it still makes sense to offer this method, considering how it is still more performant in cases where only ASCII is necessary.
[fuchsia] Update zx_cprng_draw to target semantics
This change is the final step in improving the semantics of
zx_cprng_draw. Now the syscall always generates the requested number of
bytes. If the syscall would have failed to generate the requested number
of bytes, the syscall either terminates the entire operating system or
terminates the calling process, depending on whether the error is a
result of the kernel misbehaving or the userspace program misbehaving.
This change is the final step in improving the semantics of
zx_cprng_draw. Now the syscall always generates the requested number of
bytes. If the syscall would have failed to generate the requested number
of bytes, the syscall either terminates the entire operating system or
terminates the calling process, depending on whether the error is a
result of the kernel misbehaving or the userspace program misbehaving.
Implement `#[macro_export(local_inner_macros)]` (a solution for the macro helper import problem)
Implement a solution for the macro helper issue discussed in https://github.com/rust-lang/rust/issues/35896 as described in https://github.com/rust-lang/rust/issues/35896#issuecomment-395977901.
Macros exported from libraries can be marked with `#[macro_export(local_inner_macros)]` and this annotation changes how nested macros in them are resolved.
If we have a fn-like macro call `ident!(...)` and `ident` comes from a `macro_rules!` macro marked with `#[macro_export(local_inner_macros)]` then it's replaced with `$crate::ident!(...)` and resolved as such (`$crate` gets the same context as `ident`).
This requires us to allocate a single entry vector we didn't use to
allocate. I doubt this makes a difference in practice, as this only
occurs for cache misses.
Optimize sum of Durations by using custom function
The current `impl Sum for Duration` uses `fold` to perform several `add`s (or really `checked_add`s) of durations. In doing so, it has to guarantee the number of nanoseconds is valid after every addition. If you squeese the current implementation into a single function it looks kind of like this:
````rust
fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
let mut sum = Duration::new(0, 0);
for rhs in iter {
if let Some(mut secs) = sum.secs.checked_add(rhs.secs) {
let mut nanos = sum.nanos + rhs.nanos;
if nanos >= NANOS_PER_SEC {
nanos -= NANOS_PER_SEC;
if let Some(new_secs) = secs.checked_add(1) {
secs = new_secs;
} else {
panic!("overflow when adding durations");
}
}
sum = Duration { secs, nanos }
} else {
panic!("overflow when adding durations");
}
}
sum
}
````
We only need to check if `nanos` is in the correct range when giving our final answer so we can have a more optimized version like so:
````rust
fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
let mut total_secs: u64 = 0;
let mut total_nanos: u64 = 0;
for entry in iter {
total_secs = total_secs
.checked_add(entry.secs)
.expect("overflow in iter::sum over durations");
total_nanos = match total_nanos.checked_add(entry.nanos as u64) {
Some(n) => n,
None => {
total_secs = total_secs
.checked_add(total_nanos / NANOS_PER_SEC as u64)
.expect("overflow in iter::sum over durations");
(total_nanos % NANOS_PER_SEC as u64) + entry.nanos as u64
}
};
}
total_secs = total_secs
.checked_add(total_nanos / NANOS_PER_SEC as u64)
.expect("overflow in iter::sum over durations");
total_nanos = total_nanos % NANOS_PER_SEC as u64;
Duration {
secs: total_secs,
nanos: total_nanos as u32,
}
}
````
We now only convert `total_nanos` to `total_secs` (1) if `total_nanos` overflows and (2) at the end of the function when we have to output a valid `Duration`. This gave a 5-22% performance improvement when I benchmarked it, depending on how big the `nano` value of the `Duration`s in `iter` were.