One can either use `-Z borrowck-mir` or add the `#[rustc_mir_borrowck]` attribute
to opt into MIR based borrow checking.
Note that regardless of whether one opts in or not, AST-based borrow
check will still run as well. The errors emitted from AST-based
borrow check will include a "(Ast)" suffix in their error message,
while the errors emitted from MIR-based borrow check will include a
"(Mir)" suffix.
post-rebase: removed check for intra-statement mutual conflict;
replaced with assertion checking that at most one borrow is generated
per statement.
post-rebase: removed dead code: `IdxSet::pairs` and supporting stuff.
Use hir::ItemLocalId as keys in TypeckTables.
This PR makes `TypeckTables` use `ItemLocalId` instead of `NodeId` as key. This is needed for incremental compilation -- for stable hashing and for being able to persist and reload these tables. The PR implements the most important part of https://github.com/rust-lang/rust/issues/40303.
Some notes on the implementation:
* The PR adds the `HirId` to HIR nodes where needed (`Expr`, `Local`, `Block`, `Pat`) which obviates the need to store a `NodeId -> HirId` mapping in crate metadata. Thanks @eddyb for the suggestion! In the future the `HirId` should completely replace the `NodeId` in HIR nodes.
* Before something is read or stored in one of the various `TypeckTables` subtables, the entry's key is validated via the new `TypeckTables::validate_hir_id()` method. This makes sure that we are not mixing information from different items in a single table.
That last part could be made a bit nicer by either (a) new-typing the table-key and making `validate_hir_id()` the only way to convert a `HirId` to the new-typed key, or (b) just encapsulate sub-table access a little better. This PR, however, contents itself with not making things significantly worse.
Also, there's quite a bit of switching around between `NodeId`, `HirId`, and `DefIndex`. These conversions are cheap except for `HirId -> NodeId`, so if the valued reviewer finds such an instance in a performance critical place, please let me know.
Ideally we convert more and more code from `NodeId` to `HirId` in the future so that there are no more `NodeId`s after HIR lowering anywhere. Then the amount of switching should be minimal again.
r? @eddyb, maybe?
Fixed mutable vars being marked used when they weren't
#### NB : bootstrapping is slow on my machine, even with `keep-stage` - fixes for occurances in the current codebase are <s>in the pipeline</s> done. This PR is being put up for review of the fix of the issue.
Fixes#43526, Fixes#30280, Fixes#25049
### Issue
Whenever the compiler detected a mutable deref being used mutably, it marked an associated value as being used mutably as well. In the case of derefencing local variables which were mutable references, this incorrectly marked the reference itself being used mutably, instead of its contents - with the consequence of making the following code emit no warnings
```
fn do_thing<T>(mut arg : &mut T) {
... // don't touch arg - just deref it to access the T
}
```
### Fix
Make dereferences not be counted as a mutable use, but only when they're on borrows on local variables.
#### Why not on things other than local variables?
* Whenever you capture a variable in a closure, it gets turned into a hidden reference - when you use it in the closure, it gets dereferenced. If the closure uses the variable mutably, that is actually a mutable use of the thing being dereffed to, so it has to be counted.
* If you deref a mutable `Box` to access the contents mutably, you are using the `Box` mutably - so it has to be counted.
rustc::middle::dataflow - visit the CFG in RPO
We used to propagate bits in node-id order, which sometimes caused an
excessive number of iterations, especially when macros were present. As
everyone knows, visiting the CFG in RPO bounds the number of iterators
by 1 plus the depth of the most deeply nested loop (times the height of
the lattice, which is 1).
I have no idea how this affects borrowck perf in the non-worst-case, so it's probably a good idea to not roll this up so we can see the effects.
Fixes#43704.
r? @eddyb
We used to propagate bits in node-id order, which sometimes caused an
excessive number of iterations, especially when macros were present. As
everyone knows, visiting the CFG in RPO bounds the number of iterators
by 1 plus the depth of the most deeply nested loop (times the height of
the lattice, which is 1).
Fixes#43704.
These need to be inlined across crates to avoid showing up as one-instruction
functions in profiles! In the benchmark from #43578 this decreased the
translation item collection step from 30s to 23s, and looks like it also allowed
vectorization elsewhere of the operations!
Stabilize more APIs for the 1.20.0 release
In addition to the few stabilizations that have already landed, this cleans up the remaining APIs that are in `final-comment-period` right now to be stable by the 1.20.0 release
Constrain the layout of Blake2bCtx for proper SPARC compilation
On SPARC, optimization fuel ends up emitting incorrect load and store
instructions for the transmute() call in blake2b_compress(). If we
force Blake2bCtx to be repr(C), the problem disappears.
Fixes#43346
Make the "main" constructors of NonZero/Shared/Unique return Option
Per discussion in https://github.com/rust-lang/rust/issues/27730#issuecomment-303939441.
This is a breaking change to unstable APIs.
The old behavior is still available under the name `new_unchecked`. Note that only that one can be `const fn`, since `if` is currently not allowed in constant contexts.
In the case of `NonZero` this requires adding a new `is_zero` method to the `Zeroable` trait. I mildly dislike this, but it’s not much worse than having a `Zeroable` trait in the first place. `Zeroable` and `NonZero` are both unstable, this can be reworked later.
On SPARC, optimization fuel ends up emitting incorrect load and store
instructions for the transmute() call in blake2b_compress(). If we
force Blake2bCtx to be repr(C), the problem disappears.
Fixes#43346
Implement lazy loading of external crates' sources. Fixes#38875Fixes#38875. This is a follow-up to #42507. When a (now correctly translated) span from an external crate is referenced in a error, warning or info message, we still don't have the source code being referenced.
Since stuffing the source in the serialized metadata of an rlib is extremely wasteful, the following scheme has been implemented:
* File maps now contain a source hash that gets serialized as well.
* When a span is rendered in a message, the source hash in the corresponding file map(s) is used to try and load the source from the corresponding file on disk. If the file is not found or the hashes don't match, the failed attempt is recorded (and not retried).
* The machinery fetching source lines from file maps is augmented to use the lazily loaded external source as a secondary fallback for file maps belonging to external crates.
This required a small change to the expected stderr of one UI test (it now renders a span, where previously was none).
Further work can be done based on this - some of the machinery previously used to hide external spans is possibly obsolete and the hashing code can be reused in different places as well.
r? @eddyb
Remove struct_field_attributes feature gate
Part of #41681. ~This PR only removes the feature gate; this *does not* update any documentations.~ This PR removes the feature gate and the corresponding chapter of the Unstable Book.
I'm not very sure about the changes I made though... Just followed the stabilization guideline.
r? @nikomatsakis
Refactor variance and remove last `[pub]` map
This PR refactors variance to work in a more red-green friendly way. Because red-green doesn't exist yet, it has to be a bit hacky. The basic idea is this:
- We compute a big map with the variance for all items in the crate; when you request variances for a particular item, we read it from the crate
- We now hard-code that traits are invariant (which they are, for deep reasons, not gonna' change)
- When building constraints, we compute the transitive closure of all things within the crate that depend on what using `TransitiveRelation`
- this lets us gin up the correct dependencies when requesting variance of a single item
Ah damn, just remembered, one TODO:
- [x] Update the variance README -- ah, I guess the README updates I did are sufficient
r? @michaelwoerister
There are now two queries: crate and item. The crate one computes the
variance of all items in the crate; it is sort of an implementation
detail, and not meant to be used. The item one reads from the crate one,
synthesizing correct deps in lieu of the red-green algorithm.
At the same time, remove the `variance_computed` flag, which was a
horrible hack used to force invariance early on (e.g. when type-checking
constants). This is only needed because of trait applications, and
traits are always invariant anyway. Therefore, we now change to take
advantage of the query system:
- When asked to compute variances for a trait, just return a vector
saying 'all invariant'.
- Remove the corresponding "inferreds" from traits, and tweak the
constraint generation code to understand that traits are always
inferred.
Removal pass for anonymous parameters
Removes occurences of anonymous parameters from the
rustc codebase, as they are to be deprecated.
See issue #41686 and RFC 1685.
r? @frewsxcv
Handle subtyping in inference through obligations
We currently store subtyping relations in the `TypeVariables` structure as a kind of special case. This branch uses normal obligations to propagate subtyping, thus converting our inference variables into normal fallback. It also does a few other things:
- Removes the (unstable, outdated) support for custom type inference fallback.
- It's not clear how we want this to work, but we know that we don't want it to work the way it currently does.
- The existing support was also just getting in my way.
- Fixes#30225, which was caused by the trait caching code pretending type variables were normal unification variables, when indeed they were not (but now are).
There is one fishy part of these changes: when computing the LUB/GLB of a "bivariant" type parameter, I currently return the `a` value. Bivariant type parameters are only allowed in a very particular situation, where the type parameter is only used as an associated type output, like this:
```rust
pub struct Foo<A, B>
where A: Fn() -> B
{
data: A
}
```
In principle, if one had `T=Foo<A, &'a u32>` and `U=Foo<A, &'b u32>` and (e.g.) `A: for<'a> Fn() -> &'a u32`, then I think that computing the LUB of `T` and `U` might do the wrong thing. Probably the right behavior is just to create a fresh type variable. However, that particular example would not compile (because the where-clause is illegal; `'a` does not appear in any input type). I was not able to make an example that *would* compile and demonstrate this shortcoming, and handling the LUB/GLB was mildly inconvenient, so I left it as is. I am considering whether to revisit this or what.
I have started a crater run to test the impact of these changes.
Instead of collecting all potential inputs to some metadata entry and
hashing those, we directly hash the values we are storing in metadata.
This is more accurate and doesn't suffer from quadratic blow-up when
many entries have the same dependencies.
rustbuild: Update bootstrap compiler
Now that we've also updated cargo's release process this commit also changes the
download location of Cargo from Cargos archives back to the static.r-l.o
archives. This should ensure that the Cargo download is the exact Cargo paired
with the rustc that we release.
Now that we've also updated cargo's release process this commit also changes the
download location of Cargo from Cargos archives back to the static.r-l.o
archives. This should ensure that the Cargo download is the exact Cargo paired
with the rustc that we release.
This commit deletes the internal liblog in favor of the implementation that
lives on crates.io. Similarly it's also setting a convention for adding crates
to the compiler. The main restriction right now is that we want compiler
implementation details to be unreachable from normal Rust code (e.g. requires a
feature), and by default everything in the sysroot is reachable via `extern
crate`.
The proposal here is to require that crates pulled in have these lines in their
`src/lib.rs`:
#![cfg_attr(rustbuild, feature(staged_api, rustc_private))]
#![cfg_attr(rustbuild, unstable(feature = "rustc_private", issue = "27812"))]
This'll mean that by default they're not using these attributes but when
compiled as part of the compiler they do a few things:
* Mark themselves as entirely unstable via the `staged_api` feature and the
`#![unstable]` attribute.
* Allow usage of other unstable crates via `feature(rustc_private)` which is
required if the crate relies on any other crates to compile (other than std).
[MIR] SwitchInt Everywhere
Something I've been meaning to do for a very long while. This PR essentially gets rid of 3 kinds of conditional branching and only keeps the most general one - `SwitchInt`. Primary benefits are such that dealing with MIR now does not involve dealing with 3 different ways to do conditional control flow. On the other hand, constructing a `SwitchInt` currently requires more code than what previously was necessary to build an equivalent `If` terminator. Something trivially "fixable" with some constructor methods somewhere (MIR needs stuff like that badly in general).
Some timings (tl;dr: slightly faster^1 (unexpected), but also uses slightly more memory at peak (expected)):
^1: Not sure if the speed benefits are because of LLVM liking the generated code better or the compiler itself getting compiled better. Either way, its a net benefit. The CORE and SYNTAX timings done for compilation without optimisation.
```
AFTER:
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Finished release [optimized] target(s) in 31.50 secs
Finished release [optimized] target(s) in 31.42 secs
Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Finished release [optimized] target(s) in 439.56 secs
Finished release [optimized] target(s) in 435.15 secs
CORE: 99% (24.81 real, 0.13 kernel, 24.57 user); 358536k resident
CORE: 99% (24.56 real, 0.15 kernel, 24.36 user); 359168k resident
SYNTAX: 99% (49.98 real, 0.48 kernel, 49.42 user); 653416k resident
SYNTAX: 99% (50.07 real, 0.58 kernel, 49.43 user); 653604k resident
BEFORE:
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Finished release [optimized] target(s) in 31.84 secs
Building stage1 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Finished release [optimized] target(s) in 451.17 secs
CORE: 99% (24.66 real, 0.20 kernel, 24.38 user); 351096k resident
CORE: 99% (24.36 real, 0.17 kernel, 24.18 user); 352284k resident
SYNTAX: 99% (52.24 real, 0.56 kernel, 51.66 user); 645544k resident
SYNTAX: 99% (51.55 real, 0.48 kernel, 50.99 user); 646428k resident
```
cc @nikomatsakis @eddyb
This removes another special case of Switch by replacing it with the more general SwitchInt. While
this is more clunky currently, there’s no reason we can’t make it nice (and efficient) to use.
This commit updates the version number to 1.17.0 as we're not on that version of
the nightly compiler, and at the same time this updates src/stage0.txt to
bootstrap from freshly minted beta compiler and beta Cargo.
Like many hash functions, the blake2 hash is mathematically defined on
a sequence of 64-bit words. As Rust's hash interface operates on
sequences of octets, some encoding must be used to bridge that
difference.
The Blake2 RFC (RFC 7693) specifies that:
Byte (octet) streams are interpreted as words in little-endian order,
with the least-significant byte first.
So use that encoding consistently.
Fixes#38891.
Remove not(stage0) from deny(warnings)
Historically this was done to accommodate bugs in lints, but there hasn't been a
bug in a lint since this feature was added which the warnings affected. Let's
completely purge warnings from all our stages by denying warnings in all stages.
This will also assist in tracking down `stage0` code to be removed whenever
we're updating the bootstrap compiler.
Historically this was done to accommodate bugs in lints, but there hasn't been a
bug in a lint since this feature was added which the warnings affected. Let's
completely purge warnings from all our stages by denying warnings in all stages.
This will also assist in tracking down `stage0` code to be removed whenever
we're updating the bootstrap compiler.
struct field reordering and optimization
This is work in progress. The goal is to divorce the order of fields in source code from the order of fields in the LLVM IR, then optimize structs (and tuples/enum variants)by always ordering fields from least to most aligned. It does not work yet. I intend to check compiler memory usage as a benchmark, and a crater run will probably be required.
I don't know enough of the compiler to complete this work unaided. If you see places that still need updating, please mention them. The only one I know of currently is debuginfo, which I'm putting off intentionally until a bit later.
r? @eddyb
The standard implementations of Hasher have architecture-dependent
results when hashing integers. This causes problems when the hashes are
stored within metadata - metadata written by one host architecture can't
be read by another.
To fix that, implement an architecture-independent StableHasher and use
it in all places an architecture-independent hasher is needed.
Fixes#38177.
Before this PR, type names could depend on the cratenum being used
for a given crate and also on the source location of closures.
Both are undesirable for incremental compilation where we cache
LLVM IR and don't want it to depend on formatting or in which
order crates are loaded.
Replace syntax's SmallVector with AccumulateVec
This adds a new type to data_structures, `SmallVec`, which wraps `AccumulateVec` with support for re-allocating onto the heap (`SmallVec::reserve`). `SmallVec` is then used to replace the implementation of `SmallVector` in libsyntax.
r? @eddyb
Fixes#37371. Using `SmallVec` instead of libsyntax's `SmallVector` will provide the `N = 2/4` case easily (just needs a few more `Array` impls).
cc @nnethercote, probably interested in this area
A way to remove otherwise unused locals from MIR
There is a certain amount of desire for a pass which cleans up the provably unused variables (no assignments or reads). There has been an implementation of such pass by @scottcarr, and another (two!) implementations by me in my own dataflow efforts.
PR like https://github.com/rust-lang/rust/pull/35916 proves that this pass is useful even on its own, which is why I cherry-picked it out from my dataflow effort.
@nikomatsakis previously expressed concerns over this pass not seeming to be very cheap to run and therefore unsuitable for regular cleanup duties. Turns out, regular cleanup of local declarations is not at all necessary, at least now, because majority of passes simply do not (or should not) care about them. That’s why it is viable to only run this pass once (perhaps a few more times in the future?) per function, right before translation.
r? @eddyb or @nikomatsakis
Use where clasues and only where clauses for bounds in the
iterators for Graph.
The rest of the code uses bounds on the generic declarations for
Debug, and we may want to change those to for consistency. I did
not do that here because I don't know whether or not that's a good
idea. But for the iterators, they were inconsistent causing
confusion, at least for me.
This commit partially inlines two functions, `find_cycles_from_node` and
`mark_as_waiting_from`, at two call sites in order to avoid function
unnecessary function calls on hot paths.
It also fully inlines and removes `is_popped`.
These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 2% when
doing debug builds with a stage1 compiler.
Add ArrayVec and AccumulateVec to reduce heap allocations during interning of slices
Updates `mk_tup`, `mk_type_list`, and `mk_substs` to allow interning directly from iterators. The previous PR, #37220, changed some of the calls to pass a borrowed slice from `Vec` instead of directly passing the iterator, and these changes further optimize that to avoid the allocation entirely.
This change yields 50% less malloc calls in [some cases](https://pastebin.mozilla.org/8921686). It also yields decent, though not amazing, performance improvements:
```
futures-rs-test 4.091s vs 4.021s --> 1.017x faster (variance: 1.004x, 1.004x)
helloworld 0.219s vs 0.220s --> 0.993x faster (variance: 1.010x, 1.018x)
html5ever-2016- 3.805s vs 3.736s --> 1.018x faster (variance: 1.003x, 1.009x)
hyper.0.5.0 4.609s vs 4.571s --> 1.008x faster (variance: 1.015x, 1.017x)
inflate-0.1.0 3.864s vs 3.883s --> 0.995x faster (variance: 1.232x, 1.005x)
issue-32062-equ 0.309s vs 0.299s --> 1.033x faster (variance: 1.014x, 1.003x)
issue-32278-big 1.614s vs 1.594s --> 1.013x faster (variance: 1.007x, 1.004x)
jld-day15-parse 1.390s vs 1.326s --> 1.049x faster (variance: 1.006x, 1.009x)
piston-image-0. 10.930s vs 10.675s --> 1.024x faster (variance: 1.006x, 1.010x)
reddit-stress 2.302s vs 2.261s --> 1.019x faster (variance: 1.010x, 1.026x)
regex.0.1.30 2.250s vs 2.240s --> 1.005x faster (variance: 1.087x, 1.011x)
rust-encoding-0 1.895s vs 1.887s --> 1.005x faster (variance: 1.005x, 1.018x)
syntex-0.42.2 29.045s vs 28.663s --> 1.013x faster (variance: 1.004x, 1.006x)
syntex-0.42.2-i 13.925s vs 13.868s --> 1.004x faster (variance: 1.022x, 1.007x)
```
We implement a small-size optimized vector, intended to be used primarily for collection of presumed to be short iterators. This vector cannot be "upsized/reallocated" into a heap-allocated vector, since that would require (slow) branching logic, but during the initial collection from an iterator heap-allocation is possible.
We make the new `AccumulateVec` and `ArrayVec` generic over implementors of the `Array` trait, of which there is currently one, `[T; 8]`. In the future, this is likely to expand to other values of N.
Huge thanks to @nnethercote for collecting the performance and other statistics mentioned above.
remove keys w/ skolemized regions from proj cache when popping skolemized regions
This addresses #37154 (a regression). The projection cache was incorrectly caching the results for skolemized regions -- when we pop skolemized regions, we are supposed to drop cache keys for them (just as we remove those skolemized regions from the region inference graph). This is because those skolemized region numbers will be reused later with different meaning (and we have determined that the old ones don't leak out in any meaningful way).
I did a *somewhat* aggressive fix here of only removing keys that mention the skolemized regions. One could imagine just removing all keys added since we started the skolemization (as indeed I did in my initial commit). This more aggressive fix required fixing a latent bug in `TypeFlags`, as an aside.
I believe the more aggressive fix is correct; clearly there can be entries that are unrelated to the skoelemized region, and it's a shame to remove them. My one concern was that it *is* possible I believe to have some region variables that are created and related to skolemized regions, and maybe some of them could end up in the cache. However, that seems harmless enough to me-- those relations will be removed, and couldn't have impacted how trait resolution proceeded anyway (iow, the cache entry is not wrong, though it is kind of useless).
r? @pnkfelix
cc @arielb1
ICH: Use 128-bit Blake2b hash instead of 64-bit SipHash for incr. comp. fingerprints
This PR makes incr. comp. hashes 128 bits wide in order to push collision probability below a threshold that we need to worry about. It also replaces SipHash, which has been mentioned multiple times as not being built for fingerprinting, with the [BLAKE2b hash function](https://blake2.net/), an improved version of the BLAKE sha-3 finalist.
I was worried that using a cryptographic hash function would make ICH computation noticeably slower, but after doing some performance tests, I'm not any more. Most of the time BLAKE2b is actually faster than using two SipHashes (in order to get 128 bits):
```
SipHash
libcore: 0.199 seconds
libstd: 0.090 seconds
BLAKE2b
libcore: 0.162 seconds
libstd: 0.078 seconds
```
If someone can prove that something like MetroHash128 provides a comparably low collision probability as BLAKE2, I'm happy to switch. But for now we are at least not taking a performance hit.
I also suggest that we throw out the sha-256 implementation in the compiler and replace it with BLAKE2, since our sha-256 implementation is two to three times slower than the BLAKE2 implementation in this PR (cc @alexcrichton @eddyb @brson)
r? @nikomatsakis (although there's not much incr. comp. specific in here, so feel free to re-assign)
fail obligations that depend on erroring obligations
Fix a bug where an obligation that depend on an erroring obligation would
be regarded as successful, leading to global cache pollution and random
lossage.
Fixes#33723.
Fixes#34503.
r? @eddyb since @nikomatsakis is on vacation
beta-nominating because of the massive lossage potential (e.g. with `Copy` this could lead to random memory leaks), plus this is a regression.
Fix a bug where an obligation that depend on an erroring obligation would
be regarded as successful, leading to global cache pollution and random
lossage.
Fixes#33723.
Fixes#34503.
generate fewer basic blocks for variant switches
CC #33567
Adds a new field to TestKind::Switch that tracks the variants that are actually matched against. The other candidates target a common "otherwise" block.
Replace the obligation forest with a graph
In the presence of caching, arbitrary nodes in the obligation forest can be merged, which makes it a general graph. Handle it as such, using cycle-detection algorithms in the processing.
I should do performance measurements sometime.
This was pretty much written as a proof-of-concept. Please help me write this in a less-ugly way. I should also add comments explaining what is going on.
r? @nikomatsakis