Fixes to mir dataflow
Fixes to mir dataflow
This collects a bunch of changes to `rustc_borrowck::borrowck::dataflow` (which others have pointed out should probably migrate to some crate that isn't tied to the borrow-checker -- but I have not attempted that here, especially since there are competing approaches to dataflow that we should also evaluate).
These changes:
1. Provide a family of related analyses: MovingOutStatements (which is what the old AST-based dataflo computed), as well as MaybeInitialized, MaybeUninitalized, and DefinitelyInitialized.
* (The last two are actually inverses of each other; we should pick one and drop the other.)
2. Fix bugs in the pre-existing analysis implementation, which was untested and thus some obvious bugs went unnoticed, which brings us to the third point:
3. Add a unit test infrastructure for the MIR dataflow analysis.
* The tests work by adding a new intrinsic that is able to query the analysis state for a particular expression (technically, a particular L-value).
* See the examples in compile-fail/mir-dataflow/inits-1.rs and compile-fail/mir-dataflow/uninits-1.rs
* These tests are only checking the results for MaybeInitialized, MaybeUninitalized, and DefinitelyInitialized; I am not sure if it will be feasible to generalize this testing strategy to the MovingOutStatements dataflow operator.
(The crucial thing these changes are working toward (but are not yet
in this commit) is a way to pretty-print MIR without having the
`NodeId` for that MIR in hand.)
Some simple improvements to MIR pretty printing
In short, this PR changes the MIR printer so that it:
* places an empty line between the MIR for each item
* does *not* write an empty line before the first BB when there are no
var decls
* aligns the "// Scope" comments 50 chars in (makes the output more
readable)
* prints the scope comments as "// scope N at ..." instead of "//
Scope(N) at ..."
* prints a prettier scope tree:
* no more unbalanced delimiters!
* no more "Parent" entry (these convey no useful information)
* drop the "Scope()" and just print scope IDs
* no braces when the scope is empty
In action: https://gist.github.com/jonas-schievink/1c11226cbb112892a9470ce0f9870b65
Only break critical edges where actually needed
Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.
This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.
In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².
In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".
So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.
cc #33111
r? @Aatch
mir: don't attempt to promote Unpromotable constant temps.
Fixes#33537. This was a non-problem in regular functions, but we also promote in `const fn`s.
There we always qualify temps so you can't depend on `Unpromotable` temps being `NOT_CONST`.
In short, this PR changes the MIR printer so that it:
* places an empty line between the MIR for each item
* does *not* write an empty line before the first BB when there are no
var decls
* aligns the "// Scope" comments 50 chars in (makes the output more
readable)
* prints the scope comments as "// scope N at ..." instead of "//
Scope(N) at ..."
* prints a prettier scope tree:
* no more unbalanced delimiters!
* no more "Parent" entry (these convey no useful information)
* drop the "Scope()" and just print scope IDs
* no braces when the scope is empty
Currently, to prepare for MIR trans, we break _all_ critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.
This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.
In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².
In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".
So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.
cc #33111
mir: drop temps outside-in by scheduling the drops inside-out.
It was backwards all along, but only noticeable with multiple drops in one rvalue scope. Fixes#32433.
Coercion casts (`expr as T` where the type of `expr` can be coerced to
`T`) are essentially no-ops, as the actual work is done by a coercion.
Previously a check for type equality was used to avoid emitting the
redundant cast in the MIR, but this failed for coercion casts of
function items that had lifetime parameters. The MIR trans code doesn't
handle `FnPtr -> FnPtr` casts and produced an error.
Also fixes a bug with type ascription expressions not having any
adjustments applied.
Fixes#33295
Primarily affects the MIR construction, which indirectly improves LLVM
IR generation, but some LLVM IR changes have been made too.
* Handle "statement expressions" more intelligently. These are
expressions that always evaluate to `()`. Previously a temporary would
be generated as a destination to translate into, which is unnecessary.
This affects assignment, augmented assignment, `return`, `break` and
`continue`.
* Avoid inserting drops for non-drop types in more places. Scheduled
drops were already skipped for types that we knew wouldn't need
dropping at construction time. However manually-inserted drops like
those for `x` in `x = y;` were still generated. `build_drop` now takes
a type parameter like its `schedule_drop` counterpart and checks to
see if the type needs dropping.
* Avoid generating an extra temporary for an assignment where the types
involved don't need dropping. Previously an expression like
`a = b + 1;` would result in a temporary for `b + 1`. This is so the
RHS can be evaluated, then the LHS evaluated and dropped and have
everything work correctly. However, this isn't necessary if the `LHS`
doesn't need a drop, as we can just overwrite the existing value.
* Improves lvalue analysis to allow treating an `Rvalue::Use` as an
operand in certain conditions. The reason for it never being an
operand is so it can be zeroed/drop-filled, but this is only true for
types that need dropping.
The first two changes result in significantly fewer MIR blocks being
generated, as previously almost every statement would end up generating
a new block due to the drop of the `()` temporary being generated.
Paths are mostly parsed without taking whitespaces into account, e.g. `std :: vec :: Vec :: new ()` parses successfully, however, there are some special cases involving keywords `super`, `self` and `Self`. For example, `self::` is considered a path start only if there are no spaces between `self` and `::`. These restrictions probably made sense when `self` and friends weren't keywords, but now they are unnecessary.
The first two commits remove this special treatment of whitespaces by removing `token::IdentStyle` entirely and therefore fix https://github.com/rust-lang/rust/issues/14109.
This change also affects naked `self` and `super` (which are not tightly followed by `::`, obviously) they can now be parsed as paths, however they are still not resolved correctly in imports (cc @jseyfried, see `compile-fail/use-keyword.rs`), so https://github.com/rust-lang/rust/issues/29036 is not completely fixed.
The third commit also makes `super`, `self`, `Self` and `static` keywords nominally (before this they acted as keywords for all purposes) and removes most of remaining \"special idents\".
The last commit (before tests) contains some small improvements - some qualified paths with type parameters are parsed correctly, `parse_path` is not used for parsing single identifiers, imports are sanity checked for absence of type parameters - such type parameters can be generated by syntax extensions or by macros when https://github.com/rust-lang/rust/issues/10415 is fixed (~~soon!~~already!).
This patch changes some pretty basic things in `libsyntax`, like `token::Token` and the keyword list, so it's a plugin-[breaking-change].
r? @eddyb
MIR: Do not require END_BLOCK to always exist
Basically, all this does, is removing restriction for END_BLOCK to exist past the first invocation of RemoveDeadBlocks pass. This way for functions whose CFG does not reach the `END_BLOCK` end up not containing the block.
As far as the implementation goes, I’m not entirely satisfied with the `BasicBlock::end_block`. I had hoped to make `new` a `const fn` and then just have a `const END_BLOCK` private to mir::build, but it turns out that constant functions don’t yet support conditionals nor a way to assert.