This avoids all sorts of confusing issues with using both `dest_place`
and `self` in the `propagate_call_return` function in the
`BitDenotation` implementation for `Borrows`.
Currently we have two files implementing bitsets (and 2D bit matrices).
This commit combines them into one, taking the best features from each.
This involves renaming a lot of things. The high level changes are as
follows.
- bitvec.rs --> bit_set.rs
- indexed_set.rs --> (removed)
- BitArray + IdxSet --> BitSet (merged, see below)
- BitVector --> GrowableBitSet
- {,Sparse,Hybrid}IdxSet --> {,Sparse,Hybrid}BitSet
- BitMatrix --> BitMatrix
- SparseBitMatrix --> SparseBitMatrix
The changes within the bitset types themselves are as follows.
```
OLD OLD NEW
BitArray<C> IdxSet<T> BitSet<T>
-------- ------ ------
grow - grow
new - (remove)
new_empty new_empty new_empty
new_filled new_filled new_filled
- to_hybrid to_hybrid
clear clear clear
set_up_to set_up_to set_up_to
clear_above - clear_above
count - count
contains(T) contains(&T) contains(T)
contains_all - superset
is_empty - is_empty
insert(T) add(&T) insert(T)
insert_all - insert_all()
remove(T) remove(&T) remove(T)
words words words
words_mut words_mut words_mut
- overwrite overwrite
merge union union
- subtract subtract
- intersect intersect
iter iter iter
```
In general, when choosing names I went with:
- names that are more obvious (e.g. `BitSet` over `IdxSet`).
- names that are more like the Rust libraries (e.g. `T` over `C`,
`insert` over `add`);
- names that are more set-like (e.g. `union` over `merge`, `superset`
over `contains_all`, `domain_size` over `num_bits`).
Also, using `T` for index arguments seems more sensible than `&T` --
even though the latter is standard in Rust collection types -- because
indices are always copyable. It also results in fewer `&` and `*`
sigils in practice.
`HybridIdxSetBuf` is a sparse-when-small but dense-when-large index set
that is very efficient for sets that (a) have few elements, (b) have
large `universe_size` values, and (c) are cleared frequently. Which
makes it perfect for the `gen_set` and `kill_set` sets used by the new
borrow checker.
This patch reduces the execution time of the five slowest NLL benchmarks
by 55%, 21%, 16%, 10% and 9%. It also reduces the max-rss of three
benchmarks by 53%, 33%, and 9%.
The current situation is something of a mess.
- `IdxSetBuf` derefs to `IdxSet`.
- `IdxSetBuf` implements `Clone`, and therefore has a provided `clone_from`
method, which does allocation and so is expensive.
- `IdxSet` has a `clone_from` method that is non-allocating and therefore
cheap, but this method is not from the `Clone` trait.
As a result, if you have an `IdxSetBuf` called `b`, if you call
`b.clone_from(b2)` you'll get the expensive `IdxSetBuf` method, but if you call
`(*b).clone_from(b2)` you'll get the cheap `IdxSetBuf` method.
`liveness_of_locals()` does the former, presumably unintentionally, and
therefore does lots of unnecessary allocations.
Having a `clone_from` method that isn't from the `Clone` trait is a bad idea in
general, so this patch renames it as `overwrite`. This avoids the unnecessary
allocations in `liveness_of_locals()`, speeding up most NLL benchmarks, the
best by 1.5%. It also means that calls of the form `(*b).clone_from(b2)` can be
rewritten as `b.overwrite(b2)`.
`has_any_child_of` is hot. It allocates a `Vec` that almost always
doesn't exceed a length of 1.
This patch peels off the first iteration of the loop, avoiding the need
for the `Vec` creation in ~99% of cases.
High-level picture: The old `Borrows` analysis is now called
`Reservations` (implemented as a newtype wrapper around `Borrows`);
this continues to compute whether a `Rvalue::Ref` can reach a
statement without an intervening `EndRegion`. In addition, we also
track what `Place` each such `Rvalue::Ref` was immediately assigned
to in a given borrow (yay for MIR-structural properties!).
The new `ActiveBorrows` analysis then tracks the initial use of any of
those assigned `Places` for a given borrow. I.e. a borrow becomes
"active" immediately after it starts being "used" in some way. (This
is conservative in the sense that we will treat a copy `x = y;` as a
use of `y`; in principle one might further delay activation in such
cases.)
The new `ActiveBorrows` analysis needs to take the `Reservations`
results as an initial input, because the reservation state influences
the gen/kill sets for `ActiveBorrows`. In particular, a use of `a`
activates a borrow `a = &b` if and only if there exists a path (in the
control flow graph) from the borrow to that use. So we need to know if
the borrow reaches a given use to know if it really gets a gen-bit or
not.
* Incorporating the output from one dataflow analysis into the input
of another required more changes to the infrastructure than I had
expected, and even after those changes, the resulting code is still
a bit subtle.
* In particular, Since we need to know the intrablock reservation
state, we need to dynamically update a bitvector for the
reservations as we are also trying to compute the gen/kills
bitvector for the active borrows.
* The way I ended up deciding to do this (after also toying with at
least two other designs) is to put both the reservation state and
the active borrow state into a single bitvector. That is why we now
have separate (but related) `BorrowIndex` and
`ReserveOrActivateIndex`: each borrow index maps to a pair of
neighboring reservation and activation indexes.
As noted above, these changes are solely adding the active borrows
dataflow analysis (and updating the existing code to cope with the
switch from `Borrows` to `Reservations`). The code to process the
bitvector in the borrow checker currently just skips over all of the
active borrow bits.
But atop this commit, one *can* observe the analysis results by
looking at the graphviz output, e.g. via
```rust
#[rustc_mir(borrowck_graphviz_preflow="pre_two_phase.dot",
borrowck_graphviz_postflow="post_two_phase.dot")]
```
Includes doc for `FindPlaceUses`, as well as `Reservations` and
`ActiveBorrows` structs, which are wrappers are the `Borrows` struct
that dictate which flow analysis should be performed.