Avoid most allocations in `Canonicalizer`.
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:
indices
.entry(kind)
.or_insert_with(|| {
let cvar1 = variables.push(info);
let cvar2 = var_values.push(kind);
assert_eq!(cvar1, cvar2);
cvar1
})
.clone()
`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)
The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
r? @nikomatsakis
Extra allocations are a significant cost of NLL, and the most common
ones come from within `Canonicalizer`. In particular, `canonical_var()`
contains this code:
indices
.entry(kind)
.or_insert_with(|| {
let cvar1 = variables.push(info);
let cvar2 = var_values.push(kind);
assert_eq!(cvar1, cvar2);
cvar1
})
.clone()
`variables` and `var_values` are `Vec`s. `indices` is a `HashMap` used
to track what elements have been inserted into `var_values`. If `kind`
hasn't been seen before, `indices`, `variables` and `var_values` all get
a new element. (The number of elements in each container is always the
same.) This results in lots of allocations.
In practice, most of the time these containers only end up holding a few
elements. This PR changes them to avoid heap allocations in the common
case, by changing the `Vec`s to `SmallVec`s and only using `indices`
once enough elements are present. (When the number of elements is small,
a direct linear search of `var_values` is as good or better than a
hashmap lookup.)
The changes to `variables` are straightforward and contained within
`Canonicalizer`. The changes to `indices` are more complex but also
contained within `Canonicalizer`. The changes to `var_values` are more
intrusive because they require defining a new type
`SmallCanonicalVarValues` -- which is to `CanonicalVarValues` as
`SmallVec` is to `Vec -- and passing stack-allocated values of that type
in from outside.
All this speeds up a number of NLL "check" builds, the best by 2%.
This new query returns only the predicates *directly defined* on an
item (in contrast to the more common `predicates_of`, which returns
the predicates that must be proven to reference an item). These two
sets are almost always identical except for traits, where
`predicates_of` includes an artificial `Self: Trait<...>` predicate
(basically saying that you cannot use a trait item without proving
that the trait is implemented for the type parameters).
This new query is only used in chalk lowering, where this artificial
`Self: Trait` predicate is problematic. We encode it in metadata but
only where needed since it is kind of repetitive with existing
information.
Co-authored-by: Tyler Mandry <tmandry@gmail.com>
This requires us to allocate a single entry vector we didn't use to
allocate. I doubt this makes a difference in practice, as this only
occurs for cache misses.