Add slice::sort_by_cached_key as a memoised sort_by_key
At present, `slice::sort_by_key` calls its key function twice for each comparison that is made. When the key function is expensive (which can often be the case when `sort_by_key` is chosen over `sort_by`), this can lead to very suboptimal behaviour.
To address this, I've introduced a new slice method, `sort_by_cached_key`, which has identical semantic behaviour to `sort_by_key`, except that it guarantees the key function will only be called once per element.
Where there are `n` elements and the key function is `O(m)`:
- `slice::sort_by_cached_key` time complexity is `O(m n log m n)`, compared to `slice::sort_by_key`'s `O(m n + n log n)`.
- `slice::sort_by_cached_key` space complexity remains at `O(n + m)`. (Technically, it now reserves a slice of size `n`, whereas before it reserved a slice of size `n/2`.)
`slice::sort_unstable_by_key` has not been given an analogue, as it is important that unstable sorts are in-place, which is not a property that is guaranteed here. However, this also means that `slice::sort_unstable_by_key` is likely to be slower than `slice::sort_by_cached_key` when the key function does not have negligible complexity. We might want to explore this trade-off further in the future.
Benchmarks (for a vector of 100 `i32`s):
```
# Lexicographic: `|x| x.to_string()`
test bench_sort_by_key ... bench: 112,638 ns/iter (+/- 19,563)
test bench_sort_by_cached_key ... bench: 15,038 ns/iter (+/- 4,814)
# Identity: `|x| *x`
test bench_sort_by_key ... bench: 1,346 ns/iter (+/- 238)
test bench_sort_by_cached_key ... bench: 1,839 ns/iter (+/- 765)
# Power: `|x| x.pow(31)`
test bench_sort_by_key ... bench: 3,624 ns/iter (+/- 738)
test bench_sort_by_cached_key ... bench: 1,997 ns/iter (+/- 311)
# Abs: `|x| x.abs()`
test bench_sort_by_key ... bench: 1,546 ns/iter (+/- 174)
test bench_sort_by_cached_key ... bench: 1,668 ns/iter (+/- 790)
```
(So it seems functions that are single operations do perform slightly worse with this method, but for pretty much any more complex key, you're better off with this optimisation.)
I've definitely found myself using expensive keys in the past and wishing this optimisation was made (e.g. for https://github.com/rust-lang/rust/pull/47415). This feels like both desirable and expected behaviour, at the small cost of slightly more stack allocation and minute degradation in performance for extremely trivial keys.
Resolves#34447.