158f00a1c5
Optimize `VecDeque::drain` for (half-)open ranges
The most common use cases of `VecDeque::drain` consume either the entire queue or elements from the front or back.[^1] This PR makes these operations faster by optimizing the generated code of the destructor of the drain:
* `.drain(..)` is now the same as `.clear()`.
* `.drain(n..)` is now (almost[^2]) the same as `.truncate(n)`.
* `.drain(..n)` is now an efficient "advance" function. This operation is not provided by a dedicated function and optimizing it is my main motivation for this PR.
Previously, all of these cases generated a function call to the destructor of the `DropGuard`, emitting a lot of unused machine code as well as unnecessary branches and loads/stores of stack variables.
There are no algorithmic changes in this PR, but it simplifies the code enough to allow LLVM to recognize the special cases and optimize accordingly. Most notably, it allows elimination of the rather large [`wrap_copy`] function.
Some [rudimentary microbenchmarks][benches] show a performance improvement of **~3x-4x** on my machine for the special cases and roughly equal performance for the general case.
Best reviewed commit by commit.
[^1]: source: GitHub code search: [full range `drain(..)` = 7.5k results][full], [from front `drain(..n)` = 3.2k results][front], [from back `drain(n..)` = 1.6k results][back], [from middle `drain(n..m)` = <500 results][middle]
[^2]: `.drain(0..)` and `.clear()` reset the head to 0, but `.truncate(0)` does not.
[full]: https://github.com/search?type=code&q=%2FVecDeque%28.%7C%5Cn%29%2B%5C.drain%5C%280%3F%5C.%5C.%5C%29%2F+lang%3ARust
[front]: https://github.com/search?type=code&q=%2FVecDeque%28.%7C%5Cn%29%2B%5C.drain%5C%280%3F%5C.%5C.%5B%5E%29%5D.*%5C%29%2F+lang%3ARust
[back]: https://github.com/search?type=code&q=%2FVecDeque%28.%7C%5Cn%29%2B%5C.drain%5C%28%5B%5E0%5D.*%5C.%5C.%5C%29%2F+lang%3ARust
[middle]: https://github.com/search?type=code&q=%2FVecDeque%28.%7C%5Cn%29%2B%5C.drain%5C%28%5B%5E0%5D.*%5C.%5C.%5B%5E%29%5D.*%5C%29%2F+lang%3ARust
[`wrap_copy`]:
|
||
---|---|---|
.. | ||
auxiliary | ||
avr | ||
cffi | ||
cross-crate-inlining | ||
debug-accessibility | ||
dllimports | ||
enum | ||
instrument-xray | ||
intrinsics | ||
issues | ||
lib-optimizations | ||
loongarch-abi | ||
macos | ||
naked-fn | ||
non-terminate | ||
remap_path_prefix | ||
repr | ||
riscv-abi | ||
sanitizer | ||
simd | ||
simd-intrinsic | ||
src-hash-algorithm | ||
unwind-abis | ||
aarch64-struct-align-128.rs | ||
abi-efiapi.rs | ||
abi-main-signature-16bit-c-int.rs | ||
abi-main-signature-32bit-c-int.rs | ||
abi-repr-ext.rs | ||
abi-sysv64.rs | ||
abi-x86_64_sysv.rs | ||
abi-x86-interrupt.rs | ||
addr-of-mutate.rs | ||
adjustments.rs | ||
align-byval-vector.rs | ||
align-byval.rs | ||
align-enum.rs | ||
align-fn.rs | ||
align-offset.rs | ||
align-struct.rs | ||
alloc-optimisation.rs | ||
array-clone.rs | ||
array-codegen.rs | ||
array-equality.rs | ||
array-map.rs | ||
array-optimized.rs | ||
ascii-char.rs | ||
asm-clobber_abi.rs | ||
asm-clobbers.rs | ||
asm-may_unwind.rs | ||
asm-maybe-uninit.rs | ||
asm-multiple-options.rs | ||
asm-options.rs | ||
asm-powerpc-clobbers.rs | ||
asm-sanitize-llvm.rs | ||
asm-target-clobbers.rs | ||
async-closure-debug.rs | ||
async-fn-debug-awaitee-field.rs | ||
async-fn-debug-msvc.rs | ||
async-fn-debug.rs | ||
atomic-operations.rs | ||
autovectorize-f32x4.rs | ||
binary-search-index-no-bound-check.rs | ||
bool-cmp.rs | ||
box-uninit-bytes.rs | ||
bpf-alu32.rs | ||
branch-protection.rs | ||
call-llvm-intrinsics.rs | ||
call-metadata.rs | ||
catch-unwind.rs | ||
cdylib-external-inline-fns.rs | ||
cf-protection.rs | ||
cfguard-checks.rs | ||
cfguard-disabled.rs | ||
cfguard-nochecks.rs | ||
cfguard-non-msvc.rs | ||
char-ascii-branchless.rs | ||
codemodels.rs | ||
coercions.rs | ||
cold-call-declare-and-call.rs | ||
comparison-operators-2-tuple.rs | ||
comparison-operators-newtype.rs | ||
const_scalar_pair.rs | ||
consts.rs | ||
coroutine-debug-msvc.rs | ||
coroutine-debug.rs | ||
dealloc-no-unwind.rs | ||
debug-alignment.rs | ||
debug-column-msvc.rs | ||
debug-column.rs | ||
debug-compile-unit-path.rs | ||
debug-fndef-size.rs | ||
debug-limited.rs | ||
debug-line-directives-only.rs | ||
debug-line-tables-only.rs | ||
debug-linkage-name.rs | ||
debug-vtable.rs | ||
debuginfo-constant-locals.rs | ||
debuginfo-generic-closure-env-names.rs | ||
debuginfo-inline-callsite-location.rs | ||
deduced-param-attrs.rs | ||
default-hidden-visibility.rs | ||
default-requires-uwtable.rs | ||
direct-access-external-data.rs | ||
drop-in-place-noalias.rs | ||
drop.rs | ||
dst-vtable-align-nonzero.rs | ||
dst-vtable-size-range.rs | ||
ehcontguard_disabled.rs | ||
ehcontguard_enabled.rs | ||
enable-lto-unit-splitting.rs | ||
export-no-mangle.rs | ||
external-no-mangle-fns.rs | ||
external-no-mangle-statics.rs | ||
fastcall-inreg.rs | ||
fatptr.rs | ||
fewer-names.rs | ||
float_math.rs | ||
fn-impl-trait-self.rs | ||
foo.s | ||
force-frame-pointers.rs | ||
force-no-unwind-tables.rs | ||
force-unwind-tables.rs | ||
frame-pointer.rs | ||
function-arguments-noopt.rs | ||
function-arguments.rs | ||
function-return.rs | ||
gdb_debug_script_load.rs | ||
generic-debug.rs | ||
global_asm_include.rs | ||
global_asm_x2.rs | ||
global_asm.rs | ||
i128-x86-align.rs | ||
infallible-unwrap-in-opt-z.rs | ||
inherit_overflow.rs | ||
inline-always-works-always.rs | ||
inline-debuginfo.rs | ||
inline-function-args-debug-info.rs | ||
inline-hint.rs | ||
instrument-coverage-off.rs | ||
instrument-coverage.rs | ||
instrument-mcount.rs | ||
integer-cmp.rs | ||
integer-overflow.rs | ||
internalize-closures.rs | ||
intrinsic-no-unnamed-attr.rs | ||
is_val_statically_known.rs | ||
issue-97217.rs | ||
iter-repeat-n-trivial-drop.rs | ||
layout-size-checks.rs | ||
lifetime_start_end.rs | ||
link_section.rs | ||
link-dead-code.rs | ||
llvm_module_flags.rs | ||
llvm-ident.rs | ||
loads.rs | ||
local-generics-in-exe-internalized.rs | ||
lto-removes-invokes.rs | ||
mainsubprogram.rs | ||
mainsubprogramstart.rs | ||
match-optimized.rs | ||
match-optimizes-away.rs | ||
match-unoptimized.rs | ||
maybeuninit-rvo.rs | ||
mem-replace-big-type.rs | ||
mem-replace-simple-type.rs | ||
merge-functions.rs | ||
method-declaration.rs | ||
mir_zst_stores.rs | ||
mir-inlined-line-numbers.rs | ||
move-before-nocapture-ref-arg.rs | ||
move-operands.rs | ||
no_builtins-at-crate.rs | ||
no-assumes-on-casts.rs | ||
no-dllimport-w-cross-lang-lto.rs | ||
no-jump-tables.rs | ||
no-plt.rs | ||
noalias-box-off.rs | ||
noalias-box.rs | ||
noalias-flag.rs | ||
noalias-refcell.rs | ||
noalias-rwlockreadguard.rs | ||
noalias-unpin.rs | ||
noreturn-uninhabited.rs | ||
noreturnflag.rs | ||
nounwind.rs | ||
nrvo.rs | ||
optimize-attr-1.rs | ||
option-as-slice.rs | ||
option-nonzero-eq.rs | ||
overaligned-constant.rs | ||
packed.rs | ||
panic-abort-windows.rs | ||
panic-in-drop-abort.rs | ||
panic-unwind-default-uwtable.rs | ||
personality_lifetimes.rs | ||
pgo-counter-bias.rs | ||
pgo-instrumentation.rs | ||
pic-relocation-model.rs | ||
pie-relocation-model.rs | ||
ptr-arithmetic.rs | ||
ptr-read-metadata.rs | ||
README.md | ||
refs.rs | ||
repeat-trusted-len.rs | ||
scalar-pair-bool.rs | ||
set-discriminant-invalid.rs | ||
slice_as_from_ptr_range.rs | ||
slice-as_chunks.rs | ||
slice-indexing.rs | ||
slice-init.rs | ||
slice-iter-fold.rs | ||
slice-iter-len-eq-zero.rs | ||
slice-iter-nonnull.rs | ||
slice-position-bounds-check.rs | ||
slice-ref-equality.rs | ||
slice-reverse.rs | ||
slice-windows-no-bounds-check.rs | ||
some-abis-do-extend-params-to-32-bits.rs | ||
some-global-nonnull.rs | ||
sparc-struct-abi.rs | ||
split-lto-unit.rs | ||
sroa-fragment-debuginfo.rs | ||
sse42-implies-crc32.rs | ||
stack-probes-inline.rs | ||
stack-protector.rs | ||
static-relocation-model-msvc.rs | ||
staticlib-external-inline-fns.rs | ||
stores.rs | ||
swap-large-types.rs | ||
swap-small-types.rs | ||
target-cpu-on-functions.rs | ||
target-feature-inline-closure.rs | ||
target-feature-overrides.rs | ||
thin-lto.rs | ||
thread-local.rs | ||
tied-features-strength.rs | ||
to_vec.rs | ||
trailing_zeros.rs | ||
transmute-optimized.rs | ||
transmute-scalar.rs | ||
try_identity.rs | ||
try_question_mark_nop.rs | ||
tune-cpu-on-functions.rs | ||
tuple-layout-opt.rs | ||
unchecked_shifts.rs | ||
unchecked-float-casts.rs | ||
uninit-consts.rs | ||
union-abi.rs | ||
unwind-and-panic-abort.rs | ||
unwind-extern-exports.rs | ||
unwind-extern-imports.rs | ||
unwind-landingpad-cold.rs | ||
unwind-landingpad-inline.rs | ||
used_with_arg.rs | ||
var-names.rs | ||
vec_pop_push_noop.rs | ||
vec-as-ptr.rs | ||
vec-calloc.rs | ||
vec-in-place.rs | ||
vec-iter-collect-len.rs | ||
vec-iter.rs | ||
vec-optimizes-away.rs | ||
vec-reserve-extend.rs | ||
vec-shrink-panik.rs | ||
vecdeque_no_panic.rs | ||
vecdeque-drain.rs | ||
vecdeque-nonempty-get-no-panic.rs | ||
virtual-function-elimination-32bit.rs | ||
virtual-function-elimination.rs | ||
wasm_casts_trapping.rs | ||
wasm_exceptions.rs | ||
zip.rs | ||
zst-offset.rs |
The files here use the LLVM FileCheck framework, documented at https://llvm.org/docs/CommandGuide/FileCheck.html.
One extension worth noting is the use of revisions as custom prefixes for FileCheck. If your codegen test has different behavior based on the chosen target or different compiler flags that you want to exercise, you can use a revisions annotation, like so:
// revisions: aaa bbb
// [bbb] compile-flags: --flags-for-bbb
After specifying those variations, you can write different expected, or
explicitly unexpected output by using <prefix>-SAME:
and <prefix>-NOT:
,
like so:
// CHECK: expected code
// aaa-SAME: emitted-only-for-aaa
// aaa-NOT: emitted-only-for-bbb
// bbb-NOT: emitted-only-for-aaa
// bbb-SAME: emitted-only-for-bbb