Auto merge of #111296 - Sp00ph:const_gcd, r=nagisa,Mark-Simulacrum

Always const-evaluate the GCD in `slice::align_to_offsets`

Use an inline `const`-block to force the compiler to calculate the GCD at compile time, even in debug mode. This shouldn't affect the behavior of the program at all, but it drastically cuts down on the number of instructions emitted with optimizations disabled.

With the current implementation, a single `slice::align_to` instantiation (specifically `<[u8]>::align_to::<u128>()`) generates 676 instructions (on x86-64). Forcing the GCD computation to be const cuts it down to 327 instructions, so just over 50% less. This is obviously not representative of actual runtime gains, but I still see it as a significant win as long as it doesn't degrade compile times.

Not having to worry about LLVM const-evaluating the GCD function also allows it to use the textbook recursive euclidean algorithm instead of a much more complicated iterative implementation with multiple `unsafe`-blocks.
This commit is contained in:
bors 2023-05-08 23:47:39 +00:00
commit 90c02c1bc1

View File

@ -3478,44 +3478,13 @@ fn align_to_offsets<U>(&self) -> (usize, usize) {
// Ts = size_of::<U> / gcd(size_of::<T>, size_of::<U>)
//
// Luckily since all this is constant-evaluated... performance here matters not!
#[inline]
fn gcd(a: usize, b: usize) -> usize {
use crate::intrinsics;
// iterative steins algorithm
// We should still make this `const fn` (and revert to recursive algorithm if we do)
// because relying on llvm to consteval all this is… well, it makes me uncomfortable.
// SAFETY: `a` and `b` are checked to be non-zero values.
let (ctz_a, mut ctz_b) = unsafe {
if a == 0 {
return b;
}
if b == 0 {
return a;
}
(intrinsics::cttz_nonzero(a), intrinsics::cttz_nonzero(b))
};
let k = ctz_a.min(ctz_b);
let mut a = a >> ctz_a;
let mut b = b;
loop {
// remove all factors of 2 from b
b >>= ctz_b;
if a > b {
mem::swap(&mut a, &mut b);
}
b = b - a;
// SAFETY: `b` is checked to be non-zero.
unsafe {
if b == 0 {
break;
}
ctz_b = intrinsics::cttz_nonzero(b);
}
}
a << k
const fn gcd(a: usize, b: usize) -> usize {
if b == 0 { a } else { gcd(b, a % b) }
}
let gcd: usize = gcd(mem::size_of::<T>(), mem::size_of::<U>());
// Explicitly wrap the function call in a const block so it gets
// constant-evaluated even in debug mode.
let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
let ts: usize = mem::size_of::<U>() / gcd;
let us: usize = mem::size_of::<T>() / gcd;