The body of these benchmarks is close to empty but not literally empty.
This was making the runtime of the benchmarks (which are compiled
without optimizations!) flicker between 9 ns and 10 ns runtime, which
changes the padding and breaks the test. Recent changes to the standard
library have pushed the runtime closer to 10 ns when unoptimized, which
is why we haven't seen such failures before in CI.
Contributors can also induce such failures before this PR by running the
run-make tests while the system is under heavy load.
Before this fix we applied padding before manually doing what
`convert_benchmarks_to_tests()` does. Instead use
`convert_benchmarks_to_tests()` if applicable and then apply padding
afterwards so it becomes correct. (Benches should only be padded when
run as benches to make it easy to compare the benchmark numbers.)
As you can see the padding is wrong when running benches as tests. This
will be fixed in the next commit. (Benches should only be padded when
run as benches to make it easy to compare the benchmark numbers.)