Previously, some parts of this optimization were impossible because the
alignment passed to the free function was not correct. That was fully
fixed by #17012.
Closes#17092
Previously, some parts of this optimization were impossible because the
alignment passed to the free function was not correct. That was fully
fixed by #17012.
Closes#17092
Based on an observation that strings and arguments are always interleaved, thanks to #15832. Additionally optimize invocations where formatting parameters are unspecified for all arguments, e.g. `"{} {:?} {:x}"`, by emptying the `__STATIC_FMTARGS` array. Next, `Arguments::new` replaces an empty slice with `None` so that passing empty `__STATIC_FMTARGS` generates slightly less machine code when `Arguments::new` is inlined. Furthermore, formatting itself treats these cases separately without making redundant copies of formatting parameters.
All in all, this adds a single mov instruction per `write!` in most cases. That's why code size has increased.
Format specs are ignored and not stored in case they're all default.
Restore default formatting parameters during iteration.
Pass `None` instead of empty slices of format specs to take advantage
of non-nullable pointer optimization.
Generate a call to one of two functions of `fmt::Argument`.
rand: inform the optimiser that indexing is never out-of-bounds.
This uses a bitwise mask to ensure that there's no bounds checking for
the array accesses when generating the next random number. This isn't
costless, but the single instruction is nothing compared to the branch.
A `debug_assert` for "bounds check" is preserved to ensure that
refactoring doesn't accidentally break it (i.e. create values of `cnt`
that are out of bounds with the masking causing it to silently wrap-
around).
Before:
test test::rand_isaac ... bench: 990 ns/iter (+/- 24) = 808 MB/s
test test::rand_isaac64 ... bench: 614 ns/iter (+/- 25) = 1302 MB/s
After:
test test::rand_isaac ... bench: 877 ns/iter (+/- 134) = 912 MB/s
test test::rand_isaac64 ... bench: 470 ns/iter (+/- 30) = 1702 MB/s
(It also removes the unsafe code in Isaac64Rng.next_u64, with a *gain*
in performance; today is a good day.)
The performance hit from these checks is significant, but unoptimized
builds are already incredibly slow. Enabling these checks results in
better test coverage since there are bots doing unoptimized builds, and
the cost is relatively small in the context of an unoptimized build.
This also allows using `JEMALLOC_FLAGS` to override the default
configure flags.
instead of prefix `..`.
This breaks code that looked like:
match foo {
[ first, ..middle, last ] => { ... }
}
Change this code to:
match foo {
[ first, middle.., last ] => { ... }
}
RFC #55.
Closes#16967.
[breaking-change]
I've found that 64k is still too much and continue to see the errors as reported
in #14940. I've locally found that 32k fails, and 24k succeeds, so I've trimmed
the size down to 10000 which the included links in the added comment end up
recommending.
It sounds like the limit can still be hit with many threads in play, but I have
yet to reproduce this, so I figure we can wait until that's hit (if it's
possible) and then take action.