Tracking issue: #30014
This implements the RFC and makes a few other changes.
I have added a few extra tests, and made the Windows and
Unix code as similar as possible.
Part of the RFC mentions the unstable OpenOptionsExt trait
on Windows (see #27720). I have added a few extra methods
to future-proof it for CreateFile2.
(Note that it might be a good idea to replace *all* calls of
`alloc_ty` with calls to `alloc_ty_init`, to encourage programmers to
consider the appropriate value for the `init` flag when creating
temporary values.)
includes bugfixes pointed out during review:
* Only `call_lifetime_start` for an alloca if the function entry does
not itself initialize it to "dropped."
* Remove `schedule_lifetime_end` after writing an *element* into a
borrowed slice. (As explained by [dotdash][irc], "the lifetime end
that is being removed was for an element in the slice, which is not
an alloca of its own and has no lifetime start of its own")
[irc]: https://botbot.me/mozilla/rust-internals/2016-01-13/?msg=57844504&page=3
It appears this was left out of RFC rust-lang/rfcs#528 because it might be useful to
also generalize the second argument in some way. That doesn't seem to
prevent generalizing the first argument now, however.
This is a [breaking-change] because it could cause type-inference to
fail where it previously succeeded.
Also update docs for a few other methods that still referred to `&str` instead of patterns.
Add tables of small powers of ten used in the fast path. The tables are redundant: We could also use the big, more accurate table and round the value to the correct type (in fact we did just that before this commit). However, the rounding is extra work and slows down the fast path.
Because only very small exponents enter the fast path, the table and thus the space overhead is negligible. Speed-wise, this is a clear win on a [benchmark] comparing the fast path to a naive, hand-optimized, inaccurate algorithm. Specifically, this change narrows the gap from a roughly 5x difference to a roughly 3.4x difference.
[benchmark]: https://gist.github.com/Veedrac/dbb0c07994bc7882098e
Michael Ellerman pointed out that the system call for getrandom()
on PowerPC Linux is incorrect. This bug was in the powerpc32 port,
and was carried over to the powerpc64 port too.
Add tables of small powers of ten used in the fast path. The tables are redundant: We could also use the big, more accurate table and round the value to the correct type (in fact we did just that before this commit). However, the rounding is extra work and slows down the fast path.
Because only very small exponents enter the fast path, the table and thus the space overhead is negligible. Speed-wise, this is a clear win on a [benchmark] comparing the fast path to a naive, hand-optimized, inaccurate algorithm. Specifically, this change narrows the gap from a roughly 5x difference to a roughly 3.4x difference.
[benchmark]: https://gist.github.com/Veedrac/dbb0c07994bc7882098e
This speeds up the ascii case (and long stretches of ascii in otherwise
mixed UTF-8 data) when checking UTF-8 validity.
Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14!
On xml and mostly english language input (en.wikipedia xml dump),
throughput increases by a factor 7.
On mostly non-ASCII input, performance increases slightly or is the
same.
The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, they
are statically elided by llvm and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.
A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.
Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.
Old code is `regular`, this PR is called `fast`.
Datasets:
- `ascii` is just ascii (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia xml dump
- `enwik10` is 100MB of an en.wikipedia xml dump
- `jawik10` is 10MB of a ja.wikipedia xml dump
```
test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular ... bench: 12,250 ns/iter (+/- 437) = 418 MB/s
test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```
Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
In 8d90d3f36871a00023cc1f313f91e351c287ca15 `BufStream`, the only
consumer of `InternalBufWriter`, was removed. As implied by the name,
this type is private, hence it is currently dead code.
In particular, bring back the `zero` flag for `lvalue_scratch_datum`,
which controls whether the alloca's created immediately at function
start are uninitialized at that point or have their embedded
drop-flags initialized to "dropped".
Then made `to_lvalue_datum_in_scope` pass "dropped" as `zero` flag.
I also re-enabled the use of `#[thread_local]` on AArch64. It was originally disabled in the PR that introduced AArch64 (#19790), but the reasons for this were not explained. `#[thread_local]` seems to work fine in my tests on AArch64, so I don't think this should be an issue.
cc @alexcrichton @akiss77
`siginfo_si_addr()` function is used once, and the returned value is
casted to `usize`. So make the function returns a `usize`.
it simplifies OpenBSD case, where the return type wouldn't be a `*mut
libc::c_void` but a `*mut libc::c_char`.