Make CharSplitIterator double-ended which is simple given that the operation is symmetric, once the split-N feature is factored out into its own adaptor.
`.rsplitn_iter()` allows splitting `N` times from the back of a string, so it is a completely new feature. With the double-ended impl, `.split_iter()`, `.line_iter()`, `.word_iter()` all allow picking off elements from either end.
`split_options_iter` is removed with the factoring of the split- and split-N- iterators, instead there is `split_terminator_iter`.
---
Add benchmarks using `#[bench]` and tune CharSplitIterator a bit after Huon Wilson's suggestions
Benchmarks 1-5 do the same split using different implementations of `CharEq`, all splitting an ascii string on ascii space. Benchmarks 6-7 split a unicode string on an ascii char.
Before this PR
test str::bench::split_iter_ascii ... bench: 166 ns/iter (+/- 2)
test str::bench::split_iter_closure ... bench: 113 ns/iter (+/- 1)
test str::bench::split_iter_extern_fn ... bench: 286 ns/iter (+/- 7)
test str::bench::split_iter_not_ascii ... bench: 114 ns/iter (+/- 4)
test str::bench::split_iter_slice ... bench: 220 ns/iter (+/- 12)
test str::bench::split_iter_unicode_ascii ... bench: 217 ns/iter (+/- 3)
test str::bench::split_iter_unicode_not_ascii ... bench: 248 ns/iter (+/- 3)
PR, first commit
test str::bench::split_iter_ascii ... bench: 331 ns/iter (+/- 9)
test str::bench::split_iter_closure ... bench: 114 ns/iter (+/- 2)
test str::bench::split_iter_extern_fn ... bench: 314 ns/iter (+/- 6)
test str::bench::split_iter_not_ascii ... bench: 132 ns/iter (+/- 1)
test str::bench::split_iter_slice ... bench: 157 ns/iter (+/- 3)
test str::bench::split_iter_unicode_ascii ... bench: 502 ns/iter (+/- 64)
test str::bench::split_iter_unicode_not_ascii ... bench: 250 ns/iter (+/- 3)
PR, final version
test str::bench::split_iter_ascii ... bench: 106 ns/iter (+/- 4)
test str::bench::split_iter_closure ... bench: 107 ns/iter (+/- 1)
test str::bench::split_iter_extern_fn ... bench: 267 ns/iter (+/- 6)
test str::bench::split_iter_not_ascii ... bench: 108 ns/iter (+/- 1)
test str::bench::split_iter_slice ... bench: 170 ns/iter (+/- 8)
test str::bench::split_iter_unicode_ascii ... bench: 128 ns/iter (+/- 5)
test str::bench::split_iter_unicode_not_ascii ... bench: 252 ns/iter (+/- 3)
---
There are several ways to deal with `CharEq::only_ascii`. It is a performance optimization, so with that in mind, we allow passing bogus char (outside ascii) as long as they don't match. We use a byte value check to make sure we don't split on these (would split substrings in the middle of encoded char). (A more principled way would be to only pass the ascii codepoints to the CharEq when it indicates only_ascii, but that undoes some of the performance optimization.)
Implement Huon Wilson's suggestions (since the benchmarks agree!).
Use `self.sep.matches(byte as char) && byte < 128u8` to match in the
only_ascii case so that mistaken matches outside the ascii range can't
create invalid substrings.
Put the conditional on only_ascii outside the loop.
Add new methods `.rsplit_iter()` and `.rsplitn_iter()` for &str.
Separate out CharSplitIterator and CharSplitNIterator,
CharSplitIterator (`split_iter` and `rsplit_iter`) is made double-ended
while `splitn_iter` and `rsplitn_iter` (limited to N splits) are not,
since these don't have the same symmetry.
With CharSplitIterator being double ended, derived iterators like
`line_iter` and `word_iter` are too.
Recent improvements to `&mut Trait` have made this work possible, and it solidifies that `ifmt` doesn't always have to return a string, but rather it's based around writers.
The method names in std::rt::io::extensions::WriterByteConversions are
the same as those in std::io::WriterUtils and a resolve error causes
rustc to fail after trying to find an impl of io::Writer instead of
trying to look for rt::io::Writer as well.
These aren't used for anything at the moment and cause some TLS hits
on some perf-critical code paths. Will need to put better thought into
it in the future.
This documents how to use trait bounds in a (hopefully) user-friendly way, in the containers tutorial, and also documents the task watching implementation for runtime developers in kill.rs.
r anybody
Naturally, and sadly, turning off sanity checks in the runtime is
a noticable performance win. The particular test I'm running goes from
~1.5 s to ~1.3s.
Sanity checks are turned *on* when not optimizing, or when cfg
includes `rtdebug` or `rtassert`.
Monomorphize's normalization results in a 2% decrease in non-optimized
code size for libstd, so there's a negligible cost to removing it. This
also fixes several visit glue bugs because normalize wasn't considering
the differences in visit glue between types.
Closes#8720
The method names in std::rt::io::extensions::WriterByteConversions are
the same as those in std::io::WriterUtils and a resolve error causes
rustc to fail after trying to find an impl of io::Writer instead of
trying to look for rt::io::Writer as well.
Same goes for ReaderByteConversions.
This resolves issue #908.
Notable changes:
- On Windows, LLVM integrated assembler emits bad stack unwind tables when segmented stacks are enabled. However, unwind info directives in the assembly output are correct, so we generate assembly first and then run it through an external assembler, just like it is already done for Android builds.
- Linker is invoked via "g++" command instead of "gcc": g++ passes the appropriate magic parameters to the linker, which ensure correct registration of stack unwind tables in dynamic libraries.