540d98e7fc
Make CharSplitIterator double-ended which is simple given that the operation is symmetric, once the split-N feature is factored out into its own adaptor. `.rsplitn_iter()` allows splitting `N` times from the back of a string, so it is a completely new feature. With the double-ended impl, `.split_iter()`, `.line_iter()`, `.word_iter()` all allow picking off elements from either end. `split_options_iter` is removed with the factoring of the split- and split-N- iterators, instead there is `split_terminator_iter`. --- Add benchmarks using `#[bench]` and tune CharSplitIterator a bit after Huon Wilson's suggestions Benchmarks 1-5 do the same split using different implementations of `CharEq`, all splitting an ascii string on ascii space. Benchmarks 6-7 split a unicode string on an ascii char. Before this PR test str::bench::split_iter_ascii ... bench: 166 ns/iter (+/- 2) test str::bench::split_iter_closure ... bench: 113 ns/iter (+/- 1) test str::bench::split_iter_extern_fn ... bench: 286 ns/iter (+/- 7) test str::bench::split_iter_not_ascii ... bench: 114 ns/iter (+/- 4) test str::bench::split_iter_slice ... bench: 220 ns/iter (+/- 12) test str::bench::split_iter_unicode_ascii ... bench: 217 ns/iter (+/- 3) test str::bench::split_iter_unicode_not_ascii ... bench: 248 ns/iter (+/- 3) PR, first commit test str::bench::split_iter_ascii ... bench: 331 ns/iter (+/- 9) test str::bench::split_iter_closure ... bench: 114 ns/iter (+/- 2) test str::bench::split_iter_extern_fn ... bench: 314 ns/iter (+/- 6) test str::bench::split_iter_not_ascii ... bench: 132 ns/iter (+/- 1) test str::bench::split_iter_slice ... bench: 157 ns/iter (+/- 3) test str::bench::split_iter_unicode_ascii ... bench: 502 ns/iter (+/- 64) test str::bench::split_iter_unicode_not_ascii ... bench: 250 ns/iter (+/- 3) PR, final version test str::bench::split_iter_ascii ... bench: 106 ns/iter (+/- 4) test str::bench::split_iter_closure ... bench: 107 ns/iter (+/- 1) test str::bench::split_iter_extern_fn ... bench: 267 ns/iter (+/- 6) test str::bench::split_iter_not_ascii ... bench: 108 ns/iter (+/- 1) test str::bench::split_iter_slice ... bench: 170 ns/iter (+/- 8) test str::bench::split_iter_unicode_ascii ... bench: 128 ns/iter (+/- 5) test str::bench::split_iter_unicode_not_ascii ... bench: 252 ns/iter (+/- 3) --- There are several ways to deal with `CharEq::only_ascii`. It is a performance optimization, so with that in mind, we allow passing bogus char (outside ascii) as long as they don't match. We use a byte value check to make sure we don't split on these (would split substrings in the middle of encoded char). (A more principled way would be to only pass the ascii codepoints to the CharEq when it indicates only_ascii, but that undoes some of the performance optimization.) |
||
---|---|---|
.. | ||
fmt | ||
num | ||
rand | ||
rt | ||
str | ||
task | ||
unstable | ||
at_vec.rs | ||
bool.rs | ||
borrow.rs | ||
c_str.rs | ||
cast.rs | ||
cell.rs | ||
char.rs | ||
cleanup.rs | ||
clone.rs | ||
cmp.rs | ||
comm.rs | ||
condition.rs | ||
container.rs | ||
either.rs | ||
from_str.rs | ||
hash.rs | ||
hashmap.rs | ||
io.rs | ||
iter.rs | ||
iterator.rs | ||
kinds.rs | ||
libc.rs | ||
local_data.rs | ||
logging.rs | ||
macros.rs | ||
managed.rs | ||
ops.rs | ||
option.rs | ||
os.rs | ||
owned.rs | ||
path.rs | ||
prelude.rs | ||
ptr.rs | ||
rand.rs | ||
reflect.rs | ||
repr.rs | ||
result.rs | ||
run.rs | ||
select.rs | ||
std.rs | ||
str.rs | ||
sys.rs | ||
to_bytes.rs | ||
to_str.rs | ||
trie.rs | ||
tuple.rs | ||
unicode.rs | ||
unit.rs | ||
util.rs | ||
vec.rs |