Commit Graph

119 Commits

Author SHA1 Message Date
Alex Crichton
ac83242ac9 test: Deny warnings in {core,collections}test
Help cleans up our build a bit and stays in line with the rest of our crates
denying warnings traditionally.
2016-01-30 16:19:37 -08:00
bors
b2f4c5c596 Auto merge of #31224 - bluss:deque-hashing, r=Gankro
Hash VecDeque in its slice parts

Use .as_slices() for a more efficient code path in VecDeque's Hash impl.

This still hashes the elements in the same order.

Before/after timing of VecDeque hashing 1024 elements of u8 and
u64 shows that the vecdeque now can match the Vec
(test_hashing_vec_of_u64 is the Vec run).

```
before

test test_hashing_u64        ... bench:  14,031 ns/iter (+/- 236) = 583 MB/s
test test_hashing_u8         ... bench:   7,887 ns/iter (+/- 65) = 129 MB/s
test test_hashing_vec_of_u64 ... bench:   6,578 ns/iter (+/- 76) = 1245 MB/s

after

running 5 tests
test test_hashing_u64        ... bench:   6,495 ns/iter (+/- 52) = 1261 MB/s
test test_hashing_u8         ... bench:     851 ns/iter (+/- 16) = 1203 MB/s
test test_hashing_vec_of_u64 ... bench:   6,499 ns/iter (+/- 59) = 1260 MB/s
```
2016-01-27 16:25:36 +00:00
Ulrik Sverdrup
d3174ce751 collections: Hash VecDeque in its slice parts
Use .as_slices() for a more efficient code path in VecDeque's Hash impl.

This still hashes the elements in the same order.

Before/after timing of VecDeque hashing 1024 elements of u8 and
u64 shows that the vecdeque now can match the Vec
(test_hashing_vec_of_u64 is the Vec run).

before

test test_hashing_u64        ... bench:  14,031 ns/iter (+/- 236) = 583 MB/s
test test_hashing_u8         ... bench:   7,887 ns/iter (+/- 65) = 129 MB/s
test test_hashing_vec_of_u64 ... bench:   6,578 ns/iter (+/- 76) = 1245 MB/s

after

running 5 tests
test test_hashing_u64        ... bench:   6,495 ns/iter (+/- 52) = 1261 MB/s
test test_hashing_u8         ... bench:     851 ns/iter (+/- 16) = 1203 MB/s
test test_hashing_vec_of_u64 ... bench:   6,499 ns/iter (+/- 59) = 1260 MB/s
2016-01-27 00:04:03 +01:00
Alex Crichton
cb343c33ac Fix warnings during tests
The deny(warnings) attribute is now enabled for tests so we need to weed out
these warnings as well.
2016-01-26 09:29:28 -08:00
Andrew Paseltiner
686be822ef Make btree_set::{IntoIter, Iter, Range} covariant
CC #30642
2016-01-18 07:53:12 -05:00
Jonathan S
fae75c93b3 Fix and test variance of BTreeMap and its companion structs. 2016-01-17 10:08:22 -06:00
Jonathan S
be4128d148 BTreeMap: Add a test for clone 2016-01-16 22:55:56 -06:00
bors
e7e4ecc522 Auto merge of #30740 - bluss:ascii-is-the-best, r=brson
Add fast path for ASCII in UTF-8 validation

This speeds up the ASCII case (and long stretches of ASCII in otherwise
mixed UTF-8 data) when checking UTF-8 validity.

Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14 (smallish input).
On XML and mostly English language input (en.wikipedia XML dump),
throughput improves by a factor 7 (large input).

On mostly non-ASCII input, performance increases slightly or is the
same.

The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, bounds
checks are statically elided by LLVM and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.

A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.

Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.

Old code is `regular`, this PR is called `fast`.

Datasets:

- `ascii` is just ASCII (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia XML dump
- `enwik8` is 100MB of an en.wikipedia XML dump
- `jawik10` is 10MB of a ja.wikipedia XML dump

```
test from_utf8_ascii_fast        ... bench:         140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular     ... bench:       1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast          ... bench:      10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular       ... bench:      10,944 ns/iter (+/- 795) = 468 MB/s
test from_utf8_dewik10_fast      ... bench:   6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular   ... bench:  11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast       ... bench:  14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular    ... bench:  93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast      ... bench:  29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular   ... bench:  29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```

Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
2016-01-16 01:18:48 +00:00
bors
8796e012cb Auto merge of #29498 - wthrowe:replace-pattern, r=alexcrichton
It appears this was left out of RFC rust-lang/rfcs#528 because it might be useful to
also generalize the second argument in some way.  That doesn't seem to
prevent generalizing the first argument now, however.

This is a [breaking-change] because it could cause type-inference to
fail where it previously succeeded.

Also update docs for a few other methods that still referred to `&str` instead of patterns.
2016-01-13 08:15:45 +00:00
Ulrik Sverdrup
11e3de39d9 Add fast path for ASCII in UTF-8 validation
This speeds up the ascii case (and long stretches of ascii in otherwise
mixed UTF-8 data) when checking UTF-8 validity.

Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14!
On xml and mostly english language input (en.wikipedia xml dump),
throughput increases by a factor 7.

On mostly non-ASCII input, performance increases slightly or is the
same.

The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, they
are statically elided by llvm and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.

A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.

Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.

Old code is `regular`, this PR is called `fast`.

Datasets:

- `ascii` is just ascii (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia xml dump
- `enwik10` is 100MB of an en.wikipedia xml dump
- `jawik10` is 10MB of a ja.wikipedia xml dump

```
test from_utf8_ascii_fast        ... bench:         140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular     ... bench:       1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast          ... bench:      10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular       ... bench:      12,250 ns/iter (+/- 437) = 418 MB/s
test from_utf8_dewik10_fast      ... bench:   6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular   ... bench:  11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast       ... bench:  14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular    ... bench:  93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast      ... bench:  29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular   ... bench:  29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```

Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
2016-01-12 21:57:04 +01:00
Tamir Duberstein
722905fda0 restore tests accidentally removed in #30182 2015-12-13 01:02:12 -05:00
Alex Crichton
da50f7c288 std: Remove deprecated functionality from 1.5
This is a standard "clean out libstd" commit which removes all 1.5-and-before
deprecated functionality as it's now all been deprecated for at least one entire
cycle.
2015-12-10 11:47:55 -08:00
William Throwe
e7f3d6eddd Let str::replace take a pattern
It appears this was left out of RFC #528 because it might be useful to
also generalize the second argument in some way.  That doesn't seem to
prevent generalizing the first argument now, however.

This is a [breaking-change] because it could cause type-inference to
fail where it previously succeeded.
2015-12-07 22:08:33 -05:00
Alex Crichton
464cdff102 std: Stabilize APIs for the 1.6 release
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below

Stabilized APIs

* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
  marked as `#[stable]` and the methods appearing via traits for primitives like
  `char` and `str` are now also marked as stable. Note that the extension traits
  themeselves are marked as unstable as they're imported via the prelude. The
  `try!` macro was also moved from the standard library into libcore to have the
  same interface. Otherwise the functions all have copied stability from the
  standard library now.
* The `#![no_std]` attribute
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)

Deprecated APIs

* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`

New APIs (still unstable)

* `<[T]>::sort_by_key` (added to mirror `min_by_key`)

Closes #27585
Closes #27704
Closes #27707
Closes #27710
Closes #27711
Closes #27727
Closes #27740
Closes #27744
Closes #27799
Closes #27801
cc #27801 (doesn't close as `Chars` is still unstable)
Closes #28968
2015-12-05 15:09:44 -08:00
Kevin Butler
83b308e585 Add assertions to test_total_ord for str 2015-10-24 19:53:42 +01:00
Kevin Butler
49c78789ce Remove unnecessary String allocations from str tests 2015-10-24 19:53:33 +01:00
bors
e7eb7d58f9 Auto merge of #27723 - mystor:vecdeque_drain_range, r=bluss
This is a WIP PR for my implementation of drain over the VecDeque data structure supporting ranges. It brings the VecDeque drain implementation in line with Vec's.

Tests haven't been written for the new function yet.
2015-10-20 11:55:17 +00:00
Michael Layzell
dec0ea08f7 Correct drain implementations in libcollectionstest 2015-10-19 11:53:35 -04:00
Cristi Cobzarenco
4b308b44e1 typos: fix a grabbag of typos all over the place 2015-10-08 19:49:31 +01:00
Scott Olson
55e48420db Minor code cleanup. 2015-09-28 19:21:18 -06:00
Manish Goregaokar
30b43c1b12 Rollup merge of #28682 - apasel422:features, r=steveklabnik 2015-09-27 15:05:18 +05:30
Andrew Paseltiner
589c82449a Remove unnecessary #![feature] attributes 2015-09-26 15:59:31 -04:00
Alex Crichton
d5f2d3b177 std: Update MatchIndices to return a subslice
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).

cc #27743
2015-09-25 09:29:23 -07:00
Lee Jeffery
140e2d3a09 Miscellaneous cleanup for old issues. 2015-09-20 11:37:08 +01:00
Andrew Paseltiner
9526813f5b Avoid zero-sized leaf allocations in BTreeMap
When both the key and value types were zero-sized, `BTreeMap` previously
called `heap::allocate` with `size == 0` for leaf nodes, which is
undefined behavior, and jemalloc would attempt to read invalid memory,
crashing the process.

This avoids undefined behavior by allocating enough space to store one
edge in leaf nodes that would otherwise have `size == 0`. Although this
uses extra memory, maps with zero-sized key types that have sensible
implementations of the ordering traits can only contain a single
key-value pair (and therefore only a single leaf node), and maps with
key and value types that are both zero-sized have few uses, if any.

Furthermore, this is a temporary fix that will likely be unnecessary
once the `BTreeMap` implementation is rewritten to use parent pointers.

Closes #28493.
2015-09-18 15:27:17 -04:00
Alex Crichton
48615a68fb std: Account for CRLF in {str, BufRead}::lines
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.

The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.

[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md

Closes #28032
2015-09-03 23:01:41 -07:00
bors
dfe9326941 Auto merge of #28148 - eefriedman:binary_heap, r=alexcrichton 2015-09-02 01:33:20 +00:00
Eli Friedman
b82c42c153 Add missing stability markings to BinaryHeap. 2015-09-01 01:22:57 -07:00
bors
b0f77ba26a Auto merge of #28101 - ijks:24214-str-bytes, r=alexcrichton
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-31 09:15:55 +00:00
Daan Rijks
dacf2725ec Add overrides to iterator methods for str::Bytes
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-30 17:32:50 +02:00
Andrew Paseltiner
f9b63d3973 implement RFC 1194 2015-08-28 12:41:54 -04:00
bors
de67d62c6b Auto merge of #27474 - bluss:twoway-reverse, r=brson
StrSearcher: Implement the complete reverse case for the two way algorithm

Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-18 02:02:57 +00:00
bors
e2bebf32fa Auto merge of #27696 - bluss:into-boxed-str, r=alexcrichton
Rename String::into_boxed_slice -> into_boxed_str

This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
2015-08-14 01:06:37 +00:00
Ulrik Sverdrup
bec64090a7 Rename String::into_boxed_slice -> into_boxed_str
This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
2015-08-13 14:02:00 +02:00
Alex Crichton
8d90d3f368 Remove all unstable deprecated functionality
This commit removes all unstable and deprecated functions in the standard
library. A release was recently cut (1.3) which makes this a good time for some
spring cleaning of the deprecated functions.
2015-08-12 14:55:17 -07:00
Ulrik Sverdrup
c5a1d8c3db StrSearcher: Add tests for rfind(&str)
Add tests for .rfind(&str), using the reverse searcher case for
substring search.
2015-08-02 20:08:35 +02:00
Alexis Beingessner
3e954a8cb2 implement Clone for Box<str>, closes #27323
This is a minor [breaking-change], as it changes what
`boxed_str.to_owned()` does (previously it would deref to `&str` and
call `to_owned` on that to get a `String`). However `Box<str>` is such an
exceptionally rare type that this is not expected to be a serious
concern. Also a `Box<str>` can be freely converted to a `String` to
obtain the previous behaviour anyway.
2015-07-29 18:43:01 -07:00
Jonathan Reem
e24423091f Implement Clone for Box<[T]> where T: Clone
Closes #25097
2015-07-28 01:43:17 -07:00
Alexis Beingessner
bfa0e1f58a Add RawVec to unify raw Vecish code 2015-07-17 08:29:15 -07:00
bors
dd46cf8b22 Auto merge of #26241 - SimonSapin:derefmut-for-string, r=alexcrichton
See https://github.com/rust-lang/rfcs/issues/1157
2015-07-13 23:47:06 +00:00
Simon Sapin
3226858e50 Fix tests for changes in #26241. 2015-07-13 23:28:58 +02:00
Simon Sapin
7469914e96 Add str::split_at_mut 2015-07-13 16:21:43 +02:00
bors
05d8767289 Auto merge of #26957 - wesleywiser:rename_connect_to_join, r=alexcrichton
Fixes #26900
2015-07-12 22:05:59 +00:00
bors
50d305e498 Auto merge of #26966 - nagisa:tail-init, r=alexcrichton
Fixes #26906
2015-07-12 13:16:24 +00:00
Jonathan Reem
69521affbb Add String::into_boxed_slice and Box<str>::into_string
Implements merged RFC 1152.

Closes #26697.
2015-07-11 21:31:56 -07:00
Simonas Kazlauskas
7a90865db5 Implement RFC 1058 2015-07-12 00:47:56 +03:00
Wesley Wiser
93ddee6cee Change some instances of .connect() to .join() 2015-07-10 19:40:46 -04:00
Ulrik Sverdrup
836f32e769 Use vec![elt; n] where possible
The common pattern `iter::repeat(elt).take(n).collect::<Vec<_>>()` is
exactly equivalent to `vec![elt; n]`, do this replacement in the whole
tree.

(Actually, vec![] is smart enough to only call clone n - 1 times, while
the former solution would call clone n times, and this fact is
virtually irrelevant in practice.)
2015-07-09 11:05:32 +02:00
bors
7fc0675f35 Auto merge of #26327 - bluss:two-way, r=aturon
Update substring search to use the Two Way algorithm

To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Many thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
2015-06-30 18:09:51 +00:00
Johannes Oertel
239d9c2b09 Remove remaining use of bit_vec_append_splitoff feature gate. 2015-06-24 12:08:57 +02:00