Commit Graph

71 Commits

Author SHA1 Message Date
bors
7fc0675f35 Auto merge of #26327 - bluss:two-way, r=aturon
Update substring search to use the Two Way algorithm

To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Many thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
2015-06-30 18:09:51 +00:00
Johannes Oertel
239d9c2b09 Remove remaining use of bit_vec_append_splitoff feature gate. 2015-06-24 12:08:57 +02:00
Ulrik Sverdrup
b890b7bbc7 StrSearcher: Update substring search to use the Two Way algorithm
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
2015-06-21 19:58:50 +02:00
Alex Crichton
b4a2823cd6 More test fixes and fallout of stability changes 2015-06-17 09:07:17 -07:00
Alex Crichton
ce1a965cf5 Fallout in tests and docs from feature renamings 2015-06-17 09:07:16 -07:00
bors
b5b3a99f84 Auto merge of #26190 - Veedrac:no-iter, r=alexcrichton
Pull request for #26188.
2015-06-11 18:10:08 +00:00
bors
2fbbd54afe Auto merge of #26122 - bluss:borrow-box, r=alexcrichton
Implement Borrow<T> and BorrowMut<T> for Box<T: ?Sized>
2015-06-11 03:25:45 +00:00
bors
fbb13543fc Auto merge of #25839 - bluss:str-split-at-impl, r=alexcrichton
Implement RFC rust-lang/rfcs#1123

Add str method str::split_at(mid: usize) -> (&str, &str).

Also a minor cleanup in the collections::str module. Remove redundant slicing of self.
2015-06-11 00:22:27 +00:00
Joshua Landau
ca7418b846 Removed many pointless calls to *iter() and iter_mut() 2015-06-10 21:14:03 +01:00
Ulrik Sverdrup
d43bf53948 Add str::split_at
Implement RFC rust-lang/rfcs#1123

Add str method str::split_at(mid: usize) -> (&str, &str).
2015-06-10 09:15:07 +02:00
bors
f06e026578 Auto merge of #26039 - SimonSapin:case-mapping, r=alexcrichton
* Add “complex” mappings to `char::to_lowercase` and `char::to_uppercase`, making them yield sometimes more than on `char`: #25800. `str::to_lowercase` and `str::to_uppercase` are affected as well.
* Add `char::to_titlecase`, since it’s the same algorithm (just different data). However this does **not** add `str::to_titlecase`, as that would require UAX#29 Unicode Text Segmentation which we decided not to include in of `std`: https://github.com/rust-lang/rfcs/pull/1054 I made `char::to_titlecase` immediately `#[stable]`, since it’s so similar to `char::to_uppercase` that’s already stable. Let me know if it should be `#[unstable]` for a while.
* Add a special case for upper-case Sigma in word-final position in `str::to_lowercase`: #26035. This is the only language-independent conditional mapping currently in `SpecialCasing.txt`.
* Stabilize `str::to_lowercase` and `str::to_uppercase`. The `&self -> String` on `str` signature seems straightforward enough, and the only relevant issue I’ve found is #24536 about naming. But `char` already has stable methods with the same name, and deprecating them for a rename doesn’t seem worth it.

r? @alexcrichton
2015-06-09 20:00:32 +00:00
Ulrik Sverdrup
4fdb4cfa89 Implement Borrow<T> and BorrowMut<T> for Box<T: ?Sized> 2015-06-09 16:15:38 +02:00
Simon Sapin
6369dcbad8 Move collectionstest::char into coretest::char 2015-06-09 13:08:29 +02:00
bors
02c33b690b Auto merge of #26077 - SimonSapin:patch-6, r=alexcrichton
With the latter is provided by the `From` conversion trait, the former is now completely redundant. Their code is identical. Let’s deprecate now and plan to remove in the next cycle. (It’s `#[unstable]`.)

r? @alexcrichton 
CC @nagisa
2015-06-08 20:52:33 +00:00
Simon Sapin
c57a4124ff Address a review comment and fix a bootstrapping issue 2015-06-08 19:50:28 +02:00
Simon Sapin
c160192f5f Replace usage of String::from_str with String:from 2015-06-08 16:55:35 +02:00
Johannes Oertel
b36ed7d2ed Implement RFC 839
Closes #25976.
2015-06-08 12:05:33 +02:00
Simon Sapin
f901086b0d Correctly map upper-case Sigma to lower-case in word-final position. Fix #26035. 2015-06-06 12:37:11 +02:00
Simon Sapin
d316487ec1 Add char::to_titlecase
But not str::to_titlecase which would require UAX#29 Unicode Text Segmentation
which we decided not to include in of `std`:
https://github.com/rust-lang/rfcs/pull/1054
2015-06-06 12:37:11 +02:00
Simon Sapin
addaa5b1ff Add complex (but unconditional) Unicode case mapping. Fix #25800
As a result, the iterator returned by `char::to_uppercase` sometimes
yields two or three `char`s instead of just one.
2015-06-06 12:37:10 +02:00
Niko Matsakis
2c5e784d6f add const_fn features 2015-05-29 09:42:54 -04:00
Eduard Burtescu
377b0900ae Use const fn to abstract away the contents of UnsafeCell & friends. 2015-05-27 11:19:03 +03:00
Johannes Oertel
f95c812311 Implement append and split_off for BitSet (RFC 509) 2015-05-10 21:46:32 +02:00
bors
e8b4c84e39 Auto merge of #24890 - jooert:bitvec-append-split_off, r=alexcrichton
cc #19986 

r? @Gankro
2015-05-07 00:20:25 +00:00
Johannes Oertel
d55a7e8bc4 Implement append and split_off for BitVec (RFC 509) 2015-05-06 09:29:07 +02:00
Steven Allen
decf395221 Implement retain for vec_deque 2015-05-04 23:04:06 -04:00
bors
6517a0e90e Auto merge of #25047 - sinkuu:vec_intoiter_override, r=alexcrichton
Override methods `count`, `last`, and `nth` in vec::IntoIter.

#24214
2015-05-04 04:05:37 +00:00
sinkuu
5d8431c203 Override Iterator::count method in vec::IntoIter 2015-05-02 17:05:43 +09:00
Ulrik Sverdrup
ee48e6d192 collections: Implement String::drain(range) according to RFC 574
`.drain(range)` is unstable and under feature(collections_drain).

This adds a safe way to remove any range of a String as efficiently as
possible.

As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.

RFC tracking issue is #23055
2015-05-01 19:51:31 +02:00
Tamir Duberstein
69abc12b00 Register new snapshots 2015-04-28 17:23:45 -07:00
Ulrik Sverdrup
b475fc7d6a collections: Implement vec::drain(range) according to RFC 574
Old `.drain()` on vec is performed using `.drain(..)` now.

`.drain(range)` is unstable and under feature(collections_drain)

[breaking-change]
2015-04-28 11:38:33 +02:00
kwantam
c361e13d71 implement rfc 1054: split_whitespace() fn, deprecate words()
For now, words() is left in (but deprecated), and Words is a type alias for
struct SplitWhitespace.

Also cleaned up references to s.words() throughout codebase.

Closes #15628
2015-04-21 15:31:51 -04:00
Erick Tryzelaar
f055054eab collections: Move optimized String::from_str to String::from
This implementation is currently about 3-4 times faster than using
the `.to_string()` based approach.
2015-04-19 10:59:06 -07:00
kwantam
29d1252e4d deprecate Unicode functions that will be moved to crates.io
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
2015-04-16 17:03:05 -04:00
Alex Crichton
34603b0c19 rollup merge of #24310: alexcrichton/stabilize-utf8-error
The meaning of each variant of this enum was somewhat ambiguous and it's uncler
that we wouldn't even want to add more enumeration values in the future. As a
result this error has been altered to instead become an opaque structure.
Learning about the "first invalid byte index" is still an unstable feature, but
the type itself is now stable.
2015-04-14 10:55:41 -07:00
Alex Crichton
b8760afe47 More test fixes 2015-04-14 10:14:19 -07:00
Alex Crichton
700e627cf7 test: Fixup many library unit tests 2015-04-14 10:14:19 -07:00
Alex Crichton
f329030b09 std: Stabilize the Utf8Error type
The meaning of each variant of this enum was somewhat ambiguous and it's uncler
that we wouldn't even want to add more enumeration values in the future. As a
result this error has been altered to instead become an opaque structure.
Learning about the "first invalid byte index" is still an unstable feature, but
the type itself is now stable.
2015-04-10 16:07:46 -07:00
bors
dd6c4a8f15 Auto merge of #23293 - tbu-:pr_additive_multiplicative, r=alexcrichton
Previously it could not be implemented for types outside `libcore/iter.rs` due
to coherence issues.
2015-04-08 00:42:10 +00:00
Tobias Bucher
97f24a8596 Make sum and product inherent methods on Iterator
In addition to being nicer, this also allows you to use `sum` and `product` for
iterators yielding custom types aside from the standard integers.

Due to removing the `AdditiveIterator` and `MultiplicativeIterator` trait, this
is a breaking change.

[breaking-change]
2015-04-08 00:26:35 +02:00
bors
b2e65ee6e4 Auto merge of #23952 - Kimundi:more_string_pattern, r=alexcrichton
This adds the missing methods and turns `str::pattern` in a user facing module, as per RFC.

This also contains some big internal refactorings:
- string iterator pairs are implemented with a central macro to reduce redundancy 
- Moved all tests from `coretest::str` into `collectionstest::str` and left a note to prevent the two sets of tests drifting apart further.

See https://github.com/rust-lang/rust/issues/22477
2015-04-07 00:57:08 +00:00
Marvin Löbel
fbba28e246 Added smoke tests for new methods.
Fixed bug in existing StrSearcher impl
2015-04-05 18:52:58 +02:00
Marvin Löbel
c29559d28a Moved coretest::str tests into collectiontest::str 2015-04-05 18:52:58 +02:00
bors
fc98b19cf7 Auto merge of #23832 - petrochenkov:usize, r=aturon
These constants are small and can fit even in `u8`, but semantically they have type `usize` because they denote sizes and are almost always used in `usize` context. The change of their type to `u32` during the integer audit led only to the large amount of `as usize` noise (see the second commit, which removes this noise).

This is a minor [breaking-change] to an unstable interface.

r? @aturon
2015-04-03 04:29:52 +00:00
Alex Crichton
f92e7abefd rollup merge of #23860: nikomatsakis/copy-requires-clone
Conflicts:
	src/test/compile-fail/coherence-impls-copy.rs
2015-04-01 18:37:54 -07:00
Alex Crichton
9edbf42a34 rollup merge of #23945: pnkfelix/gate-u-negate
Feature-gate  unsigned unary negate.

Discussed in weekly meeting here: https://github.com/rust-lang/meeting-minutes/blob/master/weekly-meetings/2015-03-31.md#feature-gate--expr

and also in the internals thread here: http://internals.rust-lang.org/t/forbid-unsigned-integer/752
2015-04-01 18:36:21 -07:00
Alex Crichton
f86318d63c Test fixes and rebase conflicts, round 2
Conflicts:
	src/libcore/num/mod.rs
2015-04-02 02:07:51 +02:00
Alex Crichton
e98dce3e00 std: Changing the meaning of the count to splitn
This commit is an implementation of [RFC 979][rfc] which changes the meaning of
the count parameter to the `splitn` function on strings and slices. The
parameter now means the number of items that are returned from the iterator, not
the number of splits that are made.

[rfc]: https://github.com/rust-lang/rfcs/pull/979

Closes #23911
[breaking-change]
2015-04-01 13:29:42 -07:00
Niko Matsakis
c35c46821a Fallout in public-facing and semi-public-facing libs 2015-04-01 11:23:45 -04:00
Alex Crichton
3422be3666 rollup merge of #23288: alexcrichton/issue-19470
This is a deprecated attribute that is slated for removal, and it also affects
all implementors of the trait. This commit removes the attribute and fixes up
implementors accordingly. The primary implementation which was lost was the
ability to compare `&[T]` and `Vec<T>` (in that order).

This change also modifies the `assert_eq!` macro to not consider both directions
of equality, only the one given in the left/right forms to the macro. This
modification is motivated due to the fact that `&[T] == Vec<T>` no longer
compiles, causing hundreds of errors in unit tests in the standard library (and
likely throughout the community as well).

Closes #19470
[breaking-change]
2015-03-31 15:59:35 -07:00