Commit Graph

42 Commits

Author SHA1 Message Date
Kevin Butler
83b308e585 Add assertions to test_total_ord for str 2015-10-24 19:53:42 +01:00
Kevin Butler
49c78789ce Remove unnecessary String allocations from str tests 2015-10-24 19:53:33 +01:00
Cristi Cobzarenco
4b308b44e1 typos: fix a grabbag of typos all over the place 2015-10-08 19:49:31 +01:00
Alex Crichton
d5f2d3b177 std: Update MatchIndices to return a subslice
This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow
the same pattern as the `chars` and `char_indices` iterators. The `matches`
iterator currently yield `&str` elements, so the `MatchIndices` iterator now
yields the index of the match as well as the `&str` that matched (instead of
start/end indexes).

cc #27743
2015-09-25 09:29:23 -07:00
Alex Crichton
48615a68fb std: Account for CRLF in {str, BufRead}::lines
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.

The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.

[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md

Closes #28032
2015-09-03 23:01:41 -07:00
bors
b0f77ba26a Auto merge of #28101 - ijks:24214-str-bytes, r=alexcrichton
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-31 09:15:55 +00:00
Daan Rijks
dacf2725ec Add overrides to iterator methods for str::Bytes
Specifically, `count`, `last`, and `nth` are implemented to use the
methods of the underlying slice iterator.

Partially closes #24214.
2015-08-30 17:32:50 +02:00
bors
de67d62c6b Auto merge of #27474 - bluss:twoway-reverse, r=brson
StrSearcher: Implement the complete reverse case for the two way algorithm

Fix quadratic behavior in StrSearcher in reverse search with periodic
needles.

This commit adds the missing pieces for the "short period" case in
reverse search. The short case will show up when the needle is literally
periodic, for example "abababab".

Two way uses a "critical factorization" of the needle: x = u v.

Searching matches v first, if mismatch at character k, skip k forward.
Matching u, if mismatch, skip period(x) forward.

To avoid O(mn) behavior after mismatch in u, memorize the already
matched prefix.

The short period case requires that |u| < period(x).

For the reverse search we need to compute a different critical
factorization x = u' v' where |v'| < period(x), because we are searching
for the reversed needle. A short v' also benefits the algorithm in
general.

The reverse critical factorization is computed quickly by using the same
maximal suffix algorithm, but terminating as soon as we have a location
with local period equal to period(x).

This adds extra fields crit_pos_back and memory_back for the reverse
case. The new overhead for TwoWaySearcher::new is low, and additionally
I think the "short period" case is uncommon in many applications of
string search.

The maximal_suffix methods were updated in documentation and the
algorithms updated to not use !0 and wrapping add, variable left is now
1 larger, offset 1 smaller.

Use periodicity when computing byteset: in the periodic case, just
iterate over one period instead of the whole needle.

Example before (rfind) after (twoway_rfind) benchmark shows the removal
of quadratic behavior.

needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100

```
test periodic::rfind           ... bench:   1,926,595 ns/iter (+/- 11,390) = 10 MB/s
test periodic::twoway_rfind    ... bench:      51,740 ns/iter (+/- 66) = 386 MB/s
```
2015-08-18 02:02:57 +00:00
bors
e2bebf32fa Auto merge of #27696 - bluss:into-boxed-str, r=alexcrichton
Rename String::into_boxed_slice -> into_boxed_str

This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
2015-08-14 01:06:37 +00:00
Ulrik Sverdrup
bec64090a7 Rename String::into_boxed_slice -> into_boxed_str
This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
2015-08-13 14:02:00 +02:00
Alex Crichton
8d90d3f368 Remove all unstable deprecated functionality
This commit removes all unstable and deprecated functions in the standard
library. A release was recently cut (1.3) which makes this a good time for some
spring cleaning of the deprecated functions.
2015-08-12 14:55:17 -07:00
Ulrik Sverdrup
c5a1d8c3db StrSearcher: Add tests for rfind(&str)
Add tests for .rfind(&str), using the reverse searcher case for
substring search.
2015-08-02 20:08:35 +02:00
Alexis Beingessner
3e954a8cb2 implement Clone for Box<str>, closes #27323
This is a minor [breaking-change], as it changes what
`boxed_str.to_owned()` does (previously it would deref to `&str` and
call `to_owned` on that to get a `String`). However `Box<str>` is such an
exceptionally rare type that this is not expected to be a serious
concern. Also a `Box<str>` can be freely converted to a `String` to
obtain the previous behaviour anyway.
2015-07-29 18:43:01 -07:00
bors
dd46cf8b22 Auto merge of #26241 - SimonSapin:derefmut-for-string, r=alexcrichton
See https://github.com/rust-lang/rfcs/issues/1157
2015-07-13 23:47:06 +00:00
Simon Sapin
7469914e96 Add str::split_at_mut 2015-07-13 16:21:43 +02:00
bors
05d8767289 Auto merge of #26957 - wesleywiser:rename_connect_to_join, r=alexcrichton
Fixes #26900
2015-07-12 22:05:59 +00:00
Jonathan Reem
69521affbb Add String::into_boxed_slice and Box<str>::into_string
Implements merged RFC 1152.

Closes #26697.
2015-07-11 21:31:56 -07:00
Wesley Wiser
93ddee6cee Change some instances of .connect() to .join() 2015-07-10 19:40:46 -04:00
Ulrik Sverdrup
b890b7bbc7 StrSearcher: Update substring search to use the Two Way algorithm
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
2015-06-21 19:58:50 +02:00
Alex Crichton
ce1a965cf5 Fallout in tests and docs from feature renamings 2015-06-17 09:07:16 -07:00
bors
fbb13543fc Auto merge of #25839 - bluss:str-split-at-impl, r=alexcrichton
Implement RFC rust-lang/rfcs#1123

Add str method str::split_at(mid: usize) -> (&str, &str).

Also a minor cleanup in the collections::str module. Remove redundant slicing of self.
2015-06-11 00:22:27 +00:00
Ulrik Sverdrup
d43bf53948 Add str::split_at
Implement RFC rust-lang/rfcs#1123

Add str method str::split_at(mid: usize) -> (&str, &str).
2015-06-10 09:15:07 +02:00
bors
f06e026578 Auto merge of #26039 - SimonSapin:case-mapping, r=alexcrichton
* Add “complex” mappings to `char::to_lowercase` and `char::to_uppercase`, making them yield sometimes more than on `char`: #25800. `str::to_lowercase` and `str::to_uppercase` are affected as well.
* Add `char::to_titlecase`, since it’s the same algorithm (just different data). However this does **not** add `str::to_titlecase`, as that would require UAX#29 Unicode Text Segmentation which we decided not to include in of `std`: https://github.com/rust-lang/rfcs/pull/1054 I made `char::to_titlecase` immediately `#[stable]`, since it’s so similar to `char::to_uppercase` that’s already stable. Let me know if it should be `#[unstable]` for a while.
* Add a special case for upper-case Sigma in word-final position in `str::to_lowercase`: #26035. This is the only language-independent conditional mapping currently in `SpecialCasing.txt`.
* Stabilize `str::to_lowercase` and `str::to_uppercase`. The `&self -> String` on `str` signature seems straightforward enough, and the only relevant issue I’ve found is #24536 about naming. But `char` already has stable methods with the same name, and deprecating them for a rename doesn’t seem worth it.

r? @alexcrichton
2015-06-09 20:00:32 +00:00
Simon Sapin
c57a4124ff Address a review comment and fix a bootstrapping issue 2015-06-08 19:50:28 +02:00
Simon Sapin
c160192f5f Replace usage of String::from_str with String:from 2015-06-08 16:55:35 +02:00
Simon Sapin
f901086b0d Correctly map upper-case Sigma to lower-case in word-final position. Fix #26035. 2015-06-06 12:37:11 +02:00
kwantam
c361e13d71 implement rfc 1054: split_whitespace() fn, deprecate words()
For now, words() is left in (but deprecated), and Words is a type alias for
struct SplitWhitespace.

Also cleaned up references to s.words() throughout codebase.

Closes #15628
2015-04-21 15:31:51 -04:00
kwantam
29d1252e4d deprecate Unicode functions that will be moved to crates.io
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
2015-04-16 17:03:05 -04:00
Alex Crichton
f329030b09 std: Stabilize the Utf8Error type
The meaning of each variant of this enum was somewhat ambiguous and it's uncler
that we wouldn't even want to add more enumeration values in the future. As a
result this error has been altered to instead become an opaque structure.
Learning about the "first invalid byte index" is still an unstable feature, but
the type itself is now stable.
2015-04-10 16:07:46 -07:00
bors
dd6c4a8f15 Auto merge of #23293 - tbu-:pr_additive_multiplicative, r=alexcrichton
Previously it could not be implemented for types outside `libcore/iter.rs` due
to coherence issues.
2015-04-08 00:42:10 +00:00
Tobias Bucher
97f24a8596 Make sum and product inherent methods on Iterator
In addition to being nicer, this also allows you to use `sum` and `product` for
iterators yielding custom types aside from the standard integers.

Due to removing the `AdditiveIterator` and `MultiplicativeIterator` trait, this
is a breaking change.

[breaking-change]
2015-04-08 00:26:35 +02:00
Marvin Löbel
fbba28e246 Added smoke tests for new methods.
Fixed bug in existing StrSearcher impl
2015-04-05 18:52:58 +02:00
Marvin Löbel
c29559d28a Moved coretest::str tests into collectiontest::str 2015-04-05 18:52:58 +02:00
Alex Crichton
e98dce3e00 std: Changing the meaning of the count to splitn
This commit is an implementation of [RFC 979][rfc] which changes the meaning of
the count parameter to the `splitn` function on strings and slices. The
parameter now means the number of items that are returned from the iterator, not
the number of splits that are made.

[rfc]: https://github.com/rust-lang/rfcs/pull/979

Closes #23911
[breaking-change]
2015-04-01 13:29:42 -07:00
Alex Crichton
3422be3666 rollup merge of #23288: alexcrichton/issue-19470
This is a deprecated attribute that is slated for removal, and it also affects
all implementors of the trait. This commit removes the attribute and fixes up
implementors accordingly. The primary implementation which was lost was the
ability to compare `&[T]` and `Vec<T>` (in that order).

This change also modifies the `assert_eq!` macro to not consider both directions
of equality, only the one given in the left/right forms to the macro. This
modification is motivated due to the fact that `&[T] == Vec<T>` no longer
compiles, causing hundreds of errors in unit tests in the standard library (and
likely throughout the community as well).

Closes #19470
[breaking-change]
2015-03-31 15:59:35 -07:00
Alex Crichton
554946c81e rollup merge of #23873: alexcrichton/remove-deprecated
Conflicts:
	src/libcollectionstest/fmt.rs
	src/libcollectionstest/lib.rs
	src/libcollectionstest/str.rs
	src/libcore/error.rs
	src/libstd/fs.rs
	src/libstd/io/cursor.rs
	src/libstd/os.rs
	src/libstd/process.rs
	src/libtest/lib.rs
	src/test/run-pass-fulldeps/compiler-calls.rs
2015-03-31 15:54:44 -07:00
Alex Crichton
d4a2c94180 std: Clean out #[deprecated] APIs
This commit cleans out a large amount of deprecated APIs from the standard
library and some of the facade crates as well, updating all users in the
compiler and in tests as it goes along.
2015-03-31 15:49:57 -07:00
Alex Crichton
5cf126ae2f std: Remove #[old_orphan_check] from PartialEq
This is a deprecated attribute that is slated for removal, and it also affects
all implementors of the trait. This commit removes the attribute and fixes up
implementors accordingly. The primary implementation which was lost was the
ability to compare `&[T]` and `Vec<T>` (in that order).

This change also modifies the `assert_eq!` macro to not consider both directions
of equality, only the one given in the left/right forms to the macro. This
modification is motivated due to the fact that `&[T] == Vec<T>` no longer
compiles, causing hundreds of errors in unit tests in the standard library (and
likely throughout the community as well).

cc #19470
[breaking-change]
2015-03-31 13:39:14 -07:00
Emeliov Dmitrii
df65f59fe9 replace deprecated as_slice() 2015-03-31 01:03:13 +03:00
Jake Goulding
c6ca2205ea StrExt::splitn should not require a DoubleEndedSearcher
Closes #23262
2015-03-19 19:25:22 -04:00
Jake Goulding
6a5148bda1 Introduce rsplit 2015-03-19 19:25:22 -04:00
Jorge Aparicio
6453fcd4cc extract libcollections tests into libcollectionstest 2015-03-16 21:57:42 -05:00