mikros/rust - rust - Gitea.pterpstra.com

Author	SHA1	Message	Date
Tobias Bucher	68efea08fa	Restore `char::escape_default` and add `char::escape` instead	2016-07-26 15:15:00 +02:00
Tobias Bucher	e7d16580f5	Escape fewer Unicode codepoints in `Debug` impl of `str` Use the same procedure as Python to determine whether a character is printable, described in [PEP 3138]. In particular, this means that the following character classes are escaped: - Cc (Other, Control) - Cf (Other, Format) - Cs (Other, Surrogate), even though they can't appear in Rust strings - Co (Other, Private Use) - Cn (Other, Not Assigned) - Zl (Separator, Line) - Zp (Separator, Paragraph) - Zs (Separator, Space), except for the ASCII space `' '` (`0x20`) This allows for user-friendly inspection of strings that are not English (e.g. compare `"\u{e9}\u{e8}\u{ea}"` to `"éèê"`). Fixes #34318. [PEP 3138]: https://www.python.org/dev/peps/pep-3138/	2016-07-23 00:18:44 +02:00
Srinivas Reddy Thatiparthy	c605480521	clean up for test cases	2016-06-09 08:20:08 +05:30
Srinivas Reddy Thatiparthy	c6ed7adf7a	remove redundant assert statements	2016-06-09 08:12:31 +05:30
Alex Crichton	b64c9d5670	std: Clean out old unstable + deprecated APIs These should all have been deprecated for at least one cycle, so this commit cleans them all out.	2016-05-30 20:46:32 -07:00
Steve Klabnik	657cae03e9	Rollup merge of #32869 - bluss:char-boundary-test, r=brson Add test for is_char_boundary Add test for is_char_boundary Apparently there was no test for this method. This test is rather simple, not exhaustive.	2016-04-14 14:49:09 -04:00
Alex Crichton	552eda70d3	std: Stabilize APIs for the 1.9 release This commit applies all stabilizations, renamings, and deprecations that the library team has decided on for the upcoming 1.9 release. All tracking issues have gone through a cycle-long "final comment period" and the specific APIs stabilized/deprecated are: Stable * `std::panic` * `std::panic::catch_unwind` (renamed from `recover`) * `std::panic::resume_unwind` (renamed from `propagate`) * `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`) * `std::panic::UnwindSafe` (renamed from `RecoverSafe`) * `str::is_char_boundary` * `<const T>::as_ref` `<mut T>::as_ref` `<mut T>::as_mut` `AsciiExt::make_ascii_uppercase` * `AsciiExt::make_ascii_lowercase` * `char::decode_utf16` * `char::DecodeUtf16` * `char::DecodeUtf16Error` * `char::DecodeUtf16Error::unpaired_surrogate` * `BTreeSet::take` * `BTreeSet::replace` * `BTreeSet::get` * `HashSet::take` * `HashSet::replace` * `HashSet::get` * `OsString::with_capacity` * `OsString::clear` * `OsString::capacity` * `OsString::reserve` * `OsString::reserve_exact` * `OsStr::is_empty` * `OsStr::len` * `std::os::unix::thread` * `RawPthread` * `JoinHandleExt` * `JoinHandleExt::as_pthread_t` * `JoinHandleExt::into_pthread_t` * `HashSet::hasher` * `HashMap::hasher` * `CommandExt::exec` * `File::try_clone` * `SocketAddr::set_ip` * `SocketAddr::set_port` * `SocketAddrV4::set_ip` * `SocketAddrV4::set_port` * `SocketAddrV6::set_ip` * `SocketAddrV6::set_port` * `SocketAddrV6::set_flowinfo` * `SocketAddrV6::set_scope_id` * `<[T]>::copy_from_slice` * `ptr::read_volatile` * `ptr::write_volatile` * The `#[deprecated]` attribute * `OpenOptions::create_new` Deprecated * `std::raw::Slice` - use raw parts of `slice` module instead * `std::raw::Repr` - use raw parts of `slice` module instead * `str::char_range_at` - use slicing plus `chars()` plus `len_utf8` * `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8` * `str::char_at` - use slicing plus `chars()` * `str::char_at_reverse` - use slicing plus `chars().rev()` * `str::slice_shift_char` - use `chars()` plus `Chars::as_str` * `CommandExt::session_leader` - use `before_exec` instead. Closes #27719 cc #27751 (deprecating the `Slice` bits) Closes #27754 Closes #27780 Closes #27809 Closes #27811 Closes #27830 Closes #28050 Closes #29453 Closes #29791 Closes #29935 Closes #30014 Closes #30752 Closes #31262 cc #31398 (still need to deal with `before_exec`) Closes #31405 Closes #31572 Closes #31755 Closes #31756	2016-04-11 08:57:53 -07:00
Ulrik Sverdrup	f0a1ea27cc	Add test for is_char_boundary	2016-04-10 20:09:26 +02:00
Alex Crichton	48d5fe9ec5	std: Change `encode_utf{8,16}` to return iterators Currently these have non-traditional APIs which take a buffer and report how much was filled in, but they're not necessarily ergonomic to use. Returning an iterator which also exposes an underlying slice shouldn't result in any performance loss as it's just a lazy version of the same implementation, and it's also much more ergonomic! cc #27784	2016-03-22 10:25:30 -07:00
Alex Crichton	0d5cfd9117	mk: Distribute fewer TARGET_CRATES Right now everything in TARGET_CRATES is built by default for all non-fulldeps tests and is distributed by default for all target standard library packages. Currenly this includes a number of unstable crates which are rarely used such as `graphviz` and `rbml`> This commit trims down the set of `TARGET_CRATES`, moves a number of tests to `*-fulldeps` as a result, and trims down the dependencies of libtest so we can distribute fewer crates in the `rust-std` packages.	2016-03-07 13:05:12 -08:00
Ulrik Sverdrup	4594f0f67a	Fix panic on string slicing error to truncate the string The string may be arbitrarily long, but we want to limit the panic message to a reasonable length. Truncate the string if it is too long (simply to char boundary). Also add details to the start <= end message. I think it's ok to flesh out the code here, since it's in a cold function.	2016-03-05 18:11:52 +01:00
Marvin Löbel	dd67e55c10	Changed `std::pattern::Pattern` impl on `&'a &'a str` to `&'a &'b str` in order to allow a bit more felixibility in how to use it.	2016-03-01 17:53:51 +01:00
Tobias Bucher	b27b8f63bc	Add tests for `Cow::from` for strings, vectors and slices	2016-02-03 20:45:30 +01:00
bors	e7e4ecc522	Auto merge of #30740 - bluss:ascii-is-the-best, r=brson Add fast path for ASCII in UTF-8 validation This speeds up the ASCII case (and long stretches of ASCII in otherwise mixed UTF-8 data) when checking UTF-8 validity. Benchmark results suggest that on purely ASCII input, we can improve throughput (megabytes verified / second) by a factor of 13 to 14 (smallish input). On XML and mostly English language input (en.wikipedia XML dump), throughput improves by a factor 7 (large input). On mostly non-ASCII input, performance increases slightly or is the same. The UTF-8 validation is rewritten to use indexed access; since all access is preceded by a (mandatory for validation) length check, bounds checks are statically elided by LLVM and this formulation is in fact the best for performance. A previous version had losses due to slice to iterator conversions. A large credit to Björn Steinbrink who improved this patch immensely, writing this second version. Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3. Old code is `regular`, this PR is called `fast`. Datasets: - `ascii` is just ASCII (2.5 kB) - `cyr` is cyrillic script with ascii spaces (5 kB) - `dewik10` is 10MB of a de.wikipedia XML dump - `enwik8` is 100MB of an en.wikipedia XML dump - `jawik10` is 10MB of a ja.wikipedia XML dump ``` test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s test from_utf8_cyr_regular ... bench: 10,944 ns/iter (+/- 795) = 468 MB/s test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s ``` Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>	2016-01-16 01:18:48 +00:00
Ulrik Sverdrup	11e3de39d9	Add fast path for ASCII in UTF-8 validation This speeds up the ascii case (and long stretches of ascii in otherwise mixed UTF-8 data) when checking UTF-8 validity. Benchmark results suggest that on purely ASCII input, we can improve throughput (megabytes verified / second) by a factor of 13 to 14! On xml and mostly english language input (en.wikipedia xml dump), throughput increases by a factor 7. On mostly non-ASCII input, performance increases slightly or is the same. The UTF-8 validation is rewritten to use indexed access; since all access is preceded by a (mandatory for validation) length check, they are statically elided by llvm and this formulation is in fact the best for performance. A previous version had losses due to slice to iterator conversions. A large credit to Björn Steinbrink who improved this patch immensely, writing this second version. Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3. Old code is `regular`, this PR is called `fast`. Datasets: - `ascii` is just ascii (2.5 kB) - `cyr` is cyrillic script with ascii spaces (5 kB) - `dewik10` is 10MB of a de.wikipedia xml dump - `enwik10` is 100MB of an en.wikipedia xml dump - `jawik10` is 10MB of a ja.wikipedia xml dump ``` test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s test from_utf8_cyr_regular ... bench: 12,250 ns/iter (+/- 437) = 418 MB/s test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s ``` Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>	2016-01-12 21:57:04 +01:00
William Throwe	e7f3d6eddd	Let str::replace take a pattern It appears this was left out of RFC #528 because it might be useful to also generalize the second argument in some way. That doesn't seem to prevent generalizing the first argument now, however. This is a [breaking-change] because it could cause type-inference to fail where it previously succeeded.	2015-12-07 22:08:33 -05:00
Kevin Butler	83b308e585	Add assertions to test_total_ord for str	2015-10-24 19:53:42 +01:00
Kevin Butler	49c78789ce	Remove unnecessary String allocations from str tests	2015-10-24 19:53:33 +01:00
Cristi Cobzarenco	4b308b44e1	typos: fix a grabbag of typos all over the place	2015-10-08 19:49:31 +01:00
Alex Crichton	d5f2d3b177	std: Update MatchIndices to return a subslice This commit updates the `MatchIndices` and `RMatchIndices` iterators to follow the same pattern as the `chars` and `char_indices` iterators. The `matches` iterator currently yield `&str` elements, so the `MatchIndices` iterator now yields the index of the match as well as the `&str` that matched (instead of start/end indexes). cc #27743	2015-09-25 09:29:23 -07:00
Alex Crichton	48615a68fb	std: Account for CRLF in {str, BufRead}::lines This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of the `str::lines` and `BufRead::lines` iterators. Both iterators now account for `\r\n` sequences in addition to `\n`, allowing for less surprising behavior across platforms (especially in the `BufRead` case). Splitting only on the `\n` character can still be achieved with `split('\n')` in both cases. The `str::lines_any` function is also now deprecated as `str::lines` is a drop-in replacement for it. [rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md Closes #28032	2015-09-03 23:01:41 -07:00
bors	b0f77ba26a	Auto merge of #28101 - ijks:24214-str-bytes, r=alexcrichton Specifically, `count`, `last`, and `nth` are implemented to use the methods of the underlying slice iterator. Partially closes #24214.	2015-08-31 09:15:55 +00:00
Daan Rijks	dacf2725ec	Add overrides to iterator methods for `str::Bytes` Specifically, `count`, `last`, and `nth` are implemented to use the methods of the underlying slice iterator. Partially closes #24214.	2015-08-30 17:32:50 +02:00
bors	de67d62c6b	Auto merge of #27474 - bluss:twoway-reverse, r=brson StrSearcher: Implement the complete reverse case for the two way algorithm Fix quadratic behavior in StrSearcher in reverse search with periodic needles. This commit adds the missing pieces for the "short period" case in reverse search. The short case will show up when the needle is literally periodic, for example "abababab". Two way uses a "critical factorization" of the needle: x = u v. Searching matches v first, if mismatch at character k, skip k forward. Matching u, if mismatch, skip period(x) forward. To avoid O(mn) behavior after mismatch in u, memorize the already matched prefix. The short period case requires that \|u\| < period(x). For the reverse search we need to compute a different critical factorization x = u' v' where \|v'\| < period(x), because we are searching for the reversed needle. A short v' also benefits the algorithm in general. The reverse critical factorization is computed quickly by using the same maximal suffix algorithm, but terminating as soon as we have a location with local period equal to period(x). This adds extra fields crit_pos_back and memory_back for the reverse case. The new overhead for TwoWaySearcher::new is low, and additionally I think the "short period" case is uncommon in many applications of string search. The maximal_suffix methods were updated in documentation and the algorithms updated to not use !0 and wrapping add, variable left is now 1 larger, offset 1 smaller. Use periodicity when computing byteset: in the periodic case, just iterate over one period instead of the whole needle. Example before (rfind) after (twoway_rfind) benchmark shows the removal of quadratic behavior. needle: "ab" * 100, haystack: ("bb" + "ab" * 100) * 100 ``` test periodic::rfind ... bench: 1,926,595 ns/iter (+/- 11,390) = 10 MB/s test periodic::twoway_rfind ... bench: 51,740 ns/iter (+/- 66) = 386 MB/s ```	2015-08-18 02:02:57 +00:00
bors	e2bebf32fa	Auto merge of #27696 - bluss:into-boxed-str, r=alexcrichton Rename String::into_boxed_slice -> into_boxed_str This is the name that was decided in rust-lang/rfcs#1152, and it's better if we say “boxed str” for `Box<str>`. The old name `String::into_boxed_slice` is deprecated.	2015-08-14 01:06:37 +00:00
Ulrik Sverdrup	bec64090a7	Rename String::into_boxed_slice -> into_boxed_str This is the name that was decided in rust-lang/rfcs#1152, and it's better if we say “boxed str” for `Box<str>`. The old name `String::into_boxed_slice` is deprecated.	2015-08-13 14:02:00 +02:00
Alex Crichton	8d90d3f368	Remove all unstable deprecated functionality This commit removes all unstable and deprecated functions in the standard library. A release was recently cut (1.3) which makes this a good time for some spring cleaning of the deprecated functions.	2015-08-12 14:55:17 -07:00
Ulrik Sverdrup	c5a1d8c3db	StrSearcher: Add tests for rfind(&str) Add tests for .rfind(&str), using the reverse searcher case for substring search.	2015-08-02 20:08:35 +02:00
Alexis Beingessner	3e954a8cb2	implement Clone for Box<str>, closes #27323 This is a minor [breaking-change], as it changes what `boxed_str.to_owned()` does (previously it would deref to `&str` and call `to_owned` on that to get a `String`). However `Box<str>` is such an exceptionally rare type that this is not expected to be a serious concern. Also a `Box<str>` can be freely converted to a `String` to obtain the previous behaviour anyway.	2015-07-29 18:43:01 -07:00
bors	dd46cf8b22	Auto merge of #26241 - SimonSapin:derefmut-for-string, r=alexcrichton See https://github.com/rust-lang/rfcs/issues/1157	2015-07-13 23:47:06 +00:00
Simon Sapin	7469914e96	Add str::split_at_mut	2015-07-13 16:21:43 +02:00
bors	05d8767289	Auto merge of #26957 - wesleywiser:rename_connect_to_join, r=alexcrichton Fixes #26900	2015-07-12 22:05:59 +00:00
Jonathan Reem	69521affbb	Add String::into_boxed_slice and Box<str>::into_string Implements merged RFC 1152. Closes #26697.	2015-07-11 21:31:56 -07:00
Wesley Wiser	93ddee6cee	Change some instances of .connect() to .join()	2015-07-10 19:40:46 -04:00
Ulrik Sverdrup	b890b7bbc7	StrSearcher: Update substring search to use the Two Way algorithm To improve our substring search performance, revive the two way searcher and adapt it to the Pattern API. Fixes #25483, a performance bug: that particular case now completes faster in optimized rust than in ruby (but they share the same order of magnitude). Much thanks to @gereeter who helped me understand the reverse case better and wrote the comment explaining `next_back` in the code. I had quickcheck to fuzz test forward and reverse searching thoroughly. The two way searcher implements both forward and reverse search, but not double ended search. The forward and reverse parts of the two way searcher are completely independent. The two way searcher algorithm has very small, constant space overhead, requiring no dynamic allocation. Our implementation is relatively fast, especially due to the `byteset` addition to the algorithm, which speeds up many no-match cases. A bad case for the two way algorithm is: ``` let haystack = (0..10_000).map(\|_\| "dac").collect::<String>(); let needle = (0..100).map(\|_\| "bac").collect::<String>()); ``` For this particular case, two way is not much faster than the naive implementation it replaces.	2015-06-21 19:58:50 +02:00
Alex Crichton	ce1a965cf5	Fallout in tests and docs from feature renamings	2015-06-17 09:07:16 -07:00
bors	fbb13543fc	Auto merge of #25839 - bluss:str-split-at-impl, r=alexcrichton Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str). Also a minor cleanup in the collections::str module. Remove redundant slicing of self.	2015-06-11 00:22:27 +00:00
Ulrik Sverdrup	d43bf53948	Add str::split_at Implement RFC rust-lang/rfcs#1123 Add str method str::split_at(mid: usize) -> (&str, &str).	2015-06-10 09:15:07 +02:00
bors	f06e026578	Auto merge of #26039 - SimonSapin:case-mapping, r=alexcrichton * Add “complex” mappings to `char::to_lowercase` and `char::to_uppercase`, making them yield sometimes more than on `char`: #25800. `str::to_lowercase` and `str::to_uppercase` are affected as well. * Add `char::to_titlecase`, since it’s the same algorithm (just different data). However this does not add `str::to_titlecase`, as that would require UAX#29 Unicode Text Segmentation which we decided not to include in of `std`: https://github.com/rust-lang/rfcs/pull/1054 I made `char::to_titlecase` immediately `#[stable]`, since it’s so similar to `char::to_uppercase` that’s already stable. Let me know if it should be `#[unstable]` for a while. * Add a special case for upper-case Sigma in word-final position in `str::to_lowercase`: #26035. This is the only language-independent conditional mapping currently in `SpecialCasing.txt`. * Stabilize `str::to_lowercase` and `str::to_uppercase`. The `&self -> String` on `str` signature seems straightforward enough, and the only relevant issue I’ve found is #24536 about naming. But `char` already has stable methods with the same name, and deprecating them for a rename doesn’t seem worth it. r? @alexcrichton	2015-06-09 20:00:32 +00:00
Simon Sapin	c57a4124ff	Address a review comment and fix a bootstrapping issue	2015-06-08 19:50:28 +02:00
Simon Sapin	c160192f5f	Replace usage of String::from_str with String:from	2015-06-08 16:55:35 +02:00
Simon Sapin	f901086b0d	Correctly map upper-case Sigma to lower-case in word-final position. Fix #26035 .	2015-06-06 12:37:11 +02:00
kwantam	c361e13d71	implement rfc 1054: split_whitespace() fn, deprecate words() For now, words() is left in (but deprecated), and Words is a type alias for struct SplitWhitespace. Also cleaned up references to s.words() throughout codebase. Closes #15628	2015-04-21 15:31:51 -04:00
kwantam	29d1252e4d	deprecate Unicode functions that will be moved to crates.io This patch 1. renames libunicode to librustc_unicode, 2. deprecates several pieces of libunicode (see below), and 3. removes references to deprecated functions from librustc_driver and libsyntax. This may change pretty-printed output from these modules in cases involving wide or combining characters used in filenames, identifiers, etc. The following functions are marked deprecated: 1. char.width() and str.width(): --> use unicode-width crate 2. str.graphemes() and str.grapheme_indices(): --> use unicode-segmentation crate 3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(), char.compose(), char.decompose_canonical(), char.decompose_compatible(), char.canonical_combining_class(): --> use unicode-normalization crate	2015-04-16 17:03:05 -04:00
Alex Crichton	f329030b09	std: Stabilize the Utf8Error type The meaning of each variant of this enum was somewhat ambiguous and it's uncler that we wouldn't even want to add more enumeration values in the future. As a result this error has been altered to instead become an opaque structure. Learning about the "first invalid byte index" is still an unstable feature, but the type itself is now stable.	2015-04-10 16:07:46 -07:00
bors	dd6c4a8f15	Auto merge of #23293 - tbu-:pr_additive_multiplicative, r=alexcrichton Previously it could not be implemented for types outside `libcore/iter.rs` due to coherence issues.	2015-04-08 00:42:10 +00:00
Tobias Bucher	97f24a8596	Make `sum` and `product` inherent methods on `Iterator` In addition to being nicer, this also allows you to use `sum` and `product` for iterators yielding custom types aside from the standard integers. Due to removing the `AdditiveIterator` and `MultiplicativeIterator` trait, this is a breaking change. [breaking-change]	2015-04-08 00:26:35 +02:00
Marvin Löbel	fbba28e246	Added smoke tests for new methods. Fixed bug in existing StrSearcher impl	2015-04-05 18:52:58 +02:00
Marvin Löbel	c29559d28a	Moved coretest::str tests into collectiontest::str	2015-04-05 18:52:58 +02:00
Alex Crichton	e98dce3e00	std: Changing the meaning of the count to splitn This commit is an implementation of [RFC 979][rfc] which changes the meaning of the count parameter to the `splitn` function on strings and slices. The parameter now means the number of items that are returned from the iterator, not the number of splits that are made. [rfc]: https://github.com/rust-lang/rfcs/pull/979 Closes #23911 [breaking-change]	2015-04-01 13:29:42 -07:00

1 2

58 Commits