This commit starts out by consolidating all `str` extension traits into one
`StrExt` trait to be included in the prelude. This means that
`UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into
one `StrExt` exported by the standard library. Some functionality is currently
duplicated with the `StrExt` present in libcore.
This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
* from_utf8_unchecked
* CowString - after moving to std::string
* StrExt::as_bytes
* StrExt::as_ptr
* StrExt::bytes/Bytes - also made a struct instead of a typedef
* StrExt::char_indices/CharIndices - CharOffsets was renamed
* StrExt::chars/Chars
* StrExt::is_empty
* StrExt::len
* StrExt::lines/Lines
* StrExt::lines_any/LinesAny
* StrExt::slice_unchecked
* StrExt::trim
* StrExt::trim_left
* StrExt::trim_right
* StrExt::words/Words - also made a struct instead of a typedef
Unstable
* from_utf8 - the error type was changed to a `Result`, but the error type has
yet to prove itself
* from_c_str - this function will be handled by the c_str RFC
* FromStr - this trait will have an associated error type eventually
* StrExt::escape_default - needs iterators at least, unsure if it should make
the cut
* StrExt::escape_unicode - needs iterators at least, unsure if it should make
the cut
* StrExt::slice_chars - this function has yet to prove itself
* StrExt::slice_shift_char - awaiting conventions about slicing and shifting
* StrExt::graphemes/Graphemes - this functionality may only be in libunicode
* StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in
libunicode
* StrExt::width - this functionality may only be in libunicode
* StrExt::utf16_units - this functionality may only be in libunicode
* StrExt::nfd_chars - this functionality may only be in libunicode
* StrExt::nfkd_chars - this functionality may only be in libunicode
* StrExt::nfc_chars - this functionality may only be in libunicode
* StrExt::nfkc_chars - this functionality may only be in libunicode
* StrExt::is_char_boundary - naming is uncertain with container conventions
* StrExt::char_range_at - naming is uncertain with container conventions
* StrExt::char_range_at_reverse - naming is uncertain with container conventions
* StrExt::char_at - naming is uncertain with container conventions
* StrExt::char_at_reverse - naming is uncertain with container conventions
* StrVector::concat - this functionality may be replaced with iterators, but
it's not certain at this time
* StrVector::connect - as with concat, may be deprecated in favor of iterators
Deprecated
* StrAllocating and UnicodeStrPrelude have been merged into StrExit
* eq_slice - compiler implementation detail
* from_str - use the inherent parse() method
* is_utf8 - call from_utf8 instead
* replace - call the method instead
* truncate_utf16_at_nul - this is an implementation detail of windows and does
not need to be exposed.
* utf8_char_width - moved to libunicode
* utf16_items - moved to libunicode
* is_utf16 - moved to libunicode
* Utf16Items - moved to libunicode
* Utf16Item - moved to libunicode
* Utf16Encoder - moved to libunicode
* AnyLines - renamed to LinesAny and made a struct
* SendStr - use CowString<'static> instead
* str::raw - all functionality is deprecated
* StrExt::into_string - call to_string() instead
* StrExt::repeat - use iterators instead
* StrExt::char_len - use .chars().count() instead
* StrExt::is_alphanumeric - use .chars().all(..)
* StrExt::is_whitespace - use .chars().all(..)
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
* Str - while currently used for generic programming, this trait will be
replaced with one of [], deref coercions, or a generic conversion trait.
* StrExt::slice - use slicing syntax instead
* StrExt::slice_to - use slicing syntax instead
* StrExt::slice_from - use slicing syntax instead
* StrExt::lev_distance - deprecated with no replacement
Awaiting stabilization due to patterns and/or matching
* StrExt::contains
* StrExt::contains_char
* StrExt::split
* StrExt::splitn
* StrExt::split_terminator
* StrExt::rsplitn
* StrExt::match_indices
* StrExt::split_str
* StrExt::starts_with
* StrExt::ends_with
* StrExt::trim_chars
* StrExt::trim_left_chars
* StrExt::trim_right_chars
* StrExt::find
* StrExt::rfind
* StrExt::find_str
* StrExt::subslice_offset
`String::push(&mut self, ch: char)` currently has a single code path that calls `Char::encode_utf8`. This adds a fast path for ASCII `char`s, which are represented as a single byte in UTF-8.
Benchmarks of stage1 libcollections at the intermediate commit show that the fast path very significantly improves the performance of repeatedly pushing an ASCII `char`, but does not significantly affect the performance for a non-ASCII `char` (where the fast path is not taken).
```
bench_push_char_one_byte 59552 ns/iter (+/- 2132) = 167 MB/s
bench_push_char_one_byte_with_fast_path 6563 ns/iter (+/- 658) = 1523 MB/s
bench_push_char_two_bytes 71520 ns/iter (+/- 3541) = 279 MB/s
bench_push_char_two_bytes_with_slow_path 71452 ns/iter (+/- 4202) = 279 MB/s
bench_push_str_one_byte 38910 ns/iter (+/- 2477) = 257 MB/s
```
A benchmark of pushing a one-byte-long `&str` is added for comparison, but its performance [has varied a lot lately](https://github.com/rust-lang/rust/pull/19640#issuecomment-67741561). (When the input is fixed, `s.push_str("x")` could be used just as well as `s.push('x')`.)
TL;DR I wrongly implemented these two ops, namely `"prefix" + "suffix".to_string()` gives back `"suffixprefix"`. Let's remove them.
The correct implementation of these operations (`lhs.clone().push_str(rhs.as_slice())`) is really wasteful, because the lhs has to be cloned and the rhs gets moved/consumed just to be dropped (no buffer reuse). For this reason, I'd prefer to drop the implementation instead of fixing it. This leaves us with the fact that you'll be able to do `String + &str` but not `&str + String`, which may be unexpected.
r? @aturon
Closes#19952
`String::push(&mut self, ch: char)` currently has a single code path
that calls `Char::encode_utf8`.
Perhaps it could be faster for ASCII `char`s, which are represented as
a single byte in UTF-8.
This commit leaves the method unchanged,
adds a copy of it with the fast path,
and adds benchmarks to compare them.
Results show that the fast path very significantly improves the performance
of repeatedly pushing an ASCII `char`,
but does not significantly affect the performance for a non-ASCII `char`
(where the fast path is not taken).
Output of `make check-stage1-collections NO_REBUILD=1 PLEASE_BENCH=1 TESTNAME=string::tests::bench_push`
```
test string::tests::bench_push_char_one_byte ... bench: 59552 ns/iter (+/- 2132) = 167 MB/s
test string::tests::bench_push_char_one_byte_with_fast_path ... bench: 6563 ns/iter (+/- 658) = 1523 MB/s
test string::tests::bench_push_char_two_bytes ... bench: 71520 ns/iter (+/- 3541) = 279 MB/s
test string::tests::bench_push_char_two_bytes_with_slow_path ... bench: 71452 ns/iter (+/- 4202) = 279 MB/s
test string::tests::bench_push_str ... bench: 24 ns/iter (+/- 2)
test string::tests::bench_push_str_one_byte ... bench: 38910 ns/iter (+/- 2477) = 257 MB/s
```
A benchmark of pushing a one-byte-long `&str` is added for comparison,
but its performance [has varied a lot lately](
https://github.com/rust-lang/rust/pull/19640#issuecomment-67741561).
(When the input is fixed, `s.push_str("x")` could be used
instead of `s.push('x')`.)
followed by a semicolon.
This allows code like `vec![1i, 2, 3].len();` to work.
This breaks code that uses macros as statements without putting
semicolons after them, such as:
fn main() {
...
assert!(a == b)
assert!(c == d)
println(...);
}
It also breaks code that uses macros as items without semicolons:
local_data_key!(foo)
fn main() {
println("hello world")
}
Add semicolons to fix this code. Those two examples can be fixed as
follows:
fn main() {
...
assert!(a == b);
assert!(c == d);
println(...);
}
local_data_key!(foo);
fn main() {
println("hello world")
}
RFC #378.
Closes#18635.
[breaking-change]
This commit performs a second pass stabilization of the `std::default` module.
The module was already marked `#[stable]`, and the inheritance of `#[stable]`
was removed since this attribute was applied. This commit adds the `#[stable]`
attribute to the trait definition and one method name, along with all
implementations found in the standard distribution.
This commit collapses the various prelude traits for slices into just one trait:
* SlicePrelude/SliceAllocPrelude => SliceExt
* CloneSlicePrelude/CloneSliceAllocPrelude => CloneSliceExt
* OrdSlicePrelude/OrdSliceAllocPrelude => OrdSliceExt
* PartialEqSlicePrelude => PartialEqSliceExt
This commit collapses the various prelude traits for slices into just one trait:
* SlicePrelude/SliceAllocPrelude => SliceExt
* CloneSlicePrelude/CloneSliceAllocPrelude => CloneSliceExt
* OrdSlicePrelude/OrdSliceAllocPrelude => OrdSliceExt
* PartialEqSlicePrelude => PartialEqSliceExt
Strings iterate to both char and &str, so it is natural it can also be extended or collected from an iterator of &str.
Apart from the trait implementations, `Extend<char>` is updated to use the iterator size hint, and the test added tests both the char and the &str versions of Extend and FromIterator.
Change Example to Examples.
Add a doctest that better demonstrates the utility of as_string.
Update the doctest example to use String instead of &String.
This commit is an implementation of [RFC 240][rfc] when applied to the standard
library. It primarily deprecates the entirety of `string::raw`, `vec::raw`,
`slice::raw`, and `str::raw` in favor of associated functions, methods, and
other free functions. The detailed renaming is:
* slice::raw::buf_as_slice => slice::with_raw_buf
* slice::raw::mut_buf_as_slice => slice::with_raw_mut_buf
* slice::shift_ptr => deprecated with no replacement
* slice::pop_ptr => deprecated with no replacement
* str::raw::from_utf8 => str::from_utf8_unchecked
* str::raw::c_str_to_static_slice => str::from_c_str
* str::raw::slice_bytes => deprecated for slice_unchecked (slight semantic diff)
* str::raw::slice_unchecked => str.slice_unchecked
* string::raw::from_parts => String::from_raw_parts
* string::raw::from_buf_len => String::from_raw_buf_len
* string::raw::from_buf => String::from_raw_buf
* string::raw::from_utf8 => String::from_utf8_unchecked
* vec::raw::from_buf => Vec::from_raw_buf
All previous functions exist in their `#[deprecated]` form, and the deprecation
messages indicate how to migrate to the newer variants.
[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/0240-unsafe-api-location.md
[breaking-change]
Closes#17863
Throughout the docs, "failure" was replaced with "panics" if it means a
task panic. Otherwise, it remained as is, or changed to "errors" to
clearly differentiate it from a task panic.
* Renames/deprecates the simplest and most obvious methods
* Adds FIXME(conventions)s for outstanding work
* Marks "handled" methods as unstable
NOTE: the semantics of reserve and reserve_exact have changed!
Other methods have had their semantics changed as well, but in a
way that should obviously not typecheck if used incorrectly.
Lots of work and breakage to come, but this handles most of the core
APIs and most eggregious breakage. Future changes should *mostly* focus on
niche collections, APIs, or simply back-compat additions.
[breaking-change]
This commit renames a number of extension traits for slices and string
slices, now that they have been refactored for DST. In many cases,
multiple extension traits could now be consolidated. Further
consolidation will be possible with generalized where clauses.
The renamings are consistent with the [new `-Prelude`
suffix](https://github.com/rust-lang/rfcs/pull/344). There are probably
a few more candidates for being renamed this way, but that is left for
API stabilization of the relevant modules.
Because this renames traits, it is a:
[breaking-change]
However, I do not expect any code that currently uses the standard
library to actually break.
Closes#17917
As part of the collections reform RFC, this commit removes all collections
traits in favor of inherent methods on collections themselves. All methods
should continue to be available on all collections.
This is a breaking change with all of the collections traits being removed and
no longer being in the prelude. In order to update old code you should move the
trait implementations to inherent implementations directly on the type itself.
Note that some traits had default methods which will also need to be implemented
to maintain backwards compatibility.
[breaking-change]
cc #18424
This commit adds the following impls:
impl<T> Deref<[T]> for Vec<T>
impl<T> DerefMut<[T]> for Vec<T>
impl Deref<str> for String
This commit also removes all duplicated inherent methods from vectors and
strings as implementations will now silently call through to the slice
implementation. Some breakage occurred at std and beneath due to inherent
methods removed in favor of those in the slice traits and std doesn't use its
own prelude,
cc #18424
https://github.com/rust-lang/rfcs/pull/221
The current terminology of "task failure" often causes problems when
writing or speaking about code. You often want to talk about the
possibility of an operation that returns a Result "failing", but cannot
because of the ambiguity with task failure. Instead, you have to speak
of "the failing case" or "when the operation does not succeed" or other
circumlocutions.
Likewise, we use a "Failure" header in rustdoc to describe when
operations may fail the task, but it would often be helpful to separate
out a section describing the "Err-producing" case.
We have been steadily moving away from task failure and toward Result as
an error-handling mechanism, so we should optimize our terminology
accordingly: Result-producing functions should be easy to describe.
To update your code, rename any call to `fail!` to `panic!` instead.
Assuming you have not created your own macro named `panic!`, this
will work on UNIX based systems:
grep -lZR 'fail!' . | xargs -0 -l sed -i -e 's/fail!/panic!/g'
You can of course also do this by hand.
[breaking-change]
Spring cleaning is here! In the Fall! This commit removes quite a large amount
of deprecated functionality from the standard libraries. I tried to ensure that
only old deprecated functionality was removed.
This is removing lots and lots of deprecated features, so this is a breaking
change. Please consult the deprecation messages of the deleted code to see how
to migrate code forward if it still needs migration.
[breaking-change]
compiletest: compact "linux" "macos" etc.as "unix".
liballoc: remove a superfluous "use".
libcollections: remove invocations of deprecated methods in favor of
their suggested replacements and use "_" for a loop counter.
libcoretest: remove invocations of deprecated methods; also add
"allow(deprecated)" for testing a deprecated method itself.
libglob: use "cfg_attr".
libgraphviz: add a test for one of data constructors.
libgreen: remove a superfluous "use".
libnum: "allow(type_overflow)" for type cast into u8 in a test code.
librustc: names of static variables should be in upper case.
libserialize: v[i] instead of get().
libstd/ascii: to_lowercase() instead of to_lower().
libstd/bitflags: modify AnotherSetOfFlags to use i8 as its backend.
It will serve better for testing various aspects of bitflags!.
libstd/collections: "allow(deprecated)" for testing a deprecated
method itself.
libstd/io: remove invocations of deprecated methods and superfluous "use".
Also add #[test] where it was missing.
libstd/num: introduce a helper function to effectively remove
invocations of a deprecated method.
libstd/path and rand: remove invocations of deprecated methods and
superfluous "use".
libstd/task and libsync/comm: "allow(deprecated)" for testing
a deprecated method itself.
libsync/deque: remove superfluous "unsafe".
libsync/mutex and once: names of static variables should be in upper case.
libterm: introduce a helper function to effectively remove
invocations of a deprecated method.
We still see a few warnings about using obsoleted native::task::spawn()
in the test modules for libsync. I'm not sure how I should replace them
with std::task::TaksBuilder and native::task::NativeTaskBuilder
(dependency to libstd?)
Signed-off-by: NODA, Kai <nodakai@gmail.com>
This provides a way to pass `&[T]` to functions taking `&U` where `U` is
a `Vec<T>`. This is useful in many cases not covered by the Equiv trait
or methods like `find_with` on TreeMap.
This commit deprecates the String::shift_char() function in favor of the
addition of an insert()/remove() pair of functions. This aligns the API with Vec
in that characters can be inserted at arbitrary positions. Additionaly, there
is no `_char` suffix due to the rationaled laid out in the previous commit.
These functions are both introduced as unstable as their failure semantics,
while in line with slices/vectors, are uncertain about whether they should
remain the same.
# Rationale
When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name. There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).
The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:
> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.
# Method stabilization
The following methods have been marked #[stable]
* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice
The following methods have been marked #[unstable]
* String::from_utf8 - The error type in the returned `Result` may change to
provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
stabilization
* String::from_utf16 - The return type may change to become a `Result` which
includes more contextual information like where the error
occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
been marked #[unstable] becuase it can be seen as a
duplicate of iterator-based functionality as well as
possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
less efficient because of decoding/encoding. Due to the
desire to minimize API surface this may be able to be
removed in the future for something possibly generic with
no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
indices isn't totally clear at this time, so the failure
semantics and return value of this method are subject to
change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]
[2]: https://github.com/rust-lang/rfcs/pull/240
The following method have been marked #[experimental]
* String::from_str - This function only exists as it's more efficient than
to_string(), but having a less ergonomic function for
performance reasons isn't the greatest reason to keep it
around. Like Vec::push_all, this has been marked
experimental for now.
The following methods have been #[deprecated]
* String::append - This method has been deprecated to remain consistent with the
deprecation of Vec::append. While convenient, it is one of
the only functional-style apis on String, and requires more
though as to whether it belongs as a first-class method or
now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
str::from_utf8 plus an assert plus a call to to_string().
Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
type which allow bypassing utf-8 checks. These have all
been deprecated in favor of calling `.as_mut_vec()` and
then operating directly on the vector returned. These
methods were deprecated because naming them with relation
to other methods was difficult to rationalize and it's
arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes
# Reservation methods
This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]
[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
Change to resolve and update compiler and libs for uses.
[breaking-change]
Enum variants are now in both the value and type namespaces. This means that
if you have a variant with the same name as a type in scope in a module, you
will get a name clash and thus an error. The solution is to either rename the
type or the variant.
The first commit improves code generation through a few changes:
- The `#[inline]` attributes allow llvm to constant fold the encoding step away in certain situations. For example, code like this changes from a call to `encode_utf8` in a inner loop to the pushing of a byte constant:
```rust
let mut s = String::new();
for _ in range(0u, 21) {
s.push_char('a');
}
```
- Both methods changed their semantic from causing run time failure if the target buffer is not large enough to returning `None` instead. This makes llvm no longer emit code for causing failure for these methods.
- A few debug `assert!()` calls got removed because they affected code generation due to unwinding, and where basically unnecessary with today's sound handling of `char` as a Unicode scalar value.
~~The second commit is optional. It changes the methods from regular indexing with the `dst[i]` syntax to unsafe indexing with `dst.unsafe_mut_ref(i)`. This does not change code generation directly - in both cases llvm is smart enough to see that there can never be an out-of-bounds access. But it makes it emit a `nounwind` attribute for the function.
However, I'm not sure whether that is a real improvement, so if there is any objection to this I'll remove the commit.~~
This changes how the methods behave on a too small buffer, so this is a
[breaking-change]
declared with the same name in the same scope.
This breaks several common patterns. First are unused imports:
use foo::bar;
use baz::bar;
Change this code to the following:
use baz::bar;
Second, this patch breaks globs that import names that are shadowed by
subsequent imports. For example:
use foo::*; // including `bar`
use baz::bar;
Change this code to remove the glob:
use foo::{boo, quux};
use baz::bar;
Or qualify all uses of `bar`:
use foo::{boo, quux};
use baz;
... baz::bar ...
Finally, this patch breaks code that, at top level, explicitly imports
`std` and doesn't disable the prelude.
extern crate std;
Because the prelude imports `std` implicitly, there is no need to
explicitly import it; just remove such directives.
The old behavior can be opted into via the `import_shadowing` feature
gate. Use of this feature gate is discouraged.
This implements RFC #116.
Closes#16464.
[breaking-change]
- Both can now be inlined and constant folded away
- Both can no longer cause failure
- Both now return an `Option` instead
Removed debug `assert!()`s over the valid ranges of a `char`
- It affected optimizations due to unwinding
- Char handling is now sound enought that they became uneccessary
This required some contortions because importing both raw::Slice
and slice::Slice makes rustc crash.
Since `Slice` is in the prelude, this renaming is unlikely to
casue breakage.
[breaking-change]
Implement for Vec, DList, RingBuf. Add MutableSeq to the prelude.
Since the collections traits are in the prelude most consumers of
these methods will continue to work without change.
[breaking-change]