The first commit improves code generation through a few changes:
- The `#[inline]` attributes allow llvm to constant fold the encoding step away in certain situations. For example, code like this changes from a call to `encode_utf8` in a inner loop to the pushing of a byte constant:
```rust
let mut s = String::new();
for _ in range(0u, 21) {
s.push_char('a');
}
```
- Both methods changed their semantic from causing run time failure if the target buffer is not large enough to returning `None` instead. This makes llvm no longer emit code for causing failure for these methods.
- A few debug `assert!()` calls got removed because they affected code generation due to unwinding, and where basically unnecessary with today's sound handling of `char` as a Unicode scalar value.
~~The second commit is optional. It changes the methods from regular indexing with the `dst[i]` syntax to unsafe indexing with `dst.unsafe_mut_ref(i)`. This does not change code generation directly - in both cases llvm is smart enough to see that there can never be an out-of-bounds access. But it makes it emit a `nounwind` attribute for the function.
However, I'm not sure whether that is a real improvement, so if there is any objection to this I'll remove the commit.~~
This changes how the methods behave on a too small buffer, so this is a
[breaking-change]
declared with the same name in the same scope.
This breaks several common patterns. First are unused imports:
use foo::bar;
use baz::bar;
Change this code to the following:
use baz::bar;
Second, this patch breaks globs that import names that are shadowed by
subsequent imports. For example:
use foo::*; // including `bar`
use baz::bar;
Change this code to remove the glob:
use foo::{boo, quux};
use baz::bar;
Or qualify all uses of `bar`:
use foo::{boo, quux};
use baz;
... baz::bar ...
Finally, this patch breaks code that, at top level, explicitly imports
`std` and doesn't disable the prelude.
extern crate std;
Because the prelude imports `std` implicitly, there is no need to
explicitly import it; just remove such directives.
The old behavior can be opted into via the `import_shadowing` feature
gate. Use of this feature gate is discouraged.
This implements RFC #116.
Closes#16464.
[breaking-change]
- Both can now be inlined and constant folded away
- Both can no longer cause failure
- Both now return an `Option` instead
Removed debug `assert!()`s over the valid ranges of a `char`
- It affected optimizations due to unwinding
- Char handling is now sound enought that they became uneccessary
ImmutableVector -> ImmutableSlice
ImmutableEqVector -> ImmutableEqSlice
ImmutableOrdVector -> ImmutableOrdSlice
MutableVector -> MutableSlice
MutableVectorAllocating -> MutableSliceAllocating
MutableCloneableVector -> MutableCloneableSlice
MutableOrdVector -> MutableOrdSlice
These are all in the prelude so most code will not break.
[breaking-change]
This trait was only implemented by `String`. It provided the methods
`into_bytes` and `append`, both of which **are already implemented as normal
methods** of `String` (not as trait methods). This change improves the
consistency of strings.
This shouldn't break any code, except if somebody has implemented
`OwnedStr` for a user-defined type.
Implement for Vec, DList, RingBuf. Add MutableSeq to the prelude.
Since the collections traits are in the prelude most consumers of
these methods will continue to work without change.
[breaking-change]
Reimplement the string slice's `Iterator<char>` by wrapping the already efficient
slice iterator.
The iterator uses our guarantee that the string contains valid UTF-8, but its only unsafe
code is transmuting the decoded `u32` into `char`.
Benchmarks suggest that the runtime of `Chars` benchmarks are reduced by up to 30%,
runtime of `Chars` reversed reduced by up to 60%.
```
BEFORE
test str::bench::char_indicesator ... bench: 124 ns/iter (+/- 1)
test str::bench::char_indicesator_rev ... bench: 188 ns/iter (+/- 9)
test str::bench::char_iterator ... bench: 122 ns/iter (+/- 2)
test str::bench::char_iterator_ascii ... bench: 302 ns/iter (+/- 41)
test str::bench::char_iterator_for ... bench: 123 ns/iter (+/- 4)
test str::bench::char_iterator_rev ... bench: 189 ns/iter (+/- 14)
test str::bench::char_iterator_rev_for ... bench: 177 ns/iter (+/- 4)
AFTER
test str::bench::char_indicesator ... bench: 85 ns/iter (+/- 3)
test str::bench::char_indicesator_rev ... bench: 82 ns/iter (+/- 2)
test str::bench::char_iterator ... bench: 100 ns/iter (+/- 3)
test str::bench::char_iterator_ascii ... bench: 317 ns/iter (+/- 3)
test str::bench::char_iterator_for ... bench: 86 ns/iter (+/- 2)
test str::bench::char_iterator_rev ... bench: 80 ns/iter (+/- 6)
test str::bench::char_iterator_rev_for ... bench: 68 ns/iter (+/- 0)
```
Note: Branch name is no longer indicative of the implementation.
- `width()` computes the displayed width of a string, ignoring the width of control characters.
- arguably we might do *something* else for control characters, but the question is, what?
- users who want to do something else can iterate over chars()
- `graphemes()` returns a `Graphemes` struct, which implements an iterator over the grapheme clusters of a &str.
- fully compliant with [UAX#29](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)
- passes all [Unicode-supplied tests](http://www.unicode.org/reports/tr41/tr41-15.html#Tests29)
- added code to generate additionial categories in `unicode.py`
- `Cn` aka `Not_Assigned`
- categories necessary for grapheme cluster breaking
- tidied up the exports from libunicode
- all exports are exposed through a module rather than directly at crate root.
- std::prelude imports UnicodeChar and UnicodeStrSlice from std::char and std::str rather than directly from libunicode
closes#7043
- Graphemes and GraphemeIndices structs implement iterators over
grapheme clusters analogous to the Chars and CharOffsets for chars in
a string. Iterator and DoubleEndedIterator are available for both.
- tidied up the exports for libunicode. crate root exports are now moved
into more appropriate module locations:
- UnicodeStrSlice, Words, Graphemes, GraphemeIndices are in str module
- UnicodeChar exported from char instead of crate root
- canonical_combining_class is exported from str rather than crate root
Since libunicode's exports have changed, programs that previously relied
on the old export locations will need to change their `use` statements
to reflect the new ones. See above for more information on where the new
exports live.
closes#7043
[breaking-change]
Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
- Unicode-aware functions now live in libunicode
- `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
`is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
`to_uppercase`, `to_lowercase`
- added `width` method in UnicodeChar trait
- determines printed width of character in columns, or None if it is a non-NULL control character
- takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
- functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
- `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
- also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
- thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`
The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
- unicode tests live in coretest crate
- libcollections str tests need UnicodeChar trait.
- libregex perlw tests were checking a char in the Alphabetic category,
\x2161. Confirmed perl 5.18 considers this a \w character. Changed to
\x2961, which is not \w as the test expects.
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
- Unicode-aware functions now live in libunicode
- is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
is_uppercase, is_whitespace, is_alphanumeric, is_control,
is_digit, to_uppercase, to_lowercase
- added width method in UnicodeChar trait
- determines printed width of character in columns, or None if it is
a non-NULL control character
- takes a boolean argument indicating whether the present context is
CJK or not (characters with 'A'mbiguous widths are double-wide in
CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
(libunicode)
- functionality formerly in StrSlice that relied upon Unicode
functionality from Char is now in UnicodeStrSlice
- words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
- also moved Words type alias into libunicode because words method is
in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
combining the libunicode functionality with the Char functionality
from libcore
- thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str
The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]