Thanks to comments from @huonw, clarify decoding details and use
statics for important constants for UTF-8 decoding. Convert some magic
numbers scattered in the same file to use the statics too.
Re-use the vector iterator to implement the chars iterator.
The iterator uses our guarantee that the string contains valid UTF-8,
but its only unsafe code is transmuting the decoded u32 into char.
Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
- Unicode-aware functions now live in libunicode
- `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
`is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
`to_uppercase`, `to_lowercase`
- added `width` method in UnicodeChar trait
- determines printed width of character in columns, or None if it is a non-NULL control character
- takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
- functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
- `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
- also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
- thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`
The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
- Unicode-aware functions now live in libunicode
- is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
is_uppercase, is_whitespace, is_alphanumeric, is_control,
is_digit, to_uppercase, to_lowercase
- added width method in UnicodeChar trait
- determines printed width of character in columns, or None if it is
a non-NULL control character
- takes a boolean argument indicating whether the present context is
CJK or not (characters with 'A'mbiguous widths are double-wide in
CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
(libunicode)
- functionality formerly in StrSlice that relied upon Unicode
functionality from Char is now in UnicodeStrSlice
- words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
- also moved Words type alias into libunicode because words method is
in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
combining the libunicode functionality with the Char functionality
from libcore
- thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str
The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
Being able to index into the bytes of a string encourages
poor UTF-8 hygiene. To get a view of `&[u8]` from either
a `String` or `&str` slice, use the `as_bytes()` method.
Closes#12710.
[breaking-change]
Closes#14358.
~~The tests are not yet moved to `utf16_iter`, so this probably won't compile. I'm submitting this PR anyway so it can be reviewed and since it was mentioned in #14611.~~ EDIT: Tests now use `utf16_iter`.
This deprecates `.to_utf16`. `x.to_utf16()` should be replaced by either `x.utf16_iter().collect::<Vec<u16>>()` (the type annotation may be optional), or just `x.utf16_iter()` directly, if it can be used in an iterator context.
[breaking-change]
cc @huonw
This deprecates `.to_utf16`. `x.to_utf16()` should be replaced by either
`x.utf16_units().collect::<Vec<u16>>()` (the type annotation may be optional), or
just `x.utf16_units()` directly, if it can be used in an iterator context.
Closes#14358
[breaking-change]
I ended up altering the semantics of Json's PartialOrd implementation.
It used to be the case that Null < Null, but I can't think of any reason
for an ordering other than the default one so I just switched it over to
using the derived implementation.
This also fixes broken `PartialOrd` implementations for `Vec` and
`TreeMap`.
RFC: 0028-partial-cmp
Libcore's test infrastructure is complicated by the fact that many lang
items are defined in the crate. The current approach (realcore/realstd
imports) is hacky and hard to work with (tests inside of core::cmp
haven't been run for months!).
Moving tests to a separate crate does mean that they can only test the
public API of libcore, but I don't feel that that is too much of an
issue. The only tests that I had to get rid of were some checking the
various numeric formatters, but those are also exercised through normal
format! calls in other tests.
This breaks a fair amount of code. The typical patterns are:
* `for _ in range(0, 10)`: change to `for _ in range(0u, 10)`;
* `println!("{}", 3)`: change to `println!("{}", 3i)`;
* `[1, 2, 3].len()`: change to `[1i, 2, 3].len()`.
RFC #30. Closes#6023.
[breaking-change]
This commit carries out the request from issue #14678:
> The method `Iterator::len()` is surprising, as all the other uses of
> `len()` do not consume the value. `len()` would make more sense to be
> called `count()`, but that would collide with the current
> `Iterator::count(|T| -> bool) -> unit` method. That method, however, is
> a bit redundant, and can be easily replaced with
> `iter.filter(|x| x < 5).count()`.
> After this change, we could then define the `len()` method
> on `iter::ExactSize`.
Closes#14678.
[breaking-change]
This completes the last stage of the renaming of the comparison hierarchy of
traits. This change renames TotalEq to Eq and TotalOrd to Ord.
In the future the new Eq/Ord will be filled out with their appropriate methods,
but for now this change is purely a renaming change.
[breaking-change]
This commit adds support in rustdoc to recognize the `#[doc(primitive = "foo")]`
attribute. This attribute indicates that the current module is the "owner" of
the primitive type `foo`. For rustdoc, this means that the doc-comment for the
module is the doc-comment for the primitive type, plus a signal to all
downstream crates that hyperlinks for primitive types will be directed at the
crate containing the `#[doc]` directive.
Additionally, rustdoc will favor crates closest to the one being documented
which "implements the primitive type". For example, documentation of libcore
links to libcore for primitive types, but documentation for libstd and beyond
all links to libstd for primitive types.
This change involves no compiler modifications, it is purely a rustdoc change.
The landing pages for the primitive types primarily serve to show a list of
implemented traits for the primitive type itself.
The primitive types documented includes both strings and slices in a semi-ad-hoc
way, but in a way that should provide at least somewhat meaningful
documentation.
Closes#14474
This is part of the ongoing renaming of the equality traits. See #12517 for more
details. All code using Eq/Ord will temporarily need to move to Partial{Eq,Ord}
or the Total{Eq,Ord} traits. The Total traits will soon be renamed to {Eq,Ord}.
cc #12517
[breaking-change]
This changes the previously naive string searching algorithm to a two-way search like glibc, which should be faster on average while still maintaining worst case linear time complexity. This fixes#14107. Note that I don't think this should be merged yet, as this is the only approach to speeding up search I've tried - it's worth considering options like Boyer-Moore or adding a bad character shift table to this. However, the benchmarks look quite good so far:
test str::bench::bench_contains_bad_naive ... bench: 290 ns/iter (+/- 12) from 1309 ns/iter (+/- 36)
test str::bench::bench_contains_equal ... bench: 479 ns/iter (+/- 10) from 137 ns/iter (+/- 2)
test str::bench::bench_contains_short_long ... bench: 2844 ns/iter (+/- 105) from 5473 ns/iter (+/- 14)
test str::bench::bench_contains_short_short ... bench: 55 ns/iter (+/- 4) from 57 ns/iter (+/- 6)
Except for the case specifically designed to be optimal for the naive case (`bench_contains_equal`), this gets as good or better performance as the previous code.
This implements set_timeout() for std::io::Process which will affect wait()
operations on the process. This follows the same pattern as the rest of the
timeouts emerging in std::io::net.
The implementation was super easy for everything except libnative on unix
(backwards from usual!), which required a good bit of signal handling. There's a
doc comment explaining the strategy in libnative. Internally, this also required
refactoring the "helper thread" implementation used by libnative to allow for an
extra helper thread (not just the timer).
This is a breaking change in terms of the io::Process API. It is now possible
for wait() to fail, and subsequently wait_with_output(). These two functions now
return IoResult<T> due to the fact that they can time out.
Additionally, the wait_with_output() function has moved from taking `&mut self`
to taking `self`. If a timeout occurs while waiting with output, the semantics
are undesirable in almost all cases if attempting to re-wait on the process.
Equivalent functionality can still be achieved by dealing with the output
handles manually.
[breaking-change]
cc #13523
This commit revisits the `cast` module in libcore and libstd, and scrutinizes
all functions inside of it. The result was to remove the `cast` module entirely,
folding all functionality into the `mem` module. Specifically, this is the fate
of each function in the `cast` module.
* transmute - This function was moved to `mem`, but it is now marked as
#[unstable]. This is due to planned changes to the `transmute`
function and how it can be invoked (see the #[unstable] comment).
For more information, see RFC 5 and #12898
* transmute_copy - This function was moved to `mem`, with clarification that is
is not an error to invoke it with T/U that are different
sizes, but rather that it is strongly discouraged. This
function is now #[stable]
* forget - This function was moved to `mem` and marked #[stable]
* bump_box_refcount - This function was removed due to the deprecation of
managed boxes as well as its questionable utility.
* transmute_mut - This function was previously deprecated, and removed as part
of this commit.
* transmute_mut_unsafe - This function doesn't serve much of a purpose when it
can be achieved with an `as` in safe code, so it was
removed.
* transmute_lifetime - This function was removed because it is likely a strong
indication that code is incorrect in the first place.
* transmute_mut_lifetime - This function was removed for the same reasons as
`transmute_lifetime`
* copy_lifetime - This function was moved to `mem`, but it is marked
`#[unstable]` now due to the likelihood of being removed in
the future if it is found to not be very useful.
* copy_mut_lifetime - This function was also moved to `mem`, but had the same
treatment as `copy_lifetime`.
* copy_lifetime_vec - This function was removed because it is not used today,
and its existence is not necessary with DST
(copy_lifetime will suffice).
In summary, the cast module was stripped down to these functions, and then the
functions were moved to the `mem` module.
transmute - #[unstable]
transmute_copy - #[stable]
forget - #[stable]
copy_lifetime - #[unstable]
copy_mut_lifetime - #[unstable]
[breaking-change]
This moves as much allocation as possible from teh std::str module into
core::str. This includes essentially all non-allocating functionality, mostly
iterators and slicing and such.
This primarily splits the Str trait into only having the as_slice() method,
adding a new StrAllocating trait to std::str which contains the relevant new
allocation methods. This is a breaking change if any of the methods of "trait
Str" were overriden. The old functionality can be restored by implementing both
the Str and StrAllocating traits.
[breaking-change]
fail!() used to require owned strings but can handle static strings
now. Also, it can pass its arguments to fmt!() on its own, no need for
the caller to call fmt!() itself.