This commit uniforms the short title of modules provided by libstd,
in order to make their roles more explicit when glancing at the index.
Signed-off-by: Luca Bruno <lucab@debian.org>
`.as_mut_buf` was used exactly once, in `.push_char` which could be
written in a simpler way, using the `&mut ~[u8]` that it already
retrieved. In the rare situation when someone really needs
`.as_mut_buf`-like functionality (getting a `*mut u8`), they can go via
`str::raw::as_owned_vec`.
This function had type &[u8] -> ~str, i.e. it allocates a string
internally, even though the non-allocating version that take &[u8] ->
&str and ~[u8] -> ~str are all that is necessary in most circumstances.
This involved changing a fair amount of code, rooted in how we access the local
IoFactory instance. I added a helper method to the rtio module to access the
optional local IoFactory. This is different than before in which it was assumed
that a local IoFactory was *always* present. Now, a separate io_error is raised
when an IoFactory is not present, yet I/O is requested.
This removes the PathLike trait associated with this "support module". This is
yet another "container of bytes" trait, so I didn't want to duplicate what
already exists throughout libstd. In actuality, we're going to pass of C strings
to the libuv APIs, so instead the arguments are now bound with the 'ToCStr'
trait instead.
Additionally, a layer of complexity was removed by immediately converting these
type-generic parameters into CStrings to get handed off to libuv apis.
d4a32386f3 broke these since slice_to() and slice_from() must get character
boundaries, and arbitrary needle lengths don't necessarily map to character
boundaries of the haystack.
This also adds new tests that would have caught this bug.
Moved OwnedStr doc comments from impl to trait.
Added a few #[inline] hints.
The doc comment changes make the source a bit harder to read, as
documentation and implementation no longer live right next to each
other. But this way they at least appear in the docs.
Fix uint overflow bugs in std::{at_vec, vec, str}
Closes#8742
Fix issue #8742, which summarized is: unsafe code in vec and str did assume
that a reservation for `X + Y` elements always succeeded, and didn't overflow.
Introduce the method `Vec::reserve_additional(n)` to make it easy to check for
overflow in `Vec::push` and `Vec::push_all`.
In std::str, simplify and remove a lot of the unsafe code and use `push_str`
instead. With improvements to `.push_str` and the new function
`vec::bytes::push_bytes`, it looks like this change has either no or positive
impact on performance.
I believe there are many places still where `v.reserve(A + B)` still can overflow.
This by itself is not an issue unless followed by (unsafe) code that steps aside
boundary checks.
A SendStr is a string that can hold either a ~str or a &'static str.
This can be useful as an optimization when an allocation is sometimes needed but the common case is statically known.
Possible use cases include Maps with both static and owned keys, or propagating error messages across task boundaries.
SendStr implements most basic traits in a way that hides the fact that it is an enum; in particular things like order and equality are only determined by the content of the wrapped strings.
Replaced std::rt:logging::SendableString with SendStr
Added tests for using an SendStr as key in Hash- and Treemaps
The trait will keep the `Iterator` naming, but a more concise module
name makes using the free functions less verbose. The module will define
iterables in addition to iterators, as it deals with iteration in
general.
Reject codepoints \uD800 to \uDFFF which are the surrogates
(reserved/unused codepoints that are invalid to encode into UTF-8)
The surrogates is the only hole of invalid codepoints in the range from
\u0 to \u10FFFF.
The message of the first commit explains (edited for changed trait name):
The trait `ExactSize` is introduced to solve a few small niggles:
* We can't reverse (`.invert()`) an enumeration iterator
* for a vector, we have `v.iter().position(f)` but `v.rposition(f)`.
* We can't reverse `Zip` even if both iterators are from vectors
`ExactSize` is an empty trait that is intended to indicate that an
iterator, for example `VecIterator`, knows its exact finite size and
reports it correctly using `.size_hint()`. Only adaptors that preserve
this at all times, can expose this trait further. (Where here we say
finite for fitting in uint).
---
It may seem complicated just to solve these small "niggles",
(It's really the reversible enumerate case that's the most interesting)
but only a few core iterators need to implement this trait.
While we gain more capabilities generically for some iterators,
it becomes a tad more complicated to figure out if a type has
the right trait impls for it.
Address discussion with acrichto; inherit DoubleEndedIterator so that
`.rposition()` can be a default method, and that the nische of the trait
is clear. Use assertions when using `.size_hint()` in reverse enumerate
and `.rposition()`
Fix a bug in `s.slice_chars(a, b)` that did not accept `a == s.len()`.
Fix a bug in `!=` defined for DList.
Also simplify NormalizationIterator to use the CharIterator directly instead of mimicing the iteration itself.
These are very easy to replace with methods on string slices, basically
`.char_len()` and `.len()`.
These are the replacement implementations I did to clean these
functions up, but seeing this I propose removal:
/// ...
pub fn count_chars(s: &str, begin: uint, end: uint) -> uint {
// .slice() checks the char boundaries
s.slice(begin, end).char_len()
}
/// Counts the number of bytes taken by the first `n` chars in `s`
/// starting from byte index `begin`.
///
/// Fails if there are less than `n` chars past `begin`
pub fn count_bytes<'b>(s: &'b str, begin: uint, n: uint) -> uint {
s.slice_from(begin).slice_chars(0, n).len()
}
`s.slice_chars(a, b)` did not allow the case where `a == s.len()`, this
is a bug I introduced last time I touched the method; add a test for
this case.
These are very easy to replace with methods on string slices, basically
`.char_len()` and `.len()`.
These are the replacement implementations I did to clean these
functions up, but seeing this I propose removal:
/// ...
pub fn count_chars(s: &str, begin: uint, end: uint) -> uint {
// .slice() checks the char boundaries
s.slice(begin, end).char_len()
}
/// Counts the number of bytes taken by the first `n` chars in `s`
/// starting from byte index `begin`.
///
/// Fails if there are less than `n` chars past `begin`
pub fn count_bytes<'b>(s: &'b str, begin: uint, n: uint) -> uint {
s.slice_from(begin).slice_chars(0, n).len()
}
Make CharSplitIterator double-ended which is simple given that the operation is symmetric, once the split-N feature is factored out into its own adaptor.
`.rsplitn_iter()` allows splitting `N` times from the back of a string, so it is a completely new feature. With the double-ended impl, `.split_iter()`, `.line_iter()`, `.word_iter()` all allow picking off elements from either end.
`split_options_iter` is removed with the factoring of the split- and split-N- iterators, instead there is `split_terminator_iter`.
---
Add benchmarks using `#[bench]` and tune CharSplitIterator a bit after Huon Wilson's suggestions
Benchmarks 1-5 do the same split using different implementations of `CharEq`, all splitting an ascii string on ascii space. Benchmarks 6-7 split a unicode string on an ascii char.
Before this PR
test str::bench::split_iter_ascii ... bench: 166 ns/iter (+/- 2)
test str::bench::split_iter_closure ... bench: 113 ns/iter (+/- 1)
test str::bench::split_iter_extern_fn ... bench: 286 ns/iter (+/- 7)
test str::bench::split_iter_not_ascii ... bench: 114 ns/iter (+/- 4)
test str::bench::split_iter_slice ... bench: 220 ns/iter (+/- 12)
test str::bench::split_iter_unicode_ascii ... bench: 217 ns/iter (+/- 3)
test str::bench::split_iter_unicode_not_ascii ... bench: 248 ns/iter (+/- 3)
PR, first commit
test str::bench::split_iter_ascii ... bench: 331 ns/iter (+/- 9)
test str::bench::split_iter_closure ... bench: 114 ns/iter (+/- 2)
test str::bench::split_iter_extern_fn ... bench: 314 ns/iter (+/- 6)
test str::bench::split_iter_not_ascii ... bench: 132 ns/iter (+/- 1)
test str::bench::split_iter_slice ... bench: 157 ns/iter (+/- 3)
test str::bench::split_iter_unicode_ascii ... bench: 502 ns/iter (+/- 64)
test str::bench::split_iter_unicode_not_ascii ... bench: 250 ns/iter (+/- 3)
PR, final version
test str::bench::split_iter_ascii ... bench: 106 ns/iter (+/- 4)
test str::bench::split_iter_closure ... bench: 107 ns/iter (+/- 1)
test str::bench::split_iter_extern_fn ... bench: 267 ns/iter (+/- 6)
test str::bench::split_iter_not_ascii ... bench: 108 ns/iter (+/- 1)
test str::bench::split_iter_slice ... bench: 170 ns/iter (+/- 8)
test str::bench::split_iter_unicode_ascii ... bench: 128 ns/iter (+/- 5)
test str::bench::split_iter_unicode_not_ascii ... bench: 252 ns/iter (+/- 3)
---
There are several ways to deal with `CharEq::only_ascii`. It is a performance optimization, so with that in mind, we allow passing bogus char (outside ascii) as long as they don't match. We use a byte value check to make sure we don't split on these (would split substrings in the middle of encoded char). (A more principled way would be to only pass the ascii codepoints to the CharEq when it indicates only_ascii, but that undoes some of the performance optimization.)
Implement Huon Wilson's suggestions (since the benchmarks agree!).
Use `self.sep.matches(byte as char) && byte < 128u8` to match in the
only_ascii case so that mistaken matches outside the ascii range can't
create invalid substrings.
Put the conditional on only_ascii outside the loop.
Add new methods `.rsplit_iter()` and `.rsplitn_iter()` for &str.
Separate out CharSplitIterator and CharSplitNIterator,
CharSplitIterator (`split_iter` and `rsplit_iter`) is made double-ended
while `splitn_iter` and `rsplitn_iter` (limited to N splits) are not,
since these don't have the same symmetry.
With CharSplitIterator being double ended, derived iterators like
`line_iter` and `word_iter` are too.
Implement CharIterator as a separate struct, so that it can be .clone()'d. Fix `.char_range_at_reverse` so that it performs better, closer to the forwards version. This makes the reverse iterators and users like `.rfind()` perform better.
Before
test str::bench::char_iterator ... bench: 146 ns/iter (+/- 0)
test str::bench::char_iterator_ascii ... bench: 397 ns/iter (+/- 49)
test str::bench::char_iterator_rev ... bench: 576 ns/iter (+/- 8)
test str::bench::char_offset_iterator ... bench: 128 ns/iter (+/- 2)
test str::bench::char_offset_iterator_rev ... bench: 425 ns/iter (+/- 59)
After
test str::bench::char_iterator ... bench: 130 ns/iter (+/- 1)
test str::bench::char_iterator_ascii ... bench: 307 ns/iter (+/- 5)
test str::bench::char_iterator_rev ... bench: 185 ns/iter (+/- 8)
test str::bench::char_offset_iterator ... bench: 131 ns/iter (+/- 13)
test str::bench::char_offset_iterator_rev ... bench: 183 ns/iter (+/- 2)
To be able to use a string slice to represent the CharIterator, a function `slice_unchecked` is added, that does the same as `slice_bytes` but without any boundary checks.
It would be possible to implement CharIterator with pointer arithmetic to make it *much more efficient*, but since vec iterator is still improving, it's too early to attempt to re-implement it in other places. Hopefully CharIterator can be implemented on top of vec iterator without any unsafe code later.
Additional changes fix the documentation about null termination.
Let CharIterator be a separate type from CharOffsetIterator (so that
CharIterator can be cloned, for example).
Implement CharOffsetIterator by using the same technique as the method
subslice_offset.
Add a function like raw::slice_bytes, but it doesn't check slice
boundaries. For iterator use where we always know the begin, end indices
are in range.
I need `Clone` for `Tm` for my latest work on [rust-http](https://github.com/chris-morgan/rust-http) (static typing for headers, and headers like `Date` are a time), so here it is.
@huonw recommended deriving DeepClone while I was at it.
I also had to implement `DeepClone` for `~str` to get a derived implementation of `DeepClone` for `Tm`; I did `@str` while I was at it, for consistency.
If they are on the trait then it is extremely annoying to use them as
generic parameters to a function, e.g. with the iterator param on the trait
itself, if one was to pass an Extendable<int> to a function that filled it
either from a Range or a Map<VecIterator>, one needs to write something
like:
fn foo<E: Extendable<int, Range<int>> +
Extendable<int, Map<&'self int, int, VecIterator<int>>>
(e: &mut E, ...) { ... }
since using a generic, i.e. `foo<E: Extendable<int, I>, I: Iterator<int>>`
means that `foo` takes 2 type parameters, and the caller has to specify them
(which doesn't work anyway, as they'll mismatch with the iterators used in
`foo` itself).
This patch changes it to:
fn foo<E: Extendable<int>>(e: &mut E, ...) { ... }
If they are on the trait then it is extremely annoying to use them as
generic parameters to a function, e.g. with the iterator param on the trait
itself, if one was to pass an Extendable<int> to a function that filled it
either from a Range or a Map<VecIterator>, one needs to write something
like:
fn foo<E: Extendable<int, Range<int>> +
Extendable<int, Map<&'self int, int, VecIterator<int>>>
(e: &mut E, ...) { ... }
since using a generic, i.e. `foo<E: Extendable<int, I>, I: Iterator<int>>`
means that `foo` takes 2 type parameters, and the caller has to specify them
(which doesn't work anyway, as they'll mismatch with the iterators used in
`foo` itself).
This patch changes it to:
fn foo<E: Extendable<int>>(e: &mut E, ...) { ... }
This includes a number of improvements to `ifmt!`
* Implements formatting arguments -- `{:0.5x}` works now
* Formatting now works on all integer widths, not just `int` and `uint`
* Added a large doc block to `std::fmt` which should help explain what `ifmt!` is all about
* Added floating point formatters, although they have the same pitfalls from before (they're just proof-of-concept now)
Closed a couple of issues along the way, yay! Once this gets into a snapshot, I'll start looking into removing all of `fmt`
Basically, generic containers should not use the default methods since a
type of elements may not guarantees total order. str could use them
since u8's Ord guarantees total order. Floating point numbers are also
broken with the default methods because of NaN. Thanks for @thestinger.
Timespec also guarantees total order AIUI. I'm unsure whether
extra::semver::Identifier does so I left it alone. Proof needed.
Signed-off-by: OGINO Masanori <masanori.ogino@gmail.com>
According to #7887, we've decided to use the syntax of `fn map<U>(f: &fn(&T) -> U) -> U`, which passes a reference to the closure, and to `fn map_move<U>(f: &fn(T) -> U) -> U` which moves the value into the closure. This PR adds these `.map_move()` functions to `Option` and `Result`.
In addition, it has these other minor features:
* Replaces a couple uses of `option.get()`, `result.get()`, and `result.get_err()` with `option.unwrap()`, `result.unwrap()`, and `result.unwrap_err()`. (See #8268 and #8288 for a more thorough adaptation of this functionality.
* Removes `option.take_map()` and `option.take_map_default()`. These two functions can be easily written as `.take().map_move(...)`.
* Adds a better error message to `result.unwrap()` and `result.unwrap_err()`.
The two deletions are because the test cases are very old (still using `class` and modes!), and, as far as I can tell (since they are so old), the areas they test are well tested by other rpass tests.
`fn slice_bytes` is marked unsafe since it allows violating the valid
string encoding property; but the function did also allow extending the
lifetime of the slice by mistake, since it's returning `&str`.
Use the annotation `slice_bytes<'a>(&'a str, ...) -> &'a str` so
that all uses of `slice_bytes` are region checked correctly.
- Made naming schemes consistent between Option, Result and Either
- Changed Options Add implementation to work like the maybe monad (return None if any of the inputs is None)
- Removed duplicate Option::get and renamed all related functions to use the term `unwrap` instead
fn slice_bytes is marked unsafe since it allows violating the valid
string encoding property; but the function did also allow extending the
lifetime of the slice by mistake, since it's returning `&str`.
Use the annotation `slice_bytes<'a>(&'a str, ...) -> &'a str` so
that all uses of slice_bytes are region checked correctly.
Use unchecked vec indexing since the vector bounds are checked by the
loop. Iterators are not easy to use in this case since we skip 1-4 bytes
each lap. This part of the commit speeds up is_utf8 for ASCII input.
Check codepoint ranges by checking the byte ranges manually instead of
computing a full decoding for multibyte encodings. This is easy to read
and corresponds to the UTF-8 syntax in the RFC.
No changes to what we accept. A comment notes that surrogate halves are
accepted.
Before:
test str::bench::is_utf8_100_ascii ... bench: 165 ns/iter (+/- 3)
test str::bench::is_utf8_100_multibyte ... bench: 218 ns/iter (+/- 5)
After:
test str::bench::is_utf8_100_ascii ... bench: 130 ns/iter (+/- 1)
test str::bench::is_utf8_100_multibyte ... bench: 156 ns/iter (+/- 3)
An improvement upon the previous pull #8133
Use unchecked vec indexing since the vector bounds are checked by the
loop. Iterators are not easy to use in this case since we skip 1-4 bytes
each lap. This part of the commit speeds up is_utf8 for ASCII input.
Check codepoint ranges by checking the byte ranges manually instead of
computing a full decoding for multibyte encodings. This is easy to read
and corresponds to the UTF-8 syntax in the RFC.
No changes to what we accept. A comment notes that surrogate halves are
accepted.
Before:
test str::bench::is_utf8_100_ascii ... bench: 165 ns/iter (+/- 3)
test str::bench::is_utf8_100_multibyte ... bench: 218 ns/iter (+/- 5)
After:
test str::bench::is_utf8_100_ascii ... bench: 130 ns/iter (+/- 1)
test str::bench::is_utf8_100_multibyte ... bench: 156 ns/iter (+/- 3)