298 Commits

Author SHA1 Message Date
Jack Moffitt
090b2453a1 Fix starts_with() and ends_with().
d4a32386f3b6 broke these since slice_to() and slice_from() must get character
boundaries, and arbitrary needle lengths don't necessarily map to character
boundaries of the haystack.

This also adds new tests that would have caught this bug.
2013-10-17 23:36:43 -06:00
Kevin Ballard
87a2d032ff Rewrite str.starts_with()/ends_with() to be simpler 2013-10-16 23:17:34 -07:00
Alex Crichton
a84c2999c9 Require module documentation with missing_doc
Closes #9824
2013-10-15 22:27:10 -07:00
Daniel Micay
6a90e80b62 option: rewrite the API to use composition 2013-10-09 09:17:29 -04:00
bors
1506dac10f auto merge of #9727 : Valloric/rust/doc-fixes, r=catamorphism 2013-10-04 23:51:32 -07:00
Strahinja Val Markovic
03099e5678 Fixed another minor typo in std::str docs 2013-10-04 22:07:57 -07:00
Strahinja Val Markovic
d629aca81a Fix minor typo in std::str module docs 2013-10-04 21:24:29 -07:00
Florian Gilcher
4dc3ff9c80 Add an implementation of FromStr for ~str and @str
This fixes an issue for APIs that return FromStr. Before this change,
they would have to offer a seperate function for returning the source
string.
2013-10-02 15:37:59 +02:00
Alex Crichton
a8ba31dbf3 std: Remove usage of fmt! 2013-09-30 23:21:18 -07:00
bors
10e7f12daf auto merge of #9550 : alexcrichton/rust/remove-printf, r=thestinger
The 0.8 release was cut, down with printf!
2013-09-27 08:21:23 -07:00
Erick Tryzelaar
0bdc99d81f std: removed some warnings in tests. 2013-09-26 22:20:39 -07:00
Alex Crichton
409182de6d Update the compiler to not use printf/printfln 2013-09-26 17:05:59 -07:00
Marvin Löbel
e94b3fae39 Moved StrSlice doc comments from impl to trait.
Moved OwnedStr doc comments from impl to trait.
Added a few #[inline] hints.

The doc comment changes make the source a bit harder to read, as
documentation and implementation no longer live right next to each
other. But this way they at least appear in the docs.
2013-09-26 04:44:36 +02:00
Alex Crichton
3585c64d09 rustdoc: Change all code-blocks with a script
find src -name '*.rs' | xargs sed -i '' 's/~~~.*{\.rust}/```rust/g'
    find src -name '*.rs' | xargs sed -i '' 's/ ~~~$/ ```/g'
    find src -name '*.rs' | xargs sed -i '' 's/^~~~$/ ```/g'
2013-09-25 14:27:42 -07:00
Simon Sapin
4aee7b2b42 Do not imply that str is sometimes null-terminated. 2013-09-24 13:26:10 +01:00
bors
a95604fcaa auto merge of #9276 : alexcrichton/rust/dox, r=brson
Hopefull this will make our libstd docs appear a little more "full".
2013-09-20 14:11:08 -07:00
Brian Anderson
fd0fcba9f5 std: Fix an invalid read in from_c_multistring
When `count` is `Some` this function was reading a byte past the end
of the buffer.
2013-09-17 21:25:18 -07:00
Alex Crichton
88bc11e646 Document a few undocumented modules in libstd
Hopefull this will make our libstd docs appear a little more "full".
2013-09-17 20:50:23 -07:00
Jeff Olson
c0ec40f74b std: merge conflict cleanup from std::str 2013-09-16 23:39:33 -07:00
Jeff Olson
63182885d8 std: more work on from_c_multistring.. let it take an optional len param 2013-09-16 23:19:23 -07:00
Jeff Olson
daf4974628 std: win32 os::env() str parsing to str::raw::from_c_multistring + test 2013-09-16 23:17:46 -07:00
bors
d5e9033a0d auto merge of #9108 : blake2-ppc/rust/hazards-on-overflow, r=alexcrichton
Fix uint overflow bugs in std::{at_vec, vec, str}

Closes #8742

Fix issue #8742, which summarized is: unsafe code in vec and str did assume
that a reservation for `X + Y` elements always succeeded, and didn't overflow.

Introduce the method `Vec::reserve_additional(n)` to make it easy to check for
overflow in `Vec::push` and `Vec::push_all`.

In std::str, simplify and remove a lot of the unsafe code and use `push_str`
instead. With improvements to `.push_str` and the new function
`vec::bytes::push_bytes`, it looks like this change has either no or positive
impact on performance.

I believe there are many places still where `v.reserve(A + B)` still can overflow.
This by itself is not an issue unless followed by (unsafe) code that steps aside
boundary checks.
2013-09-16 19:35:50 -07:00
blake2-ppc
90dc9512ba std::str: Fix overflow problems in unsafe code
See issue #8742
2013-09-17 02:47:59 +02:00
blake2-ppc
e34e2032e8 std::str: Add bench tests for StrVector::connect() and for str::push_str 2013-09-16 19:13:41 +02:00
Marvin Löbel
76c3e8a38c Add an SendStr type
A SendStr is a string that can hold either a ~str or a &'static str.
This can be useful as an optimization when an allocation is sometimes needed but the common case is statically known.

Possible use cases include Maps with both static and owned keys, or propagating error messages across task boundaries.

SendStr implements most basic traits in a way that hides the fact that it is an enum; in particular things like order and equality are only determined by the content of the wrapped strings.

Replaced std::rt:logging::SendableString with SendStr
Added tests for using an SendStr as key in Hash- and Treemaps
2013-09-16 16:57:50 +02:00
Daniel Micay
6919cf5fe1 rename std::iterator to std::iter
The trait will keep the `Iterator` naming, but a more concise module
name makes using the free functions less verbose. The module will define
iterables in addition to iterators, as it deals with iteration in
general.
2013-09-09 03:21:46 -04:00
bors
6f9ce0948a auto merge of #8997 : fhahn/rust/issue_8985, r=catamorphism,brson
Patch for #8985
2013-09-05 15:00:49 -07:00
Florian Hahn
de39874801 Rename str::from_bytes to str::from_utf8, closes #8985 2013-09-05 14:17:24 +02:00
Daniel Micay
fcc7aff62b str: rm map_chars, replaced by iterators
mapping a function against the elements should not require allocating a
new container, but `collect` still provides the functionality as-needed
2013-09-05 02:02:27 -04:00
blake2-ppc
b153219556 std::str: Deny surrogates in is_utf8
Reject codepoints \uD800 to \uDFFF which are the surrogates
(reserved/unused codepoints that are invalid to encode into UTF-8)

The surrogates is the only hole of invalid codepoints in the range from
\u0 to \u10FFFF.
2013-09-04 23:09:51 -04:00
bors
b161e09e03 auto merge of #8977 : pnkfelix/rust/fsk-followup-on-6009-rebased, r=alexcrichton
Fix #6009.  Rebased version of #8970.  Inherits review from alexcrichton.
2013-09-04 16:20:46 -07:00
Daniel Micay
62a3434529 stop treating char as an integer type
Closes #7609
2013-09-04 08:07:56 -04:00
Felix S. Klock II
83e19d2ead Added explicit pub to several conditions. Enables completion of #6009. 2013-09-04 10:03:47 +02:00
bors
1ac8e8885b auto merge of #8884 : blake2-ppc/rust/exact-size-hint, r=huonw
The message of the first commit explains (edited for changed trait name):

The trait `ExactSize` is introduced to solve a few small niggles:

* We can't reverse (`.invert()`) an enumeration iterator
* for a vector, we have `v.iter().position(f)` but `v.rposition(f)`.
* We can't reverse `Zip` even if both iterators are from vectors

`ExactSize` is an empty trait that is intended to indicate that an
iterator, for example `VecIterator`, knows its exact finite size and
reports it correctly using `.size_hint()`. Only adaptors that preserve
this at all times, can expose this trait further. (Where here we say
finite for fitting in uint).

---

It may seem complicated just to solve these small "niggles",
(It's really the reversible enumerate case that's the most interesting)
but only a few core iterators need to implement this trait.

While we gain more capabilities generically for some iterators,
it becomes a tad more complicated to figure out if a type has
the right trait impls for it.
2013-09-03 06:56:05 -07:00
blake2-ppc
35040dfccc std::iterator: Use ExactSize, inheriting DoubleEndedIterator
Address discussion with acrichto; inherit DoubleEndedIterator so that
`.rposition()` can be a default method, and that the nische of the trait
is clear. Use assertions when using `.size_hint()` in reverse enumerate
and `.rposition()`
2013-09-01 18:17:26 +02:00
Eric Martin
babe20f018 remove several 'ne' methods 2013-08-30 21:53:25 -04:00
blake2-ppc
46a6dbc541 std::str: Use reverse enumerate and .rposition
Simplify code by using the reversibility of enumerate and use
.rposition().
2013-08-30 20:06:26 +02:00
bors
0553618e08 auto merge of #8858 : blake2-ppc/rust/small-bugs, r=alexcrichton
Fix a bug in `s.slice_chars(a, b)` that did not accept `a == s.len()`.

Fix a bug in `!=` defined for DList.

Also simplify NormalizationIterator to use the CharIterator directly instead of mimicing the iteration itself.
2013-08-30 11:00:43 -07:00
bors
1f9bd62fd6 auto merge of #8857 : blake2-ppc/rust/std-str-remove, r=thestinger
These are very easy to replace with methods on string slices, basically
`.char_len()` and `.len()`.

These are the replacement implementations I did to clean these
functions up, but seeing this I propose removal:

/// ...
pub fn count_chars(s: &str, begin: uint, end: uint) -> uint {
    // .slice() checks the char boundaries
    s.slice(begin, end).char_len()
}

/// Counts the number of bytes taken by the first `n` chars in `s`
/// starting from byte index `begin`.
///
/// Fails if there are less than `n` chars past `begin`
pub fn count_bytes<'b>(s: &'b str, begin: uint, n: uint) -> uint {
    s.slice_from(begin).slice_chars(0, n).len()
}
2013-08-30 04:40:47 -07:00
bors
a6835dd3cb auto merge of #8842 : jfager/rust/remove-iter-module, r=pnkfelix
Moves the Times trait to num while the question of whether it should
exist at all gets hashed out as a completely separate question.
2013-08-29 11:10:47 -07:00
blake2-ppc
479aefb670 std::str: Fix bug in .slice_chars()
`s.slice_chars(a, b)` did not allow the case where `a == s.len()`, this
is a bug I introduced last time I touched the method; add a test for
this case.
2013-08-29 17:11:11 +02:00
blake2-ppc
d8801ceabc std::str: Use CharIterator in NormalizationIterator
Just to simplify and not have the iteration logic repeated in multiple places.
2013-08-29 17:11:11 +02:00
blake2-ppc
b656bfaaa9 std::str: Remove functions count_chars, count_bytes
These are very easy to replace with methods on string slices, basically
`.char_len()` and `.len()`.

These are the replacement implementations I did to clean these
functions up, but seeing this I propose removal:

/// ...
pub fn count_chars(s: &str, begin: uint, end: uint) -> uint {
    // .slice() checks the char boundaries
    s.slice(begin, end).char_len()
}

/// Counts the number of bytes taken by the first `n` chars in `s`
/// starting from byte index `begin`.
///
/// Fails if there are less than `n` chars past `begin`
pub fn count_bytes<'b>(s: &'b str, begin: uint, n: uint) -> uint {
    s.slice_from(begin).slice_chars(0, n).len()
}
2013-08-29 15:51:39 +02:00
Jason Fager
dc30005ad8 Remove the iter module.
Moves the Times trait to num while the question of whether it should
exist at all gets hashed out as a completely separate question.
2013-08-29 01:27:24 -04:00
Alex Crichton
e3662b1880 Remove offset_inbounds for an unsafe offset function 2013-08-27 23:22:52 -07:00
Corey Richardson
87d9d37c07 Add a Default trait. 2013-08-26 19:25:53 -04:00
bors
540d98e7fc auto merge of #8737 : blake2-ppc/rust/std-str-rsplit, r=huonw
Make CharSplitIterator double-ended which is simple given that the operation is symmetric, once the split-N feature is factored out into its own adaptor.

`.rsplitn_iter()` allows splitting `N` times from the back of a string, so it is a completely new feature. With the double-ended impl, `.split_iter()`, `.line_iter()`, `.word_iter()` all allow picking off elements from either end.

`split_options_iter` is removed with the factoring of the split- and split-N- iterators, instead there is `split_terminator_iter`.

---

Add benchmarks using `#[bench]` and tune CharSplitIterator a bit after Huon Wilson's suggestions

Benchmarks 1-5 do the same split using different implementations of `CharEq`, all splitting an ascii string on ascii space. Benchmarks 6-7 split a unicode string on an ascii char.

Before this PR
test str::bench::split_iter_ascii ... bench: 166 ns/iter (+/- 2)
test str::bench::split_iter_closure ... bench: 113 ns/iter (+/- 1)
test str::bench::split_iter_extern_fn ... bench: 286 ns/iter (+/- 7)
test str::bench::split_iter_not_ascii ... bench: 114 ns/iter (+/- 4)
test str::bench::split_iter_slice ... bench: 220 ns/iter (+/- 12)
test str::bench::split_iter_unicode_ascii ... bench: 217 ns/iter (+/- 3)
test str::bench::split_iter_unicode_not_ascii ... bench: 248 ns/iter (+/- 3)

PR, first commit
test str::bench::split_iter_ascii ... bench: 331 ns/iter (+/- 9)
test str::bench::split_iter_closure ... bench: 114 ns/iter (+/- 2)
test str::bench::split_iter_extern_fn ... bench: 314 ns/iter (+/- 6)
test str::bench::split_iter_not_ascii ... bench: 132 ns/iter (+/- 1)
test str::bench::split_iter_slice ... bench: 157 ns/iter (+/- 3)
test str::bench::split_iter_unicode_ascii ... bench: 502 ns/iter (+/- 64)
test str::bench::split_iter_unicode_not_ascii ... bench: 250 ns/iter (+/- 3)

PR, final version
test str::bench::split_iter_ascii ... bench: 106 ns/iter (+/- 4)
test str::bench::split_iter_closure ... bench: 107 ns/iter (+/- 1)
test str::bench::split_iter_extern_fn ... bench: 267 ns/iter (+/- 6)
test str::bench::split_iter_not_ascii ... bench: 108 ns/iter (+/- 1)
test str::bench::split_iter_slice ... bench: 170 ns/iter (+/- 8)
test str::bench::split_iter_unicode_ascii ... bench: 128 ns/iter (+/- 5)
test str::bench::split_iter_unicode_not_ascii ... bench: 252 ns/iter (+/- 3)

---

There are several ways to deal with `CharEq::only_ascii`. It is a performance optimization, so with that in mind, we allow passing bogus char (outside ascii) as long as they don't match. We use a byte value check to make sure we don't split on these (would split substrings in the middle of encoded char).  (A more principled way would be to only pass the ascii codepoints to the CharEq when it indicates only_ascii, but that undoes some of the performance optimization.)
2013-08-26 05:06:16 -07:00
blake2-ppc
4de9bca4d8 std::str: Tune CharSplitIterator after benchmarks
Implement Huon Wilson's suggestions (since the benchmarks agree!).

Use `self.sep.matches(byte as char) && byte < 128u8` to match in the
only_ascii case so that mistaken matches outside the ascii range can't
create invalid substrings.

Put the conditional on only_ascii outside the loop.
2013-08-26 13:30:46 +02:00
blake2-ppc
413f868220 std::str: bench tests for .split_iter() 2013-08-26 11:48:48 +02:00
Kevin Ballard
6f9c68af2e Add _opt variants to str byte-conversion functions
Add _opt variants to from_bytes, from_bytes_owned, and from_bytes_slice.
These variants return an Option instead of raising a condition/failing.
2013-08-25 18:30:31 -07:00