Commit Graph

66 Commits

Author SHA1 Message Date
Aaron Turon
d2ea0315e0 Revert "Use slice syntax instead of slice_to, etc."
This reverts commit 40b9f5ded5.
2014-10-02 11:48:07 -07:00
Nick Cameron
40b9f5ded5 Use slice syntax instead of slice_to, etc. 2014-10-02 13:19:45 +13:00
Alex Crichton
50375139e2 Deal with the fallout of string stabilization 2014-09-23 18:31:52 -07:00
Alex Crichton
0169218047 Fix fallout from Vec stabilization 2014-09-21 22:15:51 -07:00
Joseph Crail
b7bfe04b2d Fix spelling errors and capitalization. 2014-09-03 23:10:38 -04:00
nham
7b31058873 libcollections: In tests, remove some uses of deprecated methods and
unused imports.
2014-08-26 16:11:40 -04:00
Nick Cameron
52ef46251e Rebasing changes 2014-08-26 16:07:32 +12:00
Nick Cameron
37a94b80f2 Use temp vars for implicit coercion to ^[T] 2014-08-26 12:37:45 +12:00
P1start
f2aa88ca06 A few minor documentation fixes 2014-08-19 17:22:18 +12:00
Steve Klabnik
2f8044418e Remove innapropriate string mutability section.
Fixes #14948
2014-08-18 14:00:35 -04:00
bors
cb9c1e0e70 auto merge of #16498 : Kimundi/rust/inline-utf-encoding, r=alexcrichton
The first commit improves code generation through a few changes:
- The `#[inline]` attributes allow llvm to constant fold the encoding step away in certain situations. For example, code like this changes from a call to `encode_utf8` in a inner loop to the pushing of a byte constant:

 ```rust
let mut s = String::new();
for _ in range(0u, 21) {
        s.push_char('a');
}
```
- Both methods changed their semantic from causing run time failure if the target buffer is not large enough to returning `None` instead. This makes llvm no longer emit code for causing failure for these methods.
- A few debug `assert!()` calls got removed because they affected code generation due to unwinding, and where basically unnecessary with today's sound handling of `char` as a Unicode scalar value.

~~The second commit is optional. It changes the methods from regular indexing with the `dst[i]` syntax to unsafe indexing with `dst.unsafe_mut_ref(i)`. This does not change code generation directly - in both cases llvm is smart enough to see that there can never be an out-of-bounds access. But it makes it emit a `nounwind` attribute for the function. 
However, I'm not sure whether that is a real improvement, so if there is any objection to this I'll remove the commit.~~

This changes how the methods behave on a too small buffer, so this is a 

[breaking-change]
2014-08-17 04:42:32 +00:00
Patrick Walton
7f928d150e librustc: Forbid external crates, imports, and/or items from being
declared with the same name in the same scope.

This breaks several common patterns. First are unused imports:

    use foo::bar;
    use baz::bar;

Change this code to the following:

    use baz::bar;

Second, this patch breaks globs that import names that are shadowed by
subsequent imports. For example:

    use foo::*; // including `bar`
    use baz::bar;

Change this code to remove the glob:

    use foo::{boo, quux};
    use baz::bar;

Or qualify all uses of `bar`:

    use foo::{boo, quux};
    use baz;

    ... baz::bar ...

Finally, this patch breaks code that, at top level, explicitly imports
`std` and doesn't disable the prelude.

    extern crate std;

Because the prelude imports `std` implicitly, there is no need to
explicitly import it; just remove such directives.

The old behavior can be opted into via the `import_shadowing` feature
gate. Use of this feature gate is discouraged.

This implements RFC #116.

Closes #16464.

[breaking-change]
2014-08-16 19:32:25 -07:00
Marvin Löbel
13079c1a85 Optimized IR generation for UTF-8 and UTF-16 encoding
- Both can now be inlined and constant folded away
- Both can no longer cause failure
- Both now return an `Option` instead

Removed debug `assert!()`s over the valid ranges of a `char`
- It affected optimizations due to unwinding
- Char handling is now sound enought that they became uneccessary
2014-08-16 21:13:39 +02:00
Brian Anderson
fce442e75c Fix test fallout 2014-08-13 17:11:21 -07:00
Brian Anderson
4f5b6927e8 std: Rename various slice traits for consistency
ImmutableVector -> ImmutableSlice
ImmutableEqVector -> ImmutableEqSlice
ImmutableOrdVector -> ImmutableOrdSlice
MutableVector -> MutableSlice
MutableVectorAllocating -> MutableSliceAllocating
MutableCloneableVector -> MutableCloneableSlice
MutableOrdVector -> MutableOrdSlice

These are all in the prelude so most code will not break.

[breaking-change]
2014-08-13 11:30:14 -07:00
Aaron Turon
f77cabecbb Deprecation fallout in libcollections 2014-08-12 13:35:56 -07:00
nham
f36ddf1d0e Use byte literals in libcollections tests 2014-08-06 00:57:49 -04:00
Florian Zeitz
7ece0abe64 collections, unicode: Add support for NFC and NFKC 2014-07-28 18:47:38 +02:00
Adolfo Ochagavía
75a0062d88 Add string::raw::from_buf 2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
0fe894e49b Deprecated String::from_raw_parts
Replaced by `string::raw::from_parts`

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
6e509d3462 Deprecated str::raw::from_buf_len
Replaced by `string::raw::from_buf_len`

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
feeae27a56 Deprecated str::raw::from_byte
Use `string:raw::from_utf8` instead

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
9ec19373af Deprecated str::raw::from_utf8_owned
Replaced by `string::raw::from_utf8`

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
eacc5d779f Deprecated str::raw::from_c_str
Use `string::raw::from_buf` instead

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
ba707fb3a0 Remove OwnedStr trait
This trait was only implemented by `String`. It provided the methods
`into_bytes` and `append`, both of which **are already implemented as normal
methods** of `String` (not as trait methods). This change improves the
consistency of strings.

This shouldn't break any code, except if somebody has implemented
`OwnedStr` for a user-defined type.
2014-07-24 07:25:43 -07:00
Brian Anderson
054b1ff989 Remove kludgy imports from vec! macro 2014-07-23 13:20:17 -07:00
Brian Anderson
d36a8f3f9c collections: Move push/pop to MutableSeq
Implement for Vec, DList, RingBuf. Add MutableSeq to the prelude.

Since the collections traits are in the prelude most consumers of
these methods will continue to work without change.

[breaking-change]
2014-07-23 13:20:10 -07:00
bors
8d43e4474a auto merge of #15867 : cmr/rust/rewrite-lexer4, r=alexcrichton 2014-07-22 07:16:17 +00:00
Corey Richardson
188d889aaf ignore-lexer-test to broken files and remove some tray hyphens
I blame @ChrisMorgan for the hyphens.
2014-07-21 10:59:58 -07:00
bors
ca38434829 auto merge of #15638 : blake2-ppc/rust/ptr-arithmetic-chars, r=huonw
Reimplement the string slice's `Iterator<char>` by wrapping the already efficient
slice iterator.

The iterator uses our guarantee that the string contains valid UTF-8, but its only unsafe
code is transmuting the decoded `u32` into `char`.

Benchmarks suggest that the runtime of `Chars` benchmarks are reduced by up to 30%,
runtime of `Chars` reversed reduced by up to 60%.

```
BEFORE
test str::bench::char_indicesator                          ... bench:       124 ns/iter (+/- 1)
test str::bench::char_indicesator_rev                      ... bench:       188 ns/iter (+/- 9)
test str::bench::char_iterator                             ... bench:       122 ns/iter (+/- 2)
test str::bench::char_iterator_ascii                       ... bench:       302 ns/iter (+/- 41)
test str::bench::char_iterator_for                         ... bench:       123 ns/iter (+/- 4)
test str::bench::char_iterator_rev                         ... bench:       189 ns/iter (+/- 14)
test str::bench::char_iterator_rev_for                     ... bench:       177 ns/iter (+/- 4)

AFTER
test str::bench::char_indicesator                          ... bench:        85 ns/iter (+/- 3)
test str::bench::char_indicesator_rev                      ... bench:        82 ns/iter (+/- 2)
test str::bench::char_iterator                             ... bench:       100 ns/iter (+/- 3)
test str::bench::char_iterator_ascii                       ... bench:       317 ns/iter (+/- 3)
test str::bench::char_iterator_for                         ... bench:        86 ns/iter (+/- 2)
test str::bench::char_iterator_rev                         ... bench:        80 ns/iter (+/- 6)
test str::bench::char_iterator_rev_for                     ... bench:        68 ns/iter (+/- 0)
```

Note: Branch name is no longer indicative of the implementation.
2014-07-19 14:06:39 +00:00
Steve Klabnik
226b7d1b72 Guide: strings 2014-07-17 20:50:14 -04:00
root
9e59c76263 Add more tests for str Chars iterator
Test iterating (decoding) every codepoint.
2014-07-18 00:59:09 +02:00
root
d6b42c2463 str: Add better tests for string slice's Chars iterator
Test using for ch s.chars() { black_box(ch) } to have a test that should
force the iterator to run its full decoding computations.
2014-07-17 20:21:53 +02:00
bors
2692ae1ddd auto merge of #15619 : kwantam/rust/master, r=huonw
- `width()` computes the displayed width of a string, ignoring the width of control characters.
    - arguably we might do *something* else for control characters, but the question is, what?
    - users who want to do something else can iterate over chars()

- `graphemes()` returns a `Graphemes` struct, which implements an iterator over the grapheme clusters of a &str.
    - fully compliant with [UAX#29](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)
    - passes all [Unicode-supplied tests](http://www.unicode.org/reports/tr41/tr41-15.html#Tests29)

- added code to generate additionial categories in `unicode.py`
    - `Cn` aka `Not_Assigned`
    - categories necessary for grapheme cluster breaking

- tidied up the exports from libunicode
  - all exports are exposed through a module rather than directly at crate root.
  - std::prelude imports UnicodeChar and UnicodeStrSlice from std::char and std::str rather than directly from libunicode

closes #7043
2014-07-15 22:51:17 +00:00
Adolfo Ochagavía
584fbde5d1 Fix errors 2014-07-15 20:34:16 +02:00
Adolfo Ochagavía
c6b82c7566 Deprecate str::from_utf8_lossy
Use `String::from_utf8_lossy` instead

[breaking-change]
2014-07-15 19:55:21 +02:00
Adolfo Ochagavía
1900abdd9b Deprecate str::from_utf16_lossy
Use `String::from_utf16_lossy` instead.

[breaking-change]
2014-07-15 19:55:20 +02:00
Adolfo Ochagavía
6ac4fc7fc2 Deprecate str::from_utf16
Use `String::from_utf16` instead

[breaking-change]
2014-07-15 19:55:19 +02:00
Adolfo Ochagavía
05baf9b10c Deprecate str::from_char
Use `String::from_char` or `.to_str` instead

[breaking-change]
2014-07-15 19:55:18 +02:00
Adolfo Ochagavía
20a6894830 Deprecate str::from_chars
Use `String::from_chars` instead

[breaking-change]
2014-07-15 19:55:18 +02:00
Adolfo Ochagavía
211f1caa29 Deprecate str::from_utf8_owned
Use `String::from_utf8` instead

[breaking-change]
2014-07-15 19:55:17 +02:00
kwantam
cf432b8f8f add Graphemes iterator; tidy unicode exports
- Graphemes and GraphemeIndices structs implement iterators over
  grapheme clusters analogous to the Chars and CharOffsets for chars in
  a string. Iterator and DoubleEndedIterator are available for both.

- tidied up the exports for libunicode. crate root exports are now moved
  into more appropriate module locations:
  - UnicodeStrSlice, Words, Graphemes, GraphemeIndices are in str module
  - UnicodeChar exported from char instead of crate root
  - canonical_combining_class is exported from str rather than crate root

Since libunicode's exports have changed, programs that previously relied
on the old export locations will need to change their `use` statements
to reflect the new ones. See above for more information on where the new
exports live.

closes #7043
[breaking-change]
2014-07-14 19:53:46 -04:00
kwantam
c066a1ee9f add UnicodeStrSlice width() function 2014-07-14 19:53:46 -04:00
bors
0a1e251e81 auto merge of #15497 : jasonthompson/rust/docs/str3, r=cmr
- for 3 implementations of into_maybe_owned()
  - is_slice()
  - is_owned()
2014-07-14 02:16:28 +00:00
bors
fa7cbb5a46 auto merge of #15283 : kwantam/rust/master, r=alexcrichton
Add libunicode; move unicode functions from core

- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
  - Unicode-aware functions now live in libunicode
    - `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
      `is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
      `to_uppercase`, `to_lowercase`
  - added `width` method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is a non-NULL control character
    - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
  - functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
    - `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
  - also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
  - thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`

The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-09 18:36:30 +00:00
kwantam
85e2bee4a2 fix test failures
- unicode tests live in coretest crate
- libcollections str tests need UnicodeChar trait.
- libregex perlw tests were checking a char in the Alphabetic category,
  \x2161. Confirmed perl 5.18 considers this a \w character. Changed to
  \x2961, which is not \w as the test expects.
2014-07-09 10:14:46 -04:00
Richo Healey
12c334a77b std: Rename the ToStr trait to ToString, and to_str to to_string.
[breaking-change]
2014-07-08 13:01:43 -07:00
kwantam
5d4238b6fc Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
  - Unicode-aware functions now live in libunicode
    - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
      is_uppercase, is_whitespace, is_alphanumeric, is_control,
      is_digit, to_uppercase, to_lowercase
  - added width method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is
      a non-NULL control character
    - takes a boolean argument indicating whether the present context is
      CJK or not (characters with 'A'mbiguous widths are double-wide in
      CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
  (libunicode)
  - functionality formerly in StrSlice that relied upon Unicode
    functionality from Char is now in UnicodeStrSlice
    - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
  - also moved Words type alias into libunicode because words method is
    in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
  libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
  combining the libunicode functionality with the Char functionality
  from libcore
  - thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str

The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-07 14:52:24 -04:00
Jason Thompson
7158e8a1b7 Add example for str replace() and MaybeOwned
- for 3 implementations of into_maybe_owned()
  - is_slice()
  - is_owned()
2014-07-07 06:26:52 -04:00
Jason Thompson
7db691e010 Add examples for StrVector methods
- examples for connect and concat
- also fixed extra word in existing docs
2014-07-03 12:54:52 -07:00