Commit Graph

149 Commits

Author SHA1 Message Date
Nick Cameron
6e0611a487 Review and rebasing changes 2014-10-02 14:50:22 +13:00
Nick Cameron
95cfc35607 Put slicing syntax behind a feature gate.
[breaking-change]

If you are using slicing syntax you will need to add #![feature(slicing_syntax)] to your crate.
2014-10-02 13:23:36 +13:00
Nick Cameron
40b9f5ded5 Use slice syntax instead of slice_to, etc. 2014-10-02 13:19:45 +13:00
Patrick Walton
416144b827 librustc: Forbid .. in range patterns.
This breaks code that looks like:

    match foo {
        1..3 => { ... }
    }

Instead, write:

    match foo {
        1...3 => { ... }
    }

Closes #17295.

[breaking-change]
2014-09-30 09:11:26 -07:00
Alex Crichton
735d16b1b0 rollup merge of #17585 : sfackler/string-slice 2014-09-29 08:14:16 -07:00
Steven Fackler
aa2814fd4e Implement Slice for String and str
Closes #17502
2014-09-26 21:48:49 -07:00
Squeaky
070ba14a71 Correct stability marker in string.rs 2014-09-27 02:37:28 +02:00
Alex Crichton
50375139e2 Deal with the fallout of string stabilization 2014-09-23 18:31:52 -07:00
Alex Crichton
31be3319bf collections: Deprecate shift_char for insert/remove
This commit deprecates the String::shift_char() function in favor of the
addition of an insert()/remove() pair of functions. This aligns the API with Vec
in that characters can be inserted at arbitrary positions.  Additionaly, there
is no `_char` suffix due to the rationaled laid out in the previous commit.

These functions are both introduced as unstable as their failure semantics,
while in line with slices/vectors, are uncertain about whether they should
remain the same.
2014-09-22 08:24:14 -07:00
Alex Crichton
79b4ce06ae collections: Stabilize String
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: https://github.com/rust-lang/rfcs/pull/240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
2014-09-22 07:46:40 -07:00
Alex Crichton
0169218047 Fix fallout from Vec stabilization 2014-09-21 22:15:51 -07:00
Nick Cameron
ce0907e46e Add enum variants to the type namespace
Change to resolve and update compiler and libs for uses.

[breaking-change]

Enum variants are now in both the value and type namespaces. This means that
if you have a variant with the same name as a type in scope in a module, you
will get a name clash and thus an error. The solution is to either rename the
type or the variant.
2014-09-19 15:11:00 +12:00
Nick Cameron
52ef46251e Rebasing changes 2014-08-26 16:07:32 +12:00
P1start
f2aa88ca06 A few minor documentation fixes 2014-08-19 17:22:18 +12:00
Patrick Walton
67deb2e65e libsyntax: Remove the use foo = bar syntax from the language in favor
of `use bar as foo`.

Change all uses of `use foo = bar` to `use bar as foo`.

Implements RFC #47.

Closes #16461.

[breaking-change]
2014-08-18 09:19:10 -07:00
bors
cb9c1e0e70 auto merge of #16498 : Kimundi/rust/inline-utf-encoding, r=alexcrichton
The first commit improves code generation through a few changes:
- The `#[inline]` attributes allow llvm to constant fold the encoding step away in certain situations. For example, code like this changes from a call to `encode_utf8` in a inner loop to the pushing of a byte constant:

 ```rust
let mut s = String::new();
for _ in range(0u, 21) {
        s.push_char('a');
}
```
- Both methods changed their semantic from causing run time failure if the target buffer is not large enough to returning `None` instead. This makes llvm no longer emit code for causing failure for these methods.
- A few debug `assert!()` calls got removed because they affected code generation due to unwinding, and where basically unnecessary with today's sound handling of `char` as a Unicode scalar value.

~~The second commit is optional. It changes the methods from regular indexing with the `dst[i]` syntax to unsafe indexing with `dst.unsafe_mut_ref(i)`. This does not change code generation directly - in both cases llvm is smart enough to see that there can never be an out-of-bounds access. But it makes it emit a `nounwind` attribute for the function. 
However, I'm not sure whether that is a real improvement, so if there is any objection to this I'll remove the commit.~~

This changes how the methods behave on a too small buffer, so this is a 

[breaking-change]
2014-08-17 04:42:32 +00:00
Patrick Walton
7f928d150e librustc: Forbid external crates, imports, and/or items from being
declared with the same name in the same scope.

This breaks several common patterns. First are unused imports:

    use foo::bar;
    use baz::bar;

Change this code to the following:

    use baz::bar;

Second, this patch breaks globs that import names that are shadowed by
subsequent imports. For example:

    use foo::*; // including `bar`
    use baz::bar;

Change this code to remove the glob:

    use foo::{boo, quux};
    use baz::bar;

Or qualify all uses of `bar`:

    use foo::{boo, quux};
    use baz;

    ... baz::bar ...

Finally, this patch breaks code that, at top level, explicitly imports
`std` and doesn't disable the prelude.

    extern crate std;

Because the prelude imports `std` implicitly, there is no need to
explicitly import it; just remove such directives.

The old behavior can be opted into via the `import_shadowing` feature
gate. Use of this feature gate is discouraged.

This implements RFC #116.

Closes #16464.

[breaking-change]
2014-08-16 19:32:25 -07:00
Marvin Löbel
13079c1a85 Optimized IR generation for UTF-8 and UTF-16 encoding
- Both can now be inlined and constant folded away
- Both can no longer cause failure
- Both now return an `Option` instead

Removed debug `assert!()`s over the valid ranges of a `char`
- It affected optimizations due to unwinding
- Char handling is now sound enought that they became uneccessary
2014-08-16 21:13:39 +02:00
Brian Anderson
033f28d436 core: Rename ImmutableSlice::unsafe_ref to unsafe_get
Deprecate the previous.
2014-08-13 11:30:14 -07:00
Brian Anderson
fbc93082ec std: Rename slice::Vector to Slice
This required some contortions because importing both raw::Slice
and slice::Slice makes rustc crash.

Since `Slice` is in the prelude, this renaming is unlikely to
casue breakage.

[breaking-change]
2014-08-13 11:30:14 -07:00
Aaron Turon
f77cabecbb Deprecation fallout in libcollections 2014-08-12 13:35:56 -07:00
nham
f36ddf1d0e Use byte literals in libcollections tests 2014-08-06 00:57:49 -04:00
Joseph Crail
ad06dfe496 Fix misspelled comments. 2014-08-01 19:42:52 -04:00
Erick Tryzelaar
a011b2273e Fix a whitespace typo 2014-07-29 15:50:44 -07:00
Jonas Hietala
3f56846460 doc: Method examples for String
Reword comments on unsafe methods regarding UTF-8.
2014-07-28 17:03:12 +02:00
Adolfo Ochagavía
75a0062d88 Add string::raw::from_buf 2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
0fe894e49b Deprecated String::from_raw_parts
Replaced by `string::raw::from_parts`

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
6e509d3462 Deprecated str::raw::from_buf_len
Replaced by `string::raw::from_buf_len`

[breaking-change]
2014-07-24 07:25:43 -07:00
Adolfo Ochagavía
9ec19373af Deprecated str::raw::from_utf8_owned
Replaced by `string::raw::from_utf8`

[breaking-change]
2014-07-24 07:25:43 -07:00
Brian Anderson
71a75cc2ce Just land already 2014-07-23 13:20:17 -07:00
Brian Anderson
d36a8f3f9c collections: Move push/pop to MutableSeq
Implement for Vec, DList, RingBuf. Add MutableSeq to the prelude.

Since the collections traits are in the prelude most consumers of
these methods will continue to work without change.

[breaking-change]
2014-07-23 13:20:10 -07:00
bors
8d43e4474a auto merge of #15867 : cmr/rust/rewrite-lexer4, r=alexcrichton 2014-07-22 07:16:17 +00:00
Corey Richardson
188d889aaf ignore-lexer-test to broken files and remove some tray hyphens
I blame @ChrisMorgan for the hyphens.
2014-07-21 10:59:58 -07:00
Ted Horst
dfacef532d fix string in from_utf8_lossy_100_multibyte benchmark 2014-07-21 09:55:02 -07:00
Adolfo Ochagavía
584fbde5d1 Fix errors 2014-07-15 20:34:16 +02:00
Adolfo Ochagavía
c6b82c7566 Deprecate str::from_utf8_lossy
Use `String::from_utf8_lossy` instead

[breaking-change]
2014-07-15 19:55:21 +02:00
Adolfo Ochagavía
1900abdd9b Deprecate str::from_utf16_lossy
Use `String::from_utf16_lossy` instead.

[breaking-change]
2014-07-15 19:55:20 +02:00
Adolfo Ochagavía
6ac4fc7fc2 Deprecate str::from_utf16
Use `String::from_utf16` instead

[breaking-change]
2014-07-15 19:55:19 +02:00
Adolfo Ochagavía
173baac495 Deprecate str::from_byte
Replaced by `String::from_byte`

[breaking-change]
2014-07-15 19:55:19 +02:00
Adolfo Ochagavía
20a6894830 Deprecate str::from_chars
Use `String::from_chars` instead

[breaking-change]
2014-07-15 19:55:18 +02:00
Adolfo Ochagavía
211f1caa29 Deprecate str::from_utf8_owned
Use `String::from_utf8` instead

[breaking-change]
2014-07-15 19:55:17 +02:00
Richo Healey
12c334a77b std: Rename the ToStr trait to ToString, and to_str to to_string.
[breaking-change]
2014-07-08 13:01:43 -07:00
Simon Sapin
ed3eee2e2a Optimize String::push_byte()
```
test new_push_byte ... bench:      6985 ns/iter (+/- 487) = 17 MB/s
test old_push_byte ... bench:     19335 ns/iter (+/- 1368) = 6 MB/s
```

```rust
extern crate test;
use test::Bencher;

static TEXT: &'static str = "\
    Unicode est un standard informatique qui permet des échanges \
    de textes dans différentes langues, à un niveau mondial.";

#[bench]
fn old_push_byte(bencher: &mut Bencher) {
    bencher.bytes = TEXT.len() as u64;
    bencher.iter(|| {
        let mut new = String::new();
        for b in TEXT.bytes() {
            unsafe { new.as_mut_vec().push_all([b]) }
        }
    })
}

#[bench]
fn new_push_byte(bencher: &mut Bencher) {
    bencher.bytes = TEXT.len() as u64;
    bencher.iter(|| {
        let mut new = String::new();
        for b in TEXT.bytes() {
            unsafe { new.as_mut_vec().push(b) }
        }
    })
}
```
2014-07-06 01:11:13 +01:00
Brian Anderson
d21336ee0a rustc: Remove &str indexing from the language.
Being able to index into the bytes of a string encourages
poor UTF-8 hygiene. To get a view of `&[u8]` from either
a `String` or `&str` slice, use the `as_bytes()` method.

Closes #12710.

[breaking-change]
2014-07-01 19:12:29 -07:00
Alex Crichton
f7f95c8f5a std: Bring back half of Add on String
This adds an implementation of Add for String where the rhs is <S: Str>. The
other half of adding strings is where the lhs is <S: Str>, but coherence and
the libcore separation currently prevent that.
2014-06-24 17:17:09 -07:00
Alex Crichton
da0703973a core: Move the collections traits to libcollections
This commit moves Mutable, Map, MutableMap, Set, and MutableSet from
`core::collections` to the `collections` crate at the top-level. Additionally,
this removes the `deque` module and moves the `Deque` trait to only being
available at the top-level of the collections crate.

All functionality continues to be reexported through `std::collections`.

[breaking-change]
2014-06-09 00:38:46 -07:00
Brian Anderson
50942c7695 core: Rename container mod to collections. Closes #12543
Also renames the `Container` trait to `Collection`.

[breaking-change]
2014-06-08 21:29:57 -07:00
Alex Crichton
760b93adc0 Fallout from the libcollections movement 2014-06-05 13:55:11 -07:00
Alex Crichton
6a585375a0 std: Recreate a collections module
As with the previous commit with `librand`, this commit shuffles around some
`collections` code. The new state of the world is similar to that of librand:

* The libcollections crate now only depends on libcore and liballoc.
* The standard library has a new module, `std::collections`. All functionality
  of libcollections is reexported through this module.

I would like to stress that this change is purely cosmetic. There are very few
alterations to these primitives.

There are a number of notable points about the new organization:

* std::{str, slice, string, vec} all moved to libcollections. There is no reason
  that these primitives shouldn't be necessarily usable in a freestanding
  context that has allocation. These are all reexported in their usual places in
  the standard library.

* The `hashmap`, and transitively the `lru_cache`, modules no longer reside in
  `libcollections`, but rather in libstd. The reason for this is because the
  `HashMap::new` contructor requires access to the OSRng for initially seeding
  the hash map. Beyond this requirement, there is no reason that the hashmap
  could not move to libcollections.

  I do, however, have a plan to move the hash map to the collections module. The
  `HashMap::new` function could be altered to require that the `H` hasher
  parameter ascribe to the `Default` trait, allowing the entire `hashmap` module
  to live in libcollections. The key idea would be that the default hasher would
  be different in libstd. Something along the lines of:

      // src/libstd/collections/mod.rs

      pub type HashMap<K, V, H = RandomizedSipHasher> =
            core_collections::HashMap<K, V, H>;

  This is not possible today because you cannot invoke static methods through
  type aliases. If we modified the compiler, however, to allow invocation of
  static methods through type aliases, then this type definition would
  essentially be switching the default hasher from `SipHasher` in libcollections
  to a libstd-defined `RandomizedSipHasher` type. This type's `Default`
  implementation would randomly seed the `SipHasher` instance, and otherwise
  perform the same as `SipHasher`.

  This future state doesn't seem incredibly far off, but until that time comes,
  the hashmap module will live in libstd to not compromise on functionality.

* In preparation for the hashmap moving to libcollections, the `hash` module has
  moved from libstd to libcollections. A previously snapshotted commit enables a
  distinct `Writer` trait to live in the `hash` module which `Hash`
  implementations are now parameterized over.

  Due to using a custom trait, the `SipHasher` implementation has lost its
  specialized methods for writing integers. These can be re-added
  backwards-compatibly in the future via default methods if necessary, but the
  FNV hashing should satisfy much of the need for speedier hashing.

A list of breaking changes:

* HashMap::{get, get_mut} no longer fails with the key formatted into the error
  message with `{:?}`, instead, a generic message is printed. With backtraces,
  it should still be not-too-hard to track down errors.

* The HashMap, HashSet, and LruCache types are now available through
  std::collections instead of the collections crate.

* Manual implementations of hash should be parameterized over `hash::Writer`
  instead of just `Writer`.

[breaking-change]
2014-06-05 13:55:10 -07:00