Commit Graph

70 Commits

Author SHA1 Message Date
bors
5366cfecf3 auto merge of #17438 : alexcrichton/rust/string-stable, r=aturon
# Rationale

When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name.  There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).

The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:

> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.

# Method stabilization

The following methods have been marked #[stable]

* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice

The following methods have been marked #[unstable]

* String::from_utf8 - The error type in the returned `Result` may change to
                      provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
                            stabilization
* String::from_utf16 - The return type may change to become a `Result` which
                       includes more contextual information like where the error
                       occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
                       as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
                      been marked #[unstable] becuase it can be seen as a
                      duplicate of iterator-based functionality as well as
                      possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
                     less efficient because of decoding/encoding. Due to the
                     desire to minimize API surface this may be able to be
                     removed in the future for something possibly generic with
                     no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
                 become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
                     indices isn't totally clear at this time, so the failure
                     semantics and return value of this method are subject to
                     change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]

[2]: rust-lang/rfcs#240

The following method have been marked #[experimental]

* String::from_str - This function only exists as it's more efficient than
                     to_string(), but having a less ergonomic function for
                     performance reasons isn't the greatest reason to keep it
                     around. Like Vec::push_all, this has been marked
                     experimental for now.

The following methods have been #[deprecated]

* String::append - This method has been deprecated to remain consistent with the
                   deprecation of Vec::append. While convenient, it is one of
                   the only functional-style apis on String, and requires more
                   though as to whether it belongs as a first-class method or
                   now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
                      str::from_utf8 plus an assert plus a call to to_string().
                      Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
                          above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
                       type which allow bypassing utf-8 checks. These have all
                       been deprecated in favor of calling `.as_mut_vec()` and
                       then operating directly on the vector returned. These
                       methods were deprecated because naming them with relation
                       to other methods was difficult to rationalize and it's
                       arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes

# Reservation methods

This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]

[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
2014-09-24 14:00:57 +00:00
Alex Crichton
50375139e2 Deal with the fallout of string stabilization 2014-09-23 18:31:52 -07:00
Patrick Walton
e9ad12c0ca librustc: Forbid private types in public APIs.
This breaks code like:

    struct Foo {
        ...
    }

    pub fn make_foo() -> Foo {
        ...
    }

Change this code to:

    pub struct Foo {    // note `pub`
        ...
    }

    pub fn make_foo() -> Foo {
        ...
    }

The `visible_private_types` lint has been removed, since it is now an
error to attempt to expose a private type in a public API. In its place
a `#[feature(visible_private_types)]` gate has been added.

Closes #16463.

RFC #48.

[breaking-change]
2014-09-22 20:05:45 -07:00
Alex Crichton
0169218047 Fix fallout from Vec stabilization 2014-09-21 22:15:51 -07:00
Nick Cameron
ce0907e46e Add enum variants to the type namespace
Change to resolve and update compiler and libs for uses.

[breaking-change]

Enum variants are now in both the value and type namespaces. This means that
if you have a variant with the same name as a type in scope in a module, you
will get a name clash and thus an error. The solution is to either rename the
type or the variant.
2014-09-19 15:11:00 +12:00
Aaron Turon
fc525eeb4e Fallout from renaming 2014-09-16 14:37:48 -07:00
bors
d3e7922ddd auto merge of #16982 : jbcrail/rust/comment-and-string-corrections, r=alexcrichton
I corrected spelling and capitalization errors in comments and strings.
2014-09-04 18:30:59 +00:00
Joseph Crail
b7bfe04b2d Fix spelling errors and capitalization. 2014-09-03 23:10:38 -04:00
wickerwaka
2cb210d2c6 Updated to new extern crate syntax.
Added warning for old deprecated syntax
2014-09-01 09:02:00 -07:00
P1start
de7abd8824 Unify non-snake-case lints and non-uppercase statics lints
This unifies the `non_snake_case_functions` and `uppercase_variables` lints
into one lint, `non_snake_case`. It also now checks for non-snake-case modules.
This also extends the non-camel-case types lint to check type parameters, and
merges the `non_uppercase_pattern_statics` lint into the
`non_uppercase_statics` lint.

Because the `uppercase_variables` lint is now part of the `non_snake_case`
lint, all non-snake-case variables that start with lowercase characters (such
as `fooBar`) will now trigger the `non_snake_case` lint.

New code should be updated to use the new `non_snake_case` lint instead of the
previous `non_snake_case_functions` and `uppercase_variables` lints. All use of
the `non_uppercase_pattern_statics` should be replaced with the
`non_uppercase_statics` lint. Any code that previously contained non-snake-case
module or variable names should be updated to use snake case names or disable
the `non_snake_case` lint. Any code with non-camel-case type parameters should
be changed to use camel case or disable the `non_camel_case_types` lint.

[breaking-change]
2014-08-30 09:10:05 +12:00
klutzy
d7916f8d44 regex: Enable test on Windows
Fixes #13725
2014-08-18 13:44:29 +09:00
Brian Anderson
bc450b17e3 core: Change the argument order on splitn and rsplitn for strs.
This makes it consistent with the same functions for slices,
and allows the search closure to be specified last.

[breaking-change]
2014-08-13 15:27:37 -07:00
Brian Anderson
a4b354ca02 core: Add binary_search and binary_search_elem methods to slices.
These are like the existing bsearch methods but if the search fails,
it returns the next insertion point.

The new `binary_search` returns a `BinarySearchResult` that is either
`Found` or `NotFound`. For convenience, the `found` and `not_found`
methods convert to `Option`, ala `Result`.

Deprecate bsearch and bsearch_elem.
2014-08-13 11:30:15 -07:00
Brian Anderson
4f5b6927e8 std: Rename various slice traits for consistency
ImmutableVector -> ImmutableSlice
ImmutableEqVector -> ImmutableEqSlice
ImmutableOrdVector -> ImmutableOrdSlice
MutableVector -> MutableSlice
MutableVectorAllocating -> MutableSliceAllocating
MutableCloneableVector -> MutableCloneableSlice
MutableOrdVector -> MutableOrdSlice

These are all in the prelude so most code will not break.

[breaking-change]
2014-08-13 11:30:14 -07:00
nham
30cfce3d92 Use a byte literal in libregex 2014-08-06 01:18:19 -04:00
Brian Anderson
9db1d35687 collections: Deprecate shift/unshift
Use insert/remove instead.
2014-07-23 13:20:16 -07:00
bors
8d43e4474a auto merge of #15867 : cmr/rust/rewrite-lexer4, r=alexcrichton 2014-07-22 07:16:17 +00:00
Corey Richardson
35c0bf3292 Add a ton of ignore-lexer-test 2014-07-21 18:38:40 -07:00
Corey Richardson
188d889aaf ignore-lexer-test to broken files and remove some tray hyphens
I blame @ChrisMorgan for the hyphens.
2014-07-21 10:59:58 -07:00
Nick Cameron
aa760a849e deprecate Vec::get 2014-07-17 12:08:31 +12:00
Adolfo Ochagavía
584fbde5d1 Fix errors 2014-07-15 20:34:16 +02:00
Adolfo Ochagavía
20a6894830 Deprecate str::from_chars
Use `String::from_chars` instead

[breaking-change]
2014-07-15 19:55:18 +02:00
Adolfo Ochagavía
211f1caa29 Deprecate str::from_utf8_owned
Use `String::from_utf8` instead

[breaking-change]
2014-07-15 19:55:17 +02:00
Brian Anderson
fa2d220567 Update doc URLs for version bump 2014-07-11 11:21:57 -07:00
bors
898701cb35 auto merge of #15556 : alexcrichton/rust/snapshots, r=brson
Closes #15544
2014-07-10 03:21:30 +00:00
Alex Crichton
0c71e0c596 Register new snapshots
Closes #15544
2014-07-09 10:57:58 -07:00
kwantam
85e2bee4a2 fix test failures
- unicode tests live in coretest crate
- libcollections str tests need UnicodeChar trait.
- libregex perlw tests were checking a char in the Alphabetic category,
  \x2161. Confirmed perl 5.18 considers this a \w character. Changed to
  \x2961, which is not \w as the test expects.
2014-07-09 10:14:46 -04:00
kwantam
5d4238b6fc Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
  - Unicode-aware functions now live in libunicode
    - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
      is_uppercase, is_whitespace, is_alphanumeric, is_control,
      is_digit, to_uppercase, to_lowercase
  - added width method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is
      a non-NULL control character
    - takes a boolean argument indicating whether the present context is
      CJK or not (characters with 'A'mbiguous widths are double-wide in
      CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
  (libunicode)
  - functionality formerly in StrSlice that relied upon Unicode
    functionality from Char is now in UnicodeStrSlice
    - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
  - also moved Words type alias into libunicode because words method is
    in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
  libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
  combining the libunicode functionality with the Char functionality
  from libcore
  - thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str

The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-07 14:52:24 -04:00
Alex Crichton
e44c2b9bbc Add #[crate_name] attributes as necessary 2014-07-05 12:45:42 -07:00
Alex Crichton
ff1dd44b40 Merge remote-tracking branch 'origin/master' into 0.11.0-release
Conflicts:
	src/libstd/lib.rs
2014-07-02 11:08:21 -07:00
Patrick Walton
a5bb0a3a45 librustc: Remove the fallback to int for integers and f64 for
floating point numbers for real.

This will break code that looks like:

    let mut x = 0;
    while ... {
        x += 1;
    }
    println!("{}", x);

Change that code to:

    let mut x = 0i;
    while ... {
        x += 1;
    }
    println!("{}", x);

Closes #15201.

[breaking-change]
2014-06-29 11:47:58 -07:00
Alex Crichton
aa1163b92d Update to 0.11.0 2014-06-27 12:50:16 -07:00
Alex Crichton
89b0e6e12b Register new snapshots 2014-06-15 23:30:24 -07:00
Cameron Zwarich
159e27aebb Fix all violations of stronger guarantees for mutable borrows
Fix all violations in the Rust source tree of the stronger guarantee
of a unique access path for mutable borrows as described in #12624.
2014-06-13 20:48:09 -07:00
bors
0422934e24 auto merge of #14831 : alexcrichton/rust/format-intl, r=brson
* The select/plural methods from format strings are removed
* The # character no longer needs to be escaped
* The \-based escapes have been removed
* '{{' is now an escape for '{'
* '}}' is now an escape for '}'

Closes #14810
[breaking-change]
2014-06-13 14:42:03 +00:00
Alex Crichton
cac7a2053a std: Remove i18n/l10n from format!
* The select/plural methods from format strings are removed
* The # character no longer needs to be escaped
* The \-based escapes have been removed
* '{{' is now an escape for '{'
* '}}' is now an escape for '}'

Closes #14810
[breaking-change]
2014-06-11 16:04:24 -07:00
Alex Crichton
3316b1eb7c rustc: Remove ~[T] from the language
The following features have been removed

* box [a, b, c]
* ~[a, b, c]
* box [a, ..N]
* ~[a, ..N]
* ~[T] (as a type)
* deprecated_owned_vector lint

All users of ~[T] should move to using Vec<T> instead.
2014-06-11 15:02:17 -07:00
Joseph Crail
c2c9946372 Fix more misspelled comments and strings. 2014-06-10 11:24:17 -04:00
Keegan McAllister
84243ed6e1 Use phase(plugin) in other crates 2014-06-09 14:29:30 -07:00
Brian Anderson
50942c7695 core: Rename container mod to collections. Closes #12543
Also renames the `Container` trait to `Collection`.

[breaking-change]
2014-06-08 21:29:57 -07:00
Alex Crichton
e5bbbca33e rustdoc: Submit examples to play.rust-lang.org
This grows a new option inside of rustdoc to add the ability to submit examples
to an external website. If the `--markdown-playground-url` command line option
or crate doc attribute `html_playground_url` is present, then examples will have
a button on hover to submit the code to the playground specified.

This commit enables submission of example code to play.rust-lang.org. The code
submitted is that which is tested by rustdoc, not necessarily the exact code
shown in the example.

Closes #14654
2014-06-06 20:00:16 -07:00
Alex Crichton
760b93adc0 Fallout from the libcollections movement 2014-06-05 13:55:11 -07:00
bors
d130acc0d0 auto merge of #14635 : BurntSushi/rust/regex-doco-touchups, r=alexcrichton 2014-06-03 23:51:41 -07:00
Andrew Gallant
179fc6dbfd Some minor documentation touchups for libregex. Fixes #13800. 2014-06-03 23:45:54 -04:00
Andrew Gallant
9d39178f2f Fixes #13843.
An empty regex is a valid regex that always matches. This behavior
is consistent with at least Go and Python.

A couple regression tests are included.
2014-06-03 23:04:59 -04:00
Alex Crichton
748bc3ca49 std: Rename {Eq,Ord} to Partial{Eq,Ord}
This is part of the ongoing renaming of the equality traits. See #12517 for more
details. All code using Eq/Ord will temporarily need to move to Partial{Eq,Ord}
or the Total{Eq,Ord} traits. The Total traits will soon be renamed to {Eq,Ord}.

cc #12517

[breaking-change]
2014-05-30 15:52:24 -07:00
Kevin Butler
3faa6762c1 lib{std,core,debug,rustuv,collections,native,regex}: Fix snake_case errors.
A number of functions/methods have been moved or renamed to align
better with rust standard conventions.

std::reflect::MovePtrAdaptor => MovePtrAdaptor::new
debug::reflect::MovePtrAdaptor => MovePtrAdaptor::new
std::repr::ReprVisitor => ReprVisitor::new
debug::repr::ReprVisitor => ReprVisitor::new
rustuv::homing::HomingIO.go_to_IO_home => go_to_io_home

[breaking-change]
2014-05-30 17:55:41 +01:00
bors
6510527e15 auto merge of #14510 : kballard/rust/rename_strallocating_into_owned, r=alexcrichton
We already have into_string(), but it was implemented in terms of
into_owned(). Flip it around and deprecate into_owned().

Remove a few spurious calls to .into_owned() that existed in libregex
and librustdoc.
2014-05-29 19:31:42 -07:00
Alex Crichton
925ff65118 std: Recreate a rand module
This commit shuffles around some of the `rand` code, along with some
reorganization. The new state of the world is as follows:

* The librand crate now only depends on libcore. This interface is experimental.
* The standard library has a new module, `std::rand`. This interface will
  eventually become stable.

Unfortunately, this entailed more of a breaking change than just shuffling some
names around. The following breaking changes were made to the rand library:

* Rng::gen_vec() was removed. This has been replaced with Rng::gen_iter() which
  will return an infinite stream of random values. Previous behavior can be
  regained with `rng.gen_iter().take(n).collect()`

* Rng::gen_ascii_str() was removed. This has been replaced with
  Rng::gen_ascii_chars() which will return an infinite stream of random ascii
  characters. Similarly to gen_iter(), previous behavior can be emulated with
  `rng.gen_ascii_chars().take(n).collect()`

* {IsaacRng, Isaac64Rng, XorShiftRng}::new() have all been removed. These all
  relied on being able to use an OSRng for seeding, but this is no longer
  available in librand (where these types are defined). To retain the same
  functionality, these types now implement the `Rand` trait so they can be
  generated with a random seed from another random number generator. This allows
  the stdlib to use an OSRng to create seeded instances of these RNGs.

* Rand implementations for `Box<T>` and `@T` were removed. These seemed to be
  pretty rare in the codebase, and it allows for librand to not depend on
  liballoc.  Additionally, other pointer types like Rc<T> and Arc<T> were not
  supported.  If this is undesirable, librand can depend on liballoc and regain
  these implementations.

* The WeightedChoice structure is no longer built with a `Vec<Weighted<T>>`,
  but rather a `&mut [Weighted<T>]`. This means that the WeightedChoice
  structure now has a lifetime associated with it.

* The `sample` method on `Rng` has been moved to a top-level function in the
  `rand` module due to its dependence on `Vec`.

cc #13851

[breaking-change]
2014-05-29 16:18:26 -07:00
Kevin Ballard
eb98c9eeaa Replace StrAllocating.into_owned() with .into_string()
We already have into_string(), but it was implemented in terms of
into_owned(). Flip it around and deprecate into_owned().

Remove a few spurious calls to .into_owned() that existed in libregex
and librustdoc.
2014-05-28 21:31:19 -07:00