Commit Graph

13 Commits

Author SHA1 Message Date
Jorge Aparicio
8c59ec0488 unicode: fix fallout 2015-01-03 09:34:04 -05:00
Alex Crichton
4908017d59 std: Stabilize the std::str module
This commit starts out by consolidating all `str` extension traits into one
`StrExt` trait to be included in the prelude. This means that
`UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into
one `StrExt` exported by the standard library. Some functionality is currently
duplicated with the `StrExt` present in libcore.

This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.

Next, stability of methods and structures are as follows:

Stable

* from_utf8_unchecked
* CowString - after moving to std::string
* StrExt::as_bytes
* StrExt::as_ptr
* StrExt::bytes/Bytes - also made a struct instead of a typedef
* StrExt::char_indices/CharIndices - CharOffsets was renamed
* StrExt::chars/Chars
* StrExt::is_empty
* StrExt::len
* StrExt::lines/Lines
* StrExt::lines_any/LinesAny
* StrExt::slice_unchecked
* StrExt::trim
* StrExt::trim_left
* StrExt::trim_right
* StrExt::words/Words - also made a struct instead of a typedef

Unstable

* from_utf8 - the error type was changed to a `Result`, but the error type has
              yet to prove itself
* from_c_str - this function will be handled by the c_str RFC
* FromStr - this trait will have an associated error type eventually
* StrExt::escape_default - needs iterators at least, unsure if it should make
                           the cut
* StrExt::escape_unicode - needs iterators at least, unsure if it should make
                           the cut
* StrExt::slice_chars - this function has yet to prove itself
* StrExt::slice_shift_char - awaiting conventions about slicing and shifting
* StrExt::graphemes/Graphemes - this functionality may only be in libunicode
* StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in
                                             libunicode
* StrExt::width - this functionality may only be in libunicode
* StrExt::utf16_units - this functionality may only be in libunicode
* StrExt::nfd_chars - this functionality may only be in libunicode
* StrExt::nfkd_chars - this functionality may only be in libunicode
* StrExt::nfc_chars - this functionality may only be in libunicode
* StrExt::nfkc_chars - this functionality may only be in libunicode
* StrExt::is_char_boundary - naming is uncertain with container conventions
* StrExt::char_range_at - naming is uncertain with container conventions
* StrExt::char_range_at_reverse - naming is uncertain with container conventions
* StrExt::char_at - naming is uncertain with container conventions
* StrExt::char_at_reverse - naming is uncertain with container conventions
* StrVector::concat - this functionality may be replaced with iterators, but
                      it's not certain at this time
* StrVector::connect - as with concat, may be deprecated in favor of iterators

Deprecated

* StrAllocating and UnicodeStrPrelude have been merged into StrExit
* eq_slice - compiler implementation detail
* from_str - use the inherent parse() method
* is_utf8 - call from_utf8 instead
* replace - call the method instead
* truncate_utf16_at_nul - this is an implementation detail of windows and does
                          not need to be exposed.
* utf8_char_width - moved to libunicode
* utf16_items - moved to libunicode
* is_utf16 - moved to libunicode
* Utf16Items - moved to libunicode
* Utf16Item - moved to libunicode
* Utf16Encoder - moved to libunicode
* AnyLines - renamed to LinesAny and made a struct
* SendStr - use CowString<'static> instead
* str::raw - all functionality is deprecated
* StrExt::into_string - call to_string() instead
* StrExt::repeat - use iterators instead
* StrExt::char_len - use .chars().count() instead
* StrExt::is_alphanumeric - use .chars().all(..)
* StrExt::is_whitespace - use .chars().all(..)

Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]

* Str - while currently used for generic programming, this trait will be
        replaced with one of [], deref coercions, or a generic conversion trait.
* StrExt::slice - use slicing syntax instead
* StrExt::slice_to - use slicing syntax instead
* StrExt::slice_from - use slicing syntax instead
* StrExt::lev_distance - deprecated with no replacement

Awaiting stabilization due to patterns and/or matching

* StrExt::contains
* StrExt::contains_char
* StrExt::split
* StrExt::splitn
* StrExt::split_terminator
* StrExt::rsplitn
* StrExt::match_indices
* StrExt::split_str
* StrExt::starts_with
* StrExt::ends_with
* StrExt::trim_chars
* StrExt::trim_left_chars
* StrExt::trim_right_chars
* StrExt::find
* StrExt::rfind
* StrExt::find_str
* StrExt::subslice_offset
2014-12-21 19:09:55 -08:00
Jorge Aparicio
50ef207253 libunicode: fix fallout 2014-12-13 17:03:45 -05:00
Steven Fackler
348cc9418a Remove special casing for some meta attributes
Descriptions and licenses are handled by Cargo now, so there's no reason
to keep these attributes around.
2014-11-26 11:44:45 -08:00
Steven Fackler
3dcd215740 Switch to purely namespaced enums
This breaks code that referred to variant names in the same namespace as
their enum. Reexport the variants in the old location or alter code to
refer to the new locations:

```
pub enum Foo {
    A,
    B
}

fn main() {
    let a = A;
}
```
=>
```
pub use self::Foo::{A, B};

pub enum Foo {
    A,
    B
}

fn main() {
    let a = A;
}
```
or
```
pub enum Foo {
    A,
    B
}

fn main() {
    let a = Foo::A;
}
```

[breaking-change]
2014-11-17 07:35:51 -08:00
Aaron Turon
cfafc1b737 Prelude: rename and consolidate extension traits
This commit renames a number of extension traits for slices and string
slices, now that they have been refactored for DST. In many cases,
multiple extension traits could now be consolidated. Further
consolidation will be possible with generalized where clauses.

The renamings are consistent with the [new `-Prelude`
suffix](https://github.com/rust-lang/rfcs/pull/344). There are probably
a few more candidates for being renamed this way, but that is left for
API stabilization of the relevant modules.

Because this renames traits, it is a:

[breaking-change]

However, I do not expect any code that currently uses the standard
library to actually break.

Closes #17917
2014-11-06 08:03:18 -08:00
Simon Sapin
61a8a28f9f Include the Unicode version used to generate src/libunicode/tables.rs. 2014-10-13 14:07:12 +01:00
Brian Anderson
5c92a8e054 Use the same html_root_url for all docs 2014-10-09 10:50:13 -07:00
Florian Zeitz
7ece0abe64 collections, unicode: Add support for NFC and NFKC 2014-07-28 18:47:38 +02:00
Alex Crichton
707cf47ac8 Register new snapshots 2014-07-19 20:38:00 -07:00
kwantam
cf432b8f8f add Graphemes iterator; tidy unicode exports
- Graphemes and GraphemeIndices structs implement iterators over
  grapheme clusters analogous to the Chars and CharOffsets for chars in
  a string. Iterator and DoubleEndedIterator are available for both.

- tidied up the exports for libunicode. crate root exports are now moved
  into more appropriate module locations:
  - UnicodeStrSlice, Words, Graphemes, GraphemeIndices are in str module
  - UnicodeChar exported from char instead of crate root
  - canonical_combining_class is exported from str rather than crate root

Since libunicode's exports have changed, programs that previously relied
on the old export locations will need to change their `use` statements
to reflect the new ones. See above for more information on where the new
exports live.

closes #7043
[breaking-change]
2014-07-14 19:53:46 -04:00
Brian Anderson
207b83ae2f unicode: Remove crate_id attr 2014-07-11 11:27:00 -07:00
kwantam
5d4238b6fc Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
  - Unicode-aware functions now live in libunicode
    - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
      is_uppercase, is_whitespace, is_alphanumeric, is_control,
      is_digit, to_uppercase, to_lowercase
  - added width method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is
      a non-NULL control character
    - takes a boolean argument indicating whether the present context is
      CJK or not (characters with 'A'mbiguous widths are double-wide in
      CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
  (libunicode)
  - functionality formerly in StrSlice that relied upon Unicode
    functionality from Char is now in UnicodeStrSlice
    - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
  - also moved Words type alias into libunicode because words method is
    in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
  libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
  combining the libunicode functionality with the Char functionality
  from libcore
  - thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str

The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-07 14:52:24 -04:00