61 Commits

Author SHA1 Message Date
Andrea Canciani
cf3fcf7758 Reuse standard methods
Do not hand-code `Result::ok` or `cmp` in tables.rs.
2016-01-04 17:51:12 +01:00
Andrea Canciani
b081436ca4 Improve formatting of tables.rs
Make unicode.py generate a tables.rs which is more conformant to usual
Rust formatting (as per `rustfmt`).
2016-01-04 17:51:05 +01:00
Andrea Canciani
eab351ef3e Cleanup unicode.py
The methods related to char width are dead code since
464cdff102993ff1900eebbf65209e0a3c0be0d5; remove them.
2016-01-04 17:31:41 +01:00
Alex Crichton
464cdff102 std: Stabilize APIs for the 1.6 release
This commit is the standard API stabilization commit for the 1.6 release cycle.
The list of issues and APIs below have all been through their cycle-long FCP and
the libs team decisions are listed below

Stabilized APIs

* `Read::read_exact`
* `ErrorKind::UnexpectedEof` (renamed from `UnexpectedEOF`)
* libcore -- this was a bit of a nuanced stabilization, the crate itself is now
  marked as `#[stable]` and the methods appearing via traits for primitives like
  `char` and `str` are now also marked as stable. Note that the extension traits
  themeselves are marked as unstable as they're imported via the prelude. The
  `try!` macro was also moved from the standard library into libcore to have the
  same interface. Otherwise the functions all have copied stability from the
  standard library now.
* The `#![no_std]` attribute
* `fs::DirBuilder`
* `fs::DirBuilder::new`
* `fs::DirBuilder::recursive`
* `fs::DirBuilder::create`
* `os::unix::fs::DirBuilderExt`
* `os::unix::fs::DirBuilderExt::mode`
* `vec::Drain`
* `vec::Vec::drain`
* `string::Drain`
* `string::String::drain`
* `vec_deque::Drain`
* `vec_deque::VecDeque::drain`
* `collections::hash_map::Drain`
* `collections::hash_map::HashMap::drain`
* `collections::hash_set::Drain`
* `collections::hash_set::HashSet::drain`
* `collections::binary_heap::Drain`
* `collections::binary_heap::BinaryHeap::drain`
* `Vec::extend_from_slice` (renamed from `push_all`)
* `Mutex::get_mut`
* `Mutex::into_inner`
* `RwLock::get_mut`
* `RwLock::into_inner`
* `Iterator::min_by_key` (renamed from `min_by`)
* `Iterator::max_by_key` (renamed from `max_by`)

Deprecated APIs

* `ErrorKind::UnexpectedEOF` (renamed to `UnexpectedEof`)
* `OsString::from_bytes`
* `OsStr::to_cstring`
* `OsStr::to_bytes`
* `fs::walk_dir` and `fs::WalkDir`
* `path::Components::peek`
* `slice::bytes::MutableByteVector`
* `slice::bytes::copy_memory`
* `Vec::push_all` (renamed to `extend_from_slice`)
* `Duration::span`
* `IpAddr`
* `SocketAddr::ip`
* `Read::tee`
* `io::Tee`
* `Write::broadcast`
* `io::Broadcast`
* `Iterator::min_by` (renamed to `min_by_key`)
* `Iterator::max_by` (renamed to `max_by_key`)
* `net::lookup_addr`

New APIs (still unstable)

* `<[T]>::sort_by_key` (added to mirror `min_by_key`)

Closes #27585
Closes #27704
Closes #27707
Closes #27710
Closes #27711
Closes #27727
Closes #27740
Closes #27744
Closes #27799
Closes #27801
cc #27801 (doesn't close as `Chars` is still unstable)
Closes #28968
2015-12-05 15:09:44 -08:00
Corentin Henry
1bb7205082 rustfmt librustc_unicode 2015-10-26 17:57:53 +01:00
Alex Crichton
8d90d3f368 Remove all unstable deprecated functionality
This commit removes all unstable and deprecated functions in the standard
library. A release was recently cut (1.3) which makes this a good time for some
spring cleaning of the deprecated functions.
2015-08-12 14:55:17 -07:00
Simon Sapin
32b7b50baf Remove char::to_titlecase. Fix #26555
I added it because it was easy (same a `char::to_lowercase`,
just a different table), but it doesn’t make sense to have this
in std but not str::to_titlecase, which would require
https://github.com/unicode-rs/unicode-segmentation

At some point in the future this feature will be available
(both on char and str) in a crates.io crate.
2015-06-24 22:16:25 -07:00
Simon Sapin
f901086b0d Correctly map upper-case Sigma to lower-case in word-final position. Fix #26035. 2015-06-06 12:37:11 +02:00
Simon Sapin
d316487ec1 Add char::to_titlecase
But not str::to_titlecase which would require UAX#29 Unicode Text Segmentation
which we decided not to include in of `std`:
https://github.com/rust-lang/rfcs/pull/1054
2015-06-06 12:37:11 +02:00
Simon Sapin
addaa5b1ff Add complex (but unconditional) Unicode case mapping. Fix #25800
As a result, the iterator returned by `char::to_uppercase` sometimes
yields two or three `char`s instead of just one.
2015-06-06 12:37:10 +02:00
Simon Sapin
66af12721a to_lowercase/to_uppercase: also map chars not in Lu/Ll categories.
This adds 120 mappings:

Dž dž
Dž DŽ
Lj lj
Lj LJ
Nj nj
Nj NJ
Dz dz
Dz DZ
 Ι
ᾈ ᾀ
ᾉ ᾁ
ᾊ ᾂ
ᾋ ᾃ
ᾌ ᾄ
ᾍ ᾅ
ᾎ ᾆ
ᾏ ᾇ
ᾘ ᾐ
ᾙ ᾑ
ᾚ ᾒ
ᾛ ᾓ
ᾜ ᾔ
ᾝ ᾕ
ᾞ ᾖ
ᾟ ᾗ
ᾨ ᾠ
ᾩ ᾡ
ᾪ ᾢ
ᾫ ᾣ
ᾬ ᾤ
ᾭ ᾥ
ᾮ ᾦ
ᾯ ᾧ
ᾼ ᾳ
ῌ ῃ
ῼ ῳ
Ⅰ ⅰ
Ⅱ ⅱ
Ⅲ ⅲ
Ⅳ ⅳ
Ⅴ ⅴ
Ⅵ ⅵ
Ⅶ ⅶ
Ⅷ ⅷ
Ⅸ ⅸ
Ⅹ ⅹ
Ⅺ ⅺ
Ⅻ ⅻ
Ⅼ ⅼ
Ⅽ ⅽ
Ⅾ ⅾ
Ⅿ ⅿ
ⅰ Ⅰ
ⅱ Ⅱ
ⅲ Ⅲ
ⅳ Ⅳ
ⅴ Ⅴ
ⅵ Ⅵ
ⅶ Ⅶ
ⅷ Ⅷ
ⅸ Ⅸ
ⅹ Ⅹ
ⅺ Ⅺ
ⅻ Ⅻ
ⅼ Ⅼ
ⅽ Ⅽ
ⅾ Ⅾ
ⅿ Ⅿ
Ⓐ ⓐ
Ⓑ ⓑ
Ⓒ ⓒ
Ⓓ ⓓ
Ⓔ ⓔ
Ⓕ ⓕ
Ⓖ ⓖ
Ⓗ ⓗ
Ⓘ ⓘ
Ⓙ ⓙ
Ⓚ ⓚ
Ⓛ ⓛ
Ⓜ ⓜ
Ⓝ ⓝ
Ⓞ ⓞ
Ⓟ ⓟ
Ⓠ ⓠ
Ⓡ ⓡ
Ⓢ ⓢ
Ⓣ ⓣ
Ⓤ ⓤ
Ⓥ ⓥ
Ⓦ ⓦ
Ⓧ ⓧ
Ⓨ ⓨ
Ⓩ ⓩ
ⓐ Ⓐ
ⓑ Ⓑ
ⓒ Ⓒ
ⓓ Ⓓ
ⓔ Ⓔ
ⓕ Ⓕ
ⓖ Ⓖ
ⓗ Ⓗ
ⓘ Ⓘ
ⓙ Ⓙ
ⓚ Ⓚ
ⓛ Ⓛ
ⓜ Ⓜ
ⓝ Ⓝ
ⓞ Ⓞ
ⓟ Ⓟ
ⓠ Ⓠ
ⓡ Ⓡ
ⓢ Ⓢ
ⓣ Ⓣ
ⓤ Ⓤ
ⓥ Ⓥ
ⓦ Ⓦ
ⓧ Ⓧ
ⓨ Ⓨ
ⓩ Ⓩ
2015-06-06 12:37:10 +02:00
kwantam
f14d289d71 optimize Unicode tables
Apply optimization described in
https://github.com/rust-lang/regex/pull/73#issuecomment-93777126
to rust's copy of `unicode.py`.

This shrinks librustc_unicode's tables.rs from 479kB to 456kB,
and should improve performance slightly for related operations
(e.g., is_alphabetic(), is_xid_start(), etc).

In addition, pull in fix from @dscorbett's commit
d25c39f86568a147f9b7080c25711fb1f98f056a in regex, which
makes `load_properties()` more tolerant of whitespace
in the Unicode tables. (This fix does not result in any
changes to tables.rs, but could if the Unicode tables
change in the future.)
2015-04-18 13:20:57 -04:00
kwantam
29d1252e4d deprecate Unicode functions that will be moved to crates.io
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
2015-04-16 17:03:05 -04:00
Chris Wong
5308ac939a Remove regex module from libunicode
The regex crate keeps its own tables now (rust-lang/regex#41) so we
don't need them here.

[breaking-change]
2015-04-13 10:30:10 +12:00
kwantam
bef00ab2b8 use normative source for Grapheme class data
@mahkoh points out in #15628 that unicode.py does not use
normative data for Grapheme classes. This pr fixes that issue.

In addition, GC_RegionalIndicator is renamed GC_Regional_Indicator
in order to stay in line with the Unicode class name definitions.
I have updated refs in u_str.rs, and verified that there are no
refs elsewhere in the codebase. However, in principle someone
using the unicode tables for their own purposes might see breakage
from this.
2015-04-06 19:46:48 -04:00
Florian Zeitz
c9e2de42b5 unicode: Properly parse ranges in UnicodeData.txt
This handles the ranges contained in UnicodeData.txt.
Counterintuitively this actually makes the tables shorter.
2015-03-03 20:04:55 +01:00
Florian Zeitz
f35f973cb7 Use consts instead of statics where appropriate
This changes the type of some public constants/statics in libunicode.
Notably some `&'static &'static [(char, char)]` have changed
to `&'static [(char, char)]`. The regexp crate seems to be the
sole user of these, yet this is technically a [breaking-change]
2015-03-02 17:11:51 +01:00
Vadim Petrochenkov
09f53fd45c Audit integer types in libunicode, libcore/(char, str) and libstd/ascii 2015-02-15 00:09:40 +03:00
Jorge Aparicio
bff462302b cleanup: s/impl Copy/#[derive(Copy)]/g 2015-01-25 11:20:38 -05:00
Earl St Sauver
6ab95bdd62 s/deriving/derives in Comments/Docs
There are a large number of places that incorrectly refer
to deriving in comments, instead of derives.

Fixes #20984
2015-01-17 11:08:02 -08:00
Alex Crichton
7741516a8b std: Collapse SlicePrelude traits
This commit collapses the various prelude traits for slices into just one trait:

* SlicePrelude/SliceAllocPrelude => SliceExt
* CloneSlicePrelude/CloneSliceAllocPrelude => CloneSliceExt
* OrdSlicePrelude/OrdSliceAllocPrelude => OrdSliceExt
* PartialEqSlicePrelude => PartialEqSliceExt
2014-12-14 19:03:56 -08:00
Jorge Aparicio
029789b98c Get rid of all the remaining uses of refN/valN/mutN/TupleN 2014-12-13 20:04:41 -05:00
Alex Crichton
52edb2ecc9 Register new snapshots 2014-12-11 11:30:38 -08:00
Corey Farwell
4ef16741e3 Utilize fewer reexports
In regards to:

https://github.com/rust-lang/rust/issues/19253#issuecomment-64836729

This commit:

* Changes the #deriving code so that it generates code that utilizes fewer
  reexports (in particur Option::* and Result::*), which is necessary to
  remove those reexports in the future
* Changes other areas of the codebase so that fewer reexports are utilized
2014-12-05 18:13:04 -05:00
Steven Fackler
3dcd215740 Switch to purely namespaced enums
This breaks code that referred to variant names in the same namespace as
their enum. Reexport the variants in the old location or alter code to
refer to the new locations:

```
pub enum Foo {
    A,
    B
}

fn main() {
    let a = A;
}
```
=>
```
pub use self::Foo::{A, B};

pub enum Foo {
    A,
    B
}

fn main() {
    let a = A;
}
```
or
```
pub enum Foo {
    A,
    B
}

fn main() {
    let a = Foo::A;
}
```

[breaking-change]
2014-11-17 07:35:51 -08:00
Alex Crichton
fa530fff51 rollup merge of #18656 : thiagopnts/rename-deprecated-non_uppercase_statics 2014-11-06 13:31:54 -08:00
Aaron Turon
cfafc1b737 Prelude: rename and consolidate extension traits
This commit renames a number of extension traits for slices and string
slices, now that they have been refactored for DST. In many cases,
multiple extension traits could now be consolidated. Further
consolidation will be possible with generalized where clauses.

The renamings are consistent with the [new `-Prelude`
suffix](https://github.com/rust-lang/rfcs/pull/344). There are probably
a few more candidates for being renamed this way, but that is left for
API stabilization of the relevant modules.

Because this renames traits, it is a:

[breaking-change]

However, I do not expect any code that currently uses the standard
library to actually break.

Closes #17917
2014-11-06 08:03:18 -08:00
thiagopnts
23913ec713 rename deprecated non_uppercase_statics to non_upper_case_globals 2014-11-05 12:04:26 -02:00
Patrick Walton
e8d6031c71 libsyntax: Forbid escapes in the inclusive range \x80-\xff in
Unicode characters and strings.

Use `\u0080`-`\u00ff` instead. ASCII/byte literals are unaffected.

This PR introduces a new function, `escape_default`, into the ASCII
module. This was necessary for the pretty printer to continue to
function.

RFC #326.

Closes #18062.

[breaking-change]
2014-11-04 14:58:11 -08:00
Joseph Crail
835b92efb8 Replace deprecated missing_doc attribute. 2014-11-01 21:12:13 -04:00
Simon Sapin
61a8a28f9f Include the Unicode version used to generate src/libunicode/tables.rs. 2014-10-13 14:07:12 +01:00
Alex Crichton
34d66de52a unicode: Make statics legal
The tables in libunicode are far too large to want to be inlined into any other
program, so these tables are all going to remain `static`. For them to be legal,
they cannot reference one another by value, but instead use references now.

This commit also modifies the src/etc/unicode.py script to generate the right
tables.
2014-10-09 09:44:51 -07:00
P1start
de7abd8824 Unify non-snake-case lints and non-uppercase statics lints
This unifies the `non_snake_case_functions` and `uppercase_variables` lints
into one lint, `non_snake_case`. It also now checks for non-snake-case modules.
This also extends the non-camel-case types lint to check type parameters, and
merges the `non_uppercase_pattern_statics` lint into the
`non_uppercase_statics` lint.

Because the `uppercase_variables` lint is now part of the `non_snake_case`
lint, all non-snake-case variables that start with lowercase characters (such
as `fooBar`) will now trigger the `non_snake_case` lint.

New code should be updated to use the new `non_snake_case` lint instead of the
previous `non_snake_case_functions` and `uppercase_variables` lints. All use of
the `non_uppercase_pattern_statics` should be replaced with the
`non_uppercase_statics` lint. Any code that previously contained non-snake-case
module or variable names should be updated to use snake case names or disable
the `non_snake_case` lint. Any code with non-camel-case type parameters should
be changed to use camel case or disable the `non_camel_case_types` lint.

[breaking-change]
2014-08-30 09:10:05 +12:00
Brian Anderson
a4b354ca02 core: Add binary_search and binary_search_elem methods to slices.
These are like the existing bsearch methods but if the search fails,
it returns the next insertion point.

The new `binary_search` returns a `BinarySearchResult` that is either
`Found` or `NotFound`. For convenience, the `found` and `not_found`
methods convert to `Option`, ala `Result`.

Deprecate bsearch and bsearch_elem.
2014-08-13 11:30:15 -07:00
Florian Zeitz
7ece0abe64 collections, unicode: Add support for NFC and NFKC 2014-07-28 18:47:38 +02:00
kwantam
cf432b8f8f add Graphemes iterator; tidy unicode exports
- Graphemes and GraphemeIndices structs implement iterators over
  grapheme clusters analogous to the Chars and CharOffsets for chars in
  a string. Iterator and DoubleEndedIterator are available for both.

- tidied up the exports for libunicode. crate root exports are now moved
  into more appropriate module locations:
  - UnicodeStrSlice, Words, Graphemes, GraphemeIndices are in str module
  - UnicodeChar exported from char instead of crate root
  - canonical_combining_class is exported from str rather than crate root

Since libunicode's exports have changed, programs that previously relied
on the old export locations will need to change their `use` statements
to reflect the new ones. See above for more information on where the new
exports live.

closes #7043
[breaking-change]
2014-07-14 19:53:46 -04:00
kwantam
5d4238b6fc Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
  - Unicode-aware functions now live in libunicode
    - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
      is_uppercase, is_whitespace, is_alphanumeric, is_control,
      is_digit, to_uppercase, to_lowercase
  - added width method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is
      a non-NULL control character
    - takes a boolean argument indicating whether the present context is
      CJK or not (characters with 'A'mbiguous widths are double-wide in
      CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
  (libunicode)
  - functionality formerly in StrSlice that relied upon Unicode
    functionality from Char is now in UnicodeStrSlice
    - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
  - also moved Words type alias into libunicode because words method is
    in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
  libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
  combining the libunicode functionality with the Char functionality
  from libcore
  - thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str

The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-07 14:52:24 -04:00
Florian Zeitz
df802a2754 std: Rename str::Normalizations to str::Decompositions
The Normalizations iterator has been renamed to Decompositions.
It does not currently include all forms of Unicode normalization,
but only encompasses decompositions.
If implemented recomposition would likely be a separate iterator
which works on the result of this one.

[breaking-change]
2014-05-13 17:24:07 -07:00
Florian Zeitz
8c54d5bf40 core: Move Hangul decomposition into unicode.rs 2014-05-13 17:24:07 -07:00
Florian Zeitz
74ad023674 std, core: Generate unicode.rs using unicode.py 2014-05-13 17:24:07 -07:00
Manish Goregaokar
713e87526e Use new attribute syntax in python files in src/etc too (#13478) 2014-04-14 21:00:31 +05:30
Daniel Micay
ce620320a2 rename std::vec -> std::slice
Closes #12702
2014-03-20 01:30:27 -04:00
Piotr Zolnierek
dba5625cb8 Remove code duplication
Remove whitespace

Update documentation for to_uppercase, to_lowercase
2014-03-13 12:23:24 +01:00
Piotr Zolnierek
04170b0a41 Implement lower, upper case conversion for char 2014-03-13 09:32:05 +01:00
Piotr Zolnierek
4a00211916 std::unicode: remove unused category tables 2014-03-13 09:32:05 +01:00
Adrien Tétar
0ebe112b3b etc: add missing license boilerplates 2014-02-05 19:53:53 +01:00
Florian Zeitz
dfe38dbca4 Fix handling of upper/lowercase, and whitespace 2013-11-27 23:36:20 +01:00
Florian Zeitz
e9ab9bf01a Update unicode.py to reflect language changes 2013-11-27 23:21:22 +01:00
Daniel Micay
6919cf5fe1 rename std::iterator to std::iter
The trait will keep the `Iterator` naming, but a more concise module
name makes using the free functions less verbose. The module will define
iterables in addition to iterators, as it deals with iteration in
general.
2013-09-09 03:21:46 -04:00
Daniel Micay
62a3434529 stop treating char as an integer type
Closes #7609
2013-09-04 08:07:56 -04:00