Commit Graph

45 Commits

Author SHA1 Message Date
Simon Sapin
46226a7a6e Yield Err in char::decode_utf8 per Unicode, like String::from_utf8_lossy 2016-08-23 22:09:59 +02:00
Simon Sapin
892bf3d41d Use a macro in test_decode_utf8 to preserve line numbers in panic messages. 2016-08-23 22:07:48 +02:00
Tobias Bucher
3d09b4a0d5 Rename char::escape to char::escape_debug and add tracking issue 2016-07-28 02:20:49 +02:00
Tobias Bucher
68efea08fa Restore char::escape_default and add char::escape instead 2016-07-26 15:15:00 +02:00
Tobias Bucher
e7d16580f5 Escape fewer Unicode codepoints in Debug impl of str
Use the same procedure as Python to determine whether a character is
printable, described in [PEP 3138]. In particular, this means that the
following character classes are escaped:

- Cc (Other, Control)
- Cf (Other, Format)
- Cs (Other, Surrogate), even though they can't appear in Rust strings
- Co (Other, Private Use)
- Cn (Other, Not Assigned)
- Zl (Separator, Line)
- Zp (Separator, Paragraph)
- Zs (Separator, Space), except for the ASCII space `' '` (`0x20`)

This allows for user-friendly inspection of strings that are not
English (e.g. compare `"\u{e9}\u{e8}\u{ea}"` to `"éèê"`).

Fixes #34318.

[PEP 3138]: https://www.python.org/dev/peps/pep-3138/
2016-07-23 00:18:44 +02:00
M Farkas-Dyck
837029fec1 add core::char::DecodeUtf8 2016-07-13 17:40:16 -08:00
Andrea Canciani
6b5e86b0ce Extend the test for EscapeUnicode
to also check that it is legitimately an `ExactSizeIterator`.
2016-05-26 10:54:58 +02:00
Andrea Canciani
8169fa2fe8 Add test for EscapeUnicode specializations 2016-05-04 12:23:10 +02:00
Alex Crichton
552eda70d3 std: Stabilize APIs for the 1.9 release
This commit applies all stabilizations, renamings, and deprecations that the
library team has decided on for the upcoming 1.9 release. All tracking issues
have gone through a cycle-long "final comment period" and the specific APIs
stabilized/deprecated are:

Stable

* `std::panic`
* `std::panic::catch_unwind` (renamed from `recover`)
* `std::panic::resume_unwind` (renamed from `propagate`)
* `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`)
* `std::panic::UnwindSafe` (renamed from `RecoverSafe`)
* `str::is_char_boundary`
* `<*const T>::as_ref`
* `<*mut T>::as_ref`
* `<*mut T>::as_mut`
* `AsciiExt::make_ascii_uppercase`
* `AsciiExt::make_ascii_lowercase`
* `char::decode_utf16`
* `char::DecodeUtf16`
* `char::DecodeUtf16Error`
* `char::DecodeUtf16Error::unpaired_surrogate`
* `BTreeSet::take`
* `BTreeSet::replace`
* `BTreeSet::get`
* `HashSet::take`
* `HashSet::replace`
* `HashSet::get`
* `OsString::with_capacity`
* `OsString::clear`
* `OsString::capacity`
* `OsString::reserve`
* `OsString::reserve_exact`
* `OsStr::is_empty`
* `OsStr::len`
* `std::os::unix::thread`
* `RawPthread`
* `JoinHandleExt`
* `JoinHandleExt::as_pthread_t`
* `JoinHandleExt::into_pthread_t`
* `HashSet::hasher`
* `HashMap::hasher`
* `CommandExt::exec`
* `File::try_clone`
* `SocketAddr::set_ip`
* `SocketAddr::set_port`
* `SocketAddrV4::set_ip`
* `SocketAddrV4::set_port`
* `SocketAddrV6::set_ip`
* `SocketAddrV6::set_port`
* `SocketAddrV6::set_flowinfo`
* `SocketAddrV6::set_scope_id`
* `<[T]>::copy_from_slice`
* `ptr::read_volatile`
* `ptr::write_volatile`
* The `#[deprecated]` attribute
* `OpenOptions::create_new`

Deprecated

* `std::raw::Slice` - use raw parts of `slice` module instead
* `std::raw::Repr` - use raw parts of `slice` module instead
* `str::char_range_at` - use slicing plus `chars()` plus `len_utf8`
* `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8`
* `str::char_at` - use slicing plus `chars()`
* `str::char_at_reverse` - use slicing plus `chars().rev()`
* `str::slice_shift_char` - use `chars()` plus `Chars::as_str`
* `CommandExt::session_leader` - use `before_exec` instead.

Closes #27719
cc #27751 (deprecating the `Slice` bits)
Closes #27754
Closes #27780
Closes #27809
Closes #27811
Closes #27830
Closes #28050
Closes #29453
Closes #29791
Closes #29935
Closes #30014
Closes #30752
Closes #31262
cc #31398 (still need to deal with `before_exec`)
Closes #31405
Closes #31572
Closes #31755
Closes #31756
2016-04-11 08:57:53 -07:00
Alex Crichton
48d5fe9ec5 std: Change encode_utf{8,16} to return iterators
Currently these have non-traditional APIs which take a buffer and report how
much was filled in, but they're not necessarily ergonomic to use. Returning an
iterator which *also* exposes an underlying slice shouldn't result in any
performance loss as it's just a lazy version of the same implementation, and
it's also much more ergonomic!

cc #27784
2016-03-22 10:25:30 -07:00
Ticki
d026977f25 Make style more uniform, add tests for specialization of .last(), move tests to libcoretest
Remove unused import

Fold nth() method into the match expr
2016-01-16 09:12:09 +01:00
bors
fd302a95e1 Auto merge of #27808 - SimonSapin:utf16decoder, r=alexcrichton
* Rename `Utf16Items` to `Utf16Decoder`. "Items" is meaningless.
* Generalize it to any `u16` iterator, not just `[u16].iter()`
* Make it yield `Result` instead of a custom `Utf16Item` enum that was isomorphic to `Result`. This enable using the `FromIterator for Result` impl.
* Replace `Utf16Item::to_char_lossy` with a `Utf16Decoder::lossy` iterator adaptor.

This is a [breaking change], but only for users of the unstable `rustc_unicode` crate.

I’d like this functionality to be stabilized and re-exported in `std` eventually, as the "low-level equivalent" of `String::from_utf16` and `String::from_utf16_lossy` like #27784 is the low-level equivalent of #27714.

CC @aturon, @alexcrichton
2015-08-27 00:41:13 +00:00
Simon Sapin
6174b8d726 Refactor low-level UTF-16 decoding.
* Rename `utf16_items` to `decode_utf16`. "Items" is meaningless.
* Move it to `rustc_unicode::char`, exposed in `std::char`.
* Generalize it to any `u16` iterable, not just `&[u16]`.
* Make it yield `Result` instead of a custom `Utf16Item` enum that was isomorphic to `Result`. This enable using the `FromIterator for Result` impl.
* Add a `REPLACEMENT_CHARACTER` constant.
* Document how `result.unwrap_or(REPLACEMENT_CHARACTER)` replaces `Utf16Item::to_char_lossy`.
2015-08-23 00:28:56 +02:00
Simon Sapin
961012e983 Add a test for char::to_lowercase mapping to more than one char.
I was wrong about Unicode not having such language-independent mapping.
2015-08-20 14:38:46 +02:00
Alex Crichton
8d90d3f368 Remove all unstable deprecated functionality
This commit removes all unstable and deprecated functions in the standard
library. A release was recently cut (1.3) which makes this a good time for some
spring cleaning of the deprecated functions.
2015-08-12 14:55:17 -07:00
Simon Sapin
32b7b50baf Remove char::to_titlecase. Fix #26555
I added it because it was easy (same a `char::to_lowercase`,
just a different table), but it doesn’t make sense to have this
in std but not str::to_titlecase, which would require
https://github.com/unicode-rs/unicode-segmentation

At some point in the future this feature will be available
(both on char and str) in a crates.io crate.
2015-06-24 22:16:25 -07:00
Simon Sapin
6369dcbad8 Move collectionstest::char into coretest::char 2015-06-09 13:08:29 +02:00
Simon Sapin
c6a8d5e733 Fix coretest::char::test_to_uppercase for complex mapping 2015-06-09 13:08:22 +02:00
Piotr Czarnecki
13bc8afa4b Model lexer: Fix remaining issues 2015-04-21 12:02:12 +02:00
kwantam
29d1252e4d deprecate Unicode functions that will be moved to crates.io
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
2015-04-16 17:03:05 -04:00
Alex Crichton
0f6a0b58f9 std: Stabilize more of the char module
This commit performs another pass over the `std::char` module for stabilization.
Some minor cleanup is performed such as migrating documentation from libcore to
libunicode (where the `std`-facing trait resides) as well as a slight
reorganiation in libunicode itself. Otherwise, the stability modifications made
are:

* `char::from_digit` is now stable
* `CharExt::is_digit` is now stable
* `CharExt::to_digit` is now stable
* `CharExt::to_{lower,upper}case` are now stable after being modified to return
  an iterator over characters. While the implementation today has not changed
  this should allow us to implement the full set of case conversions in unicode
  where some characters can map to multiple when doing an upper or lower case
  mapping.
* `StrExt::to_{lower,upper}case` was added as unstable for a convenience of not
  having to worry about characters expanding to more characters when you just
  want the whole string to get into upper or lower case.

This is a breaking change due to the change in the signatures of the
`CharExt::to_{upper,lower}case` methods. Code can be updated to use functions
like `flat_map` or `collect` to handle the difference.

[breaking-change]
2015-03-10 15:08:31 -07:00
Eduard Burtescu
e64670888a Remove integer suffixes where the types in compiled code are identical. 2015-03-05 12:38:33 +05:30
Alfie John
bffbcb5729 Deprecating i/u suffixes in libcoretest 2015-02-10 22:56:31 +00:00
Jorge Aparicio
17bc7d8d5b cleanup: replace as[_mut]_slice() calls with deref coercions 2015-02-05 13:45:01 -05:00
Jorge Aparicio
c1d48a8508 cleanup: &foo[0..a] -> &foo[..a] 2015-01-12 17:59:37 -05:00
Jorge Aparicio
517f1cc63c use slicing sugar 2015-01-07 17:35:56 -05:00
Nick Cameron
f7ff37e4c5 Replace full slice notation with index calls 2015-01-07 10:46:33 +13:00
Alex Crichton
7d8d06f86b Remove deprecated functionality
This removes a large array of deprecated functionality, regardless of how
recently it was deprecated. The purpose of this commit is to clean out the
standard libraries and compiler for the upcoming alpha release.

Some notable compiler changes were to enable warnings for all now-deprecated
command line arguments (previously the deprecated versions were silently
accepted) as well as removing deriving(Zero) entirely (the trait was removed).

The distribution no longer contains the libtime or libregex_macros crates. Both
of these have been deprecated for some time and are available externally.
2015-01-03 23:43:57 -08:00
Nick Cameron
7e2b9ea235 Fallout - change array syntax to use ; 2015-01-02 10:28:19 +13:00
Alex Crichton
df5404cfa8 std: Change escape_unicode to use new escapes
This changes the `escape_unicode` method on a `char` to use the new style of
unicode escapes in the language.

Closes #19811
Closes #19879
2014-12-16 08:09:37 -08:00
Jorge Aparicio
5257a5b284 libcoretest: remove unnecessary as_slice() calls 2014-12-06 19:05:58 -05:00
Brian Anderson
f6607a20c4 core: Add Char::len_utf16
Missing method to pair with len_utf8.
2014-11-21 13:17:09 -08:00
Brian Anderson
c2aff692fa unicode: Rename UnicodeChar::is_digit to is_numeric
'Numeric' is the proper name of the unicode character class,
and this frees up the word 'digit' for ascii use in libcore.

Since I'm going to rename `Char::is_digit_radix` to
`is_digit`, I am not leaving a deprecated method in place,
because that would just cause name clashes, as both
`Char` and `UnicodeChar` are in the prelude.

[breaking-change]
2014-11-21 13:17:04 -08:00
Nick Cameron
ca08540a00 Fix fallout from coercion removal 2014-11-17 22:41:33 +13:00
Patrick Walton
e8d6031c71 libsyntax: Forbid escapes in the inclusive range \x80-\xff in
Unicode characters and strings.

Use `\u0080`-`\u00ff` instead. ASCII/byte literals are unaffected.

This PR introduces a new function, `escape_default`, into the ASCII
module. This was necessary for the pretty printer to continue to
function.

RFC #326.

Closes #18062.

[breaking-change]
2014-11-04 14:58:11 -08:00
NODA, Kai
f27ad3d3e9 Clean up rustc warnings.
compiletest: compact "linux" "macos" etc.as "unix".
liballoc: remove a superfluous "use".
libcollections: remove invocations of deprecated methods in favor of
    their suggested replacements and use "_" for a loop counter.
libcoretest: remove invocations of deprecated methods;  also add
    "allow(deprecated)" for testing a deprecated method itself.
libglob: use "cfg_attr".
libgraphviz: add a test for one of data constructors.
libgreen: remove a superfluous "use".
libnum: "allow(type_overflow)" for type cast into u8 in a test code.
librustc: names of static variables should be in upper case.
libserialize: v[i] instead of get().
libstd/ascii: to_lowercase() instead of to_lower().
libstd/bitflags: modify AnotherSetOfFlags to use i8 as its backend.
    It will serve better for testing various aspects of bitflags!.
libstd/collections: "allow(deprecated)" for testing a deprecated
    method itself.
libstd/io: remove invocations of deprecated methods and superfluous "use".
    Also add #[test] where it was missing.
libstd/num: introduce a helper function to effectively remove
    invocations of a deprecated method.
libstd/path and rand: remove invocations of deprecated methods and
    superfluous "use".
libstd/task and libsync/comm: "allow(deprecated)" for testing
    a deprecated method itself.
libsync/deque: remove superfluous "unsafe".
libsync/mutex and once: names of static variables should be in upper case.
libterm: introduce a helper function to effectively remove
    invocations of a deprecated method.

We still see a few warnings about using obsoleted native::task::spawn()
in the test modules for libsync.  I'm not sure how I should replace them
with std::task::TaksBuilder and native::task::NativeTaskBuilder
(dependency to libstd?)

Signed-off-by: NODA, Kai <nodakai@gmail.com>
2014-10-13 14:16:22 +08:00
Nick Cameron
59976942ea Use slice syntax instead of slice_to, etc. 2014-10-07 15:49:53 +13:00
Aaron Turon
d2ea0315e0 Revert "Use slice syntax instead of slice_to, etc."
This reverts commit 40b9f5ded5.
2014-10-02 11:48:07 -07:00
Nick Cameron
40b9f5ded5 Use slice syntax instead of slice_to, etc. 2014-10-02 13:19:45 +13:00
Marvin Löbel
13079c1a85 Optimized IR generation for UTF-8 and UTF-16 encoding
- Both can now be inlined and constant folded away
- Both can no longer cause failure
- Both now return an `Option` instead

Removed debug `assert!()`s over the valid ranges of a `char`
- It affected optimizations due to unwinding
- Char handling is now sound enought that they became uneccessary
2014-08-16 21:13:39 +02:00
Corey Richardson
188d889aaf ignore-lexer-test to broken files and remove some tray hyphens
I blame @ChrisMorgan for the hyphens.
2014-07-21 10:59:58 -07:00
bors
fa7cbb5a46 auto merge of #15283 : kwantam/rust/master, r=alexcrichton
Add libunicode; move unicode functions from core

- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
  - Unicode-aware functions now live in libunicode
    - `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
      `is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
      `to_uppercase`, `to_lowercase`
  - added `width` method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is a non-NULL control character
    - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
  - functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
    - `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
  - also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
  - thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`

The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-09 18:36:30 +00:00
Richo Healey
12c334a77b std: Rename the ToStr trait to ToString, and to_str to to_string.
[breaking-change]
2014-07-08 13:01:43 -07:00
kwantam
5d4238b6fc Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
  - Unicode-aware functions now live in libunicode
    - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
      is_uppercase, is_whitespace, is_alphanumeric, is_control,
      is_digit, to_uppercase, to_lowercase
  - added width method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is
      a non-NULL control character
    - takes a boolean argument indicating whether the present context is
      CJK or not (characters with 'A'mbiguous widths are double-wide in
      CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
  (libunicode)
  - functionality formerly in StrSlice that relied upon Unicode
    functionality from Char is now in UnicodeStrSlice
    - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
  - also moved Words type alias into libunicode because words method is
    in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
  libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
  combining the libunicode functionality with the Char functionality
  from libcore
  - thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str

The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-07 14:52:24 -04:00
Steven Fackler
1ed646eaf7 Extract tests from libcore to a separate crate
Libcore's test infrastructure is complicated by the fact that many lang
items are defined in the crate. The current approach (realcore/realstd
imports) is hacky and hard to work with (tests inside of core::cmp
haven't been run for months!).

Moving tests to a separate crate does mean that they can only test the
public API of libcore, but I don't feel that that is too much of an
issue. The only tests that I had to get rid of were some checking the
various numeric formatters, but those are also exercised through normal
format! calls in other tests.
2014-06-29 15:57:21 -07:00