rust/src/libcollections/str.rs

2620 lines
97 KiB
Rust
Raw Normal View History

2014-01-30 12:29:35 -06:00
// Copyright 2012-2014 The Rust Project Developers. See the COPYRIGHT
2012-12-10 17:44:02 -06:00
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
//
// ignore-lexer-test FIXME #15679
2012-12-10 17:44:02 -06:00
2014-08-04 05:48:39 -05:00
//! Unicode string manipulation (`str` type)
//!
//! # Basic Usage
//!
//! Rust's string type is one of the core primitive types of the language. While
//! represented by the name `str`, the name `str` is not actually a valid type in
//! Rust. Each string must also be decorated with a pointer. `String` is used
//! for an owned string, so there is only one commonly-used `str` type in Rust:
//! `&str`.
//!
//! `&str` is the borrowed string type. This type of string can only be created
//! from other strings, unless it is a static string (see below). As the word
//! "borrowed" implies, this type of string is owned elsewhere, and this string
//! cannot be moved out of.
//!
//! As an example, here's some code that uses a string.
//!
//! ```rust
//! fn main() {
//! let borrowed_string = "This string is borrowed with the 'static lifetime";
//! }
//! ```
//!
//! From the example above, you can see that Rust's string literals have the
//! `'static` lifetime. This is akin to C's concept of a static string.
//!
//! String literals are allocated statically in the rodata of the
//! executable/library. The string then has the type `&'static str` meaning that
//! the string is valid for the `'static` lifetime, otherwise known as the
//! lifetime of the entire program. As can be inferred from the type, these static
//! strings are not mutable.
//!
//! # Representation
//!
//! Rust's string type, `str`, is a sequence of Unicode scalar values encoded as a
2014-08-04 05:48:39 -05:00
//! stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8
//! sequences. Additionally, strings are not null-terminated and can contain null
//! bytes.
//!
//! The actual representation of strings have direct mappings to slices: `&str`
//! is the same as `&[u8]`.
#![doc(primitive = "str")]
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
use core::default::Default;
use core::fmt;
use core::cmp;
use core::iter::AdditiveIterator;
use core::mem;
use core::prelude::{Char, Clone, Collection, Eq, Equiv, ImmutableSlice};
use core::prelude::{Iterator, MutableSlice, None, Option, Ord, Ordering};
use core::prelude::{PartialEq, PartialOrd, Result, Slice, Some, Tuple2};
use core::prelude::{range};
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
use {Deque, MutableSeq};
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
use hash;
use ringbuf::RingBuf;
2014-09-17 14:56:31 -05:00
use slice::CloneableVector;
use string::String;
Add libunicode; move unicode functions from core - created new crate, libunicode, below libstd - split Char trait into Char (libcore) and UnicodeChar (libunicode) - Unicode-aware functions now live in libunicode - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase, is_uppercase, is_whitespace, is_alphanumeric, is_control, is_digit, to_uppercase, to_lowercase - added width method in UnicodeChar trait - determines printed width of character in columns, or None if it is a non-NULL control character - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise) - split StrSlice into StrSlice (libcore) and UnicodeStrSlice (libunicode) - functionality formerly in StrSlice that relied upon Unicode functionality from Char is now in UnicodeStrSlice - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right - also moved Words type alias into libunicode because words method is in UnicodeStrSlice - unified Unicode tables from libcollections, libcore, and libregex into libunicode - updated unicode.py in src/etc to generate aforementioned tables - generated new tables based on latest Unicode data - added UnicodeChar and UnicodeStrSlice traits to prelude - libunicode is now the collection point for the std::char module, combining the libunicode functionality with the Char functionality from libcore - thus, moved doc comment for char from core::char to unicode::char - libcollections remains the collection point for std::str The Unicode-aware functions that previously lived in the Char and StrSlice traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and use the UnicodeChar and/or UnicodeStrSlice traits: extern crate unicode; use unicode::UnicodeChar; use unicode::UnicodeStrSlice; use unicode::Words; // if you want to use the words() method NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude. closes #15224 [breaking-change]
2014-06-30 16:04:10 -05:00
use unicode;
core: Remove the cast module This commit revisits the `cast` module in libcore and libstd, and scrutinizes all functions inside of it. The result was to remove the `cast` module entirely, folding all functionality into the `mem` module. Specifically, this is the fate of each function in the `cast` module. * transmute - This function was moved to `mem`, but it is now marked as #[unstable]. This is due to planned changes to the `transmute` function and how it can be invoked (see the #[unstable] comment). For more information, see RFC 5 and #12898 * transmute_copy - This function was moved to `mem`, with clarification that is is not an error to invoke it with T/U that are different sizes, but rather that it is strongly discouraged. This function is now #[stable] * forget - This function was moved to `mem` and marked #[stable] * bump_box_refcount - This function was removed due to the deprecation of managed boxes as well as its questionable utility. * transmute_mut - This function was previously deprecated, and removed as part of this commit. * transmute_mut_unsafe - This function doesn't serve much of a purpose when it can be achieved with an `as` in safe code, so it was removed. * transmute_lifetime - This function was removed because it is likely a strong indication that code is incorrect in the first place. * transmute_mut_lifetime - This function was removed for the same reasons as `transmute_lifetime` * copy_lifetime - This function was moved to `mem`, but it is marked `#[unstable]` now due to the likelihood of being removed in the future if it is found to not be very useful. * copy_mut_lifetime - This function was also moved to `mem`, but had the same treatment as `copy_lifetime`. * copy_lifetime_vec - This function was removed because it is not used today, and its existence is not necessary with DST (copy_lifetime will suffice). In summary, the cast module was stripped down to these functions, and then the functions were moved to the `mem` module. transmute - #[unstable] transmute_copy - #[stable] forget - #[stable] copy_lifetime - #[unstable] copy_mut_lifetime - #[unstable] [breaking-change]
2014-05-09 12:34:51 -05:00
use vec::Vec;
pub use core::str::{from_utf8, CharEq, Chars, CharOffsets};
pub use core::str::{Bytes, CharSplits};
Add libunicode; move unicode functions from core - created new crate, libunicode, below libstd - split Char trait into Char (libcore) and UnicodeChar (libunicode) - Unicode-aware functions now live in libunicode - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase, is_uppercase, is_whitespace, is_alphanumeric, is_control, is_digit, to_uppercase, to_lowercase - added width method in UnicodeChar trait - determines printed width of character in columns, or None if it is a non-NULL control character - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise) - split StrSlice into StrSlice (libcore) and UnicodeStrSlice (libunicode) - functionality formerly in StrSlice that relied upon Unicode functionality from Char is now in UnicodeStrSlice - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right - also moved Words type alias into libunicode because words method is in UnicodeStrSlice - unified Unicode tables from libcollections, libcore, and libregex into libunicode - updated unicode.py in src/etc to generate aforementioned tables - generated new tables based on latest Unicode data - added UnicodeChar and UnicodeStrSlice traits to prelude - libunicode is now the collection point for the std::char module, combining the libunicode functionality with the Char functionality from libcore - thus, moved doc comment for char from core::char to unicode::char - libcollections remains the collection point for std::str The Unicode-aware functions that previously lived in the Char and StrSlice traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and use the UnicodeChar and/or UnicodeStrSlice traits: extern crate unicode; use unicode::UnicodeChar; use unicode::UnicodeStrSlice; use unicode::Words; // if you want to use the words() method NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude. closes #15224 [breaking-change]
2014-06-30 16:04:10 -05:00
pub use core::str::{CharSplitsN, AnyLines, MatchIndices, StrSplits};
pub use core::str::{eq_slice, is_utf8, is_utf16, Utf16Items};
pub use core::str::{Utf16Item, ScalarValue, LoneSurrogate, utf16_items};
pub use core::str::{truncate_utf16_at_nul, utf8_char_width, CharRange};
pub use core::str::{Str, StrSlice};
pub use unicode::str::{UnicodeStrSlice, Words, Graphemes, GraphemeIndices};
/*
Section: Creating a string
*/
2014-08-04 05:48:39 -05:00
/// Deprecated. Replaced by `String::from_utf8`.
#[deprecated = "Replaced by `String::from_utf8`"]
pub fn from_utf8_owned(vv: Vec<u8>) -> Result<String, Vec<u8>> {
String::from_utf8(vv)
}
2014-08-04 05:48:39 -05:00
/// Deprecated. Replaced by `String::from_byte`.
#[deprecated = "Replaced by String::from_byte"]
pub fn from_byte(b: u8) -> String {
2013-08-04 15:22:56 -05:00
assert!(b < 128u8);
String::from_char(1, b as char)
2013-08-04 15:22:56 -05:00
}
2014-08-04 05:48:39 -05:00
/// Deprecated. Use `String::from_char` or `char::to_string()` instead.
#[deprecated = "use String::from_char or char.to_string()"]
pub fn from_char(ch: char) -> String {
String::from_char(1, ch)
}
2014-08-04 05:48:39 -05:00
/// Deprecated. Replaced by `String::from_chars`.
#[deprecated = "use String::from_chars instead"]
pub fn from_chars(chs: &[char]) -> String {
chs.iter().map(|c| *c).collect()
}
2014-08-04 05:48:39 -05:00
/// Methods for vectors of strings.
pub trait StrVector {
2014-08-04 05:48:39 -05:00
/// Concatenates a vector of strings.
///
/// # Example
///
/// ```rust
/// let first = "Restaurant at the End of the".to_string();
/// let second = " Universe".to_string();
/// let string_vec = vec![first, second];
/// assert_eq!(string_vec.concat(), "Restaurant at the End of the Universe".to_string());
/// ```
fn concat(&self) -> String;
2014-08-04 05:48:39 -05:00
/// Concatenates a vector of strings, placing a given separator between each.
///
/// # Example
///
/// ```rust
/// let first = "Roast".to_string();
/// let second = "Sirloin Steak".to_string();
/// let string_vec = vec![first, second];
/// assert_eq!(string_vec.connect(", "), "Roast, Sirloin Steak".to_string());
/// ```
fn connect(&self, sep: &str) -> String;
}
impl<'a, S: Str> StrVector for &'a [S] {
fn concat(&self) -> String {
if self.is_empty() {
return String::new();
}
// `len` calculation may overflow but push_str will check boundaries
let len = self.iter().map(|s| s.as_slice().len()).sum();
let mut result = String::with_capacity(len);
for s in self.iter() {
result.push_str(s.as_slice())
}
result
}
fn connect(&self, sep: &str) -> String {
if self.is_empty() {
return String::new();
}
2013-08-04 15:22:56 -05:00
// concat is faster
if sep.is_empty() {
return self.concat();
}
2013-08-04 15:22:56 -05:00
// this is wrong without the guarantee that `self` is non-empty
// `len` calculation may overflow but push_str but will check boundaries
2013-08-04 15:22:56 -05:00
let len = sep.len() * (self.len() - 1)
+ self.iter().map(|s| s.as_slice().len()).sum();
let mut result = String::with_capacity(len);
2013-08-04 15:22:56 -05:00
let mut first = true;
for s in self.iter() {
if first {
first = false;
} else {
result.push_str(sep);
2013-08-04 15:22:56 -05:00
}
result.push_str(s.as_slice());
2013-08-04 15:22:56 -05:00
}
result
2013-08-04 15:22:56 -05:00
}
}
impl<'a, S: Str> StrVector for Vec<S> {
#[inline]
fn concat(&self) -> String {
self.as_slice().concat()
}
#[inline]
fn connect(&self, sep: &str) -> String {
self.as_slice().connect(sep)
}
}
/*
Section: Iterators
*/
// Helper functions used for Unicode normalization
fn canonical_sort(comb: &mut [(char, u8)]) {
let len = comb.len();
for i in range(0, len) {
let mut swapped = false;
for j in range(1, len-i) {
let class_a = *comb[j-1].ref1();
let class_b = *comb[j].ref1();
if class_a != 0 && class_b != 0 && class_a > class_b {
comb.swap(j-1, j);
swapped = true;
}
}
if !swapped { break; }
}
}
#[deriving(Clone)]
enum DecompositionType {
Canonical,
Compatible
}
/// External iterator for a string's decomposition's characters.
/// Use with the `std::iter` module.
#[deriving(Clone)]
pub struct Decompositions<'a> {
kind: DecompositionType,
2014-03-27 17:09:47 -05:00
iter: Chars<'a>,
buffer: Vec<(char, u8)>,
2014-03-27 17:09:47 -05:00
sorted: bool
}
impl<'a> Iterator<char> for Decompositions<'a> {
#[inline]
fn next(&mut self) -> Option<char> {
match self.buffer.as_slice().head() {
Some(&(c, 0)) => {
self.sorted = false;
2014-08-11 18:47:46 -05:00
self.buffer.remove(0);
return Some(c);
}
Some(&(c, _)) if self.sorted => {
2014-08-11 18:47:46 -05:00
self.buffer.remove(0);
return Some(c);
}
_ => self.sorted = false
}
let decomposer = match self.kind {
Add libunicode; move unicode functions from core - created new crate, libunicode, below libstd - split Char trait into Char (libcore) and UnicodeChar (libunicode) - Unicode-aware functions now live in libunicode - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase, is_uppercase, is_whitespace, is_alphanumeric, is_control, is_digit, to_uppercase, to_lowercase - added width method in UnicodeChar trait - determines printed width of character in columns, or None if it is a non-NULL control character - takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise) - split StrSlice into StrSlice (libcore) and UnicodeStrSlice (libunicode) - functionality formerly in StrSlice that relied upon Unicode functionality from Char is now in UnicodeStrSlice - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right - also moved Words type alias into libunicode because words method is in UnicodeStrSlice - unified Unicode tables from libcollections, libcore, and libregex into libunicode - updated unicode.py in src/etc to generate aforementioned tables - generated new tables based on latest Unicode data - added UnicodeChar and UnicodeStrSlice traits to prelude - libunicode is now the collection point for the std::char module, combining the libunicode functionality with the Char functionality from libcore - thus, moved doc comment for char from core::char to unicode::char - libcollections remains the collection point for std::str The Unicode-aware functions that previously lived in the Char and StrSlice traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and use the UnicodeChar and/or UnicodeStrSlice traits: extern crate unicode; use unicode::UnicodeChar; use unicode::UnicodeStrSlice; use unicode::Words; // if you want to use the words() method NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude. closes #15224 [breaking-change]
2014-06-30 16:04:10 -05:00
Canonical => unicode::char::decompose_canonical,
Compatible => unicode::char::decompose_compatible
};
if !self.sorted {
for ch in self.iter {
let buffer = &mut self.buffer;
let sorted = &mut self.sorted;
decomposer(ch, |d| {
let class = unicode::char::canonical_combining_class(d);
if class == 0 && !*sorted {
canonical_sort(buffer.as_mut_slice());
*sorted = true;
}
buffer.push((d, class));
});
if *sorted { break }
}
}
if !self.sorted {
canonical_sort(self.buffer.as_mut_slice());
self.sorted = true;
}
2014-08-11 18:47:46 -05:00
match self.buffer.remove(0) {
Some((c, 0)) => {
self.sorted = false;
Some(c)
}
Some((c, _)) => Some(c),
None => None
}
}
fn size_hint(&self) -> (uint, Option<uint>) {
let (lower, _) = self.iter.size_hint();
(lower, None)
}
}
#[deriving(Clone)]
enum RecompositionState {
Composing,
Purging,
Finished
}
/// External iterator for a string's recomposition's characters.
/// Use with the `std::iter` module.
#[deriving(Clone)]
pub struct Recompositions<'a> {
iter: Decompositions<'a>,
state: RecompositionState,
buffer: RingBuf<char>,
composee: Option<char>,
last_ccc: Option<u8>
}
impl<'a> Iterator<char> for Recompositions<'a> {
#[inline]
fn next(&mut self) -> Option<char> {
loop {
match self.state {
Composing => {
for ch in self.iter {
let ch_class = unicode::char::canonical_combining_class(ch);
if self.composee.is_none() {
if ch_class != 0 {
return Some(ch);
}
self.composee = Some(ch);
continue;
}
let k = self.composee.clone().unwrap();
match self.last_ccc {
None => {
match unicode::char::compose(k, ch) {
Some(r) => {
self.composee = Some(r);
continue;
}
None => {
if ch_class == 0 {
self.composee = Some(ch);
return Some(k);
}
self.buffer.push(ch);
self.last_ccc = Some(ch_class);
}
}
}
Some(l_class) => {
if l_class >= ch_class {
// `ch` is blocked from `composee`
if ch_class == 0 {
self.composee = Some(ch);
self.last_ccc = None;
self.state = Purging;
return Some(k);
}
self.buffer.push(ch);
self.last_ccc = Some(ch_class);
continue;
}
match unicode::char::compose(k, ch) {
Some(r) => {
self.composee = Some(r);
continue;
}
None => {
self.buffer.push(ch);
self.last_ccc = Some(ch_class);
}
}
}
}
}
self.state = Finished;
if self.composee.is_some() {
return self.composee.take();
}
}
Purging => {
match self.buffer.pop_front() {
None => self.state = Composing,
s => return s
}
}
Finished => {
match self.buffer.pop_front() {
None => return self.composee.take(),
s => return s
}
}
}
}
}
}
2014-08-04 05:48:39 -05:00
/// Replaces all occurrences of one string with another.
///
/// # Arguments
///
/// * s - The string containing substrings to replace
/// * from - The string to replace
/// * to - The replacement string
///
/// # Return value
///
2014-08-04 05:48:39 -05:00
/// The original string with all occurrences of `from` replaced with `to`.
///
/// # Example
///
/// ```rust
/// use std::str;
/// let string = "orange";
/// let new_string = str::replace(string, "or", "str");
/// assert_eq!(new_string.as_slice(), "strange");
/// ```
pub fn replace(s: &str, from: &str, to: &str) -> String {
let mut result = String::new();
let mut last_end = 0;
for (start, end) in s.match_indices(from) {
result.push_str(unsafe{raw::slice_bytes(s, last_end, start)});
result.push_str(to);
last_end = end;
}
result.push_str(unsafe{raw::slice_bytes(s, last_end, s.len())});
result
}
/*
Section: Misc
*/
/// Deprecated. Use `String::from_utf16`.
#[deprecated = "Replaced by String::from_utf16"]
pub fn from_utf16(v: &[u16]) -> Option<String> {
String::from_utf16(v)
}
/// Deprecated. Use `String::from_utf16_lossy`.
#[deprecated = "Replaced by String::from_utf16_lossy"]
pub fn from_utf16_lossy(v: &[u16]) -> String {
String::from_utf16_lossy(v)
}
// Return the initial codepoint accumulator for the first byte.
// The first byte is special, only want bottom 5 bits for width 2, 4 bits
// for width 3, and 3 bits for width 4
macro_rules! utf8_first_byte(
($byte:expr, $width:expr) => (($byte & (0x7F >> $width)) as u32)
)
// return the value of $ch updated with continuation byte $byte
macro_rules! utf8_acc_cont_byte(
($ch:expr, $byte:expr) => (($ch << 6) | ($byte & 63u8) as u32)
)
/// Deprecated. Use `String::from_utf8_lossy`.
#[deprecated = "Replaced by String::from_utf8_lossy"]
pub fn from_utf8_lossy<'a>(v: &'a [u8]) -> MaybeOwned<'a> {
String::from_utf8_lossy(v)
}
/*
Section: MaybeOwned
*/
2014-08-04 05:48:39 -05:00
/// A string type that can hold either a `String` or a `&str`.
/// This can be useful as an optimization when an allocation is sometimes
/// needed but not always.
pub enum MaybeOwned<'a> {
2014-08-04 05:48:39 -05:00
/// A borrowed string.
Slice(&'a str),
2014-08-04 05:48:39 -05:00
/// An owned string.
Owned(String)
}
2014-08-04 05:48:39 -05:00
/// A specialization of `MaybeOwned` to be sendable.
pub type SendStr = MaybeOwned<'static>;
impl<'a> MaybeOwned<'a> {
2014-08-04 05:48:39 -05:00
/// Returns `true` if this `MaybeOwned` wraps an owned string.
///
/// # Example
///
/// ```rust
/// let string = String::from_str("orange");
/// let maybe_owned_string = string.into_maybe_owned();
/// assert_eq!(true, maybe_owned_string.is_owned());
/// ```
#[inline]
pub fn is_owned(&self) -> bool {
match *self {
Slice(_) => false,
Owned(_) => true
}
}
2014-08-04 05:48:39 -05:00
/// Returns `true` if this `MaybeOwned` wraps a borrowed string.
///
/// # Example
///
/// ```rust
/// let string = "orange";
/// let maybe_owned_string = string.as_slice().into_maybe_owned();
/// assert_eq!(true, maybe_owned_string.is_slice());
/// ```
#[inline]
pub fn is_slice(&self) -> bool {
match *self {
Slice(_) => true,
Owned(_) => false
}
}
}
2014-08-04 05:48:39 -05:00
/// Trait for moving into a `MaybeOwned`.
pub trait IntoMaybeOwned<'a> {
2014-08-04 05:48:39 -05:00
/// Moves `self` into a `MaybeOwned`.
fn into_maybe_owned(self) -> MaybeOwned<'a>;
}
impl<'a> IntoMaybeOwned<'a> for String {
2014-08-04 05:48:39 -05:00
/// # Example
///
/// ```rust
/// let owned_string = String::from_str("orange");
/// let maybe_owned_string = owned_string.into_maybe_owned();
/// assert_eq!(true, maybe_owned_string.is_owned());
/// ```
#[inline]
fn into_maybe_owned(self) -> MaybeOwned<'a> {
Owned(self)
}
}
impl<'a> IntoMaybeOwned<'a> for &'a str {
2014-08-04 05:48:39 -05:00
/// # Example
///
/// ```rust
/// let string = "orange";
/// let maybe_owned_str = string.as_slice().into_maybe_owned();
/// assert_eq!(false, maybe_owned_str.is_owned());
/// ```
#[inline]
fn into_maybe_owned(self) -> MaybeOwned<'a> { Slice(self) }
}
impl<'a> IntoMaybeOwned<'a> for MaybeOwned<'a> {
2014-08-04 05:48:39 -05:00
/// # Example
///
/// ```rust
/// let str = "orange";
/// let maybe_owned_str = str.as_slice().into_maybe_owned();
/// let maybe_maybe_owned_str = maybe_owned_str.into_maybe_owned();
/// assert_eq!(false, maybe_maybe_owned_str.is_owned());
/// ```
#[inline]
fn into_maybe_owned(self) -> MaybeOwned<'a> { self }
}
impl<'a> PartialEq for MaybeOwned<'a> {
#[inline]
fn eq(&self, other: &MaybeOwned) -> bool {
self.as_slice() == other.as_slice()
}
}
impl<'a> Eq for MaybeOwned<'a> {}
impl<'a> PartialOrd for MaybeOwned<'a> {
#[inline]
fn partial_cmp(&self, other: &MaybeOwned) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl<'a> Ord for MaybeOwned<'a> {
#[inline]
fn cmp(&self, other: &MaybeOwned) -> Ordering {
self.as_slice().cmp(&other.as_slice())
}
}
impl<'a, S: Str> Equiv<S> for MaybeOwned<'a> {
#[inline]
fn equiv(&self, other: &S) -> bool {
self.as_slice() == other.as_slice()
}
}
impl<'a> Str for MaybeOwned<'a> {
#[inline]
fn as_slice<'b>(&'b self) -> &'b str {
match *self {
Slice(s) => s,
Owned(ref s) => s.as_slice()
}
}
}
impl<'a> StrAllocating for MaybeOwned<'a> {
#[inline]
fn into_string(self) -> String {
match self {
Slice(s) => String::from_str(s),
Owned(s) => s
}
}
}
impl<'a> Collection for MaybeOwned<'a> {
#[inline]
fn len(&self) -> uint { self.as_slice().len() }
}
impl<'a> Clone for MaybeOwned<'a> {
#[inline]
fn clone(&self) -> MaybeOwned<'a> {
match *self {
Slice(s) => Slice(s),
Owned(ref s) => Owned(String::from_str(s.as_slice()))
}
}
}
impl<'a> Default for MaybeOwned<'a> {
#[inline]
fn default() -> MaybeOwned<'a> { Slice("") }
}
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
impl<'a, H: hash::Writer> hash::Hash<H> for MaybeOwned<'a> {
#[inline]
2014-02-25 10:03:41 -06:00
fn hash(&self, hasher: &mut H) {
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
self.as_slice().hash(hasher)
}
}
impl<'a> fmt::Show for MaybeOwned<'a> {
#[inline]
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
Slice(ref s) => s.fmt(f),
Owned(ref s) => s.fmt(f)
}
}
}
2014-08-04 05:48:39 -05:00
/// Unsafe string operations.
2012-09-28 17:41:10 -05:00
pub mod raw {
use string;
use string::String;
use vec::Vec;
2014-07-14 16:45:10 -05:00
use MutableSeq;
pub use core::str::raw::{from_utf8, c_str_to_static_slice, slice_bytes};
pub use core::str::raw::{slice_unchecked};
/// Deprecated. Replaced by `string::raw::from_buf_len`
#[deprecated = "Use string::raw::from_buf_len"]
2014-06-25 14:47:34 -05:00
pub unsafe fn from_buf_len(buf: *const u8, len: uint) -> String {
string::raw::from_buf_len(buf, len)
}
2014-07-22 10:55:12 -05:00
/// Deprecated. Use `string::raw::from_buf`
#[deprecated = "Use string::raw::from_buf"]
2014-06-25 14:47:34 -05:00
pub unsafe fn from_c_str(c_string: *const i8) -> String {
2014-07-22 10:55:12 -05:00
string::raw::from_buf(c_string as *const u8)
}
/// Deprecated. Replaced by `string::raw::from_utf8`
#[deprecated = "Use string::raw::from_utf8"]
pub unsafe fn from_utf8_owned(v: Vec<u8>) -> String {
string::raw::from_utf8(v)
2013-08-04 15:22:56 -05:00
}
/// Deprecated. Use `string::raw::from_utf8`
#[deprecated = "Use string::raw::from_utf8"]
pub unsafe fn from_byte(u: u8) -> String {
string::raw::from_utf8(vec![u])
}
}
/*
Section: Trait implementations
*/
2014-08-04 05:48:39 -05:00
/// Any string that can be represented as a slice.
pub trait StrAllocating: Str {
2014-08-04 05:48:39 -05:00
/// Converts `self` into a `String`, not making a copy if possible.
fn into_string(self) -> String;
#[allow(missing_doc)]
#[deprecated = "replaced by .into_string()"]
fn into_owned(self) -> String {
self.into_string()
}
2014-08-04 05:48:39 -05:00
/// Escapes each char in `s` with `char::escape_default`.
fn escape_default(&self) -> String {
let me = self.as_slice();
let mut out = String::with_capacity(me.len());
for c in me.chars() {
c.escape_default(|c| out.push(c));
2013-06-11 07:13:23 -05:00
}
out
2013-06-11 07:13:23 -05:00
}
2014-08-04 05:48:39 -05:00
/// Escapes each char in `s` with `char::escape_unicode`.
fn escape_unicode(&self) -> String {
let me = self.as_slice();
let mut out = String::with_capacity(me.len());
for c in me.chars() {
c.escape_unicode(|c| out.push(c));
2013-06-11 07:13:23 -05:00
}
out
2013-06-11 07:13:23 -05:00
}
2012-07-23 13:51:12 -05:00
2014-08-04 05:48:39 -05:00
/// Replaces all occurrences of one string with another.
///
/// # Arguments
///
/// * `from` - The string to replace
/// * `to` - The replacement string
///
/// # Return value
///
/// The original string with all occurrences of `from` replaced with `to`.
///
/// # Example
///
/// ```rust
/// let s = "Do you know the muffin man,
2014-05-25 05:10:11 -05:00
/// The muffin man, the muffin man, ...".to_string();
///
/// assert_eq!(s.replace("muffin man", "little lamb"),
/// "Do you know the little lamb,
2014-05-25 05:10:11 -05:00
/// The little lamb, the little lamb, ...".to_string());
///
/// // not found, so no change.
/// assert_eq!(s.replace("cookie monster", "little lamb"), s);
/// ```
fn replace(&self, from: &str, to: &str) -> String {
let me = self.as_slice();
let mut result = String::new();
2013-06-14 21:40:11 -05:00
let mut last_end = 0;
for (start, end) in me.match_indices(from) {
result.push_str(unsafe{raw::slice_bytes(me, last_end, start)});
2013-06-11 06:46:40 -05:00
result.push_str(to);
last_end = end;
}
result.push_str(unsafe{raw::slice_bytes(me, last_end, me.len())});
result
2013-06-11 06:46:40 -05:00
}
2014-05-25 05:10:11 -05:00
#[allow(missing_doc)]
#[deprecated = "obsolete, use `to_string`"]
2013-08-04 15:22:56 -05:00
#[inline]
fn to_owned(&self) -> String {
unsafe {
2014-09-17 14:56:31 -05:00
mem::transmute(self.as_slice().as_bytes().to_vec())
}
2013-08-04 15:22:56 -05:00
}
/// Converts to a vector of `u16` encoded as UTF-16.
#[deprecated = "use `utf16_units` instead"]
fn to_utf16(&self) -> Vec<u16> {
self.as_slice().utf16_units().collect::<Vec<u16>>()
}
2014-08-04 05:48:39 -05:00
/// Given a string, makes a new string with repeated copies of it.
fn repeat(&self, nn: uint) -> String {
let me = self.as_slice();
let mut ret = String::with_capacity(nn * me.len());
for _ in range(0, nn) {
ret.push_str(me);
2013-08-04 15:22:56 -05:00
}
ret
2013-08-04 15:22:56 -05:00
}
2014-08-04 05:48:39 -05:00
/// Returns the Levenshtein Distance between two strings.
fn lev_distance(&self, t: &str) -> uint {
let me = self.as_slice();
let slen = me.len();
let tlen = t.len();
if slen == 0 { return tlen; }
if tlen == 0 { return slen; }
let mut dcol = Vec::from_fn(tlen + 1, |x| x);
for (i, sc) in me.chars().enumerate() {
let mut current = i;
*dcol.get_mut(0) = current + 1;
for (j, tc) in t.chars().enumerate() {
2014-08-11 18:47:46 -05:00
let next = dcol[j + 1];
if sc == tc {
*dcol.get_mut(j + 1) = current;
} else {
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
*dcol.get_mut(j + 1) = cmp::min(current, next);
2014-08-11 18:47:46 -05:00
*dcol.get_mut(j + 1) = cmp::min(dcol[j + 1],
dcol[j]) + 1;
}
current = next;
}
}
2014-08-11 18:47:46 -05:00
return dcol[tlen];
}
2014-08-04 05:48:39 -05:00
/// Returns an iterator over the string in Unicode Normalization Form D
/// (canonical decomposition).
#[inline]
fn nfd_chars<'a>(&'a self) -> Decompositions<'a> {
Decompositions {
iter: self.as_slice().chars(),
buffer: Vec::new(),
sorted: false,
kind: Canonical
}
}
2014-08-04 05:48:39 -05:00
/// Returns an iterator over the string in Unicode Normalization Form KD
/// (compatibility decomposition).
2013-07-10 19:33:11 -05:00
#[inline]
fn nfkd_chars<'a>(&'a self) -> Decompositions<'a> {
Decompositions {
iter: self.as_slice().chars(),
buffer: Vec::new(),
sorted: false,
kind: Compatible
}
2013-07-10 19:33:11 -05:00
}
/// An Iterator over the string in Unicode Normalization Form C
/// (canonical decomposition followed by canonical composition).
#[inline]
fn nfc_chars<'a>(&'a self) -> Recompositions<'a> {
Recompositions {
iter: self.nfd_chars(),
state: Composing,
buffer: RingBuf::new(),
composee: None,
last_ccc: None
}
}
/// An Iterator over the string in Unicode Normalization Form KC
/// (compatibility decomposition followed by canonical composition).
#[inline]
fn nfkc_chars<'a>(&'a self) -> Recompositions<'a> {
Recompositions {
iter: self.nfkd_chars(),
state: Composing,
buffer: RingBuf::new(),
composee: None,
last_ccc: None
}
}
2012-03-16 19:35:38 -05:00
}
impl<'a> StrAllocating for &'a str {
#[inline]
fn into_string(self) -> String {
String::from_str(self)
}
}
2012-01-17 19:28:21 -06:00
#[cfg(test)]
mod tests {
use std::iter::AdditiveIterator;
use std::iter::range;
use std::default::Default;
use std::char::Char;
use std::clone::Clone;
2014-07-04 15:38:13 -05:00
use std::cmp::{Equal, Greater, Less, Ord, PartialOrd, Equiv};
use std::option::{Some, None};
use std::ptr::RawPtr;
use std::iter::{Iterator, DoubleEndedIterator};
2014-07-14 16:45:10 -05:00
use {Collection, MutableSeq};
use super::*;
2014-08-13 18:33:14 -05:00
use std::slice::{Slice, ImmutableSlice};
use string::String;
use vec::Vec;
use unicode::char::UnicodeChar;
2012-09-03 12:47:10 -05:00
#[test]
fn test_eq_slice() {
assert!((eq_slice("foobar".slice(0, 3), "foo")));
assert!((eq_slice("barfoo".slice(3, 6), "foo")));
2013-03-28 20:39:09 -05:00
assert!((!eq_slice("foo1", "foo2")));
2012-09-03 12:47:10 -05:00
}
2012-01-17 19:28:21 -06:00
#[test]
2012-02-03 05:28:49 -06:00
fn test_le() {
assert!("" <= "");
assert!("" <= "foo");
assert!("foo" <= "foo");
2013-06-27 10:45:24 -05:00
assert!("foo" != "bar");
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_len() {
assert_eq!("".len(), 0u);
assert_eq!("hello world".len(), 11u);
assert_eq!("\x63".len(), 1u);
assert_eq!("\xa2".len(), 2u);
assert_eq!("\u03c0".len(), 2u);
assert_eq!("\u2620".len(), 3u);
assert_eq!("\U0001d11e".len(), 4u);
assert_eq!("".char_len(), 0u);
assert_eq!("hello world".char_len(), 11u);
assert_eq!("\x63".char_len(), 1u);
assert_eq!("\xa2".char_len(), 1u);
assert_eq!("\u03c0".char_len(), 1u);
assert_eq!("\u2620".char_len(), 1u);
assert_eq!("\U0001d11e".char_len(), 1u);
assert_eq!("ประเทศไทย中华Việt Nam".char_len(), 19u);
2014-07-10 19:28:40 -05:00
assert_eq!("".width(false), 10u);
assert_eq!("".width(true), 10u);
assert_eq!("\0\0\0\0\0".width(false), 0u);
assert_eq!("\0\0\0\0\0".width(true), 0u);
assert_eq!("".width(false), 0u);
assert_eq!("".width(true), 0u);
assert_eq!("\u2081\u2082\u2083\u2084".width(false), 4u);
assert_eq!("\u2081\u2082\u2083\u2084".width(true), 8u);
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_find() {
assert_eq!("hello".find('l'), Some(2u));
assert_eq!("hello".find(|c:char| c == 'o'), Some(4u));
assert!("hello".find('x').is_none());
assert!("hello".find(|c:char| c == 'x').is_none());
assert_eq!("ประเทศไทย中华Việt Nam".find('华'), Some(30u));
assert_eq!("ประเทศไทย中华Việt Nam".find(|c: char| c == '华'), Some(30u));
}
#[test]
fn test_rfind() {
assert_eq!("hello".rfind('l'), Some(3u));
assert_eq!("hello".rfind(|c:char| c == 'o'), Some(4u));
assert!("hello".rfind('x').is_none());
assert!("hello".rfind(|c:char| c == 'x').is_none());
assert_eq!("ประเทศไทย中华Việt Nam".rfind('华'), Some(30u));
assert_eq!("ประเทศไทย中华Việt Nam".rfind(|c: char| c == '华'), Some(30u));
}
#[test]
fn test_collect() {
let empty = String::from_str("");
let s: String = empty.as_slice().chars().collect();
2013-09-27 19:02:31 -05:00
assert_eq!(empty, s);
let data = String::from_str("ประเทศไทย中");
let s: String = data.as_slice().chars().collect();
2013-09-27 19:02:31 -05:00
assert_eq!(data, s);
}
#[test]
fn test_into_bytes() {
let data = String::from_str("asdf");
let buf = data.into_bytes();
assert_eq!(b"asdf", buf.as_slice());
}
2012-01-17 19:28:21 -06:00
#[test]
fn test_find_str() {
2012-02-13 02:17:59 -06:00
// byte positions
assert_eq!("".find_str(""), Some(0u));
assert!("banana".find_str("apple pie").is_none());
2012-02-16 21:16:08 -06:00
2013-05-23 11:39:17 -05:00
let data = "abcabc";
assert_eq!(data.slice(0u, 6u).find_str("ab"), Some(0u));
2013-06-10 08:01:45 -05:00
assert_eq!(data.slice(2u, 6u).find_str("ab"), Some(3u - 2u));
assert!(data.slice(2u, 4u).find_str("ab").is_none());
2012-02-13 05:07:29 -06:00
let string = "ประเทศไทย中华Việt Nam";
let mut data = String::from_str(string);
data.push_str(string);
assert!(data.as_slice().find_str("ไท华").is_none());
assert_eq!(data.as_slice().slice(0u, 43u).find_str(""), Some(0u));
assert_eq!(data.as_slice().slice(6u, 43u).find_str(""), Some(6u - 6u));
assert_eq!(data.as_slice().slice(0u, 43u).find_str("ประ"), Some( 0u));
assert_eq!(data.as_slice().slice(0u, 43u).find_str("ทศไ"), Some(12u));
assert_eq!(data.as_slice().slice(0u, 43u).find_str("ย中"), Some(24u));
assert_eq!(data.as_slice().slice(0u, 43u).find_str("iệt"), Some(34u));
assert_eq!(data.as_slice().slice(0u, 43u).find_str("Nam"), Some(40u));
assert_eq!(data.as_slice().slice(43u, 86u).find_str("ประ"), Some(43u - 43u));
assert_eq!(data.as_slice().slice(43u, 86u).find_str("ทศไ"), Some(55u - 43u));
assert_eq!(data.as_slice().slice(43u, 86u).find_str("ย中"), Some(67u - 43u));
assert_eq!(data.as_slice().slice(43u, 86u).find_str("iệt"), Some(77u - 43u));
assert_eq!(data.as_slice().slice(43u, 86u).find_str("Nam"), Some(83u - 43u));
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_slice_chars() {
fn t(a: &str, b: &str, start: uint) {
assert_eq!(a.slice_chars(start, start + b.char_len()), b);
2012-01-17 19:28:21 -06:00
}
t("", "", 0);
t("hello", "llo", 2);
t("hello", "el", 1);
t("αβλ", "β", 1);
t("αβλ", "", 3);
assert_eq!("ะเทศไท", "ประเทศไทย中华Việt Nam".slice_chars(2, 8));
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_concat() {
fn t(v: &[String], s: &str) {
assert_eq!(v.concat().as_slice(), s);
2012-09-21 20:36:32 -05:00
}
t([String::from_str("you"), String::from_str("know"),
String::from_str("I'm"),
String::from_str("no"), String::from_str("good")],
"youknowI'mnogood");
let v: &[String] = [];
2013-05-23 11:39:17 -05:00
t(v, "");
t([String::from_str("hi")], "hi");
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_connect() {
fn t(v: &[String], sep: &str, s: &str) {
assert_eq!(v.connect(sep).as_slice(), s);
2012-01-17 19:28:21 -06:00
}
t([String::from_str("you"), String::from_str("know"),
String::from_str("I'm"),
String::from_str("no"), String::from_str("good")],
2013-05-23 11:39:17 -05:00
" ", "you know I'm no good");
let v: &[String] = [];
2013-05-23 11:39:17 -05:00
t(v, " ", "");
t([String::from_str("hi")], " ", "hi");
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_concat_slices() {
fn t(v: &[&str], s: &str) {
assert_eq!(v.concat().as_slice(), s);
}
t(["you", "know", "I'm", "no", "good"], "youknowI'mnogood");
let v: &[&str] = [];
t(v, "");
t(["hi"], "hi");
}
#[test]
fn test_connect_slices() {
fn t(v: &[&str], sep: &str, s: &str) {
assert_eq!(v.connect(sep).as_slice(), s);
}
t(["you", "know", "I'm", "no", "good"],
" ", "you know I'm no good");
t([], " ", "");
t(["hi"], " ", "hi");
}
#[test]
fn test_repeat() {
assert_eq!("x".repeat(4), String::from_str("xxxx"));
assert_eq!("hi".repeat(4), String::from_str("hihihihi"));
assert_eq!("ไท华".repeat(3), String::from_str("ไท华ไท华ไท华"));
assert_eq!("".repeat(4), String::from_str(""));
assert_eq!("hi".repeat(0), String::from_str(""));
}
2012-01-17 19:28:21 -06:00
#[test]
fn test_unsafe_slice() {
assert_eq!("ab", unsafe {raw::slice_bytes("abc", 0, 2)});
assert_eq!("bc", unsafe {raw::slice_bytes("abc", 1, 3)});
assert_eq!("", unsafe {raw::slice_bytes("abc", 1, 1)});
fn a_million_letter_a() -> String {
let mut i = 0u;
let mut rs = String::new();
while i < 100000 {
rs.push_str("aaaaaaaaaa");
i += 1;
}
rs
2012-01-17 19:28:21 -06:00
}
fn half_a_million_letter_a() -> String {
let mut i = 0u;
let mut rs = String::new();
while i < 100000 {
rs.push_str("aaaaa");
i += 1;
}
rs
2013-03-21 05:58:03 -05:00
}
let letters = a_million_letter_a();
2013-03-28 20:39:09 -05:00
assert!(half_a_million_letter_a() ==
unsafe {String::from_str(raw::slice_bytes(letters.as_slice(),
0u,
500000))});
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_starts_with() {
assert!(("".starts_with("")));
assert!(("abc".starts_with("")));
assert!(("abc".starts_with("a")));
assert!((!"a".starts_with("abc")));
assert!((!"".starts_with("abc")));
assert!((!"ödd".starts_with("-")));
assert!(("ödd".starts_with("öd")));
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_ends_with() {
assert!(("".ends_with("")));
assert!(("abc".ends_with("")));
assert!(("abc".ends_with("c")));
assert!((!"a".ends_with("abc")));
assert!((!"".ends_with("abc")));
assert!((!"ddö".ends_with("-")));
assert!(("ddö".ends_with("")));
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_is_empty() {
assert!("".is_empty());
assert!(!"a".is_empty());
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_replace() {
2013-05-23 11:39:17 -05:00
let a = "a";
assert_eq!("".replace(a, "b"), String::from_str(""));
assert_eq!("a".replace(a, "b"), String::from_str("b"));
assert_eq!("ab".replace(a, "b"), String::from_str("bb"));
2013-05-23 11:39:17 -05:00
let test = "test";
2013-06-11 06:46:40 -05:00
assert!(" test test ".replace(test, "toast") ==
String::from_str(" toast toast "));
assert_eq!(" test test ".replace(test, ""), String::from_str(" "));
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_replace_2a() {
let data = "ประเทศไทย中华";
let repl = "دولة الكويت";
let a = "ประเ";
let a2 = "دولة الكويتทศไทย中华";
assert_eq!(data.replace(a, repl).as_slice(), a2);
}
#[test]
fn test_replace_2b() {
let data = "ประเทศไทย中华";
let repl = "دولة الكويت";
let b = "ะเ";
let b2 = "ปรدولة الكويتทศไทย中华";
assert_eq!(data.replace(b, repl).as_slice(), b2);
}
#[test]
fn test_replace_2c() {
let data = "ประเทศไทย中华";
let repl = "دولة الكويت";
let c = "中华";
let c2 = "ประเทศไทยدولة الكويت";
assert_eq!(data.replace(c, repl).as_slice(), c2);
}
#[test]
fn test_replace_2d() {
let data = "ประเทศไทย中华";
let repl = "دولة الكويت";
let d = "ไท华";
assert_eq!(data.replace(d, repl).as_slice(), data);
}
#[test]
fn test_slice() {
assert_eq!("ab", "abc".slice(0, 2));
assert_eq!("bc", "abc".slice(1, 3));
assert_eq!("", "abc".slice(1, 1));
assert_eq!("\u65e5", "\u65e5\u672c".slice(0, 3));
2013-03-21 05:58:03 -05:00
let data = "ประเทศไทย中华";
assert_eq!("", data.slice(0, 3));
assert_eq!("", data.slice(3, 6));
assert_eq!("", data.slice(3, 3));
assert_eq!("", data.slice(30, 33));
fn a_million_letter_x() -> String {
let mut i = 0u;
let mut rs = String::new();
2012-09-21 20:36:32 -05:00
while i < 100000 {
rs.push_str("华华华华华华华华华华");
2012-09-21 20:36:32 -05:00
i += 1;
}
rs
}
fn half_a_million_letter_x() -> String {
let mut i = 0u;
let mut rs = String::new();
while i < 100000 {
rs.push_str("华华华华华");
i += 1;
}
rs
}
let letters = a_million_letter_x();
assert!(half_a_million_letter_x() ==
String::from_str(letters.as_slice().slice(0u, 3u * 500000u)));
}
#[test]
fn test_slice_2() {
2013-03-21 05:58:03 -05:00
let ss = "中华Việt Nam";
assert_eq!("", ss.slice(3u, 6u));
assert_eq!("Việt Nam", ss.slice(6u, 16u));
assert_eq!("ab", "abc".slice(0u, 2u));
assert_eq!("bc", "abc".slice(1u, 3u));
assert_eq!("", "abc".slice(1u, 1u));
assert_eq!("", ss.slice(0u, 3u));
assert_eq!("华V", ss.slice(3u, 7u));
assert_eq!("", ss.slice(3u, 3u));
/*0: 中
3:
6: V
7: i
8:
11: t
12:
13: N
14: a
15: m */
}
2012-01-17 19:28:21 -06:00
#[test]
#[should_fail]
fn test_slice_fail() {
"中华Việt Nam".slice(0u, 2u);
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_slice_from() {
assert_eq!("abcd".slice_from(0), "abcd");
assert_eq!("abcd".slice_from(2), "cd");
assert_eq!("abcd".slice_from(4), "");
}
#[test]
fn test_slice_to() {
assert_eq!("abcd".slice_to(0), "");
assert_eq!("abcd".slice_to(2), "ab");
assert_eq!("abcd".slice_to(4), "abcd");
}
2012-01-17 19:28:21 -06:00
#[test]
2012-09-05 18:39:06 -05:00
fn test_trim_left_chars() {
let v: &[char] = &[];
assert_eq!(" *** foo *** ".trim_left_chars(v), " *** foo *** ");
let chars: &[char] = &['*', ' '];
assert_eq!(" *** foo *** ".trim_left_chars(chars), "foo *** ");
assert_eq!(" *** *** ".trim_left_chars(chars), "");
assert_eq!("foo *** ".trim_left_chars(chars), "foo *** ");
assert_eq!("11foo1bar11".trim_left_chars('1'), "foo1bar11");
let chars: &[char] = &['1', '2'];
assert_eq!("12foo1bar12".trim_left_chars(chars), "foo1bar12");
assert_eq!("123foo1bar123".trim_left_chars(|c: char| c.is_digit()), "foo1bar123");
2012-09-05 18:39:06 -05:00
}
#[test]
fn test_trim_right_chars() {
let v: &[char] = &[];
assert_eq!(" *** foo *** ".trim_right_chars(v), " *** foo *** ");
let chars: &[char] = &['*', ' '];
assert_eq!(" *** foo *** ".trim_right_chars(chars), " *** foo");
assert_eq!(" *** *** ".trim_right_chars(chars), "");
assert_eq!(" *** foo".trim_right_chars(chars), " *** foo");
assert_eq!("11foo1bar11".trim_right_chars('1'), "11foo1bar");
let chars: &[char] = &['1', '2'];
assert_eq!("12foo1bar12".trim_right_chars(chars), "12foo1bar");
assert_eq!("123foo1bar123".trim_right_chars(|c: char| c.is_digit()), "123foo1bar");
2012-09-05 18:39:06 -05:00
}
#[test]
fn test_trim_chars() {
let v: &[char] = &[];
assert_eq!(" *** foo *** ".trim_chars(v), " *** foo *** ");
let chars: &[char] = &['*', ' '];
assert_eq!(" *** foo *** ".trim_chars(chars), "foo");
assert_eq!(" *** *** ".trim_chars(chars), "");
assert_eq!("foo".trim_chars(chars), "foo");
assert_eq!("11foo1bar11".trim_chars('1'), "foo1bar");
let chars: &[char] = &['1', '2'];
assert_eq!("12foo1bar12".trim_chars(chars), "foo1bar");
assert_eq!("123foo1bar123".trim_chars(|c: char| c.is_digit()), "foo1bar");
2012-09-05 18:39:06 -05:00
}
#[test]
2012-01-17 19:28:21 -06:00
fn test_trim_left() {
2013-06-10 06:03:16 -05:00
assert_eq!("".trim_left(), "");
assert_eq!("a".trim_left(), "a");
assert_eq!(" ".trim_left(), "");
assert_eq!(" blah".trim_left(), "blah");
assert_eq!(" \u3000 wut".trim_left(), "wut");
assert_eq!("hey ".trim_left(), "hey ");
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_trim_right() {
2013-06-10 06:03:16 -05:00
assert_eq!("".trim_right(), "");
assert_eq!("a".trim_right(), "a");
assert_eq!(" ".trim_right(), "");
assert_eq!("blah ".trim_right(), "blah");
assert_eq!("wut \u3000 ".trim_right(), "wut");
assert_eq!(" hey".trim_right(), " hey");
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_trim() {
2013-06-10 06:03:16 -05:00
assert_eq!("".trim(), "");
assert_eq!("a".trim(), "a");
assert_eq!(" ".trim(), "");
assert_eq!(" blah ".trim(), "blah");
assert_eq!("\nwut \u3000 ".trim(), "wut");
assert_eq!(" hey dude ".trim(), "hey dude");
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_is_whitespace() {
assert!("".is_whitespace());
assert!(" ".is_whitespace());
assert!("\u2009".is_whitespace()); // Thin space
assert!(" \n\t ".is_whitespace());
assert!(!" _ ".is_whitespace());
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_slice_shift_char() {
let data = "ประเทศไทย中";
assert_eq!(data.slice_shift_char(), (Some('ป'), "ระเทศไทย中"));
}
#[test]
fn test_slice_shift_char_2() {
let empty = "";
assert_eq!(empty.slice_shift_char(), (None, ""));
}
#[test]
fn test_is_utf8() {
// deny overlong encodings
assert!(!is_utf8([0xc0, 0x80]));
assert!(!is_utf8([0xc0, 0xae]));
assert!(!is_utf8([0xe0, 0x80, 0x80]));
assert!(!is_utf8([0xe0, 0x80, 0xaf]));
assert!(!is_utf8([0xe0, 0x81, 0x81]));
assert!(!is_utf8([0xf0, 0x82, 0x82, 0xac]));
assert!(!is_utf8([0xf4, 0x90, 0x80, 0x80]));
// deny surrogates
assert!(!is_utf8([0xED, 0xA0, 0x80]));
assert!(!is_utf8([0xED, 0xBF, 0xBF]));
assert!(is_utf8([0xC2, 0x80]));
assert!(is_utf8([0xDF, 0xBF]));
assert!(is_utf8([0xE0, 0xA0, 0x80]));
assert!(is_utf8([0xED, 0x9F, 0xBF]));
assert!(is_utf8([0xEE, 0x80, 0x80]));
assert!(is_utf8([0xEF, 0xBF, 0xBF]));
assert!(is_utf8([0xF0, 0x90, 0x80, 0x80]));
assert!(is_utf8([0xF4, 0x8F, 0xBF, 0xBF]));
}
#[test]
fn test_is_utf16() {
macro_rules! pos ( ($($e:expr),*) => { { $(assert!(is_utf16($e));)* } });
// non-surrogates
pos!([0x0000],
[0x0001, 0x0002],
[0xD7FF],
[0xE000]);
// surrogate pairs (randomly generated with Python 3's
// .encode('utf-16be'))
pos!([0xdb54, 0xdf16, 0xd880, 0xdee0, 0xdb6a, 0xdd45],
[0xd91f, 0xdeb1, 0xdb31, 0xdd84, 0xd8e2, 0xde14],
[0xdb9f, 0xdc26, 0xdb6f, 0xde58, 0xd850, 0xdfae]);
// mixtures (also random)
pos!([0xd921, 0xdcc2, 0x002d, 0x004d, 0xdb32, 0xdf65],
[0xdb45, 0xdd2d, 0x006a, 0xdacd, 0xddfe, 0x0006],
[0x0067, 0xd8ff, 0xddb7, 0x000f, 0xd900, 0xdc80]);
// negative tests
macro_rules! neg ( ($($e:expr),*) => { { $(assert!(!is_utf16($e));)* } });
neg!(
// surrogate + regular unit
[0xdb45, 0x0000],
// surrogate + lead surrogate
[0xd900, 0xd900],
// unterminated surrogate
[0xd8ff],
// trail surrogate without a lead
[0xddb7]);
// random byte sequences that Python 3's .decode('utf-16be')
// failed on
neg!([0x5b3d, 0x0141, 0xde9e, 0x8fdc, 0xc6e7],
[0xdf5a, 0x82a5, 0x62b9, 0xb447, 0x92f3],
[0xda4e, 0x42bc, 0x4462, 0xee98, 0xc2ca],
[0xbe00, 0xb04a, 0x6ecb, 0xdd89, 0xe278],
[0x0465, 0xab56, 0xdbb6, 0xa893, 0x665e],
[0x6b7f, 0x0a19, 0x40f4, 0xa657, 0xdcc5],
[0x9b50, 0xda5e, 0x24ec, 0x03ad, 0x6dee],
[0x8d17, 0xcaa7, 0xf4ae, 0xdf6e, 0xbed7],
[0xdaee, 0x2584, 0x7d30, 0xa626, 0x121a],
[0xd956, 0x4b43, 0x7570, 0xccd6, 0x4f4a],
[0x9dcf, 0x1b49, 0x4ba5, 0xfce9, 0xdffe],
[0x6572, 0xce53, 0xb05a, 0xf6af, 0xdacf],
[0x1b90, 0x728c, 0x9906, 0xdb68, 0xf46e],
[0x1606, 0xbeca, 0xbe76, 0x860f, 0xdfa5],
[0x8b4f, 0xde7a, 0xd220, 0x9fac, 0x2b6f],
[0xb8fe, 0xebbe, 0xda32, 0x1a5f, 0x8b8b],
[0x934b, 0x8956, 0xc434, 0x1881, 0xddf7],
[0x5a95, 0x13fc, 0xf116, 0xd89b, 0x93f9],
[0xd640, 0x71f1, 0xdd7d, 0x77eb, 0x1cd8],
[0x348b, 0xaef0, 0xdb2c, 0xebf1, 0x1282],
[0x50d7, 0xd824, 0x5010, 0xb369, 0x22ea]);
}
#[test]
fn test_as_bytes() {
// no null
let v = [
224, 184, 168, 224, 185, 132, 224, 184, 151, 224, 184, 162, 228,
184, 173, 229, 141, 142, 86, 105, 225, 187, 135, 116, 32, 78, 97,
109
];
let b: &[u8] = &[];
assert_eq!("".as_bytes(), b);
assert_eq!("abc".as_bytes(), b"abc");
assert_eq!("ศไทย中华Việt Nam".as_bytes(), v.as_slice());
}
#[test]
#[should_fail]
fn test_as_bytes_fail() {
// Don't double free. (I'm not sure if this exercises the
// original problem code path anymore.)
let s = String::from_str("");
2013-07-03 22:02:09 -05:00
let _bytes = s.as_bytes();
fail!();
}
2012-01-17 19:28:21 -06:00
#[test]
fn test_as_ptr() {
let buf = "hello".as_ptr();
unsafe {
assert_eq!(*buf.offset(0), b'h');
assert_eq!(*buf.offset(1), b'e');
assert_eq!(*buf.offset(2), b'l');
assert_eq!(*buf.offset(3), b'l');
assert_eq!(*buf.offset(4), b'o');
}
2012-01-17 19:28:21 -06:00
}
2013-06-30 10:29:38 -05:00
#[test]
fn test_subslice_offset() {
let a = "kernelsprite";
let b = a.slice(7, a.len());
let c = a.slice(0, a.len() - 6);
assert_eq!(a.subslice_offset(b), 7);
assert_eq!(a.subslice_offset(c), 0);
2013-04-10 17:48:31 -05:00
let string = "a\nb\nc";
let lines: Vec<&str> = string.lines().collect();
let lines = lines.as_slice();
assert_eq!(string.subslice_offset(lines[0]), 0);
assert_eq!(string.subslice_offset(lines[1]), 2);
assert_eq!(string.subslice_offset(lines[2]), 4);
}
#[test]
#[should_fail]
fn test_subslice_offset_2() {
let a = "alchemiter";
let b = "cruxtruder";
a.subslice_offset(b);
}
2012-01-17 19:28:21 -06:00
#[test]
fn vec_str_conversions() {
let s1: String = String::from_str("All mimsy were the borogoves");
2012-01-17 19:28:21 -06:00
let v: Vec<u8> = Vec::from_slice(s1.as_bytes());
let s2: String = String::from_str(from_utf8(v.as_slice()).unwrap());
let mut i: uint = 0u;
let n1: uint = s1.len();
let n2: uint = v.len();
assert_eq!(n1, n2);
2012-01-17 19:28:21 -06:00
while i < n1 {
let a: u8 = s1.as_bytes()[i];
let b: u8 = s2.as_bytes()[i];
debug!("{}", a);
debug!("{}", b);
assert_eq!(a, b);
2012-01-17 19:28:21 -06:00
i += 1u;
}
}
#[test]
fn test_contains() {
assert!("abcde".contains("bcd"));
assert!("abcde".contains("abcd"));
assert!("abcde".contains("bcde"));
assert!("abcde".contains(""));
assert!("".contains(""));
assert!(!"abcde".contains("def"));
assert!(!"".contains("a"));
let data = "ประเทศไทย中华Việt Nam";
assert!(data.contains("ประเ"));
assert!(data.contains("ะเ"));
assert!(data.contains("中华"));
assert!(!data.contains("ไท华"));
2012-01-17 19:28:21 -06:00
}
#[test]
fn test_contains_char() {
2013-06-10 08:01:45 -05:00
assert!("abc".contains_char('b'));
assert!("a".contains_char('a'));
assert!(!"abc".contains_char('d'));
assert!(!"".contains_char('a'));
}
#[test]
fn test_truncate_utf16_at_nul() {
let v = [];
let b: &[u16] = &[];
assert_eq!(truncate_utf16_at_nul(v), b);
let v = [0, 2, 3];
assert_eq!(truncate_utf16_at_nul(v), b);
let v = [1, 0, 3];
let b: &[u16] = &[1];
assert_eq!(truncate_utf16_at_nul(v), b);
let v = [1, 2, 0];
let b: &[u16] = &[1, 2];
assert_eq!(truncate_utf16_at_nul(v), b);
let v = [1, 2, 3];
let b: &[u16] = &[1, 2, 3];
assert_eq!(truncate_utf16_at_nul(v), b);
}
2013-03-15 02:32:11 -05:00
#[test]
fn test_char_at() {
let s = "ศไทย中华Việt Nam";
let v = vec!['ศ','ไ','ท','ย','中','华','V','i','ệ','t',' ','N','a','m'];
2013-03-15 02:32:11 -05:00
let mut pos = 0;
for ch in v.iter() {
2013-03-28 20:39:09 -05:00
assert!(s.char_at(pos) == *ch);
2014-07-04 15:38:13 -05:00
pos += String::from_char(1, *ch).len();
2013-03-15 02:32:11 -05:00
}
}
#[test]
fn test_char_at_reverse() {
let s = "ศไทย中华Việt Nam";
let v = vec!['ศ','ไ','ท','ย','中','华','V','i','ệ','t',' ','N','a','m'];
2013-03-15 02:32:11 -05:00
let mut pos = s.len();
Deprecate the rev_iter pattern in all places where a DoubleEndedIterator is provided (everywhere but treemap) This commit deprecates rev_iter, mut_rev_iter, move_rev_iter everywhere (except treemap) and also deprecates related functions like rsplit, rev_components, and rev_str_components. In every case, these functions can be replaced with the non-reversed form followed by a call to .rev(). To make this more concrete, a translation table for all functional changes necessary follows: * container.rev_iter() -> container.iter().rev() * container.mut_rev_iter() -> container.mut_iter().rev() * container.move_rev_iter() -> container.move_iter().rev() * sliceorstr.rsplit(sep) -> sliceorstr.split(sep).rev() * path.rev_components() -> path.components().rev() * path.rev_str_components() -> path.str_components().rev() In terms of the type system, this change also deprecates any specialized reversed iterator types (except in treemap), opting instead to use Rev directly if any type annotations are needed. However, since methods directly returning reversed iterators are now discouraged, the need for such annotations should be small. However, in those cases, the general pattern for conversion is to take whatever follows Rev in the original reversed name and surround it with Rev<>: * RevComponents<'a> -> Rev<Components<'a>> * RevStrComponents<'a> -> Rev<StrComponents<'a>> * RevItems<'a, T> -> Rev<Items<'a, T>> * etc. The reasoning behind this change is that it makes the standard API much simpler without reducing readability, performance, or power. The presence of functions such as rev_iter adds more boilerplate code to libraries (all of which simply call .iter().rev()), clutters up the documentation, and only helps code by saving two characters. Additionally, the numerous type synonyms that were used to make the type signatures look nice like RevItems add even more boilerplate and clutter up the docs even more. With this change, all that cruft goes away. [breaking-change]
2014-04-20 23:59:12 -05:00
for ch in v.iter().rev() {
2013-03-28 20:39:09 -05:00
assert!(s.char_at_reverse(pos) == *ch);
2014-07-04 15:38:13 -05:00
pos -= String::from_char(1, *ch).len();
2013-03-15 02:32:11 -05:00
}
}
#[test]
fn test_escape_unicode() {
assert_eq!("abc".escape_unicode(), String::from_str("\\x61\\x62\\x63"));
assert_eq!("a c".escape_unicode(), String::from_str("\\x61\\x20\\x63"));
assert_eq!("\r\n\t".escape_unicode(), String::from_str("\\x0d\\x0a\\x09"));
assert_eq!("'\"\\".escape_unicode(), String::from_str("\\x27\\x22\\x5c"));
assert_eq!("\x00\x01\xfe\xff".escape_unicode(), String::from_str("\\x00\\x01\\xfe\\xff"));
assert_eq!("\u0100\uffff".escape_unicode(), String::from_str("\\u0100\\uffff"));
assert_eq!("\U00010000\U0010ffff".escape_unicode(),
String::from_str("\\U00010000\\U0010ffff"));
assert_eq!("ab\ufb00".escape_unicode(), String::from_str("\\x61\\x62\\ufb00"));
assert_eq!("\U0001d4ea\r".escape_unicode(), String::from_str("\\U0001d4ea\\x0d"));
}
#[test]
fn test_escape_default() {
assert_eq!("abc".escape_default(), String::from_str("abc"));
assert_eq!("a c".escape_default(), String::from_str("a c"));
assert_eq!("\r\n\t".escape_default(), String::from_str("\\r\\n\\t"));
assert_eq!("'\"\\".escape_default(), String::from_str("\\'\\\"\\\\"));
assert_eq!("\u0100\uffff".escape_default(), String::from_str("\\u0100\\uffff"));
assert_eq!("\U00010000\U0010ffff".escape_default(),
String::from_str("\\U00010000\\U0010ffff"));
assert_eq!("ab\ufb00".escape_default(), String::from_str("ab\\ufb00"));
assert_eq!("\U0001d4ea\r".escape_default(), String::from_str("\\U0001d4ea\\r"));
}
2013-03-01 21:07:12 -06:00
#[test]
fn test_total_ord() {
"1234".cmp(&("123")) == Greater;
"123".cmp(&("1234")) == Less;
"1234".cmp(&("1234")) == Equal;
"12345555".cmp(&("123456")) == Less;
"22".cmp(&("1234")) == Greater;
2013-03-01 21:07:12 -06:00
}
#[test]
fn test_char_range_at() {
let data = "b¢€𤭢𤭢€¢b";
assert_eq!('b', data.char_range_at(0).ch);
assert_eq!('¢', data.char_range_at(1).ch);
assert_eq!('€', data.char_range_at(3).ch);
assert_eq!('𤭢', data.char_range_at(6).ch);
assert_eq!('𤭢', data.char_range_at(10).ch);
assert_eq!('€', data.char_range_at(14).ch);
assert_eq!('¢', data.char_range_at(17).ch);
assert_eq!('b', data.char_range_at(19).ch);
}
#[test]
fn test_char_range_at_reverse_underflow() {
2013-06-10 06:46:36 -05:00
assert_eq!("abc".char_range_at_reverse(0).next, 0);
}
2013-04-18 07:50:55 -05:00
#[test]
fn test_iterator() {
let s = "ศไทย中华Việt Nam";
let v = ['ศ','ไ','ท','ย','中','华','V','i','ệ','t',' ','N','a','m'];
2013-04-18 07:50:55 -05:00
let mut pos = 0;
let mut it = s.chars();
for c in it {
assert_eq!(c, v[pos]);
pos += 1;
}
assert_eq!(pos, v.len());
}
#[test]
fn test_rev_iterator() {
let s = "ศไทย中华Việt Nam";
let v = ['m', 'a', 'N', ' ', 't', 'ệ','i','V','华','中','ย','ท','ไ','ศ'];
let mut pos = 0;
Deprecate the rev_iter pattern in all places where a DoubleEndedIterator is provided (everywhere but treemap) This commit deprecates rev_iter, mut_rev_iter, move_rev_iter everywhere (except treemap) and also deprecates related functions like rsplit, rev_components, and rev_str_components. In every case, these functions can be replaced with the non-reversed form followed by a call to .rev(). To make this more concrete, a translation table for all functional changes necessary follows: * container.rev_iter() -> container.iter().rev() * container.mut_rev_iter() -> container.mut_iter().rev() * container.move_rev_iter() -> container.move_iter().rev() * sliceorstr.rsplit(sep) -> sliceorstr.split(sep).rev() * path.rev_components() -> path.components().rev() * path.rev_str_components() -> path.str_components().rev() In terms of the type system, this change also deprecates any specialized reversed iterator types (except in treemap), opting instead to use Rev directly if any type annotations are needed. However, since methods directly returning reversed iterators are now discouraged, the need for such annotations should be small. However, in those cases, the general pattern for conversion is to take whatever follows Rev in the original reversed name and surround it with Rev<>: * RevComponents<'a> -> Rev<Components<'a>> * RevStrComponents<'a> -> Rev<StrComponents<'a>> * RevItems<'a, T> -> Rev<Items<'a, T>> * etc. The reasoning behind this change is that it makes the standard API much simpler without reducing readability, performance, or power. The presence of functions such as rev_iter adds more boilerplate code to libraries (all of which simply call .iter().rev()), clutters up the documentation, and only helps code by saving two characters. Additionally, the numerous type synonyms that were used to make the type signatures look nice like RevItems add even more boilerplate and clutter up the docs even more. With this change, all that cruft goes away. [breaking-change]
2014-04-20 23:59:12 -05:00
let mut it = s.chars().rev();
2013-04-18 07:50:55 -05:00
for c in it {
2013-04-18 07:50:55 -05:00
assert_eq!(c, v[pos]);
pos += 1;
}
assert_eq!(pos, v.len());
}
#[test]
fn test_chars_decoding() {
let mut bytes = [0u8, ..4];
for c in range(0u32, 0x110000).filter_map(|c| ::core::char::from_u32(c)) {
let len = c.encode_utf8(bytes).unwrap_or(0);
let s = ::core::str::from_utf8(bytes.slice_to(len)).unwrap();
if Some(c) != s.chars().next() {
fail!("character {:x}={} does not decode correctly", c as u32, c);
}
}
}
#[test]
fn test_chars_rev_decoding() {
let mut bytes = [0u8, ..4];
for c in range(0u32, 0x110000).filter_map(|c| ::core::char::from_u32(c)) {
let len = c.encode_utf8(bytes).unwrap_or(0);
let s = ::core::str::from_utf8(bytes.slice_to(len)).unwrap();
if Some(c) != s.chars().rev().next() {
fail!("character {:x}={} does not decode correctly", c as u32, c);
}
}
}
#[test]
fn test_iterator_clone() {
let s = "ศไทย中华Việt Nam";
let mut it = s.chars();
it.next();
assert!(it.zip(it.clone()).all(|(x,y)| x == y));
}
#[test]
fn test_bytesator() {
let s = "ศไทย中华Việt Nam";
let v = [
224, 184, 168, 224, 185, 132, 224, 184, 151, 224, 184, 162, 228,
184, 173, 229, 141, 142, 86, 105, 225, 187, 135, 116, 32, 78, 97,
109
];
let mut pos = 0;
for b in s.bytes() {
assert_eq!(b, v[pos]);
pos += 1;
}
}
#[test]
fn test_bytes_revator() {
let s = "ศไทย中华Việt Nam";
let v = [
224, 184, 168, 224, 185, 132, 224, 184, 151, 224, 184, 162, 228,
184, 173, 229, 141, 142, 86, 105, 225, 187, 135, 116, 32, 78, 97,
109
];
let mut pos = v.len();
Deprecate the rev_iter pattern in all places where a DoubleEndedIterator is provided (everywhere but treemap) This commit deprecates rev_iter, mut_rev_iter, move_rev_iter everywhere (except treemap) and also deprecates related functions like rsplit, rev_components, and rev_str_components. In every case, these functions can be replaced with the non-reversed form followed by a call to .rev(). To make this more concrete, a translation table for all functional changes necessary follows: * container.rev_iter() -> container.iter().rev() * container.mut_rev_iter() -> container.mut_iter().rev() * container.move_rev_iter() -> container.move_iter().rev() * sliceorstr.rsplit(sep) -> sliceorstr.split(sep).rev() * path.rev_components() -> path.components().rev() * path.rev_str_components() -> path.str_components().rev() In terms of the type system, this change also deprecates any specialized reversed iterator types (except in treemap), opting instead to use Rev directly if any type annotations are needed. However, since methods directly returning reversed iterators are now discouraged, the need for such annotations should be small. However, in those cases, the general pattern for conversion is to take whatever follows Rev in the original reversed name and surround it with Rev<>: * RevComponents<'a> -> Rev<Components<'a>> * RevStrComponents<'a> -> Rev<StrComponents<'a>> * RevItems<'a, T> -> Rev<Items<'a, T>> * etc. The reasoning behind this change is that it makes the standard API much simpler without reducing readability, performance, or power. The presence of functions such as rev_iter adds more boilerplate code to libraries (all of which simply call .iter().rev()), clutters up the documentation, and only helps code by saving two characters. Additionally, the numerous type synonyms that were used to make the type signatures look nice like RevItems add even more boilerplate and clutter up the docs even more. With this change, all that cruft goes away. [breaking-change]
2014-04-20 23:59:12 -05:00
for b in s.bytes().rev() {
pos -= 1;
assert_eq!(b, v[pos]);
}
}
#[test]
fn test_char_indicesator() {
let s = "ศไทย中华Việt Nam";
let p = [0, 3, 6, 9, 12, 15, 18, 19, 20, 23, 24, 25, 26, 27];
let v = ['ศ','ไ','ท','ย','中','华','V','i','ệ','t',' ','N','a','m'];
let mut pos = 0;
let mut it = s.char_indices();
for c in it {
assert_eq!(c, (p[pos], v[pos]));
pos += 1;
}
assert_eq!(pos, v.len());
assert_eq!(pos, p.len());
}
#[test]
fn test_char_indices_revator() {
let s = "ศไทย中华Việt Nam";
let p = [27, 26, 25, 24, 23, 20, 19, 18, 15, 12, 9, 6, 3, 0];
let v = ['m', 'a', 'N', ' ', 't', 'ệ','i','V','华','中','ย','ท','ไ','ศ'];
let mut pos = 0;
Deprecate the rev_iter pattern in all places where a DoubleEndedIterator is provided (everywhere but treemap) This commit deprecates rev_iter, mut_rev_iter, move_rev_iter everywhere (except treemap) and also deprecates related functions like rsplit, rev_components, and rev_str_components. In every case, these functions can be replaced with the non-reversed form followed by a call to .rev(). To make this more concrete, a translation table for all functional changes necessary follows: * container.rev_iter() -> container.iter().rev() * container.mut_rev_iter() -> container.mut_iter().rev() * container.move_rev_iter() -> container.move_iter().rev() * sliceorstr.rsplit(sep) -> sliceorstr.split(sep).rev() * path.rev_components() -> path.components().rev() * path.rev_str_components() -> path.str_components().rev() In terms of the type system, this change also deprecates any specialized reversed iterator types (except in treemap), opting instead to use Rev directly if any type annotations are needed. However, since methods directly returning reversed iterators are now discouraged, the need for such annotations should be small. However, in those cases, the general pattern for conversion is to take whatever follows Rev in the original reversed name and surround it with Rev<>: * RevComponents<'a> -> Rev<Components<'a>> * RevStrComponents<'a> -> Rev<StrComponents<'a>> * RevItems<'a, T> -> Rev<Items<'a, T>> * etc. The reasoning behind this change is that it makes the standard API much simpler without reducing readability, performance, or power. The presence of functions such as rev_iter adds more boilerplate code to libraries (all of which simply call .iter().rev()), clutters up the documentation, and only helps code by saving two characters. Additionally, the numerous type synonyms that were used to make the type signatures look nice like RevItems add even more boilerplate and clutter up the docs even more. With this change, all that cruft goes away. [breaking-change]
2014-04-20 23:59:12 -05:00
let mut it = s.char_indices().rev();
for c in it {
assert_eq!(c, (p[pos], v[pos]));
pos += 1;
}
assert_eq!(pos, v.len());
assert_eq!(pos, p.len());
}
#[test]
fn test_split_char_iterator() {
let data = "\nMäry häd ä little lämb\nLittle lämb\n";
let split: Vec<&str> = data.split(' ').collect();
assert_eq!( split, vec!["\nMäry", "häd", "ä", "little", "lämb\nLittle", "lämb\n"]);
let mut rsplit: Vec<&str> = data.split(' ').rev().collect();
rsplit.reverse();
assert_eq!(rsplit, vec!["\nMäry", "häd", "ä", "little", "lämb\nLittle", "lämb\n"]);
let split: Vec<&str> = data.split(|c: char| c == ' ').collect();
assert_eq!( split, vec!["\nMäry", "häd", "ä", "little", "lämb\nLittle", "lämb\n"]);
let mut rsplit: Vec<&str> = data.split(|c: char| c == ' ').rev().collect();
rsplit.reverse();
assert_eq!(rsplit, vec!["\nMäry", "häd", "ä", "little", "lämb\nLittle", "lämb\n"]);
// Unicode
let split: Vec<&str> = data.split('ä').collect();
assert_eq!( split, vec!["\nM", "ry h", "d ", " little l", "mb\nLittle l", "mb\n"]);
let mut rsplit: Vec<&str> = data.split('ä').rev().collect();
rsplit.reverse();
assert_eq!(rsplit, vec!["\nM", "ry h", "d ", " little l", "mb\nLittle l", "mb\n"]);
let split: Vec<&str> = data.split(|c: char| c == 'ä').collect();
assert_eq!( split, vec!["\nM", "ry h", "d ", " little l", "mb\nLittle l", "mb\n"]);
let mut rsplit: Vec<&str> = data.split(|c: char| c == 'ä').rev().collect();
rsplit.reverse();
assert_eq!(rsplit, vec!["\nM", "ry h", "d ", " little l", "mb\nLittle l", "mb\n"]);
}
#[test]
fn test_splitn_char_iterator() {
let data = "\nMäry häd ä little lämb\nLittle lämb\n";
2014-08-13 18:33:14 -05:00
let split: Vec<&str> = data.splitn(3, ' ').collect();
assert_eq!(split, vec!["\nMäry", "häd", "ä", "little lämb\nLittle lämb\n"]);
2014-08-13 18:33:14 -05:00
let split: Vec<&str> = data.splitn(3, |c: char| c == ' ').collect();
assert_eq!(split, vec!["\nMäry", "häd", "ä", "little lämb\nLittle lämb\n"]);
// Unicode
2014-08-13 18:33:14 -05:00
let split: Vec<&str> = data.splitn(3, 'ä').collect();
assert_eq!(split, vec!["\nM", "ry h", "d ", " little lämb\nLittle lämb\n"]);
2014-08-13 18:33:14 -05:00
let split: Vec<&str> = data.splitn(3, |c: char| c == 'ä').collect();
assert_eq!(split, vec!["\nM", "ry h", "d ", " little lämb\nLittle lämb\n"]);
}
#[test]
fn test_rsplitn_char_iterator() {
let data = "\nMäry häd ä little lämb\nLittle lämb\n";
2014-08-13 18:33:14 -05:00
let mut split: Vec<&str> = data.rsplitn(3, ' ').collect();
split.reverse();
assert_eq!(split, vec!["\nMäry häd ä", "little", "lämb\nLittle", "lämb\n"]);
2014-08-13 18:33:14 -05:00
let mut split: Vec<&str> = data.rsplitn(3, |c: char| c == ' ').collect();
split.reverse();
assert_eq!(split, vec!["\nMäry häd ä", "little", "lämb\nLittle", "lämb\n"]);
// Unicode
2014-08-13 18:33:14 -05:00
let mut split: Vec<&str> = data.rsplitn(3, 'ä').collect();
split.reverse();
assert_eq!(split, vec!["\nMäry häd ", " little l", "mb\nLittle l", "mb\n"]);
2014-08-13 18:33:14 -05:00
let mut split: Vec<&str> = data.rsplitn(3, |c: char| c == 'ä').collect();
split.reverse();
assert_eq!(split, vec!["\nMäry häd ", " little l", "mb\nLittle l", "mb\n"]);
}
#[test]
fn test_split_char_iterator_no_trailing() {
let data = "\nMäry häd ä little lämb\nLittle lämb\n";
let split: Vec<&str> = data.split('\n').collect();
assert_eq!(split, vec!["", "Märy häd ä little lämb", "Little lämb", ""]);
let split: Vec<&str> = data.split_terminator('\n').collect();
assert_eq!(split, vec!["", "Märy häd ä little lämb", "Little lämb"]);
}
#[test]
fn test_rev_split_char_iterator_no_trailing() {
let data = "\nMäry häd ä little lämb\nLittle lämb\n";
let mut split: Vec<&str> = data.split('\n').rev().collect();
split.reverse();
assert_eq!(split, vec!["", "Märy häd ä little lämb", "Little lämb", ""]);
let mut split: Vec<&str> = data.split_terminator('\n').rev().collect();
split.reverse();
assert_eq!(split, vec!["", "Märy häd ä little lämb", "Little lämb"]);
}
#[test]
fn test_words() {
let data = "\n \tMäry häd\tä little lämb\nLittle lämb\n";
let words: Vec<&str> = data.words().collect();
assert_eq!(words, vec!["Märy", "häd", "ä", "little", "lämb", "Little", "lämb"])
}
#[test]
fn test_nfd_chars() {
macro_rules! t {
($input: expr, $expected: expr) => {
assert_eq!($input.nfd_chars().collect::<String>(), $expected.into_string());
}
}
t!("abc", "abc");
t!("\u1e0b\u01c4", "d\u0307\u01c4");
t!("\u2026", "\u2026");
t!("\u2126", "\u03a9");
t!("\u1e0b\u0323", "d\u0323\u0307");
t!("\u1e0d\u0307", "d\u0323\u0307");
t!("a\u0301", "a\u0301");
t!("\u0301a", "\u0301a");
t!("\ud4db", "\u1111\u1171\u11b6");
t!("\uac1c", "\u1100\u1162");
}
#[test]
fn test_nfkd_chars() {
macro_rules! t {
($input: expr, $expected: expr) => {
assert_eq!($input.nfkd_chars().collect::<String>(), $expected.into_string());
}
}
t!("abc", "abc");
t!("\u1e0b\u01c4", "d\u0307DZ\u030c");
t!("\u2026", "...");
t!("\u2126", "\u03a9");
t!("\u1e0b\u0323", "d\u0323\u0307");
t!("\u1e0d\u0307", "d\u0323\u0307");
t!("a\u0301", "a\u0301");
t!("\u0301a", "\u0301a");
t!("\ud4db", "\u1111\u1171\u11b6");
t!("\uac1c", "\u1100\u1162");
}
#[test]
fn test_nfc_chars() {
macro_rules! t {
($input: expr, $expected: expr) => {
assert_eq!($input.nfc_chars().collect::<String>(), $expected.into_string());
}
}
t!("abc", "abc");
t!("\u1e0b\u01c4", "\u1e0b\u01c4");
t!("\u2026", "\u2026");
t!("\u2126", "\u03a9");
t!("\u1e0b\u0323", "\u1e0d\u0307");
t!("\u1e0d\u0307", "\u1e0d\u0307");
t!("a\u0301", "\xe1");
t!("\u0301a", "\u0301a");
t!("\ud4db", "\ud4db");
t!("\uac1c", "\uac1c");
t!("a\u0300\u0305\u0315\u05aeb", "\xe0\u05ae\u0305\u0315b");
}
#[test]
fn test_nfkc_chars() {
macro_rules! t {
($input: expr, $expected: expr) => {
assert_eq!($input.nfkc_chars().collect::<String>(), $expected.into_string());
}
}
t!("abc", "abc");
t!("\u1e0b\u01c4", "\u1e0bD\u017d");
t!("\u2026", "...");
t!("\u2126", "\u03a9");
t!("\u1e0b\u0323", "\u1e0d\u0307");
t!("\u1e0d\u0307", "\u1e0d\u0307");
t!("a\u0301", "\xe1");
t!("\u0301a", "\u0301a");
t!("\ud4db", "\ud4db");
t!("\uac1c", "\uac1c");
t!("a\u0300\u0305\u0315\u05aeb", "\xe0\u05ae\u0305\u0315b");
}
#[test]
fn test_lines() {
let data = "\nMäry häd ä little lämb\n\nLittle lämb\n";
let lines: Vec<&str> = data.lines().collect();
assert_eq!(lines, vec!["", "Märy häd ä little lämb", "", "Little lämb"]);
let data = "\nMäry häd ä little lämb\n\nLittle lämb"; // no trailing \n
let lines: Vec<&str> = data.lines().collect();
assert_eq!(lines, vec!["", "Märy häd ä little lämb", "", "Little lämb"]);
}
#[test]
fn test_graphemes() {
use std::iter::order;
// official Unicode test data
// from http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/GraphemeBreakTest.txt
2014-08-06 04:59:40 -05:00
let test_same: [(_, &[_]), .. 325] = [
("\u0020\u0020", &["\u0020", "\u0020"]), ("\u0020\u0308\u0020", &["\u0020\u0308",
"\u0020"]), ("\u0020\u000D", &["\u0020", "\u000D"]), ("\u0020\u0308\u000D",
&["\u0020\u0308", "\u000D"]), ("\u0020\u000A", &["\u0020", "\u000A"]),
("\u0020\u0308\u000A", &["\u0020\u0308", "\u000A"]), ("\u0020\u0001", &["\u0020",
"\u0001"]), ("\u0020\u0308\u0001", &["\u0020\u0308", "\u0001"]), ("\u0020\u0300",
&["\u0020\u0300"]), ("\u0020\u0308\u0300", &["\u0020\u0308\u0300"]), ("\u0020\u1100",
&["\u0020", "\u1100"]), ("\u0020\u0308\u1100", &["\u0020\u0308", "\u1100"]),
("\u0020\u1160", &["\u0020", "\u1160"]), ("\u0020\u0308\u1160", &["\u0020\u0308",
"\u1160"]), ("\u0020\u11A8", &["\u0020", "\u11A8"]), ("\u0020\u0308\u11A8",
&["\u0020\u0308", "\u11A8"]), ("\u0020\uAC00", &["\u0020", "\uAC00"]),
("\u0020\u0308\uAC00", &["\u0020\u0308", "\uAC00"]), ("\u0020\uAC01", &["\u0020",
"\uAC01"]), ("\u0020\u0308\uAC01", &["\u0020\u0308", "\uAC01"]), ("\u0020\U0001F1E6",
&["\u0020", "\U0001F1E6"]), ("\u0020\u0308\U0001F1E6", &["\u0020\u0308",
"\U0001F1E6"]), ("\u0020\u0378", &["\u0020", "\u0378"]), ("\u0020\u0308\u0378",
&["\u0020\u0308", "\u0378"]), ("\u000D\u0020", &["\u000D", "\u0020"]),
("\u000D\u0308\u0020", &["\u000D", "\u0308", "\u0020"]), ("\u000D\u000D", &["\u000D",
"\u000D"]), ("\u000D\u0308\u000D", &["\u000D", "\u0308", "\u000D"]), ("\u000D\u000A",
&["\u000D\u000A"]), ("\u000D\u0308\u000A", &["\u000D", "\u0308", "\u000A"]),
("\u000D\u0001", &["\u000D", "\u0001"]), ("\u000D\u0308\u0001", &["\u000D", "\u0308",
"\u0001"]), ("\u000D\u0300", &["\u000D", "\u0300"]), ("\u000D\u0308\u0300",
&["\u000D", "\u0308\u0300"]), ("\u000D\u0903", &["\u000D", "\u0903"]),
("\u000D\u1100", &["\u000D", "\u1100"]), ("\u000D\u0308\u1100", &["\u000D", "\u0308",
"\u1100"]), ("\u000D\u1160", &["\u000D", "\u1160"]), ("\u000D\u0308\u1160",
&["\u000D", "\u0308", "\u1160"]), ("\u000D\u11A8", &["\u000D", "\u11A8"]),
("\u000D\u0308\u11A8", &["\u000D", "\u0308", "\u11A8"]), ("\u000D\uAC00", &["\u000D",
"\uAC00"]), ("\u000D\u0308\uAC00", &["\u000D", "\u0308", "\uAC00"]), ("\u000D\uAC01",
&["\u000D", "\uAC01"]), ("\u000D\u0308\uAC01", &["\u000D", "\u0308", "\uAC01"]),
("\u000D\U0001F1E6", &["\u000D", "\U0001F1E6"]), ("\u000D\u0308\U0001F1E6",
&["\u000D", "\u0308", "\U0001F1E6"]), ("\u000D\u0378", &["\u000D", "\u0378"]),
("\u000D\u0308\u0378", &["\u000D", "\u0308", "\u0378"]), ("\u000A\u0020", &["\u000A",
"\u0020"]), ("\u000A\u0308\u0020", &["\u000A", "\u0308", "\u0020"]), ("\u000A\u000D",
&["\u000A", "\u000D"]), ("\u000A\u0308\u000D", &["\u000A", "\u0308", "\u000D"]),
("\u000A\u000A", &["\u000A", "\u000A"]), ("\u000A\u0308\u000A", &["\u000A", "\u0308",
"\u000A"]), ("\u000A\u0001", &["\u000A", "\u0001"]), ("\u000A\u0308\u0001",
&["\u000A", "\u0308", "\u0001"]), ("\u000A\u0300", &["\u000A", "\u0300"]),
("\u000A\u0308\u0300", &["\u000A", "\u0308\u0300"]), ("\u000A\u0903", &["\u000A",
"\u0903"]), ("\u000A\u1100", &["\u000A", "\u1100"]), ("\u000A\u0308\u1100",
&["\u000A", "\u0308", "\u1100"]), ("\u000A\u1160", &["\u000A", "\u1160"]),
("\u000A\u0308\u1160", &["\u000A", "\u0308", "\u1160"]), ("\u000A\u11A8", &["\u000A",
"\u11A8"]), ("\u000A\u0308\u11A8", &["\u000A", "\u0308", "\u11A8"]), ("\u000A\uAC00",
&["\u000A", "\uAC00"]), ("\u000A\u0308\uAC00", &["\u000A", "\u0308", "\uAC00"]),
("\u000A\uAC01", &["\u000A", "\uAC01"]), ("\u000A\u0308\uAC01", &["\u000A", "\u0308",
"\uAC01"]), ("\u000A\U0001F1E6", &["\u000A", "\U0001F1E6"]),
("\u000A\u0308\U0001F1E6", &["\u000A", "\u0308", "\U0001F1E6"]), ("\u000A\u0378",
&["\u000A", "\u0378"]), ("\u000A\u0308\u0378", &["\u000A", "\u0308", "\u0378"]),
("\u0001\u0020", &["\u0001", "\u0020"]), ("\u0001\u0308\u0020", &["\u0001", "\u0308",
"\u0020"]), ("\u0001\u000D", &["\u0001", "\u000D"]), ("\u0001\u0308\u000D",
&["\u0001", "\u0308", "\u000D"]), ("\u0001\u000A", &["\u0001", "\u000A"]),
("\u0001\u0308\u000A", &["\u0001", "\u0308", "\u000A"]), ("\u0001\u0001", &["\u0001",
"\u0001"]), ("\u0001\u0308\u0001", &["\u0001", "\u0308", "\u0001"]), ("\u0001\u0300",
&["\u0001", "\u0300"]), ("\u0001\u0308\u0300", &["\u0001", "\u0308\u0300"]),
("\u0001\u0903", &["\u0001", "\u0903"]), ("\u0001\u1100", &["\u0001", "\u1100"]),
("\u0001\u0308\u1100", &["\u0001", "\u0308", "\u1100"]), ("\u0001\u1160", &["\u0001",
"\u1160"]), ("\u0001\u0308\u1160", &["\u0001", "\u0308", "\u1160"]), ("\u0001\u11A8",
&["\u0001", "\u11A8"]), ("\u0001\u0308\u11A8", &["\u0001", "\u0308", "\u11A8"]),
("\u0001\uAC00", &["\u0001", "\uAC00"]), ("\u0001\u0308\uAC00", &["\u0001", "\u0308",
"\uAC00"]), ("\u0001\uAC01", &["\u0001", "\uAC01"]), ("\u0001\u0308\uAC01",
&["\u0001", "\u0308", "\uAC01"]), ("\u0001\U0001F1E6", &["\u0001", "\U0001F1E6"]),
("\u0001\u0308\U0001F1E6", &["\u0001", "\u0308", "\U0001F1E6"]), ("\u0001\u0378",
&["\u0001", "\u0378"]), ("\u0001\u0308\u0378", &["\u0001", "\u0308", "\u0378"]),
("\u0300\u0020", &["\u0300", "\u0020"]), ("\u0300\u0308\u0020", &["\u0300\u0308",
"\u0020"]), ("\u0300\u000D", &["\u0300", "\u000D"]), ("\u0300\u0308\u000D",
&["\u0300\u0308", "\u000D"]), ("\u0300\u000A", &["\u0300", "\u000A"]),
("\u0300\u0308\u000A", &["\u0300\u0308", "\u000A"]), ("\u0300\u0001", &["\u0300",
"\u0001"]), ("\u0300\u0308\u0001", &["\u0300\u0308", "\u0001"]), ("\u0300\u0300",
&["\u0300\u0300"]), ("\u0300\u0308\u0300", &["\u0300\u0308\u0300"]), ("\u0300\u1100",
&["\u0300", "\u1100"]), ("\u0300\u0308\u1100", &["\u0300\u0308", "\u1100"]),
("\u0300\u1160", &["\u0300", "\u1160"]), ("\u0300\u0308\u1160", &["\u0300\u0308",
"\u1160"]), ("\u0300\u11A8", &["\u0300", "\u11A8"]), ("\u0300\u0308\u11A8",
&["\u0300\u0308", "\u11A8"]), ("\u0300\uAC00", &["\u0300", "\uAC00"]),
("\u0300\u0308\uAC00", &["\u0300\u0308", "\uAC00"]), ("\u0300\uAC01", &["\u0300",
"\uAC01"]), ("\u0300\u0308\uAC01", &["\u0300\u0308", "\uAC01"]), ("\u0300\U0001F1E6",
&["\u0300", "\U0001F1E6"]), ("\u0300\u0308\U0001F1E6", &["\u0300\u0308",
"\U0001F1E6"]), ("\u0300\u0378", &["\u0300", "\u0378"]), ("\u0300\u0308\u0378",
&["\u0300\u0308", "\u0378"]), ("\u0903\u0020", &["\u0903", "\u0020"]),
("\u0903\u0308\u0020", &["\u0903\u0308", "\u0020"]), ("\u0903\u000D", &["\u0903",
"\u000D"]), ("\u0903\u0308\u000D", &["\u0903\u0308", "\u000D"]), ("\u0903\u000A",
&["\u0903", "\u000A"]), ("\u0903\u0308\u000A", &["\u0903\u0308", "\u000A"]),
("\u0903\u0001", &["\u0903", "\u0001"]), ("\u0903\u0308\u0001", &["\u0903\u0308",
"\u0001"]), ("\u0903\u0300", &["\u0903\u0300"]), ("\u0903\u0308\u0300",
&["\u0903\u0308\u0300"]), ("\u0903\u1100", &["\u0903", "\u1100"]),
("\u0903\u0308\u1100", &["\u0903\u0308", "\u1100"]), ("\u0903\u1160", &["\u0903",
"\u1160"]), ("\u0903\u0308\u1160", &["\u0903\u0308", "\u1160"]), ("\u0903\u11A8",
&["\u0903", "\u11A8"]), ("\u0903\u0308\u11A8", &["\u0903\u0308", "\u11A8"]),
("\u0903\uAC00", &["\u0903", "\uAC00"]), ("\u0903\u0308\uAC00", &["\u0903\u0308",
"\uAC00"]), ("\u0903\uAC01", &["\u0903", "\uAC01"]), ("\u0903\u0308\uAC01",
&["\u0903\u0308", "\uAC01"]), ("\u0903\U0001F1E6", &["\u0903", "\U0001F1E6"]),
("\u0903\u0308\U0001F1E6", &["\u0903\u0308", "\U0001F1E6"]), ("\u0903\u0378",
&["\u0903", "\u0378"]), ("\u0903\u0308\u0378", &["\u0903\u0308", "\u0378"]),
("\u1100\u0020", &["\u1100", "\u0020"]), ("\u1100\u0308\u0020", &["\u1100\u0308",
"\u0020"]), ("\u1100\u000D", &["\u1100", "\u000D"]), ("\u1100\u0308\u000D",
&["\u1100\u0308", "\u000D"]), ("\u1100\u000A", &["\u1100", "\u000A"]),
("\u1100\u0308\u000A", &["\u1100\u0308", "\u000A"]), ("\u1100\u0001", &["\u1100",
"\u0001"]), ("\u1100\u0308\u0001", &["\u1100\u0308", "\u0001"]), ("\u1100\u0300",
&["\u1100\u0300"]), ("\u1100\u0308\u0300", &["\u1100\u0308\u0300"]), ("\u1100\u1100",
&["\u1100\u1100"]), ("\u1100\u0308\u1100", &["\u1100\u0308", "\u1100"]),
("\u1100\u1160", &["\u1100\u1160"]), ("\u1100\u0308\u1160", &["\u1100\u0308",
"\u1160"]), ("\u1100\u11A8", &["\u1100", "\u11A8"]), ("\u1100\u0308\u11A8",
&["\u1100\u0308", "\u11A8"]), ("\u1100\uAC00", &["\u1100\uAC00"]),
("\u1100\u0308\uAC00", &["\u1100\u0308", "\uAC00"]), ("\u1100\uAC01",
&["\u1100\uAC01"]), ("\u1100\u0308\uAC01", &["\u1100\u0308", "\uAC01"]),
("\u1100\U0001F1E6", &["\u1100", "\U0001F1E6"]), ("\u1100\u0308\U0001F1E6",
&["\u1100\u0308", "\U0001F1E6"]), ("\u1100\u0378", &["\u1100", "\u0378"]),
("\u1100\u0308\u0378", &["\u1100\u0308", "\u0378"]), ("\u1160\u0020", &["\u1160",
"\u0020"]), ("\u1160\u0308\u0020", &["\u1160\u0308", "\u0020"]), ("\u1160\u000D",
&["\u1160", "\u000D"]), ("\u1160\u0308\u000D", &["\u1160\u0308", "\u000D"]),
("\u1160\u000A", &["\u1160", "\u000A"]), ("\u1160\u0308\u000A", &["\u1160\u0308",
"\u000A"]), ("\u1160\u0001", &["\u1160", "\u0001"]), ("\u1160\u0308\u0001",
&["\u1160\u0308", "\u0001"]), ("\u1160\u0300", &["\u1160\u0300"]),
("\u1160\u0308\u0300", &["\u1160\u0308\u0300"]), ("\u1160\u1100", &["\u1160",
"\u1100"]), ("\u1160\u0308\u1100", &["\u1160\u0308", "\u1100"]), ("\u1160\u1160",
&["\u1160\u1160"]), ("\u1160\u0308\u1160", &["\u1160\u0308", "\u1160"]),
("\u1160\u11A8", &["\u1160\u11A8"]), ("\u1160\u0308\u11A8", &["\u1160\u0308",
"\u11A8"]), ("\u1160\uAC00", &["\u1160", "\uAC00"]), ("\u1160\u0308\uAC00",
&["\u1160\u0308", "\uAC00"]), ("\u1160\uAC01", &["\u1160", "\uAC01"]),
("\u1160\u0308\uAC01", &["\u1160\u0308", "\uAC01"]), ("\u1160\U0001F1E6", &["\u1160",
"\U0001F1E6"]), ("\u1160\u0308\U0001F1E6", &["\u1160\u0308", "\U0001F1E6"]),
("\u1160\u0378", &["\u1160", "\u0378"]), ("\u1160\u0308\u0378", &["\u1160\u0308",
"\u0378"]), ("\u11A8\u0020", &["\u11A8", "\u0020"]), ("\u11A8\u0308\u0020",
&["\u11A8\u0308", "\u0020"]), ("\u11A8\u000D", &["\u11A8", "\u000D"]),
("\u11A8\u0308\u000D", &["\u11A8\u0308", "\u000D"]), ("\u11A8\u000A", &["\u11A8",
"\u000A"]), ("\u11A8\u0308\u000A", &["\u11A8\u0308", "\u000A"]), ("\u11A8\u0001",
&["\u11A8", "\u0001"]), ("\u11A8\u0308\u0001", &["\u11A8\u0308", "\u0001"]),
("\u11A8\u0300", &["\u11A8\u0300"]), ("\u11A8\u0308\u0300", &["\u11A8\u0308\u0300"]),
("\u11A8\u1100", &["\u11A8", "\u1100"]), ("\u11A8\u0308\u1100", &["\u11A8\u0308",
"\u1100"]), ("\u11A8\u1160", &["\u11A8", "\u1160"]), ("\u11A8\u0308\u1160",
&["\u11A8\u0308", "\u1160"]), ("\u11A8\u11A8", &["\u11A8\u11A8"]),
("\u11A8\u0308\u11A8", &["\u11A8\u0308", "\u11A8"]), ("\u11A8\uAC00", &["\u11A8",
"\uAC00"]), ("\u11A8\u0308\uAC00", &["\u11A8\u0308", "\uAC00"]), ("\u11A8\uAC01",
&["\u11A8", "\uAC01"]), ("\u11A8\u0308\uAC01", &["\u11A8\u0308", "\uAC01"]),
("\u11A8\U0001F1E6", &["\u11A8", "\U0001F1E6"]), ("\u11A8\u0308\U0001F1E6",
&["\u11A8\u0308", "\U0001F1E6"]), ("\u11A8\u0378", &["\u11A8", "\u0378"]),
("\u11A8\u0308\u0378", &["\u11A8\u0308", "\u0378"]), ("\uAC00\u0020", &["\uAC00",
"\u0020"]), ("\uAC00\u0308\u0020", &["\uAC00\u0308", "\u0020"]), ("\uAC00\u000D",
&["\uAC00", "\u000D"]), ("\uAC00\u0308\u000D", &["\uAC00\u0308", "\u000D"]),
("\uAC00\u000A", &["\uAC00", "\u000A"]), ("\uAC00\u0308\u000A", &["\uAC00\u0308",
"\u000A"]), ("\uAC00\u0001", &["\uAC00", "\u0001"]), ("\uAC00\u0308\u0001",
&["\uAC00\u0308", "\u0001"]), ("\uAC00\u0300", &["\uAC00\u0300"]),
("\uAC00\u0308\u0300", &["\uAC00\u0308\u0300"]), ("\uAC00\u1100", &["\uAC00",
"\u1100"]), ("\uAC00\u0308\u1100", &["\uAC00\u0308", "\u1100"]), ("\uAC00\u1160",
&["\uAC00\u1160"]), ("\uAC00\u0308\u1160", &["\uAC00\u0308", "\u1160"]),
("\uAC00\u11A8", &["\uAC00\u11A8"]), ("\uAC00\u0308\u11A8", &["\uAC00\u0308",
"\u11A8"]), ("\uAC00\uAC00", &["\uAC00", "\uAC00"]), ("\uAC00\u0308\uAC00",
&["\uAC00\u0308", "\uAC00"]), ("\uAC00\uAC01", &["\uAC00", "\uAC01"]),
("\uAC00\u0308\uAC01", &["\uAC00\u0308", "\uAC01"]), ("\uAC00\U0001F1E6", &["\uAC00",
"\U0001F1E6"]), ("\uAC00\u0308\U0001F1E6", &["\uAC00\u0308", "\U0001F1E6"]),
("\uAC00\u0378", &["\uAC00", "\u0378"]), ("\uAC00\u0308\u0378", &["\uAC00\u0308",
"\u0378"]), ("\uAC01\u0020", &["\uAC01", "\u0020"]), ("\uAC01\u0308\u0020",
&["\uAC01\u0308", "\u0020"]), ("\uAC01\u000D", &["\uAC01", "\u000D"]),
("\uAC01\u0308\u000D", &["\uAC01\u0308", "\u000D"]), ("\uAC01\u000A", &["\uAC01",
"\u000A"]), ("\uAC01\u0308\u000A", &["\uAC01\u0308", "\u000A"]), ("\uAC01\u0001",
&["\uAC01", "\u0001"]), ("\uAC01\u0308\u0001", &["\uAC01\u0308", "\u0001"]),
("\uAC01\u0300", &["\uAC01\u0300"]), ("\uAC01\u0308\u0300", &["\uAC01\u0308\u0300"]),
("\uAC01\u1100", &["\uAC01", "\u1100"]), ("\uAC01\u0308\u1100", &["\uAC01\u0308",
"\u1100"]), ("\uAC01\u1160", &["\uAC01", "\u1160"]), ("\uAC01\u0308\u1160",
&["\uAC01\u0308", "\u1160"]), ("\uAC01\u11A8", &["\uAC01\u11A8"]),
("\uAC01\u0308\u11A8", &["\uAC01\u0308", "\u11A8"]), ("\uAC01\uAC00", &["\uAC01",
"\uAC00"]), ("\uAC01\u0308\uAC00", &["\uAC01\u0308", "\uAC00"]), ("\uAC01\uAC01",
&["\uAC01", "\uAC01"]), ("\uAC01\u0308\uAC01", &["\uAC01\u0308", "\uAC01"]),
("\uAC01\U0001F1E6", &["\uAC01", "\U0001F1E6"]), ("\uAC01\u0308\U0001F1E6",
&["\uAC01\u0308", "\U0001F1E6"]), ("\uAC01\u0378", &["\uAC01", "\u0378"]),
("\uAC01\u0308\u0378", &["\uAC01\u0308", "\u0378"]), ("\U0001F1E6\u0020",
&["\U0001F1E6", "\u0020"]), ("\U0001F1E6\u0308\u0020", &["\U0001F1E6\u0308",
"\u0020"]), ("\U0001F1E6\u000D", &["\U0001F1E6", "\u000D"]),
("\U0001F1E6\u0308\u000D", &["\U0001F1E6\u0308", "\u000D"]), ("\U0001F1E6\u000A",
&["\U0001F1E6", "\u000A"]), ("\U0001F1E6\u0308\u000A", &["\U0001F1E6\u0308",
"\u000A"]), ("\U0001F1E6\u0001", &["\U0001F1E6", "\u0001"]),
("\U0001F1E6\u0308\u0001", &["\U0001F1E6\u0308", "\u0001"]), ("\U0001F1E6\u0300",
&["\U0001F1E6\u0300"]), ("\U0001F1E6\u0308\u0300", &["\U0001F1E6\u0308\u0300"]),
("\U0001F1E6\u1100", &["\U0001F1E6", "\u1100"]), ("\U0001F1E6\u0308\u1100",
&["\U0001F1E6\u0308", "\u1100"]), ("\U0001F1E6\u1160", &["\U0001F1E6", "\u1160"]),
("\U0001F1E6\u0308\u1160", &["\U0001F1E6\u0308", "\u1160"]), ("\U0001F1E6\u11A8",
&["\U0001F1E6", "\u11A8"]), ("\U0001F1E6\u0308\u11A8", &["\U0001F1E6\u0308",
"\u11A8"]), ("\U0001F1E6\uAC00", &["\U0001F1E6", "\uAC00"]),
("\U0001F1E6\u0308\uAC00", &["\U0001F1E6\u0308", "\uAC00"]), ("\U0001F1E6\uAC01",
&["\U0001F1E6", "\uAC01"]), ("\U0001F1E6\u0308\uAC01", &["\U0001F1E6\u0308",
"\uAC01"]), ("\U0001F1E6\U0001F1E6", &["\U0001F1E6\U0001F1E6"]),
("\U0001F1E6\u0308\U0001F1E6", &["\U0001F1E6\u0308", "\U0001F1E6"]),
("\U0001F1E6\u0378", &["\U0001F1E6", "\u0378"]), ("\U0001F1E6\u0308\u0378",
&["\U0001F1E6\u0308", "\u0378"]), ("\u0378\u0020", &["\u0378", "\u0020"]),
("\u0378\u0308\u0020", &["\u0378\u0308", "\u0020"]), ("\u0378\u000D", &["\u0378",
"\u000D"]), ("\u0378\u0308\u000D", &["\u0378\u0308", "\u000D"]), ("\u0378\u000A",
&["\u0378", "\u000A"]), ("\u0378\u0308\u000A", &["\u0378\u0308", "\u000A"]),
("\u0378\u0001", &["\u0378", "\u0001"]), ("\u0378\u0308\u0001", &["\u0378\u0308",
"\u0001"]), ("\u0378\u0300", &["\u0378\u0300"]), ("\u0378\u0308\u0300",
&["\u0378\u0308\u0300"]), ("\u0378\u1100", &["\u0378", "\u1100"]),
("\u0378\u0308\u1100", &["\u0378\u0308", "\u1100"]), ("\u0378\u1160", &["\u0378",
"\u1160"]), ("\u0378\u0308\u1160", &["\u0378\u0308", "\u1160"]), ("\u0378\u11A8",
&["\u0378", "\u11A8"]), ("\u0378\u0308\u11A8", &["\u0378\u0308", "\u11A8"]),
("\u0378\uAC00", &["\u0378", "\uAC00"]), ("\u0378\u0308\uAC00", &["\u0378\u0308",
"\uAC00"]), ("\u0378\uAC01", &["\u0378", "\uAC01"]), ("\u0378\u0308\uAC01",
&["\u0378\u0308", "\uAC01"]), ("\u0378\U0001F1E6", &["\u0378", "\U0001F1E6"]),
("\u0378\u0308\U0001F1E6", &["\u0378\u0308", "\U0001F1E6"]), ("\u0378\u0378",
&["\u0378", "\u0378"]), ("\u0378\u0308\u0378", &["\u0378\u0308", "\u0378"]),
("\u0061\U0001F1E6\u0062", &["\u0061", "\U0001F1E6", "\u0062"]),
("\U0001F1F7\U0001F1FA", &["\U0001F1F7\U0001F1FA"]),
("\U0001F1F7\U0001F1FA\U0001F1F8", &["\U0001F1F7\U0001F1FA\U0001F1F8"]),
("\U0001F1F7\U0001F1FA\U0001F1F8\U0001F1EA",
&["\U0001F1F7\U0001F1FA\U0001F1F8\U0001F1EA"]),
("\U0001F1F7\U0001F1FA\u200B\U0001F1F8\U0001F1EA", &["\U0001F1F7\U0001F1FA", "\u200B",
"\U0001F1F8\U0001F1EA"]), ("\U0001F1E6\U0001F1E7\U0001F1E8",
&["\U0001F1E6\U0001F1E7\U0001F1E8"]), ("\U0001F1E6\u200D\U0001F1E7\U0001F1E8",
&["\U0001F1E6\u200D", "\U0001F1E7\U0001F1E8"]),
("\U0001F1E6\U0001F1E7\u200D\U0001F1E8", &["\U0001F1E6\U0001F1E7\u200D",
"\U0001F1E8"]), ("\u0020\u200D\u0646", &["\u0020\u200D", "\u0646"]),
("\u0646\u200D\u0020", &["\u0646\u200D", "\u0020"]),
];
2014-08-06 04:59:40 -05:00
let test_diff: [(_, &[_], &[_]), .. 23] = [
("\u0020\u0903", &["\u0020\u0903"], &["\u0020", "\u0903"]), ("\u0020\u0308\u0903",
&["\u0020\u0308\u0903"], &["\u0020\u0308", "\u0903"]), ("\u000D\u0308\u0903",
&["\u000D", "\u0308\u0903"], &["\u000D", "\u0308", "\u0903"]), ("\u000A\u0308\u0903",
&["\u000A", "\u0308\u0903"], &["\u000A", "\u0308", "\u0903"]), ("\u0001\u0308\u0903",
&["\u0001", "\u0308\u0903"], &["\u0001", "\u0308", "\u0903"]), ("\u0300\u0903",
&["\u0300\u0903"], &["\u0300", "\u0903"]), ("\u0300\u0308\u0903",
&["\u0300\u0308\u0903"], &["\u0300\u0308", "\u0903"]), ("\u0903\u0903",
&["\u0903\u0903"], &["\u0903", "\u0903"]), ("\u0903\u0308\u0903",
&["\u0903\u0308\u0903"], &["\u0903\u0308", "\u0903"]), ("\u1100\u0903",
&["\u1100\u0903"], &["\u1100", "\u0903"]), ("\u1100\u0308\u0903",
&["\u1100\u0308\u0903"], &["\u1100\u0308", "\u0903"]), ("\u1160\u0903",
&["\u1160\u0903"], &["\u1160", "\u0903"]), ("\u1160\u0308\u0903",
&["\u1160\u0308\u0903"], &["\u1160\u0308", "\u0903"]), ("\u11A8\u0903",
&["\u11A8\u0903"], &["\u11A8", "\u0903"]), ("\u11A8\u0308\u0903",
&["\u11A8\u0308\u0903"], &["\u11A8\u0308", "\u0903"]), ("\uAC00\u0903",
&["\uAC00\u0903"], &["\uAC00", "\u0903"]), ("\uAC00\u0308\u0903",
&["\uAC00\u0308\u0903"], &["\uAC00\u0308", "\u0903"]), ("\uAC01\u0903",
&["\uAC01\u0903"], &["\uAC01", "\u0903"]), ("\uAC01\u0308\u0903",
&["\uAC01\u0308\u0903"], &["\uAC01\u0308", "\u0903"]), ("\U0001F1E6\u0903",
&["\U0001F1E6\u0903"], &["\U0001F1E6", "\u0903"]), ("\U0001F1E6\u0308\u0903",
&["\U0001F1E6\u0308\u0903"], &["\U0001F1E6\u0308", "\u0903"]), ("\u0378\u0903",
&["\u0378\u0903"], &["\u0378", "\u0903"]), ("\u0378\u0308\u0903",
&["\u0378\u0308\u0903"], &["\u0378\u0308", "\u0903"]),
];
for &(s, g) in test_same.iter() {
// test forward iterator
assert!(order::equals(s.graphemes(true), g.iter().map(|&x| x)));
assert!(order::equals(s.graphemes(false), g.iter().map(|&x| x)));
// test reverse iterator
assert!(order::equals(s.graphemes(true).rev(), g.iter().rev().map(|&x| x)));
assert!(order::equals(s.graphemes(false).rev(), g.iter().rev().map(|&x| x)));
}
for &(s, gt, gf) in test_diff.iter() {
// test forward iterator
assert!(order::equals(s.graphemes(true), gt.iter().map(|&x| x)));
assert!(order::equals(s.graphemes(false), gf.iter().map(|&x| x)));
// test reverse iterator
assert!(order::equals(s.graphemes(true).rev(), gt.iter().rev().map(|&x| x)));
assert!(order::equals(s.graphemes(false).rev(), gf.iter().rev().map(|&x| x)));
}
// test the indices iterators
let s = "a̐éö̲\r\n";
let gr_inds = s.grapheme_indices(true).collect::<Vec<(uint, &str)>>();
2014-08-06 04:59:40 -05:00
let b: &[_] = &[(0u, ""), (3, ""), (6, "ö̲"), (11, "\r\n")];
assert_eq!(gr_inds.as_slice(), b);
let gr_inds = s.grapheme_indices(true).rev().collect::<Vec<(uint, &str)>>();
2014-08-06 04:59:40 -05:00
let b: &[_] = &[(11, "\r\n"), (6, "ö̲"), (3, ""), (0u, "")];
assert_eq!(gr_inds.as_slice(), b);
let mut gr_inds = s.grapheme_indices(true);
let e1 = gr_inds.size_hint();
assert_eq!(e1, (1, Some(13)));
let c = gr_inds.count();
assert_eq!(c, 4);
let e2 = gr_inds.size_hint();
assert_eq!(e2, (0, Some(0)));
// make sure the reverse iterator does the right thing with "\n" at beginning of string
let s = "\n\r\n\r";
let gr = s.graphemes(true).rev().collect::<Vec<&str>>();
2014-08-06 04:59:40 -05:00
let b: &[_] = &["\r", "\r\n", "\n"];
assert_eq!(gr.as_slice(), b);
}
#[test]
fn test_split_strator() {
fn t(s: &str, sep: &str, u: &[&str]) {
let v: Vec<&str> = s.split_str(sep).collect();
assert_eq!(v.as_slice(), u.as_slice());
}
t("--1233345--", "12345", ["--1233345--"]);
t("abc::hello::there", "::", ["abc", "hello", "there"]);
t("::hello::there", "::", ["", "hello", "there"]);
t("hello::there::", "::", ["hello", "there", ""]);
t("::hello::there::", "::", ["", "hello", "there", ""]);
t("ประเทศไทย中华Việt Nam", "中华", ["ประเทศไทย", "Việt Nam"]);
t("zzXXXzzYYYzz", "zz", ["", "XXX", "YYY", ""]);
t("zzXXXzYYYz", "XXX", ["zz", "zYYYz"]);
t(".XXX.YYY.", ".", ["", "XXX", "YYY", ""]);
t("", ".", [""]);
t("zz", "zz", ["",""]);
t("ok", "z", ["ok"]);
t("zzz", "zz", ["","z"]);
t("zzzzz", "zz", ["","","z"]);
}
2013-06-17 02:05:51 -05:00
#[test]
2013-08-10 08:38:00 -05:00
fn test_str_default() {
use std::default::Default;
2013-08-10 08:38:00 -05:00
fn t<S: Default + Str>() {
let s: S = Default::default();
2013-06-17 02:05:51 -05:00
assert_eq!(s.as_slice(), "");
}
t::<&str>();
t::<String>();
2013-06-17 02:05:51 -05:00
}
#[test]
fn test_str_container() {
fn sum_len<S: Collection>(v: &[S]) -> uint {
v.iter().map(|x| x.len()).sum()
}
let s = String::from_str("01234");
assert_eq!(5, sum_len(["012", "", "34"]));
assert_eq!(5, sum_len([String::from_str("01"), String::from_str("2"),
String::from_str("34"), String::from_str("")]));
assert_eq!(5, sum_len([s.as_slice()]));
}
2013-08-24 00:05:35 -05:00
#[test]
fn test_str_from_utf8() {
let xs = b"hello";
assert_eq!(from_utf8(xs), Some("hello"));
let xs = "ศไทย中华Việt Nam".as_bytes();
assert_eq!(from_utf8(xs), Some("ศไทย中华Việt Nam"));
let xs = b"hello\xFF";
assert_eq!(from_utf8(xs), None);
}
#[test]
fn test_maybe_owned_traits() {
let s = Slice("abcde");
assert_eq!(s.len(), 5);
assert_eq!(s.as_slice(), "abcde");
assert_eq!(String::from_str(s.as_slice()).as_slice(), "abcde");
assert_eq!(format!("{}", s).as_slice(), "abcde");
assert!(s.lt(&Owned(String::from_str("bcdef"))));
assert_eq!(Slice(""), Default::default());
let o = Owned(String::from_str("abcde"));
assert_eq!(o.len(), 5);
assert_eq!(o.as_slice(), "abcde");
assert_eq!(String::from_str(o.as_slice()).as_slice(), "abcde");
assert_eq!(format!("{}", o).as_slice(), "abcde");
assert!(o.lt(&Slice("bcdef")));
assert_eq!(Owned(String::from_str("")), Default::default());
assert!(s.cmp(&o) == Equal);
assert!(s.equiv(&o));
assert!(o.cmp(&s) == Equal);
assert!(o.equiv(&s));
}
#[test]
fn test_maybe_owned_methods() {
let s = Slice("abcde");
assert!(s.is_slice());
assert!(!s.is_owned());
let o = Owned(String::from_str("abcde"));
assert!(!o.is_slice());
assert!(o.is_owned());
}
#[test]
fn test_maybe_owned_clone() {
assert_eq!(Owned(String::from_str("abcde")), Slice("abcde").clone());
assert_eq!(Owned(String::from_str("abcde")), Owned(String::from_str("abcde")).clone());
assert_eq!(Slice("abcde"), Slice("abcde").clone());
assert_eq!(Slice("abcde"), Owned(String::from_str("abcde")).clone());
}
#[test]
fn test_maybe_owned_into_string() {
assert_eq!(Slice("abcde").into_string(), String::from_str("abcde"));
assert_eq!(Owned(String::from_str("abcde")).into_string(),
String::from_str("abcde"));
}
#[test]
fn test_into_maybe_owned() {
assert_eq!("abcde".into_maybe_owned(), Slice("abcde"));
assert_eq!((String::from_str("abcde")).into_maybe_owned(), Slice("abcde"));
assert_eq!("abcde".into_maybe_owned(), Owned(String::from_str("abcde")));
assert_eq!((String::from_str("abcde")).into_maybe_owned(),
Owned(String::from_str("abcde")));
}
2012-01-23 02:36:58 -06:00
}
2013-07-22 12:52:38 -05:00
#[cfg(test)]
mod bench {
use test::Bencher;
use test::black_box;
use super::*;
use std::iter::{Iterator, DoubleEndedIterator};
use std::collections::Collection;
#[bench]
fn char_iterator(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
b.iter(|| s.chars().count());
}
#[bench]
fn char_iterator_for(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
b.iter(|| {
for ch in s.chars() { black_box(ch) }
});
}
#[bench]
fn char_iterator_ascii(b: &mut Bencher) {
let s = "Mary had a little lamb, Little lamb
Mary had a little lamb, Little lamb
Mary had a little lamb, Little lamb
Mary had a little lamb, Little lamb
Mary had a little lamb, Little lamb
Mary had a little lamb, Little lamb";
b.iter(|| s.chars().count());
}
#[bench]
fn char_iterator_rev(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
b.iter(|| s.chars().rev().count());
}
#[bench]
fn char_iterator_rev_for(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
b.iter(|| {
for ch in s.chars().rev() { black_box(ch) }
});
}
#[bench]
fn char_indicesator(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.char_len();
b.iter(|| assert_eq!(s.char_indices().count(), len));
}
#[bench]
fn char_indicesator_rev(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let len = s.char_len();
b.iter(|| assert_eq!(s.char_indices().rev().count(), len));
}
2013-07-22 12:52:38 -05:00
#[bench]
fn split_unicode_ascii(b: &mut Bencher) {
let s = "ประเทศไทย中华Việt Namประเทศไทย中华Việt Nam";
b.iter(|| assert_eq!(s.split('V').count(), 3));
}
#[bench]
fn split_unicode_not_ascii(b: &mut Bencher) {
struct NotAscii(char);
impl CharEq for NotAscii {
fn matches(&mut self, c: char) -> bool {
let NotAscii(cc) = *self;
cc == c
}
fn only_ascii(&self) -> bool { false }
}
let s = "ประเทศไทย中华Việt Namประเทศไทย中华Việt Nam";
b.iter(|| assert_eq!(s.split(NotAscii('V')).count(), 3));
}
#[bench]
fn split_ascii(b: &mut Bencher) {
let s = "Mary had a little lamb, Little lamb, little-lamb.";
let len = s.split(' ').count();
b.iter(|| assert_eq!(s.split(' ').count(), len));
}
#[bench]
fn split_not_ascii(b: &mut Bencher) {
struct NotAscii(char);
impl CharEq for NotAscii {
#[inline]
fn matches(&mut self, c: char) -> bool {
let NotAscii(cc) = *self;
cc == c
}
fn only_ascii(&self) -> bool { false }
}
let s = "Mary had a little lamb, Little lamb, little-lamb.";
let len = s.split(' ').count();
b.iter(|| assert_eq!(s.split(NotAscii(' ')).count(), len));
}
#[bench]
fn split_extern_fn(b: &mut Bencher) {
let s = "Mary had a little lamb, Little lamb, little-lamb.";
let len = s.split(' ').count();
fn pred(c: char) -> bool { c == ' ' }
b.iter(|| assert_eq!(s.split(pred).count(), len));
}
#[bench]
fn split_closure(b: &mut Bencher) {
let s = "Mary had a little lamb, Little lamb, little-lamb.";
let len = s.split(' ').count();
b.iter(|| assert_eq!(s.split(|c: char| c == ' ').count(), len));
}
#[bench]
fn split_slice(b: &mut Bencher) {
let s = "Mary had a little lamb, Little lamb, little-lamb.";
let len = s.split(' ').count();
let c: &[char] = &[' '];
b.iter(|| assert_eq!(s.split(c).count(), len));
}
2013-07-22 12:52:38 -05:00
#[bench]
fn is_utf8_100_ascii(b: &mut Bencher) {
2013-07-22 12:52:38 -05:00
let s = b"Hello there, the quick brown fox jumped over the lazy dog! \
Lorem ipsum dolor sit amet, consectetur. ";
2013-07-22 12:52:38 -05:00
assert_eq!(100, s.len());
b.iter(|| {
is_utf8(s)
});
2013-07-22 12:52:38 -05:00
}
#[bench]
fn is_utf8_100_multibyte(b: &mut Bencher) {
let s = "𐌀𐌖𐌋𐌄𐌑𐌉ปรدولة الكويتทศไทย中华𐍅𐌿𐌻𐍆𐌹𐌻𐌰".as_bytes();
2013-07-22 12:52:38 -05:00
assert_eq!(100, s.len());
b.iter(|| {
is_utf8(s)
});
2013-07-22 12:52:38 -05:00
}
#[bench]
fn bench_connect(b: &mut Bencher) {
let s = "ศไทย中华Việt Nam; Mary had a little lamb, Little lamb";
let sep = "";
let v = [s, s, s, s, s, s, s, s, s, s];
b.iter(|| {
assert_eq!(v.connect(sep).len(), s.len() * 10 + sep.len() * 9);
})
}
#[bench]
fn bench_contains_short_short(b: &mut Bencher) {
let haystack = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
let needle = "sit";
b.iter(|| {
assert!(haystack.contains(needle));
})
}
#[bench]
fn bench_contains_short_long(b: &mut Bencher) {
let haystack = "\
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse quis lorem sit amet dolor \
ultricies condimentum. Praesent iaculis purus elit, ac malesuada quam malesuada in. Duis sed orci \
eros. Suspendisse sit amet magna mollis, mollis nunc luctus, imperdiet mi. Integer fringilla non \
sem ut lacinia. Fusce varius tortor a risus porttitor hendrerit. Morbi mauris dui, ultricies nec \
tempus vel, gravida nec quam.
In est dui, tincidunt sed tempus interdum, adipiscing laoreet ante. Etiam tempor, tellus quis \
sagittis interdum, nulla purus mattis sem, quis auctor erat odio ac tellus. In nec nunc sit amet \
diam volutpat molestie at sed ipsum. Vestibulum laoreet consequat vulputate. Integer accumsan \
lorem ac dignissim placerat. Suspendisse convallis faucibus lorem. Aliquam erat volutpat. In vel \
eleifend felis. Sed suscipit nulla lorem, sed mollis est sollicitudin et. Nam fermentum egestas \
interdum. Curabitur ut nisi justo.
Sed sollicitudin ipsum tellus, ut condimentum leo eleifend nec. Cras ut velit ante. Phasellus nec \
mollis odio. Mauris molestie erat in arcu mattis, at aliquet dolor vehicula. Quisque malesuada \
lectus sit amet nisi pretium, a condimentum ipsum porta. Morbi at dapibus diam. Praesent egestas \
est sed risus elementum, eu rutrum metus ultrices. Etiam fermentum consectetur magna, id rutrum \
felis accumsan a. Aliquam ut pellentesque libero. Sed mi nulla, lobortis eu tortor id, suscipit \
ultricies neque. Morbi iaculis sit amet risus at iaculis. Praesent eget ligula quis turpis \
feugiat suscipit vel non arcu. Interdum et malesuada fames ac ante ipsum primis in faucibus. \
Aliquam sit amet placerat lorem.
Cras a lacus vel ante posuere elementum. Nunc est leo, bibendum ut facilisis vel, bibendum at \
mauris. Nullam adipiscing diam vel odio ornare, luctus adipiscing mi luctus. Nulla facilisi. \
Mauris adipiscing bibendum neque, quis adipiscing lectus tempus et. Sed feugiat erat et nisl \
lobortis pharetra. Donec vitae erat enim. Nullam sit amet felis et quam lacinia tincidunt. Aliquam \
suscipit dapibus urna. Sed volutpat urna in magna pulvinar volutpat. Phasellus nec tellus ac diam \
cursus accumsan.
Nam lectus enim, dapibus non nisi tempor, consectetur convallis massa. Maecenas eleifend dictum \
feugiat. Etiam quis mauris vel risus luctus mattis a a nunc. Nullam orci quam, imperdiet id \
vehicula in, porttitor ut nibh. Duis sagittis adipiscing nisl vitae congue. Donec mollis risus eu \
leo suscipit, varius porttitor nulla porta. Pellentesque ut sem nec nisi euismod vehicula. Nulla \
malesuada sollicitudin quam eu fermentum.";
let needle = "english";
b.iter(|| {
assert!(!haystack.contains(needle));
})
}
#[bench]
fn bench_contains_bad_naive(b: &mut Bencher) {
let haystack = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
let needle = "aaaaaaaab";
b.iter(|| {
assert!(!haystack.contains(needle));
})
}
#[bench]
fn bench_contains_equal(b: &mut Bencher) {
let haystack = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
let needle = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
b.iter(|| {
assert!(haystack.contains(needle));
})
}
2013-07-22 12:52:38 -05:00
}