diff --git a/src/doc/trpl/strings.md b/src/doc/trpl/strings.md index 2c2e6a8c7c5..6ed4c7cb1b3 100644 --- a/src/doc/trpl/strings.md +++ b/src/doc/trpl/strings.md @@ -1,36 +1,34 @@ % Strings -Strings are an important concept for any programmer to master. Rust's string +Strings are an important concept for any programmer to master. Rust’s string handling system is a bit different from other languages, due to its systems focus. Any time you have a data structure of variable size, things can get -tricky, and strings are a re-sizable data structure. That being said, Rust's +tricky, and strings are a re-sizable data structure. That being said, Rust’s strings also work differently than in some other systems languages, such as C. -Let's dig into the details. A *string* is a sequence of Unicode scalar values -encoded as a stream of UTF-8 bytes. All strings are guaranteed to be -validly encoded UTF-8 sequences. Additionally, strings are not null-terminated -and can contain null bytes. +Let’s dig into the details. A ‘string’ is a sequence of Unicode scalar values +encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid +encoding of UTF-8 sequences. Additionally, unlike some systems languages, +strings are not null-terminated and can contain null bytes. -Rust has two main types of strings: `&str` and `String`. +Rust has two main types of strings: `&str` and `String`. Let’s talk about +`&str` first. These are called ‘string slices’. String literals are of the type +`&'static str`: -The first kind is a `&str`. These are called *string slices*. String literals -are of the type `&str`: - -```{rust} -let string = "Hello there."; // string: &str +```rust +let string = "Hello there."; // string: &'static str ``` -This string is statically allocated, meaning that it's saved inside our +This string is statically allocated, meaning that it’s saved inside our compiled program, and exists for the entire duration it runs. The `string` binding is a reference to this statically allocated string. String slices have a fixed size, and cannot be mutated. -A `String`, on the other hand, is a heap-allocated string. This string -is growable, and is also guaranteed to be UTF-8. `String`s are -commonly created by converting from a string slice using the -`to_string` method. +A `String`, on the other hand, is a heap-allocated string. This string is +growable, and is also guaranteed to be UTF-8. `String`s are commonly created by +converting from a string slice using the `to_string` method. -```{rust} +```rust let mut s = "Hello".to_string(); // mut s: String println!("{}", s); @@ -54,8 +52,78 @@ fn main() { Viewing a `String` as a `&str` is cheap, but converting the `&str` to a `String` involves allocating memory. No reason to do that unless you have to! -That's the basics of strings in Rust! They're probably a bit more complicated -than you are used to, if you come from a scripting language, but when the -low-level details matter, they really matter. Just remember that `String`s -allocate memory and control their data, while `&str`s are a reference to -another string, and you'll be all set. +## Indexing + +Because strings are valid UTF-8, strings do not support indexing: + +```rust,ignore +let s = "hello"; + +println!("The first letter of s is {}", s[0]); // ERROR!!! +``` + +Usually, access to a vector with `[]` is very fast. But, because each character +in a UTF-8 encoded string can be multiple bytes, you have to walk over the +string to find the nᵗʰ letter of a string. This is a significantly more +expensive operation, and we don’t want to be misleading. Furthermore, ‘letter’ +isn’t something defined in Unicode, exactly. We can choose to look at a string as +individual bytes, or as codepoints: + +```rust +let hachiko = "忠犬ハチ公"; + +for b in hachiko.as_bytes() { +print!("{}, ", b); +} + +println!(""); + +for c in hachiko.chars() { +print!("{}, ", c); +} + +println!(""); +``` + +This prints: + +```text +229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172, +忠, 犬, ハ, チ, 公, +``` + +As you can see, there are more bytes than `char`s. + +You can get something similar to an index like this: + +```rust +# let hachiko = "忠犬ハチ公"; +let dog = hachiko.chars().nth(1); // kinda like hachiko[1] +``` + +This emphasizes that we have to go through the whole list of `chars`. + +## Concatenation + +If you have a `String`, you can concatenate a `&str` to the end of it: + +```rust +let hello = "Hello ".to_string(); +let world = "world!"; + +let hello_world = hello + world; +``` + +But if you have two `String`s, you need an `&`: + +```rust +let hello = "Hello ".to_string(); +let world = "world!".to_string(); + +let hello_world = hello + &world; +``` + +This is because `&String` can automatically coerece to a `&str`. This is a +feature called ‘[`Deref` coercions][dc]’. + +[dc]: deref-coercions.html