rust/src/doc/book/strings.md
Steve Klabnik 024aa9a345 src/doc/trpl -> src/doc/book
The book was located under 'src/doc/trpl' because originally, it was
going to be hosted under that URL. Late in the game, before 1.0, we
decided that /book was a better one, so we changed the output, but
not the input. This causes confusion for no good reason. So we'll change
the source directory to look like the output directory, like for every
other thing in src/doc.
2015-11-19 11:30:18 -05:00

190 lines
5.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

% Strings
Strings are an important concept for any programmer to master. Rusts string
handling system is a bit different from other languages, due to its systems
focus. Any time you have a data structure of variable size, things can get
tricky, and strings are a re-sizable data structure. That being said, Rusts
strings also work differently than in some other systems languages, such as C.
Lets dig into the details. A string is a sequence of Unicode scalar values
encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid
encoding of UTF-8 sequences. Additionally, unlike some systems languages,
strings are not null-terminated and can contain null bytes.
Rust has two main types of strings: `&str` and `String`. Lets talk about
`&str` first. These are called string slices. A string slice has a fixed
size, and cannot be mutated. It is a reference to a sequence of UTF-8 bytes.
```rust
let greeting = "Hello there."; // greeting: &'static str
```
`"Hello there."` is a string literal and its type is `&'static str`. A string
literal is a string slice that is statically allocated, meaning that its saved
inside our compiled program, and exists for the entire duration it runs. The
`greeting` binding is a reference to this statically allocated string. Any
function expecting a string slice will also accept a string literal.
String literals can span multiple lines. There are two forms. The first will
include the newline and the leading spaces:
```rust
let s = "foo
bar";
assert_eq!("foo\n bar", s);
```
The second, with a `\`, trims the spaces and the newline:
```rust
let s = "foo\
bar";
assert_eq!("foobar", s);
```
Rust has more than just `&str`s though. A `String`, is a heap-allocated string.
This string is growable, and is also guaranteed to be UTF-8. `String`s are
commonly created by converting from a string slice using the `to_string`
method.
```rust
let mut s = "Hello".to_string(); // mut s: String
println!("{}", s);
s.push_str(", world.");
println!("{}", s);
```
`String`s will coerce into `&str` with an `&`:
```rust
fn takes_slice(slice: &str) {
println!("Got: {}", slice);
}
fn main() {
let s = "Hello".to_string();
takes_slice(&s);
}
```
This coercion does not happen for functions that accept one of `&str`s traits
instead of `&str`. For example, [`TcpStream::connect`][connect] has a parameter
of type `ToSocketAddrs`. A `&str` is okay but a `String` must be explicitly
converted using `&*`.
```rust,no_run
use std::net::TcpStream;
TcpStream::connect("192.168.0.1:3000"); // &str parameter
let addr_string = "192.168.0.1:3000".to_string();
TcpStream::connect(&*addr_string); // convert addr_string to &str
```
Viewing a `String` as a `&str` is cheap, but converting the `&str` to a
`String` involves allocating memory. No reason to do that unless you have to!
## Indexing
Because strings are valid UTF-8, strings do not support indexing:
```rust,ignore
let s = "hello";
println!("The first letter of s is {}", s[0]); // ERROR!!!
```
Usually, access to a vector with `[]` is very fast. But, because each character
in a UTF-8 encoded string can be multiple bytes, you have to walk over the
string to find the nᵗʰ letter of a string. This is a significantly more
expensive operation, and we dont want to be misleading. Furthermore, letter
isnt something defined in Unicode, exactly. We can choose to look at a string as
individual bytes, or as codepoints:
```rust
let hachiko = "忠犬ハチ公";
for b in hachiko.as_bytes() {
print!("{}, ", b);
}
println!("");
for c in hachiko.chars() {
print!("{}, ", c);
}
println!("");
```
This prints:
```text
229, 191, 160, 231, 138, 172, 227, 131, 143, 227, 131, 129, 229, 133, 172,
忠, 犬, ハ, チ, 公,
```
As you can see, there are more bytes than `char`s.
You can get something similar to an index like this:
```rust
# let hachiko = "忠犬ハチ公";
let dog = hachiko.chars().nth(1); // kinda like hachiko[1]
```
This emphasizes that we have to walk from the beginning of the list of `chars`.
## Slicing
You can get a slice of a string with slicing syntax:
```rust
let dog = "hachiko";
let hachi = &dog[0..5];
```
But note that these are _byte_ offsets, not _character_ offsets. So
this will fail at runtime:
```rust,should_panic
let dog = "忠犬ハチ公";
let hachi = &dog[0..2];
```
with this error:
```text
thread '<main>' panicked at 'index 0 and/or 2 in `忠犬ハチ公` do not lie on
character boundary'
```
## Concatenation
If you have a `String`, you can concatenate a `&str` to the end of it:
```rust
let hello = "Hello ".to_string();
let world = "world!";
let hello_world = hello + world;
```
But if you have two `String`s, you need an `&`:
```rust
let hello = "Hello ".to_string();
let world = "world!".to_string();
let hello_world = hello + &world;
```
This is because `&String` can automatically coerce to a `&str`. This is a
feature called [`Deref` coercions][dc].
[dc]: deref-coercions.html
[connect]: ../std/net/struct.TcpStream.html#method.connect