diff --git a/src/doc/guide-strings.md b/src/doc/guide-strings.md new file mode 100644 index 00000000000..6f301afc849 --- /dev/null +++ b/src/doc/guide-strings.md @@ -0,0 +1,129 @@ +% The Strings Guide + +# Strings + +Strings are an important concept to master in any programming language. If you +come from a managed language background, you may be surprised at the complexity +of string handling in a systems programming language. Efficient access and +allocation of memory for a dynamically sized structure involves a lot of +details. Luckily, Rust has lots of tools to help us here. + +A **string** is a sequence of unicode scalar values encoded as a stream of +UTF-8 bytes. All strings are guaranteed to be validly-encoded UTF-8 sequences. +Additionally, strings are not null-terminated and can contain null bytes. + +Rust has two main types of strings: `&str` and `String`. + +## &str + +The first kind is a `&str`. This is pronounced a 'string slice.' String literals +are of the type `&str`: + +```{rust} +let string = "Hello there."; +``` + +Like any Rust type, string slices have an associated lifetime. A string literal +is a `&'static str`. A string slice can be written without an explicit +lifetime in many cases, such as in function arguments. In these cases the +lifetime will be inferred: + +```{rust} +fn takes_slice(slice: &str) { + println!("Got: {}", slice); +} +``` + +Like vector slices, string slices are simply a pointer plus a length. This +means that they're a 'view' into an already-allocated string, such as a +`&'static str` or a `String`. + +## String + +A `String` is a heap-allocated string. This string is growable, and is also +guaranteed to be UTF-8. + +```{rust} +let mut s = "Hello".to_string(); +println!("{}", s); + +s.push_str(", world."); +println!("{}", s); +``` + +You can coerce a `String` into a `&str` with the `as_slice()` method: + +```{rust} +fn takes_slice(slice: &str) { + println!("Got: {}", slice); +} + +fn main() { + let s = "Hello".to_string(); + takes_slice(s.as_slice()); +} +``` + +You can also get a `&str` from a stack-allocated array of bytes: + +```{rust} +use std::str; + +let x: &[u8] = &[b'a', b'b']; +let stack_str: &str = str::from_utf8(x).unwrap(); +``` + +## Best Practices + +### `String` vs. `&str` + +In general, you should prefer `String` when you need ownership, and `&str` when +you just need to borrow a string. This is very similar to using `Vec` vs. `&[T]`, +and `T` vs `&T` in general. + +This means starting off with this: + +```{rust,ignore} +fn foo(s: &str) { +``` + +and only moving to this: + +```{rust,ignore} +fn foo(s: String) { +``` + +If you have good reason. It's not polite to hold on to ownership you don't +need, and it can make your lifetimes more complex. Furthermore, you can pass +either kind of string into `foo` by using `.as_slice()` on any `String` you +need to pass in, so the `&str` version is more flexible. + +### Comparisons + +To compare a String to a constant string, prefer `as_slice()`... + +```{rust} +fn compare(string: String) { + if string.as_slice() == "Hello" { + println!("yes"); + } +} +``` + +... over `to_string()`: + +```{rust} +fn compare(string: String) { + if string == "Hello".to_string() { + println!("yes"); + } +} +``` + +Converting a `String` to a `&str` is cheap, but converting the `&str` to a +`String` involves an allocation. + +## Other Documentation + +* [the `&str` API documentation](/std/str/index.html) +* [the `String` API documentation](std/string/index.html) diff --git a/src/libcollections/str.rs b/src/libcollections/str.rs index 85921f1176a..bd6a13ae2e3 100644 --- a/src/libcollections/str.rs +++ b/src/libcollections/str.rs @@ -55,10 +55,10 @@ fn main() { # Representation -Rust's string type, `str`, is a sequence of unicode codepoints encoded as a -stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly -encoded UTF-8 sequences. Additionally, strings are not null-terminated -and can contain null codepoints. +Rust's string type, `str`, is a sequence of unicode scalar values encoded as a +stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8 +sequences. Additionally, strings are not null-terminated and can contain null +bytes. The actual representation of strings have direct mappings to vectors: `&str` is the same as `&[u8]`.