auto merge of #15593 : steveklabnik/rust/string_guide, r=kballard
I decided to change it up a little today and hack out the beginning of the String guide. Strings are different enough in Rust that I think they deserve a specific guide, especially for those who are used to managed languages. I decided to start with Strings because they get asked about a lot in IRC, and also based on discussions like this one on reddit: http://www.reddit.com/r/rust/comments/2ac390/generic_string_literals/ I blatantly stole bits from our other documentation on Strings. It's a little sparse at current, but I wanted to start somewhere. I am not exactly sure what should go in "Best Practices," and would like the feedback from the team on this. Specifically due to comments like this one: http://www.reddit.com/r/rust/comments/2ac390/generic_string_literals/citmxb5
This commit is contained in:
commit
cebed8ab92
129
src/doc/guide-strings.md
Normal file
129
src/doc/guide-strings.md
Normal file
@ -0,0 +1,129 @@
|
|||||||
|
% The Strings Guide
|
||||||
|
|
||||||
|
# Strings
|
||||||
|
|
||||||
|
Strings are an important concept to master in any programming language. If you
|
||||||
|
come from a managed language background, you may be surprised at the complexity
|
||||||
|
of string handling in a systems programming language. Efficient access and
|
||||||
|
allocation of memory for a dynamically sized structure involves a lot of
|
||||||
|
details. Luckily, Rust has lots of tools to help us here.
|
||||||
|
|
||||||
|
A **string** is a sequence of unicode scalar values encoded as a stream of
|
||||||
|
UTF-8 bytes. All strings are guaranteed to be validly-encoded UTF-8 sequences.
|
||||||
|
Additionally, strings are not null-terminated and can contain null bytes.
|
||||||
|
|
||||||
|
Rust has two main types of strings: `&str` and `String`.
|
||||||
|
|
||||||
|
## &str
|
||||||
|
|
||||||
|
The first kind is a `&str`. This is pronounced a 'string slice.' String literals
|
||||||
|
are of the type `&str`:
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
let string = "Hello there.";
|
||||||
|
```
|
||||||
|
|
||||||
|
Like any Rust type, string slices have an associated lifetime. A string literal
|
||||||
|
is a `&'static str`. A string slice can be written without an explicit
|
||||||
|
lifetime in many cases, such as in function arguments. In these cases the
|
||||||
|
lifetime will be inferred:
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
fn takes_slice(slice: &str) {
|
||||||
|
println!("Got: {}", slice);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Like vector slices, string slices are simply a pointer plus a length. This
|
||||||
|
means that they're a 'view' into an already-allocated string, such as a
|
||||||
|
`&'static str` or a `String`.
|
||||||
|
|
||||||
|
## String
|
||||||
|
|
||||||
|
A `String` is a heap-allocated string. This string is growable, and is also
|
||||||
|
guaranteed to be UTF-8.
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
let mut s = "Hello".to_string();
|
||||||
|
println!("{}", s);
|
||||||
|
|
||||||
|
s.push_str(", world.");
|
||||||
|
println!("{}", s);
|
||||||
|
```
|
||||||
|
|
||||||
|
You can coerce a `String` into a `&str` with the `as_slice()` method:
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
fn takes_slice(slice: &str) {
|
||||||
|
println!("Got: {}", slice);
|
||||||
|
}
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let s = "Hello".to_string();
|
||||||
|
takes_slice(s.as_slice());
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also get a `&str` from a stack-allocated array of bytes:
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
use std::str;
|
||||||
|
|
||||||
|
let x: &[u8] = &[b'a', b'b'];
|
||||||
|
let stack_str: &str = str::from_utf8(x).unwrap();
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### `String` vs. `&str`
|
||||||
|
|
||||||
|
In general, you should prefer `String` when you need ownership, and `&str` when
|
||||||
|
you just need to borrow a string. This is very similar to using `Vec<T>` vs. `&[T]`,
|
||||||
|
and `T` vs `&T` in general.
|
||||||
|
|
||||||
|
This means starting off with this:
|
||||||
|
|
||||||
|
```{rust,ignore}
|
||||||
|
fn foo(s: &str) {
|
||||||
|
```
|
||||||
|
|
||||||
|
and only moving to this:
|
||||||
|
|
||||||
|
```{rust,ignore}
|
||||||
|
fn foo(s: String) {
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have good reason. It's not polite to hold on to ownership you don't
|
||||||
|
need, and it can make your lifetimes more complex. Furthermore, you can pass
|
||||||
|
either kind of string into `foo` by using `.as_slice()` on any `String` you
|
||||||
|
need to pass in, so the `&str` version is more flexible.
|
||||||
|
|
||||||
|
### Comparisons
|
||||||
|
|
||||||
|
To compare a String to a constant string, prefer `as_slice()`...
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
fn compare(string: String) {
|
||||||
|
if string.as_slice() == "Hello" {
|
||||||
|
println!("yes");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
... over `to_string()`:
|
||||||
|
|
||||||
|
```{rust}
|
||||||
|
fn compare(string: String) {
|
||||||
|
if string == "Hello".to_string() {
|
||||||
|
println!("yes");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Converting a `String` to a `&str` is cheap, but converting the `&str` to a
|
||||||
|
`String` involves an allocation.
|
||||||
|
|
||||||
|
## Other Documentation
|
||||||
|
|
||||||
|
* [the `&str` API documentation](/std/str/index.html)
|
||||||
|
* [the `String` API documentation](std/string/index.html)
|
@ -55,10 +55,10 @@ fn main() {
|
|||||||
|
|
||||||
# Representation
|
# Representation
|
||||||
|
|
||||||
Rust's string type, `str`, is a sequence of unicode codepoints encoded as a
|
Rust's string type, `str`, is a sequence of unicode scalar values encoded as a
|
||||||
stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly
|
stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8
|
||||||
encoded UTF-8 sequences. Additionally, strings are not null-terminated
|
sequences. Additionally, strings are not null-terminated and can contain null
|
||||||
and can contain null codepoints.
|
bytes.
|
||||||
|
|
||||||
The actual representation of strings have direct mappings to vectors: `&str`
|
The actual representation of strings have direct mappings to vectors: `&str`
|
||||||
is the same as `&[u8]`.
|
is the same as `&[u8]`.
|
||||||
|
Loading…
Reference in New Issue
Block a user