81801f2171
Fixes #23571
591 lines
20 KiB
Markdown
591 lines
20 KiB
Markdown
% A 30-minute Introduction to Rust
|
||
|
||
Rust is a modern systems programming language focusing on safety and speed. It
|
||
accomplishes these goals by being memory safe without using garbage collection.
|
||
|
||
This introduction will give you a rough idea of what Rust is like, eliding many
|
||
details. It does not require prior experience with systems programming, but you
|
||
may find the syntax easier if you've used a "curly brace" programming language
|
||
before, like C or JavaScript. The concepts are more important than the syntax,
|
||
so don't worry if you don't get every last detail: you can read [The
|
||
Rust Programming Language](book/index.html) to get a more complete explanation.
|
||
|
||
Because this is about high-level concepts, you don't need to actually install
|
||
Rust to follow along. If you'd like to anyway, check out [the
|
||
homepage](http://rust-lang.org) for explanation.
|
||
|
||
To show off Rust, let's talk about how easy it is to get started with Rust.
|
||
Then, we'll talk about Rust's most interesting feature, *ownership*, and
|
||
then discuss how it makes concurrency easier to reason about. Finally,
|
||
we'll talk about how Rust breaks down the perceived dichotomy between speed
|
||
and safety.
|
||
|
||
# Tools
|
||
|
||
Getting started on a new Rust project is incredibly easy, thanks to Rust's
|
||
package manager, [Cargo](http://crates.io).
|
||
|
||
To start a new project with Cargo, use `cargo new`:
|
||
|
||
```{bash}
|
||
$ cargo new hello_world --bin
|
||
```
|
||
|
||
We're passing `--bin` because we're making a binary program: if we
|
||
were making a library, we'd leave it off.
|
||
|
||
Let's check out what Cargo has generated for us:
|
||
|
||
```{bash}
|
||
$ cd hello_world
|
||
$ tree .
|
||
.
|
||
├── Cargo.toml
|
||
└── src
|
||
└── main.rs
|
||
|
||
1 directory, 2 files
|
||
```
|
||
|
||
This is all we need to get started. First, let's check out `Cargo.toml`:
|
||
|
||
```{toml}
|
||
[package]
|
||
|
||
name = "hello_world"
|
||
version = "0.0.1"
|
||
authors = ["Your Name <you@example.com>"]
|
||
```
|
||
|
||
This is called a *manifest*, and it contains all of the metadata that Cargo
|
||
needs to compile your project.
|
||
|
||
Here's what's in `src/main.rs`:
|
||
|
||
```{rust}
|
||
fn main() {
|
||
println!("Hello, world!");
|
||
}
|
||
```
|
||
|
||
Cargo generated a "Hello World" for us. We'll talk more about the syntax here
|
||
later, but that's what Rust code looks like! Let's compile and run it:
|
||
|
||
```{bash}
|
||
$ cargo run
|
||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||
Running `target/hello_world`
|
||
Hello, world!
|
||
```
|
||
|
||
Using an external dependency in Rust is incredibly easy. You add a line to
|
||
your `Cargo.toml`:
|
||
|
||
```{toml}
|
||
[package]
|
||
|
||
name = "hello_world"
|
||
version = "0.0.1"
|
||
authors = ["Your Name <someone@example.com>"]
|
||
|
||
[dependencies.semver]
|
||
|
||
git = "https://github.com/rust-lang/semver.git"
|
||
```
|
||
|
||
You added the `semver` library, which parses version numbers and compares them
|
||
according to the [SemVer specification](http://semver.org/).
|
||
|
||
Now, you can pull in that library using `extern crate` in
|
||
`main.rs`.
|
||
|
||
```{rust,ignore}
|
||
extern crate semver;
|
||
|
||
use semver::Version;
|
||
|
||
fn main() {
|
||
assert!(Version::parse("1.2.3") == Ok(Version {
|
||
major: 1u64,
|
||
minor: 2u64,
|
||
patch: 3u64,
|
||
pre: vec!(),
|
||
build: vec!(),
|
||
}));
|
||
|
||
println!("Versions compared successfully!");
|
||
}
|
||
```
|
||
|
||
Again, we'll discuss the exact details of all of this syntax soon. For now,
|
||
let's compile and run it:
|
||
|
||
```{bash}
|
||
$ cargo run
|
||
Updating git repository `https://github.com/rust-lang/semver.git`
|
||
Compiling semver v0.0.1 (https://github.com/rust-lang/semver.git#bf739419)
|
||
Compiling hello_world v0.0.1 (file:///home/you/projects/hello_world)
|
||
Running `target/hello_world`
|
||
Versions compared successfully!
|
||
```
|
||
|
||
Because we only specified a repository without a version, if someone else were
|
||
to try out our project at a later date, when `semver` was updated, they would
|
||
get a different, possibly incompatible version. To solve this problem, Cargo
|
||
produces a file, `Cargo.lock`, which records the versions of any dependencies.
|
||
This gives us repeatable builds.
|
||
|
||
There is a lot more here, and this is a whirlwind tour, but you should feel
|
||
right at home if you've used tools like [Bundler](http://bundler.io/),
|
||
[npm](https://www.npmjs.org/), or [pip](https://pip.pypa.io/en/latest/).
|
||
There's no `Makefile`s or endless `autotools` output here. (Rust's tooling does
|
||
[play nice with external libraries written in those
|
||
tools](http://doc.crates.io/build-script.html), if you need to.)
|
||
|
||
Enough about tools, let's talk code!
|
||
|
||
# Ownership
|
||
|
||
Rust's defining feature is "memory safety without garbage collection". Let's
|
||
take a moment to talk about what that means. *Memory safety* means that the
|
||
programming language eliminates certain kinds of bugs, such as [buffer
|
||
overflows](http://en.wikipedia.org/wiki/Buffer_overflow) and [dangling
|
||
pointers](http://en.wikipedia.org/wiki/Dangling_pointer). These problems occur
|
||
when you have unrestricted access to memory. As an example, here's some Ruby
|
||
code:
|
||
|
||
```{ruby}
|
||
v = []
|
||
|
||
v.push("Hello")
|
||
|
||
x = v[0]
|
||
|
||
v.push("world")
|
||
|
||
puts x
|
||
```
|
||
|
||
We make an array, `v`, and then call `push` on it. `push` is a method which
|
||
adds an element to the end of an array.
|
||
|
||
Next, we make a new variable, `x`, that's equal to the first element of
|
||
the array. Simple, but this is where the "bug" will appear.
|
||
|
||
Let's keep going. We then call `push` again, pushing "world" onto the
|
||
end of the array. `v` now is `["Hello", "world"]`.
|
||
|
||
Finally, we print `x` with the `puts` method. This prints "Hello."
|
||
|
||
All good? Let's go over a similar, but subtly different example, in C++:
|
||
|
||
```{cpp}
|
||
#include<iostream>
|
||
#include<vector>
|
||
#include<string>
|
||
|
||
int main() {
|
||
std::vector<std::string> v;
|
||
|
||
v.push_back("Hello");
|
||
|
||
std::string& x = v[0];
|
||
|
||
v.push_back("world");
|
||
|
||
std::cout << x;
|
||
}
|
||
```
|
||
|
||
It's a little more verbose due to the static typing, but it's almost the same
|
||
thing. We make a `std::vector` of `std::string`s, we call `push_back` (same as
|
||
`push`) on it, take a reference to the first element of the vector, call
|
||
`push_back` again, and then print out the reference.
|
||
|
||
There's two big differences here: one, they're not _exactly_ the same thing,
|
||
and two...
|
||
|
||
```{bash}
|
||
$ g++ hello.cpp -Wall -Werror
|
||
$ ./a.out
|
||
Segmentation fault (core dumped)
|
||
```
|
||
|
||
A crash! (Note that this is actually system-dependent. Because referring to an
|
||
invalid reference is undefined behavior, the compiler can do anything,
|
||
including the right thing!) Even though we compiled with flags to give us as
|
||
many warnings as possible, and to treat those warnings as errors, we got no
|
||
errors. When we ran the program, it crashed.
|
||
|
||
Why does this happen? When we append to an array, its length changes. Since
|
||
its length changes, we may need to allocate more memory. In Ruby, this happens
|
||
as well, we just don't think about it very often. So why does the C++ version
|
||
segfault when we allocate more memory?
|
||
|
||
The answer is that in the C++ version, `x` is a *reference* to the memory
|
||
location where the first element of the array is stored. But in Ruby, `x` is a
|
||
standalone value, not connected to the underlying array at all. Let's dig into
|
||
the details for a moment. Your program has access to memory, provided to it by
|
||
the operating system. Each location in memory has an address. So when we make
|
||
our vector, `v`, it's stored in a memory location somewhere:
|
||
|
||
| location | name | value |
|
||
|----------|------|-------|
|
||
| 0x30 | v | |
|
||
|
||
(Address numbers made up, and in hexadecimal. Those of you with deep C++
|
||
knowledge, there are some simplifications going on here, like the lack of an
|
||
allocated length for the vector. This is an introduction.)
|
||
|
||
When we push our first string onto the array, we allocate some memory,
|
||
and `v` refers to it:
|
||
|
||
| location | name | value |
|
||
|----------|------|----------|
|
||
| 0x30 | v | 0x18 |
|
||
| 0x18 | | "Hello" |
|
||
|
||
We then make a reference to that first element. A reference is a variable
|
||
that points to a memory location, so its value is the memory location of
|
||
the `"Hello"` string:
|
||
|
||
| location | name | value |
|
||
|----------|------|----------|
|
||
| 0x30 | v | 0x18 |
|
||
| 0x18 | | "Hello" |
|
||
| 0x14 | x | 0x18 |
|
||
|
||
When we push `"world"` onto the vector with `push_back`, there's no room:
|
||
we only allocated one element. So, we need to allocate two elements,
|
||
copy the `"Hello"` string over, and update the reference. Like this:
|
||
|
||
| location | name | value |
|
||
|----------|------|----------|
|
||
| 0x30 | v | 0x08 |
|
||
| 0x18 | | GARBAGE |
|
||
| 0x14 | x | 0x18 |
|
||
| 0x08 | | "Hello" |
|
||
| 0x04 | | "world" |
|
||
|
||
Note that `v` now refers to the new list, which has two elements. It's all
|
||
good. But our `x` didn't get updated! It still points at the old location,
|
||
which isn't valid anymore. In fact, [the documentation for `push_back` mentions
|
||
this](http://en.cppreference.com/w/cpp/container/vector/push_back):
|
||
|
||
> If the new `size()` is greater than `capacity()` then all iterators and
|
||
> references (including the past-the-end iterator) are invalidated.
|
||
|
||
Finding where these iterators and references are is a difficult problem, and
|
||
even in this simple case, `g++` can't help us here. While the bug is obvious in
|
||
this case, in real code, it can be difficult to track down the source of the
|
||
error.
|
||
|
||
Before we talk about this solution, why didn't our Ruby code have this problem?
|
||
The semantics are a little more complicated, and explaining Ruby's internals is
|
||
out of the scope of a guide to Rust. But in a nutshell, Ruby's garbage
|
||
collector keeps track of references, and makes sure that everything works as
|
||
you might expect. This comes at an efficiency cost, and the internals are more
|
||
complex. If you'd really like to dig into the details, [this
|
||
article](http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values)
|
||
can give you more information.
|
||
|
||
Garbage collection is a valid approach to memory safety, but Rust chooses a
|
||
different path. Let's examine what the Rust version of this looks like:
|
||
|
||
```{rust,ignore}
|
||
fn main() {
|
||
let mut v = vec![];
|
||
|
||
v.push("Hello");
|
||
|
||
let x = &v[0];
|
||
|
||
v.push("world");
|
||
|
||
println!("{}", x);
|
||
}
|
||
```
|
||
|
||
This looks like a bit of both: fewer type annotations, but we do create new
|
||
variables with `let`. The method name is `push`, some other stuff is different,
|
||
but it's pretty close. So what happens when we compile this code? Does Rust
|
||
print `"Hello"`, or does Rust crash?
|
||
|
||
Neither. It refuses to compile:
|
||
|
||
```bash
|
||
$ cargo run
|
||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||
main.rs:8:5: 8:6 error: cannot borrow `v` as mutable because it is also borrowed as immutable
|
||
main.rs:8 v.push("world");
|
||
^
|
||
main.rs:6:14: 6:15 note: previous borrow of `v` occurs here; the immutable borrow prevents subsequent moves or mutable borrows of `v` until the borrow ends
|
||
main.rs:6 let x = &v[0];
|
||
^
|
||
main.rs:11:2: 11:2 note: previous borrow ends here
|
||
main.rs:1 fn main() {
|
||
...
|
||
main.rs:11 }
|
||
^
|
||
error: aborting due to previous error
|
||
```
|
||
|
||
When we try to mutate the array by `push`ing it the second time, Rust throws
|
||
an error. It says that we "cannot borrow v as mutable because it is also
|
||
borrowed as immutable." What does it mean by "borrowed"?
|
||
|
||
In Rust, the type system encodes the notion of *ownership*. The variable `v`
|
||
is an *owner* of the vector. When we make a reference to `v`, we let that
|
||
variable (in this case, `x`) *borrow* it for a while. Just like if you own a
|
||
book, and you lend it to me, I'm borrowing the book.
|
||
|
||
So, when I try to modify the vector with the second call to `push`, I need
|
||
to be owning it. But `x` is borrowing it. You can't modify something that
|
||
you've lent to someone. And so Rust throws an error.
|
||
|
||
So how do we fix this problem? Well, we can make a copy of the element:
|
||
|
||
|
||
```{rust}
|
||
fn main() {
|
||
let mut v = vec![];
|
||
|
||
v.push("Hello");
|
||
|
||
let x = v[0].clone();
|
||
|
||
v.push("world");
|
||
|
||
println!("{}", x);
|
||
}
|
||
```
|
||
|
||
Note the addition of `clone()`. This creates a copy of the element, leaving
|
||
the original untouched. Now, we no longer have two references to the same
|
||
memory, and so the compiler is happy. Let's give that a try:
|
||
|
||
```{bash}
|
||
$ cargo run
|
||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||
Running `target/hello_world`
|
||
Hello
|
||
```
|
||
|
||
Same result. Now, making a copy can be inefficient, so this solution may not be
|
||
acceptable. There are other ways to get around this problem, but this is a toy
|
||
example, and because we're in an introduction, we'll leave that for later.
|
||
|
||
The point is, the Rust compiler and its notion of ownership has saved us from a
|
||
bug that would crash the program. We've achieved safety, at compile time,
|
||
without needing to rely on a garbage collector to handle our memory.
|
||
|
||
# Concurrency
|
||
|
||
Rust's ownership model can help in other ways, as well. For example, take
|
||
concurrency. Concurrency is a big topic, and an important one for any modern
|
||
programming language. Let's take a look at how ownership can help you write
|
||
safe concurrent programs.
|
||
|
||
Here's an example of a concurrent Rust program:
|
||
|
||
```{rust}
|
||
use std::thread;
|
||
|
||
fn main() {
|
||
let guards: Vec<_> = (0..10).map(|_| {
|
||
thread::scoped(|| {
|
||
println!("Hello, world!");
|
||
})
|
||
}).collect();
|
||
}
|
||
```
|
||
|
||
This program creates ten threads, which all print `Hello, world!`. The `scoped`
|
||
function takes one argument, a closure, indicated by the double bars `||`. This
|
||
closure is executed in a new thread created by `scoped`. The method is called
|
||
`scoped` because it returns a 'join guard', which will automatically join the
|
||
child thread when it goes out of scope. Because we `collect` these guards into
|
||
a `Vec<T>`, and that vector goes out of scope at the end of our program, our
|
||
program will wait for every thread to finish before finishing.
|
||
|
||
One common form of problem in concurrent programs is a *data race*.
|
||
This occurs when two different threads attempt to access the same
|
||
location in memory in a non-synchronized way, where at least one of
|
||
them is a write. If one thread is attempting to read, and one thread
|
||
is attempting to write, you cannot be sure that your data will not be
|
||
corrupted. Note the first half of that requirement: two threads that
|
||
attempt to access the same location in memory. Rust's ownership model
|
||
can track which pointers own which memory locations, which solves this
|
||
problem.
|
||
|
||
Let's see an example. This Rust code will not compile:
|
||
|
||
```{rust,ignore}
|
||
use std::thread;
|
||
|
||
fn main() {
|
||
let mut numbers = vec![1, 2, 3];
|
||
|
||
let guards: Vec<_> = (0..3).map(|i| {
|
||
thread::scoped(move || {
|
||
numbers[i] += 1;
|
||
println!("numbers[{}] is {}", i, numbers[i]);
|
||
})
|
||
}).collect();
|
||
}
|
||
```
|
||
|
||
It gives us this error:
|
||
|
||
```text
|
||
7:25: 10:6 error: cannot move out of captured outer variable in an `FnMut` closure
|
||
7 thread::scoped(move || {
|
||
8 numbers[i] += 1;
|
||
9 println!("numbers[{}] is {}", i, numbers[i]);
|
||
10 })
|
||
error: aborting due to previous error
|
||
```
|
||
|
||
This is a little confusing because there are two closures here: the one passed
|
||
to `map`, and the one passed to `thread::scoped`. In this case, the closure for
|
||
`thread::scoped` is attempting to reference `numbers`, a `Vec<i32>`. This
|
||
closure is a `FnOnce` closure, as that’s what `thread::scoped` takes as an
|
||
argument. `FnOnce` closures take ownership of their environment. That’s fine,
|
||
but there’s one detail: because of `map`, we’re going to make three of these
|
||
closures. And since all three try to take ownership of `numbers`, that would be
|
||
a problem. That’s what it means by ‘cannot move out of captured outer
|
||
variable’: our `thread::scoped` closure wants to take ownership, and it can’t,
|
||
because the closure for `map` won’t let it.
|
||
|
||
What to do here? Rust has two types that helps us: `Arc<T>` and `Mutex<T>`.
|
||
*Arc* stands for "atomically reference counted". In other words, an Arc will
|
||
keep track of the number of references to something, and not free the
|
||
associated resource until the count is zero. The *atomic* portion refers to an
|
||
Arc's usage of concurrency primitives to atomically update the count, making it
|
||
safe across threads. If we use an Arc, we can have our three references. But,
|
||
an Arc does not allow mutable borrows of the data it holds, and we want to
|
||
modify what we're sharing. In this case, we can use a `Mutex<T>` inside of our
|
||
Arc. A Mutex will synchronize our accesses, so that we can ensure that our
|
||
mutation doesn't cause a data race.
|
||
|
||
Here's what using an Arc with a Mutex looks like:
|
||
|
||
```{rust}
|
||
use std::thread;
|
||
use std::sync::{Arc,Mutex};
|
||
|
||
fn main() {
|
||
let numbers = Arc::new(Mutex::new(vec![1, 2, 3]));
|
||
|
||
let guards: Vec<_> = (0..3).map(|i| {
|
||
let number = numbers.clone();
|
||
thread::scoped(move || {
|
||
let mut array = number.lock().unwrap();
|
||
array[i] += 1;
|
||
println!("numbers[{}] is {}", i, array[i]);
|
||
})
|
||
}).collect();
|
||
}
|
||
```
|
||
|
||
We first have to `use` the appropriate library, and then we wrap our vector in
|
||
an Arc with the call to `Arc::new()`. Inside of the loop, we make a new
|
||
reference to the Arc with the `clone()` method. This will increment the
|
||
reference count. When each new `numbers` variable binding goes out of scope, it
|
||
will decrement the count. The `lock()` call will return us a reference to the
|
||
value inside the Mutex, and block any other calls to `lock()` until said
|
||
reference goes out of scope.
|
||
|
||
We can compile and run this program without error, and in fact, see the
|
||
non-deterministic aspect:
|
||
|
||
```{shell}
|
||
$ cargo run
|
||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||
Running `target/hello_world`
|
||
numbers[1] is 3
|
||
numbers[0] is 2
|
||
numbers[2] is 4
|
||
$ cargo run
|
||
Running `target/hello_world`
|
||
numbers[2] is 4
|
||
numbers[1] is 3
|
||
numbers[0] is 2
|
||
```
|
||
|
||
Each time, we can get a slightly different output because the threads are not
|
||
guaranteed to run in any set order. If you get the same order every time it is
|
||
because each of these threads are very small and complete too fast for their
|
||
indeterminate behavior to surface.
|
||
|
||
The important part here is that the Rust compiler was able to use ownership to
|
||
give us assurance _at compile time_ that we weren't doing something incorrect
|
||
with regards to concurrency. In order to share ownership, we were forced to be
|
||
explicit and use a mechanism to ensure that it would be properly handled.
|
||
|
||
# Safety _and_ Speed
|
||
|
||
Safety and speed are always presented as a continuum. At one end of the spectrum,
|
||
you have maximum speed, but no safety. On the other end, you have absolute safety
|
||
with no speed. Rust seeks to break out of this paradigm by introducing safety at
|
||
compile time, ensuring that you haven't done anything wrong, while compiling to
|
||
the same low-level code you'd expect without the safety.
|
||
|
||
As an example, Rust's ownership system is _entirely_ at compile time. The
|
||
safety check that makes this an error about moved values:
|
||
|
||
```{rust,ignore}
|
||
use std::thread;
|
||
|
||
fn main() {
|
||
let numbers = vec![1, 2, 3];
|
||
|
||
let guards: Vec<_> = (0..3).map(|i| {
|
||
thread::scoped(move || {
|
||
println!("{}", numbers[i]);
|
||
})
|
||
}).collect();
|
||
}
|
||
```
|
||
|
||
carries no runtime penalty. And while some of Rust's safety features do have
|
||
a run-time cost, there's often a way to write your code in such a way that
|
||
you can remove it. As an example, this is a poor way to iterate through
|
||
a vector:
|
||
|
||
```{rust}
|
||
let vec = vec![1, 2, 3];
|
||
|
||
for i in 0..vec.len() {
|
||
println!("{}", vec[i]);
|
||
}
|
||
```
|
||
|
||
The reason is that the access of `vec[i]` does bounds checking, to ensure
|
||
that we don't try to access an invalid index. However, we can remove this
|
||
while retaining safety. The answer is iterators:
|
||
|
||
```{rust}
|
||
let vec = vec![1, 2, 3];
|
||
|
||
for x in &vec {
|
||
println!("{}", x);
|
||
}
|
||
```
|
||
|
||
This version uses an iterator that yields each element of the vector in turn.
|
||
Because we have a reference to the element, rather than the whole vector itself,
|
||
there's no array access bounds to check.
|
||
|
||
# Learning More
|
||
|
||
I hope that this taste of Rust has given you an idea if Rust is the right
|
||
language for you. We talked about Rust's tooling, how encoding ownership into
|
||
the type system helps you find bugs, how Rust can help you write correct
|
||
concurrent code, and how you don't have to pay a speed cost for much of this
|
||
safety.
|
||
|
||
To continue your Rustic education, read [The Rust Programming
|
||
Language](book/index.html) for a more in-depth exploration of Rust's syntax and
|
||
concepts.
|