rust/concurrency.md
Alexis Beingessner f8be0737ea progress
2015-06-18 21:04:48 -07:00

187 lines
8.3 KiB
Markdown

% Concurrency and Paralellism
```Not sure if I want this
Safe Rust features *a ton* of tooling to make concurrency and parallelism totally
safe, easy, and fearless. This is a case where we'll really just
[defer to TRPL][trpl-conc] for the basics.
TL;DR: The `Send` and `Sync` traits in conjunction with Rust's ownership model and
normal generic bounds make using concurrent APIs really easy and painless for
a user of Safe Rust.
```
## Data Races and Race Conditions
Safe Rust guarantees an absence of data races, which are defined as:
* two or more threads concurrently accessing a location of memory
* one of them is a write
* one of them is unsynchronized
A data race has Undefined Behaviour, and is therefore impossible to perform
in Safe Rust. Data races are *mostly* prevented through rust's ownership system:
it's impossible to alias a mutable reference, so it's impossible to perform a
data race. Interior mutability makes this more complicated, which is largely why
we have the Send and Sync traits (see below).
However Rust *does not* prevent general race conditions. This is
pretty fundamentally impossible, and probably honestly undesirable. Your hardware
is racy, your OS is racy, the other programs on your computer are racy, and the
world this all runs in is racy. Any system that could genuinely claim to prevent
*all* race conditions would be pretty awful to use, if not just incorrect.
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do
something incredibly stupid with incorrect synchronization. Obviously such a
program isn't very good, but Rust can only hold your hand so far. Still, a
race condition can't violate memory safety in a Rust program on
its own. Only in conjunction with some other unsafe code can a race condition
actually violate memory safety. For instance:
```rust
use std::thread;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
let data = vec![1, 2, 3, 4];
// Arc so that the memory the AtomicUsize is stored in still exists for
// the other thread to increment, even if we completely finish executing
// before it. Rust won't compile the program without it, because of the
// lifetime requirements of thread::spawn!
let idx = Arc::new(AtomicUsize::new(0));
let other_idx = idx.clone();
// `move` captures other_idx by-value, moving it into this thread
thread::spawn(move || {
// It's ok to mutate idx because this value
// is an atomic, so it can't cause a Data Race.
other_idx.fetch_add(10, Ordering::SeqCst);
});
// Index with the value loaded from the atomic. This is safe because we
// read the atomic memory only once, and then pass a *copy* of that value
// to the Vec's indexing implementation. This indexing will be correctly
// bounds checked, and there's no chance of the value getting changed
// in the middle. However our program may panic if the thread we spawned
// managed to increment before this ran. A race condition because correct
// program execution (panicing is rarely correct) depends on order of
// thread execution.
println!("{}", data[idx.load(Ordering::SeqCst)]);
if idx.load(Ordering::SeqCst) < data.len() {
unsafe {
// Incorrectly loading the idx *after* we did the bounds check.
// It could have changed. This is a race condition, *and dangerous*
// because we decided to do `get_unchecked`, which is `unsafe`.
println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst)));
}
}
```
## Send and Sync
Not everything obeys inherited mutability, though. Some types allow you to multiply
alias a location in memory while mutating it. Unless these types use synchronization
to manage this access, they are absolutely not thread safe. Rust captures this with
through the `Send` and `Sync` traits.
* A type is Send if it is safe to send it to another thread.
* A type is Sync if it is safe to share between threads (`&T` is Send).
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
substantial amount of special tooling exists to make them work right. First and
foremost, they're *unsafe traits*. This means that they are unsafe *to implement*,
and other unsafe code can *trust* that they are correctly implemented. Since
they're *marker traits* (they have no associated items like methods), correctly
implemented simply means that they have the intrinsic properties an implementor
should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour.
Send and Sync are also what Rust calls *opt-in builtin traits*.
This means that, unlike every other trait, they are *automatically* derived:
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
Almost all primitives are Send and Sync, and as a consequence pretty much
all types you'll ever interact with are Send and Sync.
Major exceptions include:
* raw pointers are neither Send nor Sync (because they have no safety guards)
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't)
* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized)
`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable
unsynchronized shared mutable state. However raw pointers are, strictly speaking,
marked as thread-unsafe as more of a *lint*. Doing anything useful
with a raw pointer requires dereferencing it, which is already unsafe. In that
sense, one could argue that it would be "fine" for them to be marked as thread safe.
However it's important that they aren't thread safe to prevent types that
*contain them* from being automatically marked as thread safe. These types have
non-trivial untracked ownership, and it's unlikely that their author was
necessarily thinking hard about thread safety. In the case of Rc, we have a nice
example of a type that contains a `*mut` that is *definitely* not thread safe.
Types that aren't automatically derived can *opt-in* to Send and Sync by simply
implementing them:
```rust
struct MyBox(*mut u8);
unsafe impl Send for MyBox {}
unsafe impl Sync for MyBox {}
```
In the *incredibly rare* case that a type is *inappropriately* automatically
derived to be Send or Sync, then one can also *unimplement* Send and Sync:
```rust
struct SpecialThreadToken(u8);
impl !Send for SpecialThreadToken {}
impl !Sync for SpecialThreadToken {}
```
Note that *in and of itself* it is impossible to incorrectly derive Send and Sync.
Only types that are ascribed special meaning by other unsafe code can possible cause
trouble by being incorrectly Send or Sync.
Most uses of raw pointers should be encapsulated behind a sufficient abstraction
that Send and Sync can be derived. For instance all of Rust's standard
collections are Send and Sync (when they contain Send and Sync types)
in spite of their pervasive use raw pointers to
manage allocations and complex ownership. Similarly, most iterators into these
collections are Send and Sync because they largely behave like an `&` or `&mut`
into the collection.
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
only to data races?
## Atomics
Rust pretty blatantly just inherits LLVM's model for atomics, which in turn is
largely based off of the C11 model for atomics. This is not due these models
being particularly excellent or easy to understand. Indeed, these models are
quite complex and are known to have several flaws. Rather, it is a pragmatic
concession to the fact that *everyone* is pretty bad at modeling atomics. At very
least, we can benefit from existing tooling and research around C's model.
Trying to fully explain these models is fairly hopeless, so we're just going to
drop that problem in LLVM's lap.
## Actually Doing Things Concurrently
Rust as a language doesn't *really* have an opinion on how to do concurrency or
parallelism. The standard library exposes OS threads and blocking sys-calls
because *everyone* has those and they're uniform enough that you can provide
an abstraction over them in a relatively uncontroversial way. Message passing,
green threads, and async APIs are all diverse enough that any abstraction over
them tends to involve trade-offs that we weren't willing to commit to for 1.0.
However Rust's current design is setup so that you can set up your own
concurrent paradigm or library as you see fit. Just require the right
lifetimes and Send and Sync where appropriate and everything should Just Work
with everyone else's stuff.
[llvm-conc]: http://llvm.org/docs/Atomics.html
[trpl-conc]: https://doc.rust-lang.org/book/concurrency.html