2015-06-18 23:04:48 -05:00
|
|
|
% Concurrency and Paralellism
|
|
|
|
|
|
|
|
|
2015-06-19 17:52:21 -05:00
|
|
|
|
|
|
|
# Data Races and Race Conditions
|
2015-06-18 23:04:48 -05:00
|
|
|
|
|
|
|
Safe Rust guarantees an absence of data races, which are defined as:
|
|
|
|
|
|
|
|
* two or more threads concurrently accessing a location of memory
|
|
|
|
* one of them is a write
|
|
|
|
* one of them is unsynchronized
|
|
|
|
|
|
|
|
A data race has Undefined Behaviour, and is therefore impossible to perform
|
|
|
|
in Safe Rust. Data races are *mostly* prevented through rust's ownership system:
|
|
|
|
it's impossible to alias a mutable reference, so it's impossible to perform a
|
|
|
|
data race. Interior mutability makes this more complicated, which is largely why
|
|
|
|
we have the Send and Sync traits (see below).
|
|
|
|
|
|
|
|
However Rust *does not* prevent general race conditions. This is
|
|
|
|
pretty fundamentally impossible, and probably honestly undesirable. Your hardware
|
|
|
|
is racy, your OS is racy, the other programs on your computer are racy, and the
|
|
|
|
world this all runs in is racy. Any system that could genuinely claim to prevent
|
|
|
|
*all* race conditions would be pretty awful to use, if not just incorrect.
|
|
|
|
|
|
|
|
So it's perfectly "fine" for a Safe Rust program to get deadlocked or do
|
|
|
|
something incredibly stupid with incorrect synchronization. Obviously such a
|
|
|
|
program isn't very good, but Rust can only hold your hand so far. Still, a
|
|
|
|
race condition can't violate memory safety in a Rust program on
|
|
|
|
its own. Only in conjunction with some other unsafe code can a race condition
|
|
|
|
actually violate memory safety. For instance:
|
|
|
|
|
|
|
|
```rust
|
|
|
|
use std::thread;
|
|
|
|
use std::sync::atomic::{AtomicUsize, Ordering};
|
|
|
|
use std::sync::Arc;
|
|
|
|
|
|
|
|
let data = vec![1, 2, 3, 4];
|
|
|
|
// Arc so that the memory the AtomicUsize is stored in still exists for
|
|
|
|
// the other thread to increment, even if we completely finish executing
|
|
|
|
// before it. Rust won't compile the program without it, because of the
|
|
|
|
// lifetime requirements of thread::spawn!
|
|
|
|
let idx = Arc::new(AtomicUsize::new(0));
|
|
|
|
let other_idx = idx.clone();
|
|
|
|
|
|
|
|
// `move` captures other_idx by-value, moving it into this thread
|
|
|
|
thread::spawn(move || {
|
|
|
|
// It's ok to mutate idx because this value
|
|
|
|
// is an atomic, so it can't cause a Data Race.
|
|
|
|
other_idx.fetch_add(10, Ordering::SeqCst);
|
|
|
|
});
|
|
|
|
|
|
|
|
// Index with the value loaded from the atomic. This is safe because we
|
|
|
|
// read the atomic memory only once, and then pass a *copy* of that value
|
|
|
|
// to the Vec's indexing implementation. This indexing will be correctly
|
|
|
|
// bounds checked, and there's no chance of the value getting changed
|
|
|
|
// in the middle. However our program may panic if the thread we spawned
|
|
|
|
// managed to increment before this ran. A race condition because correct
|
|
|
|
// program execution (panicing is rarely correct) depends on order of
|
|
|
|
// thread execution.
|
|
|
|
println!("{}", data[idx.load(Ordering::SeqCst)]);
|
|
|
|
|
|
|
|
if idx.load(Ordering::SeqCst) < data.len() {
|
|
|
|
unsafe {
|
|
|
|
// Incorrectly loading the idx *after* we did the bounds check.
|
|
|
|
// It could have changed. This is a race condition, *and dangerous*
|
|
|
|
// because we decided to do `get_unchecked`, which is `unsafe`.
|
|
|
|
println!("{}", data.get_unchecked(idx.load(Ordering::SeqCst)));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2015-06-19 17:52:21 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Send and Sync
|
2015-06-18 23:04:48 -05:00
|
|
|
|
|
|
|
Not everything obeys inherited mutability, though. Some types allow you to multiply
|
|
|
|
alias a location in memory while mutating it. Unless these types use synchronization
|
|
|
|
to manage this access, they are absolutely not thread safe. Rust captures this with
|
|
|
|
through the `Send` and `Sync` traits.
|
|
|
|
|
|
|
|
* A type is Send if it is safe to send it to another thread.
|
|
|
|
* A type is Sync if it is safe to share between threads (`&T` is Send).
|
|
|
|
|
|
|
|
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
|
|
|
|
substantial amount of special tooling exists to make them work right. First and
|
|
|
|
foremost, they're *unsafe traits*. This means that they are unsafe *to implement*,
|
|
|
|
and other unsafe code can *trust* that they are correctly implemented. Since
|
|
|
|
they're *marker traits* (they have no associated items like methods), correctly
|
|
|
|
implemented simply means that they have the intrinsic properties an implementor
|
|
|
|
should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour.
|
|
|
|
|
|
|
|
Send and Sync are also what Rust calls *opt-in builtin traits*.
|
|
|
|
This means that, unlike every other trait, they are *automatically* derived:
|
|
|
|
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
|
|
|
|
Almost all primitives are Send and Sync, and as a consequence pretty much
|
|
|
|
all types you'll ever interact with are Send and Sync.
|
|
|
|
|
|
|
|
Major exceptions include:
|
2015-06-23 07:23:23 -05:00
|
|
|
|
2015-06-18 23:04:48 -05:00
|
|
|
* raw pointers are neither Send nor Sync (because they have no safety guards)
|
|
|
|
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't)
|
|
|
|
* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized)
|
|
|
|
|
|
|
|
`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable
|
|
|
|
unsynchronized shared mutable state. However raw pointers are, strictly speaking,
|
|
|
|
marked as thread-unsafe as more of a *lint*. Doing anything useful
|
|
|
|
with a raw pointer requires dereferencing it, which is already unsafe. In that
|
|
|
|
sense, one could argue that it would be "fine" for them to be marked as thread safe.
|
|
|
|
|
|
|
|
However it's important that they aren't thread safe to prevent types that
|
|
|
|
*contain them* from being automatically marked as thread safe. These types have
|
|
|
|
non-trivial untracked ownership, and it's unlikely that their author was
|
|
|
|
necessarily thinking hard about thread safety. In the case of Rc, we have a nice
|
|
|
|
example of a type that contains a `*mut` that is *definitely* not thread safe.
|
|
|
|
|
|
|
|
Types that aren't automatically derived can *opt-in* to Send and Sync by simply
|
|
|
|
implementing them:
|
|
|
|
|
|
|
|
```rust
|
|
|
|
struct MyBox(*mut u8);
|
|
|
|
|
|
|
|
unsafe impl Send for MyBox {}
|
|
|
|
unsafe impl Sync for MyBox {}
|
|
|
|
```
|
|
|
|
|
|
|
|
In the *incredibly rare* case that a type is *inappropriately* automatically
|
|
|
|
derived to be Send or Sync, then one can also *unimplement* Send and Sync:
|
|
|
|
|
|
|
|
```rust
|
|
|
|
struct SpecialThreadToken(u8);
|
|
|
|
|
|
|
|
impl !Send for SpecialThreadToken {}
|
|
|
|
impl !Sync for SpecialThreadToken {}
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that *in and of itself* it is impossible to incorrectly derive Send and Sync.
|
|
|
|
Only types that are ascribed special meaning by other unsafe code can possible cause
|
|
|
|
trouble by being incorrectly Send or Sync.
|
|
|
|
|
|
|
|
Most uses of raw pointers should be encapsulated behind a sufficient abstraction
|
|
|
|
that Send and Sync can be derived. For instance all of Rust's standard
|
|
|
|
collections are Send and Sync (when they contain Send and Sync types)
|
|
|
|
in spite of their pervasive use raw pointers to
|
|
|
|
manage allocations and complex ownership. Similarly, most iterators into these
|
|
|
|
collections are Send and Sync because they largely behave like an `&` or `&mut`
|
|
|
|
into the collection.
|
|
|
|
|
|
|
|
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
|
|
|
|
only to data races?
|
|
|
|
|
2015-06-19 17:52:21 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Atomics
|
2015-06-18 23:04:48 -05:00
|
|
|
|
2015-07-01 15:11:13 -05:00
|
|
|
Rust pretty blatantly just inherits C11's memory model for atomics. This is not
|
|
|
|
due this model being particularly excellent or easy to understand. Indeed, this
|
|
|
|
model is quite complex and known to have [several flaws][C11-busted]. Rather,
|
|
|
|
it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
|
|
|
|
atomics. At very least, we can benefit from existing tooling and research around
|
|
|
|
C.
|
2015-06-18 23:04:48 -05:00
|
|
|
|
2015-07-01 15:11:13 -05:00
|
|
|
Trying to fully explain the model is fairly hopeless. If you want all the
|
|
|
|
nitty-gritty details, you should check out [C's specification][C11-model].
|
|
|
|
Still, we'll try to cover the basics and some of the problems Rust developers
|
|
|
|
face.
|
|
|
|
|
|
|
|
The C11 memory model is fundamentally about trying to bridge the gap between C's
|
|
|
|
single-threaded semantics, common compiler optimizations, and hardware peculiarities
|
|
|
|
in the face of a multi-threaded environment. It does this by splitting memory
|
|
|
|
accesses into two worlds: data accesses, and atomic accesses.
|
|
|
|
|
|
|
|
Data accesses are the bread-and-butter of the programming world. They are
|
|
|
|
fundamentally unsynchronized and compilers are free to aggressively optimize
|
|
|
|
them. In particular data accesses are free to be reordered by the compiler
|
|
|
|
on the assumption that the program is single-threaded. The hardware is also free
|
|
|
|
to propagate the changes made in data accesses as lazily and inconsistently as
|
|
|
|
it wants to other threads. Mostly critically, data accesses are where we get data
|
|
|
|
races. These are pretty clearly awful semantics to try to write a multi-threaded
|
|
|
|
program with.
|
|
|
|
|
|
|
|
Atomic accesses are the answer to this. Each atomic access can be marked with
|
|
|
|
an *ordering*. The set of orderings Rust exposes are:
|
|
|
|
|
|
|
|
* Sequentially Consistent (SeqCst)
|
|
|
|
* Release
|
|
|
|
* Acquire
|
|
|
|
* Relaxed
|
|
|
|
|
|
|
|
(Note: We explicitly do not expose the C11 *consume* ordering)
|
|
|
|
|
|
|
|
TODO: give simple "basic" explanation of these
|
|
|
|
TODO: implementing Arc example (why does Drop need the trailing barrier?)
|
2015-06-18 23:04:48 -05:00
|
|
|
|
2015-06-19 17:52:21 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Actually Doing Things Concurrently
|
2015-06-18 23:04:48 -05:00
|
|
|
|
|
|
|
Rust as a language doesn't *really* have an opinion on how to do concurrency or
|
|
|
|
parallelism. The standard library exposes OS threads and blocking sys-calls
|
|
|
|
because *everyone* has those and they're uniform enough that you can provide
|
|
|
|
an abstraction over them in a relatively uncontroversial way. Message passing,
|
|
|
|
green threads, and async APIs are all diverse enough that any abstraction over
|
|
|
|
them tends to involve trade-offs that we weren't willing to commit to for 1.0.
|
|
|
|
|
|
|
|
However Rust's current design is setup so that you can set up your own
|
|
|
|
concurrent paradigm or library as you see fit. Just require the right
|
|
|
|
lifetimes and Send and Sync where appropriate and everything should Just Work
|
|
|
|
with everyone else's stuff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2015-07-01 15:11:13 -05:00
|
|
|
[C11-busted]: http://plv.mpi-sws.org/c11comp/popl15.pdf
|
|
|
|
[C11-model]: http://en.cppreference.com/w/c/atomic/memory_order
|