many many pnkfelix fixes
This commit is contained in:
parent
05bb1dbc43
commit
5789106737
@ -9,19 +9,24 @@ is not always the case, however.
|
||||
|
||||
# Dynamically Sized Types (DSTs)
|
||||
|
||||
Rust also supports types without a statically known size. On the surface, this
|
||||
is a bit nonsensical: Rust *must* know the size of something in order to work
|
||||
with it! DSTs are generally produced as views, or through type-erasure of types
|
||||
that *do* have a known size. Due to their lack of a statically known size, these
|
||||
types can only exist *behind* some kind of pointer. They consequently produce a
|
||||
*fat* pointer consisting of the pointer and the information that *completes*
|
||||
them.
|
||||
Rust in fact supports Dynamically Sized Types (DSTs): types without a statically
|
||||
known size or alignment. On the surface, this is a bit nonsensical: Rust *must*
|
||||
know the size and alignment of something in order to correctly work with it! In
|
||||
this regard, DSTs are not normal types. Due to their lack of a statically known
|
||||
size, these types can only exist behind some kind of pointer. Any pointer to a
|
||||
DST consequently becomes a *fat* pointer consisting of the pointer and the
|
||||
information that "completes" them (more on this below).
|
||||
|
||||
For instance, the slice type, `[T]`, is some statically unknown number of
|
||||
elements stored contiguously. `&[T]` consequently consists of a `(&T, usize)`
|
||||
pair that specifies where the slice starts, and how many elements it contains.
|
||||
Similarly, Trait Objects support interface-oriented type erasure through a
|
||||
`(data_ptr, vtable_ptr)` pair.
|
||||
There are two major DSTs exposed by the language: trait objects, and slices.
|
||||
|
||||
A trait object represents some type that implements the traits it specifies.
|
||||
The exact original type is *erased* in favour of runtime reflection
|
||||
with a vtable containing all the information necessary to use the type.
|
||||
This is the information that completes a trait object: a pointer to its vtable.
|
||||
|
||||
A slice is simply a view into some contiguous storage -- typically an array or
|
||||
`Vec`. The information that completes a slice is just the number of elements
|
||||
it points to.
|
||||
|
||||
Structs can actually store a single DST directly as their last field, but this
|
||||
makes them a DST as well:
|
||||
@ -34,8 +39,8 @@ struct Foo {
|
||||
}
|
||||
```
|
||||
|
||||
**NOTE: As of Rust 1.0 struct DSTs are broken if the last field has
|
||||
a variable position based on its alignment.**
|
||||
**NOTE: [As of Rust 1.0 struct DSTs are broken if the last field has
|
||||
a variable position based on its alignment][dst-issue].**
|
||||
|
||||
|
||||
|
||||
@ -56,22 +61,32 @@ struct Baz {
|
||||
}
|
||||
```
|
||||
|
||||
On their own, ZSTs are, for obvious reasons, pretty useless. However as with
|
||||
many curious layout choices in Rust, their potential is realized in a generic
|
||||
context.
|
||||
On their own, Zero Sized Types (ZSTs) are, for obvious reasons, pretty useless.
|
||||
However as with many curious layout choices in Rust, their potential is realized
|
||||
in a generic context: Rust largely understands that any operation that produces
|
||||
or stores a ZST can be reduced to a no-op. First off, storing it doesn't even
|
||||
make sense -- it doesn't occupy any space. Also there's only one value of that
|
||||
type, so anything that loads it can just produce it from the aether -- which is
|
||||
also a no-op since it doesn't occupy any space.
|
||||
|
||||
Rust largely understands that any operation that produces or stores a ZST can be
|
||||
reduced to a no-op. For instance, a `HashSet<T>` can be effeciently implemented
|
||||
as a thin wrapper around `HashMap<T, ()>` because all the operations `HashMap`
|
||||
normally does to store and retrieve values will be completely stripped in
|
||||
monomorphization.
|
||||
One of the most extreme example's of this is Sets and Maps. Given a
|
||||
`Map<Key, Value>`, it is common to implement a `Set<Key>` as just a thin wrapper
|
||||
around `Map<Key, UselessJunk>`. In many languages, this would necessitate
|
||||
allocating space for UselessJunk and doing work to store and load UselessJunk
|
||||
only to discard it. Proving this unnecessary would be a difficult analysis for
|
||||
the compiler.
|
||||
|
||||
Similarly `Result<(), ()>` and `Option<()>` are effectively just fancy `bool`s.
|
||||
However in Rust, we can just say that `Set<Key> = Map<Key, ()>`. Now Rust
|
||||
statically knows that every load and store is useless, and no allocation has any
|
||||
size. The result is that the monomorphized code is basically a custom
|
||||
implementation of a HashSet with none of the overhead that HashMap would have to
|
||||
support values.
|
||||
|
||||
Safe code need not worry about ZSTs, but *unsafe* code must be careful about the
|
||||
consequence of types with no size. In particular, pointer offsets are no-ops,
|
||||
and standard allocators (including jemalloc, the one used by Rust) generally
|
||||
consider passing in `0` as Undefined Behaviour.
|
||||
and standard allocators (including jemalloc, the one used by default in Rust)
|
||||
generally consider passing in `0` for the size of an allocation as Undefined
|
||||
Behaviour.
|
||||
|
||||
|
||||
|
||||
@ -93,11 +108,12 @@ return a Result in general, but a specific case actually is infallible. It's
|
||||
actually possible to communicate this at the type level by returning a
|
||||
`Result<T, Void>`. Consumers of the API can confidently unwrap such a Result
|
||||
knowing that it's *statically impossible* for this value to be an `Err`, as
|
||||
this would require providing a value of type Void.
|
||||
this would require providing a value of type `Void`.
|
||||
|
||||
In principle, Rust can do some interesting analyses and optimizations based
|
||||
on this fact. For instance, `Result<T, Void>` could be represented as just `T`,
|
||||
because the Err case doesn't actually exist. The following *could* also compile:
|
||||
because the `Err` case doesn't actually exist. The following *could* also
|
||||
compile:
|
||||
|
||||
```rust,ignore
|
||||
enum Void {}
|
||||
@ -116,3 +132,6 @@ actually valid to construct, but dereferencing them is Undefined Behaviour
|
||||
because that doesn't actually make sense. That is, you could model C's `void *`
|
||||
type with `*const Void`, but this doesn't necessarily gain anything over using
|
||||
e.g. `*const ()`, which *is* safe to randomly dereference.
|
||||
|
||||
|
||||
[dst-issue]: https://github.com/rust-lang/rust/issues/26403
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
Programmers in safe "high-level" languages face a fundamental dilemma. On one
|
||||
hand, it would be *really* great to just say what you want and not worry about
|
||||
how it's done. On the other hand, that can lead to some *really* poor
|
||||
how it's done. On the other hand, that can lead to unacceptably poor
|
||||
performance. It may be necessary to drop down to less clear or idiomatic
|
||||
practices to get the performance characteristics you want. Or maybe you just
|
||||
throw up your hands in disgust and decide to shell out to an implementation in
|
||||
@ -12,21 +12,22 @@ Worse, when you want to talk directly to the operating system, you *have* to
|
||||
talk to an unsafe language: *C*. C is ever-present and unavoidable. It's the
|
||||
lingua-franca of the programming world.
|
||||
Even other safe languages generally expose C interfaces for the world at large!
|
||||
Regardless of *why* you're doing it, as soon as your program starts talking to
|
||||
Regardless of why you're doing it, as soon as your program starts talking to
|
||||
C it stops being safe.
|
||||
|
||||
With that said, Rust is *totally* a safe programming language.
|
||||
|
||||
Well, Rust *has* a safe programming language. Let's step back a bit.
|
||||
|
||||
Rust can be thought of as being composed of two
|
||||
programming languages: *Safe* and *Unsafe*. Safe is For Reals Totally Safe.
|
||||
Unsafe, unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe lets
|
||||
you do some really crazy unsafe things.
|
||||
Rust can be thought of as being composed of two programming languages: *Safe
|
||||
Rust* and *Unsafe Rust*. Safe Rust is For Reals Totally Safe. Unsafe Rust,
|
||||
unsurprisingly, is *not* For Reals Totally Safe. In fact, Unsafe Rust lets you
|
||||
do some really crazy unsafe things.
|
||||
|
||||
Safe is *the* Rust programming language. If all you do is write Safe Rust,
|
||||
you will never have to worry about type-safety or memory-safety. You will never
|
||||
endure a null or dangling pointer, or any of that Undefined Behaviour nonsense.
|
||||
Safe Rust is the *true* Rust programming language. If all you do is write Safe
|
||||
Rust, you will never have to worry about type-safety or memory-safety. You will
|
||||
never endure a null or dangling pointer, or any of that Undefined Behaviour
|
||||
nonsense.
|
||||
|
||||
*That's totally awesome*.
|
||||
|
||||
@ -69,17 +70,16 @@ language cares about is preventing the following things:
|
||||
* A non-utf8 `str`
|
||||
* Unwinding into another language
|
||||
* Causing a [data race][race]
|
||||
* Double-dropping a value
|
||||
|
||||
That's it. That's all the Undefined Behaviour baked into Rust. Of course, unsafe
|
||||
functions and traits are free to declare arbitrary other constraints that a
|
||||
program must maintain to avoid Undefined Behaviour. However these are generally
|
||||
just things that will transitively lead to one of the above problems. Some
|
||||
additional constraints may also derive from compiler intrinsics that make special
|
||||
assumptions about how code can be optimized.
|
||||
That's it. That's all the causes of Undefined Behaviour baked into Rust. Of
|
||||
course, unsafe functions and traits are free to declare arbitrary other
|
||||
constraints that a program must maintain to avoid Undefined Behaviour. However,
|
||||
generally violations of these constraints will just transitively lead to one of
|
||||
the above problems. Some additional constraints may also derive from compiler
|
||||
intrinsics that make special assumptions about how code can be optimized.
|
||||
|
||||
Rust is otherwise quite permissive with respect to other dubious operations. Rust
|
||||
considers it "safe" to:
|
||||
Rust is otherwise quite permissive with respect to other dubious operations.
|
||||
Rust considers it "safe" to:
|
||||
|
||||
* Deadlock
|
||||
* Have a [race condition][race]
|
||||
|
@ -12,21 +12,21 @@ The order, size, and alignment of fields is exactly what you would expect from C
|
||||
or C++. Any type you expect to pass through an FFI boundary should have
|
||||
`repr(C)`, as C is the lingua-franca of the programming world. This is also
|
||||
necessary to soundly do more elaborate tricks with data layout such as
|
||||
reintepretting values as a different type.
|
||||
reinterpreting values as a different type.
|
||||
|
||||
However, the interaction with Rust's more exotic data layout features must be
|
||||
kept in mind. Due to its dual purpose as "for FFI" and "for layout control",
|
||||
`repr(C)` can be applied to types that will be nonsensical or problematic if
|
||||
passed through the FFI boundary.
|
||||
|
||||
* ZSTs are still zero-sized, even though this is not a standard behaviour in
|
||||
* ZSTs are still zero-sized, even though this is not a standard behaviour in
|
||||
C, and is explicitly contrary to the behaviour of an empty type in C++, which
|
||||
still consumes a byte of space.
|
||||
|
||||
* DSTs, tuples, and tagged unions are not a concept in C and as such are never
|
||||
FFI safe.
|
||||
|
||||
* **The [drop flag][] will still be added**
|
||||
* **If the type would have any [drop flags][], they will still be added**
|
||||
|
||||
* This is equivalent to one of `repr(u*)` (see the next section) for enums. The
|
||||
chosen size is the default enum size for the target platform's C ABI. Note that
|
||||
@ -39,10 +39,10 @@ compiled with certain flags.
|
||||
# repr(u8), repr(u16), repr(u32), repr(u64)
|
||||
|
||||
These specify the size to make a C-like enum. If the discriminant overflows the
|
||||
integer it has to fit in, it will be an error. You can manually ask Rust to
|
||||
allow this by setting the overflowing element to explicitly be 0. However Rust
|
||||
will not allow you to create an enum where two variants have the same
|
||||
discriminant.
|
||||
integer it has to fit in, it will produce a compile-time error. You can manually
|
||||
ask Rust to allow this by setting the overflowing element to explicitly be 0.
|
||||
However Rust will not allow you to create an enum where two variants have the
|
||||
same discriminant.
|
||||
|
||||
On non-C-like enums, this will inhibit certain optimizations like the null-
|
||||
pointer optimization.
|
||||
@ -65,9 +65,12 @@ compiler might be able to paper over alignment issues with shifts and masks.
|
||||
However if you take a reference to a packed field, it's unlikely that the
|
||||
compiler will be able to emit code to avoid an unaligned load.
|
||||
|
||||
**[As of Rust 1.0 this can cause undefined behaviour.][ub loads]**
|
||||
|
||||
`repr(packed)` is not to be used lightly. Unless you have extreme requirements,
|
||||
this should not be used.
|
||||
|
||||
This repr is a modifier on `repr(C)` and `repr(rust)`.
|
||||
|
||||
[drop flag]: drop-flags.html
|
||||
[drop flags]: drop-flags.html
|
||||
[ub loads]: https://github.com/rust-lang/rust/issues/27060
|
||||
|
@ -5,16 +5,17 @@ memory-safe and efficient, while avoiding garbage collection. Before getting
|
||||
into the ownership system in detail, we will consider the motivation of this
|
||||
design.
|
||||
|
||||
We will assume that you accept that garbage collection is not always an optimal
|
||||
solution, and that it is desirable to manually manage memory to some extent.
|
||||
If you do not accept this, might I interest you in a different language?
|
||||
We will assume that you accept that garbage collection (GC) is not always an
|
||||
optimal solution, and that it is desirable to manually manage memory in some
|
||||
contexts. If you do not accept this, might I interest you in a different
|
||||
language?
|
||||
|
||||
Regardless of your feelings on GC, it is pretty clearly a *massive* boon to
|
||||
making code safe. You never have to worry about things going away *too soon*
|
||||
(although whether you still *wanted* to be pointing at that thing is a different
|
||||
issue...). This is a pervasive problem that C and C++ need to deal with.
|
||||
Consider this simple mistake that all of us who have used a non-GC'd language
|
||||
have made at one point:
|
||||
issue...). This is a pervasive problem that C and C++ programs need to deal
|
||||
with. Consider this simple mistake that all of us who have used a non-GC'd
|
||||
language have made at one point:
|
||||
|
||||
```rust,ignore
|
||||
fn as_str(data: &u32) -> &str {
|
||||
@ -40,7 +41,7 @@ be forced to accept your program on the assumption that it is correct.
|
||||
This will never happen to Rust. It's up to the programmer to prove to the
|
||||
compiler that everything is sound.
|
||||
|
||||
Of course, rust's story around ownership is much more complicated than just
|
||||
Of course, Rust's story around ownership is much more complicated than just
|
||||
verifying that references don't escape the scope of their referent. That's
|
||||
because ensuring pointers are always valid is much more complicated than this.
|
||||
For instance in this code,
|
||||
|
@ -1,5 +1,19 @@
|
||||
% repr(Rust)
|
||||
|
||||
First and foremost, all types have an alignment specified in bytes. The
|
||||
alignment of a type specifies what addresses are valid to store the value at. A
|
||||
value of alignment `n` must only be stored at an address that is a multiple of
|
||||
`n`. So alignment 2 means you must be stored at an even address, and 1 means
|
||||
that you can be stored anywhere. Alignment is at least 1, and always a power of
|
||||
2. Most primitives are generally aligned to their size, although this is
|
||||
platform-specific behaviour. In particular, on x86 `u64` and `f64` may be only
|
||||
aligned to 32 bits.
|
||||
|
||||
A type's size must always be a multiple of its alignment. This ensures that an
|
||||
array of that type may always be indexed by offsetting by a multiple of its
|
||||
size. Note that the size and alignment of a type may not be known
|
||||
statically in the case of [dynamically sized types][dst].
|
||||
|
||||
Rust gives you the following ways to lay out composite data:
|
||||
|
||||
* structs (named product types)
|
||||
@ -9,17 +23,10 @@ Rust gives you the following ways to lay out composite data:
|
||||
|
||||
An enum is said to be *C-like* if none of its variants have associated data.
|
||||
|
||||
For all these, individual fields are aligned to their preferred alignment. For
|
||||
primitives this is usually equal to their size. For instance, a u32 will be
|
||||
aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16
|
||||
bits. Note that some primitives may be emulated on different platforms, and as
|
||||
such may have strange alignment. For instance, a u64 on x86 may actually be
|
||||
emulated as a pair of u32s, and thus only have 32-bit alignment.
|
||||
|
||||
Composite structures will have a preferred alignment equal to the maximum
|
||||
of their fields' preferred alignment, and a size equal to a multiple of their
|
||||
preferred alignment. This ensures that arrays of T can be correctly iterated
|
||||
by offsetting by their size. So for instance,
|
||||
Composite structures will have an alignment equal to the maximum
|
||||
of their fields' alignment. Rust will consequently insert padding where
|
||||
necessary to ensure that all fields are properly aligned and that the overall
|
||||
type's size is a multiple of its alignment. For instance:
|
||||
|
||||
```rust
|
||||
struct A {
|
||||
@ -29,12 +36,24 @@ struct A {
|
||||
}
|
||||
```
|
||||
|
||||
will have a size that is a multiple of 32-bits, and 32-bit alignment.
|
||||
will be 32-bit aligned assuming these primitives are aligned to their size.
|
||||
It will therefore have a size that is a multiple of 32-bits. It will potentially
|
||||
*really* become:
|
||||
|
||||
There is *no indirection* for these types; all data is stored contiguously as you would
|
||||
expect in C. However with the exception of arrays (which are densely packed and
|
||||
in-order), the layout of data is not by default specified in Rust. Given the two
|
||||
following struct definitions:
|
||||
```rust
|
||||
struct A {
|
||||
a: u8,
|
||||
_pad1: [u8; 3], // to align `b`
|
||||
b: u32,
|
||||
c: u16,
|
||||
_pad2: [u8; 2], // to make overall size multiple of 4
|
||||
}
|
||||
```
|
||||
|
||||
There is *no indirection* for these types; all data is stored contiguously as
|
||||
you would expect in C. However with the exception of arrays (which are densely
|
||||
packed and in-order), the layout of data is not by default specified in Rust.
|
||||
Given the two following struct definitions:
|
||||
|
||||
```rust
|
||||
struct A {
|
||||
@ -48,13 +67,15 @@ struct B {
|
||||
}
|
||||
```
|
||||
|
||||
Rust *does* guarantee that two instances of A have their data laid out in exactly
|
||||
the same way. However Rust *does not* guarantee that an instance of A has the same
|
||||
field ordering or padding as an instance of B (in practice there's no *particular*
|
||||
reason why they wouldn't, other than that its not currently guaranteed).
|
||||
Rust *does* guarantee that two instances of A have their data laid out in
|
||||
exactly the same way. However Rust *does not* guarantee that an instance of A
|
||||
has the same field ordering or padding as an instance of B (in practice there's
|
||||
no *particular* reason why they wouldn't, other than that its not currently
|
||||
guaranteed).
|
||||
|
||||
With A and B as written, this is basically nonsensical, but several other features
|
||||
of Rust make it desirable for the language to play with data layout in complex ways.
|
||||
With A and B as written, this is basically nonsensical, but several other
|
||||
features of Rust make it desirable for the language to play with data layout in
|
||||
complex ways.
|
||||
|
||||
For instance, consider this struct:
|
||||
|
||||
@ -66,10 +87,10 @@ struct Foo<T, U> {
|
||||
}
|
||||
```
|
||||
|
||||
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If Rust lays out the
|
||||
fields in the order specified, we expect it to *pad* the values in the struct to satisfy
|
||||
their *alignment* requirements. So if Rust didn't reorder fields, we would expect Rust to
|
||||
produce the following:
|
||||
Now consider the monomorphizations of `Foo<u32, u16>` and `Foo<u16, u32>`. If
|
||||
Rust lays out the fields in the order specified, we expect it to *pad* the
|
||||
values in the struct to satisfy their *alignment* requirements. So if Rust
|
||||
didn't reorder fields, we would expect Rust to produce the following:
|
||||
|
||||
```rust,ignore
|
||||
struct Foo<u16, u32> {
|
||||
@ -87,10 +108,11 @@ struct Foo<u32, u16> {
|
||||
}
|
||||
```
|
||||
|
||||
The latter case quite simply wastes space. An optimal use of space therefore requires
|
||||
different monomorphizations to have *different field orderings*.
|
||||
The latter case quite simply wastes space. An optimal use of space therefore
|
||||
requires different monomorphizations to have *different field orderings*.
|
||||
|
||||
**Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0**
|
||||
**Note: this is a hypothetical optimization that is not yet implemented in Rust
|
||||
**1.0
|
||||
|
||||
Enums make this consideration even more complicated. Naively, an enum such as:
|
||||
|
||||
@ -121,8 +143,10 @@ by using null as a special value. The net result is that
|
||||
|
||||
There are many types in Rust that are, or contain, "not null" pointers such as
|
||||
`Box<T>`, `Vec<T>`, `String`, `&T`, and `&mut T`. Similarly, one can imagine
|
||||
nested enums pooling their tags into a single descriminant, as they are by
|
||||
nested enums pooling their tags into a single discriminant, as they are by
|
||||
definition known to have a limited range of valid values. In principle enums can
|
||||
use fairly elaborate algorithms to cache bits throughout nested types with
|
||||
special constrained representations. As such it is *especially* desirable that
|
||||
we leave enum layout unspecified today.
|
||||
|
||||
[dst]: exotic-sizes.html#dynamically-sized-types-(dsts)
|
||||
|
@ -1,29 +1,30 @@
|
||||
% How Safe and Unsafe Interact
|
||||
|
||||
So what's the relationship between Safe and Unsafe? How do they interact?
|
||||
So what's the relationship between Safe and Unsafe Rust? How do they interact?
|
||||
|
||||
Rust models the seperation between Safe and Unsafe with the `unsafe` keyword, which
|
||||
can be thought as a sort of *foreign function interface* (FFI) between Safe and Unsafe.
|
||||
This is the magic behind why we can say Safe is a safe language: all the scary unsafe
|
||||
bits are relagated *exclusively* to FFI *just like every other safe language*.
|
||||
Rust models the separation between Safe and Unsafe Rust with the `unsafe`
|
||||
keyword, which can be thought as a sort of *foreign function interface* (FFI)
|
||||
between Safe and Unsafe Rust. This is the magic behind why we can say Safe Rust
|
||||
is a safe language: all the scary unsafe bits are relegated *exclusively* to FFI
|
||||
*just like every other safe language*.
|
||||
|
||||
However because one language is a subset of the other, the two can be cleanly
|
||||
intermixed as long as the boundary between Safe and Unsafe is denoted with the
|
||||
`unsafe` keyword. No need to write headers, initialize runtimes, or any of that
|
||||
other FFI boiler-plate.
|
||||
intermixed as long as the boundary between Safe and Unsafe Rust is denoted with
|
||||
the `unsafe` keyword. No need to write headers, initialize runtimes, or any of
|
||||
that other FFI boiler-plate.
|
||||
|
||||
There are several places `unsafe` can appear in Rust today, which can largely be
|
||||
grouped into two categories:
|
||||
|
||||
* There are unchecked contracts here. To declare you understand this, I require
|
||||
you to write `unsafe` elsewhere:
|
||||
* On functions, `unsafe` is declaring the function to be unsafe to call. Users
|
||||
of the function must check the documentation to determine what this means,
|
||||
and then have to write `unsafe` somewhere to identify that they're aware of
|
||||
the danger.
|
||||
* On functions, `unsafe` is declaring the function to be unsafe to call.
|
||||
Users of the function must check the documentation to determine what this
|
||||
means, and then have to write `unsafe` somewhere to identify that they're
|
||||
aware of the danger.
|
||||
* On trait declarations, `unsafe` is declaring that *implementing* the trait
|
||||
is an unsafe operation, as it has contracts that other unsafe code is free to
|
||||
trust blindly. (More on this below.)
|
||||
is an unsafe operation, as it has contracts that other unsafe code is free
|
||||
to trust blindly. (More on this below.)
|
||||
|
||||
* I am declaring that I have, to the best of my knowledge, adhered to the
|
||||
unchecked contracts:
|
||||
@ -64,9 +65,9 @@ This means that Unsafe, **the royal vanguard of Undefined Behaviour**, has to be
|
||||
*super paranoid* about generic safe code. Unsafe is free to trust *specific* safe
|
||||
code (or else you would degenerate into infinite spirals of paranoid despair).
|
||||
It is generally regarded as ok to trust the standard library to be correct, as
|
||||
std is effectively an extension of the language (and you *really* just have to trust
|
||||
the language). If `std` fails to uphold the guarantees it declares, then it's
|
||||
basically a language bug.
|
||||
`std` is effectively an extension of the language (and you *really* just have
|
||||
to trust the language). If `std` fails to uphold the guarantees it declares,
|
||||
then it's basically a language bug.
|
||||
|
||||
That said, it would be best to minimize *needlessly* relying on properties of
|
||||
concrete safe code. Bugs happen! Of course, I must reinforce that this is only
|
||||
@ -89,7 +90,7 @@ Ord for a type, but don't actually provide a proper total ordering, BTreeMap wil
|
||||
get *really confused* and start making a total mess of itself. Data that is
|
||||
inserted may be impossible to find!
|
||||
|
||||
But that's ok. BTreeMap is safe, so it guarantees that even if you give it a
|
||||
But that's okay. BTreeMap is safe, so it guarantees that even if you give it a
|
||||
*completely* garbage Ord implementation, it will still do something *safe*. You
|
||||
won't start reading uninitialized memory or unallocated memory. In fact, BTreeMap
|
||||
manages to not actually lose any of your data. When the map is dropped, all the
|
||||
@ -104,7 +105,24 @@ Safe's responsibility to uphold.
|
||||
But wouldn't it be grand if there was some way for Unsafe to trust *some* trait
|
||||
contracts *somewhere*? This is the problem that unsafe traits tackle: by marking
|
||||
*the trait itself* as unsafe *to implement*, Unsafe can trust the implementation
|
||||
to be correct.
|
||||
to uphold the trait's contract. Although the trait implementation may be
|
||||
incorrect in arbitrary other ways.
|
||||
|
||||
For instance, given a hypothetical UnsafeOrd trait, this is technically a valid
|
||||
implementation:
|
||||
|
||||
```rust
|
||||
# use std::cmp::Ordering;
|
||||
# struct MyType;
|
||||
# pub unsafe trait UnsafeOrd { fn cmp(&self, other: &Self) -> Ordering; }
|
||||
unsafe impl UnsafeOrd for MyType {
|
||||
fn cmp(&self, other: &Self) -> Ordering {
|
||||
Ordering::Equal
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
But it's probably not the implementation you want.
|
||||
|
||||
Rust has traditionally avoided making traits unsafe because it makes Unsafe
|
||||
pervasive, which is not desirable. Send and Sync are unsafe is because
|
||||
|
@ -1,8 +1,8 @@
|
||||
% Working with Unsafe
|
||||
|
||||
Rust generally only gives us the tools to talk about Unsafe in a scoped and
|
||||
binary manner. Unfortunately, reality is significantly more complicated than that.
|
||||
For instance, consider the following toy function:
|
||||
Rust generally only gives us the tools to talk about Unsafe Rust in a scoped and
|
||||
binary manner. Unfortunately, reality is significantly more complicated than
|
||||
that. For instance, consider the following toy function:
|
||||
|
||||
```rust
|
||||
fn index(idx: usize, arr: &[u8]) -> Option<u8> {
|
||||
@ -35,10 +35,15 @@ fn index(idx: usize, arr: &[u8]) -> Option<u8> {
|
||||
|
||||
This program is now unsound, and yet *we only modified safe code*. This is the
|
||||
fundamental problem of safety: it's non-local. The soundness of our unsafe
|
||||
operations necessarily depends on the state established by "safe" operations.
|
||||
Although safety *is* modular (we *still* don't need to worry about about
|
||||
unrelated safety issues like uninitialized memory), it quickly contaminates the
|
||||
surrounding code.
|
||||
operations necessarily depends on the state established by otherwise
|
||||
"safe" operations.
|
||||
|
||||
Safety is modular in the sense that opting into unsafety doesn't require you
|
||||
to consider arbitrary other kinds of badness. For instance, doing an unchecked
|
||||
index into a slice doesn't mean you suddenly need to worry about the slice being
|
||||
null or containing uninitialized memory. Nothing fundamentally changes. However
|
||||
safety *isn't* modular in the sense that programs are inherently stateful and
|
||||
your unsafe operations may depend on arbitrary other state.
|
||||
|
||||
Trickier than that is when we get into actual statefulness. Consider a simple
|
||||
implementation of `Vec`:
|
||||
@ -84,10 +89,10 @@ fn make_room(&mut self) {
|
||||
}
|
||||
```
|
||||
|
||||
This code is safe, but it is also completely unsound. Changing the capacity
|
||||
violates the invariants of Vec (that `cap` reflects the allocated space in the
|
||||
Vec). This is not something the rest of Vec can guard against. It *has* to
|
||||
trust the capacity field because there's no way to verify it.
|
||||
This code is 100% Safe Rust but it is also completely unsound. Changing the
|
||||
capacity violates the invariants of Vec (that `cap` reflects the allocated space
|
||||
in the Vec). This is not something the rest of Vec can guard against. It *has*
|
||||
to trust the capacity field because there's no way to verify it.
|
||||
|
||||
`unsafe` does more than pollute a whole function: it pollutes a whole *module*.
|
||||
Generally, the only bullet-proof way to limit the scope of unsafe code is at the
|
||||
@ -102,9 +107,13 @@ as Vec.
|
||||
It is therefore possible for us to write a completely safe abstraction that
|
||||
relies on complex invariants. This is *critical* to the relationship between
|
||||
Safe Rust and Unsafe Rust. We have already seen that Unsafe code must trust
|
||||
*some* Safe code, but can't trust *arbitrary* Safe code. However if Unsafe
|
||||
couldn't prevent client Safe code from messing with its state in arbitrary ways,
|
||||
safety would be a lost cause.
|
||||
*some* Safe code, but can't trust *generic* Safe code. It can't trust an
|
||||
arbitrary implementor of a trait or any function that was passed to it to be
|
||||
well-behaved in a way that safe code doesn't care about.
|
||||
|
||||
However if unsafe code couldn't prevent client safe code from messing with its
|
||||
state in arbitrary ways, safety would be a lost cause. Thankfully, it *can*
|
||||
prevent arbitrary code from messing with critical state due to privacy.
|
||||
|
||||
Safety lives!
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user