24 KiB
% Example: Implementing Vec
TODO: audit for non-ZST offsets from heap::empty
To bring everything together, we're going to write std::Vec
from scratch.
Because the all the best tools for writing unsafe code are unstable, this
project will only work on nightly (as of Rust 1.2.0).
Layout
First off, we need to come up with the struct layout. Naively we want this design:
struct Vec<T> {
ptr: *mut T,
cap: usize,
len: usize,
}
And indeed this would compile. Unfortunately, it would be incorrect. The compiler
will give us too strict variance, so e.g. an &Vec<&'static str>
couldn't be used
where an &Vec<&'a str>
was expected. More importantly, it will give incorrect
ownership information to dropck, as it will conservatively assume we don't own
any values of type T
. See [the chapter on ownership and lifetimes]
(lifetimes.html) for details.
As we saw in the lifetimes chapter, we should use Unique<T>
in place of *mut T
when we have a raw pointer to an allocation we own:
#![feature(unique)]
use std::ptr::{Unique, self};
pub struct Vec<T> {
ptr: Unique<T>,
cap: usize,
len: usize,
}
As a recap, Unique is a wrapper around a raw pointer that declares that:
- We own at least one value of type
T
- We are Send/Sync iff
T
is Send/Sync - Our pointer is never null (and therefore
Option<Vec>
is null-pointer-optimized)
That last point is subtle. First, it makes Unique::new
unsafe to call, because
putting null
inside of it is Undefined Behaviour. It also throws a
wrench in an important feature of Vec (and indeed all of the std collections):
an empty Vec doesn't actually allocate at all. So if we can't allocate,
but also can't put a null pointer in ptr
, what do we do in
Vec::new
? Well, we just put some other garbage in there!
This is perfectly fine because we already have cap == 0
as our sentinel for no
allocation. We don't even need to handle it specially in almost any code because
we usually need to check if cap > len
or len > 0
anyway. The traditional
Rust value to put here is 0x01
. The standard library actually exposes this
as std::rt::heap::EMPTY
. There are quite a few places where we'll want to use
heap::EMPTY
because there's no real allocation to talk about but null
would
make the compiler angry.
All of the heap
API is totally unstable under the alloc
feature, though.
We could trivially define heap::EMPTY
ourselves, but we'll want the rest of
the heap
API anyway, so let's just get that dependency over with.
Allocating Memory
So:
#![feature(alloc)]
use std::rt::heap::EMPTY;
use std::mem;
impl<T> Vec<T> {
fn new() -> Self {
assert!(mem::size_of::<T>() != 0, "We're not ready to handle ZSTs");
unsafe {
// need to cast EMPTY to the actual ptr type we want, let
// inference handle it.
Vec { ptr: Unique::new(heap::EMPTY as *mut _), len: 0, cap: 0 }
}
}
}
I slipped in that assert there because zero-sized types will require some special handling throughout our code, and I want to defer the issue for now. Without this assert, some of our early drafts will do some Very Bad Things.
Next we need to figure out what to actually do when we do want space. For that, we'll need to use the rest of the heap APIs. These basically allow us to talk directly to Rust's instance of jemalloc.
We'll also need a way to handle out-of-memory conditions. The standard library
calls the abort
intrinsic, but calling intrinsics from normal Rust code is a
pretty bad idea. Unfortunately, the abort
exposed by the standard library
allocates. Not something we want to do during oom
! Instead, we'll call
std::process::exit
.
fn oom() {
::std::process::exit(-9999);
}
Okay, now we can write growing. Roughly, we want to have this logic:
if cap == 0:
allocate()
cap = 1
else
reallocate
cap *= 2
But Rust's only supported allocator API is so low level that we'll need to
do a fair bit of extra work, though. We also need to guard against some special
conditions that can occur with really large allocations. In particular, we index
into arrays using unsigned integers, but ptr::offset
takes signed integers. This
means Bad Things will happen if we ever manage to grow to contain more than
isize::MAX
elements. Thankfully, this isn't something we need to worry about
in most cases.
On 64-bit targets we're artifically limited to only 48-bits, so we'll run out
of memory far before we reach that point. However on 32-bit targets, particularly
those with extensions to use more of the address space, it's theoretically possible
to successfully allocate more than isize::MAX
bytes of memory. Still, we only
really need to worry about that if we're allocating elements that are a byte large.
Anything else will use up too much space.
However since this is a tutorial, we're not going to be particularly optimal here,
and just unconditionally check, rather than use clever platform-specific cfg
s.
fn grow(&mut self) {
// this is all pretty delicate, so let's say it's all unsafe
unsafe {
let align = mem::min_align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
// as an invariant, we can assume that `self.cap < isize::MAX`,
// so this doesn't need to be checked.
let new_cap = self.cap * 2;
// Similarly this can't overflow due to previously allocating this
let old_num_bytes = self.cap * elem_size;
// check that the new allocation doesn't exceed `isize::MAX` at all
// regardless of the actual size of the capacity. This combines the
// `new_cap <= isize::MAX` and `new_num_bytes <= usize::MAX` checks
// we need to make. We lose the ability to allocate e.g. 2/3rds of
// the address space with a single Vec of i16's on 32-bit though.
// Alas, poor Yorick -- I knew him, Horatio.
assert!(old_num_bytes <= (::std::isize::MAX as usize) / 2,
"capacity overflow");
let new_num_bytes = old_num_bytes * 2;
let ptr = heap::reallocate(*self.ptr as *mut _,
old_num_bytes,
new_num_bytes,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom(); }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
Nothing particularly tricky here. Just computing sizes and alignments and doing some careful multiplication checks.
Push and Pop
Alright. We can initialize. We can allocate. Let's actually implement some
functionality! Let's start with push
. All it needs to do is check if we're
full to grow, unconditionally write to the next index, and then increment our
length.
To do the write we have to be careful not to evaluate the memory we want to write
to. At worst, it's truly uninitialized memory from the allocator. At best it's the
bits of some old value we popped off. Either way, we can't just index to the memory
and dereference it, because that will evaluate the memory as a valid instance of
T. Worse, foo[idx] = x
will try to call drop
on the old value of foo[idx]
!
The correct way to do this is with ptr::write
, which just blindly overwrites the
target address with the bits of the value we provide. No evaluation involved.
For push
, if the old len (before push was called) is 0, then we want to write
to the 0th index. So we should offset by the old len.
pub fn push(&mut self, elem: T) {
if self.len == self.cap { self.grow(); }
unsafe {
ptr::write(self.ptr.offset(self.len as isize), elem);
}
// Can't fail, we'll OOM first.
self.len += 1;
}
Easy! How about pop
? Although this time the index we want to access is
initialized, Rust won't just let us dereference the location of memory to move
the value out, because that would leave the memory uninitialized! For this we
need ptr::read
, which just copies out the bits from the target address and
intrprets it as a value of type T. This will leave the memory at this address
logically uninitialized, even though there is in fact a perfectly good instance
of T there.
For pop
, if the old len is 1, we want to read out of the 0th index. So we
should offset by the new len.
pub fn pop(&mut self) -> Option<T> {
if self.len == 0 {
None
} else {
self.len -= 1;
unsafe {
Some(ptr::read(self.ptr.offset(self.len as isize)))
}
}
}
Deallocating
Next we should implement Drop so that we don't massively leaks tons of resources.
The easiest way is to just call pop
until it yields None, and then deallocate
our buffer. Note that calling pop
is uneeded if T: !Drop
. In theory we can
ask Rust if T needs_drop and omit the calls to pop
. However in practice LLVM
is really good at removing simple side-effect free code like this, so I wouldn't
bother unless you notice it's not being stripped (in this case it is).
We must not call heap::deallocate
when self.cap == 0
, as in this case we haven't
actually allocated any memory.
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
if self.cap != 0 {
while let Some(_) = self.pop() { }
let align = mem::min_align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr, num_bytes, align);
}
}
}
}
Deref
Alright! We've got a decent minimal ArrayStack implemented. We can push, we can
pop, and we can clean up after ourselves. However there's a whole mess of functionality
we'd reasonably want. In particular, we have a proper array, but none of the slice
functionality. That's actually pretty easy to solve: we can implement Deref<Target=[T]>
.
This will magically make our Vec coerce to and behave like a slice in all sorts of
conditions.
All we need is slice::from_raw_parts
.
use std::ops::Deref;
impl<T> Deref for Vec<T> {
type Target = [T];
fn deref(&self) -> &[T] {
unsafe {
::std::slice::from_raw_parts(*self.ptr, self.len)
}
}
}
And let's do DerefMut too:
use std::ops::DerefMut;
impl<T> DerefMut for Vec<T> {
fn deref_mut(&mut self) -> &mut [T] {
unsafe {
::std::slice::from_raw_parts_mut(*self.ptr, self.len)
}
}
}
Now we have len
, first
, last
, indexing, slicing, sorting, iter
, iter_mut
,
and all other sorts of bells and whistles provided by slice. Sweet!
Insert and Remove
Something not provided but slice is insert
and remove
, so let's do those next.
Insert needs to shift all the elements at the target index to the right by one.
To do this we need to use ptr::copy
, which is our version of C's memmove
.
This copies some chunk of memory from one location to another, correctly handling
the case where the source and destination overlap (which will definitely happen
here).
If we insert at index i
, we want to shift the [i .. len]
to [i+1 .. len+1]
using the old len.
pub fn insert(&mut self, index: usize, elem: T) {
// Note: `<=` because it's valid to insert after everything
// which would be equivalent to push.
assert!(index <= self.len, "index out of bounds");
if self.cap == self.len { self.grow(); }
unsafe {
if index < self.len {
// ptr::copy(src, dest, len): "copy from source to dest len elems"
ptr::copy(self.ptr.offset(index as isize),
self.ptr.offset(index as isize + 1),
len - index);
}
ptr::write(self.ptr.offset(index as isize), elem);
self.len += 1;
}
}
Remove behaves in the opposite manner. We need to shift all the elements from
[i+1 .. len + 1]
to [i .. len]
using the new len.
pub fn remove(&mut self, index: usize) -> T {
// Note: `<` because it's *not* valid to remove after everything
assert!(index < self.len, "index out of bounds");
unsafe {
self.len -= 1;
let result = ptr::read(self.ptr.offset(index as isize));
ptr::copy(self.ptr.offset(index as isize + 1),
self.ptr.offset(index as isize),
len - index);
result
}
}
IntoIter
Let's move on to writing iterators. iter
and iter_mut
have already been
written for us thanks to The Magic of Deref. However there's two interesting
iterators that Vec provides that slices can't: into_iter
and drain
.
IntoIter consumes the Vec by-value, and can consequently yield its elements by-value. In order to enable this, IntoIter needs to take control of Vec's allocation.
IntoIter needs to be DoubleEnded as well, to enable reading from both ends.
Reading from the back could just be implemented as calling pop
, but reading
from the front is harder. We could call remove(0)
but that would be insanely
expensive. Instead we're going to just use ptr::read to copy values out of either
end of the Vec without mutating the buffer at all.
To do this we're going to use a very common C idiom for array iteration. We'll make two pointers; one that points to the start of the array, and one that points to one-element past the end. When we want an element from one end, we'll read out the value pointed to at that end and move the pointer over by one. When the two pointers are equal, we know we're done.
Note that the order of read and offset are reversed for next
and next_back
For next_back
the pointer is always after the element it wants to read next,
while for next
the pointer is always at the element it wants to read next.
To see why this is, consider the case where every element but one has been yielded.
The array looks like this:
S E
[X, X, X, O, X, X, X]
If E pointed directly at the element it wanted to yield next, it would be indistinguishable from the case where there are no more elements to yield.
So we're going to use the following struct:
struct IntoIter<T> {
buf: Unique<T>,
cap: usize,
start: *const T,
end: *const T,
}
And initialize it like this:
impl<T> Vec<T> {
fn into_iter(self) -> IntoIter<T> {
// Can't destructure Vec since it's Drop
let ptr = self.ptr;
let cap = self.cap;
let len = self.len;
// Make sure not to drop Vec since that will free the buffer
mem::forget(self);
unsafe {
IntoIter {
buf: ptr,
cap: cap,
start: *ptr,
end: ptr.offset(len as isize),
}
}
}
}
Here's iterating forward:
impl<T> Iterator for IntoIter<T> {
type Item = T;
fn next(&mut self) -> Option<T> {
if self.start == self.end {
None
} else {
unsafe {
let result = ptr::read(self.start);
self.start = self.start.offset(1);
Some(result)
}
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
let len = self.end as usize - self.start as usize;
(len, Some(len))
}
}
And here's iterating backwards.
impl<T> DoubleEndedIterator for IntoIter<T> {
fn next_back(&mut self) -> Option<T> {
if self.start == self.end {
None
} else {
unsafe {
self.end = self.end.offset(-1);
Some(ptr::read(self.end))
}
}
}
}
Because IntoIter takes ownership of its allocation, it needs to implement Drop to free it. However it also wants to implement Drop to drop any elements it contains that weren't yielded.
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
if self.cap != 0 {
// drop any remaining elements
for _ in &mut *self {}
let align = mem::min_align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.buf as *mut _, num_bytes, align);
}
}
}
}
We've actually reached an interesting situation here: we've duplicated the logic for specifying a buffer and freeing its memory. Now that we've implemented it and identified actual logic duplication, this is a good time to perform some logic compression.
We're going to abstract out the (ptr, cap)
pair and give them the logic for
allocating, growing, and freeing:
struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
impl<T> RawVec<T> {
fn new() -> Self {
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
unsafe {
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
}
}
// unchanged from Vec
fn grow(&mut self) {
unsafe {
let align = mem::min_align_of::<T>();
let elem_size = mem::size_of::<T>();
let (new_cap, ptr) = if self.cap == 0 {
let ptr = heap::allocate(elem_size, align);
(1, ptr)
} else {
let new_cap = 2 * self.cap;
let ptr = heap::reallocate(*self.ptr as *mut _,
self.cap * elem_size,
new_cap * elem_size,
align);
(new_cap, ptr)
};
// If allocate or reallocate fail, we'll get `null` back
if ptr.is_null() { oom() }
self.ptr = Unique::new(ptr as *mut _);
self.cap = new_cap;
}
}
}
impl<T> Drop for RawVec<T> {
fn drop(&mut self) {
if self.cap != 0 {
let align = mem::min_align_of::<T>();
let elem_size = mem::size_of::<T>();
let num_bytes = elem_size * self.cap;
unsafe {
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
}
}
}
}
And change vec as follows:
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
fn ptr(&self) -> *mut T { *self.buf.ptr }
fn cap(&self) -> usize { self.buf.cap }
pub fn new() -> Self {
Vec { buf: RawVec::new(), len: 0 }
}
// push/pop/insert/remove largely unchanged:
// * `self.ptr -> self.ptr()`
// * `self.cap -> self.cap()`
// * `self.grow -> self.buf.grow()`
}
impl<T> Drop for Vec<T> {
fn drop(&mut self) {
while let Some(_) = self.pop() {}
// deallocation is handled by RawVec
}
}
And finally we can really simplify IntoIter:
struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
start: *const T,
end: *const T,
}
// next and next_back litterally unchanged since they never referred to the buf
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
// only need to ensure all our elements are read;
// buffer will clean itself up afterwards.
for _ in &mut *self {}
}
}
impl<T> Vec<T> {
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
// need to use ptr::read to unsafely move the buf out since it's
// not Copy.
let buf = ptr::read(&self.buf);
let len = self.len;
mem::forget(self);
IntoIter {
start: *buf.ptr,
end: buf.ptr.offset(len as isize),
_buf: buf,
}
}
}
}
Much better.
Drain
Let's move on to Drain. Drain is largely the same as IntoIter, except that instead of consuming the Vec, it borrows the Vec and leaves its allocation free. For now we'll only implement the "basic" full-range version.
use std::marker::PhantomData;
struct Drain<'a, T: 'a> {
vec: PhantomData<&'a mut Vec<T>>
start: *const T,
end: *const T,
}
impl<'a, T> Iterator for Drain<'a, T> {
type Item = T;
fn next(&mut self) -> Option<T> {
if self.start == self.end {
None
-- wait, this is seeming familiar. Let's do some more compression. Both IntoIter and Drain have the exact same structure, let's just factor it out.
struct RawValIter<T> {
start: *const T,
end: *const T,
}
impl<T> RawValIter<T> {
// unsafe to construct because it has no associated lifetimes.
// This is necessary to store a RawValIter in the same struct as
// its actual allocation. OK since it's a private implementation
// detail.
unsafe fn new(slice: &[T]) -> Self {
RawValIter {
start: slice.as_ptr(),
end: slice.as_ptr().offset(slice.len() as isize),
}
}
}
// Iterator and DoubleEndedIterator impls identical to IntoIter.
And IntoIter becomes the following:
pub struct IntoIter<T> {
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
iter: RawValIter<T>,
}
impl<T> Iterator for IntoIter<T> {
type Item = T;
fn next(&mut self) -> Option<T> { self.iter.next() }
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
}
impl<T> DoubleEndedIterator for IntoIter<T> {
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
}
impl<T> Drop for IntoIter<T> {
fn drop(&mut self) {
for _ in &mut self.iter {}
}
}
impl<T> Vec<T> {
pub fn into_iter(self) -> IntoIter<T> {
unsafe {
let iter = RawValIter::new(&self);
let buf = ptr::read(&self.buf);
mem::forget(self);
IntoIter {
iter: iter,
_buf: buf,
}
}
}
}
Note that I've left a few quirks in this design to make upgrading Drain to work with arbitrary subranges a bit easier. In particular we could have RawValIter drain itself on drop, but that won't work right for a more complex Drain. We also take a slice to simplify Drain initialization.
Alright, now Drain is really easy:
use std::marker::PhantomData;
pub struct Drain<'a, T: 'a> {
vec: PhantomData<&'a mut Vec<T>>,
iter: RawValIter<T>,
}
impl<'a, T> Iterator for Drain<'a, T> {
type Item = T;
fn next(&mut self) -> Option<T> { self.iter.next_back() }
fn size_hint(&self) -> (usize, Option<usize>) { self.iter.size_hint() }
}
impl<'a, T> DoubleEndedIterator for Drain<'a, T> {
fn next_back(&mut self) -> Option<T> { self.iter.next_back() }
}
impl<'a, T> Drop for Drain<'a, T> {
fn drop(&mut self) {
for _ in &mut self.iter {}
}
}
impl<T> Vec<T> {
pub fn drain(&mut self) -> Drain<T> {
// this is a mem::forget safety thing. If Drain is forgotten, we just
// leak the whole Vec's contents. Also we need to do this *eventually*
// anyway, so why not do it now?
self.len = 0;
unsafe {
Drain {
iter: RawValIter::new(&self),
vec: PhantomData,
}
}
}
}
Handling Zero-Sized Types
It's time. We're going to fight the spectre that is zero-sized types. Safe Rust never needs to care about this, but Vec is very intensive on raw pointers and raw allocations, which are exactly the only two things that care about zero-sized types. We need to be careful of two things:
- The raw allocator API has undefined behaviour if you pass in 0 for an allocation size.
- raw pointer offsets are no-ops for zero-sized types, which will break our C-style pointer iterator
Thankfully we abstracted out pointer-iterators and allocating handling into RawValIter and RawVec respectively. How mysteriously convenient.
Allocating Zero-Sized Types
So if the allocator API doesn't support zero-sized allocations, what on earth
do we store as our allocation? Why, heap::EMPTY
of course! Almost every operation
with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs
to be considered to store or load them. This actually extends to ptr::read
and
ptr::write
: they won't actually look at the pointer at all. As such we never need
to change the pointer.
TODO
Iterating Zero-Sized Types
TODO
Advanced Drain
TODO? Not clear if informative