clean up vec chapter of tarpl
This commit is contained in:
parent
42c2f107c1
commit
5f6e0abe27
@ -46,6 +46,7 @@
|
||||
* [Deref](vec-deref.md)
|
||||
* [Insert and Remove](vec-insert-remove.md)
|
||||
* [IntoIter](vec-into-iter.md)
|
||||
* [RawVec](vec-raw.md)
|
||||
* [Drain](vec-drain.md)
|
||||
* [Handling Zero-Sized Types](vec-zsts.md)
|
||||
* [Final Code](vec-final.md)
|
||||
|
@ -46,7 +46,7 @@ Okay, now we can write growing. Roughly, we want to have this logic:
|
||||
if cap == 0:
|
||||
allocate()
|
||||
cap = 1
|
||||
else
|
||||
else:
|
||||
reallocate
|
||||
cap *= 2
|
||||
```
|
||||
|
@ -2,13 +2,13 @@
|
||||
|
||||
Next we should implement Drop so that we don't massively leak tons of resources.
|
||||
The easiest way is to just call `pop` until it yields None, and then deallocate
|
||||
our buffer. Note that calling `pop` is uneeded if `T: !Drop`. In theory we can
|
||||
ask Rust if T needs_drop and omit the calls to `pop`. However in practice LLVM
|
||||
is *really* good at removing simple side-effect free code like this, so I wouldn't
|
||||
bother unless you notice it's not being stripped (in this case it is).
|
||||
our buffer. Note that calling `pop` is unneeded if `T: !Drop`. In theory we can
|
||||
ask Rust if `T` `needs_drop` and omit the calls to `pop`. However in practice
|
||||
LLVM is *really* good at removing simple side-effect free code like this, so I
|
||||
wouldn't bother unless you notice it's not being stripped (in this case it is).
|
||||
|
||||
We must not call `heap::deallocate` when `self.cap == 0`, as in this case we haven't
|
||||
actually allocated any memory.
|
||||
We must not call `heap::deallocate` when `self.cap == 0`, as in this case we
|
||||
haven't actually allocated any memory.
|
||||
|
||||
|
||||
```rust,ignore
|
||||
|
@ -1,13 +1,15 @@
|
||||
% Deref
|
||||
|
||||
Alright! We've got a decent minimal ArrayStack implemented. We can push, we can
|
||||
pop, and we can clean up after ourselves. However there's a whole mess of functionality
|
||||
we'd reasonably want. In particular, we have a proper array, but none of the slice
|
||||
functionality. That's actually pretty easy to solve: we can implement `Deref<Target=[T]>`.
|
||||
This will magically make our Vec coerce to and behave like a slice in all sorts of
|
||||
conditions.
|
||||
Alright! We've got a decent minimal stack implemented. We can push, we can
|
||||
pop, and we can clean up after ourselves. However there's a whole mess of
|
||||
functionality we'd reasonably want. In particular, we have a proper array, but
|
||||
none of the slice functionality. That's actually pretty easy to solve: we can
|
||||
implement `Deref<Target=[T]>`. This will magically make our Vec coerce to, and
|
||||
behave like, a slice in all sorts of conditions.
|
||||
|
||||
All we need is `slice::from_raw_parts`.
|
||||
All we need is `slice::from_raw_parts`. It will correctly handle empty slices
|
||||
for us. Later once we set up zero-sized type support it will also Just Work
|
||||
for those too.
|
||||
|
||||
```rust,ignore
|
||||
use std::ops::Deref;
|
||||
@ -36,5 +38,5 @@ impl<T> DerefMut for Vec<T> {
|
||||
}
|
||||
```
|
||||
|
||||
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`, `iter_mut`,
|
||||
and all other sorts of bells and whistles provided by slice. Sweet!
|
||||
Now we have `len`, `first`, `last`, indexing, slicing, sorting, `iter`,
|
||||
`iter_mut`, and all other sorts of bells and whistles provided by slice. Sweet!
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
Let's move on to Drain. Drain is largely the same as IntoIter, except that
|
||||
instead of consuming the Vec, it borrows the Vec and leaves its allocation
|
||||
free. For now we'll only implement the "basic" full-range version.
|
||||
untouched. For now we'll only implement the "basic" full-range version.
|
||||
|
||||
```rust,ignore
|
||||
use std::marker::PhantomData;
|
||||
@ -38,6 +38,9 @@ impl<T> RawValIter<T> {
|
||||
RawValIter {
|
||||
start: slice.as_ptr(),
|
||||
end: if slice.len() == 0 {
|
||||
// if `len = 0`, then this is not actually allocated memory.
|
||||
// Need to avoid offsetting because that will give wrong
|
||||
// information to LLVM via GEP.
|
||||
slice.as_ptr()
|
||||
} else {
|
||||
slice.as_ptr().offset(slice.len() as isize)
|
||||
@ -137,5 +140,7 @@ impl<T> Vec<T> {
|
||||
}
|
||||
```
|
||||
|
||||
For more details on the `mem::forget` problem, see the
|
||||
[section on leaks][leaks].
|
||||
|
||||
|
||||
[leaks]: leaking.html
|
||||
|
@ -1,12 +1,13 @@
|
||||
% Insert and Remove
|
||||
|
||||
Something *not* provided but slice is `insert` and `remove`, so let's do those next.
|
||||
Something *not* provided by slice is `insert` and `remove`, so let's do those
|
||||
next.
|
||||
|
||||
Insert needs to shift all the elements at the target index to the right by one.
|
||||
To do this we need to use `ptr::copy`, which is our version of C's `memmove`.
|
||||
This copies some chunk of memory from one location to another, correctly handling
|
||||
the case where the source and destination overlap (which will definitely happen
|
||||
here).
|
||||
This copies some chunk of memory from one location to another, correctly
|
||||
handling the case where the source and destination overlap (which will
|
||||
definitely happen here).
|
||||
|
||||
If we insert at index `i`, we want to shift the `[i .. len]` to `[i+1 .. len+1]`
|
||||
using the *old* len.
|
||||
|
@ -11,19 +11,20 @@ allocation.
|
||||
IntoIter needs to be DoubleEnded as well, to enable reading from both ends.
|
||||
Reading from the back could just be implemented as calling `pop`, but reading
|
||||
from the front is harder. We could call `remove(0)` but that would be insanely
|
||||
expensive. Instead we're going to just use ptr::read to copy values out of either
|
||||
end of the Vec without mutating the buffer at all.
|
||||
expensive. Instead we're going to just use ptr::read to copy values out of
|
||||
either end of the Vec without mutating the buffer at all.
|
||||
|
||||
To do this we're going to use a very common C idiom for array iteration. We'll
|
||||
make two pointers; one that points to the start of the array, and one that points
|
||||
to one-element past the end. When we want an element from one end, we'll read out
|
||||
the value pointed to at that end and move the pointer over by one. When the two
|
||||
pointers are equal, we know we're done.
|
||||
make two pointers; one that points to the start of the array, and one that
|
||||
points to one-element past the end. When we want an element from one end, we'll
|
||||
read out the value pointed to at that end and move the pointer over by one. When
|
||||
the two pointers are equal, we know we're done.
|
||||
|
||||
Note that the order of read and offset are reversed for `next` and `next_back`
|
||||
For `next_back` the pointer is always *after* the element it wants to read next,
|
||||
while for `next` the pointer is always *at* the element it wants to read next.
|
||||
To see why this is, consider the case where every element but one has been yielded.
|
||||
To see why this is, consider the case where every element but one has been
|
||||
yielded.
|
||||
|
||||
The array looks like this:
|
||||
|
||||
@ -35,6 +36,10 @@ The array looks like this:
|
||||
If E pointed directly at the element it wanted to yield next, it would be
|
||||
indistinguishable from the case where there are no more elements to yield.
|
||||
|
||||
Although we don't actually care about it during iteration, we also need to hold
|
||||
onto the Vec's allocation information in order to free it once IntoIter is
|
||||
dropped.
|
||||
|
||||
So we're going to use the following struct:
|
||||
|
||||
```rust,ignore
|
||||
@ -46,8 +51,8 @@ struct IntoIter<T> {
|
||||
}
|
||||
```
|
||||
|
||||
One last subtle detail: if our Vec is empty, we want to produce an empty iterator.
|
||||
This will actually technically fall out doing the naive thing of:
|
||||
One last subtle detail: if our Vec is empty, we want to produce an empty
|
||||
iterator. This will actually technically fall out doing the naive thing of:
|
||||
|
||||
```text
|
||||
start = ptr
|
||||
@ -155,139 +160,3 @@ impl<T> Drop for IntoIter<T> {
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We've actually reached an interesting situation here: we've duplicated the logic
|
||||
for specifying a buffer and freeing its memory. Now that we've implemented it and
|
||||
identified *actual* logic duplication, this is a good time to perform some logic
|
||||
compression.
|
||||
|
||||
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
|
||||
allocating, growing, and freeing:
|
||||
|
||||
```rust,ignore
|
||||
|
||||
struct RawVec<T> {
|
||||
ptr: Unique<T>,
|
||||
cap: usize,
|
||||
}
|
||||
|
||||
impl<T> RawVec<T> {
|
||||
fn new() -> Self {
|
||||
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
|
||||
unsafe {
|
||||
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
|
||||
}
|
||||
}
|
||||
|
||||
// unchanged from Vec
|
||||
fn grow(&mut self) {
|
||||
unsafe {
|
||||
let align = mem::align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
let ptr = heap::allocate(elem_size, align);
|
||||
(1, ptr)
|
||||
} else {
|
||||
let new_cap = 2 * self.cap;
|
||||
let ptr = heap::reallocate(*self.ptr as *mut _,
|
||||
self.cap * elem_size,
|
||||
new_cap * elem_size,
|
||||
align);
|
||||
(new_cap, ptr)
|
||||
};
|
||||
|
||||
// If allocate or reallocate fail, we'll get `null` back
|
||||
if ptr.is_null() { oom() }
|
||||
|
||||
self.ptr = Unique::new(ptr as *mut _);
|
||||
self.cap = new_cap;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
impl<T> Drop for RawVec<T> {
|
||||
fn drop(&mut self) {
|
||||
if self.cap != 0 {
|
||||
let align = mem::align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And change vec as follows:
|
||||
|
||||
```rust,ignore
|
||||
pub struct Vec<T> {
|
||||
buf: RawVec<T>,
|
||||
len: usize,
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
fn ptr(&self) -> *mut T { *self.buf.ptr }
|
||||
|
||||
fn cap(&self) -> usize { self.buf.cap }
|
||||
|
||||
pub fn new() -> Self {
|
||||
Vec { buf: RawVec::new(), len: 0 }
|
||||
}
|
||||
|
||||
// push/pop/insert/remove largely unchanged:
|
||||
// * `self.ptr -> self.ptr()`
|
||||
// * `self.cap -> self.cap()`
|
||||
// * `self.grow -> self.buf.grow()`
|
||||
}
|
||||
|
||||
impl<T> Drop for Vec<T> {
|
||||
fn drop(&mut self) {
|
||||
while let Some(_) = self.pop() {}
|
||||
// deallocation is handled by RawVec
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And finally we can really simplify IntoIter:
|
||||
|
||||
```rust,ignore
|
||||
struct IntoIter<T> {
|
||||
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
|
||||
// next and next_back literally unchanged since they never referred to the buf
|
||||
|
||||
impl<T> Drop for IntoIter<T> {
|
||||
fn drop(&mut self) {
|
||||
// only need to ensure all our elements are read;
|
||||
// buffer will clean itself up afterwards.
|
||||
for _ in &mut *self {}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
pub fn into_iter(self) -> IntoIter<T> {
|
||||
unsafe {
|
||||
// need to use ptr::read to unsafely move the buf out since it's
|
||||
// not Copy.
|
||||
let buf = ptr::read(&self.buf);
|
||||
let len = self.len;
|
||||
mem::forget(self);
|
||||
|
||||
IntoIter {
|
||||
start: *buf.ptr,
|
||||
end: buf.ptr.offset(len as isize),
|
||||
_buf: buf,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Much better.
|
||||
|
@ -13,15 +13,15 @@ pub struct Vec<T> {
|
||||
# fn main() {}
|
||||
```
|
||||
|
||||
And indeed this would compile. Unfortunately, it would be incorrect. The compiler
|
||||
will give us too strict variance, so e.g. an `&Vec<&'static str>` couldn't be used
|
||||
where an `&Vec<&'a str>` was expected. More importantly, it will give incorrect
|
||||
ownership information to dropck, as it will conservatively assume we don't own
|
||||
any values of type `T`. See [the chapter on ownership and lifetimes]
|
||||
(lifetimes.html) for details.
|
||||
And indeed this would compile. Unfortunately, it would be incorrect. The
|
||||
compiler will give us too strict variance, so e.g. an `&Vec<&'static str>`
|
||||
couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it
|
||||
will give incorrect ownership information to dropck, as it will conservatively
|
||||
assume we don't own any values of type `T`. See [the chapter on ownership and
|
||||
lifetimes] (lifetimes.html) for details.
|
||||
|
||||
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of `*mut T`
|
||||
when we have a raw pointer to an allocation we own:
|
||||
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of
|
||||
`*mut T` when we have a raw pointer to an allocation we own:
|
||||
|
||||
|
||||
```rust
|
||||
@ -40,9 +40,10 @@ pub struct Vec<T> {
|
||||
|
||||
As a recap, Unique is a wrapper around a raw pointer that declares that:
|
||||
|
||||
* We own at least one value of type `T`
|
||||
* We may own a value of type `T`
|
||||
* We are Send/Sync iff `T` is Send/Sync
|
||||
* Our pointer is never null (and therefore `Option<Vec>` is null-pointer-optimized)
|
||||
* Our pointer is never null (and therefore `Option<Vec>` is
|
||||
null-pointer-optimized)
|
||||
|
||||
That last point is subtle. First, it makes `Unique::new` unsafe to call, because
|
||||
putting `null` inside of it is Undefined Behaviour. It also throws a
|
||||
|
136
src/doc/tarpl/vec-raw.md
Normal file
136
src/doc/tarpl/vec-raw.md
Normal file
@ -0,0 +1,136 @@
|
||||
% RawVec
|
||||
|
||||
We've actually reached an interesting situation here: we've duplicated the logic
|
||||
for specifying a buffer and freeing its memory. Now that we've implemented it
|
||||
and identified *actual* logic duplication, this is a good time to perform some
|
||||
logic compression.
|
||||
|
||||
We're going to abstract out the `(ptr, cap)` pair and give them the logic for
|
||||
allocating, growing, and freeing:
|
||||
|
||||
```rust,ignore
|
||||
struct RawVec<T> {
|
||||
ptr: Unique<T>,
|
||||
cap: usize,
|
||||
}
|
||||
|
||||
impl<T> RawVec<T> {
|
||||
fn new() -> Self {
|
||||
assert!(mem::size_of::<T>() != 0, "TODO: implement ZST support");
|
||||
unsafe {
|
||||
RawVec { ptr: Unique::new(heap::EMPTY as *mut T), cap: 0 }
|
||||
}
|
||||
}
|
||||
|
||||
// unchanged from Vec
|
||||
fn grow(&mut self) {
|
||||
unsafe {
|
||||
let align = mem::align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
let ptr = heap::allocate(elem_size, align);
|
||||
(1, ptr)
|
||||
} else {
|
||||
let new_cap = 2 * self.cap;
|
||||
let ptr = heap::reallocate(*self.ptr as *mut _,
|
||||
self.cap * elem_size,
|
||||
new_cap * elem_size,
|
||||
align);
|
||||
(new_cap, ptr)
|
||||
};
|
||||
|
||||
// If allocate or reallocate fail, we'll get `null` back
|
||||
if ptr.is_null() { oom() }
|
||||
|
||||
self.ptr = Unique::new(ptr as *mut _);
|
||||
self.cap = new_cap;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
impl<T> Drop for RawVec<T> {
|
||||
fn drop(&mut self) {
|
||||
if self.cap != 0 {
|
||||
let align = mem::align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
let num_bytes = elem_size * self.cap;
|
||||
unsafe {
|
||||
heap::deallocate(*self.ptr as *mut _, num_bytes, align);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And change vec as follows:
|
||||
|
||||
```rust,ignore
|
||||
pub struct Vec<T> {
|
||||
buf: RawVec<T>,
|
||||
len: usize,
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
fn ptr(&self) -> *mut T { *self.buf.ptr }
|
||||
|
||||
fn cap(&self) -> usize { self.buf.cap }
|
||||
|
||||
pub fn new() -> Self {
|
||||
Vec { buf: RawVec::new(), len: 0 }
|
||||
}
|
||||
|
||||
// push/pop/insert/remove largely unchanged:
|
||||
// * `self.ptr -> self.ptr()`
|
||||
// * `self.cap -> self.cap()`
|
||||
// * `self.grow -> self.buf.grow()`
|
||||
}
|
||||
|
||||
impl<T> Drop for Vec<T> {
|
||||
fn drop(&mut self) {
|
||||
while let Some(_) = self.pop() {}
|
||||
// deallocation is handled by RawVec
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
And finally we can really simplify IntoIter:
|
||||
|
||||
```rust,ignore
|
||||
struct IntoIter<T> {
|
||||
_buf: RawVec<T>, // we don't actually care about this. Just need it to live.
|
||||
start: *const T,
|
||||
end: *const T,
|
||||
}
|
||||
|
||||
// next and next_back literally unchanged since they never referred to the buf
|
||||
|
||||
impl<T> Drop for IntoIter<T> {
|
||||
fn drop(&mut self) {
|
||||
// only need to ensure all our elements are read;
|
||||
// buffer will clean itself up afterwards.
|
||||
for _ in &mut *self {}
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Vec<T> {
|
||||
pub fn into_iter(self) -> IntoIter<T> {
|
||||
unsafe {
|
||||
// need to use ptr::read to unsafely move the buf out since it's
|
||||
// not Copy, and Vec implements Drop (so we can't destructure it).
|
||||
let buf = ptr::read(&self.buf);
|
||||
let len = self.len;
|
||||
mem::forget(self);
|
||||
|
||||
IntoIter {
|
||||
start: *buf.ptr,
|
||||
end: buf.ptr.offset(len as isize),
|
||||
_buf: buf,
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Much better.
|
Loading…
x
Reference in New Issue
Block a user