add a tutorial on containers and iterators

2013-06-27 18:48:12 -04:00 · 2013-06-27 18:48:12 -04:00 · 659cd55e75
commit 659cd55e75
parent c45af01351
3 changed files with 218 additions and 126 deletions
--- a/doc/tutorial-container.md
+++ b/doc/tutorial-container.md
@ -0,0 +1,207 @@
+% Containers and iterators
+
+# Containers
+
+The container traits are defined in the `std::container` module.
+
+## Unique and managed vectors
+
+Vectors have `O(1)` indexing and removal from the end, along with `O(1)`
+amortized insertion. Vectors are the most common container in Rust, and are
+flexible enough to fit many use cases.
+
+Vectors can also be sorted and used as efficient lookup tables with the
+`std::vec::bsearch` function, if all the elements are inserted at one time and
+deletions are unnecessary.
+
+## Maps and sets
+
+Maps are collections of unique keys with corresponding values, and sets are
+just unique keys without a corresponding value. The `Map` and `Set` traits in
+`std::container` define the basic interface.
+
+The standard library provides three owned map/set types:
+
+* `std::hashmap::HashMap` and `std::hashmap::HashSet`, requiring the keys to
+  implement `Eq` and `Hash`
+* `std::trie::TrieMap` and `std::trie::TrieSet`, requiring the keys to be `uint`
+* `extra::treemap::TreeMap` and `extra::treemap::TreeSet`, requiring the keys
+  to implement `TotalOrd`
+
+These maps do not use managed pointers so they can be sent between tasks as
+long as the key and value types are sendable. Neither the key or value type has
+to be copyable.
+
+The `TrieMap` and `TreeMap` maps are ordered, while `HashMap` uses an arbitrary
+order.
+
+Each `HashMap` instance has a random 128-bit key to use with a keyed hash,
+making the order of a set of keys in a given hash table randomized. Rust
+provides a [SipHash](https://131002.net/siphash/) implementation for any type
+implementing the `IterBytes` trait.
+
+## Double-ended queues
+
+The `extra::deque` module implements a double-ended queue with `O(1)` amortized
+inserts and removals from both ends of the container. It also has `O(1)`
+indexing like a vector. The contained elements are not required to be copyable,
+and the queue will be sendable if the contained type is sendable.
+
+## Priority queues
+
+The `extra::priority_queue` module implements a queue ordered by a key.  The
+contained elements are not required to be copyable, and the queue will be
+sendable if the contained type is sendable.
+
+Insertions have `O(log n)` time complexity and checking or popping the largest
+element is `O(1)`. Converting a vector to a priority queue can be done
+in-place, and has `O(n)` complexity. A priority queue can also be converted to
+a sorted vector in-place, allowing it to be used for an `O(n log n)` in-place
+heapsort.
+
+# Iterators
+
+## Iteration protocol
+
+The iteration protocol is defined by the `Iterator` trait in the
+`std::iterator` module. The minimal implementation of the trait is a `next`
+method, yielding the next element from an iterator object:
+
+~~~
+/// An infinite stream of zeroes
+struct ZeroStream;
+
+impl Iterator<int> for ZeroStream {
+    fn next(&mut self) -> Option<int> {
+        Some(0)
+    }
+}
+~~~~
+
+Reaching the end of the iterator is signalled by returning `None` instead of
+`Some(item)`:
+
+~~~
+/// A stream of N zeroes
+struct ZeroStream {
+    priv remaining: uint
+}
+
+impl ZeroStream {
+    fn new(n: uint) -> ZeroStream {
+        ZeroStream { remaining: n }
+    }
+}
+
+impl Iterator<int> for ZeroStream {
+    fn next(&mut self) -> Option<int> {
+        if self.remaining == 0 {
+            None
+        } else {
+            self.remaining -= 1;
+            Some(0)
+        }
+    }
+}
+~~~
+
+## Container iterators
+
+Containers implement iteration over the contained elements by returning an
+iterator object. For example, vectors have four iterators available:
+
+* `vector.iter()`, for immutable references to the elements
+* `vector.mut_iter()`, for mutable references to the elements
+* `vector.rev_iter()`, for immutable references to the elements in reverse order
+* `vector.mut_rev_iter()`, for mutable references to the elements in reverse order
+
+### Freezing
+
+Unlike most other languages with external iterators, Rust has no *iterator
+invalidation*. As long an iterator is still in scope, the compiler will prevent
+modification of the container through another handle.
+
+~~~
+let mut xs = [1, 2, 3];
+{
+    let _it = xs.iter();
+
+    // the vector is frozen for this scope, the compiler will statically
+    // prevent modification
+}
+// the vector becomes unfrozen again at the end of the scope
+~~~
+
+These semantics are due to most container iterators being implemented with `&`
+and `&mut`.
+
+## Iterator adaptors
+
+The `IteratorUtil` trait implements common algorithms as methods extending
+every `Iterator` implementation. For example, the `fold` method will accumulate
+the items yielded by an `Iterator` into a single value:
+
+~~~
+let xs = [1, 9, 2, 3, 14, 12];
+let result = xs.iter().fold(0, |accumulator, item| accumulator - *item);
+assert_eq!(result, -41);
+~~~
+
+Some adaptors return an adaptor object implementing the `Iterator` trait itself:
+
+~~~
+let xs = [1, 9, 2, 3, 14, 12];
+let ys = [5, 2, 1, 8];
+let sum = xs.iter().chain_(ys.iter()).fold(0, |a, b| a + *b);
+assert_eq!(sum, 57);
+~~~
+
+Note that some adaptors like the `chain_` method above use a trailing
+underscore to work around an issue with method resolve. The underscores will be
+dropped when they become unnecessary.
+
+## For loops
+
+The `for` loop syntax is currently in transition, and will switch from the old
+closure-based iteration protocol to iterator objects. For now, the `advance`
+adaptor is required as a compatibility shim to use iterators with for loops.
+
+~~~
+let xs = [2, 3, 5, 7, 11, 13, 17];
+
+// print out all the elements in the vector
+for xs.iter().advance |x| {
+    println(x.to_str())
+}
+
+// print out all but the first 3 elements in the vector
+for xs.iter().skip(3).advance |x| {
+    println(x.to_str())
+}
+~~~
+
+For loops are *often* used with a temporary iterator object, as above. They can
+also advance the state of an iterator in a mutable location:
+
+~~~
+let xs = [1, 2, 3, 4, 5];
+let ys = ["foo", "bar", "baz", "foobar"];
+
+// create an iterator yielding tuples of elements from both vectors
+let mut it = xs.iter().zip(ys.iter());
+
+// print out the pairs of elements up to (&3, &"baz")
+for it.advance |(x, y)| {
+    println(fmt!("%d %s", *x, *y));
+
+    if *x == 3 {
+        break;
+    }
+}
+
+// yield and print the last pair from the iterator
+println(fmt!("last: %?", it.next()));
+
+// the iterator is now fully consumed
+assert!(it.next().is_none());
+~~~
--- a/doc/tutorial.md
+++ b/doc/tutorial.md
@ -1607,132 +1607,6 @@ do spawn {
 If you want to see the output of `debug!` statements, you will need to turn on `debug!` logging.
 To enable `debug!` logging, set the RUST_LOG environment variable to the name of your crate, which, for a file named `foo.rs`, will be `foo` (e.g., with bash, `export RUST_LOG=foo`).

-## For loops
-
-> ***Note:*** The closure-based protocol used `for` loop is on the way out. The `for` loop will
-> use iterator objects in the future instead.
-
-The most common way to express iteration in Rust is with a `for`
-loop. Like `do`, `for` is a nice syntax for describing control flow
-with closures.  Additionally, within a `for` loop, `break`, `loop`,
-and `return` work just as they do with `while` and `loop`.
-
-Consider again our `each` function, this time improved to return
-immediately when the iteratee returns `false`:
-
-~~~~
-fn each(v: &[int], op: &fn(v: &int) -> bool) -> bool {
-   let mut n = 0;
-   while n < v.len() {
-       if !op(&v[n]) {
-           return false;
-       }
-       n += 1;
-   }
-   return true;
-}
-~~~~
-
-And using this function to iterate over a vector:
-
-~~~~
-# fn each(v: &[int], op: &fn(v: &int) -> bool) -> bool {
-#    let mut n = 0;
-#    while n < v.len() {
-#        if !op(&v[n]) {
-#            return false;
-#        }
-#        n += 1;
-#    }
-#    return true;
-# }
-each([2, 4, 8, 5, 16], |n| {
-    if *n % 2 != 0 {
-        println("found odd number!");
-        false
-    } else { true }
-});
-~~~~
-
-With `for`, functions like `each` can be treated more
-like built-in looping structures. When calling `each`
-in a `for` loop, instead of returning `false` to break
-out of the loop, you just write `break`. To skip ahead
-to the next iteration, write `loop`.
-
-~~~~
-# fn each(v: &[int], op: &fn(v: &int) -> bool) -> bool {
-#    let mut n = 0;
-#    while n < v.len() {
-#        if !op(&v[n]) {
-#            return false;
-#        }
-#        n += 1;
-#    }
-#    return true;
-# }
-for each([2, 4, 8, 5, 16]) |n| {
-    if *n % 2 != 0 {
-        println("found odd number!");
-        break;
-    }
-}
-~~~~
-
-As an added bonus, you can use the `return` keyword, which is not
-normally allowed in closures, in a block that appears as the body of a
-`for` loop: the meaning of `return` in such a block is to return from
-the enclosing function, not just the loop body.
-
-~~~~
-# fn each(v: &[int], op: &fn(v: &int) -> bool) -> bool {
-#    let mut n = 0;
-#    while n < v.len() {
-#        if !op(&v[n]) {
-#            return false;
-#        }
-#        n += 1;
-#    }
-#    return true;
-# }
-fn contains(v: &[int], elt: int) -> bool {
-    for each(v) |x| {
-        if (*x == elt) { return true; }
-    }
-    false
-}
-~~~~
-
-Notice that, because `each` passes each value by borrowed pointer,
-the iteratee needs to dereference it before using it.
-In these situations it can be convenient to lean on Rust's
-argument patterns to bind `x` to the actual value, not the pointer.
-
-~~~~
-# fn each(v: &[int], op: &fn(v: &int) -> bool) -> bool {
-#    let mut n = 0;
-#    while n < v.len() {
-#        if !op(&v[n]) {
-#            return false;
-#        }
-#        n += 1;
-#    }
-#    return true;
-# }
-# fn contains(v: &[int], elt: int) -> bool {
-    for each(v) |&x| {
-        if (x == elt) { return true; }
-    }
-#    false
-# }
-~~~~
-
-`for` syntax only works with stack closures.
-
-> ***Note:*** This is, essentially, a special loop protocol:
-> the keywords `break`, `loop`, and `return` work, in varying degree,
-> with `while`, `loop`, `do`, and `for` constructs.
-
 # Methods

 Methods are like functions except that they always begin with a special argument,
@ -2653,6 +2527,7 @@ tutorials on individual topics.
 * [Tasks and communication][tasks]
 * [Macros][macros]
 * [The foreign function interface][ffi]
+* [Containers and iterators](tutorial-container.html)

 There is further documentation on the [wiki].

--- a/mk/docs.mk
+++ b/mk/docs.mk
@ -99,6 +99,16 @@ doc/tutorial-macros.html: tutorial-macros.md doc/version_info.html \
 	   --include-before-body=doc/version_info.html \
           --output=$@

+DOCS += doc/tutorial-container.html
+doc/tutorial-container.html: tutorial-container.md doc/version_info.html doc/rust.css
+	@$(call E, pandoc: $@)
+	$(Q)$(CFG_NODE) $(S)doc/prep.js --highlight $< | \
+          $(CFG_PANDOC) --standalone --toc \
+           --section-divs --number-sections \
+           --from=markdown --to=html --css=rust.css \
+	   --include-before-body=doc/version_info.html \
+           --output=$@
+
 DOCS += doc/tutorial-ffi.html
 doc/tutorial-ffi.html: tutorial-ffi.md doc/version_info.html doc/rust.css
 	@$(call E, pandoc: $@)