Fix is_utf8 and UTF-8 char width functions to deny non-canonical 'overlong encodings' in UTF-8.
We address the function is_utf8 to make it more strict and correct, but no changes are made to the handling of invalid UTF-8.
Fixes issue #3787
It makes things ugly when the last thing you print is the usage() output, resulting in something like:
```
$ rust run foo.rs -h
Lala
Options:
-h --help display this help and exit
-V --version output version information and exit
$ prompt
```
An 'overlong encoding' is a codepoint encoded non-minimally using the
utf-8 format. Denying these enforce each codepoint to have only one
valid representation in utf-8.
An example is byte sequence 0xE0 0x80 0x80 which could be interpreted as
U+0, but it's an overlong encoding since the canonical form is just
0x00.
Another example is 0xE0 0x80 0xAF which was previously accepted and is
an overlong encoding of the solidus "/". Directory traversal characters
like / and . form the most compelling argument for why this commit is
security critical.
Factor out common UTF-8 decoding expressions as macros. This commit will
partly duplicate UTF-8 decoding, so it is now present in both
fn is_utf8() and .char_range_at(); the latter using an assumption of
a valid str.
Bytes 0xC0, 0xC1 can only be used to start 2-byte codepoint encodings,
that are 'overlong encodings' of codepoints below 128.
The reference given in a comment -- https://tools.ietf.org/html/rfc3629
-- does in fact already exclude these bytes, so no additional comment
should be needed in the code.
Renamed bytes_iter to byte_iter to match other iterators
Refactored str Iterators to use DoubleEnded Iterators and typedefs instead of wrapper structs
Reordered the Iterator section
Whitespace fixup
Moved clunky `each_split_within` function to the one place in the tree where it's actually needed
Replaced all block doccomments in str with line doccomments
Contiunation of naming cleanup in `libsyntax::ast`:
```rust
ast::node_id => ast::NodeId
ast::local_crate => ast::LOCAL_CRATE
ast::crate_node_id => ast::CRATE_NODE_ID
ast::blk_check_mode => ast::BlockCheckMode
ast::ty_field => ast::TypeField
ast::ty_method => ast::TypeMethod
```
Also moved span field directly into `TypeField` struct and cleaned up overlooked `ast::CrateConfig` renamings from last pull request.
Cheers,
Michael
Implement RandomAccessIterator (RAI) where possible, for Iterator adaptors such as Map, Enumerate, Peek, Skip, Take, Cycle, where the adapted iterator is already RAI, and for collections where it is relevant (ringbuf and bitv).
After discussion with thestinger, remove the RAI impl for VecMutIterator, we cannot soundly provide mutable access with this trait.
Implement Extendable everywhere FromIterator is already implemented.
Fixes issue #8108.
Implement RAI where possible for iterator adaptors such as Map,
Enumerate, Skip, Take, Zip, Cycle (all of the requiring that the adapted
iterator also implements RAI).
Drop the "Iterator" suffix for the the structs in std::iterator.
Filter, Zip, Chain etc. are shorter type names for when iterator
pipelines need their types written out in full in return value types, so
it's easier to read and write. the iterator module already forms enough
namespace.
Implement Clone and DeepClone for functions with 0 to 8 arguments. `extern fn()` is implicitly copyable so it's simple, except there is no way to implement it generically over #n function arguments.
Allows deriving of Clone on structs containing `extern "Rust" fn`.
r? @graydon Package IDs can now be of the form a/b/c#FOO, where (if a/b/c is
a git repository) FOO is any tag in the repository. Non-numeric
tags only match against package IDs with the same tag, and aren't
compared linearly like numeric versions.
While I was at it, refactored the code that calls `git clone`, and segregated build output properly for different packages.