This is a preliminary implementation of `for ... in ... { ...}` using a transitionary keyword `foreach`. Codesize seems to be a little bit down (10% or less non-opt) and otherwise it seems quite trivial to rewrite lambda-based loops to use it. Once we've rewritten the codebase away from lambda-based `for` we can retarget that word at the same production, snapshot, rewrite the keywords in one go, and expire `foreach`.
Feedback welcome. It's a desugaring-based approach which is arguably something we should have been doing for other constructs before. I apologize both for the laziness associated with doing it this way and with any sense that I'm bending rules I put in place previously concerning "never doing desugarings". I put the expansion in `expand.rs` and would be amenable to the argument that the code there needs better factoring / more helpers / to move to a submodule or helper function. It does seem to work at this point, though, and I gather we'd like to get the shift done relatively quickly.
This removes a bunch of options from the task builder interface that are irrelevant to the new scheduler and were generally unused anyway. It also bumps the stack size of new scheduler tasks so that there's enough room to run rustc and changes the interface to `Thread` to not implicitly join threads on destruction, but instead require an explicit, and mandatory, call to `join`.
Main logic in ```Implement select() for new runtime pipes.```. The guts of the ```PortOne::try_recv()``` implementation are now split up across several functions, ```optimistic_check```, ```block_on```, and ```recv_ready```.
There is one weird FIXME I left open here, in the "implement select" commit -- an assertion I couldn't get to work in the receive path, on an invariant that for some reason doesn't hold with ```SharedPort```. Still investigating this.
Fix is_utf8 and UTF-8 char width functions to deny non-canonical 'overlong encodings' in UTF-8.
We address the function is_utf8 to make it more strict and correct, but no changes are made to the handling of invalid UTF-8.
Fixes issue #3787
It makes things ugly when the last thing you print is the usage() output, resulting in something like:
```
$ rust run foo.rs -h
Lala
Options:
-h --help display this help and exit
-V --version output version information and exit
$ prompt
```
An 'overlong encoding' is a codepoint encoded non-minimally using the
utf-8 format. Denying these enforce each codepoint to have only one
valid representation in utf-8.
An example is byte sequence 0xE0 0x80 0x80 which could be interpreted as
U+0, but it's an overlong encoding since the canonical form is just
0x00.
Another example is 0xE0 0x80 0xAF which was previously accepted and is
an overlong encoding of the solidus "/". Directory traversal characters
like / and . form the most compelling argument for why this commit is
security critical.
Factor out common UTF-8 decoding expressions as macros. This commit will
partly duplicate UTF-8 decoding, so it is now present in both
fn is_utf8() and .char_range_at(); the latter using an assumption of
a valid str.
Bytes 0xC0, 0xC1 can only be used to start 2-byte codepoint encodings,
that are 'overlong encodings' of codepoints below 128.
The reference given in a comment -- https://tools.ietf.org/html/rfc3629
-- does in fact already exclude these bytes, so no additional comment
should be needed in the code.