Shebang handling was too agressive in stripping out the first line in cases where it is actually _not_ a shebang, but instead, valid rust (#70528). This is a second attempt at resolving this issue (the first attempt was flawed, for, among other reasons, causing an ICE in certain cases (#71372, #71471).
The behavior is now codified by a number of UI tests, but simply:
For the first line to be a shebang, the following must all be true:
1. The line must start with `#!`
2. The line must contain a non whitespace character after `#!`
3. The next character in the file, ignoring comments & whitespace must not be `[`
I believe this is a strict superset of what we used to allow, so perhaps a crater run is unnecessary, but probably not a terrible idea.
The modified code to handle parsing raw strings didn't properly account for the case where there was no "#" on either end and erroneously reported this strings as complete. This lead to a panic trying to read off the end of the file.
This diff improves error messages around raw strings in a few ways:
- Catch extra trailing `#` in the parser. This can't be handled in the lexer because we could be in a macro that actually expects another # (see test)
- Refactor & unify error handling in the lexer between ByteStrings and RawByteStrings
- Detect potentially intended terminators (longest sequence of "#*" is suggested)
They are only used by rustc_lexer, and are not needed elsewhere.
So we move the relevant definitions into rustc_lexer (while the actual
unicode data comes from the unicode-xid crate) and make the rest of
the compiler use it.
add rustc_private as a proper language feature gate
At the moment, `rustc_private` as a (library) feature exists by
accident: `char::is_xid_start`, `char::is_xid_continue` methods in
libcore define it.
cc https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/How.20to.20declare.20new.20langauge.20feature.3F
I don't know if this is at all reasonable, but at least tests seem to pass locally. That probably means that we can remove/rename to something more resonable the feature in libcore in the next release?
The idea here is to make a reusable library out of the existing
rust-lexer, by separating out pure lexing and rustc-specific concerns,
like spans, error reporting an interning.
So, rustc_lexer operates directly on `&str`, produces simple tokens
which are a pair of type-tag and a bit of original text, and does not
report errors, instead storing them as flags on the token.