Add fast path for ASCII in UTF-8 validation
This speeds up the ASCII case (and long stretches of ASCII in otherwise
mixed UTF-8 data) when checking UTF-8 validity.
Benchmark results suggest that on purely ASCII input, we can improve
throughput (megabytes verified / second) by a factor of 13 to 14 (smallish input).
On XML and mostly English language input (en.wikipedia XML dump),
throughput improves by a factor 7 (large input).
On mostly non-ASCII input, performance increases slightly or is the
same.
The UTF-8 validation is rewritten to use indexed access; since all
access is preceded by a (mandatory for validation) length check, bounds
checks are statically elided by LLVM and this formulation is in fact the best
for performance. A previous version had losses due to slice to iterator
conversions.
A large credit to Björn Steinbrink who improved this patch immensely,
writing this second version.
Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3.
Old code is `regular`, this PR is called `fast`.
Datasets:
- `ascii` is just ASCII (2.5 kB)
- `cyr` is cyrillic script with ascii spaces (5 kB)
- `dewik10` is 10MB of a de.wikipedia XML dump
- `enwik8` is 100MB of an en.wikipedia XML dump
- `jawik10` is 10MB of a ja.wikipedia XML dump
```
test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s
test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s
test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s
test from_utf8_cyr_regular ... bench: 10,944 ns/iter (+/- 795) = 468 MB/s
test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s
test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s
test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s
test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s
test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s
test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s
```
Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com>
This adds back the raw_pointer_derive lint as a 'removed' lint, so that its removal does not cause errors (#30346) but warnings.
In the process I discovered regressions in the code for renamed and removed lints, which didn't appear to have any tests. The addition of a second lint pass (ast vs. hir) meant that attributes were being inspected twice, renamed and removed warnings printed twice. I restructured the code so these tests are only done once and added tests. Unfortunately it makes the patch more complicated for the needed beta backport.
r? @nikomatsakis
This provides limited support for using associated consts on type parameters. It generally works on things that can be figured out at trans time. This doesn't work for array lengths or match arms. I have another patch to make it work in const expressions.
CC @eddyb @nikomatsakis
This PR changes translation of tuple-like ADTs from being calls to being proper aggregates. This change is done in hope to make code generation better. Namely, now we can avoid:
1. Call overhead;
2. Generating landingpads in presence of cleanups (we know for sure constructing ADTs can’t panic);
3. And probably much more, gaining better MIR introspectablilty.
Along with that a few serious deficiencies with translation of ADTs and switches have been fixed as well (commits 2 and 3).
r? @nikomatsakis
cc @tsion
In my PR for #21659 I accidentally used `// | help` as test annotation. This PR updates it to `//~| help`. I also found and updated 2 other tests with the same issue.
Previously we would generate regular calls for these, which is likely to result in worse LLVM code,
especially in presence of cleanups – we needn’t unecessarilly generate landing pads to construct an
ADT!
Ref issue [30825](https://github.com/rust-lang/rust/issues/30825)
This commit should suffice to add a concise introduction to the concept of crates.
My only worry, is that it is maybe too concise; but, the book seems to be written with the understanding that the new Rust user is coming from another language, and so will understand what a Library or Code Package is.
There is now more structure to the report, so that you can specify e.g. an RFC/PR/issue number and other explanatory details.
Example message:
```
type-parameter-invalid-lint.rs:14:8: 14:9 error: defaults for type parameters are only allowed on type definitions, like `struct` or `enum`
type-parameter-invalid-lint.rs:14 fn avg<T=i32>(_: T) {}
^
type-parameter-invalid-lint.rs:14:8: 14:9 warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
type-parameter-invalid-lint.rs:14:8: 14:9 note: for more information, see PR 30742 <https://github.com/rust-lang/rust/pull/30724>
type-parameter-invalid-lint.rs:11:9: 11:28 note: lint level defined here
type-parameter-invalid-lint.rs:11 #![deny(future_incompatible)]
^~~~~~~~~~~~~~~~~~~
error: aborting due to previous error
```
r? @brson
I would really like feedback also on the specific messages!
Fixes#30746
The first line (paragraph?) of a doc-comment is what rustdoc shows when listing items of a module.
What makes `Instant` and `SystemTime` different is important enough to be there. (Though feel free to bikeshed the wording.)
This is achieved by adding the scan_back method. This method looks back
through the source_text of the StringReader until it finds the target
char, returning it's offset in the source. We use this method to find
the offset of the opening single quote, and use that offset as the start
of the error.
Given this code:
```rust
fn main() {
let _ = 'abcd';
}
```
The compiler would give a message like:
```
error: character literal may only contain one codepoint: ';
let _ = 'abcd';
^~
```
With this change, the message now displays:
```
error: character literal may only contain one codepoint: 'abcd';
let _ = 'abcd';
^~~~~~~
```
Fixes#30033
The compiler can emit errors and warning in JSON format. This is a more easily machine readable form then the usual error output.
Closes#10492, closes#14863.
[breaking-change]
`OptLevel` variants are no longer `pub use`ed by rust::session::config. If you are using these variants, you must change your code to prefix the variant name with `OptLevel`.
[breaking-change]
syntax::errors::Handler::new has been renamed to with_tty_emitter
Many functions which used to take a syntax::errors::ColorConfig, now take a rustc::session::config::ErrorOutputType. If you previously used ColorConfig::Auto as a default, you should now use ErrorOutputType::default().
This feature is partially stabilized, so describe each part in the appropriate place.
r? @alexcrichton @brson
It would be nice to backport this to beta, since this is the first release where this is true. I try really hard to not do doc backports, but this isn't very large, and might be worth making an exception, I dunno.
Given this code:
fn main() {
let _ = 'abcd';
}
The compiler would give a message like:
error: character literal may only contain one codepoint: ';
let _ = 'abcd';
^~
With this change, the message now displays:
error: character literal may only contain one codepoint: 'abcd'
let _ = 'abcd'
^~~~~~
Fixes#30033