Optimize size/speed of Unicode datasets
The overall implementation has the same general idea as the prior approach,
which was based on a compressed trie structure, but modified to use less space
(and, coincidentally, be an overall performance improvement).
Sizes | Old | New | New/current
-- | -- | -- | --
Alphabetic | 4616 | 2982 | 64.60%
Case_Ignorable | 3144 | 2112 | 67.18%
Cased | 2376 | 934 | 39.31%
Cc | 19 | 43 | 226.32%
Grapheme_Extend | 3072 | 1734 | 56.45%
Lowercase | 2328 | 985 | 42.31%
N | 2648 | 1239 | 46.79%
Uppercase | 1978 | 934 | 47.22%
White_Space | 241 | 140 | 58.09%
| | |
Total | 20422 | 11103 | 54.37%
This table shows the size of the old and new tables in bytes. The most important
of these tables is "Grapheme_Extend", as it is present in essentially all Rust
programs due to being called from `str`'s Debug impl (`char::escape_debug`). In
a representative case given by this [blog post] for the embedded world, the
shrinking in this PR shrinks the final binary by 1,604 bytes, from 14,440 to
12,836.
The performance of these new tables, based on the (rough) benchmark of linearly
scanning the entire valid set of chars, querying for each `is_*`, is roughly
~50% better, though in some cases is either on par or slightly (3-5%) worse. In
practice, I believe the size benefits of this PR are the main concern. The new
implementation has been tested to be equivalent to the current nightly in terms
of returned values on the set of valid chars.
A (relatively) high-level explanation of the specific compression scheme used
can be found [in the generator].
This is split into three commits -- the first adds the generator which produces
the Rust code for the tables, the second adds support code for the lookup, and
the third actually swaps the current implementation out for the new one.
[blog post]: https://jamesmunns.com/blog/fmt-unreasonably-expensive/
[in the generator]: https://github.com/Mark-Simulacrum/rust/blob/unicode-tables/src/tools/unicode-table-generator/src/raw_emitter.rs
Rollup of 12 pull requests
Successful merges:
- #67784 (Reset Formatter flags on exit from pad_integral)
- #67914 (Don't run const propagation on items with inconsistent bounds)
- #68141 (use winapi for non-stdlib Windows bindings)
- #68211 (Add failing example for E0170 explanation)
- #68219 (Untangle ZST validation from integer validation and generalize it to all zsts)
- #68222 (Update the wasi-libc bundled with libstd)
- #68226 (Avoid calling tcx.hir().get() on CRATE_HIR_ID)
- #68227 (Update to a version of cmake with windows arm64 support)
- #68229 (Update iovec to a version with no winapi dependency)
- #68230 (Update libssh2-sys to a version that can build for aarch64-pc-windows…)
- #68231 (Better support for cross compilation on Windows.)
- #68233 (Update compiler_builtins with changes to fix 128 bit integer remainder for aarch64 windows.)
Failed merges:
r? @ghost
Update compiler_builtins with changes to fix 128 bit integer remainder for aarch64 windows.
I have been investigating enabling panic=unwind for aarch64-pc-windows-msvc (see #65313) and building rustc and cargo hosted on aarch64-pc-windows-msvc.
Update libssh2-sys to a version that can build for aarch64-pc-windows…
I have been investigating enabling panic=unwind for aarch64-pc-windows-msvc (see #65313) and building rustc and cargo hosted on aarch64-pc-windows-msvc.
Update iovec to a version with no winapi dependency
I have been investigating enabling panic=unwind for aarch64-pc-windows-msvc (see #65313) and building rustc and cargo hosted on aarch64-pc-windows-msvc.
Update to a version of cmake with windows arm64 support
I have been investigating enabling panic=unwind for aarch64-pc-windows-msvc (see #65313) and building rustc and cargo hosted on aarch64-pc-windows-msvc.
Split MIR building into its own crate
This moves `rustc_mir::{build, hair, lints}` to `rustc_mir_build`.
The new crate only has a `provide` function as it's public API.
Based on #67898
cc @Centril @rust-lang/compiler
r? @oli-obk
Extract `rustc_ast_passes`, move gating, & refactor linting
Based on https://github.com/rust-lang/rust/pull/67770.
This PR extracts a crate `rustc_ast_passes`:
- `ast_validation.rs`, which is contributed by `rustc_passes` (now only has HIR based passes) -- the goal here is to improve recompilation of the parser,
- `feature_gate.rs`, which is contributed by `syntax` and performs post-expansion-gating & final erroring for pre-expansion gating,
- `show_span`, which is contributed by `syntax`.
To facilitate this, we first have to also:
- Move `{leveled_}feature_err{_err}` from `syntax::feature_gate::check` into `rustc_session::parse`.
- Move `get_features` into `rustc_parse::config`, the only place it is used.
- Move some some lint datatypes and traits, e.g. `LintBuffer`, `BufferedEarlyLint`, `BuiltinLintDiagnostics`, `LintPass`, and `LintArray` into `rustc_session::lint`.
- Move all the hard-wired lint `static`s into `rustc_session::lint::builtin`.
build-std compatible sanitizer support
### Motivation
When using `-Z sanitizer=*` feature it is essential that both user code and
standard library is instrumented. Otherwise the utility of sanitizer will be
limited, or its use will be impractical like in the case of memory sanitizer.
The recently introduced cargo feature build-std makes it possible to rebuild
standard library with arbitrary rustc flags. Unfortunately, those changes alone
do not make it easy to rebuild standard library with sanitizers, since runtimes
are dependencies of std that have to be build in specific environment,
generally not available outside rustbuild process. Additionally rebuilding them
requires presence of llvm-config and compiler-rt sources.
The goal of changes proposed here is to make it possible to avoid rebuilding
sanitizer runtimes when rebuilding the std, thus making it possible to
instrument standard library for use with sanitizer with simple, although
verbose command:
```
env CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS=-Zsanitizer=thread cargo test -Zbuild-std --target x86_64-unknown-linux-gnu
```
### Implementation
* Sanitizer runtimes are no long packed into crates. Instead, libraries build
from compiler-rt are used as is, after renaming them into `librusc_rt.*`.
* rustc obtains runtimes from target libdir for default sysroot, so that
they are not required in custom build sysroots created with build-std.
* The runtimes are only linked-in into executables to address issue #64629.
(in previous design it was hard to avoid linking runtimes into static
libraries produced by rustc as demonstrated by sanitizer-staticlib-link
test, which still passes despite changes made in #64780).
cc @kennytm, @japaric, @firstyear, @choller
Standard library support for riscv64gc-unknown-linux-gnu
Add std support for RISC-V 64-bit GNU/Linux and update libc for RISC-V support.
r? @alexcrichton