Refactor unicode.py script
Hi, I noticed that the `unicode.py` script used some deprecated escapes in regular expressions. E.g. `\d`, `\w`, `\.` will be illegal in the future without "raw strings". This is now fixed. I have also cleaned up the script quite a bit.
## Escape deprecation
OK (note the `r`):
`re.compile(r"\d")`
Deprecated (from Python 3.6 onwards, see [here][link1] and [here][link2]):
`re.compile("\d")`.
[link1]: https://docs.python.org/3.6/whatsnew/3.6.html#deprecated-python-behavior
[link2]: https://bugs.python.org/issue27364
This was evident running the script using Python 3.7 like so:
```
$ python3 -Wall unicode.py
unicode.py:227: DeprecationWarning: invalid escape sequence \w
re1 = re.compile("^ *([0-9A-F]+) *; *(\w+)")
unicode.py:228: DeprecationWarning: invalid escape sequence \.
re2 = re.compile("^ *([0-9A-F]+)\.\.([0-9A-F]+) *; *(\w+)")
unicode.py:453: DeprecationWarning: invalid escape sequence \d
pattern = "for Version (\d+)\.(\d+)\.(\d+) of the Unicode"
```
The documentation states that
> A backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases.
## Testing
To test my changes, I had to add support for choosing the Unicode version to use. The script will default to latest release (which is 12.0.0 at the moment, repo has 11.0.0 checked in).
The script generates the exact same output for version 11.0.0 with Python 2.7 and 3.7 and no longer generates any deprecation warnings:
```
$ python3 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 --version
Python 2.7.16
$ python3 --version
Python 3.7.3
```
## Extra functionality
Furthermore, the script will check and download the latest Unicode version by default (without the `-v` argument). The `--help` is below:
```
$ ./unicode.py --help
usage: unicode.py [-h] [-v VERSION]
Regenerate Unicode tables (tables.rs).
optional arguments:
-h, --help show this help message and exit
-v VERSION, --version VERSION
Unicode version to use (if not specified, defaults to
latest available final release).
```
## Cleanups
I have cleaned up the code quite a bit, with Python best practices and code style in mind. I'm happy to provide more details and rationale for all my changes if the reviewers so desire.
One externally visible change is that the Unicode data will now be downloaded into `src/libcore/unicode/downloaded` directory suffixed by Unicode version:
```
$ pwd
.../rust/src/libcore/unicode
$ exa -T downloaded/
downloaded
├── 11.0.0
│ ├── DerivedCoreProperties.txt
│ ├── DerivedNormalizationProps.txt
│ ├── PropList.txt
│ ├── ReadMe.txt
│ ├── Scripts.txt
│ ├── SpecialCasing.txt
│ └── UnicodeData.txt
└── 12.0.0
├── DerivedCoreProperties.txt
├── DerivedNormalizationProps.txt
├── PropList.txt
├── ReadMe.txt
├── Scripts.txt
├── SpecialCasing.txt
└── UnicodeData.txt
```
Rollup of 7 pull requests
Successful merges:
- #62151 (Update linked OpenSSL version)
- #62245 (Miri engine: support extra function (pointer) values)
- #62257 (forward read_c_str method from Memory to Alloc)
- #62264 (Fix perf regression from Miri Machine trait changes)
- #62296 (request at least ptr-size alignment from posix_memalign)
- #62329 (Remove support for 1-token lookahead from the lexer)
- #62377 (Add test for ICE #62375)
Failed merges:
r? @ghost
Remove support for 1-token lookahead from the lexer
`StringReader` maintained `peek_token` and `peek_span_src_raw` for look ahead.
`peek_token` was used only by rustdoc syntax coloring. After moving peeking logic into highlighter, I was able to remove `peek_token` from the lexer. I tried to use `iter::Peekable`, but that wasn't as pretty as I hoped, due to buffered fatal errors. So I went with hand-rolled peeking.
After that I've noticed that the only peeking behavior left was for raw tokens to test tt jointness. I've rewritten it in terms of trivia tokens, and not just spans.
After that it became possible to simplify the awkward constructor of the lexer, which could return `Err` if the first peeked token contained error.
Fix perf regression from Miri Machine trait changes
Maybe this fixes the perf regression that https://github.com/rust-lang/rust/pull/62003 seemingly introduced?
Cc @nnethercote
forward read_c_str method from Memory to Alloc
This is more convenient to call when one starts with a `Scalar` (which is the common case).
`read_c_str` is only used in Miri.
Miri engine: support extra function (pointer) values
We want to add basic support for `dlsym` in Miri (needed to run the latest version of `getrandom`). For that to work, `dlsym` needs to return *something* that can be stored in a function pointer and later called.
So we add a new `ExtraFnVal` type to the `Machine` trait, and enable Miri's memory to associate allocation IDs with such values, so that `create_fn_alloc` and `get_fn` can work on *both* `Instance` (this is used for "normal" function pointers) and `ExtraFnVal`.
Cc @oli-obk
Update linked OpenSSL version
This bumps our linked OpenSSL version from 1.1.1a to 1.1.1c, picking up
some various bug fixes and minor security issue fixes.
Lint on invalid values passed to x.py --warnings
This also introduces support for `--warnings allow` and fixes --warnings
being overridden by the configuration file, config.toml.
Fixes#62402
r? @RalfJung
remove Scalar::is_null_ptr
Comparing pointers should be done more carefully than that. With https://github.com/rust-lang/miri/pull/825, Miri does not need it any more and it is otherwise unused.
rustc_target: avoid negative register counts in the SysV x86_64 ABI.
Because `needed_{int,sse}` and `{int,sse}_regs` were only used with integer literals, they were inferred to `i32` and `{int,sse}_regs` could therefore be negative.
There was a check which prevented that, but *only* for aggregate arguments, not scalars.
Fixes#62350.
r? @nagisa or @rkruppe
Remove `compile-pass` from compiletest
This is a part of #62277.
Removes `compile-pass` from compiletest (and modify some tests' annotations).
r? @Centril
Create async version of the dynamic-drop test
Some of the tests in dynamic-drop have been cut:
* The tests that are just simpler versions of other tests - these tests are already fairly slow due to all of the unwinding and async functions have more control flow paths than normal functions.
* The union test - it's for an unstable feature that has an RFC to remove it.
* The generator test - there aren't async generators yet.
* The tests that show values being leaked - these can be added once the issue is fixed.
r? @Centril
cc #62121 @cramertj
The (almost) culmination of HirIdification
It's finally over.
This PR removes old `FIXME`s and renames some functions so that the `HirId` variant has the shorter name.
All that remains (and rightfully so) is stuff in `resolve`, `save_analysis` and (as far as I can tell) in a few places where we can't replace `NodeId` with `HirId`.
Add MemoryExtra in InterpretCx constructor params
This is to avoid modifying `MemoryExtra` inside `InterpretCx` after initialization. Related miri PR: https://github.com/rust-lang/miri/pull/792
r? @RalfJung
Implement another internal lints
cc #49509
This adds ~~two~~ one internal lint~~s~~:
1. LINT_PASS_IMPL_WITHOUT_MACRO: Make sure, that the `{declare,impl}_lint_pass` macro is used to implement lint passes. cc #59669
2. ~~USAGE_OF_TYCTXT_AND_SPAN_ARGS: item 2 on the list in #49509~~
~~With 2. I wasn't sure, if this lint should be applied everywhere. That means a careful review of 0955835 would be great. Also 73fb9b4 allows this lint on some functions. Should I also apply this lint there?~~
TODO (not directly relevant for review):
- [ ] https://github.com/rust-lang/rust/pull/59316#discussion_r280186517 (not sure yet, if this works or how to query for `rustc_private`, since it's not in [`Features`](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/feature_gate/struct.Features.html) 🤔 cc @eddyb)
- [x] https://github.com/rust-lang/rust/pull/61735#discussion_r292389870
- [x] Check explicitly for the `{declare,impl}_lint_pass!` macros
r? @oli-obk
Rollup of 10 pull requests
Successful merges:
- #62123 ( Remove needless lifetimes (std))
- #62150 (Implement mem::{zeroed,uninitialized} in terms of MaybeUninit.)
- #62169 (Derive which queries to save using the proc macro)
- #62238 (Fix code block information icon position)
- #62292 (Move `async || ...` closures into `#![feature(async_closure)]`)
- #62323 (Clarify unaligned fields in ptr::{read,write}_unaligned)
- #62324 (Reduce reliance on `await!(...)` macro)
- #62371 (Add tracking issue for Box::into_pin)
- #62383 (Improve error span for async type inference error)
- #62388 (Break out of the correct number of scopes in loops)
Failed merges:
r? @ghost
Break out of the correct number of scopes in loops
We were incorrectly breaking out of one too many drop scopes when
generating MIR for loops and breakable blocks, resulting in use after
free and associated borrow checker warnings.
This wasn't noticed because the scope that we're breaking out of twice
is only used for temporaries that are created for adjustments applied to
the loop. Since loops generally propagate coercions to the `break`
expressions, the only case we see this is when the type of the loop is a
smart pointer to a trait object.
Closes#62312
Improve error span for async type inference error
Fixes#62382
Previously, we would point at the spawn of the 'await' expression,
instead of the actual expression with an unknown type.