In <https://github.com/rust-lang/rust/pull/42998>, we added an
uninstantiable type for the internal `UNICODE_VERSION` value,
`UnicodeVersion`, but it was not made public to the outside of the
crate, resulting in the value becoming less useful. Here we make the
type accessible from the outside.
Also add a run-pass test to make sure the type and value can be accessed
as intended.
std: Respect formatting flags for str-like OsStr
Historically many `Display` and `Debug` implementations for `OsStr`-like
abstractions have gone through `String::from_utf8_lossy`, but this was updated
in #42613 to use an internal `Utf8Lossy` abstraction instead. This had the
unfortunate side effect of causing a regression (#43765) in code which relied on
these `fmt` trait implementations respecting the various formatting flags
specified.
This commit opportunistically adds back interpretation of formatting trait flags
in the "common case" where where `OsStr`-like "thing" is all valid utf-8 and can
delegate to the formatting implementation for `str`. This doesn't entirely solve
the regression as non-utf8 paths will format differently than they did before
still (in that they will not respect formatting flags), but this should solve
the regression for all "real world" use cases of paths and such. The door's also
still open for handling these flags in the future!
Closes#43765
Historically many `Display` and `Debug` implementations for `OsStr`-like
abstractions have gone through `String::from_utf8_lossy`, but this was updated
in #42613 to use an internal `Utf8Lossy` abstraction instead. This had the
unfortunate side effect of causing a regression (#43765) in code which relied on
these `fmt` trait implementations respecting the various formatting flags
specified.
This commit opportunistically adds back interpretation of formatting trait flags
in the "common case" where where `OsStr`-like "thing" is all valid utf-8 and can
delegate to the formatting implementation for `str`. This doesn't entirely solve
the regression as non-utf8 paths will format differently than they did before
still (in that they will not respect formatting flags), but this should solve
the regression for all "real world" use cases of paths and such. The door's also
still open for handling these flags in the future!
Closes#43765
[libstd_unicode] Change UNICODE_VERSION to use u32
Looks like there's no strong reason to keep these values at `u64`.
With the current plans for the Unicode Standard, `u8` should be enough for the next 200 years. To stay on the safe side, I'm using `u16` here. I don't see a reason to go with anything machine-dependent/more-efficient.
Create named struct `UnicodeVersion` to use instead of tuple type for
`UNICODE_VERSION` value. This allows user to access the fields with
meaningful field names: `major`, `minor`, and `micro`.
Per request, an empty private field is added to the struct, so it can be
extended in the future without API breakage.
Use `u32` for version components, as `u64` is just an overkill, and
`u32` is the default type for integers and the default type used for
regular internal numbers.
There's no expectation for Unicode Versions to even reach one thousand
in the next hundered years. This is different from *package versions*,
which may become something auto-generated and exceed human-friendly
range of integer values.
add `FromStr` Impl for `char`
fixes#24939.
is it possible to use pub(restricted) instead of using a stability attribute for the internal error representation? is it needed at all?
Introduce tidy lint to check for inconsistent tracking issues
This PR
* Refactors the collect_lib_features function to work in a
non-checking mode (no bad pointer needed, and list of
lang features).
* Introduces checking whether unstable/stable tags for a
given feature have inconsistent tracking issues, as in,
multiple tracking issues per feature.
* Fixes such inconsistencies throughout the codebase.
This commit
* Refactors the collect_lib_features function to work in a
non-checking mode (no bad pointer needed, and list of
lang features).
* Introduces checking whether unstable/stable tags for a
given feature have inconsistent tracking issues.
* Fixes such inconsistencies throughout the codebase.
impl Clone for .split_whitespace()
Use custom closure structs for the predicates so that the iterator's
clone can simply be derived. This should also reduce virtual call
overhead by not using function pointers.
Fixes#41655
Use custom closure structs for the predicates so that the iterator's
clone can simply be derived. This should also reduce virtual call
overhead by not using function pointers.
Corrected very minor documentation detail about Unicode and Japanese
Japanese half-width and full-width romaji characters do have upper and lowercase according Unicode (but other Japanese characters do not). For example,
` assert_eq!('\u{FF21}'.to_lowercase().collect::<String>(),"\u{FF41}");`
r? @steveklabnik
It was only accessible through the `#[unstable]` crate std_unicode.
It has never been used in the compiler or standard library
since 47e7a05a28 added it in 2012
“for OS API interop”.
It can be replaced with a one-liner:
```rust
fn is_utf16(slice: &[u16]) -> bool {
std::char::decode_utf16(s.iter().cloned()).all(|r| r.is_ok())
}
```
Remove not(stage0) from deny(warnings)
Historically this was done to accommodate bugs in lints, but there hasn't been a
bug in a lint since this feature was added which the warnings affected. Let's
completely purge warnings from all our stages by denying warnings in all stages.
This will also assist in tracking down `stage0` code to be removed whenever
we're updating the bootstrap compiler.
`BoolTrie` works well for sets of code points spread out through
most of Unicode’s range, but is uses a lot of space for sets
with few, mostly low, code points.
This switches a few of its instances to a similar but simpler trie
data structure.
## Before
`size_of::<BoolTrie>()` is 1552, which is added to
`table.r3.len() * 8 + t.r5.len() + t.r6.len() * 8`:
* `Cc_table`: 1632
* `White_Space_table`: 1656
* `Pattern_White_Space_table`: 1640
* Total: 4928 bytes
## After
`size_of::<SmallBoolTrie>()` is 32, which is added to
`t.r1.len() + t.r2.len() * 8`:
* `Cc_table`: 51
* `White_Space_table`: 273
* `Pattern_White_Space_table`: 193
* Total: 517 bytes
## Difference
Every Rust program with `std` statically linked should be about 4 KB smaller.
Historically this was done to accommodate bugs in lints, but there hasn't been a
bug in a lint since this feature was added which the warnings affected. Let's
completely purge warnings from all our stages by denying warnings in all stages.
This will also assist in tracking down `stage0` code to be removed whenever
we're updating the bootstrap compiler.