Ali MJ Al-Nasrawy 38654ad741
Rollup merge of #95967 - CAD97:from-utf16, r=dtolnay
Add explicit-endian String::from_utf16 variants

This adds the following APIs under `feature(str_from_utf16_endian)`:

```rust
impl String {
    pub fn from_utf16le(v: &[u8]) -> Result<String, FromUtf16Error>;
    pub fn from_utf16le_lossy(v: &[u8]) -> String;
    pub fn from_utf16be(v: &[u8]) -> Result<String, FromUtf16Error>;
    pub fn from_utf16be_lossy(v: &[u8]) -> String;
}
```

These are versions of `String::from_utf16` that explicitly take [UTF-16LE and UTF-16BE](https://unicode.org/faq/utf_bom.html#gen7). Notably, we can do better than just the obvious `decode_utf16(v.array_chunks::<2>().copied().map(u16::from_le_bytes)).collect()` in that:

- We handle the case where the byte slice is not an even number of bytes, and
- In the case that the UTF-16 is native endian and the slice is aligned, we can forward to `String::from_utf16`.

If the Unicode Consortium actively defines how to handle character replacement when decoding a UTF-16 bytestream with a trailing odd byte, I was unable to find reference. However, the behavior implemented here is fairly self-evidently correct: replace the single errant byte with the replacement character.
2023-10-11 03:53:16 +03:00
..