Documentation for the following methods
with_capacity
with_capacity_in
with_capacity_and_hasher
reserve
reserve_exact
try_reserve
try_reserve_exact
was inconsistent and often not entirely correct where they existed on the following types
Vec
VecDeque
String
OsString
PathBuf
BinaryHeap
HashSet
HashMap
BufWriter
LineWriter
since the allocator is allowed to allocate more than the requested capacity in all such cases, and will frequently "allocate" much more in the case of zero-sized types (I also checked BufReader, but there the docs appear to be accurate as it appears to actually allocate the exact capacity).
Some effort was made to make the documentation more consistent between types as well.
Fix with_capacity* methods for Vec
Fix *reserve* methods for Vec
Fix docs for *reserve* methods of VecDeque
Fix docs for String::with_capacity
Fix docs for *reserve* methods of String
Fix docs for OsString::with_capacity
Fix docs for *reserve* methods on OsString
Fix docs for with_capacity* methods on HashSet
Fix docs for *reserve methods of HashSet
Fix docs for with_capacity* methods of HashMap
Fix docs for *reserve methods on HashMap
Fix expect messages about OOM in doctests
Fix docs for BinaryHeap::with_capacity
Fix docs for *reserve* methods of BinaryHeap
Fix typos
Fix docs for with_capacity on BufWriter and LineWriter
Fix consistent use of `hasher` between `HashMap` and `HashSet`
Fix warning in doc test
Add test for capacity of vec with ZST
Fix doc test error
os str capacity documentation
This is based on https://github.com/rust-lang/rust/pull/95394 , with expansion and consolidation
to address comments from `@dtolnay` and other `@rust-lang/libs-api` team members.
Implement [OsStr]::join
Implements join for `OsStr` and `OsString` slices:
```Rust
let strings = [OsStr::new("hello"), OsStr::new("dear"), OsStr::new("world")];
assert_eq!("hello dear world", strings.join(OsStr::new(" ")));
````
This saves one from converting to strings and back, or from implementing it manually.
This PR has been re-filed after #96744 was first accidentally merged and then reverted.
The change is instantly stable and thus:
r? rust-lang/libs-api `@rustbot` label +T-libs-api -T-libs
cc `@thomcc` `@m-ou-se` `@faptc`
This reverts commit 9aed829fe6cdf5eaf278c6c3972f7acd0830887d.
Fixes https://github.com/rust-lang/rust/issues/96435 , a regression
in crates doing `use std::ffi::*;` and `use std::os::raw::*;`.
We can re-add this re-export once the `core::ffi` types
are stable, and thus the `std::os::raw` types can become re-exports as
well, which will avoid the conflict. (Type aliases to the same type
still conflict, but re-exports of the same type don't.)
This adds a member fn that converts a slice into a CStr; it is intended
to be safer than from_ptr (which is unsafe and may read out of bounds),
and more useful than from_bytes_with_nul (which requires that the caller
already know where the nul byte is).
feature gate: cstr_from_bytes_until_nul
Also add an error type FromBytesUntilNulError for this fn.
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.
A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
Add debug assertions to validate NUL terminator in c strings
The `unchecked` variants from the stdlib usually perform the check anyway if debug assertions are on (for example, `unwrap_unchecked`). This PR does the same thing for `CStr` and `CString`, validating the correctness for the NUL byte in debug mode.
Add documentation to more `From::from` implementations.
For users looking at documentation through IDE popups, this gives them relevant information rather than the generic trait documentation wording “Performs the conversion”. For users reading the documentation for a specific type for any reason, this informs them when the conversion may allocate or copy significant memory versus when it is always a move or cheap copy.
Notes on specific cases:
* The new documentation for `From<T> for T` explains that it is not a conversion at all.
* Also documented `impl<T, U> Into<U> for T where U: From<T>`, the other central blanket implementation of conversion.
* The new documentation for construction of maps and sets from arrays of keys mentions the handling of duplicates. Future work could be to do this for *all* code paths that convert an iterable to a map or set.
* I did not add documentation to conversions of a specific error type to a more general error type.
* I did not add documentation to unstable code.
This change was prepared by searching for the text "From<... for" and so may have missed some cases that for whatever reason did not match. I also looked for `Into` impls but did not find any worth documenting by the above criteria.
Make io::Error use 64 bits on targets with 64 bit pointers.
I've wanted this for a long time, but didn't see a good way to do it without having extra allocation. When looking at it yesterday, it was more clear what to do for some reason.
This approach avoids any additional allocations, and reduces the size by half (8 bytes, down from 16). AFAICT it doesn't come additional runtime cost, and the compiler seems to do a better job with code using it.
Additionally, this `io::Error` has a niche (still), so `io::Result<()>` is *also* 64 bits (8 bytes, down from 16), and `io::Result<usize>` (used for lots of io trait functions) is 2x64 bits (16 bytes, down from 24 — this means on x86_64 it can use the nice rax/rdx 2-reg struct return). More generally, it shaves a whole 64 bit integer register off of the size of basically any `io::Result<()>`.
(For clarity: Improving `io::Result` (rather than io::Error) was most of the motivation for this)
On 32 bit (or other non-64bit) targets we still use something equivalent the old repr — I don't think think there's improving it, since one of the fields it stores is a `i32`, so we can't get below that, and it's already about as close as we can get to it.
---
### Isn't Pointer Tagging Dodgy?
The details of the layout, and why its implemented the way it is, are explained in the header comment of library/std/src/io/error/repr_bitpacked.rs. There's probably more details than there need to be, but I didn't trim it down that much, since there's a lot of stuff I did deliberately, that might have not seemed that way.
There's actually only one variant holding a pointer which gets tagged. This one is the (holder for the) user-provided error.
I believe the scheme used to tag it is not UB, and that it preserves pointer provenance (even though often pointer tagging does not) because the tagging operation is just `core::ptr::add`, and untagging is `core::ptr::sub`. The result of both operations lands inside the original allocation, so it would follow the safety contract of `core::ptr::{add,sub}`.
The other pointer this had to encode is not tagged — or rather, the tagged repr is equivalent to untagged (it's tagged with 0b00, and has >=4b alignment, so we can reuse the bottom bits). And the other variants we encode are just integers, which (which can be untagged using bitwise operations without worry — they're integers).
CC `@RalfJung` for the stuff in repr_bitpacked.rs, as my comments are informed by a lot of the UCG work, but it's possible I missed something or got it wrong (even if the implementation is okay, there are parts of the header comment that says things like "We can't do $x" which could be false).
---
### Why So Many Changes?
The repr change was mostly internal, but changed one widely used API: I had to switch how `io::Error::new_const` works.
This required switching `io::Error::new_const` to take the full message data (including the kind) as a `&'static`, rather than just the string. This would have been really tedious, but I made a macro that made it much simpler, but it was a wide change since `io::Error::new_const` is used everywhere.
This included changing files for a lot of targets I don't have easy access to (SGX? Haiku? Windows? Who has heard of these things), so I expect there to be spottiness in CI initially, unless luck is on my side.
Anyway this large only tangentially-related change is all in the first commit (although that commit also pulls the previous repr out into its own file), whereas the packing stuff is all in commit 2.
---
P.S. I haven't looked at all of this since writing it, and will do a pass over it again later, sorry for any obvious typos or w/e. I also definitely repeat myself in comments and such.
(It probably could use more tests too. I did some basic testing, and made it so we `debug_assert!` in cases the decode isn't what we encoded, but I don't know the degree which I can assume libstd's testing of IO would exercise this. That is: it wouldn't be surprising to me if libstds IO testing were minimal, especially around error cases, although I have no idea).
Little improves in CString `new` when creating from slice
Old code already contain optimization for cases with `&str` and `&[u8]` args. This commit adds a specialization for `&mut[u8]` too.
Also, I added usage of old slice in search for zero bytes instead of new buffer because it produce better code for constant inputs on Windows LTO builds. For other platforms, this wouldn't cause any difference because it calls `libc` anyway.
Inlined `_new` method into spec trait to reduce amount of code generated to `CString::new` callers.
It appears `find_max_slow` comes from the BinaryHeap docs, where the
try_reserve example is a slow implementation of find_max. It has no
relevance to this code in OsString though.
Old code already contain optimization for cases with `&str` and `&[u8]` args. This commit adds a specialization for `&mut[u8]` too.
Also, I added usage of old slice in search for zero bytes instead of new buffer because it produce better code for Windows on LTO builds. For other platforms, this wouldn't cause any difference because it calls `libc` anyway.
Inlined `_new` method into spec trait to reduce amount of code generated to `CString::new` callers.
For users looking at documentation through IDE popups, this gives them
relevant information rather than the generic trait documentation wording
“Performs the conversion”. For users reading the documentation for a
specific type for any reason, this informs them when the conversion may
allocate or copy significant memory versus when it is always a move or
cheap copy.
Notes on specific cases:
* The new documentation for `From<T> for T` explains that it is not a
conversion at all.
* Also documented `impl<T, U> Into<U> for T where U: From<T>`, the other
central blanket implementation of conversion.
* I did not add documentation to conversions of a specific error type to
a more general error type.
* I did not add documentation to unstable code.
This change was prepared by searching for the text "From<... for" and so
may have missed some cases that for whatever reason did not match. I
also looked for `Into` impls but did not find any worth documenting by
the above criteria.
Add #[must_use] to expensive computations
The unifying theme for this commit is weak, admittedly. I put together a list of "expensive" functions when I originally proposed this whole effort, but nobody's cared about that criterion. Still, it's a decent way to bite off a not-too-big chunk of work.
Given the grab bag nature of this commit, the messages I used vary quite a bit. I'm open to wording changes.
For some reason clippy flagged four `BTreeSet` methods but didn't say boo about equivalent ones on `HashSet`. I stared at them for a while but I can't figure out the difference so I added the `HashSet` ones in.
```rust
// Flagged by clippy.
alloc::collections::btree_set::BTreeSet<T> fn difference<'a>(&'a self, other: &'a BTreeSet<T>) -> Difference<'a, T>;
alloc::collections::btree_set::BTreeSet<T> fn symmetric_difference<'a>(&'a self, other: &'a BTreeSet<T>) -> SymmetricDifference<'a, T>
alloc::collections::btree_set::BTreeSet<T> fn intersection<'a>(&'a self, other: &'a BTreeSet<T>) -> Intersection<'a, T>;
alloc::collections::btree_set::BTreeSet<T> fn union<'a>(&'a self, other: &'a BTreeSet<T>) -> Union<'a, T>;
// Ignored by clippy, but not by me.
std::collections::HashSet<T, S> fn difference<'a>(&'a self, other: &'a HashSet<T, S>) -> Difference<'a, T, S>;
std::collections::HashSet<T, S> fn symmetric_difference<'a>(&'a self, other: &'a HashSet<T, S>) -> SymmetricDifference<'a, T, S>
std::collections::HashSet<T, S> fn intersection<'a>(&'a self, other: &'a HashSet<T, S>) -> Intersection<'a, T, S>;
std::collections::HashSet<T, S> fn union<'a>(&'a self, other: &'a HashSet<T, S>) -> Union<'a, T, S>;
```
Parent issue: #89692
r? ```@joshtriplett```
Inline CStr::from_ptr
Inlining this function is valuable, as it allows LLVM to apply `strlen`-specific optimizations without having to enable LTO.
For instance, the following function:
```rust
pub fn f(p: *const c_char) -> Option<u8> {
unsafe { CStr::from_ptr(p) }.to_bytes().get(0).copied()
}
```
Looks like this if `CStr::from_ptr` is allowed to be inlined.
```asm
before:
push rax
call qword ptr [rip + std::ffi::c_str::CStr::from_ptr@GOTPCREL]
mov rcx, rax
cmp rdx, 1
sete dl
test rax, rax
sete al
or al, dl
jne .LBB1_2
mov dl, byte ptr [rcx]
.LBB1_2:
xor al, 1
pop rcx
ret
after:
mov dl, byte ptr [rdi]
test dl, dl
setne al
ret
```
Note that optimization turned this from O(N) to O(1) in terms of performance, as LLVM knows that it doesn't really need to call `strlen` to determine whether a string is empty or not.
The unifying theme for this commit is weak, admittedly. I put together a
list of "expensive" functions when I originally proposed this whole
effort, but nobody's cared about that criterion. Still, it's a decent
way to bite off a not-too-big chunk of work.
Given the grab bag nature of this commit, the messages I used vary quite
a bit.