BTreeSet: avoid intermediate sorting when collecting sorted iterators
As [pointed out by droundy](https://users.rust-lang.org/t/question-about-btreeset-implementation/76427), an obvious optimization is to skip the first step introduced by #88448 (creation of a vector and sorting) and it's easy to do so for btree's own iterators. Also, exploit `from` in the examples.
Remove migrate borrowck mode
Closes#58781Closes#43234
# Stabilization proposal
This PR proposes the stabilization of `#![feature(nll)]` and the removal of `-Z borrowck`. Current borrow checking behavior of item bodies is currently done by first infering regions *lexically* and reporting any errors during HIR type checking. If there *are* any errors, then MIR borrowck (NLL) never occurs. If there *aren't* any errors, then MIR borrowck happens and any errors there would be reported. This PR removes the lexical region check of item bodies entirely and only uses MIR borrowck. Because MIR borrowck could never *not* be run for a compiled program, this should not break any programs. It does, however, change diagnostics significantly and allows a slightly larger set of programs to compile.
Tracking issue: #43234
RFC: https://github.com/rust-lang/rfcs/blob/master/text/2094-nll.md
Version: 1.63 (2022-06-30 => beta, 2022-08-11 => stable).
## Motivation
Over time, the Rust borrow checker has become "smarter" and thus allowed more programs to compile. There have been three different implementations: AST borrowck, MIR borrowck, and polonius (well, in progress). Additionally, there is the "lexical region resolver", which (roughly) solves the constraints generated through HIR typeck. It is not a full borrow checker, but does emit some errors.
The AST borrowck was the original implementation of the borrow checker and was part of the initially stabilized Rust 1.0. In mid 2017, work began to implement the current MIR borrow checker and that effort ompleted by the end of 2017, for the most part. During 2018, efforts were made to migrate away from the AST borrow checker to the MIR borrow checker - eventually culminating into "migrate" mode - where HIR typeck with lexical region resolving following by MIR borrow checking - being active by default in the 2018 edition.
In early 2019, migrate mode was turned on by default in the 2015 edition as well, but with MIR borrowck errors emitted as warnings. By late 2019, these warnings were upgraded to full errors. This was followed by the complete removal of the AST borrow checker.
In the period since, various errors emitted by the MIR borrow checker have been improved to the point that they are mostly the same or better than those emitted by the lexical region resolver.
While there do remain some degradations in errors (tracked under the [NLL-diagnostics tag](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3ANLL-diagnostics), those are sufficiently small and rare enough that increased flexibility of MIR borrow check-only is now a worthwhile tradeoff.
## What is stabilized
As said previously, this does not fundamentally change the landscape of accepted programs. However, there are a [few](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3ANLL-fixed-by-NLL) cases where programs can compile under `feature(nll)`, but not otherwise.
There are two notable patterns that are "fixed" by this stabilization. First, the `scoped_threads` feature, which is a continutation of a pre-1.0 API, can sometimes emit a [weird lifetime error](https://github.com/rust-lang/rust/issues/95527) without NLL. Second, actually seen in the standard library. In the `Extend` impl for `HashMap`, there is an implied bound of `K: 'a` that is available with NLL on but not without - this is utilized in the impl.
As mentioned before, there are a large number of diagnostic differences. Most of them are better, but some are worse. None are serious or happen often enough to need to block this PR. The biggest change is the loss of error code for a number of lifetime errors in favor of more general "lifetime may not live long enough" error. While this may *seem* bad, the former error codes were just attempts to somewhat-arbitrarily bin together lifetime errors of the same type; however, on paper, they end up being roughly the same with roughly the same kinds of solutions.
## What isn't stabilized
This PR does not completely remove the lexical region resolver. In the future, it may be possible to remove that (while still keeping HIR typeck) or to remove it together with HIR typeck.
## Tests
Many test outputs get updated by this PR. However, there are number of tests specifically geared towards NLL under `src/test/ui/nll`
## History
* On 2017-07-14, [tracking issue opened](https://github.com/rust-lang/rust/issues/43234)
* On 2017-07-20, [initial empty MIR pass added](https://github.com/rust-lang/rust/pull/43271)
* On 2017-08-29, [RFC opened](https://github.com/rust-lang/rfcs/pull/2094)
* On 2017-11-16, [Integrate MIR type-checker with NLL](https://github.com/rust-lang/rust/pull/45825)
* On 2017-12-20, [NLL feature complete](https://github.com/rust-lang/rust/pull/46862)
* On 2018-07-07, [Don't run AST borrowck on mir mode](https://github.com/rust-lang/rust/pull/52083)
* On 2018-07-27, [Add migrate mode](https://github.com/rust-lang/rust/pull/52681)
* On 2019-04-22, [Enable migrate mode on 2015 edition](https://github.com/rust-lang/rust/pull/59114)
* On 2019-08-26, [Don't downgrade errors on 2015 edition](https://github.com/rust-lang/rust/pull/64221)
* On 2019-08-27, [Remove AST borrowck](https://github.com/rust-lang/rust/pull/64790)
Avoid zero-sized allocs in ThinBox if T and H are both ZSTs.
This was surprisingly tricky, and took longer to get right than expected. `ThinBox` is a surprisingly subtle piece of code. That said, in the end, a lot of this was due to overthinking[^overthink] -- ultimately the fix ended up fairly clean and simple.
[^overthink]: Honestly, for a while I was convinced this couldn't be done without allocations or runtime branches in these cases, but that's obviously untrue.
Anyway, as a result of spending all that time debugging, I've extended the tests quite a bit, and also added more debug assertions. Many of these helped for subtle bugs I made in the middle (for example, the alloc/drop tracking is because I ended up double-dropping the value in the case where both were ZSTs), they're arguably a bit of overkill at this point, although I imagine they could help in the future too.
Anyway, these tests cover a wide range of size/align cases, nd fully pass under miri[^1]. They also do some smoke-check asserting that the value has the correct alignment, although in practice it's totally within the compiler's rights to delete these assertions since we'd have already done UB if they get hit. They have more boilerplate than they really need, but it's not *too* bad on a per-test basis.
A notable absence from testing is atypical header types, but at the moment it's impossible to manually implement `Pointee`. It would be really nice to have testing here, since it's not 100% obvious to me that the aligned read/write we use for `H` are correct in the face of arbitrary combinations of `size_of::<H>()`, `align_of::<H>()`, and `align_of::<T>()`. (That said, I spent a while thinking through it and am *pretty* sure it's fine -- I'd just feel... better if we could test some cases for non-ZST headers which have unequal and align).
[^1]: Or at least, they pass under miri if I copy the code and tests into a new crate and run miri on it (after making it less stdlibified).
Fixes#96485.
I'd request review ``@yaahc,`` but I believe you're taking some time away from reviews, so I'll request from the previous PR's reviewer (I think that the context helps, even if the actual change didn't end up being bad here).
r? ``@joshtriplett``
Improve documentation for constructors of pinned `Box`es
Adds a cross-references between `Box::pin` and `Box::into_pin` (and other related methods, i.e. the equivalent `From` implementation, and the unstable `pin_in` method), in particular now that `into_pin` [was stabilized](https://github.com/rust-lang/rust/pull/97397). The main goal is to further improve visibility of the fact that `Box<T> -> Pin<Box<T>>` conversion exits in the first place, and that `Box::pin(x)` is – essentially – just a convenience function for `Box::into_pin(Box::new(x))`
The motivating context why I think this is important is even experienced Rust users overlooking the existence this kind of conversion, [e.g. in this thread on IRLO](https://internals.rust-lang.org/t/pre-rfc-function-variants/16732/7?u=steffahn); and also the fact that that discussion brought up that there would be a bunch of Box-construction methods "missing" such as e.g. methods with fallible allocation a la "`Box::try_pin`", and similar; while those are in fact *not* necessary, because you can use `Box::into_pin(Box::try_new(x)?)` instead.
I have *not* included explicit mention of methods (e.g. `try_new`) in the docs of stable methods (e.g. `into_pin`). (Referring to unstable API in stable API docs would be bad style IMO.) Stable examples I have in mind with the statement "constructing a (pinned) Box in a different way than with `Box::new`" are things like cloning a `Box`, or `Box::from_raw`. If/when `try_new` would get stabilized, it would become a very good concrete example use-case of `Box::into_pin` IMO.
Add #[rustc_box] and use it inside alloc
This commit adds an alternative content boxing syntax, and uses it inside alloc.
```Rust
#![feature(box_syntax)]
fn foo() {
let foo = box bar;
}
```
is equivalent to
```Rust
#![feature(rustc_attrs)]
fn foo() {
let foo = #[rustc_box] Box::new(bar);
}
```
The usage inside the very performance relevant code in
liballoc is the only remaining relevant usage of box syntax
in the compiler (outside of tests, which are comparatively easy to port).
box syntax was originally designed to be used by all Rust
developers. This introduces a replacement syntax more tailored
to only being used inside the Rust compiler, and with it,
lays the groundwork for eventually removing box syntax.
[Earlier work](https://github.com/rust-lang/rust/pull/87781#issuecomment-894714878) by `@nbdd0121` to lower `Box::new` to `box` during THIR -> MIR building ran into borrow checker problems, requiring the lowering to be adjusted in a way that led to [performance regressions](https://github.com/rust-lang/rust/pull/87781#issuecomment-894872367). The proposed change in this PR lowers `#[rustc_box] Box::new` -> `box` in the AST -> HIR lowering step, which is way earlier in the compiler, and thus should cause less issues both performance wise as well as regarding type inference/borrow checking/etc. Hopefully, future work can move the lowering further back in the compiler, as long as there are no performance regressions.
Tweak insert docs
For `{Hash, BTree}Map::insert`, I always have to take a few extra seconds to think about the slight weirdness about the fact that if we "did not" insert (which "sounds" false), we return true, and if we "did" insert, (which "sounds" true), we return false.
This tweaks the doc comments for the `insert` methods of those types (as well as what looks like a rustc internal data structure that I found just by searching the codebase for "If the set did") to first use the "Returns whether _something_" pattern used in e.g. `remove`, where we say that `remove` "returns whether the value was present".
alloc: remove repeated word in comment
Linux's `checkpatch.pl` reports:
```txt
#42544: FILE: rust/alloc/vec/mod.rs:2692:
WARNING: Possible repeated word: 'to'
+ // - Elements are :Copy so it's OK to to copy them, without doing
```
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Put a bound on collection misbehavior
As currently written, when a logic error occurs in a collection's trait parameters, this allows *completely arbitrary* misbehavior, so long as it does not cause undefined behavior in std. However, because the extent of misbehavior is not specified, it is allowed for *any* code in std to start misbehaving in arbitrary ways which are not formally UB; consider the theoretical example of a global which gets set on an observed logic error. Because the misbehavior is only bound by not resulting in UB from safe APIs and the crate-level encapsulation boundary of all of std, this makes writing user unsafe code that utilizes std theoretically impossible, as it now relies on undocumented QOI (quality of implementation) that unrelated parts of std cannot be caused to misbehave by a misuse of std::collections APIs.
In practice, this is a nonconcern, because std has reasonable QOI and an implementation that takes advantage of this freedom is essentially a malicious implementation and only compliant by the most langauage-lawyer reading of the documentation.
To close this hole, we just add a small clause to the existing logic error paragraph that ensures that any misbehavior is limited to the collection which observed the logic error, making it more plausible to prove the soundness of user unsafe code.
This is not meant to be formal; a formal refinement would likely need to mention that values derived from the collection can also misbehave after a logic error is observed, as well as define what it means to "observe" a logic error in the first place. This fix errs on the side of informality in order to close the hole without complicating a normal reading which can assume a reasonable nonmalicious QOI.
See also [discussion on IRLO][1].
[1]: https://internals.rust-lang.org/t/using-std-collections-and-unsafe-anything-can-happen/16640
r? rust-lang/libs-api ```@rustbot``` label +T-libs-api -T-libs
This technically adds a new guarantee to the documentation, though I argue as written it's one already implicitly provided.
Clarify the guarantees of Vec::as_ptr and Vec::as_mut_ptr when there's no allocation
Currently the documentation says they return a pointer to the vector's buffer, which has the implied precondition that the vector allocated some memory. However `Vec`'s documentation also specifies that it won't always allocate, so it's unclear whether the pointer returned is valid in that case. Of course you won't be able to read/write actual bytes to/from it since the capacity is 0, but there's an exception: zero sized read/writes. They are still valid as long as the pointer is not null and the memory it points to wasn't deallocated, but `Vec::as_ptr` and `Vec::as_mut_ptr` don't specify that's not the case. This PR thus specifies they are actually valid for zero sized reads since `Vec` is implemented to hold a dangling pointer in those cases, which is neither null nor was deallocated.
Linux's `checkpatch.pl` reports:
```txt
#42544: FILE: rust/alloc/vec/mod.rs:2692:
WARNING: Possible repeated word: 'to'
+ // - Elements are :Copy so it's OK to to copy them, without doing
```
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Document the current aliasing rules for `Box<T>`.
Currently, `Box<T>` gets `noalias`, meaning it has the same rules as `&mut T`. This is sparsely documented, even though it can have quite a big impact on unsafe code using box. Therefore, these rules are documented here, with a big warning that they are not normative and subject to change, since we have not yet committed to an aliasing model and the state of `Box<T>` is especially uncertain.
If you have any suggestions and improvements, make sure to leave them here. This is mostly intended to inform people about what is currently going on (to prevent misunderstandings such as [Jon Gjengset's Box aliasing](https://www.youtube.com/watch?v=EY7Wi9fV5bk)).
This is supposed to _only document current UB_ and not add any new guarantees or rules.
refactor: VecDeques Iter fields to private
Made the fields of VecDeque's Iter private by creating a Iter::new(...) function to create a new instance of Iter and migrating usage to use Iter::new(...).
improve format impl for literals
The basic idea of this change can be seen here https://godbolt.org/z/MT37cWoe1.
Updates the format impl to have a fast path for string literals and the default path for regular format args.
This change will allow `format!("string literal")` to be used interchangably with `"string literal".to_owned()`.
This would be relevant in the case of `f!"string literal"` being legal (https://github.com/rust-lang/rfcs/pull/3267) in which case it would be the easiest way to create owned strings from literals, while also being just as efficient as any other impl
Use Box::new() instead of box syntax in library tests
The tests inside `library/*` have no reason to use `box` syntax as they have 0 performance relevance. Therefore, we can safely remove them (instead of having to use alternatives like the one in #97293).
Finish bumping stage0
It looks like the last time had left some remaining cfg's -- which made me think
that the stage0 bump was actually successful. This brings us to a released 1.62
beta though.
This now brings us to cfg-clean, with the exception of check-cfg-features in bootstrap;
I'd prefer to leave that for a separate PR at this time since it's likely to be more tricky.
cc https://github.com/rust-lang/rust/pull/97147#issuecomment-1132845061
r? `@pietroalbini`
Remove impossible panic note from `Vec::append`
Neither the number of elements in a vector can overflow a `usize`, nor
can the amount of elements in two vectors.
It looks like the last time had left some remaining cfg's -- which made me think
that the stage0 bump was actually successful. This brings us to a released 1.62
beta though.
Currently, `Box<T>` gets `noalias`, meaning it has
the same rules as `&mut T`. This is
sparsely documented, even though it can have quite
a big impact on unsafe code using box. Therefore,
these rules are documented here, with a big warning
that they are not normative and subject to change,
since we have not yet committed to an aliasing model
and the state of `Box<T>` is especially uncertain.
improve case conversion happy path
Someone shared the source code for [Go's string case conversion](19156a5474/src/strings/strings.go (L558-L616)).
It features a hot path for ascii-only strings (although I assume for reasons specific to go, they've opted for a read safe hot loop).
I've borrowed these ideas and also kept our existing code to provide a fast path + seamless utf-8 correct path fallback.
(Naive) Benchmarks can be found here https://github.com/conradludgate/case-conv
For the cases where non-ascii is found near the start, the performance of this algorithm does fall back to original speeds and has not had any measurable speed loss
As currently written, when a logic error occurs in a collection's trait
parameters, this allows *completely arbitrary* misbehavior, so long as
it does not cause undefined behavior in std. However, because the extent
of misbehavior is not specified, it is allowed for *any* code in std to
start misbehaving in arbitrary ways which are not formally UB; consider
the theoretical example of a global which gets set on an observed logic
error. Because the misbehavior is only bound by not resulting in UB from
safe APIs and the crate-level encapsulation boundary of all of std, this
makes writing user unsafe code that utilizes std theoretically
impossible, as it now relies on undocumented QOI that unrelated parts of
std cannot be caused to misbehave by a misuse of std::collections APIs.
In practice, this is a nonconcern, because std has reasonable QOI and an
implementation that takes advantage of this freedom is essentially a
malicious implementation and only compliant by the most langauage-lawyer
reading of the documentation.
To close this hole, we just add a small clause to the existing logic
error paragraph that ensures that any misbehavior is limited to the
collection which observed the logic error, making it more plausible to
prove the soundness of user unsafe code.
This is not meant to be formal; a formal refinement would likely need to
mention that values derived from the collection can also misbehave after a
logic error is observed, as well as define what it means to "observe" a
logic error in the first place. This fix errs on the side of informality
in order to close the hole without complicating a normal reading which
can assume a reasonable nonmalicious QOI.
See also [discussion on IRLO][1].
[1]: https://internals.rust-lang.org/t/using-std-collections-and-unsafe-anything-can-happen/16640
Clarify slice and Vec iteration order
While already being inferable from the doc examples, it wasn't fully specified. This is the only logical way to do a slice iterator, so I think this should be uncontroversial. It also improves the `Vec::into_iter` example to better show the order and that the iterator returns owned values.
Improve codegen of String::retain method
This pull-request improve the codegen of the `String::retain` method.
Using `unwrap_unchecked` helps the optimizer to not generate a panicking path that will never be taken for valid UTF-8 like string.
Using `encode_utf8` saves us from an expensive call to `memcpy`, as the optimizer is unable to realize that `ch_len <= 4` and so can generate much better assembly code.
https://rust.godbolt.org/z/z73ohenfc
Remove libstd's calls to `C-unwind` foreign functions
Remove all libstd and its dependencies' usage of `extern "C-unwind"`.
This is a prerequiste of a WIP PR which will forbid libraries calling `extern "C-unwind"` functions to be compiled in `-Cpanic=unwind` and linked against `panic_abort` (this restriction is necessary to address soundness bug #96926).
Cargo will ensure all crates are compiled with the same `-Cpanic` but the std is only compiled `-Cpanic=unwind` but needs the ability to be linked into `-Cpanic=abort`.
Currently there are two places where `C-unwind` is used in libstd:
* `__rust_start_panic` is used for interfacing to the panic runtime. This could be `extern "Rust"`
* `_{rdl,rg}_oom`: a shim `__rust_alloc_error_handler` will be generated by codegen to call into one of these; they can also be `extern "Rust"` (in fact, the generated shim is used as `extern "Rust"`, so I am not even sure why these are not, probably because they used to `extern "C"` and was changed to `extern "C-unwind"` when we allow alloc error hooks to unwind, but they really should just be using Rust ABI).
For dependencies, there is only one `extern "C-unwind"` function call, in `unwind` crate. This can be expressed as a re-export.
More dicussions can be seen in the Zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/210922-project-ffi-unwind/topic/soundness.20in.20mixed.20panic.20mode
`@rustbot` label: T-libs F-c_unwind
Use default alloc_error_handler for hermit
Hermit now properly separates kernel from userspace.
Applications for hermit can now use Rust's default `alloc_error_handler` instead of calling the kernel's `__rg_oom`.
CC: ``@stlankes``
Add `sub_ptr` on pointers (the `usize` version of `offset_from`)
We have `add`/`sub` which are the `usize` versions of `offset`, this adds the `usize` equivalent of `offset_from`. Like how `.add(d)` replaced a whole bunch of `.offset(d as isize)`, you can see from the changes here that it's fairly common that code actually knows the order between the pointers and *wants* a `usize`, not an `isize`.
As a bonus, this can do `sub nuw`+`udiv exact`, rather than `sub`+`sdiv exact`, which can be optimized slightly better because it doesn't have to worry about negatives. That's why the slice iterators weren't using `offset_from`, though I haven't updated that code in this PR because slices are so perf-critical that I'll do it as its own change.
This is an intrinsic, like `offset_from`, so that it can eventually be allowed in CTFE. It also allows checking the extra safety condition -- see the test confirming that CTFE catches it if you pass the pointers in the wrong order.
Like we have `add`/`sub` which are the `usize` version of `offset`, this adds the `usize` equivalent of `offset_from`. Like how `.add(d)` replaced a whole bunch of `.offset(d as isize)`, you can see from the changes here that it's fairly common that code actually knows the order between the pointers and *wants* a `usize`, not an `isize`.
As a bonus, this can do `sub nuw`+`udiv exact`, rather than `sub`+`sdiv exact`, which can be optimized slightly better because it doesn't have to worry about negatives. That's why the slice iterators weren't using `offset_from`, though I haven't updated that code in this PR because slices are so perf-critical that I'll do it as its own change.
This is an intrinsic, like `offset_from`, so that it can eventually be allowed in CTFE. It also allows checking the extra safety condition -- see the test confirming that CTFE catches it if you pass the pointers in the wrong order.
Implement a lint to warn about unused macro rules
This implements a new lint to warn about unused macro rules (arms/matchers), similar to the `unused_macros` lint added by #41907 that warns about entire macros.
```rust
macro_rules! unused_empty {
(hello) => { println!("Hello, world!") };
() => { println!("empty") }; //~ ERROR: 1st rule of macro `unused_empty` is never used
}
fn main() {
unused_empty!(hello);
}
```
Builds upon #96149 and #96156.
Fixes#73576
Warn on unused `#[doc(hidden)]` attributes on trait impl items
[Zulip conversation](https://rust-lang.zulipchat.com/#narrow/stream/266220-rustdoc/topic/.E2.9C.94.20Validy.20checks.20for.20.60.23.5Bdoc.28hidden.29.5D.60).
Whether an associated item in a trait impl is shown or hidden in the documentation entirely depends on the corresponding item in the trait declaration. Rustdoc completely ignores `#[doc(hidden)]` attributes on impl items. No error or warning is emitted:
```rust
pub trait Tr { fn f(); }
pub struct Ty;
impl Tr for Ty { #[doc(hidden)] fn f() {} }
// ^^^^^^^^^^^^^^ ignored by rustdoc and currently
// no error or warning issued
```
This may lead users to the wrong belief that the attribute has an effect. In fact, several such cases are found in the standard library (I've removed all of them in this PR).
There does not seem to exist any incentive to allow this in the future either: Impl'ing a trait for a type means the type *fully* conforms to its API. Users can add `#[doc(hidden)]` to the whole impl if they want to hide the implementation or add the attribute to the corresponding associated item in the trait declaration to hide the specific item. Hiding an implementation of an associated item does not make much sense: The associated item can still be found on the trait page.
This PR emits the warn-by-default lint `unused_attribute` for this case with a future-incompat warning.
`@rustbot` label T-compiler T-rustdoc A-lint
Remove `#[rustc_deprecated]`
This removes `#[rustc_deprecated]` and introduces diagnostics to help users to the right direction (that being `#[deprecated]`). All uses of `#[rustc_deprecated]` have been converted. CI is expected to fail initially; this requires #95958, which includes converting `stdarch`.
I plan on following up in a short while (maybe a bootstrap cycle?) removing the diagnostics, as they're only intended to be short-term.
Add a dedicated length-prefixing method to `Hasher`
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Fixes#94026
r? rust-lang/libs
---
The core of this change is the following two new methods on `Hasher`:
```rust
pub trait Hasher {
/// Writes a length prefix into this hasher, as part of being prefix-free.
///
/// If you're implementing [`Hash`] for a custom collection, call this before
/// writing its contents to this `Hasher`. That way
/// `(collection![1, 2, 3], collection![4, 5])` and
/// `(collection![1, 2], collection![3, 4, 5])` will provide different
/// sequences of values to the `Hasher`
///
/// The `impl<T> Hash for [T]` includes a call to this method, so if you're
/// hashing a slice (or array or vector) via its `Hash::hash` method,
/// you should **not** call this yourself.
///
/// This method is only for providing domain separation. If you want to
/// hash a `usize` that represents part of the *data*, then it's important
/// that you pass it to [`Hasher::write_usize`] instead of to this method.
///
/// # Examples
///
/// ```
/// #![feature(hasher_prefixfree_extras)]
/// # // Stubs to make the `impl` below pass the compiler
/// # struct MyCollection<T>(Option<T>);
/// # impl<T> MyCollection<T> {
/// # fn len(&self) -> usize { todo!() }
/// # }
/// # impl<'a, T> IntoIterator for &'a MyCollection<T> {
/// # type Item = T;
/// # type IntoIter = std::iter::Empty<T>;
/// # fn into_iter(self) -> Self::IntoIter { todo!() }
/// # }
///
/// use std:#️⃣:{Hash, Hasher};
/// impl<T: Hash> Hash for MyCollection<T> {
/// fn hash<H: Hasher>(&self, state: &mut H) {
/// state.write_length_prefix(self.len());
/// for elt in self {
/// elt.hash(state);
/// }
/// }
/// }
/// ```
///
/// # Note to Implementers
///
/// If you've decided that your `Hasher` is willing to be susceptible to
/// Hash-DoS attacks, then you might consider skipping hashing some or all
/// of the `len` provided in the name of increased performance.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_length_prefix(&mut self, len: usize) {
self.write_usize(len);
}
/// Writes a single `str` into this hasher.
///
/// If you're implementing [`Hash`], you generally do not need to call this,
/// as the `impl Hash for str` does, so you can just use that.
///
/// This includes the domain separator for prefix-freedom, so you should
/// **not** call `Self::write_length_prefix` before calling this.
///
/// # Note to Implementers
///
/// The default implementation of this method includes a call to
/// [`Self::write_length_prefix`], so if your implementation of `Hasher`
/// doesn't care about prefix-freedom and you've thus overridden
/// that method to do nothing, there's no need to override this one.
///
/// This method is available to be overridden separately from the others
/// as `str` being UTF-8 means that it never contains `0xFF` bytes, which
/// can be used to provide prefix-freedom cheaper than hashing a length.
///
/// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating
/// them into a buffer), then you can hash the bytes of the `str` followed
/// by a single `0xFF` byte.
///
/// If your `Hasher` works in chunks, you can also do this by being careful
/// about how you pad partial chunks. If the chunks are padded with `0x00`
/// bytes then just hashing an extra `0xFF` byte doesn't necessarily
/// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash
/// the same sequence of chunks. But if you pad with `0xFF` bytes instead,
/// ensuring at least one padding byte, then it can often provide
/// prefix-freedom cheaper than hashing the length would.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_str(&mut self, s: &str) {
self.write_length_prefix(s.len());
self.write(s.as_bytes());
}
}
```
With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`.
`write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing. But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach.
---
Compatibility:
Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Avoid using `rand::thread_rng` in the stdlib benchmarks.
This is kind of an anti-pattern because it introduces extra nondeterminism for no real reason. In thread_rng's case this comes both from the random seed and also from the reseeding operations it does, which occasionally does syscalls (which adds additional nondeterminism). The impact of this would be pretty small in most cases, but it's a good practice to avoid (particularly because avoiding it was not hard).
Anyway, several of our benchmarks already did the right thing here anyway, so the change was pretty easy and mostly just applying it more universally. That said, the stdlib benchmarks aren't particularly stable (nor is our benchmark framework particularly great), so arguably this doesn't matter that much in practice.
~~Anyway, this also bumps the `rand` dev-dependency to 0.8, since it had fallen somewhat out of date.~~ Nevermind, too much of a headache.
Tweak the vec-calloc runtime check to only apply to shortish-arrays
r? `@Mark-Simulacrum`
`@nbdd0121` pointed out in https://github.com/rust-lang/rust/pull/95362#issuecomment-1114085395 that LLVM currently doesn't constant-fold the `IsZero` check for long arrays, so that seems like a reasonable justification for limiting it.
It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR. Maybe it's a ["the number of inlining shall be three"](https://youtu.be/s4wnuiCwTGU?t=320) sort of situation.
Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.
std::fmt: Various fixes and improvements to documentation
This PR contains the following changes:
- **Added argument index comments to examples for specifying precision**
The examples for specifying the precision have comments explaining which
argument the specifier is referring to. However, for implicit positional
arguments, the examples simply refer to "next arg". To simplify following the
comments, "next arg" was supplemented with the actual resulting argument index.
- **Fixed documentation for specifying precision via `.*`**
The documentation stated that in case of the syntax `{<arg>:<spec>.*}`, "the
`<arg>` part refers to the value to print, and the precision must come in the
input preceding `<arg>`". This is not correct: the <arg> part does indeed refer
to the value to print, but the precision does not come in the input preciding
arg, but in the next implicit input (as if specified with {}).
Fixes#96413.
- **Fix the grammar documentation**
According to the grammar documented, the format specifier `{: }` should not be
legal because of the whitespace it contains. However, in reality, this is
perfectly fine because the actual implementation allows spaces before the
closing brace. Fixes#71088.
Also, the exact meaning of most of the terminal symbols was not specified, for
example the meaning of `identifier`.
- **Removed reference to Formatter::buf and other private fields**
Formatter::buf is not a public field and therefore isn't very helpful in user-
facing documentation. Also, the other public fields of Formatter were removed
during stabilization of std::fmt (4af3494bb0) and can only be accessed via
getters.
- **Improved list of formatting macros**
Two improvements:
1. write! can not only receive a `io::Write`, but also a `fmt::Write` as first argument.
2. The description texts now contain links to the actual macros for easier
navigation.
Clarify docs for `from_raw_parts` on `Vec` and `String`
Closes#95427
Original safety explanation for `from_raw_parts` was unclear on safety for consuming a C string. This clarifies when doing so is safe.
Classify BinaryHeap & LinkedList unit tests as such
All but one of these so-called integration test case are unit tests, just like btree's were (#75531). In addition, reunite the unit tests of linked_list that were split off during #23104 because they needed to remain unit tests (they were later moved to the separate file they are in during #63207). The two sets could remain separate files, but I opted to merge them back together, more or less in the order they used to be, apart from one duplicate name `test_split_off` and one duplicate tiny function `list_from`.
Using unwrap_unchecked helps the optimizer to not generate panicking
path, that will never be taken for valid UTF-8 like string.
Using encode_utf8 saves us a call to a memcpy, as the optimizer is
unable to realize that ch_len <= 4 and so can generate much better
assembly code.
https://rust.godbolt.org/z/z73ohenfc
Two improvements:
1. write! can not only receive a `io::Write`, but also a `fmt::Write` as first argument.
2. The description texts now contain links to the actual macros for easier
navigation.
Formatter::buf is not a public field and therefore isn't very helpful in user-
facing documentation. Also, the other public fields of Formatter were made
private during stabilization of std::fmt (4af3494bb0) and can now only be read
via accessor methods.
According to the grammar documented, the format specifier `{: }` should not be
legal because of the whitespace it contains. However, in reality, this is
perfectly fine because the actual implementation allows spaces before the
closing brace. Fixes#71088.
Also, the exact meaning of most of the terminal symbols was not specified, for
example the meaning of `identifier`.
The examples for specifying the precision have comments explaining which
argument the specifier is referring to. However, for implicit positional
arguments, the examples simply talk about "next arg". To make it easier for
readers to follow the comments, "next arg" was supplemented with the actual
resulting argument index.
The documentation stated that in case of the syntax `{<arg>:<spec>.*}`, "the
`<arg>` part refers to the value to print, and the precision must come in the
input preceding `<arg>`". This is not correct: the <arg> part does indeed refer
to the value to print, but the precision does not come in the input preciding
arg, but in the next implicit input (as if specified with {}).
Fixes#96413.
Implement str to [u8] conversion for refcounted containers
This seems motivated to complete the APIs for shared containers since we already have similar allocation-free conversions for strings like `From<Box<[u8]>> for Box<str>`.
Insta-stable since it's a new trait impl?
Revert "Re-export core::ffi types from std::ffi"
This reverts commit 9aed829fe6.
Fixes https://github.com/rust-lang/rust/issues/96435 , a regression
in crates doing `use std::ffi::*;` and `use std::os::raw::*;`.
We can re-add this re-export once the `core::ffi` types
are stable, and thus the `std::os::raw` types can become re-exports as
well, which will avoid the conflict. (Type aliases to the same type
still conflict, but re-exports of the same type don't.)
Clarify that `Cow::into_owned` returns owned data
Two sections of the `Cow::into_owned` docs imply that `into_owned` returns a `Cow`. Clarify that it returns the underlying owned object, either cloned or extracted from the `Cow`.
Fix some confusing wording and improve slice-search-related docs
This adds more links between `contains` and `binary_search` because I do think they have some relevant connections. If your (big) slice happens to be sorted and you know it, surely you should be using `[3; 100].binary_search(&5).is_ok()` over `[3; 100].contains(&5)`?
This also fixes the confusing "searches this sorted X" wording which just sounds really weird because it doesn't know whether it's actually sorted. It should be but it may not be. The new wording should make it clearer that you will probably want to sort it and in the same sentence it also mentions the related function `contains`.
Similarly, this mentions `binary_search` on `contains`' docs.
This also fixes some other minor stuff and inconsistencies.
[test] Add test cases for untested functions for VecDeque
Added test cases of the following functions
- get
- get_mut
- swap
- reserve_exact
- try_reserve_exact
- try_reserve
- contains
- rotate_left
- rotate_right
- binary_search
- binary_search_by
- binary_search_by_key
`alloc`: make `vec!` unavailable under `no_global_oom_handling`
`alloc`: make `vec!` unavailable under `no_global_oom_handling`
The `vec!` macro has 3 rules, but two are not usable under
`no_global_oom_handling` builds of the standard library
(even with a zero size):
```rust
let _ = vec![42]; // Error: requires `exchange_malloc` lang_item.
let _ = vec![42; 0]; // Error: cannot find function `from_elem`.
```
Thus those two rules should not be available to begin with.
The remaining one, with an empty matcher, is just a shorthand for
`new()` and may not make as much sense to have alone, since the
idea behind `vec!` is to enable `Vec`s to be defined with the same
syntax as array expressions. Furthermore, the documentation can be
confusing since it shows the other rules.
Thus perhaps it is better and simpler to disable `vec!` entirely
under `no_global_oom_handling` environments, and let users call
`new()` instead:
```rust
let _: Vec<i32> = vec![];
let _: Vec<i32> = Vec::new();
```
Notwithstanding this, a `try_vec!` macro would be useful, such as
the one introduced in https://github.com/rust-lang/rust/pull/95051.
If the shorthand for `new()` is deemed worth keeping on its own,
then it may be interesting to have a separate `vec!` macro with
a single rule and different, simpler documentation.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Speed up Vec::clear().
Currently it just calls `truncate(0)`. `truncate()` is (a) not marked as
`#[inline]`, and (b) more general than needed for `clear()`.
This commit changes `clear()` to do the work itself. This modest change
was first proposed in rust-lang#74172, where the reviewer rejected it because
there was insufficient evidence that `Vec::clear()`'s performance
mattered enough to justify the change. Recent changes within rustc have
made `Vec::clear()` hot within `macro_parser.rs`, so the change is now
clearly worthwhile.
Although it doesn't show wins on CI perf runs, this seems to be because they
use PGO. But not all platforms currently use PGO. Also, local builds don't use
PGO, and `truncate` sometimes shows up in an over-represented fashion in local
profiles. So local profiling will be made easier by this change.
Note that this will also benefit `String::clear()`, because it just
calls `Vec::clear()`.
Finally, the commit removes the `vec-clear.rs` codegen test. It was
added in #52908. From before then until now, `Vec::clear()` just called
`Vec::truncate()` with a zero length. The body of Vec::truncate() has
changed a lot since then. Now that `Vec::clear()` is doing actual work
itself, and not just calling `Vec::truncate()`, it's not surprising that
its generated code includes a load and an icmp. I think it's reasonable
to remove this test.
r? `@m-ou-se`