Some modest running-time improvements to `std::collections::BitSet` on bit-sets of varying set-membership densities. This is work originally from [here](https://github.com/rayglover/alt_collections). (Benchmarks copied below)
```
std::collections::BitSet / alt_collections::BitSet
copy_dense ... 3.08x
copy_sparse ... 4.22x
count_dense ... 11.01x
count_sparse ... 8.11x
from_bytes ... 1.47x
intersect_dense ... 6.54x
intersect_sparse ... 4.37x
union_dense ... 5.53x
union_sparse ... 5.60x
```
The exception is `from_bytes`, which I've left unaltered since the optimization is rather obscure.
Compiling with the cpu feature `popcnt` gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above.
Similar improvements could be made to `BitVec`, although that would probably require more substantial changes.
criticism welcome!
Using regular pointer arithmetic to iterate collections of zero-sized types
doesn't work, because we'd get the same pointer all the time. Our
current solution is to convert the pointer to an integer, add an offset
and then convert back, but this inhibits certain optimizations.
What we should do instead is to convert the pointer to one that points
to an i8\*, and then use a LLVM GEP instructions without the inbounds
flag to perform the pointer arithmetic. This allows to generate pointers
that point outside allocated objects without causing UB (as long as you
don't dereference them), and it wraps around using two's complement,
i.e. it behaves exactly like the wrapping_* operations we're currently
using, with the added benefit of LLVM being able to better optimize the
resulting IR.
Using regular pointer arithmetic to iterate collections of zero-sized types
doesn't work, because we'd get the same pointer all the time. Our
current solution is to convert the pointer to an integer, add an offset
and then convert back, but this inhibits certain optimizations.
What we should do instead is to convert the pointer to one that points
to an i8*, and then use a LLVM GEP instructions without the inbounds
flag to perform the pointer arithmetic. This allows to generate pointers
that point outside allocated objects without causing UB (as long as you
don't dereference them), and it wraps around using two's complement,
i.e. it behaves exactly like the wrapping_* operations we're currently
using, with the added benefit of LLVM being able to better optimize the
resulting IR.
The functions BitSet::{iter,union,symmetric_difference} each had docs that claimed u32s were output when their actual output each end up being usizes.
r? @steveklabnik
We don't need to copy any elements if `at` is behind the last element
in the map. The last element is at index `self.v.len() - 1`, so we
should not copy if `at` is greater **or equals** `self.v.len()`.
r? @Gankro
Several Minor API / Reference Documentation Fixes
- Fix a few small errors in the reference.
- Fix paper cuts in the API docs.
Fixes#24882Fixes#25233Fixes#25250
We don't need to copy any elements if `at` is behind the last element
in the map. The last element is at index `self.v.len() - 1`, so we
should not copy if `at` is greater or equals `self.v.len()`.
I was profiling my code again and this time AsRef<str> for String
was eating up a considerable chunk of my runtime; adding the inline
annotation made the program run almost twice as fast!
While I was at it I also added the annotation to other implementations
of AsRef as well as AsMut.
Ideally this trait implementation would be unstable, requiring crates to opt-in
if they would like the functionality, but that's not currently how stability
works so the implementation needs to be removed entirely.
This may come back at a future date, but for now the conservative option is to
remove it.
[breaking-change]
Ideally this trait implementation would be unstable, requiring crates to opt-in
if they would like the functionality, but that's not currently how stability
works so the implementation needs to be removed entirely.
This may come back at a future date, but for now the conservative option is to
remove it.
[breaking-change]
collections: Convert SliceConcatExt to use associated types
Coherence now allows this, we have `SliceConcatExt<T> for [V] where T: Sized + Clone` and` SliceConcatExt<str> for [S]`, these don't conflict because
str is never Sized.
According to rust-lang/rfcs#235, `VecDeque` should have this method (`VecDeque` was called `RingBuf` at the time) but it was never implemented.
I marked this stable since "1.0.0" because it's stable in `Vec`.
Coherence now allows this, we have SliceConcatExt<T> for [V] where T: Sized
+ Clone and SliceConcatExt<str> for [S], these don't conflict because
str is never Sized.
This commit does two things: it adds an example for indexing vectors, and it changes the \"Examples\" section to use full sentences.
This change was spurred by someone in the #rust IRC channel asking if there was a `.set()` method for changing the `i`-th value of a vector (they had missed that `Vec` implements `IndexMut`, which is easy to do if you're not aware of that trait).
This makes the `bit::vec::bench::bench_bit_vec_big_union` benchmark go
from `774 ns/iter (+/- 190)` to `602 ns/iter (+/- 5)`.
(There's room for more work here too: if one can guarantee 128-bit
alignment for the vector, the compiler actually optimises `union`,
`intersection` etc. to SIMD instructions, which end up being ~5x faster
that the original version, and 4x faster than the optimised version in
this patch.)
This makes the `bit::vec::bench::bench_bit_vec_big_union` benchmark go
from `774 ns/iter (+/- 190)` to `602 ns/iter (+/- 5)`.
(There's room for more work here too: if one can guarantee 128-bit
alignment for the vector, the compiler actually optimises `union`,
`intersection` etc. to SIMD instructions, which end up being ~5x faster
that the original version, and 4x faster than the optimised version in
this patch.)
collections: Implement String::drain(range) according to RFC 574
`.drain(range)` is unstable and under feature(collections_drain).
This adds a safe way to remove any range of a String as efficiently as
possible.
As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.
RFC tracking issue is #23055
These implementations were intended to be unstable, but currently the stability
attributes cannot handle a stable trait with an unstable `impl` block. This
commit also audits the rest of the standard library for explicitly-`#[unstable]`
impl blocks. No others were removed but some annotations were changed to
`#[stable]` as they're defacto stable anyway.
One particularly interesting `impl` marked `#[stable]` as part of this commit
is the `Add<&[T]>` impl for `Vec<T>`, which uses `push_all` and implicitly
clones all elements of the vector provided.
Closes#24791
[breaking-change]
`.drain(range)` is unstable and under feature(collections_drain).
This adds a safe way to remove any range of a String as efficiently as
possible.
As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.
RFC tracking issue is #23055
These implementations were intended to be unstable, but currently the stability
attributes cannot handle a stable trait with an unstable `impl` block. This
commit also audits the rest of the standard library for explicitly-`#[unstable]`
impl blocks. No others were removed but some annotations were changed to
`#[stable]` as they're defacto stable anyway.
One particularly interesting `impl` marked `#[stable]` as part of this commit
is the `Add<&[T]>` impl for `Vec<T>`, which uses `push_all` and implicitly
clones all elements of the vector provided.
Closes#24791
Implement Vec::drain(\<range type\>) from rust-lang/rfcs#574, tracking issue #23055.
This is a big step forward for vector usability. This is an introduction of an API for removing a range of *m* consecutive elements from a vector, as efficently as possible.
New features:
- Introduce trait `std::collections::range::RangeArgument` implemented by all four built-in range types.
- Change `Vec::drain()` to use `Vec::drain<R: RangeArgument>(R)`
Implementation notes:
- Use @Gankro's idea for memory safety: Use `set_len` on the source vector when creating the iterator, to make sure that the part of the vector that will be modified is unreachable. Fix up things in Drain's destructor — but even if it doesn't run, we don't expose any moved-out-from slots of the vector.
- This `.drain<R>(R)` very close to how it is specified in the RFC.
- Introduced as unstable
- Drain reuses the slice iterator — copying and pasting the same iterator pointer arithmetic again felt very bad
- The `usize` index as a range argument in the RFC is not included. The ranges trait would have to change to accomodate it.
Please help me with:
- Name and location of the new ranges trait.
- Design of the ranges trait
- Understanding Niko's comments about variance (Note: for a long time I was using a straight up &mut Vec in the iterator, but I changed this to permit reusing the slice iterator).
Previous PR and discussion: #23071
Improve example for as_string and add example for as_vec
Provide a better example of `as_string` / `DerefString`'s unique capabilities.
Use an example where (for an unspecified reason) you need a &String, and
show how `as_string` solves the problem without needing an allocation.
Changes the style guidelines regarding unit tests to recommend using a
sub-module named "tests" instead of "test" for unit tests as "test"
might clash with imports of libtest.
This is an implementation of [RFC 1030][rfc] which adds these traits to the
prelude and additionally removes all inherent `into_iter` methods on collections
in favor of the trait implementation (which is now accessible by default).
[rfc]: https://github.com/rust-lang/rfcs/pull/1030
This is technically a breaking change due to the prelude additions and removal
of inherent methods, but it is expected that essentially no code breaks in
practice.
[breaking-change]
Closes#24538
as accepted in [RFC 526](https://github.com/rust-lang/rfcs/blob/master/text/0526-fmt-text-writer.md).
Note that this brand new method is marked as **stable**. I judged this safe enough: it’s simple enough that it’s very unlikely to change. Still, I can mark it unstable instead if you prefer.
r? @alexcrichton
For now, words() is left in (but deprecated), and Words is a type alias for
struct SplitWhitespace.
Also cleaned up references to s.words() throughout codebase.
Closes#15628
This commit removes all the old casting/generic traits from `std::num` that are
no longer in use by the standard library. This additionally removes the old
`strconv` module which has not seen much use in quite a long time. All generic
functionality has been supplanted with traits in the `num` crate and the
`strconv` module is supplanted with the [rust-strconv crate][rust-strconv].
[rust-strconv]: https://github.com/lifthrasiir/rust-strconv
This is a breaking change due to the removal of these deprecated crates, and the
alternative crates are listed above.
[breaking-change]
This implementation is currently about 3-4 times faster than using the `.to_string()` based approach.
I would also suggest we deprecate `String::from_str` since it's redundant with the stable `String::from` method, but I'll leave that for a future PR.
It's just as convenient, but it's much faster. Using write! requires an
extra call to fmt::write and a extra dynamically dispatched call to
Arguments' Display format function.
This is an implementation of [RFC 1030][rfc] which adds these traits to the
prelude and additionally removes all inherent `into_iter` methods on collections
in favor of the trait implementation (which is now accessible by default).
[rfc]: https://github.com/rust-lang/rfcs/pull/1030
This is technically a breaking change due to the prelude additions and removal
of inherent methods, but it is expected that essentially no code breaks in
practice.
[breaking-change]
Closes#24538
It's just as convenient, but it's much faster. Using write! requires an
extra call to fmt::write and a extra dynamically dispatched call to
Arguments' Display format function.
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
librustc_driver and libsyntax. This may change pretty-printed
output from these modules in cases involving wide or combining
characters used in filenames, identifiers, etc.
The following functions are marked deprecated:
1. char.width() and str.width():
--> use unicode-width crate
2. str.graphemes() and str.grapheme_indices():
--> use unicode-segmentation crate
3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
char.compose(), char.decompose_canonical(), char.decompose_compatible(),
char.canonical_combining_class():
--> use unicode-normalization crate
The meaning of each variant of this enum was somewhat ambiguous and it's uncler
that we wouldn't even want to add more enumeration values in the future. As a
result this error has been altered to instead become an opaque structure.
Learning about the "first invalid byte index" is still an unstable feature, but
the type itself is now stable.
Right now, if the user requests to increase the vector size via reserve() or push_back() and the request brings the attempted memory above usize::MAX, we panic.
With this change there is only a panic if the minimum requested memory that could meet the requirement is above usize::MAX- otherwise it simply requests its largest capacity possible, usize::MAX.
...to be less confusing. Since 0 is the smallest number possible for usize, it doesn't make sense to mention it if it's already included, and it should be more clear that the length of the vector is a valid index with the new wording.
r? @steveklabnik
The meaning of each variant of this enum was somewhat ambiguous and it's uncler
that we wouldn't even want to add more enumeration values in the future. As a
result this error has been altered to instead become an opaque structure.
Learning about the "first invalid byte index" is still an unstable feature, but
the type itself is now stable.
In addition to being nicer, this also allows you to use `sum` and `product` for
iterators yielding custom types aside from the standard integers.
Due to removing the `AdditiveIterator` and `MultiplicativeIterator` trait, this
is a breaking change.
[breaking-change]
This adds the missing methods and turns `str::pattern` in a user facing module, as per RFC.
This also contains some big internal refactorings:
- string iterator pairs are implemented with a central macro to reduce redundancy
- Moved all tests from `coretest::str` into `collectionstest::str` and left a note to prevent the two sets of tests drifting apart further.
See https://github.com/rust-lang/rust/issues/22477
- Added missing reverse versions of methods
- Added [r]matches()
- Generated the string pattern iterators with a macro
- Added where bounds to the methods returning reverse iterators
for better error messages.
Right now comparing a `&String` (or a `&Cow`) to a `&str` requires redundant borrowing of the latter. Implementing `PartialEq<str>` tries to avoid this limitation.
```rust
struct Foo (String);
fn main () {
let s = Foo("foo".to_string());
match s {
Foo(ref x) if x == &"foo" => println!("foo!"),
// avoid this -----^
_ => {}
}
}
```
I was hoping that #23521 would solve this but it didn't work out.
These constants are small and can fit even in `u8`, but semantically they have type `usize` because they denote sizes and are almost always used in `usize` context. The change of their type to `u32` during the integer audit led only to the large amount of `as usize` noise (see the second commit, which removes this noise).
This is a minor [breaking-change] to an unstable interface.
r? @aturon
This commit is an implementation of [RFC 979][rfc] which changes the meaning of
the count parameter to the `splitn` function on strings and slices. The
parameter now means the number of items that are returned from the iterator, not
the number of splits that are made.
[rfc]: https://github.com/rust-lang/rfcs/pull/979Closes#23911
[breaking-change]
This commit is an implementation of [RFC 979][rfc] which changes the meaning of
the count parameter to the `splitn` function on strings and slices. The
parameter now means the number of items that are returned from the iterator, not
the number of splits that are made.
[rfc]: https://github.com/rust-lang/rfcs/pull/979Closes#23911
[breaking-change]
This is a deprecated attribute that is slated for removal, and it also affects
all implementors of the trait. This commit removes the attribute and fixes up
implementors accordingly. The primary implementation which was lost was the
ability to compare `&[T]` and `Vec<T>` (in that order).
This change also modifies the `assert_eq!` macro to not consider both directions
of equality, only the one given in the left/right forms to the macro. This
modification is motivated due to the fact that `&[T] == Vec<T>` no longer
compiles, causing hundreds of errors in unit tests in the standard library (and
likely throughout the community as well).
Closes#19470
[breaking-change]
* The `io::Seek` trait.
* The `Iterator::{partition, unsip}` methods.
* The `Vec::into_boxed_slice` method.
* The `LinkedList::append` method.
* The `{or_insert, or_insert_with` methods in the `Entry` APIs.
r? @alexcrichton
* Marks `#[stable]` the contents of the `std::convert` module.
* Added methods `PathBuf::as_path`, `OsString::as_os_str`,
`String::as_str`, `Vec::{as_slice, as_mut_slice}`.
* Deprecates `OsStr::from_str` in favor of a new, stable, and more
general `OsStr::new`.
* Adds unstable methods `OsString::from_bytes` and `OsStr::{to_bytes,
to_cstring}` for ergonomic FFI usage.
[breaking-change]
r? @alexcrichton
This commit cleans out a large amount of deprecated APIs from the standard
library and some of the facade crates as well, updating all users in the
compiler and in tests as it goes along.
* The `io::Seek` trait, and `SeekFrom` enum.
* The `Iterator::{partition, unsip}` methods.
* The `Vec::into_boxed_slice` method.
* The `LinkedList::append` method.
* The `{or_insert, or_insert_with` methods in the `Entry` APIs.
This is a deprecated attribute that is slated for removal, and it also affects
all implementors of the trait. This commit removes the attribute and fixes up
implementors accordingly. The primary implementation which was lost was the
ability to compare `&[T]` and `Vec<T>` (in that order).
This change also modifies the `assert_eq!` macro to not consider both directions
of equality, only the one given in the left/right forms to the macro. This
modification is motivated due to the fact that `&[T] == Vec<T>` no longer
compiles, causing hundreds of errors in unit tests in the standard library (and
likely throughout the community as well).
cc #19470
[breaking-change]
* Marks `#[stable]` the contents of the `std::convert` module.
* Added methods `PathBuf::as_path`, `OsString::as_os_str`,
`String::as_str`, `Vec::{as_slice, as_mut_slice}`.
* Deprecates `OsStr::from_str` in favor of a new, stable, and more
general `OsStr::new`.
* Adds unstable methods `OsString::from_bytes` and `OsStr::{to_bytes,
to_cstring}` for ergonomic FFI usage.
[breaking-change]
This commit stabilizes the `std::num` module:
* The `Int` and `Float` traits are deprecated in favor of (1) the
newly-added inherent methods and (2) the generic traits available in
rust-lang/num.
* The `Zero` and `One` traits are reintroduced in `std::num`, which
together with various other traits allow you to recover the most
common forms of generic programming.
* The `FromStrRadix` trait, and associated free function, is deprecated
in favor of inherent implementations.
* A wide range of methods and constants for both integers and floating
point numbers are now `#[stable]`, having been adjusted for integer
guidelines.
* `is_positive` and `is_negative` are renamed to `is_sign_positive` and
`is_sign_negative`, in order to address #22985
* The `Wrapping` type is moved to `std::num` and stabilized;
`WrappingOps` is deprecated in favor of inherent methods on the
integer types, and direct implementation of operations on
`Wrapping<X>` for each concrete integer type `X`.
Closes#22985Closes#21069
[breaking-change]
r? @alexcrichton
This commit stabilizes the `std::num` module:
* The `Int` and `Float` traits are deprecated in favor of (1) the
newly-added inherent methods and (2) the generic traits available in
rust-lang/num.
* The `Zero` and `One` traits are reintroduced in `std::num`, which
together with various other traits allow you to recover the most
common forms of generic programming.
* The `FromStrRadix` trait, and associated free function, is deprecated
in favor of inherent implementations.
* A wide range of methods and constants for both integers and floating
point numbers are now `#[stable]`, having been adjusted for integer
guidelines.
* `is_positive` and `is_negative` are renamed to `is_sign_positive` and
`is_sign_negative`, in order to address #22985
* The `Wrapping` type is moved to `std::num` and stabilized;
`WrappingOps` is deprecated in favor of inherent methods on the
integer types, and direct implementation of operations on
`Wrapping<X>` for each concrete integer type `X`.
Closes#22985Closes#21069
[breaking-change]
r? @alexcrichton
This commit stabilizes the `std::num` module:
* The `Int` and `Float` traits are deprecated in favor of (1) the
newly-added inherent methods and (2) the generic traits available in
rust-lang/num.
* The `Zero` and `One` traits are reintroduced in `std::num`, which
together with various other traits allow you to recover the most
common forms of generic programming.
* The `FromStrRadix` trait, and associated free function, is deprecated
in favor of inherent implementations.
* A wide range of methods and constants for both integers and floating
point numbers are now `#[stable]`, having been adjusted for integer
guidelines.
* `is_positive` and `is_negative` are renamed to `is_sign_positive` and
`is_sign_negative`, in order to address #22985
* The `Wrapping` type is moved to `std::num` and stabilized;
`WrappingOps` is deprecated in favor of inherent methods on the
integer types, and direct implementation of operations on
`Wrapping<X>` for each concrete integer type `X`.
Closes#22985Closes#21069
[breaking-change]
This functions swaps the order of arguments to a few functions that previously
took (output, input) parameters, but now take (input, output) parameters (in
that order).
The affected functions are:
* ptr::copy
* ptr::copy_nonoverlapping
* slice::bytes::copy_memory
* intrinsics::copy
* intrinsics::copy_nonoverlapping
Closes#22890
[breaking-change]
This functions swaps the order of arguments to a few functions that previously
took (output, input) parameters, but now take (input, output) parameters (in
that order).
The affected functions are:
* ptr::copy
* ptr::copy_nonoverlapping
* slice::bytes::copy_memory
* intrinsics::copy
* intrinsics::copy_nonoverlapping
Closes#22890
[breaking-change]