There is an issue with lev_distance, where
```
fn main() {
println!("{}", "\x80".lev_distance("\x80"))
}
```
prints `2`.
This is due to using the byte length instead of the char length.
Additionally, support zero-sized types.
Now there isn't a safe interface of `PartialVec` anymore, it's just a bare data structure with destructor that assumes you handled everything correctly before.
Replaces BTree with BTreeMap and BTreeSet, which are completely new implementations.
BTreeMap's internal Node representation is particularly inefficient at the moment to
make this first implementation easy to reason about and fairly safe. Both collections
are also currently missing some of the tooling specific to sorted collections, which
is planned as future work pending reform of these APIs. General implementation issues
are discussed with TODOs internally
[breaking-change]
Still waiting on compilation/test/bench stuff locally, but the edit-distance on any errors should be very small at this point. This is ready to be reviewed.
Replaces BTree with BTreeMap and BTreeSet, which are completely new implementations.
BTreeMap's internal Node representation is particularly inefficient at the moment to
make this first implementation easy to reason about and fairly safe. Both collections
are also currently missing some of the tooling specific to sorted collections, which
is planned as future work pending reform of these APIs. General implementation issues
are discussed with TODOs internally
Perf results on x86_64 Linux:
test treemap::bench::find_rand_100 ... bench: 76 ns/iter (+/- 4)
test treemap::bench::find_rand_10_000 ... bench: 163 ns/iter (+/- 6)
test treemap::bench::find_seq_100 ... bench: 77 ns/iter (+/- 3)
test treemap::bench::find_seq_10_000 ... bench: 115 ns/iter (+/- 1)
test treemap::bench::insert_rand_100 ... bench: 111 ns/iter (+/- 1)
test treemap::bench::insert_rand_10_000 ... bench: 996 ns/iter (+/- 18)
test treemap::bench::insert_seq_100 ... bench: 486 ns/iter (+/- 20)
test treemap::bench::insert_seq_10_000 ... bench: 800 ns/iter (+/- 15)
test btree::map::bench::find_rand_100 ... bench: 74 ns/iter (+/- 4)
test btree::map::bench::find_rand_10_000 ... bench: 153 ns/iter (+/- 5)
test btree::map::bench::find_seq_100 ... bench: 82 ns/iter (+/- 1)
test btree::map::bench::find_seq_10_000 ... bench: 108 ns/iter (+/- 0)
test btree::map::bench::insert_rand_100 ... bench: 220 ns/iter (+/- 1)
test btree::map::bench::insert_rand_10_000 ... bench: 620 ns/iter (+/- 16)
test btree::map::bench::insert_seq_100 ... bench: 411 ns/iter (+/- 12)
test btree::map::bench::insert_seq_10_000 ... bench: 534 ns/iter (+/- 14)
BTreeMap still has a lot of room for optimization, but it's already beating out TreeMap on most access patterns.
[breaking-change]
# Rationale
When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name. There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).
The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:
> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.
# Method stabilization
The following methods have been marked #[stable]
* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice
The following methods have been marked #[unstable]
* String::from_utf8 - The error type in the returned `Result` may change to
provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
stabilization
* String::from_utf16 - The return type may change to become a `Result` which
includes more contextual information like where the error
occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
been marked #[unstable] becuase it can be seen as a
duplicate of iterator-based functionality as well as
possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
less efficient because of decoding/encoding. Due to the
desire to minimize API surface this may be able to be
removed in the future for something possibly generic with
no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
indices isn't totally clear at this time, so the failure
semantics and return value of this method are subject to
change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]
[2]: rust-lang/rfcs#240
The following method have been marked #[experimental]
* String::from_str - This function only exists as it's more efficient than
to_string(), but having a less ergonomic function for
performance reasons isn't the greatest reason to keep it
around. Like Vec::push_all, this has been marked
experimental for now.
The following methods have been #[deprecated]
* String::append - This method has been deprecated to remain consistent with the
deprecation of Vec::append. While convenient, it is one of
the only functional-style apis on String, and requires more
though as to whether it belongs as a first-class method or
now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
str::from_utf8 plus an assert plus a call to to_string().
Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
type which allow bypassing utf-8 checks. These have all
been deprecated in favor of calling `.as_mut_vec()` and
then operating directly on the vector returned. These
methods were deprecated because naming them with relation
to other methods was difficult to rationalize and it's
arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes
# Reservation methods
This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]
[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
This commit deprecates the String::shift_char() function in favor of the
addition of an insert()/remove() pair of functions. This aligns the API with Vec
in that characters can be inserted at arbitrary positions. Additionaly, there
is no `_char` suffix due to the rationaled laid out in the previous commit.
These functions are both introduced as unstable as their failure semantics,
while in line with slices/vectors, are uncertain about whether they should
remain the same.
# Rationale
When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name. There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).
The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:
> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.
# Method stabilization
The following methods have been marked #[stable]
* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice
The following methods have been marked #[unstable]
* String::from_utf8 - The error type in the returned `Result` may change to
provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
stabilization
* String::from_utf16 - The return type may change to become a `Result` which
includes more contextual information like where the error
occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
been marked #[unstable] becuase it can be seen as a
duplicate of iterator-based functionality as well as
possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
less efficient because of decoding/encoding. Due to the
desire to minimize API surface this may be able to be
removed in the future for something possibly generic with
no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
indices isn't totally clear at this time, so the failure
semantics and return value of this method are subject to
change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]
[2]: https://github.com/rust-lang/rfcs/pull/240
The following method have been marked #[experimental]
* String::from_str - This function only exists as it's more efficient than
to_string(), but having a less ergonomic function for
performance reasons isn't the greatest reason to keep it
around. Like Vec::push_all, this has been marked
experimental for now.
The following methods have been #[deprecated]
* String::append - This method has been deprecated to remain consistent with the
deprecation of Vec::append. While convenient, it is one of
the only functional-style apis on String, and requires more
though as to whether it belongs as a first-class method or
now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
str::from_utf8 plus an assert plus a call to to_string().
Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
type which allow bypassing utf-8 checks. These have all
been deprecated in favor of calling `.as_mut_vec()` and
then operating directly on the vector returned. These
methods were deprecated because naming them with relation
to other methods was difficult to rationalize and it's
arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes
# Reservation methods
This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]
[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
The following methods, types, and names have become stable:
* Vec
* Vec::as_mut_slice
* Vec::as_slice
* Vec::capacity
* Vec::clear
* Vec::default
* Vec::grow
* Vec::insert
* Vec::len
* Vec::new
* Vec::pop
* Vec::push
* Vec::remove
* Vec::set_len
* Vec::shrink_to_fit
* Vec::truncate
* Vec::with_capacity
The following have become unstable:
* Vec::dedup // naming
* Vec::from_fn // naming and unboxed closures
* Vec::get_mut // will be removed for IndexMut
* Vec::grow_fn // unboxed closures and naming
* Vec::retain // unboxed closures
* Vec::swap_remove // uncertain naming
* Vec::from_elem // uncertain semantics
* vec::unzip // should be generic for all collections
The following have been deprecated
* Vec::append - call .extend()
* Vec::append_one - call .push()
* Vec::from_slice - call .to_vec()
* Vec::grow_set - call .grow() and then .push()
* Vec::into_vec - move the vector instead
* Vec::move_iter - renamed to iter_move()
* Vec::to_vec - call .clone()
The following methods remain experimental pending conventions
* vec::raw
* vec::raw::from_buf
* Vec:from_raw_parts
* Vec::push_all
This is a breaking change in terms of the signature of the `Vec::grow` function.
The argument used to be taken by reference, but it is now taken by value. Code
must update by removing a leading `&` sigil or by calling `.clone()` to create a
value.
[breaking-change]
Change to resolve and update compiler and libs for uses.
[breaking-change]
Enum variants are now in both the value and type namespaces. This means that
if you have a variant with the same name as a type in scope in a module, you
will get a name clash and thus an error. The solution is to either rename the
type or the variant.
Sized deallocation makes it pointless to provide an address that never
overlaps with pointers returned by an allocator. Code can branch on the
capacity of the allocation instead of a comparison with this sentinel.
This improves the situation in #8859, and the remaining issues are only
from the logging API, which should be disabled by default in optimized
release builds anyway along with debug assertions. The remaining issues
are part of #17081.
Closes#8859
This isn't ready to merge yet.
The 'containers and iterators' guide is basically just a collection of stuff that should be in the module definitions. So I'm moving the guide to just an 'iterators' guide, and moved the info that was there into the right places.
So, is this a good path forward, and is all of the information still correct?
This is important because the underlying allocator of the `Vec` passes that
information to the deallocator which needs the guarantee that it is the same
parameters that were also passed to the allocation function.
bitv: add larger tests, better benchmarks & remove dead code.
There were no tests for iteration etc. with more than 5 elements,
i.e. not even going beyond a single word. This situation is rectified.
Also, the only benchmarks for `set` were with a constant bit value,
which was not indicative of every situation, due to inlining & branch
removal. This adds a benchmark at the other end of the spectrum: random
input.
There were no tests for iteration etc. with more than 5 elements,
i.e. not even going beyond a single word. This situation is rectified.
Also, the only benchmarks for `set` were with a constant bit value,
which was not indicative of every situation, due to inlining & branch
removal. This adds a benchmark at the other end of the spectrum: random
input.
As outlined in
https://aturon.github.io/style/naming/conversions.html
`to_` functions names should only be used for expensive operations.
Thus `to_option` is better named `as_option`. Also, putting type
names into method names is considered bad style; what the user is
really trying to get is a reference. This `as_ref` is even better.
Also, we are missing a mutable version of this method. So add a
new trait `RawMutPtr` with a corresponding `as_mut` methode.
Finally, there is a bug in the signature of `to_option` which has
been around since lifetime elision: originally the returned reference
had 'static lifetime, but since the elision changes this become
the lifetime of the raw pointer (which does not make sense, since
the pointer lifetime and referent lifetime are unrelated). Fix
the bug to return a reference with a fresh lifetime (which will
be inferred from the calling context).
[breaking-change]
This adds support for lint groups to the compiler. Lint groups are a way of
grouping a number of lints together under one name. For example, this also
defines a default lint for naming conventions, named `bad_style`. Writing
`#[allow(bad_style)]` is equivalent to writing
`#[allow(non_camel_case_types, non_snake_case, non_uppercase_statics)]`. These
lint groups can also be defined as a compiler plugin using the new
`Registry::register_lint_group` method.
This also adds two built-in lint groups, `bad_style` and `unused`. The contents
of these groups can be seen by running `rustc -W help`.
This unifies the `non_snake_case_functions` and `uppercase_variables` lints
into one lint, `non_snake_case`. It also now checks for non-snake-case modules.
This also extends the non-camel-case types lint to check type parameters, and
merges the `non_uppercase_pattern_statics` lint into the
`non_uppercase_statics` lint.
Because the `uppercase_variables` lint is now part of the `non_snake_case`
lint, all non-snake-case variables that start with lowercase characters (such
as `fooBar`) will now trigger the `non_snake_case` lint.
New code should be updated to use the new `non_snake_case` lint instead of the
previous `non_snake_case_functions` and `uppercase_variables` lints. All use of
the `non_uppercase_pattern_statics` should be replaced with the
`non_uppercase_statics` lint. Any code that previously contained non-snake-case
module or variable names should be updated to use snake case names or disable
the `non_snake_case` lint. Any code with non-camel-case type parameters should
be changed to use camel case or disable the `non_camel_case_types` lint.
[breaking-change]
Per API meeting
https://github.com/rust-lang/meeting-minutes/blob/master/Meeting-API-review-2014-08-13.md
# Changes to `core::option`
Most of the module is marked as stable or unstable; most of the unstable items are awaiting resolution of conventions issues.
However, a few methods have been deprecated, either due to lack of use or redundancy:
* `take_unwrap`, `get_ref` and `get_mut_ref` (redundant, and we prefer for this functionality to go through an explicit .unwrap)
* `filtered` and `while`
* `mutate` and `mutate_or_set`
* `collect`: this functionality is being moved to a new `FromIterator` impl.
# Changes to `core::result`
Most of the module is marked as stable or unstable; most of the unstable items are awaiting resolution of conventions issues.
* `collect`: this functionality is being moved to a new `FromIterator` impl.
* `fold_` is deprecated due to lack of use
* Several methods found in `core::option` are added here, including `iter`, `as_slice`, and variants.
Due to deprecations, this is a:
[breaking-change]
[breaking-change]
1. The internal layout for traits has changed from (vtable, data) to (data, vtable). If you were relying on this in unsafe transmutes, you might get some very weird and apparently unrelated errors. You should not be doing this! Prefer not to do this at all, but if you must, you should use raw::TraitObject rather than hardcoding rustc's internal representation into your code.
2. The minimal type of reference-to-vec-literals (e.g., `&[1, 2, 3]`) is now a fixed size vec (e.g., `&[int, ..3]`) where it used to be an unsized vec (e.g., `&[int]`). If you want the unszied type, you must explicitly give the type (e.g., `let x: &[_] = &[1, 2, 3]`). Note in particular where multiple blocks must have the same type (e.g., if and else clauses, vec elements), the compiler will not coerce to the unsized type without a hint. E.g., `[&[1], &[1, 2]]` used to be a valid expression of type '[&[int]]'. It no longer type checks since the first element now has type `&[int, ..1]` and the second has type &[int, ..2]` which are incompatible.
3. The type of blocks (including functions) must be coercible to the expected type (used to be a subtype). Mostly this makes things more flexible and not less (in particular, in the case of coercing function bodies to the return type). However, in some rare cases, this is less flexible. TBH, I'm not exactly sure of the exact effects. I think the change causes us to resolve inferred type variables slightly earlier which might make us slightly more restrictive. Possibly it only affects blocks with unreachable code. E.g., `if ... { fail!(); "Hello" }` used to type check, it no longer does. The fix is to add a semicolon after the string.
Heapify is O(n), extend as currently implemented is O(nlogn). No brainer.
Currently investigating whether extend can just be implemented as a local heapify.
For crates `alloc`–`collections`. This is mostly just updating a few function/method descriptions to use the indicative style.
cc #4361; I’ve sort of assumed that the third-person indicative style has been decided on, but I could update this to use the imperative style if that’s preferred, or even update this to remove all function-style-related changes. (I think that standardising on one thing, even if it’s not the ‘best’ option, is still better than having no standard at all.) The indicative style seems to be more common in the Rust standard library at the moment, especially in the newer modules (e.g. `collections::vec`), more popular in the discussion about it, and also more popular amongst other languages (see https://github.com/rust-lang/rust/issues/4361#issuecomment-33470215).
This was bothering me (and some other people). The macro was necessary in a transient step of my development, but I converged on a design where it was unnecessary, but it didn't really click that that had happened.
This fixes it up.
This is very half-baked at the moment and very inefficient, e.g.
inappropriate use of by-value `self` (and thus being forced into an
overuse of `clone`). People get the wrong impression about Rust when
using it, e.g. that Rust cannot express what other languages can because
the implementation is inefficient: https://news.ycombinator.com/item?id=8187831 .