- `width()` computes the displayed width of a string, ignoring the width of control characters.
- arguably we might do *something* else for control characters, but the question is, what?
- users who want to do something else can iterate over chars()
- `graphemes()` returns a `Graphemes` struct, which implements an iterator over the grapheme clusters of a &str.
- fully compliant with [UAX#29](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries)
- passes all [Unicode-supplied tests](http://www.unicode.org/reports/tr41/tr41-15.html#Tests29)
- added code to generate additionial categories in `unicode.py`
- `Cn` aka `Not_Assigned`
- categories necessary for grapheme cluster breaking
- tidied up the exports from libunicode
- all exports are exposed through a module rather than directly at crate root.
- std::prelude imports UnicodeChar and UnicodeStrSlice from std::char and std::str rather than directly from libunicode
closes#7043
- Graphemes and GraphemeIndices structs implement iterators over
grapheme clusters analogous to the Chars and CharOffsets for chars in
a string. Iterator and DoubleEndedIterator are available for both.
- tidied up the exports for libunicode. crate root exports are now moved
into more appropriate module locations:
- UnicodeStrSlice, Words, Graphemes, GraphemeIndices are in str module
- UnicodeChar exported from char instead of crate root
- canonical_combining_class is exported from str rather than crate root
Since libunicode's exports have changed, programs that previously relied
on the old export locations will need to change their `use` statements
to reflect the new ones. See above for more information on where the new
exports live.
closes#7043
[breaking-change]
This PR is the outcome of the library stabilization meeting for the
`liballoc::owned` and `libcore::cell` modules.
Aside from the stability attributes, there are a few breaking changes:
* The `owned` modules is now named `boxed`, to better represent its
contents. (`box` was unavailable, since it's a keyword.) This will
help avoid the misconception that `Box` plays a special role wrt
ownership.
* The `AnyOwnExt` extension trait is renamed to `BoxAny`, and its `move`
method is renamed to `downcast`, in both cases to improve clarity.
* The recently-added `AnySendOwnExt` extension trait is removed; it was
not being used and is unnecessary.
[breaking-change]
Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
- Unicode-aware functions now live in libunicode
- `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
`is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
`to_uppercase`, `to_lowercase`
- added `width` method in UnicodeChar trait
- determines printed width of character in columns, or None if it is a non-NULL control character
- takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
- functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
- `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
- also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
- thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`
The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
- Unicode-aware functions now live in libunicode
- is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
is_uppercase, is_whitespace, is_alphanumeric, is_control,
is_digit, to_uppercase, to_lowercase
- added width method in UnicodeChar trait
- determines printed width of character in columns, or None if it is
a non-NULL control character
- takes a boolean argument indicating whether the present context is
CJK or not (characters with 'A'mbiguous widths are double-wide in
CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
(libunicode)
- functionality formerly in StrSlice that relied upon Unicode
functionality from Char is now in UnicodeStrSlice
- words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
- also moved Words type alias into libunicode because words method is
in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
combining the libunicode functionality with the Char functionality
from libcore
- thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str
The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
Earlier commits have established a baseline of `experimental` stability
for all crates under the facade (so their contents are considered
experimental within libstd). Since `experimental` is `allow` by
default, we should use the same baseline stability for libstd itself.
This commit adds `experimental` tags to all of the modules defined in
`std`, and `unstable` to `std` itself.
We use re-exported pathes (e.g. std::io::Command) and original ones
(e.g. std::io::process::Command) together in examples now. Using
re-exported ones consistently avoids confusion.
Signed-off-by: OGINO Masanori <masanori.ogino@gmail.com>
This allows llvm to optimize away much of the overhead from using
the MemReader/MemWriters. My benchmarks showed it to shave 15% off
of my in progress serialization/json encoding.
Replace its usage with byte string literals, except in `bytes!()` tests.
Also add a new snapshot, to be able to use the new b"foo" syntax.
The src/etc/2014-06-rewrite-bytes-macros.py script automatically
rewrites `bytes!()` invocations into byte string literals.
Pass it filenames as arguments to generate a diff that you can inspect,
or `--apply` followed by filenames to apply the changes in place.
Diffs can be piped into `tip` or `pygmentize -l diff` for coloring.
This is part of the ongoing renaming of the equality traits. See #12517 for more
details. All code using Eq/Ord will temporarily need to move to Partial{Eq,Ord}
or the Total{Eq,Ord} traits. The Total traits will soon be renamed to {Eq,Ord}.
cc #12517
[breaking-change]
The span on a inner doc-comment would point to the next token, e.g. the span for the `a` line points to the `b` line, and the span of `b` points to the `fn`.
```rust
//! a
//! b
fn bar() {}
```
1. Wherever the `buf` field of a `Formatter` was used, the `Formatter` is used
instead.
2. The usage of `write_fmt` is minimized as much as possible, the `write!` macro
is preferred wherever possible.
3. Usage of `fmt::write` is minimized, favoring the `write!` macro instead.
This new method, write_fmt(), is the one way to write a formatted list of
arguments into a Writer stream. This has a special adaptor to preserve errors
which occur on the writer.
All macros will be updated to use this method explicitly.
The existing APIs for spawning processes took strings for the command
and arguments, but the underlying system may not impose utf8 encoding,
so this is overly limiting.
The assumption we actually want to make is just that the command and
arguments are viewable as [u8] slices with no interior NULLs, i.e., as
CStrings. The ToCStr trait is a handy bound for types that meet this
requirement (such as &str and Path).
However, since the commands and arguments are often a mixture of
strings and paths, it would be inconvenient to take a slice with a
single T: ToCStr bound. So this patch revamps the process creation API
to instead use a builder-style interface, called `Command`, allowing
arguments to be added one at a time with differing ToCStr
implementations for each.
The initial cut of the builder API has some drawbacks that can be
addressed once issue #13851 (libstd as a facade) is closed. These are
detailed as FIXMEs.
Closes#11650.
[breaking-change]
I feel that this is a very vital, missing piece of functionality. This adds on to #13072.
Only bits used in the definition of the bitflag are considered for the universe set. This is a bit safer than simply inverting all of the bits in the wrapped value.
```rust
bitflags!(flags Flags: u32 {
FlagA = 0x00000001,
FlagB = 0x00000010,
FlagC = 0x00000100,
FlagABC = FlagA.bits
| FlagB.bits
| FlagC.bits
})
...
// `Not` implements set complement
assert!(!(FlagB | FlagC) == FlagA);
// `all` and `is_all` are the inverses of `empty` and `is_empty`
assert!(Flags::all() - FlagA == !FlagA);
assert!(FlagABC.is_all());
```
Reader.read_at_least() ensures that at least a given number of bytes
have been read. The most common use-case for this is ensuring at least 1
byte has been read. If the reader returns 0 enough times in a row, a new
error kind NoProgress will be returned instead of looping infinitely.
This change is necessary in order to properly support Readers that
repeatedly return 0, either because they're broken, or because they're
attempting to do a non-blocking read on some resource that never becomes
available.
Also add .push() and .push_at_least() methods. push() is like read() but
the results are appended to the passed Vec.
Remove Reader.fill() and Reader.push_exact() as they end up being thin
wrappers around read_at_least() and push_at_least().
[breaking-change]
Been meaning to try my hand at something like this for a while, and noticed something similar mentioned as part of #13537. The suggestion on the original ticket is to use `TcpStream::open(&str)` to pass in a host + port string, but seems a little cleaner to pass in host and port separately -- so a signature like `TcpStream::open(&str, u16)`.
Also means we can use std::io::net::addrinfo directly instead of using e.g. liburl to parse the host+port pair from a string.
One outstanding issue in this PR that I'm not entirely sure how to address: in open_timeout, the timeout_ms will apply for every A record we find associated with a hostname -- probably not the intended behavior, but I didn't want to waste my time on elaborate alternatives until the general idea was a-OKed. :)
Anyway, perhaps there are other reasons for us to prefer the original proposed syntax, but thought I'd get some thoughts on this. Maybe there are some solid reasons to prefer using liburl to do this stuff.
Prior to this commit, TcpStream::connect and TcpListener::bind took a
single SocketAddr argument. This worked well enough, but the API felt a
little too "low level" for most simple use cases.
A great example is connecting to rust-lang.org on port 80. Rust users would
need to:
1. resolve the IP address of rust-lang.org using
io::net::addrinfo::get_host_addresses.
2. check for errors
3. if all went well, use the returned IP address and the port number
to construct a SocketAddr
4. pass this SocketAddr to TcpStream::connect.
I'm modifying the type signature of TcpStream::connect and
TcpListener::bind so that the API is a little easier to use.
TcpStream::connect now accepts two arguments: a string describing the
host/IP of the host we wish to connect to, and a u16 representing the
remote port number.
Similarly, TcpListener::bind has been modified to take two arguments:
a string describing the local interface address (e.g. "0.0.0.0" or
"127.0.0.1") and a u16 port number.
Here's how to port your Rust code to use the new TcpStream::connect API:
// old ::connect API
let addr = SocketAddr{ip: Ipv4Addr{127, 0, 0, 1}, port: 8080};
let stream = TcpStream::connect(addr).unwrap()
// new ::connect API (minimal change)
let addr = SocketAddr{ip: Ipv4Addr{127, 0, 0, 1}, port: 8080};
let stream = TcpStream::connect(addr.ip.to_str(), addr.port()).unwrap()
// new ::connect API (more compact)
let stream = TcpStream::connect("127.0.0.1", 8080).unwrap()
// new ::connect API (hostname)
let stream = TcpStream::connect("rust-lang.org", 80)
Similarly, for TcpListener::bind:
// old ::bind API
let addr = SocketAddr{ip: Ipv4Addr{0, 0, 0, 0}, port: 8080};
let mut acceptor = TcpListener::bind(addr).listen();
// new ::bind API (minimal change)
let addr = SocketAddr{ip: Ipv4Addr{0, 0, 0, 0}, port: 8080};
let mut acceptor = TcpListener::bind(addr.ip.to_str(), addr.port()).listen()
// new ::bind API (more compact)
let mut acceptor = TcpListener::bind("0.0.0.0", 8080).listen()
[breaking-change]
Closes#14163 (Fix typos in rustc manpage)
Closes#14161 (Add the patch number to version strings. Closes#13289)
Closes#14156 (rustdoc: Fix hiding implementations of traits)
Closes#14152 (add shebang to scripts that have execute bit set)
Closes#14150 (libcore: remove fails from slice.rs and remove duplicated length checking)
Closes#14147 (Make ProcessOutput Eq, TotalEq, Clone)
Closes#14142 (doc: updates rust manual (loop to continue))
Closes#14141 (doc: Update the linkage documentation)
Closes#14139 (Remove an unnecessary .move_iter().collect())
Closes#14136 (Two minor fixes in parser.rs)
Closes#14130 (Fixed typo in comments of driver.rs)
Closes#14128 (Add `stat` method to `std::io::fs::File` to stat without a Path.)
Closes#14114 (rustdoc: List macros in the sidebar)
Closes#14113 (shootout-nbody improvement)
Closes#14112 (Improved example code in Option)
Closes#14104 (Remove reference to MutexArc)
Closes#14087 (emacs: highlight `macro_name!` in macro invocations using [] delimiters)
The `FileStat` struct contained a `path` field, which was filled by the
`stat` and `lstat` function. Since this field isn't in fact returned by
the operating system (it was copied from the paths passed to the
functions) it was removed, as in the `fstat` case we aren't working with
a `Path`, but directly with a fd.
If your code used the `path` field of `FileStat` you will now have to
manually store the path passed to `stat` along with the returned struct.
[breaking-change]
This commit revisits the `cast` module in libcore and libstd, and scrutinizes
all functions inside of it. The result was to remove the `cast` module entirely,
folding all functionality into the `mem` module. Specifically, this is the fate
of each function in the `cast` module.
* transmute - This function was moved to `mem`, but it is now marked as
#[unstable]. This is due to planned changes to the `transmute`
function and how it can be invoked (see the #[unstable] comment).
For more information, see RFC 5 and #12898
* transmute_copy - This function was moved to `mem`, with clarification that is
is not an error to invoke it with T/U that are different
sizes, but rather that it is strongly discouraged. This
function is now #[stable]
* forget - This function was moved to `mem` and marked #[stable]
* bump_box_refcount - This function was removed due to the deprecation of
managed boxes as well as its questionable utility.
* transmute_mut - This function was previously deprecated, and removed as part
of this commit.
* transmute_mut_unsafe - This function doesn't serve much of a purpose when it
can be achieved with an `as` in safe code, so it was
removed.
* transmute_lifetime - This function was removed because it is likely a strong
indication that code is incorrect in the first place.
* transmute_mut_lifetime - This function was removed for the same reasons as
`transmute_lifetime`
* copy_lifetime - This function was moved to `mem`, but it is marked
`#[unstable]` now due to the likelihood of being removed in
the future if it is found to not be very useful.
* copy_mut_lifetime - This function was also moved to `mem`, but had the same
treatment as `copy_lifetime`.
* copy_lifetime_vec - This function was removed because it is not used today,
and its existence is not necessary with DST
(copy_lifetime will suffice).
In summary, the cast module was stripped down to these functions, and then the
functions were moved to the `mem` module.
transmute - #[unstable]
transmute_copy - #[stable]
forget - #[stable]
copy_lifetime - #[unstable]
copy_mut_lifetime - #[unstable]
[breaking-change]
This is the last remaining networkig object to implement timeouts for. This
takes advantage of the CancelIo function and the already existing asynchronous
I/O functionality of pipes.
These timeouts all follow the same pattern as established by the timeouts on
acceptors. There are three methods: set_timeout, set_read_timeout, and
set_write_timeout. Each of these sets a point in the future after which
operations will time out.
Timeouts with cloned objects are a little trickier. Each object is viewed as
having its own timeout, unaffected by other objects' timeouts. Additionally,
timeouts do not propagate when a stream is cloned or when a cloned stream has
its timeouts modified.
This commit is just the public interface which will be exposed for timeouts, the
implementation will come in later commits.
This moves as much allocation as possible from teh std::str module into
core::str. This includes essentially all non-allocating functionality, mostly
iterators and slicing and such.
This primarily splits the Str trait into only having the as_slice() method,
adding a new StrAllocating trait to std::str which contains the relevant new
allocation methods. This is a breaking change if any of the methods of "trait
Str" were overriden. The old functionality can be restored by implementing both
the Str and StrAllocating traits.
[breaking-change]
for `~str`/`~[]`.
Note that `~self` still remains, since I forgot to add support for
`Box<self>` before the snapshot.
How to update your code:
* Instead of `~EXPR`, you should write `box EXPR`.
* Instead of `~TYPE`, you should write `Box<Type>`.
* Instead of `~PATTERN`, you should write `box PATTERN`.
[breaking-change]
This patch changes `std::io::FilePermissions` from an exposed `u32`
representation to a typesafe representation (that only allows valid
flag combinations) using the `std::bitflags`, thus ensuring a greater
degree of safety on the Rust side.
Despite the change to the type, most code should continue to work
as-is, sincde the new type provides bit operations in the style of C
flags. To get at the underlying integer representation, use the `bits`
method; to (unsafely) convert to `FilePermissions`, use
`FilePermissions::from_bits`.
Closes#6085.
[breaking-change]
This adds a `TcpStream::connect_timeout` function in order to assist opening
connections with a timeout (cc #13523). There isn't really much design space for
this specific operation (unlike timing out normal blocking reads/writes), so I
am fairly confident that this is the correct interface for this function.
The function is marked #[experimental] because it takes a u64 timeout argument,
and the u64 type is likely to change in the future.
This removes all resizability support for ~[T] vectors in preparation of DST.
The only growable vector remaining is Vec<T>. In summary, the following methods
from ~[T] and various functions were removed. Each method/function has an
equivalent on the Vec type in std::vec unless otherwise stated.
* slice::OwnedCloneableVector
* slice::OwnedEqVector
* slice::append
* slice::append_one
* slice::build (no replacement)
* slice::bytes::push_bytes
* slice::from_elem
* slice::from_fn
* slice::with_capacity
* ~[T].capacity()
* ~[T].clear()
* ~[T].dedup()
* ~[T].extend()
* ~[T].grow()
* ~[T].grow_fn()
* ~[T].grow_set()
* ~[T].insert()
* ~[T].pop()
* ~[T].push()
* ~[T].push_all()
* ~[T].push_all_move()
* ~[T].remove()
* ~[T].reserve()
* ~[T].reserve_additional()
* ~[T].reserve_exect()
* ~[T].retain()
* ~[T].set_len()
* ~[T].shift()
* ~[T].shrink_to_fit()
* ~[T].swap_remove()
* ~[T].truncate()
* ~[T].unshift()
* ~str.clear()
* ~str.set_len()
* ~str.truncate()
Note that no other API changes were made. Existing apis that took or returned
~[T] continue to do so.
[breaking-change]