- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
- Unicode-aware functions now live in libunicode
- is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
is_uppercase, is_whitespace, is_alphanumeric, is_control,
is_digit, to_uppercase, to_lowercase
- added width method in UnicodeChar trait
- determines printed width of character in columns, or None if it is
a non-NULL control character
- takes a boolean argument indicating whether the present context is
CJK or not (characters with 'A'mbiguous widths are double-wide in
CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
(libunicode)
- functionality formerly in StrSlice that relied upon Unicode
functionality from Char is now in UnicodeStrSlice
- words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
- also moved Words type alias into libunicode because words method is
in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
combining the libunicode functionality with the Char functionality
from libcore
- thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str
The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
I ran `make check` and everything went smoothly. I also tested `#[deriving(Decodable, Encodable)]` on a struct containing both Cell<T> and RefCell<T> and everything now seems to work fine.
LLVM doesn't handle i1 value in allocas/memory very well and skips a number of optimizations if it hits it. So we have to do the same thing that Clang does, using i1 for SSA values, but storing i8 in memory.
Fixes#15203.
LLVM doesn't really like types with a bit-width that isn't a multiple of
8 and disable various optimizations if it encounters such types used
with loads/stores. OTOH, booleans must be represented as i1 when used as
SSA values. To get the best results, we must use i1 for SSA values, and
i8 when storing the value to memory.
By using range asserts on loads, LLVM can eliminate the required
zero-extend and truncate operations.
Fixes#15203
`Vec::push_all` with a length 1 slice seems to have significant overhead compared to `Vec::push`.
```
test new_push_byte ... bench: 6985 ns/iter (+/- 487) = 17 MB/s
test old_push_byte ... bench: 19335 ns/iter (+/- 1368) = 6 MB/s
```
```rust
extern crate test;
use test::Bencher;
static TEXT: &'static str = "\
Unicode est un standard informatique qui permet des échanges \
de textes dans différentes langues, à un niveau mondial.";
#[bench]
fn old_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push_all([b]) }
}
})
}
#[bench]
fn new_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push(b) }
}
})
}
```
```
test new_push_byte ... bench: 6985 ns/iter (+/- 487) = 17 MB/s
test old_push_byte ... bench: 19335 ns/iter (+/- 1368) = 6 MB/s
```
```rust
extern crate test;
use test::Bencher;
static TEXT: &'static str = "\
Unicode est un standard informatique qui permet des échanges \
de textes dans différentes langues, à un niveau mondial.";
#[bench]
fn old_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push_all([b]) }
}
})
}
#[bench]
fn new_push_byte(bencher: &mut Bencher) {
bencher.bytes = TEXT.len() as u64;
bencher.iter(|| {
let mut new = String::new();
for b in TEXT.bytes() {
unsafe { new.as_mut_vec().push(b) }
}
})
}
```
This is an implementation of [RFC 35](https://github.com/rust-lang/rfcs/blob/master/active/0035-remove-crate-id.md).
The summary for this PR is the same as that of the RFC, with one addendum:
* Removes the `#[crate_id]` attribute and knowledge of versions from rustc.
* Added a `#[crate_name]` attribute similar to the old `#[crate_id]` attribute
* Output filenames no longer have versions or hashes
* Symbols no longer have versions (they still have hashes)
* A new flag, `--extern`, is used to override searching for external crates
* A new flag, `-C metadata=foo`, used when hashing symbols
* [added] An old flag, `--crate-name`, was re purposed to specify the crate name from the command line.
I tried to maintain backwards compatibility wherever possible (with warnings being printed). If I missed anywhere, however, please let me know!
[breaking-change]
Closes#14468Closes#14469Closes#14470Closes#14471
See commits for info, a number of these are 'breaking', although liburl is marked experimental so I'm not sure that matters so much.
First two commits will be impacted if #15138 is adopted, but it's a simple rename.
In a cargo-driven world the primary location for the name of a crate will be in
its manifest, not in the source file itself. The purpose of this flag is to
reduce required duplication for new cargo projects.
This is a breaking change because the existing --crate-name flag actually
printed the crate name. This flag was renamed to --print-crate-name, and to
maintain consistence, the --crate-file-name flag was renamed to
--print-file-name.
To maintain backwards compatibility, the --crate-file-name flag is still
recognized, but it is deprecated.
[breaking-change]
This comit implements a new flag, --extern, which is used to specify where a
crate is located. The purpose of this flag is to bypass the normal crate
loading/matching of the compiler to point it directly at the right file.
This flag takes the form `--extern foo=bar` where `foo` is the name of a crate
and `bar` is the location at which to find the crate. Multiple `--extern`
directives are allowed with the same crate name to specify the rlib/dylib pair
for a crate. It is invalid to specify more than one rlib or more than one dylib,
and it's required that the crates are valid rust crates.
I have also added some extensive documentation to metadata::loader about how
crate loading should work.
RFC: 0035-remove-crate-id
The compiler will no longer insert a hash or version into a filename by default.
Instead, all output is simply based off the crate name being compiled. For
example, a crate name of `foo` would produce the following outputs:
* bin => foo
* rlib => libfoo.rlib
* dylib => libfoo.{so,dylib} or foo.dll
* staticlib => libfoo.a
The old behavior has been moved behind a new codegen flag,
`-C extra-filename=<hash>`. For example, with the "extra filename" of `bar` and
a crate name of `foo`, the following outputs would be generated:
* bin => foo (same old behavior)
* rlib => libfoobar.rlib
* dylib => libfoobar.{so,dylib} or foobar.dll
* staticlib => libfoobar.a
The makefiles have been altered to pass a hash by default to invocations of
`rustc` so all installed rust libraries will have a hash in their filename. This
is done because the standard libraries are intended to be installed into
privileged directories such as /usr/local. Additionally, it involves very few
build system changes!
RFC: 0035-remove-crate-id
[breaking-change]
This commit modifies crate loading to purely work off a `crate_name` and nothing
else. This commit also changes the patterns recognized from `lib<foo>-*` to
`lib<foo>*` to accomodate the future renamings of output files.
RFC: 0035-remove-crate-id
This commit removes all support in the compiler for the #[crate_id] attribute
and all of its derivative infrastructure. A list of the functionality removed is:
* The #[crate_id] attribute no longer exists
* There is no longer the concept of a version of a crate
* Version numbers are no longer appended to symbol names
* The --crate-id command line option has been removed
To migrate forward, rename #[crate_id] to #[crate_name] and only the name of the
crate itself should be mentioned. The version/path of the old crate id should be
removed.
For a transitionary state, the #[crate_id] attribute is still accepted if
the #[crate_name] is not present, but it is warned about if it is the only
identifier present.
RFC: 0035-remove-crate-id
[breaking-change]
In my informal measurements, this brings the peak memory usage when
building librustc from 1662M down to 1502M. Since 1662 - 1502 = 160,
this may not recover the entirety of the observed memory regression
(250M) from PR #14604. (However, according to my local measurements,
the regression when building librustc was more like 209M, so perhaps
this will still recover the lions share of the lost memory.)
The types `Bitv` and `BitvSet` are badly out of date. This PR:
- cleans up the code (primarily, simplifies `Bitv` and implements `BitvSet` in terms of `Bitv`)
- implements several new traits for `Bitv`
- adds new functionality to `Bitv` and `BitvSet`
- replaces internal iterators with external ones
- updates documentation
- minor bug fixes
This is a significantly souped-up version of PR #15139 and is the result of the discussion there.
In my informal measurements, this brings the peak memory usage when
building librustc from 1662M down to 1502M. Since 1662 - 1502 = 160,
this may not recover the entirety of the observed memory regression
(250M) from PR #14604. (However, according to my local measurements,
the regression when building librustc was more like 209M, so perhaps
this will still recover the lions share of the lost memory.)