This python script will consume an appropriately formatted JSON file and
output either a Rust file for use in librustc_platform_intrinsics, or an
extern block for importing the intrinsics in an external library.
The --help flag has details.
This commit removes all unstable and deprecated functions in the standard
library. A release was recently cut (1.3) which makes this a good time for some
spring cleaning of the deprecated functions.
Completely rewrite the conversion of decimal strings to `f64` and `f32`. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding.
This code implements the algorithms from the paper [How to Read Floating Point Numbers Accurately][paper] by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations.
# Correctness
With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in `src/etc/test-float-parse`: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are `#[test]`s, but they don't even cover every stupid mistake I made in course of writing this.
Another aspect is *integer* overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will *accidentally* cause this code to go out of range. Still, no guarantees.
# Limitations
Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e., `digits * 10^abs(exp)` can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit is `number_of_digits + abs(exp) > 375` but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely.
This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as `0.333...333` with 400+ `3`s. Thus this might qualify as [breaking-change].
# Performance
Benchmarks results are... tolerable. Short numbers that hit the fast paths (`f64` multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds).
Numbers that need the AlgorithmM fallback (for `f64`, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow.
All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load.
# Binary size
Unfortunately the implementation needs to duplicate almost all code: Once for `f32` and once for `f64`. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes.
Running a stage1 `rustc` over a stand-alone program that simply parses pi to `f32` and `f64` and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised.
| rustc options | old | new | delta |
|--------------------------- |--------- |--------- |----------- |
| [nothing] | 2588375 | 2633828 | 44.39 KiB |
| -O | 2585211 | 2630688 | 44.41 KiB |
| -O -C lto | 1026353 | 1054981 | 27.96 KiB |
| -O -C lto -C link-args=-s | 414208 | 442368 | 27.5 KiB |
# Future Work
## Directory layout
The `dec2flt` code uses some types embedded deeply in the `flt2dec` module hierarchy, even though nothing about them it formatting-specific. They should be moved to a more conversion-direction-agnostic location at some point.
## Performance
It could be much better, especially for large inputs. Some low-hanging fruit has been picked but much more work could be done. Some specific ideas are jotted down in `FIXME`s all over the code.
## Binary size
One could try to compress the table further, though I am skeptical. Another avenue would be reducing the code duplication from basically everything being generic over `T: RawFloat`. Perhaps one can reduce the magnitude of the duplication by pushing the parts that don't need to know the target type into separate functions, but this is finicky and probably makes some code read less naturally.
## Other bases
This PR leaves `f{32,64}::from_str_radix` alone. It only replaces `FromStr` (and thus `.parse()`). I am convinced that `from_str_radix` should not exist, and have proposed its [deprecation and speedy removal][deprecate-radix]. Whatever the outcome of that discussion, it is independent from, and out of scope for, this PR.
Fixes#24557Fixes#14353
r? @pnkfelix
cc @lifthrasiir @huonw
[paper]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152
[deprecate-radix]: https://internals.rust-lang.org/t/deprecate-f-32-64-from-str-radix/2405
This commit removes all unstable and deprecated functions in the standard
library. A release was recently cut (1.3) which makes this a good time for some
spring cleaning of the deprecated functions.
This commit leverages the runtime support for DWARF exception info added
in #27210 to enable unwinding by default on 64-bit MSVC. This also additionally
adds a few minor fixes here and there in the test harness and such to get
`make check` entirely passing on 64-bit MSVC:
* The invocation of `maketest.py` now works with spaces/quotes in CC
* debuginfo tests are disabled on MSVC
* A link error for librustc was hacked around (see #27438)
As there’s no C++ runtime any more there’s really no point in having
anything but Rust tags being made.
I’ve also taken the liberty of excluding the compiler parts of this in
the `librust%,,` pattern substitution. Whether or not this is “correct”
will depend on whether you want tags for the compiler or for general
use. For myself, I want it for general use.
I’m not sure how much people use the tags files anyway. I definitely do,
but with Racer existing the tags files aren’t quite so necessary.
I added it because it was easy (same a `char::to_lowercase`,
just a different table), but it doesn’t make sense to have this
in std but not str::to_titlecase, which would require
https://github.com/unicode-rs/unicode-segmentation
At some point in the future this feature will be available
(both on char and str) in a crates.io crate.
GDB and LLDB pretty printers have some common functionality and also access some common information, such as the layout of standard library types. So far, this information has been duplicated in the two pretty printing python modules. This PR introduces a common module used by both debuggers.
This PR also implements proper rendering of `String` and `&str` values in LLDB.
GDB and LLDB pretty printers have some common functionality
and also access some common information, such as the layout of
standard library types. So far, this information has been
duplicated in the two pretty printing python modules. This
commit introduces a common module used by both debuggers.
Windows needs explicit exports of functions from DLLs but LLVM does not mention
any of its symbols as being export-able from a DLL. The compiler, however,
relies on being able to use LLVM symbols across DLL boundaries so we need to
force many of LLVM's symbols to be exported from `rustc_llvm.dll`. This commit
adds support for generation of a `rustc_llvm.def` file which is passed along to
the linker when generating `rustc_llvm.dll` which should keep all these symbols
exportable and usable.
These libraries don't exist! The linker for MSVC is highly likely to not pass
`/NODEFAULTLIB` in which case the right standard library will automatically be
selected.
This script used to be used to extract the grammar sections from the
reference, but there is now a separate src/doc/grammar.md where the
grammar sections that used to be in the reference live, so there is
no longer a need to extract the grammar from the reference.
Apply optimization described in
https://github.com/rust-lang/regex/pull/73#issuecomment-93777126
to rust's copy of `unicode.py`.
This shrinks librustc_unicode's tables.rs from 479kB to 456kB,
and should improve performance slightly for related operations
(e.g., is_alphabetic(), is_xid_start(), etc).
In addition, pull in fix from @dscorbett's commit
d25c39f86568a147f9b7080c25711fb1f98f056a in regex, which
makes `load_properties()` more tolerant of whitespace
in the Unicode tables. (This fix does not result in any
changes to tables.rs, but could if the Unicode tables
change in the future.)
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
librustc_driver and libsyntax. This may change pretty-printed
output from these modules in cases involving wide or combining
characters used in filenames, identifiers, etc.
The following functions are marked deprecated:
1. char.width() and str.width():
--> use unicode-width crate
2. str.graphemes() and str.grapheme_indices():
--> use unicode-segmentation crate
3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
char.compose(), char.decompose_canonical(), char.decompose_compatible(),
char.canonical_combining_class():
--> use unicode-normalization crate
This PR makes `rustc` emit field names for tuple fields in DWARF. Formerly there was no way of directly accessing the fields of a tuple in GDB and LLDB since there is no C/C++ equivalent to this. Now, the debugger sees the name `__{field-index}` for tuple fields. So you can type for example `some_tuple_val.__2` to get the third tuple component.
When pretty printers are used (e.g. via `rust-gdb` or `rust-lldb`) these artificial field names will not clutter tuple rendering (which was the main motivation for not doing this in the past).
Solves #21948.
This commit series starts out with more official test harness support for rustdoc tests, and then each commit afterwards adds a test (where appropriate). Each commit should also test and finish independently of all others (they're all pretty separable).
I've uploaded a [copy of the documentation](http://people.mozilla.org/~acrichton/doc/std/) generated after all these commits were applied, and a double check on issues being closed would be greatly appreciated! I'll also browse the docs a bit and make sure nothing regressed too horribly.
The idea here is if you don't want rust in /usr/local
you can put something like this is your .profile:
```
export RUSTUP_PREFIX=$HOME/.local/rust
export PATH=$PATH:${RUSTUP_PREFIX}/bin
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:${RUSTUP_PREFIX}/lib
```
Then when you run rustup, it will update the install
in ${RUSTUP_PREFIX} without having to remember to pass
an explicit --prefix argument every time.
The idea here is if you don't want rust in /usr/local
you can put something like this is your .profile:
export RUSTUP_PREFIX=$HOME/.local/rust
export PATH=$PATH:${RUSTUP_PREFIX}/bin
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:${RUSTUP_PREFIX}/lib
Then when you run rustup, it will update the install
in ${RUSTUP_PREFIX} without having to remember to pass
an explicit --prefix argument every time.
@mahkoh points out in #15628 that unicode.py does not use
normative data for Grapheme classes. This pr fixes that issue.
In addition, GC_RegionalIndicator is renamed GC_Regional_Indicator
in order to stay in line with the Unicode class name definitions.
I have updated refs in u_str.rs, and verified that there are no
refs elsewhere in the codebase. However, in principle someone
using the unicode tables for their own purposes might see breakage
from this.
Rationale for this, is that I lurked `ulimit -c unlimited` into my .profile to debug an unrelated crash, that I kept forgetting to set before hand. I then ran the test suite and discovered that I had 150 gigs of core dumps in `/cores`.
Very open to another approach, or to setting the limit to something higher than 0, but I think it would be nice if the build system tried to save you from yourself here.
Currently, the list of files linted in `tidy.py` is unordered. It seems more appropriate for more frequently appearing files (like `.rs`) to appear at the top of the list and for \"other files\" to appear at the very end. This PR also changes the wildcard import of `check_license()` into an explicit one.
```
Before: After:
* linted 4 .sh files * linted 5034 .rs files
* linted 4 .h files * linted 29 .c files
* linted 29 .c files * linted 28 .py files
* linted 2 .js files * linted 4 .sh files
* linted 0 other files * linted 4 .h files
* linted 28 .py files * linted 2 .js files
* linted 5034 .rs files * linted 0 other files
```
r? @brson
This changes the type of some public constants/statics in libunicode.
Notably some `&'static &'static [(char, char)]` have changed
to `&'static [(char, char)]`. The regexp crate seems to be the
sole user of these, yet this is technically a [breaking-change]
- Replace wildcard import with explicit import of `check_license`
- Move more logic outside of the `try` block.
- Group all helper functions together.
- Define `interesting_exts` and `uninteresting_files` at start of file
(with the rest of the constant declarations).
Since it makes more sense for .rs files to appear at the top of the
list of linted files and "other" files to appear at the end, this
commit moves the "other" count outside of the `file_counts` dictionary
and sorts the remaining "interesting" files by decreasing frequency.
- Now "make check-stage2-T-aarch64-linux-android-H-x86_64-unknown-linux-gnu" works (#21773)
- Fix & enable debuginfo tests for android (#10381)
- Fix & enable more tests for android (both for arm/aarch64)
- Enable many already-pass tests on android (both for arm/aarch64)
Since `tr` converts lowercase to uppercase according to system locale using `LC_CTYPE` environment variable; on some locales, rustup.sh fails to use correct variables names, thus deletes temporarily downloaded files and gives a meaningless error as shown below. This a simple fix which explictly sets `LC_CTYPE` as `C`.
Here is what happens without the fix:
```
➜ projects curl -s https://static.rust-lang.org/rustup.sh | sudo sh
rustup: CFG_CURL := /usr/bin/curl (7.22.0)
rustup: CFG_TAR := /bin/tar (1.26)
rustup: CFG_FILE := /usr/bin/file (5.09)
rustup: CFG_SHA256SUM := /usr/bin/sha256sum (256sum)
rustup: CFG_SHASUM := /usr/bin/shasum (5.61)
rustup:
rustup: processing sh args
rustup:
rustup: CFG_PREFiX :=
rustup: CFG_DATE :=
rustup:
rustup: validating sh args
rustup:
rustup: host triple: i686-unknown-linux-gnu
rustup: Downloading https://static.rust-lang.org/dist/rust-nightly-i686-unknown-linux-gnu.tar.gz to /tmp/tmp.Wz6F1ToG5z/rust-nightly-i686-unknown-linux-gnu.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 132M 100 132M 0 0 59947 0 0:38:31 0:38:31 --:--:-- 71204
rustup: Downloading https://static.rust-lang.org/dist/rust-nightly-i686-unknown-linux-gnu.tar.gz.sha256
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 109 100 109 0 0 107 0 0:00:01 0:00:01 --:--:-- 169
rustup: Verifying hash
rustup: Extracting /tmp/tmp.Wz6F1ToG5z/rust-nightly-i686-unknown-linux-gnu.tar.gz
install: looking for install programs
install:
install: found mkdir
install: found printf
install: found cut
install: found grep
install: found uname
install: found tr
install: found sed
install: found chmod
install:
install: processing /tmp/tmp.Wz6F1ToG5z/rust-nightly-i686-unknown-linux-gnu/install.sh args
install:
install: CFG_DESTDiR :=
install: CFG_PREFiX := /usr/local
install: CFG_LiBDiR := /lib
install: CFG_MANDiR := /share/man
install:
install: validating /tmp/tmp.Wz6F1ToG5z/rust-nightly-i686-unknown-linux-gnu/install.sh args
install:
install: verifying platform can run binaries
install: verifying destination is writable
mkdir: cannot create directory `': No such file or directory
install: error: can't write to destination. consider `sudo`.
rustup: error: failed to install Rust
```
Notice how `i` wasn't replaced with `I`.
Rust is installed as usual after the fix. Tested on Ubuntu x86 12.04 LTS.
I'm not exactly sure if setting LC_CTYPE is the best solution, but there's that.
The book in "hello-world" tells that there are configs for some programs and gives a link to main repo's src/etc. Actually, these configs moved to separate repos some days ago. This PR adds a markdown file with links and moves "hello-world" link about editors to point directly to this new file.
Since `tr` converts lowercase to uppercase according to system locale using `LC_CTYPE` environment variable; on some locales, rustup.sh fails to use correct variables names, thus deletes temporarily downloaded files and gives a meaningless error as shown below. This a simple fix which explictly sets `LC_CTYPE` as `C`.
This restructures tidy.py to walk the tree itself,
and improves performance considerably by not loading entire
files into buffers for licenseck.
Splits build rules into 'tidy', 'tidy-basic', 'tidy-binaries',
'tidy-errors', 'tidy-features'.
Use the crates.io crate `rand` (version 0.1 should be a drop in
replacement for `std::rand`) and `rand_macros` (`#[derive_Rand]` should
be a drop-in replacement).
[breaking-change]
This is an implementation of [RFC 578][rfc] which adds a new `std::env` module
to replace most of the functionality in the current `std::os` module. More
details can be found in the RFC itself, but as a summary the following methods
have all been deprecated:
[rfc]: https://github.com/rust-lang/rfcs/pull/578
* `os::args_as_bytes` => `env::args`
* `os::args` => `env::args`
* `os::consts` => `env::consts`
* `os::dll_filename` => no replacement, use `env::consts` directly
* `os::page_size` => `env::page_size`
* `os::make_absolute` => use `env::current_dir` + `join` instead
* `os::getcwd` => `env::current_dir`
* `os::change_dir` => `env::set_current_dir`
* `os::homedir` => `env::home_dir`
* `os::tmpdir` => `env::temp_dir`
* `os::join_paths` => `env::join_paths`
* `os::split_paths` => `env::split_paths`
* `os::self_exe_name` => `env::current_exe`
* `os::self_exe_path` => use `env::current_exe` + `pop`
* `os::set_exit_status` => `env::set_exit_status`
* `os::get_exit_status` => `env::get_exit_status`
* `os::env` => `env::vars`
* `os::env_as_bytes` => `env::vars`
* `os::getenv` => `env::var` or `env::var_string`
* `os::getenv_as_bytes` => `env::var`
* `os::setenv` => `env::set_var`
* `os::unsetenv` => `env::remove_var`
Many function signatures have also been tweaked for various purposes, but the
main changes were:
* `Vec`-returning APIs now all return iterators instead
* All APIs are now centered around `OsString` instead of `Vec<u8>` or `String`.
There is currently on convenience API, `env::var_string`, which can be used to
get the value of an environment variable as a unicode `String`.
All old APIs are `#[deprecated]` in-place and will remain for some time to allow
for migrations. The semantics of the APIs have been tweaked slightly with regard
to dealing with invalid unicode (panic instead of replacement).
The new `std::env` module is all contained within the `env` feature, so crates
must add the following to access the new APIs:
#![feature(env)]
[breaking-change]
While waiting on some builds I started cleaning up the various python bits and pieces.
I'm going to keep poking, want to ping me before the next rollup?
when saving .rs files under vim
do not fail to run the syntax checker
error: Unrecognized option: 'parse-only'.
due to this commit
953f294ea3
which removed the deprecated flag --parse-only
the syntax of 'case' is:
`case word in [[(] pattern [| pattern] ...) list ;; ] ... esac`
`list` don't have to issue `break`. `break` is normally used to exit a
`for`, `until` or `while` loop.
when saving .rs files under vim
do not fail to run the syntax checker
error: Unrecognized option: 'parse-only'.
due to this commit
953f294ea3
which removed the deprecated flag --parse-only
This does the bare minimum to make registration of error codes work again. After this patch, every call to `span_err!` with an error code gets that error code validated against a list in that crate and a new tidy script `errorck.py` validates that no error codes are duplicated globally.
There are further improvements to be made yet, detailed in #19624.
r? @nikomatsakis
The timezone of the server that builds rustc nightlies
appears to be in UTC. From what I can tell, it builds
the nightlies at 0300 UTC, which takes about ~15 minutes.
So, there were a couple options here for which offset to use.
UTC+3 would ensure that you always got the latest rust, but it
would mean the script is broken ~15/20 minutes a day. UTC+4
means users get a stale nightly for 45 minutes, but that feels
okay to me.
Ideally, buildbot would publish the "current" nightly, but
that seems unneeded relative to fixing it for all Timezones
quickly.
Fixes#21270
I added an option to auto-indent method chains to line up along their '.' operators. Like so:
```
let input = io::stdin().readline()
.ok()
.expect("Failed to read line");
```
The old default would indent like so:
```
let input = io::stdin().readme()
.ok()
.expect("Failed to read line");
```
The Rust guide explicitly condones the former, so I thought it would be nice for the emacs mode to support it. It's off by default, you have to set ```rust-indent-method-chain``` to ```t``` via your .emacs or the customize menu
There's only one build-critical path in which perl is used, and it was to do a text replacement trivially achievable with sed(1).
I ported the indenter script because it [appears to be used][indenter], but removed check links because it appears to be entirely out of date.
[indenter]: https://github.com/rust-lang/rust/blob/master/src/librustc/util/common.rs#L60-70
`x in y` is more Pythonic than `y.find(x) != -1`. I believe it runs quite a bit faster as well (though it's probably not a bottleneck of the Travis builds):
```bash
$ python -m timeit '"abc".find("a") != -1'
1000000 loops, best of 3: 0.218 usec per loop
$ python -m timeit '"a" in "abc"'
10000000 loops, best of 3: 0.0343 usec per loop
```
rust.nanorc provides syntax highlighting for Rust. An attempt has been made to make the syntax highlighting look good on both dark and light terminals. Issue #21286.
This PR is dedicated to @substars and nano-lovers everywhere.