e7e4ecc522
Add fast path for ASCII in UTF-8 validation This speeds up the ASCII case (and long stretches of ASCII in otherwise mixed UTF-8 data) when checking UTF-8 validity. Benchmark results suggest that on purely ASCII input, we can improve throughput (megabytes verified / second) by a factor of 13 to 14 (smallish input). On XML and mostly English language input (en.wikipedia XML dump), throughput improves by a factor 7 (large input). On mostly non-ASCII input, performance increases slightly or is the same. The UTF-8 validation is rewritten to use indexed access; since all access is preceded by a (mandatory for validation) length check, bounds checks are statically elided by LLVM and this formulation is in fact the best for performance. A previous version had losses due to slice to iterator conversions. A large credit to Björn Steinbrink who improved this patch immensely, writing this second version. Benchmark results on x86-64 (Sandy Bridge) compiled with -C opt-level=3. Old code is `regular`, this PR is called `fast`. Datasets: - `ascii` is just ASCII (2.5 kB) - `cyr` is cyrillic script with ascii spaces (5 kB) - `dewik10` is 10MB of a de.wikipedia XML dump - `enwik8` is 100MB of an en.wikipedia XML dump - `jawik10` is 10MB of a ja.wikipedia XML dump ``` test from_utf8_ascii_fast ... bench: 140 ns/iter (+/- 4) = 18221 MB/s test from_utf8_ascii_regular ... bench: 1,932 ns/iter (+/- 19) = 1320 MB/s test from_utf8_cyr_fast ... bench: 10,025 ns/iter (+/- 245) = 511 MB/s test from_utf8_cyr_regular ... bench: 10,944 ns/iter (+/- 795) = 468 MB/s test from_utf8_dewik10_fast ... bench: 6,017,909 ns/iter (+/- 105,755) = 1740 MB/s test from_utf8_dewik10_regular ... bench: 11,669,493 ns/iter (+/- 264,045) = 891 MB/s test from_utf8_enwik8_fast ... bench: 14,085,692 ns/iter (+/- 1,643,316) = 7000 MB/s test from_utf8_enwik8_regular ... bench: 93,657,410 ns/iter (+/- 5,353,353) = 1000 MB/s test from_utf8_jawik10_fast ... bench: 29,154,073 ns/iter (+/- 4,659,534) = 340 MB/s test from_utf8_jawik10_regular ... bench: 29,112,917 ns/iter (+/- 2,475,123) = 340 MB/s ``` Co-authored-by: Björn Steinbrink <bsteinbr@gmail.com> |
||
---|---|---|
man | ||
mk | ||
src | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.mailmap | ||
.travis.yml | ||
COMPILER_TESTS.md | ||
configure | ||
CONTRIBUTING.md | ||
COPYRIGHT | ||
LICENSE-APACHE | ||
LICENSE-MIT | ||
Makefile.in | ||
README.md | ||
RELEASES.md |
The Rust Programming Language
This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.
Quick Start
Read "Installing Rust" from The Book.
Building from Source
-
Make sure you have installed the dependencies:
g++
4.7 orclang++
3.xpython
2.7 or later (but not 3.x)- GNU
make
3.81 or later curl
git
-
Clone the source with
git
:$ git clone https://github.com/rust-lang/rust.git $ cd rust
-
Build and install:
$ ./configure $ make && make install
Note: You may need to use
sudo make install
if you do not normally have permission to modify the destination directory. The install locations can be adjusted by passing a--prefix
argument toconfigure
. Various other options are also supported – pass--help
for more information on them.When complete,
make install
will place several programs into/usr/local/bin
:rustc
, the Rust compiler, andrustdoc
, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.
Building on Windows
There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.
MinGW
MSYS2 can be used to easily build Rust on Windows:
-
Grab the latest MSYS2 installer and go through the installer.
-
From the MSYS2 terminal, install the
mingw64
toolchain and other required tools.# Update package mirrors (may be needed if you have a fresh install of MSYS2) $ pacman -Sy pacman-mirrors
Download MinGW from
here, and choose the
threads=win32,exceptions=dwarf/seh
flavor when installing. After installing,
add its bin
directory to your PATH
. This is due to #28260, in the future,
installing from pacman should be just fine.
# Make git available in MSYS2 (if not already available on path)
$ pacman -S git
$ pacman -S base-devel
-
Run
mingw32_shell.bat
ormingw64_shell.bat
from wherever you installed MSYS2 (i.e.C:\msys
), depending on whether you want 32-bit or 64-bit Rust. -
Navigate to Rust's source code, configure and build it:
$ ./configure $ make && make install
MSVC
MSVC builds of Rust additionally require an installation of Visual Studio 2013
(or later) so rustc
can use its linker. Make sure to check the “C++ tools”
option. In addition, cmake
needs to be installed to build LLVM.
With these dependencies installed, the build takes two steps:
$ ./configure
$ make && make install
Building Documentation
If you’d like to build the documentation, it’s almost the same:
./configure
$ make docs
Building the documentation requires building the compiler, so the above details will apply. Once you have the compiler built, you can
$ make docs NO_REBUILD=1
To make sure you don’t re-build the compiler because you made a change to some documentation.
The generated documentation will appear in a top-level doc
directory,
created by the make
rule.
Notes
Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.
Snapshot binaries are currently built and tested on several platforms:
Platform \ Architecture | x86 | x86_64 |
---|---|---|
Windows (7, 8, Server 2008 R2) | ✓ | ✓ |
Linux (2.6.18 or later) | ✓ | ✓ |
OSX (10.7 Lion or later) | ✓ | ✓ |
You may find that other platforms work, but these are our officially supported build environments that are most likely to work.
Rust currently needs between 600MiB and 1.5GiB to build, depending on platform. If it hits swap, it will take a very long time to build.
There is more advice about hacking on Rust in CONTRIBUTING.md.
Getting Help
The Rust community congregates in a few places:
- Stack Overflow - Direct questions about using the language.
- users.rust-lang.org - General discussion and broader questions.
- /r/rust - News and general discussion.
Contributing
To contribute to Rust, please see CONTRIBUTING.
Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust, and a good place to ask for help.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.