Add small-copy optimization for copy_from_slice ## Summary During benchmarking, I found that one of my programs spent between 5 and 10 percent of the time doing memmoves. Ultimately I tracked these down to single-byte slices being copied with a memcopy. Doing a manual copy if the slice contains only one element can speed things up significantly. For my program, this reduced the running time by 20%. ## Background I am optimizing a program that relies heavily on reading a single byte at a time. To avoid IO overhead, I read all data into a vector once, and then I use a `Cursor` around that vector to read from. During profiling, I noticed that `__memmove_avx_unaligned_erms` was hot, taking up 7.3% of the running time. It turns out that these were caused by calls to `Cursor::read()`, which calls `<&[u8] as Read>::read()`, which calls `&[T]::copy_from_slice()`, which calls `ptr::copy_nonoverlapping()`. This one is implemented as a memcopy. Copying a single byte with a memcopy is very wasteful, because (at least on my platform) it involves calling `memcpy` in libc. This is an indirect call when libc is linked dynamically, and furthermore `memcpy` is optimized for copying large amounts of data at the cost of a bit of overhead for small copies. ## Benchmarks Before I made this change, `perf` reported the following for my program. I only included the relevant functions, and how they rank. (This is on a different machine than where I ran the original benchmarks. It has an older CPU, so `__memmove_sse2_unaligned_erms` is called instead of `__memmove_avx_unaligned_erms`.) ``` #3 5.47% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms #5 1.67% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5 #6 1.51% bench_decode bench_decode [.] memcpy@plt ``` `memcpy` is eating up 8.65% of the total running time, and the overhead of dispatching to a specialized fast copy function (`memcpy@GLIBC` showing up) is clearly visible. The price of dynamic linking (`memcpy@plt` showing up) is visible too. After this change, this is what `perf` reports: ``` #5 0.33% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms #14 0.01% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5 ``` Now only 0.34% of the running time is spent on memcopies. The dynamic linking overhead is not significant at all any more. To add some more data, my program generates timing results for the operation in its main loop. These are the timings before and after the change: | Time before | Time after | After/Before | |---------------|---------------|--------------| | 29.8 ± 0.8 ns | 23.6 ± 0.5 ns | 0.79 ± 0.03 | The time is basically the total running time divided by a constant; the actual numbers are not important. This change reduced the total running time by 21% (much more than the original 9% spent on memmoves, likely because the CPU is stalling a lot less because data dependencies are more transparent). Of course YMMV and for most programs this will not matter at all. But when it does, the gains can be significant! ## Alternatives * At first I implemented this in `io::Cursor`. I moved it to `&[T]::copy_from_slice()` instead, but this might be too intrusive, especially because it applies to all `T`, not just `u8`. To restrict this to `io::Read`, `<&[u8] as Read>::read()` is probably the best place. * I tried copying bytes in a loop up to 64 or 8 bytes before calling `Read::read`, but both resulted in about a 20% slowdown instead of speedup.
The Rust Programming Language
This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.
Quick Start
Read "Installing Rust" from The Book.
Building from Source
-
Make sure you have installed the dependencies:
g++
4.7 or later orclang++
3.xpython
2.7 (but not 3.x)- GNU
make
3.81 or later cmake
3.4.3 or latercurl
git
-
Clone the source with
git
:$ git clone https://github.com/rust-lang/rust.git $ cd rust
-
Build and install:
$ ./configure $ make && make install
Note: You may need to use
sudo make install
if you do not normally have permission to modify the destination directory. The install locations can be adjusted by passing a--prefix
argument toconfigure
. Various other options are also supported – pass--help
for more information on them.When complete,
make install
will place several programs into/usr/local/bin
:rustc
, the Rust compiler, andrustdoc
, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.
Building on Windows
There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.
MinGW
MSYS2 can be used to easily build Rust on Windows:
-
Grab the latest MSYS2 installer and go through the installer.
-
Run
mingw32_shell.bat
ormingw64_shell.bat
from wherever you installed MSYS2 (i.e.C:\msys64
), depending on whether you want 32-bit or 64-bit Rust. (As of the latest version of MSYS2 you have to runmsys2_shell.cmd -mingw32
ormsys2_shell.cmd -mingw64
from the command line instead) -
From this terminal, install the required tools:
# Update package mirrors (may be needed if you have a fresh install of MSYS2) $ pacman -Sy pacman-mirrors # Install build tools needed for Rust. If you're building a 32-bit compiler, # then replace "x86_64" below with "i686". If you've already got git, python, # or CMake installed and in PATH you can remove them from this list. Note # that it is important that the `python2` and `cmake` packages **not** used. # The build has historically been known to fail with these packages. $ pacman -S git \ make \ diffutils \ tar \ mingw-w64-x86_64-python2 \ mingw-w64-x86_64-cmake \ mingw-w64-x86_64-gcc
-
Navigate to Rust's source code (or clone it), then configure and build it:
$ ./configure $ make && make install
MSVC
MSVC builds of Rust additionally require an installation of Visual Studio 2013
(or later) so rustc
can use its linker. Make sure to check the “C++ tools”
option.
With these dependencies installed, the build takes two steps:
$ ./configure
$ make && make install
MSVC with rustbuild
The old build system, based on makefiles, is currently being rewritten into a Rust-based build system called rustbuild. This can be used to bootstrap the compiler on MSVC without needing to install MSYS or MinGW. All you need are Python 2, CMake, and Git in your PATH (make sure you do not use the ones from MSYS if you have it installed). You'll also need Visual Studio 2013 or newer with the C++ tools. Then all you need to do is to kick off rustbuild.
python x.py build
Currently rustbuild only works with some known versions of Visual Studio. If you have a more recent version installed that a part of rustbuild doesn't understand then you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.
CALL "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64\vcvars64.bat"
python x.py build
Building Documentation
If you’d like to build the documentation, it’s almost the same:
$ ./configure
$ make docs
Building the documentation requires building the compiler, so the above details will apply. Once you have the compiler built, you can
$ make docs NO_REBUILD=1
To make sure you don’t re-build the compiler because you made a change to some documentation.
The generated documentation will appear in a top-level doc
directory,
created by the make
rule.
Notes
Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.
Snapshot binaries are currently built and tested on several platforms:
Platform / Architecture | x86 | x86_64 |
---|---|---|
Windows (7, 8, Server 2008 R2) | ✓ | ✓ |
Linux (2.6.18 or later) | ✓ | ✓ |
OSX (10.7 Lion or later) | ✓ | ✓ |
You may find that other platforms work, but these are our officially supported build environments that are most likely to work.
Rust currently needs between 600MiB and 1.5GiB to build, depending on platform. If it hits swap, it will take a very long time to build.
There is more advice about hacking on Rust in CONTRIBUTING.md.
Getting Help
The Rust community congregates in a few places:
- Stack Overflow - Direct questions about using the language.
- users.rust-lang.org - General discussion and broader questions.
- /r/rust - News and general discussion.
Contributing
To contribute to Rust, please see CONTRIBUTING.
Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust. And a good place to ask for help would be #rust-beginners.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.