dedd985084
Implement a faster sort algorithm
Hi everyone, this is my first PR.
I've made some changes to the standard sort algorithm, starting out with a few tweaks here and there, but in the end this endeavour became a complete rewrite of it.
#### Summary
Changes:
* Improved performance, especially on partially sorted inputs.
* Performs less comparisons on both random and partially sorted inputs.
* Decreased the size of temporary memory: the new sort allocates 4x less.
Benchmark:
```
name out1 ns/iter out2 ns/iter diff ns/iter diff %
slice::bench::sort_large_ascending 85,323 (937 MB/s) 8,970 (8918 MB/s) -76,353 -89.49%
slice::bench::sort_large_big_ascending 2,135,297 (599 MB/s) 355,955 (3595 MB/s) -1,779,342 -83.33%
slice::bench::sort_large_big_descending 2,266,402 (564 MB/s) 416,479 (3073 MB/s) -1,849,923 -81.62%
slice::bench::sort_large_big_random 3,053,031 (419 MB/s) 1,921,389 (666 MB/s) -1,131,642 -37.07%
slice::bench::sort_large_descending 313,181 (255 MB/s) 14,725 (5432 MB/s) -298,456 -95.30%
slice::bench::sort_large_mostly_ascending 287,706 (278 MB/s) 243,204 (328 MB/s) -44,502 -15.47%
slice::bench::sort_large_mostly_descending 415,078 (192 MB/s) 271,028 (295 MB/s) -144,050 -34.70%
slice::bench::sort_large_random 545,872 (146 MB/s) 521,559 (153 MB/s) -24,313 -4.45%
slice::bench::sort_large_random_expensive 30,321,770 (2 MB/s) 23,533,735 (3 MB/s) -6,788,035 -22.39%
slice::bench::sort_medium_ascending 616 (1298 MB/s) 155 (5161 MB/s) -461 -74.84%
slice::bench::sort_medium_descending 1,952 (409 MB/s) 202 (3960 MB/s) -1,750 -89.65%
slice::bench::sort_medium_random 3,646 (219 MB/s) 3,421 (233 MB/s) -225 -6.17%
slice::bench::sort_small_ascending 39 (2051 MB/s) 34 (2352 MB/s) -5 -12.82%
slice::bench::sort_small_big_ascending 96 (13333 MB/s) 96 (13333 MB/s) 0 0.00%
slice::bench::sort_small_big_descending 248 (5161 MB/s) 243 (5267 MB/s) -5 -2.02%
slice::bench::sort_small_big_random 501 (2554 MB/s) 490 (2612 MB/s) -11 -2.20%
slice::bench::sort_small_descending 95 (842 MB/s) 63 (1269 MB/s) -32 -33.68%
slice::bench::sort_small_random 372 (215 MB/s) 354 (225 MB/s) -18 -4.84%
```
#### Background
First, let me just do a quick brain dump to discuss what I learned along the way.
The official documentation says that the standard sort in Rust is a stable sort. This constraint is thus set in stone and immediately rules out many popular sorting algorithms. Essentially, the only algorithms we might even take into consideration are:
1. [Merge sort](https://en.wikipedia.org/wiki/Merge_sort)
2. [Block sort](https://en.wikipedia.org/wiki/Block_sort) (famous implementations are [WikiSort](https://github.com/BonzaiThePenguin/WikiSort) and [GrailSort](https://github.com/Mrrl/GrailSort))
3. [TimSort](https://en.wikipedia.org/wiki/Timsort)
Actually, all of those are just merge sort flavors. :) The current standard sort in Rust is a simple iterative merge sort. It has three problems. First, it's slow on partially sorted inputs (even though #29675 helped quite a bit). Second, it always makes around `log(n)` iterations copying the entire array between buffers, no matter what. Third, it allocates huge amounts of temporary memory (a buffer of size `2*n`, where `n` is the size of input).
The problem of auxilliary memory allocation is a tough one. Ideally, it would be best for our sort to allocate `O(1)` additional memory. This is what block sort (and it's variants) does. However, it's often very complicated (look at [this](https://github.com/BonzaiThePenguin/WikiSort/blob/master/WikiSort.cpp)) and even then performs rather poorly. The author of WikiSort claims good performance, but that must be taken with a grain of salt. It performs well in comparison to `std::stable_sort` in C++. It can even beat `std::sort` on partially sorted inputs, but on random inputs it's always far worse. My rule of thumb is: high performance, low memory overhead, stability - choose two.
TimSort is another option. It allocates a buffer of size `n/2`, which is not great, but acceptable. Performs extremelly well on partially sorted inputs. However, it seems pretty much all implementations suck on random inputs. I benchmarked implementations in [Rust](https://github.com/notriddle/rust-timsort), [C++](https://github.com/gfx/cpp-TimSort), and [D](
|
||
---|---|---|
man | ||
mk | ||
src | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.mailmap | ||
.travis.yml | ||
appveyor.yml | ||
COMPILER_TESTS.md | ||
configure | ||
CONTRIBUTING.md | ||
COPYRIGHT | ||
LICENSE-APACHE | ||
LICENSE-MIT | ||
Makefile.in | ||
README.md | ||
RELEASES.md | ||
x.py |
The Rust Programming Language
This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.
Quick Start
Read "Installing Rust" from The Book.
Building from Source
-
Make sure you have installed the dependencies:
g++
4.7 or later orclang++
3.xpython
2.7 (but not 3.x)- GNU
make
3.81 or later cmake
3.4.3 or latercurl
git
-
Clone the source with
git
:$ git clone https://github.com/rust-lang/rust.git $ cd rust
-
Build and install:
$ ./configure $ make && sudo make install
Note: Install locations can be adjusted by passing a
--prefix
argument toconfigure
. Various other options are also supported – pass--help
for more information on them.When complete,
sudo make install
will place several programs into/usr/local/bin
:rustc
, the Rust compiler, andrustdoc
, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.
Building on Windows
There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.
MinGW
MSYS2 can be used to easily build Rust on Windows:
-
Grab the latest MSYS2 installer and go through the installer.
-
Run
mingw32_shell.bat
ormingw64_shell.bat
from wherever you installed MSYS2 (i.e.C:\msys64
), depending on whether you want 32-bit or 64-bit Rust. (As of the latest version of MSYS2 you have to runmsys2_shell.cmd -mingw32
ormsys2_shell.cmd -mingw64
from the command line instead) -
From this terminal, install the required tools:
# Update package mirrors (may be needed if you have a fresh install of MSYS2) $ pacman -Sy pacman-mirrors # Install build tools needed for Rust. If you're building a 32-bit compiler, # then replace "x86_64" below with "i686". If you've already got git, python, # or CMake installed and in PATH you can remove them from this list. Note # that it is important that the `python2` and `cmake` packages **not** used. # The build has historically been known to fail with these packages. $ pacman -S git \ make \ diffutils \ tar \ mingw-w64-x86_64-python2 \ mingw-w64-x86_64-cmake \ mingw-w64-x86_64-gcc
-
Navigate to Rust's source code (or clone it), then configure and build it:
$ ./configure $ make && make install
MSVC
MSVC builds of Rust additionally require an installation of Visual Studio 2013
(or later) so rustc
can use its linker. Make sure to check the “C++ tools”
option.
With these dependencies installed, you can build the compiler in a cmd.exe
shell with:
> python x.py build
If you're running inside of an msys shell, however, you can run:
$ ./configure --build=x86_64-pc-windows-msvc
$ make && make install
Currently building Rust only works with some known versions of Visual Studio. If you have a more recent version installed the build system doesn't understand then you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.
CALL "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\amd64\vcvars64.bat"
python x.py build
Building Documentation
If you’d like to build the documentation, it’s almost the same:
$ ./configure
$ make docs
The generated documentation will appear in a top-level doc
directory,
created by the make
rule.
Notes
Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.
Snapshot binaries are currently built and tested on several platforms:
Platform / Architecture | x86 | x86_64 |
---|---|---|
Windows (7, 8, Server 2008 R2) | ✓ | ✓ |
Linux (2.6.18 or later) | ✓ | ✓ |
OSX (10.7 Lion or later) | ✓ | ✓ |
You may find that other platforms work, but these are our officially supported build environments that are most likely to work.
Rust currently needs between 600MiB and 1.5GiB to build, depending on platform. If it hits swap, it will take a very long time to build.
There is more advice about hacking on Rust in CONTRIBUTING.md.
Getting Help
The Rust community congregates in a few places:
- Stack Overflow - Direct questions about using the language.
- users.rust-lang.org - General discussion and broader questions.
- /r/rust - News and general discussion.
Contributing
To contribute to Rust, please see CONTRIBUTING.
Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust. And a good place to ask for help would be #rust-beginners.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.