mikros/rust - rust - Gitea.pterpstra.com

Go to file

bors f6fa358a18 Auto merge of #127226 - mat-1:optimize-siphash-round, r=nnethercote

Optimize SipHash by reordering compress instructions

This PR optimizes hashing by changing the order of instructions in the sip.rs `compress` macro so the CPU can parallelize it better. The new order is taken directly from Fig 2.1 in [the SipHash paper](https://eprint.iacr.org/2012/351.pdf) (but with the xors moved which makes it a little faster). I attempted to optimize it some more after this, but I think this might be the optimal instruction order. Note that this shouldn't change the behavior of hashing at all, only statements that don't depend on each other were reordered.

It appears like the current order hasn't changed since its [original implementation from 2012](fada46c421 (diff-b751133c229259d7099bbbc7835324e5504b91ab1aded9464f0c48cd22e5e420R35)) which doesn't look like it was written with data dependencies in mind.

Running `./x bench library/core --stage 0 --test-args hash` before and after this change shows the following results:

Before:
```
benchmarks:
    hash::sip::bench_bytes_4             7.20/iter +/- 0.70
    hash::sip::bench_bytes_7             9.01/iter +/- 0.35
    hash::sip::bench_bytes_8             8.12/iter +/- 0.10
    hash::sip::bench_bytes_a_16         10.07/iter +/- 0.44
    hash::sip::bench_bytes_b_32         13.46/iter +/- 0.71
    hash::sip::bench_bytes_c_128        37.75/iter +/- 0.48
    hash::sip::bench_long_str          121.18/iter +/- 3.01
    hash::sip::bench_str_of_8_bytes     11.20/iter +/- 0.25
    hash::sip::bench_str_over_8_bytes   11.20/iter +/- 0.26
    hash::sip::bench_str_under_8_bytes   9.89/iter +/- 0.59
    hash::sip::bench_u32                 9.57/iter +/- 0.44
    hash::sip::bench_u32_keyed           6.97/iter +/- 0.10
    hash::sip::bench_u64                 8.63/iter +/- 0.07
```
After:
```
benchmarks:
    hash::sip::bench_bytes_4             6.64/iter +/- 0.14
    hash::sip::bench_bytes_7             8.19/iter +/- 0.07
    hash::sip::bench_bytes_8             8.59/iter +/- 0.68
    hash::sip::bench_bytes_a_16          9.73/iter +/- 0.49
    hash::sip::bench_bytes_b_32         12.70/iter +/- 0.06
    hash::sip::bench_bytes_c_128        32.38/iter +/- 0.20
    hash::sip::bench_long_str          102.99/iter +/- 0.82
    hash::sip::bench_str_of_8_bytes     10.71/iter +/- 0.21
    hash::sip::bench_str_over_8_bytes   11.73/iter +/- 0.17
    hash::sip::bench_str_under_8_bytes  10.33/iter +/- 0.41
    hash::sip::bench_u32                10.41/iter +/- 0.29
    hash::sip::bench_u32_keyed           9.50/iter +/- 0.30
    hash::sip::bench_u64                 8.44/iter +/- 1.09
```
I ran this on my computer so there's some noise, but you can tell at least `bench_long_str` is significantly faster (~18%).

Also, I noticed the same compress function from the library is used in the compiler as well, so I took the liberty of copy-pasting this change to there as well.

Thanks `@semisol` for porting SipHash for another project which led me to notice this issue in Rust, and for helping investigate. <3

2024-07-04 04:03:45 +00:00

.github

Stop pinning XCode 14

2024-06-29 15:08:04 +00:00

.reuse

Rollup merge of #126876 - WaffleLapkin:unignoreconfigtoml, r=Mark-Simulacrum

2024-06-30 10:39:47 +02:00

compiler

Auto merge of #127226 - mat-1:optimize-siphash-round, r=nnethercote

2024-07-04 04:03:45 +00:00

library

Auto merge of #127226 - mat-1:optimize-siphash-round, r=nnethercote

2024-07-04 04:03:45 +00:00

LICENSES

…

src

Auto merge of #127127 - notriddle:notriddle/pulldown-cmark-0.11, r=GuillaumeGomez

2024-07-04 01:50:31 +00:00

tests

Auto merge of #127127 - notriddle:notriddle/pulldown-cmark-0.11, r=GuillaumeGomez

2024-07-04 01:50:31 +00:00

.clang-format

Add .clang-format

2024-06-26 05:56:00 +08:00

.editorconfig

…

.git-blame-ignore-revs

…

.gitattributes

…

.gitignore

…

.gitmodules

…

.ignore

Add .ignore file to make config.toml searchable in vscode

2024-06-24 10:15:16 +02:00

.mailmap

…

Cargo.lock

rustdoc: add usable lint for pulldown-cmark-0.11 parsing changes

2024-07-01 07:21:02 -07:00

Cargo.toml

Implement x perf as a separate tool

2024-06-27 10:22:03 +02:00

CODE_OF_CONDUCT.md

…

config.example.toml

…

configure

…

CONTRIBUTING.md

…

INSTALL.md

…

LICENSE-APACHE

…

LICENSE-MIT

…

README.md

…

RELEASES.md

…

rust-bors.toml

…

rustfmt.toml

…

triagebot.toml

Autolabel rustc-perf-wrapper changes with t-bootstrap label

2024-06-29 16:07:39 +02:00

…

x.ps1

…

x.py

…

README.md

Website | Getting started | Learn | Documentation | Contributing

This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.

Why Rust?

Performance: Fast and memory-efficient, suitable for critical services, embedded devices, and easily integrate with other languages.
Reliability: Our rich type system and ownership model ensure memory and thread safety, reducing bugs at compile-time.
Productivity: Comprehensive documentation, a compiler committed to providing great diagnostics, and advanced tooling including package manager and build tool (Cargo), auto-formatter (rustfmt), linter (Clippy) and editor support (rust-analyzer).

Quick Start

Read "Installation" from The Book.

Installing from Source

If you really want to install from source (though this is not recommended), see INSTALL.md.

Getting Help

See https://www.rust-lang.org/community for a list of chat platforms and forums.

Contributing

See CONTRIBUTING.md.

License

Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.

See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.

Trademark

The Rust Foundation owns and protects the Rust and Cargo trademarks and logos (the "Rust Trademarks").

If you want to use these names or brands, please read the media guide.

Third-party logos may be subject to third-party copyrights and trademarks. See Licenses for details.

Languages

Rust 96.2%

RenderScript 0.7%

JavaScript 0.6%

Shell 0.6%

Fluent 0.4%

Other 1.3%