This ended up being a bigger refactoring than I thought, as I also cleaned a few ugly points in rustc. There are still a few areas that need improvements.
Performance numbers:
```
Before:
572.70user 5.52system 7:33.21elapsed 127%CPU (0avgtext+0avgdata 1173368maxresident)k
llvm-time: 385.858
After:
545.27user 5.49system 7:10.22elapsed 128%CPU (0avgtext+0avgdata 1145348maxresident)k
llvm-time: 387.119
```
A good 5% perf improvement. Note that after this patch >70% of the time is spent in LLVM - Amdahl's law is in full effect.
Passes make check locally.
r? @nikomatsakis
The innermost loop of TwoWaySearcher checks the boundary of the haystack
vs position + needle.len(), and it checks the last byte of the needle
against the byteset.
If these two steps are combined by using the indexing of the last
needle byte's position as bounds check, the algorithm improves its
throughput. We improve the innermost loop by reducing the number of
instructions used, and elminating the panic case for the checked
indexing that was previously used.
Selected benchmarks from the external/workspace testsuite. Benchmarks
improve across the board.
```
before:
test bb_in_aa::twoway_find ... bench: 4,229 ns/iter (+/- 1,305) = 23646 MB/s
test bb_in_aa::twoway_rfind ... bench: 3,873 ns/iter (+/- 101) = 25819 MB/s
test short_1let_long::twoway_find ... bench: 7,075 ns/iter (+/- 29) = 360 MB/s
test short_1let_long::twoway_rfind ... bench: 6,640 ns/iter (+/- 79) = 384 MB/s
test short_2let_long::twoway_find ... bench: 3,823 ns/iter (+/- 16) = 667 MB/s
test short_2let_long::twoway_rfind ... bench: 3,774 ns/iter (+/- 44) = 675 MB/s
test short_3let_long::twoway_find ... bench: 3,582 ns/iter (+/- 47) = 712 MB/s
test short_3let_long::twoway_rfind ... bench: 3,616 ns/iter (+/- 34) = 705 MB/s
with this commit:
test bb_in_aa::twoway_find ... bench: 2,952 ns/iter (+/- 20) = 33875 MB/s
test bb_in_aa::twoway_rfind ... bench: 2,939 ns/iter (+/- 99) = 34025 MB/s
test short_1let_long::twoway_find ... bench: 4,593 ns/iter (+/- 4) = 555 MB/s
test short_1let_long::twoway_rfind ... bench: 4,592 ns/iter (+/- 76) = 555 MB/s
test short_2let_long::twoway_find ... bench: 2,804 ns/iter (+/- 3) = 909 MB/s
test short_2let_long::twoway_rfind ... bench: 2,807 ns/iter (+/- 40) = 908 MB/s
test short_3let_long::twoway_find ... bench: 3,105 ns/iter (+/- 120) = 821 MB/s
test short_3let_long::twoway_rfind ... bench: 3,019 ns/iter (+/- 50) = 844 MB/s
```
- `bb_in_aa`: fast skip due to byteset filter loop improves.
- 1/2/3let: Searches for 1, 2, or 3 ascii bytes improves.
This search happens a lot! Locally, compiling hyper sees the following improvements:
before
real 0m30.843s
user 0m51.644s
sys 0m2.128s
real 0m30.164s
user 0m53.320s
sys 0m2.208s
after
real 0m28.438s
user 0m51.076s
sys 0m2.276s
real 0m28.612s
user 0m51.560s
sys 0m2.192s
This search happens a lot! Locally, compiling hyper sees the following improvements:
before
real 0m30.843s
user 0m51.644s
sys 0m2.128s
real 0m30.164s
user 0m53.320s
sys 0m2.208s
after
real 0m28.438s
user 0m51.076s
sys 0m2.276s
real 0m28.612s
user 0m51.560s
sys 0m2.192s
I got a bit confused reading the guide over why all of a sudden there was an asterisk in the code. I was explained what it was there for in the IRC, and I think it should added it to the docs to prevent any further confusion!
This pull request implements the functionality for [RFC 873](https://github.com/rust-lang/rfcs/blob/master/text/0873-type-macros.md). This is currently just an update of @freebroccolo's branch from January, the corresponding commits are linked in each commit message.
@nikomatsakis and I had talked about updating the macro language to support a lifetime fragment specifier, and it is possible to do that work on this branch as well. If so we can (collectively) talk about it next week during the pre-RustCamp work week.
In Section 3.2, TARPL says that "standard allocators (including jemalloc, the one used by default in Rust) generally consider passing in 0 for the size of an allocation as Undefined Behaviour."
However, the C standard and jemalloc manual says allocating zero bytes
should succeed:
- C11 7.22.3 paragraph 1: "If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object."
- [jemalloc manual](http://www.freebsd.org/cgi/man.cgi?query=jemalloc&sektion=3): "The malloc and calloc functions return a pointer to the allocated memory if successful; otherwise a NULL pointer is returned and errno is set to ENOMEM."
+ Note that the description for `allocm` says "Behavior is undefined if size is 0," but it is an experimental API.
r? @Gankro
In Section 3.2, TARPL says that "standard allocators (including jemalloc, the one used by default in Rust) generally consider passing in 0 for the size of an allocation as Undefined Behaviour."
However, the C standard and jemalloc manual says allocating zero bytes
should succeed:
- C11 7.22.3 paragraph 1: "If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object."
- [jemalloc manual](http://www.freebsd.org/cgi/man.cgi?query=jemalloc&sektion=3): "The malloc and calloc functions return a pointer to the allocated memory if successful; otherwise a NULL pointer is returned and errno is set to ENOMEM."
+ Note that the description for `allocm` says "Behavior is undefined if size is 0," but it is an experimental API.