debuginfo: Handle spread_arg case in MIR-trans in a more stable way.
Use `VariableAccess::DirectVariable` instead of `VariableAccess::IndirectVariable` in order not to make LLVM's SROA angry. This is a step towards fixing #36774 and #35547. At least, I can build Cargo with optimizations + debuginfo again.
r? @eddyb
normalize types every time HR regions are erased
Associated type normalization is inhibited by higher-ranked regions.
Therefore, every time we erase them, we must re-normalize.
I was meaning to introduce this change some time ago, but we used
to erase regions in generic context, which broke this terribly (because
you can't always normalize in a generic context). That seems to be gone
now.
Ensure this by having a `erase_late_bound_regions_and_normalize`
function.
Fixes#37109 (the missing call was in mir::block).
r? @eddyb
rustdoc: Improve playground run buttons
The main change is to stop using javascript to generate the URLs and use
rustdoc instead.
This also adds run buttons to the error index examples.
You can test the changes at https://ollie27.github.io/rust_doc_test/.
Fixes#36621Fixes#36910
add a per-param-env cache to `impls_bound`
There used to be only a global cache, which led to uncached calls to
trait selection when there were type parameters.
This causes a 20% decrease in borrow-checking time and an overall 0.5% performance increase during bootstrapping (as borrow-checking tends to be a tiny part of compilation time).
Fixes#37106 (drop elaboration times are now ~half of borrow checking,
so might still be worthy of optimization, but not critical).
r? @pnkfelix
Change Substs to type alias for Slice<Kind> for interning
This changes the definition of `librustc::ty::subst::Substs` to be a type alias to `Slice<Kind>`. `Substs` was already interned, but can now make use of the efficient `PartialEq` and `Hash` impls on `librustc::ty::Slice`.
I'm working on collecting some timing data for this, will update when it's done.
I chose to leave the impls on `Substs<'tcx>` even though it's now just a type alias to `Slice<Kind<'tcx>>` because it has the smallest footprint on other portions of the compiler which depend on its API. It turns out to be a pretty huge diff if you change where Substs's methods live 😄. That said, I'm not necessarily sure it's the *best* implementation but it's probably the easiest/smallest to review.
Many thanks to @eddyb for both suggesting this as a project for learning more about the compiler, and the tireless ~~handholding~~ mentorship he provided.
Specialize Vec::extend to Vec::extend_from_slice
I tried using the existing `SpecExtend` as a helper trait for this, but the instances would always conflict with the instances higher up in the file, so I created a new helper trait.
Benchmarking `extend` vs `extend_from_slice` with an slice of 1000 `u64`s gives the following results:
```
before:
running 2 tests
test tests::bench_extend_from_slice ... bench: 166 ns/iter (+/- 78)
test tests::bench_extend_trait ... bench: 1,187 ns/iter (+/- 697)
after:
running 2 tests
test tests::bench_extend_from_slice ... bench: 149 ns/iter (+/- 87)
test tests::bench_extend_trait ... bench: 138 ns/iter (+/- 70)
```
Implement `read_offset` and `write_offset`
These functions allow to read from and write to a file from multiple
threads without changing the per-file cursor, avoiding the race between
the seek and the read.
add (missing) tar to list of packages to get under mingw
The distribution targets use tar, but the readme pacman invocation doesn't include the tar package.
Explain motivation behind lifetimes
Start the lifetime section with an explanation of the issues that lack of explicit lifetimes cause and how the explicit lifetimes solve these.
----------------
I had really hard time figuring out why I would need to care about the explicit reference lifetimes when going through the book at first. With strong background in C++, I'm familiar with the dangling reference problem - but given the section seems to focus more on the lifetime syntax and various ways to define lifetimes on functions and structs, I was unable to understand how they are used to solve the reference problem.
This PR is an attempt at getting the reader to understand what the explicit lifetimes are used for and why they are an awesome thing instead of a bit of syntax that just has to be written.
It's been less than a week that I've been diving into Rust so I'm far from certain about the terminology and technical correctness. I tried mimicking the existing terminology from the lifetimes section, but still no promises on getting it right.
Changed error message E0408 to new format
Followed your text and was able to change the ouput to the new format.
I did not encounter any broken test therefore this is a really small commit.
Thanks for letting me hack on the compiler :)
r? @jonathandturner
Cache conscious hashmap table
Right now the internal HashMap representation is 3 unziped arrays hhhkkkvvv, I propose to change it to hhhkvkvkv (in further iterations kvkvkvhhh may allow inplace grow). A previous attempt is at #21973.
This layout is generally more cache conscious as it makes the value immediately accessible after a key matches. The separated hash arrays is a _no-brainer_ because of how the RH algorithm works and that's unchanged.
**Lookups**: Upon a successful match in the hash array the code can check the key and immediately have access to the value in the same or next cache line (effectively saving a L[1,2,3] miss compared to the current layout).
**Inserts/Deletes/Resize**: Moving values in the table (robin hooding it) is faster because it touches consecutive cache lines and uses less instructions.
Some backing benchmarks (besides the ones bellow) for the benefits of this layout can be seen here as well http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/
The obvious drawbacks is: padding can be wasted between the key and value. Because of that keys(), values() and contains() can consume more cache and be slower.
Total wasted padding between items (C being the capacity of the table).
* Old layout: C * (K-K padding) + C * (V-V padding)
* Proposed: C * (K-V padding) + C * (V-K padding)
In practice padding between K-K and V-V *can* be smaller than K-V and V-K. The overhead is capped(ish) at sizeof u64 - 1 so we can actually measure the worst case (u8 at the end of key type and value with aliment of 1, _hardly the average case in practice_).
Starting from the worst case the memory overhead is:
* `HashMap<u64, u8>` 46% memory overhead. (aka *worst case*)
* `HashMap<u64, u16>` 33% memory overhead.
* `HashMap<u64, u32>` 20% memory overhead.
* `HashMap<T, T>` 0% memory overhead
* Worst case based on sizeof K + sizeof V:
| x | 16 | 24 | 32 | 64 | 128 |
|----------------|--------|--------|--------|-------|-------|
| (8+x+7)/(8+x) | 1.29 | 1.22 | 1.18 | 1.1 | 1.05 |
I've a test repo here to run benchmarks https://github.com/arthurprs/hashmap2/tree/layout
```
➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff %
grow_10_000 922,064 783,933 -138,131 -14.98%
grow_big_value_10_000 1,901,909 1,171,862 -730,047 -38.38%
grow_fnv_10_000 443,544 418,674 -24,870 -5.61%
insert_100 2,469 2,342 -127 -5.14%
insert_1000 23,331 21,536 -1,795 -7.69%
insert_100_000 4,748,048 3,764,305 -983,743 -20.72%
insert_10_000 321,744 290,126 -31,618 -9.83%
insert_int_bigvalue_10_000 749,764 407,547 -342,217 -45.64%
insert_str_10_000 337,425 334,009 -3,416 -1.01%
insert_string_10_000 788,667 788,262 -405 -0.05%
iter_keys_100_000 394,484 374,161 -20,323 -5.15%
iter_keys_big_value_100_000 402,071 620,810 218,739 54.40%
iter_values_100_000 424,794 373,004 -51,790 -12.19%
iterate_100_000 424,297 389,950 -34,347 -8.10%
lookup_100_000 189,997 186,554 -3,443 -1.81%
lookup_100_000_bigvalue 192,509 189,695 -2,814 -1.46%
lookup_10_000 154,251 145,731 -8,520 -5.52%
lookup_10_000_bigvalue 162,315 146,527 -15,788 -9.73%
lookup_10_000_exist 132,769 128,922 -3,847 -2.90%
lookup_10_000_noexist 146,880 144,504 -2,376 -1.62%
lookup_1_000_000 137,167 132,260 -4,907 -3.58%
lookup_1_000_000_bigvalue 141,130 134,371 -6,759 -4.79%
lookup_1_000_000_bigvalue_unif 567,235 481,272 -85,963 -15.15%
lookup_1_000_000_unif 589,391 453,576 -135,815 -23.04%
merge_shuffle 1,253,357 1,207,387 -45,970 -3.67%
merge_simple 40,264,690 37,996,903 -2,267,787 -5.63%
new 6 5 -1 -16.67%
with_capacity_10e5 3,214 3,256 42 1.31%
```
```
➜ hashmap2 git:(layout) ✗ cargo benchcmp hhkkvv:: hhkvkv:: bench.txt
name hhkkvv:: ns/iter hhkvkv:: ns/iter diff ns/iter diff %
iter_keys_100_000 391,677 382,839 -8,838 -2.26%
iter_keys_1_000_000 10,797,360 10,209,898 -587,462 -5.44%
iter_keys_big_value_100_000 414,736 662,255 247,519 59.68%
iter_keys_big_value_1_000_000 10,147,837 12,067,938 1,920,101 18.92%
iter_values_100_000 440,445 377,080 -63,365 -14.39%
iter_values_1_000_000 10,931,844 9,979,173 -952,671 -8.71%
iterate_100_000 428,644 388,509 -40,135 -9.36%
iterate_1_000_000 11,065,419 10,042,427 -1,022,992 -9.24%
```
There used to be only a global cache, which led to uncached calls to
trait selection when there were type parameters.
I'm running a check that there are no adverse performance effects.
Fixes#37106 (drop elaboration times are now ~half of borrow checking,
so might still be worthy of optimization, but not critical).
Associated type normalization is inhibited by higher-ranked regions.
Therefore, every time we erase them, we must re-normalize.
I was meaning to introduce this change some time ago, but we used
to erase regions in generic context, which broke this terribly (because
you can't always normalize in a generic context). That seems to be gone
now.
Ensure this by having a `erase_late_bound_regions_and_normalize`
function.
Fixes#37109 (the missing call was in mir::block).
macros: clean up scopes of expanded `#[macro_use]` imports
This PR changes the scope of macro-expanded `#[macro_use]` imports to match that of unexpanded `#[macro_use]` imports. For example, this would be allowed:
```rust
example!();
macro_rules! m { () => { #[macro_use(example)] extern crate example_crate; } }
m!();
```
This PR also enforces the full shadowing restrictions from RFC 1560 on `#[macro_use]` imports (currently, we only enforce the weakened restrictions from #36767).
This is a [breaking-change], but I believe it is highly unlikely to cause breakage in practice.
r? @nrc
Error monitor should emit error to stderr instead of stdout
We are pretty consistent about emitting to stderr, except for when there is actually an error, in which case we emit to stdout. This seems a bit backwards. This PR just changes that exception to emit to stderr. This is useful for the RLS since the LS protocol uses stdout (grrr).
r? @alexcrichton
Avoid allocations in `Decoder::read_str`.
`opaque::Decoder::read_str` is very hot within `rustc` due to its use in
the reading of crate metadata, and it currently returns a `String`. This
commit changes it to instead return a `Cow<str>`, which avoids a heap
allocation.
This change reduces the number of calls to `malloc` by almost 10% in
some benchmarks.
This is a [breaking-change] to libserialize.
Add comparison operators to boolean const eval.
I think it might be worth adding tests here, but since I don't know how or where to do that, I have not done so yet. Willing to do so if asked and given an explanation as to how.
Fixes#37047.