Add basic PGO support.
This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses), `-C pgo-use` and `-C pgo-gen`.
See each commit for details.
appveyor: Move run-pass-fulldeps to extra builders
We've made headway towards splitting the test suite across two appveyor builders
and this moves one more tests suite between builders. The last [failed
build][fail] had its longest running test suite and I've moved that to the
secondary builder.
cc #48844
[fail]: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6782
Introduce unsafe offset_from on pointers
Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
sub rcx, rdx
mov rax, rcx
sar rax, 63
shr rax, 62
lea rax, [rax + rcx]
sar rax, 2
ret
```
down to
```asm
sub rcx, rdx
sar rcx, 2
mov rax, rcx
ret
```
(for `*const i32`)
See discussion on the `offset_to` tracking issue https://github.com/rust-lang/rust/issues/41079
Some open questions
- Would you rather I split the intrinsic PR from the library PR?
- Do we even want the safe version of the API? https://github.com/rust-lang/rust/issues/41079#issuecomment-374426786 I've added some text to its documentation that even if it's not UB, it's useless to use it between pointers into different objects.
and todos
- [x] ~~I need to make a codegen test~~ Done
- [x] ~~Can the subtraction use nsw/nuw?~~ No, it can't https://github.com/rust-lang/rust/pull/49297#discussion_r176697574
- [x] ~~Should there be `usize` variants of this, like there are now `add` and `sub` that you almost always want over `offset`? For example, I imagine `sub_ptr` that returns `usize` and where it's UB if the distance is negative.~~ Can wait for later; C gives a signed result https://github.com/rust-lang/rust/issues/41079#issuecomment-375842235, so we might as well, and this existing to go with `offset` makes sense.
Pass --strip-debug to GccLinker when building without debuginfo
C.f. #46034
---
This brings a hello-world built by passing rustc no command line options from 2.9M to 592K on Linux.
(This might need to special case MacOS or Windows, not sure if the linkers there support `--strip-debug`, and there is an annoying lack of dependable docs for the linkers there.)
Reduce scope of unsafe block in sun_path_offset
I reduced the scope of the unsafe block to the `uninitialized` call which is the only actual unsafe bit.
ci: Don't use Travis caches for docker images
This commit moves away from caching on Travis to our own caching on S3 for
caching docker layers between builds. Unfortunately the Travis caches have over
time had a few critical pain points:
* Caches are only updated for successful builds, meaning that if a build times
out or fails in a different location the sucessfully-created docker images
isn't always cached. While this makes sense as a general rule of caches it
hurts our use cases.
* Caches are per-branch and builder which means that we don't have a separate
cache on each release channel. All our merges go through the `auto` branch
which means that they're all sharing the same cache, even those for merging to
master/beta. This means that PRs which switch between master/beta will keep
rebuilting and having cache misses.
* Caches have historically been invaliated somewhat regularly a little more
aggressively than we'd want (I think).
* We don't always need to update the contents of the cache if the Docker image
didn't change at all, and saving off the docker layers can sometimes be quite
expensive.
For all these reasons this commit drops the usage of Travis's built-in caching
support. Instead our own caching is used by storing blobs to S3. Normally this
would be a very risky endeavour but we're basically priming a cache for a cache
(docker) so if we get this wrong the failure mode is longer builds, not stale
caches. We'll notice that pretty quickly and hopefully fix it!
The logic here is inserted directly into the `src/ci/docker/run.sh` script to
download an image based on a shasum of the `Dockerfile` and other assorted files.
This blob, if found, is loaded into docker and we record what layers were
inserted. After docker finishes the build (hopefully quickly with lots of cache
hits) we then see the sha of the final image. If it's one of the layers we
loaded then there's no need to update the cache. Otherwise we upload our layers
to the global cache, possibly overwriting what we previously just downloaded.
This is hopefully a step towards mitigating #49278 although it doesn't
completely fix it as it means we'll still probably have to retry builds that
bust the cache.
They should at least be that, they usually warn about control flow mismatches,
and or the profile being useless, which looks like at least a warning to me.
Signed-off-by: Emilio Cobos Álvarez <emilio@crisal.io>
Executables crash in the probestack function otherwise... I haven't debugged
much further than that.
Signed-off-by: Emilio Cobos Álvarez <emilio@crisal.io>
Otherwise lprofGetHostName, used by the PGO generator, won't be available.
This means that PGO and coverage profiling would be restricted to systems with
uname, but that seems acceptable.
Signed-off-by: Emilio Cobos Álvarez <emilio@crisal.io>
adds simd_select intrinsic
The select SIMD intrinsic is used to select elements from two SIMD vectors using a mask:
```rust
let mask = b8x4::new(true, false, false, true);
let a = f32x4::new(1., 2., 3., 4.);
let b = f32x4::new(5., 6., 7., 8.);
assert_eq!(simd_select(mask, a, b), f32x4::new(1., 6., 7., 4.));
```
The number of lanes between the mask and the vectors must match, but the vector width of the mask does not need to match that of the vectors. The mask is required to be a vector of signed integers.
Note: this intrinsic will be exposed via `std::simd`'s vector masks - users are not expected to use it directly.
We've made headway towards splitting the test suite across two appveyor builders
and this moves one more tests suite between builders. The last [failed
build][fail] had its longest running test suite and I've moved that to the
secondary builder.
cc #48844
[fail]: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6782
Stabilize the copy_closures and clone_closures features
In addition to the `Fn*` family of traits, closures now implement `Copy` (and similarly `Clone`) if all of the captures do.
Tracking issue: https://github.com/rust-lang/rust/issues/44490
Allow installing rustfmt without config.extended
This assertion was preventing `./x.py install rustfmt` if attempted
without an "extended" build configuration, but it actually builds and
installs just fine.
Remove slow HashSet during miri stack frame creation
fixes#49237
probably has a major impact on #48846
r? @michaelwoerister
cc @eddyb I know you kept telling me to use vectors instead of hash containers... Now I know why.
Fix DefKey lookup for proc-macro crates.
Add a special case for proc-macro crates for `def_key()` in the metadata decoder (like we already have for many other methods in there). In the long run, it would be preferable to get rid of the need for special casing proc-macro crates (see #49271).
Fixes https://github.com/rust-lang/rust/issues/48739 (though I wasn't able to come up with a regression test, unfortunately)
r? @eddyb