Add slice::sort_by_cached_key as a memoised sort_by_key
At present, `slice::sort_by_key` calls its key function twice for each comparison that is made. When the key function is expensive (which can often be the case when `sort_by_key` is chosen over `sort_by`), this can lead to very suboptimal behaviour.
To address this, I've introduced a new slice method, `sort_by_cached_key`, which has identical semantic behaviour to `sort_by_key`, except that it guarantees the key function will only be called once per element.
Where there are `n` elements and the key function is `O(m)`:
- `slice::sort_by_cached_key` time complexity is `O(m n log m n)`, compared to `slice::sort_by_key`'s `O(m n + n log n)`.
- `slice::sort_by_cached_key` space complexity remains at `O(n + m)`. (Technically, it now reserves a slice of size `n`, whereas before it reserved a slice of size `n/2`.)
`slice::sort_unstable_by_key` has not been given an analogue, as it is important that unstable sorts are in-place, which is not a property that is guaranteed here. However, this also means that `slice::sort_unstable_by_key` is likely to be slower than `slice::sort_by_cached_key` when the key function does not have negligible complexity. We might want to explore this trade-off further in the future.
Benchmarks (for a vector of 100 `i32`s):
```
# Lexicographic: `|x| x.to_string()`
test bench_sort_by_key ... bench: 112,638 ns/iter (+/- 19,563)
test bench_sort_by_cached_key ... bench: 15,038 ns/iter (+/- 4,814)
# Identity: `|x| *x`
test bench_sort_by_key ... bench: 1,346 ns/iter (+/- 238)
test bench_sort_by_cached_key ... bench: 1,839 ns/iter (+/- 765)
# Power: `|x| x.pow(31)`
test bench_sort_by_key ... bench: 3,624 ns/iter (+/- 738)
test bench_sort_by_cached_key ... bench: 1,997 ns/iter (+/- 311)
# Abs: `|x| x.abs()`
test bench_sort_by_key ... bench: 1,546 ns/iter (+/- 174)
test bench_sort_by_cached_key ... bench: 1,668 ns/iter (+/- 790)
```
(So it seems functions that are single operations do perform slightly worse with this method, but for pretty much any more complex key, you're better off with this optimisation.)
I've definitely found myself using expensive keys in the past and wishing this optimisation was made (e.g. for https://github.com/rust-lang/rust/pull/47415). This feels like both desirable and expected behaviour, at the small cost of slightly more stack allocation and minute degradation in performance for extremely trivial keys.
Resolves#34447.
implement minmax intrinsics
This adds the `simd_{fmin,fmax}` intrinsics, which do a vertical (lane-wise) `min`/`max` for floating point vectors that's equivalent to Rust's `min`/`max` for `f32`/`f64`.
It might make sense to make `{f32,f64}::{min,max}` use the `minnum` and `minmax` intrinsics as well.
---
~~HELP: I need some help with these. Either I should go to sleep or there must be something that I must be missing. AFAICT I am calling the `maxnum` builder correctly, yet rustc/LLVM seem to insert a call to `llvm.minnum` there instead...~~ EDIT: Rust's LLVM version is too old :/
rustbuild: Fail the build if we build Cargo twice
This commit updates the `ToolBuild` step to stream Cargo's JSON messages, parse
them, and record all libraries built. If we build anything twice (aka Cargo)
it'll most likely happen due to dependencies being recompiled which is caught by
this check.
This commit updates the `ToolBuild` step to stream Cargo's JSON messages, parse
them, and record all libraries built. If we build anything twice (aka Cargo)
it'll most likely happen due to dependencies being recompiled which is caught by
this check.
Fix confusing doc for `scan`
The comment "the value passed on to the next iteration" confused me since it sounded more like what Haskell's [scanl](http://hackage.haskell.org/package/base-4.11.0.0/docs/Prelude.html#v:scanl) does where the closure's return value serves as both the "yielded value" *and* the new value of the "state".
I tried changing the example to make it clear that the closure's return value is decoupled from the state argument.
rustbuild: Disable docs on cross-compiled builds
This commit disables building documentation on cross-compiled compilers, for
example ARM/MIPS/PowerPC/etc. Currently I believe we're not getting much use out
of these documentation artifacts and they often take 10-15 minutes total to
build as it requires building rustdoc/rustbook and then also generating all the
documentation, especially for the reference and the book itself.
In an effort to cut down on the amount of work that we're doing on dist CI
builders in light of recent timeouts this was some relatively low hanging fruit
to cut which in theory won't have much impact on the ecosystem in the hopes that
the documentation isn't used too heavily anyway.
While initial analysis in #48827 showed only shaving 5 minutes off local builds
the same 5 minute conclusion was drawn from #48826 which ended up having nearly
a half-hour impact on the bots. In that sense I'm hoping that we can land this
and test out what happens on CI to see how it affects timing.
Note that all tier 1 platforms, Windows, Mac, and Linux, will continue to
generate documentation.
Use an uninitialized buffer in GenericRadix::fmt_int, like in Display::fmt for numeric types
The code using a slice of that buffer is only ever going to use
bytes that are subsequently initialized.
Document when types have OS-dependent sizes
As per issue #43601, types that can change size depending on the
target operating system should say so in their documentation.
I used this template when adding doc comments:
```
The size of a(n) <name> struct may vary depending on the target
operating system, and may change between Rust releases.
```
For enums, I used "instance" instead of "struct".
I added documentation to these types:
```
- std::time::Instant (contains sys::time::Instant)
- std::time::SystemTime (contains sys::time::SystemTime)
- std::io::StdinRaw (contains sys::stdio::Stdin)
- std::io::StdoutRaw (contains sys::stdio::Stdout)
- std::io::Stderr (contains sys::stdio::Stderr)
- std::net::addr::SocketAddrV4 (contains sys::net::netc::sockaddr_in)
- std::net::addr::SocketAddrV6 (contains sys::net::netc::sockaddr_in6)
- std::net::addr::SocketAddr (contains std::net::addr::SocketAddrV4 and SocketAddrV6)
- std::net::ip::Ipv4Addr (contains sys::net::netc::in_addr)
- std::net::ip::Ipv6Addr (contains sys::net::netc::in6_addr)
- std::net::ip::IpAddr (contains std::net::ip::Ipv4Addr and Ipv6Addr)
```
I also found that these types varied in size; however, I don't think they need documentation, as it's already fairly obvious that they change based on different OS's:
```
- std::fs::DirBuilder (contains sys::fs::DirBuilder)
- std::fs::FileType (contains sys::fs::FileType)
- std::fs::Permissions (contains sys::fs::FilePermissions)
- std::fs::OpenOptions (contains sys::fs::OpenOptions)
- std::fs::DirEntry (contains sys::fs::DirEntry)
- std::fs::ReadDir (contains sys::fs::ReadDir)
- std::fs::Metadata (contains sys::fs::FileAttr)
- std::fs::File (contains sys::fs::File)
- std::process::Child (contains sys::process::Process)
- std::process::ChildStdin (contains sys::process::AnonPipe)
- std::process::ChildStdout (contains sys::process::AnonPipe)
- std::process::ChildStderr (contains sys::process::AnonPipe)
- std::process::Command (contains sys::process::Command)
- std::process::Stdio (contains sys::process::Stdio)
- std::process::ExitStatus (contains sys::process::ExitStatus)
- std::process::Output (contains std::process::ExitStatus)
- std::sys_common::condvar::Condvar (contains sys::condvar::Condvar)
- std::sys_common::mutex::Mutex (contains sys::mutex::Mutex)
- std::sys_common::net::LookupHost (contains sys::net::netc::addrinfo)
- std::sys_common::net::TcpStream (contains sys::net::Socket)
- std::sys_common::net::TcpListener (contains sys::net::Socket)
- std::sys_common::net::UdpSocket (contains sys::net::Socket)
- std::sys_common::remutex::ReentrantMutex (contains sys::mutex::ReentrantMutex)
- std::sys_common::rwlock::RWLock (contains sys::rwlock::RWLock)
- std::sys_common::thread_local::Key (contains sys::thread_local::Key)
```
Maybe we should just put a comment about the size of structs in the module-level docs for `fs`, `process`, and `sys_common`?
If anyone can think of other types that can change size, comment below. I'm also open to changing the wording.
closes#43601.
Some comments and documentation for name resolution crate
Hello
I'm trying to get a grasp of how the name resolution crate works, as part of helping with https://github.com/rust-lang-nursery/rustc-guide/issues/16. Not that I'd be succeeding much, but as I was reading the code, I started to put some notes into it, to help me understand.
I guess I didn't get very far yet, but I'd like to share what I have, in case it might be useful for someone else. I hope these are correct (even if incomplete), but I'll be glad for a fast check in case I put something misleading there.
Add basic PGO support.
This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses), `-C pgo-use` and `-C pgo-gen`.
See each commit for details.
This commit disables building documentation on cross-compiled compilers, for
example ARM/MIPS/PowerPC/etc. Currently I believe we're not getting much use out
of these documentation artifacts and they often take 10-15 minutes total to
build as it requires building rustdoc/rustbook and then also generating all the
documentation, especially for the reference and the book itself.
In an effort to cut down on the amount of work that we're doing on dist CI
builders in light of recent timeouts this was some relatively low hanging fruit
to cut which in theory won't have much impact on the ecosystem in the hopes that
the documentation isn't used too heavily anyway.
While initial analysis in #48827 showed only shaving 5 minutes off local builds
the same 5 minute conclusion was drawn from #48826 which ended up having nearly
a half-hour impact on the bots. In that sense I'm hoping that we can land this
and test out what happens on CI to see how it affects timing.
Note that all tier 1 platforms, Windows, Mac, and Linux, will continue to
generate documentation.
appveyor: Move run-pass-fulldeps to extra builders
We've made headway towards splitting the test suite across two appveyor builders
and this moves one more tests suite between builders. The last [failed
build][fail] had its longest running test suite and I've moved that to the
secondary builder.
cc #48844
[fail]: https://ci.appveyor.com/project/rust-lang/rust/build/1.0.6782
Introduce unsafe offset_from on pointers
Adds intrinsics::exact_div to take advantage of the unsafe, which reduces the implementation from
```asm
sub rcx, rdx
mov rax, rcx
sar rax, 63
shr rax, 62
lea rax, [rax + rcx]
sar rax, 2
ret
```
down to
```asm
sub rcx, rdx
sar rcx, 2
mov rax, rcx
ret
```
(for `*const i32`)
See discussion on the `offset_to` tracking issue https://github.com/rust-lang/rust/issues/41079
Some open questions
- Would you rather I split the intrinsic PR from the library PR?
- Do we even want the safe version of the API? https://github.com/rust-lang/rust/issues/41079#issuecomment-374426786 I've added some text to its documentation that even if it's not UB, it's useless to use it between pointers into different objects.
and todos
- [x] ~~I need to make a codegen test~~ Done
- [x] ~~Can the subtraction use nsw/nuw?~~ No, it can't https://github.com/rust-lang/rust/pull/49297#discussion_r176697574
- [x] ~~Should there be `usize` variants of this, like there are now `add` and `sub` that you almost always want over `offset`? For example, I imagine `sub_ptr` that returns `usize` and where it's UB if the distance is negative.~~ Can wait for later; C gives a signed result https://github.com/rust-lang/rust/issues/41079#issuecomment-375842235, so we might as well, and this existing to go with `offset` makes sense.
Pass --strip-debug to GccLinker when building without debuginfo
C.f. #46034
---
This brings a hello-world built by passing rustc no command line options from 2.9M to 592K on Linux.
(This might need to special case MacOS or Windows, not sure if the linkers there support `--strip-debug`, and there is an annoying lack of dependable docs for the linkers there.)