These are some samples that I have been focusing on improving over
time. In this PR, I mainly want to stem the bleeding where we in some
cases we show an error that gives you no possible way to divine the
problem.
Fix for bootstrapping on NixOS
NixOS puts Linux's dynamic loader in wierd place. Detect when we're on NixOS and patch the downloaded bootstrap executables appropriately.
appveyor: Move MSVC dist builds to their own builder
In the long run we want to separate out the dist builders from the test
builders. This provides us leeway to expand the dist builders with more tools
(e.g. Cargo and the RLS) without impacting cycle times.
Currently the Travis dist builders double-up the platforms they provide builds
for, so I figured we could try that out for MSVC as well. This commit adds a new
AppVeyor builder which runs a dist for all the MSVC targets:
* x86_64-pc-windows-msvc
* i686-pc-windows-msvc
* i586-pc-windows-msvc
If this takes too long and/or times out we'll need to split this up. In any case
we're going to need more capacity from AppVeyor no matter what becaue the two
pc-windows-gnu targets can't cross compile so we need at least 2 more builders
no matter what.
1. Clarify that `String::split_off` returns one string and modifies self
in-place. The documentation implied that it returns two new strings.
2. Make the documentation mirror `Vec::split_off`.
* Update bootstrap to recognize the cputype 'sparcv9' (used on Solaris)
* Change to never use -fomit-frame-pointer on Solaris or for sparc
* Adds rust target sparcv9-sun-solaris
Fixes#39901
Adaptive hashmap implementation
All credits to @pczarn who wrote https://github.com/rust-lang/rfcs/pull/1796 and https://github.com/contain-rs/hashmap2/pull/5
**Background**
Rust std lib hashmap puts a strong emphasis on security, we did some improvements in https://github.com/rust-lang/rust/pull/37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481).
This is a simplified version of https://github.com/rust-lang/rfcs/pull/1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself.
**Proposal**
This PR adds code checking for extra long probe and shifts lengths (see code comments and https://github.com/rust-lang/rfcs/pull/1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case.
We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe.
**Drawbacks**
* May interact badly with poor hashers. Maps using those may not use the desired capacity.
* It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches.
* May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor.
**Example**
Example code that exploit the exposure of iteration order and weak hasher.
```
const MERGE: usize = 10_000usize;
#[bench]
fn merge_dos(b: &mut Bencher) {
let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect();
let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect();
b.iter(|| {
let mut merged = first_map.clone();
for (&k, &v) in &second_map {
merged.insert(k, v);
}
::test::black_box(merged);
});
}
```
_91 is stdlib and _ad is patched (the end capacity in both cases is the same)
```
running 2 tests
test _91::merge_dos ... bench: 47,311,843 ns/iter (+/- 2,040,302)
test _ad::merge_dos ... bench: 599,099 ns/iter (+/- 83,270)
```
In the long run we want to separate out the dist builders from the test
builders. This provides us leeway to expand the dist builders with more tools
(e.g. Cargo and the RLS) without impacting cycle times.
Currently the Travis dist builders double-up the platforms they provide builds
for, so I figured we could try that out for MSVC as well. This commit adds a new
AppVeyor builder which runs a dist for all the MSVC targets:
* x86_64-pc-windows-msvc
* i686-pc-windows-msvc
* i586-pc-windows-msvc
If this takes too long and/or times out we'll need to split this up. In any case
we're going to need more capacity from AppVeyor no matter what becaue the two
pc-windows-gnu targets can't cross compile so we need at least 2 more builders
no matter what.
This is a simple way to workaround the debugging issues caused by the rustc
wrapper used in the bootstrap process. Namely, it uses some obscure environment
variables and you can’t just copy the failed command and run it in the shell or
debugger to examine the failure more closely.
With `--on-fail` its possible to run an arbitrary command within exactly the
same environment under which rustc failed. Theres’s multiple ways to use this
new flag:
$ python x.py build --stage=1 --on-fail=env
would print a list of environment variables and the failed command, so a
few copy-pastes and you now can run the same rust in your shell outside the
bootstrap system.
$ python x.py build --stage=1 --on-fail=bash
Is a more useful variation of the command above in that it launches a whole
shell with environment already in place! All that’s left to do is copy-paste
the command just above the shell prompt!
Fixes#38686Fixes#38221
Higher-ranked object types can otherwise cause late-bound regions to
sneak into the substs, leading to the false conclusion that some method
is unreachable. The heart of this patch is from @arielb1.
travis: Disable source tarballs on most builders
Currently we create a source tarball on almost all of the `DEPLOY=1` builders
but this has the adverse side effect of all source tarballs overriding
themselves in the S3 bucket. Normally this is ok but unfortunately a source
tarball created on Windows is not buildable on Unix.
On Windows the vendored sources contain paths with `\` characters in them which
when interpreted on Unix end up in "file not found" errors.
Instead of this overwriting behavior, whitelist just one linux builder for
producing tarballs and avoid producing tarballs on all other hosts.
use bash when invoking dist shell scripts on solaris
Partially fixes#25845
A separate, trivial fix is needed to the rust-installer scripts to completely resolve this issue.
sys/mod doc update and mod import order adjust
* Some doc updates.
* Racer currently use the first mod it finds regardless of cfg attrs. Moving #[cfg(unix)] up should be a temporary tweak that works as expected for more people.
Vec, LinkedList, VecDeque, String, and Option NatVis visualizations
I've added some basic [NatVis](https://msdn.microsoft.com/en-us/library/jj620914.aspx) visualizations for core Rust collections and types. This helps address a need filed in issue #36503. NatVis visualizations are similar to gdb/lldb pretty printers, but for windbg and the Visual Studio debugger on Windows.
For example, Vec without the supplied NatVis looks like this in windbg using the "dx" command:
```
0:000> dx some_64_bit_vec
some_64_bit_vec [Type: collections::vec::Vec<u64>]
[+0x000] buf [Type: alloc::raw_vec::RawVec<u64>]
[+0x010] len : 0x4 [Type: unsigned __int64]
```
With the NatVis, the elements of the Vec are displayed:
```
0:000> dx some_64_bit_vec
some_64_bit_vec : { size=0x4 } [Type: collections::vec::Vec<u64>]
[<Raw View>] [Type: collections::vec::Vec<u64>]
[size] : 0x4 [Type: unsigned __int64]
[capacity] : 0x4 [Type: unsigned __int64]
[0] : 0x4 [Type: unsigned __int64]
[1] : 0x4f [Type: unsigned __int64]
[2] : 0x1a [Type: unsigned __int64]
[3] : 0x184 [Type: unsigned __int64]
```
In fact, the vector can be treated as an array by the NatVis expression evaluator:
```
0:000> dx some_64_bit_vec[2]
some_64_bit_vec[2] : 0x1a [Type: unsigned __int64]
```
In general, it works with any NatVis command that understands collections, such as NatVis LINQ expressions:
```
0:000> dx some_64_bit_vec.Select(x => x * 2)
some_64_bit_vec.Select(x => x * 2)
[0] : 0x8
[1] : 0x9e
[2] : 0x34
[3] : 0x308
```
std::string::String is implemented, as well:
```
0:000> dv
hello_world = "Hello, world!"
empty = ""
new = ""
0:000> dx hello_world
hello_world : "Hello, world!" [Type: collections::string::String]
[<Raw View>] [Type: collections::string::String]
[size] : 0xd [Type: unsigned __int64]
[capacity] : 0xd [Type: unsigned __int64]
[0] : 72 'H' [Type: char]
[1] : 101 'e' [Type: char]
...
[12] : 33 '!' [Type: char]
0:000> dx empty
empty : "" [Type: collections::string::String]
[<Raw View>] [Type: collections::string::String]
[size] : 0x0 [Type: unsigned __int64]
[capacity] : 0x0 [Type: unsigned __int64]
```
VecDeque and LinkedList are also implemented.
My biggest concern is the implementation for Option due to the different layouts it can receive based on whether the sentinel value can be embedded with-in the Some value or must be stored separately.
It seems to work, but my testing isn't exhaustive:
```
0:000> dv
three = { Some 3 }
none = { None }
no_str = { None }
some_str = { Some "Hello!" }
0:000> dx three
three : { Some 3 } [Type: core::option::Option<i32>]
[<Raw View>] [Type: core::option::Option<i32>]
[size] : 0x1 [Type: ULONG]
[value] : 3 [Type: int]
[0] : 3 [Type: int]
0:000> dx none
none : { None } [Type: core::option::Option<i32>]
[<Raw View>] [Type: core::option::Option<i32>]
[size] : 0x0 [Type: ULONG]
[value] : 4 [Type: int]
0:000> dx no_str
no_str : { None } [Type: core::option::Option<collections::string::String>]
[<Raw View>] [Type: core::option::Option<collections::string::String>]
[size] : 0x0 [Type: ULONG]
0:000> dx some_str
some_str : { Some "Hello!" } [Type: core::option::Option<collections::string::String>]
[<Raw View>] [Type: core::option::Option<collections::string::String>]
[size] : 0x1 [Type: ULONG]
[value] : 0x4673df710 : "Hello!" [Type: collections::string::String *]
[0] : "Hello!" [Type: collections::string::String]
```
For now all of these visualizations work in windbg, but I've only gotten the visualizations in libcore.natvis working in the VS debugger. My priority is windbg, but somebody else may be interested in investigating the issues related to VS.
You can load these visualizations into a windbg sessions using the .nvload command:
```
0:000> .nvload ..\rust\src\etc\natvis\libcollections.natvis; .nvload ..\rust\src\etc\natvis\libcore.natvis
Successfully loaded visualizers in "..\rust\src\etc\natvis\libcollections.natvis"
Successfully loaded visualizers in "..\rust\src\etc\natvis\libcore.natvis"
```
There are some issues with the symbols that Rust and LLVM conspire to emit into the PDB that inhibit debugging in windbg generally, and by extension make writing visualizations more difficult. Additionally, there are some bugs in windbg itself that complicate or disable some use of the NatVis visualizations for Rust. Significantly, due to NatVis limitations in windbg around allowable type names, you cannot write a visualization for [T] or str. I'll report separate issues as I isolate them.
In the near term, I hope to fill out these NatVis files with more of Rust's core collections and types. In the long run, I hope that we can ship NatVis files with crates and streamline their deployment when debugging Rust programs on windows.
Allow more Cell methods for non-Copy types
Clearly, `get_mut` is safe for any `T`. The other two only provide unsafe pointers anyway.
The only remaining inherent method with `Copy` bound is `get`, which sounds about right to me.
I found the order if `impl` blocks in the file a little weird (first inherent impl, then some trait impls, then another inherent impl), but didn't change it to keep the diff small.
Contributes to #39264
book: don’t use GNU extensions in the example unnecessarily
The use of a GNU C extension for bloc expressions is immaterial to the
actual problem with C macros that the section tries to show so don’t
use it and instead use a plain C way of writing the macro which has
added benefit of being better C code (since the macro now behaves like
a function, syntax-wise).
This commit changes all MSVC rustc binaries to be compiled with
`-C target-feature=+crt-static` to link statically against the MSVCRT instead of
dynamically (as it does today). This also necessitates compiling LLVM in a
different fashion, ensuring it's compiled with `/MT` instead of `/MD`.
cc #37406
Stabilize field init shorthand
Closes#37340.
~Still blocked by the documentation issue #38830.~ EDIT: seems that all parts required for stabilisation are fixed, so its not blocked.
Currently we create a source tarball on almost all of the `DEPLOY=1` builders
but this has the adverse side effect of all source tarballs overriding
themselves in the S3 bucket. Normally this is ok but unfortunately a source
tarball created on Windows is not buildable on Unix.
On Windows the vendored sources contain paths with `\` characters in them which
when interpreted on Unix end up in "file not found" errors.
Instead of this overwriting behavior, whitelist just one linux builder for
producing tarballs and avoid producing tarballs on all other hosts.
Dont segfault if btree range is not in order
This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.
Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case.
Open questions:
1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.
The same question for other weird cases:
2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5.
4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?