speed up mem::swap
I would have thought that the mem::swap code didn't need an intermediate variable precisely because the pointers are guaranteed never to alias. And.. it doesn't! It seems that llvm will also auto-vectorize this case for large structs, but alas it doesn't seem to have all the aliasing info it needs and so will add redundant checks (and even not bother with autovectorizing for small types). Looks like a lot of performance could still be gained here, so this might be a good test case for future optimizer improvements.
Here are the current benchmarks for the simd version of mem::swap; the timings are in cycles (code below) measured with 10 iterations. The timings for sizes > 32 which are not a multiple of 8 tend to be ever so slightly faster in the old code, but not always. For large struct sizes (> 1024) the new code shows a marked improvement.
\* = latest commit
† = subtracted from other measurements
| arr_length | noop<sup>†</sup> | rust_stdlib | simd_u64x4\* | simd_u64x8
|------------------|------------|-------------------|-------------------|-------------------
8|80|90|90|90
16|72|177|177|177
24|32|76|76|76
32|68|188|112|188
40|32|80|60|80
48|32|84|56|84
56|32|108|72|108
64|32|108|72|76
72|80|350|220|230
80|80|350|220|230
88|80|420|270|270
96|80|420|270|270
104|80|500|320|320
112|80|490|320|320
120|72|528|342|342
128|48|360|234|234
136|72|987|387|387
144|80|1070|420|420
152|64|856|376|376
160|68|804|400|400
168|80|1060|520|520
176|80|1070|520|520
184|32|464|228|228
192|32|504|228|228
200|32|440|248|248
208|72|987|573|573
216|80|1464|220|220
224|48|852|450|450
232|72|1182|666|666
240|32|428|288|288
248|32|428|308|308
256|80|860|770|770
264|80|1130|820|820
272|80|1340|820|820
280|80|1220|870|870
288|72|1227|804|804
296|72|1356|849|849
We now fetch source lines from the `external_src` member as a secondary
fallback if no regular source is present, that is, if the file map
belongs to an external crate and the source has been fetched from disk.
Only emit one error for `use foo::self;`
Currently `use foo::self;` would emit both E0429 and E0432. This commit silence the latter one (assuming `foo` is a valid module).
Fixes#42559
Disentangle InferCtxt, MemCategorizationContext and ExprUseVisitor.
At some point in the past, `InferCtxt` started being used to replace an old "`Typer`" abstraction, which provided access to `TypeckTables` and had optionally type inference to account for.
That didn't play so nicely with the `'gcx`/`'tcx` split and I had to introduce `borrowck_fake_infer_ctxt`.
The situation wasn't great but it wasn't too painful inside `rustc` itself.
Recently I've found that method being used in clippy, which does need EUV (before we make it plausible to run lints on HAIR or MIR), and set out to separate inference from tables, for the sake of lint authors.
Also fixes#42435 to make it trivial to compute type layout or use EUV from lints.
The remaining uses of `TypeckTables` in `InferCtxt` are for closure kinds and signatures, used in trait selection and projection normalization. The solution there is likely to add them as bounds to `ParamEnv`.
r? @nikomatsakis
cc @mcarton @llogiq @Manishearth
They are now handled in their own member to prevent mutating access to
the `src` member. This way, we can safely load external sources, while
keeping the mutation of local source strings off-limits.
The rationale is that BOM stripping is needed for lazy source loading
for external crates, and duplication can be avoided by moving the
corresponding functionality to libsyntax_pos.
We can use these to perform lazy loading of source files belonging to
external crates. That way we will be able to show the source code of
external spans that have been translated.
Get LLVM to stop generating dead assembly in next_power_of_two
It turns out that LLVM can turn `@llvm.ctlz.i64(_, true)` into `@llvm.ctlz.i64(_, false)` ([`ctlz`](http://llvm.org/docs/LangRef.html#llvm-ctlz-intrinsic)) where valuable, but never does the opposite. That leads to some silly assembly getting generated in certain cases.
A contrived-but-clear example https://is.gd/VAIKuC:
```rust
fn foo(x:u64) -> u32 {
if x == 0 { return !0; }
x.leading_zeros()
}
```
Generates
```asm
testq %rdi, %rdi
je .LBB0_1
je .LBB0_3 ; <-- wha?
bsrq %rdi, %rax
xorq $63, %rax
retq
.LBB0_1:
movl $-1, %eax
retq
.LBB0_3:
movl $64, %eax ; <-- dead
retq
```
I noticed this in `next_power_of_two`, which without this PR generates the following:
```asm
cmpq $2, %rcx
jae .LBB1_2
movl $1, %eax
retq
.LBB1_2:
decq %rcx
je .LBB1_3
bsrq %rcx, %rcx
xorq $63, %rcx
jmp .LBB1_5
.LBB1_3:
movl $64, %ecx ; <-- dead
.LBB1_5:
movq $-1, %rax
shrq %cl, %rax
incq %rax
retq
```
And with this PR becomes
```asm
cmpq $2, %rcx
jae .LBB0_2
movl $1, %eax
retq
.LBB0_2:
decq %rcx
bsrq %rcx, %rcx
xorl $63, %ecx
movq $-1, %rax
shrq %cl, %rax
incq %rax
retq
```
Speed up expansion
This reduces duplication, thereby increasing expansion speed. Based on tests with rust-uinput, this produces a 29x performance win (440 seconds to 15 seconds). I want to land this first, since it's a minimal patch, but with more changes to the macro parsing I can get down to 12 seconds locally.
There is one FIXME added to the code that I'll keep for now since changing it will spread outward and increase the patch size, I think.
Fixes#37074.
r? @jseyfried
cc @oberien
Ignore variadic FFI test on AArch64
I've cross compiled Rust to `aarch64-linux-gnu`, and tried to run the compile-fail tests, but `variadic-ffi.rs` fails with the following error:
```
The ABI `"stdcall"` is not supported for the current target [E0570]
```
The test seems to be ignored on (32-bit) ARM, so I turned it off for AArch64 too.
Vec<T> is pronounced 'vec'
I've never heard it pronounced "vector". Is this an outdated recommendation?
(or have I been doing it wrong all this time)
r? @steveklabnik
Currently rustdoc will fail if passed `-o foo/doc` if the `foo`
directory doesn't exist.
Also remove unneeded `mkdir` as `create_dir_all` can now handle
concurrent invocations.
Fix GDB pretty-printer for tuples and pointers
Names of children should not be the same, because GDB uses them to distinguish the children.
|Before|After|
|---|---|
|![tuples_before](https://cloud.githubusercontent.com/assets/1297574/26527639/5d6cf10e-43a0-11e7-9498-abfcddb08055.png)|![tuples_after](https://cloud.githubusercontent.com/assets/1297574/26527655/9699233a-43a0-11e7-83c6-f58f713b51a0.png)|
`main.rs`
```rust
enum Test {
Zero,
One(i32),
Two(i32, String),
Three(i32, String, Vec<String>),
}
fn main() {
let tuple = (1, 2, "Asdfgh");
let zero = Test::Zero;
let one = Test::One(10);
let two = Test::Two(42, "Qwerty".to_owned());
let three = Test::Three(9000,
"Zxcvbn".to_owned(),
vec!["lorem".to_owned(), "ipsum".to_owned(), "dolor".to_owned()]);
println!(""); // breakpoint here
}
```
`launch.json`
```json
{
"version": "0.2.0",
"configurations": [
{
"type": "gdb",
"request": "launch",
"gdbpath": "rust-gdb",
"name": "Launch Program",
"valuesFormatting": "prettyPrinters", //this requires plugin Native Debug >= 0.20.0
"target": "./target/debug/test_pretty_printers",
"cwd": "${workspaceRoot}"
}
]
}
```
Fix translation of external spans
Previously, I noticed that spans from external crates don't generate any output. This limitation is problematic if analysis is performed on one or more external crates, as is the case with [rust-semverver](https://github.com/ibabushkin/rust-semverver). This change should address this behaviour, with the potential drawback that a minor performance hit is to be expected, as spans from potentially large crates have to be translated now.