Optimize `IntRange::from_pat`, then shrink `ParamEnv`
Resolves#77058.
r? `@Mark-Simulacrum`
cc `@vandenheuvel`
Looking at the output of `perf report` for #76244, the hot instructions seemed to be around the call to `pat_constructor` in `IntRange::from_pat`. I carried out an obvious optimization, but it actually made the instruction count higher (see #77075). However, it seems to have mitigated whatever was causing the pipeline stalls, so when combined with #76244, it's a net win.
As you can see below, the regression in #76244 seems to have originated from something measured by `stalled-cycles-backend`. I'll try to collect some finer-grained stats to see if I can isolate it. I wish I had a better idea of what was going on here. I'd like to prevent the regression from reappearing in the future due to small changes in unrelated code.
<details>
<summary>Current `master`:</summary>
```
Performance counter stats for 'cargo +baseline-stage1 check':
2,275.67 msec task-clock:u # 0.998 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
49,826 page-faults:u # 0.022 M/sec
5,117,221,678 cycles:u # 2.249 GHz
299,655,943 stalled-cycles-frontend:u # 5.86% frontend cycles idle
2,284,213,395 stalled-cycles-backend:u # 44.64% backend cycles idle
8,051,871,959 instructions:u # 1.57 insn per cycle
# 0.28 stalled cycles per insn
1,359,589,402 branches:u # 597.447 M/sec
7,359,347 branch-misses:u # 0.54% of all branches
2.281030026 seconds time elapsed
2.108197000 seconds user
0.164183000 seconds sys
```
</details>
<details>
<summary>Shrink `ParamEnv` without changing `IntRange::from_pat`:</summary>
```
Performance counter stats for 'cargo +perf-stage1 check':
2,751.79 msec task-clock:u # 0.996 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
50,103 page-faults:u # 0.018 M/sec
6,260,590,019 cycles:u # 2.275 GHz
317,355,920 stalled-cycles-frontend:u # 5.07% frontend cycles idle
3,397,743,582 stalled-cycles-backend:u # 54.27% backend cycles idle
8,276,224,367 instructions:u # 1.32 insn per cycle
# 0.41 stalled cycles per insn
1,370,453,386 branches:u # 498.023 M/sec
7,281,031 branch-misses:u # 0.53% of all branches
2.763265838 seconds time elapsed
2.544578000 seconds user
0.204548000 seconds sys
```
</details>
<details>
<summary>Shrink `ParamEnv` and change `IntRange::from_pat`: </summary>
```
Performance counter stats for 'cargo +perf-stage1 check':
2,295.57 msec task-clock:u # 0.996 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
49,959 page-faults:u # 0.022 M/sec
5,151,407,066 cycles:u # 2.244 GHz
324,517,829 stalled-cycles-frontend:u # 6.30% frontend cycles idle
2,301,671,001 stalled-cycles-backend:u # 44.68% backend cycles idle
8,130,868,329 instructions:u # 1.58 insn per cycle
# 0.28 stalled cycles per insn
1,356,618,512 branches:u # 590.972 M/sec
7,323,800 branch-misses:u # 0.54% of all branches
2.304509653 seconds time elapsed
2.128090000 seconds user
0.163909000 seconds sys
```
</details>
Remove `#[rustc_allow_const_fn_ptr]` and add `#![feature(const_fn_fn_ptr_basics)]`
`rustc_allow_const_fn_ptr` was a hack to work around the lack of an escape hatch for the "min `const fn`" checks in const-stable functions. Now that we have co-opted `allow_internal_unstable` for this purpose, we no longer need a bespoke attribute.
Now this functionality is gated under `const_fn_fn_ptr_basics` (how concise!), and `#[allow_internal_unstable(const_fn_fn_ptr_basics)]` replaces `#[rustc_allow_const_fn_ptr]`. `const_fn_fn_ptr_basics` allows function pointer types to appear in the arguments and locals of a `const fn` as well as function pointer casts to be performed inside a `const fn`. Both of these were allowed in constants and statics already. Notably, this does **not** allow users to invoke function pointers in a const context. Presumably, we will use a nicer name for that (`const_fn_ptr`?).
r? @oli-obk
diag: improve closure/generic parameter mismatch
Fixes#51154.
This PR improves the diagnostic when a type parameter is expected and a closure is found, noting that each closure has a distinct type and therefore could not always match the caller-chosen type of the parameter.
r? @estebank
This was a hack to work around the lack of an escape hatch for the "min
`const fn`" checks in const-stable functions. Now that we have co-opted
`allow_internal_unstable` for this purpose, we no longer need the
bespoke attribute.
Return values up to 128 bits in registers
This fixes https://github.com/rust-lang/rust/issues/26494#issuecomment-619506345 by making Rust's default ABI pass return values up to 128 bits in size in registers, just like the System V ABI.
The result is that these methods from the comment linked above now generate the same code, making the Rust ABI as efficient as the `"C"` ABI:
```rust
pub struct Stats { x: u32, y: u32, z: u32, }
pub extern "C" fn sum_c(a: &Stats, b: &Stats) -> Stats {
return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z };
}
pub fn sum_rust(a: &Stats, b: &Stats) -> Stats {
return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z };
}
```
```asm
sum_rust:
movl (%rsi), %eax
addl (%rdi), %eax
movl 4(%rsi), %ecx
addl 4(%rdi), %ecx
movl 8(%rsi), %edx
addl 8(%rdi), %edx
shlq $32, %rcx
orq %rcx, %rax
retq
```
Ignore ZST offsets when deciding whether to use Scalar/ScalarPair layout
This is important because Scalar/ScalarPair layout previously would not be used if any ZST had nonzero offset.
For example, before this change, only `((), u128)` would be laid out like `u128`, not `(u128, ())`.
Fixes#63244