treat host effect params as erased in codegen
This fixes the changes brought to codegen tests when effect params are added to libcore, by not attempting to monomorphize functions that get the host param by being `const fn`.
r? `@oli-obk`
This fixes the changes brought to codegen tests when effect params are
added to libcore, by not attempting to monomorphize functions that get
the host param by being `const fn`.
Rework `no_coverage` to `coverage(off)`
As discussed at the tail of https://github.com/rust-lang/rust/issues/84605 this replaces the `no_coverage` attribute with a `coverage` attribute that takes sub-parameters (currently `off` and `on`) to control the coverage instrumentation.
Allows future-proofing for things like `coverage(off, reason="Tested live", issue="#12345")` or similar.
Remove `verbose_generic_activity_with_arg`
This removes `verbose_generic_activity_with_arg` and changes users to `generic_activity_with_arg`. This keeps the output of `-Z time` readable while these repeated events are still available with the self profiling mechanism.
Instead of writing coverage mappings into a supplied `&RustString`, this
function can just create the buffer itself and return the resulting vector of
bytes.
If two or more mappings cover exactly the same region, their relative order
will now be preserved from `get_expressions_and_counter_regions`, rather than
being disturbed by implementation details of an unstable sort.
The current order is: counter mappings, expression mappings, zero mappings.
(LLVM will also perform its own stable sort on these mappings, but that sort
only compares file ID, start location, and `RegionKind`.)
This struct was only being used to hold the global file table, and one of its
methods didn't even use the table. Changing its methods to ordinary functions
makes it easier to see where the table is mutated.
debuginfo: add compiler option to allow compressed debuginfo sections
LLVM already supports emitting compressed debuginfo. In debuginfo=full builds, the debug section is often a large amount of data, and it typically compresses very well (3x is not unreasonable.) We add a new knob to allow debuginfo to be compressed when the matching LLVM functionality is present. Like clang, if a known-but-disabled compression mechanism is requested, we disable compression and emit uncompressed debuginfo sections.
The API is different enough on older LLVMs we just pretend the support
is missing on LLVM older than 16.
Use the same DISubprogram for each instance of the same inlined function within a caller
# Issue Details:
The call to `panic` within a function like `Option::unwrap` is translated to LLVM as a `tail call` (as it will never return), when multiple calls to the same function like this are inlined LLVM will notice the common `tail call` block (i.e., loading the same panic string + location info and then calling `panic`) and merge them together.
When merging these instructions together, LLVM will also attempt to merge the debug locations as well, but this fails (i.e., debug info is dropped) as Rust emits a new `DISubprogram` at each inline site thus LLVM doesn't recognize that these are actually the same function and so thinks that there isn't a common debug location.
As an example of this, consider the following program:
```rust
#[no_mangle]
fn add_numbers(x: &Option<i32>, y: &Option<i32>) -> i32 {
let x1 = x.unwrap();
let y1 = y.unwrap();
x1 + y1
}
```
When building for x86_64 Windows using 1.72 it generates (note the lack of `.cv_loc` before the call to `panic`, thus it will be attributed to the same line at the `addq` instruction):
```llvm
.cv_loc 0 1 3 0 # src\lib.rs:3:0
addq $40, %rsp
retq
leaq .Lalloc_f570dea0a53168780ce9a91e67646421(%rip), %rcx
leaq .Lalloc_629ace53b7e5b76aaa810d549cc84ea3(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17h12e60b9063f6dee8E
int3
```
# Fix Details:
Cache the `DISubprogram` emitted for each inlined function instance within a caller so that this can be reused if that instance is encountered again.
Ideally, we would also deduplicate child scopes and variables, however my attempt to do that with #114643 resulted in asserts when building for Linux (#115156) which would require some deep changes to Rust to fix (#115455).
Instead, when using an inlined function as a debug scope, we will also create a new child scope such that subsequent child scopes and variables do not collide (from LLVM's perspective).
After this change the above assembly now (with <https://reviews.llvm.org/D159226> as well) shows the `panic!` was inlined from `unwrap` in `option.rs` at line 935 into the current function in `lib.rs` at line 0 (line 0 is emitted since it is ambiguous which line to use as there were two inline sites that lead to this same code):
```llvm
.cv_loc 0 1 3 0 # src\lib.rs:3:0
addq $40, %rsp
retq
.cv_inline_site_id 6 within 0 inlined_at 1 0 0
.cv_loc 6 2 935 0 # library\core\src\option.rs:935:0
leaq .Lalloc_5f55955de67e57c79064b537689facea(%rip), %rcx
leaq .Lalloc_e741d4de8cb5801e1fd7a6c6795c1559(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17hde1558f32d5b1c04E
int3
```
lto: load bitcode sections by name
Upstream change
llvm/llvm-project@6b539f5eb8 changed `isSectionBitcode` works and it now only respects `.llvm.lto` sections instead of also `.llvmbc`, which it says was never intended to be used for LTO. We instead load sections by name, and sniff for raw bitcode by hand.
This is an alternative approach to #115136, where we tried the same thing using the `object` crate, but it got too fraught to continue.
r? `@nikic`
`@rustbot` label: +llvm-main
Use `Freeze` for `SourceFile`
This uses the `Freeze` type in `SourceFile` to let accessing `external_src` and `lines` be lock-free.
Behavior of `add_external_src` is changed to set `ExternalSourceKind::AbsentErr` on a hash mismatch which matches the documentation. `ExternalSourceKind::Unneeded` was removed as it's unused.
Based on https://github.com/rust-lang/rust/pull/115401.
LLVM already supports emitting compressed debuginfo. In debuginfo=full
builds, the debug section is often a large amount of data, and it
typically compresses very well (3x is not unreasonable.) We add a new
knob to allow debuginfo to be compressed when the matching LLVM
functionality is present. Like clang, if a known-but-disabled
compression mechanism is requested, we disable compression and emit
uncompressed debuginfo sections.
The API is different enough on older LLVMs we just pretend the support
is missing on LLVM older than 16.
Upstream change
llvm/llvm-project@6b539f5eb8 changed
`isSectionBitcode` works and it now only respects `.llvm.lto` sections
instead of also `.llvmbc`, which it says was never intended to be used
for LTO. We instead load sections by name, and sniff for raw bitcode by
hand.
r? @nikic
@rustbot label: +llvm-main
Add CL and CMD into to pdb debug info
Partial fix for https://github.com/rust-lang/rust/issues/96475
The Arg0 and CommandLineArgs of the MCTargetOptions cpp class are not set within bb548f9645/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L378)
This causes LLVM to not neither output any compiler path (cl) nor the arguments that were used when invoking it (cmd) in the PDB file.
This fix adds the missing information to the target machine so LLVM can use it.
Upstream change
llvm/llvm-project@6b539f5eb8 changed
`isSectionBitcode` works and it now only respects `.llvm.lto` sections
instead of also `.llvmbc`, which it says was never intended to be used
for LTO. We instead load sections by name, and sniff for raw bitcode by
hand.
r? @nikic
@rustbot label: +llvm-main
Inline functions called from `add_coverage`
This removes quite a bit of indirection and duplicated code related to getting the `FunctionCoverage`.
CC `@Zalathar`
Use `preserve_mostcc` for `extern "rust-cold"`
As experimentation in #115242 has shown looks better than `coldcc`. Notably, clang exposes `preserve_most` (https://clang.llvm.org/docs/AttributeReference.html#preserve-most) but not `cold`, so this change should put us on a better-supported path.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse. (See comment in the code.)
cc tracking issue #97544
codegen_llvm/llvm_type: avoid matching on the Rust type
This `match` is highly suspicious. Looking at `scalar_llvm_type_at` I think it makes no difference. But if it were to make a difference that would be a huge problem, since it doesn't look through `repr(transparent)`!
Cc `@eddyb` `@bjorn3`
As experimentation in 115242 has shown looks better than `coldcc`.
And *don't* use a different convention for cold on Windows, because that actually ends up making things worse.
cc tracking issue 97544
Revert "Use the same DISubprogram for each instance of the same inline function within the caller"
This reverts commit 687bffa493.
Reverting to resolve ICEs reported on nightly.
cc `@dpaoliello`
Fixes#115156
Use the same DISubprogram for each instance of the same inlined function within a caller
# Issue Details:
The call to `panic` within a function like `Option::unwrap` is translated to LLVM as a `tail call` (as it will never return), when multiple calls to the same function like this is inlined LLVM will notice the common `tail call` block (i.e., loading the same panic string + location info and then calling `panic`) and merge them together.
When merging these instructions together, LLVM will also attempt to merge the debug locations as well, but this fails (i.e., debug info is dropped) as Rust emits a new `DISubprogram` at each inline site thus LLVM doesn't recognize that these are actually the same function and so thinks that there isn't a common debug location.
As an example of this when building for x86_64 Windows (note the lack of `.cv_loc` before the call to `panic`, thus it will be attributed to the same line at the `addq` instruction):
```
.cv_loc 0 1 23 0 # src\lib.rs:23:0
addq $40, %rsp
retq
leaq .Lalloc_f570dea0a53168780ce9a91e67646421(%rip), %rcx
leaq .Lalloc_629ace53b7e5b76aaa810d549cc84ea3(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17h12e60b9063f6dee8E
int3
```
# Fix Details:
Cache the `DISubprogram` emitted for each inlined function instance within a caller so that this can be reused if that instance is encountered again, this also requires caching the `DILexicalBlock` and `DIVariable` objects to avoid creating duplicates.
After this change the above assembly now looks like:
```
.cv_loc 0 1 23 0 # src\lib.rs:23:0
addq $40, %rsp
retq
.cv_inline_site_id 5 within 0 inlined_at 1 0 0
.cv_inline_site_id 6 within 5 inlined_at 1 12 0
.cv_loc 6 2 935 0 # library\core\src\option.rs:935:0
leaq .Lalloc_5f55955de67e57c79064b537689facea(%rip), %rcx
leaq .Lalloc_e741d4de8cb5801e1fd7a6c6795c1559(%rip), %r8
movl $43, %edx
callq _ZN4core9panicking5panic17hde1558f32d5b1c04E
int3
```
Replace the \01__gnu_mcount_nc to LLVM intrinsic for ARM
Current `-Zinstrument-mcount` for ARM32 use the `\01__gnu_mcount_nc` directly for its instrumentation function.
However, the LLVM does not use this mcount function directly, but it wraps it to intrinsic, `llvm.arm.gnu.eabi.mcount` and the transform pass also only handle the intrinsic.
As a result, current `-Zinstrument-mcount` not work on ARM32. Refer: https://github.com/namhyung/uftrace/issues/1764
This commit replaces the mcount name from native function to the LLVM intrinsic so that the transform pass can handle it.
Current `-Zinstrument-mcount` for ARM32 use the `\01__gnu_mcount_nc`
directly for its instrumentation function.
However, the LLVM does not use this mcount function directly, but it wraps
it to intrinsic, `llvm.arm.gnu.eabi.mcount` and the transform pass also
only handle the intrinsic.
As a result, current `-Zinstrument-mcount` not work on ARM32.
Refer: https://github.com/namhyung/uftrace/issues/1764
This commit replaces the mcount name from native function to the
LLVM intrinsic so that the transform pass can handle it.
Signed-off-by: ChoKyuWon <kyuwoncho18@gmail.com>
Use `unstable_target_features` when checking inline assembly
This is necessary to properly validate register classes even when the relevant target feature name is still unstable.
Extract a create_wrapper_function for use in allocator shim writing
This deduplicates some logic and makes it easier to follow what wrappers are produced. In the future it may allow moving the code to determine which wrappers to create to cg_ssa.
coverage: Don't convert filename/symbol strings to `CString` for FFI
LLVM APIs are usually perfectly happy to accept pointer/length strings, as long as we supply a suitable length value when creating a `StringRef` or `std::string`.
This lets us avoid quite a few intermediate `CString` copies during coverage codegen. It also lets us use an `IndexSet<Symbol>` (instead of an `IndexSet<CString>`) when building the deduplicated filename table.
feat: `riscv-interrupt-{m,s}` calling conventions
Similar to prior support added for the mips430, avr, and x86 targets this change implements the rough equivalent of clang's [`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking visibility into any non-assembly body of the interrupt handler, have to be very conservative and save the [entire CPU state to the stack frame][full-frame-save]. By instead asking LLVM to only save the registers that it uses, we defer the decision to the tool with the best context: it can more accurately account for the cost of spills if it knows that every additional register used is already at the cost of an implicit spill.
At the LLVM level, this is apparently [implemented by] marking every register as "[callee-save]," matching the semantics of an interrupt handler nicely (it has to leave the CPU state just as it found it after its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes no attempt to e.g. save the state in a user-accessible stack frame. For a full discussion of those challenges and tradeoffs, please refer to [the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM does not expose the "all-saved" function flavor as a calling convention directly, instead preferring to use an attribute that allows for differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be aware of the differences between machine-mode and supervisor-mode interrupts as to why no `riscv-interrupt` calling convention is exposed through rustc, and similarly for why `riscv-interrupt-u` makes no appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
Similar to prior support added for the mips430, avr, and x86 targets
this change implements the rough equivalent of clang's
[`__attribute__((interrupt))`][clang-attr] for riscv targets, enabling
e.g.
```rust
static mut CNT: usize = 0;
pub extern "riscv-interrupt-m" fn isr_m() {
unsafe {
CNT += 1;
}
}
```
to produce highly effective assembly like:
```asm
pub extern "riscv-interrupt-m" fn isr_m() {
420003a0: 1141 addi sp,sp,-16
unsafe {
CNT += 1;
420003a2: c62a sw a0,12(sp)
420003a4: c42e sw a1,8(sp)
420003a6: 3fc80537 lui a0,0x3fc80
420003aa: 63c52583 lw a1,1596(a0) # 3fc8063c <_ZN12esp_riscv_rt3CNT17hcec3e3a214887d53E.0>
420003ae: 0585 addi a1,a1,1
420003b0: 62b52e23 sw a1,1596(a0)
}
}
420003b4: 4532 lw a0,12(sp)
420003b6: 45a2 lw a1,8(sp)
420003b8: 0141 addi sp,sp,16
420003ba: 30200073 mret
```
(disassembly via `riscv64-unknown-elf-objdump -C -S --disassemble ./esp32c3-hal/target/riscv32imc-unknown-none-elf/release/examples/gpio_interrupt`)
This outcome is superior to hand-coded interrupt routines which, lacking
visibility into any non-assembly body of the interrupt handler, have to
be very conservative and save the [entire CPU state to the stack
frame][full-frame-save]. By instead asking LLVM to only save the
registers that it uses, we defer the decision to the tool with the best
context: it can more accurately account for the cost of spills if it
knows that every additional register used is already at the cost of an
implicit spill.
At the LLVM level, this is apparently [implemented by] marking every
register as "[callee-save]," matching the semantics of an interrupt
handler nicely (it has to leave the CPU state just as it found it after
its `{m|s}ret`).
This approach is not suitable for every interrupt handler, as it makes
no attempt to e.g. save the state in a user-accessible stack frame. For
a full discussion of those challenges and tradeoffs, please refer to
[the interrupt calling conventions RFC][rfc].
Inside rustc, this implementation differs from prior art because LLVM
does not expose the "all-saved" function flavor as a calling convention
directly, instead preferring to use an attribute that allows for
differentiating between "machine-mode" and "superivsor-mode" interrupts.
Finally, some effort has been made to guide those who may not yet be
aware of the differences between machine-mode and supervisor-mode
interrupts as to why no `riscv-interrupt` calling convention is exposed
through rustc, and similarly for why `riscv-interrupt-u` makes no
appearance (as it would complicate future LLVM upgrades).
[clang-attr]: https://clang.llvm.org/docs/AttributeReference.html#interrupt-risc-v
[full-frame-save]: 9281af2ecf/src/lib.rs (L440-L469)
[implemented by]: b7fb2a3fec/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (L61-L67)
[callee-save]: 973f1fe7a8/llvm/lib/Target/RISCV/RISCVCallingConv.td (L30-L37)
[rfc]: https://github.com/rust-lang/rfcs/pull/3246
CFI: Fix error compiling core with LLVM CFI enabled
Fix#90546 by filtering out global value function pointer types from the type tests, and adding the LowerTypeTests pass to the rustc LTO optimization pipelines.
Add hotness data to LLVM remarks
Slight improvement of https://github.com/rust-lang/rust/pull/113040. This makes sure that if PGO is used, remarks generated using `-Zremark-dir` will include the `Hotness` attribute.
r? `@tmiasko`
Fix#90546 by filtering out global value function pointer types from the
type tests, and adding the LowerTypeTests pass to the rustc LTO
optimization pipelines.
Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly
As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target. (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)
Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.
cc `@RalfJung` `@Amanieu`
This deduplicates some logic and makes it easier to follow what wrappers
are produced. In the future it may allow moving the code to determine
which wrappers to create to cg_ssa.
cg_llvm: stop identifying ADTs in LLVM IR
This is an extension of https://github.com/rust-lang/rust/pull/94107. It may be a minor perf win.
Fixes#96242.
Now that we use opaque pointers, ADTs can no longer be recursive, so we
do not need to name them. Previously, this would be necessary if you had
a struct like
```rs
struct Foo(Box<Foo>, u64, u64);
```
which would be represented with something like
```ll
%Foo = type { %Foo*, i64, i64 }
```
which is now just
```ll
{ ptr, i64, i64 }
```
r? `@tmiasko`
Coverage FFI types were historically split across two modules, because some of
them were needed by code in `rustc_codegen_ssa`.
Now that all of the coverage codegen code has been moved into
`rustc_codegen_llvm` (#113355), it's possible to move all of the FFI types into
a single module, making it easier to see all of them at once.
Filter out short-lived LLVM diagnostics before they reach the rustc handler
During profiling I saw remark passes being unconditionally enabled: for example `Machine Optimization Remark Emitter`.
The diagnostic remarks enabled by default are [from missed optimizations and opt analyses](https://github.com/rust-lang/rust/pull/113339#discussion_r1259480303). They are created by LLVM, passed to the diagnostic handler on the C++ side, emitted to rust, where they are unpacked, C++ strings are converted to rust, etc.
Then they are discarded in the vast majority of the time (i.e. unless some kind of `-Cremark` has enabled some of these passes' output to be printed).
These unneeded allocations are very short-lived, basically only lasting between the LLVM pass emitting them and the rust handler where they are discarded. So it doesn't hugely impact max-rss, and is only a slight reduction in instruction count (cachegrind reports a reduction between 0.3% and 0.5%) _on linux_. It's possible that targets without `jemalloc` or with a worse allocator, may optimize these less.
It is however significant in the aggregate, looking at the total number of allocated bytes:
- it's the biggest source of allocations according to dhat, on the benchmarks I've tried e.g. `syn` or `cargo`
- allocations on `syn` are reduced by 440MB, 17% (from 2440722647 bytes total, to 2030461328 bytes)
- allocations on `cargo` are reduced by 6.6GB, 19% (from 35371886402 bytes total, to 28723987743 bytes)
Some of these diagnostics objects [are allocated in LLVM](https://github.com/rust-lang/rust/pull/113339#discussion_r1252387484) *before* they're emitted to our diagnostic handler, where they'll be filtered out. So we could remove those in the future, but that will require changing a few LLVM call-sites upstream, so I left a FIXME.
now that remarks are filtered before cg_llvm's diagnostic handler callback
is called, we don't need to do the filtering post c++-to-rust conversion
of the diagnostic.
cleanup: remove pointee types
This can't be merged until the oldest LLVM version we support uses opaque pointers, which will be the case after #114148. (Also note `-Cllvm-args="-opaque-pointers=0"` can technically be used in LLVM 15, though I don't think we should support that configuration.)
I initially hoped this would provide some minor perf win, but in https://github.com/rust-lang/rust/pull/105412#issuecomment-1341224450 it had very little impact, so this is only valuable as a cleanup.
As a followup, this will enable #96242 to be resolved.
r? `@ghost`
`@rustbot` label S-blocked
Operand types are now tracked explicitly, so there is no need to reserve ID 0
for the special always-zero counter.
As part of the renumbering, this change fixes an off-by-one error in the way
counters were counted by the `coverageinfo` query. As a result, functions
should now have exactly the number of counters they actually need, instead of
always having an extra counter that is never used.
Operand types are now tracked explicitly, so there is no need for expression
IDs to avoid counter IDs by descending from `u32::MAX`. Instead they can just
count up from 0, and can be used directly as indices when necessary.
Because the three kinds of operand are now distinguished explicitly, we no
longer need fiddly code to disambiguate counter IDs and expression IDs based on
the total number of counters/expressions in a function.
This does increase the size of operands from 4 bytes to 8 bytes, but that
shouldn't be a big deal since they are mostly stored inside boxed structures,
and the current coverage code is not particularly size-optimized anyway.
Now that we use opaque pointers, ADTs can no longer be recursive, so we
do not need to name them. Previously, this would be necessary if you had
a struct like
```rs
struct Foo(Box<Foo>, u64, u64);
```
which would be represented with something like
```ll
%Foo = type { %Foo*, i64, i64 }
```
which is now just
```ll
{ ptr, i64, i64 }
```