Commit Graph

1422 Commits

Author SHA1 Message Date
bors
f525bb4e2a Auto merge of #114439 - Kobzol:remark-pgo-hotness, r=tmiasko
Add hotness data to LLVM remarks

Slight improvement of https://github.com/rust-lang/rust/pull/113040. This makes sure that if PGO is used, remarks generated using `-Zremark-dir` will include the `Hotness` attribute.

r? `@tmiasko`
2023-08-08 15:41:44 +00:00
Jakub Beránek
9d417d7c86
Only enable hotness information when PGO is available 2023-08-08 15:36:55 +02:00
Nikita Popov
ad7ea8b7e6 Update powerpc data layouts
Function pointer alignment is specified since https://reviews.llvm.org/D147016.
2023-08-07 20:35:55 +02:00
Matthias Krüger
cbe2522652
Rollup merge of #114382 - scottmcm:compare-bytes-intrinsic, r=cjgillot
Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly

As discussed in #113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target.  (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.

cc `@RalfJung` `@Amanieu`
2023-08-07 05:29:12 +02:00
scottmcm
75277a6606 Apply suggestions from code review
Co-authored-by: Ralf Jung <post@ralfj.de>
2023-08-06 15:47:40 -07:00
Scott McMurray
502af03445 Add a new compare_bytes intrinsic instead of calling memcmp directly 2023-08-06 15:47:40 -07:00
David Tolnay
704aa56ba0
Generate better function argument names in global_allocator expansion 2023-08-06 07:36:05 -07:00
Matthias Krüger
a0fd747e38
Rollup merge of #114450 - chenyukang:yukang-fix-114435, r=compiler-errors
Fix ICE failed to get layout for ReferencesError

Fixes #114435

r? `@compiler-errors`
2023-08-04 21:31:57 +02:00
yukang
3d25b5c7e8 Fix ICE failed to get layout for ReferencesError 2023-08-05 01:38:14 +08:00
bors
73dc6f03a2 Auto merge of #114350 - erikdesjardins:ident, r=tmiasko
cg_llvm: stop identifying ADTs in LLVM IR

This is an extension of https://github.com/rust-lang/rust/pull/94107. It may be a minor perf win.

Fixes #96242.

Now that we use opaque pointers, ADTs can no longer be recursive, so we
do not need to name them. Previously, this would be necessary if you had
a struct like

```rs
struct Foo(Box<Foo>, u64, u64);
```

which would be represented with something like

```ll
%Foo = type { %Foo*, i64, i64 }
```

which is now just

```ll
{ ptr, i64, i64 }
```

r? `@tmiasko`
2023-08-04 07:17:02 +00:00
Oli Scherer
4457ef2c6d Forbid old-style simd_shuffleN intrinsics 2023-08-03 09:29:00 +00:00
Nilstrieb
46f6b05eb7
Rollup merge of #114079 - compiler-errors:closure-upvars, r=oli-obk
Use `upvar_tys` in more places, make it return a list

Just a cleanup that fell out of a PR that I was gonna write, but that PR kinda got stuck.
2023-08-02 13:46:54 +02:00
Zalathar
d6ed6e3904 coverage: Consolidate FFI types into one module
Coverage FFI types were historically split across two modules, because some of
them were needed by code in `rustc_codegen_ssa`.

Now that all of the coverage codegen code has been moved into
`rustc_codegen_llvm` (#113355), it's possible to move all of the FFI types into
a single module, making it easier to see all of them at once.
2023-08-02 15:26:47 +10:00
Michael Goulet
99969d282b Use upvar_tys in more places, make it a list 2023-08-01 23:19:31 +00:00
bors
f77c624c03 Auto merge of #113339 - lqd:respect-filters, r=tmiasko
Filter out short-lived LLVM diagnostics before they reach the rustc handler

During profiling I saw remark passes being unconditionally enabled: for example `Machine Optimization Remark Emitter`.

The diagnostic remarks enabled by default are [from missed optimizations and opt analyses](https://github.com/rust-lang/rust/pull/113339#discussion_r1259480303). They are created by LLVM, passed to the diagnostic handler on the C++ side, emitted to rust, where they are unpacked, C++ strings are converted to rust, etc.

Then they are discarded in the vast majority of the time (i.e. unless some kind of `-Cremark` has enabled some of these passes' output to be printed).

These unneeded allocations are very short-lived, basically only lasting between the LLVM pass emitting them and the rust handler where they are discarded. So it doesn't hugely impact max-rss, and is only a slight reduction in instruction count (cachegrind reports a reduction between 0.3% and 0.5%) _on linux_. It's possible that targets without `jemalloc` or with a worse allocator, may optimize these less.

It is however significant in the aggregate, looking at the total number of allocated bytes:
- it's the biggest source of allocations according to dhat, on the benchmarks I've tried e.g. `syn` or `cargo`
- allocations on `syn` are reduced by 440MB, 17% (from 2440722647 bytes total, to 2030461328 bytes)
- allocations on `cargo` are reduced by 6.6GB, 19% (from 35371886402 bytes total, to 28723987743 bytes)

Some of these diagnostics objects [are allocated in LLVM](https://github.com/rust-lang/rust/pull/113339#discussion_r1252387484) *before* they're emitted to our diagnostic handler, where they'll be filtered out. So we could remove those in the future, but that will require changing a few LLVM call-sites upstream, so I left a FIXME.
2023-08-01 23:15:20 +00:00
Rémy Rakic
ca5a383fb6 remove remark filtering on the rust side
now that remarks are filtered before cg_llvm's diagnostic handler callback
is called, we don't need to do the filtering post c++-to-rust conversion
of the diagnostic.
2023-08-01 21:01:20 +00:00
bors
abd3637e42 Auto merge of #105545 - erikdesjardins:ptrclean, r=bjorn3
cleanup: remove pointee types

This can't be merged until the oldest LLVM version we support uses opaque pointers, which will be the case after #114148. (Also note `-Cllvm-args="-opaque-pointers=0"` can technically be used in LLVM 15, though I don't think we should support that configuration.)

I initially hoped this would provide some minor perf win, but in https://github.com/rust-lang/rust/pull/105412#issuecomment-1341224450 it had very little impact, so this is only valuable as a cleanup.

As a followup, this will enable #96242 to be resolved.

r? `@ghost`

`@rustbot` label S-blocked
2023-08-01 19:44:17 +00:00
Zalathar
3920e07f0b Make coverage counter IDs count up from 0, not 1
Operand types are now tracked explicitly, so there is no need to reserve ID 0
for the special always-zero counter.

As part of the renumbering, this change fixes an off-by-one error in the way
counters were counted by the `coverageinfo` query. As a result, functions
should now have exactly the number of counters they actually need, instead of
always having an extra counter that is never used.
2023-08-01 11:29:55 +10:00
Zalathar
f103db894f Make coverage expression IDs count up from 0, not down from u32::MAX
Operand types are now tracked explicitly, so there is no need for expression
IDs to avoid counter IDs by descending from `u32::MAX`. Instead they can just
count up from 0, and can be used directly as indices when necessary.
2023-08-01 11:29:55 +10:00
Zalathar
1a014d42f4 Replace ExpressionOperandId with enum Operand
Because the three kinds of operand are now distinguished explicitly, we no
longer need fiddly code to disambiguate counter IDs and expression IDs based on
the total number of counters/expressions in a function.

This does increase the size of operands from 4 bytes to 8 bytes, but that
shouldn't be a big deal since they are mostly stored inside boxed structures,
and the current coverage code is not particularly size-optimized anyway.
2023-08-01 11:29:55 +10:00
bors
5082281609 Auto merge of #113879 - nnethercote:codegen_ssa-cleanups, r=bjorn3
`codegen_ssa` cleanups

Some clarifications I made when reading this code closely.

r? `@tmiasko`
2023-07-31 08:18:19 +00:00
Nicholas Nethercote
3b44f5b0eb Use standard Rust capitalization rules for names containing "LTO". 2023-07-31 16:21:02 +10:00
Nicholas Nethercote
4a120f33f7 Remove ExtraBackendMethods::spawn_thread.
It's no longer used, and `spawn_named_thread` is preferable, because
naming threads is helpful when profiling.
2023-07-31 16:21:02 +10:00
bors
3be07c1161 Auto merge of #114266 - calebzulawski:simd-bswap, r=compiler-errors
Fix simd_bswap for i8/u8

#114156 missed this test case ☹️
cc `@workingjubilee`
2023-07-31 04:43:48 +00:00
Caleb Zulawski
77ed437de8 Fix simd_bswap for i8/u8 2023-07-30 15:40:32 -04:00
Matthias Krüger
3ce90b1649 inline format!() args up to and including rustc_codegen_llvm 2023-07-30 14:22:50 +02:00
Erik Desjardins
55800123b7 cg_llvm: simplify llvm.masked.gather/scatter naming with opaque pointers
With opaque pointers, there's no longer a need to generate a chain
of pointer types in the intrinsic name when arguments are pointers to
pointers.
2023-07-29 16:56:27 -04:00
Erik Desjardins
cf7788d54b cg_llvm: clean up match 2023-07-29 16:32:03 -04:00
Erik Desjardins
def44c5669 cg_llvm: inline check_store 2023-07-29 16:31:53 -04:00
Erik Desjardins
1d7f728901 cg_llvm: stop identifying ADTs in LLVM IR
Now that we use opaque pointers, ADTs can no longer be recursive, so we
do not need to name them. Previously, this would be necessary if you had
a struct like

```rs
struct Foo(Box<Foo>, u64, u64);
```

which would be represented with something like

```ll
%Foo = type { %Foo*, i64, i64 }
```

which is now just

```ll
{ ptr, i64, i64 }
```
2023-07-29 16:12:27 -04:00
bors
03a57254b5 Auto merge of #114156 - calebzulawski:simd-bswap, r=compiler-errors
Add simd_bswap, simd_bitreverse, simd_ctlz, and simd_cttz intrinsics

cc `@workingjubilee`
2023-07-29 18:51:45 +00:00
Erik Desjardins
04303cfb3a cg_ssa: remove pointee types and pointercast/bitcast-of-ptr 2023-07-29 13:18:20 -04:00
Erik Desjardins
b6540777fe cg_llvm: remove pointee types and pointercast/bitcast-of-ptr 2023-07-29 13:18:17 -04:00
Caleb Zulawski
ce4a48f41f Use i1 instead of bool 2023-07-28 09:46:16 -04:00
Caleb Zulawski
4c02b4cf4c Add SIMD bitreverse, ctlz, cttz intrinsics 2023-07-27 23:53:45 -04:00
Caleb Zulawski
3ea0e6e3fb Add simd_bswap intrinsic 2023-07-27 23:04:14 -04:00
Josh Stone
190ded8443 Update the minimum external LLVM to 15 2023-07-27 14:07:08 -07:00
Zalathar
01f3cc1272 coverage: Obtain the __llvm_covfun section name outside a per-function loop
This section name is always constant for a given target, but obtaining it from
LLVM requires a few intermediate allocations. There's no need to do so
repeatedly from inside a per-function loop.
2023-07-24 21:58:00 +10:00
David Tolnay
5bbf0a8306
Revert "Auto merge of #113166 - moulins:ref-niches-initial, r=oli-obk"
This reverts commit 557359f925, reversing
changes made to 1e6c09a803.
2023-07-21 22:35:57 -07:00
Miguel Ojeda
74b8d324eb Support .comment section like GCC/Clang (!llvm.ident)
Both GCC and Clang write by default a `.comment` section with compiler
information:

```txt
$ gcc -c -xc /dev/null && readelf -p '.comment' null.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 11.2.0

$ clang -c -xc /dev/null && readelf -p '.comment' null.o

String dump of section '.comment':
  [     1]  clang version 14.0.1 (https://github.com/llvm/llvm-project.git c62053979489ccb002efe411c3af059addcb5d7d)
```

They also implement the `-Qn` flag to avoid doing so:

```txt
$ gcc -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!

$ clang -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
```

So far, `rustc` only does it for WebAssembly targets and only
when debug info is enabled:

```txt
$ echo 'fn main(){}' | rustc --target=wasm32-unknown-unknown --emit=llvm-ir -Cdebuginfo=2 - && grep llvm.ident rust_out.ll
!llvm.ident = !{!27}
```

In the RFC part of this PR it was decided to always add
the information, which gets us closer to other popular compilers.
An opt-out flag like GCC and Clang may be added later on if deemed
necessary.

Implementation-wise, this covers both `ModuleLlvm::new()` and
`ModuleLlvm::new_metadata()` cases by moving the addition to
`context::create_module` and adds a few test cases.

ThinLTO also sees the `llvm.ident` named metadata duplicated (in
temporary outputs), so this deduplicates it like it is done for
`wasm.custom_sections`. The tests also check this duplication does
not take place.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2023-07-21 22:01:50 +02:00
bors
557359f925 Auto merge of #113166 - moulins:ref-niches-initial, r=oli-obk
Prototype: Add unstable `-Z reference-niches` option

MCP: rust-lang/compiler-team#641
Relevant RFC: rust-lang/rfcs#3204

This prototype adds a new `-Z reference-niches` option, controlling the range of valid bit-patterns for reference types (`&T` and `&mut T`), thereby enabling new enum niching opportunities. Like `-Z randomize-layout`, this setting is crate-local; as such, references to built-in types (primitives, tuples, ...) are not affected.

The possible settings are (here, `MAX` denotes the all-1 bit-pattern):
| `-Z reference-niches=` | Valid range |
|:---:|:---:|
| `null` (the default) | `1..=MAX` |
| `size` | `1..=(MAX- size)` |
| `align` | `align..=MAX.align_down_to(align)` |
| `size,align` | `align..=(MAX-size).align_down_to(align)` |

------

This is very WIP, and I'm not sure the approach I've taken here is the best one, but stage 1 tests pass locally; I believe this is in a good enough state to unleash this upon unsuspecting 3rd-party code, and see what breaks.
2023-07-21 15:00:36 +00:00
Matthias Krüger
b1d1e99c22
Rollup merge of #113780 - dtolnay:printkindpath, r=b-naber
Support `--print KIND=PATH` command line syntax

As is already done for `--emit KIND=PATH` and `-L KIND=PATH`.

In the discussion of #110785, it was pointed out that `--print KIND=PATH` is nicer than trying to apply the single global `-o` path to `--print`'s output, because in general there can be multiple print requests within a single rustc invocation, and anyway `-o` would already be used for a different meaning in the case of `link-args` and `native-static-libs`.

I am interested in using `--print cfg=PATH` in Buck2. Currently Buck2 works around the lack of support for `--print KIND=PATH` by [indirecting through a Python wrapper script](d43cf3a51a/prelude/rust/tools/get_rustc_cfg.py) to redirect rustc's stdout into the location dictated by the build system.

From skimming Cargo's usages of `--print`, it definitely seems like it would benefit from `--print KIND=PATH` too. Currently it is working around the lack of this by inserting `--crate-name=___ --print=crate-name` so that it can look for a line containing `___` as a delimiter between the 2 other `--print` informations it actually cares about. This is commented as a "HACK" and "abuse". 31eda6f7c3/src/cargo/core/compiler/build_context/target_info.rs (L242) (FYI `@weihanglo` as you dealt with this recently in https://github.com/rust-lang/cargo/pull/11633.)

Mentioning reviewers active in #110785: `@fee1-dead` `@jyn514` `@bjorn3`
2023-07-21 06:52:28 +02:00
Matthias Krüger
2734b5ada9
Rollup merge of #113723 - khei4:khei4/llvm-stats, r=oli-obk,nikic
Resurrect: rustc_llvm: Add a -Z `print-codegen-stats` option to expose LLVM statistics.

This resurrects PR https://github.com/rust-lang/rust/pull/104000, which has sat idle for a while. And I want to see the effect of stack-move optimizations on LLVM (like https://reviews.llvm.org/D153453) :).

I have applied the changes requested by `@oli-obk` and `@nagisa`  https://github.com/rust-lang/rust/pull/104000#discussion_r1014625377 and https://github.com/rust-lang/rust/pull/104000#discussion_r1014642482 in the latest commits.

r? `@oli-obk`

-----

LLVM has a neat [statistics](https://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option) feature that tracks how often optimizations kick in. It's very handy for optimization work. Since we expose the LLVM pass timings, I thought it made sense to expose the LLVM statistics too.

-----
(Edit: fix broken link
(Edit2: fix segmentation fault and use malloc

If `rustc` is built with
```toml
[llvm]
assertions = true
```
Then you can see like
```
rustc +stage1 -Z print-codegen-stats -C opt-level=3  tmp.rs
===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===
         3 aa                           - Number of MayAlias results
       193 aa                           - Number of MustAlias results
       531 aa                           - Number of NoAlias results
...
```

And the current default build emits only
```
$ rustc +stage1 -Z print-codegen-stats -C opt-level=3  tmp.rs
===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===
$
```
This might be better to emit the message to tell assertion flag necessity, but now I can't find how to do that...
2023-07-21 06:52:27 +02:00
Moulins
403f34b599 Don't treat ref. fields with non-null niches as dereferenceable_or_null 2023-07-21 03:31:46 +02:00
David Tolnay
815a114974
Implement printing to file in PassWrapper 2023-07-20 11:04:31 -07:00
David Tolnay
6e734fce63
Implement printing to file in llvm_util 2023-07-20 11:04:31 -07:00
David Tolnay
c80cbe4bae
Implement printing to file in codegen_backend.print 2023-07-20 11:04:31 -07:00
David Tolnay
c0dc0c6875
Store individual output file name with every PrintRequest 2023-07-20 11:04:30 -07:00
khei4
c7bf20dfdc address feedback from nikic and oli-obk https://github.com/rust-lang/rust/pull/113723/files
use slice memcpy rather than strcpy and write it on stdout

use println on failure

Co-authored-by: Oli Scherer <github35764891676564198441@oli-obk.de>
2023-07-20 16:53:06 +09:00
Dylan DPC
c1d6d322f4
Rollup merge of #113716 - DianQK:add-no_builtins-to-function, r=pnkfelix
Add the `no-builtins` attribute to functions when `no_builtins` is applied at the crate level.

**When `no_builtins` is applied at the crate level, we should add the `no-builtins` attribute to each function to ensure it takes effect in LTO.**

This is also the reason why no_builtins does not take effect in LTO as mentioned in #35540.

Now, `#![no_builtins]` should be similar to `-fno-builtin` in clang/gcc, see https://clang.godbolt.org/z/z4j6Wsod5.

Next, we should make `#![no_builtins]` participate in LTO again. That makes sense, as LTO also takes into consideration function-level instruction optimizations, such as the MachineOutliner. More importantly, when a user writes a large `#![no_builtins]` crate, they would like this crate to participate in LTO as well.

We should also add a function-level no_builtins attribute to allow users to have more control over it. This is similar to Clang's `__attribute__((no_builtin))` feature, see https://clang.godbolt.org/z/Wod6KK6eq. Before implementing this feature, maybe we should discuss whether to support more fine-grained control, such as `__attribute__((no_builtin("memcpy")))`.

Related discussions:
- #109821
- #35540

Next (a separate pull request?):
- [ ] Revert #35637
- [ ] Add a function-level `no_builtin` attribute?
2023-07-19 22:37:06 +05:30