Commit Graph

170921 Commits

Author SHA1 Message Date
b-naber
e14b34c386 account for endianness in debuginfo for const args 2022-06-14 16:12:34 +02:00
b-naber
060acc97db rebase 2022-06-14 16:12:28 +02:00
b-naber
8093db6e2b correctly create Scalar for meta info 2022-06-14 16:11:36 +02:00
b-naber
196f3c0e71 fix wrong evaluation in clippy 2022-06-14 16:11:35 +02:00
b-naber
90c4b947aa fix wrong evaluation in clippy 2022-06-14 16:11:35 +02:00
b-naber
6d94f95a20 address review 2022-06-14 16:11:27 +02:00
b-naber
773d8b2e15 address review 2022-06-14 16:11:27 +02:00
b-naber
0a6815a924 bless 32-bit ui tests 2022-06-14 16:09:10 +02:00
b-naber
17323e05ce manually bless 32-bit mir-opt tests 2022-06-14 16:09:06 +02:00
b-naber
dbef6e4507 address review 2022-06-14 16:08:18 +02:00
b-naber
3f4ad95826 fix clippy test failures 2022-06-14 16:08:11 +02:00
b-naber
5c95a3db2a fix clippy test failures 2022-06-14 16:08:11 +02:00
b-naber
90a41050ba implement valtrees as the type-system representation for constant values 2022-06-14 16:07:11 +02:00
b-naber
705d818bd5 implement valtrees as the type-system representation for constant values 2022-06-14 16:07:11 +02:00
bors
872503d918 Auto merge of #78781 - eddyb:measureme-rdpmc, r=oli-obk
Integrate measureme's hardware performance counter support.

*Note: this is a companion to https://github.com/rust-lang/measureme/pull/143, and duplicates some information with it for convenience*

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

## Credits

I'd like to start by thanking `@alyssais,` `@cuviper,` `@edef1c,` `@glandium,` `@jix,` `@Mark-Simulacrum,` `@m-ou-se,` `@mystor,` `@nagisa,` `@puckipedia,` and `@yorickvP,` for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

## Summary

This PR adds support to `-Z self-profile` for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the `rdpmc` instruction on `x86_64` Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit `x86` (aka `i686`) would be almost trivial to add and test once we land the initial `x86_64` version (as all the CPU detection code can be reused).

A new flag `-Z self-profile-counter` was added, to control which of the named `measureme` counters is used, and which defaults to `wall-time`, in order to keep `-Z self-profile`'s current functionality unchanged (at least for now).

The named counters so far are:
* `wall-time`: the existing time measurement
    * name chosen for consistency with `perf.rust-lang.org`
    * continues to use `std::time::Instant` for a nanosecond-precision "monotonic clock"
* `instructions:u`: the hardware performance counter usually referred to as "Instructions retired"
    * here "retired" (roughly) means "fully executed"
    * the `:u` suffix is from the Linux `perf` tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
        * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this isn't entirely true and why `instructions-minus-irqs:u` should be preferred instead*
* `instructions-minus-irqs:u`: same as `instructions:u`, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
    * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this should be preferred over `instructions:u`*
* `instructions-minus-r0420:u`: experimental counter, same as `instructions-minus-irqs:u` but subtracting an undocumented counter (`r0420:u`) instead of IRQs
    * the `rXXXX` notation is again from Linux `perf`, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't *really* know what it is
    * this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    * *see [Challenges/Zen's undocumented 420 counter](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter) for details on how this counter was found and what it does*

---

There are also some additional commits:
* ~~see [Challenges/Rebasing *shouldn't* affect the results, right?](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) for details on the changes to `rustc_parse` and `rustc_trait_section` (the latter far more dubious, and probably shouldn't be merged, or not as-is)~~
  *  **EDIT**: the effects of these are no long quantifiable, the PR includes reverts for them
* ~~see [Challenges/`jemalloc`: purging will commence in ten seconds](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) for details on the `jemalloc` change~~
  * this is also separately found in #77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API `jemalloc` offers (assuming that can stop the timer that's already running, which I'm not sure about)
  * **EDIT**: until we can do this based on `-Z` flags, this commit has also been reverted
* the `proc_macro` change was to avoid randomized hashing and therefore ASLR-like effects

---

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

#### Write-up / report

Because of how extensive the full report ended up being, I've kept most of it [on `hackmd.io`](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view), but for convenient access, here are all the sections (with individual links):
<sup>(someone suggested I'd make a backup, so [here it is on the wayback machine](http://web.archive.org/web/20201127164748/https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view) - I'll need to remember to update that if I have to edit the write-up)</sup>

* [**Motivation**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Motivation)

* [**Results**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Results)
    * [**Overhead**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Overhead)
    *Preview (see the report itself for more details):*

    |Counter|Total<br>`instructions-minus-irqs:u`|Overhead from "Baseline"<br>(for all 1903881<br>counter reads)|Overhead from "Baseline"<br>(per each counter read)|
    |-|-|-|-|
    |Baseline|63637621286 ±6||
    |`instructions:u`|63658815885 ±2|&nbsp;&nbsp;+21194599 ±8|&nbsp;&nbsp;+11|
    |`instructions-minus-irqs:u`|63680307361 ±13|&nbsp;&nbsp;+42686075 ±19|&nbsp;&nbsp;+22|
    |`wall-time`|63951958376 ±10275|+314337090 ±10281|+165|

    * [**"Macro" noise (self time)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Macro”-noise-(self-time))
    *Preview (see the report itself for more details):*

    || `wall-time` (ns) | `instructions:u` | `instructions-minus-irqs:u`
    -: | -: | -: | -:
    `typeck` | 5478261360 ±283933373 (±~5.2%) | 17350144522 ±6392 (±~0.00004%) | 17351035832.5 ±4.5 (±~0.00000003%)
    `expand_crate` | 2342096719 ±110465856 (±~4.7%) | 8263777916 ±2937 (±~0.00004%) | 8263708389 ±0 (±~0%)
    `mir_borrowck` | 2216149671 ±119458444 (±~5.4%) | 8340920100 ±2794 (±~0.00003%) | 8341613983.5 ±2.5 (±~0.00000003%)
    `mir_built` | 1269059734 ±91514604 (±~7.2%) | 4454959122 ±1618 (±~0.00004%) | 4455303811 ±1 (±~0.00000002%)
    `resolve_crate` | 942154987.5 ±53068423.5 (±~5.6%) | 3951197709 ±39 (±~0.000001%) | 3951196865 ±0 (±~0%)

    * [**"Micro" noise (individual sampling intervals)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Micro”-noise-(individual-sampling-intervals))

* [**Caveats**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Caveats)
    * [**Disabling ASLR**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Disabling-ASLR)
    * [**Non-deterministic proc macros**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Non-deterministic-proc-macros)
    * [**Subtracting IRQs**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs)
    * [**Lack of support for multiple threads**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Lack-of-support-for-multiple-threads)

* [**Challenges**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Challenges)
    * [**How do we even read hardware performance counters?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#How-do-we-even-read-hardware-performance-counters)
    * [**ASLR: it's free entropy**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#ASLR-it’s-free-entropy)
    * [**The serializing instruction**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#The-serializing-instruction)
    * [**Getting constantly interrupted**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Getting-constantly-interrupted)
    * [**AMD patented time-travel and dubbed it `SpecLockMap`<br><sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;or: "how we accidentally unlocked `rr` on AMD Zen"</sup>**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#AMD-patented-time-travel-and-dubbed-it-SpecLockMapnbspnbspnbspnbspnbspnbspnbspnbspor-“how-we-accidentally-unlocked-rr-on-AMD-Zen”)
    * [**`jemalloc`: purging will commence in ten seconds**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds)
    * [**Rebasing *shouldn't* affect the results, right?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right)
    * [**Epilogue: Zen's undocumented 420 counter**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter)
2022-06-14 13:37:39 +00:00
flip1995
195f208200
Add VFE test for 32 bit
The offset in the llvm.type.checked.load intrinsic differs on 32 bit platforms
2022-06-14 14:50:53 +02:00
flip1995
a93ea7ebc8
Add user documentation for -Zvirtual-function-elimination 2022-06-14 14:50:53 +02:00
flip1995
996c6b7964
Add test for VFE optimization 2022-06-14 14:50:52 +02:00
flip1995
e96e6e2c89
Add metadata generation for vtables when using VFE
This adds the typeid and `vcall_visibility` metadata to vtables when the
-Cvirtual-function-elimination flag is set.

The typeid is generated in the same way as for the
`llvm.type.checked.load` intrinsic from the trait_ref.

The offset that is added to the typeid is always 0. This is because LLVM
assumes that vtables are constructed according to the definition in the
Itanium ABI. This includes an "address point" of the vtable. In C++ this
is the offset in the vtable where information for RTTI is placed. Since
there is no RTTI information in Rust's vtables, this "address point" is
always 0. This "address point" in combination with the offset passed to
the `llvm.type.checked.load` intrinsic determines the final function
that should be loaded from the vtable in the
`WholeProgramDevirtualization` pass in LLVM. That's why the
`llvm.type.checked.load` intrinsics are generated with the typeid of the
trait, rather than with that of the function that is called. This
matches what `clang` does for C++.

The vcall_visibility metadata depends on three factors:

1. LTO level: Currently this is always fat LTO, because LLVM only
   supports this optimization with fat LTO.
2. Visibility of the trait: If the trait is publicly visible, VFE
   can only act on its vtables after linking.
3. Number of CGUs: if there is more than one CGU, also vtables with
   restricted visibility could be seen outside of the CGU, so VFE can
   only act on them after linking.

To reflect this, there are three visibility levels: Public, LinkageUnit,
and TranslationUnit.
2022-06-14 14:50:52 +02:00
flip1995
e1c1d0f8c2
Add llvm.type.checked.load intrinsic
Add the intrinsic

declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type)

This is used in the VFE optimization when lowering loading functions
from vtables to LLVM IR. The `metadata` is used to map the function to
all vtables this function could belong to. This ensures that functions
from vtables that might be used somewhere won't get removed.
2022-06-14 14:50:52 +02:00
flip1995
d55787a155
Add typeid_for_trait_ref function
This function computes a Itanium-like typeid for a trait_ref. This is
required for the VFE optimization in LLVM. It is used to map
`llvm.type.checked.load` invocations, that is loading the function from
a vtable, to the vtables this function could be from.

It is important to note that `typeid`s are not unique. So multiple
vtables of the same trait can share `typeid`s.
2022-06-14 14:50:52 +02:00
flip1995
20f597ffcd
Add LLVM module flags required for the VFE opt
To apply the optimization the `Virtual Function Elim` module flag has to
be set. To apply this optimization post-link the `LTOPostLink` module
flag has to be set.
2022-06-14 14:50:52 +02:00
flip1995
def3fd8e92
Add -Zvirtual-function-elimination flag
Adds the virtual-function-elimination unstable compiler flag and a check
that this flag is only used in combination with -Clto. LLVM can only
apply this optimization with fat LTO.
2022-06-14 14:50:51 +02:00
bors
edab34ab2a Auto merge of #98091 - Dylan-DPC:rollup-ueb6b5x, r=Dylan-DPC
Rollup of 5 pull requests

Successful merges:

 - #97869 (BTree: tweak internal comments)
 - #97935 (Rename the `ConstS::val` field as `kind`.)
 - #97948 (lint: add diagnostic translation migration lints)
 - #98042 (Fix compat_fn option method on miri)
 - #98069 (rustdoc:  remove link on slice brackets)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
2022-06-14 10:51:16 +00:00
bors
5a45805db5 Auto merge of #8947 - Serial-ATA:lint-produces-output, r=xFrednet
Add lint output to lint list

changelog: Add the ability to show the lint output in the lint list

This just adds the logic to produce the output, it hasn't been added to any lints yet. It did help find some mistakes in some docs though 😄.

### Screenshots

<details>
<summary>A single code block</summary>

![single-code-block](https://user-images.githubusercontent.com/69764315/172013766-145b22b1-1d91-4fb8-9cd0-b967a52d6330.png)
</details>

<details>
<summary>A single code block with a "Use instead" section</summary>

![with-usage](https://user-images.githubusercontent.com/69764315/172013792-d2dd6c9c-defa-41e0-8c27-8e8e311adb63.png)
</details>

<details>
<summary>Multiple code blocks</summary>

![multi-code-block](https://user-images.githubusercontent.com/69764315/172013808-5328f59b-e7c5-4914-a396-253822a6d350.png)
</details>

This is the last task in #7172 🎉.
r? `@xFrednet` (?)
2022-06-14 10:42:09 +00:00
bors
c07cbb9ea6 Auto merge of #8901 - Jarcho:sharing_code, r=dswij
Rework `branches_sharing_code`

fixes #7378

This changes the lint from checking pairs of blocks, to checking all the blocks at the same time. As such there's almost none of the original code left.

changelog: Don't lint `branches_sharing_code` when using different binding names
2022-06-14 08:59:40 +00:00
Guillaume Gomez
df9fea24a5 Fix expand/collapse on source viewer sidebar folders 2022-06-14 10:59:01 +02:00
Dylan DPC
27f78051ad
Rollup merge of #98069 - notriddle:notriddle/square-brackets, r=jsha
rustdoc:  remove link on slice brackets

This is #91778, take two.

Fixes #91173

The reason I'm reevaluating this change is #97668, which makes fully-generic slices link to the slice docs page. This fixes some downsides in the original PR, where `Box<[T]>`, for example, was not linked to the primitive.slice.html page. In this PR, the `[T]` inside is still a link.

The other major reason for wanting to reevaluate this is the changed color scheme. When this feature was first introduced in rustdoc, primitives were a different color from structs and enums. This way, eagle-eyed users could figure out that the square brackets were separate links from the structs inside. Now, all types have the same color, so a significant fraction of users won't even know the links are there unless they pay close attention to the status bar or use an accessibility tool that lists all links on the page.
2022-06-14 10:35:33 +02:00
Dylan DPC
e565541824
Rollup merge of #98042 - DrMeepster:winfred_std_changes, r=ChrisDenton
Fix compat_fn option method on miri

This change is required to make `WaitOnAddress` work with rust-lang/miri#2231
2022-06-14 10:35:32 +02:00
Dylan DPC
d8333a7b59
Rollup merge of #97948 - davidtwco:diagnostic-translation-lints, r=oli-obk
lint: add diagnostic translation migration lints

Introduce allow-by-default lints for checking whether diagnostics are written in
`SessionDiagnostic` or `AddSubdiagnostic` impls and whether diagnostics are translatable. These lints can be denied for modules once they are fully migrated to impls and translation.

These lints are intended to be temporary - once all diagnostics have been changed then we can just change the APIs we have and that will enforce these constraints thereafter.

r? `````@oli-obk`````
2022-06-14 10:35:31 +02:00
Dylan DPC
9e5c5c57e9
Rollup merge of #97935 - nnethercote:rename-ConstS-val-as-kind, r=lcnr
Rename the `ConstS::val` field as `kind`.

And likewise for the `Const::val` method.

Because its type is called `ConstKind`. Also `val` is a confusing name
because `ConstKind` is an enum with seven variants, one of which is
called `Value`. Also, this gives consistency with `TyS` and `PredicateS`
which have `kind` fields.

The commit also renames a few `Const` variables from `val` to `c`, to
avoid confusion with the `ConstKind::Value` variant.

r? `@BoxyUwU`
2022-06-14 10:35:29 +02:00
Dylan DPC
4b1d510e0c
Rollup merge of #97869 - ssomers:btree_comments, r=Dylan-DPC
BTree: tweak internal comments
2022-06-14 10:35:29 +02:00
bors
c80ca2c1d6 Auto merge of #8997 - Jarcho:clap_deprecate, r=flip1995
Fix `clap` deprecation warnings

Clap `3.2.0` deprecated a few functions used by lintcheck.

changelog: None
2022-06-14 08:33:52 +00:00
bors
da895e7938 Auto merge of #98082 - lnicola:rust-analyzer-2022-06-14, r=lnicola
⬆️ rust-analyzer

r? `@ghost`
2022-06-14 08:08:36 +00:00
Takayuki Maeda
60a50d02ab change edition to 2021 2022-06-14 15:46:11 +09:00
Takayuki Maeda
801725a77b suggest adding a #[macro_export] to a private macro 2022-06-14 14:58:46 +09:00
Nicholas Nethercote
a2801625cd Remove thread-local IGNORED_ATTRIBUTES.
It's just a copy of the read-only global `ich::IGNORED_ATTRIBUTES`, and
can be removed without any effect.
2022-06-14 15:20:54 +10:00
bors
4e02a9281d Auto merge of #98041 - jackh726:remove-regionckmode, r=oli-obk
Remove RegionckMode in favor of calling new skip_region_resolution

Simple cleanup. We can skip a bunch of stuff for places where NLL does the region checking, so skip earlier.

r? rust-lang/types
2022-06-14 05:07:11 +00:00
Nicholas Nethercote
abe45a9ffa Rename rustc_serialize::opaque::Encoder as MemEncoder.
This avoids the name clash with `rustc_serialize::Encoder` (a trait),
and allows lots qualifiers to be removed and imports to be simplified
(e.g. fewer `as` imports).

(This was previously merged as commit 5 in #94732 and then was reverted
in #97905 because of a perf regression caused by commit 4 in #94732.)
2022-06-14 14:52:01 +10:00
Laurențiu Nicola
15f63553d0 ⬆️ rust-analyzer 2022-06-14 07:43:32 +03:00
gco
6f0d614936 Do not try to statically link libstd++ on Solaris
Fixes: Error on bootstrapping : Empty search path given via '-L' (solaris) #97260
2022-06-13 21:29:09 -07:00
Michael Goulet
d1ba2d25d4 Improve parsing errors and suggestions for bad if statements 2022-06-13 20:53:48 -07:00
Mark Drobnak
c814f842e4
Use a private type definition to reduce cfg noise
I checked with t-libs to make sure this is OK to do on stable functions:
https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Replacing.20std.20function.20arg.20type.20with.20private.20type.20def.3F
2022-06-13 20:45:26 -07:00
Mark Drobnak
5d5039e1b8
Disable has_thread_local due to weird issues in some programs
For example, in the following issue the `thread_info` thread-local is
not correctly initialized in debug builds:
https://github.com/Meziu/ctru-rs/issues/60
2022-06-13 20:45:25 -07:00
Ian Chamberlain
bc63d5a26a
Enable thread_local_dtor on horizon OS
Always use fallback thread_local destructor, since __cxa_thread_atexit_impl
is never defined on the target.

See https://github.com/AzureMarker/rust-horizon/pull/2
2022-06-13 20:45:24 -07:00
Ian Chamberlain
82e8cd4834
Add platform-support page for armv6k-nintendo-3ds
Co-authored-by: Mark Drobnak <mark.drobnak@gmail.com>
2022-06-13 20:45:22 -07:00
Ian Chamberlain
a49d14f089
Update libc::stat field names
See https://github.com/Meziu/rust-horizon/pull/14
2022-06-13 20:44:58 -07:00
Ian Chamberlain
19f68a2729
Enable argv support for horizon OS
See https://github.com/Meziu/rust-horizon/pull/9
2022-06-13 20:44:57 -07:00
AzureMarker
06eae30034
Use the right wait_timeout implementation
Our condvar doesn't support setting attributes, like
pthread_condattr_setclock, which the current wait_timeout expects to
have configured.

Switch to a different implementation, following espidf.
2022-06-13 20:44:57 -07:00
AzureMarker
be8b88f2b6
Lower listen backlog to fix accept crashes
See https://github.com/Meziu/rust-horizon/pull/1
2022-06-13 20:44:56 -07:00