Commit Graph

170512 Commits

Author SHA1 Message Date
Yuki Okushi
b770012202
Rollup merge of #98026 - c410-f3r:z-errors, r=petrochenkov
Move some tests to more reasonable directories

r? ```@petrochenkov```
2022-06-15 12:02:02 +09:00
Yuki Okushi
0ee15040d5
Rollup merge of #97822 - compiler-errors:hesitate-to-suggest-intrinsics, r=oli-obk
Filter out intrinsics if we have other import candidates to suggest

Fixes #97618

Also open to just sorting these candidates to be last. Pretty easy to modify the code to do that, too.
2022-06-15 12:02:01 +09:00
bors
ddb6cc8524 Auto merge of #97474 - compiler-errors:if-cond-and-block, r=oli-obk
Improve parsing errors and suggestions for bad `if` statements

1. Parses `if {}` as `if <err> {}` (block-like conditions that are missing a "then" block), and `if true && {}` as `if true && <err> {}` (unfinished binary operation), which is a more faithful recovery and leads to better typeck errors later on.
1. Points out the span of the condition if we don't see a "then" block after it, to help the user understand what is being parsed as a condition (and by elimination, what isn't).
1. Allow `if cond token else { }` to be fixed properly to `if cond { token } else { }`.
1. Fudge with the error messages a bit. This is somewhat arbitrary and I can revert my rewordings if they're useless.

----

Also this PR addresses a strange parsing regression (1.20 -> 1.21) where we chose to reject this piece of code somewhat arbitrarily, even though we should parse it fine:

```rust
fn main() {
    if { if true { return } else { return }; } {}
}
```

For context, all of these other expressions parse correctly:

```rust
fn main() {
    if { if true { return } else { return } } {}
    if { return; } {}
    if { return } {}
    if { return if true { } else { }; } {}
}
```

The parser used a heuristic to determine if the "the parsed `if` condition makes sense as a condition" that did like a one-expr-deep reachability analysis. This should not be handled by the parser though.
2022-06-15 02:58:44 +00:00
Jakob Degen
bc7cd2f351 BitSet perf improvements
This commit makes two changes:
 1. Changes `MaybeLiveLocals` to use `ChunkedBitSet`
 2. Overrides the `fold` method for the iterator for `ChunkedBitSet`
2022-06-14 19:41:58 -07:00
Nika Layzell
1793ee0658 proc_macro: support encoding/decoding Vec<T> 2022-06-14 22:12:46 -04:00
Nika Layzell
7678e6ad85 proc_macro: support encoding/decoding structs with type parameters 2022-06-14 22:12:46 -04:00
EdwinRy
71a98e1a4e Refactor path segment parameter error 2022-06-15 02:50:34 +01:00
EdwinRy
c8b411ebf1 rename function and remove return type 2022-06-15 01:06:40 +01:00
Jacob Pratt
fb05b53745
Remove rustc_deprecated diagnostics 2022-06-14 19:46:13 -04:00
Dan Gohman
1237232aba Add a stability attribute to WASI's try_clone(). 2022-06-14 14:46:22 -07:00
Dan Gohman
67ed99e6d2 Implement stabilization of #[feature(io_safety)].
Implement stabilization of [I/O safety], aka `#[feature(io_safety)]`.

Fixes #87074.

[I/O safety]: https://github.com/rust-lang/rfcs/blob/master/text/3128-io-safety.md
2022-06-14 14:46:22 -07:00
bors
2d1e075079 Auto merge of #96285 - flip1995:pk-vfe, r=nagisa
Introduce `-Zvirtual-function-elimination` codegen flag

Fixes #68262

This PR adds a codegen flag `-Zvirtual-function-elimination` to enable the VFE optimization in LLVM. To make this work, additonal  information has to be added to vtables ([`!vcall_visibility` metadata](https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata) and a `typeid` of the trait). Furthermore, instead of just `load`ing functions, the [`llvm.type.checked.load` intrinsic](https://llvm.org/docs/LangRef.html#llvm-type-checked-load-intrinsic) has to be used to map functions to vtables.

For technical details of the changes, see the commit messages.

I also tested this flag on https://github.com/tock/tock on different boards to verify that this fixes the issue https://github.com/tock/tock/issues/2594. This flag is able to improve the size of the resulting binary by about 8k-9k bytes by removing the unused debug print functions.

[Rendered documentation update](https://github.com/flip1995/rust/blob/pk-vfe/src/doc/rustc/src/codegen-options/index.md#virtual-function-elimination)
2022-06-14 21:37:11 +00:00
Michael Howell
f1d24beb68 rustdoc: add test case for "variadic tuple" search notation 2022-06-14 14:17:20 -07:00
Erik Desjardins
50f6a9ed87 use unchecked mul to compute slice sizes
...since slice sizes can't signed wrap

see https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html

> The total size len * mem::size_of::<T>() of the slice must be no larger than isize::MAX.
2022-06-14 17:09:07 -04:00
Camille GILLOT
34e4d72929 Separate source_span and expn_that_defined from Definitions. 2022-06-14 22:45:51 +02:00
Camille GILLOT
b676edd641 Do not modify the resolver outputs. 2022-06-14 22:44:27 +02:00
Camille GILLOT
603746a35e Make ResolverAstLowering a struct. 2022-06-14 22:44:27 +02:00
Camille GILLOT
47799de35a Separate Definitions and CrateStore from ResolverOutputs. 2022-06-14 22:44:27 +02:00
Michael Howell
2bbf44f655 rustdoc: change "variadic tuple" notation to look less like real syntax 2022-06-14 12:21:38 -07:00
Guillaume Gomez
a70c14aecc Add GUI test for sidebar items expand/collapse 2022-06-14 20:09:09 +02:00
Jacob Hughes
1f7023a519 btreemap-alloc: adjust ui test 2022-06-14 13:54:10 -04:00
Jacob Hughes
417b20835d btreemap-alloc: fix clear impl 2022-06-14 13:54:10 -04:00
Jacob Hughes
dc5951a6e5 BTreeMap: Add alloc param 2022-06-14 13:54:03 -04:00
bors
1f34da9ec8 Auto merge of #96591 - b-naber:transition-to-valtrees-in-type-system, r=lcnr
Use valtrees as the type-system representation for constant values

This is not quite ready yet, there are still some problems with pretty printing and symbol mangling and `deref_const` seems to not work correctly in all cases.

Mainly opening now for a perf-run (which should be good to go, despite the still existing problems).

r? `@oli-obk`

cc `@lcnr` `@RalfJung`
2022-06-14 17:19:38 +00:00
bors
32a86c086e Auto merge of #8999 - Alexendoo:error-pattern, r=xFrednet
Remove error-pattern comments

The `clippy_lints` one [is unused](https://rust-lang.zulipchat.com/#narrow/stream/257328-clippy/topic/.60error-pattern.60), the others in `ui-toml` also appear not to have an effect

changelog: none
2022-06-14 16:54:13 +00:00
Alex Macleod
08cfb8ddc3 Remove error-pattern comments 2022-06-14 16:28:34 +00:00
b-naber
15c1c06522 rebase 2022-06-14 17:57:51 +02:00
Takayuki Maeda
d29915af79 add a test case for decl_macro 2022-06-15 00:42:10 +09:00
Takayuki Maeda
0d24405211 implement MacroData 2022-06-15 00:31:21 +09:00
tamaron
14478bb94b add lint 2022-06-14 23:30:43 +09:00
b-naber
e14b34c386 account for endianness in debuginfo for const args 2022-06-14 16:12:34 +02:00
b-naber
060acc97db rebase 2022-06-14 16:12:28 +02:00
b-naber
8093db6e2b correctly create Scalar for meta info 2022-06-14 16:11:36 +02:00
b-naber
196f3c0e71 fix wrong evaluation in clippy 2022-06-14 16:11:35 +02:00
b-naber
90c4b947aa fix wrong evaluation in clippy 2022-06-14 16:11:35 +02:00
b-naber
6d94f95a20 address review 2022-06-14 16:11:27 +02:00
b-naber
773d8b2e15 address review 2022-06-14 16:11:27 +02:00
b-naber
0a6815a924 bless 32-bit ui tests 2022-06-14 16:09:10 +02:00
b-naber
17323e05ce manually bless 32-bit mir-opt tests 2022-06-14 16:09:06 +02:00
b-naber
dbef6e4507 address review 2022-06-14 16:08:18 +02:00
b-naber
3f4ad95826 fix clippy test failures 2022-06-14 16:08:11 +02:00
b-naber
5c95a3db2a fix clippy test failures 2022-06-14 16:08:11 +02:00
b-naber
90a41050ba implement valtrees as the type-system representation for constant values 2022-06-14 16:07:11 +02:00
b-naber
705d818bd5 implement valtrees as the type-system representation for constant values 2022-06-14 16:07:11 +02:00
bors
872503d918 Auto merge of #78781 - eddyb:measureme-rdpmc, r=oli-obk
Integrate measureme's hardware performance counter support.

*Note: this is a companion to https://github.com/rust-lang/measureme/pull/143, and duplicates some information with it for convenience*

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

## Credits

I'd like to start by thanking `@alyssais,` `@cuviper,` `@edef1c,` `@glandium,` `@jix,` `@Mark-Simulacrum,` `@m-ou-se,` `@mystor,` `@nagisa,` `@puckipedia,` and `@yorickvP,` for all of their help with testing, and valuable insight and suggestions.
Getting here wouldn't have been possible without you!

(If I've forgotten anyone please let me know, I'm going off memory here, plus some discussion logs)

## Summary

This PR adds support to `-Z self-profile` for counting hardware events such as "instructions retired" (as opposed to being limited to time measurements), using the `rdpmc` instruction on `x86_64` Linux.

While other OSes may eventually be supported, preliminary research suggests some kind of kernel extension/driver is required to enable this, whereas on Linux any user can profile (at least) their own threads.

Supporting Linux on architectures other than x86_64 should be much easier (provided the hardware supports such performance counters), and was mostly not done due to a lack of readily available test hardware.
That said, 32-bit `x86` (aka `i686`) would be almost trivial to add and test once we land the initial `x86_64` version (as all the CPU detection code can be reused).

A new flag `-Z self-profile-counter` was added, to control which of the named `measureme` counters is used, and which defaults to `wall-time`, in order to keep `-Z self-profile`'s current functionality unchanged (at least for now).

The named counters so far are:
* `wall-time`: the existing time measurement
    * name chosen for consistency with `perf.rust-lang.org`
    * continues to use `std::time::Instant` for a nanosecond-precision "monotonic clock"
* `instructions:u`: the hardware performance counter usually referred to as "Instructions retired"
    * here "retired" (roughly) means "fully executed"
    * the `:u` suffix is from the Linux `perf` tool and indicates the counter only runs while userspace code is executing, and therefore counts no kernel instructions
        * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this isn't entirely true and why `instructions-minus-irqs:u` should be preferred instead*
* `instructions-minus-irqs:u`: same as `instructions:u`, except the count of hardware interrupts ("IRQs" here for brevity) is subtracted
    * *see [Caveats/Subtracting IRQs](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs) for why this should be preferred over `instructions:u`*
* `instructions-minus-r0420:u`: experimental counter, same as `instructions-minus-irqs:u` but subtracting an undocumented counter (`r0420:u`) instead of IRQs
    * the `rXXXX` notation is again from Linux `perf`, and indicates a "raw" counter, with a hex representation of the low-level counter configuration - this was picked because we still don't *really* know what it is
    * this only exists for (future) testing and isn't included/used in any comparisons/data we've put together so far
    * *see [Challenges/Zen's undocumented 420 counter](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter) for details on how this counter was found and what it does*

---

There are also some additional commits:
* ~~see [Challenges/Rebasing *shouldn't* affect the results, right?](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right) for details on the changes to `rustc_parse` and `rustc_trait_section` (the latter far more dubious, and probably shouldn't be merged, or not as-is)~~
  *  **EDIT**: the effects of these are no long quantifiable, the PR includes reverts for them
* ~~see [Challenges/`jemalloc`: purging will commence in ten seconds](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds) for details on the `jemalloc` change~~
  * this is also separately found in #77162, and we probably want to avoid doing it by default, ideally we'd use the runtime control API `jemalloc` offers (assuming that can stop the timer that's already running, which I'm not sure about)
  * **EDIT**: until we can do this based on `-Z` flags, this commit has also been reverted
* the `proc_macro` change was to avoid randomized hashing and therefore ASLR-like effects

---

**(much later) EDIT**: take any numbers with a grain of salt, they may have changed since initial PR open.

#### Write-up / report

Because of how extensive the full report ended up being, I've kept most of it [on `hackmd.io`](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view), but for convenient access, here are all the sections (with individual links):
<sup>(someone suggested I'd make a backup, so [here it is on the wayback machine](http://web.archive.org/web/20201127164748/https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view) - I'll need to remember to update that if I have to edit the write-up)</sup>

* [**Motivation**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Motivation)

* [**Results**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Results)
    * [**Overhead**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Overhead)
    *Preview (see the report itself for more details):*

    |Counter|Total<br>`instructions-minus-irqs:u`|Overhead from "Baseline"<br>(for all 1903881<br>counter reads)|Overhead from "Baseline"<br>(per each counter read)|
    |-|-|-|-|
    |Baseline|63637621286 ±6||
    |`instructions:u`|63658815885 ±2|&nbsp;&nbsp;+21194599 ±8|&nbsp;&nbsp;+11|
    |`instructions-minus-irqs:u`|63680307361 ±13|&nbsp;&nbsp;+42686075 ±19|&nbsp;&nbsp;+22|
    |`wall-time`|63951958376 ±10275|+314337090 ±10281|+165|

    * [**"Macro" noise (self time)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Macro”-noise-(self-time))
    *Preview (see the report itself for more details):*

    || `wall-time` (ns) | `instructions:u` | `instructions-minus-irqs:u`
    -: | -: | -: | -:
    `typeck` | 5478261360 ±283933373 (±~5.2%) | 17350144522 ±6392 (±~0.00004%) | 17351035832.5 ±4.5 (±~0.00000003%)
    `expand_crate` | 2342096719 ±110465856 (±~4.7%) | 8263777916 ±2937 (±~0.00004%) | 8263708389 ±0 (±~0%)
    `mir_borrowck` | 2216149671 ±119458444 (±~5.4%) | 8340920100 ±2794 (±~0.00003%) | 8341613983.5 ±2.5 (±~0.00000003%)
    `mir_built` | 1269059734 ±91514604 (±~7.2%) | 4454959122 ±1618 (±~0.00004%) | 4455303811 ±1 (±~0.00000002%)
    `resolve_crate` | 942154987.5 ±53068423.5 (±~5.6%) | 3951197709 ±39 (±~0.000001%) | 3951196865 ±0 (±~0%)

    * [**"Micro" noise (individual sampling intervals)**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#“Micro”-noise-(individual-sampling-intervals))

* [**Caveats**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Caveats)
    * [**Disabling ASLR**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Disabling-ASLR)
    * [**Non-deterministic proc macros**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Non-deterministic-proc-macros)
    * [**Subtracting IRQs**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Subtracting-IRQs)
    * [**Lack of support for multiple threads**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Lack-of-support-for-multiple-threads)

* [**Challenges**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Challenges)
    * [**How do we even read hardware performance counters?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#How-do-we-even-read-hardware-performance-counters)
    * [**ASLR: it's free entropy**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#ASLR-it’s-free-entropy)
    * [**The serializing instruction**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#The-serializing-instruction)
    * [**Getting constantly interrupted**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Getting-constantly-interrupted)
    * [**AMD patented time-travel and dubbed it `SpecLockMap`<br><sup>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;or: "how we accidentally unlocked `rr` on AMD Zen"</sup>**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#AMD-patented-time-travel-and-dubbed-it-SpecLockMapnbspnbspnbspnbspnbspnbspnbspnbspor-“how-we-accidentally-unlocked-rr-on-AMD-Zen”)
    * [**`jemalloc`: purging will commence in ten seconds**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#jemalloc-purging-will-commence-in-ten-seconds)
    * [**Rebasing *shouldn't* affect the results, right?**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Rebasing-*shouldn’t*-affect-the-results,-right)
    * [**Epilogue: Zen's undocumented 420 counter**](https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view#Epilogue-Zen’s-undocumented-420-counter)
2022-06-14 13:37:39 +00:00
flip1995
195f208200
Add VFE test for 32 bit
The offset in the llvm.type.checked.load intrinsic differs on 32 bit platforms
2022-06-14 14:50:53 +02:00
flip1995
a93ea7ebc8
Add user documentation for -Zvirtual-function-elimination 2022-06-14 14:50:53 +02:00
flip1995
996c6b7964
Add test for VFE optimization 2022-06-14 14:50:52 +02:00
flip1995
e96e6e2c89
Add metadata generation for vtables when using VFE
This adds the typeid and `vcall_visibility` metadata to vtables when the
-Cvirtual-function-elimination flag is set.

The typeid is generated in the same way as for the
`llvm.type.checked.load` intrinsic from the trait_ref.

The offset that is added to the typeid is always 0. This is because LLVM
assumes that vtables are constructed according to the definition in the
Itanium ABI. This includes an "address point" of the vtable. In C++ this
is the offset in the vtable where information for RTTI is placed. Since
there is no RTTI information in Rust's vtables, this "address point" is
always 0. This "address point" in combination with the offset passed to
the `llvm.type.checked.load` intrinsic determines the final function
that should be loaded from the vtable in the
`WholeProgramDevirtualization` pass in LLVM. That's why the
`llvm.type.checked.load` intrinsics are generated with the typeid of the
trait, rather than with that of the function that is called. This
matches what `clang` does for C++.

The vcall_visibility metadata depends on three factors:

1. LTO level: Currently this is always fat LTO, because LLVM only
   supports this optimization with fat LTO.
2. Visibility of the trait: If the trait is publicly visible, VFE
   can only act on its vtables after linking.
3. Number of CGUs: if there is more than one CGU, also vtables with
   restricted visibility could be seen outside of the CGU, so VFE can
   only act on them after linking.

To reflect this, there are three visibility levels: Public, LinkageUnit,
and TranslationUnit.
2022-06-14 14:50:52 +02:00
flip1995
e1c1d0f8c2
Add llvm.type.checked.load intrinsic
Add the intrinsic

declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type)

This is used in the VFE optimization when lowering loading functions
from vtables to LLVM IR. The `metadata` is used to map the function to
all vtables this function could belong to. This ensures that functions
from vtables that might be used somewhere won't get removed.
2022-06-14 14:50:52 +02:00