Record more artifact sizes during self-profiling.
This PR adds artifact size recording for
- "linked artifacts" (executables, RLIBs, dylibs, static libs)
- object files
- dwo files
- assembly files
- crate metadata
- LLVM bitcode files
- LLVM IR files
- codegen unit size estimates
Currently the identifiers emitted for these are hard-coded as string literals. Is it worth adding constants to https://github.com/rust-lang/measureme/blob/master/measureme/src/rustc.rs instead? We don't do that for query names and the like -- but artifact kinds might be more stable than query names.
Type inference for inline consts
Fixes#78132Fixes#78174Fixes#81857Fixes#89964
Perform type checking/inference of inline consts in the same context as the outer def, similar to what is currently done to closure.
Doing so would require `closure_base_def_id` of the inline const to return the outer def, and since `closure_base_def_id` can be called on non-local crate (and thus have no HIR available), a new `DefKind` is created for inline consts.
The type of the generated anon const can capture lifetime of outer def, so we couldn't just use the typeck result as the type of the inline const's def. Closure has a similar issue, and it uses extra type params `CK, CS, U` to capture closure kind, input/output signature and upvars. I use a similar approach for inline consts, letting it have an extra type param `R`, and then `typeof(InlineConst<[paremt generics], R>)` would just be `R`. In borrowck region requirements are also propagated to the outer MIR body just like it's currently done for closure.
With this PR, inline consts in expression position are quitely usable now; however the usage in pattern position is still incomplete -- since those does not remain in the MIR borrowck couldn't verify the lifetime there. I have left an ignored test as a FIXME.
Some disucssions can be found on [this Zulip thread](https://rust-lang.zulipchat.com/#narrow/stream/260443-project-const-generics/topic/inline.20consts.20typeck).
cc `````@spastorino````` `````@lcnr`````
r? `````@nikomatsakis`````
`````@rustbot````` label A-inference F-inline_const T-compiler
The only reason to use `abort_if_errors` is when the program is so broken that either:
1. later passes get confused and ICE
2. any diagnostics from later passes would be noise
This is never the case for lints, because the compiler has to be able to deal with `allow`-ed lints.
So it can continue to lint and compile even if there are lint errors.
Initialize LLVM time trace profiler on each code generation thread
In https://reviews.llvm.org/D71059 LLVM 11, the time trace profiler was
extended to support multiple threads.
`timeTraceProfilerInitialize` creates a thread local profiler instance.
When a thread finishes `timeTraceProfilerFinishThread` moves a thread
local instance into a global collection of instances. Finally when all
codegen work is complete `timeTraceProfilerWrite` writes data from the
current thread local instance and the instances in global collection
of instances.
Previously, the profiler was intialized on a single thread only. Since
this thread performs no code generation on its own, the resulting
profile was empty.
Update LLVM codegen to initialize & finish time trace profiler on each
code generation thread.
cc `@tmandry`
r? `@wesleywiser`
In https://reviews.llvm.org/D71059 LLVM 11, the time trace profiler was
extended to support multiple threads.
`timeTraceProfilerInitialize` creates a thread local profiler instance.
When a thread finishes `timeTraceProfilerFinishThread` moves a thread
local instance into a global collection of instances. Finally when all
codegen work is complete `timeTraceProfilerWrite` writes data from the
current thread local instance and the instances in global collection
of instances.
Previously, the profiler was intialized on a single thread only. Since
this thread performs no code generation on its own, the resulting
profile was empty.
Update LLVM codegen to initialize & finish time trace profiler on each
code generation thread.
Add LLVM CFI support to the Rust compiler
This PR adds LLVM Control Flow Integrity (CFI) support to the Rust compiler. It initially provides forward-edge control flow protection for Rust-compiled code only by aggregating function pointers in groups identified by their number of arguments.
Forward-edge control flow protection for C or C++ and Rust -compiled code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code share the same virtual address space) will be provided in later work as part of this project by defining and using compatible type identifiers (see Type metadata in the design document in the tracking issue #89653).
LLVM CFI can be enabled with -Zsanitizer=cfi and requires LTO (i.e., -Clto).
Thank you, `@eddyb` and `@pcc,` for all the help!
This commit adds LLVM Control Flow Integrity (CFI) support to the Rust
compiler. It initially provides forward-edge control flow protection for
Rust-compiled code only by aggregating function pointers in groups
identified by their number of arguments.
Forward-edge control flow protection for C or C++ and Rust -compiled
code "mixed binaries" (i.e., for when C or C++ and Rust -compiled code
share the same virtual address space) will be provided in later work as
part of this project by defining and using compatible type identifiers
(see Type metadata in the design document in the tracking issue #89653).
LLVM CFI can be enabled with -Zsanitizer=cfi and requires LTO (i.e.,
-Clto).
Add -Z no-unique-section-names to reduce ELF header bloat.
This change adds a new compiler flag that can help reduce the size of ELF binaries that contain many functions.
By default, when enabling function sections (which is the default for most targets), the LLVM backend will generate different section names for each function. For example, a function `func` would generate a section called `.text.func`. Normally this is fine because the linker will merge all those sections into a single one in the binary. However, starting with [LLVM 12](https://github.com/llvm/llvm-project/commit/ee5d1a04), the backend will also generate unique section names for exception handling, resulting in thousands of `.gcc_except_table.*` sections ending up in the final binary because some linkers like LLD don't currently merge or strip these EH sections (see discussion [here](https://reviews.llvm.org/D83655)). This can bloat the ELF headers and string table significantly in binaries that contain many functions.
The new option is analogous to Clang's `-fno-unique-section-names`, and instructs LLVM to generate the same `.text` and `.gcc_except_table` section for each function, resulting in a smaller final binary.
The motivation to add this new option was because we have a binary that ended up with so many ELF sections (over 65,000) that it broke some existing ELF tools, which couldn't handle so many sections.
Here's our old binary:
```
$ readelf --sections old.elf | head -1
There are 71746 section headers, starting at offset 0x2a246508:
$ readelf --sections old.elf | grep shstrtab
[71742] .shstrtab STRTAB 0000000000000000 2977204c ad44bb 00 0 0 1
```
That's an 11MB+ string table. Here's the new binary using this option:
```
$ readelf --sections new.elf | head -1
There are 43 section headers, starting at offset 0x29143ca8:
$ readelf --sections new.elf | grep shstrtab
[40] .shstrtab STRTAB 0000000000000000 29143acc 0001db 00 0 0 1
```
The whole binary size went down by over 20MB, which is quite significant.
Cleanup LLVM multi-threading checks
The support for runtime multi-threading was removed from LLVM. Calls to
`LLVMStartMultithreaded` became no-ops equivalent to checking if LLVM
was compiled with support for threads http://reviews.llvm.org/D4216.
Add support for artifact size profiling
This adds support for profiling artifact file sizes (incremental compilation artifacts and query cache to begin with).
Eventually we want to track this in perf.rlo so we can ensure that file sizes do not change dramatically on each pull request.
This relies on support in measureme: https://github.com/rust-lang/measureme/pull/169. Once that lands we can update this PR to not point to a git dependency.
This was worked on together with `@michaelwoerister.`
r? `@wesleywiser`
The support for runtime multi-threading was removed from LLVM. Calls to
`LLVMStartMultithreaded` became no-ops equivalent to checking if LLVM
was compiled with support for threads http://reviews.llvm.org/D4216.
This change adds a new compiler flag that can help reduce the size of
ELF binaries that contain many functions.
By default, when enabling function sections (which is the default for most
targets), the LLVM backend will generate different section names for each
function. For example, a function "func" would generate a section called
".text.func". Normally this is fine because the linker will merge all those
sections into a single one in the binary. However, starting with LLVM 12
(llvm/llvm-project@ee5d1a0), the backend will
also generate unique section names for exception handling, resulting in
thousands of ".gcc_except_table.*" sections ending up in the final binary
because some linkers don't currently merge or strip these EH sections.
This can bloat the ELF headers and string table significantly in
binaries that contain many functions.
The new option is analogous to Clang's -fno-unique-section-names, and
instructs LLVM to generate the same ".text" and ".gcc_except_table"
section for each function, resulting in smaller object files and
potentially a smaller final binary.
Create more accurate debuginfo for vtables.
Before this PR all vtables would have the same name (`"vtable"`) in debuginfo. Now they get an unambiguous name that identifies the implementing type and the trait that is being implemented.
This is only one of several possible improvements:
- This PR describes vtables as arrays of `*const u8` pointers. It would nice to describe them as structs where function pointer is represented by a field with a name indicative of the method it maps to. However, this requires coming up with a naming scheme that avoids clashes between methods with the same name (which is possible if the vtable contains multiple traits).
- The PR does not update the debuginfo we generate for the vtable-pointer field in a fat `dyn` pointer. Right now there does not seem to be an easy way of getting ahold of a vtable-layout without also knowing the concrete self-type of a trait object.
r? `@wesleywiser`
Add new tier-3 target: armv7-unknown-linux-uclibceabihf
This change adds a new tier-3 target: armv7-unknown-linux-uclibceabihf
This target is primarily used in embedded linux devices where system resources are slim and glibc is deemed too heavyweight. Cross compilation C toolchains are available [here](https://toolchains.bootlin.com/) or via [buildroot](https://buildroot.org).
The change is based largely on a previous PR #79380 with a few minor modifications. The author of that PR was unable to push the PR forward, and graciously allowed me to take it over.
Per the [target tier 3 policy](https://github.com/rust-lang/rfcs/blob/master/text/2803-target-tier-policy.md), I volunteer to be the "target maintainer".
This is my first PR to Rust itself, so I apologize if I've missed things!
Before this commit all vtables would have the same name "vtable" in
debuginfo. Now they get a name that identifies the implementing type
and the trait that is being implemented.
Implement `#[link_ordinal(n)]`
Allows the use of `#[link_ordinal(n)]` with `#[link(kind = "raw-dylib")]`, allowing Rust to link against DLLs that export symbols by ordinal rather than by name. As long as the ordinal matches, the name of the function in Rust is not required to match the name of the corresponding function in the exporting DLL.
Part of #58713.
Enable AutoFDO.
This largely involves implementing the options debug-info-for-profiling
and profile-sample-use and forwarding them on to LLVM.
AutoFDO can be used on x86-64 Linux like this:
rustc -O -Clink-arg='Wl,--no-rosegment' -Cdebug-info-for-profiling main.rs -o main
perf record -b ./main
create_llvm_prof --binary=main --out=code.prof
rustc -O -Cprofile-sample-use=code.prof main.rs -o main2
Now `main2` will have feedback directed optimization applied to it.
The create_llvm_prof tool can be obtained from this github repository:
https://github.com/google/autofdo
The option -Clink-arg='Wl,--no-rosegment' is necessary to avoid lld
putting an extra RO segment before the executable code, which would make
the binary silently incompatible with create_llvm_prof.
This largely involves implementing the options debug-info-for-profiling
and profile-sample-use and forwarding them on to LLVM.
AutoFDO can be used on x86-64 Linux like this:
rustc -O -Cdebug-info-for-profiling main.rs -o main
perf record -b ./main
create_llvm_prof --binary=main --out=code.prof
rustc -O -Cprofile-sample-use=code.prof main.rs -o main2
Now `main2` will have feedback directed optimization applied to it.
The create_llvm_prof tool can be obtained from this github repository:
https://github.com/google/autofdoFixes#64892.
[aarch64] add target feature outline-atomics
Enable outline-atomics by default as enabled in clang by the following commit
https://reviews.llvm.org/rGc5e7e649d537067dec7111f3de1430d0fc8a4d11
Performance improves by several orders of magnitude when using the LSE instructions
instead of the ARMv8.0 compatible load/store exclusive instructions.
Tested on Graviton2 aarch64-linux with
x.py build && x.py install && x.py test
Fix clippy lints
I'm currently working on allowing clippy to run on librustdoc after a discussion I had with `@Mark-Simulacrum.` So in the meantime, I fixed a few lints on the compiler crates.
Fix use after drop in self-profile with llvm events
self-profile with `-Z self-profile-events=llvm` have failed with a segmentation fault due to this use after drop.
this type of events can be more useful now that the new passmanager is the default.
Enable outline-atomics by default as enabled in clang by the following commit
https://reviews.llvm.org/rGc5e7e649d537067dec7111f3de1430d0fc8a4d11
Performance improves by several orders of magnitude when using the LSE instructions
instead of the ARMv8.0 compatible load/store exclusive instructions.
Tested on Graviton2 aarch64-linux with
x.py build && x.py install && x.py test
The new pass manager is enabled by default in clang since
Clang/LLVM 13. While the discussion about this is still ongoing
(https://lists.llvm.org/pipermail/llvm-dev/2021-August/152305.html)
it's expected that support for the legacy pass manager will be
dropped either in LLVM 14 or 15.
This switches us to use the new pass manager if LLVM >= 13 is used.
Avoid a couple of Symbol::as_str calls in cg_llvm
This should improve performance a tiny bit. Also remove `Symbol::len` and make `SymbolIndex` private.
Migrate in-tree crates to 2021
This replaces #89075 (cherry picking some of the commits from there), and closes#88637 and fixes#89074.
It excludes a migration of the library crates for now (see tidy diff) because we have some pending bugs around macro spans to fix there.
I instrumented bootstrap during the migration to make sure all crates moved from 2018 to 2021 had the compatibility warnings applied first.
Originally, the intent was to support cargo fix --edition within bootstrap, but this proved fairly difficult to pull off. We'd need to architect the check functionality to support running cargo check and cargo fix within the same x.py invocation, and only resetting sysroots on check. Further, it was found that cargo fix doesn't behave too well with "not quite workspaces", such as Clippy which has several crates. Bootstrap runs with --manifest-path ... for all the tools, and this makes cargo fix only attempt migration for that crate. We can't use e.g. --workspace due to needing to maintain sysroots for different phases of compilation appropriately.
It is recommended to skip the mass migration of Cargo.toml's to 2021 for review purposes; you can also use `git diff d6cd2c6c87 -I'^edition = .20...$'` to ignore the edition = 2018/21 lines in the diff.
This fixes compiling things like the `snap` crate after
https://reviews.llvm.org/D105462. I added a test that verifies the
additional attribute gets specified, and confirmed that I can build
cargo with both LLVM 13 and 14 with this change applied.
This just applies the suggested fixes from the compatibility warnings,
leaving any that are in practice spurious in. This is primarily intended to
provide a starting point to identify possible fixes to the migrations (e.g., by
avoiding spurious warnings).
A secondary commit cleans these up where they are false positives (as is true in
many of the cases).
Querify `FnAbi::of_{fn_ptr,instance}` as `fn_abi_of_{fn_ptr,instance}`.
*Note: opening this PR as draft because it's based on #88499*
This more or less replicates the `LayoutOf::layout_of` setup from #88499, to replace `FnAbi::of_{fn_ptr,instance}` with `FnAbiOf::fn_abi_of_{fn_ptr,instance}`, and also route them through queries (which `layout_of` has used for a while).
The two changes at the use sites (other than the names) are:
* return type is now wrapped in `&'tcx`
* the value *is* interned, which may affect performance
* the `extra_args` list is now an interned `&'tcx ty::List<Ty<'tcx>>`
* should be cheap (it's empty for anything other than C variadics)
Theoretically, a `FnAbiOfHelpers` implementer could choose to keep the `Result<...>` instead of eagerly erroring, but the only existing users of these APIs are codegen backends, so they don't (want to) take advantage of this.
At least miri could make use of this, since it prefers propagating errors (it "just" doesn't use `FnAbi` yet - cc `@RalfJung).`
The way this is done is probably less efficient than what is possible, because the queries handle the correctness-oriented API (i.e. the split into `fn` pointers vs instances), whereas a lower-level query could end up with more reuse between different instances with identical signatures.
r? `@nagisa` cc `@oli-obk` `@bjorn3`
Use a separate interner type for UniqueTypeId
Using symbol::Interner makes it very easy to mixup UniqueTypeId symbols
with the global interner. In fact the Debug implementation of
UniqueTypeId did exactly this.
Using a separate interner type also avoids prefilling the interner with
unused symbols and allow for optimizing the symbol interner for parallel
access without negatively affecting the single threaded module codegen.
Using symbol::Interner makes it very easy to mixup UniqueTypeId symbols
with the global interner. In fact the Debug implementation of
UniqueTypeId did exactly this.
Using a separate interner type also avoids prefilling the interner with
unused symbols and allow for optimizing the symbol interner for parallel
access without negatively affecting the single threaded module codegen.
Add -Z panic-in-drop={unwind,abort} command-line option
This PR changes `Drop` to abort if an unwinding panic attempts to escape it, making the process abort instead. This has several benefits:
- The current behavior when unwinding out of `Drop` is very unintuitive and easy to miss: unwinding continues, but the remaining drops in scope are simply leaked.
- A lot of unsafe code doesn't expect drops to unwind, which can lead to unsoundness:
- https://github.com/servo/rust-smallvec/issues/14
- https://github.com/bluss/arrayvec/issues/3
- There is a code size and compilation time cost to this: LLVM needs to generate extra landing pads out of all calls in a drop implementation. This can compound when functions are inlined since unwinding will then continue on to process drops in the callee, which can itself unwind, etc.
- Initial measurements show a 3% size reduction and up to 10% compilation time reduction on some crates (`syn`).
One thing to note about `-Z panic-in-drop=abort` is that *all* crates must be built with this option for it to be sound since it makes the compiler assume that dropping `Box<dyn Any>` will never unwind.
cc https://github.com/rust-lang/lang-team/issues/97
Move *_max methods back to util
change to inline instead of inline(always)
Remove valid_range_exclusive from scalar
Use WrappingRange instead
implement always_valid_for in a safer way
Fix accidental edit
Provide `layout_of` automatically (given tcx + param_env + error handling).
After #88337, there's no longer any uses of `LayoutOf` within `rustc_target` itself, so I realized I could move the trait to `rustc_middle::ty::layout` and redesign it a bit.
This is similar to #88338 (and supersedes it), but at no ergonomic loss, since there's no funky `C: LayoutOf<Ty = Ty>` -> `Ty: TyAbiInterface<C>` generic `impl` chain, and each `LayoutOf` still corresponds to one `impl` (of `LayoutOfHelpers`) for the specific context.
After this PR, this is what's needed to get `trait LayoutOf` (with the `layout_of` method) implemented on some context type:
* `TyCtxt`, via `HasTyCtxt`
* `ParamEnv`, via `HasParamEnv`
* a way to transform `LayoutError`s into the desired error type
* an error type of `!` can be paired with having `cx.layout_of(...)` return `TyAndLayout` *without* `Result<...>` around it, such as used by codegen
* this is done through a new `LayoutOfHelpers` trait (and so is specifying the type of `cx.layout_of(...)`)
When going through this path (and not bypassing it with a manual `impl` of `LayoutOf`), the end result is that only the error case can be customized, the query itself and the success paths are guaranteed to be uniform.
(**EDIT**: just noticed that because of the supertrait relationship, you cannot actually implement `LayoutOf` yourself, the blanket `impl` fully covers all possible context types that could ever implement it)
Part of the motivation for this shape of API is that I've been working on querifying `FnAbi::of_*`, and what I want/need to introduce for that looks a lot like the setup in this PR - in particular, it's harder to express the `FnAbi` methods in `rustc_target`, since they're much more tied to `rustc` concepts.
r? `@nagisa` cc `@oli-obk` `@bjorn3`
Include debug info for the allocator shim
Issue Details:
In some cases it is necessary to generate an "allocator shim" to forward various Rust allocation functions (e.g., `__rust_alloc`) to an underlying function (e.g., `malloc`). However, since this allocator shim is a manually created LLVM module it is not processed via the normal module processing code and so no debug info is generated for it (if debugging info is enabled).
Fix Details:
* Modify the `debuginfo` code to allow creating debug info for a module without a `CodegenCx` (since it is difficult, and expensive, to create one just to emit some debug info).
* After creating the allocator shim add in basic debug info.
Path remapping: Make behavior of diagnostics output dependent on presence of --remap-path-prefix.
This PR fixes a regression (#87745) with `--remap-path-prefix` where the flag stopped causing diagnostic messages to be remapped as well. The regression was introduced in https://github.com/rust-lang/rust/pull/83813 where we erroneously assumed that remapping of diagnostic messages was not desired anymore (because #70642 partially undid that functionality with nobody objecting).
The issue is fixed by making `--remap-path-prefix` remap diagnostic messages again, including for paths that have been remapped in upstream crates (e.g. the standard library). This means that "sysroot-localization" (implemented in #70642) is also disabled if `rustc` is invoked with `--remap-path-prefix`. The assumption is that once someone starts explicitly remapping paths they also don't want paths to their local Rust installation in their build output.
In the future we might want to give more fine-grained control over this behavior via compiler flags (see https://github.com/rust-lang/rfcs/pull/3127 for a related RFC). For now this PR is intended as a regression fix.
This PR is an alternative to https://github.com/rust-lang/rust/pull/88191, which makes diagnostic messages be remapped unconditionally. That approach, however, would effectively revert #70642.
Fixes https://github.com/rust-lang/rust/issues/87745.
cc `@cbeuw`
r? `@ghost`
Issue Details:
In some cases it is necessary to generate an "allocator shim" to forward various Rust allocation functions (e.g., `__rust_alloc`) to an underlying function (e.g., `malloc`). However, since this allocator shim is a manually created LLVM module it is not processed via the normal module processing code and so no debug info is generated for it (if debugging info is enabled).
Fix Details:
* Modify the `debuginfo` code to allow creating debug info for a module without a `CodegenCx` (since it is difficult, and expensive, to create one just to emit some debug info).
* After creating the allocator shim add in basic debug info.
rustc_target: `TyAndLayout::field` should never error.
This refactor (making `TyAndLayout::field` return `TyAndLayout` without any `Result` around it) is based on a simple observation, regarding `TyAndLayout::field`:
If `cx.layout_of(ty)` succeeds (for some `cx` and `ty`), then `.field(cx, i)` on the resulting `TyAndLayout` should *always* succeed in computing `cx.layout_of(field_ty)` (where `field_ty` is the type of the `i`th field of `ty`).
The reason for this is that no matter which field is chosen, `cx.layout_of(field_ty)` *will have already been computed*, as part of computing `cx.layout_of(ty)`, as we cannot determine the layout of *any* type without considering the layouts of *all* of its fields.
And so it should be fine to turn any errors into ICEs, since they likely indicate a `cx` mismatch, or some other edge case that is due to a compiler bug (as opposed to ever being an user-facing error).
<hr/>
Each commit should probably be reviewed separately, though note that there's some `where` clauses (in `rustc_target::abi::call::*`) that change in most commits.
cc `@nagisa` `@oli-obk`
S390x inline asm
This adds register definitions and constraint codes for the s390x general and floating point registers necessary for fixing #85931; as well as a few tests.
Further testing is needed, but I am a little unsure of what specific tests should be added to `src/test/assembly/asm/s390x.rs` to address this.
lazily "compute" anon const default substs
Continuing the work of #83086, this implements the discussed solution for the [unused substs problem](https://github.com/rust-lang/project-const-generics/blob/master/design-docs/anon-const-substs.md#unused-substs). As of now, anonymous constants inherit all of their parents generics, even if they do not use them, e.g. in `fn foo<T, const N: usize>() -> [T; N + 1]`, the array length has `T` as a generic parameter even though it doesn't use it. These *unused substs* cause some backwards incompatible, and imo incorrect behavior, e.g. #78369.
---
We do not actually filter any generic parameters here and the `default_anon_const_substs` query still a dummy which only checks that
- we now prevent the previously existing query cycles and are able to call `predicates_of(parent)` when computing the substs of anonymous constants
- the default anon consts substs only include the typeflags we assume it does.
Implementing that filtering will be left as future work.
---
The idea of this PR is to delay the creation of the anon const substs until after we've computed `predicates_of` for the parent of the anon const. As the predicates of the parent can however contain the anon const we still have to create a `ty::Const` for it.
We do this by changing the substs field of `ty::Unevaluated` to an option and modifying accesses to instead call the method `unevaluated.substs(tcx)` which returns the substs as before. If the substs - now `substs_` - of `ty::Unevaluated` are `None`, it means that the anon const currently has its default substs, i.e. the substs it has when first constructed, which are the generic parameters it has available. To be able to call `unevaluated.substs(tcx)` in a `TypeVisitor`, we add the non-defaulted method `fn tcx_for_anon_const_substs(&self) -> Option<TyCtxt<'tcx>>`. In case `tcx_for_anon_const_substs` returns `None`, unknown anon const default substs are skipped entirely.
Even when `substs_` is `None` we still have to treat the constant as if it has its default substs. To do this, `TypeFlags` are modified so that it is clear whether they can still change when *exposing* any anon const default substs. A new flag, `HAS_UNKNOWN_DEFAULT_CONST_SUBSTS`, is added in case some default flags are missing.
The rest of this PR are some smaller changes to either not cause cycles by trying to access the default anon const substs too early or to be able to access the `tcx` in previously unused locations.
cc `@rust-lang/project-const-generics`
r? `@nikomatsakis`
Use custom wrap-around type instead of RangeInclusive
Two reasons:
1. More memory is allocated than necessary for `valid_range` in `Scalar`. The range is not used as an iterator and `exhausted` is never used.
2. `contains`, `count` etc. methods in `RangeInclusive` are doing very unhelpful(and dangerous!) things when used as a wrap-around range. - In general this PR wants to limit potentially confusing methods, that have a low probability of working.
Doing a local perf run, every metric shows improvement except for instructions.
Max-rss seem to have a very consistent improvement.
Sorry - newbie here, probably doing something wrong.
Stop emitting the `dso_local` LLVM attribute for external symbols under the static relocation model on macOS.
This matches Clang's behavior:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1038-L1040)
Even if `dso_local` were properly supported in this way on macOS, it seems
incorrect to add this annotation as liberally as we did. The `dso_local`
annotation is for symbols that ultimately end up in the same linkage unit, but
we were adding this annotation even for `static` values inside `extern` blocks
marked with `#[link(type="framework")]`, which should be considered dynamically
linked. Note that Clang likewise avoids emitting `dso_local` for `dllimport`
symbols:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1005-L1007)
This issue caused breakage in the `ring` crate, which links to a symbol defined
in `Security.framework` that ultimately resolves to address `0x0`:
b94d61e044/src/rand.rs (L390)
For this symbol, the use of `dso_local` causes LLVM to emit a relocation of
type `X86_64_RELOC_SIGNED`, which is a 32-bit signed PC-relative offset. If the
binary is large enough, `0x0` might be out of range, and the link will fail.
Avoiding `dso_local` causes LLVM to use the GOT instead, emitting a relocation
of type `X86_64_RELOC_GOT_LOAD`, which will properly handle the large offset
and cause the link to succeed.
As a side note, the static relocation model is effectively deprecated for
security reasons on macOS, as it prohibits PIE. It's also completely
unsupported on Apple Silicon, so I don't think it's worth going to the effort
of properly supporting this model on that platform.
Upgrade to LLVM 13
Work in progress update to LLVM 13. Main changes:
* InlineAsm diagnostics reported using SrcMgr diagnostic kind are now handled. Previously these used a separate diag handler.
* Codegen tests are updated for additional attributes.
* Some data layouts have changed.
* Switch `#[used]` attribute from `llvm.used` to `llvm.compiler.used` to avoid SHF_GNU_RETAIN flag introduced in https://reviews.llvm.org/D97448, which appears to trigger a bug in older versions of gold.
* Set `LLVM_INCLUDE_TESTS=OFF` to avoid Python 3.6 requirement.
Upstream issues:
* ~~https://bugs.llvm.org/show_bug.cgi?id=51210 (InlineAsm diagnostic reporting for module asm)~~ Fixed by 1558bb80c0.
* ~~https://bugs.llvm.org/show_bug.cgi?id=51476 (Miscompile on AArch64 due to incorrect comparison elimination)~~ Fixed by 81b106584f.
* https://bugs.llvm.org/show_bug.cgi?id=51207 (Can't set custom section flags anymore). Problematic change reverted in our fork, https://reviews.llvm.org/D107216 posted for upstream revert.
* https://bugs.llvm.org/show_bug.cgi?id=51211 (Regression in codegen for #83623). This is an optimization regression that we may likely have to eat for this release. The fix for #83623 was based on an incorrect premise, and this needs to be properly addressed in the MergeICmps pass.
The [compile-time impact](https://perf.rust-lang.org/compare.html?start=ef9549b6c0efb7525c9b012148689c8d070f9bc0&end=0983094463497eec22d550dad25576a894687002) is mixed, but quite positive as LLVM upgrades go.
The LLVM 13 final release is scheduled for Sep 21st. The current nightly is scheduled for stable release on Oct 21st.
r? `@ghost`
This matches Clang's behavior:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1038-L1040)
Even if `dso_local` were properly supported in this way on macOS, it seems
incorrect to add this annotation as liberally as we did. The `dso_local`
annotation is for symbols that ultimately end up in the same linkage unit, but
we were adding this annotation even for `static` values inside `extern` blocks
marked with `#[link(type="framework")]`, which should be considered dynamically
linked. Note that Clang likewise avoids emitting `dso_local` for `dllimport`
symbols:
973cb2c326/clang/lib/CodeGen/CodeGenModule.cpp (L1005-L1007)
This issue caused breakage in the `ring` crate, which links to a symbol defined
in `Security.framework` that ultimately resolves to address `0x0`:
b94d61e044/src/rand.rs (L390)
For this symbol, the use of `dso_local` causes LLVM to emit a relocation of
type `X86_64_RELOC_SIGNED`, which is a 32-bit signed PC-relative offset. If the
binary is large enough, `0x0` might be out of range, and the link will fail.
Avoiding `dso_local` causes LLVM to use the GOT instead, emitting a relocation
of type `X86_64_RELOC_GOT_LOAD`, which will properly handle the large offset
and cause the link to succeed.
As a side note, the static relocation model is effectively deprecated for
security reasons on macOS, as it prohibits PIE. It's also completely
unsupported on Apple Silicon, so I don't think it's worth going to the effort
of properly supporting this model on that platform.
The TargetMachine may be referencing data in the context. In
particular, at least the GlobalISel instruction selector stored
in the TM may reference a TrackedMDNode DebugLoc that destruction
of the TargetMachine will try to untrack.
The #[used] attribute explicitly only requires symbols to be
retained in object files, but allows the linker to drop them
if dead. This corresponds to llvm.compiler.used semantics.
The motivation to change this *now* is that https://reviews.llvm.org/D97448
starts emitting #[used] symbols into unique sections with
SHF_GNU_RETAIN flag. This triggers a bug in some version of gold,
resulting in the ARGV_INIT_ARRAY symbol part of the .init_array
section to be incorrectly placed.
Fixes#85019
A `SourceFile` created during compilation may have a relative
path (e.g. if rustc itself is invoked with a relative path).
When we write out crate metadata, we convert all relative paths
to absolute paths using the current working direction.
However, the working directory is not included in the crate hash.
This means that the crate metadata can change while the crate
hash remains the same. Among other problems, this can cause a
fingerprint mismatch ICE, since incremental compilation uses
the crate metadata hash to determine if a foreign query is green.
This commit moves the field holding the working directory from
`Session` to `Options`, including it as part of the crate hash.
Add support for clobber_abi to asm!
This PR adds the `clobber_abi` feature that was proposed in #81092.
Fixes#81092
cc `@rust-lang/wg-inline-asm`
r? `@nagisa`
Name the captured upvars for closures/generators in debuginfo
Previously, debuggers print closures as something like
```
y::main::closure-0 (0x7fffffffdd34)
```
The pointer actually references to an upvar. It is not very obvious, especially for beginners.
It's because upvars don't have names before, as they are packed into a tuple. This PR names the upvars, so we can expect to see something like
```
y::main::closure-0 {_captured_ref__b: 0x[...]}
```
r? `@tmandry`
Discussed at https://github.com/rust-lang/rust/pull/84752#issuecomment-831639489 .
Implement `black_box` using intrinsic
Introduce `black_box` intrinsic, as suggested in https://github.com/rust-lang/rust/pull/87590#discussion_r680468700.
This is still codegenned as empty inline assembly for LLVM. For MIR interpretation and cranelift it's treated as identity.
cc `@Amanieu` as this is related to inline assembly
cc `@bjorn3` for rustc_codegen_cranelift changes
cc `@RalfJung` as this affects MIRI
r? `@nagisa` I suppose
The new implementation allows some `memcpy`s to be optimized away,
so the uninit value in ui/sanitize/memory.rs is constructed directly
onto the return place. Therefore the sanitizer now says that the
value is allocated by `main` rather than `random`.
LLVM codegen: Don't emit zero-sized padding for fields
Currently padding is emitted before fields of a struct and at the end of the struct regardless of the ABI. Even if no padding is required zero-sized padding fields are emitted. This is not useful and - more importantly - it make it impossible to generate the exact vector types that LLVM expects for certain ARM SIMD intrinsics. This change should unblock the implementation of many ARM intrinsics using the `unadjusted` ABI, see https://github.com/rust-lang/stdarch/issues/1143#issuecomment-827404092.
This is a proof of concept only because the field lookup now takes O(number of fields) time compared to O(1) before since it recalculates the mapping at every lookup. I would like to find out how big the performance impact actually is before implementing caching or restricting this behavior to the `unadjusted` ABI.
cc `@SparrowLii` `@bjorn3`
([Discussion on internals](https://internals.rust-lang.org/t/feature-request-add-a-way-in-rustc-for-generating-struct-type-llvm-ir-without-paddings/15007))
The indexes into the VaListImpl struct used on aarch64 ABI (not macos/ios) are hard-coded which is brittle so we replace them with the usual lookup.
The varargs ffi is tested in ui/abi/variadic-ffi.rs on aarch64 Linux.
Rather than relying on `getPointerElementType()` from LLVM function
pointers, we now pass the function type explicitly when building `call`
or `invoke` instructions.
emit_aapcs_va_arg() emits hardcoded field indexes to access the
aarch64-specific `VaListImpl` struct. Due to the removed padding
those indexes have changed.
LLVM codegen: Don't emit zero-sized padding for whiles because that has no use and makes it impossible to generate the return types that LLVM expects for certain ARM SIMD intrinsics.
rfc3052 followup: Remove authors field from Cargo manifests
Since RFC 3052 soft deprecated the authors field, hiding it from
crates.io, docs.rs, and making Cargo not add it by default, and it is
not generally up to date/useful information for contributors, we may as well
remove it from crates in this repo.
Since RFC 3052 soft deprecated the authors field anyway, hiding it from
crates.io, docs.rs, and making Cargo not add it by default, and it is
not generally up to date/useful information, we should remove it from
crates in this repo.
Use existing declaration of rust_eh_personality
If crate declares `rust_eh_personality`, re-use existing declaration
as otherwise attempts to set function attributes that follow the
declaration will fail (unless it happens to have exactly the same
type signature as the one predefined in the compiler).
Fixes#70117.
Fixes https://github.com/rust-lang/rust/pull/81469#issuecomment-809428126; probably.
Remove nondeterminism in multiple-definitions test
Compare all fields in `DllImport` when sorting to avoid nondeterminism in the error for multiple inconsistent definitions of an extern function. Restore the multiple-definitions test.
Resolves#87084.
CTFE/Miri engine Pointer type overhaul
This fixes the long-standing problem that we are using `Scalar` as a type to represent pointers that might be integer values (since they point to a ZST). The main problem is that with int-to-ptr casts, there are multiple ways to represent the same pointer as a `Scalar` and it is unclear if "normalization" (i.e., the cast) already happened or not. This leads to ugly methods like `force_mplace_ptr` and `force_op_ptr`.
Another problem this solves is that in Miri, it would make a lot more sense to have the `Pointer::offset` field represent the full absolute address (instead of being relative to the `AllocId`). This means we can do ptr-to-int casts without access to any machine state, and it means that the overflow checks on pointer arithmetic are (finally!) accurate.
To solve this, the `Pointer` type is made entirely parametric over the provenance, so that we can use `Pointer<AllocId>` inside `Scalar` but use `Pointer<Option<AllocId>>` when accessing memory (where `None` represents the case that we could not figure out an `AllocId`; in that case the `offset` is an absolute address). Moreover, the `Provenance` trait determines if a pointer with a given provenance can be cast to an integer by simply dropping the provenance.
I hope this can be read commit-by-commit, but the first commit does the bulk of the work. It introduces some FIXMEs that are resolved later.
Fixes https://github.com/rust-lang/miri/issues/841
Miri PR: https://github.com/rust-lang/miri/pull/1851
r? `@oli-obk`
Do not allow JSON targets to set is-builtin: true
Note that this will affect (and make builds fail for) all of the projects out there that have target files invalid in this way. Crater, however, does not really cover these kinds of the codebases, so it is quite difficult to measure the impact. That said, the target files invalid in this way can start causing build failures each time LLVM is upgraded, anyway, so it is probably a good opportunity to disallow this property, entirely.
Another approach considered was to simply not parse this field anymore, which would avoid making the builds explicitly fail, but it wasn't clear to me if `is-builtin` was always set unintentionally… In case this was the case, I'd expect people to file a feature request stating specifically for what purpose they were using `is-builtin`.
Fixes#86017
This resolves all the problems we had around "normalizing" the representation of a Scalar in case it carries a Pointer value: we can just use Pointer if we want to have a value taht we are sure is already normalized.
Add clobber-only register classes for asm!
These are needed to properly express a function call ABI using a clobber
list, even though we don't support passing actual values into/out of
these registers.
Improve opaque pointers support
Opaque pointers are coming, and rustc is not ready.
This adds partial support by passing an explicit load type to LLVM. Two issues I've encountered:
* The necessary type was not available at the point where non-temporal copies were generated. I've pushed the code for that upwards out of the memcpy implementation and moved the position of a cast to make do with the types we have available. (I'm not sure that cast is needed at all, but have retained it in the interest of conservativeness.)
* The `PlaceRef::project_deref()` function used during debuginfo generation seems to be buggy in some way -- though I haven't figured out specifically what it does wrong. Replacing it with `load_operand().deref()` did the trick, but I don't really know what I'm doing here.
These are needed to properly express a function call ABI using a clobber
list, even though we don't support passing actual values into/out of
these registers.
If crate declares `rust_eh_personality`, re-use existing declaration
as otherwise attempts to set function attributes that follow the
declaration will fail (unless it happens to have exactly the same
type signature as the one predefined in the compiler).
Add support for raw-dylib with stdcall, fastcall functions
Next stage of work for #58713: allow `extern "stdcall"` and `extern "fastcall"` with `#[link(kind = "raw-dylib")]`.
I've deliberately omitted support for vectorcall, as that doesn't currently work, and I wanted to get this out for review. (I haven't really investigated the vectorcall failure much yet, but at first (very cursory) glance it appears that the problem is elsewhere.)
This makes load generation compatible with opaque pointers.
The generation of nontemporal copies still accesses the pointer
element type, as fixing this requires more movement.
- Closures in external crates may get compiled in because of
monomorphization. We should store names of captured variables
in `optimized_mir`, so that they are written into the metadata
file and we can use them to generate debuginfo.
- If there are breakpoints inside closures, the names of captured
variables stored in `optimized_mir` can be used to print them.
Now the name is more precise when disjoint fields are captured.
Previously, debuggers print closures as something like
```
y::main::closure-0 (0x7fffffffdd34)
```
The pointer actually references to an upvar. It is not
very obvious, especially for beginners.
It's because upvars don't have names before, as they
are packed into a tuple. This commit names the upvars,
so we can expect to see something like
```
y::main::closure-0 {_captured_ref__b: 0x[...]}
```
LLVM target name does not necessarily match the Rust target name and it
can be confusing when the ICE message is describing a target other than
has been specified on the command line.
Refactor linker code
This merges `LinkerInfo` into `CrateInfo` as there is no reason to keep them separate. `LinkerInfo::to_linker` is merged into `get_linker` as both have different logic for each linker type and `to_linker` is directly called after `get_linker`. Also contains a couple of small cleanups.
See the individual commits for all changes.
Previously, only the fields would be displayed with no indication of the
variant name. If you already knew the enum was univariant, this was ok
but if the enum was univariant because of layout, for example, a
`Result<T, !>` then it could be very confusing which variant was the
active one.
Previously, directly tagged enums had a `variant$` field which would
show the name of the active variant. We now show the variant using a
`[variant]` synthetic item just like we do for niche-layout enums.
Improve debug symbol names to avoid ambiguity and work better with MSVC's debugger
There are several cases where names of types and functions in the debug info are either ambiguous, or not helpful, such as including ambiguous placeholders (e.g., `{{impl}}`, `{{closure}}` or `dyn _'`) or dropping qualifications (e.g., for dynamic types).
Instead, each debug symbol name should be unique and useful:
* Include disambiguators for anonymous `DefPathDataName` (closures and generators), and unify their formatting when used as a path-qualifier vs item being qualified.
* Qualify the principal trait for dynamic types.
* If there is no principal trait for a dynamic type, emit all other traits instead.
* Respect the `qualified` argument when emitting ref and pointer types.
* For implementations, emit the disambiguator.
* Print const generics when emitting generic parameters or arguments.
Additionally, when targeting MSVC, its debugger treats many command arguments as C++ expressions, even when the argument is defined to be a symbol name. As such names in the debug info need to be more C++-like to be parsed correctly:
* Avoid characters with special meaning (`#`, `[`, `"`, `+`).
* Never start a name with `<` or `{` as this is treated as an operator.
* `>>` is always treated as a right-shift, even when parsing generic arguments (so add a space to avoid this).
* Emit function declarations using C/C++ style syntax (e.g., leading return type).
* Emit arrays as a synthetic `array$<type, size>` type.
* Include a `$` in all synthetic types as this is a legal character for C++, but not Rust (thus we avoid collisions with user types).
Add support for leaf function frame pointer elimination
This PR adds ability for the target specifications to specify frame
pointer emission type that's not just “always” or “whatever cg decides”.
In particular there's a new mode that allows omission of the frame
pointer for leaf functions (those that don't call any other functions).
We then set this new mode for Aarch64-based Apple targets.
Fixes#86196
Remove unused dependencies from compiler crates
Various compiler crates have dependencies that they don't appear to use. I used some scripting to detect such dependencies, filtered them based on some manual review, and removed those that do indeed appear to be entirely unused.
There are several cases where names of types and functions in the debug info are either ambiguous, or not helpful, such as including ambiguous placeholders (e.g., `{{impl}}`, `{{closure}}` or `dyn _'`) or dropping qualifications (e.g., for dynamic types).
Instead, each debug symbol name should be unique and useful:
* Include disambiguators for anonymous `DefPathDataName` (closures and generators), and unify their formatting when used as a path-qualifier vs item being qualified.
* Qualify the principal trait for dynamic types.
* If there is no principal trait for a dynamic type, emit all other traits instead.
* Respect the `qualified` argument when emitting ref and pointer types.
* For implementations, emit the disambiguator.
* Print const generics when emitting generic parameters or arguments.
Additionally, when targeting MSVC, its debugger treats many command arguments as C++ expressions, even when the argument is defined to be a symbol name. As such names in the debug info need to be more C++-like to be parsed correctly:
* Avoid characters with special meaning (`#`, `[`, `"`, `+`).
* Never start a name with `<` or `{` as this is treated as an operator.
* `>>` is always treated as a right-shift, even when parsing generic arguments (so add a space to avoid this).
* Emit function declarations using C/C++ style syntax (e.g., leading return type).
* Emit arrays as a synthetic `array$<type, size>` type.
* Include a `$` in all synthetic types as this is a legal character for C++, but not Rust (thus we avoid collisions with user types).
This PR adds ability for the target specifications to specify frame
pointer emission type that's not just “always” or “whatever cg decides”.
In particular there's a new mode that allows omission of the frame
pointer for leaf functions (those that don't call any other functions).
We then set this new mode for Aarch64-based Apple targets.
Fixes#86196
Allow loading of llvm plugins on nightly
Based on a discussion in #82734 / with `@wsmoses.`
Mainly moves [this](0149bc4e7e) behind a -Z flag, so it can only be used on nightly,
as requested by `@nagisa` in https://github.com/rust-lang/rust/issues/82734#issuecomment-835863940
This change allows loading of llvm plugins like Enzyme.
Right now it also requires a shared library LLVM build of rustc for symbol resolution.
```rust
// test.rs
extern { fn __enzyme_autodiff(_: usize, ...) -> f64; }
fn square(x : f64) -> f64 {
return x * x;
}
fn main() {
unsafe {
println!("Hello, world {} {}!", square(3.0), __enzyme_autodiff(square as usize, 3.0));
}
}
```
```
./rustc test.rs -Z llvm-plugins="./LLVMEnzyme-12.so" -C passes="enzyme"
./test
Hello, world 9 6!
```
I will try to figure out how to simplify the usage and get this into stable in a later iteration,
but having this on nightly will already help testing further steps.
Disable the machine outliner by default
This addresses a codegen-issue that needs to be fixed upstream in LLVM.
While we wait for the fix, we can disable it.
Verified manually that the outliner is no longer run when
`-Copt-level=z` is specified, and also that you can override this with
`-Cllvm-args=-enable-machine-outliner` if you need it anyway.
A regression test is not really feasible in this instance, given that we
do not have any minimal reproducers.
Fixes#85351
cc `@pnkfelix`
Driver improvements
This PR contains a couple of cleanups for the driver and a few small improvements for the custom codegen backend interface. It also implements `--version` and `-Cpasses=list` support for custom codegen backends.
Partial support for raw-dylib linkage
First cut of functionality for issue #58713: add support for `#[link(kind = "raw-dylib")]` on `extern` blocks in lib crates compiled to .rlib files. Does not yet support `#[link_name]` attributes on functions, or the `#[link_ordinal]` attribute, or `#[link(kind = "raw-dylib")]` on `extern` blocks in bin crates; I intend to publish subsequent PRs to fill those gaps. It's also not yet clear whether this works for functions in `extern "stdcall"` blocks; I also intend to investigate that shortly and make any necessary changes as a follow-on PR.
This implementation calls out to an LLVM function to construct the actual `.idata` sections as temporary `.lib` files on disk and then links those into the generated .rlib.
BPF target support
This adds `bpfel-unknown-none` and `bpfeb-unknown-none`, two new no_std targets that generate little and big endian BPF. The approach taken is very similar to the cuda target, where `TargetOptions::obj_is_bitcode` is enabled and code generation is done by the linker.
I added the targets to `dist-various-2`. There are [some tests](https://github.com/alessandrod/bpf-linker/tree/main/tests/assembly) in bpf-linker and I'm planning to add more. Those are currently not ran as part of rust CI.
This addresses a codegen-issue that needs to be fixed upstream in LLVM.
While we wait for the fix, we can disable it.
Verified manually that the outliner is no longer run when
`-Copt-level=z` is specified, and also that you can override this with
`-Cllvm-args=-enable-machine-outliner` if you need it anyway.
A regression test is not really feasible in this instance, given that we
do not have any minimal reproducers.
Fixes#85351
This does not yet support #[link_name] attributes on functions, the #[link_ordinal]
attribute, #[link(kind = "raw-dylib")] on extern blocks in bin crates, or
stdcall functions on 32-bit x86.
rustc: Store metadata-in-rlibs in object files
This commit updates how rustc compiler metadata is stored in rlibs.
Previously metadata was stored as a raw file that has the same format as
`--emit metadata`. After this commit, however, the metadata is encoded
into a small object file which has one section which is the contents of
the metadata.
The motivation for this commit is to fix a common case where #83730
arises. The problem is that when rustc crates a `dylib` crate type it
needs to include entire rlib files into the dylib, so it passes
`--whole-archive` (or the equivalent) to the linker. The problem with
this, though, is that the linker will attempt to read all files in the
archive. If the metadata file were left as-is (today) then the linker
would generate an error saying it can't read the file. The previous
solution was to alter the rlib just before linking, creating a new
archive in a temporary directory which has the metadata file removed.
This problem from before this commit is now removed if the metadata file
is stored in an object file that the linker can read. The only caveat we
have to take care of is to ensure that the linker never actually
includes the contents of the object file into the final output. We apply
similar tricks as the `.llvmbc` bytecode sections to do this.
This involved changing the metadata loading code a bit, namely updating
some of the LLVM C APIs used to use non-deprecated ones and fiddling
with the lifetimes a bit to get everything to work out. Otherwise though
this isn't intended to be a functional change really, only that metadata
is stored differently in archives now.
This should end up fixing #83730 because by default dylibs will no
longer have their rlib dependencies "altered" meaning that
split-debuginfo will continue to have valid paths pointing at the
original rlibs. (note that we still "alter" rlibs if LTO is enabled to
remove Rust object files and we also "alter" for the #[link(cfg)]
feature, but that's rarely used).
Closes#83730
This commit updates how rustc compiler metadata is stored in rlibs.
Previously metadata was stored as a raw file that has the same format as
`--emit metadata`. After this commit, however, the metadata is encoded
into a small object file which has one section which is the contents of
the metadata.
The motivation for this commit is to fix a common case where #83730
arises. The problem is that when rustc crates a `dylib` crate type it
needs to include entire rlib files into the dylib, so it passes
`--whole-archive` (or the equivalent) to the linker. The problem with
this, though, is that the linker will attempt to read all files in the
archive. If the metadata file were left as-is (today) then the linker
would generate an error saying it can't read the file. The previous
solution was to alter the rlib just before linking, creating a new
archive in a temporary directory which has the metadata file removed.
This problem from before this commit is now removed if the metadata file
is stored in an object file that the linker can read. The only caveat we
have to take care of is to ensure that the linker never actually
includes the contents of the object file into the final output. We apply
similar tricks as the `.llvmbc` bytecode sections to do this.
This involved changing the metadata loading code a bit, namely updating
some of the LLVM C APIs used to use non-deprecated ones and fiddling
with the lifetimes a bit to get everything to work out. Otherwise though
this isn't intended to be a functional change really, only that metadata
is stored differently in archives now.
This should end up fixing #83730 because by default dylibs will no
longer have their rlib dependencies "altered" meaning that
split-debuginfo will continue to have valid paths pointing at the
original rlibs. (note that we still "alter" rlibs if LTO is enabled to
remove Rust object files and we also "alter" for the #[link(cfg)]
feature, but that's rarely used).
Closes#83730
Previously, we would generate a single struct with the layout of the
dataful variant plus an extra field whose name contained the value of
the niche (this would only really work for things like `Option<&_>`
where we can determine that the `None` case maps to `0` but for enums
that have multiple tag only variants, this doesn't work).
Now, we generate a union of two structs, one which is the layout of the
dataful variant and one which just has a way of reading the
discriminant. We also generate an enum which maps the discriminant value
to the tag only variants.
We also encode information about the range of values which correspond to
the dataful variant in the type name and then use natvis to determine
which union field we should display to the user.
As a result of this change, all niche-layout enums render correctly in
WinDbg and Visual Studio!
This wasn't necessary for msvc and caused issues where different types
with the same name such as different instantiations of `Option<T>` would
have colliding debuginfo. This confused the debugger which would pick
one of the type definitions and use for all types with that name even
though they had different layout.
Update list of allowed aarch64 features
I recently added these features to std_detect for aarch64 linux, pending [review](https://github.com/rust-lang/stdarch/pull/1146).
I have commented any features not supported by LLVM 9, the current minimum version for Rust. Some (PAuth at least) were renamed between 9 & 12 and I've left them disabled. TME, however, is not in LLVM 9 but I've left it enabled.
See https://github.com/rust-lang/stdarch/issues/993
Set dso_local for more items
Related to https://github.com/rust-lang/rust/pull/83592. (cc `@nagisa)`
Noticed that on x86_64 with `relocation-model: static` `R_X86_64_GOTPCREL` relocations were still generated in some cases. (related: https://github.com/Rust-for-Linux/linux/issues/135; Rust-for-Linux needs these fixes to successfully build)
First time doing anything with LLVM so not sure whether this is correct but the following are some of the things I've tried to convince myself.
## C equivalent
Example from clang which also sets `dso_local` in these cases:
`clang-12 -fno-PIC -S -emit-llvm test.c`
```C
extern int A;
int* a() {
return &A;
}
int B;
int* b() {
return &B;
}
```
```
; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
`@A` = external dso_local global i32, align 4
`@B` = dso_local global i32 0, align 4
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32* `@a()` #0 {
ret i32* `@A`
}
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32* `@b()` #0 {
ret i32* `@B`
}
attributes #0 = { noinline nounwind optnone uwtable "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 12.0.0 (https://github.com/llvm/llvm-project/ b978a93635b584db380274d7c8963c73989944a1)"}
```
`clang-12 -fno-PIC -c test.c`
`objdump test.o -r`:
```
test.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000006 R_X86_64_64 A
0000000000000016 R_X86_64_64 B
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text
0000000000000040 R_X86_64_PC32 .text+0x0000000000000010
```
## Comparison to pre-LLVM 12 output
`rustc --emit=obj,llvm-ir --target=x86_64-unknown-none-linuxkernel --crate-type rlib test.rs`
```Rust
#![feature(no_core, lang_items)]
#![no_core]
#[lang="sized"]
trait Sized {}
#[lang="sync"]
trait Sync {}
#[lang = "drop_in_place"]
pub unsafe fn drop_in_place<T: ?Sized>(_: *mut T) {}
impl Sync for i32 {}
pub static STATIC: i32 = 32;
extern {
pub static EXT_STATIC: i32;
}
pub fn a() -> &'static i32 {
&STATIC
}
pub fn b() -> &'static i32 {
unsafe {&EXT_STATIC}
}
```
`objdump test.o -r`
nightly-2021-02-20 (rustc target is `x86_64-linux-kernel`):
```
RELOCATION RECORDS FOR [.text._ZN4test1a17h1024ba65f3424175E]:
OFFSET TYPE VALUE
0000000000000007 R_X86_64_32S _ZN4test6STATIC17h3adc41a83746c9ffE
RELOCATION RECORDS FOR [.text._ZN4test1b17h86a6a80c1190ac8dE]:
OFFSET TYPE VALUE
0000000000000007 R_X86_64_32S EXT_STATIC
```
nightly-2021-05-10:
```
RELOCATION RECORDS FOR [.text._ZN4test1a17he846f03bf37b2d20E]:
OFFSET TYPE VALUE
0000000000000007 R_X86_64_GOTPCREL _ZN4test6STATIC17h5a059515bf3d4968E-0x0000000000000004
RELOCATION RECORDS FOR [.text._ZN4test1b17h7e0f7f80fbd91125E]:
OFFSET TYPE VALUE
0000000000000007 R_X86_64_GOTPCREL EXT_STATIC-0x0000000000000004
```
This PR:
```
RELOCATION RECORDS FOR [.text._ZN4test1a17he846f03bf37b2d20E]:
OFFSET TYPE VALUE
0000000000000007 R_X86_64_32S _ZN4test6STATIC17h5a059515bf3d4968E
RELOCATION RECORDS FOR [.text._ZN4test1b17h7e0f7f80fbd91125E]:
OFFSET TYPE VALUE
0000000000000007 R_X86_64_32S EXT_STATIC
```
# Stabilization report
## Summary
This stabilizes using macro expansion in key-value attributes, like so:
```rust
#[doc = include_str!("my_doc.md")]
struct S;
#[path = concat!(env!("OUT_DIR"), "/generated.rs")]
mod m;
```
See the changes to the reference for details on what macros are allowed;
see Petrochenkov's excellent blog post [on internals](https://internals.rust-lang.org/t/macro-expansion-points-in-attributes/11455)
for alternatives that were considered and rejected ("why accept no more
and no less?")
This has been available on nightly since 1.50 with no major issues.
## Notes
### Accepted syntax
The parser accepts arbitrary Rust expressions in this position, but any expression other than a macro invocation will ultimately lead to an error because it is not expected by the built-in expression forms (e.g., `#[doc]`). Note that decorators and the like may be able to observe other expression forms.
### Expansion ordering
Expansion of macro expressions in "inert" attributes occurs after decorators have executed, analogously to macro expressions appearing in the function body or other parts of decorator input.
There is currently no way for decorators to accept macros in key-value position if macro expansion must be performed before the decorator executes (if the macro can simply be copied into the output for later expansion, that can work).
## Test cases
- https://github.com/rust-lang/rust/blob/master/src/test/ui/attributes/key-value-expansion-on-mac.rs
- https://github.com/rust-lang/rust/blob/master/src/test/rustdoc/external-doc.rs
The feature has also been dogfooded extensively in the compiler and
standard library:
- https://github.com/rust-lang/rust/pull/83329
- https://github.com/rust-lang/rust/pull/83230
- https://github.com/rust-lang/rust/pull/82641
- https://github.com/rust-lang/rust/pull/80534
## Implementation history
- Initial proposal: https://github.com/rust-lang/rust/issues/55414#issuecomment-554005412
- Experiment to see how much code it would break: https://github.com/rust-lang/rust/pull/67121
- Preliminary work to restrict expansion that would conflict with this
feature: https://github.com/rust-lang/rust/pull/77271
- Initial implementation: https://github.com/rust-lang/rust/pull/78837
- Fix for an ICE: https://github.com/rust-lang/rust/pull/80563
## Unresolved Questions
~~https://github.com/rust-lang/rust/pull/83366#issuecomment-805180738 listed some concerns, but they have been resolved as of this final report.~~
## Additional Information
There are two workarounds that have a similar effect for `#[doc]`
attributes on nightly. One is to emulate this behavior by using a limited version of this feature that was stabilized for historical reasons:
```rust
macro_rules! forward_inner_docs {
($e:expr => $i:item) => {
#[doc = $e]
$i
};
}
forward_inner_docs!(include_str!("lib.rs") => struct S {});
```
This also works for other attributes (like `#[path = concat!(...)]`).
The other is to use `doc(include)`:
```rust
#![feature(external_doc)]
#[doc(include = "lib.rs")]
struct S {}
```
The first works, but is non-trivial for people to discover, and
difficult to read and maintain. The second is a strange special-case for
a particular use of the macro. This generalizes it to work for any use
case, not just including files.
I plan to remove `doc(include)` when this is stabilized. The
`forward_inner_docs` workaround will still compile without warnings, but
I expect it to be used less once it's no longer necessary.
Remove CrateNum parameter for queries that only work on local crate
The pervasive `CrateNum` parameter is a remnant of the multi-crate rustc idea.
Using `()` as query key in those cases avoids having to worry about the validity of the query key.
Use the object crate for metadata reading
This allows sharing the metadata reader between cg_llvm, cg_clif and other codegen backends.
This is not currently useful for rlib reading with cg_spirv ([rust-gpu](https://github.com/EmbarkStudios/rust-gpu/)) as it uses tar rather than ar as .rlib format, but it is useful for dylib reading required for loading proc macros. (cc `@eddyb)`
The object crate is already trusted as dependency of libstd through backtrace. As far as I know it supports reading all object file formats used by targets for which we support rust dylibs with crate metadata, but I am not certain. If this happens to not be the case, I could keep using LLVM for reading dylib metadata.
Marked as WIP for a perf run and as it is based on #83637.
Add asm!() support for PowerPC
This includes GPRs and FPRs only.
Note that this does not include PowerPC64.
For my reference, this was mostly duplicated from PR #73214.
Fix `--remap-path-prefix` not correctly remapping `rust-src` component paths and unify handling of path mapping with virtualized paths
This PR fixes#73167 ("Binaries end up containing path to the rust-src component despite `--remap-path-prefix`") by preventing real local filesystem paths from reaching compilation output if the path is supposed to be remapped.
`RealFileName::Named` introduced in #72767 is now renamed as `LocalPath`, because this variant wraps a (most likely) valid local filesystem path.
`RealFileName::Devirtualized` is renamed as `Remapped` to be used for remapped path from a real path via `--remap-path-prefix` argument, as well as real path inferred from a virtualized (during compiler bootstrapping) `/rustc/...` path. The `local_path` field is now an `Option<PathBuf>`, as it will be set to `None` before serialisation, so it never reaches any build output. Attempting to serialise a non-`None` `local_path` will cause an assertion faliure.
When a path is remapped, a `RealFileName::Remapped` variant is created. The original path is preserved in `local_path` field and the remapped path is saved in `virtual_name` field. Previously, the `local_path` is directly modified which goes against its purpose of "suitable for reading from the file system on the local host".
`rustc_span::SourceFile`'s fields `unmapped_path` (introduced by #44940) and `name_was_remapped` (introduced by #41508 when `--remap-path-prefix` feature originally added) are removed, as these two pieces of information can be inferred from the `name` field: if it's anything other than a `FileName::Real(_)`, or if it is a `FileName::Real(RealFileName::LocalPath(_))`, then clearly `name_was_remapped` would've been false and `unmapped_path` would've been `None`. If it is a `FileName::Real(RealFileName::Remapped{local_path, virtual_name})`, then `name_was_remapped` would've been true and `unmapped_path` would've been `Some(local_path)`.
cc `@eddyb` who implemented `/rustc/...` path devirtualisation
This doesn't seem to be necessary anymore, although I don't know
at which point or why that changed.
Forcing -O1 makes some tests fail under NewPM, because NewPM also
performs inlining at -O1, so it ends up performing much more
optimization in practice than before.