std: Synchronize global allocator on wasm32
We originally didn't have threads, and now we're starting to add them!
Make sure we properly synchronize access to dlmalloc when the `atomics`
feature is enabled for `wasm32-unknown-unknown`.
doc fix: it's auto traits that make for automatic implementations
Being a marker trait is not good enough (that just means "no items in the trait").
r? @alexcrichton who [originally wrote these docs](0a13f1abaf).
Add doc comments about safest way to initialize a vector of zeros
This adds more information about the vec! macro as discussed in #54628. I think this is a good starting point, but I think additional detail is needed so that we can explain why vec! is safer than the alternatives.
NLL says "borrowed content" instead of more precise "dereference of raw pointer"
Part of #52663.
Previously, move errors involving the dereference of a raw pointer would
say "borrowed content". This commit changes it to say "dereference of
raw pointer".
r? @nikomatsakis
cc @pnkfelix
Documents reference equality by address (#54197)
Clarification of the use of `ptr::eq` to test equality of references via address by pointer coercion, regarding issue #54197 .
The same example as in `ptr::eq` docs is shown here to clarify that
`PartialEq` compares values pointed-to instead of via address (which can be desired in some cases)
It looks like we tend to use angle-brackets around the placeholder in
the few other places we use `Applicability::HasPlaceholders`, but that
would be confusing here, so ...
Use MaybeUninit in liballoc
All code by @japaric. This is a re-submission of a part of https://github.com/rust-lang/rust/pull/53508 that hopefully does not regress performance.
Support for disabling PLT for better function call performance
This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.
AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](https://github.com/rust-lang/rust/pull/43170), lazy binding was disabled anyway.
This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).
Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.
## Performance
I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):
```
name control ns/iter no-plt ns/iter diff ns/iter diff % speedup
build_app_long 11,097 10,733 -364 -3.28% x 1.03
build_app_short 11,089 10,742 -347 -3.13% x 1.03
build_help_long 186,835 182,713 -4,122 -2.21% x 1.02
build_help_short 80,949 78,455 -2,494 -3.08% x 1.03
parse_clean 12,385 12,044 -341 -2.75% x 1.03
parse_complex 19,438 19,017 -421 -2.17% x 1.02
parse_lots 431,493 421,421 -10,072 -2.33% x 1.02
```
A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.
## Security benefits
**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).
## Remaining PLT calls
The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
Disable the PLT where possible to improve performance
for indirect calls into shared libraries.
This optimization is enabled by default where possible.
- Add the `NonLazyBind` attribute to `rustllvm`:
This attribute informs LLVM to skip PLT calls in codegen.
- Disable PLT unconditionally:
Apply the `NonLazyBind` attribute on every function.
- Only enable no-plt when full relro is enabled:
Ensures we only enable it when we have linker support.
- Add `-Z plt` as a compiler option
This commit extends the existing lang items functionality to assert
that the `#[lang_item]` attribute is only found on the appropriate item
for any given lang item. That is, language items representing traits
must only ever have their corresponding attribute placed on a trait, for
example.
This adds an implementation of thread local storage for the
`wasm32-unknown-unknown` target when the `atomics` feature is
implemented. This, however, comes with a notable caveat of that it
requires a new feature of the standard library, `wasm-bindgen-threads`,
to be enabled.
Thread local storage for wasm (when `atomics` are enabled and there's
actually more than one thread) is powered by the assumption that an
external entity can fill in some information for us. It's not currently
clear who will fill in this information nor whose responsibility it
should be long-term. In the meantime there's a strategy being gamed out
in the `wasm-bindgen` project specifically, and the hope is that we can
continue to test and iterate on the standard library without committing
to a particular strategy yet.
As to the details of `wasm-bindgen`'s strategy, LLVM doesn't currently
have the ability to emit custom `global` values (thread locals in a
`WebAssembly.Module`) so we leverage the `wasm-bindgen` CLI tool to do
it for us. To that end we have a few intrinsics, assuming two global values:
* `__wbindgen_current_id` - gets the current thread id as a 32-bit
integer. It's `wasm-bindgen`'s responsibility to initialize this
per-thread and then inform libstd of the id. Currently `wasm-bindgen`
performs this initialization as part of the `start` function.
* `__wbindgen_tcb_{get,set}` - in addition to a thread id it's assumed
that there's a global available for simply storing a pointer's worth
of information (a thread control block, which currently only contains
thread local storage). This would ideally be a native `global`
injected by LLVM, but we don't have a great way to support that right
now.
To reiterate, this is all intended to be unstable and purely intended
for testing out Rust on the web with threads. The story is very likely
to change in the future and we want to make sure that we're able to do
that!
Fix#54707 - parse_trait_item_ now handles interpolated blocks as function body decls
Fix#54707 - parse_trait_item_ now handles interpolated blocks as function body decls
Previously parsing trait items only handled opening brace token and semicolon, I added a branch to the match statement that will also handle interpolated blocks.
Better Diagnostic for Trait Object Capture
Part of #52663.
This commit enhances `LaterUseKind` detection to identify when a borrow
is captured by a trait object which helps explain why there is a borrow
error.
r? @nikomatsakis
cc @pnkfelix
codegen_llvm: verify that inline assembly operands are scalars
Another set of inline assembly fixes. This time let's emit an error message when the operand value cannot be coerced into the operand constraint.
Two questions:
1) Should I reuse `E0668` which was introduced in #54568 or just use `E0669` as it stands because they do mean different things, but maybe that's not too user-friendly. Just a thought.
2) The `try_fold` returns the operand which failed to be converted into a scalar value, any suggestions on how to use that in the error message?
Thanks!