The llvm.copysign and llvm.round intrinsics weren't added until LLVM 3.4, so if
we're on LLVM 3.3 we lower these to calls in libm instead of LLVM intrinsics.
This should fix our travis failures.
The travis builds have been breaking recently because LLVM 3.5 upstream is
changing. This looks like it's likely to continue, so it would be more useful
for us if we could lock ourselves to a system LLVM version that is not changing.
This commit has the support to bring our C++ glue to LLVM back in line with what
was possible back in LLVM 3.{3,4}. I don't think we're going to be able to
reasonably protect against regressions in the future, but this kind of code is a
good sign that we can continue to use the system LLVM for simple-ish things.
Codegen for ARM won't work and it won't have some of the perf improvements we
have, but using the system LLVM should work well enough for development.
Set "Dwarf Version" to 2 on OS X to avoid toolchain incompatibility, and
set "Debug Info Version" to prevent debug info from being stripped from
bitcode.
Fixes#11352.
Major changes:
- Define temporary scopes in a syntax-based way that basically defaults
to the innermost statement or conditional block, except for in
a `let` initializer, where we default to the innermost block. Rules
are documented in the code, but not in the manual (yet).
See new test run-pass/cleanup-value-scopes.rs for examples.
- Refactors Datum to better define cleanup roles.
- Refactor cleanup scopes to not be tied to basic blocks, permitting
us to have a very large number of scopes (one per AST node).
- Introduce nascent documentation in trans/doc.rs covering datums and
cleanup in a more comprehensive way.
This pull request fixes#11083. The problem was that recursive type definitions were not properly handled for enum types, leading to problems with LLVM's metadata "uniquing". This bug has already been fixed for struct types some time ago (#9658) but I seem to have forgotten about enums back then. I added the offending code from issue #11083 as a test case.
We were previously reading metadata via `ar p`, but as learned from rustdoc
awhile back, spawning a process to do something is pretty slow. Turns out LLVM
has an Archive class to read archives, but it cannot write archives.
This commits adds bindings to the read-only version of the LLVM archive class
(with a new type that only has a read() method), and then it uses this class
when reading the metadata out of rlibs. When you put this in tandem of not
compressing the metadata, reading the metadata is 4x faster than it used to be
The timings I got for reading metadata from the respective libraries was:
libstd-04ff901e-0.9-pre.dylib => 100ms
libstd-04ff901e-0.9-pre.rlib => 23ms
librustuv-7945354c-0.9-pre.dylib => 4ms
librustuv-7945354c-0.9-pre.rlib => 1ms
librustc-5b94a16f-0.9-pre.dylib => 87ms
librustc-5b94a16f-0.9-pre.rlib => 35ms
libextra-a6ebb16f-0.9-pre.dylib => 63ms
libextra-a6ebb16f-0.9-pre.rlib => 15ms
libsyntax-2e4c0458-0.9-pre.dylib => 86ms
libsyntax-2e4c0458-0.9-pre.rlib => 22ms
In order to always take advantage of these faster metadata read-times, I sort
the files in filesearch based on whether they have an rlib extension or not
(prefer all rlib files first).
Overall, this halved the compile time for a `fn main() {}` crate from 0.185s to
0.095s on my system (when preferring dynamic linking). Reading metadata is still
the slowest pass of the compiler at 0.035s, but it's getting pretty close to
linking at 0.021s! The next best optimization is to just not copy the metadata
from LLVM because that's the most expensive part of reading metadata right now.
This commit implements LTO for rust leveraging LLVM's passes. What this means
is:
* When compiling an rlib, in addition to insdering foo.o into the archive, also
insert foo.bc (the LLVM bytecode) of the optimized module.
* When the compiler detects the -Z lto option, it will attempt to perform LTO on
a staticlib or binary output. The compiler will emit an error if a dylib or
rlib output is being generated.
* The actual act of performing LTO is as follows:
1. Force all upstream libraries to have an rlib version available.
2. Load the bytecode of each upstream library from the rlib.
3. Link all this bytecode into the current LLVM module (just using llvm
apis)
4. Run an internalization pass which internalizes all symbols except those
found reachable for the local crate of compilation.
5. Run the LLVM LTO pass manager over this entire module
6a. If assembling an archive, then add all upstream rlibs into the output
archive. This ignores all of the object/bitcode/metadata files rust
generated and placed inside the rlibs.
6b. If linking a binary, create copies of all upstream rlibs, remove the
rust-generated object-file, and then link everything as usual.
As I have explained in #10741, this process is excruciatingly slow, so this is
*not* turned on by default, and it is also why I have decided to hide it behind
a -Z flag for now. The good news is that the binary sizes are about as small as
they can be as a result of LTO, so it's definitely working.
Closes#10741Closes#10740
LLVM's JIT has been updated numerous times, and we haven't been tracking it at
all. The existing LLVM glue code no longer compiles, and the JIT isn't used for
anything currently.
This also rebases out the FixedStackSegment support which we have added to LLVM.
None of this is still in use by the compiler, and there's no need to keep this
functionality around inside of LLVM.
This is needed to unblock #10708 (where we're tripping an LLVM assertion).
Example:
void ({ i64, %tydesc*, i8*, i8*, i8 }*, i64*, %"struct.std::fmt::Formatter[#1]"*)*
Before, we would print 20 levels deep due to recursion in the type
definition.
Also fixed nasty bug caused by calling LLVMDIBuilderCreateStructType() with a null pointer where an empty array was expected (which would trigger an unintelligable assertion somewhere down the line).
Beforehand, it was unclear whether rust was performing the "recommended set" of
optimizations provided by LLVM for code. This commit changes the way we run
passes to closely mirror that of clang, which in theory does it correctly. The
notable changes include:
* Passes are no longer explicitly added one by one. This would be difficult to
keep up with as LLVM changes and we don't guaranteed always know the best
order in which to run passes
* Passes are now managed by LLVM's PassManagerBuilder object. This is then used
to populate the various pass managers run.
* We now run both a FunctionPassManager and a module-wide PassManager. This is
what clang does, and I presume that we *may* see a speed boost from the
module-wide passes just having to do less work. I have no measured this.
* The codegen pass manager has been extracted to its own separate pass manager
to not get mixed up with the other passes
* All pass managers now include passes for target-specific data layout and
analysis passes
Some new features include:
* You can now print all passes being run with `-Z print-llvm-passes`
* When specifying passes via `--passes`, the passes are now appended to the
default list of passes instead of overwriting them.
* The output of `--passes list` is now generated by LLVM instead of maintaining
a list of passes ourselves
* Loop vectorization is turned on by default as an optimization pass and can be
disabled with `-Z no-vectorize-loops`
We currently have no need for the frame pointers on any platform. They
may eventually be needed on platforms without an equivalent to the DWARF
call frame information to walk the stack in the garbage collector.
Closes#7477
The first commit message is pretty good, but whomever reviews this should probably also at least glance at the changes I made in LLVM. I basically reorganized our pending patch queue to be a bit more organized and clearer in what needs to go where. After this, our queue would be:
* Add the `no-split-stack` attribute
* Add the `fixedstacksegment` attribute
* Add split-stacks for arm android
* Add split-stacks for arm linux
* Add split stacks for mips
Then there's a patch which I added to get rust to build at all on LLVM-head, and I'm not quite sure why it's there, but nothing seems to be crashing for now! (famous last words).
Otherwise, I just updated code to reflect the changes I made in LLVM with the only major change being the advent of the new `no_split_stack` attribute. This is work towards #1226, but someone more familiar with the code should probably actually assign the attribute to the appropriate functions.
Also as a bonus, I've verified that this closes#5774
Adds `--target-cpu` flag which lets you choose a more specific target cpu instead of just passing the default, `generic`. It's more or less akin to `-mcpu`/`-mtune` in clang/gcc.
This can be applied to statics and it will indicate that LLVM will attempt to
merge the constant in .data with other statics.
I have preliminarily applied this to all of the statics generated by the new
`ifmt!` syntax extension. I compiled a file with 1000 calls to `ifmt!` and a
separate file with 1000 calls to `fmt!` to compare the sizes, and the results
were:
fmt 310k
ifmt (before) 529k
ifmt (after) 202k
This now means that ifmt! is both faster and smaller than fmt!, yay!
* LLVM now has a C interface to LLVMBuildAtomicRMW
* The exception handling support for the JIT seems to have been dropped
* Various interfaces have been added or headers have changed
Refactor the optimization passes to explicitly use the passes. This commit
just re-implements the same passes as were already being run.
It also adds an option (behind `-Z`) to run the LLVM lint pass on the
unoptimized IR.
The default versions (atomic_load and atomic_store) are sequentially consistent.
The atomic_load_acq intrinsic acquires as described in [1].
The atomic_store_rel intrinsic releases as described in [1].
[1]: http://llvm.org/docs/Atomics.html