The only changes to the default passes is that O1 now doesn't run the inline
pass, just always-inline with lifetime intrinsics. O2 also now has a threshold
of 225 instead of 275. Otherwise the default passes being run is the same.
I've also added a few more options for configuring the pass pipeline. Namely you
can now specify arguments to LLVM directly via the `--llvm-args` command line
option which operates similarly to `--passes`. I also added the ability to turn
off pre-population of the pass manager in case you want to run *only* your own
passes.
Beforehand, it was unclear whether rust was performing the "recommended set" of
optimizations provided by LLVM for code. This commit changes the way we run
passes to closely mirror that of clang, which in theory does it correctly. The
notable changes include:
* Passes are no longer explicitly added one by one. This would be difficult to
keep up with as LLVM changes and we don't guaranteed always know the best
order in which to run passes
* Passes are now managed by LLVM's PassManagerBuilder object. This is then used
to populate the various pass managers run.
* We now run both a FunctionPassManager and a module-wide PassManager. This is
what clang does, and I presume that we *may* see a speed boost from the
module-wide passes just having to do less work. I have no measured this.
* The codegen pass manager has been extracted to its own separate pass manager
to not get mixed up with the other passes
* All pass managers now include passes for target-specific data layout and
analysis passes
Some new features include:
* You can now print all passes being run with `-Z print-llvm-passes`
* When specifying passes via `--passes`, the passes are now appended to the
default list of passes instead of overwriting them.
* The output of `--passes list` is now generated by LLVM instead of maintaining
a list of passes ourselves
* Loop vectorization is turned on by default as an optimization pass and can be
disabled with `-Z no-vectorize-loops`
We currently have no need for the frame pointers on any platform. They
may eventually be needed on platforms without an equivalent to the DWARF
call frame information to walk the stack in the garbage collector.
Closes#7477
The first commit message is pretty good, but whomever reviews this should probably also at least glance at the changes I made in LLVM. I basically reorganized our pending patch queue to be a bit more organized and clearer in what needs to go where. After this, our queue would be:
* Add the `no-split-stack` attribute
* Add the `fixedstacksegment` attribute
* Add split-stacks for arm android
* Add split-stacks for arm linux
* Add split stacks for mips
Then there's a patch which I added to get rust to build at all on LLVM-head, and I'm not quite sure why it's there, but nothing seems to be crashing for now! (famous last words).
Otherwise, I just updated code to reflect the changes I made in LLVM with the only major change being the advent of the new `no_split_stack` attribute. This is work towards #1226, but someone more familiar with the code should probably actually assign the attribute to the appropriate functions.
Also as a bonus, I've verified that this closes#5774
* This has one workaround patch (everything's testing just fine...)
* I reworked the fixedstacksegment attribute to be specified with a string
rather than using a keyword and an integer and modifying the parser
* I added a "no-split-stack" attribute along the same lines as the
"fixedstacksegment" attribute for #1226
Adds `--target-cpu` flag which lets you choose a more specific target cpu instead of just passing the default, `generic`. It's more or less akin to `-mcpu`/`-mtune` in clang/gcc.
This can be applied to statics and it will indicate that LLVM will attempt to
merge the constant in .data with other statics.
I have preliminarily applied this to all of the statics generated by the new
`ifmt!` syntax extension. I compiled a file with 1000 calls to `ifmt!` and a
separate file with 1000 calls to `fmt!` to compare the sizes, and the results
were:
fmt 310k
ifmt (before) 529k
ifmt (after) 202k
This now means that ifmt! is both faster and smaller than fmt!, yay!
* LLVM now has a C interface to LLVMBuildAtomicRMW
* The exception handling support for the JIT seems to have been dropped
* Various interfaces have been added or headers have changed
These blocks were required because previously we could only insert
instructions at the end of blocks, but we wanted to have all allocas in
one place, so they can be collapse. But now we have "direct" access the
the LLVM IR builder and can position it freely. This allows us to use
the same trick that clang uses, which means that we insert a dummy
"marker" instruction to identify the spot at which we want to insert
allocas. We can then later position the IR builder at that spot and
insert the alloca instruction, without any dedicated block.
The block for loading the closure environment can now also go away,
because the function context now provides the toplevel block, and the
translation of the loading happens first, so that's good enough.
Makes the LLVM IR a bit more readable, saving a bunch of branches in the
unoptimized code, which benefits unoptimized builds.
This un-reverts the reverts of the rusti commits made awhile back. These were reverted for an LLVM failure in rustpkg. I believe that this is not a problem with these commits, but rather that rustc is being used in parallel for rustpkg tests (in-process). This is not working yet (almost! see #7011), so I serialized all the tests to run one after another.
@brson, I'm mainly just guessing as to the cause of the LLVM failures in rustpkg tests. I'm confident that running tests in parallel is more likely to be the problem than those commits I made.
Additionally, this fixes two recently reported issues with rusti.
This refactors pass handling to use the argument names, so it can be used
in a similar manner to `opt`. This may be slightly less efficient than the
previous version, but it is much easier to maintain.
It also adds in the ability to specify a custom pipeline on the command
line, this overrides the normal passes, however. This should completely
close#2396.
Refactor the optimization passes to explicitly use the passes. This commit
just re-implements the same passes as were already being run.
It also adds an option (behind `-Z`) to run the LLVM lint pass on the
unoptimized IR.
The default versions (atomic_load and atomic_store) are sequentially consistent.
The atomic_load_acq intrinsic acquires as described in [1].
The atomic_store_rel intrinsic releases as described in [1].
[1]: http://llvm.org/docs/Atomics.html