Turns out for OSX our data layout was subtly wrong and the LLVM update must have
exposed this. Instead of fixing this I've removed all data layouts from the
compiler to just use the defaults that LLVM provides for all targets. All data
layouts (and a number of dead modules) are removed from the compiler here.
Custom target specifications can still provide a custom data layout, but it is
now an optional key as the default will be used if one isn't specified.
This commit updates the LLVM submodule in use to the current HEAD of the LLVM
repository. This is primarily being done to start picking up unwinding support
for MSVC, which is currently unimplemented in the revision of LLVM we are using.
Along the way a few changes had to be made:
* As usual, lots of C++ debuginfo bindings in LLVM changed, so there were some
significant changes to our RustWrapper.cpp
* As usual, some pass management changed in LLVM, so clang was re-scrutinized to
ensure that we're doing the same thing as clang.
* Some optimization options are now passed directly into the
`PassManagerBuilder` instead of through CLI switches to LLVM.
* The `NoFramePointerElim` option was removed from LLVM, favoring instead the
`no-frame-pointer-elim` function attribute instead.
Additionally, LLVM has picked up some new optimizations which required fixing an
existing soundness hole in the IR we generate. It appears that the current LLVM
we use does not expose this hole. When an enum is moved, the previous slot in
memory is overwritten with a bit pattern corresponding to "dropped". When the
drop glue for this slot is run, however, the switch on the discriminant can
often start executing the `unreachable` block of the switch due to the
discriminant now being outside the normal range. This was patched over locally
for now by having the `unreachable` block just change to a `ret void`.
Position independent code has fewer requirements in executables, so pass
the appropriate flag to LLVM in order to allow more optimization. At the
moment this means faster thread-local storage.
The core library in theory has 0 dependencies, but in practice it has some in
order for it to be efficient. These dependencies are in the form of the basic
memory operations provided by libc traditionally, such as memset, memcmp, etc.
These functions are trivial to implement and themselves have 0 dependencies.
This commit adds a new crate, librlibc, which will serve the purpose of
providing these dependencies. The crate is never linked to by default, but is
available to be linked to by downstream consumers. Normally these functions are
provided by the system libc, but in other freestanding contexts a libc may not
be available. In these cases, librlibc will suffice for enabling execution with
libcore.
cc #10116
The compiler has previously been producing binaries on the order of 1.8MB for
hello world programs "fn main() {}". This is largely a result of the compilation
model used by compiling entire libraries into a single object file and because
static linking is favored by default.
When linking, linkers will pull in the entire contents of an object file if any
symbol from the object file is used. This means that if any symbol from a rust
library is used, the entire library is pulled in unconditionally, regardless of
whether the library is used or not.
Traditional C/C++ projects do not normally encounter these large executable
problems because their archives (rust's rlibs) are composed of many objects.
Because of this, linkers can eliminate entire objects from being in the final
executable. With rustc, however, the linker does not have the opportunity to
leave out entire object files.
In order to get similar benefits from dead code stripping at link time, this
commit enables the -ffunction-sections and -fdata-sections flags in LLVM, as
well as passing --gc-sections to the linker *by default*. This means that each
function and each global will be placed into its own section, allowing the
linker to GC all unused functions and data symbols.
By enabling these flags, rust is able to generate much smaller binaries default.
On linux, a hello world binary went from 1.8MB to 597K (a 67% reduction in
size). The output size of dynamic libraries remained constant, but the output
size of rlibs increased, as seen below:
libarena - 2.27% bigger ( 292872 => 299508)
libcollections - 0.64% bigger ( 6765884 => 6809076)
libflate - 0.83% bigger ( 186516 => 188060)
libfourcc - 14.71% bigger ( 307290 => 352498)
libgetopts - 4.42% bigger ( 761468 => 795102)
libglob - 2.73% bigger ( 899932 => 924542)
libgreen - 9.63% bigger ( 1281718 => 1405124)
libhexfloat - 13.88% bigger ( 333738 => 380060)
liblibc - 10.79% bigger ( 551280 => 610736)
liblog - 10.93% bigger ( 218208 => 242060)
libnative - 8.26% bigger ( 1362096 => 1474658)
libnum - 2.34% bigger ( 2583400 => 2643916)
librand - 1.72% bigger ( 1608684 => 1636394)
libregex - 6.50% bigger ( 1747768 => 1861398)
librustc - 4.21% bigger (151820192 => 158218924)
librustdoc - 8.96% bigger ( 13142604 => 14320544)
librustuv - 4.13% bigger ( 4366896 => 4547304)
libsemver - 2.66% bigger ( 396166 => 406686)
libserialize - 1.91% bigger ( 6878396 => 7009822)
libstd - 3.59% bigger ( 39485286 => 40902218)
libsync - 3.95% bigger ( 1386390 => 1441204)
libsyntax - 4.96% bigger ( 35757202 => 37530798)
libterm - 13.99% bigger ( 924580 => 1053902)
libtest - 6.04% bigger ( 2455720 => 2604092)
libtime - 2.84% bigger ( 1075708 => 1106242)
liburl - 6.53% bigger ( 590458 => 629004)
libuuid - 4.63% bigger ( 326350 => 341466)
libworkcache - 8.45% bigger ( 1230702 => 1334750)
This increase in size is a result of encoding many more section names into each
object file (rlib). These increases are moderate enough that this change seems
worthwhile to me, due to the drastic improvements seen in the final artifacts.
The overall increase of the stage2 target folder (not the size of an install)
went from 337MB to 348MB (3% increase).
Additionally, linking is generally slower when executed with all these new
sections plus the --gc-sections flag. The stage0 compiler takes 1.4s to link the
`rustc` binary, where the stage1 compiler takes 1.9s to link the binary. Three
megabytes are shaved off the binary. I found this increase in link time to be
acceptable relative to the benefits of code size gained.
This commit only enables --gc-sections for *executables*, not dynamic libraries.
LLVM does all the heavy lifting when producing an object file for a dynamic
library, so there is little else for the linker to do (remember that we only
have one object file).
I conducted similar experiments by putting a *module's* functions and data
symbols into its own section (granularity moved to a module level instead of a
function/static level). The size benefits of a hello world were seen to be on
the order of 400K rather than 1.2MB. It seemed that enough benefit was gained
using ffunction-sections that this route was less desirable, despite the lesser
increases in binary rlib size.
Many of the instances of setting a global error variable ended up leaving a
dangling pointer into free'd memory. This changes the method of error
transmission to strdup any error and "relinquish ownership" to rustc when it
gets an error. The corresponding Rust code will then free the error as
necessary.
Closes#12865
This comes with a number of fixes to be compatible with upstream LLVM:
* Previously all monomorphizations of "mem::size_of()" would receive the same
symbol. In the past LLVM would silently rename duplicated symbols, but it
appears to now be dropping the duplicate symbols and functions now. The symbol
names of monomorphized functions are now no longer solely based on the type of
the function, but rather the type and the unique hash for the
monomorphization.
* Split stacks are no longer a global feature controlled by a flag in LLVM.
Instead, they are opt-in on a per-function basis through a function attribute.
The rust #[no_split_stack] attribute will disable this, otherwise all
functions have #[split_stack] attached to them.
* The compare and swap instruction now takes two atomic orderings, one for the
successful case and one for the failure case. LLVM internally has an
implementation of calculating the appropriate failure ordering given a
particular success ordering (previously only a success ordering was
specified), and I copied that into the intrinsic translation so the failure
ordering isn't supplied on a source level for now.
* Minor tweaks to LLVM's API in terms of debuginfo, naming, c++11 conventions,
etc.
The travis builds have been breaking recently because LLVM 3.5 upstream is
changing. This looks like it's likely to continue, so it would be more useful
for us if we could lock ourselves to a system LLVM version that is not changing.
This commit has the support to bring our C++ glue to LLVM back in line with what
was possible back in LLVM 3.{3,4}. I don't think we're going to be able to
reasonably protect against regressions in the future, but this kind of code is a
good sign that we can continue to use the system LLVM for simple-ish things.
Codegen for ARM won't work and it won't have some of the perf improvements we
have, but using the system LLVM should work well enough for development.
Upstream LLVM has changed slightly such that our PassWrapper.cpp no longer
comiles (travis errors). This updates the bundled LLVM to the latest nightly
which will hopefully fix the travis errors we're seeing.
This can almost be fully disabled, as it no longer breaks retrieving a
backtrace on OS X as verified by @alexcrichton. However, it still
breaks retrieving the values of parameters. This should be fixable in
the future via a proper location list...
Closes#7477
When performing LTO, the rust compiler has an opportunity to completely strip
all landing pads in all dependent libraries. I've modified the LTO pass to
recognize the -Z no-landing-pads option when also running an LTO pass to flag
everything in LLVM as nothrow. I've verified that this prevents any and all
invoke instructions from being emitted.
I believe that this is one of our best options for moving forward with
accomodating use-cases where unwinding doesn't really make sense. This will
allow libraries to be built with landing pads by default but allow usage of them
in contexts where landing pads aren't necessary.
cc #10780
This commit implements LTO for rust leveraging LLVM's passes. What this means
is:
* When compiling an rlib, in addition to insdering foo.o into the archive, also
insert foo.bc (the LLVM bytecode) of the optimized module.
* When the compiler detects the -Z lto option, it will attempt to perform LTO on
a staticlib or binary output. The compiler will emit an error if a dylib or
rlib output is being generated.
* The actual act of performing LTO is as follows:
1. Force all upstream libraries to have an rlib version available.
2. Load the bytecode of each upstream library from the rlib.
3. Link all this bytecode into the current LLVM module (just using llvm
apis)
4. Run an internalization pass which internalizes all symbols except those
found reachable for the local crate of compilation.
5. Run the LLVM LTO pass manager over this entire module
6a. If assembling an archive, then add all upstream rlibs into the output
archive. This ignores all of the object/bitcode/metadata files rust
generated and placed inside the rlibs.
6b. If linking a binary, create copies of all upstream rlibs, remove the
rust-generated object-file, and then link everything as usual.
As I have explained in #10741, this process is excruciatingly slow, so this is
*not* turned on by default, and it is also why I have decided to hide it behind
a -Z flag for now. The good news is that the binary sizes are about as small as
they can be as a result of LTO, so it's definitely working.
Closes#10741Closes#10740
LLVM's JIT has been updated numerous times, and we haven't been tracking it at
all. The existing LLVM glue code no longer compiles, and the JIT isn't used for
anything currently.
This also rebases out the FixedStackSegment support which we have added to LLVM.
None of this is still in use by the compiler, and there's no need to keep this
functionality around inside of LLVM.
This is needed to unblock #10708 (where we're tripping an LLVM assertion).
This change adds -Z soft-float option for generating
software floating point library calls.
It also implies using soft float ABI, that is the same as llc.
It is useful for targets that have no FPU.
The only changes to the default passes is that O1 now doesn't run the inline
pass, just always-inline with lifetime intrinsics. O2 also now has a threshold
of 225 instead of 275. Otherwise the default passes being run is the same.
I've also added a few more options for configuring the pass pipeline. Namely you
can now specify arguments to LLVM directly via the `--llvm-args` command line
option which operates similarly to `--passes`. I also added the ability to turn
off pre-population of the pass manager in case you want to run *only* your own
passes.
Beforehand, it was unclear whether rust was performing the "recommended set" of
optimizations provided by LLVM for code. This commit changes the way we run
passes to closely mirror that of clang, which in theory does it correctly. The
notable changes include:
* Passes are no longer explicitly added one by one. This would be difficult to
keep up with as LLVM changes and we don't guaranteed always know the best
order in which to run passes
* Passes are now managed by LLVM's PassManagerBuilder object. This is then used
to populate the various pass managers run.
* We now run both a FunctionPassManager and a module-wide PassManager. This is
what clang does, and I presume that we *may* see a speed boost from the
module-wide passes just having to do less work. I have no measured this.
* The codegen pass manager has been extracted to its own separate pass manager
to not get mixed up with the other passes
* All pass managers now include passes for target-specific data layout and
analysis passes
Some new features include:
* You can now print all passes being run with `-Z print-llvm-passes`
* When specifying passes via `--passes`, the passes are now appended to the
default list of passes instead of overwriting them.
* The output of `--passes list` is now generated by LLVM instead of maintaining
a list of passes ourselves
* Loop vectorization is turned on by default as an optimization pass and can be
disabled with `-Z no-vectorize-loops`
This refactors pass handling to use the argument names, so it can be used
in a similar manner to `opt`. This may be slightly less efficient than the
previous version, but it is much easier to maintain.
It also adds in the ability to specify a custom pipeline on the command
line, this overrides the normal passes, however. This should completely
close#2396.
Refactor the optimization passes to explicitly use the passes. This commit
just re-implements the same passes as were already being run.
It also adds an option (behind `-Z`) to run the LLVM lint pass on the
unoptimized IR.