Position independent code has fewer requirements in executables, so pass
the appropriate flag to LLVM in order to allow more optimization. At the
moment this means faster thread-local storage.
Apparently the default getFile implementation for a memory buffer in LLVM ends
up requiring a null terminator at the end of the file. This isn't true a good
bit of the time apparently on OSX. There have been a number of failed
nightly/snapshot builds recently with this strange assertion.
This modifies the calls to MemoryBuffer::getFile to explicitly not ask for a
null terminator.
To fix#8106, we need an LLVM version that contains r211082 aka 0dee6756
which fixes a bug that blocks that issue.
There have been some tiny API changes in LLVM, and cmpxchg changed its
return type. The i1 part of the new return type is only interesting when
using the new weak cmpxchg, which we don't do.
The core library in theory has 0 dependencies, but in practice it has some in
order for it to be efficient. These dependencies are in the form of the basic
memory operations provided by libc traditionally, such as memset, memcmp, etc.
These functions are trivial to implement and themselves have 0 dependencies.
This commit adds a new crate, librlibc, which will serve the purpose of
providing these dependencies. The crate is never linked to by default, but is
available to be linked to by downstream consumers. Normally these functions are
provided by the system libc, but in other freestanding contexts a libc may not
be available. In these cases, librlibc will suffice for enabling execution with
libcore.
cc #10116
LLVM internally uses `uint64_t` for array size, but the corresponding
C API (`LLVMArrayType`) uses `unsigned int` so ths value is truncated.
Therefore rustc generates wrong type for fixed-sized large vector e.g.
`[0 x i8]` for `[0u8, ..(1 << 32)]`.
This patch adds `LLVMRustArrayType` function for `uint64_t` support.
The compiler has previously been producing binaries on the order of 1.8MB for
hello world programs "fn main() {}". This is largely a result of the compilation
model used by compiling entire libraries into a single object file and because
static linking is favored by default.
When linking, linkers will pull in the entire contents of an object file if any
symbol from the object file is used. This means that if any symbol from a rust
library is used, the entire library is pulled in unconditionally, regardless of
whether the library is used or not.
Traditional C/C++ projects do not normally encounter these large executable
problems because their archives (rust's rlibs) are composed of many objects.
Because of this, linkers can eliminate entire objects from being in the final
executable. With rustc, however, the linker does not have the opportunity to
leave out entire object files.
In order to get similar benefits from dead code stripping at link time, this
commit enables the -ffunction-sections and -fdata-sections flags in LLVM, as
well as passing --gc-sections to the linker *by default*. This means that each
function and each global will be placed into its own section, allowing the
linker to GC all unused functions and data symbols.
By enabling these flags, rust is able to generate much smaller binaries default.
On linux, a hello world binary went from 1.8MB to 597K (a 67% reduction in
size). The output size of dynamic libraries remained constant, but the output
size of rlibs increased, as seen below:
libarena - 2.27% bigger ( 292872 => 299508)
libcollections - 0.64% bigger ( 6765884 => 6809076)
libflate - 0.83% bigger ( 186516 => 188060)
libfourcc - 14.71% bigger ( 307290 => 352498)
libgetopts - 4.42% bigger ( 761468 => 795102)
libglob - 2.73% bigger ( 899932 => 924542)
libgreen - 9.63% bigger ( 1281718 => 1405124)
libhexfloat - 13.88% bigger ( 333738 => 380060)
liblibc - 10.79% bigger ( 551280 => 610736)
liblog - 10.93% bigger ( 218208 => 242060)
libnative - 8.26% bigger ( 1362096 => 1474658)
libnum - 2.34% bigger ( 2583400 => 2643916)
librand - 1.72% bigger ( 1608684 => 1636394)
libregex - 6.50% bigger ( 1747768 => 1861398)
librustc - 4.21% bigger (151820192 => 158218924)
librustdoc - 8.96% bigger ( 13142604 => 14320544)
librustuv - 4.13% bigger ( 4366896 => 4547304)
libsemver - 2.66% bigger ( 396166 => 406686)
libserialize - 1.91% bigger ( 6878396 => 7009822)
libstd - 3.59% bigger ( 39485286 => 40902218)
libsync - 3.95% bigger ( 1386390 => 1441204)
libsyntax - 4.96% bigger ( 35757202 => 37530798)
libterm - 13.99% bigger ( 924580 => 1053902)
libtest - 6.04% bigger ( 2455720 => 2604092)
libtime - 2.84% bigger ( 1075708 => 1106242)
liburl - 6.53% bigger ( 590458 => 629004)
libuuid - 4.63% bigger ( 326350 => 341466)
libworkcache - 8.45% bigger ( 1230702 => 1334750)
This increase in size is a result of encoding many more section names into each
object file (rlib). These increases are moderate enough that this change seems
worthwhile to me, due to the drastic improvements seen in the final artifacts.
The overall increase of the stage2 target folder (not the size of an install)
went from 337MB to 348MB (3% increase).
Additionally, linking is generally slower when executed with all these new
sections plus the --gc-sections flag. The stage0 compiler takes 1.4s to link the
`rustc` binary, where the stage1 compiler takes 1.9s to link the binary. Three
megabytes are shaved off the binary. I found this increase in link time to be
acceptable relative to the benefits of code size gained.
This commit only enables --gc-sections for *executables*, not dynamic libraries.
LLVM does all the heavy lifting when producing an object file for a dynamic
library, so there is little else for the linker to do (remember that we only
have one object file).
I conducted similar experiments by putting a *module's* functions and data
symbols into its own section (granularity moved to a module level instead of a
function/static level). The size benefits of a hello world were seen to be on
the order of 400K rather than 1.2MB. It seemed that enough benefit was gained
using ffunction-sections that this route was less desirable, despite the lesser
increases in binary rlib size.
Many of the instances of setting a global error variable ended up leaving a
dangling pointer into free'd memory. This changes the method of error
transmission to strdup any error and "relinquish ownership" to rustc when it
gets an error. The corresponding Rust code will then free the error as
necessary.
Closes#12865
In upgrading LLVM, only rust functions had the "split-stack" attribute added.
This commit changes the addition of LLVM's "split-stack" attribute to *always*
occur and then we remove it sometimes if the "no_split_stack" rust attribute is
present.
Closes#13625
This comes with a number of fixes to be compatible with upstream LLVM:
* Previously all monomorphizations of "mem::size_of()" would receive the same
symbol. In the past LLVM would silently rename duplicated symbols, but it
appears to now be dropping the duplicate symbols and functions now. The symbol
names of monomorphized functions are now no longer solely based on the type of
the function, but rather the type and the unique hash for the
monomorphization.
* Split stacks are no longer a global feature controlled by a flag in LLVM.
Instead, they are opt-in on a per-function basis through a function attribute.
The rust #[no_split_stack] attribute will disable this, otherwise all
functions have #[split_stack] attached to them.
* The compare and swap instruction now takes two atomic orderings, one for the
successful case and one for the failure case. LLVM internally has an
implementation of calculating the appropriate failure ordering given a
particular success ordering (previously only a success ordering was
specified), and I copied that into the intrinsic translation so the failure
ordering isn't supplied on a source level for now.
* Minor tweaks to LLVM's API in terms of debuginfo, naming, c++11 conventions,
etc.
The recent pull request to remove libc from libstd has hit a wall in compiling
on windows, and I've been trying to investigate on the try bots as to why (it
compiles locally just fine). To the best of my knowledge, the LLVM section
iterator is behaving badly when iterating over the sections of the libc DLL.
Upon investigating the LLVMGetSectionName function in LLVM, I discovered that
this function doesn't always return a null-terminated string. It returns the
data pointer of a StringRef instance (LLVM's equivalent of &str essentially),
but it has no method of returning the length of the name of the section.
This commit modifies the section iteration when loading libraries to invoke a
custom LLVMRustGetSectionName which will correctly return both the length and
the data pointer.
I have not yet verified that this will fix landing liblibc, as it will require a
snapshot before doing a full test. Regardless, this is a worrisome situation
regarding the LLVM API, and should likely be fixed anyway.
The llvm.copysign and llvm.round intrinsics weren't added until LLVM 3.4, so if
we're on LLVM 3.3 we lower these to calls in libm instead of LLVM intrinsics.
This should fix our travis failures.
The travis builds have been breaking recently because LLVM 3.5 upstream is
changing. This looks like it's likely to continue, so it would be more useful
for us if we could lock ourselves to a system LLVM version that is not changing.
This commit has the support to bring our C++ glue to LLVM back in line with what
was possible back in LLVM 3.{3,4}. I don't think we're going to be able to
reasonably protect against regressions in the future, but this kind of code is a
good sign that we can continue to use the system LLVM for simple-ish things.
Codegen for ARM won't work and it won't have some of the perf improvements we
have, but using the system LLVM should work well enough for development.
Upstream LLVM has changed slightly such that our PassWrapper.cpp no longer
comiles (travis errors). This updates the bundled LLVM to the latest nightly
which will hopefully fix the travis errors we're seeing.
This can almost be fully disabled, as it no longer breaks retrieving a
backtrace on OS X as verified by @alexcrichton. However, it still
breaks retrieving the values of parameters. This should be fixable in
the future via a proper location list...
Closes#7477
Set "Dwarf Version" to 2 on OS X to avoid toolchain incompatibility, and
set "Debug Info Version" to prevent debug info from being stripped from
bitcode.
Fixes#11352.
Major changes:
- Define temporary scopes in a syntax-based way that basically defaults
to the innermost statement or conditional block, except for in
a `let` initializer, where we default to the innermost block. Rules
are documented in the code, but not in the manual (yet).
See new test run-pass/cleanup-value-scopes.rs for examples.
- Refactors Datum to better define cleanup roles.
- Refactor cleanup scopes to not be tied to basic blocks, permitting
us to have a very large number of scopes (one per AST node).
- Introduce nascent documentation in trans/doc.rs covering datums and
cleanup in a more comprehensive way.
This pull request fixes#11083. The problem was that recursive type definitions were not properly handled for enum types, leading to problems with LLVM's metadata "uniquing". This bug has already been fixed for struct types some time ago (#9658) but I seem to have forgotten about enums back then. I added the offending code from issue #11083 as a test case.
We were previously reading metadata via `ar p`, but as learned from rustdoc
awhile back, spawning a process to do something is pretty slow. Turns out LLVM
has an Archive class to read archives, but it cannot write archives.
This commits adds bindings to the read-only version of the LLVM archive class
(with a new type that only has a read() method), and then it uses this class
when reading the metadata out of rlibs. When you put this in tandem of not
compressing the metadata, reading the metadata is 4x faster than it used to be
The timings I got for reading metadata from the respective libraries was:
libstd-04ff901e-0.9-pre.dylib => 100ms
libstd-04ff901e-0.9-pre.rlib => 23ms
librustuv-7945354c-0.9-pre.dylib => 4ms
librustuv-7945354c-0.9-pre.rlib => 1ms
librustc-5b94a16f-0.9-pre.dylib => 87ms
librustc-5b94a16f-0.9-pre.rlib => 35ms
libextra-a6ebb16f-0.9-pre.dylib => 63ms
libextra-a6ebb16f-0.9-pre.rlib => 15ms
libsyntax-2e4c0458-0.9-pre.dylib => 86ms
libsyntax-2e4c0458-0.9-pre.rlib => 22ms
In order to always take advantage of these faster metadata read-times, I sort
the files in filesearch based on whether they have an rlib extension or not
(prefer all rlib files first).
Overall, this halved the compile time for a `fn main() {}` crate from 0.185s to
0.095s on my system (when preferring dynamic linking). Reading metadata is still
the slowest pass of the compiler at 0.035s, but it's getting pretty close to
linking at 0.021s! The next best optimization is to just not copy the metadata
from LLVM because that's the most expensive part of reading metadata right now.
llvm supports both win32 native threads and pthread,
but configure tries to find pthread first.
This manually disables pthread to use native api.
This removes libpthreads-2.dll dependency on librustc.
When performing LTO, the rust compiler has an opportunity to completely strip
all landing pads in all dependent libraries. I've modified the LTO pass to
recognize the -Z no-landing-pads option when also running an LTO pass to flag
everything in LLVM as nothrow. I've verified that this prevents any and all
invoke instructions from being emitted.
I believe that this is one of our best options for moving forward with
accomodating use-cases where unwinding doesn't really make sense. This will
allow libraries to be built with landing pads by default but allow usage of them
in contexts where landing pads aren't necessary.
cc #10780