cc #6004 and #3273
This is a rewrite of TLS to get towards not requiring `@` when using task local storage. Most of the rewrite is straightforward, although there are two caveats:
1. Changing `local_set` to not require `@` is blocked on #7673
2. The code in `local_pop` is some of the most unsafe code I've written. A second set of eyes should definitely scrutinize it...
The public-facing interface currently hasn't changed, although it will have to change because `local_data::get` cannot return `Option<T>`, nor can it return `Option<&T>` (the lifetime isn't known). This will have to be changed to be given a closure which yield `&T` (or as an Option). I didn't do this part of the api rewrite in this pull request as I figured that it could wait until when `@` is fully removed.
This also doesn't deal with the issue of using something other than functions as keys, but I'm looking into using static slices (as mentioned in the issues).
for cases where it's hard to decide what id to use for the lookup); modify
irrefutable bindings code to move or copy depending on the type, rather than
threading through a flag. Also updates how local variables and arguments are
registered. These changes were hard to isolate.
We currently still handle immediate return values a lot like
non-immediate ones. We provide a slot for them and store them into
memory, often just to immediately load them again. To improve this
situation, trans_call_inner has to return a Result which contains the
immediate return value.
Also, it also needs to accept "No destination" in addition to just
SaveIn and Ignore. Since "No destination" isn't something that fits
well into the Dest type, I've chosen to simply use Option<Dest>
instead, paired with an assertion that checks that "None" is only
allowed for immediate return values.
This way when you compile with -Z trans-stats you'll get a per-function cost breakdown, sorted with the most expensive functions first. Should help highlight pathological code.
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
@catamorphism, this re-enables threadsafe rustpkg tests, @brson this will fail unless the bots have LLVM rebuilt, so this is a good indicator of whether that happened or not.
Continuation of #7430.
I haven't removed the `map` method, since the replacement `v.iter().transform(f).collect::<~[SomeType]>()` is a little ridiculous at the moment.
With these changes, exchange allocator headers are never initialized, read or written to. Removing the header will now just involve updating the code in trans using an offset to only do it if the type contained is managed.
The only thing blocking removing the initialization of the last field in the header was ~fn since it uses it to store the dynamic size/types due to captures. I temporarily switched it to a `closure_exchange_alloc` lang item (it uses the same `exchange_free`) and #7496 is filed about removing that.
Since the `exchange_free` call is now inlined all over the codebase, I don't think we should have an assert for null. It doesn't currently ever happen, but it would be fine if we started generating code that did do it. The `exchange_free` function also had a comment declaring that it must not fail, but a regular assert would cause a failure. I also removed the atomic counter because valgrind can already find these leaks, and we have valgrind bots now.
Note that exchange free does not currently print an error an out-of-memory when it aborts, because our `io` code may allocate. We could probably get away with a `#[rust_stack]` call to a `stdio` function but it would be better to make a write system call.
Currently we pass all "self" arguments by reference, for the pointer
variants this means that we end up with double indirection which causes
a unnecessary performance hit.
The fix itself is pretty straight-forward and just means that "self"
needs to be handled like any other argument, except for by-value "self"
which still needs to be passed by reference. This is because
non-pointer types can't just be stuffed into the environment slot which
is used to pass "self".
What made things tricky is that there was also a bug in the typechecker
where the method map entries are created. For type impls, that stored
the base type instead of the actual self-type in the method map, e.g.
Foo instead of &Foo for &self. That worked with pass-by-reference, but
fails with pass-by-value which needs the real type.
Code that makes use of methods seems to be about 10% faster with this
change. Also, build times are reduced by about 4%.
Fixes#4355, #4402, #5280, #4406 and #7285
Instead of determining paths from the path tag, we iterate through
modules' children recursively in the metadata. This will allow for
lazy external module resolution.
This removes the `namegen` thunk that was in `common.rs`. I also take the opportunity to refactor a few uses where we had a `str -> ident -> str` chain that seemed somewhat redundant to me.
Also cleans up some warnings that made their way in already.
Mostly just low-haning fruit, i.e. function arguments that were @ even
though & would work just as well.
Reduces librustc.so size by 200k when compiling without -O, by 100k when
compiling with -O.