This is the first of a series of refactorings to get rid of the `codemap::spanned<T>` struct (see this thread for more information: https://mail.mozilla.org/pipermail/rust-dev/2013-July/004798.html).
The changes in this PR should not change any semantics, just rename `ast::blk_` to `ast::blk` and add a span field to it. 95% of the changes were of the form `block.node.id` -> `block.id`. Only some transformations in `libsyntax::fold` where not entirely trivial.
Currently, our intrinsics are generated as functions that have the
usual setup, which means an alloca, and therefore also a jump, for
those intrinsics that return an immediate value. This is especially bad
for unoptimized builds because it means that an intrinsic like
"contains_managed" that should be just "ret 0" or "ret 1" actually ends
up allocating stack space, doing a jump and a store/load sequence
before it finally returns the value.
To fix that, we need a way to stop the generic function declaration
mechanism from allocating stack space for the return value. This
implicitly also kills the jump, because the block for static allocas
isn't required anymore.
Additionally, trans_intrinsic needs to build the return itself instead
of calling finish_fn, because the latter relies on the availability of
the return value pointer.
With these changes, we get the bare minimum code required for our
intrinsics, which makes them small enough that inlining them makes the
resulting code smaller, so we can mark them as "always inline" to get
better performing unoptimized builds.
Optimized builds also benefit slightly from this change as there's less
code for LLVM to translate and the smaller intrinsics help it to make
better inlining decisions for a few code paths.
Building stage2 librustc gets ~1% faster for the optimized version and 5% for
the unoptimized version.
Most arms of the huge match contain the same code, differing only in
small details like the name of the llvm intrinsic that is to be called.
Thus the duplicated code can be factored out into a few functions that
take some parameters to handle the differences.
Whenever a lang_item is required, some relevant message is displayed, often with
a span of what triggered the usage of the lang item.
Now "hello word" is as small as:
```rust
#[no_std];
extern {
fn puts(s: *u8);
}
extern "rust-intrinsic" {
fn transmute<T, U>(t: T) -> U;
}
#[start]
fn main(_: int, _: **u8, _: *u8) -> int {
unsafe {
let (ptr, _): (*u8, uint) = transmute("Hello!");
puts(ptr);
}
return 0;
}
```
Allowing them in type signatures is a significant amount of extra work, unfortunately. This also doesn't apply to static values, which takes a different code path.
As per @pcwalton's request, `debug!(..)` is only activated when the `debug` cfg is set; that is, for `RUST_LOG=some_module=4 ./some_program` to work, it needs to be compiled with `rustc --cfg debug some_program.rs`. (Although, there is the sneaky `__debug!(..)` macro that is always active, if you *really* need it.)
It functions by making `debug!` expand to `if false { __debug!(..) }` (expanding to an `if` like this is required to make sure `debug!` statements are typechecked and to avoid unused variable warnings), and adjusting trans to skip the pointless branches in `if true ...` and `if false ...`.
The conditional expansion change also required moving the inject-std-macros step into a new pass, and makes it actually insert them at the top of the crate; this means that the cfg stripping traverses over the macros and so filters out the unused ones.
This appears to takes an unoptimised build of `librustc` from 65s to 59s; and the full bootstrap from 18m41s to 18m26s on my computer (with general background use).
`./configure --enable-debug` will enable `debug!` statements in the bootstrap build.
That is, the `b` branch in `if true { a } else { b }` will not be
trans'd, and that expression will be exactly the same as `a`. This
means that, for example, macros conditionally expanding to `if false
{ .. }` (like debug!) will not waste time in LLVM (or trans).
An alloca in an unreachable block would shortcircuit with Undef, but with type
`Type`, rather than type `*Type` (i.e. a plain value, not a pointer) but it is
expected to return a pointer into the stack, leading to confusion and LLVM
asserts later.
Similarly, attaching the range metadata to a Load in an unreachable block
makes LLVM unhappy, since the Load returns Undef.
Fixes#7344.
If the TLS key is 0-sized, then the linux linker is apparently smart enough to
put everything at the same pointer. OSX on the other hand, will reserve some
space for all of them. To get around this, the TLS key now actuall consumes
space to ensure that it gets a unique pointer
We used to have concrete types in glue functions, but the way we used
to implement that broke inlining of those functions. To fix that, we
converted all glue to just take an i8* and always casted to that type.
The problem with the old implementation was that we made a wrong
assumption about the glue functions, taking it for granted that they
always take an i8*, because that's the function type expected by the
TyDesc fields. Therefore, we always ended up with some kind of cast.
But actually, we can initially have the glue with concrete types and
only cast the functions to the generic type once we actually emit the
TyDesc data.
That means that for glue calls that can be statically resolved, we don't
need any casts, unless the glue uses a simplified type. In that case we
cast the argument. And for glue calls that are resolved at runtime, we
cast the argument to i8*, because that's what the glue function in the
TyDesc expects.
Since most of out glue calls are static, this saves a lot of bitcasts.
The size of the unoptimized librustc.ll goes down by 240k lines.
Currently, we always create a dedicated "return" basic block, but when
there's only a single predecessor for that block, it can be merged with
that predecessor. We can achieve that merge by only creating the return
block on demand, avoiding its creation when its not required.
Reduces the pre-optimization size of librustc.ll created with --passes ""
by about 90k lines which equals about 4%.
Currently, immediate values are copied into an alloca only to have an
addressable storage so that it can be used with memcpy. Obviously we
can skip the memcpy in this case.
cc #6004 and #3273
This is a rewrite of TLS to get towards not requiring `@` when using task local storage. Most of the rewrite is straightforward, although there are two caveats:
1. Changing `local_set` to not require `@` is blocked on #7673
2. The code in `local_pop` is some of the most unsafe code I've written. A second set of eyes should definitely scrutinize it...
The public-facing interface currently hasn't changed, although it will have to change because `local_data::get` cannot return `Option<T>`, nor can it return `Option<&T>` (the lifetime isn't known). This will have to be changed to be given a closure which yield `&T` (or as an Option). I didn't do this part of the api rewrite in this pull request as I figured that it could wait until when `@` is fully removed.
This also doesn't deal with the issue of using something other than functions as keys, but I'm looking into using static slices (as mentioned in the issues).
Currently, immediate values are copied into an alloca only to have an
addressable storage so that it can be used with memcpy. Obviously we
can skip the memcpy in this case.
The free-standing functions in f32, f64, i8, i16, i32, i64, u8, u16,
u32, u64, float, int, and uint are replaced with generic functions in
num instead.
This means that instead of having to know everywhere what the type is, like
~~~
f64::sin(x)
~~~
You can simply write code that uses the type-generic versions in num instead, this works for all types that implement the corresponding trait in num.
~~~
num::sin(x)
~~~
Note 1: If you were previously using any of those functions, just replace them
with the corresponding function with the same name in num.
Note 2: If you were using a function that corresponds to an operator, use the
operator instead.
Note 3: This is just https://github.com/mozilla/rust/pull/7090 reopened against master.
for cases where it's hard to decide what id to use for the lookup); modify
irrefutable bindings code to move or copy depending on the type, rather than
threading through a flag. Also updates how local variables and arguments are
registered. These changes were hard to isolate.
The free-standing functions in f32, f64, i8, i16, i32, i64, u8, u16,
u32, u64, float, int, and uint are replaced with generic functions in
num instead.
If you were previously using any of those functions, just replace them
with the corresponding function with the same name in num.
Note: If you were using a function that corresponds to an operator, use
the operator instead.
We currently still handle immediate return values a lot like
non-immediate ones. We provide a slot for them and store them into
memory, often just to immediately load them again. To improve this
situation, trans_call_inner has to return a Result which contains the
immediate return value.
Also, it also needs to accept "No destination" in addition to just
SaveIn and Ignore. Since "No destination" isn't something that fits
well into the Dest type, I've chosen to simply use Option<Dest>
instead, paired with an assertion that checks that "None" is only
allowed for immediate return values.
There's no need to allocate a return slot for anykind of immediate
return value, not just not for nils. Also, when the return value is
ignored, we only have to copy it to a temporary alloca if it's actually
required to call drop_ty on it.
This is work continued from the now landed #7495 and #7521 pulls.
Removing the headers from unique vectors is another project, so I've separated the allocator.
This way when you compile with -Z trans-stats you'll get a per-function cost breakdown, sorted with the most expensive functions first. Should help highlight pathological code.
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
@catamorphism, this re-enables threadsafe rustpkg tests, @brson this will fail unless the bots have LLVM rebuilt, so this is a good indicator of whether that happened or not.
Continuation of #7430.
I haven't removed the `map` method, since the replacement `v.iter().transform(f).collect::<~[SomeType]>()` is a little ridiculous at the moment.
With these changes, exchange allocator headers are never initialized, read or written to. Removing the header will now just involve updating the code in trans using an offset to only do it if the type contained is managed.
The only thing blocking removing the initialization of the last field in the header was ~fn since it uses it to store the dynamic size/types due to captures. I temporarily switched it to a `closure_exchange_alloc` lang item (it uses the same `exchange_free`) and #7496 is filed about removing that.
Since the `exchange_free` call is now inlined all over the codebase, I don't think we should have an assert for null. It doesn't currently ever happen, but it would be fine if we started generating code that did do it. The `exchange_free` function also had a comment declaring that it must not fail, but a regular assert would cause a failure. I also removed the atomic counter because valgrind can already find these leaks, and we have valgrind bots now.
Note that exchange free does not currently print an error an out-of-memory when it aborts, because our `io` code may allocate. We could probably get away with a `#[rust_stack]` call to a `stdio` function but it would be better to make a write system call.
Currently we pass all "self" arguments by reference, for the pointer
variants this means that we end up with double indirection which causes
a unnecessary performance hit.
The fix itself is pretty straight-forward and just means that "self"
needs to be handled like any other argument, except for by-value "self"
which still needs to be passed by reference. This is because
non-pointer types can't just be stuffed into the environment slot which
is used to pass "self".
What made things tricky is that there was also a bug in the typechecker
where the method map entries are created. For type impls, that stored
the base type instead of the actual self-type in the method map, e.g.
Foo instead of &Foo for &self. That worked with pass-by-reference, but
fails with pass-by-value which needs the real type.
Code that makes use of methods seems to be about 10% faster with this
change. Also, build times are reduced by about 4%.
Fixes#4355, #4402, #5280, #4406 and #7285
Currently we pass all "self" arguments by reference, for the pointer
variants this means that we end up with double indirection which causes
a unnecessary performance hit.
The fix itself is pretty straight-forward and just means that "self"
needs to be handled like any other argument, except for by-value "self"
which still needs to be passed by reference. This is because
non-pointer types can't just be stuffed into the environment slot which
is used to pass "self".
What made things tricky is that there was also a bug in the typechecker
where the method map entries are created. For type impls, that stored
the base type instead of the actual self-type in the method map, e.g.
Foo instead of &Foo for &self. That worked with pass-by-reference, but
fails with pass-by-value which needs the real type.
Code that makes use of methods seems to be about 10% faster with this
change. Also, build times are reduced by about 4%.
Fixes#4355, #4402, #5280, #4406 and #7285
The code that tried to revoke the cleanup for the self argument tried
to use "llself" to do so, but the cleanup might actually be registered
with a different ValueRef due to e.g. casting. Currently, this is
worked around by early revocation of the cleanup for self in
trans_self_arg.
To handle this correctly, we have to put the ValueRef for the cleanup
into the MethodData, so trans_call_inner can use it to revoke the
cleanup when it's actually supposed to.
"self" is always passed as an opaque box, so there's no point in using
the concrete self type when translating the argument. All it does it
causing the value to be casted back to an opaque box right away.
Instead of determining paths from the path tag, we iterate through
modules' children recursively in the metadata. This will allow for
lazy external module resolution.
This removes the `namegen` thunk that was in `common.rs`. I also take the opportunity to refactor a few uses where we had a `str -> ident -> str` chain that seemed somewhat redundant to me.
Also cleans up some warnings that made their way in already.
Mostly just low-haning fruit, i.e. function arguments that were @ even
though & would work just as well.
Reduces librustc.so size by 200k when compiling without -O, by 100k when
compiling with -O.
This PR contains no real code changes. Just some documentation additions in the form of comments and some internal reordering of functions within debuginfo.rs.
This adds a `#[no_drop_flag]` attribute. This attribute tells the compiler to omit the drop flag from the struct, if it has a destructor. When the destructor is run, instead of setting the drop flag, it instead zeroes-out the struct. This means the destructor can run multiple times and therefore it is up to the developer to use it safely.
The primary usage case for this is smart-pointer types like `Rc<T>` as the extra flag caused the struct to be 1 word larger because of alignment.
This closes#7271 and #7138
This sets the `get_tydesc()` return type correctly and removes the intrinsic module. See #3730, #3475.
Update: this now also removes the unused shape fields in tydescs.
To achieve this, the following changes were made:
* Move TyDesc, TyVisitor and Opaque to std::unstable::intrinsics
* Convert TyDesc, TyVisitor and Opaque to lang items instead of specially
handling the intrinsics module
* Removed TypeDesc, FreeGlue and get_type_desc() from sys
Fixes#3475.
This fixes part of #3730, but not all.
Also changes the TyDesc struct to be equivalent with the generated
code, with the hope that the above issue may one day be closed for good,
i.e. that the TyDesc type can completely be specified in the Rust
sources and not be generated.
I removed the `static-method-test.rs` test because it was heavily based
on `BaseIter` and there are plenty of other more complex uses of static
methods anyway.
This makes the handling of atomic operations more generic, which
does impose a specific naming convention for the intrinsics, but
that seems ok with me, rather than having an individual case for
each name.
It also adds the intrinsics to the the intrinsics file.
The changes in these commits improve the IR codegen by removing unnecessary copies for certain function call arguments and stopping to allocate return values for functions returning nil. They reduce compile times by about 10% in total.
By using "void" instead of "{}" as the LLVM type for nil, we can avoid
the alloca/store/load sequence for the return value, resulting in less
and simpler IR code.
This reduces compile times by about 10%.
The removed test for issue #2611 is well covered by the `std::iterator`
module itself.
This adds the `count` method to `IteratorUtil` to replace `EqIter`.
This fixes the large number of problems that prevented cross crate
methods from ever working. It also fixes a couple lingering bugs with
polymorphic default methods and cleans up some of the code paths.
Closes#4102. Closes#4103.
r? nikomatsakis
This fixes the large number of problems that prevented cross crate
methods from ever working. It also fixes a couple lingering bugs with
polymorphic default methods and cleans up some of the code paths.
Closes#4102. Closes#4103.
Currently, by-copy function arguments are always stored into a scratch
datum, which serves two purposes. First, it is required to be able to
have a temporary cleanup, in case that the call fails before the callee
actually takes ownership of the value. Second, if the argument is to be
passed by reference, the copy is required, so that the function doesn't
get a reference to the original value.
But in case that the datum does not need a drop glue call and it is
passed by value, there's no need to perform the extra copy.
Currently, cleanup blocks are only reused when there are nested scopes, the
child scope's cleanup block will terminate with a jump to the parent
scope's cleanup block. But within a single scope, adding or revoking
any cleanup will force a fresh cleanup block. This means quadratic
growth with the number of allocations in a scope, because each
allocation needs a landing pad.
Instead of forcing a fresh cleanup block, we can keep a list chained
cleanup blocks that form a prefix of the currently required cleanups.
That way, the next cleanup block only has to handle newly added
cleanups. And by keeping the whole list instead of just the latest
block, we can also handle revocations more efficiently, by only
dropping those blocks that are no longer required, instead of all of
them.
Reduces the size of librustc by about 5% and the time required to build
it by about 10%.
Currently, cleanup blocks are only reused when there are nested scopes, the
child scope's cleanup block will terminate with a jump to the parent
scope's cleanup block. But within a single scope, adding or revoking
any cleanup will force a fresh cleanup block. This means quadratic
growth with the number of allocations in a scope, because each
allocation needs a landing pad.
Instead of forcing a fresh cleanup block, we can keep a list chained
cleanup blocks that form a prefix of the currently required cleanups.
That way, the next cleanup block only has to handle newly added
cleanups. And by keeping the whole list instead of just the latest
block, we can also handle revocations more efficiently, by only
dropping those blocks that are no longer required, instead of all of
them.
Reduces the size of librustc by about 5% and the time required to build
it by about 10%.
Moves all the remaining functions that could reasonably be methods to be methods, except for some FFI ones (which I believe @erickt is working on, possibly) and `each_split_within`, since I'm not really sure the details of it (I believe @kimundi wrote the current implementation, so maybe he could convert it to an external iterator method on `StrSlice`, e.g. `word_wrap_iter(&self) -> WordWrapIterator<'self>`, where `WordWrapIterator` impls `Iterator<&'self str>`. It probably won't be too hard, since it's already a state machine.)
This also cleans up the comparison impls for the string types, except I'm not sure how the lang items `eq_str` and `eq_str_uniq` need to be handled, so they (`eq_slice` and `eq`) remain stand-alone functions.
Remove all the explicit @mut-fields from CrateContext, though many
fields are still @-ptrs.
This required changing every single function call that explicitly
took a @CrateContext, so I took advantage and changed as many as I
could get away with to &-ptrs or &mut ptrs.
Currently, when calling glue functions, we cast the function to match
the argument type. This interacts very badly with LLVM and breaks
inlining of the glue code.
It's more efficient to use a unified function type for the glue
functions and always cast the function argument instead of the function.
The resulting code for rustc is about 13% faster (measured up to and
including the "trans" pass) and the resulting librustc is about 5%
smaller.
This un-reverts the reverts of the rusti commits made awhile back. These were reverted for an LLVM failure in rustpkg. I believe that this is not a problem with these commits, but rather that rustc is being used in parallel for rustpkg tests (in-process). This is not working yet (almost! see #7011), so I serialized all the tests to run one after another.
@brson, I'm mainly just guessing as to the cause of the LLVM failures in rustpkg tests. I'm confident that running tests in parallel is more likely to be the problem than those commits I made.
Additionally, this fixes two recently reported issues with rusti.
The lookups for these items in external crates currently cause repeated
decoding of the EBML metadata, which is pretty slow. Adding caches to
avoid the repeated decoding reduces the time required for the type
checking of librustc by about 25%.
This fixes the strange random crashes in compile-fail tests.
This reverts commit 96cd61ad03.
Conflicts:
src/librustc/driver/driver.rs
src/libstd/str.rs
src/libsyntax/ext/quote.rs
This almost removes the StringRef wrapper, since all strings are
Equiv-alent now. Removes a lot of `/* bad */ copy *`'s, and converts
several things to be &'static str (the lint table and the intrinsics
table).
There are many instances of .to_managed(), unfortunately.
There are now only half-a-dozen or so functions left `std::str` that should be methods.
Highlights:
- `.substr` was removed, since most of the uses of it in the code base were actually incorrect (it had a weird mixing of a byte index and a unicode character count), adding `.slice_chars` if one wants to handle characters, and the normal `.slice` method to handle bytes.
- Code duplication between the two impls for `connect` and `concat` was removed via a new `Str` trait, that is purely designed to allow an explicit -> `&str` conversion (`.as_slice()`)
- Deconfuse the 5 different functions for converting to `[u8]` (3 of which had actually incorrect documentation: implying that they didn't have the null terminator), into 3: `as_bytes` (all strings), `as_bytes_with_null` (`&'static str`, `@str` and `~str`) and `as_bytes_with_null_consume` (`~str`). None of these allocate, unlike the old versions.
(cc @thestinger)
The confusing mixture of byte index and character count meant that every
use of .substr was incorrect; replaced by slice_chars which only uses
character indices. The old behaviour of `.substr(start, n)` can be emulated
via `.slice_from(start).slice_chars(0, n)`.
This was a lot more painful than just changing `x.each` to `x.iter().advance` . I ran into my old friend #5898 and had to add underscores to some method names as a temporary workaround.
The borrow checker also had other ideas because rvalues aren't handled very well yet so temporary variables had to be added. However, storing the temporary in a variable led to dynamic `@mut` failures, so those had to be wrapped in blocks except where the scope ends immediately.
Anyway, the ugliness will be fixed as the compiler issues are fixed and this change will amount to `for x.each |x|` becoming `for x.iter |x|` and making all the iterator adaptors available.
I dropped the run-pass tests for `old_iter` because there's not much point in fixing a module that's on the way out in the next week or so.