Updated all users of HashMap, HashSet ::consume() to use
.consume_iter().
Since .consume_iter() takes the map or set by value, it needs awkward
extra code to in librusti's use of @mut HashMap, where the map value can
not be directly moved out.
Addresses issue #7719
Updated all users of HashMap, HashSet old .consume() to use .consume()
with a for loop.
Since .consume() takes the map or set by value, it needs awkward
extra code to in librusti's use of @mut HashMap, where the map value can
not be directly moved out.
r? anyone
The only bit that I'm a little concerned about is whether there's some way the assignments to `hi` could somehow still be necessary; but I think that could only be the case if it had been `&const` borrowed (or whatever the hypothetical syntax is for that), and that's not going on in this file.
It disables the insertion of `use std::prelude::*;` into the top of
all the modules below the item on which it is placed (including that
item itself).
(Similar to GHC's `-XNoImplicitPrelude`.)
This is the first of a series of refactorings to get rid of the `codemap::spanned<T>` struct (see this thread for more information: https://mail.mozilla.org/pipermail/rust-dev/2013-July/004798.html).
The changes in this PR should not change any semantics, just rename `ast::blk_` to `ast::blk` and add a span field to it. 95% of the changes were of the form `block.node.id` -> `block.id`. Only some transformations in `libsyntax::fold` where not entirely trivial.
Currently, our intrinsics are generated as functions that have the
usual setup, which means an alloca, and therefore also a jump, for
those intrinsics that return an immediate value. This is especially bad
for unoptimized builds because it means that an intrinsic like
"contains_managed" that should be just "ret 0" or "ret 1" actually ends
up allocating stack space, doing a jump and a store/load sequence
before it finally returns the value.
To fix that, we need a way to stop the generic function declaration
mechanism from allocating stack space for the return value. This
implicitly also kills the jump, because the block for static allocas
isn't required anymore.
Additionally, trans_intrinsic needs to build the return itself instead
of calling finish_fn, because the latter relies on the availability of
the return value pointer.
With these changes, we get the bare minimum code required for our
intrinsics, which makes them small enough that inlining them makes the
resulting code smaller, so we can mark them as "always inline" to get
better performing unoptimized builds.
Optimized builds also benefit slightly from this change as there's less
code for LLVM to translate and the smaller intrinsics help it to make
better inlining decisions for a few code paths.
Building stage2 librustc gets ~1% faster for the optimized version and 5% for
the unoptimized version.
Most arms of the huge match contain the same code, differing only in
small details like the name of the llvm intrinsic that is to be called.
Thus the duplicated code can be factored out into a few functions that
take some parameters to handle the differences.
Whenever a lang_item is required, some relevant message is displayed, often with
a span of what triggered the usage of the lang item.
Now "hello word" is as small as:
```rust
#[no_std];
extern {
fn puts(s: *u8);
}
extern "rust-intrinsic" {
fn transmute<T, U>(t: T) -> U;
}
#[start]
fn main(_: int, _: **u8, _: *u8) -> int {
unsafe {
let (ptr, _): (*u8, uint) = transmute("Hello!");
puts(ptr);
}
return 0;
}
```
Allowing them in type signatures is a significant amount of extra work, unfortunately. This also doesn't apply to static values, which takes a different code path.
As per @pcwalton's request, `debug!(..)` is only activated when the `debug` cfg is set; that is, for `RUST_LOG=some_module=4 ./some_program` to work, it needs to be compiled with `rustc --cfg debug some_program.rs`. (Although, there is the sneaky `__debug!(..)` macro that is always active, if you *really* need it.)
It functions by making `debug!` expand to `if false { __debug!(..) }` (expanding to an `if` like this is required to make sure `debug!` statements are typechecked and to avoid unused variable warnings), and adjusting trans to skip the pointless branches in `if true ...` and `if false ...`.
The conditional expansion change also required moving the inject-std-macros step into a new pass, and makes it actually insert them at the top of the crate; this means that the cfg stripping traverses over the macros and so filters out the unused ones.
This appears to takes an unoptimised build of `librustc` from 65s to 59s; and the full bootstrap from 18m41s to 18m26s on my computer (with general background use).
`./configure --enable-debug` will enable `debug!` statements in the bootstrap build.
That is, the `b` branch in `if true { a } else { b }` will not be
trans'd, and that expression will be exactly the same as `a`. This
means that, for example, macros conditionally expanding to `if false
{ .. }` (like debug!) will not waste time in LLVM (or trans).
An alloca in an unreachable block would shortcircuit with Undef, but with type
`Type`, rather than type `*Type` (i.e. a plain value, not a pointer) but it is
expected to return a pointer into the stack, leading to confusion and LLVM
asserts later.
Similarly, attaching the range metadata to a Load in an unreachable block
makes LLVM unhappy, since the Load returns Undef.
Fixes#7344.
Macros can be conditionally defined because stripping occurs before macro
expansion, but, the built-in macros were only added as part of the actual
expansion process and so couldn't be stripped to have definitions conditional
on cfg flags.
debug! is defined conditionally in terms of the debug config, expanding to
nothing unless the --cfg debug flag is passed (to be precise it expands to
`if false { normal_debug!(...) }` so that they are still type checked, and
to avoid unused variable lints).
If the TLS key is 0-sized, then the linux linker is apparently smart enough to
put everything at the same pointer. OSX on the other hand, will reserve some
space for all of them. To get around this, the TLS key now actuall consumes
space to ensure that it gets a unique pointer
We used to have concrete types in glue functions, but the way we used
to implement that broke inlining of those functions. To fix that, we
converted all glue to just take an i8* and always casted to that type.
The problem with the old implementation was that we made a wrong
assumption about the glue functions, taking it for granted that they
always take an i8*, because that's the function type expected by the
TyDesc fields. Therefore, we always ended up with some kind of cast.
But actually, we can initially have the glue with concrete types and
only cast the functions to the generic type once we actually emit the
TyDesc data.
That means that for glue calls that can be statically resolved, we don't
need any casts, unless the glue uses a simplified type. In that case we
cast the argument. And for glue calls that are resolved at runtime, we
cast the argument to i8*, because that's what the glue function in the
TyDesc expects.
Since most of out glue calls are static, this saves a lot of bitcasts.
The size of the unoptimized librustc.ll goes down by 240k lines.
Turns out this was a more subtle bug than I originally thought. My analysis can be found in #7732, but I also tried to put descriptive info into the comments.
Closes#7732
We used to have concrete types in glue functions, but the way we used
to implement that broke inlining of those functions. To fix that, we
converted all glue to just take an i8* and always casted to that type.
The problem with the old implementation was that we made a wrong
assumption about the glue functions, taking it for granted that they
always take an i8*, because that's the function type expected by the
TyDesc fields. Therefore, we always ended up with some kind of cast.
But actually, we can initially have the glue with concrete types and
only cast the functions to the generic type once we actually emit the
TyDesc data.
That means that for glue calls that can be statically resolved, we don't
need any casts, unless the glue uses a simplified type. In that case we
cast the argument. And for glue calls that are resolved at runtime, we
cast the argument to i8*, because that's what the glue function in the
TyDesc expects.
Since most of out glue calls are static, this saves a lot of bitcasts.
The size of the unoptimized librustc.ll goes down by 240k lines.
Currently, we always create a dedicated "return" basic block, but when
there's only a single predecessor for that block, it can be merged with
that predecessor. We can achieve that merge by only creating the return
block on demand, avoiding its creation when its not required.
Reduces the pre-optimization size of librustc.ll created with --passes ""
by about 90k lines which equals about 4%.
Currently, immediate values are copied into an alloca only to have an
addressable storage so that it can be used with memcpy. Obviously we
can skip the memcpy in this case.
cc #6004 and #3273
This is a rewrite of TLS to get towards not requiring `@` when using task local storage. Most of the rewrite is straightforward, although there are two caveats:
1. Changing `local_set` to not require `@` is blocked on #7673
2. The code in `local_pop` is some of the most unsafe code I've written. A second set of eyes should definitely scrutinize it...
The public-facing interface currently hasn't changed, although it will have to change because `local_data::get` cannot return `Option<T>`, nor can it return `Option<&T>` (the lifetime isn't known). This will have to be changed to be given a closure which yield `&T` (or as an Option). I didn't do this part of the api rewrite in this pull request as I figured that it could wait until when `@` is fully removed.
This also doesn't deal with the issue of using something other than functions as keys, but I'm looking into using static slices (as mentioned in the issues).
Currently, immediate values are copied into an alloca only to have an
addressable storage so that it can be used with memcpy. Obviously we
can skip the memcpy in this case.
The free-standing functions in f32, f64, i8, i16, i32, i64, u8, u16,
u32, u64, float, int, and uint are replaced with generic functions in
num instead.
This means that instead of having to know everywhere what the type is, like
~~~
f64::sin(x)
~~~
You can simply write code that uses the type-generic versions in num instead, this works for all types that implement the corresponding trait in num.
~~~
num::sin(x)
~~~
Note 1: If you were previously using any of those functions, just replace them
with the corresponding function with the same name in num.
Note 2: If you were using a function that corresponds to an operator, use the
operator instead.
Note 3: This is just https://github.com/mozilla/rust/pull/7090 reopened against master.
Correct treatment of irrefutable patterns. The old code was wrong in many, many ways. `ref` bindings didn't work, it sometimes copied when it should have moved, the borrow checker didn't even look at such patterns at all, we weren't consistent about preventing values with destructors from being pulled apart, etc.
Fixes#3224.
Fixes#3225.
Fixes#3255.
Fixes#6225.
Fixes#6386.
r? @catamorphism
for cases where it's hard to decide what id to use for the lookup); modify
irrefutable bindings code to move or copy depending on the type, rather than
threading through a flag. Also updates how local variables and arguments are
registered. These changes were hard to isolate.
The free-standing functions in f32, f64, i8, i16, i32, i64, u8, u16,
u32, u64, float, int, and uint are replaced with generic functions in
num instead.
If you were previously using any of those functions, just replace them
with the corresponding function with the same name in num.
Note: If you were using a function that corresponds to an operator, use
the operator instead.
We currently still handle immediate return values a lot like
non-immediate ones. We provide a slot for them and store them into
memory, often just to immediately load them again. To improve this
situation, trans_call_inner has to return a Result which contains the
immediate return value.
Also, it also needs to accept "No destination" in addition to just
SaveIn and Ignore. Since "No destination" isn't something that fits
well into the Dest type, I've chosen to simply use Option<Dest>
instead, paired with an assertion that checks that "None" is only
allowed for immediate return values.
There's no need to allocate a return slot for anykind of immediate
return value, not just not for nils. Also, when the return value is
ignored, we only have to copy it to a temporary alloca if it's actually
required to call drop_ty on it.
This is work continued from the now landed #7495 and #7521 pulls.
Removing the headers from unique vectors is another project, so I've separated the allocator.
This way when you compile with -Z trans-stats you'll get a per-function cost breakdown, sorted with the most expensive functions first. Should help highlight pathological code.
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
Also, makes the pretty-printer use & instead of @ as much as possible,
which will help with later changes, though in the interim has produced
some... interesting constructs.
After getting an ICE trying to use the `Repr` enum from middle::trans::adt (see issue #7527), I tried to implement the missing case for struct-like enum variants in `middle::ty::enum_variants()`. It seems to work now (and passes make check) but there are still some uncertainties that bother me:
+ I'm not sure I did everything, right. Especially getting the variant constructor function from the variant node id is just copied from the tuple-variant case. Someone with more experience in the code base should be able to see rather quickly whether this OK so.
+ It is kind of strange that I could not reproduce the ICE with a smaller test case. The unimplemented code path never seems to be hit in most cases, even when using the exact same `Repr` enum, just with `ty::t` replaced by an opaque pointer. Also, within the `adt` module, `Repr` and matching on it is used multiple times, again without running into problems. Can anyone explain why this is the case? That would be much appreciated.
Apart from that, I hope this PR is useful.
This a followup to #7510. @catamorphism requested a test - so I have created one, but in doing so I noticed some inconsistency in the error messages resulting from referencing nonexistent traits, so I changed the messages to be more consistent.
Change the signature of Iterator.size_hint() to always have a lower bound.
Implement .size_hint() on all remaining iterators (if it differs from the default).
Adds a lint for `static some_lowercase_name: uint = 1;`. Warning by default since it causes confusion, e.g. `static a: uint = 1; ... let a = 2;` => `error: only refutable patterns allowed here`.
I think it's WIP - but I wanted to ask for feedback (/cc @thestinger)
I had to move the impl of FromIter for vec into extra::iter because I don't think std can depend on extra, but that's a bit messed up. Similarly some FromIter uses are gone now, not sure if this is fixable or if I made a complete mess here..
common supertypes.
This was breaking with the change to regions because of the
(now incorrect) assumpton that our inference code makes,
which is that if a <: b succeeds, there is no need to compute
the LUB/GLB.
This patch makes error handling for region inference failures more
uniform by not reporting *any* region errors until the reigon inference
step. This requires threading through more information about what
caused a region constraint, so that we can still give informative
error messages.
I have only taken partial advantage of this information: when region
inference fails, we still report the same error we always did, despite
the fact that we now know precisely what caused the various constriants
and what the region variable represents, which we did not know before.
This change is required not only to improve error messages but
because the region hierarchy is not in fact fully known until regionck,
because it is not clear where closure bodies fit in (our current
treatment is unsound). Moreover, the relationships between free variables
cannot be fully determined until type inference is otherwise complete.
cc #3238.
@catamorphism, this re-enables threadsafe rustpkg tests, @brson this will fail unless the bots have LLVM rebuilt, so this is a good indicator of whether that happened or not.
Continuation of #7430.
I haven't removed the `map` method, since the replacement `v.iter().transform(f).collect::<~[SomeType]>()` is a little ridiculous at the moment.
With these changes, exchange allocator headers are never initialized, read or written to. Removing the header will now just involve updating the code in trans using an offset to only do it if the type contained is managed.
The only thing blocking removing the initialization of the last field in the header was ~fn since it uses it to store the dynamic size/types due to captures. I temporarily switched it to a `closure_exchange_alloc` lang item (it uses the same `exchange_free`) and #7496 is filed about removing that.
Since the `exchange_free` call is now inlined all over the codebase, I don't think we should have an assert for null. It doesn't currently ever happen, but it would be fine if we started generating code that did do it. The `exchange_free` function also had a comment declaring that it must not fail, but a regular assert would cause a failure. I also removed the atomic counter because valgrind can already find these leaks, and we have valgrind bots now.
Note that exchange free does not currently print an error an out-of-memory when it aborts, because our `io` code may allocate. We could probably get away with a `#[rust_stack]` call to a `stdio` function but it would be better to make a write system call.
Currently we pass all "self" arguments by reference, for the pointer
variants this means that we end up with double indirection which causes
a unnecessary performance hit.
The fix itself is pretty straight-forward and just means that "self"
needs to be handled like any other argument, except for by-value "self"
which still needs to be passed by reference. This is because
non-pointer types can't just be stuffed into the environment slot which
is used to pass "self".
What made things tricky is that there was also a bug in the typechecker
where the method map entries are created. For type impls, that stored
the base type instead of the actual self-type in the method map, e.g.
Foo instead of &Foo for &self. That worked with pass-by-reference, but
fails with pass-by-value which needs the real type.
Code that makes use of methods seems to be about 10% faster with this
change. Also, build times are reduced by about 4%.
Fixes#4355, #4402, #5280, #4406 and #7285
Currently we pass all "self" arguments by reference, for the pointer
variants this means that we end up with double indirection which causes
a unnecessary performance hit.
The fix itself is pretty straight-forward and just means that "self"
needs to be handled like any other argument, except for by-value "self"
which still needs to be passed by reference. This is because
non-pointer types can't just be stuffed into the environment slot which
is used to pass "self".
What made things tricky is that there was also a bug in the typechecker
where the method map entries are created. For type impls, that stored
the base type instead of the actual self-type in the method map, e.g.
Foo instead of &Foo for &self. That worked with pass-by-reference, but
fails with pass-by-value which needs the real type.
Code that makes use of methods seems to be about 10% faster with this
change. Also, build times are reduced by about 4%.
Fixes#4355, #4402, #5280, #4406 and #7285
The code that tried to revoke the cleanup for the self argument tried
to use "llself" to do so, but the cleanup might actually be registered
with a different ValueRef due to e.g. casting. Currently, this is
worked around by early revocation of the cleanup for self in
trans_self_arg.
To handle this correctly, we have to put the ValueRef for the cleanup
into the MethodData, so trans_call_inner can use it to revoke the
cleanup when it's actually supposed to.