Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
Implement methods `.pop_opt() -> Option<T>` and `.shift_opt() -> Option<T>` to allow retrieval of front/back of a vec in one operation without fail. .pop() and .shift() are changed to reuse the former two methods.
Follows the naming of the previous method .head_opt()
Where * = tcp, ip, url.
Formerly, extra::net::* were aliases of extra::net_*, but were the
recommended path to use. Thus, the documentation talked of the `net_*`
modules while everything else was written expecting `net::*`.
This moves the actual modules so that `extra::net::*` is the actual
location of the modules.
This will naturally break any code which used `extra::net_*` directly.
They should be altered to use `extra::net::*` (which has been the
documented way of doing things for some time).
This ensures that there is one, and only one, obvious way of doing
things.
When building Rust libraries (e.g. librustc, libstd, etc), checks for
and verbosely removes previous build products before invoking rustc.
(Also, when Make variable VERBOSE is defined, it will list all of the
libraries matching the object library's glob after the rustc
invocation has completed.)
When installing Rust libraries, checks for previous libraries in
target install directory, but does not remove them.
The thinking behind these two different modes of operation is that the
installation target, unlike the build tree, is not under the control
of this infrastructure and it is not up to this Makefile to decide if
the previous libraries should be removed.
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
In an ideal world, the AST would be completely sendable, this gets us a step closer.
It removes the local heap allocations for `view_item`, `Path`, `Lifetime` `trait_ref` `OptVec<TyParamBounds>` and `Ty`. There are also a few other smaller changes I made as things went along.
Also, makes the pretty-printer use & instead of @ as much as possible,
which will help with later changes, though in the interim has produced
some... interesting constructs.
After getting an ICE trying to use the `Repr` enum from middle::trans::adt (see issue #7527), I tried to implement the missing case for struct-like enum variants in `middle::ty::enum_variants()`. It seems to work now (and passes make check) but there are still some uncertainties that bother me:
+ I'm not sure I did everything, right. Especially getting the variant constructor function from the variant node id is just copied from the tuple-variant case. Someone with more experience in the code base should be able to see rather quickly whether this OK so.
+ It is kind of strange that I could not reproduce the ICE with a smaller test case. The unimplemented code path never seems to be hit in most cases, even when using the exact same `Repr` enum, just with `ty::t` replaced by an opaque pointer. Also, within the `adt` module, `Repr` and matching on it is used multiple times, again without running into problems. Can anyone explain why this is the case? That would be much appreciated.
Apart from that, I hope this PR is useful.
This a followup to #7510. @catamorphism requested a test - so I have created one, but in doing so I noticed some inconsistency in the error messages resulting from referencing nonexistent traits, so I changed the messages to be more consistent.
Change the signature of Iterator.size_hint() to always have a lower bound.
Implement .size_hint() on all remaining iterators (if it differs from the default).
It's broken/unmaintained and needs to be rewritten to avoid managed
pointers and needless copies. A full rewrite is necessary and the API
will need to be redone so it's not worth keeping this around.
Closes#2236, #2744
Fix an assertion in grow when using add_front.
Speeds up grow (and the functions that use it) by a lot. We retain the vector instead of creating it anew for each grow.
The struct field hi is removed since it is redundant, and all iterators are updated to use a representation closer to the Deque itself, it should make it work even if deque sizes are not powers of two.
Deque::with_capacity is also implemented, but .reserve() is not yet done.
They simply byte-swap an integer to a specific endian, like the hton* functions in C.
These intrinsics are synthesized, so maybe they should be in another file. But since they are just a single line of code each, based on the bswap intrinsics and aren't really intended for public consumption I thought they would fit in the intrinsics file.
The next step working on this could be to expose a trait / generic function for byteswapping.
Adding iterators for extra::smallintmap
Working on mutability error
Ran into ICE
More mutability problems
Working through mutability issue
working on getting tests passing
SmallIntMa tests passing
Added SmallIntSet iterators, and the tests are passing
Stripped trailing spaces
Removed extra use directive
Since ' ' is by far one of the most common characters, it is worthwhile
to put it first, and short-circuit the rest of the function.
On the same JSON benchmark, as the json_perf improvement, reading example.json
10 times from https://code.google.com/p/rapidjson/wiki/Performance.
Before: 0.16s
After: 0.11s
The deque is split at the marker lo, or logical index 0. Move the
shortest part (split by lo) when growing. This way add_front is just as
fast as add_back, on average.
The previous implementation of reverse iterators used modulus (%) of
negative indices, which did work but was fragile due to dependency on
the divisor.
The deque is determined by vec self.elts.len(), self.nelts, and self.lo,
and self.hi is calculated from these.
self.hi is just the raw index of element number `self.nelts`
Fix some issues with the deque being very slow, keep the same vec around
instead of constructing a new. Move as few elements as possible, so the
self.lo point is not moved after grow.
[o o o o o|o o o]
hi...^ ^.... lo
grows to
[. . . . .|o o o o o o o o|. . .]
^.. lo ^.. hi
If the deque is append-only, it will result in moving no elements on
grow. If the deque is prepend-only, all will be moved each time.
The bench tests added show big improvements:
Timed using `rust build -O --test extra.rs && ./extra --bench deque`
Old version:
test deque::tests::bench_add_back ... bench: 4976 ns/iter (+/- 9)
test deque::tests::bench_add_front ... bench: 4108 ns/iter (+/- 18)
test deque::tests::bench_grow ... bench: 416964 ns/iter (+/- 4197)
test deque::tests::bench_new ... bench: 408 ns/iter (+/- 12)
With this commit:
test deque::tests::bench_add_back ... bench: 12 ns/iter (+/- 0)
test deque::tests::bench_add_front ... bench: 16 ns/iter (+/- 0)
test deque::tests::bench_grow ... bench: 1515 ns/iter (+/- 30)
test deque::tests::bench_new ... bench: 419 ns/iter (+/- 3)
Avoids the overhead of read_char for every character.
Benchmark reading example.json 10 times from
https://code.google.com/p/rapidjson/wiki/Performance
Before: 2.55s
After: 0.16s
Regression testing is already done by isrustfastyet.
Add a function to safely retrieve the first element of a ~[T], as
Option<T>. Implement shift() using shift_opt().
Add tests for both .shift() and .shift_opt()
Add a function to safely retrieve the last element of a ~[T], as
Option<T>. Implement pop() using pop_opt(); it benches the same as the
old implementation when tested with optimization level 2.