Avoids the overhead of read_char for every character.
Benchmark reading example.json 10 times from
https://code.google.com/p/rapidjson/wiki/Performance
Before: 2.55s
After: 0.16s
Regression testing is already done by isrustfastyet.
There are lots of unneeded allocas and Store/Load cycles for calls with
immediate return values. This is a first step towards removing that, allowing
immediate return values to be directly returned from trans_call_inner and
trans_lang_call (for now), instead of always stuffing them into an alloca.
For now, only a few things take advantage of the new behaviour, but this
already saves 16k allocas and 43k lines in total in the unoptimized IR
for librustc. Running "make check" under time shows that CPU time for
the unoptimized test suite is reduced by about 7%.
We currently still handle immediate return values a lot like
non-immediate ones. We provide a slot for them and store them into
memory, often just to immediately load them again. To improve this
situation, trans_call_inner has to return a Result which contains the
immediate return value.
Also, it also needs to accept "No destination" in addition to just
SaveIn and Ignore. Since "No destination" isn't something that fits
well into the Dest type, I've chosen to simply use Option<Dest>
instead, paired with an assertion that checks that "None" is only
allowed for immediate return values.
A lot of cross-platform issues stem from rusti/rustpkg, so include these two test suites in the 'check-lite' target which is run on the cross-compile bots. It shouldn't be much of a performance hit because these suites are pretty fast to run.
Hopefully this will make snapshot/tarball creation easier in the future.
There's no need to allocate a return slot for anykind of immediate
return value, not just not for nils. Also, when the return value is
ignored, we only have to copy it to a temporary alloca if it's actually
required to call drop_ty on it.
This is work continued from the now landed #7495 and #7521 pulls.
Removing the headers from unique vectors is another project, so I've separated the allocator.
It's broken/unmaintained and needs to be rewritten to avoid managed
pointers and needless copies. A full rewrite is necessary and the API
will need to be redone so it's not worth keeping this around (#7628).
Closes#2236, #2744
This way when you compile with -Z trans-stats you'll get a per-function cost breakdown, sorted with the most expensive functions first. Should help highlight pathological code.
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
Implement methods `.pop_opt() -> Option<T>` and `.shift_opt() -> Option<T>` to allow retrieval of front/back of a vec in one operation without fail. .pop() and .shift() are changed to reuse the former two methods.
Follows the naming of the previous method .head_opt()
Where * = tcp, ip, url.
Formerly, extra::net::* were aliases of extra::net_*, but were the
recommended path to use. Thus, the documentation talked of the `net_*`
modules while everything else was written expecting `net::*`.
This moves the actual modules so that `extra::net::*` is the actual
location of the modules.
This will naturally break any code which used `extra::net_*` directly.
They should be altered to use `extra::net::*` (which has been the
documented way of doing things for some time).
This ensures that there is one, and only one, obvious way of doing
things.
Currently, scopes are tied to LLVM basic blocks. For each scope, there
are two new basic blocks, which means two extra jumps in the unoptimized
IR. These blocks aren't actually required, but only used to act as the
boundary for cleanups.
By keeping track of the current scope within a single basic block, we
can avoid those extra blocks and jumps, shrinking the pre-optimization
IR quite considerably. For example, the IR for trans_intrinsic goes
from ~22k lines to ~16k lines, almost 30% less.
The impact on the build times of optimized builds is rather small (about
1%), but unoptimized builds are about 11% faster. The testsuite for
unoptimized builds runs between 15% (CPU time) and 7.5% (wallclock time on
my i7) faster.
Also, in some situations this helps LLVM to generate better code by
inlining functions that it previously considered to be too large.
Likely because of the pointless blocks/jumps that were still present at
the time the inlining pass runs.
Refs #7462
In an ideal world, the AST would be completely sendable, this gets us a step closer.
It removes the local heap allocations for `view_item`, `Path`, `Lifetime` `trait_ref` `OptVec<TyParamBounds>` and `Ty`. There are also a few other smaller changes I made as things went along.
Also, makes the pretty-printer use & instead of @ as much as possible,
which will help with later changes, though in the interim has produced
some... interesting constructs.