These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks!
The program I used to benchmark is at the end. It was compiled with `rustc --opt-level=3 bench.rs --test` and run as `RUST_THREADS=1 ./bench --bench`. I chose to use `RUST_THREADS=1` due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by `--bench` (lower is better).
| | green | native | raw |
| ------------- | ----- | ------ | ------ |
| osx before | 12699 | 24030 | 19734 |
| linux before | 10223 | 125983 | 122647 |
| osx after | 3847 | 25771 | 20835 |
| linux after | 2631 | 135398 | 122765 |
Note that this is *not* a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is *clearly* benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower.
All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack.
```rust
extern mod extra;
extern mod native;
use std::rt:🧵:Thread;
#[bench]
fn green(bh: &mut extra::test::BenchHarness) {
let (p, c) = SharedChan::new();
bh.iter(|| {
let c = c.clone();
spawn(proc() {
c.send(());
});
p.recv();
});
}
#[bench]
fn native(bh: &mut extra::test::BenchHarness) {
let (p, c) = SharedChan::new();
bh.iter(|| {
let c = c.clone();
native::task::spawn(proc() {
c.send(());
});
p.recv();
});
}
#[bench]
fn raw(bh: &mut extra::test::BenchHarness) {
bh.iter(|| {
Thread::start(proc() {}).join()
});
}
```
Two unfortunate allocations were wrapping a proc() in a proc() with
GreenTask::build_start_wrapper, and then boxing this proc in a ~proc() inside of
Context::new(). Both of these allocations were a direct result from two
conditions:
1. The Context::new() function has a nice api of taking a procedure argument to
start up a new context with. This inherently required an allocation by
build_start_wrapper because extra code needed to be run around the edges of a
user-provided proc() for a new task.
2. The initial bootstrap code only understood how to pass one argument to the
next function. By modifying the assembly and entry points to understand more
than one argument, more information is passed through in registers instead of
allocating a pointer-sized context.
This is sadly where I end up throwing mips under a bus because I have no idea
what's going on in the mips context switching code and don't know how to modify
it.
Closes#7767
cc #11389
Instead, use an enum to allow running both a procedure and sending the task
result over a channel. I expect the common case to be sending on a channel (e.g.
task::try), so don't require an extra allocation in the common case.
cc #11389
The condition was the wrong direction and it also didn't take equality into
account. Tests were added for both cases.
For the small benchmark of `task::try(proc() {}).unwrap()`, this takes the
iteration time on OSX from 15119 ns/iter to 6179 ns/iter (timed with
RUST_THREADS=1)
cc #11389
The first setp for #9880 is to add a new `crate` keyword. This PR does exactly that. I took a chance to refactor `parse_item_foreign_mod` and I broke it down into 2 separate methods to isolate each feature.
The next step will be to push a new stage0 snapshot and then get rid of all `extern mod` around the code.
If you were writing to something along the lines of `self.foo` then with the new
closure rules it meant that you were borrowing `self` for the entirety of the
closure, meaning that you couldn't format other fields of `self` at the same
time as writing to a buffer contained in `self`.
By lifting the borrow outside of the closure the borrow checker can better
understand that you're only borrowing one of the fields at a time. This had to
use type ascription as well in order to preserve trait object coercions.
Previously crates like `green` and `native` would still depend on their
parents when running `make check-stage2-green NO_REBUILD=1`, this
ensures that they only depend on their source files.
Also, apply NO_REBUILD to the crate doc tests, so, for example,
`check-stage2-doc-std` will use an already compiled `rustdoc` directly.
It asserted that the previous count was always nonnegative, but DISCONNECTED is
a valid value for it to see. In order to continue to remember to store
DISCONNECTED after DISCONNECTED was seen, I also added a helper method.
Closes#12226
The `id` shouldn't be changed by external code, and exposing it publicly
allows to be accidentally changed.
Also, remove the first element special case in the `select!` macro.
::num::bigint, Remove a source of O(n^2) running time in `fn shr_bits`.
I'll cut to the chase: On my laptop, this brings the running time on
`pidigits 2000` (from src/test/bench/shootout-pidigits.rs) from this:
```
% time ./pidigits 2000 > /dev/null
real 0m7.695s
user 0m7.690s
sys 0m0.005s
```
to this:
```
% time ./pidigits 2000 > /dev/null
real 0m0.322s
user 0m0.318s
sys 0m0.004s
```
The previous code was building up a vector by repeatedly making a
fresh copy for each element that was unshifted onto the front,
yielding quadratic running time. This fixes that by building up the
vector in reverse order (pushing elements onto the end) and then
reversing it.
(Another option would be to build up a zero-initialized vector of the
desired length and then installing all of the shifted result elements
into their target index, but this was easier to hack up quickly, and
yields the desired asymptotic improvement. I have been thinking of
adding a `vec::from_fn_rev` to handle this case, maybe I will try that
this weekend.)
Externally loaded libraries are able to do things that cause references
to them to survive past the expansion phase (e.g. creating @-box cycles,
launching a task or storing something in task local data). As such, the
library has to stay loaded for the lifetime of the process.
This patch gets rid of ObsoleteExternModAttributesInParens and
ObsoleteNamedExternModule since the replacement of `extern mod` with
`extern crate` avoids those cases and raises different errors. Both have
been around for at least a version which makes this a good moment to get
rid of them.
This patch adds a new keyword `crate` which is intended to replace mod
in the context of `extern mod` as part of the issue #9880. The patch
doesn't replace all `extern mod` cases since it is necessary to first
push a new snapshot 0.
The implementation could've been less invasive than this. However I
preferred to take this chance to split the `parse_item_foreign_mod`
method and pull the `extern crate` part out of there, hence the new
method `parse_item_foreign_crate`.
This patch replaces all `crate` usage with `krate` before introducing the
new keyword. This ensures that after introducing the keyword, there
won't be any compilation errors.
krate might not be the most expressive substitution for crate but it's a
very close abbreviation for it. `module` was already used in several
places already.
While working on #11363 I stumbled over a couple of ignored tests, that seem to be fixed or invalid.
* src/test/run-pass/issue-3559.rs was fixed in #4726
* src/test/compile-fail/borrowck-call-sendfn.rs was fixed in #2978
* update src/test/compile-fail/issue-5500-1.rs to work with current Rust (I'm not 100% sure if the original condition is tested as mentioned in #5500, but I think so)
* removed src/test/compile-fail/issue-5500.rs because it is tested in
src/test/run-fail/issue-5500.rs (they are the same test cases, I just renamed src/test/run-fail/addr-of-bot.rs to be consistent with the other issue name
* src/test/run-pass/issue-3559.rs was fixed in #4726
* src/test/compile-fail/borrowck-call-sendfn.rs was fixed in #2978
* update src/test/compile-fail/issue-5500-1.rs to work with current Rust
* removed src/test/compile-fail/issue-5500.rs because it is tested in
src/test/run-fail/issue-5500.rs
* src/test/compile-fail/view-items-at-top.rs fixed
* #897 fixed
* compile-fail/issue-6762.rs issue was closed as dup of #6801
* deleted compile-fail/issue-2074.rs because it became irelevant and is
irrelevant #2074, a test covering this was added in
4f92f452bd
This adopts the rules posted in #10432:
1. If a seek position is negative, then an error is generated
2. Seeks beyond the end-of-file are allowed. Future writes will fill the gap
with data and future reads will return errors.
3. Seeks within the bounds of a file are fine.
Closes#10432
Cleans up a few issues with `fourcc`:
* Corrects the endianness in the docs example
* Removes `#[cfg(not(test))]` (bors might not build this on Windows. If the build fails, I'll re-add it)
* Adds a FIXME referencing the LLVM assert issue we encountered with bors builds on Windows (Same error as #10872)
Loadable syntax extensions don't work when cross compiling (see #12102), so the
fourcc tests all need to be ignored. They're valuable tests, so they shouldn't
be outright ignored, so they're now flagged with ignore-cross-compile
This adopts the rules posted in #10432:
1. If a seek position is negative, then an error is generated
2. Seeks beyond the end-of-file are allowed. Future writes will fill the gap
with data and future reads will return errors.
3. Seeks within the bounds of a file are fine.
Closes#10432
The user-facing API-level change of this commit is that `SharedChan` is gone and `Chan` now has `clone`. The major parts of this patch are the internals which have changed.
Channels are now internally upgraded from oneshots to streams to shared channels depending on the use case. I've noticed a 3x improvement in the oneshot case and very little slowdown (if any) in the stream/shared case.
This patch is mostly a reorganization of the `std::comm` module, and the large increase in code is from either dispatching to one of 3 impls or the duplication between the stream/shared impl (because they're not entirely separate).
The `comm` module is now divided into `oneshot`, `stream`, `shared`, and `select` modules. Each module contains the implementation for that flavor of channel (or the select implementation for select).
Some notable parts of this patch
* Upgrades are done through a semi-ad-hoc scheme for oneshots and messages for streams
* Upgrades are processed ASAP and have some interesting interactions with select
* send_deferred is gone because I expect the mutex to land before this
* Some of stream/shared is straight-up duplicated, but I like having the distinction between the two modules
* Select got a little worse, but it's still "basically limping along"
* This lumps in the patch of deallocating the queue backlog on packet drop
* I'll rebase this on top of the "more errors from try_recv" patch once it lands (all the infrastructure is here already)
All in all, this shouldn't be merged until the new mutexes are merged (because send_deferred wasn't implemented).
Closes#11351
This is an attempt to remove some more of Rust's dependencies on libgcc and replace it with LLVM's compiler-rt lib. I've added compiler-rt as a submodule and changed libstd to link with it.
As far as I could verify, after this change, the only symbols still imported by std from libgcc are the stack unwinding functions. Other crates, however, still picked up symbols from libgcc, not from libstd, as I had hoped. So linking definitely requires some work.
I've only tested this on windows, 32-bit linux and android and fully expect it to fail on other platforms. Patches are welcome.
This, the Nth rewrite of channels, is not a rewrite of the core logic behind
channels, but rather their API usage. In the past, we had the distinction
between oneshot, stream, and shared channels, but the most recent rewrite
dropped oneshots in favor of streams and shared channels.
This distinction of stream vs shared has shown that it's not quite what we'd
like either, and this moves the `std::comm` module in the direction of "one
channel to rule them all". There now remains only one Chan and one Port.
This new channel is actually a hybrid oneshot/stream/shared channel under the
hood in order to optimize for the use cases in question. Additionally, this also
reduces the cognitive burden of having to choose between a Chan or a SharedChan
in an API.
My simple benchmarks show no reduction in efficiency over the existing channels
today, and a 3x improvement in the oneshot case. I sadly don't have a
pre-last-rewrite compiler to test out the old old oneshots, but I would imagine
that the performance is comparable, but slightly slower (due to atomic reference
counting).
This commit also brings the bonus bugfix to channels that the pending queue of
messages are all dropped when a Port disappears rather then when both the Port
and the Chan disappear.
Beforehand, using a concurrent queue always mandated that the "shared state" be
stored internally to the queues in order to provide a safe interface. This isn't
quite as flexible as one would want in some circumstances, so instead this
commit moves the queues to not containing the shared state.
The queues no longer have a "default useful safe" interface, but rather a
"default safe" interface (minus the useful part). The queues have to be shared
manually through an Arc or some other means. This allows them to be a little
more flexible at the cost of a usability hindrance.
I plan on using this new flexibility to upgrade a channel to a shared channel
seamlessly.
I implemented an add method for the btree in progress. It is intended to be refactored later using an alternative to .clone() that passes the borrow checker, but for now, it works as intended. r? @catamorphism