The previous code erroneously assumed that 'steals > cnt' was always true, but
that was a false assumption. The code was altered to decrement steals to a
minimum of 0 instead of taking all of cnt into account.
I didn't include the exact test from #12295 because it could run for quite
awhile, and instead set the threshold for MAX_STEALS to much lower during
testing. I found that this triggered the old bug quite frequently when running
without this fix.
Closes#12295
Change `os::args()` and `os::env()` to use `str::from_utf8_lossy()`.
Add new functions `os::args_as_bytes()` and `os::env_as_bytes()` to retrieve the args/env as byte vectors instead.
The existing methods were left returning strings because I expect that the common use-case is to want string handling.
Fixes#7188.
Parse the environment by default with from_utf8_lossy. Also provide
byte-vector equivalents (e.g. os::env_as_bytes()).
Unfortunately, setenv() can't have a byte-vector equivalent because of
Windows support, unless we want to define a setenv_bytes() that fails
under Windows for non-UTF8 (or non-UTF16).
os::args() was using str::raw::from_c_str(), which would assert if the
C-string wasn't valid UTF-8. Switch to using from_utf8_lossy() instead,
and add a separate function os::args_as_bytes() that returns the ~[u8]
byte-vectors instead.
This will hopefully bring us closer to #11937. We're still using gcc's idea of
"startup files", but this should prevent us from leaking in dependencies that we
don't quite want (libgcc for example once compiler-rt is what we use).
When tests fail, their stdout and stderr is printed as part of the summary, but
this helps suppress failure messages from #[should_fail] tests and generally
clean up the output of the test runner.
Any single-threaded task benchmark will spend a good chunk of time in `kqueue()` on osx and `epoll()` on linux, and the reason for this is that each time a task is terminated it will hit the syscall. When a task terminates, it context switches back to the scheduler thread, and the scheduler thread falls out of `run_sched_once` whenever it figures out that it did some work.
If we know that `epoll()` will return nothing, then we can continue to do work locally (only while there's work to be done). We must fall back to `epoll()` whenever there's active I/O in order to check whether it's ready or not, but without that (which is largely the case in benchmarks), we can prevent the costly syscall and can get a nice speedup.
I've separated the commits into preparation for this change and then the change itself, the last commit message has more details.
These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks!
The program I used to benchmark is at the end. It was compiled with `rustc --opt-level=3 bench.rs --test` and run as `RUST_THREADS=1 ./bench --bench`. I chose to use `RUST_THREADS=1` due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by `--bench` (lower is better).
| | green | native | raw |
| ------------- | ----- | ------ | ------ |
| osx before | 12699 | 24030 | 19734 |
| linux before | 10223 | 125983 | 122647 |
| osx after | 3847 | 25771 | 20835 |
| linux after | 2631 | 135398 | 122765 |
Note that this is *not* a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is *clearly* benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower.
All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack.
```rust
extern mod extra;
extern mod native;
use std::rt:🧵:Thread;
#[bench]
fn green(bh: &mut extra::test::BenchHarness) {
let (p, c) = SharedChan::new();
bh.iter(|| {
let c = c.clone();
spawn(proc() {
c.send(());
});
p.recv();
});
}
#[bench]
fn native(bh: &mut extra::test::BenchHarness) {
let (p, c) = SharedChan::new();
bh.iter(|| {
let c = c.clone();
native::task::spawn(proc() {
c.send(());
});
p.recv();
});
}
#[bench]
fn raw(bh: &mut extra::test::BenchHarness) {
bh.iter(|| {
Thread::start(proc() {}).join()
});
}
```
Two unfortunate allocations were wrapping a proc() in a proc() with
GreenTask::build_start_wrapper, and then boxing this proc in a ~proc() inside of
Context::new(). Both of these allocations were a direct result from two
conditions:
1. The Context::new() function has a nice api of taking a procedure argument to
start up a new context with. This inherently required an allocation by
build_start_wrapper because extra code needed to be run around the edges of a
user-provided proc() for a new task.
2. The initial bootstrap code only understood how to pass one argument to the
next function. By modifying the assembly and entry points to understand more
than one argument, more information is passed through in registers instead of
allocating a pointer-sized context.
This is sadly where I end up throwing mips under a bus because I have no idea
what's going on in the mips context switching code and don't know how to modify
it.
Closes#7767
cc #11389
Instead, use an enum to allow running both a procedure and sending the task
result over a channel. I expect the common case to be sending on a channel (e.g.
task::try), so don't require an extra allocation in the common case.
cc #11389
If you were writing to something along the lines of `self.foo` then with the new
closure rules it meant that you were borrowing `self` for the entirety of the
closure, meaning that you couldn't format other fields of `self` at the same
time as writing to a buffer contained in `self`.
By lifting the borrow outside of the closure the borrow checker can better
understand that you're only borrowing one of the fields at a time. This had to
use type ascription as well in order to preserve trait object coercions.
It asserted that the previous count was always nonnegative, but DISCONNECTED is
a valid value for it to see. In order to continue to remember to store
DISCONNECTED after DISCONNECTED was seen, I also added a helper method.
Closes#12226
The `id` shouldn't be changed by external code, and exposing it publicly
allows to be accidentally changed.
Also, remove the first element special case in the `select!` macro.
The green scheduler can optimize its runtime based on this by deciding to not go
to sleep in epoll() if there is no active I/O and there is a task to be stolen.
This is implemented for librustuv by keeping a count of the number of tasks
which are currently homed. If a task is homed, and then performs a blocking I/O
operation, the count will be nonzero while the task is blocked. The homing count
is intentionally 0 when there are I/O handles, but no handles currently blocked.
The reason for this is that epoll() would only be used to wake up the scheduler
anyway.
The crux of this change was to have a `HomingMissile` contain a mutable borrowed
reference back to the `HomeHandle`. The rest of the change was just dealing with
this fallout. This reference is used to decrement the homed handle count in a
HomingMissile's destructor.
Also note that the count maintained is not atomic because all of its
increments/decrements/reads are all on the same I/O thread.
This adopts the rules posted in #10432:
1. If a seek position is negative, then an error is generated
2. Seeks beyond the end-of-file are allowed. Future writes will fill the gap
with data and future reads will return errors.
3. Seeks within the bounds of a file are fine.
Closes#10432
This adopts the rules posted in #10432:
1. If a seek position is negative, then an error is generated
2. Seeks beyond the end-of-file are allowed. Future writes will fill the gap
with data and future reads will return errors.
3. Seeks within the bounds of a file are fine.
Closes#10432
This, the Nth rewrite of channels, is not a rewrite of the core logic behind
channels, but rather their API usage. In the past, we had the distinction
between oneshot, stream, and shared channels, but the most recent rewrite
dropped oneshots in favor of streams and shared channels.
This distinction of stream vs shared has shown that it's not quite what we'd
like either, and this moves the `std::comm` module in the direction of "one
channel to rule them all". There now remains only one Chan and one Port.
This new channel is actually a hybrid oneshot/stream/shared channel under the
hood in order to optimize for the use cases in question. Additionally, this also
reduces the cognitive burden of having to choose between a Chan or a SharedChan
in an API.
My simple benchmarks show no reduction in efficiency over the existing channels
today, and a 3x improvement in the oneshot case. I sadly don't have a
pre-last-rewrite compiler to test out the old old oneshots, but I would imagine
that the performance is comparable, but slightly slower (due to atomic reference
counting).
This commit also brings the bonus bugfix to channels that the pending queue of
messages are all dropped when a Port disappears rather then when both the Port
and the Chan disappear.
Beforehand, using a concurrent queue always mandated that the "shared state" be
stored internally to the queues in order to provide a safe interface. This isn't
quite as flexible as one would want in some circumstances, so instead this
commit moves the queues to not containing the shared state.
The queues no longer have a "default useful safe" interface, but rather a
"default safe" interface (minus the useful part). The queues have to be shared
manually through an Arc or some other means. This allows them to be a little
more flexible at the cost of a usability hindrance.
I plan on using this new flexibility to upgrade a channel to a shared channel
seamlessly.
I factored the commits by affected files, for the most part. The last 7 or 8 contain the meat of the PR. The rest are small changes to closures found in the codebase. Maybe interesting to read to see some of the impact of the rules.
r? @pcwalton
Fixes#6801
This is a fairly trivial (but IMHO handy) change to implement IterBytes for IpAddr and SocketAddr.
I originally stumbled across this because I wanted to use a SocketAddr as a HashMap key and discovered that I couldn't do it directly. Had to impl IterBytes on a new intermediate type to work around it.
Thinking about swap as an example of unsafe programming. This cleans it up a bit. It also removes type parametrization over `RawPtr` from the memcpy functions to make this compile.