This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).
The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.
I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).
Closes#4877
* Added doc comments explaining what all public functionality does.
* Added the ability to spawn a detached thread
* Added the ability for the procs to return a value in 'join'
Whenever the runtime is shut down, add a few hooks to clean up some of the
statically initialized data of the runtime. Note that this is an unsafe
operation because there's no guarantee on behalf of the runtime that there's no
other code running which is using the runtime.
This helps turn down the noise a bit in the valgrind output related to
statically initialized mutexes. It doesn't turn the noise down to 0 because
there are still statically initialized mutexes in dynamic_lib and
os::with_env_lock, but I believe that it would be easy enough to add exceptions
for those cases and I don't think that it's the runtime's job to go and clean up
that data.
This patchset fixes some parts broken on Win64.
This also adds `--disable-pthreads` flags to llvm on mingw-w64 archs (both 32-bit and 64-bit, not mingw) due to bad performance. See #8996 for discussion.
The reasons for doing this are:
* The model on which linked failure is based is inherently complex
* The implementation is also very complex, and there are few remaining who
fully understand the implementation
* There are existing race conditions in the core context switching function of
the scheduler, and possibly others.
* It's unclear whether this model of linked failure maps well to a 1:1 threading
model
Linked failure is often a desired aspect of tasks, but we would like to take a
much more conservative approach in re-implementing linked failure if at all.
Closes#8674Closes#8318Closes#8863
The reasons for doing this are:
* The model on which linked failure is based is inherently complex
* The implementation is also very complex, and there are few remaining who
fully understand the implementation
* There are existing race conditions in the core context switching function of
the scheduler, and possibly others.
* It's unclear whether this model of linked failure maps well to a 1:1 threading
model
Linked failure is often a desired aspect of tasks, but we would like to take a
much more conservative approach in re-implementing linked failure if at all.
Closes#8674Closes#8318Closes#8863
I cannot tell whether the original comment was unsure about the
arithmetic calculations, or if it was unsure about the assumptions
being made about the alignment of the current allocation pointer.
The arithmetic calculation looks fine to me, though. This technique
is documented e.g. in Henry Warren's "Hacker's Delight" (section 3-1).
(I am sure one can find it elsewhere too, its not an obscure
property.)
There are issues with reading stdin when it is actually attached to a pipe, but
I have run into no problems in writing to stdout/stderr when they are attached
to pipes.
Explicitly have the only C++ portion of the runtime be one file with exception
handling. All other runtime files must now live in C and be fully defined in C.
There are issues with reading stdin when it is actually attached to a pipe, but
I have run into no problems in writing to stdout/stderr when they are attached
to pipes.
This commit re-organizes the io::native module slightly in order to have a
working implementation of rtio::IoFactory which uses native implementations. The
goal is to seamlessly multiplex among libuv/native implementations wherever
necessary.
Right now most of the native I/O is unimplemented, but we have existing bindings
for file descriptors and processes which have been hooked up. What this means is
that you can now invoke println!() from libstd with no local task, no local
scheduler, and even without libuv.
There's still plenty of work to do on the native I/O factory, but this is the
first steps into making it an official portion of the standard library. I don't
expect anyone to reach into io::native directly, but rather only std::io
primitives will be used. Each std::io interface seamlessly falls back onto the
native I/O implementation if the local scheduler doesn't have a libuv one
(hurray trait ojects!)
I was benchmarking rust-http recently, and I saw that 50% of its time was spent
creating buffered readers/writers. Albeit rust-http wasn't using
std::rt::io::buffered, but the same idea applies here. It's much cheaper to
malloc a large region and not initialize it than to set it all to 0. Buffered
readers/writers never use uninitialized data, and their internal buffers are
encapsulated, so any usage of uninitialized slots are an implementation bug in
the readers/writers.
I increased this to 4MB when I implemented abort-on-stack-overflow for Rust
functions. Now that the fixed_stack_segment attribute is removed, no rust
function will ever reasonably request 2MB of stack (due to calling an FFI
function).
The default size of 2MB should be plenty for everyday use-cases, and tasks can
still request more stack via the spawning API.
These two attributes are no longer useful now that Rust has decided to leave
segmented stacks behind. It is assumed that the rust task's stack is always
large enough to make an FFI call (due to the stack being very large).
There's always the case of stack overflow, however, to consider. This does not
change the behavior of stack overflow in Rust. This is still normally triggered
by the __morestack function and aborts the whole process.
C stack overflow will continue to corrupt the stack, however (as it did before
this commit as well). The future improvement of a guard page at the end of every
rust stack is still unimplemented and is intended to be the mechanism through
which we attempt to detect C stack overflow.
Closes#8822Closes#10155
I was benchmarking rust-http recently, and I saw that 50% of its time was spent
creating buffered readers/writers. Albeit rust-http wasn't using
std::rt::io::buffered, but the same idea applies here. It's much cheaper to
malloc a large region and not initialize it than to set it all to 0. Buffered
readers/writers never use uninitialized data, and their internal buffers are
encapsulated, so any usage of uninitialized slots are an implementation bug in
the readers/writers.
It appears that uv's support for interacting with a stdio stream as a tty when
it's actually a pipe is pretty problematic. To get around this, promote a check
to see if the stream is a tty to the top of the tty constructor, and bail out
quickly if it's not identified as a tty.
Closes#10237