This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).
The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.
I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).
Closes#4877
This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).
The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.
I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).
Closes#4877
While tracking down how this function became dead, identified a spot
(@fn cannot happen) where we probably would prefer to ICE rather than
pass silently; so added fail! invocation.
While tracking down how this function became dead, identified a spot
(@fn cannot happen) where we probably would prefer to ICE rather than
pass silently; so added fail! invocation.
I have written some benchmark tests to `push`, `push_many`, `join`,
`join_many` and `ends_with_path`.
Let me know what you think (@cmr).
Thanks in advance.
Previously, `//// foo` and `/*** foo ***/` were accepted as doc comments. This
changes that, so that only `/// foo` and `/** foo ***/` are accepted. This
confuses many newcomers and it seems weird.
Also update the manual for these changes, and modernify the EBNF for comments.
Closes#10638
Previously, `//// foo` and `/*** foo ***/` were accepted as doc comments. This
changes that, so that only `/// foo` and `/** foo ***/` are accepted. This
confuses many newcomers and it seems weird.
Also update the manual for these changes, and modernify the EBNF for comments.
Closes#10638
Instead of forcibly always aborting compilation, allow usage of
#[warn(unknown_features)] and related lint attributes to selectively abort
compilation. By default, this lint is deny.
This has one commit from a separate pull request (because these commits depend on that one), but otherwise the extra details can be found in the commit messages. The `rt::thread` module has been generally cleaned up for everyday safe usage (and it's a bug if it's not safe).
Instead of forcibly always aborting compilation, allow usage of
#[warn(unknown_features)] and related lint attributes to selectively abort
compilation. By default, this lint is deny.
* Added doc comments explaining what all public functionality does.
* Added the ability to spawn a detached thread
* Added the ability for the procs to return a value in 'join'
I've noticed I use this pattern quite a bit:
~~~rust
do spawn {
loop {
match port.try_recv() {
Some(x) => ...,
None => ...,
}
}
}
~~~
The `RecvIterator`, returned from a default `recv_iter` method on the `GenericPort` trait, allows you to reduce this down to:
~~~rust
do spawn {
for x in port.recv_iter() {
...
}
}
~~~
As demonstrated in the tests, you can also access the port from within the `for` block for further `recv`ing and `peek`ing with no borrow errors, which is quite nice.
### Rationale
There is no reason to support more than 2³² nodes or names at this moment, as compiling something that big (even without considering the quadratic space usage of some analysis passes) would take at least **64GB**.
Meanwhile, some can't (or barely can) compile rustc because it requires almost **1.5GB**.
### Potential problems
Can someone confirm this doesn't affect metadata (de)serialization? I can't tell myself, I know nothing about it.
### Results
Some structures have a size reduction of 25% to 50%: [before](https://gist.github.com/luqmana/3a82a51fa9c86d9191fa) - [after](https://gist.github.com/eddyb/5a75f8973d3d8018afd3).
Sadly, there isn't a massive change in the memory used for compiling stage2 librustc (it doesn't go over **1.4GB** as [before](http://huonw.github.io/isrustfastyet/mem/), but I can barely see the difference).
However, my own testcase (previously peaking at **1.6GB** in typeck) shows a reduction of **200**-**400MB**.
Whenever the runtime is shut down, add a few hooks to clean up some of the
statically initialized data of the runtime. Note that this is an unsafe
operation because there's no guarantee on behalf of the runtime that there's no
other code running which is using the runtime.
This helps turn down the noise a bit in the valgrind output related to
statically initialized mutexes. It doesn't turn the noise down to 0 because
there are still statically initialized mutexes in dynamic_lib and
os::with_env_lock, but I believe that it would be easy enough to add exceptions
for those cases and I don't think that it's the runtime's job to go and clean up
that data.