This patch saves and restores win64's nonvolatile registers.
This patch also saves stack information of thread environment
block (TEB), which is at %gs:0x08 and %gs:0x10.
Some extern blobs are duplicated without "stdcall" abi,
since Win64 does not use any calling convention.
(Giving any abi to them causes llvm producing wrong bytecode.)
The method names in std::rt::io::extensions::WriterByteConversions are
the same as those in std::io::WriterUtils and a resolve error causes
rustc to fail after trying to find an impl of io::Writer instead of
trying to look for rt::io::Writer as well.
These aren't used for anything at the moment and cause some TLS hits
on some perf-critical code paths. Will need to put better thought into
it in the future.
This documents how to use trait bounds in a (hopefully) user-friendly way, in the containers tutorial, and also documents the task watching implementation for runtime developers in kill.rs.
r anybody
Naturally, and sadly, turning off sanity checks in the runtime is
a noticable performance win. The particular test I'm running goes from
~1.5 s to ~1.3s.
Sanity checks are turned *on* when not optimizing, or when cfg
includes `rtdebug` or `rtassert`.
The method names in std::rt::io::extensions::WriterByteConversions are
the same as those in std::io::WriterUtils and a resolve error causes
rustc to fail after trying to find an impl of io::Writer instead of
trying to look for rt::io::Writer as well.
Same goes for ReaderByteConversions.
This resolves issue #908.
Notable changes:
- On Windows, LLVM integrated assembler emits bad stack unwind tables when segmented stacks are enabled. However, unwind info directives in the assembly output are correct, so we generate assembly first and then run it through an external assembler, just like it is already done for Android builds.
- Linker is invoked via "g++" command instead of "gcc": g++ passes the appropriate magic parameters to the linker, which ensure correct registration of stack unwind tables in dynamic libraries.
- change all uses of Path in fn args to &P
- FileStream.read assumptions were wrong (libuv file io is non-positional)
- the above will mean that we "own" Seek impl info .. should probably
push it in UvFileDescriptor..
- needs more tests
Fixed a memory leak caused by the singleton idle callback failing to close correctly. The problem was that the close function requires running inside a callback in the event loop, but we were trying to close the idle watcher after the loop returned from run. The fix was to just call run again to process this callback. There is an additional tweak to move the initialization logic fully into bootstrap, so tasks that do not ever call run do not have problems destructing.
libuv handles are tied to the event loop that created them. In order to perform IO, the handle must be on the thread with its home event loop. Thus, when as task wants to do IO it must first go to the IO handle's home event loop and pin itself to the corresponding scheduler while the IO action is in flight. Once the IO action completes, the task is unpinned and either returns to its home scheduler if it is a pinned task, or otherwise stays on the current scheduler.
Making new blocking IO implementations (i.e. files) thread safe is rather simple. Add a home field to the IO handle's struct in uvio and implement the HomingIO trait. Wrap every IO call in the HomingIO.home_for_io method, which will take care of the scheduling.
I'm not sure if this remains thread safe in the presence of asynchronous IO at the libuv level. If we decide to do that, then this set up should be revisited.
Instead of a furious storm of idle callbacks we just have one. This is a major performance gain - around 40% on my machine for the ping pong bench.
Also in this PR is a cleanup commit for the scheduler code. Was previously up as a separate PR, but bors load + imminent merge hell led me to roll them together. Was #8549.
Each IO handle has a home event loop, which created it.
When a task wants to use an IO handle, it must first make sure it is on that home event loop.
It uses the scheduler handle in the IO handle to send itself there before starting the IO action.
Once the IO action completes, the task restores its previous home state.
If it is an AnySched task, then it will be executed on the new scheduler.
If it has a normal home, then it will return there before executing any more code after the IO action.
.with_c_str() is a replacement for the old .as_c_str(), to avoid
unnecessary boilerplate.
Replace all usages of .to_c_str().with_ref() with .with_c_str().
This PR fixes#7235 and #3371, which removes trailing nulls from `str` types. Instead, it replaces the creation of c strings with a new type, `std::c_str::CString`, which wraps a malloced byte array, and respects:
* No interior nulls
* Ends with a trailing null
Mostly optimizing TLS accesses to bring local heap allocation performance
closer to that of oldsched. It's not completely at parity but removing the
branches involved in supporting oldsched and optimizing pthread_get/setspecific
to instead use our dedicated TCB slot will probably make up for it.
Mostly optimizing TLS accesses to bring local heap allocation performance
closer to that of oldsched. It's not completely at parity but removing the
branches involved in supporting oldsched and optimizing pthread_get/setspecific
to instead use our dedicated TCB slot will probably make up for it.
FromStr implemented from scratch.
It is overengineered a bit, however.
Old implementation handles errors by fail!()-ing. And it has bugs, like it accepts `127.0.0.1::127.0.0.1` as IPv6 address, and does not handle all ipv4-in-ipv6 schemes. So I decided to implement parser from scratch.
This pull request converts the scheduler from a naive shared queue scheduler to a naive workstealing scheduler. The deque is still a queue inside a lock, but there is still a substantial performance gain. Fiddling with the messaging benchmark I got a ~10x speedup and observed massively reduced memory usage.
There are still *many* locations for optimization, but based on my experience so far it is a clear performance win as it is now.
This is a fairly large rollup, but I've tested everything locally, and none of
it should be platform-specific.
r=alexcrichton (bdfdbdd)
r=brson (d803c18)
r=alexcrichton (a5041d0)
r=bstrie (317412a)
r=alexcrichton (135c85e)
r=thestinger (8805baa)
r=pcwalton (0661178)
r=cmr (9397fe0)
r=cmr (caa4135)
r=cmr (6a21d93)
r=cmr (4dc3379)
r=cmr (0aa5154)
r=cmr (18be261)
r=thestinger (f10be03)
This is a reopening of #8182, although this removes any abuse of the compiler internals. Now it's just a pure syntax extension (hard coded what the attribute names are).
This lazily initializes the taskgroup structs for ```spawn_unlinked``` tasks. If such a task never spawns another task linked to it (or a descendant of it), its taskgroup is simply never initialized at all. Also if an unlinked task spawns another unlinked task, neither of them will need to initialize their taskgroups. This works for the main task too.
I benchmarked this with the following test case and observed a ~~21% speedup (average over 4 runs: 7.85 sec -> 6.20 sec, 2.5 GHz)~~ 11% speedup, see comment below.
```
use std::task;
use std::cell::Cell;
use std::rt::comm;
static NUM: uint = 1024*256;
fn run(f: ~fn()) {
let mut t = task::task();
t.unlinked();
t.spawn(f);
}
fn main() {
do NUM.times {
let (p,c) = comm::oneshot();
let c = Cell::new(c);
do run { c.take().send(()); }
p.recv();
}
}
```
Better than that in rt::uv::net, because it:
* handles invalid input explicitly, without fail!()
* parses socket address, not just IP
* handles various ipv4-in-ipv6 addresses, like 2001:db8:122:344::192.0.2.33
(see http://tools.ietf.org/html/rfc6052 for example)
* rejects output like `127.0000000.0.1`
* does not allocate heap memory
* have unit tests
- Made naming schemes consistent between Option, Result and Either
- Changed Options Add implementation to work like the maybe monad (return None if any of the inputs is None)
- Removed duplicate Option::get and renamed all related functions to use the term `unwrap` instead
The truncation needs to be done in the console logger in order
to catch all the logging output, and because truncation only matters
when outputting to the console.
Every time run_sched_once performs a 'scheduling action' it needs to guarantee
that it runs at least one more time, so enqueue another run_sched_once callback.
The primary reason it needs to do this is because not all async callbacks
are guaranteed to run, it's only guaranteed that *a* callback will run after
enqueing one - some may get dropped.
At the moment this means we wastefully create lots of callbacks to ensure that
there will *definitely* be a callback queued up to continue running the scheduler.
The logic really needs to be tightened up here.
multicast functions now take IpAddr (without port), because they dont't
need port.
Uv* types renamed:
* UvIpAddr -> UvSocketAddr
* UvIpv4 -> UvIpv4SocketAddr
* UvIpv6 -> UvIpv6SocketAddr
"Socket address" is a common name for (ip-address, port) pair (e.g. in
sockaddr_in struct).
P. S. Are there any backward compatibility concerns? What is std::rt module, is it a part of public API?
And before collect_failure. These are both running user dtors and need to be handled
in the task try/catch block and before the final task cleanup code.
OS X defaults the ulimit for open files to 256 for programs launched
from the Terminal (GUI apps get a higher default). Unfortunately this is
too low for the rt tests, which deliberately overcommit and create a lot
of threads (which means a lot of schedulers, and each scheduler needs at
least 2 fds).
By calling sysctl() and setrlimit() we can bump the fd limit up to the
maximum allowed (on stock OS X it's 10240).
Fixes#7772.
multicast functions now take IpAddr (without port), because they dont't
need port.
Uv* types renamed:
* UvIpAddr -> UvSocketAddr
* UvIpv4 -> UvIpv4SocketAddr
* UvIpv6 -> UvIpv6SocketAddr
"Socket address" is a common name for (ip-address, port) pair (e.g. in
sockaddr_in struct).
In the first commit it is obvious why some of the barriers can be changed to ```Relaxed```, but it is not as obvious for the once I changed in ```kill.rs```. The rationale for those is documented as part of the documenting commit.
Also the last commit is a temporary hack to prevent kill signals from being received in taskgroup cleanup code, which could be fixed in a more principled way once the old runtime is gone.
old design the TLS held the scheduler struct, and the scheduler struct
held the active task. This posed all sorts of weird problems due to
how we wanted to use the contents of TLS. The cleaner approach is to
leave the active task in TLS and have the task hold the scheduler. To
make this work out the scheduler has to run inside a regular task, and
then once that is the case the context switching code is massively
simplified, as instead of three possible paths there is only one. The
logical flow is also easier to follow, as the scheduler struct acts
somewhat like a "token" indicating what is active.
These changes also necessitated changing a large number of runtime
tests, and rewriting most of the runtime testing helpers.
Polish level is "low", as I will very soon start on more scheduler
changes that will require wiping the polish off. That being said there
should be sufficient comments around anything complex to make this
entirely respectable as a standalone commit.
Change the former repetition::
for 5.times { }
to::
do 5.times { }
.times() cannot be broken with `break` or `return` anymore; for those
cases, use a numerical range loop instead.
This removes a bunch of options from the task builder interface that are irrelevant to the new scheduler and were generally unused anyway. It also bumps the stack size of new scheduler tasks so that there's enough room to run rustc and changes the interface to `Thread` to not implicitly join threads on destruction, but instead require an explicit, and mandatory, call to `join`.
Main logic in ```Implement select() for new runtime pipes.```. The guts of the ```PortOne::try_recv()``` implementation are now split up across several functions, ```optimistic_check```, ```block_on```, and ```recv_ready```.
There is one weird FIXME I left open here, in the "implement select" commit -- an assertion I couldn't get to work in the receive path, on an invariant that for some reason doesn't hold with ```SharedPort```. Still investigating this.
To be more specific:
`UPPERCASETYPE` was changed to `UppercaseType`
`type_new` was changed to `Type::new`
`type_function(value)` was changed to `value.method()`
Implements various missing tcp & udp methods.. Also fixes handling ipv4-mapped/compatible ipv6 addresses and addresses the XXX on `status_to_maybe_uv_error`.
r? @brson
This moves the raw struct layout of closures, vectors, boxes, and strings into a
new `unstable::raw` module. This is meant to be a centralized location to find
information for the layout of these values.
As safe method, `repr`, is provided to convert a rust value to its raw
representation. Unsafe methods to convert back are not provided because they are
rarely used and too numerous to write an implementation for each (not much of a
common pattern).
This is a cleanup pull request that does:
* removes `os::as_c_charp`
* moves `str::as_buf` and `str::as_c_str` into `StrSlice`
* converts some functions from `StrSlice::as_buf` to `StrSlice::as_c_str`
* renames `StrSlice::as_buf` to `StrSlice::as_imm_buf` (and adds `StrSlice::as_mut_buf` to match `vec.rs`.
* renames `UniqueStr::as_bytes_with_null_consume` to `UniqueStr::to_bytes`
* and other misc cleanups and minor optimizations
Some notes about the commits.
Exit code propagation commits:
* ```Reimplement unwrap()``` has the same old code from ```arc::unwrap``` ported to use modern atomic types and finally (it's considerably nicer this way)
* ```Add try_unwrap()``` has some new slightly-tricky (but pretty simple) concurrency primitive code
* ```Add KillHandle``` and ```Add kill::Death``` are the bulk of the logic.
Task killing commits:
* ```Implement KillHandle::kill() and friends```, ```Do a task-killed check```, and ```Add BlockedTask``` implement the killing logic;
* ```Change the HOF context switchers``` turns said logic on
Linked failure commits:
* ```Replace *rust_task ptrs``` adapts the taskgroup code to work for both runtimes
* ```Enable taskgroup code``` does what it says on the tin.
r? @brson
If the TLS key is 0-sized, then the linux linker is apparently smart enough to
put everything at the same pointer. OSX on the other hand, will reserve some
space for all of them. To get around this, the TLS key now actuall consumes
space to ensure that it gets a unique pointer
cc #6004 and #3273
This is a rewrite of TLS to get towards not requiring `@` when using task local storage. Most of the rewrite is straightforward, although there are two caveats:
1. Changing `local_set` to not require `@` is blocked on #7673
2. The code in `local_pop` is some of the most unsafe code I've written. A second set of eyes should definitely scrutinize it...
The public-facing interface currently hasn't changed, although it will have to change because `local_data::get` cannot return `Option<T>`, nor can it return `Option<&T>` (the lifetime isn't known). This will have to be changed to be given a closure which yield `&T` (or as an Option). I didn't do this part of the api rewrite in this pull request as I figured that it could wait until when `@` is fully removed.
This also doesn't deal with the issue of using something other than functions as keys, but I'm looking into using static slices (as mentioned in the issues).
r? @graydon, @nikomatsakis, @pcwalton, or @catamorphism
Sorry this is so huge, but it's been accumulating for about a month. There's lots of stuff here, mostly oriented toward enabling multithreaded scheduling and improving compatibility between the old and new runtimes. Adds task pinning so that we can create the 'platform thread' in servo.
[Here](e1555f9b56/src/libstd/rt/mod.rs (L201)) is the current runtime setup code.
About half of this has already been reviewed.
* stop using an atomic counter, this has a significant cost and
valgrind will already catch these leaks
* remove the extra layer of function calls
* remove the assert of non-null in free, freeing null is well defined
but throwing a failure from free will not be
* stop initializing the `prev`/`next` pointers
* abort on out-of-memory, failing won't necessarily work
Reopening of #7031, Closes#6963
I imagine though that this will bounce in bors once or twice... Because attributes can't be cfg(stage0)'d off, there's temporarily a lot of new stage0/stage1+ code.
This sets the `get_tydesc()` return type correctly and removes the intrinsic module. See #3730, #3475.
Update: this now also removes the unused shape fields in tydescs.
the `test/run-pass/class-trait-bounded-param.rs` test was xfailed and
written in an ancient dialect of Rust so I've just removed it
this also removes `to_vec` from DList because it's provided by
`std::iter::to_vec`
an Iterator implementation is added for OptVec but some transitional
internal iterator methods are still left
To achieve this, the following changes were made:
* Move TyDesc, TyVisitor and Opaque to std::unstable::intrinsics
* Convert TyDesc, TyVisitor and Opaque to lang items instead of specially
handling the intrinsics module
* Removed TypeDesc, FreeGlue and get_type_desc() from sys
Fixes#3475.
The Listener trait takes two type parameters, the type of connection and the type of Acceptor,
and specifies only one method, listen, which consumes the listener and produces an Acceptor.
The Acceptor trait takes one type parameter, the type of connection, and defines two methods.
The accept() method waits for an incoming connection attempt and returns the result.
The incoming() method creates an iterator over incoming connections and is a default method.
Example:
let listener = TcpListener.bind(addr); // Bind to a socket
let acceptor = listener.listen(); // Start the listener
for stream in acceptor.incoming() {
// Process incoming connections forever (or until you break out of the loop)
}
I removed the `static-method-test.rs` test because it was heavily based
on `BaseIter` and there are plenty of other more complex uses of static
methods anyway.
This is supposed to be an efficient way to link the lifetimes
of tasks into a tree. JoinLatches form a tree and when `release`
is called they wait on children then signal the parent.
This structure creates zombie tasks which currently keep the entire
task allocated. Zombie tasks are supposed to be tombstoned but that
code does not work correctly.
Fix a laundry list of warnings involving unused imports that glutted
up compilation output. There are more, but there seems to be some
false positives (where 'remedy' appears to break the build), but this
particular set of fixes seems safe.