Instead of a furious storm of idle callbacks we just have one. This is a major performance gain - around 40% on my machine for the ping pong bench.
Also in this PR is a cleanup commit for the scheduler code. Was previously up as a separate PR, but bors load + imminent merge hell led me to roll them together. Was #8549.
Each IO handle has a home event loop, which created it.
When a task wants to use an IO handle, it must first make sure it is on that home event loop.
It uses the scheduler handle in the IO handle to send itself there before starting the IO action.
Once the IO action completes, the task restores its previous home state.
If it is an AnySched task, then it will be executed on the new scheduler.
If it has a normal home, then it will return there before executing any more code after the IO action.
.with_c_str() is a replacement for the old .as_c_str(), to avoid
unnecessary boilerplate.
Replace all usages of .to_c_str().with_ref() with .with_c_str().
This PR fixes#7235 and #3371, which removes trailing nulls from `str` types. Instead, it replaces the creation of c strings with a new type, `std::c_str::CString`, which wraps a malloced byte array, and respects:
* No interior nulls
* Ends with a trailing null
Mostly optimizing TLS accesses to bring local heap allocation performance
closer to that of oldsched. It's not completely at parity but removing the
branches involved in supporting oldsched and optimizing pthread_get/setspecific
to instead use our dedicated TCB slot will probably make up for it.
Mostly optimizing TLS accesses to bring local heap allocation performance
closer to that of oldsched. It's not completely at parity but removing the
branches involved in supporting oldsched and optimizing pthread_get/setspecific
to instead use our dedicated TCB slot will probably make up for it.
FromStr implemented from scratch.
It is overengineered a bit, however.
Old implementation handles errors by fail!()-ing. And it has bugs, like it accepts `127.0.0.1::127.0.0.1` as IPv6 address, and does not handle all ipv4-in-ipv6 schemes. So I decided to implement parser from scratch.
This pull request converts the scheduler from a naive shared queue scheduler to a naive workstealing scheduler. The deque is still a queue inside a lock, but there is still a substantial performance gain. Fiddling with the messaging benchmark I got a ~10x speedup and observed massively reduced memory usage.
There are still *many* locations for optimization, but based on my experience so far it is a clear performance win as it is now.