The code compiles and runs under windows now, but I couldn't look up any
symbol from the current executable (dlopen(NULL)), and calling looked
up external function handles doesn't seem to work correctly under windows.
This the beginning of a fix for #7095.
These intrinsics are synthesized, so maybe they should be in another
file. But since they are just a single line of code each, based on the
bswap intrinsics and aren't really intended for public consumption (they should be exposed as a
single function / trait) I thought they would fit here.
r? @brson
links to issues: #7065 the race that's fixed; #7066 the perf improvement I added. There are also some minor cleanup commits here.
To measure the performance improvement from replacing the exclusive with an atomic uint, I edited the ```msgsend-ring-rw-arcs``` bench test to do a ```write_downgrade``` instead of just a ```write```, so that it stressed the code paths that accessed ```read_count```. (At first I was still using ```write``` and saw no performance difference whatsoever, whoooops.)
The bench test measures how long it takes to send 1,000,000 messages by using rwarcs to emulate pipes. I also measured the performance difference imposed by the fix to the ```access_lock``` race (which involves taking an extra semaphore in the ```cond.wait()``` path). The net result is that fixing the race imposes a 4% to 5% slowdown, but doing the atomic uint optimization gives a 6% to 8% speedup.
Note that this speedup will be most visible in read- or downgrade-heavy workloads. If an RWARC's only users are writers, the optimization doesn't matter. All the same, I think this more than justifies the extra complexity I mentioned in #7066.
The raw numbers are:
```
with xadd read count
before write_cond fix
4.18 to 4.26 us/message
with write_cond fix
4.35 to 4.39 us/message
with exclusive read count
before write_cond fix
4.41 to 4.47 us/message
with write_cond fix
4.65 to 4.76 us/message
```
The code compiles and runs under windows now, but I couldn't look up any
symbol from the current executable (dlopen(NULL)), and calling looked
up external function handles doesn't seem to work correctly under windows.
The confusing mixture of byte index and character count meant that every
use of .substr was incorrect; replaced by slice_chars which only uses
character indices. The old behaviour of `.substr(start, n)` can be emulated
via `.slice_from(start).slice_chars(0, n)`.
Fix a laundry list of warnings involving unused imports that glutted
up compilation output. There are more, but there seems to be some
false positives (where 'remedy' appears to break the build), but this
particular set of fixes seems safe.