r? @brson
links to issues: #7065 the race that's fixed; #7066 the perf improvement I added. There are also some minor cleanup commits here.
To measure the performance improvement from replacing the exclusive with an atomic uint, I edited the ```msgsend-ring-rw-arcs``` bench test to do a ```write_downgrade``` instead of just a ```write```, so that it stressed the code paths that accessed ```read_count```. (At first I was still using ```write``` and saw no performance difference whatsoever, whoooops.)
The bench test measures how long it takes to send 1,000,000 messages by using rwarcs to emulate pipes. I also measured the performance difference imposed by the fix to the ```access_lock``` race (which involves taking an extra semaphore in the ```cond.wait()``` path). The net result is that fixing the race imposes a 4% to 5% slowdown, but doing the atomic uint optimization gives a 6% to 8% speedup.
Note that this speedup will be most visible in read- or downgrade-heavy workloads. If an RWARC's only users are writers, the optimization doesn't matter. All the same, I think this more than justifies the extra complexity I mentioned in #7066.
The raw numbers are:
```
with xadd read count
before write_cond fix
4.18 to 4.26 us/message
with write_cond fix
4.35 to 4.39 us/message
with exclusive read count
before write_cond fix
4.41 to 4.47 us/message
with write_cond fix
4.65 to 4.76 us/message
```