Optimize `AtomicBool` for target that don't support byte-sized atomics
`AtomicBool` is defined to have the same layout as `bool`, which means that we guarantee that it has a size of 1 byte. However on certain architectures such as RISC-V, LLVM will emulate byte atomics using a masked CAS loop on an aligned word.
We can take advantage of the fact that `bool` only ever has a value of 0 or 1 to replace `swap` operations with `and`/`or` operations that LLVM can lower to word-sized atomic `and`/`or` operations. This takes advantage of the fact that the incoming value to a `swap` or `compare_exchange` for `AtomicBool` is often a compile-time constant.
### Example
```rust
pub fn swap_true(atomic: &AtomicBool) -> bool {
atomic.swap(true, Ordering::Relaxed)
}
```
### Old
```asm
andi a1, a0, -4
slli a0, a0, 3
li a2, 255
sllw a2, a2, a0
li a3, 1
sllw a3, a3, a0
slli a3, a3, 32
srli a3, a3, 32
.LBB1_1:
lr.w a4, (a1)
mv a5, a3
xor a5, a5, a4
and a5, a5, a2
xor a5, a5, a4
sc.w a5, a5, (a1)
bnez a5, .LBB1_1
srlw a0, a4, a0
andi a0, a0, 255
snez a0, a0
ret
```
### New
```asm
andi a1, a0, -4
slli a0, a0, 3
li a2, 1
sllw a2, a2, a0
amoor.w a1, a2, (a1)
srlw a0, a1, a0
andi a0, a0, 255
snez a0, a0
ret
```