Some modest running-time improvements to `std::collections::BitSet` on bit-sets of varying set-membership densities. This is work originally from [here](https://github.com/rayglover/alt_collections). (Benchmarks copied below)
```
std::collections::BitSet / alt_collections::BitSet
copy_dense ... 3.08x
copy_sparse ... 4.22x
count_dense ... 11.01x
count_sparse ... 8.11x
from_bytes ... 1.47x
intersect_dense ... 6.54x
intersect_sparse ... 4.37x
union_dense ... 5.53x
union_sparse ... 5.60x
```
The exception is `from_bytes`, which I've left unaltered since the optimization is rather obscure.
Compiling with the cpu feature `popcnt` gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above.
Similar improvements could be made to `BitVec`, although that would probably require more substantial changes.
criticism welcome!