It was not obvious how to make this implementation work when the unit size was not also a power of two, so for now just make the buffer size a multiple of the unit size so it can pass all the tests.