This const generic implementation for certain lane sizes represents
a more limited interface than what LLVM's shufflevector instruction
can handle, as normally the length of U can be different from the
length of T, but offers an interface that it is expected to be able
to expand the capabilities of in the future.