This significantly improves performance. For example for the simple-raytracer benchmark it goes from a 13% improvement over LLVM to 39% improvement over LLVM.