this improves typeck performance by 5% (LLVM times are still huge). Basically fixes #25916 (still O(n^2), but the example takes <1s to compile).