Turn quadratic time on number of impl blocks into linear time
Previously, if you had a lot of inherent impl blocks on a type like:
```Rust
struct Foo;
impl Foo { fn foo_1() {} }
// ...
impl Foo { fn foo_100_000() {} }
```
The compiler would be very slow at processing it, because
an internal algorithm would run in O(n^2), where n is the number
of impl blocks. Now, we add a new algorithm that allocates but
is faster asymptotically.
Comparing rustc nightly with a local build of rustc as of this PR (results in seconds):
| N | real time before | real time after |
| - | - | - |
| 4_000 | 0.57 | 0.46 |
| 8_000 | 1.31 | 0.84 |
| 16_000 | 3.56 | 1.69 |
| 32_000 | 10.60 | 3.73 |
I've tuned up the numbers to make the effect larger than the startup noise of rustc, but the asymptotic difference should hold for smaller n as well.
Note: current state of the PR omits error messages if there are other errors present already. For now, I'm mainly interested in a perf run to study whether this issue is present at all. Please queue one for this PR. Thanks!