This uses a (per-trait) hash-table to separate impls from different TraitDefs, and makes coherence go so much quicker. I will post performance numbers tomorrow.
This is still WIP, as when there's an overlap error, impls can get printed in the wrong order, which causes a few issues. Should I pick the local impl with the smallest NodeId to print?
Could you take a look at this @nikomatsakis?