rust/compiler
bors 1be5c8f909 Auto merge of #93432 - Kobzol:stable-hash-isize-hash-compression, r=the8472
Compress amount of hashed bytes for `isize` values in StableHasher

This is another attempt to land https://github.com/rust-lang/rust/pull/92103, this time hopefully with a correct implementation w.r.t. stable hashing guarantees. The previous PR was [reverted](https://github.com/rust-lang/rust/pull/93014) because it could produce the [same hash](https://github.com/rust-lang/rust/pull/92103#issuecomment-1014625442) for different values even in quite simple situations. I have since added a basic [test](https://github.com/rust-lang/rust/pull/93193) that should guard against that situation, I also added a new test in this PR, specialised for this optimization.

## Why this optimization helps
Since the original PR, I have tried to analyze why this optimization even helps (and why it especially helps for `clap`). I found that the vast majority of stable-hashing `i64` actually comes from hashing `isize` (which is converted to `i64` in the stable hasher). I only found a single place where is this datatype used directly in the compiler, and this place has also been showing up in traces that I used to find out when is `isize` being hashed. This place is `rustc_span::FileName::DocTest`, however, I suppose that isizes also come from other places, but they might not be so easy to find (there were some other entries in the trace). `clap` hashes about 8.5 million `isize`s, and all of them fit into a single byte, which is why this optimization has helped it [quite a lot](https://github.com/rust-lang/rust/pull/92103#issuecomment-1005711861).

Now, I'm not sure if special casing `isize` is the correct solution here, maybe something could be done with that `isize` inside `DocTest` or in other places, but that's for another discussion I suppose. In this PR, instead of hardcoding a special case inside `SipHasher128`, I instead put it into `StableHasher`, and only used it for `isize` (I tested that for `i64` it doesn't help, or at least not for `clap` and other few benchmarks that I was testing).

## New approach
Since the most common case is a single byte, I added a fast path for hashing `isize` values which positive value fits within a single byte, and a cold path for the rest of the values.

To avoid the previous correctness problem, we need to make sure that each unique `isize` value will produce a unique hash stream to the hasher. By hash stream I mean a sequence of bytes that will be hashed (a different sequence should produce a different hash, but that is of course not guaranteed).

We have to distinguish different values that produce the same bit pattern when we combine them. For example, if we just simply skipped the leading zero bytes for values that fit within a single byte, `(0xFF, 0xFFFFFFFFFFFFFFFF)` and `(0xFFFFFFFFFFFFFFFF, 0xFF)` would send the same hash stream to the hasher, which must not happen.

To avoid this situation, values `[0, 0xFE]` are hashed as a single byte. When we hash a larger (treating `isize` as `u64`) value, we first hash an additional byte `0xFF`. Since `0xFF` cannot occur when we apply the single byte optimization, we guarantee that the hash streams will be unique when hashing two values `(a, b)` and `(b, a)` if `a != b`:
1) When both `a` and `b` are within `[0, 0xFE]`, their hash streams will be different.
2) When neither `a` and `b` are within `[0, 0xFE]`, their hash streams will be different.
3) When `a` is within `[0, 0xFE]` and `b` isn't, when we hash `(a, b)`, the hash stream will definitely not begin with `0xFF`. When we hash `(b, a)`, the hash stream will definitely begin with `0xFF`. Therefore the hash streams will be different.

r? `@the8472`
2022-02-03 01:08:45 +00:00
..
rustc remove unused jemallocator crate 2022-01-28 16:56:05 +01:00
rustc_apfloat
rustc_arena Remove unused dep from rustc_arena 2022-02-02 17:37:14 +01:00
rustc_ast Make Decodable and Decoder infallible. 2022-01-22 10:38:31 +11:00
rustc_ast_lowering More let_else adoptions 2022-02-02 17:11:01 +01:00
rustc_ast_passes add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_ast_pretty Allow any line to have at least 60 chars 2022-01-31 10:56:57 -08:00
rustc_attr More let_else adoptions 2022-02-02 17:11:01 +01:00
rustc_borrowck Rollup merge of #93590 - est31:let_else, r=lcnr 2022-02-02 19:34:07 +01:00
rustc_builtin_macros add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_codegen_cranelift Use an indexmap to avoid sorting LocalDefIds 2022-01-22 22:34:16 -06:00
rustc_codegen_gcc Merge landing_pad and set_cleanup into cleanup_landing_pad 2022-01-24 14:10:05 +01:00
rustc_codegen_llvm Auto merge of #93154 - michaelwoerister:fix-generic-closure-and-generator-debuginfo, r=wesleywiser 2022-02-02 12:37:28 +00:00
rustc_codegen_ssa Auto merge of #93154 - michaelwoerister:fix-generic-closure-and-generator-debuginfo, r=wesleywiser 2022-02-02 12:37:28 +00:00
rustc_const_eval Rollup merge of #93546 - tmiasko:validate-switch-int, r=oli-obk 2022-02-02 19:34:04 +01:00
rustc_data_structures Auto merge of #93432 - Kobzol:stable-hash-isize-hash-compression, r=the8472 2022-02-03 01:08:45 +00:00
rustc_driver add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_error_codes Rollup merge of #88205 - danii:e0772, r=GuillaumeGomez 2022-01-29 14:46:29 +01:00
rustc_errors add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_expand add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_feature add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_fs_util
rustc_graphviz
rustc_hir Auto merge of #93285 - JulianKnodt:const_eq_2, r=oli-obk 2022-02-01 23:18:01 +00:00
rustc_hir_pretty try apply rustc_pass_by_value to Span 2022-01-27 11:29:41 +01:00
rustc_incremental add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_index implement lint for suspicious auto trait impls 2022-02-01 09:55:19 +01:00
rustc_infer Auto merge of #93285 - JulianKnodt:const_eq_2, r=oli-obk 2022-02-01 23:18:01 +00:00
rustc_interface Auto merge of #93466 - cjgillot:query-dead, r=nagisa 2022-02-02 02:29:32 +00:00
rustc_lexer
rustc_lint Rollup merge of #93290 - lcnr:same_type, r=jackh726 2022-02-01 16:08:05 +01:00
rustc_lint_defs implement lint for suspicious auto trait impls 2022-02-01 09:55:19 +01:00
rustc_llvm Use error-on-mismatch policy for PAuth module flags. 2022-01-24 16:50:10 +00:00
rustc_log
rustc_macros Make Decodable and Decoder infallible. 2022-01-22 10:38:31 +11:00
rustc_metadata add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_middle Auto merge of #93312 - pierwill:map-all-local-trait-impls, r=cjgillot 2022-02-02 15:36:12 +00:00
rustc_mir_build More let_else adoptions 2022-02-02 17:11:01 +01:00
rustc_mir_dataflow
rustc_mir_transform Rollup merge of #93290 - lcnr:same_type, r=jackh726 2022-02-01 16:08:05 +01:00
rustc_monomorphize add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_parse better suggestion for duplicated where 2022-02-02 00:29:45 -08:00
rustc_parse_format
rustc_passes Auto merge of #93466 - cjgillot:query-dead, r=nagisa 2022-02-02 02:29:32 +00:00
rustc_plugin_impl
rustc_privacy add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_query_impl add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_query_system add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_resolve Auto merge of #93312 - pierwill:map-all-local-trait-impls, r=cjgillot 2022-02-02 15:36:12 +00:00
rustc_save_analysis More let_else adoptions 2022-02-02 17:11:01 +01:00
rustc_serialize Remove two unnecessary transmutes from opaque Encoder and Decoder 2022-01-31 18:25:05 +01:00
rustc_session Add missing | between print options 2022-02-01 12:40:01 -08:00
rustc_span add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_symbol_mangling add a rustc::query_stability lint 2022-02-01 10:15:59 +01:00
rustc_target Rollup merge of #92021 - woodenarrow:br_single_fp_element, r=Mark-Simulacrum 2022-02-01 16:08:03 +01:00
rustc_trait_selection Auto merge of #93285 - JulianKnodt:const_eq_2, r=oli-obk 2022-02-01 23:18:01 +00:00
rustc_traits Remove generalization over projection 2022-01-28 00:25:36 +00:00
rustc_ty_utils remove TyS::same_type 2022-02-01 11:21:26 +01:00
rustc_type_ir
rustc_typeck don't suggest adding let due to expressions inside of while loop 2022-02-01 23:27:04 -08:00