Optimizing Stacked Borrows (part 1?): Cache locations of Tags in a Borrow Stack
Before this PR, a profile of Miri under almost any workload points quite squarely at these regions of code as being incredibly hot (each being ~40% of cycles):
dadcbebfbd/src/stacked_borrows.rs (L259-L269)dadcbebfbd/src/stacked_borrows.rs (L362-L369)
This code is one of at least three reasons that stacked borrows analysis is super-linear: These are both linear in the number of borrows in the stack and they are positioned along the most commonly-taken paths.
I'm addressing the first loop (which is in `Stack::find_granting`) by adding a very very simple sort of LRU cache implemented on a `VecDeque`, which maps recently-looked-up tags to their position in the stack. For `Untagged` access we fall back to the same sort of linear search. But as far as I can tell there are never enough `Untagged` items to be significant.
I'm addressing the second loop by keeping track of the region of stack where there could be items granting `Permission::Unique`. This optimization is incredibly effective because `Read` access tends to dominate and many trips through this code path now skip the loop entirely.
These optimizations result in pretty enormous improvements:
Without raw pointer tagging, `mse` 34.5s -> 2.4s, `serde1` 5.6s -> 3.6s
With raw pointer tagging, `mse` 35.3s -> 2.4s, `serde1` 5.7s -> 3.6s
And there is hardly any impact on memory usage:
Memory usage on `mse` 844 MB -> 848 MB, `serde1` 184 MB -> 184 MB (jitter on these is a few MB).
Support (stat/fstat/lstat)64 on macos
"In order to accommodate advanced capabilities of newer file systems,
the struct stat, struct statfs, and struct dirent data structures
were updated in Mac OSX 10.5."
"TRANSITIONAL DESCRIPTION (NOW DEPRECATED)
The fstat64, lstat64 and stat64 routines are equivalent to their
corresponding non-64-suffixed routine, when 64-bit inodes are in
effect. They were added before there was support for the symbol
variants, and so are now deprecated. Instead of using these, set
the _DARWIN_USE_64_BIT_INODE macro before including header files to
force 64-bit inode support. The stat64 structure used by these deprecated routines is the same
as the stat structure when 64-bit inodes are in effect (see above)."
"HISTORY
An lstat() function call appeared in 4.2BSD. The stat64(),
fstat64(), and lstat64() system calls first appeared in Mac OS X
10.5 (Leopard) and are now deprecated in favor of the corresponding
symbol variants. The fstatat() system call appeared in OS X 10.10"
"In order to accommodate advanced capabilities of newer file systems,
the struct stat, struct statfs, and struct dirent data structures
were updated in Mac OSX 10.5."
"TRANSITIONAL DESCRIPTION (NOW DEPRECATED)
The fstat64, lstat64 and stat64 routines are equivalent to their
corresponding non-64-suffixed routine, when 64-bit inodes are in
effect. They were added before there was support for the symbol
variants, and so are now deprecated. Instead of using these, set
the _DARWIN_USE_64_BIT_INODE macro before including header files to
force 64-bit inode support.
The stat64 structure used by these deprecated routines is the same
as the stat structure when 64-bit inodes are in effect (see above)."
"HISTORY
An lstat() function call appeared in 4.2BSD. The stat64(),
fstat64(), and lstat64() system calls first appeared in Mac OS X
10.5 (Leopard) and are now deprecated in favor of the corresponding
symbol variants. The fstatat() system call appeared in OS X 10.10"
This adds a very simple LRU-like cache which stores the locations of
often-used tags. While the implementation is very simple, the cache hit
rate is incredible at ~99.9% on most programs, and often the element at
position 0 in the cache has a hit rate of 90%. So the sub-optimality of
this cache basicaly vanishes into the noise in a profile.
Additionally, we keep a range which denotes where there might be an item
granting Unique permission in the stack, so that when we invalidate
Uniques we do not need to scan much of the stack, and often scan nothing
at all.
Enable permissive provenance by default
This completes the plan laid out in https://github.com/rust-lang/miri/issues/2133:
- We use permissive provenance with wildcard pointers by default.
- We print a warning on int2ptr casts. `-Zmiri-permissive-provenance` suppresses the warning; `-Zmiri-strict-provenance` turns it into a hard error.
- Raw pointer tagging is now always enabled, so we remove the `-Zmiri-tag-raw-pointers` flag and the code for untagged pointers. (Passing the flag still works, for compatibility -- but we just ignore it, with a warning.)
We also fix an intptrcast issue:
- Only live allocations are considered when computing the AllocId from an address.
So, finally, Miri has a good story for ptr2int2ptr roundtrips *and* no weird false negatives when doing raw pointer stuff with Stacked Borrows. :-) 🎉 Thanks a lot to everyone who helped with this, in particular `@carbotaniuman` who convinced me this is even possible.
Fixes https://github.com/rust-lang/miri/issues/2133
Fixes https://github.com/rust-lang/miri/issues/1866
Fixes https://github.com/rust-lang/miri/issues/1993