Optimizing Stacked Borrows (part 1?): Cache locations of Tags in a Borrow Stack
Before this PR, a profile of Miri under almost any workload points quite squarely at these regions of code as being incredibly hot (each being ~40% of cycles):
dadcbebfbd/src/stacked_borrows.rs (L259-L269)dadcbebfbd/src/stacked_borrows.rs (L362-L369)
This code is one of at least three reasons that stacked borrows analysis is super-linear: These are both linear in the number of borrows in the stack and they are positioned along the most commonly-taken paths.
I'm addressing the first loop (which is in `Stack::find_granting`) by adding a very very simple sort of LRU cache implemented on a `VecDeque`, which maps recently-looked-up tags to their position in the stack. For `Untagged` access we fall back to the same sort of linear search. But as far as I can tell there are never enough `Untagged` items to be significant.
I'm addressing the second loop by keeping track of the region of stack where there could be items granting `Permission::Unique`. This optimization is incredibly effective because `Read` access tends to dominate and many trips through this code path now skip the loop entirely.
These optimizations result in pretty enormous improvements:
Without raw pointer tagging, `mse` 34.5s -> 2.4s, `serde1` 5.6s -> 3.6s
With raw pointer tagging, `mse` 35.3s -> 2.4s, `serde1` 5.7s -> 3.6s
And there is hardly any impact on memory usage:
Memory usage on `mse` 844 MB -> 848 MB, `serde1` 184 MB -> 184 MB (jitter on these is a few MB).
move arc_drop test to miri-test-libstd
That's where we have a bunch of slow and unlikely-to-regress tests already. Miri's test suite should primarily test Miri; we sometimes add libstd tests when they are cheap and/or exercise interesting code paths in Miri. https://github.com/rust-lang/miri/pull/2248 already ensures that we allow code like `Arc` with more direct tests.
./miri many-seeds: also print the seed before we try it
When using `cargo miri`, we otherwise have no way of even seeing which seed it is currently on.
Extend `environ` linux extern implementation to freebsd
This fixes the `env` test on freebsd, and enables the CI test
Signed-off-by: InfRandomness <infrandomness@gmail.com>
Support (stat/fstat/lstat)64 on macos
"In order to accommodate advanced capabilities of newer file systems,
the struct stat, struct statfs, and struct dirent data structures
were updated in Mac OSX 10.5."
"TRANSITIONAL DESCRIPTION (NOW DEPRECATED)
The fstat64, lstat64 and stat64 routines are equivalent to their
corresponding non-64-suffixed routine, when 64-bit inodes are in
effect. They were added before there was support for the symbol
variants, and so are now deprecated. Instead of using these, set
the _DARWIN_USE_64_BIT_INODE macro before including header files to
force 64-bit inode support. The stat64 structure used by these deprecated routines is the same
as the stat structure when 64-bit inodes are in effect (see above)."
"HISTORY
An lstat() function call appeared in 4.2BSD. The stat64(),
fstat64(), and lstat64() system calls first appeared in Mac OS X
10.5 (Leopard) and are now deprecated in favor of the corresponding
symbol variants. The fstatat() system call appeared in OS X 10.10"
"In order to accommodate advanced capabilities of newer file systems,
the struct stat, struct statfs, and struct dirent data structures
were updated in Mac OSX 10.5."
"TRANSITIONAL DESCRIPTION (NOW DEPRECATED)
The fstat64, lstat64 and stat64 routines are equivalent to their
corresponding non-64-suffixed routine, when 64-bit inodes are in
effect. They were added before there was support for the symbol
variants, and so are now deprecated. Instead of using these, set
the _DARWIN_USE_64_BIT_INODE macro before including header files to
force 64-bit inode support.
The stat64 structure used by these deprecated routines is the same
as the stat structure when 64-bit inodes are in effect (see above)."
"HISTORY
An lstat() function call appeared in 4.2BSD. The stat64(),
fstat64(), and lstat64() system calls first appeared in Mac OS X
10.5 (Leopard) and are now deprecated in favor of the corresponding
symbol variants. The fstatat() system call appeared in OS X 10.10"
avoid copying thread manager state in data race detector
When doing https://github.com/rust-lang/miri/pull/2047 I did not realize that there is some redundant state here that we can now remove from the data race detector.
Also this removes the vector clocks from the data race errors since those don't really help diagnose the problem.
This adds a very simple LRU-like cache which stores the locations of
often-used tags. While the implementation is very simple, the cache hit
rate is incredible at ~99.9% on most programs, and often the element at
position 0 in the cache has a hit rate of 90%. So the sub-optimality of
this cache basicaly vanishes into the noise in a profile.
Additionally, we keep a range which denotes where there might be an item
granting Unique permission in the stack, so that when we invalidate
Uniques we do not need to scan much of the stack, and often scan nothing
at all.