rustup; ptr atomics
Adds support for the operations added in https://github.com/rust-lang/rust/pull/96935.
I made the pointer-binops always return the provenance of the *left* argument; `@thomcc` I hope that is what you intended. I have honestly no idea if it has anything to do with what LLVM does...
I also simplified our pointer comparison code while I was at it -- now that *all* comparison operators support wide pointers, we can unify those branches.
fix comparing wide raw pointers
Fixes https://github.com/rust-lang/rust/issues/96169
However I am not sure if these are the correct semantics. I'll wait for confirmation in that issue.
handle Box with allocators
This is the Miri side of https://github.com/rust-lang/rust/pull/98847.
Thanks `@DrMeepster` for doing most of the work of getting this test case to pass in Miri. :)
add command to run our benchmarks
This is quite ad-hoc but better than nothing IMO.
I have also deleted the old `benches` folder. Some of these tests have UB 😂 and the rest doesn't seem very useful to benchmark the things that are slow about Miri today.
Cc `@saethlin`
remove ancient tex files
These are the sources of `@solson's` original report, I think. They will remain available in the git history, but I don't think there is much point in still carrying them around on master. The readme links to their rendered PDFs:
- https://solson.me/miri-slides.pdf
- https://solson.me/miri-report.pdf
Optimizing Stacked Borrows (part 1?): Cache locations of Tags in a Borrow Stack
Before this PR, a profile of Miri under almost any workload points quite squarely at these regions of code as being incredibly hot (each being ~40% of cycles):
dadcbebfbd/src/stacked_borrows.rs (L259-L269)dadcbebfbd/src/stacked_borrows.rs (L362-L369)
This code is one of at least three reasons that stacked borrows analysis is super-linear: These are both linear in the number of borrows in the stack and they are positioned along the most commonly-taken paths.
I'm addressing the first loop (which is in `Stack::find_granting`) by adding a very very simple sort of LRU cache implemented on a `VecDeque`, which maps recently-looked-up tags to their position in the stack. For `Untagged` access we fall back to the same sort of linear search. But as far as I can tell there are never enough `Untagged` items to be significant.
I'm addressing the second loop by keeping track of the region of stack where there could be items granting `Permission::Unique`. This optimization is incredibly effective because `Read` access tends to dominate and many trips through this code path now skip the loop entirely.
These optimizations result in pretty enormous improvements:
Without raw pointer tagging, `mse` 34.5s -> 2.4s, `serde1` 5.6s -> 3.6s
With raw pointer tagging, `mse` 35.3s -> 2.4s, `serde1` 5.7s -> 3.6s
And there is hardly any impact on memory usage:
Memory usage on `mse` 844 MB -> 848 MB, `serde1` 184 MB -> 184 MB (jitter on these is a few MB).