Projection cache and better warnings for #32330
This PR does three things:
- it lays the groundwork for the more precise subtyping rules discussed in #32330, but does not enable them;
- it issues warnings when the result of a leak-check or subtyping check relies on a late-bound region which will late become early-bound when #32330 is fixed;
- it introduces a cache for projection in the inference context.
I'm not 100% happy with the approach taken by the cache here, but it seems like a step in the right direction. It results in big wins on some test cases, but not as big as previous versions -- I think because it is caching the `Vec<Obligation>` (whereas before I just returned the normalized type with an empty vector). However, that change was needed to fix an ICE in @alexcrichton's future-rs module (I haven't fully tracked the cause of that ICE yet). Also, because trans/the collector use a fresh inference context for every call to `fulfill_obligation`, they don't profit nearly as much from this cache as they ought to.
Still, here are the results from the future-rs `retry.rs`:
```
06:26 <nmatsakis> time: 6.246; rss: 44MB item-bodies checking
06:26 <nmatsakis> time: 54.783; rss: 63MB translation item collection
06:26 <nmatsakis> time: 140.086; rss: 86MB translation
06:26 <nmatsakis> time: 0.361; rss: 46MB item-bodies checking
06:26 <nmatsakis> time: 5.299; rss: 63MB translation item collection
06:26 <nmatsakis> time: 12.140; rss: 86MB translation
```
~~Another example is the example from #31849. For that, I get 34s to run item-bodies without any cache. The version of the cache included here takes 2s to run item-bodies type-checking. An alternative version which doesn't track nested obligations takes 0.2s, but that version ICEs on @alexcrichton's future-rs (and may well be incorrect, I've not fully convinced myself of that). So, a definite win, but I think there's definitely room for further progress.~~
Pushed a modified version which improves performance of the case from #31849:
```
lunch-box. time rustc --stage0 ~/tmp/issue-31849.rs -Z no-trans
real 0m33.539s
user 0m32.932s
sys 0m0.570s
lunch-box. time rustc --stage2 ~/tmp/issue-31849.rs -Z no-trans
real 0m0.195s
user 0m0.154s
sys 0m0.042s
```
Some sort of cache is also needed for unblocking further work on lazy normalization, since that will lean even more heavily on the cache, and will also require cycle detection.
r? @arielb1
MSVC requires unwinding code to be split to a tree of *funclets*, where each funclet
can only branch to itself or to to its parent.
Luckily, the code we generates matches this pattern. Recover that structure in
an analyze pass and translate according to that.
Support 16-bit pointers as well as i/usize
I'm opening this pull request to get some feedback from the community.
Although Rust doesn't support any platforms with a native 16-bit pointer at the moment, the [AVR-Rust][ar] fork is working towards that goal. Keeping this forked logic up-to-date with the changes in master has been onerous so I'd like to merge these changes so that they get carried along when refactoring happens. I do not believe this should increase the maintenance burden.
This is based on the original work of Dylan McKay (@dylanmckay).
[ar]: https://github.com/avr-rust/rust
this introduces a DropAndReplace terminator as a fix to #30380. That terminator
is suppsoed to be translated by desugaring during drop elaboration, which is
not implemented in this commit, so this breaks `-Z orbit` temporarily.
Fix handling of FFI arguments
r? @eddyb @nikomatsakis or whoever else.
cc @alexcrichton @rust-lang/core
The strategy employed here was to essentially change code we generate from
```llvm
%s = alloca %S ; potentially smaller than argument, but never larger
%1 = bitcast %S* %s to { i64, i64 }*
store { i64, i64 } %0, { i64, i64 }* %1, align 4
```
to
```llvm
%1 = alloca { i64, i64 } ; the copy of argument itself
store { i64, i64 } %0, { i64, i64 }* %1, align 4
%s = bitcast { i64, i64 }* %1 to %S* ; potentially truncate by casting to a pointer of smaller type.
```
rustc: Add a new crate type, cdylib
This commit is an implementation of [RFC 1510] which adds a new crate type,
`cdylib`, to the compiler. This new crate type differs from the existing `dylib`
crate type in a few key ways:
* No metadata is present in the final artifact
* Symbol visibility rules are the same as executables, that is only reachable
`extern` functions are visible symbols
* LTO is allowed
* All libraries are always linked statically
This commit is relatively simple by just plubming the compiler with another
crate type which takes different branches here and there. The only major change
is an implementation of the `Linker::export_symbols` function on Unix which now
actually does something. This helps restrict the public symbols from a cdylib on
Unix.
With this PR a "hello world" `cdylib` is 7.2K while the same `dylib` is 2.4MB,
which is some nice size savings!
[RFC 1510]: https://github.com/rust-lang/rfcs/pull/1510Closes#33132
This commit is an implementation of [RFC 1510] which adds a new crate type,
`cdylib`, to the compiler. This new crate type differs from the existing `dylib`
crate type in a few key ways:
* No metadata is present in the final artifact
* Symbol visibility rules are the same as executables, that is only reachable
`extern` functions are visible symbols
* LTO is allowed
* All libraries are always linked statically
This commit is relatively simple by just plubming the compiler with another
crate type which takes different branches here and there. The only major change
is an implementation of the `Linker::export_symbols` function on Unix which now
actually does something. This helps restrict the public symbols from a cdylib on
Unix.
With this PR a "hello world" `cdylib` is 7.2K while the same `dylib` is 2.4MB,
which is some nice size savings!
[RFC 1510]: https://github.com/rust-lang/rfcs/pull/1510Closes#33132
trans-collector: Assorted fixes and refactorings needed for making trans collector-driven.
As the title says. The messages on the individual commits should do a good job of explaining what they are about.
r? @nikomatsakis
[MIR trans] Optimize trans for biased switches
Currently, all switches in MIR are exhausitive, meaning that we can have
a lot of arms that all go to the same basic block, the extreme case
being an if-let expression which results in just 2 possible cases, be
might end up with hundreds of arms for large enums.
To improve this situation and give LLVM less code to chew on, we can
detect whether there's a pre-dominant target basic block in a switch
and then promote this to be the default target, not translating the
corresponding arms at all.
In combination with #33544 this makes unoptimized MIR trans of
nickel.rs as fast as using old trans and greatly improves the times for
optimized builds, which are only 30-40% slower instead of ~300%.
cc #33111