querify layout and move param env out of the infcx
The main goal of this PR is to move the parameter environment *out* of the inference context. This is because the inference environment will soon be changing over the course of inference --- for example, when we enter into a `for<'a> fn(...)` type, we will push a new environment with an increasing universe index, rather than skolemizing the `'a` references. Similarly, each obligation will soon be able to have a distinct parameter environment, and therefore the `Obligation` struct is extended to carry a `ParamEnv<'tcx>`. (I debated about putting it into the cause; seems plausible, but also weird.)
Along the way, I also reworked how layout works, moving the layout cache into a proper query along the lines of needs-drop and friends.
Finally, tweaks the inference context API. It seemed to be accumulating parameters at an alarming rate. The main way to e.g. make a subtype or equality relationship is to do the following:
infcx.at(cause, param_env).sub(a, b)
infcx.at(cause, param_env).eq(a, b)
In both cases, `a` is considered the "expected" type (this used to be specified by a boolean). I tried hard to preserve the existing notion of what was "expected", although in some cases I'm not convinced it was being set on purpose one way or the other. This is why in some cases you will see me do `sup(b, a)`, which is otherwise equivalent to `sub(a, b)`, but sets the "expected type" differently.
r? @eddyb
cc @arielb1
Use the trait-environment+type as the key. Note that these
are only invoked on types that live for the entire compilation
(no inference artifacts). We no longer need the various special-case
bits and caches that were in place before.
When -Z profile is passed, the GCDAProfiling LLVM pass is added
to the pipeline, which uses debug information to instrument the IR.
After compiling with -Z profile, the $(OUT_DIR)/$(CRATE_NAME).gcno
file is created, containing initial profiling information.
After running the program built, the $(OUT_DIR)/$(CRATE_NAME).gcda
file is created, containing branch counters.
The created *.gcno and *.gcda files can be processed using
the "llvm-cov gcov" and "lcov" tools. The profiling data LLVM
generates does not faithfully follow the GCC's format for *.gcno
and *.gcda files, and so it will probably not work with other tools
(such as gcov itself) that consume these files.
That method is *incredibly* hot, so this ends up saving 10% of trans
time.
BTW, we really should be doing dependency tracking there - and possibly be
taking the respective perf hit (got to find a way to make DTMs fast), but
`layout_cache` is a non-dep-tracking map.
Arguably these could become custom queries, but I chose not to do that
because the relationship of queries and trait system is not yet fleshed
out enough. For now it seems fine to have them be `DepTrackingMap` using
the memoize pattern.
The symbol map is not good for incremental: it has inputs from every fn
in existence, and it will change if anything changes. One could imagine
cheating with the symbol-map and exempting it from the usual dependency
tracking, since the results are fully deterministic. Instead, I opted to
just add a per-CGU cache, on the premise that recomputing some symbol
names is not going to be so very expensive.
A number of things were using `crate_hash` that really ought to be using
`crate_disambiguator` (e.g., to create the plugin symbol names). They
have been updated.
It is important to remove `LinkMeta` from `SharedCrateContext` since it
contains a hash of the entire crate, and hence it will change
whenever **anything** changes (which would then require
rebuilding **everything**).
similar to GCC's __attribute((used))__. This attribute prevents LLVM from
optimizing away a non-exported symbol, within a compilation unit (object file),
when there are no references to it.
This is better explained with an example:
```
#[used]
static LIVE: i32 = 0;
static REFERENCED: i32 = 0;
static DEAD: i32 = 0;
fn internal() {}
pub fn exported() -> &'static i32 {
&REFERENCED
}
```
Without optimizations, LLVM pretty much preserves all the static variables and
functions within the compilation unit.
```
$ rustc --crate-type=lib --emit=obj symbols.rs && nm -C symbols.o
0000000000000000 t drop::h1be0f8f27a2ba94a
0000000000000000 r symbols::REFERENCED::hb3bdfd46050bc84c
0000000000000000 r symbols::DEAD::hc2ea8f9bd06f380b
0000000000000000 r symbols::LIVE::h0970cf9889edb56e
0000000000000000 T symbols::exported::h6f096c2b1fc292b2
0000000000000000 t symbols::internal::h0ac1aadbc1e3a494
```
With optimizations, LLVM will drop dead code. Here `internal` is dropped because
it's not a exported function/symbol (i.e. not `pub`lic). `DEAD` is dropped for
the same reason. `REFERENCED` is preserved, even though it's not exported,
because it's referenced by the `exported` function. Finally, `LIVE` survives
because of the `#[used]` attribute even though it's not exported or referenced.
```
$ rustc --crate-type=lib -C opt-level=3 --emit=obj symbols.rs && nm -C symbols.o
0000000000000000 r symbols::REFERENCED::hb3bdfd46050bc84c
0000000000000000 r symbols::LIVE::h0970cf9889edb56e
0000000000000000 T symbols::exported::h6f096c2b1fc292b2
```
Note that the linker knows nothing about `#[used]` and will drop `LIVE`
because no other object references to it.
```
$ echo 'fn main() {}' >> symbols.rs
$ rustc symbols.rs && nm -C symbols | grep LIVE
```
At this time, `#[used]` only works on `static` variables.
Drop of arrays is now translated in trans::block in an ugly way that I
should clean up in a later PR, and does not handle panics in the middle
of an array drop, but this commit & PR are growing too big.
These changes are in the same commit to avoid needing to adapt
meth::trans_object_shim to the new scheme.
One codegen-units test is broken because we instantiate the shims even
when they are not needed. This will be fixed in the next PR.
A task function is now given as a `fn` pointer to ensure that it carries
no state. Each fn can take two arguments, because that worked out to be
convenient -- these two arguments must be of some type that is
`DepGraphSafe`, a new trait that is intended to prevent "leaking"
information into the task that was derived from tracked state.
This intentionally leaves `DepGraph::in_task()`, the more common form,
alone. Eventually all uses of `DepGraph::in_task()` should be ported
to `with_task()`, but I wanted to start with a smaller subset.
Originally I wanted to use closures bound by an auto trait, but that
approach has some limitations:
- the trait cannot have a `read()` method; since the current method
is unused, that may not be a problem.
- more importantly, we would want the auto trait to be "undefined" for all types
*by default* -- that is, this use case doesn't really fit the typical
auto trait scenario. For example, imagine that there is a `u32` loaded
out of a `hir::Node` -- we don't really want to be passing that
`u32` into the task!
This commit introduces 128-bit integers. Stage 2 builds and produces a working compiler which
understands and supports 128-bit integers throughout.
The general strategy used is to have rustc_i128 module which provides aliases for iu128, equal to
iu64 in stage9 and iu128 later. Since nowhere in rustc we rely on large numbers being supported,
this strategy is good enough to get past the first bootstrap stages to end up with a fully working
128-bit capable compiler.
In order for this strategy to work, number of locations had to be changed to use associated
max_value/min_value instead of MAX/MIN constants as well as the min_value (or was it max_value?)
had to be changed to use xor instead of shift so both 64-bit and 128-bit based consteval works
(former not necessarily producing the right results in stage1).
This commit includes manual merge conflict resolution changes from a rebase by @est31.
Before this PR, type names could depend on the cratenum being used
for a given crate and also on the source location of closures.
Both are undesirable for incremental compilation where we cache
LLVM IR and don't want it to depend on formatting or in which
order crates are loaded.
Every codegen unit gets its own local counter for generating new symbol
names. This makes bitcode and object files reproducible at the binary
level even when incremental compilation is used.
rustc_trans: don't round up the DST prefix size to its alignment.
Fixes#35815 by using `ty::layout` and `min_size` to compute the size of the DST prefix.
`ty::layout::Struct::min_size` is not rounded up to alignment, which could be smaller for the DST field.
Address ICEs running w/ incremental compilation and building glium
Fixes for various ICEs I encountered trying to build glium with incremental compilation enabled. Building glium now works. Of the 4 ICEs, I have test cases for 3 of them -- I didn't isolate a test for the last commit and kind of want to go do other things -- most notably, figuring out why incremental isn't saving much *effort*.
But if it seems worthwhile and I can come back and try to narrow down the problem.
r? @michaelwoerister
Fixes#34991Fixes#32015
Per the discussion on #34765, we make one `DepNode::Mir` variant and use
it to represent both the MIR tracking map as well as passes that operate
on MIR. We also track loads of cached MIR (which naturally comes from
metadata).
Note that the "HAIR" pass adds a read of TypeckItemBody because it uses
a myriad of tables that are not individually tracked.
Since LLVM reversed the order of the debug info graphs, we need to have
a compile unit that exists *before* any functions (`DISubprogram`s) are
created. This allows the LLVM debug info builder to automatically link
the functions to the compile unit.
In the older version, a `.o` and ` .bc` file were separate
work-products. This newer version keeps, for each codegen-unit, a set
of files of different kinds. We assume that if any kinds are available
then all the kinds we need are available, since the precise set of
switches will depend on attributes and command-line switches.
Should probably test this: the effect of changing attributes in
particular might not be successfully tracked?
This checks the `previous_work_products` data from the dep-graph and
tries to simply copy a `.o` file if possible. We also add new
work-products into the dep-graph, and create edges to/from the dep-node
for a work-product.
The data tracked here was meant to compare the output of the
translation item collector to the set of translation items found
by the on-demand translator.