This is an optimization which is quite impactful for compiling small crates.
Reading libstd's metadata takes about 50ms, and a hello world before this change
took about 100ms (this change halves that time).
Recent changes made it such that this optimization wasn't performed, but I think
it's a better idea do to this for now. See #10786 for tracking this issue.
Fix#13266.
There is a little bit of acrobatics in the definition of `crate_paths`
to avoid calling `clone()` on the dylib/rlib unless we actually are
going to need them.
The other oddity is that I have replaced the `root_ident: Option<&str>`
parameter with a `root: &Option<CratePaths>`, which may surprise one
who was expecting to see something like: `root: Option<&CratePaths>`.
I went with the approach here because I could not come up with code for
the alternative that was acceptable to the borrow checker.
(i.e. semi-generalized version of prior errorinfo gathering.)
Also revised presentation to put each path on its own line, prefixed
by file:linenum information.
The recent pull request to remove libc from libstd has hit a wall in compiling
on windows, and I've been trying to investigate on the try bots as to why (it
compiles locally just fine). To the best of my knowledge, the LLVM section
iterator is behaving badly when iterating over the sections of the libc DLL.
Upon investigating the LLVMGetSectionName function in LLVM, I discovered that
this function doesn't always return a null-terminated string. It returns the
data pointer of a StringRef instance (LLVM's equivalent of &str essentially),
but it has no method of returning the length of the name of the section.
This commit modifies the section iteration when loading libraries to invoke a
custom LLVMRustGetSectionName which will correctly return both the length and
the data pointer.
I have not yet verified that this will fix landing liblibc, as it will require a
snapshot before doing a full test. Regardless, this is a worrisome situation
regarding the LLVM API, and should likely be fixed anyway.
Fix#13266.
There is a little bit of acrobatics in the definition of `crate_paths`
to avoid calling `clone()` on the dylib/rlib unless we actually are
going to need them.
The other oddity is that I have replaced the `root_ident: Option<&str>`
parameter with a `root: &Option<CratePaths>`, which may surprise one
who was expecting to see something like: `root: Option<&CratePaths>`.
I went with the approach here because I could not come up with code for
the alternative that was acceptable to the borrow checker.
Previously, any library of the pattern `lib<name>-<hash>-<version>.so` was
>considered a candidate (rightly so) for loading a crate. Sets are generated for
each unique `<hash>`, and then from these sets a candidate is selected. If a set
contained more than one element, then it immediately generated an error saying
that multiple copies of the same dylib were found.
This is incorrect because each candidate needs to be validated to actually
contain a rust library (valid metadata). This commit alters the logic to filter
each set of candidates for a hash to only libraries which are actually rust
libraries. This means that if multiple false positives are found with the right
name pattern, they're all ignored.
Closes#13010
Previously, any library of the pattern `lib<name>-<hash>-<version>.so` was
>considered a candidate (rightly so) for loading a crate. Sets are generated for
each unique `<hash>`, and then from these sets a candidate is selected. If a set
contained more than one element, then it immediately generated an error saying
that multiple copies of the same dylib were found.
This is incorrect because each candidate needs to be validated to actually
contain a rust library (valid metadata). This commit alters the logic to filter
each set of candidates for a hash to only libraries which are actually rust
libraries. This means that if multiple false positives are found with the right
name pattern, they're all ignored.
Closes#13010
Refactored get_metadata_section to return a Result<MetadataBlob,~str> instead of a Option<MetadataBlob>. This provides more clarity to the user through the debug output when using --ls.
This is kind of a continuation of my original closed pull request 2 months ago (#11544), but I think the time-span constitutes a new pull request.
When the metadata format changes, old libraries often cause librustc to abort
when reading their metadata. This should all change with the introduction of SVH
markers, but the loader for crates should gracefully handle libraries without
SVH markers still.
This commit adds support for tripping fewer assertions when loading libraries by
using maybe_get_doc when initially parsing metadata. It's still possible for
some libraries to fall through the cracks, but this should deal with a fairly
large number of them up front.
This new SVH is used to uniquely identify all crates as a snapshot in time of
their ABI/API/publicly reachable state. This current calculation is just a hash
of the entire crate's AST. This is obviously incorrect, but it is currently the
reality for today.
This change threads through the new Svh structure which originates from crate
dependencies. The concept of crate id hash is preserved to provide efficient
matching on filenames for crate loading. The inspected hash once crate metadata
is opened has been changed to use the new Svh.
The goal of this hash is to identify when upstream crates have changed but
downstream crates have not been recompiled. This will prevent the def-id drift
problem where upstream crates were recompiled, thereby changing their metadata,
but downstream crates were not recompiled.
In the future this hash can be expanded to exclude contents of the AST like doc
comments, but limitations in the compiler prevent this change from being made at
this time.
Closes#10207
The previous code passed around a {name,version} pair everywhere, but this is
better expressed as a CrateId. This patch changes these paths to store and pass
around crate ids instead of these pairs of name/version. This also prepares the
code to change the type of hash that is stored in crates.
This commit implements a layman's version of realpath() for metadata::loader to
use in order to not error on symlinks pointing to the same file.
Closes#12459
These two containers are indeed collections, so their place is in
libcollections, not in libstd. There will always be a hash map as part of the
standard distribution of Rust, but by moving it out of the standard library it
makes libstd that much more portable to more platforms and environments.
This conveniently also removes the stuttering of 'std::hashmap::HashMap',
although 'collections::HashMap' is only one character shorter.
Closes#12366.
Parentheses around assignment statements such as
let mut a = (0);
a = (1);
a += (2);
are not necessary and therefore an unnecessary_parens warning is raised when
statements like this occur.
The warning mechanism was refactored along the way to allow for code reuse
between the routines for checking expressions and statements.
Code had to be adopted throughout the compiler and standard libraries to comply
with this modification of the lint.
This commit rewrites crate loading internally in attempt to look at less
metadata and provide nicer errors. The loading is now split up into a few
stages:
1. Collect a mapping of (hash => ~[Path]) for a set of candidate libraries for a
given search. The hash is the hash in the filename and the Path is the
location of the library in question. All candidates are filtered based on
their prefix/suffix (dylib/rlib appropriate) and then the hash/version are
split up and are compared (if necessary).
This means that if you're looking for an exact hash of library you don't have
to open up the metadata of all libraries named the same, but also in your
path.
2. Once this mapping is constructed, each (hash, ~[Path]) pair is filtered down
to just a Path. This is necessary because the same rlib could show up twice
in the path in multiple locations. Right now the filenames are based on just
the crate id, so this could be indicative of multiple version of a crate
during one crate_id lifetime in the path. If multiple duplicate crates are
found, an error is generated.
3. Now that we have a mapping of (hash => Path), we error on multiple versions
saying that multiple versions were found. Only if there's one (hash => Path)
pair do we actually return that Path and its metadata.
With this restructuring, it restructures code so errors which were assertions
previously are now first-class errors. Additionally, this should read much less
metadata with lots of crates of the same name or same version in a path.
Closes#11908
It was decided a long, long time ago that libextra should not exist, but rather its modules should be split out into smaller independent libraries maintained outside of the compiler itself. The theory was to use `rustpkg` to manage dependencies in order to move everything out of the compiler, but maintain an ease of usability.
Sadly, the work on `rustpkg` isn't making progress as quickly as expected, but the need for dissolving libextra is becoming more and more pressing. Because of this, we've thought that a good interim solution would be to simply package more libraries with the rust distribution itself. Instead of dissolving libextra into libraries outside of the mozilla/rust repo, we can dissolve libraries into the mozilla/rust repo for now.
Work on this has been excruciatingly painful in the past because the makefiles are completely opaque to all but a few. Adding a new library involved adding about 100 lines spread out across 8 files (incredibly error prone). The first commit of this pull request targets this pain point. It does not rewrite the build system, but rather refactors large portions of it. Afterwards, adding a new library is as simple as modifying 2 lines (easy, right?). The build system automatically keeps track of dependencies between crates (rust *and* native), promotes binaries between stages, tracks dependencies of installed tools, etc, etc.
With this newfound buildsystem power, I chose the `extra::flate` module as the first candidate for removal from libextra. While a small module, this module is relative complex in that is has a C dependency and the compiler requires it (messing with the dependency graph a bit). Albeit I modified more than 2 lines of makefiles to accomodate libflate (the native dependency required 2 extra lines of modifications), but the removal process was easy to do and straightforward.
---
Testing-wise, I've cross-compiled, run tests, built some docs, installed, uninstalled, etc. I'm still working out a few kinks, and I'm sure that there's gonna be built system issues after this, but it should be working well for basic use!
cc #8784
This is hopefully the beginning of the long-awaited dissolution of libextra.
Using the newly created build infrastructure for building libraries, I decided
to move the first module out of libextra.
While not being a particularly meaty module in and of itself, the flate module
is required by rustc and additionally has a native C dependency. I was able to
very easily split out the C dependency from rustrt, update librustc, and
magically everything gets installed to the right locations and built
automatically.
This is meant to be a proof-of-concept commit to how easy it is to remove
modules from libextra now. I didn't put any effort into modernizing the
interface of libflate or updating it other than to remove the one glob import it
had.
We were previously reading metadata via `ar p`, but as learned from rustdoc
awhile back, spawning a process to do something is pretty slow. Turns out LLVM
has an Archive class to read archives, but it cannot write archives.
This commits adds bindings to the read-only version of the LLVM archive class
(with a new type that only has a read() method), and then it uses this class
when reading the metadata out of rlibs. When you put this in tandem of not
compressing the metadata, reading the metadata is 4x faster than it used to be
The timings I got for reading metadata from the respective libraries was:
libstd-04ff901e-0.9-pre.dylib => 100ms
libstd-04ff901e-0.9-pre.rlib => 23ms
librustuv-7945354c-0.9-pre.dylib => 4ms
librustuv-7945354c-0.9-pre.rlib => 1ms
librustc-5b94a16f-0.9-pre.dylib => 87ms
librustc-5b94a16f-0.9-pre.rlib => 35ms
libextra-a6ebb16f-0.9-pre.dylib => 63ms
libextra-a6ebb16f-0.9-pre.rlib => 15ms
libsyntax-2e4c0458-0.9-pre.dylib => 86ms
libsyntax-2e4c0458-0.9-pre.rlib => 22ms
In order to always take advantage of these faster metadata read-times, I sort
the files in filesearch based on whether they have an rlib extension or not
(prefer all rlib files first).
Overall, this halved the compile time for a `fn main() {}` crate from 0.185s to
0.095s on my system (when preferring dynamic linking). Reading metadata is still
the slowest pass of the compiler at 0.035s, but it's getting pretty close to
linking at 0.021s! The next best optimization is to just not copy the metadata
from LLVM because that's the most expensive part of reading metadata right now.
Now that the metadata is an owned value with a lifetime of a borrowed byte
slice, it's possible to have future optimizations where the metadata doesn't
need to be copied around (very expensive operation).
This replaces the link meta attributes with a pkgid attribute and uses a hash
of this as the crate hash. This makes the crate hash computable by things
other than the Rust compiler. It also switches the hash function ot SHA1 since
that is much more likely to be available in shell, Python, etc than SipHash.
Fixes#10188, #8523.
Right now whenever an rlib file is linked against, all of the metadata from the
rlib is pulled in to the final staticlib or binary. The reason for this is that
the metadata is currently stored in a section of the object file. Note that this
is intentional for dynamic libraries in order to distribute metadata bundled
with static libraries.
This commit alters the situation for rlib libraries to instead store the
metadata in a separate file in the archive. In doing so, when the archive is
passed to the linker, none of the metadata will get pulled into the result
executable. Furthermore, the metadata file is skipped when assembling rlibs into
an archive.
The snag in this implementation comes with multiple output formats. When
generating a dylib, the metadata needs to be in the object file, but when
generating an rlib this needs to be separate. In order to accomplish this, the
metadata variable is inserted into an entirely separate LLVM Module which is
then codegen'd into a different location (foo.metadata.o). This is then linked
into dynamic libraries and silently ignored for rlib files.
While changing how metadata is inserted into archives, I have also stopped
compressing metadata when inserted into rlib files. We have wanted to stop
compressing metadata, but the sections it creates in object file sections are
apparently too large. Thankfully if it's just an arbitrary file it doesn't
matter how large it is.
I have seen massive reductions in executable sizes, as well as staticlib output
sizes (to confirm that this is all working).