rust/src/librustc/back/lto.rs

112 lines
4.3 KiB
Rust
Raw Normal View History

Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
// Copyright 2013 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
rustc: Optimize reading metadata by 4x We were previously reading metadata via `ar p`, but as learned from rustdoc awhile back, spawning a process to do something is pretty slow. Turns out LLVM has an Archive class to read archives, but it cannot write archives. This commits adds bindings to the read-only version of the LLVM archive class (with a new type that only has a read() method), and then it uses this class when reading the metadata out of rlibs. When you put this in tandem of not compressing the metadata, reading the metadata is 4x faster than it used to be The timings I got for reading metadata from the respective libraries was: libstd-04ff901e-0.9-pre.dylib => 100ms libstd-04ff901e-0.9-pre.rlib => 23ms librustuv-7945354c-0.9-pre.dylib => 4ms librustuv-7945354c-0.9-pre.rlib => 1ms librustc-5b94a16f-0.9-pre.dylib => 87ms librustc-5b94a16f-0.9-pre.rlib => 35ms libextra-a6ebb16f-0.9-pre.dylib => 63ms libextra-a6ebb16f-0.9-pre.rlib => 15ms libsyntax-2e4c0458-0.9-pre.dylib => 86ms libsyntax-2e4c0458-0.9-pre.rlib => 22ms In order to always take advantage of these faster metadata read-times, I sort the files in filesearch based on whether they have an rlib extension or not (prefer all rlib files first). Overall, this halved the compile time for a `fn main() {}` crate from 0.185s to 0.095s on my system (when preferring dynamic linking). Reading metadata is still the slowest pass of the compiler at 0.035s, but it's getting pretty close to linking at 0.021s! The next best optimization is to just not copy the metadata from LLVM because that's the most expensive part of reading metadata right now.
2013-12-16 22:58:21 -06:00
use back::archive::ArchiveRO;
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
use back::link;
use driver::session;
use lib::llvm::{ModuleRef, TargetMachineRef, llvm, True, False};
use metadata::cstore;
use util::common::time;
2014-02-26 11:58:41 -06:00
use libc;
use flate;
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
2014-03-05 08:36:01 -06:00
pub fn run(sess: &session::Session, llmod: ModuleRef,
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
tm: TargetMachineRef, reachable: &[~str]) {
if sess.opts.cg.prefer_dynamic {
sess.err("cannot prefer dynamic linking when performing LTO");
sess.note("only 'staticlib' and 'bin' outputs are supported with LTO");
sess.abort_if_errors();
}
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
// Make sure we actually can run LTO
2014-03-20 21:49:20 -05:00
for crate_type in sess.crate_types.borrow().iter() {
Redesign output flags for rustc This commit removes the -c, --emit-llvm, -s, --rlib, --dylib, --staticlib, --lib, and --bin flags from rustc, adding the following flags: * --emit=[asm,ir,bc,obj,link] * --crate-type=[dylib,rlib,staticlib,bin,lib] The -o option has also been redefined to be used for *all* flavors of outputs. This means that we no longer ignore it for libraries. The --out-dir remains the same as before. The new logic for files that rustc emits is as follows: 1. Output types are dictated by the --emit flag. The default value is --emit=link, and this option can be passed multiple times and have all options stacked on one another. 2. Crate types are dictated by the --crate-type flag and the #[crate_type] attribute. The flags can be passed many times and stack with the crate attribute. 3. If the -o flag is specified, and only one output type is specified, the output will be emitted at this location. If more than one output type is specified, then the filename of -o is ignored, and all output goes in the directory that -o specifies. The -o option always ignores the --out-dir option. 4. If the --out-dir flag is specified, all output goes in this directory. 5. If -o and --out-dir are both not present, all output goes in the current directory of the process. 6. When multiple output types are specified, the filestem of all output is the same as the name of the CrateId (derived from a crate attribute or from the filestem of the crate file). Closes #7791 Closes #11056 Closes #11667
2014-02-03 17:27:54 -06:00
match *crate_type {
session::CrateTypeExecutable | session::CrateTypeStaticlib => {}
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
_ => {
sess.fatal("lto can only be run for executables and \
static library outputs");
}
}
}
// For each of our upstream dependencies, find the corresponding rlib and
// load the bitcode from the archive. Then merge it into the current LLVM
// module that we've got.
2013-12-25 14:08:04 -06:00
let crates = sess.cstore.get_used_crates(cstore::RequireStatic);
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
for (cnum, path) in crates.move_iter() {
let name = sess.cstore.get_crate_data(cnum).name.clone();
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
let path = match path {
Some(p) => p,
None => {
sess.fatal(format!("could not find rlib for: `{}`", name));
}
};
rustc: Optimize reading metadata by 4x We were previously reading metadata via `ar p`, but as learned from rustdoc awhile back, spawning a process to do something is pretty slow. Turns out LLVM has an Archive class to read archives, but it cannot write archives. This commits adds bindings to the read-only version of the LLVM archive class (with a new type that only has a read() method), and then it uses this class when reading the metadata out of rlibs. When you put this in tandem of not compressing the metadata, reading the metadata is 4x faster than it used to be The timings I got for reading metadata from the respective libraries was: libstd-04ff901e-0.9-pre.dylib => 100ms libstd-04ff901e-0.9-pre.rlib => 23ms librustuv-7945354c-0.9-pre.dylib => 4ms librustuv-7945354c-0.9-pre.rlib => 1ms librustc-5b94a16f-0.9-pre.dylib => 87ms librustc-5b94a16f-0.9-pre.rlib => 35ms libextra-a6ebb16f-0.9-pre.dylib => 63ms libextra-a6ebb16f-0.9-pre.rlib => 15ms libsyntax-2e4c0458-0.9-pre.dylib => 86ms libsyntax-2e4c0458-0.9-pre.rlib => 22ms In order to always take advantage of these faster metadata read-times, I sort the files in filesearch based on whether they have an rlib extension or not (prefer all rlib files first). Overall, this halved the compile time for a `fn main() {}` crate from 0.185s to 0.095s on my system (when preferring dynamic linking). Reading metadata is still the slowest pass of the compiler at 0.035s, but it's getting pretty close to linking at 0.021s! The next best optimization is to just not copy the metadata from LLVM because that's the most expensive part of reading metadata right now.
2013-12-16 22:58:21 -06:00
let archive = ArchiveRO::open(&path).expect("wanted an rlib");
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
debug!("reading {}", name);
let bc = time(sess.time_passes(), format!("read {}.bc.deflate", name), (), |_|
archive.read(format!("{}.bc.deflate", name)));
let bc = bc.expect("missing compressed bytecode in archive!");
let bc = time(sess.time_passes(), format!("inflate {}.bc", name), (), |_|
flate::inflate_bytes(bc));
let ptr = bc.as_slice().as_ptr();
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
debug!("linking {}", name);
time(sess.time_passes(), format!("ll link {}", name), (), |()| unsafe {
if !llvm::LLVMRustLinkInExternalBitcode(llmod,
ptr as *libc::c_char,
bc.len() as libc::size_t) {
link::llvm_err(sess, format!("failed to load bc of `{}`", name));
}
});
}
// Internalize everything but the reachable symbols of the current module
let cstrs: Vec<::std::c_str::CString> = reachable.iter().map(|s| s.to_c_str()).collect();
let arr: Vec<*i8> = cstrs.iter().map(|c| c.with_ref(|p| p)).collect();
let ptr = arr.as_ptr();
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
unsafe {
llvm::LLVMRustRunRestrictionPass(llmod, ptr as **libc::c_char,
arr.len() as libc::size_t);
}
if sess.no_landing_pads() {
unsafe {
llvm::LLVMRustMarkAllFunctionsNounwind(llmod);
}
}
Implement LTO This commit implements LTO for rust leveraging LLVM's passes. What this means is: * When compiling an rlib, in addition to insdering foo.o into the archive, also insert foo.bc (the LLVM bytecode) of the optimized module. * When the compiler detects the -Z lto option, it will attempt to perform LTO on a staticlib or binary output. The compiler will emit an error if a dylib or rlib output is being generated. * The actual act of performing LTO is as follows: 1. Force all upstream libraries to have an rlib version available. 2. Load the bytecode of each upstream library from the rlib. 3. Link all this bytecode into the current LLVM module (just using llvm apis) 4. Run an internalization pass which internalizes all symbols except those found reachable for the local crate of compilation. 5. Run the LLVM LTO pass manager over this entire module 6a. If assembling an archive, then add all upstream rlibs into the output archive. This ignores all of the object/bitcode/metadata files rust generated and placed inside the rlibs. 6b. If linking a binary, create copies of all upstream rlibs, remove the rust-generated object-file, and then link everything as usual. As I have explained in #10741, this process is excruciatingly slow, so this is *not* turned on by default, and it is also why I have decided to hide it behind a -Z flag for now. The good news is that the binary sizes are about as small as they can be as a result of LTO, so it's definitely working. Closes #10741 Closes #10740
2013-12-03 01:19:29 -06:00
// Now we have one massive module inside of llmod. Time to run the
// LTO-specific optimization passes that LLVM provides.
//
// This code is based off the code found in llvm's LTO code generator:
// tools/lto/LTOCodeGenerator.cpp
debug!("running the pass manager");
unsafe {
let pm = llvm::LLVMCreatePassManager();
llvm::LLVMRustAddAnalysisPasses(tm, pm, llmod);
"verify".with_c_str(|s| llvm::LLVMRustAddPass(pm, s));
let builder = llvm::LLVMPassManagerBuilderCreate();
llvm::LLVMPassManagerBuilderPopulateLTOPassManager(builder, pm,
/* Internalize = */ False,
/* RunInliner = */ True);
llvm::LLVMPassManagerBuilderDispose(builder);
"verify".with_c_str(|s| llvm::LLVMRustAddPass(pm, s));
time(sess.time_passes(), "LTO pases", (), |()|
llvm::LLVMRunPassManager(pm, llmod));
llvm::LLVMDisposePassManager(pm);
}
debug!("lto done");
}