This commit alters the implementation of multiple codegen units slightly to be
compatible with the MSVC linker. Currently the implementation will take the N
object files created by each codegen unit and will run `ld -r` to create a new
object file which is then passed along. The MSVC linker, however, is not able to
do this operation.
The compiler will now no longer attempt to assemble object files together but
will instead just pass through all the object files as usual. This implies that
rlibs may not contain more than one object file (if the library is compiled with
more than one codegen unit) and the output of `-C save-temps` will have changed
slightly as object files with the extension `0.o` will not be renamed to `o`
unless requested otherwise.
This commit starts passing the `--whole-archive` flag (`-force_load` on OSX) to
the linker when linking rlibs into dylibs. The primary purpose of this commit is
to ensure that the linker doesn't strip out objects from an archive when
creating a dynamic library. Information on how this can go wrong can be found in
issues #14344 and #25185.
The unfortunate part about passing this flag to the linker is that we have to
preprocess the rlib to remove the metadata and compressed bytecode found within.
This means that creating a dylib will now take longer to link as we've got to
copy around the input rlibs to a temporary location, modify them, and then
invoke the linker. This isn't done for executables, however, so the "hello
world" compile time is not affected.
This fix was instigated because of the previous commit where rlibs may not
contain multiple object files instead of one due to codegen units being greater
than one. That change prevented the main distribution from being compiled with
more than one codegen-unit and this commit fixes that.
Closes#14344Closes#25185
Special thanks to @retep998 for the [excellent writeup](https://github.com/rust-lang/rfcs/issues/1061) of tasks to be done and @ricky26 for initially blazing the trail here!
# MSVC Support
This goal of this series of commits is to add MSVC support to the Rust compiler
and build system, allowing it more easily interoperate with Visual Studio
installations and native libraries compiled outside of MinGW.
The tl;dr; of this change is that there is a new target of the compiler,
`x86_64-pc-windows-msvc`, which will not interact with the MinGW toolchain at
all and will instead use `link.exe` to assemble output artifacts.
## Why try to use MSVC?
With today's Rust distribution, when you install a compiler on Windows you also
install `gcc.exe` and a number of supporting libraries by default (this can be
opted out of). This allows installations to remain independent of MinGW
installations, but it still generally requires native code to be linked with
MinGW instead of MSVC. Some more background can also be found in #1768 about the
incompatibilities between MinGW and MSVC.
Overall the current installation strategy is quite nice so long as you don't
interact with native code, but once you do the usage of a MinGW-based `gcc.exe`
starts to get quite painful.
Relying on a nonstandard Windows toolchain has also been a long-standing "code
smell" of Rust and has been slated for remedy for quite some time now. Using a
standard toolchain is a great motivational factor for improving the
interoperability of Rust code with the native system.
## What does it mean to use MSVC?
"Using MSVC" can be a bit of a nebulous concept, but this PR defines it as:
* The build system for Rust will build as much code as possible with the MSVC
compiler, `cl.exe`.
* The build system will use native MSVC tools for managing archives.
* The compiler will link all output with `link.exe` instead of `gcc.exe`.
None of these are currently implemented today, but all are required for the
compiler to fluently interoperate with MSVC.
## How does this all work?
At the highest level, this PR adds a new target triple to the Rust compiler:
x86_64-pc-windows-msvc
All logic for using MSVC or not is scoped within this triple and code can
conditionally build for MSVC or MinGW via:
#[cfg(target_env = "msvc")]
It is expected that auto builders will be set up for MSVC-based compiles in
addition to the existing MinGW-based compiles, and we will likely soon start
shipping MSVC nightlies where `x86_64-pc-windows-msvc` is the host target triple
of the compiler.
# Summary of changes
Here I'll explain at a high level what many of the changes made were targeted
at, but many more details can be found in the commits themselves. Many thanks to
@retep998 for the excellent writeup in rust-lang/rfcs#1061 and @rick26 for a lot
of the initial proof-of-concept work!
## Build system changes
As is probably expected, a large chunk of this PR is changes to Rust's build
system to build with MSVC. At a high level **it is an explicit non goal** to
enable building outside of a MinGW shell, instead all Makefile infrastructure we
have today is retrofitted with support to use MSVC instead of the standard MSVC
toolchain. Some of the high-level changes are:
* The configure script now detects when MSVC is being targeted and adds a number
of additional requirements about the build environment:
* The `--msvc-root` option must be specified or `cl.exe` must be in PATH to
discover where MSVC is installed. The compiler in use is also required to
target x86_64.
* Once the MSVC root is known, the INCLUDE/LIB environment variables are
scraped so they can be reexported by the build system.
* CMake is required to build LLVM with MSVC (and LLVM is also configured with
CMake instead of the normal configure script).
* jemalloc is currently unconditionally disabled for MSVC targets as jemalloc
isn't a hard requirement and I don't know how to build it with MSVC.
* Invocations of a C and/or C++ compiler are now abstracted behind macros to
appropriately call the underlying compiler with the correct format of
arguments, for example there is now a macro for "assemble an archive from
objects" instead of hard-coded invocations of `$(AR) crus liboutput.a ...`
* The output filenames for standard libraries such as morestack/compiler-rt are
now "more correct" on windows as they are shipped as `foo.lib` instead of
`libfoo.a`.
* Rust targets can now depend on native tools provided by LLVM, and as you'll
see in the commits the entire MSVC target depends on `llvm-ar.exe`.
* Support for custom arbitrary makefile dependencies of Rust targets has been
added. The MSVC target for `rustc_llvm` currently requires a custom `.DEF`
file to be passed to the linker to get further linkages to complete.
## Compiler changes
The modifications made to the compiler have so far largely been minor tweaks
here and there, mostly just adding a layer of abstraction over whether MSVC or a
GNU-like linker is being used. At a high-level these changes are:
* The section name for metadata storage in dynamic libraries is called `.rustc`
for MSVC-based platorms as section names cannot contain more than 8
characters.
* The implementation of `rustc_back::Archive` was refactored, but the
functionality has remained the same.
* Targets can now specify the default `ar` utility to use, and for MSVC this
defaults to `llvm-ar.exe`
* The building of the linker command in `rustc_trans:🔙:link` has been
abstracted behind a trait for the same code path to be used between GNU and
MSVC linkers.
## Standard library changes
Only a few small changes were required to the stadnard library itself, and only
for minor differences between the C runtime of msvcrt.dll and MinGW's libc.a
* Some function names for floating point functions have leading underscores, and
some are not present at all.
* Linkage to the `advapi32` library for crypto-related functions is now
explicit.
* Some small bits of C code here and there were fixed for compatibility with
MSVC's cl.exe compiler.
# Future Work
This commit is not yet a 100% complete port to using MSVC as there are still
some key components missing as well as some unimplemented optimizations. This PR
is already getting large enough that I wanted to draw the line here, but here's
a list of what is not implemented in this PR, on purpose:
## Unwinding
The revision of our LLVM submodule [does not seem to implement][llvm] does not
support lowering SEH exception handling on the Windows MSVC targets, so
unwinding support is not currently implemented for the standard library (it's
lowered to an abort).
[llvm]: https://github.com/rust-lang/llvm/blob/rust-llvm-2015-02-19/lib/CodeGen/Passes.cpp#L454-L461
It looks like, however, that upstream LLVM has quite a bit more support for SEH
unwinding and landing pads than the current revision we have, so adding support
will likely just involve updating LLVM and then adding some shims of our own
here and there.
## dllimport and dllexport
An interesting part of Windows which MSVC forces our hand on (and apparently
MinGW didn't) is the usage of `dllimport` and `dllexport` attributes in LLVM IR
as well as native dependencies (in C these correspond to
`__declspec(dllimport)`).
Whenever a dynamic library is built by MSVC it must have its public interface
specified by functions tagged with `dllexport` or otherwise they're not
available to be linked against. This poses a few problems for the compiler, some
of which are somewhat fundamental, but this commit alters the compiler to attach
the `dllexport` attribute to all LLVM functions that are reachable (e.g. they're
already tagged with external linkage). This is suboptimal for a few reasons:
* If an object file will never be included in a dynamic library, there's no need
to attach the dllexport attribute. Most object files in Rust are not destined
to become part of a dll as binaries are statically linked by default.
* If the compiler is emitting both an rlib and a dylib, the same source object
file is currently used but with MSVC this may be less feasible. The compiler
may be able to get around this, but it may involve some invasive changes to
deal with this.
The flipside of this situation is that whenever you link to a dll and you import
a function from it, the import should be tagged with `dllimport`. At this time,
however, the compiler does not emit `dllimport` for any declarations other than
constants (where it is required), which is again suboptimal for even more
reasons!
* Calling a function imported from another dll without using `dllimport` causes
the linker/compiler to have extra overhead (one `jmp` instruction on x86) when
calling the function.
* The same object file may be used in different circumstances, so a function may
be imported from a dll if the object is linked into a dll, but it may be
just linked against if linked into an rlib.
* The compiler has no knowledge about whether native functions should be tagged
dllimport or not.
For now the compiler takes the perf hit (I do not have any numbers to this
effect) by marking very little as `dllimport` and praying the linker will take
care of everything. Fixing this problem will likely require adding a few
attributes to Rust itself (feature gated at the start) and then strongly
recommending static linkage on Windows! This may also involve shipping a
statically linked compiler on Windows instead of a dynamically linked compiler,
but these sorts of changes are pretty invasive and aren't part of this PR.
## CI integration
Thankfully we don't need to set up a new snapshot bot for the changes made here as our snapshots are freestanding already, we should be able to use the same snapshot to bootstrap both MinGW and MSVC compilers (once a new snapshot is made from these changes).
I plan on setting up a new suite of auto bots which are testing MSVC configurations for now as well, for now they'll just be bootstrapping and not running tests, but once unwinding is implemented they'll start running all tests as well and we'll eventually start gating on them as well.
---
I'd love as many eyes on this as we've got as this was one of my first interactions with MSVC and Visual Studio, so there may be glaring holes that I'm missing here and there!
cc @retep998, @ricky26, @vadimcn, @klutzy
r? @brson
It looks like section names in objects generated by `link.exe` are limited to at
most 8 characters in length, so shorten `.note.rustc` to just `.rustc`
These implementations were intended to be unstable, but currently the stability
attributes cannot handle a stable trait with an unstable `impl` block. This
commit also audits the rest of the standard library for explicitly-`#[unstable]`
impl blocks. No others were removed but some annotations were changed to
`#[stable]` as they're defacto stable anyway.
One particularly interesting `impl` marked `#[stable]` as part of this commit
is the `Add<&[T]>` impl for `Vec<T>`, which uses `push_all` and implicitly
clones all elements of the vector provided.
Closes#24791
I've been working on improving the diagnostic registration system so that it can:
* Check uniqueness of error codes *across the whole compiler*. The current method using `errorck.py` is prone to failure as it relies on simple text search - I found that it breaks when referencing an error's ident within a string (e.g. `"See also E0303"`).
* Provide JSON output of error metadata, to eventually facilitate HTML output, as well as tracking of which errors need descriptions. The current schema is:
```
<error code>: {
"description": <long description>,
"use_site": {
"filename": <filename where error is used>,
"line": <line in file where error is used>
}
}
```
[Here's][metadata-dump] a pretty-printed sample dump for `librustc`.
One thing to note is that I had to move the diagnostics arrays out of the diagnostics modules. I really wanted to be able to capture error usage information, which only becomes available as a crate is compiled. Hence all invocations of `__build_diagnostics_array!` have been moved to the ends of their respective `lib.rs` files. I tried to avoid moving the array by making a plugin that expands to nothing but couldn't invoke it in item position and gave up on hackily generating a fake item. I also briefly considered using a lint, but it seemed like it would impossible to get access to the data stored in the thread-local storage.
The next step will be to generate a web page that lists each error with its rendered description and use site. Simple mapping and filtering of the metadata files also allows us to work out which error numbers are absent, which errors are unused and which need descriptions.
[metadata-dump]: https://gist.github.com/michaelsproul/3246846ff1bea71bd049
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
librustc_driver and libsyntax. This may change pretty-printed
output from these modules in cases involving wide or combining
characters used in filenames, identifiers, etc.
The following functions are marked deprecated:
1. char.width() and str.width():
--> use unicode-width crate
2. str.graphemes() and str.grapheme_indices():
--> use unicode-segmentation crate
3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
char.compose(), char.decompose_canonical(), char.decompose_compatible(),
char.canonical_combining_class():
--> use unicode-normalization crate
This commit stabilizes a few remaining bits of the `io::Error` type:
* The `Error::new` method is now stable. The last `detail` parameter was removed
and the second `desc` parameter was generalized to `E: Into<Box<Error>>` to
allow creating an I/O error from any form of error. Currently there is no form
of downcasting, but this will be added in time.
* An implementation of `From<&str> for Box<Error>` was added to liballoc to
allow construction of errors from raw strings.
* The `Error::raw_os_error` method was stabilized as-is.
* Trait impls for `Clone`, `Eq`, and `PartialEq` were removed from `Error` as it
is not possible to use them with trait objects.
This is a breaking change due to the modification of the `new` method as well as
the removal of the trait implementations for the `Error` type.
[breaking-change]
* Marks `#[stable]` the contents of the `std::convert` module.
* Added methods `PathBuf::as_path`, `OsString::as_os_str`,
`String::as_str`, `Vec::{as_slice, as_mut_slice}`.
* Deprecates `OsStr::from_str` in favor of a new, stable, and more
general `OsStr::new`.
* Adds unstable methods `OsString::from_bytes` and `OsStr::{to_bytes,
to_cstring}` for ergonomic FFI usage.
[breaking-change]
This commit:
* Introduces `std::convert`, providing an implementation of
RFC 529.
* Deprecates the `AsPath`, `AsOsStr`, and `IntoBytes` traits, all
in favor of the corresponding generic conversion traits.
Consequently, various IO APIs now take `AsRef<Path>` rather than
`AsPath`, and so on. Since the types provided by `std` implement both
traits, this should cause relatively little breakage.
* Deprecates many `from_foo` constructors in favor of `from`.
* Changes `PathBuf::new` to take no argument (creating an empty buffer,
as per convention). The previous behavior is now available as
`PathBuf::from`.
* De-stabilizes `IntoCow`. It's not clear whether we need this separate trait.
Closes#22751Closes#14433
[breaking-change]
This commit clarifies some of the unstable features in the `str` module by
moving them out of the blanket `core` and `collections` features.
The following methods were moved to the `str_char` feature which generally
encompasses decoding specific characters from a `str` and dealing with the
result. It is unclear if any of these methods need to be stabilized for 1.0 and
the most conservative route for now is to continue providing them but to leave
them as unstable under a more specific name.
* `is_char_boundary`
* `char_at`
* `char_range_at`
* `char_at_reverse`
* `char_range_at_reverse`
* `slice_shift_char`
The following methods were moved into the generic `unicode` feature as they are
specifically enabled by the `unicode` crate itself.
* `nfd_chars`
* `nfkd_chars`
* `nfc_chars`
* `graphemes`
* `grapheme_indices`
* `width`
This commit stabilizes essentially all of the new `std::path` API. The
API itself is changed in a couple of ways (which brings it in closer
alignment with the RFC):
* `.` components are now normalized away, unless they appear at the
start of a path. This in turn effects the semantics of e.g. asking for
the file name of `foo/` or `foo/.`, both of which yield `Some("foo")`
now. This semantics is what the original RFC specified, and is also
desirable given early experience rolling out the new API.
* The `parent` method is now `without_file` and succeeds if, and only
if, `file_name` is `Some(_)`. That means, in particular, that it fails
for a path like `foo/../`. This change affects `pop` as well.
In addition, the `old_path` module is now deprecated.
[breaking-change]
r? @alexcrichton
This commit stabilizes essentially all of the new `std::path` API. The
API itself is changed in a couple of ways (which brings it in closer
alignment with the RFC):
* `.` components are now normalized away, unless they appear at the
start of a path. This in turn effects the semantics of e.g. asking for
the file name of `foo/` or `foo/.`, both of which yield `Some("foo")`
now. This semantics is what the original RFC specified, and is also
desirable given early experience rolling out the new API.
* The `parent` function now succeeds if, and only if, the path has at
least one non-root/prefix component. This change affects `pop` as
well.
* The `Prefix` component now involves a separate `PrefixComponent`
struct, to better allow for keeping both parsed and unparsed prefix data.
In addition, the `old_path` module is now deprecated.
Closes#23264
[breaking-change]
* Consumers of handle_options assume the unstable options are defined in
the getopts::Matches value if -Z unstable-options is set, but that's not
the case if there weren't any actual unstable options. Fix this by
always reparsing options when -Z unstable-options is set.
* If both argument parsing attempts fail, print the error from the second
attempt rather than the first. The error from the first is very poor
whenever unstable options are present. e.g.:
$ rustc hello.rs -Z unstable-options --show-span
error: Unrecognized option: 'show-span'.
$ rustc hello.rs -Z unstable-options --pretty --pretty
error: Unrecognized option: 'pretty'.
$ rustc hello.rs -Z unstable-options --pretty --bad-option
error: Unrecognized option: 'pretty'.
* On the second parse, add a separate pass to reject unstable options if
-Z unstable-options wasn't specified.
Fixes#21715.
r? @pnkfelix
* Consumers of handle_options assume the unstable options are defined in
the getopts::Matches value if -Z unstable-options is set, but that's not
the case if there weren't any actual unstable options. Fix this by
always reparsing options when -Z unstable-options is set.
* If both argument parsing attempts fail, print the error from the second
attempt rather than the first. The error from the first is very poor
whenever unstable options are present. e.g.:
$ rustc hello.rs -Z unstable-options --show-span
error: Unrecognized option: 'show-span'.
$ rustc hello.rs -Z unstable-options --pretty --pretty
error: Unrecognized option: 'pretty'.
$ rustc hello.rs -Z unstable-options --pretty --bad-option
error: Unrecognized option: 'pretty'.
* On the second parse, add a separate pass to reject unstable options if
-Z unstable-options wasn't specified.
Fixes#21715.
r? @pnkfelix
This commit performs another pass over the `std::char` module for stabilization.
Some minor cleanup is performed such as migrating documentation from libcore to
libunicode (where the `std`-facing trait resides) as well as a slight
reorganiation in libunicode itself. Otherwise, the stability modifications made
are:
* `char::from_digit` is now stable
* `CharExt::is_digit` is now stable
* `CharExt::to_digit` is now stable
* `CharExt::to_{lower,upper}case` are now stable after being modified to return
an iterator over characters. While the implementation today has not changed
this should allow us to implement the full set of case conversions in unicode
where some characters can map to multiple when doing an upper or lower case
mapping.
* `StrExt::to_{lower,upper}case` was added as unstable for a convenience of not
having to worry about characters expanding to more characters when you just
want the whole string to get into upper or lower case.
This is a breaking change due to the change in the signatures of the
`CharExt::to_{upper,lower}case` methods. Code can be updated to use functions
like `flat_map` or `collect` to handle the difference.
[breaking-change]
This commit performs a stabilization pass over the `std::fs` module now that
it's had some time to bake. The change was largely just adding `#[stable]` tags,
but there are a few APIs that remain `#[unstable]`.
The following apis are now marked `#[stable]`:
* `std::fs` (the name)
* `File`
* `Metadata`
* `ReadDir`
* `DirEntry`
* `OpenOptions`
* `Permissions`
* `File::{open, create}`
* `File::{sync_all, sync_data}`
* `File::set_len`
* `File::metadata`
* Trait implementations for `File` and `&File`
* `OpenOptions::new`
* `OpenOptions::{read, write, append, truncate, create}`
* `OpenOptions::open` - this function was modified, however, to not attempt to
reject cross-platform openings of directories. This means that some platforms
will succeed in opening a directory and others will fail.
* `Metadata::{is_dir, is_file, len, permissions}`
* `Permissions::{readonly, set_readonly}`
* `Iterator for ReadDir`
* `DirEntry::path`
* `remove_file` - like with `OpenOptions::open`, the extra windows code to
remove a readonly file has been removed. This means that removing a readonly
file will succeed on some platforms but fail on others.
* `metadata`
* `rename`
* `copy`
* `hard_link`
* `soft_link`
* `read_link`
* `create_dir`
* `create_dir_all`
* `remove_dir`
* `remove_dir_all`
* `read_dir`
The following apis remain `#[unstable]`.
* `WalkDir` and `walk` - there are many methods by which a directory walk can be
constructed, and it's unclear whether the current semantics are the right
ones. For example symlinks are not handled super well currently. This is now
behind a new `fs_walk` feature.
* `File::path` - this is an extra abstraction which the standard library
provides on top of what the system offers and it's unclear whether we should
be doing so. This is now behind a new `file_path` feature.
* `Metadata::{accessed, modified}` - we do not currently have a good
abstraction for a moment in time which is what these APIs should likely be
returning, so these remain `#[unstable]` for now. These are now behind a new
`fs_time` feature
* `set_file_times` - like with `Metadata::accessed`, we do not currently have
the appropriate abstraction for the arguments here so this API remains
unstable behind the `fs_time` feature gate.
* `PathExt` - the precise set of methods on this trait may change over time and
some methods may be removed. This API remains unstable behind the `path_ext`
feature gate.
* `set_permissions` - we may wish to expose a more granular ability to set the
permissions on a file instead of just a blanket \"set all permissions\" method.
This function remains behind the `fs` feature.
The following apis are now `#[deprecated]`
* The `TempDir` type is now entirely deprecated and is [located on
crates.io][tempdir] as the `tempdir` crate with [its source][github] at
rust-lang/tempdir.
[tempdir]: https://crates.io/crates/tempdir
[github]: https://github.com/rust-lang/tempdir
The stability of some of these APIs has been questioned over the past few weeks
in using these APIs, and it is intentional that the majority of APIs here are
marked `#[stable]`. The `std::fs` module has a lot of room to grow and the
material is [being tracked in a RFC issue][rfc-issue].
[rfc-issue]: rust-lang/rfcs#939
Closes#22879
[breaking-change]
The new `io` module has had some time to bake and this commit stabilizes some of
the utilities associated with it. This commit also deprecates a number of
`std::old_io::util` functions and structures.
These items are now `#[stable]`
* `Cursor`
* `Cursor::{new, into_inner, get_ref, get_mut, position, set_position}`
* Implementations of I/O traits for `Cursor<T>`
* Delegating implementations of I/O traits for references and `Box` pointers
* Implementations of I/O traits for primitives like slices and `Vec<T>`
* `ReadExt::bytes`
* `Bytes` (and impls)
* `ReadExt::chain`
* `Chain` (and impls)
* `ReadExt::take` (and impls)
* `BufReadExt::lines`
* `Lines` (and impls)
* `io::copy`
* `io::{empty, Empty}` (and impls)
* `io::{sink, Sink}` (and impls)
* `io::{repeat, Repeat}` (and impls)
These items remain `#[unstable]`
* Core I/O traits. These may want a little bit more time to bake along with the
commonly used methods like `read_to_end`.
* `BufReadExt::split` - this function may be renamed to not conflict with
`SliceExt::split`.
* `Error` - there are a number of questions about its representation,
`ErrorKind`, and usability.
These items are now `#[deprecated]` in `old_io`
* `LimitReader` - use `take` instead
* `NullWriter` - use `io::sink` instead
* `ZeroReader` - use `io::repeat` instead
* `NullReader` - use `io::empty` instead
* `MultiWriter` - use `broadcast` instead
* `ChainedReader` - use `chain` instead
* `TeeReader` - use `tee` instead
* `copy` - use `io::copy` instead
[breaking-change]