In a followup to PR #26849, improve one more location for I/O where
we can use `Vec::resize` to ensure better performance when zeroing
buffers.
Use the `vec![elt; n]` macro everywhere we can in the tree. It replaces
`repeat(elt).take(n).collect()` which is more verbose, requires type
hints, and right now produces worse code. `vec![]` is preferable for vector
initialization.
The `vec![]` replacement touches upon one I/O path too, Stdin::read
for windows, and that should be a small improvement.
r? @alexcrichton
The common pattern `iter::repeat(elt).take(n).collect::<Vec<_>>()` is
exactly equivalent to `vec![elt; n]`, do this replacement in the whole
tree.
(Actually, vec![] is smart enough to only call clone n - 1 times, while
the former solution would call clone n times, and this fact is
virtually irrelevant in practice.)
Exploiting the fact that getting the length of the slices is known, we
can use a counted loop instead of iterators, which means that we only
need a single counter, instead of having to increment and check one
pointer for each iterator.
Benchmarks comparing vectors with 100,000 elements:
Before:
```
running 8 tests
test eq1_u8 ... bench: 66,757 ns/iter (+/- 113)
test eq2_u16 ... bench: 111,267 ns/iter (+/- 149)
test eq3_u32 ... bench: 126,282 ns/iter (+/- 111)
test eq4_u64 ... bench: 126,418 ns/iter (+/- 155)
test ne1_u8 ... bench: 88,990 ns/iter (+/- 161)
test ne2_u16 ... bench: 89,126 ns/iter (+/- 265)
test ne3_u32 ... bench: 96,901 ns/iter (+/- 92)
test ne4_u64 ... bench: 96,750 ns/iter (+/- 137)
```
After:
```
running 8 tests
test eq1_u8 ... bench: 46,413 ns/iter (+/- 521)
test eq2_u16 ... bench: 46,500 ns/iter (+/- 74)
test eq3_u32 ... bench: 50,059 ns/iter (+/- 92)
test eq4_u64 ... bench: 54,001 ns/iter (+/- 92)
test ne1_u8 ... bench: 47,595 ns/iter (+/- 53)
test ne2_u16 ... bench: 47,521 ns/iter (+/- 59)
test ne3_u32 ... bench: 44,889 ns/iter (+/- 74)
test ne4_u64 ... bench: 47,775 ns/iter (+/- 68)
```
Fixes#23302.
Note that there's an odd situation regarding the following, most likely due to some inadequacy in `const_eval`:
```rust
enum Y {
A = 1usize,
B,
}
```
In this case, `Y::B as usize` might be considered a constant expression in some cases, but not others. (See #23513, for a related problem where there is only one variant, with no discriminant, and it doesn't behave nicely as a constant expression either.)
Most of the complexity in this PR is basically future-proofing, to ensure that when `Y::B as usize` is fully made to be a constant expression, it can't be used to set `Y::A`, and thus indirectly itself.
This commit alters the implementation of multiple codegen units slightly to be
compatible with the MSVC linker. Currently the implementation will take the N
object files created by each codegen unit and will run `ld -r` to create a new
object file which is then passed along. The MSVC linker, however, is not able to
do this operation.
The compiler will now no longer attempt to assemble object files together but
will instead just pass through all the object files as usual. This implies that
rlibs may not contain more than one object file (if the library is compiled with
more than one codegen unit) and the output of `-C save-temps` will have changed
slightly as object files with the extension `0.o` will not be renamed to `o`
unless requested otherwise.
This commit starts passing the `--whole-archive` flag (`-force_load` on OSX) to
the linker when linking rlibs into dylibs. The primary purpose of this commit is
to ensure that the linker doesn't strip out objects from an archive when
creating a dynamic library. Information on how this can go wrong can be found in
issues #14344 and #25185.
The unfortunate part about passing this flag to the linker is that we have to
preprocess the rlib to remove the metadata and compressed bytecode found within.
This means that creating a dylib will now take longer to link as we've got to
copy around the input rlibs to a temporary location, modify them, and then
invoke the linker. This isn't done for executables, however, so the "hello
world" compile time is not affected.
This fix was instigated because of the previous commit where rlibs may not
contain multiple object files instead of one due to codegen units being greater
than one. That change prevented the main distribution from being compiled with
more than one codegen-unit and this commit fixes that.
Closes#14344Closes#25185
Improve zerofill in Vec::resize and Read::read_to_end
We needed a more efficient way to zerofill the vector in read_to_end.
This to reduce the memory intialization overhead to a minimum.
Use the implementation of `std::vec::from_elem` (used for the vec![]
macro) for Vec::resize as well. For simple element types like u8, this
compiles to memset, so it makes Vec::resize much more efficient.
Use the vec![] macro directly to create a sized, zeroed vector.
This should result in a big speedup when creating BufReader, because
vec![0; cap] compiles to a memset call, while the previous extend code
currently did not.
We needed a more efficient way to zerofill the vector in read_to_end.
This to reduce the memory intialization overhead to a minimum.
Use the implementation of `std::vec::from_elem` (used for the vec![]
macro) for Vec::resize as well. For simple element types like u8, this
compiles to memset, so it makes Vec::resize much more efficient.
Exploiting the fact that getting the length of the slices is known, we
can use a counted loop instead of iterators, which means that we only
need a single counter, instead of having to increment and check one
pointer for each iterator.
Benchmarks comparing vectors with 100,000 elements:
Before:
```
running 8 tests
test eq1_u8 ... bench: 66,757 ns/iter (+/- 113)
test eq2_u16 ... bench: 111,267 ns/iter (+/- 149)
test eq3_u32 ... bench: 126,282 ns/iter (+/- 111)
test eq4_u64 ... bench: 126,418 ns/iter (+/- 155)
test ne1_u8 ... bench: 88,990 ns/iter (+/- 161)
test ne2_u16 ... bench: 89,126 ns/iter (+/- 265)
test ne3_u32 ... bench: 96,901 ns/iter (+/- 92)
test ne4_u64 ... bench: 96,750 ns/iter (+/- 137)
```
After:
```
running 8 tests
test eq1_u8 ... bench: 46,413 ns/iter (+/- 521)
test eq2_u16 ... bench: 46,500 ns/iter (+/- 74)
test eq3_u32 ... bench: 50,059 ns/iter (+/- 92)
test eq4_u64 ... bench: 54,001 ns/iter (+/- 92)
test ne1_u8 ... bench: 47,595 ns/iter (+/- 53)
test ne2_u16 ... bench: 47,521 ns/iter (+/- 59)
test ne3_u32 ... bench: 44,889 ns/iter (+/- 74)
test ne4_u64 ... bench: 47,775 ns/iter (+/- 68)
```
This reverts https://github.com/rust-lang/rust/pull/26599, which puts the stage number in the output of `--version -v`, but is not supposed to put it in the 'stage2' compiler, which is what most people refer to the binary we deploy.
The picture is not so clear though because of how stage 'promotions' happen in the build and also because the windows build deploys stage3, not stage2.
cc @richo
This commit alters the implementation of multiple codegen units slightly to be
compatible with the MSVC linker. Currently the implementation will take the N
object files created by each codegen unit and will run `ld -r` to create a new
object file which is then passed along. The MSVC linker, however, is not able to
do this operation.
The compiler will now no longer attempt to assemble object files together but
will instead just pass through all the object files as usual. This implies that
rlibs may not contain more than one object file (if the library is compiled with
more than one codegen unit) and the output of `-C save-temps` will have changed
slightly as object files with the extension `0.o` will not be renamed to `o`
unless requested otherwise.
I am not mentioning #[unsafe_drop_flag] because it should go away
eventually, and also because it's just an attribute, it's not
really a use of the `unsafe` keyword.
Fixes#26345