This PR rewrites the code that previously relied on `PATH_MAX`.
On my tests, even though the user gives the buffer length explicitly, both Linux's glibc and OS X's libc seems to obey the hard limit of `PATH_MAX` internally. So, to truly remove the limitation of `PATH_MAX`, the related system calls should be rewritten from scratch in Rust, which this PR does not try to do.
However, eliminating the need of `PATH_MAX` is still a good idea for various reasons, such as: (1) they might change the implementation in the future, and (2) some platforms don't have a hard-coded `PATH_MAX`, such as GNU Hurd.
More details are in the commit messages.
Fixes#27454.
r? @alexcrichton
The major change here is in the tiny commit at the end and makes it so that we no longer emit lifetime intrinsics for allocas for function arguments. They are live for the whole function anyway, so the intrinsics add no value. This makes the resulting IR more clear, and reduces the peak memory usage and LLVM times by about 1-4%, depending on the crate.
The remaining changes are just preparatory cleanups and fixes for missing lifetime intrinsics.
Make `std::sys::os::getcwd` call `Vec::reserve(1)` followed by
`Vec::set_len` to double the buffer. This is to align with other similar
functions, such as:
- `std::sys_common::io::read_to_end_uninitialized`
- `std::sys::fs::readlink`
Also, reduce the initial buffer size from 2048 to 512. The previous size was
introduced with 4bc26ce in 2013, but it seems a bit excessive. This is
probably because buffer doubling was not implemented back then.
- Rewrite `std::sys::fs::readlink` not to rely on `PATH_MAX`
It currently has the following problems:
1. It uses `_PC_NAME_MAX` to query the maximum length of a file path in
the underlying system. However, the meaning of the constant is the
maximum length of *a path component*, not a full path. The correct
constant should be `_PC_PATH_MAX`.
2. `pathconf` *may* fail if the referred file does not exist. This can
be problematic if the file which the symbolic link points to does not
exist, but the link itself does exist. In this case, the current
implementation resorts to the hard-coded value of `1024`, which is not
ideal.
3. There may exist a platform where there is no limit on file path
lengths in general. That's the reaon why GNU Hurd doesn't define
`PATH_MAX` at all, in addition to having `pathconf` always returning
`-1`. In these platforms, the content of the symbolic link can be
silently truncated if the length exceeds the hard-coded limit mentioned
above.
4. The value obtained by `pathconf` may be outdated at the point of
actually calling `readlink`. This is inherently racy.
This commit introduces a loop that gradually increases the length of the
buffer passed to `readlink`, eliminating the need of `pathconf`.
- Remove the arbitrary memory limit of `std::sys::fs::realpath`
As per POSIX 2013, `realpath` will return a malloc'ed buffer if the
second argument is a null pointer.[1]
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html
- Comment on functions that are still using `PATH_MAX`
There are some functions that only work in terms of `PATH_MAX`, such as
`F_GETPATH` in OS X. Comments on them for posterity.
The implementation of the remainder operation belongs to
librustc_trans, but it is also stubbed out in libcore in order to
expose it as a trait on primitive types. Instead of exposing some
implementation details (like the upcast to `f64` in MSVC), use a
minimal implementation just like that of the `Div` trait.
The old code is temporarily needed in order to keep the MSVC build
working. It should be possible to remove this code after the bootstrap
compiler is updated to contain the MSVC workaround from #27875.
This does cause some breakage due to deficiencies in resolve -
`path::Components` is both an `Iterator` and implements `Eq`, `Ord`,
etc. If one calls e.g. `partial_cmp` on a `Components` and passes a
`&Components` intending to target the `PartialOrd` impl, the compiler
will select the `partial_cmp` from `Iterator` and then error out. I
doubt anyone will run into breakage from `Components` specifically, but
we should see if there are third party types that will run into issues.
`iter::order::equals` wasn't moved to `Iterator` since it's exactly the
same as `iter::order::eq` but with an `Eq` instead of `PartialEq` bound,
which doensn't seem very useful.
I also updated `le`, `gt`, etc to use `partial_cmp` which lets us drop
the extra `PartialEq` bound.
cc #27737
r? @alexcrichton
This does cause some breakage due to deficiencies in resolve -
`path::Components` is both an `Iterator` and implements `Eq`, `Ord`,
etc. If one calls e.g. `partial_cmp` on a `Components` and passes a
`&Components` intending to target the `PartialOrd` impl, the compiler
will select the `partial_cmp` from `Iterator` and then error out. I
doubt anyone will run into breakage from `Components` specifically, but
we should see if there are third party types that will run into issues.
`iter::order::equals` wasn't moved to `Iterator` since it's exactly the
same as `iter::order::eq` but with an `Eq` instead of `PartialEq` bound,
which doensn't seem very useful.
I also updated `le`, `gt`, etc to use `partial_cmp` which lets us drop
the extra `PartialEq` bound.
cc #27737
This fixes the case where we try to re-build & re-install rust to the
same prefix (without uninstalling) while using an llvm-root that is the
same as the prefix.
Without this, builds like that fail with:
'error: multiple dylib candidates for `std` found'
See https://github.com/jmesmon/meta-rust/issues/6 for some details.
May also be related to #20342.
* Rename `Utf16Items` to `Utf16Decoder`. "Items" is meaningless.
* Generalize it to any `u16` iterator, not just `[u16].iter()`
* Make it yield `Result` instead of a custom `Utf16Item` enum that was isomorphic to `Result`. This enable using the `FromIterator for Result` impl.
* Replace `Utf16Item::to_char_lossy` with a `Utf16Decoder::lossy` iterator adaptor.
This is a [breaking change], but only for users of the unstable `rustc_unicode` crate.
I’d like this functionality to be stabilized and re-exported in `std` eventually, as the "low-level equivalent" of `String::from_utf16` and `String::from_utf16_lossy` like #27784 is the low-level equivalent of #27714.
CC @aturon, @alexcrichton
* Suppresses warnings that main is unused when testing (#12327)
* Makes `--test` work with explicit `#[start]` (#11766)
* Fixes some cases where the normal main would not be disabled by `--test`, resulting in compilation failures.
This fixes the case where we try to re-build & re-install rust to the
same prefix (without uninstalling) while using an llvm-root that is the
same as the prefix.
Without this, builds like that fail with:
'error: multiple dylib candidates for `std` found'
See https://github.com/jmesmon/meta-rust/issues/6 for some details.
May also be related to #20342.
These have been removed and should not be documented here.
Should the replacement crates on crates.io be linked to, or is that not wanted in the core docs?
Correct iterator adaptor Chain
The iterator protocol specifies that the iteration ends with the return
value `None` from `.next()` (or `.next_back()`) and it is unspecified
what further calls return. The chain adaptor must account for this in
its DoubleEndedIterator implementation.
It uses three states:
- Both `a` and `b` are valid
- Only the Front iterator (`a`) is valid
- Only the Back iterator (`b`) is valid
The fourth state (neither iterator is valid) only occurs after Chain has
returned None once, so we don't need to store this state.
Fixes#26316
The iterator protocol specifies that the iteration ends with the return
value `None` from `.next()` (or `.next_back()`) and it is unspecified
what further calls return. The chain adaptor must account for this in
its DoubleEndedIterator implementation.
It uses three states:
- Both `a` and `b` are valid
- Only the Front iterator (`a`) is valid
- Only the Back iterator (`b`) is valid
The fourth state (neither iterator is valid) only occurs after Chain has
returned None once, so we don't need to store this state.
Fixes#26316
Function arguments are live for the whole function scope, so adding
lifetime intrinsics around them adds no value. The same is true for drop
hint allocas and everything else that goes directly through
lvalue_scratch_datum. So the easiest fix is to emit lifetime intrinsics
only for lvalue datums that are created in to_lvalue_datum_in_scope().
The reduces peak memory usage and LLVM times by about 1-4%, depending on
the crate.
Combining them seemed like a good idea at the time, but turns out that
handling lifetimes separately makes it somewhat easier to handle cases
where we don't want the intrinsics, and let's you see more easily where
the start/end pairs are.
The issues that the comments referred to were fixed before the PR even
landed but we never got around to remove the hack of skipping the
lifetime start.
According to https://msdn.microsoft.com/en-us/library/windows/desktop/ms679351(v=vs.85).aspx:
> If the function succeeds, the return value is the number of TCHARs stored in the output buffer,
> excluding the terminating null character.
_**Completely untested**_… since I have no Windows machine or anything of a sort to test this on.
r? @aturon