Optimize some functions in std::process
* Be sure that `read_to_end` gets directed towards `read_to_end_uninitialized` for all handles
* When spawning a child that guaranteed doesn't need a stdin, don't actually create a stdin pipe for that process, instead just redirect it to /dev/null
* When calling `wait_with_output`, don't spawn threads to read out the pipes of the child. Instead drain all pipes on the calling thread and *then* wait on the process.
Functionally, it is intended that nothing changes as part of this PR
---
Note that this was the same as #31613, and even after that it turned out that fixing Windows was easier than I thought! To copy a comment from over there:
> As some rationale for this as well, it's always bothered me that we've spawned threads in the standard library for this (seems a bit overkill), and I've also been curious lately as to our why our build times for Windows are so much higher than Unix (on the buildbots we have). I have done basically 0 investigation into why, but I figured it can't help to try to optimize Command::output which I believe is called quite a few times during the test suite.