c245c5bbad
Enable parallel codegen (2 units) by default when --opt-level is 0 or 1. This gives a minor speedup on large crates (~10%), with only a tiny slowdown (~2%) for small ones (which usually build in under a second regardless). The current default (no parallelization) is used when the user requests optimization (--opt-level 2 or 3), and when the user has enabled LTO (which is incompatible with parallel codegen). This commit also changes the rust build system to use parallel codegen when appropriate. This means codegen-units=4 for stage0 always, and also for stage1 and stage2 when configured with --disable-optimize. (Other settings use codegen-units=1 for stage1 and stage2, to get maximum performance for release binaries.) The build system also sets codegen-units=1 for compiletest tests (compiletest does its own parallelization) and uses the same setting as stage2 for crate tests.