rust/compiler/rustc_codegen_gcc
bors a77322c16f Auto merge of #118310 - scottmcm:three-way-compare, r=davidtwco
Add `Ord::cmp` for primitives as a `BinOp` in MIR

Update: most of this OP was written months ago.  See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.

---

There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches.  Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:

1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.

Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic.  Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical.  Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)?  But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)?  And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers.  Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.

As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR.  The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks.  Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues.  (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)

---

r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
2024-04-02 19:21:44 +00:00
..
.github/workflows Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
build_sysroot Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
build_system Correctly handle cargo_target_dir 2024-03-06 16:24:02 +01:00
deps Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
doc Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
example Codegen const panic messages as function calls 2024-03-22 09:55:50 -04:00
patches stabilize ptr.is_aligned, move ptr.is_aligned_to to a new feature gate 2024-03-29 19:59:46 -04:00
src Auto merge of #118310 - scottmcm:three-way-compare, r=davidtwco 2024-04-02 19:21:44 +00:00
tests Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
tools
.gitignore Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
.ignore Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
.rustfmt.toml Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
build.rs Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
Cargo.lock Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
Cargo.toml Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
config.example.toml Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
libgccjit.version Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
LICENSE-APACHE
LICENSE-MIT
messages.ftl Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
Readme.md Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
rust-toolchain Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00
y.sh Merge commit 'b385428e3ddf330805241e7758e773f933357c4b' into subtree-update_cg_gcc_2024-03-05 2024-03-05 19:58:36 +01:00

WIP libgccjit codegen backend for rust

Chat on IRC Chat on Matrix

This is a GCC codegen for rustc, which means it can be loaded by the existing rustc frontend, but benefits from GCC: more architectures are supported and GCC's optimizations are used.

Despite its name, libgccjit can be used for ahead-of-time compilation, as is used here.

Motivation

The primary goal of this project is to be able to compile Rust code on platforms unsupported by LLVM. A secondary goal is to check if using the gcc backend will provide any run-time speed improvement for the programs compiled using rustc.

Building

This requires a patched libgccjit in order to work. You need to use my fork of gcc which already includes these patches.

$ cp config.example.toml config.toml

If don't need to test GCC patches you wrote in our GCC fork, then the default configuration should be all you need. You can update the rustc_codegen_gcc without worrying about GCC.

Building with your own GCC version

If you wrote a patch for GCC and want to test it without this backend, you will need to do a few more things.

To build it (most of these instructions come from here, so don't hesitate to take a look there if you encounter an issue):

$ git clone https://github.com/antoyo/gcc
$ sudo apt install flex libmpfr-dev libgmp-dev libmpc3 libmpc-dev
$ mkdir gcc-build gcc-install
$ cd gcc-build
$ ../gcc/configure \
    --enable-host-shared \
    --enable-languages=jit \
    --enable-checking=release \ # it enables extra checks which allow to find bugs
    --disable-bootstrap \
    --disable-multilib \
    --prefix=$(pwd)/../gcc-install
$ make -j4 # You can replace `4` with another number depending on how many cores you have.

If you want to run libgccjit tests, you will need to also enable the C++ language in the configure:

--enable-languages=jit,c++

Then to run libgccjit tests:

$ cd gcc # from the `gcc-build` folder
$ make check-jit
# To run one specific test:
$ make check-jit RUNTESTFLAGS="-v -v -v jit.exp=jit.dg/test-asm.cc"

Put the path to your custom build of libgccjit in the file config.toml.

You now need to set the gcc-path value in config.toml with the result of this command:

$ dirname $(readlink -f `find . -name libgccjit.so`)

and to comment the download-gccjit setting:

gcc-path = "[MY PATH]"
# download-gccjit = true

Then you can run commands like this:

$ ./y.sh prepare # download and patch sysroot src and install hyperfine for benchmarking
$ ./y.sh build --release

To run the tests:

$ ./y.sh test --release

Usage

$CG_GCCJIT_DIR is the directory you cloned this repo into in the following instructions:

export CG_GCCJIT_DIR=[the full path to rustc_codegen_gcc]

Cargo

$ CHANNEL="release" $CG_GCCJIT_DIR/y.sh cargo run

If you compiled cg_gccjit in debug mode (aka you didn't pass --release to ./y.sh test) you should use CHANNEL="debug" instead or omit CHANNEL="release" completely.

LTO

To use LTO, you need to set the variable FAT_LTO=1 and EMBED_LTO_BITCODE=1 in addition to setting lto = "fat" in the Cargo.toml. Don't set FAT_LTO when compiling the sysroot, though: only set EMBED_LTO_BITCODE=1.

Failing to set EMBED_LTO_BITCODE will give you the following error:

error: failed to copy bitcode to object file: No such file or directory (os error 2)

Rustc

You should prefer using the Cargo method.

$ LIBRARY_PATH="[gcc-path value]" LD_LIBRARY_PATH="[gcc-path value]" rustc +$(cat $CG_GCCJIT_DIR/rust-toolchain | grep 'channel' | cut -d '=' -f 2 | sed 's/"//g' | sed 's/ //g') -Cpanic=abort -Zcodegen-backend=$CG_GCCJIT_DIR/target/release/librustc_codegen_gcc.so --sysroot $CG_GCCJIT_DIR/build_sysroot/sysroot my_crate.rs

Env vars

CG_GCCJIT_INCR_CACHE_DISABLED
Don't cache object files in the incremental cache. Useful during development of cg_gccjit to make it possible to use incremental mode for all analyses performed by rustc without caching object files when their content should have been changed by a change to cg_gccjit.
CG_GCCJIT_DISPLAY_CG_TIME
Display the time it took to perform codegen for a crate
CG_RUSTFLAGS
Send additional flags to rustc. Can be used to build the sysroot without unwinding by setting `CG_RUSTFLAGS=-Cpanic=abort`.
CG_GCCJIT_DUMP_TO_FILE
Dump a C-like representation to /tmp/gccjit_dumps and enable debug info in order to debug this C-like representation.

Extra documentation

More specific documentation is available in the doc folder:

Licensing

While this crate is licensed under a dual Apache/MIT license, it links to libgccjit which is under the GPLv3+ and thus, the resulting toolchain (rustc + GCC codegen) will need to be released under the GPL license.

However, programs compiled with rustc_codegen_gcc do not need to be released under a GPL license.