Mark libserialize functions as inline
Got to thinking: "what if that big pile of tiny functions isn't inlining as it should?"
So a few `replace-regex` later the local perf run says this:
<details>
![](https://i.imgur.com/gvdJEgG.png)
</details>
Not huge, but still a win, which is interesting. Want to verify with the real perf run, but I understand there's a backlog.
I didn't notice any increase in compile time or binary sizes for rustc/libs.
Two small improvements
In `librustc_apfloat/ieee.rs`, use the iterator.[r]find methods to simplify the code. In `libserialize/json.rs`, make use of the fact that `Vec.last` on an empty `Vec` returns `None` to simplify the code to a single match.
Documented impl From on line 367 of libserialize/json.rs
This is for the impl From mentioned in #51430 assigned to @skade .
Hopefully I didn't miss anything/get anything wrong. I looked over another PR for another part of this same issue to see what the proper formatting was, etc.
Thanks!
In `librustc_apfloat/ieee.rs`, use the iterator.[r]find methods to
simplify the code. In `libserialize/json.rs`, make use of the fact
that `Vec.last` on an empty `Vec` returns `None` to simplify the
code to a single match.
A few cleanups
- change `skip(1).next()` to `nth(1)`
- collapse some `if-else` expressions
- remove a few explicit `return`s
- remove an unnecessary field name
- dereference once instead of matching on multiple references
- prefer `iter().enumerate()` to indexing with `for`
- remove some unnecessary lifetime annotations
- use `writeln!()` instead of `write!()`+`\n`
- remove redundant parentheses
- shorten some enum variant names
- a few other cleanups suggested by `clippy`
Replace push loops with extend() where possible
Or set the vector capacity where I couldn't do it.
According to my [simple benchmark](https://gist.github.com/ljedrz/568e97621b749849684c1da71c27dceb) `extend`ing a vector can be over **10 times** faster than `push`ing to it in a loop:
10 elements (6.1 times faster):
```
test bench_extension ... bench: 75 ns/iter (+/- 23)
test bench_push_loop ... bench: 458 ns/iter (+/- 142)
```
100 elements (11.12 times faster):
```
test bench_extension ... bench: 87 ns/iter (+/- 26)
test bench_push_loop ... bench: 968 ns/iter (+/- 3,528)
```
1000 elements (11.04 times faster):
```
test bench_extension ... bench: 311 ns/iter (+/- 9)
test bench_push_loop ... bench: 3,436 ns/iter (+/- 233)
```
Seems like a good idea to use `extend` as much as possible.
Speed up leb128 encoding and decoding for unsigned values.
Make the implementation for some leb128 functions potentially faster.
@Mark-Simulacrum, could you please trigger a perf.rlo run?
This saves the storage space used by about 32 bits per `Fingerprint`.
On average, this reduces the size of the `/target/{mode}/incremental`
folder by roughly 5%.
Fixes#45875