Customize `<FlatMap as Iterator>::fold`
`FlatMap` can use internal iteration for its `fold`, which shows a
performance advantage in the new benchmarks:
test iter::bench_flat_map_chain_ref_sum ... bench: 4,354,111 ns/iter (+/- 108,871)
test iter::bench_flat_map_chain_sum ... bench: 468,167 ns/iter (+/- 2,274)
test iter::bench_flat_map_ref_sum ... bench: 449,616 ns/iter (+/- 6,257)
test iter::bench_flat_map_sum ... bench: 348,010 ns/iter (+/- 1,227)
... where the "ref" benches are using `by_ref()` that isn't optimized.
So this change shows a decent advantage on its own, but much more when
combined with a `chain` iterator that also optimizes `fold`.