Eg when the local is immutable **and** the type is freeze.
This makes the simple raytracer runtime benchmark 1% faster than cg_llvm
without optimizations. Before it was 2% slower.
cc #691
cc #684
`write_cvalue` didn't work for `Box<[u8]>`, because the inner fat ptr
was wrapped inside a newtype, which meant `Box<[u8]>` itself only had
one field.
This also simplifies `CValue::force_stack` by reusing `write_cvalue`
when it is not already on the stack.