Rollup merge of #60187 - tmandry:generator-optimization, r=eddyb

Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](08bfe16129/src/libstd/macros.rs (L365-L381))), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] #59897 Multi-variant layouts for generators - [x] #60840 Preserve local scopes in generator MIR - [x] #61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at https://github.com/rust-lang/rust/pull/59897#issuecomment-489468627)
2019-06-12 04:22:44 +02:00 · 2019-06-12 04:22:44 +02:00 · a60a5db7de
commit a60a5db7de
parent 961a9d6e97 aeefbc4cba
8 changed files with 982 additions and 346 deletions
--- a/src/librustc/mir/mod.rs
+++ b/src/librustc/mir/mod.rs
@ -9,6 +9,7 @@
 use crate::hir::{self, InlineAsm as HirInlineAsm};
 use crate::mir::interpret::{ConstValue, InterpError, Scalar};
 use crate::mir::visit::MirVisitable;
+use rustc_data_structures::bit_set::BitMatrix;
 use rustc_data_structures::fx::FxHashSet;
 use rustc_data_structures::graph::dominators::{dominators, Dominators};
 use rustc_data_structures::graph::{self, GraphPredecessors, GraphSuccessors};
@ -2997,6 +2998,11 @@ pub struct GeneratorLayout<'tcx> {
    /// be stored in multiple variants.
    pub variant_fields: IndexVec<VariantIdx, IndexVec<Field, GeneratorSavedLocal>>,

+    /// Which saved locals are storage-live at the same time. Locals that do not
+    /// have conflicts with each other are allowed to overlap in the computed
+    /// layout.
+    pub storage_conflicts: BitMatrix<GeneratorSavedLocal, GeneratorSavedLocal>,
+
    /// Names and scopes of all the stored generator locals.
    /// NOTE(tmandry) This is *strictly* a temporary hack for codegen
    /// debuginfo generation, and will be removed at some point.
@ -3193,6 +3199,7 @@ impl<'tcx> TypeFoldable<'tcx> for Body<'tcx> {
    impl<'tcx> TypeFoldable<'tcx> for GeneratorLayout<'tcx> {
        field_tys,
        variant_fields,
+        storage_conflicts,
        __local_debuginfo_codegen_only_do_not_use,
    }
 }
@ -3572,6 +3579,15 @@ fn super_visit_with<V: TypeVisitor<'tcx>>(&self, _: &mut V) -> bool {
    }
 }

+impl<'tcx, R: Idx, C: Idx> TypeFoldable<'tcx> for BitMatrix<R, C> {
+    fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, _: &mut F) -> Self {
+        self.clone()
+    }
+    fn super_visit_with<V: TypeVisitor<'tcx>>(&self, _: &mut V) -> bool {
+        false
+    }
+}
+
 impl<'tcx> TypeFoldable<'tcx> for Constant<'tcx> {
    fn super_fold_with<'gcx: 'tcx, F: TypeFolder<'gcx, 'tcx>>(&self, folder: &mut F) -> Self {
        Constant {
--- a/src/librustc/ty/layout.rs
+++ b/src/librustc/ty/layout.rs
@ -14,6 +14,10 @@

 use crate::hir;
 use crate::ich::StableHashingContext;
+use crate::mir::{GeneratorLayout, GeneratorSavedLocal};
+use crate::ty::GeneratorSubsts;
+use crate::ty::subst::Subst;
+use rustc_data_structures::bit_set::BitSet;
 use rustc_data_structures::indexed_vec::{IndexVec, Idx};
 use rustc_data_structures::stable_hasher::{HashStable, StableHasher,
                                           StableHasherResult};
@ -212,7 +216,250 @@ pub struct LayoutCx<'tcx, C> {
    pub param_env: ty::ParamEnv<'tcx>,
 }

+#[derive(Copy, Clone, Debug)]
+enum StructKind {
+    /// A tuple, closure, or univariant which cannot be coerced to unsized.
+    AlwaysSized,
+    /// A univariant, the last field of which may be coerced to unsized.
+    MaybeUnsized,
+    /// A univariant, but with a prefix of an arbitrary size & alignment (e.g., enum tag).
+    Prefixed(Size, Align),
+}
+
 impl<'a, 'tcx> LayoutCx<'tcx, TyCtxt<'a, 'tcx, 'tcx>> {
+    fn scalar_pair(&self, a: Scalar, b: Scalar) -> LayoutDetails {
+        let dl = self.data_layout();
+        let b_align = b.value.align(dl);
+        let align = a.value.align(dl).max(b_align).max(dl.aggregate_align);
+        let b_offset = a.value.size(dl).align_to(b_align.abi);
+        let size = (b_offset + b.value.size(dl)).align_to(align.abi);
+        LayoutDetails {
+            variants: Variants::Single { index: VariantIdx::new(0) },
+            fields: FieldPlacement::Arbitrary {
+                offsets: vec![Size::ZERO, b_offset],
+                memory_index: vec![0, 1]
+            },
+            abi: Abi::ScalarPair(a, b),
+            align,
+            size
+        }
+    }
+
+    fn univariant_uninterned(&self,
+                             ty: Ty<'tcx>,
+                             fields: &[TyLayout<'_>],
+                             repr: &ReprOptions,
+                             kind: StructKind) -> Result<LayoutDetails, LayoutError<'tcx>> {
+        let dl = self.data_layout();
+        let packed = repr.packed();
+        if packed && repr.align > 0 {
+            bug!("struct cannot be packed and aligned");
+        }
+
+        let pack = Align::from_bytes(repr.pack as u64).unwrap();
+
+        let mut align = if packed {
+            dl.i8_align
+        } else {
+            dl.aggregate_align
+        };
+
+        let mut sized = true;
+        let mut offsets = vec![Size::ZERO; fields.len()];
+        let mut inverse_memory_index: Vec<u32> = (0..fields.len() as u32).collect();
+
+        let mut optimize = !repr.inhibit_struct_field_reordering_opt();
+        if let StructKind::Prefixed(_, align) = kind {
+            optimize &= align.bytes() == 1;
+        }
+
+        if optimize {
+            let end = if let StructKind::MaybeUnsized = kind {
+                fields.len() - 1
+            } else {
+                fields.len()
+            };
+            let optimizing = &mut inverse_memory_index[..end];
+            let field_align = |f: &TyLayout<'_>| {
+                if packed { f.align.abi.min(pack) } else { f.align.abi }
+            };
+            match kind {
+                StructKind::AlwaysSized |
+                StructKind::MaybeUnsized => {
+                    optimizing.sort_by_key(|&x| {
+                        // Place ZSTs first to avoid "interesting offsets",
+                        // especially with only one or two non-ZST fields.
+                        let f = &fields[x as usize];
+                        (!f.is_zst(), cmp::Reverse(field_align(f)))
+                    });
+                }
+                StructKind::Prefixed(..) => {
+                    optimizing.sort_by_key(|&x| field_align(&fields[x as usize]));
+                }
+            }
+        }
+
+        // inverse_memory_index holds field indices by increasing memory offset.
+        // That is, if field 5 has offset 0, the first element of inverse_memory_index is 5.
+        // We now write field offsets to the corresponding offset slot;
+        // field 5 with offset 0 puts 0 in offsets[5].
+        // At the bottom of this function, we use inverse_memory_index to produce memory_index.
+
+        let mut offset = Size::ZERO;
+
+        if let StructKind::Prefixed(prefix_size, prefix_align) = kind {
+            let prefix_align = if packed {
+                prefix_align.min(pack)
+            } else {
+                prefix_align
+            };
+            align = align.max(AbiAndPrefAlign::new(prefix_align));
+            offset = prefix_size.align_to(prefix_align);
+        }
+
+        for &i in &inverse_memory_index {
+            let field = fields[i as usize];
+            if !sized {
+                bug!("univariant: field #{} of `{}` comes after unsized field",
+                     offsets.len(), ty);
+            }
+
+            if field.is_unsized() {
+                sized = false;
+            }
+
+            // Invariant: offset < dl.obj_size_bound() <= 1<<61
+            let field_align = if packed {
+                field.align.min(AbiAndPrefAlign::new(pack))
+            } else {
+                field.align
+            };
+            offset = offset.align_to(field_align.abi);
+            align = align.max(field_align);
+
+            debug!("univariant offset: {:?} field: {:#?}", offset, field);
+            offsets[i as usize] = offset;
+
+            offset = offset.checked_add(field.size, dl)
+                .ok_or(LayoutError::SizeOverflow(ty))?;
+        }
+
+        if repr.align > 0 {
+            let repr_align = repr.align as u64;
+            align = align.max(AbiAndPrefAlign::new(Align::from_bytes(repr_align).unwrap()));
+            debug!("univariant repr_align: {:?}", repr_align);
+        }
+
+        debug!("univariant min_size: {:?}", offset);
+        let min_size = offset;
+
+        // As stated above, inverse_memory_index holds field indices by increasing offset.
+        // This makes it an already-sorted view of the offsets vec.
+        // To invert it, consider:
+        // If field 5 has offset 0, offsets[0] is 5, and memory_index[5] should be 0.
+        // Field 5 would be the first element, so memory_index is i:
+        // Note: if we didn't optimize, it's already right.
+
+        let mut memory_index;
+        if optimize {
+            memory_index = vec![0; inverse_memory_index.len()];
+
+            for i in 0..inverse_memory_index.len() {
+                memory_index[inverse_memory_index[i] as usize]  = i as u32;
+            }
+        } else {
+            memory_index = inverse_memory_index;
+        }
+
+        let size = min_size.align_to(align.abi);
+        let mut abi = Abi::Aggregate { sized };
+
+        // Unpack newtype ABIs and find scalar pairs.
+        if sized && size.bytes() > 0 {
+            // All other fields must be ZSTs, and we need them to all start at 0.
+            let mut zst_offsets =
+                offsets.iter().enumerate().filter(|&(i, _)| fields[i].is_zst());
+            if zst_offsets.all(|(_, o)| o.bytes() == 0) {
+                let mut non_zst_fields =
+                    fields.iter().enumerate().filter(|&(_, f)| !f.is_zst());
+
+                match (non_zst_fields.next(), non_zst_fields.next(), non_zst_fields.next()) {
+                    // We have exactly one non-ZST field.
+                    (Some((i, field)), None, None) => {
+                        // Field fills the struct and it has a scalar or scalar pair ABI.
+                        if offsets[i].bytes() == 0 &&
+                           align.abi == field.align.abi &&
+                           size == field.size {
+                            match field.abi {
+                                // For plain scalars, or vectors of them, we can't unpack
+                                // newtypes for `#[repr(C)]`, as that affects C ABIs.
+                                Abi::Scalar(_) | Abi::Vector { .. } if optimize => {
+                                    abi = field.abi.clone();
+                                }
+                                // But scalar pairs are Rust-specific and get
+                                // treated as aggregates by C ABIs anyway.
+                                Abi::ScalarPair(..) => {
+                                    abi = field.abi.clone();
+                                }
+                                _ => {}
+                            }
+                        }
+                    }
+
+                    // Two non-ZST fields, and they're both scalars.
+                    (Some((i, &TyLayout {
+                        details: &LayoutDetails { abi: Abi::Scalar(ref a), .. }, ..
+                    })), Some((j, &TyLayout {
+                        details: &LayoutDetails { abi: Abi::Scalar(ref b), .. }, ..
+                    })), None) => {
+                        // Order by the memory placement, not source order.
+                        let ((i, a), (j, b)) = if offsets[i] < offsets[j] {
+                            ((i, a), (j, b))
+                        } else {
+                            ((j, b), (i, a))
+                        };
+                        let pair = self.scalar_pair(a.clone(), b.clone());
+                        let pair_offsets = match pair.fields {
+                            FieldPlacement::Arbitrary {
+                                ref offsets,
+                                ref memory_index
+                            } => {
+                                assert_eq!(memory_index, &[0, 1]);
+                                offsets
+                            }
+                            _ => bug!()
+                        };
+                        if offsets[i] == pair_offsets[0] &&
+                           offsets[j] == pair_offsets[1] &&
+                           align == pair.align &&
+                           size == pair.size {
+                            // We can use `ScalarPair` only when it matches our
+                            // already computed layout (including `#[repr(C)]`).
+                            abi = pair.abi;
+                        }
+                    }
+
+                    _ => {}
+                }
+            }
+        }
+
+        if sized && fields.iter().any(|f| f.abi.is_uninhabited()) {
+            abi = Abi::Uninhabited;
+        }
+
+        Ok(LayoutDetails {
+            variants: Variants::Single { index: VariantIdx::new(0) },
+            fields: FieldPlacement::Arbitrary {
+                offsets,
+                memory_index
+            },
+            abi,
+            align,
+            size
+        })
+    }
+
    fn layout_raw_uncached(&self, ty: Ty<'tcx>) -> Result<&'tcx LayoutDetails, LayoutError<'tcx>> {
        let tcx = self.tcx;
        let param_env = self.param_env;
@ -228,244 +475,9 @@ fn layout_raw_uncached(&self, ty: Ty<'tcx>) -> Result<&'tcx LayoutDetails, Layou
        let scalar = |value: Primitive| {
            tcx.intern_layout(LayoutDetails::scalar(self, scalar_unit(value)))
        };
-        let scalar_pair = |a: Scalar, b: Scalar| {
-            let b_align = b.value.align(dl);
-            let align = a.value.align(dl).max(b_align).max(dl.aggregate_align);
-            let b_offset = a.value.size(dl).align_to(b_align.abi);
-            let size = (b_offset + b.value.size(dl)).align_to(align.abi);
-            LayoutDetails {
-                variants: Variants::Single { index: VariantIdx::new(0) },
-                fields: FieldPlacement::Arbitrary {
-                    offsets: vec![Size::ZERO, b_offset],
-                    memory_index: vec![0, 1]
-                },
-                abi: Abi::ScalarPair(a, b),
-                align,
-                size
-            }
-        };

-        #[derive(Copy, Clone, Debug)]
-        enum StructKind {
-            /// A tuple, closure, or univariant which cannot be coerced to unsized.
-            AlwaysSized,
-            /// A univariant, the last field of which may be coerced to unsized.
-            MaybeUnsized,
-            /// A univariant, but with a prefix of an arbitrary size & alignment (e.g., enum tag).
-            Prefixed(Size, Align),
-        }
-
-        let univariant_uninterned = |fields: &[TyLayout<'_>], repr: &ReprOptions, kind| {
-            let packed = repr.packed();
-            if packed && repr.align > 0 {
-                bug!("struct cannot be packed and aligned");
-            }
-
-            let pack = Align::from_bytes(repr.pack as u64).unwrap();
-
-            let mut align = if packed {
-                dl.i8_align
-            } else {
-                dl.aggregate_align
-            };
-
-            let mut sized = true;
-            let mut offsets = vec![Size::ZERO; fields.len()];
-            let mut inverse_memory_index: Vec<u32> = (0..fields.len() as u32).collect();
-
-            let mut optimize = !repr.inhibit_struct_field_reordering_opt();
-            if let StructKind::Prefixed(_, align) = kind {
-                optimize &= align.bytes() == 1;
-            }
-
-            if optimize {
-                let end = if let StructKind::MaybeUnsized = kind {
-                    fields.len() - 1
-                } else {
-                    fields.len()
-                };
-                let optimizing = &mut inverse_memory_index[..end];
-                let field_align = |f: &TyLayout<'_>| {
-                    if packed { f.align.abi.min(pack) } else { f.align.abi }
-                };
-                match kind {
-                    StructKind::AlwaysSized |
-                    StructKind::MaybeUnsized => {
-                        optimizing.sort_by_key(|&x| {
-                            // Place ZSTs first to avoid "interesting offsets",
-                            // especially with only one or two non-ZST fields.
-                            let f = &fields[x as usize];
-                            (!f.is_zst(), cmp::Reverse(field_align(f)))
-                        });
-                    }
-                    StructKind::Prefixed(..) => {
-                        optimizing.sort_by_key(|&x| field_align(&fields[x as usize]));
-                    }
-                }
-            }
-
-            // inverse_memory_index holds field indices by increasing memory offset.
-            // That is, if field 5 has offset 0, the first element of inverse_memory_index is 5.
-            // We now write field offsets to the corresponding offset slot;
-            // field 5 with offset 0 puts 0 in offsets[5].
-            // At the bottom of this function, we use inverse_memory_index to produce memory_index.
-
-            let mut offset = Size::ZERO;
-
-            if let StructKind::Prefixed(prefix_size, prefix_align) = kind {
-                let prefix_align = if packed {
-                    prefix_align.min(pack)
-                } else {
-                    prefix_align
-                };
-                align = align.max(AbiAndPrefAlign::new(prefix_align));
-                offset = prefix_size.align_to(prefix_align);
-            }
-
-            for &i in &inverse_memory_index {
-                let field = fields[i as usize];
-                if !sized {
-                    bug!("univariant: field #{} of `{}` comes after unsized field",
-                         offsets.len(), ty);
-                }
-
-                if field.is_unsized() {
-                    sized = false;
-                }
-
-                // Invariant: offset < dl.obj_size_bound() <= 1<<61
-                let field_align = if packed {
-                    field.align.min(AbiAndPrefAlign::new(pack))
-                } else {
-                    field.align
-                };
-                offset = offset.align_to(field_align.abi);
-                align = align.max(field_align);
-
-                debug!("univariant offset: {:?} field: {:#?}", offset, field);
-                offsets[i as usize] = offset;
-
-                offset = offset.checked_add(field.size, dl)
-                    .ok_or(LayoutError::SizeOverflow(ty))?;
-            }
-
-            if repr.align > 0 {
-                let repr_align = repr.align as u64;
-                align = align.max(AbiAndPrefAlign::new(Align::from_bytes(repr_align).unwrap()));
-                debug!("univariant repr_align: {:?}", repr_align);
-            }
-
-            debug!("univariant min_size: {:?}", offset);
-            let min_size = offset;
-
-            // As stated above, inverse_memory_index holds field indices by increasing offset.
-            // This makes it an already-sorted view of the offsets vec.
-            // To invert it, consider:
-            // If field 5 has offset 0, offsets[0] is 5, and memory_index[5] should be 0.
-            // Field 5 would be the first element, so memory_index is i:
-            // Note: if we didn't optimize, it's already right.
-
-            let mut memory_index;
-            if optimize {
-                memory_index = vec![0; inverse_memory_index.len()];
-
-                for i in 0..inverse_memory_index.len() {
-                    memory_index[inverse_memory_index[i] as usize]  = i as u32;
-                }
-            } else {
-                memory_index = inverse_memory_index;
-            }
-
-            let size = min_size.align_to(align.abi);
-            let mut abi = Abi::Aggregate { sized };
-
-            // Unpack newtype ABIs and find scalar pairs.
-            if sized && size.bytes() > 0 {
-                // All other fields must be ZSTs, and we need them to all start at 0.
-                let mut zst_offsets =
-                    offsets.iter().enumerate().filter(|&(i, _)| fields[i].is_zst());
-                if zst_offsets.all(|(_, o)| o.bytes() == 0) {
-                    let mut non_zst_fields =
-                        fields.iter().enumerate().filter(|&(_, f)| !f.is_zst());
-
-                    match (non_zst_fields.next(), non_zst_fields.next(), non_zst_fields.next()) {
-                        // We have exactly one non-ZST field.
-                        (Some((i, field)), None, None) => {
-                            // Field fills the struct and it has a scalar or scalar pair ABI.
-                            if offsets[i].bytes() == 0 &&
-                               align.abi == field.align.abi &&
-                               size == field.size {
-                                match field.abi {
-                                    // For plain scalars, or vectors of them, we can't unpack
-                                    // newtypes for `#[repr(C)]`, as that affects C ABIs.
-                                    Abi::Scalar(_) | Abi::Vector { .. } if optimize => {
-                                        abi = field.abi.clone();
-                                    }
-                                    // But scalar pairs are Rust-specific and get
-                                    // treated as aggregates by C ABIs anyway.
-                                    Abi::ScalarPair(..) => {
-                                        abi = field.abi.clone();
-                                    }
-                                    _ => {}
-                                }
-                            }
-                        }
-
-                        // Two non-ZST fields, and they're both scalars.
-                        (Some((i, &TyLayout {
-                            details: &LayoutDetails { abi: Abi::Scalar(ref a), .. }, ..
-                        })), Some((j, &TyLayout {
-                            details: &LayoutDetails { abi: Abi::Scalar(ref b), .. }, ..
-                        })), None) => {
-                            // Order by the memory placement, not source order.
-                            let ((i, a), (j, b)) = if offsets[i] < offsets[j] {
-                                ((i, a), (j, b))
-                            } else {
-                                ((j, b), (i, a))
-                            };
-                            let pair = scalar_pair(a.clone(), b.clone());
-                            let pair_offsets = match pair.fields {
-                                FieldPlacement::Arbitrary {
-                                    ref offsets,
-                                    ref memory_index
-                                } => {
-                                    assert_eq!(memory_index, &[0, 1]);
-                                    offsets
-                                }
-                                _ => bug!()
-                            };
-                            if offsets[i] == pair_offsets[0] &&
-                               offsets[j] == pair_offsets[1] &&
-                               align == pair.align &&
-                               size == pair.size {
-                                // We can use `ScalarPair` only when it matches our
-                                // already computed layout (including `#[repr(C)]`).
-                                abi = pair.abi;
-                            }
-                        }
-
-                        _ => {}
-                    }
-                }
-            }
-
-            if sized && fields.iter().any(|f| f.abi.is_uninhabited()) {
-                abi = Abi::Uninhabited;
-            }
-
-            Ok(LayoutDetails {
-                variants: Variants::Single { index: VariantIdx::new(0) },
-                fields: FieldPlacement::Arbitrary {
-                    offsets,
-                    memory_index
-                },
-                abi,
-                align,
-                size
-            })
-        };
        let univariant = |fields: &[TyLayout<'_>], repr: &ReprOptions, kind| {
-            Ok(tcx.intern_layout(univariant_uninterned(fields, repr, kind)?))
+            Ok(tcx.intern_layout(self.univariant_uninterned(ty, fields, repr, kind)?))
        };
        debug_assert!(!ty.has_infer_types());

@ -537,7 +549,7 @@ enum StructKind {
                };

                // Effectively a (ptr, meta) tuple.
-                tcx.intern_layout(scalar_pair(data_ptr, metadata))
+                tcx.intern_layout(self.scalar_pair(data_ptr, metadata))
            }

            // Arrays and slices.
@ -602,7 +614,7 @@ enum StructKind {
                univariant(&[], &ReprOptions::default(), StructKind::AlwaysSized)?
            }
            ty::Dynamic(..) | ty::Foreign(..) => {
-                let mut unit = univariant_uninterned(&[], &ReprOptions::default(),
+                let mut unit = self.univariant_uninterned(ty, &[], &ReprOptions::default(),
                  StructKind::AlwaysSized)?;
                match unit.abi {
                    Abi::Aggregate { ref mut sized } => *sized = false,
@ -611,64 +623,7 @@ enum StructKind {
                tcx.intern_layout(unit)
            }

-            ty::Generator(def_id, ref substs, _) => {
-                // FIXME(tmandry): For fields that are repeated in multiple
-                // variants in the GeneratorLayout, we need code to ensure that
-                // the offset of these fields never change. Right now this is
-                // not an issue since every variant has every field, but once we
-                // optimize this we have to be more careful.
-
-                let discr_index = substs.prefix_tys(def_id, tcx).count();
-                let prefix_tys = substs.prefix_tys(def_id, tcx)
-                    .chain(iter::once(substs.discr_ty(tcx)));
-                let prefix = univariant_uninterned(
-                    &prefix_tys.map(|ty| self.layout_of(ty)).collect::<Result<Vec<_>, _>>()?,
-                    &ReprOptions::default(),
-                    StructKind::AlwaysSized)?;
-
-                let mut size = prefix.size;
-                let mut align = prefix.align;
-                let variants_tys = substs.state_tys(def_id, tcx);
-                let variants = variants_tys.enumerate().map(|(i, variant_tys)| {
-                    let mut variant = univariant_uninterned(
-                        &variant_tys.map(|ty| self.layout_of(ty)).collect::<Result<Vec<_>, _>>()?,
-                        &ReprOptions::default(),
-                        StructKind::Prefixed(prefix.size, prefix.align.abi))?;
-
-                    variant.variants = Variants::Single { index: VariantIdx::new(i) };
-
-                    size = size.max(variant.size);
-                    align = align.max(variant.align);
-
-                    Ok(variant)
-                }).collect::<Result<IndexVec<VariantIdx, _>, _>>()?;
-
-                let abi = if prefix.abi.is_uninhabited() ||
-                             variants.iter().all(|v| v.abi.is_uninhabited()) {
-                    Abi::Uninhabited
-                } else {
-                    Abi::Aggregate { sized: true }
-                };
-                let discr = match &self.layout_of(substs.discr_ty(tcx))?.abi {
-                    Abi::Scalar(s) => s.clone(),
-                    _ => bug!(),
-                };
-
-                let layout = tcx.intern_layout(LayoutDetails {
-                    variants: Variants::Multiple {
-                        discr,
-                        discr_kind: DiscriminantKind::Tag,
-                        discr_index,
-                        variants,
-                    },
-                    fields: prefix.fields,
-                    abi,
-                    size,
-                    align,
-                });
-                debug!("generator layout ({:?}): {:#?}", ty, layout);
-                layout
-            }
+            ty::Generator(def_id, substs, _) => self.generator_layout(ty, def_id, &substs)?,

            ty::Closure(def_id, ref substs) => {
                let tys = substs.upvar_tys(def_id, tcx);
@ -853,7 +808,7 @@ enum StructKind {
                        else { StructKind::AlwaysSized }
                    };

-                    let mut st = univariant_uninterned(&variants[v], &def.repr, kind)?;
+                    let mut st = self.univariant_uninterned(ty, &variants[v], &def.repr, kind)?;
                    st.variants = Variants::Single { index: v };
                    let (start, end) = self.tcx.layout_scalar_valid_range(def.did);
                    match st.abi {
@ -932,7 +887,7 @@ enum StructKind {

                            let mut align = dl.aggregate_align;
                            let st = variants.iter_enumerated().map(|(j, v)| {
-                                let mut st = univariant_uninterned(v,
+                                let mut st = self.univariant_uninterned(ty, v,
                                    &def.repr, StructKind::AlwaysSized)?;
                                st.variants = Variants::Single { index: j };

@ -1040,7 +995,7 @@ enum StructKind {

                // Create the set of structs that represent each variant.
                let mut layout_variants = variants.iter_enumerated().map(|(i, field_layouts)| {
-                    let mut st = univariant_uninterned(&field_layouts,
+                    let mut st = self.univariant_uninterned(ty, &field_layouts,
                        &def.repr, StructKind::Prefixed(min_ity.size(), prefix_align))?;
                    st.variants = Variants::Single { index: i };
                    // Find the first field we can't move later
@ -1172,7 +1127,7 @@ enum StructKind {
                        }
                    }
                    if let Some((prim, offset)) = common_prim {
-                        let pair = scalar_pair(tag.clone(), scalar_unit(prim));
+                        let pair = self.scalar_pair(tag.clone(), scalar_unit(prim));
                        let pair_offsets = match pair.fields {
                            FieldPlacement::Arbitrary {
                                ref offsets,
@ -1237,7 +1192,259 @@ enum StructKind {
            }
        })
    }
+}

+/// Overlap eligibility and variant assignment for each GeneratorSavedLocal.
+#[derive(Clone, Debug, PartialEq)]
+enum SavedLocalEligibility {
+    Unassigned,
+    Assigned(VariantIdx),
+    // FIXME: Use newtype_index so we aren't wasting bytes
+    Ineligible(Option<u32>),
+}
+
+// When laying out generators, we divide our saved local fields into two
+// categories: overlap-eligible and overlap-ineligible.
+//
+// Those fields which are ineligible for overlap go in a "prefix" at the
+// beginning of the layout, and always have space reserved for them.
+//
+// Overlap-eligible fields are only assigned to one variant, so we lay
+// those fields out for each variant and put them right after the
+// prefix.
+//
+// Finally, in the layout details, we point to the fields from the
+// variants they are assigned to. It is possible for some fields to be
+// included in multiple variants. No field ever "moves around" in the
+// layout; its offset is always the same.
+//
+// Also included in the layout are the upvars and the discriminant.
+// These are included as fields on the "outer" layout; they are not part
+// of any variant.
+impl<'a, 'tcx> LayoutCx<'tcx, TyCtxt<'a, 'tcx, 'tcx>> {
+    /// Compute the eligibility and assignment of each local.
+    fn generator_saved_local_eligibility(&self, info: &GeneratorLayout<'tcx>)
+    -> (BitSet<GeneratorSavedLocal>, IndexVec<GeneratorSavedLocal, SavedLocalEligibility>) {
+        use SavedLocalEligibility::*;
+
+        let mut assignments: IndexVec<GeneratorSavedLocal, SavedLocalEligibility> =
+            IndexVec::from_elem_n(Unassigned, info.field_tys.len());
+
+        // The saved locals not eligible for overlap. These will get
+        // "promoted" to the prefix of our generator.
+        let mut ineligible_locals = BitSet::new_empty(info.field_tys.len());
+
+        // Figure out which of our saved locals are fields in only
+        // one variant. The rest are deemed ineligible for overlap.
+        for (variant_index, fields) in info.variant_fields.iter_enumerated() {
+            for local in fields {
+                match assignments[*local] {
+                    Unassigned => {
+                        assignments[*local] = Assigned(variant_index);
+                    }
+                    Assigned(idx) => {
+                        // We've already seen this local at another suspension
+                        // point, so it is no longer a candidate.
+                        trace!("removing local {:?} in >1 variant ({:?}, {:?})",
+                               local, variant_index, idx);
+                        ineligible_locals.insert(*local);
+                        assignments[*local] = Ineligible(None);
+                    }
+                    Ineligible(_) => {},
+                }
+            }
+        }
+
+        // Next, check every pair of eligible locals to see if they
+        // conflict.
+        for local_a in info.storage_conflicts.rows() {
+            let conflicts_a = info.storage_conflicts.count(local_a);
+            if ineligible_locals.contains(local_a) {
+                continue;
+            }
+
+            for local_b in info.storage_conflicts.iter(local_a) {
+                // local_a and local_b are storage live at the same time, therefore they
+                // cannot overlap in the generator layout. The only way to guarantee
+                // this is if they are in the same variant, or one is ineligible
+                // (which means it is stored in every variant).
+                if ineligible_locals.contains(local_b) ||
+                    assignments[local_a] == assignments[local_b]
+                {
+                    continue;
+                }
+
+                // If they conflict, we will choose one to make ineligible.
+                // This is not always optimal; it's just a greedy heuristic that
+                // seems to produce good results most of the time.
+                let conflicts_b = info.storage_conflicts.count(local_b);
+                let (remove, other) = if conflicts_a > conflicts_b {
+                    (local_a, local_b)
+                } else {
+                    (local_b, local_a)
+                };
+                ineligible_locals.insert(remove);
+                assignments[remove] = Ineligible(None);
+                trace!("removing local {:?} due to conflict with {:?}", remove, other);
+            }
+        }
+
+        // Write down the order of our locals that will be promoted to the prefix.
+        {
+            let mut idx = 0u32;
+            for local in ineligible_locals.iter() {
+                assignments[local] = Ineligible(Some(idx));
+                idx += 1;
+            }
+        }
+        debug!("generator saved local assignments: {:?}", assignments);
+
+        (ineligible_locals, assignments)
+    }
+
+    /// Compute the full generator layout.
+    fn generator_layout(
+        &self,
+        ty: Ty<'tcx>,
+        def_id: hir::def_id::DefId,
+        substs: &GeneratorSubsts<'tcx>,
+    ) -> Result<&'tcx LayoutDetails, LayoutError<'tcx>> {
+        use SavedLocalEligibility::*;
+        let tcx = self.tcx;
+        let recompute_memory_index = |offsets: &[Size]| -> Vec<u32> {
+            debug!("recompute_memory_index({:?})", offsets);
+            let mut inverse_index = (0..offsets.len() as u32).collect::<Vec<_>>();
+            inverse_index.sort_unstable_by_key(|i| offsets[*i as usize]);
+
+            let mut index = vec![0; offsets.len()];
+            for i in 0..index.len() {
+                index[inverse_index[i] as usize] = i as u32;
+            }
+            debug!("recompute_memory_index() => {:?}", index);
+            index
+        };
+        let subst_field = |ty: Ty<'tcx>| { ty.subst(tcx, substs.substs) };
+
+        let info = tcx.generator_layout(def_id);
+        let (ineligible_locals, assignments) = self.generator_saved_local_eligibility(&info);
+
+        // Build a prefix layout, including "promoting" all ineligible
+        // locals as part of the prefix. We compute the layout of all of
+        // these fields at once to get optimal packing.
+        let discr_index = substs.prefix_tys(def_id, tcx).count();
+        let promoted_tys =
+            ineligible_locals.iter().map(|local| subst_field(info.field_tys[local]));
+        let prefix_tys = substs.prefix_tys(def_id, tcx)
+            .chain(iter::once(substs.discr_ty(tcx)))
+            .chain(promoted_tys);
+        let prefix = self.univariant_uninterned(
+            ty,
+            &prefix_tys.map(|ty| self.layout_of(ty)).collect::<Result<Vec<_>, _>>()?,
+            &ReprOptions::default(),
+            StructKind::AlwaysSized)?;
+        let (prefix_size, prefix_align) = (prefix.size, prefix.align);
+
+        // Split the prefix layout into the "outer" fields (upvars and
+        // discriminant) and the "promoted" fields. Promoted fields will
+        // get included in each variant that requested them in
+        // GeneratorLayout.
+        debug!("prefix = {:#?}", prefix);
+        let (outer_fields, promoted_offsets) = match prefix.fields {
+            FieldPlacement::Arbitrary { mut offsets, .. } => {
+                let offsets_b = offsets.split_off(discr_index + 1);
+                let offsets_a = offsets;
+
+                let memory_index = recompute_memory_index(&offsets_a);
+                let outer_fields = FieldPlacement::Arbitrary { offsets: offsets_a, memory_index };
+                (outer_fields, offsets_b)
+            }
+            _ => bug!(),
+        };
+
+        let mut size = prefix.size;
+        let mut align = prefix.align;
+        let variants = info.variant_fields.iter_enumerated().map(|(index, variant_fields)| {
+            // Only include overlap-eligible fields when we compute our variant layout.
+            let variant_only_tys = variant_fields
+                .iter()
+                .filter(|local| {
+                    match assignments[**local] {
+                        Unassigned => bug!(),
+                        Assigned(v) if v == index => true,
+                        Assigned(_) => bug!("assignment does not match variant"),
+                        Ineligible(_) => false,
+                    }
+                })
+                .map(|local| subst_field(info.field_tys[*local]));
+
+            let mut variant = self.univariant_uninterned(
+                ty,
+                &variant_only_tys
+                    .map(|ty| self.layout_of(ty))
+                    .collect::<Result<Vec<_>, _>>()?,
+                &ReprOptions::default(),
+                StructKind::Prefixed(prefix_size, prefix_align.abi))?;
+            variant.variants = Variants::Single { index };
+
+            let offsets = match variant.fields {
+                FieldPlacement::Arbitrary { offsets, .. } => offsets,
+                _ => bug!(),
+            };
+
+            // Now, stitch the promoted and variant-only fields back together in
+            // the order they are mentioned by our GeneratorLayout.
+            let mut next_variant_field = 0;
+            let mut combined_offsets = Vec::new();
+            for local in variant_fields.iter() {
+                match assignments[*local] {
+                    Unassigned => bug!(),
+                    Assigned(_) => {
+                        combined_offsets.push(offsets[next_variant_field]);
+                        next_variant_field += 1;
+                    }
+                    Ineligible(field_idx) => {
+                        let field_idx = field_idx.unwrap() as usize;
+                        combined_offsets.push(promoted_offsets[field_idx]);
+                    }
+                }
+            }
+            let memory_index = recompute_memory_index(&combined_offsets);
+            variant.fields = FieldPlacement::Arbitrary { offsets: combined_offsets, memory_index };
+
+            size = size.max(variant.size);
+            align = align.max(variant.align);
+            Ok(variant)
+        }).collect::<Result<IndexVec<VariantIdx, _>, _>>()?;
+
+        let abi = if prefix.abi.is_uninhabited() ||
+                     variants.iter().all(|v| v.abi.is_uninhabited()) {
+            Abi::Uninhabited
+        } else {
+            Abi::Aggregate { sized: true }
+        };
+        let discr = match &self.layout_of(substs.discr_ty(tcx))?.abi {
+            Abi::Scalar(s) => s.clone(),
+            _ => bug!(),
+        };
+
+        let layout = tcx.intern_layout(LayoutDetails {
+            variants: Variants::Multiple {
+                discr,
+                discr_kind: DiscriminantKind::Tag,
+                discr_index,
+                variants,
+            },
+            fields: outer_fields,
+            abi,
+            size,
+            align,
+        });
+        debug!("generator layout ({:?}): {:#?}", ty, layout);
+        Ok(layout)
+    }
+}
+
+impl<'a, 'tcx> LayoutCx<'tcx, TyCtxt<'a, 'tcx, 'tcx>> {
    /// This is invoked by the `layout_raw` query to record the final
    /// layout of each type.
    #[inline(always)]
--- a/src/librustc_data_structures/bit_set.rs
+++ b/src/librustc_data_structures/bit_set.rs
@ -636,7 +636,7 @@ pub fn contains(&self, elem: T) -> bool {
 ///
 /// All operations that involve a row and/or column index will panic if the
 /// index exceeds the relevant bound.
-#[derive(Clone, Debug)]
+#[derive(Clone, Debug, Eq, PartialEq, RustcDecodable, RustcEncodable)]
 pub struct BitMatrix<R: Idx, C: Idx> {
    num_rows: usize,
    num_columns: usize,
@ -658,6 +658,23 @@ pub fn new(num_rows: usize, num_columns: usize) -> BitMatrix<R, C> {
        }
    }

+    /// Creates a new matrix, with `row` used as the value for every row.
+    pub fn from_row_n(row: &BitSet<C>, num_rows: usize) -> BitMatrix<R, C> {
+        let num_columns = row.domain_size();
+        let words_per_row = num_words(num_columns);
+        assert_eq!(words_per_row, row.words().len());
+        BitMatrix {
+            num_rows,
+            num_columns,
+            words: iter::repeat(row.words()).take(num_rows).flatten().cloned().collect(),
+            marker: PhantomData,
+        }
+    }
+
+    pub fn rows(&self) -> impl Iterator<Item = R> {
+        (0..self.num_rows).map(R::new)
+    }
+
    /// The range of bits for a given row.
    fn range(&self, row: R) -> (usize, usize) {
        let words_per_row = num_words(self.num_columns);
@ -737,6 +754,49 @@ pub fn union_rows(&mut self, read: R, write: R) -> bool {
        changed
    }

+    /// Adds the bits from `with` to the bits from row `write`, and
+    /// returns `true` if anything changed.
+    pub fn union_row_with(&mut self, with: &BitSet<C>, write: R) -> bool {
+        assert!(write.index() < self.num_rows);
+        assert_eq!(with.domain_size(), self.num_columns);
+        let (write_start, write_end) = self.range(write);
+        let mut changed = false;
+        for (read_index, write_index) in (0..with.words().len()).zip(write_start..write_end) {
+            let word = self.words[write_index];
+            let new_word = word | with.words()[read_index];
+            self.words[write_index] = new_word;
+            changed |= word != new_word;
+        }
+        changed
+    }
+
+    /// Sets every cell in `row` to true.
+    pub fn insert_all_into_row(&mut self, row: R) {
+        assert!(row.index() < self.num_rows);
+        let (start, end) = self.range(row);
+        let words = &mut self.words[..];
+        for index in start..end {
+            words[index] = !0;
+        }
+        self.clear_excess_bits(row);
+    }
+
+    /// Clear excess bits in the final word of the row.
+    fn clear_excess_bits(&mut self, row: R) {
+        let num_bits_in_final_word = self.num_columns % WORD_BITS;
+        if num_bits_in_final_word > 0 {
+            let mask = (1 << num_bits_in_final_word) - 1;
+            let (_, end) = self.range(row);
+            let final_word_idx = end - 1;
+            self.words[final_word_idx] &= mask;
+        }
+    }
+
+    /// Gets a slice of the underlying words.
+    pub fn words(&self) -> &[Word] {
+        &self.words
+    }
+
    /// Iterates through all the columns set to true in a given row of
    /// the matrix.
    pub fn iter<'a>(&'a self, row: R) -> BitIter<'a, C> {
@ -748,6 +808,12 @@ pub fn iter<'a>(&'a self, row: R) -> BitIter<'a, C> {
            marker: PhantomData,
        }
    }
+
+    /// Returns the number of elements in `row`.
+    pub fn count(&self, row: R) -> usize {
+        let (start, end) = self.range(row);
+        self.words[start..end].iter().map(|e| e.count_ones() as usize).sum()
+    }
 }

 /// A fixed-column-size, variable-row-size 2D bit matrix with a moderately
@ -1057,6 +1123,7 @@ fn matrix_iter() {
    matrix.insert(2, 99);
    matrix.insert(4, 0);
    matrix.union_rows(3, 5);
+    matrix.insert_all_into_row(6);

    let expected = [99];
    let mut iter = expected.iter();
@ -1068,6 +1135,7 @@ fn matrix_iter() {

    let expected = [22, 75];
    let mut iter = expected.iter();
+    assert_eq!(matrix.count(3), expected.len());
    for i in matrix.iter(3) {
        let j = *iter.next().unwrap();
        assert_eq!(i, j);
@ -1076,6 +1144,7 @@ fn matrix_iter() {

    let expected = [0];
    let mut iter = expected.iter();
+    assert_eq!(matrix.count(4), expected.len());
    for i in matrix.iter(4) {
        let j = *iter.next().unwrap();
        assert_eq!(i, j);
@ -1084,11 +1153,24 @@ fn matrix_iter() {

    let expected = [22, 75];
    let mut iter = expected.iter();
+    assert_eq!(matrix.count(5), expected.len());
    for i in matrix.iter(5) {
        let j = *iter.next().unwrap();
        assert_eq!(i, j);
    }
    assert!(iter.next().is_none());
+
+    assert_eq!(matrix.count(6), 100);
+    let mut count = 0;
+    for (idx, i) in matrix.iter(6).enumerate() {
+        assert_eq!(idx, i);
+        count += 1;
+    }
+    assert_eq!(count, 100);
+
+    if let Some(i) = matrix.iter(7).next() {
+        panic!("expected no elements in row, but contains element {:?}", i);
+    }
 }

 #[test]
--- a/src/librustc_data_structures/stable_hasher.rs
+++ b/src/librustc_data_structures/stable_hasher.rs
@ -503,6 +503,16 @@ fn hash_stable<W: StableHasherResult>(&self,
    }
 }

+impl<R: indexed_vec::Idx, C: indexed_vec::Idx, CTX> HashStable<CTX>
+for bit_set::BitMatrix<R, C>
+{
+    fn hash_stable<W: StableHasherResult>(&self,
+                                          ctx: &mut CTX,
+                                          hasher: &mut StableHasher<W>) {
+        self.words().hash_stable(ctx, hasher);
+    }
+}
+
 impl_stable_hash_via_hash!(::std::path::Path);
 impl_stable_hash_via_hash!(::std::path::PathBuf);

--- a/src/librustc_mir/dataflow/at_location.rs
+++ b/src/librustc_mir/dataflow/at_location.rs
@ -131,6 +131,11 @@ pub fn with_iter_outgoing<F>(&self, f: F)
        curr_state.subtract(&self.stmt_kill);
        f(curr_state.iter());
    }
+
+    /// Returns a bitset of the elements present in the current state.
+    pub fn as_dense(&self) -> &BitSet<BD::Idx> {
+        &self.curr_state
+    }
 }

 impl<'tcx, BD> FlowsAtLocation for FlowAtLocation<'tcx, BD>
--- a/src/librustc_mir/transform/generator.rs
+++ b/src/librustc_mir/transform/generator.rs
@ -59,13 +59,14 @@
 use rustc::ty::subst::SubstsRef;
 use rustc_data_structures::fx::FxHashMap;
 use rustc_data_structures::indexed_vec::{Idx, IndexVec};
-use rustc_data_structures::bit_set::BitSet;
+use rustc_data_structures::bit_set::{BitSet, BitMatrix};
 use std::borrow::Cow;
 use std::iter;
 use std::mem;
 use crate::transform::{MirPass, MirSource};
 use crate::transform::simplify;
 use crate::transform::no_landing_pads::no_landing_pads;
+use crate::dataflow::{DataflowResults, DataflowResultsConsumer, FlowAtLocation};
 use crate::dataflow::{do_dataflow, DebugFormatted, state_for_location};
 use crate::dataflow::{MaybeStorageLive, HaveBeenBorrowedLocals};
 use crate::util::dump_mir;
@ -393,16 +394,33 @@ fn visit_statement(&mut self,
    }
 }

+struct LivenessInfo {
+    /// Which locals are live across any suspension point.
+    ///
+    /// GeneratorSavedLocal is indexed in terms of the elements in this set;
+    /// i.e. GeneratorSavedLocal::new(1) corresponds to the second local
+    /// included in this set.
+    live_locals: liveness::LiveVarSet,
+
+    /// The set of saved locals live at each suspension point.
+    live_locals_at_suspension_points: Vec<BitSet<GeneratorSavedLocal>>,
+
+    /// For every saved local, the set of other saved locals that are
+    /// storage-live at the same time as this local. We cannot overlap locals in
+    /// the layout which have conflicting storage.
+    storage_conflicts: BitMatrix<GeneratorSavedLocal, GeneratorSavedLocal>,
+
+    /// For every suspending block, the locals which are storage-live across
+    /// that suspension point.
+    storage_liveness: FxHashMap<BasicBlock, liveness::LiveVarSet>,
+}
+
 fn locals_live_across_suspend_points(
    tcx: TyCtxt<'a, 'tcx, 'tcx>,
    body: &Body<'tcx>,
    source: MirSource<'tcx>,
    movable: bool,
-) -> (
-    liveness::LiveVarSet,
-    FxHashMap<BasicBlock, liveness::LiveVarSet>,
-    BitSet<BasicBlock>,
-) {
+) -> LivenessInfo {
    let dead_unwinds = BitSet::new_empty(body.basic_blocks().len());
    let def_id = source.def_id();

@ -432,7 +450,7 @@ fn locals_live_across_suspend_points(
    };

    // Calculate the liveness of MIR locals ignoring borrows.
-    let mut set = liveness::LiveVarSet::new_empty(body.local_decls.len());
+    let mut live_locals = liveness::LiveVarSet::new_empty(body.local_decls.len());
    let mut liveness = liveness::liveness_of_locals(
        body,
    );
@ -445,13 +463,10 @@ fn locals_live_across_suspend_points(
    );

    let mut storage_liveness_map = FxHashMap::default();
-
-    let mut suspending_blocks = BitSet::new_empty(body.basic_blocks().len());
+    let mut live_locals_at_suspension_points = Vec::new();

    for (block, data) in body.basic_blocks().iter_enumerated() {
        if let TerminatorKind::Yield { .. } = data.terminator().kind {
-            suspending_blocks.insert(block);
-
            let loc = Location {
                block: block,
                statement_index: data.statements.len(),
@ -490,20 +505,177 @@ fn locals_live_across_suspend_points(
            // Locals live are live at this point only if they are used across
            // suspension points (the `liveness` variable)
            // and their storage is live (the `storage_liveness` variable)
-            storage_liveness.intersect(&liveness.outs[block]);
+            let mut live_locals_here = storage_liveness;
+            live_locals_here.intersect(&liveness.outs[block]);

-            let live_locals = storage_liveness;
+            // The generator argument is ignored
+            live_locals_here.remove(self_arg());

-            // Add the locals life at this suspension point to the set of locals which live across
+            // Add the locals live at this suspension point to the set of locals which live across
            // any suspension points
-            set.union(&live_locals);
+            live_locals.union(&live_locals_here);
+
+            live_locals_at_suspension_points.push(live_locals_here);
        }
    }

-    // The generator argument is ignored
-    set.remove(self_arg());
+    // Renumber our liveness_map bitsets to include only the locals we are
+    // saving.
+    let live_locals_at_suspension_points = live_locals_at_suspension_points
+        .iter()
+        .map(|live_here| renumber_bitset(&live_here, &live_locals))
+        .collect();

-    (set, storage_liveness_map, suspending_blocks)
+    let storage_conflicts = compute_storage_conflicts(
+        body,
+        &live_locals,
+        &ignored,
+        storage_live,
+        storage_live_analysis);
+
+    LivenessInfo {
+        live_locals,
+        live_locals_at_suspension_points,
+        storage_conflicts,
+        storage_liveness: storage_liveness_map,
+    }
+}
+
+/// Renumbers the items present in `stored_locals` and applies the renumbering
+/// to 'input`.
+///
+/// For example, if `stored_locals = [1, 3, 5]`, this would be renumbered to
+/// `[0, 1, 2]`. Thus, if `input = [3, 5]` we would return `[1, 2]`.
+fn renumber_bitset(input: &BitSet<Local>, stored_locals: &liveness::LiveVarSet)
+-> BitSet<GeneratorSavedLocal> {
+    assert!(stored_locals.superset(&input), "{:?} not a superset of {:?}", stored_locals, input);
+    let mut out = BitSet::new_empty(stored_locals.count());
+    for (idx, local) in stored_locals.iter().enumerate() {
+        let saved_local = GeneratorSavedLocal::from(idx);
+        if input.contains(local) {
+            out.insert(saved_local);
+        }
+    }
+    debug!("renumber_bitset({:?}, {:?}) => {:?}", input, stored_locals, out);
+    out
+}
+
+/// For every saved local, looks for which locals are StorageLive at the same
+/// time. Generates a bitset for every local of all the other locals that may be
+/// StorageLive simultaneously with that local. This is used in the layout
+/// computation; see `GeneratorLayout` for more.
+fn compute_storage_conflicts(
+    body: &'mir Body<'tcx>,
+    stored_locals: &liveness::LiveVarSet,
+    ignored: &StorageIgnored,
+    storage_live: DataflowResults<'tcx, MaybeStorageLive<'mir, 'tcx>>,
+    _storage_live_analysis: MaybeStorageLive<'mir, 'tcx>,
+) -> BitMatrix<GeneratorSavedLocal, GeneratorSavedLocal> {
+    assert_eq!(body.local_decls.len(), ignored.0.domain_size());
+    assert_eq!(body.local_decls.len(), stored_locals.domain_size());
+    debug!("compute_storage_conflicts({:?})", body.span);
+    debug!("ignored = {:?}", ignored.0);
+
+    // Storage ignored locals are not eligible for overlap, since their storage
+    // is always live.
+    let mut ineligible_locals = ignored.0.clone();
+    ineligible_locals.intersect(&stored_locals);
+
+    // Compute the storage conflicts for all eligible locals.
+    let mut visitor = StorageConflictVisitor {
+        body,
+        stored_locals: &stored_locals,
+        local_conflicts: BitMatrix::from_row_n(&ineligible_locals, body.local_decls.len())
+    };
+    let mut state = FlowAtLocation::new(storage_live);
+    visitor.analyze_results(&mut state);
+    let local_conflicts = visitor.local_conflicts;
+
+    // Compress the matrix using only stored locals (Local -> GeneratorSavedLocal).
+    //
+    // NOTE: Today we store a full conflict bitset for every local. Technically
+    // this is twice as many bits as we need, since the relation is symmetric.
+    // However, in practice these bitsets are not usually large. The layout code
+    // also needs to keep track of how many conflicts each local has, so it's
+    // simpler to keep it this way for now.
+    let mut storage_conflicts = BitMatrix::new(stored_locals.count(), stored_locals.count());
+    for (idx_a, local_a) in stored_locals.iter().enumerate() {
+        let saved_local_a = GeneratorSavedLocal::new(idx_a);
+        if ineligible_locals.contains(local_a) {
+            // Conflicts with everything.
+            storage_conflicts.insert_all_into_row(saved_local_a);
+        } else {
+            // Keep overlap information only for stored locals.
+            for (idx_b, local_b) in stored_locals.iter().enumerate() {
+                let saved_local_b = GeneratorSavedLocal::new(idx_b);
+                if local_conflicts.contains(local_a, local_b) {
+                    storage_conflicts.insert(saved_local_a, saved_local_b);
+                }
+            }
+        }
+    }
+    storage_conflicts
+}
+
+struct StorageConflictVisitor<'body, 'tcx: 'body, 's> {
+    body: &'body Body<'tcx>,
+    stored_locals: &'s liveness::LiveVarSet,
+    // FIXME(tmandry): Consider using sparse bitsets here once we have good
+    // benchmarks for generators.
+    local_conflicts: BitMatrix<Local, Local>,
+}
+
+impl<'body, 'tcx: 'body, 's> DataflowResultsConsumer<'body, 'tcx>
+for StorageConflictVisitor<'body, 'tcx, 's> {
+    type FlowState = FlowAtLocation<'tcx, MaybeStorageLive<'body, 'tcx>>;
+
+    fn body(&self) -> &'body Body<'tcx> {
+        self.body
+    }
+
+    fn visit_block_entry(&mut self,
+                         block: BasicBlock,
+                         flow_state: &Self::FlowState) {
+        // statement_index is only used for logging, so this is fine.
+        self.apply_state(flow_state, Location { block, statement_index: 0 });
+    }
+
+    fn visit_statement_entry(&mut self,
+                             loc: Location,
+                             _stmt: &Statement<'tcx>,
+                             flow_state: &Self::FlowState) {
+        self.apply_state(flow_state, loc);
+    }
+
+    fn visit_terminator_entry(&mut self,
+                              loc: Location,
+                              _term: &Terminator<'tcx>,
+                              flow_state: &Self::FlowState) {
+        self.apply_state(flow_state, loc);
+    }
+}
+
+impl<'body, 'tcx: 'body, 's> StorageConflictVisitor<'body, 'tcx, 's> {
+    fn apply_state(&mut self,
+                   flow_state: &FlowAtLocation<'tcx, MaybeStorageLive<'body, 'tcx>>,
+                   loc: Location) {
+        // Ignore unreachable blocks.
+        match self.body.basic_blocks()[loc.block].terminator().kind {
+            TerminatorKind::Unreachable => return,
+            _ => (),
+        };
+
+        let mut eligible_storage_live = flow_state.as_dense().clone();
+        eligible_storage_live.intersect(&self.stored_locals);
+
+        for local in eligible_storage_live.iter() {
+            self.local_conflicts.union_row_with(&eligible_storage_live, local);
+        }
+
+        if eligible_storage_live.count() > 1 {
+            trace!("at {:?}, eligible_storage_live={:?}", loc, eligible_storage_live);
+        }
+    }
 }

 fn compute_layout<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
@ -517,8 +689,9 @@ fn compute_layout<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
        FxHashMap<BasicBlock, liveness::LiveVarSet>)
 {
    // Use a liveness analysis to compute locals which are live across a suspension point
-    let (live_locals, storage_liveness, suspending_blocks) =
-        locals_live_across_suspend_points(tcx, body, source, movable);
+    let LivenessInfo {
+        live_locals, live_locals_at_suspension_points, storage_conflicts, storage_liveness
+    } = locals_live_across_suspend_points(tcx, body, source, movable);

    // Erase regions from the types passed in from typeck so we can compare them with
    // MIR types
@ -547,37 +720,47 @@ fn compute_layout<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,

    let dummy_local = LocalDecl::new_internal(tcx.mk_unit(), body.span);

-    // Gather live locals and their indices replacing values in mir.local_decls with a dummy
-    // to avoid changing local indices
-    let live_decls = live_locals.iter().map(|local| {
+    // Gather live locals and their indices replacing values in body.local_decls
+    // with a dummy to avoid changing local indices.
+    let mut locals = IndexVec::<GeneratorSavedLocal, _>::new();
+    let mut tys = IndexVec::<GeneratorSavedLocal, _>::new();
+    let mut decls = IndexVec::<GeneratorSavedLocal, _>::new();
+    for (idx, local) in live_locals.iter().enumerate() {
        let var = mem::replace(&mut body.local_decls[local], dummy_local.clone());
-        (local, var)
-    });
-
-    // For now we will access everything via variant #3, leaving empty variants
-    // for the UNRESUMED, RETURNED, and POISONED states.
-    // If there were a yield-less generator without a variant #3, it would not
-    // have any vars to remap, so we would never use this.
-    let variant_index = VariantIdx::new(3);
-
-    // Create a map from local indices to generator struct indices.
-    // We also create a vector of the LocalDecls of these locals.
-    let mut remap = FxHashMap::default();
-    let mut decls = IndexVec::new();
-    for (idx, (local, var)) in live_decls.enumerate() {
-        remap.insert(local, (var.ty, variant_index, idx));
+        locals.push(local);
+        tys.push(var.ty);
        decls.push(var);
+        debug!("generator saved local {:?} => {:?}", GeneratorSavedLocal::from(idx), local);
    }
-    let field_tys = decls.iter().map(|field| field.ty).collect::<IndexVec<_, _>>();

-    // Put every var in each variant, for now.
-    let all_vars = (0..field_tys.len()).map(GeneratorSavedLocal::from).collect();
-    let empty_variants = iter::repeat(IndexVec::new()).take(3);
-    let state_variants = iter::repeat(all_vars).take(suspending_blocks.count());
+    // Leave empty variants for the UNRESUMED, RETURNED, and POISONED states.
+    const RESERVED_VARIANTS: usize = 3;
+
+    // Build the generator variant field list.
+    // Create a map from local indices to generator struct indices.
+    let mut variant_fields: IndexVec<VariantIdx, IndexVec<Field, GeneratorSavedLocal>> =
+        iter::repeat(IndexVec::new()).take(RESERVED_VARIANTS).collect();
+    let mut remap = FxHashMap::default();
+    for (suspension_point_idx, live_locals) in live_locals_at_suspension_points.iter().enumerate() {
+        let variant_index = VariantIdx::from(RESERVED_VARIANTS + suspension_point_idx);
+        let mut fields = IndexVec::new();
+        for (idx, saved_local) in live_locals.iter().enumerate() {
+            fields.push(saved_local);
+            // Note that if a field is included in multiple variants, we will
+            // just use the first one here. That's fine; fields do not move
+            // around inside generators, so it doesn't matter which variant
+            // index we access them by.
+            remap.entry(locals[saved_local]).or_insert((tys[saved_local], variant_index, idx));
+        }
+        variant_fields.push(fields);
+    }
+    debug!("generator variant_fields = {:?}", variant_fields);
+    debug!("generator storage_conflicts = {:#?}", storage_conflicts);

    let layout = GeneratorLayout {
-        field_tys,
-        variant_fields: empty_variants.chain(state_variants).collect(),
+        field_tys: tys,
+        variant_fields,
+        storage_conflicts,
        __local_debuginfo_codegen_only_do_not_use: decls,
    };

--- a/src/test/run-pass/async-fn-size.rs
+++ b/src/test/run-pass/async-fn-size.rs
@ -0,0 +1,106 @@
+// edition:2018
+// aux-build:arc_wake.rs
+
+#![feature(async_await, await_macro)]
+
+extern crate arc_wake;
+
+use std::pin::Pin;
+use std::future::Future;
+use std::sync::{
+    Arc,
+    atomic::{self, AtomicUsize},
+};
+use std::task::{Context, Poll};
+use arc_wake::ArcWake;
+
+struct Counter {
+    wakes: AtomicUsize,
+}
+
+impl ArcWake for Counter {
+    fn wake(self: Arc<Self>) {
+        Self::wake_by_ref(&self)
+    }
+    fn wake_by_ref(arc_self: &Arc<Self>) {
+        arc_self.wakes.fetch_add(1, atomic::Ordering::SeqCst);
+    }
+}
+
+struct WakeOnceThenComplete(bool, u8);
+
+impl Future for WakeOnceThenComplete {
+    type Output = u8;
+    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<u8> {
+        if self.0 {
+            Poll::Ready(self.1)
+        } else {
+            cx.waker().wake_by_ref();
+            self.0 = true;
+            Poll::Pending
+        }
+    }
+}
+
+fn wait(fut: impl Future<Output = u8>) -> u8 {
+    let mut fut = Box::pin(fut);
+    let counter = Arc::new(Counter { wakes: AtomicUsize::new(0) });
+    let waker = ArcWake::into_waker(counter.clone());
+    let mut cx = Context::from_waker(&waker);
+    loop {
+        match fut.as_mut().poll(&mut cx) {
+            Poll::Ready(out) => return out,
+            Poll::Pending => (),
+        }
+    }
+}
+
+fn base() -> WakeOnceThenComplete { WakeOnceThenComplete(false, 1) }
+
+async fn await1_level1() -> u8 {
+    await!(base())
+}
+
+async fn await2_level1() -> u8 {
+    await!(base()) + await!(base())
+}
+
+async fn await3_level1() -> u8 {
+    await!(base()) + await!(base()) + await!(base())
+}
+
+async fn await3_level2() -> u8 {
+    await!(await3_level1()) + await!(await3_level1()) + await!(await3_level1())
+}
+
+async fn await3_level3() -> u8 {
+    await!(await3_level2()) + await!(await3_level2()) + await!(await3_level2())
+}
+
+async fn await3_level4() -> u8 {
+    await!(await3_level3()) + await!(await3_level3()) + await!(await3_level3())
+}
+
+async fn await3_level5() -> u8 {
+    await!(await3_level4()) + await!(await3_level4()) + await!(await3_level4())
+}
+
+fn main() {
+    assert_eq!(2, std::mem::size_of_val(&base()));
+    assert_eq!(8, std::mem::size_of_val(&await1_level1()));
+    assert_eq!(12, std::mem::size_of_val(&await2_level1()));
+    assert_eq!(12, std::mem::size_of_val(&await3_level1()));
+    assert_eq!(20, std::mem::size_of_val(&await3_level2()));
+    assert_eq!(28, std::mem::size_of_val(&await3_level3()));
+    assert_eq!(36, std::mem::size_of_val(&await3_level4()));
+    assert_eq!(44, std::mem::size_of_val(&await3_level5()));
+
+    assert_eq!(1,   wait(base()));
+    assert_eq!(1,   wait(await1_level1()));
+    assert_eq!(2,   wait(await2_level1()));
+    assert_eq!(3,   wait(await3_level1()));
+    assert_eq!(9,   wait(await3_level2()));
+    assert_eq!(27,  wait(await3_level3()));
+    assert_eq!(81,  wait(await3_level4()));
+    assert_eq!(243, wait(await3_level5()));
+}
--- a/src/test/run-pass/generator/overlap-locals.rs
+++ b/src/test/run-pass/generator/overlap-locals.rs
@ -0,0 +1,27 @@
+#![feature(generators)]
+
+fn main() {
+    let a = || {
+        {
+            let w: i32 = 4;
+            yield;
+            println!("{:?}", w);
+        }
+        {
+            let x: i32 = 5;
+            yield;
+            println!("{:?}", x);
+        }
+        {
+            let y: i32 = 6;
+            yield;
+            println!("{:?}", y);
+        }
+        {
+            let z: i32 = 7;
+            yield;
+            println!("{:?}", z);
+        }
+    };
+    assert_eq!(8, std::mem::size_of_val(&a));
+}