This trades an O(n) allocation + memcpy for a O(1) proc allocation (for the destructor). Most users only need &[u8] anyway (all of the users in the main repo), and so this offers large gains.