26 KiB
Serde Rust Serialization Framework
Serde is a powerful framework that enables serialization libraries to generically serialize Rust data structures without the overhead of runtime type information. In many situations, the handshake protocol between serializers and serializees can be completely optimized away, leaving Serde to perform roughly the same speed as a hand written serializer for a specific type.
Simple Serde Example
Here is a simple example that uses
serde_json, which uses Serde under the
covers, to generate and parse JSON. First, lets start off with the Cargo.toml
file:
[package]
name = "serde_example"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]
[dependencies]
serde_json = "0.8"
Next, the src/main.rs
file itself:
extern crate serde_json;
use std::collections::HashMap;
use serde_json::Value;
use serde_json::builder::{ArrayBuilder, ObjectBuilder};
fn main() {
// Serde has support for many of the builtin Rust types, like arrays..:
let v = vec![1, 2];
let serialized = serde_json::to_string(&v).unwrap();
println!("serialized vec: {:?}", serialized);
let deserialized: Vec<u32> = serde_json::from_str(&serialized).unwrap();
println!("deserialized vec: {:?}", deserialized);
// ... and maps:
let mut map = HashMap::new();
map.insert("x".to_string(), 1);
map.insert("y".to_string(), 2);
let serialized = serde_json::to_string(&map).unwrap();
println!("serialized map: {:?}", serialized);
let deserialized: HashMap<String, u32> = serde_json::from_str(&serialized).unwrap();
println!("deserialized map: {:?}", deserialized);
// It also can handle complex objects:
let value = ObjectBuilder::new()
.insert("int", 1)
.insert("string", "a string")
.insert("array", ArrayBuilder::new()
.push(1)
.push(2)
.unwrap())
.unwrap();
let serialized = serde_json::to_string(&value).unwrap();
println!("serialized value: {:?}", serialized);
let deserialized: serde_json::Value = serde_json::from_str(&serialized).unwrap();
println!("deserialized value: {:?}", deserialized);
}
This produces the following output when run:
% cargo run
serialized vec: "[1,2]"
deserialized vec: [1, 2]
serialized map: "{\"y\":2,\"x\":1}"
deserialized map: {"y": 2, "x": 1}
serialized value: "{\"array\":[1,2],\"int\":1,\"string\":\"a string\"}"
deserialized value: {"array":[1,2],"int":1,"string":"a string"}
Using Serde with Stable Rust and serde_codegen
The example before used serde_json::Value
as the in-memory representation of
the JSON value, but it's also possible for Serde to serialize to and from
regular Rust types. However, the code to do this can be a bit complicated to
write. So instead, Serde also has some powerful code generation libraries that
work with Stable and Nightly Rust that eliminate much of the complexity of hand
rolling serialization and deserialization for a given type.
First lets see how we would use Stable Rust, which is currently a tad more
complicated than Nightly Rust due to having to work around compiler plugins
being unstable. We will use serde_codegen
which is based on the code
generation library syntex. First we need
to setup the Cargo.toml
that builds the project:
[package]
name = "serde_example"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]
build = "build.rs"
[build-dependencies]
serde_codegen = "0.8"
[dependencies]
serde = "0.8"
serde_json = "0.8"
Next, we define our source file, src/main.rs.in
. Note this is a different
extension than usual because we need to do code generation:
#[derive(Serialize, Deserialize, Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
let serialized = serde_json::to_string(&point).unwrap();
println!("{}", serialized);
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
println!("{:?}", deserialized);
}
To finish up the main source code, we define a very simple src/main.rs
that
uses the generated code.
src/main.rs
:
extern crate serde;
extern crate serde_json;
include!(concat!(env!("OUT_DIR"), "/main.rs"));
The last step is to actually drive the code generation, with the build.rs
script:
extern crate serde_codegen;
use std::env;
use std::path::Path;
pub fn main() {
let out_dir = env::var_os("OUT_DIR").unwrap();
let src = Path::new("src/main.rs.in");
let dst = Path::new(&out_dir).join("main.rs");
serde_codegen::expand(&src, &dst).unwrap();
}
All this produces this when run:
% cargo run
{"x":1,"y":2}
Point { x: 1, y: 2 }
While this works well with Stable Rust, be aware that the error locations currently are reported in the generated file instead of in the source file.
Using Serde with Nightly Rust and serde_macros
The prior example is a bit more complicated than it needs to be due to compiler
plugins being unstable. However, if you are already using Nightly Rust, you can
use serde_macros
, which has a much simpler interface. First, here is the new
Cargo.toml
:
[package]
name = "serde_example_nightly"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]
[dependencies]
serde = "0.8"
serde_json = "0.8"
serde_macros = "0.8"
Note that it doesn't need a build script. Now the src/main.rs
, which enables
the plugin feature, and registers the serde_macros
plugin:
#![feature(custom_derive, plugin)]
#![plugin(serde_macros)]
extern crate serde_json;
#[derive(Serialize, Deserialize, Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let point = Point { x: 1, y: 2 };
let serialized = serde_json::to_string(&point).unwrap();
println!("{}", serialized);
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
println!("{:?}", deserialized);
}
This also produces the same output:
% cargo run
{"x":1,"y":2}
Point { x: 1, y: 2 }
You may find it easier to develop with Nightly Rust and serde\_macros
, then
deploy with Stable Rust and serde_codegen
. It's possible to combine both
approaches in one setup:
Cargo.toml
:
[package]
name = "serde_example"
version = "0.1.0"
authors = ["Erick Tryzelaar <erick.tryzelaar@gmail.com>"]
build = "build.rs"
[features]
default = ["serde_codegen"]
nightly = ["serde_macros"]
[build-dependencies]
serde_codegen = { version = "0.8", optional = true }
[dependencies]
serde = "0.8"
serde_json = "0.8"
serde_macros = { version = "0.8", optional = true }
build.rs
:
#[cfg(not(feature = "serde_macros"))]
mod inner {
extern crate serde_codegen;
use std::env;
use std::path::Path;
pub fn main() {
let out_dir = env::var_os("OUT_DIR").unwrap();
let src = Path::new("src/main.rs.in");
let dst = Path::new(&out_dir).join("main.rs");
serde_codegen::expand(&src, &dst).unwrap();
}
}
#[cfg(feature = "serde_macros")]
mod inner {
pub fn main() {}
}
fn main() {
inner::main();
}
src/main.rs
:
#![cfg_attr(feature = "serde_macros", feature(custom_derive, plugin))]
#![cfg_attr(feature = "serde_macros", plugin(serde_macros))]
extern crate serde;
extern crate serde_json;
#[cfg(feature = "serde_macros")]
include!("main.rs.in");
#[cfg(not(feature = "serde_macros"))]
include!(concat!(env!("OUT_DIR"), "/main.rs"));
The src/main.rs.in
is the same as before.
Then to run with stable:
% cargo build
...
Or with nightly:
% cargo build --features nightly --no-default-features
...
Serialization without Macros
Under the covers, Serde extensively uses the Visitor pattern to thread state between the Serializer and Serialize without the two having specific information about each other's concrete type. This has many of the same benefits as frameworks that use runtime type information without the overhead. In fact, when compiling with optimizations, Rust is able to remove most or all the visitor state, and generate code that's nearly as fast as a hand written serializer format for a specific type.
To see it in action, lets look at how a simple type like i32
is serialized.
The
Serializer
is threaded through the type:
impl serde::Serialize for i32 {
fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
where S: serde::Serializer,
{
serializer.serialize_i32(*self)
}
}
As you can see it's pretty simple. More complex types like BTreeMap
need to
pass a
MapVisitor
to the
Serializer
in order to walk through the type:
impl<K, V> Serialize for BTreeMap<K, V>
where K: Serialize + Ord,
V: Serialize,
{
#[inline]
fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
where S: Serializer,
{
serializer.serialize_map(MapIteratorVisitor::new(self.iter(), Some(self.len())))
}
}
pub struct MapIteratorVisitor<Iter> {
iter: Iter,
len: Option<usize>,
}
impl<K, V, Iter> MapIteratorVisitor<Iter>
where Iter: Iterator<Item=(K, V)>
{
#[inline]
pub fn new(iter: Iter, len: Option<usize>) -> MapIteratorVisitor<Iter> {
MapIteratorVisitor {
iter: iter,
len: len,
}
}
}
impl<K, V, I> MapVisitor for MapIteratorVisitor<I>
where K: Serialize,
V: Serialize,
I: Iterator<Item=(K, V)>,
{
#[inline]
fn visit<S>(&mut self, serializer: &mut S) -> Result<Option<()>, S::Error>
where S: Serializer,
{
match self.iter.next() {
Some((key, value)) => {
let value = try!(serializer.serialize_map_elt(key, value));
Ok(Some(value))
}
None => Ok(None)
}
}
#[inline]
fn len(&self) -> Option<usize> {
self.len
}
}
Serializing structs follow this same pattern. In fact, structs are represented as a named map. Its visitor uses a simple state machine to iterate through all the fields:
extern crate serde;
extern crate serde_json;
struct Point {
x: i32,
y: i32,
}
impl serde::Serialize for Point {
fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
where S: serde::Serializer
{
serializer.serialize_struct("Point", PointMapVisitor {
value: self,
state: 0,
})
}
}
struct PointMapVisitor<'a> {
value: &'a Point,
state: u8,
}
impl<'a> serde::ser::MapVisitor for PointMapVisitor<'a> {
fn visit<S>(&mut self, serializer: &mut S) -> Result<Option<()>, S::Error>
where S: serde::Serializer
{
match self.state {
0 => {
self.state += 1;
Ok(Some(try!(serializer.serialize_struct_elt("x", &self.value.x))))
}
1 => {
self.state += 1;
Ok(Some(try!(serializer.serialize_struct_elt("y", &self.value.y))))
}
_ => {
Ok(None)
}
}
}
}
fn main() {
let point = Point { x: 1, y: 2 };
let serialized = serde_json::to_string(&point).unwrap();
println!("{}", serialized);
}
Deserialization without Macros
Deserialization is a little more complicated since there's a bit more error
handling that needs to occur. Let's start with the simple i32
Deserialize
implementation. It passes a
Visitor to the
Deserializer.
The Visitor
can create the i32
from a variety of different types:
impl Deserialize for i32 {
fn deserialize<D>(deserializer: &mut D) -> Result<i32, D::Error>
where D: serde::Deserializer,
{
deserializer.deserialize(I32Visitor)
}
}
struct I32Visitor;
impl serde::de::Visitor for I32Visitor {
type Value = i32;
fn visit_i16<E>(&mut self, value: i16) -> Result<i32, E>
where E: Error,
{
self.visit_i32(value as i32)
}
fn visit_i32<E>(&mut self, value: i32) -> Result<i32, E>
where E: Error,
{
Ok(value)
}
...
Since it's possible for this type to get passed an unexpected type, we need a way to error out. This is done by way of the Error trait, which allows a Deserialize to generate an error for a few common error conditions. Here's how it could be used:
...
fn visit_string<E>(&mut self, _: String) -> Result<i32, E>
where E: Error,
{
Err(serde::de::Error::custom("expect a string"))
}
...
Maps follow a similar pattern as before, and use a MapVisitor to walk through the values generated by the Deserializer.
impl<K, V> serde::Deserialize for BTreeMap<K, V>
where K: serde::Deserialize + Eq + Ord,
V: serde::Deserialize,
{
fn deserialize<D>(deserializer: &mut D) -> Result<BTreeMap<K, V>, D::Error>
where D: serde::Deserializer,
{
deserializer.deserialize(BTreeMapVisitor::new())
}
}
pub struct BTreeMapVisitor<K, V> {
marker: PhantomData<BTreeMap<K, V>>,
}
impl<K, V> BTreeMapVisitor<K, V> {
pub fn new() -> Self {
BTreeMapVisitor {
marker: PhantomData,
}
}
}
impl<K, V> serde::de::Visitor for BTreeMapVisitor<K, V>
where K: serde::de::Deserialize + Ord,
V: serde::de::Deserialize
{
type Value = BTreeMap<K, V>;
fn visit_unit<E>(&mut self) -> Result<BTreeMap<K, V>, E>
where E: Error,
{
Ok(BTreeMap::new())
}
fn visit_map<V_>(&mut self, mut visitor: V_) -> Result<BTreeMap<K, V>, V_::Error>
where V_: MapVisitor,
{
let mut values = BTreeMap::new();
while let Some((key, value)) = try!(visitor.visit()) {
values.insert(key, value);
}
try!(visitor.end());
Ok(values)
}
}
Deserializing structs goes a step further in order to support not allocating a
String
to hold the field names. This is done by custom field enum that
deserializes an enum variant from a string. So for our Point
example from
before, we need to generate:
extern crate serde;
extern crate serde_json;
#[derive(Debug)]
struct Point {
x: i32,
y: i32,
}
enum PointField {
X,
Y,
}
impl serde::Deserialize for PointField {
fn deserialize<D>(deserializer: &mut D) -> Result<PointField, D::Error>
where D: serde::de::Deserializer
{
struct PointFieldVisitor;
impl serde::de::Visitor for PointFieldVisitor {
type Value = PointField;
fn visit_str<E>(&mut self, value: &str) -> Result<PointField, E>
where E: serde::de::Error
{
match value {
"x" => Ok(PointField::X),
"y" => Ok(PointField::Y),
_ => Err(serde::de::Error::custom("expected x or y")),
}
}
}
deserializer.deserialize(PointFieldVisitor)
}
}
impl serde::Deserialize for Point {
fn deserialize<D>(deserializer: &mut D) -> Result<Point, D::Error>
where D: serde::de::Deserializer
{
static FIELDS: &'static [&'static str] = &["x", "y"];
deserializer.deserialize_struct("Point", FIELDS, PointVisitor)
}
}
struct PointVisitor;
impl serde::de::Visitor for PointVisitor {
type Value = Point;
fn visit_map<V>(&mut self, mut visitor: V) -> Result<Point, V::Error>
where V: serde::de::MapVisitor
{
let mut x = None;
let mut y = None;
loop {
match try!(visitor.visit_key()) {
Some(PointField::X) => { x = Some(try!(visitor.visit_value())); }
Some(PointField::Y) => { y = Some(try!(visitor.visit_value())); }
None => { break; }
}
}
let x = match x {
Some(x) => x,
None => try!(visitor.missing_field("x")),
};
let y = match y {
Some(y) => y,
None => try!(visitor.missing_field("y")),
};
try!(visitor.end());
Ok(Point{ x: x, y: y })
}
}
fn main() {
let serialized = "{\"x\":1,\"y\":2}";
let deserialized: Point = serde_json::from_str(&serialized).unwrap();
println!("{:?}", deserialized);
}
Design Considerations and tradeoffs for Serializers and Deserializers
Serde serialization and deserialization implementations are written in such a
way that they err on being able to represent more values, and also provide
better error messages when they are passed an incorrect type to deserialize
from. For example, by default, it is a syntax error to deserialize a String
into an Option<String>
. This is implemented such that it is possible to
distinguish between the values None
and Some(())
, if the serialization
format supports option types.
However, many formats do not have option types, and represents optional values
as either a null
, or some other value. Serde Serializer
s and
Deserializer
s can opt-in support for this. For serialization, this is pretty
easy. Simply implement these methods:
...
fn visit_none(&mut self) -> Result<(), Self::Error> {
self.visit_unit()
}
fn visit_some<T>(&mut self, value: T) -> Result<(), Self::Error> {
value.serialize(self)
}
...
For deserialization, this can be implemented by way of the
Deserializer::visit_option
hook, which presumes that there is some ability to peek at what is the
next value in the serialized token stream. This following example is from
serde_tests::TokenDeserializer,
where it checks to see if the next value is an Option
, a ()
, or some other
value:
...
fn visit_option<V>(&mut self, mut visitor: V) -> Result<V::Value, Error>
where V: de::Visitor,
{
match self.tokens.peek() {
Some(&Token::Option(false)) => {
self.tokens.next();
visitor.visit_none()
}
Some(&Token::Option(true)) => {
self.tokens.next();
visitor.visit_some(self)
}
Some(&Token::Unit) => {
self.tokens.next();
visitor.visit_none()
}
Some(_) => visitor.visit_some(self),
None => Err(Error::EndOfStreamError),
}
}
...
Annotations
serde_codegen
and serde_macros
support annotations that help to customize
how types are serialized. Here are the supported annotations:
Container Annotations:
Annotation | Function |
---|---|
#[serde(rename="name")] |
Serialize and deserialize this container with the given name |
#[serde(rename(serialize="name1"))] |
Serialize this container with the given name |
#[serde(rename(deserialize="name1"))] |
Deserialize this container with the given name |
#[serde(deny_unknown_fields)] |
Always error during serialization when encountering unknown fields. When absent, unknown fields are ignored for self-describing formats like JSON. |
#[serde(bound="T: MyTrait")] |
Where-clause for the Serialize and Deserialize impls. This replaces any bounds inferred by Serde. |
#[serde(bound(serialize="T: MyTrait"))] |
Where-clause for the Serialize impl. |
#[serde(bound(deserialize="T: MyTrait"))] |
Where-clause for the Deserialize impl. |
Variant Annotations:
Annotation | Function |
---|---|
#[serde(rename="name")] |
Serialize and deserialize this variant with the given name |
#[serde(rename(serialize="name1"))] |
Serialize this variant with the given name |
#[serde(rename(deserialize="name1"))] |
Deserialize this variant with the given name |
Field Annotations:
Annotation | Function |
---|---|
#[serde(rename="name")] |
Serialize and deserialize this field with the given name |
#[serde(rename(serialize="name1"))] |
Serialize this field with the given name |
#[serde(rename(deserialize="name1"))] |
Deserialize this field with the given name |
#[serde(default)] |
If the value is not specified, use the Default::default() |
#[serde(default="$path")] |
Call the path to a function fn() -> T to build the value |
#[serde(skip_serializing)] |
Do not serialize this value |
#[serde(skip_deserializing)] |
Always use Default::default() or #[serde(default="$path")] instead of deserializing this value |
#[serde(skip_serializing_if="$path")] |
Do not serialize this value if this function fn(&T) -> bool returns true |
#[serde(serialize_with="$path")] |
Call a function fn<S>(&T, &mut S) -> Result<(), S::Error> where S: Serializer to serialize this value of type T |
#[serde(deserialize_with="$path")] |
Call a function fn<D>(&mut D) -> Result<T, D::Error> where D: Deserializer to deserialize this value of type T |
#[serde(bound="T: MyTrait")] |
Where-clause for the Serialize and Deserialize impls. This replaces any bounds inferred by Serde for the current field. |
#[serde(bound(serialize="T: MyTrait"))] |
Where-clause for the Serialize impl. |
#[serde(bound(deserialize="T: MyTrait"))] |
Where-clause for the Deserialize impl. |
Using in no_std
crates
The core serde
package defines a number of features to enable usage in a
variety of freestanding environments. Enable any or none of the following
features, and use default-features = false
in your Cargo.toml
:
alloc
(impliesnightly
)collections
(impliesalloc
andnightly
)std
(default)
If you only use default-features = false
, you will receive a stock no_std
serde with no support for any of the collection types.
Serialization Formats Using Serde
Format | Name |
---|---|
Bincode | bincode |
env vars | envy |
Hjson | serde_hjson |
JSON | serde_json |
MessagePack | rmp |
XML | serde_xml |
YAML | serde_yaml |