Add simple data-race detector
Partially fixes data-race detection, see #1372, based on Dynamic Race Detection for C++11
- This does not explore weak memory behaviour, only exploring one sequentially consistent ordering.
- Data-race detection is only enabled after the first thread is created, so should have minimal overhead for non-concurrent execution.
- ~~Does not attempt to re-use thread id's so creating and joining threads lots of time in an execution will result in the vector clocks growing in size and slowing down program execution~~ It does now