fedb775fbb
Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143
214 lines
7.8 KiB
Markdown
214 lines
7.8 KiB
Markdown
# Interacting with foreign code
|
|
|
|
One of Rust's aims, as a system programming language, is to
|
|
interoperate well with C code.
|
|
|
|
We'll start with an example. It's a bit bigger than usual, and
|
|
contains a number of new concepts. We'll go over it one piece at a
|
|
time.
|
|
|
|
This is a program that uses OpenSSL's `SHA1` function to compute the
|
|
hash of its first command-line argument, which it then converts to a
|
|
hexadecimal string and prints to standard output. If you have the
|
|
OpenSSL libraries installed, it should 'just work'.
|
|
|
|
use std;
|
|
import std::{vec, str};
|
|
|
|
native mod crypto {
|
|
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
|
|
}
|
|
|
|
fn as_hex(data: [u8]) -> str {
|
|
let acc = "";
|
|
for byte in data { acc += #fmt("%02x", byte as uint); }
|
|
ret acc;
|
|
}
|
|
|
|
fn sha1(data: str) -> str unsafe {
|
|
let bytes = str::bytes(data);
|
|
let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
|
|
vec::len(bytes), std::ptr::null());
|
|
ret as_hex(vec::unsafe::from_buf(hash, 20u));
|
|
}
|
|
|
|
fn main(args: [str]) {
|
|
std::io::println(sha1(args[1]));
|
|
}
|
|
|
|
## Native modules
|
|
|
|
Before we can call `SHA1`, we have to declare it. That is what this
|
|
part of the program is responsible for:
|
|
|
|
native mod crypto {
|
|
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
|
|
}
|
|
|
|
A `native` module declaration tells the compiler that the program
|
|
should be linked with a library by that name, and that the given list
|
|
of functions are available in that library.
|
|
|
|
In this case, it'll change the name `crypto` to a shared library name
|
|
in a platform-specific way (`libcrypto.so` on Linux, for example), and
|
|
link that in. If you want the module to have a different name from the
|
|
actual library, you can use the `"link_name"` attribute, like:
|
|
|
|
#[link_name = "crypto"]
|
|
native mod something {
|
|
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
|
|
}
|
|
|
|
## Native calling conventions
|
|
|
|
Most native C code use the cdecl calling convention, so that is what
|
|
Rust uses by default when calling native functions. Some native functions,
|
|
most notably the Windows API, use other calling conventions, so Rust
|
|
provides a way to to hint to the compiler which is expected by using
|
|
the `"abi"` attribute:
|
|
|
|
#[cfg(target_os = "win32")]
|
|
#[abi = "stdcall"]
|
|
native mod kernel32 {
|
|
fn SetEnvironmentVariableA(n: *u8, v: *u8) -> int;
|
|
}
|
|
|
|
The `"abi"` attribute applies to a native mod (it can not be applied
|
|
to a single function within a module), and must be either `"cdecl"`
|
|
or `"stdcall"`. Other conventions may be defined in the future.
|
|
|
|
## Unsafe pointers
|
|
|
|
The native `SHA1` function is declared to take three arguments, and
|
|
return a pointer.
|
|
|
|
# native mod crypto {
|
|
fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
|
|
# }
|
|
|
|
When declaring the argument types to a foreign function, the Rust
|
|
compiler has no way to check whether your declaration is correct, so
|
|
you have to be careful. If you get the number or types of the
|
|
arguments wrong, you're likely to get a segmentation fault. Or,
|
|
probably even worse, your code will work on one platform, but break on
|
|
another.
|
|
|
|
In this case, `SHA1` is defined as taking two `unsigned char*`
|
|
arguments and one `unsigned long`. The rust equivalents are `*u8`
|
|
unsafe pointers and an `uint` (which, like `unsigned long`, is a
|
|
machine-word-sized type).
|
|
|
|
Unsafe pointers can be created through various functions in the
|
|
standard lib, usually with `unsafe` somewhere in their name. You can
|
|
dereference an unsafe pointer with `*` operator, but use
|
|
caution—unlike Rust's other pointer types, unsafe pointers are
|
|
completely unmanaged, so they might point at invalid memory, or be
|
|
null pointers.
|
|
|
|
## Unsafe blocks
|
|
|
|
The `sha1` function is the most obscure part of the program.
|
|
|
|
# import std::{str, vec};
|
|
# mod crypto { fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } }
|
|
# fn as_hex(data: [u8]) -> str { "hi" }
|
|
fn sha1(data: str) -> str unsafe {
|
|
let bytes = str::bytes(data);
|
|
let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
|
|
vec::len(bytes), std::ptr::null());
|
|
ret as_hex(vec::unsafe::from_buf(hash, 20u));
|
|
}
|
|
|
|
Firstly, what does the `unsafe` keyword at the top of the function
|
|
mean? `unsafe` is a block modifier—it declares the block following it
|
|
to be known to be unsafe.
|
|
|
|
Some operations, like dereferencing unsafe pointers or calling
|
|
functions that have been marked unsafe, are only allowed inside unsafe
|
|
blocks. With the `unsafe` keyword, you're telling the compiler 'I know
|
|
what I'm doing'. The main motivation for such an annotation is that
|
|
when you have a memory error (and you will, if you're using unsafe
|
|
constructs), you have some idea where to look—it will most likely be
|
|
caused by some unsafe code.
|
|
|
|
Unsafe blocks isolate unsafety. Unsafe functions, on the other hand,
|
|
advertise it to the world. An unsafe function is written like this:
|
|
|
|
unsafe fn kaboom() { log "I'm harmless!"; }
|
|
|
|
This function can only be called from an unsafe block or another
|
|
unsafe function.
|
|
|
|
## Pointer fiddling
|
|
|
|
The standard library defines a number of helper functions for dealing
|
|
with unsafe data, casting between types, and generally subverting
|
|
Rust's safety mechanisms.
|
|
|
|
Let's look at our `sha1` function again.
|
|
|
|
# import std::{str, vec};
|
|
# mod crypto { fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } }
|
|
# fn as_hex(data: [u8]) -> str { "hi" }
|
|
# fn x(data: str) -> str unsafe {
|
|
let bytes = str::bytes(data);
|
|
let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
|
|
vec::len(bytes), std::ptr::null());
|
|
ret as_hex(vec::unsafe::from_buf(hash, 20u));
|
|
# }
|
|
|
|
The `str::bytes` function is perfectly safe, it converts a string to
|
|
an `[u8]`. This byte array is then fed to `vec::unsafe::to_ptr`, which
|
|
returns an unsafe pointer to its contents.
|
|
|
|
This pointer will become invalid as soon as the vector it points into
|
|
is cleaned up, so you should be very careful how you use it. In this
|
|
case, the local variable `bytes` outlives the pointer, so we're good.
|
|
|
|
Passing a null pointer as third argument to `SHA1` causes it to use a
|
|
static buffer, and thus save us the effort of allocating memory
|
|
ourselves. `ptr::null` is a generic function that will return an
|
|
unsafe null pointer of the correct type (Rust generics are awesome
|
|
like that—they can take the right form depending on the type that they
|
|
are expected to return).
|
|
|
|
Finally, `vec::unsafe::from_buf` builds up a new `[u8]` from the
|
|
unsafe pointer that was returned by `SHA1`. SHA1 digests are always
|
|
twenty bytes long, so we can pass `20u` for the length of the new
|
|
vector.
|
|
|
|
## Passing structures
|
|
|
|
C functions often take pointers to structs as arguments. Since Rust
|
|
records are binary-compatible with C structs, Rust programs can call
|
|
such functions directly.
|
|
|
|
This program uses the Posix function `gettimeofday` to get a
|
|
microsecond-resolution timer.
|
|
|
|
use std;
|
|
type timeval = {mutable tv_sec: u32,
|
|
mutable tv_usec: u32};
|
|
#[link_name = ""]
|
|
native mod libc {
|
|
fn gettimeofday(tv: *timeval, tz: *()) -> i32;
|
|
}
|
|
fn unix_time_in_microseconds() -> u64 unsafe {
|
|
let x = {mutable tv_sec: 0u32, mutable tv_usec: 0u32};
|
|
libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null());
|
|
ret (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64);
|
|
}
|
|
|
|
The `#[link_name = ""]` sets the name of the native module to the
|
|
empty string to prevent the rust compiler from trying to link it.
|
|
The standard C library is already linked with Rust programs.
|
|
|
|
A `timeval`, in C, is a struct with two 32-bit integers. Thus, we
|
|
define a record type with the same contents, and declare
|
|
`gettimeofday` to take a pointer to such a record.
|
|
|
|
The second argument to `gettimeofday` (the time zone) is not used by
|
|
this program, so it simply declares it to be a pointer to the nil
|
|
type. Since null pointer look the same, no matter which type they are
|
|
supposed to point at, this is safe.
|