rust/doc/tutorial/ffi.md

# Interacting with foreign code

One of Rust's aims, as a system programming language, is to
interoperate well with C code.

We'll start with an example. It's a bit bigger than usual, and
contains a number of new concepts. We'll go over it one piece at a
time.

This is a program that uses OpenSSL's `SHA1` function to compute the
hash of its first command-line argument, which it then converts to a
hexadecimal string and prints to standard output. If you have the
OpenSSL libraries installed, it should 'just work'.

    use std;
    import std::{vec, str};
    
    native mod crypto {
        fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
    }
    
    fn as_hex(data: [u8]) -> str {
        let acc = "";
        for byte in data { acc += #fmt("%02x", byte as uint); }
        ret acc;
    }

    fn sha1(data: str) -> str unsafe {
        let bytes = str::bytes(data);
        let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
                                vec::len(bytes), std::ptr::null());
        ret as_hex(vec::unsafe::from_buf(hash, 20u));
    }
    
    fn main(args: [str]) {
        std::io::println(sha1(args[1]));
    }

## Native modules

Before we can call `SHA1`, we have to declare it. That is what this
part of the program is responsible for:

    native mod crypto {
        fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
    }

A `native` module declaration tells the compiler that the program
should be linked with a library by that name, and that the given list
of functions are available in that library.

In this case, it'll change the name `crypto` to a shared library name
in a platform-specific way (`libcrypto.so` on Linux, for example), and
link that in. If you want the module to have a different name from the
actual library, you can use the `"link_name"` attribute, like:

    #[link_name = "crypto"]
    native mod something {
        fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
    }

## Native calling conventions

Most native C code use the cdecl calling convention, so that is what
Rust uses by default when calling native functions. Some native functions,
most notably the Windows API, use other calling conventions, so Rust
provides a way to to hint to the compiler which is expected by using
the `"abi"` attribute:

    #[cfg(target_os = "win32")]
    #[abi = "stdcall"]
    native mod kernel32 {
        fn SetEnvironmentVariableA(n: *u8, v: *u8) -> int;
    }

The `"abi"` attribute applies to a native mod (it can not be applied
to a single function within a module), and must be either `"cdecl"`
or `"stdcall"`. Other conventions may be defined in the future.

## Unsafe pointers

The native `SHA1` function is declared to take three arguments, and
return a pointer.

    # native mod crypto {
    fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8;
    # }

When declaring the argument types to a foreign function, the Rust
compiler has no way to check whether your declaration is correct, so
you have to be careful. If you get the number or types of the
arguments wrong, you're likely to get a segmentation fault. Or,
probably even worse, your code will work on one platform, but break on
another.

In this case, `SHA1` is defined as taking two `unsigned char*`
arguments and one `unsigned long`. The rust equivalents are `*u8`
unsafe pointers and an `uint` (which, like `unsigned long`, is a
machine-word-sized type).

Unsafe pointers can be created through various functions in the
standard lib, usually with `unsafe` somewhere in their name. You can
dereference an unsafe pointer with `*` operator, but use
caution—unlike Rust's other pointer types, unsafe pointers are
completely unmanaged, so they might point at invalid memory, or be
null pointers.

## Unsafe blocks

The `sha1` function is the most obscure part of the program.

    # import std::{str, vec};
    # mod crypto { fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } }
    # fn as_hex(data: [u8]) -> str { "hi" }
    fn sha1(data: str) -> str unsafe {
        let bytes = str::bytes(data);
        let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
                                vec::len(bytes), std::ptr::null());
        ret as_hex(vec::unsafe::from_buf(hash, 20u));
    }

Firstly, what does the `unsafe` keyword at the top of the function
mean? `unsafe` is a block modifier—it declares the block following it
to be known to be unsafe.

Some operations, like dereferencing unsafe pointers or calling
functions that have been marked unsafe, are only allowed inside unsafe
blocks. With the `unsafe` keyword, you're telling the compiler 'I know
what I'm doing'. The main motivation for such an annotation is that
when you have a memory error (and you will, if you're using unsafe
constructs), you have some idea where to look—it will most likely be
caused by some unsafe code.

Unsafe blocks isolate unsafety. Unsafe functions, on the other hand,
advertise it to the world. An unsafe function is written like this:

    unsafe fn kaboom() { log "I'm harmless!"; }

This function can only be called from an unsafe block or another
unsafe function.

## Pointer fiddling

The standard library defines a number of helper functions for dealing
with unsafe data, casting between types, and generally subverting
Rust's safety mechanisms.

Let's look at our `sha1` function again.

    # import std::{str, vec};
    # mod crypto { fn SHA1(src: *u8, sz: uint, out: *u8) -> *u8 { out } }
    # fn as_hex(data: [u8]) -> str { "hi" }
    # fn x(data: str) -> str unsafe {
    let bytes = str::bytes(data);
    let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),
                            vec::len(bytes), std::ptr::null());
    ret as_hex(vec::unsafe::from_buf(hash, 20u));
    # }

The `str::bytes` function is perfectly safe, it converts a string to
an `[u8]`. This byte array is then fed to `vec::unsafe::to_ptr`, which
returns an unsafe pointer to its contents.

This pointer will become invalid as soon as the vector it points into
is cleaned up, so you should be very careful how you use it. In this
case, the local variable `bytes` outlives the pointer, so we're good.

Passing a null pointer as third argument to `SHA1` causes it to use a
static buffer, and thus save us the effort of allocating memory
ourselves. `ptr::null` is a generic function that will return an
unsafe null pointer of the correct type (Rust generics are awesome
like that—they can take the right form depending on the type that they
are expected to return).

Finally, `vec::unsafe::from_buf` builds up a new `[u8]` from the
unsafe pointer that was returned by `SHA1`. SHA1 digests are always
twenty bytes long, so we can pass `20u` for the length of the new
vector.

## Passing structures

C functions often take pointers to structs as arguments. Since Rust
records are binary-compatible with C structs, Rust programs can call
such functions directly.

This program uses the Posix function `gettimeofday` to get a
microsecond-resolution timer.

    use std;
    type timeval = {mutable tv_sec: u32,
                    mutable tv_usec: u32};
    #[nolink]
    native mod libc {
        fn gettimeofday(tv: *timeval, tz: *()) -> i32;
    }
    fn unix_time_in_microseconds() -> u64 unsafe {
        let x = {mutable tv_sec: 0u32, mutable tv_usec: 0u32};
        libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null());
        ret (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64);
    }

The `#[nolink]` sets the name of the native module to the
empty string to prevent the rust compiler from trying to link it.
The standard C library is already linked with Rust programs.

A `timeval`, in C, is a struct with two 32-bit integers. Thus, we
define a record type with the same contents, and declare
`gettimeofday` to take a pointer to such a record.

The second argument to `gettimeofday` (the time zone) is not used by
this program, so it simply declares it to be a pointer to the nil
type. Since null pointer look the same, no matter which type they are
supposed to point at, this is safe.
Flesh out the module section of the tutorial 2011-11-01 08:38:55 -05:00			`# Interacting with foreign code`

Fix some typos in tutorial 2011-11-01 16:11:19 -05:00			`One of Rust's aims, as a system programming language, is to`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`interoperate well with C code.`

			`We'll start with an example. It's a bit bigger than usual, and`
			`contains a number of new concepts. We'll go over it one piece at a`
			`time.`

			This is a program that uses OpenSSL's `SHA1` function to compute the
			`hash of its first command-line argument, which it then converts to a`
			`hexadecimal string and prints to standard output. If you have the`
			`OpenSSL libraries installed, it should 'just work'.`

			`use std;`
			`import std::{vec, str};`

tutorial: Update with native module syntax change 2011-11-16 09:28:15 -06:00			`native mod crypto {`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`fn SHA1(src: u8, sz: uint, out: u8) -> *u8;`
			`}`

			`fn as_hex(data: [u8]) -> str {`
			`let acc = "";`
			`for byte in data { acc += #fmt("%02x", byte as uint); }`
			`ret acc;`
			`}`

			`fn sha1(data: str) -> str unsafe {`
			`let bytes = str::bytes(data);`
Use libcrypto.so instead of libssl.so in the ffi part of tutorial 2011-11-15 06:53:03 -06:00			`let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),`
			`vec::len(bytes), std::ptr::null());`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`ret as_hex(vec::unsafe::from_buf(hash, 20u));`
			`}`

			`fn main(args: [str]) {`
			`std::io::println(sha1(args[1]));`
			`}`

			`## Native modules`

			Before we can call `SHA1`, we have to declare it. That is what this
			`part of the program is responsible for:`

tutorial: Update with native module syntax change 2011-11-16 09:28:15 -06:00			`native mod crypto {`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`fn SHA1(src: u8, sz: uint, out: u8) -> *u8;`
			`}`

			A `native` module declaration tells the compiler that the program
			`should be linked with a library by that name, and that the given list`
			`of functions are available in that library.`

Use libcrypto.so instead of libssl.so in the ffi part of tutorial 2011-11-15 06:53:03 -06:00			In this case, it'll change the name `crypto` to a shared library name
			in a platform-specific way (`libcrypto.so` on Linux, for example), and
			`link that in. If you want the module to have a different name from the`
tutorial: Update with native module syntax change 2011-11-16 09:28:15 -06:00			actual library, you can use the `"link_name"` attribute, like:
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00
tutorial: Update with native module syntax change 2011-11-16 09:28:15 -06:00			`#[link_name = "crypto"]`
			`native mod something {`
			`fn SHA1(src: u8, sz: uint, out: u8) -> *u8;`
			`}`

Update tutorial for native mod changes 2011-11-16 14:16:36 -06:00			`## Native calling conventions`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00
Update tutorial for native mod changes 2011-11-16 14:16:36 -06:00			`Most native C code use the cdecl calling convention, so that is what`
			`Rust uses by default when calling native functions. Some native functions,`
			`most notably the Windows API, use other calling conventions, so Rust`
			`provides a way to to hint to the compiler which is expected by using`
			the `"abi"` attribute:

Add hacks to extract and compile tutorial code Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143 2011-11-22 09:12:23 -06:00			`#[cfg(target_os = "win32")]`
Update tutorial for native mod changes 2011-11-16 14:16:36 -06:00			`#[abi = "stdcall"]`
			`native mod kernel32 {`
			`fn SetEnvironmentVariableA(n: u8, v: u8) -> int;`
			`}`

			The `"abi"` attribute applies to a native mod (it can not be applied
			to a single function within a module), and must be either `"cdecl"`
			or `"stdcall"`. Other conventions may be defined in the future.
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00
			`## Unsafe pointers`

			The native `SHA1` function is declared to take three arguments, and
			`return a pointer.`

Add hacks to extract and compile tutorial code Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143 2011-11-22 09:12:23 -06:00			`# native mod crypto {`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`fn SHA1(src: u8, sz: uint, out: u8) -> *u8;`
Add hacks to extract and compile tutorial code Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143 2011-11-22 09:12:23 -06:00			`# }`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00
			`When declaring the argument types to a foreign function, the Rust`
			`compiler has no way to check whether your declaration is correct, so`
			`you have to be careful. If you get the number or types of the`
			`arguments wrong, you're likely to get a segmentation fault. Or,`
			`probably even worse, your code will work on one platform, but break on`
			`another.`

			In this case, `SHA1` is defined as taking two `unsigned char*`
			arguments and one `unsigned long`. The rust equivalents are `*u8`
			unsafe pointers and an `uint` (which, like `unsigned long`, is a
			`machine-word-sized type).`

			`Unsafe pointers can be created through various functions in the`
			standard lib, usually with `unsafe` somewhere in their name. You can
			dereference an unsafe pointer with `*` operator, but use
			`caution—unlike Rust's other pointer types, unsafe pointers are`
			`completely unmanaged, so they might point at invalid memory, or be`
			`null pointers.`

			`## Unsafe blocks`

			The `sha1` function is the most obscure part of the program.

Add hacks to extract and compile tutorial code Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143 2011-11-22 09:12:23 -06:00			`# import std::{str, vec};`
			`# mod crypto { fn SHA1(src: u8, sz: uint, out: u8) -> *u8 { out } }`
			`# fn as_hex(data: [u8]) -> str { "hi" }`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`fn sha1(data: str) -> str unsafe {`
			`let bytes = str::bytes(data);`
Use libcrypto.so instead of libssl.so in the ffi part of tutorial 2011-11-15 06:53:03 -06:00			`let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),`
			`vec::len(bytes), std::ptr::null());`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`ret as_hex(vec::unsafe::from_buf(hash, 20u));`
			`}`

			Firstly, what does the `unsafe` keyword at the top of the function
			mean? `unsafe` is a block modifier—it declares the block following it
			`to be known to be unsafe.`

			`Some operations, like dereferencing unsafe pointers or calling`
			`functions that have been marked unsafe, are only allowed inside unsafe`
			blocks. With the `unsafe` keyword, you're telling the compiler 'I know
			`what I'm doing'. The main motivation for such an annotation is that`
			`when you have a memory error (and you will, if you're using unsafe`
			`constructs), you have some idea where to look—it will most likely be`
			`caused by some unsafe code.`

			`Unsafe blocks isolate unsafety. Unsafe functions, on the other hand,`
			`advertise it to the world. An unsafe function is written like this:`

			`unsafe fn kaboom() { log "I'm harmless!"; }`

			`This function can only be called from an unsafe block or another`
			`unsafe function.`

			`## Pointer fiddling`

			`The standard library defines a number of helper functions for dealing`
			`with unsafe data, casting between types, and generally subverting`
			`Rust's safety mechanisms.`

			Let's look at our `sha1` function again.

Add hacks to extract and compile tutorial code Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143 2011-11-22 09:12:23 -06:00			`# import std::{str, vec};`
			`# mod crypto { fn SHA1(src: u8, sz: uint, out: u8) -> *u8 { out } }`
			`# fn as_hex(data: [u8]) -> str { "hi" }`
			`# fn x(data: str) -> str unsafe {`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`let bytes = str::bytes(data);`
Use libcrypto.so instead of libssl.so in the ffi part of tutorial 2011-11-15 06:53:03 -06:00			`let hash = crypto::SHA1(vec::unsafe::to_ptr(bytes),`
			`vec::len(bytes), std::ptr::null());`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`ret as_hex(vec::unsafe::from_buf(hash, 20u));`
Add hacks to extract and compile tutorial code Not included in the build by default, since it's fragile and kludgy. Do something like this to run it: cd doc/tutorial RUSTC=../../build/stage2/bin/rustc bash test.sh Closes #1143 2011-11-22 09:12:23 -06:00			`# }`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00
			The `str::bytes` function is perfectly safe, it converts a string to
			an `[u8]`. This byte array is then fed to `vec::unsafe::to_ptr`, which
			`returns an unsafe pointer to its contents.`

			`This pointer will become invalid as soon as the vector it points into`
			`is cleaned up, so you should be very careful how you use it. In this`
			case, the local variable `bytes` outlives the pointer, so we're good.

			Passing a null pointer as third argument to `SHA1` causes it to use a
			`static buffer, and thus save us the effort of allocating memory`
			ourselves. `ptr::null` is a generic function that will return an
			`unsafe null pointer of the correct type (Rust generics are awesome`
			`like that—they can take the right form depending on the type that they`
			`are expected to return).`

			Finally, `vec::unsafe::from_buf` builds up a new `[u8]` from the
			unsafe pointer that was returned by `SHA1`. SHA1 digests are always
			twenty bytes long, so we can pass `20u` for the length of the new
			`vector.`

			`## Passing structures`

			`C functions often take pointers to structs as arguments. Since Rust`
			`records are binary-compatible with C structs, Rust programs can call`
			`such functions directly.`

			This program uses the Posix function `gettimeofday` to get a
			`microsecond-resolution timer.`

			`use std;`
Make ptr::addr_of return an immutable vec, add mut_addr_of 2011-11-02 05:42:51 -05:00			`type timeval = {mutable tv_sec: u32,`
			`mutable tv_usec: u32};`
implement #[nolink]; deprecate #[link_name = ""]; note in stdlib to remove empty link_name. Can't remove them from stdlib until the snapshotted compiler supports #[nolink]. 2011-12-15 14:25:29 -06:00			`#[nolink]`
tutorial: Update with native module syntax change 2011-11-16 09:28:15 -06:00			`native mod libc {`
Make ptr::addr_of return an immutable vec, add mut_addr_of 2011-11-02 05:42:51 -05:00			`fn gettimeofday(tv: timeval, tz: ()) -> i32;`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`}`
			`fn unix_time_in_microseconds() -> u64 unsafe {`
tutorial: Fix type mismatch in example FFI code Compile error: time.rs:13:23: 13:43 error: mismatched types: expected R[tv_sec=mMltv_usec=mMl] but found R[tv_sec=Mltv_usec=Ml] (record elements differ in mutability) time.rs:13 libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null()); ^~~~~~~~~~~~~~~~~~~~ error: aborting due to previous errors rust: upcall fail 'explicit failure', ../src/comp/driver/session.rs:70 rust: domain main @0x9dfd178 root task failed 2011-11-17 08:46:44 -06:00			`let x = {mutable tv_sec: 0u32, mutable tv_usec: 0u32};`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00			`libc::gettimeofday(std::ptr::addr_of(x), std::ptr::null());`
			`ret (x.tv_sec as u64) * 1000_000_u64 + (x.tv_usec as u64);`
			`}`

implement #[nolink]; deprecate #[link_name = ""]; note in stdlib to remove empty link_name. Can't remove them from stdlib until the snapshotted compiler supports #[nolink]. 2011-12-15 14:25:29 -06:00			The `#[nolink]` sets the name of the native module to the
tutorial: Update with native module syntax change 2011-11-16 09:28:15 -06:00			`empty string to prevent the rust compiler from trying to link it.`
			`The standard C library is already linked with Rust programs.`
Fill in the foreign-function part of the tutorial 2011-11-01 11:35:18 -05:00
			A `timeval`, in C, is a struct with two 32-bit integers. Thus, we
			`define a record type with the same contents, and declare`
			`gettimeofday` to take a pointer to such a record.

			The second argument to `gettimeofday` (the time zone) is not used by
			`this program, so it simply declares it to be a pointer to the nil`
			`type. Since null pointer look the same, no matter which type they are`
			`supposed to point at, this is safe.`