rust/doc/tutorial/syntax.md
2011-11-16 15:02:00 -08:00

12 KiB

Syntax Basics

Braces

Assuming you've programmed in any C-family language (C++, Java, JavaScript, C#, or PHP), Rust will feel familiar. The main surface difference to be aware of is that the bodies of if statements and of loops have to be wrapped in brackets. Single-statement, bracket-less bodies are not allowed.

If the verbosity of that bothers you, consider the fact that this allows you to omit the parentheses around the condition in if, while, and similar constructs. This will save you two characters every time. As a bonus, you no longer have to spend any mental energy on deciding whether you need to add braces or not, or on adding them after the fact when adding a statement to an if branch.

Accounting for these differences, the surface syntax of Rust statements and expressions is C-like. Function calls are written myfunc(arg1, arg2), operators have mostly the same name and precedence that they have in C, comments look the same, and constructs like if and while are available:

fn main() {
    if 1 < 2 {
        while false { call_a_function(10 * 4); }
    } else if 4 < 3 || 3 < 4 {
        // Comments are C++-style too
    } else {
        /* Multi-line comment syntax */
    }
}

Expression syntax

Though it isn't apparent in all code, there is a fundamental difference between Rust's syntax and the predecessors in this family of languages. A lot of thing that are statements in C are expressions in Rust. This allows for useless things like this (which passes nil—the void type—to a function):

a_function(while false {});

But also useful things like this:

let x = if the_stars_align() { 4 }
        else if something_else() { 3 }
        else { 0 };

This piece of code will bind the variable x to a value depending on the conditions. Note the condition bodies, which look like { expression }. The lack of a semicolon after the last statement in a braced block gives the whole block the value of that last expression. If the branches of the if had looked like { 4; }, the above example would simply assign nil (void) to x. But without the semicolon, each branch has a different value, and x gets the value of the branch that was taken.

This also works for function bodies. This function returns a boolean:

fn is_four(x: int) -> bool { x == 4 }

In short, everything that's not a declaration (let for variables, fn for functions, etcetera) is an expression.

If all those things are expressions, you might conclude that you have to add a terminating semicolon after every statement, even ones that are not traditionally terminated with a semicolon in C (like while). That is not the case, though. Expressions that end in a block only need a semicolon if that block contains a trailing expression. while loops do not allow trailing expressions, and if statements tend to only have a trailing expression when you want to use their value for something—in which case you'll have embedded it in a bigger statement, like the let x = ... example above.

Identifiers

Rust identifiers must start with an alphabetic character or an underscore, and after that may contain any alphanumeric character, and more underscores.

NOTE: The parser doesn't currently recognize non-ascii alphabetic characters. This is a bug that will eventually be fixed.

The double-colon (::) is used as a module separator, so std::io::println means 'the thing named println in the module named io in the module named std'.

Rust will normally emit warning about unused variables. These can be suppressed by using a variable name that starts with an underscore.

fn this_warns(x: int) {}
fn this_doesnt(_x: int) {}

Variable declaration

The let keyword, as we've seen, introduces a local variable. Global constants can be defined with const:

use std;
const repeat: uint = 5u;
fn main() {
    let count = 0u;
    while count < repeat {
        std::io::println("Hi!");
        count += 1u;
    }
}

Types

The -> bool in the is_four example is the way a function's return type is written. For functions that do not return a meaningful value (these conceptually return nil in Rust), you can optionally say -> () (() is how nil is written), but usually the return annotation is simply left off, as in the fn main() { ... } examples we've seen earlier.

Every argument to a function must have its type declared (for example, x: int). Inside the function, type inference will be able to automatically deduce the type of most locals (generic functions, which we'll come back to later, will occasionally need additional annotation). Locals can be written either with or without a type annotation:

// The type of this vector will be inferred based on its use.
let x = [];
// Explicitly say this is a vector of integers.
let y: [int] = [];

The basic types are written like this:

()
Nil, the type that has only a single value.
bool
Boolean type..
int
A machine-pointer-sized integer.
uint
A machine-pointer-sized unsigned integer.
i8, i16, i32, i64
Signed integers with a specific size (in bits).
u8, u16, u32, u64
Unsigned integers with a specific size.
f32, f64
Floating-point types.
float
The largest floating-point type efficiently supported on the target machine.
char
A character is a 32-bit Unicode code point.
str
String type. A string contains a utf-8 encoded sequence of characters.

These can be combined in composite types, which will be described in more detail later on (the Ts here stand for any other type):

[T]
Vector type.
[mutable T]
Mutable vector type.
(T1, T2)
Tuple type. Any arity above 1 is supported.
{fname1: T1, fname2: T2}
Record type.
fn(arg1: T1, arg2: T2) -> T3, lambda(), block()
Function types.
@T, ~T, *T
Pointer types.
obj { fn method1() }
Object type.

Types can be given names with type declarations:

type monster_size = uint;

This will provide a synonym, monster_size, for unsigned integers. It will not actually create a new type—monster_size and uint can be used interchangeably, and using one where the other is expected is not a type error. Read about single-variant tags further on if you need to create a type name that's not just a synonym.

Literals

Integers can be written in decimal (144), hexadecimal (0x90), and binary (0b10010000) base. Without suffix, an integer literal is considered to be of type int. Add a u (144u) to make it a uint instead. Literals of the fixed-size integer types can be created by the literal with the type name (255u8, 50i64, etc).

Note that, in Rust, no implicit conversion between integer types happens. If you are adding one to a variable of type uint, you must type v += 1u—saying += 1 will give you a type error.

Floating point numbers are written 0.0, 1e6, or 2.1e-4. Without suffix, the literal is assumed to be of type float. Suffixes f32 and f64 can be used to create literals of a specific type. The suffix f can be used to write float literals without a dot or exponent: 3f.

The nil literal is written just like the type: (). The keywords true and false produce the boolean literals.

Character literals are written between single quotes, as in 'x'. You may put non-ascii characters between single quotes (your source file should be encoded as utf-8 in that case). Rust understands a number of character escapes, using the backslash character:

\n
A newline (unicode character 32).
\r
A carriage return (13).
\t
A tab character (9).
\\, \', \"
Simply escapes the following character.
\xHH, \uHHHH, \UHHHHHHHH
Unicode escapes, where the H characters are the hexadecimal digits that form the character code.

String literals allow the same escape sequences. They are written between double quotes ("hello"). Rust strings may contain newlines. When a newline is preceded by a backslash, it, and all white space following it, will not appear in the resulting string literal.

Operators

Rust's set of operators contains very few surprises. The main difference with C is that ++ and -- are missing, and that the logical binary operators have higher precedence—in C, x & 2 > 0 comes out as x & (2 > 0), in Rust, it means (x & 2) > 0, which is more likely to be what you expect (unless you are a C veteran).

Thus, binary arithmetic is done with *, /, %, +, and - (multiply, divide, remainder, plus, minus). - is also a unary prefix operator (there are no unary postfix operators in Rust) that does negation.

Binary shifting is done with >> (shift right), >>> (arithmetic shift right), and << (shift left). Logical bitwise operators are &, |, and ^ (and, or, and exclusive or), and unary ! for bitwise negation (or boolean negation when applied to a boolean value).

The comparison operators are the traditional ==, !=, <, >, <=, and >=. Short-circuiting (lazy) boolean operators are written && (and) and || (or).

Rust has a ternary conditional operator ?:, as in:

let message = badness < 10 ? "error" : "FATAL ERROR";

For type casting, Rust uses the binary as operator, which has a precedence between the bitwise combination operators (&, |, ^) and the comparison operators. It takes an expression on the left side, and a type on the right side, and will, if a meaningful conversion exists, convert the result of the expression to the given type.

let x: float = 4.0;
let y: uint = x as uint;
assert y == 4u;

Attributes

Every definition can be annotated with attributes. Attributes are meta information that can serve a variety of purposes. One of those is conditional compilation:

#[cfg(target_os = "win32")]
fn register_win_service() { /* ... */ }

This will cause the function to vanish without a trace during compilation on a non-Windows platform, much like #ifdef in C (it allows cfg(flag=value) and cfg(flag) forms, where the second simply checks whether the configuration flag is defined at all). Flags for target_os and target_arch are set by the compiler. It is possible to set additional flags with the --cfg command-line option.

Attributes always look like #[attr], where attr can be simply a name (as in #[test], which is used by the built-in test framework), a name followed by = and then a literal (as in #[license = "BSD"], which is a valid way to annotate a Rust program as being released under a BSD-style license), or a name followed by a comma-separated list of nested attributes, as in the cfg example above, or in this crate metadata declaration:

#[link(name = "std",
       vers = "0.1",
       url = "http://rust-lang.org/src/std")];

An attribute without a semicolon following it applies to the definition that follows it. When terminated with a semicolon, it applies to the current context. The above example could also be written like this:

fn register_win_service() {
    #[cfg(target_os = "win32")];
    /* ... */
}

Syntax extensions

There are plans to support user-defined syntax (macros) in Rust. This currently only exists in very limited form.

The compiler defines a few built-in syntax extensions. The most useful one is #fmt, a printf-style text formatting macro that is expanded at compile time.

std::io::println(#fmt("%s is %d", "the answer", 42));

#fmt supports most of the directives that printf supports, but will give you a compile-time error when the types of the directives don't match the types of the arguments.

All syntax extensions look like #word. Another built-in one is #env, which will look up its argument as an environment variable at compile-time.

std::io::println(#env("PATH"));