Go to file
Mazdak Farrokhzad 327c54ed02
Rollup merge of #60081 - pawroman:cleanup_unicode_script, r=varkor
Refactor unicode.py script

Hi, I noticed that the `unicode.py` script used some deprecated escapes in regular expressions. E.g. `\d`, `\w`, `\.` will be illegal in the future without "raw strings". This is now fixed. I have also cleaned up the script quite a bit.

## Escape deprecation

OK (note the `r`):
`re.compile(r"\d")`

Deprecated (from Python 3.6 onwards, see [here][link1] and [here][link2]):
`re.compile("\d")`.

[link1]: https://docs.python.org/3.6/whatsnew/3.6.html#deprecated-python-behavior
[link2]: https://bugs.python.org/issue27364

This was evident running the script using Python 3.7 like so:

```
$ python3 -Wall unicode.py
unicode.py:227: DeprecationWarning: invalid escape sequence \w
  re1 = re.compile("^ *([0-9A-F]+) *; *(\w+)")
unicode.py:228: DeprecationWarning: invalid escape sequence \.
  re2 = re.compile("^ *([0-9A-F]+)\.\.([0-9A-F]+) *; *(\w+)")
unicode.py:453: DeprecationWarning: invalid escape sequence \d
  pattern = "for Version (\d+)\.(\d+)\.(\d+) of the Unicode"
```

The documentation states that
> A backslash-character pair that is not a valid escape sequence now generates a DeprecationWarning. Although this will eventually become a SyntaxError, that will not be for several Python releases.

## Testing

To test my changes, I had to add support for choosing the Unicode version to use. The script will default to latest release (which is 12.0.0 at the moment, repo has 11.0.0 checked in).

The script generates the exact same output for version 11.0.0 with Python 2.7 and 3.7 and no longer generates any deprecation warnings:

```
$ python3 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 -Wall unicode.py -v 11.0.0
Using Unicode version: 11.0.0
Regenerated tables.rs.
$ git diff tables.rs
$ python2 --version
Python 2.7.16
$ python3 --version
Python 3.7.3
```

## Extra functionality

Furthermore, the script will check and download the latest Unicode version by default (without the `-v` argument). The `--help` is below:

```
$ ./unicode.py --help
usage: unicode.py [-h] [-v VERSION]

Regenerate Unicode tables (tables.rs).

optional arguments:
  -h, --help            show this help message and exit
  -v VERSION, --version VERSION
                        Unicode version to use (if not specified, defaults to
                        latest available final release).
```

## Cleanups

I have cleaned up the code quite a bit, with Python best practices and code style in mind. I'm happy to provide more details and rationale for all my changes if the reviewers so desire.

One externally visible change is that the Unicode data will now be downloaded into `src/libcore/unicode/downloaded` directory suffixed by Unicode version:

```
$ pwd
.../rust/src/libcore/unicode
$ exa -T downloaded/
downloaded
├── 11.0.0
│  ├── DerivedCoreProperties.txt
│  ├── DerivedNormalizationProps.txt
│  ├── PropList.txt
│  ├── ReadMe.txt
│  ├── Scripts.txt
│  ├── SpecialCasing.txt
│  └── UnicodeData.txt
└── 12.0.0
   ├── DerivedCoreProperties.txt
   ├── DerivedNormalizationProps.txt
   ├── PropList.txt
   ├── ReadMe.txt
   ├── Scripts.txt
   ├── SpecialCasing.txt
   └── UnicodeData.txt
```
2019-07-06 22:14:33 +02:00
.azure-pipelines ci: explicitly disable CRLF conversion on Windows 2019-07-01 21:57:08 +02:00
src Rollup merge of #60081 - pawroman:cleanup_unicode_script, r=varkor 2019-07-06 22:14:33 +02:00
.gitattributes ignore images line ending on older git versions 2019-01-23 12:44:15 +01:00
.gitignore Rollup merge of #60081 - pawroman:cleanup_unicode_script, r=varkor 2019-07-06 22:14:33 +02:00
.gitmodules Rebase LLVM to 8.0.0 final 2019-03-18 15:59:24 -07:00
.mailmap Fix michaelwoerister's mailmap 2019-07-01 20:23:35 -04:00
.travis.yml ci: finish the migration to azure 2019-07-01 17:12:28 +02:00
appveyor.yml ci: finish the migration to azure 2019-07-01 17:12:28 +02:00
Cargo.lock Rollup merge of #62151 - alexcrichton:update-openssl, r=Mark-Simulacrum 2019-07-06 02:37:53 +02:00
Cargo.toml Make libstd depend on the hashbrown crate 2019-04-24 06:54:14 +08:00
CODE_OF_CONDUCT.md Make extern ref HTTPS 2019-01-07 09:52:32 -05:00
config.toml.example Address review comments 2019-05-24 13:01:23 +03:00
configure rustbuild: Rewrite the configure script in Python 2017-08-27 18:53:30 -07:00
CONTRIBUTING.md Rollup merge of #59234 - stepnivlk:add-no_merge_policy, r=oli-obk 2019-03-31 16:10:36 +02:00
COPYRIGHT Rebase to the llvm-project monorepo 2019-01-25 15:39:54 -08:00
LICENSE-APACHE Update license, add license boilerplate to most files. Remainder will follow. 2012-12-03 17:12:14 -08:00
LICENSE-MIT LICENSE-MIT: Remove inaccurate (misattributed) copyright notice 2017-07-26 16:51:58 -07:00
README.md Update README.md 2019-07-04 17:35:26 -04:00
RELEASES.md before_exec actually will only get deprecated with 1.37 2019-06-30 18:30:46 +02:00
rustfmt.toml Add rustfmt toml 2019-05-03 21:45:37 +03:00
triagebot.toml Allow claiming issues with triagebot 2019-05-16 22:22:18 +02:00
x.py

The Rust Programming Language

This is the main source code repository for Rust. It contains the compiler, standard library, and documentation.

Quick Start

Read "Installation" from The Book.

Installing from Source

Note: If you wish to contribute to the compiler, you should read this chapter of the rustc-guide instead of this section.

Building on *nix

  1. Make sure you have installed the dependencies:

    • g++ 4.7 or later or clang++ 3.x or later
    • python 2.7 (but not 3.x)
    • GNU make 3.81 or later
    • cmake 3.4.3 or later
    • curl
    • git
  2. Clone the source with git:

    $ git clone https://github.com/rust-lang/rust.git
    $ cd rust
    
  1. Build and install:

    $ ./x.py build && sudo ./x.py install
    

    If after running sudo ./x.py install you see an error message like

    error: failed to load source for a dependency on 'cc'
    

    then run these two commands and then try sudo ./x.py install again:

    $ cargo install cargo-vendor
    
    $ cargo vendor
    

    Note: Install locations can be adjusted by copying the config file from ./config.toml.example to ./config.toml, and adjusting the prefix option under [install]. Various other options, such as enabling debug information, are also supported, and are documented in the config file.

    When complete, sudo ./x.py install will place several programs into /usr/local/bin: rustc, the Rust compiler, and rustdoc, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.

Building on Windows

There are two prominent ABIs in use on Windows: the native (MSVC) ABI used by Visual Studio, and the GNU ABI used by the GCC toolchain. Which version of Rust you need depends largely on what C/C++ libraries you want to interoperate with: for interop with software produced by Visual Studio use the MSVC build of Rust; for interop with GNU software built using the MinGW/MSYS2 toolchain use the GNU build.

MinGW

MSYS2 can be used to easily build Rust on Windows:

  1. Grab the latest MSYS2 installer and go through the installer.

  2. Run mingw32_shell.bat or mingw64_shell.bat from wherever you installed MSYS2 (i.e. C:\msys64), depending on whether you want 32-bit or 64-bit Rust. (As of the latest version of MSYS2 you have to run msys2_shell.cmd -mingw32 or msys2_shell.cmd -mingw64 from the command line instead)

  3. From this terminal, install the required tools:

    # Update package mirrors (may be needed if you have a fresh install of MSYS2)
    $ pacman -Sy pacman-mirrors
    
    # Install build tools needed for Rust. If you're building a 32-bit compiler,
    # then replace "x86_64" below with "i686". If you've already got git, python,
    # or CMake installed and in PATH you can remove them from this list. Note
    # that it is important that you do **not** use the 'python2' and 'cmake'
    # packages from the 'msys2' subsystem. The build has historically been known
    # to fail with these packages.
    $ pacman -S git \
                make \
                diffutils \
                tar \
                mingw-w64-x86_64-python2 \
                mingw-w64-x86_64-cmake \
                mingw-w64-x86_64-gcc
    
  4. Navigate to Rust's source code (or clone it), then build it:

    $ ./x.py build && ./x.py install
    

MSVC

MSVC builds of Rust additionally require an installation of Visual Studio 2017 (or later) so rustc can use its linker. The simplest way is to get the Visual Studio, check the “C++ build tools” and “Windows 10 SDK” workload.

(If you're installing cmake yourself, be careful that “C++ CMake tools for Windows” doesn't get included under “Individual components”.)

With these dependencies installed, you can build the compiler in a cmd.exe shell with:

> python x.py build

Currently, building Rust only works with some known versions of Visual Studio. If you have a more recent version installed the build system doesn't understand then you may need to force rustbuild to use an older version. This can be done by manually calling the appropriate vcvars file before running the bootstrap.

> CALL "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
> python x.py build

Specifying an ABI

Each specific ABI can also be used from either environment (for example, using the GNU ABI in PowerShell) by using an explicit build triple. The available Windows build triples are:

  • GNU ABI (using GCC)
    • i686-pc-windows-gnu
    • x86_64-pc-windows-gnu
  • The MSVC ABI
    • i686-pc-windows-msvc
    • x86_64-pc-windows-msvc

The build triple can be specified by either specifying --build=<triple> when invoking x.py commands, or by copying the config.toml file (as described in Building From Source), and modifying the build option under the [build] section.

Configure and Make

While it's not the recommended build system, this project also provides a configure script and makefile (the latter of which just invokes x.py).

$ ./configure
$ make && sudo make install

When using the configure script, the generated config.mk file may override the config.toml file. To go back to the config.toml file, delete the generated config.mk file.

Building Documentation

If youd like to build the documentation, its almost the same:

$ ./x.py doc

The generated documentation will appear under doc in the build directory for the ABI used. I.e., if the ABI was x86_64-pc-windows-msvc, the directory will be build\x86_64-pc-windows-msvc\doc.

Notes

Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier stage of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.

Snapshot binaries are currently built and tested on several platforms:

Platform / Architecture x86 x86_64
Windows (7, 8, 10, ...)
Linux (2.6.18 or later)
OSX (10.7 Lion or later)

You may find that other platforms work, but these are our officially supported build environments that are most likely to work.

There is more advice about hacking on Rust in CONTRIBUTING.md.

Getting Help

The Rust community congregates in a few places:

Contributing

To contribute to Rust, please see CONTRIBUTING.

Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust. And a good place to ask for help would be #rust-beginners.

The rustc guide might be a good place to start if you want to find out how various parts of the compiler work.

Also, you may find the rustdocs for the compiler itself useful.

License

Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.

See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.

Trademark

The Rust programming language is an open source, community project governed by a core team. It is also sponsored by the Mozilla Foundation (“Mozilla”), which owns and protects the Rust and Cargo trademarks and logos (the “Rust Trademarks”).

If you want to use these names or brands, please read the media guide.

Third-party logos may be subject to third-party copyrights and trademarks. See Licenses for details.