Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split `Char` trait into `Char` (libcore) and `UnicodeChar` (libunicode)
- Unicode-aware functions now live in libunicode
- `is_alphabetic`, `is_XID_start`, `is_XID_continue`, `is_lowercase`,
`is_uppercase`, `is_whitespace`, `is_alphanumeric`, `is_control`, `is_digit`,
`to_uppercase`, `to_lowercase`
- added `width` method in UnicodeChar trait
- determines printed width of character in columns, or None if it is a non-NULL control character
- takes a boolean argument indicating whether the present context is CJK or not (characters with 'A'mbiguous widths are double-wide in CJK contexts, single-wide otherwise)
- split `StrSlice` into `StrSlice` (libcore) and `UnicodeStrSlice` (libunicode)
- functionality formerly in `StrSlice` that relied upon Unicode functionality from `Char` is now in `UnicodeStrSlice`
- `words`, `is_whitespace`, `is_alphanumeric`, `trim`, `trim_left`, `trim_right`
- also moved `Words` type alias into libunicode because `words` method is in `UnicodeStrSlice`
- unified Unicode tables from libcollections, libcore, and libregex into libunicode
- updated `unicode.py` in `src/etc` to generate aforementioned tables
- generated new tables based on latest Unicode data
- added `UnicodeChar` and `UnicodeStrSlice` traits to prelude
- libunicode is now the collection point for the `std::char` module, combining the libunicode functionality with the `Char` functionality from libcore
- thus, moved doc comment for `char` from `core::char` to `unicode::char`
- libcollections remains the collection point for `std::str`
The Unicode-aware functions that previously lived in the `Char` and `StrSlice` traits are no longer available to programs that only use libcore. To regain use of these methods, include the libunicode crate and `use` the `UnicodeChar` and/or `UnicodeStrSlice` traits:
extern crate unicode;
use unicode::UnicodeChar;
use unicode::UnicodeStrSlice;
use unicode::Words; // if you want to use the words() method
NOTE: this does *not* impact programs that use libstd, since UnicodeChar and UnicodeStrSlice have been added to the prelude.
closes#15224
[breaking-change]
- it allows to lookup using any str-equiv object, making TreeMaps finally usable (for example, it is much easier to work with JSON with lookup values being static strs)
- actually provides pretty flexible solution which could be extended to other equivalent types (although it might be not that performant)
- unicode tests live in coretest crate
- libcollections str tests need UnicodeChar trait.
- libregex perlw tests were checking a char in the Alphabetic category,
\x2161. Confirmed perl 5.18 considers this a \w character. Changed to
\x2961, which is not \w as the test expects.
Removing recursion from TreeMap implementation, because we don't have TCO. No need to add ```O(logn)``` extra stack frames to search in a tree.
I find it curious that ```find_mut``` and ```find``` basically duplicated the same logic, but in different ways (iterative vs recursive), possibly to maneuvre around mutability rules, but that's a more fundamental issue to deal with elsewhere.
Thanks to acrichto for the magic trick to appease borrowck (another issue to deal with elsewhere).
- Treat WOFF as binary files so that git does not perform newline normalization.
- Replace corrupt Heuristica files with Source Serif Pro — italics are [almost in production](https://github.com/adobe/source-serif-pro/issues/2) so I left Heuristica Italic which makes a good pair with SSP. Overall, Source Serif Pro is I think a better fit for rustdoc (cc @TheHydroImpulse). This ought to fix#15527.
- Store Source Code Pro locally in order to make offline docs freestanding. Fixes#14778.
Preview: http://adrientetar.legtux.org/cached/rust-docs/core.html
r? @alexcrichton
Now, the lexer will categorize every byte in its input according to the
grammar. The parser skips over these while parsing, thus avoiding their
presence in the input to syntax extensions.
This removes a bunch of token types. Tokens now store the original, unaltered
numeric literal (that is still checked for correctness), which is parsed into
an actual number later, as needed, when creating the AST.
This can change how syntax extensions work, but otherwise poses no visible
changes.
[breaking-change]
This shuffles things around a bit so that LIT_CHAR and co store an Ident
which is the original, unaltered literal in the source. When creating the AST,
unescape and postprocess them.
This changes how syntax extensions can work, slightly, but otherwise poses no
visible changes. To get a useful value out of one of these tokens, call
`parse::{char_lit, byte_lit, bin_lit, str_lit}`
[breaking-change]
Rather than just dumping the id in the interner, which is useless, actually
print the interned string. Adjust the lexer logging to use Show instead of
Poly.
This patch adds hygiene for methods. This one was more difficult than the others, due principally to issues surrounding `self`. Specifically, there were a whole bunch of places in the code that assumed that a `self` identifier could be discarded and then made up again later, causing the discard of contexts and hygiene breakage.
I'm leaving off `rustdoc` usage because it won't work unless this is a `pub fn`, and I want to talk about public/private in the context of modules. I'm also not mentioning `//!` because it is exclusively used to provide the overview of a module.
formerly, the self identifier was being discarded during parsing, which
stymies hygiene. The best fix here seems to be to attach a self identifier
to ExplicitSelf_, a change that rippled through the rest of the compiler,
but without any obvious damage.
The let-syntax expander is different in that it doesn't apply
a mark to its token trees before expansion. This is used
for macro_rules, and it's because macro_rules is essentially
MTWT's let-syntax. You don't want to mark before expand sees
let-syntax, because there's no "after" syntax to mark again.
In some sense, the cleaner approach might be to introduce a new
AST node that macro_rules expands into; this would make it clearer
that the expansion of a macro is distinct from the addition of a
new macro binding.
This should work for now, though...
This commit disables rustc's emission of rpath attributes into dynamic libraries
and executables by default. The functionality is still preserved, but it must
now be manually enabled via a `-C rpath` flag.
This involved a few changes to the local build system:
* --disable-rpath is now the default configure option
* Makefiles now prefer our own LD_LIBRARY_PATH over the user's LD_LIBRARY_PATH
in order to support building rust with rust already installed.
* The compiletest program was taught to correctly pass through the aux dir as a
component of LD_LIBRARY_PATH in more situations.
The major impact of this change is that neither rustdoc nor rustc will work
out-of-the-box in all situations because they are dynamically linked. It must be
arranged to ensure that the libraries of a rust installation are part of the
LD_LIBRARY_PATH. The default installation paths for all platforms ensure this,
but if an installation is in a nonstandard location, then configuration may be
necessary.
Additionally, for all developers of rustc, it will no longer be possible to run
$target/stageN/bin/rustc out-of-the-box. The old behavior can be regained
through the `--enable-rpath` option to the configure script.
This change brings linux/mac installations in line with windows installations
where rpath is not possible.
Closes#11747
[breaking-change]