178 Commits

Author SHA1 Message Date
bors
413bafdabf Auto merge of #33128 - xen0n:more-confusing-unicode-chars, r=nagisa
Add more aliases for Unicode confusable chars

Building upon #29837, this PR:

* added aliases for space characters,
* distinguished square brackets from parens, and
* added common CJK punctuation characters as aliases.

This will especially help CJK users who may have forgotten to switch off IME when coding.
2016-05-05 08:50:23 -07:00
Georg Brandl
9e2300015b lexer: do not display char confusingly in error message
Current code leads to messages like "... use a \xHH escape: \u{e4}"
which is confusing.

The printed span already points to the offending character, which
should be enough to identify the non-ASCII problem.

Fixes: #29088
2016-05-02 07:45:57 +02:00
mitaa
6887202ea3 Make some fatal lexer errors recoverable 2016-04-27 20:48:18 +02:00
Vadim Petrochenkov
6c44bea644 syntax: Check paths in visibilities for type parameters
syntax: Merge PathParsingMode::NoTypesAllowed and PathParsingMode::ImportPrefix
syntax: Rename PathParsingMode and its variants to better express their purpose
syntax: Remove obsolete error message about 'self lifetime
syntax: Remove ALLOW_MODULE_PATHS workaround
syntax/resolve: Adjust some error messages
resolve: Compare unhygienic (not renamed) names with keywords::Invalid, invalid identifiers may appear to be valid after renaming
2016-04-24 20:59:44 +03:00
Vadim Petrochenkov
546c052d22 syntax: Get rid of token::IdentStyle 2016-04-24 20:59:44 +03:00
Wang Xuerui
496081c5c7 add more confusable CJK square bracket aliases 2016-04-21 20:17:51 +08:00
Wang Xuerui
47d5c90fbe correct aliases for square brackets 2016-04-21 20:09:26 +08:00
Wang Xuerui
5f70e8f6cd add confusable space characters 2016-04-21 20:05:47 +08:00
Wang Xuerui
dbe700ef65 add more characters easily inputtable with CJK IMEs 2016-04-21 17:51:47 +08:00
bors
8b7c3f20e8 Auto merge of #29734 - Ryman:whitespace_consistency, r=Aatch
libsyntax: be more accepting of whitespace in lexer

Fixes #29590.

Perhaps this may need more thorough testing?

r? @Aatch
2016-03-07 20:06:17 -08:00
Kevin Butler
24578e0fe5 libsyntax: accept only whitespace with the PATTERN_WHITE_SPACE property
This aligns with unicode recommendations and should be stable for all future
unicode releases. See http://unicode.org/reports/tr31/#R3.

This renames `libsyntax::lexer::is_whitespace` to `is_pattern_whitespace`
so potentially breaks users of libsyntax.
2016-01-16 00:57:12 +00:00
Kevin Butler
8a27230102 libsyntax: use char::is_whitespace instead of custom implementations
Fixes #29590.
2016-01-14 22:47:50 +00:00
Greg Chapple
acc9428c6a Display better snippet for invalid char literal
Given this code:

    fn main() {
        let _ = 'abcd';
    }

The compiler would give a message like:

    error: character literal may only contain one codepoint: ';
    let _ = 'abcd';
                 ^~

With this change, the message now displays:

    error: character literal may only contain one codepoint: 'abcd'
    let _ = 'abcd'
            ^~~~~~

Fixes #30033
2016-01-14 17:34:42 +00:00
Tshepang Lekhonkhobe
aa3b4c668e re-instate comment that was mysteriously disappeared 2016-01-12 21:00:09 +02:00
Tshepang Lekhonkhobe
249b5c0b4a address review comment 2016-01-04 21:35:06 +02:00
Tshepang Lekhonkhobe
4a1062873e fix "make tidy" failure 2016-01-03 11:20:06 +02:00
Tshepang Lekhonkhobe
f20a139981 run rustfmt on syntax::parse::lexer 2016-01-03 11:14:09 +02:00
Nick Cameron
95dc7efad0 use structured errors 2015-12-30 14:27:59 +13:00
Nick Cameron
ff0c74f7d4 test errors 2015-12-17 10:00:16 +13:00
Nick Cameron
6309b0f5bb move error handling from libsyntax/diagnostics.rs to libsyntax/errors/*
Also split out emitters into their own module.
2015-12-17 09:35:50 +13:00
Huon Wilson
41f7f0c341 Add some unicode aliases for ". 2015-11-18 17:16:32 +11:00
Ravi Shankar
7f63c7cf4c Detect confusing unicode characters and show the alternative 2015-11-17 12:14:28 +05:30
Steve Klabnik
00e9ad1df8 Improve error message for char literals
If you try to put something that's bigger than a char into a char
literal, you get an error:

    fn main() {
        let c = 'ஶ்ரீ';
    }

    error: unterminated character constant:

This is a very compiler-centric message. Yes, it's technically
'unterminated', but that's not what you, the user did wrong.

Instead, this commit changes it to

    error: character literal may only contain one codepoint

As this actually tells you what went wrong.

Fixes #28851
2015-11-05 09:34:14 +01:00
Eli Friedman
329e487e58 Start pushing panics outward in lexer. 2015-10-27 20:09:10 -07:00
Barosl Lee
c7fa52df34 Prevent /**/ from being parsed as a doc comment
Previously, `/**/` was incorrectly regarded as a doc comment because it
starts with `/**` and ends with `*/`. However, this caused an ICE
because some code assumed that the length of a doc comment is at least
5. This commit adds an additional check to `is_block_doc_comment` that
tests the length of the input.

Fixes #28844.
2015-10-10 04:49:31 +09:00
Ms2ger
b093060c2a Stop re-exporting AttrStyle's variants and rename them. 2015-10-01 18:03:34 +02:00
Alex Crichton
48615a68fb std: Account for CRLF in {str, BufRead}::lines
This commit is an implementation of [RFC 1212][rfc] which tweaks the behavior of
the `str::lines` and `BufRead::lines` iterators. Both iterators now account for
`\r\n` sequences in addition to `\n`, allowing for less surprising behavior
across platforms (especially in the `BufRead` case). Splitting *only* on the
`\n` character can still be achieved with `split('\n')` in both cases.

The `str::lines_any` function is also now deprecated as `str::lines` is a
drop-in replacement for it.

[rfc]: https://github.com/rust-lang/rfcs/blob/master/text/1212-line-endings.md

Closes #28032
2015-09-03 23:01:41 -07:00
Vadim Petrochenkov
405c616eaf Use consistent terminology for byte string literals
Avoid confusion with binary integer literals and binary operator expressions in libsyntax
2015-09-03 10:54:53 +03:00
Simonas Kazlauskas
cca0ea718d Replace illegal with invalid in most diagnostics 2015-07-29 01:59:31 +03:00
Nick Cameron
f47d20aecd Use a span from the correct file for the inner span of a module
This basically only affects modules which are empty (or only contain comments).

Closes #26755
2015-07-21 21:55:19 +12:00
bors
07be6299d8 Auto merge of #26947 - nagisa:unicode-escape-error, r=nrc
Inspired by the now-mysteriously-closed https://github.com/rust-lang/rust/pull/26782.

This PR introduces better error messages when unicode escapes have invalid format (e.g. `\uFFFF`). It also makes rustc always tell the user that escape may not be used in byte-strings and bytes and fixes some spans to not include unecessary characters and include escape backslash in some others.
2015-07-13 04:00:49 +00:00
Simonas Kazlauskas
4d65ef4549 Tell unicode escapes can’t be used as bytes earlier/more 2015-07-13 02:09:22 +03:00
Wesley Wiser
93ddee6cee Change some instances of .connect() to .join() 2015-07-10 19:40:46 -04:00
Simonas Kazlauskas
d22f189da1 Improve some of the string escape diagnostic spans 2015-07-10 22:28:51 +03:00
Simonas Kazlauskas
0bd5dd6449 Improve incomplete unicode escape reporting
This improves diagnostic messages when \u escape is used incorrectly and { is
missing. Instead of saying “unknown character escape: u”, it will now report
that unicode escape sequence is incomplete and suggest what the correct syntax
is.
2015-07-10 22:26:19 +03:00
Ariel Ben-Yehuda
a18d9842ed Make the unused_mut lint smarter with respect to locals.
Fixes #26332
2015-07-01 00:12:12 +03:00
Yongqian Li
f21682ba2d fix minor indentation issues 2015-06-22 15:30:56 -07:00
bors
c23a9d42ea Auto merge of #25387 - eddyb:syn-file-loader, r=nikomatsakis
This allows compiling entire crates from memory or preprocessing source files before they are tokenized.

Minor API refactoring included, which is a [breaking-change] for libsyntax users:
* `ParseSess::{next_node_id, reserve_node_ids}` moved to rustc's `Session`
* `new_parse_sess` -> `ParseSess::new`
* `new_parse_sess_special_handler` -> `ParseSess::with_span_handler`
* `mk_span_handler` -> `SpanHandler::new`
* `default_handler` -> `Handler::new`
* `mk_handler` -> `Handler::with_emitter`
* `string_to_filemap(sess source, path)` -> `sess.codemap().new_filemap(path, source)`
2015-05-17 00:05:34 +00:00
Lee Jeffery
2dcc200be0 Fix stupid mistake from previous commit 2015-05-14 18:28:28 +01:00
Lee Jeffery
93af5f9b44 Make BytePos calculation same as original 2015-05-14 18:19:51 +01:00
Eduard Burtescu
f786437bd2 syntax: refactor (Span)Handler and ParseSess constructors to be methods. 2015-05-14 01:47:56 +03:00
Lee Jeffery
4f82c3151b Added test to check that newlines are stripped from comments 2015-05-13 22:06:26 +01:00
Lee Jeffery
aef0581513 Fix byte offset and error message inconsistencies 2015-05-13 22:05:01 +01:00
Lee Jeffery
a76244fcef Fix CRLF line-ending parsing for comments. 2015-05-08 20:33:58 +01:00
Geoffry Song
2d9831dea5 Interpolate AST nodes in quasiquote.
This changes the `ToTokens` implementations for expressions, statements,
etc. with almost-trivial ones that produce `Interpolated(*Nt(...))`
pseudo-tokens. In this way, quasiquote now works the same way as macros
do: already-parsed AST fragments are used as-is, not reparsed.

The `ToSource` trait is removed. Quasiquote no longer involves
pretty-printing at all, which removes the need for the
`encode_with_hygiene` hack. All associated machinery is removed.

A new `Nonterminal` is added, NtArm, which the parser now interpolates.
This is just for quasiquote, not macros (although it could be in the
future).

`ToTokens` is no longer implemented for `Arg` (although this could be
added again) and `Generics` (which I don't think makes sense).

This breaks any compiler extensions that relied on the ability of
`ToTokens` to turn AST fragments back into inspectable token trees. For
this reason, this closes #16987.

As such, this is a [breaking-change].

Fixes #16472.
Fixes #15962.
Fixes #17397.
Fixes #16617.
2015-04-25 21:42:10 -04:00
Johannes Oertel
07cc7d9960 Change name of unit test sub-module to "tests".
Changes the style guidelines regarding unit tests to recommend using a
sub-module named "tests" instead of "test" for unit tests as "test"
might clash with imports of libtest.
2015-04-24 23:06:41 +02:00
Erick Tryzelaar
19c8d70174 syntax: Copy unstable str::char_at into libsyntax 2015-04-21 10:23:53 -07:00
Erick Tryzelaar
2937cce70c syntax: Replace String::from_str with the stable String::from 2015-04-21 10:08:27 -07:00
Erick Tryzelaar
cfb9d286ea syntax: remove uses of .into_cow() 2015-04-21 10:08:26 -07:00
Tamir Duberstein
10f15e72e6 Negative case of len() -> is_empty()
`s/([^\(\s]+\.)len\(\) [(?:!=)>] 0/!$1is_empty()/g`
2015-04-14 20:26:03 -07:00