mikros/rust - rust - Gitea.pterpstra.com

Author	SHA1	Message	Date
Tshepang Mbambo	b66f92197a	rustc_lexer::TokenKind improve docs	2022-10-26 23:32:14 +02:00
bors	6201eabde8	Auto merge of #102302 - nnethercote:more-lexer-improvements, r=matklad More lexer improvements A follow-up to #99884. r? `@matklad`	2022-09-28 08:14:04 +00:00
Nicholas Nethercote	d0a26acb2a	Address review comments.	2022-09-28 11:15:23 +10:00
Nicholas Nethercote	94cb5e86ea	Small cleanups in unescaping code. - Rename `unescape_raw_str_or_raw_byte_str` as `unescape_raw_str_or_byte_str`, which is more accurate. - Remove the unused `Mode::in_single_quotes` method. - Make some assertions more precise, and add a missing one to `unescape_char_or_byte`. - Change all the assertions to `debug_assert!`, because this code is reasonably hot, and the assertions aren't required for memory safety, and any violations are likely to be sufficiently obvious that normal tests will trigger them.	2022-09-28 08:31:24 +10:00
Nicholas Nethercote	c91c64708b	Fix an incorrect comment. If a `\x` escape occurs in a non-byte literals (e.g. char literal, string literal), it must be <= 0xff.	2022-09-27 15:25:34 +10:00
Nicholas Nethercote	da84f0f4c3	Add `rustc_lexer::TokenKind::Eof`. For alignment with `rust_ast::TokenKind::Eof`. Plus it's a bit faster, due to less `Option` manipulation in `StringReader::next_token`.	2022-09-26 13:48:08 +10:00
Nicholas Nethercote	cc0022a363	Rename some things. `Cursor` keeps track of the position within the current token. But it uses confusing names that don't make it clear that the "length consumed" is just within the current token. This commit renames things to make this clearer.	2022-09-26 13:43:19 +10:00
Nicholas Nethercote	aa6bfaf04b	Make `rustc_lexer::cursor::Cursor` public. `Cursor` is currently hidden, and the main tokenization path uses `rustc_lexer::first_token` which involves constructing a new `Cursor` for every single token, which is weird. Also, `first_token` also can't handle empty input, so callers have to check for that first. This commit makes `Cursor` public, so `StringReader` can contain a `Cursor`, which results in a simpler structure. The commit also changes `StringReader::advance_token` so it returns an `Option<Token>`, simplifying the the empty input case.	2022-09-26 13:36:35 +10:00
Takayuki Maeda	bdc865d8f7	remove unnecessary `PartialOrd` and `Ord`	2022-09-08 06:15:33 +09:00
5225225	09ea9f0a87	Add diagnostic translation lints to crates that don't emit them	2022-08-18 19:29:02 +01:00
Nicholas Nethercote	99f5c79d64	Shrink `Token`. From 72 bytes to 12 bytes (on x86-64). There are two parts to this: - Changing various source code offsets from 64-bit to 32-bit. This is not a problem because the rest of rustc also uses 32-bit source code offsets. This means `Token` is no longer `Copy` but this causes no problems. - Removing the `RawStrError` from `LiteralKind`. Raw string literal invalidity is now indicated by a `None` value within `RawStr`/`RawByteStr`, and the new `validate_raw_str` function can be used to re-lex an invalid raw string literal to get the `RawStrError`. There is one very small change in behaviour. Previously, if a raw string literal matched both the `InvalidStarter` and `TooManyHashes` cases, the latter would override the former. This has now changed, because `raw_double_quoted_string` now uses `?` and so returns immediately upon detecting the `InvalidStarter` case. I think this is a slight improvement to report the earlier-detected error, and it explains the change in the `test_too_many_hashes` test. The commit also removes a couple of comments that refer to #77629 and say that the size of these types don't affect performance. These comments are wrong, though the performance effect is small.	2022-08-01 08:53:04 +10:00
Nicholas Nethercote	b4fdf648ea	Inline `first_token`. Because it's tiny and hot.	2022-08-01 08:11:15 +10:00
Proloy Mishra	8c22b6bcac	fix typo in comment	2022-06-28 19:59:09 +05:30
Grisha Vartanyan	38e0ae590c	Reduce max hash in raw strings from u16 to u8	2022-03-23 22:13:55 +01:00
Grisha	4e3dbb3c19	Add test for >65535 hashes in lexing raw string	2022-03-16 06:37:41 +01:00
Nicholas Nethercote	37d9ea745b	Improve `scan_escape`. `scan_escape` currently has a fast path (for when the first char isn't '\\') and a slow path. This commit changes `scan_escape` so it only handles the slow path, i.e. the actual escaping code. The fast path is inlined into the two call sites. This change makes the code faster, because there is no function call overhead on the fast path. (`scan_escape` is a big function and doesn't get inlined.) This change also improves readability, because it removes a bunch of mode checks on the the fast paths.	2022-02-24 17:01:01 +11:00
bors	2a9e0831d6	Auto merge of #91393 - Julian-Wollersberger:lexer_optimization, r=petrochenkov Optimize `rustc_lexer` The `cursor.first()` method in `rustc_lexer` now calls the `chars.next()` method instead of `chars.nth_char(0)`. This allows LLVM to optimize the code better. The biggest win is that `eat_while()` is now fully inlined and generates better assembly. This improves the lexer's performance by 35% in a micro-benchmark I made (Lexing all 18MB of code in the compiler directory). But lexing is only a small part of the overall compilation time, so I don't know how significant it is. Big thanks to criterion and `cargo asm`.	2021-12-03 13:20:14 +00:00
Julian Wollersberger	1f147a2ed7	Replace `nth_char(0)` with `next()` in `cursor.first()` and optimize the iterator returned by `tokenize(). This improves lexer performance by 35%	2021-12-01 19:14:10 +01:00
Esteban Kuber	38979a3ba1	udpate comment to be more accurate	2021-11-23 20:37:23 +00:00
Esteban Kuber	5a68abb094	Tokenize emoji as if they were valid indentifiers In the lexer, consider emojis to be valid identifiers and reject them later to avoid knock down parse errors.	2021-11-23 20:35:07 +00:00
Matthias Krüger	0a5640b55f	use matches!() macro in more places	2021-11-06 16:13:14 +01:00
Matthias Krüger	4457014398	Revert "Auto merge of #89709 - clemenswasser:apply_clippy_suggestions_2, r=petrochenkov" The PR had some unforseen perf regressions that are not as easy to find. Revert the PR for now. This reverts commit `6ae8912a3e`, reversing changes made to `86d6d2b738`.	2021-10-15 11:28:23 +02:00
Clemens Wasser	71dd0b928b	Apply clippy suggestions	2021-10-10 15:38:19 +02:00
Frank Steffahn	2396fad095	Fix more “a”/“an” typos	2021-08-22 17:27:18 +02:00
Anton Golov	07aacf53c5	Renamed variable str -> tail for clarity	2021-08-11 13:57:28 +02:00
Anton Golov	a03fbfe2ff	Warn when an escaped newline skips multiple lines	2021-08-11 11:35:08 +02:00
Anton Golov	5d59b4412e	Add warning when whitespace is not skipped after an escaped newline.	2021-07-30 16:26:39 +02:00
Ibraheem Ahmed	a397fdcc38	Remove ASCII fast path from rustc_lexer::{is_id_continue, is_id_start}	2021-07-26 20:17:28 -04:00
Mara Bos	0eeeebc990	Rename 'bad prefix' to 'unknown prefix'.	2021-06-26 23:11:14 +08:00
Mara Bos	6adce70a58	Improve comments for reserved prefixes. Co-authored-by: Niko Matsakis <niko@alum.mit.edu>	2021-06-26 23:11:13 +08:00
lrh2000	8dee9bc8fc	Reserve prefixed identifiers and string literals (RFC 3101) This commit denies any identifiers immediately followed by one of three tokens `"`, `'` or `#`, which is stricter than the requirements of RFC 3101 but may be necessary according to the discussion at [Zulip]. [Zulip]: https://rust-lang.zulipchat.com/#narrow/stream/268952-edition-2021/topic/reserved.20prefixes/near/238470099	2021-06-26 23:09:43 +08:00
pierwill	0019ca9141	Fix outdated crate names in compiler docs Changes `librustc_X` to `rustc_X`, only in documentation comments. Plain code comments are left unchanged. Also fix incorrect file paths.	2021-04-08 11:12:14 -05:00
Hanzhen Liang	f942c3cbf4	Return EOF_CHAR constant instead of magic char.	2021-01-07 13:20:04 +01:00
Hirochika Matsumoto	56530a2f25	Fix typo	2020-12-18 22:13:25 +09:00
Joshua Nelson	0ad3dce83a	Fix some clippy lints	2020-12-03 17:08:19 -05:00
Joshua Nelson	5339bd1ebe	Add back missing comments	2020-10-30 10:13:41 -04:00
Joshua Nelson	57c6ed0c07	Fix even more clippy warnings	2020-10-30 10:13:39 -04:00
Julian Wollersberger	bd49ded308	Noticed a potential bug in `eat_while()`: it doesn't account for number of UTF8 bytes. Fixed it by inlining it in the two places where the count is used and simplified the logic there.	2020-10-09 11:12:54 +02:00
LingMan	fc20b7841c	Fix typo in rustc_lexer docs Also add an Oxford comma while we're editing that line.	2020-09-21 05:43:39 +02:00
Vadim Petrochenkov	b1491eacfc	lexer: Tiny improvement to shebang detection Lexer now discerns between regular comments and doc comments, so use that. The change only affects the choice of reported errors.	2020-09-02 00:40:19 +03:00
Aleksey Kladov	ccffea5b6b	Move lexer unit tests to rustc_lexer StringReader is an intornal abstraction which at the moment changes a lot, so these unit tests cause quite a bit of friction. Moving them to rustc_lexer and more ingerated-testing style should make them much less annoying, hopefully without decreasing their usefulness much. Note that coloncolon tests are removed (it's unclear what those are testing). \r\n tests are removed as well, as we normalize line endings even before lexing.	2020-08-30 19:53:36 +02:00
mark	9e5f7d5631	mv compiler to compiler/	2020-08-30 18:45:07 +03:00

42 Commits