Remove `TokenStreamBuilder`
`TokenStreamBuilder` is used to combine multiple token streams. It can be removed, leaving the code a little simpler and a little faster.
r? `@Aaron1011`
`TokenStreamBuilder` exists to concatenate multiple `TokenStream`s
together. This commit removes it, and moves the concatenation
functionality directly into `TokenStream`, via two new methods
`push_tree` and `push_stream`. This makes things both simpler and
faster.
`push_tree` is particularly important. `TokenStreamBuilder` only had a
single `push` method, which pushed a stream. But in practice most of the
time we push a single token tree rather than a stream, and `push_tree`
avoids the need to build a token stream with a single entry (which
requires two allocations, one for the `Lrc` and one for the `Vec`).
The main `push_tree` use arises from a change to one of the `ToInternal`
impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of
a `TokenStream`. This return value is then iterated over by
`concat_trees`, which does `push_tree` on each element. Furthermore, the
use of `SmallVec` avoids more allocations, because there is always only
one or two token trees.
Note: the removed `TokenStreamBuilder::push` method had some code to
deal with a quadratic blowup case from #57735. This commit removes the
code. I tried and failed to reproduce the blowup from that PR, before
and after this change. Various other changes have happened to
`TokenStreamBuilder` in the meantime, so I suspect the original problem
is no longer relevant, though I don't have proof of this. Generally
speaking, repeatedly extending a `Vec` without pre-determining its
capacity is *not* quadratic. It's also incredibly common, within rustc
and many other Rust programs, so if there were performance problems
there you'd think it would show up in other places, too.
`TokenTree::Punct` is handled outside the `match`. This commits moves it
inside the `match`, avoiding the need for the `return`s and making it
easier to read.
proc_macro/bridge: send diagnostics over the bridge as a struct
This removes some RPC when creating and emitting diagnostics, and
simplifies the bridge slightly.
After this change, there are no remaining methods which take advantage
of the support for `&mut` references to objects in the store as
arguments, meaning that support for them could technically be removed if
we wanted. The only remaining uses of immutable references into the
store are `TokenStream` and `SourceFile`.
r? `@eddyb`
- Rename `ast::Lit::token` as `ast::Lit::token_lit`, because its type is
`token::Lit`, which is not a token. (This has been confusing me for a
long time.)
reasonable because we have an `ast::token::Lit` inside an `ast::Lit`.
- Rename `LitKind::{from,to}_lit_token` as
`LitKind::{from,to}_token_lit`, to match the above change and
`token::Lit`.
This removes some RPC when creating and emitting diagnostics, and
simplifies the bridge slightly.
After this change, there are no remaining methods which take advantage
of the support for `&mut` references to objects in the store as
arguments, meaning that support for them could technically be removed if
we wanted. The only remaining uses of immutable references into the
store are `TokenStream` and `SourceFile`.
A `TokenStream` contains a `Lrc<Vec<(TokenTree, Spacing)>>`. But this is
not quite right. `Spacing` makes sense for `TokenTree::Token`, but does
not make sense for `TokenTree::Delimited`, because a
`TokenTree::Delimited` cannot be joined with another `TokenTree`.
This commit fixes this problem, by adding `Spacing` to `TokenTree::Token`,
changing `TokenStream` to contain a `Lrc<Vec<TokenTree>>`, and removing the
`TreeAndSpacing` typedef.
The commit removes these two impls:
- `impl From<TokenTree> for TokenStream`
- `impl From<TokenTree> for TreeAndSpacing`
These were useful, but also resulted in code with many `.into()` calls
that was hard to read, particularly for anyone not highly familiar with
the relevant types. This commit makes some other changes to compensate:
- `TokenTree::token()` becomes `TokenTree::token_{alone,joint}()`.
- `TokenStream::token_{alone,joint}()` are added.
- `TokenStream::delimited` is added.
This results in things like this:
```rust
TokenTree::token(token::Semi, stmt.span).into()
```
changing to this:
```rust
TokenStream::token_alone(token::Semi, stmt.span)
```
This makes the type of the result, and its spacing, clearer.
These changes also simplifies `Cursor` and `CursorRef`, because they no longer
need to distinguish between `next` and `next_with_spacing`.
This method is still only used for Literal::subspan, however the
implementation only depends on the Span component, so it is simpler and
more efficient for now to pass down only the information that is needed.
In the future, if more information about the Literal is required in the
implementation (e.g. to validate that spans line up as expected with
source text), that extra information can be added back with extra
arguments.
This builds on the symbol infrastructure built for `Ident` to replicate
the `LitKind` and `Lit` structures in rustc within the `proc_macro`
client, allowing literals to be fully created and interacted with from
the client thread. Only parsing and subspan operations still require
sync RPC.
Doing this for all unicode identifiers would require a dependency on
`unicode-normalization` and `rustc_lexer`, which is currently not
possible for `proc_macro` due to it being built concurrently with `std`
and `core`. Instead, ASCII identifiers are validated locally, and an RPC
message is used to validate unicode identifiers when needed.
String values are interned on the both the server and client when
deserializing, to avoid unnecessary copies and keep Ident cheap to copy and
move. This appears to be important for performance.
The client-side interner is based roughly on the one from rustc_span, and uses
an arena inspired by rustc_arena.
RPC messages passing symbols always include the full value. This could
potentially be optimized in the future if it is revealed to be a
performance bottleneck.
Despite now having a relevant implementaion of Display for Ident, ToString is
still specialized, as it is a hot-path for this object.
The symbol infrastructure will also be used for literals in the next
part.
proc_macro: Fix expand_expr expansion of bool literals
Previously, the expand_expr method would expand bool literals as a
`Literal` token containing a `LitKind::Bool`, rather than as an `Ident`.
This is not a valid token, and the `LitKind::Bool` case needs to be
handled seperately.
Tests were added to more deeply compare the streams in the expand-expr
test suite to catch mistakes like this in the future.
This greatly reduces round-trips to fetch relevant extra information about the
token in proc macro code, and avoids RPC messages to create Group tokens.
This greatly reduces round-trips to fetch relevant extra information about the
token in proc macro code, and avoids RPC messages to create Punct tokens.
Previously, the expand_expr method would expand bool literals as a
`Literal` token containing a `LitKind::Bool`, rather than as an `Ident`.
This is not a valid token, and the `LitKind::Bool` case needs to be
handled seperately.
Tests were added to more deeply compare the streams in the expand-expr
test suite to catch mistakes like this in the future.
It's a weird function: it lets you modify the token stream in the middle
of iteration. There is only one call site, and it is only used for the
rare `ProceduralMasquerade` legacy case.
Batch proc_macro RPC for TokenStream iteration and combination operations
This is the first part of #86822, split off as requested in https://github.com/rust-lang/rust/pull/86822#pullrequestreview-1008655452. It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.
This is an experimental patch to try to reduce the codegen complexity of
TokenStream's FromIterator and Extend implementations for downstream
crates, by moving the core logic into a helper type. This might help
improve build performance of crates which depend on proc_macro as
iterators are used less, and the compiler may take less time to do
things like attempt specializations or other iterator optimizations.
The change intentionally sacrifices some optimization opportunities,
such as using the specializations for collecting iterators derived from
Vec::into_iter() into Vec.
This is one of the simpler potential approaches to reducing the amount
of code generated in crates depending on proc_macro, so it seems worth
trying before other more-involved changes.
This significantly reduces the cost of common interactions with TokenStream
when running with the CrossThread execution strategy, by reducing the number of
RPC calls required.
`MultiSpan` contains labels, which are more complicated with the
introduction of diagnostic translation and will use types from
`rustc_errors` - however, `rustc_errors` depends on `rustc_span` so
`rustc_span` cannot use types like `DiagnosticMessage` without
dependency cycles. Introduce a new `rustc_error_messages` crate that can
contain `DiagnosticMessage` and `MultiSpan`.
Signed-off-by: David Wood <david.wood@huawei.com>
Reduce unnecessary escaping in proc_macro::Literal::character/string
I noticed that https://doc.rust-lang.org/proc_macro/struct.Literal.html#method.character is producing unreadable literals that make macro-expanded code unnecessarily hard to read. Since the proc macro server was using `escape_unicode()`, every char is escaped using `\u{…}` regardless of whether there is any need to do so. For example `Literal::character('=')` would previously produce `'\u{3d}'` which unnecessarily obscures the meaning when reading the macro-expanded code.
I've changed Literal::string also in this PR because `str`'s `Debug` impl is also smarter than just calling `escape_debug` on every char. For example `Literal::string("ferris's")` would previously produce `"ferris\'s"` but will now produce `"ferris's"`.
This feature is aimed at giving proc macros access to powers similar to
those used by builtin macros such as `format_args!` or `concat!`. These
macros are able to accept macros in place of string literal parameters,
such as the format string, as they perform recursive macro expansion
while being expanded.
This can be especially useful in many cases thanks to helper macros like
`concat!`, `stringify!` and `include_str!` which are often used to
construct string literals at compile-time in user code.
For now, this method only allows expanding macros which produce
literals, although more expresisons will be supported before the method
is stabilized.
The only reason to use `abort_if_errors` is when the program is so broken that either:
1. later passes get confused and ICE
2. any diagnostics from later passes would be noise
This is never the case for lints, because the compiler has to be able to deal with `allow`-ed lints.
So it can continue to lint and compile even if there are lint errors.
Encode spans relative to the enclosing item
The aim of this PR is to avoid recomputing queries when code is moved without modification.
MCP at https://github.com/rust-lang/compiler-team/issues/443
This is achieved by :
1. storing the HIR owner LocalDefId information inside the span;
2. encoding and decoding spans relative to the enclosing item in the incremental on-disk cache;
3. marking a dependency to the `source_span(LocalDefId)` query when we translate a span from the short (`Span`) representation to its explicit (`SpanData`) representation.
Since all client code uses `Span`, step 3 ensures that all manipulations
of span byte positions actually create the dependency edge between
the caller and the `source_span(LocalDefId)`.
This query return the actual absolute span of the parent item.
As a consequence, any source code motion that changes the absolute byte position of a node will either:
- modify the distance to the parent's beginning, so change the relative span's hash;
- dirty `source_span`, and trigger the incremental recomputation of all code that
depends on the span's absolute byte position.
With this scheme, I believe the dependency tracking to be accurate.
For the moment, the spans are marked during lowering.
I'd rather do this during def-collection,
but the AST MutVisitor is not practical enough just yet.
The only difference is that we attach macro-expanded spans
to their expansion point instead of the macro itself.
Add proc_macro::Span::{before, after}.
This adds `proc_macro::Span::before()` and `proc_macro::Span::after()` to get a zero width span at the start or end of the span.
These are equivalent to rustc's `Span::shrink_to_lo()` and `Span::shrink_to_hi()` but with a less cryptic name. They are useful when generating diagnostlics like "missing \<thing\> after \<thing\>".
E.g.
```rust
syn::Error::new(ident.span().after(), "missing `:` after field name").into_compile_error()
```