2019-02-21 06:24:42 -06:00
|
|
|
//! The Rust parser.
|
|
|
|
//!
|
2021-12-12 10:06:40 -06:00
|
|
|
//! NOTE: The crate is undergoing refactors, don't believe everything the docs
|
|
|
|
//! say :-)
|
|
|
|
//!
|
2019-02-21 06:24:42 -06:00
|
|
|
//! The parser doesn't know about concrete representation of tokens and syntax
|
2021-12-12 10:06:40 -06:00
|
|
|
//! trees. Abstract [`TokenSource`] and [`TreeSink`] traits are used instead. As
|
|
|
|
//! a consequence, this crate does not contain a lexer.
|
2019-02-21 06:24:42 -06:00
|
|
|
//!
|
2021-08-03 22:57:31 -05:00
|
|
|
//! The [`Parser`] struct from the [`parser`] module is a cursor into the
|
|
|
|
//! sequence of tokens. Parsing routines use [`Parser`] to inspect current
|
|
|
|
//! state and advance the parsing.
|
2019-02-21 06:24:42 -06:00
|
|
|
//!
|
2021-08-03 22:57:31 -05:00
|
|
|
//! The actual parsing happens in the [`grammar`] module.
|
2019-02-21 06:24:42 -06:00
|
|
|
//!
|
2021-08-03 22:57:31 -05:00
|
|
|
//! Tests for this crate live in the `syntax` crate.
|
|
|
|
//!
|
|
|
|
//! [`Parser`]: crate::parser::Parser
|
2022-07-20 07:59:42 -05:00
|
|
|
|
|
|
|
#![warn(rust_2018_idioms, unused_lifetimes, semicolon_in_expressions_from_macros)]
|
2021-08-03 22:57:31 -05:00
|
|
|
#![allow(rustdoc::private_intra_doc_links)]
|
2021-09-06 10:42:07 -05:00
|
|
|
|
2021-12-18 06:31:50 -06:00
|
|
|
mod lexed_str;
|
2019-02-21 04:27:45 -06:00
|
|
|
mod token_set;
|
|
|
|
mod syntax_kind;
|
|
|
|
mod event;
|
|
|
|
mod parser;
|
|
|
|
mod grammar;
|
2021-12-25 12:59:02 -06:00
|
|
|
mod input;
|
|
|
|
mod output;
|
2021-12-26 07:47:10 -06:00
|
|
|
mod shortcuts;
|
2019-02-21 04:27:45 -06:00
|
|
|
|
2021-12-12 12:32:58 -06:00
|
|
|
#[cfg(test)]
|
|
|
|
mod tests;
|
|
|
|
|
2019-02-21 04:27:45 -06:00
|
|
|
pub(crate) use token_set::TokenSet;
|
|
|
|
|
internal: replace TreeSink with a data structure
The general theme of this is to make parser a better independent
library.
The specific thing we do here is replacing callback based TreeSink with
a data structure. That is, rather than calling user-provided tree
construction methods, the parser now spits out a very bare-bones tree,
effectively a log of a DFS traversal.
This makes the parser usable without any *specifc* tree sink, and allows
us to, eg, move tests into this crate.
Now, it's also true that this is a distinction without a difference, as
the old and the new interface are equivalent in expressiveness. Still,
this new thing seems somewhat simpler. But yeah, I admit I don't have a
suuper strong motivation here, just a hunch that this is better.
2021-12-19 08:36:23 -06:00
|
|
|
pub use crate::{
|
2021-12-25 12:59:02 -06:00
|
|
|
input::Input,
|
internal: replace TreeSink with a data structure
The general theme of this is to make parser a better independent
library.
The specific thing we do here is replacing callback based TreeSink with
a data structure. That is, rather than calling user-provided tree
construction methods, the parser now spits out a very bare-bones tree,
effectively a log of a DFS traversal.
This makes the parser usable without any *specifc* tree sink, and allows
us to, eg, move tests into this crate.
Now, it's also true that this is a distinction without a difference, as
the old and the new interface are equivalent in expressiveness. Still,
this new thing seems somewhat simpler. But yeah, I admit I don't have a
suuper strong motivation here, just a hunch that this is better.
2021-12-19 08:36:23 -06:00
|
|
|
lexed_str::LexedStr,
|
2021-12-25 12:59:02 -06:00
|
|
|
output::{Output, Step},
|
2021-12-26 07:47:10 -06:00
|
|
|
shortcuts::StrStep,
|
internal: replace TreeSink with a data structure
The general theme of this is to make parser a better independent
library.
The specific thing we do here is replacing callback based TreeSink with
a data structure. That is, rather than calling user-provided tree
construction methods, the parser now spits out a very bare-bones tree,
effectively a log of a DFS traversal.
This makes the parser usable without any *specifc* tree sink, and allows
us to, eg, move tests into this crate.
Now, it's also true that this is a distinction without a difference, as
the old and the new interface are equivalent in expressiveness. Still,
this new thing seems somewhat simpler. But yeah, I admit I don't have a
suuper strong motivation here, just a hunch that this is better.
2021-12-19 08:36:23 -06:00
|
|
|
syntax_kind::SyntaxKind,
|
|
|
|
};
|
2019-02-21 04:27:45 -06:00
|
|
|
|
2021-12-27 08:54:51 -06:00
|
|
|
/// Parse the whole of the input as a given syntactic construct.
|
|
|
|
///
|
|
|
|
/// This covers two main use-cases:
|
|
|
|
///
|
|
|
|
/// * Parsing a Rust file.
|
|
|
|
/// * Parsing a result of macro expansion.
|
|
|
|
///
|
|
|
|
/// That is, for something like
|
|
|
|
///
|
|
|
|
/// ```
|
|
|
|
/// quick_check! {
|
|
|
|
/// fn prop() {}
|
|
|
|
/// }
|
|
|
|
/// ```
|
|
|
|
///
|
|
|
|
/// the input to the macro will be parsed with [`PrefixEntryPoint::Item`], and
|
2022-01-02 09:41:32 -06:00
|
|
|
/// the result will be [`TopEntryPoint::MacroItems`].
|
2021-12-27 08:54:51 -06:00
|
|
|
///
|
2022-01-02 09:41:32 -06:00
|
|
|
/// [`TopEntryPoint::parse`] makes a guarantee that
|
|
|
|
/// * all input is consumed
|
|
|
|
/// * the result is a valid tree (there's one root node)
|
2021-12-27 08:54:51 -06:00
|
|
|
#[derive(Debug)]
|
|
|
|
pub enum TopEntryPoint {
|
|
|
|
SourceFile,
|
|
|
|
MacroStmts,
|
|
|
|
MacroItems,
|
|
|
|
Pattern,
|
|
|
|
Type,
|
|
|
|
Expr,
|
2022-01-02 08:45:18 -06:00
|
|
|
/// Edge case -- macros generally don't expand to attributes, with the
|
|
|
|
/// exception of `cfg_attr` which does!
|
2021-12-27 08:54:51 -06:00
|
|
|
MetaItem,
|
|
|
|
}
|
|
|
|
|
|
|
|
impl TopEntryPoint {
|
|
|
|
pub fn parse(&self, input: &Input) -> Output {
|
2022-07-20 08:02:08 -05:00
|
|
|
let entry_point: fn(&'_ mut parser::Parser<'_>) = match self {
|
2021-12-27 08:54:51 -06:00
|
|
|
TopEntryPoint::SourceFile => grammar::entry::top::source_file,
|
|
|
|
TopEntryPoint::MacroStmts => grammar::entry::top::macro_stmts,
|
|
|
|
TopEntryPoint::MacroItems => grammar::entry::top::macro_items,
|
2022-01-02 08:32:15 -06:00
|
|
|
TopEntryPoint::Pattern => grammar::entry::top::pattern,
|
2022-01-02 08:45:18 -06:00
|
|
|
TopEntryPoint::Type => grammar::entry::top::type_,
|
2022-01-02 08:52:05 -06:00
|
|
|
TopEntryPoint::Expr => grammar::entry::top::expr,
|
2022-01-02 09:41:32 -06:00
|
|
|
TopEntryPoint::MetaItem => grammar::entry::top::meta_item,
|
2021-12-27 08:54:51 -06:00
|
|
|
};
|
|
|
|
let mut p = parser::Parser::new(input);
|
|
|
|
entry_point(&mut p);
|
|
|
|
let events = p.finish();
|
2022-01-02 09:41:32 -06:00
|
|
|
let res = event::process(events);
|
|
|
|
|
|
|
|
if cfg!(debug_assertions) {
|
|
|
|
let mut depth = 0;
|
|
|
|
let mut first = true;
|
|
|
|
for step in res.iter() {
|
|
|
|
assert!(depth > 0 || first);
|
|
|
|
first = false;
|
|
|
|
match step {
|
|
|
|
Step::Enter { .. } => depth += 1,
|
|
|
|
Step::Exit => depth -= 1,
|
2023-02-07 11:08:05 -06:00
|
|
|
Step::FloatSplit { ends_in_dot: has_pseudo_dot } => {
|
|
|
|
depth -= 1 + !has_pseudo_dot as usize
|
|
|
|
}
|
2023-02-07 08:21:37 -06:00
|
|
|
Step::Token { .. } | Step::Error { .. } => (),
|
2022-01-02 09:41:32 -06:00
|
|
|
}
|
|
|
|
}
|
|
|
|
assert!(!first, "no tree at all");
|
2023-02-07 08:21:37 -06:00
|
|
|
assert_eq!(depth, 0, "unbalanced tree");
|
2022-01-02 09:41:32 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
res
|
2021-12-27 08:54:51 -06:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-01-02 09:46:01 -06:00
|
|
|
/// Parse a prefix of the input as a given syntactic construct.
|
|
|
|
///
|
|
|
|
/// This is used by macro-by-example parser to implement things like `$i:item`
|
|
|
|
/// and the naming of variants follows the naming of macro fragments.
|
|
|
|
///
|
|
|
|
/// Note that this is generally non-optional -- the result is intentionally not
|
|
|
|
/// `Option<Output>`. The way MBE work, by the time we *try* to parse `$e:expr`
|
|
|
|
/// we already commit to expression. In other words, this API by design can't be
|
|
|
|
/// used to implement "rollback and try another alternative" logic.
|
|
|
|
#[derive(Debug)]
|
|
|
|
pub enum PrefixEntryPoint {
|
|
|
|
Vis,
|
|
|
|
Block,
|
|
|
|
Stmt,
|
|
|
|
Pat,
|
2023-04-24 15:21:37 -05:00
|
|
|
PatTop,
|
2022-01-02 09:46:01 -06:00
|
|
|
Ty,
|
|
|
|
Expr,
|
|
|
|
Path,
|
|
|
|
Item,
|
|
|
|
MetaItem,
|
|
|
|
}
|
|
|
|
|
|
|
|
impl PrefixEntryPoint {
|
|
|
|
pub fn parse(&self, input: &Input) -> Output {
|
2022-07-20 08:02:08 -05:00
|
|
|
let entry_point: fn(&'_ mut parser::Parser<'_>) = match self {
|
2022-01-02 09:46:01 -06:00
|
|
|
PrefixEntryPoint::Vis => grammar::entry::prefix::vis,
|
|
|
|
PrefixEntryPoint::Block => grammar::entry::prefix::block,
|
|
|
|
PrefixEntryPoint::Stmt => grammar::entry::prefix::stmt,
|
|
|
|
PrefixEntryPoint::Pat => grammar::entry::prefix::pat,
|
2023-04-24 15:21:37 -05:00
|
|
|
PrefixEntryPoint::PatTop => grammar::entry::prefix::pat_top,
|
2022-01-02 09:46:01 -06:00
|
|
|
PrefixEntryPoint::Ty => grammar::entry::prefix::ty,
|
|
|
|
PrefixEntryPoint::Expr => grammar::entry::prefix::expr,
|
|
|
|
PrefixEntryPoint::Path => grammar::entry::prefix::path,
|
|
|
|
PrefixEntryPoint::Item => grammar::entry::prefix::item,
|
|
|
|
PrefixEntryPoint::MetaItem => grammar::entry::prefix::meta_item,
|
|
|
|
};
|
|
|
|
let mut p = parser::Parser::new(input);
|
|
|
|
entry_point(&mut p);
|
|
|
|
let events = p.finish();
|
|
|
|
event::process(events)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-02-21 06:24:42 -06:00
|
|
|
/// A parsing function for a specific braced-block.
|
2022-07-20 08:02:08 -05:00
|
|
|
pub struct Reparser(fn(&mut parser::Parser<'_>));
|
2019-02-21 04:27:45 -06:00
|
|
|
|
|
|
|
impl Reparser {
|
2019-02-21 06:24:42 -06:00
|
|
|
/// If the node is a braced block, return the corresponding `Reparser`.
|
2019-02-21 04:27:45 -06:00
|
|
|
pub fn for_node(
|
|
|
|
node: SyntaxKind,
|
|
|
|
first_child: Option<SyntaxKind>,
|
|
|
|
parent: Option<SyntaxKind>,
|
|
|
|
) -> Option<Reparser> {
|
|
|
|
grammar::reparser(node, first_child, parent).map(Reparser)
|
|
|
|
}
|
|
|
|
|
2019-02-21 06:24:42 -06:00
|
|
|
/// Re-parse given tokens using this `Reparser`.
|
|
|
|
///
|
|
|
|
/// Tokens must start with `{`, end with `}` and form a valid brace
|
|
|
|
/// sequence.
|
2021-12-25 12:59:02 -06:00
|
|
|
pub fn parse(self, tokens: &Input) -> Output {
|
2019-02-21 04:37:32 -06:00
|
|
|
let Reparser(r) = self;
|
2021-11-14 13:13:44 -06:00
|
|
|
let mut p = parser::Parser::new(tokens);
|
2019-02-21 04:37:32 -06:00
|
|
|
r(&mut p);
|
|
|
|
let events = p.finish();
|
internal: replace TreeSink with a data structure
The general theme of this is to make parser a better independent
library.
The specific thing we do here is replacing callback based TreeSink with
a data structure. That is, rather than calling user-provided tree
construction methods, the parser now spits out a very bare-bones tree,
effectively a log of a DFS traversal.
This makes the parser usable without any *specifc* tree sink, and allows
us to, eg, move tests into this crate.
Now, it's also true that this is a distinction without a difference, as
the old and the new interface are equivalent in expressiveness. Still,
this new thing seems somewhat simpler. But yeah, I admit I don't have a
suuper strong motivation here, just a hunch that this is better.
2021-12-19 08:36:23 -06:00
|
|
|
event::process(events)
|
2019-02-21 04:37:32 -06:00
|
|
|
}
|
2019-02-21 04:27:45 -06:00
|
|
|
}
|