Documentation

Lean.Parser.Basic

Basic Lean parser infrastructure #

The Lean parser was developed with the following primary goals in mind:

Given these constraints, we decided to implement a combinatoric, non-monadic, lexer-less, memoizing recursive-descent parser. Using combinators instead of some more formal and introspectible grammar representation ensures ultimate flexibility as well as efficient extensibility: there is (almost) no pre-processing necessary when extending the grammar with a new parser. However, because all the results the combinators produce are of the homogeneous Syntax type, the basic parser type is not actually a monad but a monomorphic linear function ParserState → ParserState, avoiding constructing and deconstructing countless monadic return values. Instead of explicitly returning syntax objects, parsers push (zero or more of) them onto a syntax stack inside the linear state. Chaining parsers via >> accumulates their output on the stack. Combinators such as node then pop off all syntax objects produced during their invocation and wrap them in a single Syntax.node object that is again pushed on this stack. Instead of calling node directly, we usually use the macro leading_parser p, which unfolds to node k p where the new syntax node kind k is the name of the declaration being defined.

The lack of a dedicated lexer ensures we can modify and replace the lexical grammar at any point, and simplifies detecting and propagating whitespace. The parser still has a concept of "tokens", however, and caches the most recent one for performance: when tokenFn is called twice at the same position in the input, it will reuse the result of the first call. tokenFn recognizes some built-in variable-length tokens such as identifiers as well as any fixed token in the ParserContext's TokenTable (a trie); however, the same cache field and strategy could be reused by custom token parsers. Tokens also play a central role in the prattParser combinator, which selects a leading parser followed by zero or more trailing parsers based on the current token (via peekToken); see the documentation of prattParser for more details. Tokens are specified via the symbol parser, or with symbolNoWs for tokens that should not be preceded by whitespace.

The Parser type is extended with additional metadata over the mere parsing function to propagate token information: collectTokens collects all tokens within a parser for registering. firstTokens holds information about the "FIRST" token set used to speed up parser selection in prattParser. This approach of combining static and dynamic information in the parser type is inspired by the paper "Deterministic, Error-Correcting Combinator Parsers" by Swierstra and Duponcheel. If multiple parsers accept the same current token, prattParser tries all of them using the backtracking longestMatchFn combinator. This is the only case where standard parsers might execute arbitrary backtracking. Repeated invocations of the same category or concrete parser at the same position are cached where possible; see withCache.

Finally, error reporting follows the standard combinatoric approach of collecting a single unexpected token/... and zero or more expected tokens (see Error below). Expected tokens are e.g. set by symbol and merged by <|>. Combinators running multiple parsers should check if an error message is set in the parser state (hasError) and act accordingly. Error recovery is left to the designer of the specific language; for example, Lean's top-level parseCommand loop skips tokens until the next command keyword on error.

Equations
  • One or more equations did not get rendered due to their size.
Equations
@[noinline]
Equations

The andthen(p, q) combinator, usually written as adjacency in syntax declarations (p q), parses p followed by q.

The arity of this parser is the sum of the arities of p and q: that is, it accumulates all the nodes produced by p followed by the nodes from q into the list of arguments to the surrounding parse node.

Equations
@[noinline]
Equations
Equations

Generate an error at the position saved with the withPosition combinator. If delta == true, then it reports at saved position+1. This useful to make sure a parser consumed at least one character.

Equations

Succeeds if c.prec <= prec

Equations

Succeeds if c.lhsPrec >= prec

Equations
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.mergeOrElseErrors (s : ParserState) (error1 : Error) (iniPos : String.Pos) (mergeErrors : Bool) :
Equations
Equations
  • One or more equations did not get rendered due to their size.
@[noinline]
Equations

Run p, falling back to q if p failed without consuming any input.

NOTE: In order for the pretty printer to retrace an orelse, p must be a call to node or some other parser producing a single node kind. Nested orelse calls are flattened for this, i.e. (node k1 p1 <|> node k2 p2) <|> ... is fine as well.

Equations
@[noinline]
Equations
Equations
  • One or more equations did not get rendered due to their size.

The atomic(p) parser parses p, returns the same result as p and fails iff p fails, but if p fails after consuming some tokens atomic(p) will fail without consuming tokens. This is important for the p <|> q combinator, because it is not backtracking, and will fail if p fails after consuming some tokens. To get backtracking behavior, use atomic(p) <|> q instead.

This parser has the same arity as p - it produces the same result as p.

Equations

Information about the state of the parse prior to the failing parser's execution

  • initialPos : String.Pos

    The position prior to the failing parser

  • initialSize : Nat

    The syntax stack height prior to the failing parser's execution

Instances For

Recover from errors in p using recover to consume input until a known-good state has appeared. If recover fails itself, then no recovery is performed.

recover is provided with information about the failing parser's effects , and it is run in the state immediately after the failure.

Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.recover' (parser : Parser) (handler : RecoveryContextParser) :

Recover from errors in parser using handler to consume input until a known-good state has appeared. If handler fails itself, then no recovery is performed.

handler is provided with information about the failing parser's effects , and it is run in the state immediately after the failure.

The interactions between <|> and recover' are subtle, especially for syntactic categories that admit user extension. Consider avoiding it in these cases.

Equations
def Lean.Parser.recover (parser handler : Parser) :

Recover from errors in parser using handler to consume input until a known-good state has appeared. If handler fails itself, then no recovery is performed.

handler is run in the state immediately after the failure.

The interactions between <|> and recover are subtle, especially for syntactic categories that admit user extension. Consider avoiding it in these cases.

Equations
@[noinline]
Equations

lookahead(p) runs p and fails if p does, but it produces no parse nodes and rewinds the position to the original state on success. So for example lookahead("=>") will ensure that the next token is "=>", without actually consuming this token.

This parser has arity 0 - it does not capture anything.

Equations
Equations
  • One or more equations did not get rendered due to their size.

notFollowedBy(p, "foo") succeeds iff p fails; if p succeeds then it fails with the message "unexpected foo".

This parser has arity 0 - it does not capture anything.

Equations
def Lean.Parser.sepByFn (allowTrailingSep : Bool) (p sep : ParserFn) :
Equations
def Lean.Parser.sepBy1Fn (allowTrailingSep : Bool) (p sep : ParserFn) :
Equations
@[noinline]
Equations
@[noinline]
Equations
def Lean.Parser.sepByNoAntiquot (p sep : Parser) (allowTrailingSep : Bool := false) :
Equations
def Lean.Parser.sepBy1NoAntiquot (p sep : Parser) (allowTrailingSep : Bool := false) :
Equations

Apply f to the syntax object produced by p

Equations
@[noinline]
Equations
def Lean.Parser.satisfyFn (p : CharBool) (errorMsg : String := "unexpected character") :
Equations
partial def Lean.Parser.finishCommentBlock (pushMissingOnError : Bool) (nesting : Nat) :
Equations

Consume whitespace and comments

Equations
def Lean.Parser.rawFn (p : ParserFn) (trailingWs : Bool := false) :

Match an arbitrary Parser and return the consumed String in a Syntax.atom.

Equations
def Lean.Parser.chFn (c : Char) (trailingWs : Bool := false) :
Equations
def Lean.Parser.rawCh (c : Char) (trailingWs : Bool := false) :
Equations
Equations
  • One or more equations did not get rendered due to their size.
partial def Lean.Parser.stringGapFn (seenNewline : Bool) :

Parses the whitespace after the \ when there is a string gap. Raises an error if the whitespace does not contain exactly one newline character.

def Lean.Parser.quotedCharCoreFn (isQuotable : CharBool) (inString : Bool) :

Parses a string quotation after a \.

  • isQuotable determines which characters are valid escapes
  • inString enables features that are only valid within strings, in particular "\" newline whitespace* gaps.
Equations
  • One or more equations did not get rendered due to their size.
Equations

Like quotedCharFn but enables escapes that are only valid inside strings. In particular, string gaps ("\" newline whitespace*).

Equations

Push (Syntax.node tk <new-atom>) onto syntax stack if parse was successful.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
partial def Lean.Parser.isRawStrLitStart (input : String) (i : String.Pos) :

Raw strings have the syntax r##...#"..."#...## with zero or more #'s. If we are looking at a valid start to a raw string (r##...#"), returns true. We assume i begins at the position immediately after r.

Parses a raw string literal assuming isRawStrLitStart has returned true. The startPos is the start of the raw string (at the r). The parser state is assumed to be immediately after the r.

Equations

Gives the "unterminated raw string literal" error.

Equations

Parses the #'s and " at the beginning of the raw string. The num variable counts the number of #s after the r.

Parses characters after the first ". If we need to start counting #'s to decide if we are closing the raw string literal, we switch to closingState.

partial def Lean.Parser.rawStrLitFnAux.closingState (startPos : String.Pos) (num closingNum : Nat) :

Parses # characters immediately after a ". The closingNum variable counts the number of #s seen after the ". Note: num > 0 since the num = 0 case is entirely handled by normalState.

partial def Lean.Parser.takeDigitsFn (isDigit : CharBool) (expecting : String) (needDigit : Bool) :

Parses a sequence of the form many (many '_' >> many1 digit), but if needDigit is true the parsed result must be nonempty.

Note: this does not report that it is expecting _ if we reach EOI or an unexpected character. Rationale: this error happens if there is already a _, and while sequences of _ are allowed, it's a bit perverse to suggest extending the sequence.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
def Lean.Parser.mkIdResult (startPos : String.Pos) (tk : Option Token) (val : Name) :
Equations
  • One or more equations did not get rendered due to their size.
Equations
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
Equations
Equations

Check if the following token is the symbol or identifier sym. Useful for parsing local tokens that have not been added to the token table (but may have been so by some unrelated code).

For example, the universe max Function is parsed using this combinator so that it can still be used as an identifier outside of universe (but registering it as a token in a Term Syntax would not break the universe Parser).

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.strAux (sym errorMsg : String) (j : String.Pos) :
Equations
Equations
def Lean.Parser.checkWsBefore (errorMsg : String := "space before") :

The ws parser requires that there is some whitespace at this location. For example, the parser "foo" ws "+" parses foo + or foo/- -/+ but not foo+.

This parser has arity 0 - it does not capture anything.

Equations
Equations
def Lean.Parser.checkLinebreakBefore (errorMsg : String := "line break") :

The linebreak parser requires that there is at least one line break at this location. (The line break may be inside a comment.)

This parser has arity 0 - it does not capture anything.

Equations
def Lean.Parser.checkNoWsBefore (errorMsg : String := "no space before") :

The noWs parser requires that there is no whitespace between the preceding and following parsers. For example, the parser "foo" noWs "+" parses foo+ but not foo +.

This is almost the same as "foo+", but using this parser will make foo+ a token, which may cause problems for the use of "foo" and "+" as separate tokens in other parsers.

This parser has arity 0 - it does not capture anything.

Equations
def Lean.Parser.unicodeSymbolFnAux (sym asciiSym : String) (expected : List String) :
Equations
Equations
Equations

Parses a token and asserts the result is of the given kind. desc is used in error messages as the token kind.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.ParserState.keepPrevError (s : ParserState) (oldStackSize : Nat) (oldStopPos : String.Pos) (oldError : Option Error) (oldLhsPrec : Nat) :
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.ParserState.mergeErrors (s : ParserState) (oldStackSize : Nat) (oldError : Error) :
Equations
  • One or more equations did not get rendered due to their size.
  • s.mergeErrors oldStackSize oldError = s
Equations
  • One or more equations did not get rendered due to their size.
Equations
Equations
def Lean.Parser.runLongestMatchParser (left? : Option Syntax) (startLhsPrec : Nat) (p : ParserFn) :

Auxiliary function used to execute parsers provided to longestMatchFn. Push left? into the stack if it is not none, and execute p.

Remark: p must produce exactly one syntax node. Remark: the left? is not none when we are processing trailing parsers.

Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.longestMatchStep (left? : Option Syntax) (startSize startLhsPrec : Nat) (startPos : String.Pos) (prevPrio prio : Nat) (p : ParserFn) :
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.longestMatchFnAux (left? : Option Syntax) (startSize startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (ps : List (Parser × Nat)) :
Equations
def Lean.Parser.longestMatchFnAux.parse (left? : Option Syntax) (startSize startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (ps : List (Parser × Nat)) :
Equations
  • One or more equations did not get rendered due to their size.
Equations
Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.checkColEq (errorMsg : String := "checkColEq") :

The colEq parser ensures that the next token starts at exactly the column of the saved position (see withPosition). This can be used to do whitespace sensitive syntax like a by block or do block, where all the lines have to line up.

This parser has arity 0 - it does not capture anything.

Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.checkColGe (errorMsg : String := "checkColGe") :

The colGe parser requires that the next token starts from at least the column of the saved position (see withPosition), but allows it to be more indented. This can be used for whitespace sensitive syntax to ensure that a block does not go outside a certain indentation scope. For example it is used in the lean grammar for else if, to ensure that the else is not less indented than the if it matches with.

This parser has arity 0 - it does not capture anything.

Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.checkColGt (errorMsg : String := "checkColGt") :

The colGt parser requires that the next token starts a strictly greater column than the saved position (see withPosition). This can be used for whitespace sensitive syntax for the arguments to a tactic, to ensure that the following tactic is not interpreted as an argument.

example (x : False) : False := by
  revert x
  exact id

Here, the revert tactic is followed by a list of colGt ident, because otherwise it would interpret exact as an identifier and try to revert a variable named exact.

This parser has arity 0 - it does not capture anything.

Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.checkLineEq (errorMsg : String := "checkLineEq") :

The lineEq parser requires that the current token is on the same line as the saved position (see withPosition). This can be used to ensure that composite tokens are not "broken up" across different lines. For example, else if is parsed using lineEq to ensure that the two tokens are on the same line.

This parser has arity 0 - it does not capture anything.

Equations

withPosition(p) runs p while setting the "saved position" to the current position. This has no effect on its own, but various other parsers access this position to achieve some composite effect:

  • colGt, colGe, colEq compare the column of the saved position to the current position, used to implement Python-style indentation sensitive blocks
  • lineEq ensures that the current position is still on the same line as the saved position, used to implement composite tokens

The saved position is only available in the read-only state, which is why this is a scoping parser: after the withPosition(..) block the saved position will be restored to its original value.

This parser has the same arity as p - it just forwards the results of p.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.

withoutPosition(p) runs p without the saved position, meaning that position-checking parsers like colGt will have no effect. This is usually used by bracketing constructs like (...) so that the user can locally override whitespace sensitivity.

This parser has the same arity as p - it just forwards the results of p.

Equations
  • One or more equations did not get rendered due to their size.

withForbidden tk p runs p with tk as a "forbidden token". This means that if the token appears anywhere in p (unless it is nested in withoutForbidden), parsing will immediately stop there, making tk effectively a lowest-precedence operator. This is used for parsers like for x in arr do ...: arr is parsed as withForbidden "do" term because otherwise arr do ... would be treated as an application.

This parser has the same arity as p - it just forwards the results of p.

Equations
  • One or more equations did not get rendered due to their size.

withoutForbidden(p) runs p disabling the "forbidden token" (see withForbidden), if any. This is usually used by bracketing constructs like (...) because there is no parsing ambiguity inside these nested constructs.

This parser has the same arity as p - it just forwards the results of p.

Equations
  • One or more equations did not get rendered due to their size.
Equations
def Lean.Parser.TokenMap.insert {α : Type} (map : TokenMap α) (k : Name) (v : α) :
Equations
Instances For

Specifies how the parsing table lookup function behaves for identifiers.

The function Lean.Parser.prattParser uses two tables: one each for leading and trailing parsers. These tables map tokens to parsers. Because keyword tokens are distinct from identifier tokens, keywords and identifiers cannot be confused, even when they are syntactically identical. Specifying an alternative leading identifier behavior allows greater flexiblity and makes it possible to avoid reserved keywords in some situations.

When the leading token is syntactically an identifier, the current syntax category's LeadingIdentBehavior specifies how the parsing table lookup function behaves, and allows controlled “punning” between identifiers and keywords. This feature is used to avoid creating a reserved symbol for each built-in tactic (e.g., apply or assumption). As a result, tactic names can be used as identifiers.

  • default : LeadingIdentBehavior

    If the leading token is an identifier, then the parser just executes the parsers associated with the auxiliary token “ident”, which parses identifiers.

  • symbol : LeadingIdentBehavior

    If the leading token is an identifier <foo>, and there are parsers P associated with the token <foo>, then the parser executes P. Otherwise, it executes only the parsers associated with the auxiliary token “ident”, which parses identifiers.

  • both : LeadingIdentBehavior

    If the leading token is an identifier <foo>, then it executes the parsers associated with token <foo> and parsers associated with the auxiliary token “ident”, which parses identifiers.

Instances For

Each parser category is implemented using a Pratt's parser. The system comes equipped with the following categories: level, term, tactic, and command. Users and plugins may define extra categories.

The method

categoryParser `term prec

executes the Pratt's parser for category term with precedence prec. That is, only parsers with precedence at least prec are considered. The method termParser prec is equivalent to the method above.

  • declName : Name

    The name of a declaration which will be used as the target of go-to-definition queries and from which doc strings will be extracted. This is a dummy declaration of type Lean.Parser.Category created by declare_syntax_cat, but for builtin categories the declaration is made manually and passed to registerBuiltinParserAttribute.

  • The list of syntax nodes that can parse into this category. This can be used to list all syntaxes in the category.

  • The parsing tables, which consist of a dynamic set of parser functions based on the syntaxes that have been declared so far.

  • The LeadingIdentBehavior, which specifies how the parsing table lookup function behaves for the first identifier to be parsed. This is used by the tactic parser to avoid creating a reserved symbol for each builtin tactic (e.g., apply, assumption, etc.).

Instances For
Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.categoryParser (catName : Name) (prec : Nat) :
Equations
  • One or more equations did not get rendered due to their size.

Antiquotations #

Fail if previous token is immediately followed by ':'.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.mkAntiquot (name : String) (kind : SyntaxNodeKind) (anonymous : Bool := true) (isPseudoKind : Bool := false) :

Define parser for $e (if anonymous == true) and $e:name. kind is embedded in the antiquotation's kind, and checked at syntax match unless isPseudoKind is true. Antiquotations can be escaped as in $$e, which produces the syntax tree for $e.

Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.withAntiquotFn (antiquotP p : ParserFn) (isCatAntiquot : Bool := false) :
Equations
  • One or more equations did not get rendered due to their size.

Optimized version of mkAntiquot ... <|> p.

Equations

Parse $[p]suffix, e.g. $[p],*.

Equations
  • One or more equations did not get rendered due to their size.

Parse suffix after an antiquotation, e.g. $x,*, and put both into a new node.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.nodeWithAntiquot (name : String) (kind : SyntaxNodeKind) (p : Parser) (anonymous : Bool := false) :
Equations

End of Antiquotations #

def Lean.Parser.sepBy (p : Parser) (sep : String) (psep : Parser := symbol sep) (allowTrailingSep : Bool := false) :
Equations
def Lean.Parser.sepBy1 (p : Parser) (sep : String) (psep : Parser := symbol sep) (allowTrailingSep : Bool := false) :
Equations
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Parser.leadingParser (kind : Name) (tables : PrattParsingTables) (behavior : LeadingIdentBehavior) (antiquotParser : ParserFn) :
Equations
def Lean.Parser.prattParser (kind : Name) (tables : PrattParsingTables) (behavior : LeadingIdentBehavior) (antiquotParser : ParserFn) :

Implements a variant of Pratt's algorithm. In Pratt's algorithms tokens have a right and left binding power. In our implementation, parsers have precedence instead. This method selects a parser (or more, via longestMatchFn) from leadingTable based on the current token. Note that the unindexed leadingParsers parsers are also tried. We have the unidexed leadingParsers because some parsers do not have a "first token". Example:

syntax term:51 "≤" ident "<" term "|" term : index

Example, in principle, the set of first tokens for this parser is any token that can start a term, but this set is always changing. Thus, this parsing rule is stored as an unindexed leading parser at leadingParsers. After processing the leading parser, we chain with parsers from trailingTable/trailingParsers that have precedence at least c.prec where c is the ParsingContext. Recall that c.prec is set by categoryParser.

Note that in the original Pratt's algorithm, precedences are only checked before calling trailing parsers. In our implementation, leading and trailing parsers check the precedence. We claim our algorithm is more flexible, modular and easier to understand.

antiquotParser should be a mkAntiquot parser (or always fail) and is tried before all other parsers. It should not be added to the regular leading parsers because it would heavily overlap with antiquotation parsers nested inside them.

Equations
  • One or more equations did not get rendered due to their size.
Equations
  • One or more equations did not get rendered due to their size.
def Lean.Syntax.foldArgsM {m : Type u_1 → Type u_2} [Monad m] {β : Type u_1} (s : Syntax) (f : Syntaxβm β) (b : β) :
m β
Equations
def Lean.Syntax.foldArgs {β : Type u_1} (s : Syntax) (f : Syntaxββ) (b : β) :
β
Equations
def Lean.Syntax.forArgsM {m : TypeType u_1} [Monad m] (s : Syntax) (f : Syntaxm Unit) :
Equations