Compiler Services: Processing SyntaxTree
This tutorial demonstrates how to get the SyntaxTree (AST) for F# code and how to walk over the tree. This can be used for creating tools such as code formatter, basic refactoring or code navigation tools. The untyped syntax tree contains information about the code structure, but does not contain types and there are some ambiguities that are resolved only later by the type checker. You can also combine the SyntaxTree information with the API available from editor services.
NOTE: The FSharp.Compiler.Service API is subject to change when later versions of the nuget package are published
Getting the SyntaxTree
To access the untyped AST, you need to create an instance of FSharpChecker
.
This type represents a context for type checking and parsing and corresponds either
to a stand-alone F# script file (e.g. opened in Visual Studio) or to a loaded project
file with multiple files. Once you have an instance of FSharpChecker
, you can
use it to perform "untyped parse" which is the first step of type-checking. The
second phase is "typed parse" and is used by editor services.
To use the interactive checker, reference FSharp.Compiler.Service.dll
and open the
SourceCodeServices
namespace:
#r "FSharp.Compiler.Service.dll"
open System
open FSharp.Compiler.CodeAnalysis
open FSharp.Compiler.Text
Performing untyped parse
The untyped parse operation is very fast (compared to type checking, which can
take notable amount of time) and so we can perform it synchronously. First, we
need to create FSharpChecker
- the constructor takes an argument that
can be used to notify the checker about file changes (which we ignore).
// Create an interactive checker instance
let checker = FSharpChecker.Create()
To get the AST, we define a function that takes file name and the source code
(the file is only used for location information and does not have to exist).
We first need to get "interactive checker options" which represents the context.
For simple tasks, you can use GetProjectOptionsFromScriptRoot
which infers
the context for a script file. Then we use the ParseFile
method and
return the ParseTree
property:
/// Get untyped tree for a specified input
let getUntypedTree (file, input) =
// Get compiler options for the 'project' implied by a single script file
let projOptions, diagnostics =
checker.GetProjectOptionsFromScript(file, input, assumeDotNetFramework=false)
|> Async.RunSynchronously
let parsingOptions, _errors = checker.GetParsingOptionsFromProjectOptions(projOptions)
// Run the first phase (untyped parsing) of the compiler
let parseFileResults =
checker.ParseFile(file, input, parsingOptions)
|> Async.RunSynchronously
parseFileResults.ParseTree
Walking over the AST
The abstract syntax tree is defined as a number of discriminated unions that represent
different syntactical elements (such as expressions, patterns, declarations etc.). The best
way to understand the AST is to look at the definitions in
SyntaxTree.fsi
in the source code.
The relevant parts are in the following namespace:
open FSharp.Compiler.Syntax
When processing the AST, you will typically write a number of mutually recursive functions that pattern match on the different syntactical elements. There is a number of elements that need to be supported - the top-level element is module or namespace declaration, containing declarations inside a module (let bindings, types etc.). A let declaration inside a module then contains expressions, which can contain patterns.
Walking over patterns and expressions
We start by looking at functions that walk over expressions and patterns - as we walk,
we print information about the visited elements. For patterns, the input is of type
SynPat
and has a number of cases including Wild
(for _
pattern), Named
(for
<pat> as name
) and LongIdent
(for a Foo.Bar
name). Note that the parsed pattern
is occasionally more complex than what is in the source code (in particular, Named
is
used more often):
/// Walk over a pattern - this is for example used in
/// let <pat> = <expr> or in the 'match' expression
let rec visitPattern = function
| SynPat.Wild _ ->
printfn " .. underscore pattern"
| SynPat.Named(ident = SynIdent(ident = name)) ->
printfn " .. named as '%s'" name.idText
| SynPat.LongIdent(longDotId = SynLongIdent(id = ident)) ->
let names = String.concat "." [ for i in ident -> i.idText ]
printfn " .. identifier: %s" names
| pat -> printfn " .. other pattern: %A" pat
The function is recursive (for nested patterns such as (foo, _) as bar
), but it does not
call any of the functions defined later (because patterns cannot contain other syntactical
elements).
The next function iterates over expressions - this is where most of the work would be and
there are around 20 cases to cover (type SynExpr.
and you'll get completion with other
options). In the following, we only show how to handle if .. then ..
and let .. = ...
:
/// Walk over an expression - if expression contains two or three
/// sub-expressions (two if the 'else' branch is missing), let expression
/// contains pattern and two sub-expressions
let rec visitExpression e =
match e with
| SynExpr.IfThenElse(ifExpr=cond; thenExpr=trueBranch; elseExpr=falseBranchOpt) ->
// Visit all sub-expressions
printfn "Conditional:"
visitExpression cond
visitExpression trueBranch
falseBranchOpt |> Option.iter visitExpression
| SynExpr.LetOrUse(_, _, bindings, body, _, _) ->
// Visit bindings (there may be multiple
// for 'let .. = .. and .. = .. in ...'
printfn "LetOrUse with the following bindings:"
for binding in bindings do
let (SynBinding(headPat = headPat; expr = init)) = binding
visitPattern headPat
visitExpression init
// Visit the body expression
printfn "And the following body:"
visitExpression body
| expr -> printfn " - not supported expression: %A" expr
The visitExpression
function will be called from a function that visits all top-level
declarations inside a module. In this tutorial, we ignore types and members, but that would
be another source of calls to visitExpression
.
Walking over declarations
As mentioned earlier, the AST of a file contains a number of module or namespace declarations
(top-level node) that contain declarations inside a module (let bindings or types) or inside
a namespace (just types). The following function walks over declarations - we ignore types,
nested modules and all other elements and look only at top-level let
bindings (values and
functions):
/// Walk over a list of declarations in a module. This is anything
/// that you can write as a top-level inside module (let bindings,
/// nested modules, type declarations etc.)
let visitDeclarations decls =
for declaration in decls do
match declaration with
| SynModuleDecl.Let(isRec, bindings, range) ->
// Let binding as a declaration is similar to let binding
// as an expression (in visitExpression), but has no body
for binding in bindings do
let (SynBinding(headPat = pat; expr = body)) = binding
visitPattern pat
visitExpression body
| _ -> printfn " - not supported declaration: %A" declaration
The visitDeclarations
function will be called from a function that walks over a
sequence of module or namespace declarations. This corresponds, for example, to a file
with multiple namespace Foo
declarations:
/// Walk over all module or namespace declarations
/// (basically 'module Foo =' or 'namespace Foo.Bar')
/// Note that there is one implicitly, even if the file
/// does not explicitly define it..
let visitModulesAndNamespaces modulesOrNss =
for moduleOrNs in modulesOrNss do
let (SynModuleOrNamespace(longId = lid; decls = decls)) = moduleOrNs
printfn "Namespace or module: %A" lid
visitDeclarations decls
Now that we have functions that walk over the elements of the AST (starting from declaration, down to expressions and patterns), we can get AST of a sample input and run the above function.
Putting things together
As already discussed, the getUntypedTree
function uses FSharpChecker
to run the first
phase (parsing) on the AST and get back the tree. The function requires F# source code together
with location of the file. The location does not have to exist (it is used only for location
information) and it can be in both Unix and Windows formats:
// Sample input for the compiler service
let input =
"""
let foo() =
let msg = "Hello world"
if true then
printfn "%s" msg
"""
// File name in Unix format
let file = "/home/user/Test.fsx"
// Get the AST of sample F# code
let tree = getUntypedTree(file, SourceText.ofString input)
When you run the code in F# interactive, you can enter tree;;
in the interactive console and
see a pretty printed representation of the data structure - the tree contains a lot of information,
so this is not particularly readable, but it gives you a good idea about how the tree looks.
The returned tree
value is again a discriminated union that can be two different cases - one case
is ParsedInput.SigFile
which represents F# signature file (*.fsi
) and the other one is
ParsedInput.ImplFile
representing regular source code (*.fsx
or *.fs
). The implementation
file contains a sequence of modules or namespaces that we can pass to the function implemented
in the previous step:
// Extract implementation file details
match tree with
| ParsedInput.ImplFile(implFile) ->
// Extract declarations and walk over them
let (ParsedImplFileInput(contents = modules)) = implFile
visitModulesAndNamespaces modules
| _ -> failwith "F# Interface file (*.fsi) not supported."
Summary
In this tutorial, we looked at the basics of working with the untyped abstract syntax tree. This is a comprehensive topic, so it is not possible to explain everything in a single article. The Fantomas project is a good example of a tool based on the untyped AST that can help you understand more. In practice, it is also useful to combine the information here with some information you can obtain from the editor services discussed in the next tutorial.
namespace FSharp
--------------------
namespace Microsoft.FSharp
<summary> Used to parse and check F# source code. </summary>
Get untyped tree for a specified input
type Async = static member AsBeginEnd: computation: ('Arg -> Async<'T>) -> ('Arg * AsyncCallback * objnull -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit) static member AwaitEvent: event: IEvent<'Del,'T> * ?cancelAction: (unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate) static member AwaitIAsyncResult: iar: IAsyncResult * ?millisecondsTimeout: int -> Async<bool> static member AwaitTask: task: Task<'T> -> Async<'T> + 1 overload static member AwaitWaitHandle: waitHandle: WaitHandle * ?millisecondsTimeout: int -> Async<bool> static member CancelDefaultToken: unit -> unit static member Catch: computation: Async<'T> -> Async<Choice<'T,exn>> static member Choice: computations: Async<'T option> seq -> Async<'T option> static member FromBeginEnd: beginAction: (AsyncCallback * objnull -> IAsyncResult) * endAction: (IAsyncResult -> 'T) * ?cancelAction: (unit -> unit) -> Async<'T> + 3 overloads static member FromContinuations: callback: (('T -> unit) * (exn -> unit) * (OperationCanceledException -> unit) -> unit) -> Async<'T> ...
--------------------
type Async<'T>
member FSharpChecker.ParseFile: fileName: string * sourceText: ISourceText * options: FSharpParsingOptions * ?cache: bool * ?userOpName: string -> Async<FSharpParseFileResults>
<summary> The syntax tree resulting from the parse </summary>
Walk over a pattern - this is for example used in
let <pat> = <expr> or in the 'match' expression
module SynPat from FSharp.Compiler.Syntax
--------------------
type SynPat = | Const of constant: SynConst * range: range | Wild of range: range | Named of ident: SynIdent * isThisVal: bool * accessibility: SynAccess option * range: range | Typed of pat: SynPat * targetType: SynType * range: range | Attrib of pat: SynPat * attributes: SynAttributes * range: range | Or of lhsPat: SynPat * rhsPat: SynPat * range: range * trivia: SynPatOrTrivia | ListCons of lhsPat: SynPat * rhsPat: SynPat * range: range * trivia: SynPatListConsTrivia | Ands of pats: SynPat list * range: range | As of lhsPat: SynPat * rhsPat: SynPat * range: range | LongIdent of longDotId: SynLongIdent * extraId: Ident option * typarDecls: SynValTyparDecls option * argPats: SynArgPats * accessibility: SynAccess option * range: range ... member Range: range with get
<summary> Represents a syntax tree for an F# pattern </summary>
<summary> A wildcard '_' in a pattern </summary>
<summary> A name pattern 'ident' </summary>
union case SynIdent.SynIdent: ident: Ident * trivia: FSharp.Compiler.SyntaxTrivia.IdentTrivia option -> SynIdent
--------------------
type SynIdent = | SynIdent of ident: Ident * trivia: IdentTrivia option member Range: range with get
<summary> Represents an identifier with potentially additional trivia information. </summary>
<summary> A long identifier pattern possibly with argument patterns </summary>
union case SynLongIdent.SynLongIdent: id: LongIdent * dotRanges: range list * trivia: FSharp.Compiler.SyntaxTrivia.IdentTrivia option list -> SynLongIdent
--------------------
type SynLongIdent = | SynLongIdent of id: LongIdent * dotRanges: range list * trivia: IdentTrivia option list member Dots: range list with get member IdentsWithTrivia: SynIdent list with get member LongIdent: LongIdent with get member Range: range with get member RangeWithoutAnyExtraDot: range with get member ThereIsAnExtraDotAtTheEnd: bool with get member Trivia: IdentTrivia list with get
<summary> Represents a long identifier with possible '.' at end. Typically dotRanges.Length = lid.Length-1, but they may be same if (incomplete) code ends in a dot, e.g. "Foo.Bar." The dots mostly matter for parsing, and are typically ignored by the typechecker, but if dotRanges.Length = lid.Length, then the parser must have reported an error, so the typechecker is allowed more freedom about typechecking these expressions. LongIdent can be empty list - it is used to denote that name of some AST element is absent (i.e. empty type name in inherit) </summary>
type String = interface IEnumerable<char> interface IEnumerable interface ICloneable interface IComparable interface IComparable<string> interface IConvertible interface IEquatable<string> interface IParsable<string> interface ISpanParsable<string> new: value: nativeptr<char> -> unit + 8 overloads ...
<summary>Represents text as a sequence of UTF-16 code units.</summary>
--------------------
String(value: nativeptr<char>) : String
String(value: char array) : String
String(value: ReadOnlySpan<char>) : String
String(value: nativeptr<sbyte>) : String
String(c: char, count: int) : String
String(value: nativeptr<char>, startIndex: int, length: int) : String
String(value: char array, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int, enc: Text.Encoding) : String
Walk over an expression - if expression contains two or three
sub-expressions (two if the 'else' branch is missing), let expression
contains pattern and two sub-expressions
module SynExpr from FSharp.Compiler.Syntax
--------------------
type SynExpr = | Paren of expr: SynExpr * leftParenRange: range * rightParenRange: range option * range: range | Quote of operator: SynExpr * isRaw: bool * quotedExpr: SynExpr * isFromQueryExpression: bool * range: range | Const of constant: SynConst * range: range | Typed of expr: SynExpr * targetType: SynType * range: range | Tuple of isStruct: bool * exprs: SynExpr list * commaRanges: range list * range: range | AnonRecd of isStruct: bool * copyInfo: (SynExpr * BlockSeparator) option * recordFields: (SynLongIdent * range option * SynExpr) list * range: range * trivia: SynExprAnonRecdTrivia | ArrayOrList of isArray: bool * exprs: SynExpr list * range: range | Record of baseInfo: (SynType * SynExpr * range * BlockSeparator option * range) option * copyInfo: (SynExpr * BlockSeparator) option * recordFields: SynExprRecordField list * range: range | New of isProtected: bool * targetType: SynType * expr: SynExpr * range: range | ObjExpr of objType: SynType * argOptions: (SynExpr * Ident option) option * withKeyword: range option * bindings: SynBinding list * members: SynMemberDefns * extraImpls: SynInterfaceImpl list * newExprRange: range * range: range ... member IsArbExprAndThusAlreadyReportedError: bool with get member Range: range with get member RangeOfFirstPortion: range with get member RangeWithoutAnyExtraDot: range with get
<summary> Represents a syntax tree for F# expressions </summary>
<summary> F# syntax: if expr then expr F# syntax: if expr then expr else expr </summary>
<summary> F# syntax: let pat = expr in expr F# syntax: let f pat1 .. patN = expr in expr F# syntax: let rec f pat1 .. patN = expr in expr F# syntax: use pat = expr in expr </summary>
union case SynBinding.SynBinding: accessibility: SynAccess option * kind: SynBindingKind * isInline: bool * isMutable: bool * attributes: SynAttributes * xmlDoc: FSharp.Compiler.Xml.PreXmlDoc * valData: SynValData * headPat: SynPat * returnInfo: SynBindingReturnInfo option * expr: SynExpr * range: range * debugPoint: DebugPointAtBinding * trivia: FSharp.Compiler.SyntaxTrivia.SynBindingTrivia -> SynBinding
--------------------
type SynBinding = | SynBinding of accessibility: SynAccess option * kind: SynBindingKind * isInline: bool * isMutable: bool * attributes: SynAttributes * xmlDoc: PreXmlDoc * valData: SynValData * headPat: SynPat * returnInfo: SynBindingReturnInfo option * expr: SynExpr * range: range * debugPoint: DebugPointAtBinding * trivia: SynBindingTrivia member RangeOfBindingWithRhs: range with get member RangeOfBindingWithoutRhs: range with get member RangeOfHeadPattern: range with get
<summary> Represents a binding for a 'let' or 'member' declaration </summary>
Walk over a list of declarations in a module. This is anything
that you can write as a top-level inside module (let bindings,
nested modules, type declarations etc.)
<summary> Represents a definition within a module </summary>
<summary> A 'let' definition within a module </summary>
val range: range
--------------------
type range = Range
<summary> Represents a range within a file </summary>
Walk over all module or namespace declarations
(basically 'module Foo =' or 'namespace Foo.Bar')
Note that there is one implicitly, even if the file
does not explicitly define it..
union case SynModuleOrNamespace.SynModuleOrNamespace: longId: LongIdent * isRecursive: bool * kind: SynModuleOrNamespaceKind * decls: SynModuleDecl list * xmlDoc: FSharp.Compiler.Xml.PreXmlDoc * attribs: SynAttributes * accessibility: SynAccess option * range: range * trivia: FSharp.Compiler.SyntaxTrivia.SynModuleOrNamespaceTrivia -> SynModuleOrNamespace
--------------------
type SynModuleOrNamespace = | SynModuleOrNamespace of longId: LongIdent * isRecursive: bool * kind: SynModuleOrNamespaceKind * decls: SynModuleDecl list * xmlDoc: PreXmlDoc * attribs: SynAttributes * accessibility: SynAccess option * range: range * trivia: SynModuleOrNamespaceTrivia member Range: range with get
<summary> Represents the definition of a module or namespace </summary>
Get untyped tree for a specified input
<summary> Functions related to ISourceText objects </summary>
<summary> Creates an ISourceText object from the given string </summary>
module ParsedInput from FSharp.Compiler.Syntax
<summary> Holds operations for working with the untyped abstract syntax tree (<see cref="T:FSharp.Compiler.Syntax.ParsedInput" />). </summary>
--------------------
type ParsedInput = | ImplFile of ParsedImplFileInput | SigFile of ParsedSigFileInput member FileName: string with get member Identifiers: Set<string> with get member QualifiedName: QualifiedNameOfFile with get member Range: range with get member ScopedPragmas: ScopedPragma list with get
<summary> Represents the syntax tree for a parsed implementation or signature file </summary>
<summary> A parsed implementation file </summary>
union case ParsedImplFileInput.ParsedImplFileInput: fileName: string * isScript: bool * qualifiedNameOfFile: QualifiedNameOfFile * scopedPragmas: ScopedPragma list * hashDirectives: ParsedHashDirective list * contents: SynModuleOrNamespace list * flags: bool * bool * trivia: FSharp.Compiler.SyntaxTrivia.ParsedImplFileInputTrivia * identifiers: Set<string> -> ParsedImplFileInput
--------------------
type ParsedImplFileInput = | ParsedImplFileInput of fileName: string * isScript: bool * qualifiedNameOfFile: QualifiedNameOfFile * scopedPragmas: ScopedPragma list * hashDirectives: ParsedHashDirective list * contents: SynModuleOrNamespace list * flags: bool * bool * trivia: ParsedImplFileInputTrivia * identifiers: Set<string> member Contents: SynModuleOrNamespace list with get member FileName: string with get member HashDirectives: ParsedHashDirective list with get member IsExe: bool with get member IsLastCompiland: bool with get member IsScript: bool with get member QualifiedName: QualifiedNameOfFile with get member ScopedPragmas: ScopedPragma list with get member Trivia: ParsedImplFileInputTrivia with get
<summary> Represents the full syntax tree, file name and other parsing information for an implementation file </summary>