navigation

Compiler Services: Processing untyped syntax tree

This tutorial demonstrates how to get the untyped abstract syntax tree (AST) for F# code and how to walk over the tree. This can be used for creating tools such as code formatter, basic refactoring or code navigation tools. The untyped syntax tree contains information about the code structure, but does not contain types and there are some ambiguities that are resolved only later by the type checker. You can also combine the untyped AST information with the API available from editor services.

NOTE: The FSharp.Compiler.Service API is subject to change when later versions of the nuget package are published

Getting the untyped AST

To access the untyped AST, you need to create an instance of FSharpChecker. This type represents a context for type checking and parsing and corresponds either to a stand-alone F# script file (e.g. opened in Visual Studio) or to a loaded project file with multiple files. Once you have an instance of FSharpChecker, you can use it to perform "untyped parse" which is the first step of type-checking. The second phase is "typed parse" and is used by editor services.

To use the interactive checker, reference FSharp.Compiler.Service.dll and open the SourceCodeServices namespace:

#r "FSharp.Compiler.Service.dll"
open System
open FSharp.Compiler.SourceCodeServices
open FSharp.Compiler.Text

Performing untyped parse

The untyped parse operation is very fast (compared to type checking, which can take notable amount of time) and so we can perform it synchronously. First, we need to create FSharpChecker - the constructor takes an argument that can be used to notify the checker about file changes (which we ignore).

// Create an interactive checker instance
let checker = FSharpChecker.Create()

To get the AST, we define a function that takes file name and the source code (the file is only used for location information and does not have to exist). We first need to get "interactive checker options" which represents the context. For simple tasks, you can use GetProjectOptionsFromScriptRoot which infers the context for a script file. Then we use the ParseFile method and return the ParseTree property:

/// Get untyped tree for a specified input
let getUntypedTree (file, input) =
  // Get compiler options for the 'project' implied by a single script file
  let projOptions, errors =
      checker.GetProjectOptionsFromScript(file, input)
      |> Async.RunSynchronously

  let parsingOptions, _errors = checker.GetParsingOptionsFromProjectOptions(projOptions)

  // Run the first phase (untyped parsing) of the compiler
  let parseFileResults =
      checker.ParseFile(file, input, parsingOptions)
      |> Async.RunSynchronously

  match parseFileResults.ParseTree with
  | Some tree -> tree
  | None -> failwith "Something went wrong during parsing!"

Walking over the AST

The abstract syntax tree is defined as a number of discriminated unions that represent different syntactical elements (such as expressions, patterns, declarations etc.). The best way to understand the AST is to look at the definitions in ast.fs in the source code.

The relevant parts are in the following namespace:

open FSharp.Compiler.SyntaxTree

When processing the AST, you will typically write a number of mutually recursive functions that pattern match on the different syntactical elements. There is a number of elements that need to be supported - the top-level element is module or namespace declaration, containing declarations inside a module (let bindings, types etc.). A let declaration inside a module then contains expression, which can contain patterns.

Walking over patterns and expressions

We start by looking at functions that walk over expressions and patterns - as we walk, we print information about the visited elements. For patterns, the input is of type SynPat and has a number of cases including Wild (for _ pattern), Named (for <pat> as name) and LongIdent (for a Foo.Bar name). Note that the parsed pattern is occasionally more complex than what is in the source code (in particular, Named is used more often):

/// Walk over a pattern - this is for example used in
/// let <pat> = <expr> or in the 'match' expression
let rec visitPattern = function
  | SynPat.Wild(_) ->
      printfn "  .. underscore pattern"
  | SynPat.Named(pat, name, _, _, _) ->
      visitPattern pat
      printfn "  .. named as '%s'" name.idText
  | SynPat.LongIdent(LongIdentWithDots(ident, _), _, _, _, _, _) ->
      let names = String.concat "." [ for i in ident -> i.idText ]
      printfn "  .. identifier: %s" names
  | pat -> printfn "  .. other pattern: %A" pat

The function is recursive (for nested patterns such as (foo, _) as bar), but it does not call any of the functions defined later (because patterns cannot contain other syntactical elements).

The next function iterates over expressions - this is where most of the work would be and there are around 20 cases to cover (type SynExpr. and you'll get completion with other options). In the following, we only show how to handle if .. then .. and let .. = ...:

/// Walk over an expression - if expression contains two or three
/// sub-expressions (two if the 'else' branch is missing), let expression
/// contains pattern and two sub-expressions
let rec visitExpression = function
  | SynExpr.IfThenElse(cond, trueBranch, falseBranchOpt, _, _, _, _) ->
      // Visit all sub-expressions
      printfn "Conditional:"
      visitExpression cond
      visitExpression trueBranch
      falseBranchOpt |> Option.iter visitExpression

  | SynExpr.LetOrUse(_, _, bindings, body, _) ->
      // Visit bindings (there may be multiple
      // for 'let .. = .. and .. = .. in ...'
      printfn "LetOrUse with the following bindings:"
      for binding in bindings do
        let (Binding(access, kind, inlin, mutabl, attrs, xmlDoc,
                     data, pat, retInfo, init, m, sp)) = binding
        visitPattern pat
        visitExpression init
      // Visit the body expression
      printfn "And the following body:"
      visitExpression body
  | expr -> printfn " - not supported expression: %A" expr

The visitExpression function will be called from a function that visits all top-level declarations inside a module. In this tutorial, we ignore types and members, but that would be another source of calls to visitExpression.

Walking over declarations

As mentioned earlier, the AST of a file contains a number of module or namespace declarations (top-level node) that contain declarations inside a module (let bindings or types) or inside a namespace (just types). The following functions walks over declarations - we ignore types, nested modules and all other elements and look only at top-level let bindings (values and functions):

/// Walk over a list of declarations in a module. This is anything
/// that you can write as a top-level inside module (let bindings,
/// nested modules, type declarations etc.)
let visitDeclarations decls =
  for declaration in decls do
    match declaration with
    | SynModuleDecl.Let(isRec, bindings, range) ->
        // Let binding as a declaration is similar to let binding
        // as an expression (in visitExpression), but has no body
        for binding in bindings do
          let (Binding(access, kind, inlin, mutabl, attrs, xmlDoc,
                       data, pat, retInfo, body, m, sp)) = binding
          visitPattern pat
          visitExpression body
    | _ -> printfn " - not supported declaration: %A" declaration

The visitDeclarations function will be called from a function that walks over a sequence of module or namespace declarations. This corresponds, for example, to a file with multiple namespace Foo declarations:

/// Walk over all module or namespace declarations
/// (basically 'module Foo =' or 'namespace Foo.Bar')
/// Note that there is one implicitly, even if the file
/// does not explicitly define it..
let visitModulesAndNamespaces modulesOrNss =
  for moduleOrNs in modulesOrNss do
    let (SynModuleOrNamespace(lid, isRec, isMod, decls, xml, attrs, _, m)) = moduleOrNs
    printfn "Namespace or module: %A" lid
    visitDeclarations decls

Now that we have functions that walk over the elements of the AST (starting from declaration, down to expressions and patterns), we can get AST of a sample input and run the above function.

Putting things together

As already discussed, the getUntypedTree function uses FSharpChecker to run the first phase (parsing) on the AST and get back the tree. The function requires F# source code together with location of the file. The location does not have to exist (it is used only for location information) and it can be in both Unix and Windows formats:

// Sample input for the compiler service
let input =
  """
  let foo() =
    let msg = "Hello world"
    if true then
      printfn "%s" msg
  """

// File name in Unix format
let file = "/home/user/Test.fsx"

// Get the AST of sample F# code
let tree = getUntypedTree(file, SourceText.ofString input)

When you run the code in F# interactive, you can enter tree;; in the interactive console and see pretty printed representation of the data structure - the tree contains a lot of information, so this is not particularly readable, but it gives you good idea about how the tree looks.

The returned tree value is again a discriminated union that can be two different cases - one case is ParsedInput.SigFile which represents F# signature file (*.fsi) and the other one is ParsedInput.ImplFile representing regular source code (*.fsx or *.fs). The implementation file contains a sequence of modules or namespaces that we can pass to the function implemented in the previous step:

// Extract implementation file details
match tree with
| ParsedInput.ImplFile(implFile) ->
    // Extract declarations and walk over them
    let (ParsedImplFileInput(fn, script, name, _, _, modules, _)) = implFile
    visitModulesAndNamespaces modules
| _ -> failwith "F# Interface file (*.fsi) not supported."

Summary

In this tutorial, we looked at basic of working with the untyped abstract syntax tree. This is a comprehensive topic, so it is not possible to explain everything in a single article. The Fantomas project is a good example of tool based on the untyped AST that can help you understand more. In practice, it is also useful to combine the information here with some information you can obtain from the editor services discussed in the next tutorial.

namespace System
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
namespace FSharp.Compiler
namespace FSharp.Compiler.SourceCodeServices
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp

--------------------
type FSharpAttribute =
  member Format : context:FSharpDisplayContext -> string
  member AttributeType : FSharpEntity
  member ConstructorArguments : IList<FSharpType * obj>
  member IsUnresolved : bool
  member NamedArguments : IList<FSharpType * string * bool * obj>
namespace FSharp.Compiler.Text
val checker : FSharpChecker
type FSharpChecker =
  member CheckFileInProject : parsed:FSharpParseFileResults * filename:string * fileversion:int * sourceText:ISourceText * options:FSharpProjectOptions * ?textSnapshotInfo:obj * ?userOpName:string -> Async<FSharpCheckFileAnswer>
  member CheckProjectInBackground : options:FSharpProjectOptions * ?userOpName:string -> unit
  member ClearLanguageServiceRootCachesAndCollectAndFinalizeAllTransients : unit -> unit
  member Compile : argv:string [] * ?userOpName:string -> Async<FSharpErrorInfo [] * int>
  member Compile : ast:ParsedInput list * assemblyName:string * outFile:string * dependencies:string list * ?pdbFile:string * ?executable:bool * ?noframework:bool * ?userOpName:string -> Async<FSharpErrorInfo [] * int>
  member CompileToDynamicAssembly : otherFlags:string [] * execute:(TextWriter * TextWriter) option * ?userOpName:string -> Async<FSharpErrorInfo [] * int * Assembly option>
  member CompileToDynamicAssembly : ast:ParsedInput list * assemblyName:string * dependencies:string list * execute:(TextWriter * TextWriter) option * ?debug:bool * ?noframework:bool * ?userOpName:string -> Async<FSharpErrorInfo [] * int * Assembly option>
  member FindBackgroundReferencesInFile : filename:string * options:FSharpProjectOptions * symbol:FSharpSymbol * ?userOpName:string -> Async<seq<range>>
  member GetBackgroundCheckResultsForFileInProject : filename:string * options:FSharpProjectOptions * ?userOpName:string -> Async<FSharpParseFileResults * FSharpCheckFileResults>
  member GetBackgroundParseResultsForFileInProject : filename:string * options:FSharpProjectOptions * ?userOpName:string -> Async<FSharpParseFileResults>
  ...
static member FSharpChecker.Create : ?projectCacheSize:int * ?keepAssemblyContents:bool * ?keepAllBackgroundResolutions:bool * ?legacyReferenceResolver:FSharp.Compiler.ReferenceResolver.Resolver * ?tryGetMetadataSnapshot:FSharp.Compiler.AbstractIL.ILBinaryReader.ILReaderTryGetMetadataSnapshot * ?suggestNamesForErrors:bool * ?keepAllBackgroundSymbolUses:bool * ?enableBackgroundItemKeyStoreAndSemanticClassification:bool -> FSharpChecker
val getUntypedTree : file:string * input:ISourceText -> FSharp.Compiler.SyntaxTree.ParsedInput


 Get untyped tree for a specified input
val file : string
val input : ISourceText
val projOptions : FSharpProjectOptions
val errors : FSharpErrorInfo list
Multiple items
type Async =
  static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
  static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
  static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
  static member AwaitTask : task:Task -> Async<unit>
  static member AwaitTask : task:Task<'T> -> Async<'T>
  static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
  static member CancelDefaultToken : unit -> unit
  static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
  static member Choice : computations:seq<Async<'T option>> -> Async<'T option>
  static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
  ...

--------------------
type Async<'T> =
static member Async.RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:Threading.CancellationToken -> 'T
val parsingOptions : FSharpParsingOptions
val _errors : FSharpErrorInfo list
val parseFileResults : FSharpParseFileResults
union case Option.Some: Value: 'T -> Option<'T>
val tree : FSharp.Compiler.SyntaxTree.ParsedInput
union case Option.None: Option<'T>
val failwith : message:string -> 'T
module SyntaxTree

from FSharp.Compiler
val visitPattern : _arg1:SynPat -> unit


 Walk over a pattern - this is for example used in
 let <pat> = <expr> or in the 'match' expression
type SynPat =
  | Const of constant: SynConst * range: range
  | Wild of range: range
  | Named of pat: SynPat * ident: Ident * isSelfIdentifier: bool * accessibility: SynAccess option * range: range
  | Typed of pat: SynPat * targetType: SynType * range: range
  | Attrib of pat: SynPat * attributes: SynAttributes * range: range
  | Or of lhsPat: SynPat * rhsPat: SynPat * range: range
  | Ands of pats: SynPat list * range: range
  | LongIdent of longDotId: LongIdentWithDots * extraId: Ident option * typarDecls: SynValTyparDecls option * argPats: SynArgPats * accessibility: SynAccess option * range: range
  | Tuple of isStruct: bool * elementPats: SynPat list * range: range
  | Paren of pat: SynPat * range: range
  ...
    member Range : range
union case SynPat.Wild: range: FSharp.Compiler.Range.range -> SynPat
val printfn : format:Printf.TextWriterFormat<'T> -> 'T
union case SynPat.Named: pat: SynPat * ident: Ident * isSelfIdentifier: bool * accessibility: SynAccess option * range: FSharp.Compiler.Range.range -> SynPat
val pat : SynPat
val name : Ident
union case SynPat.LongIdent: longDotId: LongIdentWithDots * extraId: Ident option * typarDecls: SynValTyparDecls option * argPats: SynArgPats * accessibility: SynAccess option * range: FSharp.Compiler.Range.range -> SynPat
Multiple items
union case LongIdentWithDots.LongIdentWithDots: id: LongIdent * dotms: FSharp.Compiler.Range.range list -> LongIdentWithDots

--------------------
type LongIdentWithDots =
  | LongIdentWithDots of id: LongIdent * dotms: range list
    member Lid : LongIdent
    member Range : range
    member RangeSansAnyExtraDot : range
    member ThereIsAnExtraDotAtTheEnd : bool
val ident : LongIdent
val names : string
Multiple items
type String =
  new : value:char[] -> string + 8 overloads
  member Chars : int -> char
  member Clone : unit -> obj
  member CompareTo : value:obj -> int + 1 overload
  member Contains : value:string -> bool + 3 overloads
  member CopyTo : sourceIndex:int * destination:char[] * destinationIndex:int * count:int -> unit
  member EndsWith : value:string -> bool + 3 overloads
  member EnumerateRunes : unit -> StringRuneEnumerator
  member Equals : obj:obj -> bool + 2 overloads
  member GetEnumerator : unit -> CharEnumerator
  ...

--------------------
String(value: char []) : String
String(value: nativeptr<char>) : String
String(value: nativeptr<sbyte>) : String
String(value: ReadOnlySpan<char>) : String
String(c: char, count: int) : String
String(value: char [], startIndex: int, length: int) : String
String(value: nativeptr<char>, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int) : String
String(value: nativeptr<sbyte>, startIndex: int, length: int, enc: Text.Encoding) : String
val concat : sep:string -> strings:seq<string> -> string
val i : Ident
val visitExpression : _arg1:SynExpr -> unit


 Walk over an expression - if expression contains two or three
 sub-expressions (two if the 'else' branch is missing), let expression
 contains pattern and two sub-expressions
type SynExpr =
  | Paren of expr: SynExpr * leftParenRange: range * rightParenRange: range option * range: range
  | Quote of operator: SynExpr * isRaw: bool * quotedExpr: SynExpr * isFromQueryExpression: bool * range: range
  | Const of constant: SynConst * range: range
  | Typed of expr: SynExpr * targetType: SynType * range: range
  | Tuple of isStruct: bool * exprs: SynExpr list * commaRanges: range list * range: range
  | AnonRecd of isStruct: bool * copyInfo: (SynExpr * BlockSeparator) option * recordFields: (Ident * SynExpr) list * range: range
  | ArrayOrList of isList: bool * exprs: SynExpr list * range: range
  | Record of baseInfo: (SynType * SynExpr * range * BlockSeparator option * range) option * copyInfo: (SynExpr * BlockSeparator) option * recordFields: (RecordFieldName * SynExpr option * BlockSeparator option) list * range: range
  | New of isProtected: bool * targetType: SynType * expr: SynExpr * range: range
  | ObjExpr of objType: SynType * argOptions: (SynExpr * Ident option) option * bindings: SynBinding list * extraImpls: SynInterfaceImpl list * newExprRange: range * range: range
  ...
    member IsArbExprAndThusAlreadyReportedError : bool
    member Range : range
    member RangeOfFirstPortion : range
    member RangeSansAnyExtraDot : range
union case SynExpr.IfThenElse: ifExpr: SynExpr * thenExpr: SynExpr * elseExpr: SynExpr option * spIfToThen: DebugPointForBinding * isFromErrorRecovery: bool * ifToThenRange: FSharp.Compiler.Range.range * range: FSharp.Compiler.Range.range -> SynExpr
val cond : SynExpr
val trueBranch : SynExpr
val falseBranchOpt : SynExpr option
module Option

from Microsoft.FSharp.Core
val iter : action:('T -> unit) -> option:'T option -> unit
union case SynExpr.LetOrUse: isRecursive: bool * isUse: bool * bindings: SynBinding list * body: SynExpr * range: FSharp.Compiler.Range.range -> SynExpr
val bindings : SynBinding list
val body : SynExpr
val binding : SynBinding
union case SynBinding.Binding: accessibility: SynAccess option * kind: SynBindingKind * mustInline: bool * isMutable: bool * attributes: SynAttributes * xmlDoc: FSharp.Compiler.XmlDoc.PreXmlDoc * valData: SynValData * headPat: SynPat * returnInfo: SynBindingReturnInfo option * expr: SynExpr * range: FSharp.Compiler.Range.range * seqPoint: DebugPointForBinding -> SynBinding
val access : SynAccess option
val kind : SynBindingKind
val inlin : bool
val mutabl : bool
val attrs : SynAttributes
val xmlDoc : FSharp.Compiler.XmlDoc.PreXmlDoc
val data : SynValData
val retInfo : SynBindingReturnInfo option
val init : SynExpr
val m : FSharp.Compiler.Range.range
val sp : DebugPointForBinding
val expr : SynExpr
val visitDeclarations : decls:seq<SynModuleDecl> -> unit


 Walk over a list of declarations in a module. This is anything
 that you can write as a top-level inside module (let bindings,
 nested modules, type declarations etc.)
val decls : seq<SynModuleDecl>
val declaration : SynModuleDecl
type SynModuleDecl =
  | ModuleAbbrev of ident: Ident * longId: LongIdent * range: range
  | NestedModule of moduleInfo: SynComponentInfo * isRecursive: bool * decls: SynModuleDecl list * isContinuing: bool * range: range
  | Let of isRecursive: bool * bindings: SynBinding list * range: range
  | DoExpr of spInfo: DebugPointForBinding * expr: SynExpr * range: range
  | Types of typeDefns: SynTypeDefn list * range: range
  | Exception of exnDefn: SynExceptionDefn * range: range
  | Open of longDotId: LongIdentWithDots * range: range
  | Attributes of attributes: SynAttributes * range: range
  | HashDirective of hashDirective: ParsedHashDirective * range: range
  | NamespaceFragment of fragment: SynModuleOrNamespace
    member Range : range
union case SynModuleDecl.Let: isRecursive: bool * bindings: SynBinding list * range: FSharp.Compiler.Range.range -> SynModuleDecl
val isRec : bool
val range : FSharp.Compiler.Range.range
val visitModulesAndNamespaces : modulesOrNss:seq<SynModuleOrNamespace> -> unit


 Walk over all module or namespace declarations
 (basically 'module Foo =' or 'namespace Foo.Bar')
 Note that there is one implicitly, even if the file
 does not explicitly define it..
val modulesOrNss : seq<SynModuleOrNamespace>
val moduleOrNs : SynModuleOrNamespace
Multiple items
union case SynModuleOrNamespace.SynModuleOrNamespace: longId: LongIdent * isRecursive: bool * kind: SynModuleOrNamespaceKind * decls: SynModuleDecl list * xmlDoc: FSharp.Compiler.XmlDoc.PreXmlDoc * attribs: SynAttributes * accessibility: SynAccess option * range: FSharp.Compiler.Range.range -> SynModuleOrNamespace

--------------------
type SynModuleOrNamespace =
  | SynModuleOrNamespace of longId: LongIdent * isRecursive: bool * kind: SynModuleOrNamespaceKind * decls: SynModuleDecl list * xmlDoc: PreXmlDoc * attribs: SynAttributes * accessibility: SynAccess option * range: range
    member Range : range
val lid : LongIdent
[<Struct>]
val isMod : SynModuleOrNamespaceKind
val decls : SynModuleDecl list
val xml : FSharp.Compiler.XmlDoc.PreXmlDoc
val input : string
val tree : ParsedInput
val getUntypedTree : file:string * input:ISourceText -> ParsedInput


 Get untyped tree for a specified input
module SourceText

from FSharp.Compiler.Text
val ofString : string -> ISourceText
Multiple items
module ParsedInput

from FSharp.Compiler.SourceCodeServices

--------------------
type ParsedInput =
  | ImplFile of ParsedImplFileInput
  | SigFile of ParsedSigFileInput
    member Range : range
union case ParsedInput.ImplFile: ParsedImplFileInput -> ParsedInput
val implFile : ParsedImplFileInput
Multiple items
union case ParsedImplFileInput.ParsedImplFileInput: fileName: string * isScript: bool * qualifiedNameOfFile: QualifiedNameOfFile * scopedPragmas: ScopedPragma list * hashDirectives: ParsedHashDirective list * modules: SynModuleOrNamespace list * isLastCompiland: bool * bool -> ParsedImplFileInput

--------------------
type ParsedImplFileInput = | ParsedImplFileInput of fileName: string * isScript: bool * qualifiedNameOfFile: QualifiedNameOfFile * scopedPragmas: ScopedPragma list * hashDirectives: ParsedHashDirective list * modules: SynModuleOrNamespace list * isLastCompiland: bool * bool
val fn : string
val script : bool
val name : QualifiedNameOfFile
val modules : SynModuleOrNamespace list