Processing large inputs without stack overflows
The compiler accepts large inputs such as:
- Large literals, such as
let str = "a1" + "a2" + ... + "a1000"
- Large array expressions
- Large list expressions
- Long lists of sequential expressions
- Long lists of bindings, such as
let v1 = e1 in let v2 = e2 in ....
- Long sequences of
if .. then ... else
expressions - Long sequences of
match x with ... | ...
expressions - Combinations of these
The compiler performs constant folding for large constants so there are no costs to using them at runtime. However, this is subject to a machine's stack size when compiling, leading to StackOverflow
exceptions if those constants are very large. The same can be observed for certain kinds of array, list, or sequence expressions. This appears to be more prominent when compiling on macOS because macOS has a smaller stack size.
Many sources of StackOverflow
exceptions prior to F# 4.7 when processing these kinds of constructs were resolved by processing them on the heap via continuation passing techniques. This avoids filling data on the stack and appears to have negligible effects on overall throughput or memory usage of the compiler.
There are two techniques to deal with this
- Linearizing processing of specific input shapes, keeping stacks small
- Using stack guards to simply temporarily move to a new thread when a certain threshold is reached.
Linearizing processing if certain inputs
Aside from array expressions, most of the previously-listed inputs are called "linear" expressions. This means that there is a single linear hole in the shape of expressions. For example:
expr :: HOLE
(list expressions or other right-linear constructions)expr; HOLE
(sequential expressions)let v = expr in HOLE
(let expressions)if expr then expr else HOLE
(conditional expression)match expr with pat[vs] -> e1[vs] | pat2 -> HOLE
(for example,match expr with Some x -> ... | None -> ...
)
Processing these constructs with continuation passing is more difficult than a more "natural" approach that would use the stack.
For example, consider the following contrived example:
and remapLinearExpr g compgen tmenv expr contf =
match expr with
| Expr.Let (bind, bodyExpr, m, _) ->
...
// tailcall for the linear position
remapLinearExpr g compgen tmenvinner bodyExpr (contf << (fun bodyExpr' ->
...))
| Expr.Sequential (expr1, expr2, dir, spSeq, m) ->
...
// tailcall for the linear position
remapLinearExpr g compgen tmenv expr2 (contf << (fun expr2' ->
...))
| LinearMatchExpr (spBind, exprm, dtree, tg1, expr2, sp2, m2, ty) ->
...
// tailcall for the linear position
remapLinearExpr g compgen tmenv expr2 (contf << (fun expr2' -> ...))
| LinearOpExpr (op, tyargs, argsFront, argLast, m) ->
...
// tailcall for the linear position
remapLinearExpr g compgen tmenv argLast (contf << (fun argLast' -> ...))
| _ -> contf (remapExpr g compgen tmenv e)
and remapExpr (g: TcGlobals) (compgen:ValCopyFlag) (tmenv:Remap) expr =
match expr with
...
| LinearOpExpr _
| LinearMatchExpr _
| Expr.Sequential _
| Expr.Let _ -> remapLinearExpr g compgen tmenv expr (fun x -> x)
The remapExpr
operation becomes two functions, remapExpr
(for non-linear cases) and remapLinearExpr
(for linear cases). remapLinearExpr
uses tailcalls for constructs in the HOLE
positions mentioned previously, passing the result to the continuation.
Some common aspects of this style of programming are:
- The tell-tale use of
contf
(continuation function) - The processing of the body expression
e
of a let-expression is tail-recursive, if the next construct is also a let-expression. - The processing of the
e2
expression of a sequential-expression is tail-recursive - The processing of the second expression in a cons is tail-recursive
The previous example is considered incomplete, because arbitrary combinations of let
and sequential expressions aren't going to be dealt with in a tail-recursive way. The compiler generally tries to do these combinations as well.
Stack Guards
The StackGuard
type is used to count synchronous recursive processing and move to a new thread if a limit is reached. Compilation globals are re-installed. Sample:
let TcStackGuardDepth = StackGuard.GetDepthOption "Tc"
...
stackGuard = StackGuard(TcMaxStackGuardDepth)
let rec ....
and TcExpr cenv ty (env: TcEnv) tpenv (expr: SynExpr) =
// Guard the stack for deeply nested expressions
cenv.stackGuard.Guard <| fun () ->
...
Note stack guarding doesn't result in a tailcall so will appear in recursive stack frames, because a counter must be decremented after the call. This is used systematically for recursive processing of:
- SyntaxTree SynExpr
- TypedTree Expr
We don't use it for other inputs.