Compiling Equality
This spec covers how equality is compiled and executed by the F# compiler and library, based mainly on the types involved in the equality operation after all inlining, type specialization and other optimizations have been applied.
What do we mean by an equality operation?
This spec is about the semantics and performance of the following coding constructs
a = b
a <> b
It is also about the semantics and performance of uses of the following FSharp.Core
constructs which, after inlining, generate code that contains an equality check at the specific EQTYPE
* HashIdentity.Structural<'T>
* {Array,Seq,List}.contains
* {Array,Seq,List}.countBy
* {Array,Seq,List}.groupBy
* {Array,Seq,List}.distinct
* {Array,Seq,List}.distinctBy
* {Array,Seq,List}.except
All of which have implied equality checks. Some of these operations are inlined, see below, which in turn affects the semantics and performance of the overall operation.
ER vs PER equality
In math, a (binary) relation is a way to describe a relationship between the elements of sets. "Greater than" is a relation for numbers, "Subset of" is a relation for sets.
Here we talk about 3 particular relations:
1) Reflexivity - every element is related to itself
- For integers, =
is reflexive (a = a
is always true) and >
is not (a > a
is never true)
2) Symmetry - if a
is related to b
, then b
is related to a
- For integers, =
is symmetric (a = b
-> b = a
) and >
is not (if a > b
then b > a
is false)
3) Transitivity - if a
is related to b
, and b
is related to c
, then a
is also related c
- For integers, >
is transitive (a > b
&& b > c
-> a > c
) and √
is not (a = √b
&& b = √c
doesn't mean a = √c
)
If a relation has 1, 2, and 3, we talk about Equivalence Relation (ER). If a relation only has 2 and 3, we talk about Partial Equivalence Relation (PER).
This matters in comparing floats since they include NaN. Depending on if we consider NaN = NaN
true or false, we talk about ER or PER comparison respectively.
What is the type known to the compiler and library for an equality operation?
The static type known to the F# compiler is crucial to determining the performance of the operation. The runtime type of the equality check is also significant in some situations.
Here we define the relevant static type EQTYPE
for the different constructs above:
Basics
a = b
:EQTYPE
is the statically known type ofa
orb
a <> b
:EQTYPE
is the statically known type ofa
orb
Inlined constructs
HashIdentity.Structural<'T>
,EQTYPE
is the inlined'T
(results in specialized equality)Array.contains<'T>
,EQTYPE
is the inlined'T
(results in specialized equality)List.contains<T>
likewiseSeq.contains<T>
likewise
These only result in naked generic equality if themselves used from a non-inlined generic context.
Non-inlined constructs always resulting in naked generic equality
Array.groupBy<'Key, 'T> f array
,EQTYPE
is non-inlined'Key
, results in naked generic equalityArray.countBy array
likewise for'T
Array.distinct<'T> array
likewiseArray.distinctBy array
likewiseArray.except array
likewiseList.groupBy
likewiseList.countBy
likewiseList.distinct
likewiseList.distinctBy
likewiseList.except
likewiseSeq.groupBy
likewiseSeq.countBy
likewiseSeq.distinct
likewiseSeq.distinctBy
likewiseSeq.except
likewise
These always result in naked generic equality checks.
Example 1:
let x = HashIdentity.Structural<byte> // EQTYPE known to compiler is `byte`
Example 2 (a non-inlined "naked" generic context):
let f2<'T> () =
... some long code
// EQTYPE known to the compiler is `'T`
// RUNTIME-EQTYPE known to the library is `byte`
let x = HashIdentity.Structural<'T>
... some long code
f2<byte>() // performance of this is determined by EQTYPE<'T> and RUNTIME-EQTYPE<byte>
Example 3 (an inlined generic context):
let f3<'T> () =
... some long code
// EQTYPE known to the compiler is `byte`
// RUNTIME-EQTYPE known to the library is `byte`
let x = HashIdentity.Structural<'T>
... some long code
f3<byte>() // performance of this is determined by EQTYPE<byte> and RUNTIME-EQTYPE<byte>
Example 4 (a generic struct type in a non-inline generic context):
let f4<'T> () =
... some long code
// EQTYPE known to the compiler is `SomeStructType<'T>`
// RUNTIME-EQTYPE known to the library is `SomeStructType<byte>`
let x = HashIdentity.Structural<SomeStructType<'T>>
... some long code
f4<byte>() // performance of this determined by EQTYPE<SomeStructType<'T>> and RUNTIME-EQTYPE<SomeStructType<byte>>
How we compile equality "a = b"
This very much depends on the EQTYPE
involved in the equality as known by the compiler
Aim here is to flesh these all out with: * Semantics: what semantics the user expects, and what the semantics actually is * Perf expectation: what perf the user expects * Compilation today: How we actually compile today * Perf today: What is the perf we achieve today * (Optional) sharplab.io link to how things are in whatever version is selected in sharplab * (Optional) notes
primitive integer types (int32
, int64
, ...)
let f (x: int) (y: int) = (x = y)
- Semantics: equality on primitive
- Perf: User expects full performance down to native
- Compilation today: compiles to IL instruction ✅
- Perf today: good ✅
- sharplab int32
primitive floating point types (float32
, float64
)
let f (x: float32) (y: float32) = (x = y)
- Semantics: IEEE floating point equality (respecting NaN etc.)
- Perf: User expects full performance down to native
- Compilation today: compiles to IL instruction ✅
- Perf today: good ✅
- sharplab float32
primitive string
, decimal
- Semantics: .NET equivalent equality, non-localized for strings
- Perf: User expects full performance down to native
- Compilation today: compiles to
String.Equals
orDecimal.op_Equality
call ✅ - Perf today: good ✅
- sharplab decimal
- sharplab string
reference tuple type (size <= 5)
- Semantics: User expects structural
- Perf: User expects flattening to constituent checks
- Compilation today: tuple equality is flattened to constituent checks ✅
- Perf today: good ✅
- sharplab (int * double * 'T), with example reductions/optimizations noted
reference tuple type (size > 5)
- Semantics: User expects structural
- Perf: User expects flattening to constituent checks
- Compilation today: not flattened, compiled to
GenericEqualityIntrinsic
- Perf today: the check does type tests, does virtual calls via
IStructuralEqualityComparer
, boxes etc. ❌(Problem1) - sharplab for size 6
struct tuple type
- Semantics: User expects structural
- Perf: User expects flattening to constituent checks or at least the same optimizations as tuples
- Compilation today: compiled to
GenericEqualityIntrinsic
- Perf today: boxes, does type tests, does virtual calls via
IStructuralEqualityComparer
etc. ❌(Problem2) - sharplab for size 3
C# or F# enum type
- Semantics: User expects identical to equality on the underlying type
- Perf: User expects same perf as flattening to underlying type
- Compilation today: flattens to underlying type
- Perf today: good ✅
- sharplab for C# enum int
- sharplab for F# enum int
C# struct type
- Semantics: User expects call to
IEquatable<T>
if present, but F# spec says callthis.Equals(box that)
, in practice these are the same - Perf expected: no boxing
- Compilation today: EqualityComparer
.Default - Perf today: good ✅
- sharplab
F# struct type (records, tuples - with compiler-generated structural equality)
- Semantics: User expects field-by-field structural equality
- Perf expected: no boxing
- Compilation today:
GenericEqualityIntrinsic<SomeStructType>
- Perf today: good ✅
- sharplab
array type (byte[], int[], some-struct-type[], ...)
- Semantics: User expects structural
- Perf expected: User expects perf is sum of constituent parts
- Compilation today: either
FSharpEqualityComparer_PER`1<uint8[]>::get_EqualityComparer().Equals(...)
orFSharpEqualityComparer_PER`1<T[]>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
- sharplab for
byte[]
F# large reference record/union type
Here "large" means the compiler-generated structural equality is NOT inlined.
- Semantics: User expects structural by default
- Perf expected: User expects perf is sum of constituent parts, type-specialized if generic
- Compilation today: direct call to
Equals(T)
- Perf today: the call to
Equals(T)
has specialized code but boxes fields if struct or generic, see Problem3 ❌, Problem4 ❌
F# tiny reference (anonymous) record or union type
Here "tiny" means the compiler-generated structural equality IS inlined.
- Semantics: User expects structural by default
- Perf expected: User expects perf is sum of constituent parts, type-specialized if generic
- Compilation today:
FSharpEqualityComparer_ER`1<!a>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
Generic 'T
in non-inlined generic code
- Semantics: User expects the PER equality semantics of whatever
'T
actually is - Perf expected: User expects no boxing
- Compilation today:
FSharpEqualityComparer_ER`1<!a>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
Generic 'T
in recursive position in structural comparison
This case happens in structural equality for tuple types and other structural types
- Semantics: User expects the PER equality semantics of whatever
'T
actually is - Perf: User expects no boxing
- Compilation today:
FSharpEqualityComparer_ER`1<!a>::get_EqualityComparer().Equals(...)
- Perf today: good ✅
- Sharplab
Techniques available to us
- Flatten and inline
- RCG: Use reflective code generation internally in FSharp.Core
- KFS: Rely on known semantics of F# structural types and treat those as special
- TS: Hand-code type-specializations using static optimization conditions in FSharp.Core
- TT: Type-indexed tables of baked (poss by reflection) equality comparers and functions, where some pre-computation is done
- DV: De-virtualization
- DEQ: Use
EqualityComparer<'T>.Default
where possible
Notes on previous attempts to improve things
#5112
- Uses TT, DEQ, KFS, DV
- Focuses on solving Problem4
- 99% not breaking, apart from the case of value types with custom equality implemented differently than the
EqualityComparer.Default
- the change would lead to the usage of the custom implementation which is reasonable
Note: this included changes to the optimizer to reduce GenericEqualityIntrinsic down to a type-indexed table lookup fetching an IEqualityComparer
and calling it. These hand-coded reductions appear unnecessary as the reduction doesn't open up any further optimizations.
val byte: value: 'T -> byte (requires member op_Explicit)
--------------------
type byte = System.Byte
--------------------
type byte<'Measure> = byte
val int: value: 'T -> int (requires member op_Explicit)
--------------------
type int = int32
--------------------
type int<'Measure> = int
val float32: value: 'T -> float32 (requires member op_Explicit)
--------------------
type float32 = System.Single
--------------------
type float32<'Measure> = float32