# F# Data: XML Type Provider

This article demonstrates how to use the XML type provider to access XML documents in a statically typed way. We first look how the structure is inferred and then demonstrate the provider by parsing RSS feed.

The XML type provider provides a statically typed access to XML documents. It takes a sample document as an input (or document containing a root XML node with multiple child nodes that are used as samples). The generated type can then be used to read files with the same structure. If the loaded file does not match the structure of the sample, an runtime error may occur (but only when accessing e.g. non-existing element).

## Introducing the provider

The type provider is located in the FSharp.Data.dll assembly. Assuming the assembly is located in the ../../bin directory, we can load it in F# Interactive as follows: (note we also need a reference to System.Xml.Linq, because the provider uses the XDocument type under the cover):

 1: 2: 3:  #r "../../../bin/FSharp.Data.dll" #r "System.Xml.Linq.dll" open FSharp.Data

### Inferring type from sample

The XmlProvider<...> takes one static parameter of type string. The parameter can be either a sample XML string or a sample file (relatively to the current folder or online accessible via http or https). It is not likely that this could lead to ambiguities.

The following sample generates a type that can read simple XML documents with a root node containing a two attributes:

 1: 2: 3: 4:  type Author = XmlProvider<""""""> let sample = Author.Parse("""""") printfn "%s (%d)" sample.Name sample.Born

The type provider generates a type Author that has properties corresponding to the attributes of the root element of the XML document. The types of the properties are inferred based on the values in the sample document. In this case, the Name property has a type string and Born is int.

XML is quite flexible format, so we could represent the same document differently. Instead of using attributes, we could use nested nodes (<name> and <born> nested under <author>) that directly contain the values:

 1: 2: 3: 4: 5:  type AuthorAlt = XmlProvider<"Karl Popper1902"> let doc = "Paul Feyerabend1924" let sampleAlt = AuthorAlt.Parse(doc) printfn "%s (%d)" sampleAlt.Name sampleAlt.Born

The generated type provides exactly the same API for reading documents following this convention (Note that you cannot use AuthorAlt to parse samples that use the first style - the implementation of the types differs, they just provide the same public API.)

The provider turns a node into a simply typed property only when the node contains just a primitive value and has no children or attributes.

### Types for more complex structure

Now let's look at a number of examples that have more interesting structure. First of all, what if a node contains some value, but also has some attributes?

 1: 2: 3: 4:  type Detailed = XmlProvider<"""Karl Popper"""> let info = Detailed.Parse("""Thomas Kuhn""") printfn "%s (full=%b)" info.Name.Value info.Name.Full

If the node cannot be represented as a simple type (like string) then the provider builds a new type with multiple properties. Here, it generates a property Full (based on the name of the attribute) and infers its type to be boolean. Then it adds a property with a (special) name Value that returns the content of the element.

### Types for multiple simple elements

Another interesting case is when there are multiple nodes that contain just a primitive value. The following example shows what happens when the root node contains multiple <value> nodes (note that if we leave out the parameter to the Parse method, the same text used for the schema will be used as the runtime value)

 1: 2: 3: 4:  type Test = XmlProvider<"13"> Test.GetSample().Values |> Seq.iter (printfn "%d")

The type provider generates a property Values that returns an array with the values - as the <value> nodes do not contain any attributes or children, they are turned into int values and so the Values property returns just int[]!

## Processing philosophers

In this section we look at an example that demonstrates how the type provider works on a simple document that lists authors that write about a specific topic. The sample document data/Writers.xml looks as follows:

<authors topic="Philosophy of Science">
<author name="Paul Feyerabend" born="1924" />
<author name="Thomas Kuhn" />
</authors> 

At runtime, we use the generated type provider to parse the following string (which has the same structure as the sample document with the exception that one of the author nodes also contains died attribute):

 1: 2: 3: 4: 5: 6:  let authors = """ """

When initializing the XmlProvider, we can pass it a file name or a web url. The Load and AsyncLoad methods allows reading the data from a file or from a web resource. The Parse method takes the data as a string, so we can now print the information as follows:

 1: 2: 3: 4: 5: 6: 7: 8:  type Authors = XmlProvider<"../data/Writers.xml"> let topic = Authors.Parse(authors) printfn "%s" topic.Topic for author in topic.Authors do printf " - %s" author.Name author.Born |> Option.iter (printf " (%d)") printfn ""

The value topic has a property Topic (of type string) which returns the value of the attribute with the same name. It also has a property Authors that returns an array with all the authors. The Born property is missing for some authors, so it becomes option<int> and we need to print it using Option.iter.

The died attribute was not present in the sample used for the inference, so we cannot obtain it in a statically typed way (although it can still be obtained dynamically using author.XElement.Attribute(XName.Get("died"))).

## Global inference mode

In the examples shown earlier, an element was never (recursively) contained in an element of the same name (for example <author> never contained another <author>). However, when we work with documents such as XHTML files, this can often be the case. Consider for example, the following sample (a simplified version of data/HtmlBody.xml):

<div id="root">
<span>Main text</span>
<div id="first">
<div>Second text</div>
</div>
</div>

Here, a <div> element can contain other <div> elements and it is quite clear that they should all have the same type - we want to be able to write a recursive function that processes <div> elements. To make this possible, you need to set an optional parameter Global to true:

 1: 2:  type Html = XmlProvider<"../data/HtmlBody.xml", Global=true> let html = Html.GetSample()

When the Global parameter is true, the type provider unifies all elements of the same name. This means that all <div> elements have the same type (with a union of all attributes and all possible children nodes that appear in the sample document).

The type is located under a type Html, so we can write a printDiv function that takes Html.Div and acts as follows:

 1: 2: 3: 4: 5: 6: 7: 8: 9:  /// Prints the content of a
element let rec printDiv (div:Html.Div) = div.Spans |> Seq.iter (printfn "%s") div.Divs |> Seq.iter printDiv if div.Spans.Length = 0 && div.Divs.Length = 0 then div.Value |> Option.iter (printfn "%s") // Print the root
element with all children printDiv html

The function first prints all text included as <span> (the element never has any attributes in our sample, so it is inferred as string), then it recursively prints the content of all <div> elements. If the element does not contain nested elements, then we print the Value (inner text).

To conclude this introduction with a more interesting example, let's look how to parse a RSS feed. As discussed earlier, we can use relative paths or web addresses when calling the type provider:

 1:  type Rss = XmlProvider<"http://tomasp.net/blog/rss.aspx">

This code builds a type Rss that represents RSS feeds (with the features that are used on http://tomasp.net). The type Rss provides static methods Parse, Load and AsyncLoad to construct it - here, we just want to reuse the same uri of the schema, so we use the GetSample static method:

 1:  let blog = Rss.GetSample()

Printing the title of the RSS feed together with a list of recent posts is now quite easy - you can simply type blog followed by . and see what the autocompletion offers. The code looks like this:

 1: 2: 3: 4: 5: 6:  // Title is a property returning string printfn "%s" blog.Channel.Title // Get all item nodes and print title with link for item in blog.Channel.Items do printfn " - %s (%s)" item.Title item.Link

## Transforming XML

In this example we will now also create XML in addition to consuming it. Consider the problem of flattening a data set. Let's say you have xml data that looks like this:

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:  [] let customersXmlSample = """ """

and you want to transform it into something like this:

 1: 2: 3: 4: 5: 6: 7: 8:  [] let orderLinesXmlSample = """ """

We'll create types from both the input and output samples and use the constructors on the types generated by the XmlProvider:

  1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:  type InputXml = XmlProvider type OutputXml = XmlProvider let orderLines = OutputXml.OrderLines [| for customer in InputXml.GetSample().Customers do for order in customer.Orders do for line in order.OrderLines do yield OutputXml.OrderLine(customer.Name, order.Number, line.Item, line.Quantity) |]

## Related articles

namespace FSharp
namespace FSharp.Data
type Author = XmlProvider<...>

Full name: XmlProvider.Author
type XmlProvider

Full name: FSharp.Data.XmlProvider

<summary>Typed representation of a XML file.</summary>
<param name='Sample'>Location of a XML sample file or a string containing a sample XML document.</param>
<param name='SampleIsList'>If true, the children of the root in the sample document represent individual samples for the inference.</param>
<param name='Global'>If true, the inference unifies all XML elements with the same name.</param>
<param name='Culture'>The culture used for parsing numbers and dates. Defaults to the invariant culture.</param>
<param name='Encoding'>The encoding used to read the sample. You can specify either the character set name or the codepage number. Defaults to UTF8 for files, and to ISO-8859-1 the for HTTP requests, unless charset is specified in the Content-Type response header.</param>
<param name='ResolutionFolder'>A directory that is used when resolving relative file references (at design time and in hosted execution).</param>
<param name='EmbeddedResource'>When specified, the type provider first attempts to load the sample from the specified resource
(e.g. 'MyCompany.MyAssembly, resource_name.xml'). This is useful when exposing types generated by the type provider.</param>
val sample : XmlProvider<...>.Author

Full name: XmlProvider.sample
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Author

Parses the specified XML string
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
property XmlProvider<...>.Author.Name: string
property XmlProvider<...>.Author.Born: int
type AuthorAlt = XmlProvider<...>

Full name: XmlProvider.AuthorAlt
val doc : string

Full name: XmlProvider.doc
val sampleAlt : XmlProvider<...>.Author

Full name: XmlProvider.sampleAlt
type Detailed = XmlProvider<...>

Full name: XmlProvider.Detailed
val info : XmlProvider<...>.Author

Full name: XmlProvider.info
property XmlProvider<...>.Author.Name: XmlProvider<...>.Name
property XmlProvider<...>.Name.Value: string
property XmlProvider<...>.Name.Full: bool
type Test = XmlProvider<...>

Full name: XmlProvider.Test
XmlProvider<...>.GetSample() : XmlProvider<...>.Root
module Seq

from Microsoft.FSharp.Collections
val iter : action:('T -> unit) -> source:seq<'T> -> unit

Full name: Microsoft.FSharp.Collections.Seq.iter
val authors : string

Full name: XmlProvider.authors
type Authors = XmlProvider<...>

Full name: XmlProvider.Authors
val topic : XmlProvider<...>.Authors

Full name: XmlProvider.topic
XmlProvider<...>.Parse(text: string) : XmlProvider<...>.Authors

Parses the specified XML string
property XmlProvider<...>.Authors.Topic: string
val author : XmlProvider<...>.Author
property XmlProvider<...>.Authors.Authors: XmlProvider<...>.Author []
val printf : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printf
property XmlProvider<...>.Author.Born: Option<int>
module Option

from Microsoft.FSharp.Core
val iter : action:('T -> unit) -> option:'T option -> unit

Full name: Microsoft.FSharp.Core.Option.iter
type Html = XmlProvider<...>

Full name: XmlProvider.Html
val html : XmlProvider<...>.Div

Full name: XmlProvider.html
XmlProvider<...>.GetSample() : XmlProvider<...>.Div
val printDiv : div:XmlProvider<...>.Div -> unit

Full name: XmlProvider.printDiv

Prints the content of a <div> element
val div : XmlProvider<...>.Div
type Div =
inherit XmlElement
new : id: Option<string> * value: Option<string> * spans: string [] * divs: Div [] -> Div + 1 overload
member Divs : Div []
member Id : Option<string>
member Spans : string []
member Value : Option<string>
member _Print : string

Full name: FSharp.Data.XmlProvider,Sample="../data/HtmlBody.xml",Global="True".Div
property XmlProvider<...>.Div.Spans: string []
property XmlProvider<...>.Div.Divs: XmlProvider<...>.Div []
property System.Array.Length: int
property XmlProvider<...>.Div.Value: Option<string>

Full name: XmlProvider.blog
property XmlProvider<...>.Channel.Title: string
val item : XmlProvider<...>.Item
property XmlProvider<...>.Channel.Items: XmlProvider<...>.Item []
property XmlProvider<...>.Item.Title: string
Multiple items
type LiteralAttribute =
inherit Attribute
new : unit -> LiteralAttribute

Full name: Microsoft.FSharp.Core.LiteralAttribute

--------------------
new : unit -> LiteralAttribute
val customersXmlSample : string

Full name: XmlProvider.customersXmlSample
val orderLinesXmlSample : string

Full name: XmlProvider.orderLinesXmlSample
type InputXml = XmlProvider<...>

Full name: XmlProvider.InputXml
type OutputXml = XmlProvider<...>

Full name: XmlProvider.OutputXml
val orderLines : XmlProvider<...>.OrderLines

Full name: XmlProvider.orderLines
type OrderLines =
inherit XmlElement
new : orderLines: OrderLine [] -> OrderLines + 1 overload
member OrderLines : OrderLine []
member _Print : string

Full name: FSharp.Data.XmlProvider,Sample="
<OrderLines>
<OrderLine Customer=\"ACME\" Order=\"A012345\" Item=\"widget\" Quantity=\"1\"/>
<OrderLine Customer=\"ACME\" Order=\"A012346\" Item=\"trinket\" Quantity=\"2\"/>
<OrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"skyhook\" Quantity=\"3\"/>
<OrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"gizmo\" Quantity=\"4\"/>
</OrderLines>".OrderLines
val customer : XmlProvider<...>.Customer
XmlProvider<...>.GetSample() : XmlProvider<...>.Customers
val order : XmlProvider<...>.Order
property XmlProvider<...>.Customer.Orders: XmlProvider<...>.Order []
val line : XmlProvider<...>.OrderLine
property XmlProvider<...>.Order.OrderLines: XmlProvider<...>.OrderLine []
type OrderLine =
inherit XmlElement
new : customer: string * order: string * item: string * quantity: int -> OrderLine + 1 overload
member Customer : string
member Item : string
member Order : string
member Quantity : int
member _Print : string

Full name: FSharp.Data.XmlProvider,Sample="
<OrderLines>
<OrderLine Customer=\"ACME\" Order=\"A012345\" Item=\"widget\" Quantity=\"1\"/>
<OrderLine Customer=\"ACME\" Order=\"A012346\" Item=\"trinket\" Quantity=\"2\"/>
<OrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"skyhook\" Quantity=\"3\"/>
<OrderLine Customer=\"Southwind\" Order=\"A012347\" Item=\"gizmo\" Quantity=\"4\"/>
</OrderLines>".OrderLine
property XmlProvider<...>.Customer.Name: string
property XmlProvider<...>.Order.Number: string
property XmlProvider<...>.OrderLine.Item: string
property XmlProvider<...>.OrderLine.Quantity: int