F# Data


F# Data: WorldBank Provider

The World Bank is an international organization that provides financial and technical assistance to developing countries around the world. As one of the activities, the World Bank also collects development indicators and other data about countries in the world. The data catalog contains over 8,000 indicators that can be programmatically accessed.

The WorldBank Type Provider makes the WorldBank data easily accessible to F# programs and scripts in a type-safe manner. This article provides an introduction. The type provider is also used on the Try F# web site in the "Data Science" tutorial, so you can find more examples there.

Introducing the provider

The following example loads the FSharp.Data.dll library (in F# Interactive), initializes a connection to the WorldBank using the GetDataContext method and then retrieves the percentage of population who attend universities in the UK:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
#r "../../../bin/FSharp.Data.dll"
open FSharp.Data

let data = WorldBankData.GetDataContext()

data
  .Countries.``United Kingdom``
  .Indicators.``Gross enrolment ratio, tertiary, both sexes (%)``
|> Seq.maxBy fst

When generating the data context, the WorldBank Type Provider retrieves the list of all countries known to the WorldBank and the list of all supported indicators. Both of these dimensions are provided as properties, so you can use autocomplete to easily discover various data sources. Most of the indicators use longer names, so we need to wrap the name in \``.

The result of the Gross enrolment ratio, tertiary, both sexes (%) property is a sequence with values for different years. Using Seq.maxBy fst we get the most recent available value.

Charting World Bank data

We can easily see how the university enrollment changes over time by using the FSharp.Charting library and plotting the data:

1: 
2: 
#load "../../../packages/FSharp.Charting/lib/net45/FSharp.Charting.fsx"
open FSharp.Charting
1: 
2: 
3: 
data.Countries.``United Kingdom``
    .Indicators.``Gross enrolment ratio, tertiary, both sexes (%)``
|> Chart.Line

The Chart.Line function takes a sequence of pairs containing X and Y values, so we can call it directly with the World Bank data set using the year as the X value and the value as a Y value.

Chart

Using World Bank data asynchronously

If you need to download large amounts of data or run the operation without blocking the caller, then you probably want to use F# asynchronous workflows to perform the operation. The F# Data Library also provides the WorldBankDataProvider type which takes a number of static parameters. If the Asynchronous parameter is set to true then the type provider generates all operations as asynchronous:

1: 
2: 
type WorldBank = WorldBankDataProvider<"World Development Indicators", Asynchronous=true>
WorldBank.GetDataContext()

The above snippet specified "World Development Indicators" as the name of the data source (a collection of commonly available indicators) and it set the optional argument Asynchronous to true. As a result, properties such as Gross enrolment ratio, tertiary, both sexes (%) will now have a type Async<(int * int)[]> meaning that they represent an asynchronous computation that can be started and will eventually produce the data.

Downloading data in parallel

To demonstrate the asynchronous version of the type provider, let's write code that downloads the university enrollment data about a number of countries in parallel. We first create a data context and then define an array with some countries we want to process:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
let wb = WorldBank.GetDataContext()

// Create a list of countries to process
let countries = 
 [| wb.Countries.``Arab World``
    wb.Countries.``European Union``
    wb.Countries.Australia
    wb.Countries.Brazil
    wb.Countries.Canada
    wb.Countries.Chile
    wb.Countries.``Czech Republic``
    wb.Countries.Denmark
    wb.Countries.France
    wb.Countries.Greece
    wb.Countries.``Low income``
    wb.Countries.``High income``
    wb.Countries.``United Kingdom``
    wb.Countries.``United States`` |]

To download the information in parallel, we can create a list of asynchronous computations, compose them using Async.Parallel and then run the (single) obtained computation to perform all the downloads:

1: 
2: 
3: 
4: 
5: 
6: 
[ for c in countries ->
    c.Indicators.``Gross enrolment ratio, tertiary, both sexes (%)`` ]
|> Async.Parallel
|> Async.RunSynchronously
|> Array.map Chart.Line
|> Chart.Combine

The above snippet does not just download the data using Async.RunSynchronously, but it also turns every single downloaded data set into a line chart (using Chart.Line) and then creates a single composed chart using Chart.Combine.

Chart

Related articles

Fork me on GitHub