The goal of this document is to get you up and running with rsdmx as quickly as possible.
rsdmx provides a set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework.
The SDMX framework provides two sets of standard specifications to facilitate the exchange of statistical data:
SDMX allows to disseminate both data (a dataset) and metadata (the description of the dataset).
For this, the SDMX standard provides various types of documents, also known as messages. Hence there will be:
Generic and Compact ones. The latter aims to provide a more compact XML document. They are other data document types derivating from the ones previously mentioned.Data Structure Definition (DSD). As its name indicates, it describes the structure and organization of a dataset, and will generally include all the master/reference data used to characterize a dataset. The 2 main types of metadata are (1) the concepts, which correspond to the dimensions and/or attributes of the dataset, and (2) the codelists which inventory the possible values to be used in the representation of dimensions and attributes.For more information about the SDMX standards, you can visit the SDMX website, or this introduction by EUROSTAT.
rsdmx offers a low-level set of tools to read data and metadata in the SDMX-ML format. Its strategy is to make it very easy for the user. For this, a unique function named readSDMX has to be used, whatever it is a data or metadata document, or if it is local or remote datasource.
What rsdmx does support:
a SDMX format abstraction library, with focus on the the main SDMX standard XML format (SDMX-ML), and the support of the three format standard versions (1.0, 2.0, 2.1)
an interface to SDMX web-services for a list of well-known data providers, such as OECD, EUROSTAT, ECB, UN FAO, UN ILO, etc (a list that should grow in a near future!). See it in action!
Let's see then how to use rsdmx!
rsdmx can be installed from CRAN or from its development repository hosted in Github. For the latter, you will need the devtools package and run:
devtools::install_github("opensdmx/rsdmx")
To load rsdmx in R, do the following:
library(rsdmx)
This section will introduce you on how to read SDMX dataset documents, either from remote datasources, or from local SDMX files.
The following code snipet shows you how to read a dataset from a remote data source, taking as example the OECD StatExtracts portal: http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011
myUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)
You can try it out with other datasources, such as from the EUROSTAT portal: http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011
The online rsdmx documentation also provides a list of data providers, either from international or national institutions, and more request examples.
Now, the service providers above mentioned are known by rsdmx which let users using readSDMX with the helper parameters. The list of service providers can be retrieved doing:
providers <- getSDMXServiceProviders();
as.data.frame(providers)
##    agencyId                                                          name
## 1       ECB                                         European Central Bank
## 2     ESTAT           Eurostat (Statistical office of the European Union)
## 3       IMF                                   International Monetary Fund
## 4      OECD        Organisation for Economic Cooperation and Development 
## 5      UNSD                            United Nations Statistics Division
## 6       FAO       Food and Agriculture Organization of the United Nations
## 7       ILO       International Labour Organization of the United Nations
## 8       UIS                                UNESCO Institute of Statistics
## 9  WBG_WITS                               World Integrated Trade Solution
## 10      ABS                               Australian Bureau of Statistics
## 11      NBB                                      National Bank of Belgium
## 12    INSEE Institut national de la statistique et des études économiques
## 13    INEGI        Instituto Nacional de Estadística y Geografía (Méjico)
## 14    ISTAT                     Istituto nazionale di statistica (Italia)
## 15   KNOEMA                                    KNOEMA knowledge plateform
##            scale country                   builder compliant
## 1  international    <NA>  SDMXREST21RequestBuilder      TRUE
## 2  international    <NA>  SDMXREST21RequestBuilder      TRUE
## 3  international    <NA>  SDMXREST20RequestBuilder     FALSE
## 4  international    <NA> SDMXDotStatRequestBuilder     FALSE
## 5  international    <NA>  SDMXREST21RequestBuilder      TRUE
## 6  international    <NA>  SDMXREST21RequestBuilder     FALSE
## 7  international    <NA>  SDMXREST21RequestBuilder     FALSE
## 8  international    <NA> SDMXDotStatRequestBuilder     FALSE
## 9  international    <NA>  SDMXREST21RequestBuilder      TRUE
## 10      national     AUS SDMXDotStatRequestBuilder     FALSE
## 11      national     BEL SDMXDotStatRequestBuilder     FALSE
## 12      national     FRA  SDMXREST21RequestBuilder      TRUE
## 13      national     MEX  SDMXREST20RequestBuilder     FALSE
## 14      national     ITA  SDMXREST21RequestBuilder      TRUE
## 15 international    <NA>        SDMXRequestBuilder     FALSE
Note it is also possible to add an SDMX service provider at runtime. For registering a new SDMX service provider by default, please contact me!
Let's see how it would look like for querying an OECD datasource:
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
                key = list("TOT", NULL, NULL), start = 2010, end = 2011)
df <- as.data.frame(sdmx)
head(df)
##   CO2 VAR GEN COU TIME_FORMAT obsTime obsValue OBS_STATUS
## 1 TOT B11 WMN AUS         P1Y    2010   107740       <NA>
## 2 TOT B11 WMN AUS         P1Y    2011   108865       <NA>
## 3 TOT B11 TOT AUS         P1Y    2010   206714       <NA>
## 4 TOT B11 TOT AUS         P1Y    2011   210704       <NA>
## 5 TOT B12 TOT AUS         P1Y    2010    29307       <NA>
## 6 TOT B12 TOT AUS         P1Y    2011    31204       <NA>
It is also possible to query a dataset together with its “definition”, handled
in a separate SDMX-ML document named DataStructureDefinition (DSD). It is 
particularly useful when you want to enrich your dataset with all labels. For this, 
you need the DSD which contains all reference data.
To do so, you only need to append dsd = TRUE (default value is FALSE), 
to the previous request, and specify labels = TRUE when calling as.data.frame,
as follows:
sdmx <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
                key = list("TOT", NULL, NULL), start = 2010, end = 2011,
                dsd = TRUE)
df <- as.data.frame(sdmx, labels = TRUE)
head(df)
##   CO2 CO2_label.fr CO2_label.en VAR
## 1 TOT        Total        Total B11
## 2 TOT        Total        Total B11
## 3 TOT        Total        Total B11
## 4 TOT        Total        Total B11
## 5 TOT        Total        Total B11
## 6 TOT        Total        Total B11
##                                      VAR_label.fr
## 1 Entrées de personnes étrangères par nationalité
## 2 Entrées de personnes étrangères par nationalité
## 3 Entrées de personnes étrangères par nationalité
## 4 Entrées de personnes étrangères par nationalité
## 5 Entrées de personnes étrangères par nationalité
## 6 Entrées de personnes étrangères par nationalité
##                                   VAR_label.en GEN GEN_label.fr
## 1 Inflows of foreign population by nationality TOT        Total
## 2 Inflows of foreign population by nationality TOT        Total
## 3 Inflows of foreign population by nationality TOT        Total
## 4 Inflows of foreign population by nationality TOT        Total
## 5 Inflows of foreign population by nationality TOT        Total
## 6 Inflows of foreign population by nationality TOT        Total
##   GEN_label.en COU COU_label.fr COU_label.en TIME_FORMAT
## 1        Total AUS    Australie    Australia         P1Y
## 2        Total AUS    Australie    Australia         P1Y
## 3        Total AUS    Australie    Australia         P1Y
## 4        Total AUS    Australie    Australia         P1Y
## 5        Total AUS    Australie    Australia         P1Y
## 6        Total AUS    Australie    Australia         P1Y
##   TIME_FORMAT_label.en obsTime obsValue OBS_STATUS OBS_STATUS_label.en
## 1               Annual    2010   107740          e     Estimated value
## 2               Annual    2011   108865          e     Estimated value
## 3               Annual    2010   206714          e     Estimated value
## 4               Annual    2011   210704       <NA>                <NA>
## 5               Annual    2010    29307       <NA>                <NA>
## 6               Annual    2011    31204       <NA>                <NA>
Note that in case you are reading SDMX-ML documents with the native approach (with
URLs), instead of the embedded providers, it is also possible to associate a DSD
to a dataset by using the function setDSD. Let's try how it works:
#data without DSD
sdmx.data <- readSDMX(providerId = "OECD", resource = "data", flowRef = "MIG",
                key = list("TOT", NULL, NULL), start = 2010, end = 2011)
#DSD
sdmx.dsd <- readSDMX(providerId = "OECD", resource = "datastructure", resourceId = "MIG")
#associate data and dsd
sdmx.data <- setDSD(sdmx.data, sdmx.dsd)
This example shows you how to use rsdmx with local SDMX files, previously downloaded from EUROSTAT.
#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)
#read local SDMX (set isURL = FALSE)
sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)
By default, readSDMX considers the data source is remote. To read a local file, add isURL = FALSE.
This section will introduce you on how to read SDMX metadata documents, including concepts, codelists and a complete data structure definition (DSD)
Read concept schemes from FAO data portal
csUrl <- "http://data.fao.org/sdmx/registry/conceptscheme/FAO/ALL/LATEST/?detail=full&references=none&version=2.1"
csobj <- readSDMX(csUrl)
csdf <- as.data.frame(csobj)
Read codelists from FAO data portal
clUrl <- "http://data.fao.org/sdmx/registry/codelist/FAO/CL_FAO_MAJOR_AREA/0.1"
clobj <- readSDMX(clUrl)
cldf <- as.data.frame(clobj)
This example illustrates how to read a complete DSD using a OECD StatExtracts portal data source.
dsdUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/TABLE1"
dsd <- readSDMX(dsdUrl)
rsdmx is implemented in object-oriented way with S4 classes and methods. The properties of S4 objects are named slots and can be accessed with the slot method. The following code snippet allows to extract the list of codelists contained in the DSD document, and read one codelist as data.frame.
#get codelists from DSD
cls <- slot(dsd, "codelists")
#get list of codelists
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id"))
#get a codelist
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS") 
In a similar way, the concepts of the dataset can be extracted from the DSD and read as data.frame.
#get concepts from DSD
concepts <- as.data.frame(slot(dsd, "concepts"))