The French official open data portal offers a huge quantity of information. They also provide a well structured API. The BARIS package allows you to exploit this API in order to get the required data from the portal.
Within the portal there is the concept of a data set which contains one or several data frames or resources. So, if I use the resource term, you need to apprehend it as the data frame inside a data set.
The package is available on CRAN, you can also install the development version from Github:
install.packages("BARIS")
Too much talking, let’s dive into a reproducible example.
The BARIS_search()
function allows you to search for a specified data set. A quick tip: within your query, use plain Nouns and avoid prepositions and determinants: le, la, de, des, en, à … and so on :
library(BARIS)
BARIS_search(query = "Monuments Historiques Marseille")
#> # A tibble: 20 x 11
#> id title organization page views frequency created_at last_modified
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 5cebf~ "Marse~ https://stati~ https~ 81190 unknown 2013-10-2~ 2020-06-29T0~
#> 2 536c4~ "Monum~ https://stati~ https~ 43725 annual 2013-11-0~ 2022-01-13T1~
#> 3 54a13~ "Monum~ https://stati~ https~ 0 punctual 2014-12-2~ 2015-08-07T1~
#> 4 5dde8~ "Monum~ https://stati~ https~ 57 punctual 2019-11-2~ 2019-11-28T1~
#> 5 5fdaa~ "Monum~ https://stati~ https~ 4 unknown 2020-12-1~ 2020-12-16T0~
#> 6 6206f~ "Monum~ https://stati~ https~ 2 unknown 2022-02-1~ 2022-03-24T0~
#> 7 55253~ "Monum~ <NA> https~ 0 punctual 2015-04-0~ 2016-02-10T1~
#> 8 55253~ "Monum~ <NA> https~ 0 punctual 2015-04-0~ 2015-12-23T0~
#> 9 55520~ "Monum~ <NA> https~ 0 unknown 2015-05-1~ 2015-05-12T1~
#> 10 55520~ "Monum~ <NA> https~ 0 unknown 2015-05-1~ 2015-05-12T1~
#> 11 5e78d~ "Monum~ https://stati~ https~ 6 punctual 2020-03-2~ 2020-03-23T1~
#> 12 602fb~ "Epône~ https://stati~ https~ 1 unknown 2021-02-1~ 2021-02-18T1~
#> 13 618b8~ "Avign~ https://stati~ https~ 2 unknown 2018-03-1~ 2020-12-24T0~
#> 14 58aef~ "Monum~ https://stati~ https~ 0 unknown 2017-02-2~ 2019-03-05T0~
#> 15 619f8~ "Monum~ https://stati~ https~ 2 punctual 2021-11-2~ 2021-11-26T1~
#> 16 5beab~ "Liste~ https://stati~ https~ 81297 unknown 2018-11-1~ 2016-08-04T1~
#> 17 617ba~ "Couch~ <NA> https~ 5 unknown 2021-10-2~ 2021-10-29T0~
#> 18 53699~ "Monum~ https://stati~ https~ 0 unknown 2013-09-1~ 2015-07-15T1~
#> 19 54296~ "Monum~ https://stati~ https~ 1155~ unknown 2014-09-2~ 2016-03-03T1~
#> 20 5878e~ "Monum~ https://stati~ https~ 6 unknown 2013-05-1~ 2017-07-10T0~
#> # ... with 3 more variables: last_update <chr>, archived <chr>, deleted <chr>
Cool we have our data set … but wait it would be better to get some explanation about it.
The BARIS_explain()
function provides a description of a data set. The function takes one argument which is the ID of the data set:
BARIS_explain(datasetId = "5cebfa8306e3e77ffdb31ef5")
#> [1] "Monuments historiques situés sur le territoire de Marseille, avec adresse, numéro de base Mérimée (base de données du Ministère de la Culture recensant les monuments historiques de toute la France) et points de géolocalisation"
Don’t panic if you’re not a french speaker. You can always use the great googleLanguageR.
Now, it’s time to list the resources contained within this data set !!!
The BARIS_resources
function displays the available resources or data frames within a data set. The function takes as argument the ID of the data set:
BARIS_resources(datasetId = "5cebfa8306e3e77ffdb31ef5")
#> # A tibble: 2 x 6
#> id title format published url description
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 59ea7bba-~ MARSEILLE_~ csv 2019-05-27~ https://trouve~ Monuments historiqu~
#> 2 6328f8b3-~ Plan des M~ pdf 2019-05-27~ https://trouve~ Edition Janvier 2013
You can see from above that the data set has two resources, a csv and a pdf. Now, we’ve reached the interesting part: extracting the data frame that you’ll work on !
Using BARIS_extract()
you can extract directly into your R session the needed data set. Currently, “only” theses formats are supported: json, csv, xls, xlsx, xml, geojson and shp, nevertheless you can always rely on the url of the resource to download it manually.
In order to use the function you’ll have to specify two arguments: The ID of the resource and its format.
You can visually catch the structure difference between the ID of a data set and the ID of a resource.
<- BARIS_extract(resourceId = "59ea7bba-f38a-4d75-b85f-2d1955050e53", format = "csv")
data
head(data)
#> # A tibble: 6 x 10
#> n_base_merimee date_de_protection_a~ denomination adresse code_postal
#> <chr> <chr> <chr> <chr> <int>
#> 1 PA00081336 Classement : liste d~ Ancienne église de~ "/" 13002
#> 2 PA00081340 Classement: 13/09/19~ Eglise Saint-Laure~ "Esplana~ 13002
#> 3 PA00081331 Classement: 29/01/19~ Chapelle et Hospic~ "2, Rue ~ 13002
#> 4 PA00081344 Classement: 16/06/19~ Fort Saint-Jean "" 13002
#> 5 PA00081325 Inscription : 23/11/~ Les deux bâtiments~ "Quai du~ 13002
#> 6 PA00081334 Inscription : 07/07/~ Clocher des Accoul~ "Montée ~ 13002
#> # ... with 5 more variables: proprietaire_du_monument <chr>,
#> # epoque_de_construction <chr>, date_de_construction <chr>, longitude <dbl>,
#> # latitude <dbl>
End of the vignette.