phsopendata provides helper functions for discovering and downloading data from the Scottish Health and Social Care Open Data platform using the CKAN API.

It can be used to:

  • search for datasets and resources;
  • download a single resource by resource ID;
  • download multiple resources from a dataset;
  • filter rows and select columns before downloading data;
  • retrieve the latest resource from datasets that publish new resources over time;
  • run SQL queries against Open Data resources.

See the package website for full documentation.

Installation

# The easiest way to get phsopendata is to install from CRAN:
install.packages("phsopendata")

Development version

To get a bug fix or to use a feature from the development version, you can install the development version of phsopendata from GitHub.

# install.packages("remotes")
remotes::install_github("Public-Health-Scotland/phsopendata")

Quick start

Find resources

To download data, you will usually need either a dataset_name or a resource_id.

These can be found in the dataset metadata, in the URL of a dataset or resource page on https://www.opendata.nhs.scot/, or by searching with list_resources(). list_datasets() is still available for compatibility, but list_resources() is recommended for new code.

library(phsopendata)

resources <- list_resources(
  dataset_contains = "gp practice",
  resource_contains = "list sizes"
)

all_datasets <- list_datasets()

Download a resource

# define a resource ID
res_id <- "a794d603-95ab-4309-8c92-b48970478c14"

# download the data
open_data <- get_resource(res_id)

Filter rows and select columns

You can define a row limit with the rows argument to get the first N rows of a table.

# get first 100 rows
open_data <- get_resource(
  res_id = res_id,
  rows = 100
)

You can use col_select and row_filters to query the data server-side, i.e. the data is filtered before it is downloaded.

# select columns and filter rows before downloading
open_data <- get_resource(
  res_id = res_id,
  col_select = c("GPPracticeName", "TelephoneNumber"),
  row_filters = list(
    HB = "S08000017",
    Dispensing = "Y"
  )
)

Download a dataset

In this example, we are downloading GP Practice Population Demographics from: opendata.nhs.scot/dataset/gp-practice-populations, so the dataset name will be gp-practice-populations.

# if max_resources is not set, all resources will be returned by default.
# Here we pull 10 rows from the first 2 resources only
practice_pops <- get_dataset("gp-practice-populations", max_resources = 2, rows = 10)

Download the latest resource from a dataset

Some datasets publish new resources over time rather than replacing an existing resource. For these datasets, you can use get_latest_resource().

latest_contacts <- get_latest_resource(
  dataset_name = "gp-practice-contact-details-and-list-sizes",
  col_select = c("PracticeCode", "PracticeName", "Postcode", "Dispensing")
)

Query using SQL

For more flexible server-side queries, use get_resource_sql(). SQL queries can return a maximum of 32,000 rows.

cancelled_ops <- get_resource_sql(r"[
SELECT
    "Hospital",
    "Month",
    "TotalCancelled",
    "TotalOperations"
FROM
    "bcc860a4-49f4-4232-a76b-f559cf6eb885"
WHERE
    "Hospital" = 'D102H'
]")

Function overview

  • list_resources() searches available datasets and resources on the Open Data platform.
  • get_resource() downloads a single resource by resource ID, with optional row filtering and column selection.
  • get_dataset() downloads multiple resources from a dataset by dataset name.
  • get_latest_resource() downloads the most recent resource from datasets that publish new resources over time.
  • get_resource_sql() runs a SQL query against one or more Open Data resources.
  • get_dataset_additional_info() returns summary information about a dataset, including the number of resources and the latest update date.
  • list_datasets() returns dataset names only. It is retained for compatibility but has been superseded by list_resources().

Contributing

This package is maintained by Public Health Scotland.

For requests, bug reports or suggestions, please use the GitHub issues page or contact the PHS Open Data team.

If you would like to share examples of how you work with open data, you can also do so in the Open Data repository, where example scripts and resources are collated.