Background
The Source Linkage Files (SLFs) project was created using SPSS syntax
until March 2022 when this was transformed into R programming language.
The project has evolved into it’s own R package called
createslf
which contains functions for maintaining the
Reproducible Analytical Pipeline (RAP) to create the SLFs on a quarterly
basis.
The package bundles together code, data, documentation, and tests,
and is easy to share with others. To create the package
createslf
we refer to the R Packages guide by
Hadley Wickham.
This package is intended to be in continuous development to support the quarterly update process of the SLFs which takes place each March, June, September and December.
Developing createslf - setup
The createslf
package has a clear structure for
development. There are a few important features which are essential for
the setup of the package:
- README.md - Readme file contains important information about the package.
- R/ directory - Main development folder where all functions are stored. See the article for more information on function setup.
- DESCRIPTION file - The DESCRIPTION file provides overall metadata about the package, such as the package name and which other packages it depends on.
- NAMESPACE file - The NAMESPACE file specifies which functions the package makes available for others to use and, optionally, imports functions from other packages. This file is generated automatically when the package is build and should not be edited by hand. Changes should be made on individual functions by editing the roxygen2 documentation.
- man/ - Documentation is stored in the man/ directory. The documentation files are generated automatically when the package is build and should not be edited by hand. Changes should be made on individual functions by editing the roxygen2 documentaiton.
-
tests/ - Package tests are stored in the
tests/ directory. Testing is a vital part of package
development as it ensures that our code is working. Testing, however,
adds an additional step to the workflow. See chapter testing basics for
more info on package tests. Please note, this tests/
directory is completely separate to SLF quarterly tests and is intented
to test the functions within the
createslf
package.
Developing a new function - Best Practice:
- All function names should be in lower case, with words separated by an underscore
- Put a space after a comma, never before
- Put a space before and after infix operators such as <-, == and +
- Limit code to 80 characters per line
- Function documentation should be generated using roxygen2
- Functions should be tested using testthat where possible
- The package should always pass devtools::check()
Documentation
Function documentation is important for maintaining the package
createslf
. For Hadley Wickham’s guide see this chapter on documentation.
In summary, documentation is important for each function as this is
where users will use the help page to search with
?XXXfunction
.
The package createslf
uses roxygen2 format for
documenting functions. Here are some of the advantages:
Code and documentation are co-located. When you modify your code, it’s easy to remember to also update your documentation.
You can use markdown, rather than having to learn a one-off markup language that only applies to
.Rd
files. In addition to formatting, the automatic hyperlinking functionality makes it much, much easier to create richly linked documentation.There’s a lot of
.Rd
boilerplate that’s automated away.roxygen2 provides a number of tools for sharing content across documentation topics and even between topics and vignettes.
Documentation example:
The function below comes from 00-update_refs and
simply returns the delayed discharges period for the update as a
STRING
. This is used in the set up of functions, allowing
the correct path to the latest delayed discharges extract. The
roxygen2 documentation is well documented with:
Title - Title of function
Description - Brief description of the functions purpose.
Return - What should the function return?
export - Include to export this during the package load.
family - Groups similar functions together.
As this function is already documented, a .Rd
file will
exist in the man/ folder. However, if there were any
changes to this function or if the documentation was updated, the
.Rd
file would also need to be updated. To do this, you can
run devtools::document()
to generate (or update) the
package’s .Rd files. The .Rd
files should not be edited by
hand and should be done by running
devtools::document()
.
#' Delayed Discharge period
#'
#' @description Get the period for Delayed Discharge
#'
#' @return The period for the Delayed Discharge file
#' as MMMYY_MMMYY
#' @export
#'
#' @family initialisation
get_dd_period <- function() {
first_part <- substr(previous_update(), 1, 3)
end_part <- substr(previous_update(), 7, 8)
dd_period <- as.character(stringr::str_glue("Jul16_{first_part}{end_part}"))
return(dd_period)
}
Checking an R package - R CMD check
Base R provides various command line tools and R CMD check is the official method for checking that an R package is valid. It is essential to pass R CMD check if you plan to submit your package to CRAN, but we highly recommend holding yourself to this standard even if you don’t intend to release your package on CRAN. R CMD check detects many common problems that you’d otherwise discover the hard way.
Our recommended way to run R CMD check is in the R console via devtools:
devtools::check()
We recommend this because it allows you to run R CMD check from
within R, which dramatically reduces friction and increases the
likelihood that you will check()
early and often! This
emphasis on fluidity and fast feedback is exactly the same motivation as
given for load_all()
. In the case of check(), it really is
executing R CMD check for you. It’s not just a high fidelity simulation,
which is the case for load_all()
.
Workflows
The workflow for checking a package is simple, but tedious:
Run
devtools::check()
, or pressCtrl/Cmd + Shift + E
.Fix the first problem.
Repeat until there are no more problems.
R CMD check returns three types of messages:
ERRORs: Severe problems that you should fix regardless of whether or not you’re submitting to CRAN.
WARNINGs: Likely problems that you must fix if you’re planning to submit to CRAN (and a good idea to look into even if you’re not).
NOTEs: Mild problems or, in a few cases, just an observation. If you are submitting to CRAN, you should strive to eliminate all NOTEs, even if they are false positives. If you have no NOTEs, human intervention is not required, and the package submission process will be easier. If it’s not possible to eliminate a NOTE, you’ll need describe why it’s OK in your submission comments, as described in Section 22.7. If you’re not submitting to CRAN, carefully read each NOTE. If it’s easy to eliminate the NOTEs, it’s worth it, so that you can continue to strive for a totally clean result. But if eliminating a NOTE will have a net negative impact on your package, it is reasonable to just tolerate it. Make sure that doesn’t lead to you ignoring other issues that really should be addressed.
Testing the package
Testing is a vital part of developing the createslf
as
it ensures that the code does what we want. Testing, however, adds an
additional step to the workflow. Please see this chapter for more
information on setting up tests.
createslf
tests sit on the tests/testthat/
directory. To set up a new test you can run
usethis::use_test()
to create a new test file.
To test the file run:
testthat::test_file("tests/testthat/test-XXX.R")
Expectations
An expectation is the finest level of testing. It makes a binary assertion about whether or not an object has the properties you expect. This object is usually the return value from a function in your package.
All expectations have a similar structure:
They start with
expect_
.They have two main arguments: the first is the actual result, the second is what you expect.
If the actual and expected results don’t agree, testthat throws an error.
Some expectations have additional arguments that control the finer points of comparing an actual and expected result.
An example of one of our tests is to check that the
convert_ca_to_lca
function is working correctly and returns
the correct lca code.
library(testthat)
library(createslf)
#>
#> Attaching package: 'createslf'
#> The following object is masked _by_ '.GlobalEnv':
#>
#> get_dd_period
test_that("Can convert ca code to lca code", {
ca <- c(
"S12000033",
"S12000049",
"S12000048",
NA,
"S12345678"
)
expect_equal(
convert_ca_to_lca(ca),
c(
"01",
"17",
"25",
NA,
NA
)
)
})
#> Test passed 🥇