class: right, middle, title-slide .title[ #
Building an R package ] .subtitle[ ##
Become an R developer
] .author[ ###
Nicolas Casajus & Aurélie Siberchicot ] .date[ ### .inst[November 2023] ] --- ## What's an R Package? --
In
, the fundamental unit of shareable code is the package. A package bundles together `code`, `data`, `documentation`, and `tests`, and is easy to share with others. .right[— **_Hadley Wickham_**] -- <br /> An
package: - is a collection of `well-documented functions` - makes your work more `reproducible` - makes your code `useful` for you and for others -- <br /> As of today (2023-11-22): - **20065** packages are available on the [`CRAN`](https://cran.r-project.org) - **2266** packages on `Bioconductor` --- ## Must-read resources <br /> .center[ [![:scale 36%](img/r-pkg.jpg)](https://r-pkgs.org/) [![:scale 34%](img/r-ext.png)](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) ] --- ## Recommended environment <br /> .center[ ![:scale 40%](img/rstudio.png) ] .center[ ![:scale 13.0%](img/hex-devtools.png) ![:scale 13.0%](img/hex-usethis.png) ![:scale 13.0%](img/hex-roxygen2.png) ![:scale 13.0%](img/hex-rmarkdown.png) ![:scale 13.0%](img/hex-testthat.png) ] .center[ ![:scale 28.0%](img/up-to-date.png) ] --- ## Development workflow <br /> ![:scale 100.0%](img/pkg-steps.png) --- ## Creating the structure <br /> ![:scale 100.0%](img/pkg-step-1.png) --- ## Creating the structure <!-- - Using **RStudio** --> ![:scale 10%](img/rstudio.png) .center[ ![:scale 32%](img/new-1.png) ![:scale 32%](img/new-2.png) ![:scale 32%](img/new-3.png) ]
Make sure to select `Create R Package using devtools`
A **package name** can only contain `letters`, `numbers`, and the `.`
Check that the chosen name is not already in use with `available::available()` -- <br /> ```r ## Alternatively ---- usethis::create_package("/absolute/path/to/the/package/name") ``` --- ## Package structure ``` . ├── (.git) # Git files system ├── (.gitignore) # Untracked files by git │ ├── (mypkg.Rproj) # RStudio files │ ├── .Rbuildignore # List of non-standard package files │ ├── R/ # Folder to store (only) R functions │ ├── myfun-1.R # A first R function │ └── myfun-2.R # A second R function │ ├── man/ # R functions documentation folder (automatically edited) │ ├── my_fun_1.Rd │ └── my_fun_2.Rd │ ├── DESCRIPTION # Package metadata │ └── NAMESPACE # Automatically edited ``` --- ## Writing an R function <br /> ![:scale 100.0%](img/pkg-step-2.png) --- ## Writing an R function
Let's create a first function `moyenne()`. -- <br /> - First, we will create a new
file in the folder **R/**. ```r usethis::use_r("moyenne") ``` -- <br /> - Now we can implement our function `moyenne()`. ```r moyenne <- function(x) sum(x) / length(x) ```
Resources: [`Tidyverse style guide`](https://style.tidyverse.org/) --- ## Time to document <br /> ![:scale 100.0%](img/pkg-step-3.png) --- ## Time to document .pull-leftt[ .center[[![:scale 75%](img/hex-roxygen2.png)](https://cran.r-project.org/web/packages/roxygen2/vignettes/roxygen2.html)] ] .pull-rightt[ - Specially-structured comments **preceding** each function definition - Lightweight syntax easy to write and to read - Syntax: `#' @field value` - Keep function definition and documentation in the same file - Automatically write `.Rd` files and **NAMESPACE** ] -- Each `roxygen` header will always start with these two fields: ```r #' @title Short title of the function (one line) #' #' @description A longer description of what the function does (several lines) ``` --
Keywords `@title` and `@description` can be omitted. ```r #' Short title of the function (one line) #' #' A longer description of what the function does (several lines) ``` --- ## Time to document
If your function has `input parameters`, each one must be documented. ```r #' @param param_1 description #' @param param_2 description ``` -- In our function `moyenne()`: ```r #' @param x a numeric vector ``` -- <br />
If your function `returns` an
object, use the keyword `@return`. ```r #' @return What the function returns. ``` -- In our function `moyenne()`: ```r #' @return A `numeric` representing the arithmetic mean of `x`. ``` -- If your function returns nothing, use the keyword `@return` as follow: ```r #' @return No return value. ``` --- ## Time to document
Add a section `@examples` to show how to use your function: ```r #' @examples #' x <- 1:10 #' moyenne(x) ``` -- <br /> If you don't want your example to be executed, use `\dontrun{}` when your example returns an error or does something that is dangerous ```r #' @examples #' \dontrun{ #' x <- 1:10 #' moyenne(x) #' } ``` or `\donttest{}` when your example takes too long (for the CRAN standards). ```r #' @examples #' \donttest{ #' moyenne(runif(10e7)) #' } ``` --- ## Time to document
Finally if you want your function to be used directly by user you need to add this tag. ```r #' @export ``` --- ## Time to document Back to the R file of our function ```r #' Compute the arithmetic mean #' #' This function computes the arithmetic mean of a numeric variable. #' #' @param x a numeric vector #' #' @return A `numeric` representing the arithmetic mean of `x`. #' #' @export #' #' @examples #' x <- 1:10 #' moyenne(x) moyenne <- function(x) sum(x) / length(x) ``` --- ## Generating the documentation
It's time to generate the corresponding `.Rd` file from this `roxygen` header.
All `.Rd` files are stored in the `man` folder. ```r devtools::document() ``` ``` ✓ Writing 'man/moyenne.Rd' ✓ Writing 'NAMESPACE' ``` -- <br />
In addition to the creation of `man/moyenne.Rd` file, the `NAMESPACE` has been updated.
This file lists which functions need to be exported, i.e. directly usable when loading the package (this file also deals with external dependencies). ```r # Generated by roxygen2: do not edit by hand export(moyenne) ``` --- ## Testing the function <br /> ![:scale 100.0%](img/pkg-step-4.png) --- ## Testing the function
Before going any further, we have to try our function. So we will load our package (and **NOT** sourcing the function). ```r devtools::load_all() ``` -- <br /> Now we can use our function: ```r moyenne(c(1, 2)) ``` ``` ## [1] 1.5 ``` -- <br /> What about `NA`? ```r moyenne(c(1, 2, NA)) ``` ``` ## [1] NA ``` ***Hum...*** --- ## Modifying the function Our function does not seem to work properly.
We need to update the code to deal with `NA` values. -- ```r moyenne <- function(x) { x <- na.omit(x) sum(x) / length(x) } ``` -- <br /> Let's test the function again: ```r ## Reload the function ---- devtools::load_all() ## Testing the function ---- moyenne(c(1, 2)) ## [1] 1.5 ## Testing the function (with NA) ---- moyenne(c(1, 2, NA)) ## [1] 1.5 ``` --- ## Modifying the function That's better, but... If user has `NA` values, this implementation will not inform him/her and will make the decision to remove `NA`. Instead we are going to let user choose to delete the `NA` or not. -- <br />
Let's add an additional parameter to our function: `na_rm` with a default value (`FALSE`). If `x` contains `NA` values and `na_rm = FALSE`, then an error will be returned. If `na_rm = TRUE`, `NA` values will be removed and the computation can be done. -- ```r moyenne <- function(x, na_rm = FALSE) { if (any(is.na(x))) { if (na_rm) { x <- na.omit(x) } else { stop("Argument 'x' contains NA values. Use 'na_rm = TRUE' to remove missing values.") } } sum(x) / length(x) } ``` --- ## Modifying the function Let's test the function again. ```r ## Reload the function ---- devtools::load_all() ## Testing the function ---- moyenne(x = c(1, 2)) ## [1] 1.5 moyenne(x = c(1, 2, NA)) ## Error in moyenne(x = c(1, 2, NA)): ## Argument 'x' contains NA values. Use 'na_rm = TRUE' to remove missing values. moyenne(x = c(1, 2, NA), na_rm = TRUE) ## [1] 1.5 ``` --- ## Update documentation <br /> ![:scale 100.0%](img/pkg-step-3.png) --- ## Update documentation ```r #' Compute the arithmetic mean #' #' This function computes the arithmetic mean of a numeric variable. #' #' @param x a numeric vector (can contain `NA` values). #' #' @param na_rm a logical value indicating whether `NA` values should be #' stripped before the computation proceeds. Default is `FALSE`. #' #' @return A `numeric` representing the arithmetic mean of `x`. #' #' @details An error will be returned if `x` contains `NA` values and `na_rm` #' is `FALSE` (default behaviour). #' #' @export #' #' @examples #' moyenne(x = c(1, 2)) #' #' \dontrun{ #' moyenne(x = c(1, 2, NA)) # error #' } #' #' moyenne(x = c(1, 2, NA), na_rm = TRUE) moyenne <- function(x, na_rm = FALSE) { ... } ``` And update the corresponding `.Rd` file and the `NAMESPACE`. ```r devtools::document() ``` --- ## And so on... <br /> ![:scale 100.0%](img/pkg-step-5.png) --- ## Package metadata Before we go any further, we need to edit some information about our package, using the `DESCRIPTION` file: - what defines our package (name, title, description, version, authors)? - who can use our package and how (licence)? - who should be contacted if there is a problem or question (the maintainer has the `cre` role)? ```r usethis::edit_file("DESCRIPTION") ``` ``` Package: mypkg Title: A Minimal but Complete R Package Version: 0.0.0.9000 Authors@R: person(given = "Nicolas", family = "Casajus", role = c("aut", "cre", "cph"), email = "nicolas.casajus@fondationbiodiversite.fr", comment = c(ORCID = "0000-0002-5537-5294")) Description: Illustrates the main structure and components of an R Package with respect to the CRAN submission policies <https://cran.r-project.org/>. License: GPL (>= 2) Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.1.2 ```
Resources: [`DESCRIPTION file`](https://r-pkgs.org/description.html) and [`Choose a license`](https://choosealicense.com) --- ## Check package <br /> ![:scale 100.0%](img/pkg-step-6.png) --- ## Check package ```r devtools::check() ``` ``` ── R CMD check results ─────────────────────────────────── mypkg 0.0.0.9000 ─────── Duration: 11.6s > checking R code for possible problems ... NOTE moyenne: no visible global function definition for ‘na.omit’ Undefined global functions or variables: na.omit Consider adding importFrom("stats", "na.omit") to your NAMESPACE file. 0 errors ✓ | 0 warnings ✓ | 1 note x ``` -- <br />
**1 note**: Let's talk about package dependencies! --- ## Package dependencies ``` > checking R code for possible problems ... NOTE moyenne: no visible global function definition for ‘na.omit’ Undefined global functions or variables: na.omit Consider adding importFrom("stats", "na.omit") to your NAMESPACE file. ``` <br />
So we need to import the function `na.omit()` from the package `{stats}`. In `roxygen` header, import: - either the whole package, with the tag `@import`: ```r #' @import stats ``` - or only specific functions from a package, with the tag `@importFrom`: ```r #' @importFrom stats na.omit ``` --- ## Package dependencies
In your R code, call external functions with `::` for clarity and efficiency.<br /> Here `stats::na.omit()` ```r moyenne <- function(x, na_rm = FALSE) { if (any(is.na(x))) { if (na_rm) { x <- stats::na.omit(x) } else { stop("Argument 'x' contains NA values. Use 'na_rm = TRUE' to remove missing values.") } } sum(x) / length(x) } ``` --- ## Package dependencies
**Do not forget** to update the `NAMESPACE` (and the `.Rd` files) with `devtools::document()`. ``` export(moyenne) importFrom(stats,na.omit) ``` --
The `NAMESPACE` controls what happens when our package is loaded but not when it's installed. This is the role of the `DESCRIPTION` file and we need to add dependencies to this file. --- ## Package dependencies
Let's add the dependency `stats` in the `DESCRIPTION` file. ```r usethis::use_package("stats", type = "Imports") ``` <br /> ``` Package: mypkg Title: A Minimal but Complete R Package Version: 0.0.0.9000 Authors@R: person(given = "Nicolas", family = "Casajus", role = c("aut", "cre", "cph"), email = "nicolas.casajus@fondationbiodiversite.fr", comment = c(ORCID = "0000-0002-5537-5294")) Description: Illustrates the main structure and components of an R Package with respect to the CRAN submission policies <https://cran.r-project.org/>. License: GPL (>= 2) Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.1.2 Imports: stats ``` --- ## Dependencies types - `Depends`: packages listed in this field are `installed` when your package is installed and are `attached` when your package is attached. **NEVER** use `Depends` and always use `Imports` (except for special cases). -- - `Imports`: packages listed in this field are `installed` when your package is installed but are `not attached` when your package is attached (there are just loaded). **ALWAYS** use this method. -- - `Suggests`: packages listed in this field are `not installed` when your package is installed. Your package can use these packages, but doesn't require them (e.g. to run tests, build vignettes, etc.). -- <br/>
Resources: Wickham & Bryan - R Packages [**Chap 9**](https://r-pkgs.org/description.html), [**Chap 10**](https://r-pkgs.org/dependencies-mindset-background.html) and [**Chap 11**](https://r-pkgs.org/dependencies-in-practice.html) --- ## Check package <br /> ![:scale 100.0%](img/pkg-step-6.png) --- ## Check package ```r devtools::check() ``` ``` ── R CMD check results ─────────────────────────────────── mypkg 0.0.0.9000 ─────── Duration: 11.6s 0 errors ✓ | 0 warnings ✓ | 0 notes ✓ ``` --- ## Install package <br /> ![:scale 100.0%](img/pkg-step-7.png) --- ## Install package ```r ## Install package ---- devtools::install() ``` -- <br />
Now we can use our package. ```r ## Load and attach the package ---- library("mypkg") ## Use the package ---- moyenne(c(1, 2)) ``` ```r ## Use the package (without attaching it) ---- mypkg::moyenne(c(1, 2)) ``` --- class: inverse, center, middle ## To go further... --- ## Advanced tests - Testing is a vital part of package development - But until now we just tried our code informally and on the fly - Problem: it's time consuming, repetitive and it can break the code
Package `{testthat}` .pull-leftt[ .center[[![:scale 75%](img/hex-testthat.png)](https://testthat.r-lib.org)] ] .pull-rightt[ - Implements a lot of unit tests - Formal automated testing - Explicits how your code should behave - Makes your code more robust ] ```r usethis::use_testthat() ``` <br />
Resources: [**R Packages Chap 13**](https://r-pkgs.org/testing-basics.html) --- ## Add a vignette .pull-leftt[ .center[[![:scale 75%](img/hex-rmarkdown.png)](https://rmarkdown.rstudio.com/)] ] .pull-rightt[ - A tutorial for your package - Shows how to use your package - Uses the syntax `Rmarkdown` ] ```r usethis::use_vignette("mypkg") ``` <br />
Resources: [**R Package Chap 17**](https://r-pkgs.org/vignettes.html) --- ## And... - Deploy on **GitHub** - `usethis::use_github()` - Add a **README** (and badges) - `usethis::use_readme_rmd()` - Add a **Website** with `pkgdown` - Add a **Logo** with `hexSticker` - CI/CD with **GitHub Actions** - `usethis::use_github_action_*()` - Check your package with `rhub` - Add a **DOI** with Zenodo - Add a **NEWS** file - `usethis::use_news_md()` <br /> - Submit to CRAN - `R CMD check pkg --as-cran` - on the current version of R-devel - on at least two OS - blank report: `0 errors | 0 warnings | 0 notes` <br /> The ultimate resources: [**https://r-pkgs.org**](https://r-pkgs.org) and [**Writing R Extensions**](https://cran.r-project.org/doc/manuals/r-release/R-exts.html)