Code refactoring

Writing and documenting   functions

November 2025

Nicolas Casajus

Senior data scientist
@FRB-CESAB    

What’s a function?

A function is a block of code organized together to perform a specific task and only runs when it is called. It can have parameters and can return a result.


 Automate common and repetitive tasks

 Increase reproducibility and readability of the code



Advantages1

  • You can give a function an evocative name that makes your code easier to understand.
  • As requirements change, you only need to update code in one place, instead of many.
  • You eliminate the chance of making incidental mistakes when you copy and paste.
  • It makes it easier to reuse work from project-to-project, increasing your productivity over time.

When to write a function?



Same code reused twice
(do not repeat yourself)

and/or

Reduce code complexity
(workflow modularity)

and/or

Improve code organisation
(workflow readability)

Writing a function

## Function definition ----

function_name <- function(input) {
  
  # Body 
  # of the function
  
  return(output)
}
  • A function is defined by calling function()
  • A function can have a (evocative) name1
  • A function can have 0, 1 or many arguments (inputs)
  • A function can return a value (output)

Writing a function

## Function definition ----

function_name <- function(input) {
  
  # Body 
  # of the function
  
  return(output)
}
  • A function is defined by calling function()
  • A function can have a (evocative) name
  • A function can have 0, 1 or many arguments (inputs)
  • A function can return a value (output)




Always save your functions in the folder R/1

Writing a function

1. Define the function

## Function definition ----

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  return(y)
}


2. Save & source the file

## Function load ----

source(here::here("R", "arithmetic-mean.R"))


3. Use (call) the function

## Function call ----

arithmetic_mean(x = c(4, 6, 5, 10))



  Do not forget to source the file each time you modify the function

Exiting a function

  Functions exit in two ways




Success

i.e. return a value w/ return()


Failure

i.e. throw an error w/ stop()

Function return

  A function only returns one single object


arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  return(y)
}


  Use list() if you want to return more that one object


function_name <- function(x) {
  
  # ...

  result <- list(
    object_1,
    object_2,
    ...
  )
  
  return(result)
}

Function return

  In most cases, you can omit the keyword return(): this is called an implicit return


Explicit return

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  return(y)
}

Implicit return

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  y
}

Function return

  In most cases, you can omit the keyword return(): this is called an implicit return


Explicit return

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  return(y)
}

Implicit return

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  y
}

Implicit return & simplification

arithmetic_mean <- function(x) {
  
  sum(x) / length(x)
}

Function return

  But sometimes, you need an explicit return


Explicit returns

arithmetic_mean <- function(x) {
  
  if (any(is.na(x))) {

    return(0)
  }

  y <- sum(x) / length(x)
  
  return(y)
}

WRONG behaviour

arithmetic_mean <- function(x) {
  
  if (any(is.na(x))) {

    0
  }

  sum(x) / length(x)
}


Explicit and implicit returns

arithmetic_mean <- function(x) {
  
  if (any(is.na(x))) {

    return(0)
  }

  sum(x) / length(x)
}

No return

  Not all functions necessarily return an object


For instance,

  • a function that downloads / saves a file on the disk
  • a function that creates a plot (w/ graphics)
  • a function that modifies a global option, …



  In that case, the function will return NULL

# Define the function ----
save_as_csv <- function(data) {

  write.csv(data, "filename.csv", row.names = FALSE)
}

# Create fake dataset ----
tab <- data.frame(x = 1:10, y = letters[1:10])

# Call the function ----
result <- save_as_csv(data = tab)

result
## NULL

Invisible return

  Sometimes, you might want to make the function’s return invisible if it is not assigned to a variable with the function invisible()


This behaviour is usually used:

  • when the function returns NULL
  • when the return is not the main purpose of the function



  If you don’t assigne the output of the function, it will no longer be accessible afterwards


  If you implement an invisible return, you have to store the output of the function in a variable and print this variable

# Define the function ----
save_as_csv <- function(data) {

  file_name <- here::here("outputs", "filename.csv")
  write.csv(data, file_name, row.names = FALSE)

  invisible(file_name)
}

# Call the function (no assignation, no return) ----
save_as_csv(data = tab)

# Call the function (w/ assignation) ----
path <- save_as_csv(data = tab)

path
## [1] '/home/user/documents/project_a/outputs/filename.csv'

Arguments

  A function can have no argument

# Define the function ----
read_data <- function() {

  file_name <- file.path("data", "filename.csv")

  read.csv(file_name)
}

# Call the function ----
tab <- read_data()

Arguments

  A function can have no argument

# Define the function ----
read_data <- function() {

  file_name <- file.path("data", "filename.csv")

  read.csv(file_name)
}

# Call the function ----
tab <- read_data()

  A function can have one argument

# Define the function ----
read_data <- function(filename) {

  file_name <- file.path("data", filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv")

Arguments

  A function can have no argument

# Define the function ----
read_data <- function() {

  file_name <- file.path("data", "filename.csv")

  read.csv(file_name)
}

# Call the function ----
tab <- read_data()

  A function can have one argument

# Define the function ----
read_data <- function(filename) {

  file_name <- file.path("data", filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv")


  A function can have many arguments

# Define the function ----
read_data <- function(filename, path) {

  file_name <- file.path(path, filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv", path = "data")

Arguments

  A function can have no argument

# Define the function ----
read_data <- function() {

  file_name <- file.path("data", "filename.csv")

  read.csv(file_name)
}

# Call the function ----
tab <- read_data()

  A function can have one argument

# Define the function ----
read_data <- function(filename) {

  file_name <- file.path("data", filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv")


  A function can have many arguments

# Define the function ----
read_data <- function(filename, path) {

  file_name <- file.path(path, filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv", path = "data")

  Arguments can have default value

# Define the function ----
read_data <- function(filename, path = "data") {

  file_name <- file.path(path, filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv")
tab <- read_data(filename = "filename.csv", path = "output")

Arguments1

  A function can have no argument

# Define the function ----
read_data <- function() {

  file_name <- file.path("data", "filename.csv")

  read.csv(file_name)
}

# Call the function ----
tab <- read_data()

  A function can have one argument

# Define the function ----
read_data <- function(filename) {

  file_name <- file.path("data", filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv")


  A function can have many arguments

# Define the function ----
read_data <- function(filename, path) {

  file_name <- file.path(path, filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv", path = "data")

  Arguments can have default value

# Define the function ----
read_data <- function(filename, path = "data") {

  file_name <- file.path(path, filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data(filename = "filename.csv")
tab <- read_data(filename = "filename.csv", path = "output")

Arguments

  Argument order matters

# Define the function ----
read_data <- function(filename, path) {

  file_name <- file.path(path, filename)

  read.csv(file_name)
}

# Call the function ----
tab <- read_data("filename.csv", "data")
tab <- read_data(path = "data", filename = "filename.csv")


  Create pipeable functions

# Define the function ----
subset_data <- function(data, column) {

  data[, column, drop = FALSE]
}

data(iris)

# Call the function ----
subset_data(data = iris, column = "Petal.Length")

# Call the function (pipe version) ----
iris |> 
  subset_data(column = "Petal.Length")

# Call the function (pipe version) ----
iris %>%
  subset_data(column = "Petal.Length")

Environments

  Object created inside a function (local environment) is not accessible from the global environment

# Define the function ----
arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  y
}

y
# Error: object 'y' not found


  Object created in the global environment is accessible from the function

x <- 1:10

# Define the function ----
arithmetic_mean <- function() {
  
  sum(x) / length(x)
}

arithmetic_mean()
# 5.5


  But use arguments instead

vect <- 1:10

# Define the function ----
arithmetic_mean <- function(x) {
  
  sum(x) / length(x)
}

arithmetic_mean(x = vect)
# 5.5

Case study

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Convert into function - v0.1

download_gadm <- function() {
    
  ##
  ## DISTANT FILE
  ##

  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- "gadm41_FRA.gpkg"

  # Full URL ----
  (full_url <- paste0(base_url, file_name))
  # https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

  ##
  ## LOCAL FILE
  ##

  # Destination folder ----
  (dest_dir <- here::here("data"))
  # /home/nicolas/projects/demo/data

  # Destination file ----
  (full_path <- file.path(dest_dir, file_name))
  # /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

  ##
  ## DOWNLOAD FILE
  ##

  download.file(url = full_url, destfile = full_path, mode = "wb")
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Clean comments - v0.2

download_gadm <- function() {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- "gadm41_FRA.gpkg"

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination folder ----
  dest_dir <- here::here("data")

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Function return - v0.3

download_gadm <- function() {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- "gadm41_FRA.gpkg"

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination folder ----
  dest_dir <- here::here("data")

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  full_path
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Invisible return - v0.4

download_gadm <- function() {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- "gadm41_FRA.gpkg"

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination folder ----
  dest_dir <- here::here("data")

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  invisible(full_path)
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Add ‘file_name’ argument - v0.5

download_gadm <- function(file_name = "gadm41_FRA.gpkg") {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  # 

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination folder ----
  dest_dir <- here::here("data")

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  invisible(full_path)
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Rename ‘file_name’ argument - v0.6

download_gadm <- function(country = "FRA") {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- paste0("gadm41_", country, ".gpkg")

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination folder ----
  dest_dir <- here::here("data")

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  invisible(full_path)
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Add ‘dest_dir’ argument - v0.7

download_gadm <- function(
  country = "FRA", 
  dest_dir = here::here("data")
) {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- paste0("gadm41_", country, ".gpkg")

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  invisible(full_path)
}

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Add ‘dest_dir’ argument - v0.7

download_gadm <- function(
  country = "FRA", 
  dest_dir = here::here("data")
) {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- paste0("gadm41_", country, ".gpkg")

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  invisible(full_path)
}


  Easy to use now!

download_gadm(country = "FRA")
download_gadm(country = "MEX", dest_dir = here::here("data", "gadm"))

From script to function

  Let’s download the spatial boundary of France

## %

##
## DISTANT FILE
##

# Base URL ----
base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

# File name ----
file_name <- "gadm41_FRA.gpkg"

# Full URL ----
(full_url <- paste0(base_url, file_name))
# https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/gadm41_FRA.gpkg

##
## LOCAL FILE
##

# Destination folder ----
(dest_dir <- here::here("data"))
# /home/nicolas/projects/demo/data

# Destination file ----
(full_path <- file.path(dest_dir, file_name))
# /home/nicolas/projects/demo/data/gadm41_FRA.gpkg

##
## DOWNLOAD FILE
##

download.file(url = full_url, destfile = full_path, mode = "wb")
## %

  Add ‘dest_dir’ argument - v0.7

download_gadm <- function(
  country = "FRA", 
  dest_dir = here::here("data")
) {
    
  # Base URL ----
  base_url <- "https://geodata.ucdavis.edu/gadm/gadm4.1/gpkg/"

  # File name ----
  file_name <- paste0("gadm41_", country, ".gpkg")

  # Full URL ----
  full_url <- paste0(base_url, file_name)

  # Destination file ----
  full_path <- file.path(dest_dir, file_name)

  # Download file ----
  download.file(url = full_url, destfile = full_path, mode = "wb")

  invisible(full_path)
}


  To go further:

  • Assertive programming (i.e. check arguments)
  • Create destination directory (or throw an error)
  • Check if file has been already downloaded
  • Check for valid URL, …

Documenting a function

Documenting a function

  • Specially-structured comments preceding each function definition
  • Lightweight syntax easy to write and to read
  • Syntax: #' @field value
  • Keep function definition and documentation in the same file
  • Automatically write .Rd files (in man/) and NAMESPACE w/ devtools::document()

 Get started w/ roxygen2: here

#' Compute the arithmetic mean
#'
#' @description
#' This function computes the arithmetic mean of a numeric variable.
#'
#' @param x a `numeric` vector
#'
#' @return A `numeric` value representing the arithmetic mean of `x`.
#'
#' @export
#'
#' @examples
#' x <- 1:10
#' arithmetic_mean(x)

arithmetic_mean <- function(x) {
  
  sum(x) / length(x)
}

Resources

  • Wickham H (2023) Chap. 25: Functions. In R for Data Science, (2nd ed.). Online book
  • Wickham H (2019) Chap. 6: Functions. In Advanced R, (2nd ed.). Online book
  • Wickham H & Bryan J (2023) Chap. 16: Function documentation. In R Packages, (2nd ed.). Online book