# The lapply() family

This question on Reddit, got me thinking about the `lapply()` family of functions, and how a beginner might want to learn about them. Here is my take

# Introduction

The easiest one to understand is `lapply()`, I’ll work through that and then extend to the others. As an aside, the programmatic terminology is vectorising as it allows us to perform an action over an entire vector at once or list in R.

Ignoring the dots, `lapply()` takes two arguments `X` and `FUN`. `FUN` is the name of the function, and `X` is a list of objects. When I say list, this could be an actual, as created by the `list()` function, or a vector such as `1:10`. But if you try to put something more complicated in like a `data.frame()`, you can get unexpected results (I’ll come back to this).

# lists

So, let’s say we have

``````X <- list(1:10,11:20,21:30)
X``````
``````## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
##
## [[2]]
##  [1] 11 12 13 14 15 16 17 18 19 20
##
## [[3]]
##  [1] 21 22 23 24 25 26 27 28 29 30``````

This list has three elements, and each element consists of a vector of 10 numbers. We can access them using `[[`, where `X[[1]]` will return the first element, the numbers 1 to 10:

``X[[1]]``
``##  [1]  1  2  3  4  5  6  7  8  9 10``

`X[[2]]` will return the second element, etc… This is extraction as it extracts an element from a list. Extraction can only bring out a single element. We can also subset using `[` for example

``X[1:2]``
``````## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
##
## [[2]]
##  [1] 11 12 13 14 15 16 17 18 19 20``````

This return the first and second elements. `X[1]` will return a subset consisting of the first element.

``X[1]``
``````## [[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10``````

What’s the difference between `X[1]` and `X[[1]]`? Well, `X[1]` returns a list, which is just 1 element long, that element being a vector of the numbers from 1 to 10. `X[[1]]` returns the actual element at position 1.

``length(X[1])``
``## [1] 1``
``length(X[[1]])``
``## [1] 10``
``class(X[1])``
``## [1] "list"``
``class(X[[1]])``
``## [1] "integer"``

So `X[1]` is a list, just like `X` but is shorter, a subset, just like how `X[1:2]` is a subset with length 2. Whereas `X[[1]]` is the first element of `X`. This is clearer if we try to add something to these two objects:

``X[[1]] + 3``
``##  [1]  4  5  6  7  8  9 10 11 12 13``
``X[1] + 3``
``## Error in X[1] + 3: non-numeric argument to binary operator``

Again, to stress the point. `X[1]` is not a number, it is a list containing a single element. Since `X[1]` is a list, we can therefore extract that first element from it:

``X[1][[1]]``
``##  [1]  1  2  3  4  5  6  7  8  9 10``

The difference between a list and a vector is that a vector has to all be of the same type (e.g. all characters as in `c("a","b","c")` or all numbers as in `c(1,2,3)`, the `c()` function will coerce them otherwise, so `c(1,"2",3)` will coerce to characters. But a list can all be different, so `list("hello",2,1:10)` has three elements. In fact lists can contain lists (nested lists)

``````Y <- list("hello",1:10,list("one","two","three"))
Y``````
``````## [[1]]
## [1] "hello"
##
## [[2]]
##  [1]  1  2  3  4  5  6  7  8  9 10
##
## [[3]]
## [[3]][[1]]
## [1] "one"
##
## [[3]][[2]]
## [1] "two"
##
## [[3]][[3]]
## [1] "three"``````

has three elements. If you extract the third element,

``Y[[3]]``
``````## [[1]]
## [1] "one"
##
## [[2]]
## [1] "two"
##
## [[3]]
## [1] "three"``````

you get another list. If you subset the third element,

``Y[3]``
``````## [[1]]
## [[1]][[1]]
## [1] "one"
##
## [[1]][[2]]
## [1] "two"
##
## [[1]][[3]]
## [1] "three"``````

you get a list with 1 element.

As far as nomenclature is concerned, a vector is a type of list which has the requirement that all entries be of the same type. You can even use extraction on a vector,

``````x <- 1:10
x[3]``````
``## [1] 3``
``x[[3]]``
``## [1] 3``

Although when dealing with a vector, the second version is much less common. The reason this works is that extraction and subsetting are essentially the same thing in a vector (because it will always return a vector, it just might be of length 1).

# lapply()

So, now that we know what a list is, we can look at what `lapply()` does to that list. If we supply a function, `lapply()` will run that function on every element in that list. The simplest example would be, using `X` from above, `lapply(X,mean)` will return a list with the `mean()` of every element in `X`.

``lapply(X,mean)``
``````## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 15.5
##
## [[3]]
## [1] 25.5``````

Remember that the elements in `X` are the vectors of numbers, `1:10`, `11:20` and `21:30`. We’ve applied the function to the list a list-apply.

The function doesn’t have to be one that is named, and we can supply a function in-line

``lapply(X, function(x) mean(x-5.5))``
``````## [[1]]
## [1] 0
##
## [[2]]
## [1] 10
##
## [[3]]
## [1] 20``````

This applies the function `function(x) mean(x-5.5)` to every element in `X`. You could define this function outside of the `lapply()` function earlier, but there is no need if this is the only place we plan on using it.

For future, when the `R 4.1` version is released, I believe this will be even easier with the shorthand `\()` syntax.

``lapply(X, \(x) mean(x - 5.5))``

So running `lapply(X,FUN)` is the same as running the following `for()` loop

``````output <- vector("list",length(X))
for(i in 1:length(X)){
output[[i]] <- FUN(X[[i]])
}``````

Compare the previous code to this:

``````output <- vector("list",length(X))
for(i in 1:length(X)){
output[[i]] <- mean(X[[i]])
}
output``````
``````## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 15.5
##
## [[3]]
## [1] 25.5``````

Notice that I’ve defined the `output <- vector("list",length(X))` before running the `for()` loop. This line basically makes an empty list of the defined length. This will come up when we move on from `lapply()`

# dots

One part of `lapply()` that I’ve ignored is the `...` dots argument. These are basically other arguments that you want passed on to your function. Whatever is in the dots, will be passed to every call to `FUN`, whether named or not:

``lapply(c("a","b","c"),paste,"2")``
``````## [[1]]
## [1] "a 2"
##
## [[2]]
## [1] "b 2"
##
## [[3]]
## [1] "c 2"``````
``lapply(list( 1:10, c(1,2,NA,4), 21:30), mean, na.rm=T)``
``````## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 2.333333
##
## [[3]]
## [1] 25.5``````

Essentially, this runs the following loop:

``````X <- list( 1:10, c(1,2,NA,4), 21:30)
output <- vector("list",3)
for(i in 1:3){
output[[i]] <- mean(X[[i]],na.rm=T)
}
output``````
``````## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 2.333333
##
## [[3]]
## [1] 25.5``````

Hopefully that will be enough to understand `lapply()`. One unusual case is using `lapply()` on a `data.frame`-like structure. Now, a `data.frame` looks like a table, but it’s actually a list, but the list is counter intuitive. Each element of the list is a column in the `data.frame`. So, if you run the following, you would get a result that is only 4 elements long

``````iris0 <- iris[,1:4]
lapply(iris0,mean)``````
``````## \$Sepal.Length
## [1] 5.843333
##
## \$Sepal.Width
## [1] 3.057333
##
## \$Petal.Length
## [1] 3.758
##
## \$Petal.Width
## [1] 1.199333``````

You might think that this would work across the rows of the `data.frame`, but it works down the columns. Also note that these outputs are now also named the same as the input list. This can be useful for keeping track of your inputs and outputs.

# apply()

This brings us only `apply()`.

The `apply()` function does a similar job, however it doesn’t work on lists, it works on multi-dimensional objects, so matrices and arrays. It tries to collapse a multi-dimensional object down by one (or more) of its dimensions. So it turns a matrix into a vector (or an array into a smaller array). As well as `X` (which must be multi-dimensional, so definitely not a list) and `FUN`, it also takes `MARGIN` which tells `apply()` which dimension(s) to collapse:

``````M <- matrix(1:9,nrow=3)
apply(M,1,mean) #takes the mean of each row``````
``## [1] 4 5 6``
``apply(M,2,mean) #takes the mean of each column``
``## [1] 2 5 8``

The type returned is the same as the type we started with, and once again `apply()` can take other arguments as dots. So this works quite well with character matrices: {

``````M <- matrix(letters[1:9],nrow=3)
apply(M,1,paste0,collapse="") #pastes across the rows``````
``## [1] "adg" "beh" "cfi"``
``apply(M,2,paste0,collapse="") #pastes down the columns``
``## [1] "abc" "def" "ghi"``

This means we can use `apply()` on a `data.frame` to work across the rows, rather than down the columns. In this case, ever though a `data.frame` is a list, because it can be accessed in the same way as a matrix, it still works

``apply(iris0,1,mean)``
``````##   [1] 2.550 2.375 2.350 2.350 2.550 2.850 2.425 2.525 2.225 2.400 2.700 2.500
##  [13] 2.325 2.125 2.800 3.000 2.750 2.575 2.875 2.675 2.675 2.675 2.350 2.650
##  [25] 2.575 2.450 2.600 2.600 2.550 2.425 2.425 2.675 2.725 2.825 2.425 2.400
##  [37] 2.625 2.500 2.225 2.550 2.525 2.100 2.275 2.675 2.800 2.375 2.675 2.350
##  [49] 2.675 2.475 4.075 3.900 4.100 3.275 3.850 3.575 3.975 2.900 3.850 3.300
##  [61] 2.875 3.650 3.300 3.775 3.350 3.900 3.650 3.400 3.600 3.275 3.925 3.550
##  [73] 3.800 3.700 3.725 3.850 3.950 4.100 3.725 3.200 3.200 3.150 3.400 3.850
##  [85] 3.600 3.875 4.000 3.575 3.500 3.325 3.425 3.775 3.400 2.900 3.450 3.525
##  [97] 3.525 3.675 2.925 3.475 4.525 3.875 4.525 4.150 4.375 4.825 3.400 4.575
## [109] 4.200 4.850 4.200 4.075 4.350 3.800 4.025 4.300 4.200 5.100 4.875 3.675
## [121] 4.525 3.825 4.800 3.925 4.450 4.550 3.900 3.950 4.225 4.400 4.550 5.025
## [133] 4.250 3.925 3.925 4.775 4.425 4.200 3.900 4.375 4.450 4.350 3.875 4.550
## [145] 4.550 4.300 3.925 4.175 4.325 3.950``````

now gives a vector of the averages of each row.

# Rest of the family

Now for `lapply()`’s sisters:

`vapply()` takes an extra argument, which is of the same type as what you want your outcome to be. This is the one that I use most often. You can think of it like a `lapply()` that will output something other than a list. I usually give `FUN.VALUE` as something like `integer(1)` or `character(1)`. These functions generate empty vectors of that type, they are wrappers around things like `vector("integer",1)`

``````X <- list(1:10,11:20,21:30)
vapply(X,mean,numeric(1))``````
``## [1]  5.5 15.5 25.5``

This time, we get a numeric vector, rather than a list like we would with `lapply()`. I find this much easier to ensure I’m working with the correct type of data.

`sapply()` tries to simplify your output, So if `lapply()` outputs a list of vectors that are all the same length, instead of a list, it’ll return a matrix

``````X <- list(1:5, 6:10, 11:15)
sapply(X,range)``````
``````##      [,1] [,2] [,3]
## [1,]    1    6   11
## [2,]    5   10   15``````

Each column in this result is the same as one of the elements of the list `lapply(X,range)`. They’ve just been `cbind`’d together. The use of `sapply()` is not common as the output can be inconsistent. ``vapply()` is much prefered as it gives more control over the output. The above can be replicated with `vapply()` and will throw an error if the output is unexpected:

``````X <- list(1:5, 6:10, 11:15)
vapply(X,range,numeric(2))``````
``````##      [,1] [,2] [,3]
## [1,]    1    6   11
## [2,]    5   10   15``````

`tapply()` is more complicated as it subsets the `X` based on the `INDEX`. It describes this as a “Ragged Array”. I have never used this directly, as I will usually do the subsetting manually using `split()`, but that is essentialy what `tapply()` does behind the scenes. `tapply()` also comes with a `simplify` argument, which decides whether R will try and simplify the results, like in `sapply()` or not, by default it will try and invoke this simplification. The following are therefore (roughly) equivalent

``````tapply(X, INDEX, FUN, simplify=FALSE)

lapply(split(X,INDEX), FUN)``````

`split()` creates a list where the first vector is split into groups based on the second argument.

So we can compare using both a `lapply()` and a `vapply()`

``````x <- 1:10
grp <- c(1,1,1,2,2,3,3,3,4,5)
tapply(x,grp,sum,simplify=FALSE)``````
``````## \$`1`
## [1] 6
##
## \$`2`
## [1] 9
##
## \$`3`
## [1] 21
##
## \$`4`
## [1] 9
##
## \$`5`
## [1] 10``````
``lapply(split(x,grp),sum)``
``````## \$`1`
## [1] 6
##
## \$`2`
## [1] 9
##
## \$`3`
## [1] 21
##
## \$`4`
## [1] 9
##
## \$`5`
## [1] 10``````
``tapply(x,grp,sum)``
``````##  1  2  3  4  5
##  6  9 21  9 10``````
``vapply(split(x,grp),sum,numeric(1))``
``````##  1  2  3  4  5
##  6  9 21  9 10``````

The other member of the `lapply()` family is `mapply()`. This is even more powerful as it allows you to vectorise over multiple arguments, rather than just the first. Syntactically, the difference here is that the dots are the vectorised arguments, and the non-vectorised arguments go into the `MoreArgs` argument.

``````X <- list("one","two",c("three", "four"))
Y <- list("A","B",c("C","D"))
mapply(paste,X,Y)``````
``````## [[1]]
## [1] "one A"
##
## [[2]]
## [1] "two B"
##
## [[3]]
## [1] "three C" "four D"``````

This is the same as doing:

``````list(
paste(X[[1]],Y[[1]]),
paste(X[[2]],Y[[2]]),
paste(X[[3]],Y[[3]])
)``````
``````## [[1]]
## [1] "one A"
##
## [[2]]
## [1] "two B"
##
## [[3]]
## [1] "three C" "four D"``````

Here is one final example using `rep()`, which repeats the first argument a specific number of times

``````  X <- letters[1:4]
Y <- 1:4
mapply(rep,X,Y)``````
``````## \$a
## [1] "a"
##
## \$b
## [1] "b" "b"
##
## \$c
## [1] "c" "c" "c"
##
## \$d
## [1] "d" "d" "d" "d"``````
##### Dr. Michael Barrowman, PhD
###### Compliance Manager

I am a Data Scientist, PhD candidate and R Developer.