# The lapply() family

This question on Reddit, got me thinking about the `lapply()`

family of functions, and how a beginner might want to learn about them. Here is my take

# Introduction

The easiest one to understand is `lapply()`

, I’ll work through that and then extend to the others. As an aside, the programmatic terminology is *vectorising* as it allows us to perform an action over an entire *vector* at once or *list* in R.

Ignoring the dots, `lapply()`

takes two arguments `X`

and `FUN`

. `FUN`

is the name of the function, and `X`

is a list of objects. When I say list, this could be an actual, as created by the `list()`

function, or a vector such as `1:10`

. But if you try to put something more complicated in like a `data.frame()`

, you can get unexpected results (I’ll come back to this).

# lists

So, let’s say we have

```
X <- list(1:10,11:20,21:30)
X
```

```
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [1] 11 12 13 14 15 16 17 18 19 20
##
## [[3]]
## [1] 21 22 23 24 25 26 27 28 29 30
```

This list has three elements, and each element consists of a vector of 10 numbers. We can access them using `[[`

, where `X[[1]]`

will return the first element, the numbers 1 to 10:

`X[[1]]`

`## [1] 1 2 3 4 5 6 7 8 9 10`

`X[[2]]`

will return the second element, etc… This is *extraction* as it extracts an element from a list. Extraction can only bring out a single element. We can also *subset* using `[`

for example

`X[1:2]`

```
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [1] 11 12 13 14 15 16 17 18 19 20
```

This return the first and second elements. `X[1]`

will return a subset consisting of the first element.

`X[1]`

```
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
```

What’s the difference between `X[1]`

and `X[[1]]`

? Well, `X[1]`

returns a list, which is just 1 element long, that element being a vector of the numbers from 1 to 10. `X[[1]]`

returns the actual element at position 1.

`length(X[1])`

`## [1] 1`

`length(X[[1]])`

`## [1] 10`

`class(X[1])`

`## [1] "list"`

`class(X[[1]])`

`## [1] "integer"`

So `X[1]`

is a list, just like `X`

but is shorter, a subset, just like how `X[1:2]`

is a subset with length 2. Whereas `X[[1]]`

*is* the first element of `X`

. This is clearer if we try to add something to these two objects:

`X[[1]] + 3`

`## [1] 4 5 6 7 8 9 10 11 12 13`

`X[1] + 3`

`## Error in X[1] + 3: non-numeric argument to binary operator`

Again, to stress the point. `X[1]`

is *not* a number, it is a list containing a single element. Since `X[1]`

is a list, we can therefore *extract* that first element from it:

`X[1][[1]]`

`## [1] 1 2 3 4 5 6 7 8 9 10`

The difference between a list and a vector is that a vector has to all be of the same type (e.g. all characters as in `c("a","b","c")`

or all numbers as in `c(1,2,3)`

, the `c()`

function will *coerce* them otherwise, so `c(1,"2",3)`

will coerce to characters. But a list can all be different, so `list("hello",2,1:10)`

has three elements. In fact lists can contain lists (nested lists)

```
Y <- list("hello",1:10,list("one","two","three"))
Y
```

```
## [[1]]
## [1] "hello"
##
## [[2]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[3]]
## [[3]][[1]]
## [1] "one"
##
## [[3]][[2]]
## [1] "two"
##
## [[3]][[3]]
## [1] "three"
```

has three elements. If you *extract* the third element,

`Y[[3]]`

```
## [[1]]
## [1] "one"
##
## [[2]]
## [1] "two"
##
## [[3]]
## [1] "three"
```

you get another list. If you *subset* the third element,

`Y[3]`

```
## [[1]]
## [[1]][[1]]
## [1] "one"
##
## [[1]][[2]]
## [1] "two"
##
## [[1]][[3]]
## [1] "three"
```

you get a list with 1 element.

As far as nomenclature is concerned, a *vector* is a type of *list* which has the requirement that all entries be of the same type. You can even use extraction on a vector,

```
x <- 1:10
x[3]
```

`## [1] 3`

`x[[3]]`

`## [1] 3`

Although when dealing with a vector, the second version is much less common. The reason this works is that *extraction* and *subsetting* are essentially the same thing in a vector (because it will always return a vector, it just might be of length 1).

# lapply()

So, now that we know what a list is, we can look at what `lapply()`

does to that list. If we supply a function, `lapply()`

will run that function on every element in that list. The simplest example would be, using `X`

from above, `lapply(X,mean)`

will return a list with the `mean()`

of every element in `X`

.

`lapply(X,mean)`

```
## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 15.5
##
## [[3]]
## [1] 25.5
```

Remember that the elements in `X`

are the vectors of numbers, `1:10`

, `11:20`

and `21:30`

. We’ve *applied* the function to the *list* a *list-apply*.

The function doesn’t have to be one that is named, and we can supply a function in-line

`lapply(X, function(x) mean(x-5.5))`

```
## [[1]]
## [1] 0
##
## [[2]]
## [1] 10
##
## [[3]]
## [1] 20
```

This applies the function `function(x) mean(x-5.5)`

to every element in `X`

. You could define this function outside of the `lapply()`

function earlier, but there is no need if this is the only place we plan on using it.

For future, when the `R 4.1`

version is released, I believe this will be even easier with the shorthand `\()`

syntax.

`lapply(X, \(x) mean(x - 5.5))`

So running `lapply(X,FUN)`

is the same as running the following `for()`

loop

```
output <- vector("list",length(X))
for(i in 1:length(X)){
output[[i]] <- FUN(X[[i]])
}
```

Compare the previous code to this:

```
output <- vector("list",length(X))
for(i in 1:length(X)){
output[[i]] <- mean(X[[i]])
}
output
```

```
## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 15.5
##
## [[3]]
## [1] 25.5
```

Notice that I’ve defined the `output <- vector("list",length(X))`

before running the `for()`

loop. This line basically makes an empty list of the defined length. This will come up when we move on from `lapply()`

# dots

One part of `lapply()`

that I’ve ignored is the `...`

dots argument. These are basically other arguments that you want passed on to your function. Whatever is in the dots, will be passed to every call to `FUN`

, whether named or not:

`lapply(c("a","b","c"),paste,"2")`

```
## [[1]]
## [1] "a 2"
##
## [[2]]
## [1] "b 2"
##
## [[3]]
## [1] "c 2"
```

`lapply(list( 1:10, c(1,2,NA,4), 21:30), mean, na.rm=T)`

```
## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 2.333333
##
## [[3]]
## [1] 25.5
```

Essentially, this runs the following loop:

```
X <- list( 1:10, c(1,2,NA,4), 21:30)
output <- vector("list",3)
for(i in 1:3){
output[[i]] <- mean(X[[i]],na.rm=T)
}
output
```

```
## [[1]]
## [1] 5.5
##
## [[2]]
## [1] 2.333333
##
## [[3]]
## [1] 25.5
```

Hopefully that will be enough to understand `lapply()`

. One unusual case is using `lapply()`

on a `data.frame`

-like structure. Now, a `data.frame`

looks like a table, but it’s actually a list, but the list is counter intuitive. Each element of the list is a *column* in the `data.frame`

. So, if you run the following, you would get a result that is only 4 elements long

```
iris0 <- iris[,1:4]
lapply(iris0,mean)
```

```
## $Sepal.Length
## [1] 5.843333
##
## $Sepal.Width
## [1] 3.057333
##
## $Petal.Length
## [1] 3.758
##
## $Petal.Width
## [1] 1.199333
```

You might think that this would work *across* the rows of the `data.frame`

, but it works *down* the columns. Also note that these outputs are now also named the same as the input list. This can be useful for keeping track of your inputs and outputs.

# apply()

This brings us only `apply()`

.

The `apply()`

function does a similar job, however it doesn’t work on lists, it works on multi-dimensional objects, so matrices and arrays. It tries to collapse a multi-dimensional object down by one (or more) of its dimensions. So it turns a matrix into a vector (or an array into a smaller array). As well as `X`

(which must be multi-dimensional, so definitely *not* a list) and `FUN`

, it also takes `MARGIN`

which tells `apply()`

which dimension(s) to collapse:

```
M <- matrix(1:9,nrow=3)
apply(M,1,mean) #takes the mean of each row
```

`## [1] 4 5 6`

`apply(M,2,mean) #takes the mean of each column`

`## [1] 2 5 8`

The type returned is the same as the type we started with, and once again `apply()`

can take other arguments as dots. So this works quite well with character matrices:
{

```
M <- matrix(letters[1:9],nrow=3)
apply(M,1,paste0,collapse="") #pastes across the rows
```

`## [1] "adg" "beh" "cfi"`

`apply(M,2,paste0,collapse="") #pastes down the columns`

`## [1] "abc" "def" "ghi"`

This means we can use `apply()`

on a `data.frame`

to work across the rows, rather than down the columns. In this case, ever though a `data.frame`

is a list, because it can be accessed in the same way as a matrix, it still works

`apply(iris0,1,mean)`

```
## [1] 2.550 2.375 2.350 2.350 2.550 2.850 2.425 2.525 2.225 2.400 2.700 2.500
## [13] 2.325 2.125 2.800 3.000 2.750 2.575 2.875 2.675 2.675 2.675 2.350 2.650
## [25] 2.575 2.450 2.600 2.600 2.550 2.425 2.425 2.675 2.725 2.825 2.425 2.400
## [37] 2.625 2.500 2.225 2.550 2.525 2.100 2.275 2.675 2.800 2.375 2.675 2.350
## [49] 2.675 2.475 4.075 3.900 4.100 3.275 3.850 3.575 3.975 2.900 3.850 3.300
## [61] 2.875 3.650 3.300 3.775 3.350 3.900 3.650 3.400 3.600 3.275 3.925 3.550
## [73] 3.800 3.700 3.725 3.850 3.950 4.100 3.725 3.200 3.200 3.150 3.400 3.850
## [85] 3.600 3.875 4.000 3.575 3.500 3.325 3.425 3.775 3.400 2.900 3.450 3.525
## [97] 3.525 3.675 2.925 3.475 4.525 3.875 4.525 4.150 4.375 4.825 3.400 4.575
## [109] 4.200 4.850 4.200 4.075 4.350 3.800 4.025 4.300 4.200 5.100 4.875 3.675
## [121] 4.525 3.825 4.800 3.925 4.450 4.550 3.900 3.950 4.225 4.400 4.550 5.025
## [133] 4.250 3.925 3.925 4.775 4.425 4.200 3.900 4.375 4.450 4.350 3.875 4.550
## [145] 4.550 4.300 3.925 4.175 4.325 3.950
```

now gives a vector of the averages of each row.

# Rest of the family

Now for `lapply()`

’s sisters:

`vapply()`

takes an extra argument, which is of the same type as what you want your outcome to be. This is the one that I use most often. You can think of it like a `lapply()`

that will output something other than a list. I usually give `FUN.VALUE`

as something like `integer(1)`

or `character(1)`

. These functions generate empty vectors of that type, they are wrappers around things like `vector("integer",1)`

```
X <- list(1:10,11:20,21:30)
vapply(X,mean,numeric(1))
```

`## [1] 5.5 15.5 25.5`

This time, we get a numeric vector, rather than a list like we would with `lapply()`

. I find this much easier to ensure I’m working with the correct type of data.

`sapply()`

tries to simplify your output, So if `lapply()`

outputs a list of vectors that are all the same length, instead of a list, it’ll return a matrix

```
X <- list(1:5, 6:10, 11:15)
sapply(X,range)
```

```
## [,1] [,2] [,3]
## [1,] 1 6 11
## [2,] 5 10 15
```

Each column in this result is the same as one of the elements of the list `lapply(X,range)`

. They’ve just been `cbind`

’d together. The use of `sapply()`

is *not* common as the output can be inconsistent. ``vapply()`

is much prefered as it gives more control over the output. The above can be replicated with `vapply()`

and will throw an error if the output is unexpected:

```
X <- list(1:5, 6:10, 11:15)
vapply(X,range,numeric(2))
```

```
## [,1] [,2] [,3]
## [1,] 1 6 11
## [2,] 5 10 15
```

`tapply()`

is more complicated as it subsets the `X`

based on the `INDEX`

. It describes this as a *“Ragged Array”*. I have *never* used this directly, as I will usually do the subsetting manually using `split()`

, but that is essentialy what `tapply()`

does behind the scenes. `tapply()`

also comes with a `simplify`

argument, which decides whether R will try and simplify the results, like in `sapply()`

or not, by default it will try and invoke this simplification. The following are therefore (roughly) equivalent

```
tapply(X, INDEX, FUN, simplify=FALSE)
lapply(split(X,INDEX), FUN)
```

`split()`

creates a list where the first vector is split into groups based on the second argument.

So we can compare using both a `lapply()`

and a `vapply()`

```
x <- 1:10
grp <- c(1,1,1,2,2,3,3,3,4,5)
tapply(x,grp,sum,simplify=FALSE)
```

```
## $`1`
## [1] 6
##
## $`2`
## [1] 9
##
## $`3`
## [1] 21
##
## $`4`
## [1] 9
##
## $`5`
## [1] 10
```

`lapply(split(x,grp),sum)`

```
## $`1`
## [1] 6
##
## $`2`
## [1] 9
##
## $`3`
## [1] 21
##
## $`4`
## [1] 9
##
## $`5`
## [1] 10
```

`tapply(x,grp,sum)`

```
## 1 2 3 4 5
## 6 9 21 9 10
```

`vapply(split(x,grp),sum,numeric(1))`

```
## 1 2 3 4 5
## 6 9 21 9 10
```

The other member of the `lapply()`

family is `mapply()`

. This is even more powerful as it allows you to *vectorise* over multiple arguments, rather than just the first. Syntactically, the difference here is that the dots are the vectorised arguments, and the non-vectorised arguments go into the `MoreArgs`

argument.

```
X <- list("one","two",c("three", "four"))
Y <- list("A","B",c("C","D"))
mapply(paste,X,Y)
```

```
## [[1]]
## [1] "one A"
##
## [[2]]
## [1] "two B"
##
## [[3]]
## [1] "three C" "four D"
```

This is the same as doing:

```
list(
paste(X[[1]],Y[[1]]),
paste(X[[2]],Y[[2]]),
paste(X[[3]],Y[[3]])
)
```

```
## [[1]]
## [1] "one A"
##
## [[2]]
## [1] "two B"
##
## [[3]]
## [1] "three C" "four D"
```

Here is one final example using `rep()`

, which repeats the first argument a specific number of times

```
X <- letters[1:4]
Y <- 1:4
mapply(rep,X,Y)
```

```
## $a
## [1] "a"
##
## $b
## [1] "b" "b"
##
## $c
## [1] "c" "c" "c"
##
## $d
## [1] "d" "d" "d" "d"
```