For those coming to R from other languages, it seems very odd that R users import code from packages the way that they do. For instance, if you are used to Python, this is the general pattern of “using code from elsewhere”:
import math
import numpy as np
from random import randint
# Usage:
math.floor(3.2)
np.array(...)
randint()
Meanwhile, in R, you’re likely to see
library(dplyr)
data_frame(x = 1, y = "A") %>%
mutate(z = TRUE)
… with the subtext being that users must simply know that the
data_frame()
, mutate()
, and %>%
functions are actually all from the
dplyr package. Why is R so unusual here?
Unlike C or some familiar Lisps,
R has proper namespace support, so it’s not as though conflicting symbols are
simply overwritten when you use library()
in the way they would be when using
#include
in C. Moreover, there is one place that R users do work with a
highly granular mechanism to control how symbols are imported into an
environment: when working with the NAMESPACE
file included with every R
package. If you crack open one of these files, you’ll see lines like
importFrom(dplyr, mutate, data_frame, %>%)
which look a bit like pseudo-R code. Yet there is no importFrom()
function in R itself. It’s not hard to write one if we make use of the
not-widely-advertised namespace API, either:
importFrom <- function(pkg, ...) {
pkg <- as.character(substitute(pkg))
call <- match.call()
symbols <- vapply(3:length(call), function(i) {
as.character(call[[i]])
}, character(1))
ns <- loadNamespace(pkg)
importIntoEnv(parent.frame(), symbols, ns, symbols)
invisible(NULL)
}
# Confirm that it works:
importFrom(tibble, is.tibble, as_tibble)
df <- dplyr::data_frame(x = 1, y = "A")
is.tibble(df)
#> [1] TRUE
is.tibble
#> function(x) {
#> "tbl_df" %in% class(x)
#> }
#> <bytecode: 0x55c14454ca60>
#> <environment: namespace:tibble>
So that’s clearly not the barrier. In fact if this appeals to you there is a
fully-realized vision of an import*()
function for R available in the
import package. But there is still a
question of why the language itself does not include such a function.
I was curious about whether anything had been written about this before, so I
asked about the historical reason that R does not have an importFrom()
function on the
r-devel
mailing list. While I didn’t get an answer from those involved, those
who did comment raised some interesting points.
The first is that larger R programs tend to get refactored into R packages,
which do have careful management of imported functions and packages. (This
largely aligns with my own experience.) And the second is that R has a strong
culture of development-by-interpreter, where library()
is a useful shortcut to
getting the third-party functions that you want.
I also suspect that the history of the language plays a part as well. When
namespace support was added to the R language in 2003, the
language had a strong emphasis on maintaining compatibility with the S language,
so that new users would not need to rewrite much of their existing code. S used
library()
, and thus R provided library()
. And insofar as there was a design
principle for the S language, it was that users could begin using it without
thinking of themselves as “programmers”,
and gradually slide into programming as their analyses demanded it. The
library()
approach is in line with this philosophy, in my view.
And of course, there is also the reality that R has almost thirty years of development and design decisions to account for – almost 50 if we include its S ancestors dating back to 1976. By this measure, R is contemporary with the very first versions of Unix (1975), Emacs (1976), and TeX (1978). I’m sure that some of these decisions might be regretted or resented by the core R developers, but most of these now-features have users expecting at least some semblance of backwards compatibility.
In the end I think that the main reason that we don’t see mass migration to
solutions like the import package speaks to a deeper division in the R
community. The R users who are likely to raise concerns over using library()
in scripts – those worried about code hygiene or dependency management, for
example – are likely willing and able to solve these problems in other ways,
notably by refactoring code into packages. And as Roger Peng argued in his
recent keynote at useR! 2018, R has focused on programmer-oriented features at
its peril.
For the large contingent of R users writing run-once data analysis scripts –
still the key constituency of the language – library()
is exactly what you
want: no hassle access to third-party functions. That this is the default
recommendation, as opposed to some version of importFrom()
, reflects the
nature of the R community.