Unconstant Conjunction A personal blog

Why There is No importFrom() Function in R

    27 August 2018 // tagged

For those coming to R from other languages, it seems very odd that R users import code from packages the way that they do. For instance, if you are used to Python, this is the general pattern of “using code from elsewhere”:

import math
import numpy as np
from random import randint

# Usage:
math.floor(3.2)
np.array(...)
randint()

Meanwhile, in R, you’re likely to see

library(dplyr)

data_frame(x = 1, y = "A") %>%
  mutate(z = TRUE)

… with the subtext being that users must simply know that the data_frame(), mutate(), and %>% functions are actually all from the dplyr package. Why is R so unusual here?

Unlike C or some familiar Lisps, R has proper namespace support, so it’s not as though conflicting symbols are simply overwritten when you use library() in the way they would be when using #include in C. Moreover, there is one place that R users do work with a highly granular mechanism to control how symbols are imported into an environment: when working with the NAMESPACE file included with every R package. If you crack open one of these files, you’ll see lines like

importFrom(dplyr, mutate, data_frame, %>%)

which look a bit like pseudo-R code. Yet there is no importFrom() function in R itself. It’s not hard to write one if we make use of the not-widely-advertised namespace API, either:

importFrom <- function(pkg, ...) {
  pkg <- as.character(substitute(pkg))
  call <- match.call()
  symbols <- vapply(3:length(call), function(i) {
    as.character(call[[i]])
  }, character(1))
  ns <- loadNamespace(pkg)
  importIntoEnv(parent.frame(), symbols, ns, symbols)
  invisible(NULL)
}

# Confirm that it works:
importFrom(tibble, is.tibble, as_tibble)

df <- dplyr::data_frame(x = 1, y = "A")
is.tibble(df)
#> [1] TRUE
is.tibble
#> function(x) {
#>   "tbl_df" %in% class(x)
#> }
#> <bytecode: 0x55c14454ca60>
#> <environment: namespace:tibble>

So that’s clearly not the barrier. In fact if this appeals to you there is a fully-realized vision of an import*() function for R available in the import package. But there is still a question of why the language itself does not include such a function.

I was curious about whether anything had been written about this before, so I asked about the historical reason that R does not have an importFrom() function on the r-devel mailing list. While I didn’t get an answer from those involved, those who did comment raised some interesting points.

The first is that larger R programs tend to get refactored into R packages, which do have careful management of imported functions and packages. (This largely aligns with my own experience.) And the second is that R has a strong culture of development-by-interpreter, where library() is a useful shortcut to getting the third-party functions that you want.

I also suspect that the history of the language plays a part as well. When namespace support was added to the R language in 2003, the language had a strong emphasis on maintaining compatibility with the S language, so that new users would not need to rewrite much of their existing code. S used library(), and thus R provided library(). And insofar as there was a design principle for the S language, it was that users could begin using it without thinking of themselves as “programmers”, and gradually slide into programming as their analyses demanded it. The library() approach is in line with this philosophy, in my view.

And of course, there is also the reality that R has almost thirty years of development and design decisions to account for – almost 50 if we include its S ancestors dating back to 1976. By this measure, R is contemporary with the very first versions of Unix (1975), Emacs (1976), and TeX (1978). I’m sure that some of these decisions might be regretted or resented by the core R developers, but most of these now-features have users expecting at least some semblance of backwards compatibility.

In the end I think that the main reason that we don’t see mass migration to solutions like the import package speaks to a deeper division in the R community. The R users who are likely to raise concerns over using library() in scripts – those worried about code hygiene or dependency management, for example – are likely willing and able to solve these problems in other ways, notably by refactoring code into packages. And as Roger Peng argued in his recent keynote at useR! 2018, R has focused on programmer-oriented features at its peril.

For the large contingent of R users writing run-once data analysis scripts – still the key constituency of the language – library() is exactly what you want: no hassle access to third-party functions. That this is the default recommendation, as opposed to some version of importFrom(), reflects the nature of the R community.

comments powered by Disqus