Unconstant Conjunction latest posts

For and Against data.table

    27 June 2019 // tagged

I was reminded by Megan Stodel’s recent post – “Three reasons why I use data.table” – of the enduring data.table vs. dplyr argument.

To Stodel’s points in favour of data.table, which I would summarise as (1) performance, (2) good design, and (3) that it makes you feel a bit iconoclastic in these tidyverse times, I would add the following:

  • A very stable, actively-maintained, and battle-tested codebase. This is also true of both dplyr and base R, of course, but it is valuable nonetheless.

  • Zero third-party dependencies. In contrast, dplyr has, at the time of writing, 22 direct or indirect package dependencies. In combination with the choice of C over Rcpp, this leads to very fast compilation and installation times.

Both of these are highly valued in production environments, which happens to be the focus of my current work.

However, I would usually advocate against using data.table unless you really, really do need its performance benefits.

Continue Reading →

Copyright in Closed-Source R Packages: The Right Way

    18 March 2019 // tagged

There are a wealth of excellent resources for R users on creating and maintaining R packages, which remain the best way to share your code within your organisation or the larger community. However, almost all of these resources tend to focus on open-source packages, tools, and workflows. As a result of this, they tend to skim over more corporate issues like copyright assignment.

Yet many R users are working on their code in proprietary environments, creating packages for internal use by their company. That code is closed source, with the copyright belonging to their organisation. If you fall into this category, how should you communicate that in your R package? This post is intended to provide a very clear answer to that question.

Continue Reading →

What Does R Value?

    13 December 2018 // tagged

I’m fan of Bryan Cantrill’s argument that we ought to think about “platform values” when assessing technologies, highlighted in a recent talk, but explained more thoroughly in an earlier one on his experience with the node.js community.

In my reading, he argues that of the many values (such as performance, security, or expressiveness) that programming languages or platforms may hold, many are in conflict, and the platform inevitably emphasises some of these values over others. These decisions are a reflection of explicit or implicit “platform values”.

Cantrill illustrates this with a series of examples, but, no surprise, the R platform does not make his shortlist. I couldn’t resist trying to cook up my own taxonomy of platform values for the R language and its community.

More broadly, though, Cantrill believes that the values of a platform affect the projects that adopt them, and conflicts can arise between the values of a project and its respective choice of platform. This strongly echoes my own experience in the R community, and I’ve gotten a measure of clarity for future and existing projects by learning to articulate these conflicts.

Continue Reading →

Why There is No importFrom() Function in R

    27 August 2018 // tagged

For those coming to R from other languages, it seems very odd that R users import code from packages the way that they do. For instance, if you are used to Python, this is the general pattern of “using code from elsewhere”:

import math
import numpy as np
from random import randint

# Usage:

Meanwhile, in R, you’re likely to see


data_frame(x = 1, y = "A") %>%
  mutate(z = TRUE)

… with the subtext being that users must simply know that the data_frame(), mutate(), and %>% functions are actually all from the dplyr package. Why is R so unusual here?

Continue Reading →

Communicating with UDP Sockets from R

    6 July 2018 // tagged

Recently I wanted to send some Shiny usage data from R to a certain metrics server. Since R includes write.socket() and friends for opening arbitrary network sockets, it seemed at the outset that this would be quite simple. However, I ran into an interesting roadblock along the way.

It turns out that R’s socket API only supports TCP connections, which you can confirm by looking at the source code – and in this case I needed to send UDP packets instead. This was a little surprising to me, since most other languages would include UDP support out of the box; it’s a core internet protocol, after all. For whatever reason, this seems not to be the case with R, and even after searching CRAN and GitHub I wasn’t able to find an existing package that provides UDP socket support.

To remedy this, I put together a simple way to write messages to UDP sockets from R.

Continue Reading →