Unconstant Conjunction latest posts

A Bayesian Estimate of BackBlaze's Hard Drive Failure Rates

    13 February 2020 // tagged
Bayesian Hard Drive Failure Rates

Each quarter the backup service BackBlaze publishes data on the failure rate of its hundreds of thousands of hard drives, most recently on February 11th. Since the failure rate of different models can vary widely, these posts sometimes make a splash in the tech community. They’re also notable as the only large public dataset on drive failures:

BackBlaze 2019 Annualized Hard Drive Failure Rates

One of the things that strikes me about the presentation above is that BackBlaze uses simple averages to compute the “Annualized Failure Rate” (AFR), despite the fact that the actual count data vary by orders of magnitude, down to a single digit. This might lead us to question the accuracy for smaller samples; in fact, the authors are sensitive to this possibility and suppress data from drives with less than 5,000 days of operation in Q4 2019 (although they are detailed in the text of the article and available in their public datasets).

This looks like a perfect use case for a Bayesian approach: we want to combine a prior expectation of the failure rate (which might be close to the historical average across all drives) with observed failure events to produce a more accurate estimate for each model.

Continue Reading →

Browsing Twitch.tv From Emacs

A video of my presentation from EmacsConf 2019 is now available. You can check out the recording below or see the slides here.


Browser-based applications can sometimes punish even new machines. In 2015, due to limited hardware, I was no longer able to use the popular video streaming site Twitch.tv to follow eSports. I investigated some alternatives at the time, but they lacked discovery and curation features, and so I decided to write a full-fledged Twitch client in my favourite text editor, Emacs. Years later, I still use this little bit of Emacs Lisp almost every day.

The talk discusses how I was able to use the richness of the built-in Emacs features and some community packages to build this client, as well as the various bumps along the way.

The code is available on GitHub.

Continue Reading →

Structured Errors in Plumber APIs

If you’ve used the Plumber package to make R models or other code accessible to others via an API, sooner or later you will need to decide how to handle and report errors.

By default, Plumber will catch R-level errors (like calls to stop()) and report them to users of your API as a JSON-encoded error message with HTTP status code 500 – also known as Internal Server Error. This might look something like the following from the command line:

$ curl -v localhost:8000/
> GET /status HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 500 Internal Server Error
< Date: Sun, 24 Mar 2019 22:56:27 GMT
< Content-Type: application/json
< Date: Sun, 24 Mar 2019 10:56:27 PM GMT
< Connection: close
< Content-Length: 97
< 
* Closing connection 0
{"error":["500 - Internal server error"],"message":["Error: Missing required 'id' parameter.\n"]}

There are two problems with this approach: first, it gives you almost zero control over how errors are reported to real users, and second, it’s badly behaved at the protocol level – HTTP status codes provide for much more granular and semantically meaningful error reporting.

In my view, the key to overcoming these problems is treating errors as more than simply a message and adding additional context when they are emitted. This is sometimes called structured error handling, and although it has not been used much historically in R, this may be changing. As you’ll see, we can take advantage of R’s powerful condition system to implement rich error handling and reporting for Plumber APIs with relative ease.

Continue Reading →

Writing Proprietary R Packages

Author’s note: this is a lightly modified version of the talk I gave at the GTA R User’s Group in May of this year. You can find the original slides here. Unfortunately, the talk was not recorded.

As I have noted before, most resources for R package authors are pitched at those writing open-source packages — usually hosted on GitHub, and with the goal of ending up on CRAN.

These are valuable resources, and reflect the healthy free and open-source (FOSS) R package ecosystem. But it is not the whole story. Many R users, especially those working as data scientists in industry, can and should be writing packages for internal use within their company or organisation.

Yet there is comparatively little out there about how to actually put together high-quality packages in these environments.

This post is my attempt to address that gap.

At work we have more than 50 internal R packages, and I have been heavily involved in building up the culture and tooling we use to make managing those packages possible over the last two years.

I’ll focus on three major themes: code, tooling, and culture.

Continue Reading →

An Autoconf Primer for R Package Authors

Have you ever noticed something like the following when you’re installing an R package?

checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
...
configure: creating ./config.status
config.status: creating src/Makevars

on Windows, users install pre-built packages, but on all other platforms (including macOS and Linux), they are built from source, including any native C, C++, or Fortran code. Often it’s enough to use the default R settings to build these packages, but sometimes you might need know a bit more about a user’s system in order to get things working correctly.

For this reason, R permits packages to have a ./configure shell script to make these checks before a package is installed.

Continue Reading →