Unconstant Conjunction latest posts

Joining RStudio

Over the last few years I’ve become interested in opportunities for using R in production, in the broad sense of code with users other than its original author. The reason is that this is an important avenue for R users to bring value to others. In the words of David Robinson, “anything still on your computer is, to a first approximation, useless.”

My public work in the R community has also largely fallen into this category, including production-grade metrics and logging packages, connecting R to RabbitMQ (which is very popular in enterprise production systems), and writing R’s first external profiler – a tool for answering questions like “why is my R code slow in production?

My private work in R has included work on platforms and tools to build and host many dozens of internal packages as well as development of production R APIs and Shiny apps. I was also intimately involved in migrating many of these workloads to Kubernetes – increasingly the target for production workloads in industry, and a tool I feel strongly about as part of R’s future.

In my experience it can be empowering for R users to have a clear path from ad-hoc analysis to “data products” – graphics, emails, reports, APIs, or even interactive applications – as they need them. And so I’ve been advocating for these data products to have production-friendly defaults (or at least production-friendly stories), all while personally discovering what it means, operationally, to actually take R to production.

A few months ago I had the realisation that if I’m really serious about expanding the frontier for R in production, the best place to do so is at RStudio. No organisation is as plugged into the broader ecosystem as they are, and no organisation has as broad a reach to improve these tools and techniques for the wider data science community, in R and beyond.

So I’m happy to announce that I have joined RStudio, at least in part to work on improving the story for R in production, and to help out the folks making that leap.

Continue Reading →

Shipping Application Logs from RStudio Connect

RStudio has an enterprise offering called RStudio Connect that is designed to host R (and now Python) content. In my experience it’s a great platform for R users using the RStudio IDE, and works particularly well for frictionless deployment of internal Shiny apps or RMarkdown reports.

But one of the things we’ve struggled with in using Connect is making Shiny app logs useful. You can view logs for individual sessions in the browser (if you know what you’re looking for), but there’s no searching or aggregation of any kind. This is the existing interface:

RStudio Connect Log GUI

This is doubly frustrating for users with experience with any other log management system – a crowded field, with many commercial and open-source solutions available. These solutions typically offer not only searching but complex querying, visualisation, and ad-hoc dashboard capabilities.1 For example, check out the landing page for Elasticsearch/Kibana, which is the most popular open-source option. Common commercial choices include Splunk and Datadog, and all of the cloud vendors have highly-integrated platforms of their own.

Unfortunately, Connect won’t work natively with any of these tools. Log management is focused on the per-session GUI.

Motivated by our own internal desire to get Shiny logs into Splunk, I wrote a Connect-specific plugin for the open-source Fluent Bit project. Fluent Bit itself is a fast, lightweight log forwarding and aggregation program with a vibrant plugin community.

Continue Reading →

.NET, rsync, and the Linux Page Cache: A Kubernetes War Story

Everyone gets a Kubernetes war story, right? Well, here’s mine.

At work we have a huge, business-critical C# service that I helped to port to Linux (from the Windows-only .NET framework) to run under Kubernetes. This service emits a large number of logs.

Like, a lot of logs. Many gigabytes of logs per day, in sequentially-numbered text files of 50MB each.

For various reasons the developers of this application preferred to be able to look at these logs as actual files for post-mortem analysis rather than through our existing centralised logging tools. Under Windows they would remote into a server and inspect a mounted drive with the logs.

As part of the porting effort they asked me to emulate this workflow.

Continue Reading →

Pushing Prometheus Metrics from R Scripts and Reports

From R to Pushgateway to Prometheus

The openmetrics R package now supports pushing metrics to a Prometheus Pushgateway instance, which is useful for short-lived batch scripts or RMarkdown reports.

You might want to expose metrics from these scripts or reports to Prometheus in order to improve monitoring and alerting on failures, but many of these processes are not around long enough to run a webserver that Prometheus can pull from.

This is where the Pushgateway comes in. It allows you to push metrics to a centralised location where they can be aggregated and then scraped by Prometheus itself. But beware: there are a limited number of use cases for pushing metrics, and you should always prefer pull-based methods when possible.

Continue Reading →

Annotating Deployments in Grafana Using the Process Start Time Metric

Grafana sports a feature called Annotations that allow you to label a timestamp on a dashboard with meaningful events – most commonly deployments, campaigns, or outages:

Process start time annotations on a Grafana panel

(In this case annotating the simulated deployment of a Fluent Bit container, which I’ve used to forward container logs out of the cluster.)

Annotations can be input manually, but the only recommendations I’ve seen to generate them automatically is to use something like Loki, or teaching your CI/CD system to interact with Grafana’s web API. However, if you’re running a simple Prometheus + Grafana stack (say, using the Prometheus Operator on Kubernetes), you might be reticent to add more complexity to your setup just to get deployment annotations.

Fortunately, there’s a simpler alternative for this narrow case: you can use the process_start_time_seconds metric from Prometheus to get an approximate idea of when apps or pods were started. I haven’t seen this approach recommended elsewhere, which is the purpose of this post.

Continue Reading →