Have you ever noticed something like the following when you’re installing an R package?
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
...
configure: creating ./config.status
config.status: creating src/Makevars
on Windows, users install pre-built packages, but on all other platforms (including macOS and Linux), they are built from source, including any native C, C++, or Fortran code. Often it’s enough to use the default R settings to build these packages, but sometimes you might need know a bit more about a user’s system in order to get things working correctly.
For this reason, R permits packages
to have a ./configure
shell script to make these checks before a package is
installed.
Far and away the most common reason you’d need a ./configure
script is to
check for the presence of a system library that your package needs. You can’t be
sure in advance what this library will be called – it may have a different name
on Linux and macOS, for example – or where it will be located on a user’s
machine, so you need at least a little bit of logic to try and track it down.
Some package authors like to write ./configure
scripts by hand (for example,
data.table
and protolite).
But many package authors instead opt to use Autoconf,
which is an arcane but highly useful language and tool for generating
./configure
scripts that will work on most systems. It also comes with a ton
of built-in functions for the kinds of checks you are likely to make.
It is Autoconf that generates output like the snippet at the start of the post.
Unfortunately, there are few resources out there on how to write Autoconf scripts for R packages.1 Partly as a result, these scripts can be messy and inconsistent in practice. This post aims to provide the boilerplate you need to get started with some “best practices” in place.
Getting Started
To use Autoconf, you’ll generally need the following:
- A
configure.ac
script. - Template files, most often a
src/Makevars.in
. - Various entries in your
.gitignore
or.Rbuildignore
; and - Optionally, a
./cleanup
script to delete various intermediate files.
First, create an src/Makevars.in
file with the following contents:2
PKG_CPPFLAGS = -I. @PKG_CPPFLAGS@
PKG_LIBS = @PKG_LIBS@
This is a template file that Autoconf will fill in to produce a final Makevars
file telling R how to compile your package correctly. As such, you should add
src/Makevars.in
to your .Rbuildignore
and src/Makevars
to your
.gitignore
.
Second, create a file called configure.ac
like the one below:
AC_INIT([<package>],[<version>])
# Find the compiler and compiler flags used by R.
: ${R_HOME=`R RHOME`}
if test -z "${R_HOME}"; then
echo "could not determine R_HOME"
exit 1
fi
CC=`"${R_HOME}/bin/R" CMD config CC`
CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
# Search for a system library, in this case the zlib compression library (which
# contains the function 'deflate'). This sets the variable 'ac_cv_search_deflate'
# to what must be passed to ${PKG_LIBS}.
AC_SEARCH_LIBS(deflate, z, [], [AC_ERROR([The zlib library is required.])])
AC_CHECK_HEADERS(zlib.h, [], [AC_ERROR([The zlib library headers are required.])])
# Write the flags into the src/Makevars file.
AC_SUBST([PKG_CPPFLAGS], ["${PKG_CPPFLAGS}"])
AC_SUBST([PKG_LIBS], ["${LIBS} ${PKG_LIBS} ${ac_cv_search_deflate}"])
AC_CONFIG_FILES([src/Makevars])
AC_OUTPUT
echo "
--------------------------------------------------
Configuration for ${PACKAGE_NAME} ${PACKAGE_VERSION}
cppflags: ${CPPFLAGS} ${PKG_CPPFLAGS}
libs: ${PKG_LIBS}
--------------------------------------------------
"
I was not kidding about the “arcane” bit. But you can now run the command
autoconf
(or, better yet autoreconf
) in your package directory to generate
a configure
script that R can understand.
Now, the contents of this generated script are scary and near-unreadable, but
that doesn’t really matter, since you will never edit the file by hand anyway.
It is good practice to check both the configure.ac
and configure
files into
version control, and in fact R CMD check
will even warn you if you forget.
Finally, you might also notice that autoreconf
produces a bunch of
intermediate files that you will want to add you your .gitignore
:
autom4te.cache
config.log
config.status
This should give you a functional Autoconf setup that will check for the zlib
library on the user’s system and link it to your compiled code.
More Common Patterns
1. Using C++ Instead of C
For a project using C++ code instead of C – for example, if you are using Rcpp – you can make the following small adjustment. Replace
CC=`"${R_HOME}/bin/R" CMD config CC`
CFLAGS=`"${R_HOME}/bin/R" CMD config CFLAGS`
CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
in the example above with
CXX=`"${R_HOME}/bin/R" CMD config CXX`
CXXFLAGS=`"${R_HOME}/bin/R" CMD config CXXFLAGS`
CPPFLAGS=`"${R_HOME}/bin/R" CMD config CPPFLAGS`
AC_LANG(C++)
AC_PROG_CPP
Autoconf will then run automated checks for C++ features (instead of C). Note
that it is important to set CXX
, CXXFLAGS
and CPPFLAGS
before you make
these checks, to ensure that Autoconf will check these features using the C++
toolchain that R will actually use to compile your package.
2. Finding Libraries Using pkg-config
Most Linux distributions3 come with a program called pkg-config
that allows you to query installed libraries and the compiler flags used to
build them.
I would suggest using pkg-config
by default, falling back on
AC_CHECK_HEADERS
otherwise. Autoconf actually comes with a
PKG_PROG_PKG_CONFIG
macro that will set $PKG_CONFIG
if it is available, so
you can set up these checks fairly easily:
have_zlib=no
ZLIB_CXXFLAGS=""
ZLIB_LIBS="-lz"
PKG_PROG_PKG_CONFIG
if test [ -n "$PKG_CONFIG" ] ; then
AC_MSG_CHECKING([pkg-config for zlib])
if $PKG_CONFIG --exists zlib; then
have_zlib=yes
ZLIB_CXXFLAGS=`"${PKG_CONFIG}" --cflags zlib`
ZLIB_LIBS=`"${PKG_CONFIG}" --libs zlib`
fi
AC_MSG_RESULT([${have_zlib}])
fi
if test "x${have_zlib}" = xno; then
AC_CHECK_HEADERS(zlib.h, [have_zlib=yes], [AC_ERROR(
[The zlib library headers are required.]
)])
fi
AC_SUBST([PKG_LIBS], ["${LIBS} ${PKG_LIBS} ${ZLIB_LIBS}"])
3. Showing Users What They Need to Install
The examples above will print
The zlib library headers are required.
in case of failure. This isn’t a terribly helpful message for users, so an emerging best practice for R packages is to provide more actionable messages explaining what users need to install on various systems to get things working.
For example, you could remove the AC_ERROR
message above add the following
block below it:
if test "x${have_zlib}" = xno; then
AC_MSG_FAILURE([
---------------------------------------------
'zlib' and its header files are required.
Please install:
* zlib1g-dev (on Debian and Ubuntu)
* zlib-devel (on Fedora, CentOS and RHEL)
* zlib (via Homebrew on macOS)
* libz1 (on Solaris)
and try again.
If you believe this library is installed on your system but
this script is simply unable to find it, you can specify the
include and lib paths manually:
R CMD INSTALL ${PACKAGE_NAME} \\
--configure-vars='LIBS=-L/path/to/libs CPPFLAGS=-I/path/to/headers'
---------------------------------------------])
fi
A few quick searches should turn up the various package names on each platform. The ones in the example above are the main platforms that CRAN checks against, although instructions for Solaris are seldom worth the bother.
4. Using a Cleanup Script
R permits packages to have a ./cleanup
script. From Writing R Extensions:
Under a Unix-alike only, an executable (Bourne shell) script cleanup is executed as the last thing by R CMD INSTALL if option –clean was given, and by R CMD build when preparing the package for building from its source.
This can be a useful way to automatically clean up generated Autoconf files. Here is one to get you started:
#!/bin/sh
rm -f config.*
rm -f src/Makevars
rm -rf autom4te.cache
5. Windows Support
Windows users generally install binary packages, and will not run the
./configure
script during installation. However, you might want to help out
the CRAN maintainers who create these binary packages by creating a static,
Windows-oriented src/Makevars.win
file:
PKG_CPPFLAGS = -I. -lz
This will obviously break when it cannot find zlib
, but we’re punting that
responsibility to the binary package maintainers in this case.
Further Reading
If you’d like to dig deeper, the following popular R packages all have Autoconf scripts that I have looked to for inspiration in the past:
-
The git2r package checks for
libgit2
, and if none is found, falls back on a bundled version of that library instead. -
The RProtoBuf package checks not only for the Protobuf library, but also the
protoc
compiler needed to generate some of the package’s C++ headers. -
The rJava package allows for discovering (and further customising, via user-supplied flags) many features of the R <-> Java bridge.
-
The sf package checks for some complex and demanding system dependencies.
-
The uuid package checks for some system libraries, but also checks if the socket address structure on that platform contains a particular field.
And, for resources on Autoconf more broadly, check out the official documentation and Autotools Mythbuster, which has a more example-driven approach.
-
I once tried to get a
use_autoconf()
function incorporated into the usethis package, but Hadley Wickham thought that too few users would benefit for it to be worth maintaining. ↩︎ -
If you have an existing
src/Makevars
, it should be fairly clear how to move it tosrc/Makevars.in
and add the right templating syntax. ↩︎ -
pkg-config
is also available for macOS, although it is not installed by default. Other Unix-based systems (such as Solaris and the BSDs) usually havepkg-config
as well. ↩︎