iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔎

Selectively Loading Functions from R Packages with base::use()

に公開

It seems named imports will be available from R-4.5.0 (?)

Every year, around late April, an R update arrives. Especially in the spring update, minor or major versions often increment, so if things proceed at the usual pace, R-4.5.0 should be released in April 2025.

While looking through the R-devel release notes (or what should become the documentation) for R-4.5.0, besides information like "C23 will be the default compiler for R package installation" and "-DR_NO_REMAP will be automatically enabled when compiling C++ code," the following item caught my eye.

New function use() to use packages in R scripts with full control over what gets added to the search path. (Actually already available since R 4.4.0.)

You didn't know about use(), did you?

I think most ordinary people probably hadn't noticed. In fact, it seems it's already available in the currently released R-4.4.x, so I gave it a try.

Named imports via base::use()

Checking the help for base::use() shows the following (confirmed in R-4.4.2):

Description

Use packages in R scripts by loading their namespace and attaching a
package environment including (a subset of) their exports to the
search path.

Usage

use(package, include.only)

Arguments

package

a character string given the name of a package.

include.only

character vector of names of objects to include in the attached
environment frame. If missing, all exports are included.

Details

This is a simple wrapper around library which always uses
attach.required = FALSE, so that packages listed in the Depends clause
of the DESCRIPTION file of the package to be used never get attached
automatically to the search path.

This therefore allows to write R scripts with full control over what
gets found on the search path. In addition, such scripts can easily be
integrated as package code, replacing the calls to use by the
corresponding ImportFrom directives in ‘NAMESPACE’ files.

Value

(invisibly) a logical indicating whether the package to be used is
available.

Note

This functionality is still experimental: interfaces may change in
future versions.

"Use packages in R scripts by loading their namespace and attaching a package environment including (a subset of) their exports to the search path" essentially means that you can load a package's namespace and add a package environment containing only specific objects from that package to the search path.

Reading just this description might be confusing, but basically, for example, if you do the following:

use("dplyr", c("filter", "mutate"))
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:stats':
#>
#>     filter

This means that among the functions provided by the dplyr package, you can load only filter() and mutate(). I'm not sure what this feature is generally called, but here, following the naming convention for import in JavaScript, I will call this a "named import."

Named import??

The term "named import" might sound a bit vague, but it refers to binding something with a specific name directly to that name. Something like Python's import numpy as np, where a namespace is bound to another (potentially different) namespace name, is called a "namespace import" on the page linked above.

Since we have performed a named import of only filter() and mutate() from dplyr here, we can call filter() and mutate() without the :: prefix as shown below. However, trying to call select() (intending to use dplyr::select()) results in an error.

dat <- mtcars |>
  filter(mpg > 20) |>
  mutate(mpg = mpg * 2)
dat
#>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4      42.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag  42.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710     45.6   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive 42.8   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> Merc 240D      48.8   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> Merc 230       45.6   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> Fiat 128       64.8   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Honda Civic    60.8   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Toyota Corolla 67.8   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> Toyota Corona  43.0   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> Fiat X1-9      54.6   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Porsche 914-2  52.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> Lotus Europa   60.8   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Volvo 142E     42.8   4 121.0 109 4.11 2.780 18.60  1  1    4    2

try(select(dat, mpg))
#>
#> Error in select(dat, mpg) : could not find function "select"

Explanation

Let's explain in more detail what "Use packages in R scripts by loading their namespace and attaching a package environment including (a subset of) their exports to the search path" means.

Environments in the R language are explained in Chapter 7 of Hadley's Advanced R, so please refer to that as well if you're interested.

First, an environment is, roughly speaking, like a box where variables can exist. When you do something like dat <- mtcars in R, mtcars is bound to the variable dat within an environment named .GlobalEnv.

When we run library(dplyr), it might look at first glance as if the functions provided by the dplyr package have been added collectively to our current environment. However, this doesn't mean that functions like filter provided by dplyr are newly bound in .GlobalEnv. In fact, we can create another function named filter in .GlobalEnv, and the function created that way will be called with higher priority than dplyr's filter.

This is because, conceptually, environments are nested. .GlobalEnv is the innermost environment, and it is surrounded by package environments loaded via library(). A variable that doesn't exist in the inner environment is searched for in its parent environment one level out, and if it doesn't exist there either, it's searched for in the next parent environment.

In other words, library() corresponds to preparing the specified package environment as a parent of .GlobalEnv. Advanced R explains this point as follows:

Each package attached by library() or require() becomes one of the parents of the global environment. The immediate parent of the global environment is the last package you attached, the parent of that package is the second to last package you attached, …

If you follow all the parents back, you see the order in which every package has been attached. This is known as the search path because all objects in these environments can be found from the top-level interactive workspace.

You can check the list of package environments on the search path using base::search(). If we actually call search() after running the previous code, you can see that the immediate parent environment of .GlobalEnv is package:dplyr as follows:

search()
#>
#>  [1] ".GlobalEnv"        "package:dplyr"     "package:stats"
#>  [4] "package:graphics"  "package:grDevices" "package:utils"
#>  [7] "package:datasets"  "package:methods"   "Autoloads"
#> [10] "package:base"

By the way, each package environment is derived from an environment separate from the ones on the search path that we can access. These are called package namespaces. A namespace is loaded in its entirety just by accessing an object contained within it, but the package environment isn't added to the search path unless you call library(). When talking about R, the former ("the namespace is loaded") is described as being loaded, while the latter ("the package environment is added to the search path") is referred to as being attached.

Let's verify this. First, since package:dplyr was on the search path earlier, the dplyr package environment is already attached. You can access the dplyr package environment attached in this session via rlang::pkg_env("dplyr"), and you can confirm that while it contains filter and mutate, it does not contain select.

rlang::is_attached("package:dplyr")
#>
#> [1] TRUE
rlang::env_has(rlang::pkg_env("dplyr"), c("filter", "mutate", "select"))
#>
#> filter mutate select
#>   TRUE   TRUE  FALSE

Confusing Aspects

Now, what happens if we run library(dplyr) again?

library(dplyr)
rlang::env_has(rlang::pkg_env("dplyr"), c("filter", "mutate", "select"))
#> filter mutate select
#>   TRUE   TRUE  FALSE

It doesn't change. It seems that with the current specification (?), once you have selectively performed a named import, you cannot update the contents of that package environment even if you use library() or use() again. For instance, if you want to add just dplyr::select to the search path later, you need to detach("package:dplyr") first.

detach("package:dplyr")
use("dplyr", "select")
rlang::env_has(rlang::pkg_env("dplyr"), c("filter", "mutate", "select"))
#> filter mutate select
#>  FALSE  FALSE   TRUE

Also, since this seems to be a simple wrapper for library(), it affects the outer search path even when used inside a function.

rlang::is_attached("package:tidyr")
#> [1] FALSE
use_tidyr <- \() {
  use("tidyr", c("pivot_longer", "pivot_wider"))
}
use_tidyr()
rlang::is_attached("package:tidyr")
#> [1] TRUE

The help states that it might be used as an alternative to @importFrom when writing R packages, but it's not as if you can restrict the scope of the attachment to just inside a regular function. Therefore, it's unlikely that box::use() or import::here() will become obsolete. If you want to achieve that kind of scoped import, please refer to articles like the following:

Summary

  • It seems that from R-4.5.0, you will be able to load only specific functions from an R package using base::use().
  • Will there be situations where this is useful to know? Perhaps...?
  • Let's keep it in mind.
GitHubで編集を提案

Discussion