library(gh)
gh generally sends a Personal Access Token (PAT) with its requests. Some endpoints of the GitHub API can be accessed without authenticating yourself. But once your API use becomes more frequent, you will want a PAT to prevent problems with rate limits and to access all possible endpoints.
This article describes how to store your PAT, so that gh can find it
(automatically, in most cases). The function gh uses for this is
gh_token()
.
More resources on PAT management:
usethis::gh_token_help()
and
usethis::git_sitrep()
help you check if a PAT is
discoverable and has suitable scopesusethis::create_github_token()
guides you through the
process of getting a new PATgitcreds::gitcreds_set()
helps you explicitly put your
PAT into the Git credential storegh::gh()
allows the user to provide a PAT via the
.token
argument and to specify a host other than
“github.com” via the .api_url
argument. (Some companies and
universities run their own instance of GitHub Enterprise.)
gh(endpoint, ..., .token = NULL, ..., .api_url = NULL, ...)
However, it’s annoying to always provide your PAT or host and it’s
unsafe for your PAT to appear explicitly in your R code. It’s important
to make it possible for the user to provide the PAT and/or API
URL directly, but it should rarely be necessary. gh::gh()
is designed to play well with more secure, less fiddly methods for
expressing what you want.
How are .api_url
and .token
determined when
the user does not provide them?
.api_url
defaults to the value of the
GITHUB_API_URL
environment variable and, if that is unset,
falls back to "https://api.github.com"
. This is always done
before worrying about the PAT.gh_token(.api_url)
.
That is, the token is looked up based on the host.gh now uses the gitcreds package to interact with the Git credential store.
gh calls gitcreds::gitcreds_get()
with a URL to try to
find a matching PAT. gitcreds::gitcreds_get()
checks
session environment variables and then the local Git credential store.
Therefore, if you have previously used a PAT with, e.g., command line
Git, gh may retrieve and re-use it. You can call
gitcreds::gitcreds_get()
directly, yourself, if you want to
see what is found for a specific URL.
::gitcreds_get() gitcreds
If you see something like this:
#> <gitcreds>
#> protocol: https
#> host : github.com
#> username: PersonalAccessToken
#> password: <-- hidden -->
that means that gitcreds could get the PAT from the Git credential
store. You can call gitcreds_get()$password
to see the
actual PAT.
If no matching PAT is found, gitcreds::gitcreds_get()
errors.
If you don’t have a Git installation, or your Git installation does
not have a working credential store, then you can specify the PAT in an
environment variable. For github.com
you can set the
GITHUB_PAT_GITHUB_COM
or GITHUB_PAT
variable.
For a different GitHub host, call
gitcreds::gitcreds_cache_envvar()
with the API URL to see
the environment variable you need to set. For example:
::gitcreds_cache_envvar("https://github.acme.com")
gitcreds#> [1] "GITHUB_PAT_GITHUB_ACME_COM"
On a machine used for interactive development, we recommend:
Store your PAT(s) in an official credential store.
Do not store your PAT(s) in plain text in, e.g.,
.Renviron
. In the past, this has been a common and
recommended practice for pragmatic reasons. However, gitcreds/gh have
now evolved to the point where it’s possible for all of us to follow
better security practices.
If you use a general-purpose password manager, like 1Password or LastPass, you may also want to store your PAT(s) there. Why? If your PAT is “forgotten” from the OS-level credential store, intentionally or not, you’ll need to provide it again when prompted.
If you don’t have any other record of your PAT, you’ll have to get a new PAT whenever this happens. This is not the end of the world. But if you aren’t disciplined about deleting lost PATs from https://github.com/settings/tokens, you will eventually find yourself in a confusing situation where you can’t be sure which PAT(s) are in use.
On a headless system, such as on a CI/CD platform, provide the necessary PAT(s) via secure environment variables. Regular environment variables can be used to configure less sensitive settings, such as the API host. Don’t expose your PAT by doing something silly like dumping all environment variables to a log file.
Note that on GitHub Actions, specifically, a personal access token is
automatically
available to the workflow as the GITHUB_TOKEN
secret.
That is why many workflows in the R community contain this snippet:
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
This makes the automatic PAT available as the GITHUB_PAT
environment variable. If that PAT doesn’t have the right permissions,
then you’ll need to explicitly provide one that does (see link above for
more).
If there is no PAT to be had, gh::gh()
sends a request
with no token. (Internally, the Authorization
header is
omitted if the PAT is found to be the empty string,
""
.)
What do PAT-related failures look like?
If no PAT is sent and the endpoint requires no auth, the request probably succeeds! At least until you run up against rate limits. If the endpoint requires auth, you’ll get an HTTP error, possibly this one:
GitHub API error (401): 401 Unauthorized
Message: Requires authentication
If a PAT is first discovered in an environment variable, it is taken
at face value. The two most common ways to arrive here are PAT
specification via .Renviron
or as a secret in a CI/CD
platform, such as GitHub Actions. If the PAT is invalid, the first
affected request will fail, probably like so:
GitHub API error (401): 401 Unauthorized
Message: Bad credentials
This will also be the experience if an invalid PAT is provided
directly via .token
.
Even a valid PAT can lead to a downstream error, if it has insufficient scopes with respect to a specific request.