Density ratio estimation is described as follows: for given two data samples \(x1\) and \(x2\) from unknown distributions \(p(x)\) and \(q(x)\) respectively, estimate
\[ w(x) = \frac{p(x)}{q(x)} \]
where \(x1\) and \(x2\) are \(d\)-dimensional real numbers.
The estimated density ratio function \(w(x)\) can be used in many applications such as anomaly detection [Hido et al. 2011], change-point detection [Liu et al. 2013], and covariate shift adaptation [Sugiyama et al. 2007]. Other useful applications about density ratio estimation were summarized by [Sugiyama et al. 2012].
The package densratio provides a function densratio()
that returns an object with a method to estimate density ratio as compute_density_ratio()
.
For example,
set.seed(3)
x1 <- rnorm(200, mean = 1, sd = 1/8)
x2 <- rnorm(200, mean = 1, sd = 1/2)
library(densratio)
densratio_obj <- densratio(x1, x2)
The function densratio()
estimates the density ratio of \(p(x)\) to \(q(x)\), \[
w(x) = \frac{p(x)}{q(x)} = \frac{\rm{Norm}(1, 1/8)}{\rm{Norm}(1, 1/2)}
\] and provides a function to compute estimated density ratio.
The densratio object has a function compute_density_ratio()
that can compute density ratio \(\hat{w}(x) \simeq p(x)/q(x)\) for any \(d\)-dimensional input \(x\) (here \(d=1\)).
new_x <- seq(0, 2, by = 0.05)
w_hat <- densratio_obj$compute_density_ratio(new_x)
plot(new_x, w_hat, pch=19)
In this case, the true density ratio \(w(x) = p(x)/q(x) = \rm{Norm}(1, 1/8) / \rm{Norm}(1, 1/2)\) is known. So we can compare \(w(x)\) with the estimated density ratio \(\hat{w}(x)\).
true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)
plot(true_density_ratio, xlim=c(0, 2), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")
plot(densratio_obj$compute_density_ratio, xlim=c(0, 2), lwd=2, col="green", add=TRUE)
legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)
You can install the densratio package from CRAN.
You can also install the package from GitHub.
install.packages("remotes") # If you have not installed "remotes" package
remotes::install_github("hoxo-m/densratio")
The source code for densratio package is available on GitHub at
The package provides densratio()
. The function returns an object that has a function to compute estimated density ratio.
For data samples x1
and x2
,
library(densratio)
x1 <- rnorm(200, mean = 1, sd = 1/8)
x2 <- rnorm(200, mean = 1, sd = 1/2)
result <- densratio(x1, x2)
In this case, densratio_obj$compute_density_ratio()
can compute estimated density ratio.
densratio()
has method
argument that you can pass "uLSIF"
, "RuSLIF"
, or "KLIEP"
.
The methods assume that density ratio are represented by linear model:
\[ w(x) = \theta_1 K(x, c_1) + \theta_2 K(x, c_2) + ... + \theta_b K(x, c_b) \]
where
\[ K(x, c) = \exp\left(-\frac{\|x - c\|^2}{2 \sigma ^ 2}\right) \]
is the Gaussian (RBF) kernel.
densratio()
performs the following:
densratio
object, and are used when to compute density ratio in the call compute_density_ratio()
.You can display information of densratio objects. Moreover, you can change some conditions to specify arguments of densratio()
.
##
## Call:
## densratio(x1 = x1, x2 = x2, method = "uLSIF")
##
## Kernel Information:
## Kernel type: Gaussian
## Number of kernels: 100
## Bandwidth (sigma): 0.1
## Centers: num [1:100, 1] 0.907 1.093 1.18 1.136 1.046 ...
##
## Kernel Weights:
## num [1:100] 0.067455 0.040045 0.000459 0.016849 0.067084 ...
##
## Regularization Parameter (lambda): 1
##
## Function to Estimate Density Ratio:
## compute_density_ratio()
kernel_num
argument. In default, kernel_num = 100
.sigma = "auto"
, the algorithm automatically select an optimal value by cross validation. If you set sigma
a number, that will be used. If you set sigma
a numeric vector, the algorithm select an optimal value in them by cross validation.x1
underlying a numerator distribution \(p(x)\). You can find the whole values in result$kernel_info$centers
.theta
parameters in the linear kernel model. You can find these values in result$kernel_weights
.compute_density_ratio()
.So far, the input data samples x1
and x2
were one dimensional. densratio()
allows to input multidimensional data samples as matrix
, as long as their dimensions are the same.
For example,
library(densratio)
library(mvtnorm)
set.seed(3)
x1 <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))
x2 <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))
densratio_obj_d2 <- densratio(x1, x2)
densratio_obj_d2
##
## Call:
## densratio(x1 = x1, x2 = x2, method = "uLSIF")
##
## Kernel Information:
## Kernel type: Gaussian
## Number of kernels: 100
## Bandwidth (sigma): 0.316
## Centers: num [1:100, 1:2] 1.257 0.758 1.122 1.3 1.386 ...
##
## Kernel Weights:
## num [1:100] 0.0756 0.0986 0.059 0.0797 0.0421 ...
##
## Regularization Parameter (lambda): 0.3162278
##
## Function to Estimate Density Ratio:
## compute_density_ratio()
In this case, as well, we can compare the true density ratio with the estimated density ratio.
true_density_ratio <- function(x) {
dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /
dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))
}
N <- 20
range <- seq(0, 2, length.out = N)
input <- expand.grid(range, range)
w_true <- matrix(true_density_ratio(input), nrow = N)
w_hat <- matrix(densratio_obj_d2$compute_density_ratio(input), nrow = N)
par(mfrow = c(1, 2))
contour(range, range, w_true, main = "True Density Ratio")
contour(range, range, w_hat, main = "Estimated Density Ratio")