library(targets)
library(ggplot2)
library(tidyverse)
library(tidybayes)
You can give uncorrelated random numbers a specific correlation by using the Cholesky decomposition. This comes in handy when you’re modelling correlated random variables using MCMC.
<- rethinking::rlkjcorr(1, 2, 1)
eg eg
[,1] [,2]
[1,] 1.000000 -0.351295
[2,] -0.351295 1.000000
<- chol(eg)
cc
::rerun(30,{
purrr<- matrix(data = rnorm(500), ncol = 2)
zz # plot(zz)
<- t(cc) %*% t(zz)
rr # plot(t(rr))
cor(t(rr))[1,2]
|>
}) ::flatten_dbl() |> density() |> plot(main = "Simulated correlations") purrr
Warning: `rerun()` was deprecated in purrr 1.0.0.
ℹ Please use `map()` instead.
# Previously
rerun(30, {
zz <- matrix(data = rnorm(500), ncol = 2)
rr <- t(cc) %*% t(zz)
cor(t(rr))[1, 2]
})
# Now
map(1:30, ~ {
zz <- matrix(data = rnorm(500), ncol = 2)
rr <- t(cc) %*% t(zz)
cor(t(rr))[1, 2]
})
abline(v=eg[1,2])
Here we can see that the correlation we want is -0.351295, and indeed we’re able to give exactly that correlation to uncorrelated random numbers.
<- rethinking::rlkjcorr(1, 2, 5)
eg print(eg)
<- chol(eg)
cc
t(cc) %*% cc
<- matrix(rnorm(4000, mean = 0, sd = 1), ncol = 2)
mm
plot(mm)
<- t(t(cc) %*% t(mm))
yy
plot(yy)
[,1] [,2]
[1,] 1.0000000 -0.5282398
[2,] -0.5282398 1.0000000
[,1] [,2]
[1,] 1.0000000 -0.5282398
[2,] -0.5282398 1.0000000
Doing it by hand
Because the case for only two random variables is pretty simple, we can actually write out the Cholesky decomposition by hand. Here I want to give some independent random numbers a correlation of -0.8:
<- -.8
p <- matrix(c(1,0, p, sqrt(1 - p^2)), ncol = 2, byrow = TRUE)
L <- matrix(data = rnorm(1000), ncol = 2)
zz <- t(L %*% t(zz))
yy plot(yy)
cor(yy)
[,1] [,2]
[1,] 1.0000000 -0.7809624
[2,] -0.7809624 1.0000000
This can be written without matrix multiplication like this:
<- rnorm(1000)
z1 <- rnorm(1000)
z2
<- z1
y1 <- p*z1 + sqrt(1 - p^2)*z2
y2
plot(y1, y2)
cor(y2, y1)
[1] -0.8054704
In a Stan model, I might want to use a scaled beta distribution to model the correlation between, say, slopes and intercepts in a model:
<- .84
mu <- 80
phi <- rbeta(1, mu*phi, (1 - mu)*phi)
g <- g*2 - 1
p_trans
<- function(n, p) {
sim_y_corr <- rnorm(n)
z1 <- rnorm(n)
z2
<- z1
y1 <- p*z1 + sqrt(1 - p^2)*z2
y2
return(data.frame(y1, y2))
}
sim_y_corr(n = 1000, p = p_trans) |> plot()
or even work on the logit scale:
<- rnorm(1, mean = -1, sd = .1)
q <- plogis(q)*2 - 1
p_trans_n sim_y_corr(n = 1000, p = p_trans_n) |> plot()
This is even a little easier to think about: because of all the transformations, 0 on the logit scale still means a 0 correlation; negative and positive numbers mean negative and positive correlations as well.
Another way to make correlation matrices is here: https://www.rdatagen.net/post/2023-02-14-flexible-correlation-generation-an-update-to-gencorgen-in-simstudy/