Here is a quick tutorial on how to use the Robbins-Monro method of stochastic root searching to estimate the intermediate correlation matrix that initializes the NORTA method. This method is described in the article Simultaneous estimation of the intermediate correlation matrix for arbitrary marginal densities co-authored with Dr. Bruno Zumbo and Dr. Edward Kroc. (It’s still in-press, so expect this entry to be updated once we can link to the actual article)
First, we need to decide what the final correlation matrix will be (the one we are interested in simulating from). In our article we simulated data from a 3-factor model with equal inter-factor correlations of 0.3, equal loadings of 0.7 and equal error variances of 0.51. The final correlation matrix is therefore:
L <- matrix(c(0.7, 0.7, 0.7,
0.0, 0.0, 0.0,
0.0, 0.0, 0.0,
0.0, 0.0, 0.0,
0.7, 0.7, 0.7,
0.0, 0.0, 0.0,
0.0, 0.0, 0.0,
0.0, 0.0, 0.0,
0.7, 0.7, 0.7),9,3)
Phi <- matrix(c(1.0, 0.3, 0.3,
0.3, 1.0, 0.3,
0.3, 0.3, 1.0),3,3)
Psi <- diag(9)
diag(Psi) <- 0.51
Sigm <- L%*%Phi%*%t(L) + Psi
Cx <- Sigm
m <- dim(Cx)[1]
Next, one needs to decide on the types of marginal distributions and parameters one would want to simulate from. To highlight the flexibility of the method, a variety of marginal distributions (some discrete, some continuous) are used. The densities are specified as:
list_dist <- list(function(L) qchisq(L,df=1),
function(L) qt(L,df=5),
function(L) qbinom(L,5,0.5),
function(L) qbeta(L, shape1=1, shape2=5),
function(L) qpois(L, lambda=7),
function(L) qgamma(L, shape=3, rate=5),
function(L) qnorm(L,mean=0, sd=1),
function(L) qunif(L,min=0, max=1),
function(L) qbinom(L,1,0.3))
M<- c(1,
0,
5*0.5,
1/(1+5),
7,
3/5,
0,
0.5,
1*0.3)
S <- c(sqrt(2),
sqrt(5/(5-2)),
sqrt(5*0.5*0.5),
sqrt(5/((1+5)^2*(1+5+1))),
sqrt(7),
sqrt(3/5^2),
sqrt(1),
sqrt(1/12),
sqrt(1*0.7*0.3))
MM <- M%*%t(M)
SS <- S%*%t(S)
The ‘M
‘ and ‘S
‘ variables are for ‘mean’ and ‘standard deviation’ respectively. Notice how the entries follow the same order as the distributions in the list_dist
list. Also, keep an eye on the matrix approach to encode this information with ‘MM
‘ and ‘SS
‘. The function that estimates the intermediate correlation matrix needs these entries.
Once the final correlation matrix and the marginal densities are been decided on, we have all the elements we need to estimate the intermediate correlation matrix:
#Final correlation as starting point
C0 <- Cx
#Increasing iterations (iter) and tolerance (tol) increases the accuracy of
#estimation, but it also increases computing time.
iter<-500
tol<-0.001
#Can only take values in (0,1). Values close to 1 increase the accuracy of
#estimation, but it also increases computing time.
alph<-0.75
a<-1
#Sample size to verify empirical approximation. We recommend to set it at
#100,000
n<-100000
### Computation of Cz ###
##################################################################
L<-RM(C0,Cx,MM,SS,list_dist,a,iter,tol,n,alph)
Cz<-L[[1]]
Cx_est<-GN(Cz,Cx,MM,SS,list_dist,n)
And now that all the elements are in place, please find the link to the RM()
function that implements the stochastic root searching.