

Ahh  Mark Leeds just pointed out offlist that it was supposed to be
a correlation matrix.
Perhaps
R < matrix(runif(10000), ncol=100)
R < (R * lower.tri(R)) + t(R * lower.tri(R))
diag(R) < 1
These are of course uniformly distributed positive correlations.
Kingsford
On Fri, Aug 28, 2009 at 7:36 PM, Kingsford
Jones< [hidden email]> wrote:
> R < matrix(runif(10000), ncol=100)
>
> hth,
>
> Kingsford Jones
>
> On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma< [hidden email]> wrote:
>> Hi,
>>
>> How can I generate a random 100x100 correlation matrix, R={r_ij},
>> where about 10% of r_ij are greater than 0.9
>>
>> Thanks in advance.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi Kingsford,
There is more structure to a correlation matrix than that meets the eye! Your method will produce a matrix R that looks "like" a correlation matrix, but beware  it is an impostor!
You can obtain a valid correlation matrix, Q, from the impostor R by using the `nearPD' function in the "Matrix" package, which finds the positive definite matrix Q that is "nearest" to R. However, note that when R is far from a positivedefinite matrix, this step may give a Q that does not have the desired property.
require(Matrix)
R < matrix(runif(16), ncol=4)
R < (R * lower.tri(R)) + t(R * lower.tri(R))
diag(R) < 1
eigen(R)$val
Q < nearPD(R, posd.tol=1.e04)$mat
eigen(Q)$val
max(abs(Q  R)) # maximum discrepancy between R and Q
Another easy way to produce a valid correlation matrix is:
R < matrix(runif(36), ncol=6)
RtR < R %*% t(R)
Q < cov2cor(RtR)
But this does not have the property that the correlations are uniformly distributed.
Hope this helps,
Ravi.
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 5022619
email: [hidden email]
 Original Message 
From: Kingsford Jones < [hidden email]>
Date: Friday, August 28, 2009 10:12 pm
Subject: Re: [R] how to generate a random correlation matrix with restrictions
To: Ning Ma < [hidden email]>
Cc: [hidden email]
> Ahh  Mark Leeds just pointed out offlist that it was supposed to be
> a correlation matrix.
>
> Perhaps
>
> R < matrix(runif(10000), ncol=100)
> R < (R * lower.tri(R)) + t(R * lower.tri(R))
> diag(R) < 1
>
> These are of course uniformly distributed positive correlations.
>
> Kingsford
>
>
> On Fri, Aug 28, 2009 at 7:36 PM, Kingsford
> Jones< [hidden email]> wrote:
> > R < matrix(runif(10000), ncol=100)
> >
> > hth,
> >
> > Kingsford Jones
> >
> > On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma< [hidden email]> wrote:
> >> Hi,
> >>
> >> How can I generate a random 100x100 correlation matrix, R={r_ij},
> >> where about 10% of r_ij are greater than 0.9
> >>
> >> Thanks in advance.
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >>
> >> PLEASE do read the posting guide
> >> and provide commented, minimal, selfcontained, reproducible code.
> >>
> >
>
> ______________________________________________
> [hidden email] mailing list
>
> PLEASE do read the posting guide
> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks Ravi  with my limited linear algebra skills I was assuming
invertible symmetric was sufficient (rather than just necessary) for
positive definiteness. So, the open challenge is to generate a pd
matrix of dimension 100 with r_ij = 1 if i=j else 1< r_ij< 1, with
about 10% of the elements >.9.
I tried for a bit without luck, but I can offer another way to
generate a pd matrix:
ev < runif(100) # choose some positive eigenvalues
U < svd(matrix(runif(10000), nc=100))$u # an orthogonal matrix
pdM < U %*% diag(ev) %*% t(U)
Kingsford
On Fri, Aug 28, 2009 at 9:41 PM, Ravi Varadhan< [hidden email]> wrote:
> Hi Kingsford,
>
> There is more structure to a correlation matrix than that meets the eye! Your method will produce a matrix R that looks "like" a correlation matrix, but beware  it is an impostor!
>
> You can obtain a valid correlation matrix, Q, from the impostor R by using the `nearPD' function in the "Matrix" package, which finds the positive definite matrix Q that is "nearest" to R. However, note that when R is far from a positivedefinite matrix, this step may give a Q that does not have the desired property.
>
> require(Matrix)
>
> R < matrix(runif(16), ncol=4)
>
> R < (R * lower.tri(R)) + t(R * lower.tri(R))
>
> diag(R) < 1
>
> eigen(R)$val
>
> Q < nearPD(R, posd.tol=1.e04)$mat
>
> eigen(Q)$val
>
> max(abs(Q  R)) # maximum discrepancy between R and Q
>
> Another easy way to produce a valid correlation matrix is:
>
> R < matrix(runif(36), ncol=6)
>
> RtR < R %*% t(R)
>
> Q < cov2cor(RtR)
>
> But this does not have the property that the correlations are uniformly distributed.
>
> Hope this helps,
> Ravi.
>
> ____________________________________________________________________
>
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
>
> Ph. (410) 5022619
> email: [hidden email]
>
>
>  Original Message 
> From: Kingsford Jones < [hidden email]>
> Date: Friday, August 28, 2009 10:12 pm
> Subject: Re: [R] how to generate a random correlation matrix with restrictions
> To: Ning Ma < [hidden email]>
> Cc: [hidden email]
>
>
>> Ahh  Mark Leeds just pointed out offlist that it was supposed to be
>> a correlation matrix.
>>
>> Perhaps
>>
>> R < matrix(runif(10000), ncol=100)
>> R < (R * lower.tri(R)) + t(R * lower.tri(R))
>> diag(R) < 1
>>
>> These are of course uniformly distributed positive correlations.
>>
>> Kingsford
>>
>>
>> On Fri, Aug 28, 2009 at 7:36 PM, Kingsford
>> Jones< [hidden email]> wrote:
>> > R < matrix(runif(10000), ncol=100)
>> >
>> > hth,
>> >
>> > Kingsford Jones
>> >
>> > On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma< [hidden email]> wrote:
>> >> Hi,
>> >>
>> >> How can I generate a random 100x100 correlation matrix, R={r_ij},
>> >> where about 10% of r_ij are greater than 0.9
>> >>
>> >> Thanks in advance.
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list
>> >>
>> >> PLEASE do read the posting guide
>> >> and provide commented, minimal, selfcontained, reproducible code.
>> >>
>> >
>>
>> ______________________________________________
>> [hidden email] mailing list
>>
>> PLEASE do read the posting guide
>> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Yes, Kingsford. This problem does not appear to be trival. I am not sure if there is any literature on this. I have seen a paper by Davies and Higham (BIT 2000) on "Stable generation of correlation matrices. There is also a paper by Harry Joe on generating correlation matrices with given partial correlation structure. I am not sure if these papers would be helpful for this problem. Of course, one can readily cook up an adhoc, trialerror procedure to generate the type of matrices that Ning wants.
By the way, the easiest way to generate a PD matrix is:
N < 10
amat < matrix(rnorm(N*N), N, N)
A < amat %*% t(amat) # A is PD
Best,
Ravi
____________________________________________________________________
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University
Ph. (410) 5022619
email: [hidden email]
 Original Message 
From: Kingsford Jones < [hidden email]>
Date: Saturday, August 29, 2009 7:43 pm
Subject: Re: [R] how to generate a random correlation matrix with restrictions
To: Ravi Varadhan < [hidden email]>
Cc: Ning Ma < [hidden email]>, [hidden email]
> Thanks Ravi  with my limited linear algebra skills I was assuming
> invertible symmetric was sufficient (rather than just necessary) for
> positive definiteness. So, the open challenge is to generate a pd
> matrix of dimension 100 with r_ij = 1 if i=j else 1< r_ij< 1, with
> about 10% of the elements >.9.
>
> I tried for a bit without luck, but I can offer another way to
> generate a pd matrix:
>
> ev < runif(100) # choose some positive eigenvalues
> U < svd(matrix(runif(10000), nc=100))$u # an orthogonal matrix
> pdM < U %*% diag(ev) %*% t(U)
>
>
> Kingsford
>
>
>
> On Fri, Aug 28, 2009 at 9:41 PM, Ravi Varadhan< [hidden email]> wrote:
> > Hi Kingsford,
> >
> > There is more structure to a correlation matrix than that meets the
> eye! Your method will produce a matrix R that looks "like" a
> correlation matrix, but beware  it is an impostor!
> >
> > You can obtain a valid correlation matrix, Q, from the impostor R
> by using the `nearPD' function in the "Matrix" package, which finds
> the positive definite matrix Q that is "nearest" to R. However, note
> that when R is far from a positivedefinite matrix, this step may give
> a Q that does not have the desired property.
> >
> > require(Matrix)
> >
> > R < matrix(runif(16), ncol=4)
> >
> > R < (R * lower.tri(R)) + t(R * lower.tri(R))
> >
> > diag(R) < 1
> >
> > eigen(R)$val
> >
> > Q < nearPD(R, posd.tol=1.e04)$mat
> >
> > eigen(Q)$val
> >
> > max(abs(Q  R)) # maximum discrepancy between R and Q
> >
> > Another easy way to produce a valid correlation matrix is:
> >
> > R < matrix(runif(36), ncol=6)
> >
> > RtR < R %*% t(R)
> >
> > Q < cov2cor(RtR)
> >
> > But this does not have the property that the correlations are
> uniformly distributed.
> >
> > Hope this helps,
> > Ravi.
> >
> > ____________________________________________________________________
> >
> > Ravi Varadhan, Ph.D.
> > Assistant Professor,
> > Division of Geriatric Medicine and Gerontology
> > School of Medicine
> > Johns Hopkins University
> >
> > Ph. (410) 5022619
> > email: [hidden email]
> >
> >
> >  Original Message 
> > From: Kingsford Jones < [hidden email]>
> > Date: Friday, August 28, 2009 10:12 pm
> > Subject: Re: [R] how to generate a random correlation matrix with
> restrictions
> > To: Ning Ma < [hidden email]>
> > Cc: [hidden email]
> >
> >
> >> Ahh  Mark Leeds just pointed out offlist that it was supposed
> to be
> >> a correlation matrix.
> >>
> >> Perhaps
> >>
> >> R < matrix(runif(10000), ncol=100)
> >> R < (R * lower.tri(R)) + t(R * lower.tri(R))
> >> diag(R) < 1
> >>
> >> These are of course uniformly distributed positive correlations.
> >>
> >> Kingsford
> >>
> >>
> >> On Fri, Aug 28, 2009 at 7:36 PM, Kingsford
> >> Jones< [hidden email]> wrote:
> >> > R < matrix(runif(10000), ncol=100)
> >> >
> >> > hth,
> >> >
> >> > Kingsford Jones
> >> >
> >> > On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma< [hidden email]> wrote:
> >> >> Hi,
> >> >>
> >> >> How can I generate a random 100x100 correlation matrix, R={r_ij},
> >> >> where about 10% of r_ij are greater than 0.9
> >> >>
> >> >> Thanks in advance.
> >> >>
> >> >> ______________________________________________
> >> >> [hidden email] mailing list
> >> >>
> >> >> PLEASE do read the posting guide
> >> >> and provide commented, minimal, selfcontained, reproducible code.
> >> >>
> >> >
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >>
> >> PLEASE do read the posting guide
> >> and provide commented, minimal, selfcontained, reproducible code.
> >
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi,
i want to pick up your last method and add that "about 10% of r_ij are
greater than 0.9" could mean for example that there is a cluster of
about sqrt(1000) variables that are highly correlated.
But first let us note, that there is no canonical meaning for the
randomness of a random correlation matrix, so whether this is adequate
depends on the underlying problem/simulation study/test cases.
Instead of drawing random vectors via rnorm, one could draw for example
42 vectors from a multivariate normal distribution with high correlation
in the first 42 components:
require("Matrix")
require("mvtnorm")
n < 100
n1 < 42
n2 < n  n1
R < bdiag(matrix(0.99,nrow=n1,ncol=n1)+diag(n1)*0.01,diag(n2)*0.01)
R < as.matrix(R)
random.correlation.matrix < function() {
m1 < rmvnorm(n1,rep(0,n),R)
m2 < rmvnorm(n2,rep(0,n),diag(n))
m < rbind(m1,m2)
for (i in 1:n) {
# normalization  we want a correlation matrix
m[i,] < m[i,]/sqrt(t(m[i,])%*%m[i,])
}
m%*%t(m)
}
Let us count how many entries have an absolute value greater than 0.9:
count.gt.90 < function() {
m < random.correlation.matrix()
sum(abs(m)>0.90)
}
> mean(replicate(100, count.gt.90()))
[1] 999.32
hist(replicate(100, count.gt.90()))
Looks well... But remember that you now have every time 42 highly
correlated variables (which are also always the first 42 ones). Perhaps
you want to make this number also random. And perhaps you want two or
three or a random number of different clusters of highly correlated
variables? Depends all on your objective  have fun! :)
Cheers, Kornelius.
> Yes, Kingsford. This problem does not appear to be trival. I am not sure if there is any literature on this. I have seen a paper by Davies and Higham (BIT 2000) on "Stable generation of correlation matrices. There is also a paper by Harry Joe on generating correlation matrices with given partial correlation structure. I am not sure if these papers would be helpful for this problem. Of course, one can readily cook up an adhoc, trialerror procedure to generate the type of matrices that Ning wants.
>
> By the way, the easiest way to generate a PD matrix is:
>
> N < 10
> amat < matrix(rnorm(N*N), N, N)
> A < amat %*% t(amat) # A is PD
>
> Best,
> Ravi
>
> ____________________________________________________________________
>
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology
> School of Medicine
> Johns Hopkins University
>
> Ph. (410) 5022619
> email: [hidden email]
>
>
>  Original Message 
> From: Kingsford Jones < [hidden email]>
> Date: Saturday, August 29, 2009 7:43 pm
> Subject: Re: [R] how to generate a random correlation matrix with restrictions
> To: Ravi Varadhan < [hidden email]>
> Cc: Ning Ma < [hidden email]>, [hidden email]
>
>
> > Thanks Ravi  with my limited linear algebra skills I was assuming
> > invertible symmetric was sufficient (rather than just necessary) for
> > positive definiteness. So, the open challenge is to generate a pd
> > matrix of dimension 100 with r_ij = 1 if i=j else 1< r_ij< 1, with
> > about 10% of the elements >.9.
> >
> > I tried for a bit without luck, but I can offer another way to
> > generate a pd matrix:
> >
> > ev < runif(100) # choose some positive eigenvalues
> > U < svd(matrix(runif(10000), nc=100))$u # an orthogonal matrix
> > pdM < U %*% diag(ev) %*% t(U)
> >
> >
> > Kingsford
> >
> >
> >
> > On Fri, Aug 28, 2009 at 9:41 PM, Ravi Varadhan< [hidden email]> wrote:
> > > Hi Kingsford,
> > >
> > > There is more structure to a correlation matrix than that meets the
> > eye! Your method will produce a matrix R that looks "like" a
> > correlation matrix, but beware  it is an impostor!
> > >
> > > You can obtain a valid correlation matrix, Q, from the impostor R
> > by using the `nearPD' function in the "Matrix" package, which finds
> > the positive definite matrix Q that is "nearest" to R. However, note
> > that when R is far from a positivedefinite matrix, this step may give
> > a Q that does not have the desired property.
> > >
> > > require(Matrix)
> > >
> > > R < matrix(runif(16), ncol=4)
> > >
> > > R < (R * lower.tri(R)) + t(R * lower.tri(R))
> > >
> > > diag(R) < 1
> > >
> > > eigen(R)$val
> > >
> > > Q < nearPD(R, posd.tol=1.e04)$mat
> > >
> > > eigen(Q)$val
> > >
> > > max(abs(Q  R)) # maximum discrepancy between R and Q
> > >
> > > Another easy way to produce a valid correlation matrix is:
> > >
> > > R < matrix(runif(36), ncol=6)
> > >
> > > RtR < R %*% t(R)
> > >
> > > Q < cov2cor(RtR)
> > >
> > > But this does not have the property that the correlations are
> > uniformly distributed.
> > >
> > > Hope this helps,
> > > Ravi.
> > >
> > > ____________________________________________________________________
> > >
> > > Ravi Varadhan, Ph.D.
> > > Assistant Professor,
> > > Division of Geriatric Medicine and Gerontology
> > > School of Medicine
> > > Johns Hopkins University
> > >
> > > Ph. (410) 5022619
> > > email: [hidden email]
> > >
> > >
> > >  Original Message 
> > > From: Kingsford Jones < [hidden email]>
> > > Date: Friday, August 28, 2009 10:12 pm
> > > Subject: Re: [R] how to generate a random correlation matrix with
> > restrictions
> > > To: Ning Ma < [hidden email]>
> > > Cc: [hidden email]
> > >
> > >
> > >> Ahh  Mark Leeds just pointed out offlist that it was supposed
> > to be
> > >> a correlation matrix.
> > >>
> > >> Perhaps
> > >>
> > >> R < matrix(runif(10000), ncol=100)
> > >> R < (R * lower.tri(R)) + t(R * lower.tri(R))
> > >> diag(R) < 1
> > >>
> > >> These are of course uniformly distributed positive correlations.
> > >>
> > >> Kingsford
> > >>
> > >>
> > >> On Fri, Aug 28, 2009 at 7:36 PM, Kingsford
> > >> Jones< [hidden email]> wrote:
> > >> > R < matrix(runif(10000), ncol=100)
> > >> >
> > >> > hth,
> > >> >
> > >> > Kingsford Jones
> > >> >
> > >> > On Fri, Aug 28, 2009 at 7:26 PM, Ning Ma< [hidden email]> wrote:
> > >> >> Hi,
> > >> >>
> > >> >> How can I generate a random 100x100 correlation matrix, R={r_ij},
> > >> >> where about 10% of r_ij are greater than 0.9
> > >> >>
> > >> >> Thanks in advance.
> > >> >>
> > >> >> ______________________________________________
> > >> >> [hidden email] mailing list
> > >> >>
> > >> >> PLEASE do read the posting guide
> > >> >> and provide commented, minimal, selfcontained, reproducible code.
> > >> >>
> > >> >
> > >>
> > >> ______________________________________________
> > >> [hidden email] mailing list
> > >>
> > >> PLEASE do read the posting guide
> > >> and provide commented, minimal, selfcontained, reproducible code.
> > >
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


So, you want about 10% of the correlations to be greater than 0.9. And
what about the others? It would be helpful to know why you need this
kind of matrix and what you are going to do with it.
Here is an attempt, based on generating a random variance matrix from
a Wishart distribution and transforming it to a correlation matrix.
(Package dlm contains a Wishart generator).
This will generate all the correlations to be around 0.9.
> require(dlm)
> n < 100
> df < n + 5
> S < matrix(0.9, n, n)
> diag(S) < 1
> S < S / df
> set.seed(125)
> x < cov2cor(rwishart(df, Sigma = S))
> summary(x[x!=1]) # offdiagonal elements
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8576 0.9025 0.9116 0.9108 0.9196 0.9525
>
This is one possibility to obtain just about 10% of the correlations
to be around 0.9.
> N < n * (n  1) / 2 # number of correlations
> k < 0.1 * N # want this number close to 0.9
> n1 < round(max(Re(polyroot(c(2*k, 1, 1)))))
> S[row(S) > n1  col(S) > n1] < 0
> diag(S) < S[1, 1]
> y < cov2cor(rwishart(df, Sigma = S))
> mean(y[y != 1] > 0.8) # about 10%
[1] 0.1002020
> summary(y[y != 1 & y > 0.8])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.8630 0.8976 0.9051 0.9045 0.9126 0.9424
>
Again, with more background info we may be able to give you more
meaningful advice.
HTH
Giovanni
> Date: Sat, 29 Aug 2009 09:26:34 +0800
> From: Ning Ma < [hidden email]>
> Sender: [hidden email]
> Precedence: list
> DKIMSignature: v=1; a=rsasha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
> DomainKeySignature: a=rsasha1; c=nofws; d=gmail.com; s=gamma;
>
> Hi,
>
> How can I generate a random 100x100 correlation matrix, R={r_ij},
> where about 10% of r_ij are greater than 0.9
>
> Thanks in advance.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
>

Giovanni Petris < [hidden email]>
Associate Professor
Department of Mathematical Sciences
University of Arkansas  Fayetteville, AR 72701
Ph: (479) 5756324, 5758630 (fax)
http://definetti.uark.edu/~gpetris/______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

