

RFinanciers,
For any of you who, like us, use SVD for risk modeling and stat arb trading, we've discovered what I think is a very serious bug in R/LAPACK's SVD calculation. Since this bug has the potential to create bogus risk models, I thought this audience might be interested in this post.
Specifically, for many matrices that are not full rank and have a few small eigenvalues (much like covariance from stock returns!), SVD may return completely bogus and impossible results: for a correlation matrix the sum of the singular values will be many times too large and the 'u' and 'v' matrix will not be even close to orthonormal. For a random matrix with certain distributions of eigenvalues (first discovered using a covariance from 1minute bar stock returns), SVD produces bogus results 20%50% of the time.
I've posted the bug report with an example here:
https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962For any of you using risk models or stat arb that use SVD, you may want to see if this error affects you.
Also, (selfishly) I wanted to post this out to fellow practitioners already knew about this problem and were using a better algorithm (I noticed there are more than one LAPACK svd algorithm). The slow method of SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works fine (but it's very slow).
Last, here's a simple 'svd' replacement function that performs a rudimentary check on the output of svd. It fails if 'u' is not orthonormal. Generally 'u', 'd', and 'v' all seem to give wrong answers at the same time, but no guarantees that problems with 'v' and 'd' won't slip through.
svd < function(...) {
x < base::svd(...)
if (!is.complex(x$u) && any(abs(colSums(x$u^2)1) > 1e3))
stop("svd gave incorrect result")
x
}
Suggestions, comments appreciated.
Robert
Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor  Boston, MA  02109
Direct: (617)3928396
This email, and any attachments hereto, are intended fo...{{dropped:11}}
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Robert,
First off, this is a pretty interesting phenomenon, and let me start
off by saying that I have not figured out what is going on. However, I do
have some more information to contribute:
I am unable to recreate the bug in MATLAB, which uses a different
subroutine from LAPACK: DGESVD whereas R uses DGESDD
I then tried to see if I exported the same random matrix and tried to run
it through the MATLAB svd function what would happen. However, if I write
the offending matrix to a csv file, then load it back into R and calculate
the svd over again, it works and we get sum(sv) ~ 600.
I know this doesn't help, but I guess its more information..
I'll let you know if I find anything else,
Bob
for (i in 1:10) {
set.seed(i)
R < matrix(rnorm(x1*x2), x1, x2)
ee < eigen(tcrossprod(R))
## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
D < ee$vectors
cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
## Scale 'cx' to the corresponding correlation matrix and verify diag=1
cx < cov2cor(cx)
stopifnot(all(diag(cx)==1))
## Calculate sum of singular values (should be 'x1')
sv < sum(svd(cx)$d)
print(abs(sv  x1))
if (abs(svx1) > 1e2) {
write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
FALSE)
x.R < read.table("svd.csv", sep = ",")
x.R < as.matrix(x.R)
R.old < R
R < x.R
ee < eigen(tcrossprod(R))
## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
D < ee$vectors
cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
## Scale 'cx' to the corresponding correlation matrix and verify
diag=1
cx < cov2cor(cx)
stopifnot(all(diag(cx)==1))
## Calculate sum of singular values (should be 'x1')
sv < sum(svd(cx)$d)
browser()
cat("Seed=", i, "\n")
stop("Average singular value of correlation matrix is ", round(sv,
3),
". Should be ", x1)
}
}
On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
[hidden email]> wrote:
> RFinanciers,
> For any of you who, like us, use SVD for risk modeling and stat arb
> trading, we've discovered what I think is a very serious bug in R/LAPACK's
> SVD calculation. Since this bug has the potential to create bogus risk
> models, I thought this audience might be interested in this post.
>
> Specifically, for many matrices that are not full rank and have a few
> small eigenvalues (much like covariance from stock returns!), SVD may
> return completely bogus and impossible results: for a correlation matrix
> the sum of the singular values will be many times too large and the 'u' and
> 'v' matrix will not be even close to orthonormal. For a random matrix with
> certain distributions of eigenvalues (first discovered using a covariance
> from 1minute bar stock returns), SVD produces bogus results 20%50% of the
> time.
>
> I've posted the bug report with an example here:
> https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962>
> For any of you using risk models or stat arb that use SVD, you may want to
> see if this error affects you.
>
> Also, (selfishly) I wanted to post this out to fellow practitioners
> already knew about this problem and were using a better algorithm (I
> noticed there are more than one LAPACK svd algorithm). The slow method of
> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
> fine (but it's very slow).
>
> Last, here's a simple 'svd' replacement function that performs a
> rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
> time, but no guarantees that problems with 'v' and 'd' won't slip through.
>
> svd < function(...) {
> x < base::svd(...)
> if (!is.complex(x$u) && any(abs(colSums(x$u^2)1) > 1e3))
> stop("svd gave incorrect result")
> x
> }
>
> Suggestions, comments appreciated.
> Robert
>
> Robert McGehee, CFA
> Geode Capital Management, LLC
> One Post Office Square, 28th Floor  Boston, MA  02109
> Direct: (617)3928396
>
> This email, and any attachments hereto, are intended fo...{{dropped:11}}
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions
> should go.
>
[[alternative HTML version deleted]]
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Thanks for highlighting that there is risk where
I hadn't expected any.
I've seen a different weird phenomenon in which
R's 'svd' throws an error because of nonconvergence
in the algorithm. Changing the order of the rows
makes it go away.
Pat
On 28/06/2012 16:52, Robert Harlow wrote:
> Robert,
> First off, this is a pretty interesting phenomenon, and let me start
> off by saying that I have not figured out what is going on. However, I do
> have some more information to contribute:
> I am unable to recreate the bug in MATLAB, which uses a different
> subroutine from LAPACK: DGESVD whereas R uses DGESDD
>
> I then tried to see if I exported the same random matrix and tried to run
> it through the MATLAB svd function what would happen. However, if I write
> the offending matrix to a csv file, then load it back into R and calculate
> the svd over again, it works and we get sum(sv) ~ 600.
>
> I know this doesn't help, but I guess its more information..
> I'll let you know if I find anything else,
> Bob
>
> for (i in 1:10) {
> set.seed(i)
> R < matrix(rnorm(x1*x2), x1, x2)
> ee < eigen(tcrossprod(R))
>
> ## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
> D < ee$vectors
> cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>
> ## Scale 'cx' to the corresponding correlation matrix and verify diag=1
> cx < cov2cor(cx)
> stopifnot(all(diag(cx)==1))
>
> ## Calculate sum of singular values (should be 'x1')
> sv < sum(svd(cx)$d)
> print(abs(sv  x1))
> if (abs(svx1) > 1e2) {
> write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
> FALSE)
> x.R < read.table("svd.csv", sep = ",")
> x.R < as.matrix(x.R)
> R.old < R
> R < x.R
> ee < eigen(tcrossprod(R))
> ## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
> D < ee$vectors
> cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>
> ## Scale 'cx' to the corresponding correlation matrix and verify
> diag=1
> cx < cov2cor(cx)
> stopifnot(all(diag(cx)==1))
>
> ## Calculate sum of singular values (should be 'x1')
> sv < sum(svd(cx)$d)
> browser()
> cat("Seed=", i, "\n")
> stop("Average singular value of correlation matrix is ", round(sv,
> 3),
> ". Should be ", x1)
> }
> }
>
>
>
> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
> [hidden email]> wrote:
>
>> RFinanciers,
>> For any of you who, like us, use SVD for risk modeling and stat arb
>> trading, we've discovered what I think is a very serious bug in R/LAPACK's
>> SVD calculation. Since this bug has the potential to create bogus risk
>> models, I thought this audience might be interested in this post.
>>
>> Specifically, for many matrices that are not full rank and have a few
>> small eigenvalues (much like covariance from stock returns!), SVD may
>> return completely bogus and impossible results: for a correlation matrix
>> the sum of the singular values will be many times too large and the 'u' and
>> 'v' matrix will not be even close to orthonormal. For a random matrix with
>> certain distributions of eigenvalues (first discovered using a covariance
>> from 1minute bar stock returns), SVD produces bogus results 20%50% of the
>> time.
>>
>> I've posted the bug report with an example here:
>> https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962>>
>> For any of you using risk models or stat arb that use SVD, you may want to
>> see if this error affects you.
>>
>> Also, (selfishly) I wanted to post this out to fellow practitioners
>> already knew about this problem and were using a better algorithm (I
>> noticed there are more than one LAPACK svd algorithm). The slow method of
>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
>> fine (but it's very slow).
>>
>> Last, here's a simple 'svd' replacement function that performs a
>> rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
>> time, but no guarantees that problems with 'v' and 'd' won't slip through.
>>
>> svd < function(...) {
>> x < base::svd(...)
>> if (!is.complex(x$u) && any(abs(colSums(x$u^2)1) > 1e3))
>> stop("svd gave incorrect result")
>> x
>> }
>>
>> Suggestions, comments appreciated.
>> Robert
>>
>> Robert McGehee, CFA
>> Geode Capital Management, LLC
>> One Post Office Square, 28th Floor  Boston, MA  02109
>> Direct: (617)3928396
>>
>> This email, and any attachments hereto, are intended fo...{{dropped:11}}
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rsigfinance>>  Subscriberposting only. If you want to post, subscribe first.
>>  Also note that this is not the rhelp list where general R questions
>> should go.
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions should go.
>

Patrick Burns
[hidden email]
http://www.burnsstat.comhttp://www.portfolioprobe.com/blogtwitter: @portfolioprobe
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


According to Peter Dalgard:
The issue goes away when relinking against LAPACK 3.4.1. R's generic
LAPACK is extracted from version 3.1.1 and could be upgraded with some
effort (it isn't a straightforward copy).
https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962#c2Good catch Robert!
*Pierre Lapointe*
*Brockhouse Cooper*  Global Macro Strategist
5142277711  [hidden email]
On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns < [hidden email]>wrote:
> Thanks for highlighting that there is risk where
> I hadn't expected any.
>
> I've seen a different weird phenomenon in which
> R's 'svd' throws an error because of nonconvergence
> in the algorithm. Changing the order of the rows
> makes it go away.
>
> Pat
>
>
> On 28/06/2012 16:52, Robert Harlow wrote:
>
>> Robert,
>> First off, this is a pretty interesting phenomenon, and let me start
>> off by saying that I have not figured out what is going on. However, I do
>> have some more information to contribute:
>> I am unable to recreate the bug in MATLAB, which uses a different
>> subroutine from LAPACK: DGESVD whereas R uses DGESDD
>>
>> I then tried to see if I exported the same random matrix and tried to run
>> it through the MATLAB svd function what would happen. However, if I write
>> the offending matrix to a csv file, then load it back into R and calculate
>> the svd over again, it works and we get sum(sv) ~ 600.
>>
>> I know this doesn't help, but I guess its more information..
>> I'll let you know if I find anything else,
>> Bob
>>
>> for (i in 1:10) {
>> set.seed(i)
>> R < matrix(rnorm(x1*x2), x1, x2)
>> ee < eigen(tcrossprod(R))
>>
>> ## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
>> D < ee$vectors
>> cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>>
>> ## Scale 'cx' to the corresponding correlation matrix and verify
>> diag=1
>> cx < cov2cor(cx)
>> stopifnot(all(diag(cx)==1))
>>
>> ## Calculate sum of singular values (should be 'x1')
>> sv < sum(svd(cx)$d)
>> print(abs(sv  x1))
>> if (abs(svx1) > 1e2) {
>> write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names
>> =
>> FALSE)
>> x.R < read.table("svd.csv", sep = ",")
>> x.R < as.matrix(x.R)
>> R.old < R
>> R < x.R
>> ee < eigen(tcrossprod(R))
>> ## 'cx' is a positive semidefinite matrix with eigenvalues of
>> 'ev'
>> D < ee$vectors
>> cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev),
>> ee$vectors[,1:x2])
>>
>> ## Scale 'cx' to the corresponding correlation matrix and verify
>> diag=1
>> cx < cov2cor(cx)
>> stopifnot(all(diag(cx)==1))
>>
>> ## Calculate sum of singular values (should be 'x1')
>> sv < sum(svd(cx)$d)
>> browser()
>> cat("Seed=", i, "\n")
>> stop("Average singular value of correlation matrix is ", round(sv,
>> 3),
>> ". Should be ", x1)
>> }
>> }
>>
>>
>>
>> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
>> Robert.McGehee@geodecapital.**com < [hidden email]>>
>> wrote:
>>
>> RFinanciers,
>>> For any of you who, like us, use SVD for risk modeling and stat arb
>>> trading, we've discovered what I think is a very serious bug in
>>> R/LAPACK's
>>> SVD calculation. Since this bug has the potential to create bogus risk
>>> models, I thought this audience might be interested in this post.
>>>
>>> Specifically, for many matrices that are not full rank and have a few
>>> small eigenvalues (much like covariance from stock returns!), SVD may
>>> return completely bogus and impossible results: for a correlation matrix
>>> the sum of the singular values will be many times too large and the 'u'
>>> and
>>> 'v' matrix will not be even close to orthonormal. For a random matrix
>>> with
>>> certain distributions of eigenvalues (first discovered using a covariance
>>> from 1minute bar stock returns), SVD produces bogus results 20%50% of
>>> the
>>> time.
>>>
>>> I've posted the bug report with an example here:
>>> https://bugs.rproject.org/**bugzilla3/show_bug.cgi?id=**14962< https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962>
>>>
>>> For any of you using risk models or stat arb that use SVD, you may want
>>> to
>>> see if this error affects you.
>>>
>>> Also, (selfishly) I wanted to post this out to fellow practitioners
>>> already knew about this problem and were using a better algorithm (I
>>> noticed there are more than one LAPACK svd algorithm). The slow method of
>>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still
>>> works
>>> fine (but it's very slow).
>>>
>>> Last, here's a simple 'svd' replacement function that performs a
>>> rudimentary check on the output of svd. It fails if 'u' is not
>>> orthonormal.
>>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
>>> time, but no guarantees that problems with 'v' and 'd' won't slip
>>> through.
>>>
>>> svd < function(...) {
>>> x < base::svd(...)
>>> if (!is.complex(x$u) && any(abs(colSums(x$u^2)1) > 1e3))
>>> stop("svd gave incorrect result")
>>> x
>>> }
>>>
>>> Suggestions, comments appreciated.
>>> Robert
>>>
>>> Robert McGehee, CFA
>>> Geode Capital Management, LLC
>>> One Post Office Square, 28th Floor  Boston, MA  02109
>>> Direct: (617)3928396
>>>
>>> This email, and any attachments hereto, are intended fo...{{dropped:11}}
>>>
>>> ______________________________**_________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/rsigfinance< https://stat.ethz.ch/mailman/listinfo/rsigfinance>
>>>  Subscriberposting only. If you want to post, subscribe first.
>>>  Also note that this is not the rhelp list where general R questions
>>> should go.
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/rsigfinance< https://stat.ethz.ch/mailman/listinfo/rsigfinance>
>>  Subscriberposting only. If you want to post, subscribe first.
>>  Also note that this is not the rhelp list where general R questions
>> should go.
>>
>>
> 
> Patrick Burns
> [hidden email]
> http://www.burnsstat.com> http://www.portfolioprobe.com/**blog < http://www.portfolioprobe.com/blog>
> twitter: @portfolioprobe
>
>
> ______________________________**_________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/**listinfo/rsigfinance< https://stat.ethz.ch/mailman/listinfo/rsigfinance>
>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions
> should go.
>
[[alternative HTML version deleted]]
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Another temporary fix could be to round your matrix R:
R < round(R, 12)
seems to work just fine...
On Thu, Jun 28, 2012 at 12:07 PM, Pierre Lapointe < [hidden email]>wrote:
> According to Peter Dalgard:
>
> The issue goes away when relinking against LAPACK 3.4.1. R's generic
> LAPACK is extracted from version 3.1.1 and could be upgraded with some
> effort (it isn't a straightforward copy).
>
> https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962#c2>
> Good catch Robert!
>
> *Pierre Lapointe*
>
> *Brockhouse Cooper*  Global Macro Strategist
>
> 5142277711  [hidden email]
>
>
>
> On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns < [hidden email]
> >wrote:
>
> > Thanks for highlighting that there is risk where
> > I hadn't expected any.
> >
> > I've seen a different weird phenomenon in which
> > R's 'svd' throws an error because of nonconvergence
> > in the algorithm. Changing the order of the rows
> > makes it go away.
> >
> > Pat
> >
> >
> > On 28/06/2012 16:52, Robert Harlow wrote:
> >
> >> Robert,
> >> First off, this is a pretty interesting phenomenon, and let me start
> >> off by saying that I have not figured out what is going on. However, I
> do
> >> have some more information to contribute:
> >> I am unable to recreate the bug in MATLAB, which uses a different
> >> subroutine from LAPACK: DGESVD whereas R uses DGESDD
> >>
> >> I then tried to see if I exported the same random matrix and tried to
> run
> >> it through the MATLAB svd function what would happen. However, if I
> write
> >> the offending matrix to a csv file, then load it back into R and
> calculate
> >> the svd over again, it works and we get sum(sv) ~ 600.
> >>
> >> I know this doesn't help, but I guess its more information..
> >> I'll let you know if I find anything else,
> >> Bob
> >>
> >> for (i in 1:10) {
> >> set.seed(i)
> >> R < matrix(rnorm(x1*x2), x1, x2)
> >> ee < eigen(tcrossprod(R))
> >>
> >> ## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
> >> D < ee$vectors
> >> cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
> >>
> >> ## Scale 'cx' to the corresponding correlation matrix and verify
> >> diag=1
> >> cx < cov2cor(cx)
> >> stopifnot(all(diag(cx)==1))
> >>
> >> ## Calculate sum of singular values (should be 'x1')
> >> sv < sum(svd(cx)$d)
> >> print(abs(sv  x1))
> >> if (abs(svx1) > 1e2) {
> >> write.table(R, "svd.csv", sep = ",", row.names = FALSE,
> col.names
> >> =
> >> FALSE)
> >> x.R < read.table("svd.csv", sep = ",")
> >> x.R < as.matrix(x.R)
> >> R.old < R
> >> R < x.R
> >> ee < eigen(tcrossprod(R))
> >> ## 'cx' is a positive semidefinite matrix with eigenvalues of
> >> 'ev'
> >> D < ee$vectors
> >> cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev),
> >> ee$vectors[,1:x2])
> >>
> >> ## Scale 'cx' to the corresponding correlation matrix and verify
> >> diag=1
> >> cx < cov2cor(cx)
> >> stopifnot(all(diag(cx)==1))
> >>
> >> ## Calculate sum of singular values (should be 'x1')
> >> sv < sum(svd(cx)$d)
> >> browser()
> >> cat("Seed=", i, "\n")
> >> stop("Average singular value of correlation matrix is ",
> round(sv,
> >> 3),
> >> ". Should be ", x1)
> >> }
> >> }
> >>
> >>
> >>
> >> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
> >> Robert.McGehee@geodecapital.**com < [hidden email]>>
> >> wrote:
> >>
> >> RFinanciers,
> >>> For any of you who, like us, use SVD for risk modeling and stat arb
> >>> trading, we've discovered what I think is a very serious bug in
> >>> R/LAPACK's
> >>> SVD calculation. Since this bug has the potential to create bogus risk
> >>> models, I thought this audience might be interested in this post.
> >>>
> >>> Specifically, for many matrices that are not full rank and have a few
> >>> small eigenvalues (much like covariance from stock returns!), SVD may
> >>> return completely bogus and impossible results: for a correlation
> matrix
> >>> the sum of the singular values will be many times too large and the 'u'
> >>> and
> >>> 'v' matrix will not be even close to orthonormal. For a random matrix
> >>> with
> >>> certain distributions of eigenvalues (first discovered using a
> covariance
> >>> from 1minute bar stock returns), SVD produces bogus results 20%50% of
> >>> the
> >>> time.
> >>>
> >>> I've posted the bug report with an example here:
> >>> https://bugs.rproject.org/**bugzilla3/show_bug.cgi?id=**14962<
> https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962>
> >>>
> >>> For any of you using risk models or stat arb that use SVD, you may want
> >>> to
> >>> see if this error affects you.
> >>>
> >>> Also, (selfishly) I wanted to post this out to fellow practitioners
> >>> already knew about this problem and were using a better algorithm (I
> >>> noticed there are more than one LAPACK svd algorithm). The slow method
> of
> >>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still
> >>> works
> >>> fine (but it's very slow).
> >>>
> >>> Last, here's a simple 'svd' replacement function that performs a
> >>> rudimentary check on the output of svd. It fails if 'u' is not
> >>> orthonormal.
> >>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
> >>> time, but no guarantees that problems with 'v' and 'd' won't slip
> >>> through.
> >>>
> >>> svd < function(...) {
> >>> x < base::svd(...)
> >>> if (!is.complex(x$u) && any(abs(colSums(x$u^2)1) > 1e3))
> >>> stop("svd gave incorrect result")
> >>> x
> >>> }
> >>>
> >>> Suggestions, comments appreciated.
> >>> Robert
> >>>
> >>> Robert McGehee, CFA
> >>> Geode Capital Management, LLC
> >>> One Post Office Square, 28th Floor  Boston, MA  02109
> >>> Direct: (617)3928396
> >>>
> >>> This email, and any attachments hereto, are intended
> fo...{{dropped:11}}
> >>>
> >>> ______________________________**_________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/**listinfo/rsigfinance<
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>
> >>>  Subscriberposting only. If you want to post, subscribe first.
> >>>  Also note that this is not the rhelp list where general R questions
> >>> should go.
> >>>
> >>>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________**_________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/rsigfinance<
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>
> >>  Subscriberposting only. If you want to post, subscribe first.
> >>  Also note that this is not the rhelp list where general R questions
> >> should go.
> >>
> >>
> > 
> > Patrick Burns
> > [hidden email]
> > http://www.burnsstat.com> > http://www.portfolioprobe.com/**blog < http://www.portfolioprobe.com/blog> >
> > twitter: @portfolioprobe
> >
> >
> > ______________________________**_________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/**listinfo/rsigfinance<
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>
> >  Subscriberposting only. If you want to post, subscribe first.
> >  Also note that this is not the rhelp list where general R questions
> > should go.
> >
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rsigfinance>  Subscriberposting only. If you want to post, subscribe first.
>  Also note that this is not the rhelp list where general R questions
> should go.
>
[[alternative HTML version deleted]]
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.


Quite right. I just rebuilt R linked against the latest version of LAPACK and all the problems went away.
From: Pierre Lapointe [mailto: [hidden email]]
Sent: Thursday, June 28, 2012 12:08 PM
To: Patrick Burns
Cc: [hidden email]; McGehee, Robert
Subject: Re: [RSIGFinance] Incorrect SVD Calculation
According to Peter Dalgard:
The issue goes away when relinking against LAPACK 3.4.1. R's generic LAPACK is extracted from version 3.1.1 and could be upgraded with some effort (it isn't a straightforward copy).
https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962#c2Good catch Robert!
Pierre Lapointe
Brockhouse Cooper  Global Macro Strategist
5142277711  [hidden email]<mailto: [hidden email]>
On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns < [hidden email]<mailto: [hidden email]>> wrote:
Thanks for highlighting that there is risk where
I hadn't expected any.
I've seen a different weird phenomenon in which
R's 'svd' throws an error because of nonconvergence
in the algorithm. Changing the order of the rows
makes it go away.
Pat
On 28/06/2012 16:52, Robert Harlow wrote:
Robert,
First off, this is a pretty interesting phenomenon, and let me start
off by saying that I have not figured out what is going on. However, I do
have some more information to contribute:
I am unable to recreate the bug in MATLAB, which uses a different
subroutine from LAPACK: DGESVD whereas R uses DGESDD
I then tried to see if I exported the same random matrix and tried to run
it through the MATLAB svd function what would happen. However, if I write
the offending matrix to a csv file, then load it back into R and calculate
the svd over again, it works and we get sum(sv) ~ 600.
I know this doesn't help, but I guess its more information..
I'll let you know if I find anything else,
Bob
for (i in 1:10) {
set.seed(i)
R < matrix(rnorm(x1*x2), x1, x2)
ee < eigen(tcrossprod(R))
## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
D < ee$vectors
cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
## Scale 'cx' to the corresponding correlation matrix and verify diag=1
cx < cov2cor(cx)
stopifnot(all(diag(cx)==1))
## Calculate sum of singular values (should be 'x1')
sv < sum(svd(cx)$d)
print(abs(sv  x1))
if (abs(svx1) > 1e2) {
write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
FALSE)
x.R < read.table("svd.csv", sep = ",")
x.R < as.matrix(x.R)
R.old < R
R < x.R
ee < eigen(tcrossprod(R))
## 'cx' is a positive semidefinite matrix with eigenvalues of 'ev'
D < ee$vectors
cx < ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
## Scale 'cx' to the corresponding correlation matrix and verify
diag=1
cx < cov2cor(cx)
stopifnot(all(diag(cx)==1))
## Calculate sum of singular values (should be 'x1')
sv < sum(svd(cx)$d)
browser()
cat("Seed=", i, "\n")
stop("Average singular value of correlation matrix is ", round(sv,
3),
". Should be ", x1)
}
}
On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
[hidden email]<mailto: [hidden email]>> wrote:
RFinanciers,
For any of you who, like us, use SVD for risk modeling and stat arb
trading, we've discovered what I think is a very serious bug in R/LAPACK's
SVD calculation. Since this bug has the potential to create bogus risk
models, I thought this audience might be interested in this post.
Specifically, for many matrices that are not full rank and have a few
small eigenvalues (much like covariance from stock returns!), SVD may
return completely bogus and impossible results: for a correlation matrix
the sum of the singular values will be many times too large and the 'u' and
'v' matrix will not be even close to orthonormal. For a random matrix with
certain distributions of eigenvalues (first discovered using a covariance
from 1minute bar stock returns), SVD produces bogus results 20%50% of the
time.
I've posted the bug report with an example here:
https://bugs.rproject.org/bugzilla3/show_bug.cgi?id=14962For any of you using risk models or stat arb that use SVD, you may want to
see if this error affects you.
Also, (selfishly) I wanted to post this out to fellow practitioners
already knew about this problem and were using a better algorithm (I
noticed there are more than one LAPACK svd algorithm). The slow method of
SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
fine (but it's very slow).
Last, here's a simple 'svd' replacement function that performs a
rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
time, but no guarantees that problems with 'v' and 'd' won't slip through.
svd < function(...) {
x < base::svd(...)
if (!is.complex(x$u) && any(abs(colSums(x$u^2)1) > 1e3))
stop("svd gave incorrect result")
x
}
Suggestions, comments appreciated.
Robert
Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor  Boston, MA  02109
Direct: (617)3928396<tel:%28617%293928396>
This email, and any attachments hereto, are intended fo...{{dropped:11}}
_______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions
should go.
[[alternative HTML version deleted]]
_______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.

Patrick Burns
[hidden email]<mailto: [hidden email]>
http://www.burnsstat.comhttp://www.portfolioprobe.com/blogtwitter: @portfolioprobe
_______________________________________________
[hidden email]<mailto: [hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.
[[alternative HTML version deleted]]
_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rsigfinance Subscriberposting only. If you want to post, subscribe first.
 Also note that this is not the rhelp list where general R questions should go.

