Incorrect SVD Calculation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Incorrect SVD Calculation

McGehee, Robert
R-Financiers,
For any of you who, like us, use SVD for risk modeling and stat arb trading, we've discovered what I think is a very serious bug in R/LAPACK's SVD calculation. Since this bug has the potential to create bogus risk models, I thought this audience might be interested in this post.

Specifically, for many matrices that are not full rank and have a few small eigenvalues (much like covariance from stock returns!), SVD may return completely bogus and impossible results: for a correlation matrix the sum of the singular values will be many times too large and the 'u' and 'v' matrix will not be even close to orthonormal. For a random matrix with certain distributions of eigenvalues (first discovered using a covariance from 1-minute bar stock returns), SVD produces bogus results 20%-50% of the time.

I've posted the bug report with an example here:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962

For any of you using risk models or stat arb that use SVD, you may want to see if this error affects you.

Also, (selfishly) I wanted to post this out to fellow practitioners already knew about this problem and were using a better algorithm (I noticed there are more than one LAPACK svd algorithm). The slow method of SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works fine (but it's very slow).

Last, here's a simple 'svd' replacement function that performs a rudimentary check on the output of svd. It fails if 'u' is not orthonormal. Generally 'u', 'd', and 'v' all seem to give wrong answers at the same time, but no guarantees that problems with 'v' and 'd' won't slip through.

svd <- function(...) {
    x <- base::svd(...)
    if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
        stop("svd gave incorrect result")
    x
}

Suggestions, comments appreciated.
--Robert

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Direct: (617)392-8396

This e-mail, and any attachments hereto, are intended fo...{{dropped:11}}

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Bob
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect SVD Calculation

Bob
Robert,
    First off, this is a pretty interesting phenomenon, and let me start
off by saying that I have not figured out what is going on.  However, I do
have some more information to contribute:
I am unable to recreate the bug in MATLAB, which uses a different
subroutine from LAPACK: DGESVD whereas R uses DGESDD

I then tried to see if I exported the same random matrix and tried to run
it through the MATLAB svd function what would happen.  However, if I write
the offending matrix to a csv file, then load it back into R and calculate
the svd over again, it works and we get sum(sv) ~ 600.

I know this doesn't help, but I guess its more information..
I'll let you know if I find anything else,
-Bob

for (i in 1:10) {
    set.seed(i)
    R <- matrix(rnorm(x1*x2), x1, x2)
    ee <- eigen(tcrossprod(R))

    ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
    D <- ee$vectors
    cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])

    ## Scale 'cx' to the corresponding correlation matrix and verify diag=1
    cx <- cov2cor(cx)
    stopifnot(all(diag(cx)==1))

    ## Calculate sum of singular values (should be 'x1')
    sv <- sum(svd(cx)$d)
    print(abs(sv - x1))
    if (abs(sv-x1) > 1e-2) {
        write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
FALSE)
        x.R <- read.table("svd.csv", sep = ",")
        x.R <- as.matrix(x.R)
        R.old <- R
        R <- x.R
        ee <- eigen(tcrossprod(R))
        ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
        D <- ee$vectors
        cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])

        ## Scale 'cx' to the corresponding correlation matrix and verify
diag=1
        cx <- cov2cor(cx)
        stopifnot(all(diag(cx)==1))

        ## Calculate sum of singular values (should be 'x1')
        sv <- sum(svd(cx)$d)
        browser()
        cat("Seed=", i, "\n")
        stop("Average singular value of correlation matrix is ", round(sv,
3),
". Should be ", x1)
    }
}



On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
[hidden email]> wrote:

> R-Financiers,
> For any of you who, like us, use SVD for risk modeling and stat arb
> trading, we've discovered what I think is a very serious bug in R/LAPACK's
> SVD calculation. Since this bug has the potential to create bogus risk
> models, I thought this audience might be interested in this post.
>
> Specifically, for many matrices that are not full rank and have a few
> small eigenvalues (much like covariance from stock returns!), SVD may
> return completely bogus and impossible results: for a correlation matrix
> the sum of the singular values will be many times too large and the 'u' and
> 'v' matrix will not be even close to orthonormal. For a random matrix with
> certain distributions of eigenvalues (first discovered using a covariance
> from 1-minute bar stock returns), SVD produces bogus results 20%-50% of the
> time.
>
> I've posted the bug report with an example here:
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962
>
> For any of you using risk models or stat arb that use SVD, you may want to
> see if this error affects you.
>
> Also, (selfishly) I wanted to post this out to fellow practitioners
> already knew about this problem and were using a better algorithm (I
> noticed there are more than one LAPACK svd algorithm). The slow method of
> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
> fine (but it's very slow).
>
> Last, here's a simple 'svd' replacement function that performs a
> rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
> time, but no guarantees that problems with 'v' and 'd' won't slip through.
>
> svd <- function(...) {
>    x <- base::svd(...)
>    if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
>        stop("svd gave incorrect result")
>    x
> }
>
> Suggestions, comments appreciated.
> --Robert
>
> Robert McGehee, CFA
> Geode Capital Management, LLC
> One Post Office Square, 28th Floor | Boston, MA | 02109
> Direct: (617)392-8396
>
> This e-mail, and any attachments hereto, are intended fo...{{dropped:11}}
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect SVD Calculation

Patrick Burns-2
Thanks for highlighting that there is risk where
I hadn't expected any.

I've seen a different weird phenomenon in which
R's 'svd' throws an error because of non-convergence
in the algorithm.  Changing the order of the rows
makes it go away.

Pat

On 28/06/2012 16:52, Robert Harlow wrote:

> Robert,
>      First off, this is a pretty interesting phenomenon, and let me start
> off by saying that I have not figured out what is going on.  However, I do
> have some more information to contribute:
> I am unable to recreate the bug in MATLAB, which uses a different
> subroutine from LAPACK: DGESVD whereas R uses DGESDD
>
> I then tried to see if I exported the same random matrix and tried to run
> it through the MATLAB svd function what would happen.  However, if I write
> the offending matrix to a csv file, then load it back into R and calculate
> the svd over again, it works and we get sum(sv) ~ 600.
>
> I know this doesn't help, but I guess its more information..
> I'll let you know if I find anything else,
> -Bob
>
> for (i in 1:10) {
>      set.seed(i)
>      R <- matrix(rnorm(x1*x2), x1, x2)
>      ee <- eigen(tcrossprod(R))
>
>      ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
>      D <- ee$vectors
>      cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>
>      ## Scale 'cx' to the corresponding correlation matrix and verify diag=1
>      cx <- cov2cor(cx)
>      stopifnot(all(diag(cx)==1))
>
>      ## Calculate sum of singular values (should be 'x1')
>      sv <- sum(svd(cx)$d)
>      print(abs(sv - x1))
>      if (abs(sv-x1) > 1e-2) {
>          write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
> FALSE)
>          x.R <- read.table("svd.csv", sep = ",")
>          x.R <- as.matrix(x.R)
>          R.old <- R
>          R <- x.R
>          ee <- eigen(tcrossprod(R))
>          ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
>          D <- ee$vectors
>          cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>
>          ## Scale 'cx' to the corresponding correlation matrix and verify
> diag=1
>          cx <- cov2cor(cx)
>          stopifnot(all(diag(cx)==1))
>
>          ## Calculate sum of singular values (should be 'x1')
>          sv <- sum(svd(cx)$d)
>          browser()
>          cat("Seed=", i, "\n")
>          stop("Average singular value of correlation matrix is ", round(sv,
> 3),
> ". Should be ", x1)
>      }
> }
>
>
>
> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
> [hidden email]> wrote:
>
>> R-Financiers,
>> For any of you who, like us, use SVD for risk modeling and stat arb
>> trading, we've discovered what I think is a very serious bug in R/LAPACK's
>> SVD calculation. Since this bug has the potential to create bogus risk
>> models, I thought this audience might be interested in this post.
>>
>> Specifically, for many matrices that are not full rank and have a few
>> small eigenvalues (much like covariance from stock returns!), SVD may
>> return completely bogus and impossible results: for a correlation matrix
>> the sum of the singular values will be many times too large and the 'u' and
>> 'v' matrix will not be even close to orthonormal. For a random matrix with
>> certain distributions of eigenvalues (first discovered using a covariance
>> from 1-minute bar stock returns), SVD produces bogus results 20%-50% of the
>> time.
>>
>> I've posted the bug report with an example here:
>> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962
>>
>> For any of you using risk models or stat arb that use SVD, you may want to
>> see if this error affects you.
>>
>> Also, (selfishly) I wanted to post this out to fellow practitioners
>> already knew about this problem and were using a better algorithm (I
>> noticed there are more than one LAPACK svd algorithm). The slow method of
>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
>> fine (but it's very slow).
>>
>> Last, here's a simple 'svd' replacement function that performs a
>> rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
>> time, but no guarantees that problems with 'v' and 'd' won't slip through.
>>
>> svd <- function(...) {
>>     x <- base::svd(...)
>>     if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
>>         stop("svd gave incorrect result")
>>     x
>> }
>>
>> Suggestions, comments appreciated.
>> --Robert
>>
>> Robert McGehee, CFA
>> Geode Capital Management, LLC
>> One Post Office Square, 28th Floor | Boston, MA | 02109
>> Direct: (617)392-8396
>>
>> This e-mail, and any attachments hereto, are intended fo...{{dropped:11}}
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions should go.
>

--
Patrick Burns
[hidden email]
http://www.burns-stat.com
http://www.portfolioprobe.com/blog
twitter: @portfolioprobe

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect SVD Calculation

Pierre Lapointe
According to Peter Dalgard:

The issue goes away when relinking against LAPACK 3.4.1. R's generic
LAPACK is extracted from version 3.1.1 and could be upgraded with some
effort (it isn't a straightforward copy).

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962#c2

Good catch Robert!

*Pierre Lapointe*

*Brockhouse Cooper* | Global Macro Strategist

514-227-7711 | [hidden email]



On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns <[hidden email]>wrote:

> Thanks for highlighting that there is risk where
> I hadn't expected any.
>
> I've seen a different weird phenomenon in which
> R's 'svd' throws an error because of non-convergence
> in the algorithm.  Changing the order of the rows
> makes it go away.
>
> Pat
>
>
> On 28/06/2012 16:52, Robert Harlow wrote:
>
>> Robert,
>>     First off, this is a pretty interesting phenomenon, and let me start
>> off by saying that I have not figured out what is going on.  However, I do
>> have some more information to contribute:
>> I am unable to recreate the bug in MATLAB, which uses a different
>> subroutine from LAPACK: DGESVD whereas R uses DGESDD
>>
>> I then tried to see if I exported the same random matrix and tried to run
>> it through the MATLAB svd function what would happen.  However, if I write
>> the offending matrix to a csv file, then load it back into R and calculate
>> the svd over again, it works and we get sum(sv) ~ 600.
>>
>> I know this doesn't help, but I guess its more information..
>> I'll let you know if I find anything else,
>> -Bob
>>
>> for (i in 1:10) {
>>     set.seed(i)
>>     R <- matrix(rnorm(x1*x2), x1, x2)
>>     ee <- eigen(tcrossprod(R))
>>
>>     ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
>>     D <- ee$vectors
>>     cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
>>
>>     ## Scale 'cx' to the corresponding correlation matrix and verify
>> diag=1
>>     cx <- cov2cor(cx)
>>     stopifnot(all(diag(cx)==1))
>>
>>     ## Calculate sum of singular values (should be 'x1')
>>     sv <- sum(svd(cx)$d)
>>     print(abs(sv - x1))
>>     if (abs(sv-x1) > 1e-2) {
>>         write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names
>> =
>> FALSE)
>>         x.R <- read.table("svd.csv", sep = ",")
>>         x.R <- as.matrix(x.R)
>>         R.old <- R
>>         R <- x.R
>>         ee <- eigen(tcrossprod(R))
>>         ## 'cx' is a positive semi-definite matrix with eigenvalues of
>> 'ev'
>>         D <- ee$vectors
>>         cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev),
>> ee$vectors[,1:x2])
>>
>>         ## Scale 'cx' to the corresponding correlation matrix and verify
>> diag=1
>>         cx <- cov2cor(cx)
>>         stopifnot(all(diag(cx)==1))
>>
>>         ## Calculate sum of singular values (should be 'x1')
>>         sv <- sum(svd(cx)$d)
>>         browser()
>>         cat("Seed=", i, "\n")
>>         stop("Average singular value of correlation matrix is ", round(sv,
>> 3),
>> ". Should be ", x1)
>>     }
>> }
>>
>>
>>
>> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
>> Robert.McGehee@geodecapital.**com <[hidden email]>>
>> wrote:
>>
>>  R-Financiers,
>>> For any of you who, like us, use SVD for risk modeling and stat arb
>>> trading, we've discovered what I think is a very serious bug in
>>> R/LAPACK's
>>> SVD calculation. Since this bug has the potential to create bogus risk
>>> models, I thought this audience might be interested in this post.
>>>
>>> Specifically, for many matrices that are not full rank and have a few
>>> small eigenvalues (much like covariance from stock returns!), SVD may
>>> return completely bogus and impossible results: for a correlation matrix
>>> the sum of the singular values will be many times too large and the 'u'
>>> and
>>> 'v' matrix will not be even close to orthonormal. For a random matrix
>>> with
>>> certain distributions of eigenvalues (first discovered using a covariance
>>> from 1-minute bar stock returns), SVD produces bogus results 20%-50% of
>>> the
>>> time.
>>>
>>> I've posted the bug report with an example here:
>>> https://bugs.r-project.org/**bugzilla3/show_bug.cgi?id=**14962<https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962>
>>>
>>> For any of you using risk models or stat arb that use SVD, you may want
>>> to
>>> see if this error affects you.
>>>
>>> Also, (selfishly) I wanted to post this out to fellow practitioners
>>> already knew about this problem and were using a better algorithm (I
>>> noticed there are more than one LAPACK svd algorithm). The slow method of
>>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still
>>> works
>>> fine (but it's very slow).
>>>
>>> Last, here's a simple 'svd' replacement function that performs a
>>> rudimentary check on the output of svd. It fails if 'u' is not
>>> orthonormal.
>>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
>>> time, but no guarantees that problems with 'v' and 'd' won't slip
>>> through.
>>>
>>> svd <- function(...) {
>>>    x <- base::svd(...)
>>>    if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
>>>        stop("svd gave incorrect result")
>>>    x
>>> }
>>>
>>> Suggestions, comments appreciated.
>>> --Robert
>>>
>>> Robert McGehee, CFA
>>> Geode Capital Management, LLC
>>> One Post Office Square, 28th Floor | Boston, MA | 02109
>>> Direct: (617)392-8396
>>>
>>> This e-mail, and any attachments hereto, are intended fo...{{dropped:11}}
>>>
>>> ______________________________**_________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-sig-finance<https://stat.ethz.ch/mailman/listinfo/r-sig-finance>
>>> -- Subscriber-posting only. If you want to post, subscribe first.
>>> -- Also note that this is not the r-help list where general R questions
>>> should go.
>>>
>>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-sig-finance<https://stat.ethz.ch/mailman/listinfo/r-sig-finance>
>> -- Subscriber-posting only. If you want to post, subscribe first.
>> -- Also note that this is not the r-help list where general R questions
>> should go.
>>
>>
> --
> Patrick Burns
> [hidden email]
> http://www.burns-stat.com
> http://www.portfolioprobe.com/**blog <http://www.portfolioprobe.com/blog>
> twitter: @portfolioprobe
>
>
> ______________________________**_________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-sig-finance<https://stat.ethz.ch/mailman/listinfo/r-sig-finance>
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Bob
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect SVD Calculation

Bob
Another temporary fix could be to round your matrix R:
R <- round(R, 12)
seems to work just fine...

On Thu, Jun 28, 2012 at 12:07 PM, Pierre Lapointe <[hidden email]>wrote:

> According to Peter Dalgard:
>
> The issue goes away when relinking against LAPACK 3.4.1. R's generic
> LAPACK is extracted from version 3.1.1 and could be upgraded with some
> effort (it isn't a straightforward copy).
>
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962#c2
>
> Good catch Robert!
>
> *Pierre Lapointe*
>
> *Brockhouse Cooper* | Global Macro Strategist
>
> 514-227-7711 | [hidden email]
>
>
>
> On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns <[hidden email]
> >wrote:
>
> > Thanks for highlighting that there is risk where
> > I hadn't expected any.
> >
> > I've seen a different weird phenomenon in which
> > R's 'svd' throws an error because of non-convergence
> > in the algorithm.  Changing the order of the rows
> > makes it go away.
> >
> > Pat
> >
> >
> > On 28/06/2012 16:52, Robert Harlow wrote:
> >
> >> Robert,
> >>     First off, this is a pretty interesting phenomenon, and let me start
> >> off by saying that I have not figured out what is going on.  However, I
> do
> >> have some more information to contribute:
> >> I am unable to recreate the bug in MATLAB, which uses a different
> >> subroutine from LAPACK: DGESVD whereas R uses DGESDD
> >>
> >> I then tried to see if I exported the same random matrix and tried to
> run
> >> it through the MATLAB svd function what would happen.  However, if I
> write
> >> the offending matrix to a csv file, then load it back into R and
> calculate
> >> the svd over again, it works and we get sum(sv) ~ 600.
> >>
> >> I know this doesn't help, but I guess its more information..
> >> I'll let you know if I find anything else,
> >> -Bob
> >>
> >> for (i in 1:10) {
> >>     set.seed(i)
> >>     R <- matrix(rnorm(x1*x2), x1, x2)
> >>     ee <- eigen(tcrossprod(R))
> >>
> >>     ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
> >>     D <- ee$vectors
> >>     cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])
> >>
> >>     ## Scale 'cx' to the corresponding correlation matrix and verify
> >> diag=1
> >>     cx <- cov2cor(cx)
> >>     stopifnot(all(diag(cx)==1))
> >>
> >>     ## Calculate sum of singular values (should be 'x1')
> >>     sv <- sum(svd(cx)$d)
> >>     print(abs(sv - x1))
> >>     if (abs(sv-x1) > 1e-2) {
> >>         write.table(R, "svd.csv", sep = ",", row.names = FALSE,
> col.names
> >> =
> >> FALSE)
> >>         x.R <- read.table("svd.csv", sep = ",")
> >>         x.R <- as.matrix(x.R)
> >>         R.old <- R
> >>         R <- x.R
> >>         ee <- eigen(tcrossprod(R))
> >>         ## 'cx' is a positive semi-definite matrix with eigenvalues of
> >> 'ev'
> >>         D <- ee$vectors
> >>         cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev),
> >> ee$vectors[,1:x2])
> >>
> >>         ## Scale 'cx' to the corresponding correlation matrix and verify
> >> diag=1
> >>         cx <- cov2cor(cx)
> >>         stopifnot(all(diag(cx)==1))
> >>
> >>         ## Calculate sum of singular values (should be 'x1')
> >>         sv <- sum(svd(cx)$d)
> >>         browser()
> >>         cat("Seed=", i, "\n")
> >>         stop("Average singular value of correlation matrix is ",
> round(sv,
> >> 3),
> >> ". Should be ", x1)
> >>     }
> >> }
> >>
> >>
> >>
> >> On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
> >> Robert.McGehee@geodecapital.**com <[hidden email]>>
> >> wrote:
> >>
> >>  R-Financiers,
> >>> For any of you who, like us, use SVD for risk modeling and stat arb
> >>> trading, we've discovered what I think is a very serious bug in
> >>> R/LAPACK's
> >>> SVD calculation. Since this bug has the potential to create bogus risk
> >>> models, I thought this audience might be interested in this post.
> >>>
> >>> Specifically, for many matrices that are not full rank and have a few
> >>> small eigenvalues (much like covariance from stock returns!), SVD may
> >>> return completely bogus and impossible results: for a correlation
> matrix
> >>> the sum of the singular values will be many times too large and the 'u'
> >>> and
> >>> 'v' matrix will not be even close to orthonormal. For a random matrix
> >>> with
> >>> certain distributions of eigenvalues (first discovered using a
> covariance
> >>> from 1-minute bar stock returns), SVD produces bogus results 20%-50% of
> >>> the
> >>> time.
> >>>
> >>> I've posted the bug report with an example here:
> >>> https://bugs.r-project.org/**bugzilla3/show_bug.cgi?id=**14962<
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962>
> >>>
> >>> For any of you using risk models or stat arb that use SVD, you may want
> >>> to
> >>> see if this error affects you.
> >>>
> >>> Also, (selfishly) I wanted to post this out to fellow practitioners
> >>> already knew about this problem and were using a better algorithm (I
> >>> noticed there are more than one LAPACK svd algorithm). The slow method
> of
> >>> SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still
> >>> works
> >>> fine (but it's very slow).
> >>>
> >>> Last, here's a simple 'svd' replacement function that performs a
> >>> rudimentary check on the output of svd. It fails if 'u' is not
> >>> orthonormal.
> >>> Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
> >>> time, but no guarantees that problems with 'v' and 'd' won't slip
> >>> through.
> >>>
> >>> svd <- function(...) {
> >>>    x <- base::svd(...)
> >>>    if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
> >>>        stop("svd gave incorrect result")
> >>>    x
> >>> }
> >>>
> >>> Suggestions, comments appreciated.
> >>> --Robert
> >>>
> >>> Robert McGehee, CFA
> >>> Geode Capital Management, LLC
> >>> One Post Office Square, 28th Floor | Boston, MA | 02109
> >>> Direct: (617)392-8396
> >>>
> >>> This e-mail, and any attachments hereto, are intended
> fo...{{dropped:11}}
> >>>
> >>> ______________________________**_________________
> >>> [hidden email] mailing list
> >>> https://stat.ethz.ch/mailman/**listinfo/r-sig-finance<
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>
> >>> -- Subscriber-posting only. If you want to post, subscribe first.
> >>> -- Also note that this is not the r-help list where general R questions
> >>> should go.
> >>>
> >>>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________**_________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/r-sig-finance<
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>
> >> -- Subscriber-posting only. If you want to post, subscribe first.
> >> -- Also note that this is not the r-help list where general R questions
> >> should go.
> >>
> >>
> > --
> > Patrick Burns
> > [hidden email]
> > http://www.burns-stat.com
> > http://www.portfolioprobe.com/**blog <http://www.portfolioprobe.com/blog
> >
> > twitter: @portfolioprobe
> >
> >
> > ______________________________**_________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/**listinfo/r-sig-finance<
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance>
> > -- Subscriber-posting only. If you want to post, subscribe first.
> > -- Also note that this is not the r-help list where general R questions
> > should go.
> >
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only. If you want to post, subscribe first.
> -- Also note that this is not the r-help list where general R questions
> should go.
>

        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect SVD Calculation

McGehee, Robert
In reply to this post by Pierre Lapointe
Quite right. I just rebuilt R linked against the latest version of LAPACK and all the problems went away.

From: Pierre Lapointe [mailto:[hidden email]]
Sent: Thursday, June 28, 2012 12:08 PM
To: Patrick Burns
Cc: [hidden email]; McGehee, Robert
Subject: Re: [R-SIG-Finance] Incorrect SVD Calculation

According to Peter Dalgard:

The issue goes away when relinking against LAPACK 3.4.1. R's generic LAPACK is extracted from version 3.1.1 and could be upgraded with some effort (it isn't a straightforward copy).

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962#c2
Good catch Robert!

Pierre Lapointe
Brockhouse Cooper | Global Macro Strategist
514-227-7711 | [hidden email]<mailto:[hidden email]>



On Thu, Jun 28, 2012 at 11:58 AM, Patrick Burns <[hidden email]<mailto:[hidden email]>> wrote:
Thanks for highlighting that there is risk where
I hadn't expected any.

I've seen a different weird phenomenon in which
R's 'svd' throws an error because of non-convergence
in the algorithm.  Changing the order of the rows
makes it go away.

Pat


On 28/06/2012 16:52, Robert Harlow wrote:
Robert,
    First off, this is a pretty interesting phenomenon, and let me start
off by saying that I have not figured out what is going on.  However, I do
have some more information to contribute:
I am unable to recreate the bug in MATLAB, which uses a different
subroutine from LAPACK: DGESVD whereas R uses DGESDD

I then tried to see if I exported the same random matrix and tried to run
it through the MATLAB svd function what would happen.  However, if I write
the offending matrix to a csv file, then load it back into R and calculate
the svd over again, it works and we get sum(sv) ~ 600.

I know this doesn't help, but I guess its more information..
I'll let you know if I find anything else,
-Bob

for (i in 1:10) {
    set.seed(i)
    R <- matrix(rnorm(x1*x2), x1, x2)
    ee <- eigen(tcrossprod(R))

    ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
    D <- ee$vectors
    cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])

    ## Scale 'cx' to the corresponding correlation matrix and verify diag=1
    cx <- cov2cor(cx)
    stopifnot(all(diag(cx)==1))

    ## Calculate sum of singular values (should be 'x1')
    sv <- sum(svd(cx)$d)
    print(abs(sv - x1))
    if (abs(sv-x1) > 1e-2) {
        write.table(R, "svd.csv", sep = ",", row.names = FALSE, col.names =
FALSE)
        x.R <- read.table("svd.csv", sep = ",")
        x.R <- as.matrix(x.R)
        R.old <- R
        R <- x.R
        ee <- eigen(tcrossprod(R))
        ## 'cx' is a positive semi-definite matrix with eigenvalues of 'ev'
        D <- ee$vectors
        cx <- ee$vectors[, 1:x2] %*% tcrossprod(diag(ev), ee$vectors[,1:x2])

        ## Scale 'cx' to the corresponding correlation matrix and verify
diag=1
        cx <- cov2cor(cx)
        stopifnot(all(diag(cx)==1))

        ## Calculate sum of singular values (should be 'x1')
        sv <- sum(svd(cx)$d)
        browser()
        cat("Seed=", i, "\n")
        stop("Average singular value of correlation matrix is ", round(sv,
3),
". Should be ", x1)
    }
}



On Wed, Jun 27, 2012 at 6:31 PM, McGehee, Robert <
[hidden email]<mailto:[hidden email]>> wrote:
R-Financiers,
For any of you who, like us, use SVD for risk modeling and stat arb
trading, we've discovered what I think is a very serious bug in R/LAPACK's
SVD calculation. Since this bug has the potential to create bogus risk
models, I thought this audience might be interested in this post.

Specifically, for many matrices that are not full rank and have a few
small eigenvalues (much like covariance from stock returns!), SVD may
return completely bogus and impossible results: for a correlation matrix
the sum of the singular values will be many times too large and the 'u' and
'v' matrix will not be even close to orthonormal. For a random matrix with
certain distributions of eigenvalues (first discovered using a covariance
from 1-minute bar stock returns), SVD produces bogus results 20%-50% of the
time.

I've posted the bug report with an example here:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14962

For any of you using risk models or stat arb that use SVD, you may want to
see if this error affects you.

Also, (selfishly) I wanted to post this out to fellow practitioners
already knew about this problem and were using a better algorithm (I
noticed there are more than one LAPACK svd algorithm). The slow method of
SVD calculation by eigen(crossprod(x)) and eigen(tcrossprod(x)) still works
fine (but it's very slow).

Last, here's a simple 'svd' replacement function that performs a
rudimentary check on the output of svd. It fails if 'u' is not orthonormal.
Generally 'u', 'd', and 'v' all seem to give wrong answers at the same
time, but no guarantees that problems with 'v' and 'd' won't slip through.

svd <- function(...) {
   x <- base::svd(...)
   if (!is.complex(x$u) && any(abs(colSums(x$u^2)-1) > 1e-3))
       stop("svd gave incorrect result")
   x
}

Suggestions, comments appreciated.
--Robert

Robert McGehee, CFA
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Direct: (617)392-8396<tel:%28617%29392-8396>

This e-mail, and any attachments hereto, are intended fo...{{dropped:11}}

_______________________________________________
[hidden email]<mailto:[hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions
should go.

       [[alternative HTML version deleted]]

_______________________________________________
[hidden email]<mailto:[hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.

--
Patrick Burns
[hidden email]<mailto:[hidden email]>
http://www.burns-stat.com
http://www.portfolioprobe.com/blog
twitter: @portfolioprobe


_______________________________________________
[hidden email]<mailto:[hidden email]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.


        [[alternative HTML version deleted]]

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should go.