# Inconsistent rank in qr()

7 messages
Open this post in threaded view
|

## Inconsistent rank in qr()

 Hi, I have noticed different rank values calculated by qr() depending on LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned. Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned which is only occasionally a true rank. Would not it be more consistent to replace the rank in the latter case by something based on the following pseudo code ? d=abs(diag(qr)) rank=sum(d >= d[1]*tol) Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3) and diagonal term in qr matrix are put in decreasing order (according to their absolute values). Serguei. How to reproduce: a=diag(2) a[2,2]=0 qaf=qr(a, LAPACK=FALSE) qaf\$rank # shows 1. OK it's the true rank value qat=qr(a, LAPACK=TRUE) qat\$rank #shows 2. Bad, it's not the expected value. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Inconsistent rank in qr()

 This behavior is noted in the qr documentation, no? rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case. > On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote: > > Hi, > > I have noticed different rank values calculated by qr() depending on > LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned. > Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned > which is only occasionally a true rank. > > Would not it be more consistent to replace the rank in the latter case by something > based on the following pseudo code ? > > d=abs(diag(qr)) > rank=sum(d >= d[1]*tol) > > Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3) > and diagonal term in qr matrix are put in decreasing order (according to their absolute values). > > Serguei. > > How to reproduce: > > a=diag(2) > a[2,2]=0 > qaf=qr(a, LAPACK=FALSE) > qaf\$rank # shows 1. OK it's the true rank value > qat=qr(a, LAPACK=TRUE) > qat\$rank #shows 2. Bad, it's not the expected value. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Inconsistent rank in qr()

 Le 22/01/2018 à 17:40, Keith O'Hara a écrit : > This behavior is noted in the qr documentation, no? > > rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case. For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A)) but here the meaning of "always is full rank" is somewhat confusing. Does it mean that only full rank matrices must be submitted to qr() when LAPACK=TRUE? May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent columns in A) is so easy. Why to discard lapack case from it (even properly documented)? > > > >> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote: >> >> Hi, >> >> I have noticed different rank values calculated by qr() depending on >> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned. >> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned >> which is only occasionally a true rank. >> >> Would not it be more consistent to replace the rank in the latter case by something >> based on the following pseudo code ? >> >> d=abs(diag(qr)) >> rank=sum(d >= d[1]*tol) >> >> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3) >> and diagonal term in qr matrix are put in decreasing order (according to their absolute values). >> >> Serguei. >> >> How to reproduce: >> >> a=diag(2) >> a[2,2]=0 >> qaf=qr(a, LAPACK=FALSE) >> qaf\$rank # shows 1. OK it's the true rank value >> qat=qr(a, LAPACK=TRUE) >> qat\$rank #shows 2. Bad, it's not the expected value. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel> -- Serguei Sokol Ingenieur de recherche INRA Cellule mathématique LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504 135 Avenue de Rangueil 31077 Toulouse Cedex 04 tel: +33 5 6155 9849 email: [hidden email] http://www.lisbp.fr______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Inconsistent rank in qr()

 I agree the result is a little confusing, but the behavior is in line with the documentation and so not ‘unexpected’ as such... I don’t think this is a matter of semantics, more of a ‘return the rank when we have it for free’ situation—when A is real-valued, qr(A,LAPACK=false) calls a modified version of Linpack’s dqrdc which computes the rank internally; see dqrdc2.f, output ‘k’. Lapack’s dqeqp3 doesn’t return the rank as is. Just my 2 cents on a potential fix: Matrix::rankMatrix follows the logic of your code, and I think this would be simpler to implement than modifying Lapack.c in two places (around lines 657 and 1175). Keith > On Jan 22, 2018, at 11:57 AM, Serguei Sokol <[hidden email]> wrote: > > Le 22/01/2018 à 17:40, Keith O'Hara a écrit : >> This behavior is noted in the qr documentation, no? >> >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case. > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A)) > but here the meaning of "always is full rank" is somewhat confusing. Does it mean > that only full rank matrices must be submitted to qr() when LAPACK=TRUE? > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent > columns in A) is so easy. Why to discard lapack case from it (even properly documented)? > >> >> >> >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote: >>> >>> Hi, >>> >>> I have noticed different rank values calculated by qr() depending on >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned. >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned >>> which is only occasionally a true rank. >>> >>> Would not it be more consistent to replace the rank in the latter case by something >>> based on the following pseudo code ? >>> >>> d=abs(diag(qr)) >>> rank=sum(d >= d[1]*tol) >>> >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3) >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values). >>> >>> Serguei. >>> >>> How to reproduce: >>> >>> a=diag(2) >>> a[2,2]=0 >>> qaf=qr(a, LAPACK=FALSE) >>> qaf\$rank # shows 1. OK it's the true rank value >>> qat=qr(a, LAPACK=TRUE) >>> qat\$rank #shows 2. Bad, it's not the expected value. >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel>> > > -- > Serguei Sokol > Ingenieur de recherche INRA > > Cellule mathématique > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504 > 135 Avenue de Rangueil > 31077 Toulouse Cedex 04 > > tel: +33 5 6155 9849 > email: [hidden email] > http://www.lisbp.fr          [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Inconsistent rank in qr()

 In reply to this post by Serguei Sokol >>>>> Serguei Sokol <[hidden email]> >>>>>     on Mon, 22 Jan 2018 17:57:47 +0100 writes:     > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :     >> This behavior is noted in the qr documentation, no?     >>     >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.     > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))     > but here the meaning of "always is full rank" is somewhat confusing. Does it mean     > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?     > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix     > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent     > columns in A) is so easy. Why to discard lapack case from it (even properly documented)? Because 99.5% of caller to qr()  never look at '\$rank', so why should we compute it every time qr() is called? ==> Matrix :: rankMatrix() does use "qr" as one of its several methods. -------------- As wiser people than me have said (I'm paraphrasing, don't find a nice citation):   While the rank of a matrix is a very well defined concept in   mathematics (theory), its practical computation on a finite   precision computer is much more challenging. The ?rankMatrix  help page (package Matrix, part of your R)    https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.htmlstarts with the following 'Description' __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition. __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).     >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:     >>>     >>> Hi,     >>>     >>> I have noticed different rank values calculated by qr() depending on     >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.     >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned     >>> which is only occasionally a true rank.     >>>     >>> Would not it be more consistent to replace the rank in the latter case by something     >>> based on the following pseudo code ?     >>>     >>> d=abs(diag(qr))     >>> rank=sum(d >= d[1]*tol)     >>>     >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)     >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).     >>>     >>> Serguei.     >>>     >>> How to reproduce:     >>>     >>> a=diag(2)     >>> a[2,2]=0     >>> qaf=qr(a, LAPACK=FALSE)     >>> qaf\$rank # shows 1. OK it's the true rank value     >>> qat=qr(a, LAPACK=TRUE)     >>> qat\$rank #shows 2. Bad, it's not the expected value.     >>>     > --     > Serguei Sokol     > Ingenieur de recherche INRA     > Cellule mathématique     > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504     > 135 Avenue de Rangueil     > 31077 Toulouse Cedex 04     > tel: +33 5 6155 9849     > email: [hidden email]     > http://www.lisbp.fr______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Inconsistent rank in qr()

 Le 23/01/2018 à 08:47, Martin Maechler a écrit : >>>>>> Serguei Sokol <[hidden email]> >>>>>>      on Mon, 22 Jan 2018 17:57:47 +0100 writes: >      > Le 22/01/2018 à 17:40, Keith O'Hara a écrit : >      >> This behavior is noted in the qr documentation, no? >      >> >      >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case. >      > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A)) >      > but here the meaning of "always is full rank" is somewhat confusing. Does it mean >      > that only full rank matrices must be submitted to qr() when LAPACK=TRUE? >      > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix >      > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent >      > columns in A) is so easy. Why to discard lapack case from it (even properly documented)? > > Because 99.5% of caller to qr()  never look at '\$rank', > so why should we compute it every time qr() is called? 1. Because R already does it for linpack so it would be consistent to do so for lapack too. 2. Because R pretends that it is a part of a returned qr class. 3. Because its calculation is a negligible fraction of QR itself. > > ==> Matrix :: rankMatrix() does use "qr" as one of its several methods. > > -------------- > > As wiser people than me have said (I'm paraphrasing, don't find a nice citation): > >    While the rank of a matrix is a very well defined concept in >    mathematics (theory), its practical computation on a finite >    precision computer is much more challenging. True. It is indeed depending of round-off errors during QR calculations and tolerance setting but putting it just as min(nrow(A), ncol(A)) and still calling it rank of "full rank" is by far the most misleading choice to my mind. Once again, if we are already calculating it for linpack let do it in most consistent way for lapack too. I can propose a patch if you will. Serguei. > > The ?rankMatrix  help page (package Matrix, part of your R) >     https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html> starts with the following 'Description' > > __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition. > > __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m). > > >      >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote: >      >>> >      >>> Hi, >      >>> >      >>> I have noticed different rank values calculated by qr() depending on >      >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned. >      >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned >      >>> which is only occasionally a true rank. >      >>> >      >>> Would not it be more consistent to replace the rank in the latter case by something >      >>> based on the following pseudo code ? >      >>> >      >>> d=abs(diag(qr)) >      >>> rank=sum(d >= d[1]*tol) >      >>> >      >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3) >      >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values). >      >>> >      >>> Serguei. >      >>> >      >>> How to reproduce: >      >>> >      >>> a=diag(2) >      >>> a[2,2]=0 >      >>> qaf=qr(a, LAPACK=FALSE) >      >>> qaf\$rank # shows 1. OK it's the true rank value >      >>> qat=qr(a, LAPACK=TRUE) >      >>> qat\$rank #shows 2. Bad, it's not the expected value. >      >>> > >      > -- >      > Serguei Sokol >      > Ingenieur de recherche INRA > >      > Cellule mathématique >      > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504 >      > 135 Avenue de Rangueil >      > 31077 Toulouse Cedex 04 > >      > tel: +33 5 6155 9849 >      > email: [hidden email] >      > http://www.lisbp.fr> ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Open this post in threaded view
|

## Re: Inconsistent rank in qr()

 A simpler fix: return rank=NA when Lapack is used. Keith > On Jan 23, 2018, at 5:36 AM, Serguei Sokol <[hidden email]> wrote: > > Le 23/01/2018 à 08:47, Martin Maechler a écrit : >>>>>>> Serguei Sokol <[hidden email]> >>>>>>>     on Mon, 22 Jan 2018 17:57:47 +0100 writes: >>     > Le 22/01/2018 à 17:40, Keith O'Hara a écrit : >>     >> This behavior is noted in the qr documentation, no? >>     >> >>     >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case. >>     > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A)) >>     > but here the meaning of "always is full rank" is somewhat confusing. Does it mean >>     > that only full rank matrices must be submitted to qr() when LAPACK=TRUE? >>     > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix >>     > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent >>     > columns in A) is so easy. Why to discard lapack case from it (even properly documented)? >> >> Because 99.5% of caller to qr()  never look at '\$rank', >> so why should we compute it every time qr() is called? > 1. Because R already does it for linpack so it would be consistent to do so for lapack too. > 2. Because R pretends that it is a part of a returned qr class. > 3. Because its calculation is a negligible fraction of QR itself. > >> >> ==> Matrix :: rankMatrix() does use "qr" as one of its several methods. >> >> -------------- >> >> As wiser people than me have said (I'm paraphrasing, don't find a nice citation): >> >>   While the rank of a matrix is a very well defined concept in >>   mathematics (theory), its practical computation on a finite >>   precision computer is much more challenging. > True. It is indeed depending of round-off errors during QR calculations and tolerance > setting but putting it just as min(nrow(A), ncol(A)) and still calling it rank of "full rank" > is by far the most misleading choice to my mind. > > Once again, if we are already calculating it for linpack let do it in most consistent > way for lapack too. I can propose a patch if you will. > > Serguei. > >> >> The ?rankMatrix  help page (package Matrix, part of your R) >>    https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html>> starts with the following 'Description' >> >> __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition. >> >> __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m). >> >> >>     >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote: >>     >>> >>     >>> Hi, >>     >>> >>     >>> I have noticed different rank values calculated by qr() depending on >>     >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned. >>     >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned >>     >>> which is only occasionally a true rank. >>     >>> >>     >>> Would not it be more consistent to replace the rank in the latter case by something >>     >>> based on the following pseudo code ? >>     >>> >>     >>> d=abs(diag(qr)) >>     >>> rank=sum(d >= d[1]*tol) >>     >>> >>     >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3) >>     >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values). >>     >>> >>     >>> Serguei. >>     >>> >>     >>> How to reproduce: >>     >>> >>     >>> a=diag(2) >>     >>> a[2,2]=0 >>     >>> qaf=qr(a, LAPACK=FALSE) >>     >>> qaf\$rank # shows 1. OK it's the true rank value >>     >>> qat=qr(a, LAPACK=TRUE) >>     >>> qat\$rank #shows 2. Bad, it's not the expected value. >>     >>> >> >>     > -- >>     > Serguei Sokol >>     > Ingenieur de recherche INRA >> >>     > Cellule mathématique >>     > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504 >>     > 135 Avenue de Rangueil >>     > 31077 Toulouse Cedex 04 >> >>     > tel: +33 5 6155 9849 >>     > email: [hidden email] >>     > http://www.lisbp.fr        [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel