Inconsistent rank in qr()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent rank in qr()

Serguei Sokol
Hi,

I have noticed different rank values calculated by qr() depending on
LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
which is only occasionally a true rank.

Would not it be more consistent to replace the rank in the latter case by something
based on the following pseudo code ?

d=abs(diag(qr))
rank=sum(d >= d[1]*tol)

Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
and diagonal term in qr matrix are put in decreasing order (according to their absolute values).

Serguei.

How to reproduce:

a=diag(2)
a[2,2]=0
qaf=qr(a, LAPACK=FALSE)
qaf$rank # shows 1. OK it's the true rank value
qat=qr(a, LAPACK=TRUE)
qat$rank #shows 2. Bad, it's not the expected value.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rank in qr()

Keith O'Hara
This behavior is noted in the qr documentation, no?

rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.



> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:
>
> Hi,
>
> I have noticed different rank values calculated by qr() depending on
> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
> which is only occasionally a true rank.
>
> Would not it be more consistent to replace the rank in the latter case by something
> based on the following pseudo code ?
>
> d=abs(diag(qr))
> rank=sum(d >= d[1]*tol)
>
> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>
> Serguei.
>
> How to reproduce:
>
> a=diag(2)
> a[2,2]=0
> qaf=qr(a, LAPACK=FALSE)
> qaf$rank # shows 1. OK it's the true rank value
> qat=qr(a, LAPACK=TRUE)
> qat$rank #shows 2. Bad, it's not the expected value.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rank in qr()

Serguei Sokol
Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
> This behavior is noted in the qr documentation, no?
>
> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
but here the meaning of "always is full rank" is somewhat confusing. Does it mean
that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
columns in A) is so easy. Why to discard lapack case from it (even properly documented)?

>
>
>
>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:
>>
>> Hi,
>>
>> I have noticed different rank values calculated by qr() depending on
>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
>> which is only occasionally a true rank.
>>
>> Would not it be more consistent to replace the rank in the latter case by something
>> based on the following pseudo code ?
>>
>> d=abs(diag(qr))
>> rank=sum(d >= d[1]*tol)
>>
>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>>
>> Serguei.
>>
>> How to reproduce:
>>
>> a=diag(2)
>> a[2,2]=0
>> qaf=qr(a, LAPACK=FALSE)
>> qaf$rank # shows 1. OK it's the true rank value
>> qat=qr(a, LAPACK=TRUE)
>> qat$rank #shows 2. Bad, it's not the expected value.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Serguei Sokol
Ingenieur de recherche INRA

Cellule mathématique
LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 6155 9849
email: [hidden email]
http://www.lisbp.fr

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rank in qr()

Keith O'Hara
I agree the result is a little confusing, but the behavior is in line with the documentation and so not ‘unexpected’ as such...

I don’t think this is a matter of semantics, more of a ‘return the rank when we have it for free’ situation—when A is real-valued, qr(A,LAPACK=false) calls a modified version of Linpack’s dqrdc which computes the rank internally; see dqrdc2.f, output ‘k’. Lapack’s dqeqp3 doesn’t return the rank as is.

Just my 2 cents on a potential fix: Matrix::rankMatrix follows the logic of your code, and I think this would be simpler to implement than modifying Lapack.c in two places (around lines 657 and 1175).

Keith


> On Jan 22, 2018, at 11:57 AM, Serguei Sokol <[hidden email]> wrote:
>
> Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
>> This behavior is noted in the qr documentation, no?
>>
>> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
> For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
> but here the meaning of "always is full rank" is somewhat confusing. Does it mean
> that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
> May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
> but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
> columns in A) is so easy. Why to discard lapack case from it (even properly documented)?
>
>>
>>
>>
>>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I have noticed different rank values calculated by qr() depending on
>>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
>>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
>>> which is only occasionally a true rank.
>>>
>>> Would not it be more consistent to replace the rank in the latter case by something
>>> based on the following pseudo code ?
>>>
>>> d=abs(diag(qr))
>>> rank=sum(d >= d[1]*tol)
>>>
>>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
>>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>>>
>>> Serguei.
>>>
>>> How to reproduce:
>>>
>>> a=diag(2)
>>> a[2,2]=0
>>> qaf=qr(a, LAPACK=FALSE)
>>> qaf$rank # shows 1. OK it's the true rank value
>>> qat=qr(a, LAPACK=TRUE)
>>> qat$rank #shows 2. Bad, it's not the expected value.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Serguei Sokol
> Ingenieur de recherche INRA
>
> Cellule mathématique
> LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
> 135 Avenue de Rangueil
> 31077 Toulouse Cedex 04
>
> tel: +33 5 6155 9849
> email: [hidden email] <mailto:[hidden email]>
> http://www.lisbp.fr <http://www.lisbp.fr/>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rank in qr()

Martin Maechler
In reply to this post by Serguei Sokol
>>>>> Serguei Sokol <[hidden email]>
>>>>>     on Mon, 22 Jan 2018 17:57:47 +0100 writes:

    > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
    >> This behavior is noted in the qr documentation, no?
    >>
    >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
    > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
    > but here the meaning of "always is full rank" is somewhat confusing. Does it mean
    > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
    > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
    > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
    > columns in A) is so easy. Why to discard lapack case from it (even properly documented)?

Because 99.5% of caller to qr()  never look at '$rank',
so why should we compute it every time qr() is called?

==> Matrix :: rankMatrix() does use "qr" as one of its several methods.

--------------

As wiser people than me have said (I'm paraphrasing, don't find a nice citation):

  While the rank of a matrix is a very well defined concept in
  mathematics (theory), its practical computation on a finite
  precision computer is much more challenging.

The ?rankMatrix  help page (package Matrix, part of your R)
   https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
starts with the following 'Description'

__ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.

__ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).


    >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:
    >>>
    >>> Hi,
    >>>
    >>> I have noticed different rank values calculated by qr() depending on
    >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
    >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
    >>> which is only occasionally a true rank.
    >>>
    >>> Would not it be more consistent to replace the rank in the latter case by something
    >>> based on the following pseudo code ?
    >>>
    >>> d=abs(diag(qr))
    >>> rank=sum(d >= d[1]*tol)
    >>>
    >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
    >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
    >>>
    >>> Serguei.
    >>>
    >>> How to reproduce:
    >>>
    >>> a=diag(2)
    >>> a[2,2]=0
    >>> qaf=qr(a, LAPACK=FALSE)
    >>> qaf$rank # shows 1. OK it's the true rank value
    >>> qat=qr(a, LAPACK=TRUE)
    >>> qat$rank #shows 2. Bad, it's not the expected value.
    >>>

    > --
    > Serguei Sokol
    > Ingenieur de recherche INRA

    > Cellule mathématique
    > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
    > 135 Avenue de Rangueil
    > 31077 Toulouse Cedex 04

    > tel: +33 5 6155 9849
    > email: [hidden email]
    > http://www.lisbp.fr

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rank in qr()

Serguei Sokol
Le 23/01/2018 à 08:47, Martin Maechler a écrit :

>>>>>> Serguei Sokol <[hidden email]>
>>>>>>      on Mon, 22 Jan 2018 17:57:47 +0100 writes:
>      > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
>      >> This behavior is noted in the qr documentation, no?
>      >>
>      >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
>      > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
>      > but here the meaning of "always is full rank" is somewhat confusing. Does it mean
>      > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
>      > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
>      > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
>      > columns in A) is so easy. Why to discard lapack case from it (even properly documented)?
>
> Because 99.5% of caller to qr()  never look at '$rank',
> so why should we compute it every time qr() is called?
1. Because R already does it for linpack so it would be consistent to do so for lapack too.
2. Because R pretends that it is a part of a returned qr class.
3. Because its calculation is a negligible fraction of QR itself.

>
> ==> Matrix :: rankMatrix() does use "qr" as one of its several methods.
>
> --------------
>
> As wiser people than me have said (I'm paraphrasing, don't find a nice citation):
>
>    While the rank of a matrix is a very well defined concept in
>    mathematics (theory), its practical computation on a finite
>    precision computer is much more challenging.
True. It is indeed depending of round-off errors during QR calculations and tolerance
setting but putting it just as min(nrow(A), ncol(A)) and still calling it rank of "full rank"
is by far the most misleading choice to my mind.

Once again, if we are already calculating it for linpack let do it in most consistent
way for lapack too. I can propose a patch if you will.

Serguei.

>
> The ?rankMatrix  help page (package Matrix, part of your R)
>     https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
> starts with the following 'Description'
>
> __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.
>
> __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).
>
>
>      >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:
>      >>>
>      >>> Hi,
>      >>>
>      >>> I have noticed different rank values calculated by qr() depending on
>      >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
>      >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
>      >>> which is only occasionally a true rank.
>      >>>
>      >>> Would not it be more consistent to replace the rank in the latter case by something
>      >>> based on the following pseudo code ?
>      >>>
>      >>> d=abs(diag(qr))
>      >>> rank=sum(d >= d[1]*tol)
>      >>>
>      >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
>      >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>      >>>
>      >>> Serguei.
>      >>>
>      >>> How to reproduce:
>      >>>
>      >>> a=diag(2)
>      >>> a[2,2]=0
>      >>> qaf=qr(a, LAPACK=FALSE)
>      >>> qaf$rank # shows 1. OK it's the true rank value
>      >>> qat=qr(a, LAPACK=TRUE)
>      >>> qat$rank #shows 2. Bad, it's not the expected value.
>      >>>
>
>      > --
>      > Serguei Sokol
>      > Ingenieur de recherche INRA
>
>      > Cellule mathématique
>      > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
>      > 135 Avenue de Rangueil
>      > 31077 Toulouse Cedex 04
>
>      > tel: +33 5 6155 9849
>      > email: [hidden email]
>      > http://www.lisbp.fr
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent rank in qr()

Keith O'Hara
A simpler fix: return rank=NA when Lapack is used.

Keith

> On Jan 23, 2018, at 5:36 AM, Serguei Sokol <[hidden email]> wrote:
>
> Le 23/01/2018 à 08:47, Martin Maechler a écrit :
>>>>>>> Serguei Sokol <[hidden email]>
>>>>>>>     on Mon, 22 Jan 2018 17:57:47 +0100 writes:
>>     > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
>>     >> This behavior is noted in the qr documentation, no?
>>     >>
>>     >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
>>     > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
>>     > but here the meaning of "always is full rank" is somewhat confusing. Does it mean
>>     > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
>>     > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
>>     > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
>>     > columns in A) is so easy. Why to discard lapack case from it (even properly documented)?
>>
>> Because 99.5% of caller to qr()  never look at '$rank',
>> so why should we compute it every time qr() is called?
> 1. Because R already does it for linpack so it would be consistent to do so for lapack too.
> 2. Because R pretends that it is a part of a returned qr class.
> 3. Because its calculation is a negligible fraction of QR itself.
>
>>
>> ==> Matrix :: rankMatrix() does use "qr" as one of its several methods.
>>
>> --------------
>>
>> As wiser people than me have said (I'm paraphrasing, don't find a nice citation):
>>
>>   While the rank of a matrix is a very well defined concept in
>>   mathematics (theory), its practical computation on a finite
>>   precision computer is much more challenging.
> True. It is indeed depending of round-off errors during QR calculations and tolerance
> setting but putting it just as min(nrow(A), ncol(A)) and still calling it rank of "full rank"
> is by far the most misleading choice to my mind.
>
> Once again, if we are already calculating it for linpack let do it in most consistent
> way for lapack too. I can propose a patch if you will.
>
> Serguei.
>
>>
>> The ?rankMatrix  help page (package Matrix, part of your R)
>>    https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
>> starts with the following 'Description'
>>
>> __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.
>>
>> __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).
>>
>>
>>     >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <[hidden email]> wrote:
>>     >>>
>>     >>> Hi,
>>     >>>
>>     >>> I have noticed different rank values calculated by qr() depending on
>>     >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
>>     >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
>>     >>> which is only occasionally a true rank.
>>     >>>
>>     >>> Would not it be more consistent to replace the rank in the latter case by something
>>     >>> based on the following pseudo code ?
>>     >>>
>>     >>> d=abs(diag(qr))
>>     >>> rank=sum(d >= d[1]*tol)
>>     >>>
>>     >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
>>     >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
>>     >>>
>>     >>> Serguei.
>>     >>>
>>     >>> How to reproduce:
>>     >>>
>>     >>> a=diag(2)
>>     >>> a[2,2]=0
>>     >>> qaf=qr(a, LAPACK=FALSE)
>>     >>> qaf$rank # shows 1. OK it's the true rank value
>>     >>> qat=qr(a, LAPACK=TRUE)
>>     >>> qat$rank #shows 2. Bad, it's not the expected value.
>>     >>>
>>
>>     > --
>>     > Serguei Sokol
>>     > Ingenieur de recherche INRA
>>
>>     > Cellule mathématique
>>     > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
>>     > 135 Avenue de Rangueil
>>     > 31077 Toulouse Cedex 04
>>
>>     > tel: +33 5 6155 9849
>>     > email: [hidden email]
>>     > http://www.lisbp.fr


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel