matrix subset problem with factors

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

matrix subset problem with factors

ঋষি  ( ऋषि / rIsHi )
Hi All,

I like to report this bug related to matrix subset by rownames when passed
as factors. Now factors are may not be safe to use but then it should
generate a warning message. Since many time we use values returned by some
packages as factor to subset a matrix and which may result in a wrong
calculation.

I wish if "factor" is not expected in matrix operation then it should throw
an error/warning message.

Below are the codes to reproduce it.

> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"),
c("A","B","C")))
>
> rNames <- as.factor(c("X","Z"))
> # As some functions from different packages return factors and which
could be overlooked
> rNames
[1] X Z
Levels: X Z
>
> x[rNames,]
  A B C
X 1 4 7
Y 2 5 8
>
> ## The intended matrix should return X and Z rows instead of X and Y
>
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.1
>





--



With regards
Rishi Das Roy

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: matrix subset problem with factors

R help mailing list-2
Hi,

I get the same behavior in R 3.5.2 on macOS.

Others may feel differently, but I am not so sure that this is a bug, as opposed to perhaps the need to clarify in ?Extract, that the following, which is found under Atomic vectors:

"The index object i can be numeric, logical, character or empty. Indexing by factors is allowed and is equivalent to indexing by the numeric codes (see factor) and not by the character values which are printed (for which use [as.character(i)])."

also applies to the indexing of matrices and arrays.

Since matrices and arrays in R are vectors with 'dim' attributes, the behavior is essentially consistent as described above.

Thus, perhaps just add the second sentence above or similar wording to the section for Matrices and arrays.

Regards,

Marc Schwartz

> On Feb 20, 2019, at 4:23 AM, ঋষি ( ऋषि / rIsHi ) <[hidden email]> wrote:
>
> Hi All,
>
> I like to report this bug related to matrix subset by rownames when passed
> as factors. Now factors are may not be safe to use but then it should
> generate a warning message. Since many time we use values returned by some
> packages as factor to subset a matrix and which may result in a wrong
> calculation.
>
> I wish if "factor" is not expected in matrix operation then it should throw
> an error/warning message.
>
> Below are the codes to reproduce it.
>
>> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"),
> c("A","B","C")))
>>
>> rNames <- as.factor(c("X","Z"))
>> # As some functions from different packages return factors and which
> could be overlooked
>> rNames
> [1] X Z
> Levels: X Z
>>
>> x[rNames,]
>  A B C
> X 1 4 7
> Y 2 5 8
>>
>> ## The intended matrix should return X and Z rows instead of X and Y
>>
>> sessionInfo()
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.5 LTS
>
> Matrix products: default
> BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.0
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.1
>>
>
>
>
> With regards
> Rishi Das Roy

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: matrix subset problem with factors

Jeff Newmiller
With on official weight, I second the opinion that the existing behavior is appropriate and not a bug.

Functions should not "unexpectedly" return factors... a common example are the read.table family of functions that by default return factors, but the behaviour is deterministic and controllable with the as.is or stringsAsFactors arguments. If you have functions that randomly return different types then the bug is in those functions.

Don't confuse factors and character data types... they are distinct and used for different purposes.

On February 20, 2019 12:59:54 PM PST, Marc Schwartz via R-help <[hidden email]> wrote:

>Hi,
>
>I get the same behavior in R 3.5.2 on macOS.
>
>Others may feel differently, but I am not so sure that this is a bug,
>as opposed to perhaps the need to clarify in ?Extract, that the
>following, which is found under Atomic vectors:
>
>"The index object i can be numeric, logical, character or empty.
>Indexing by factors is allowed and is equivalent to indexing by the
>numeric codes (see factor) and not by the character values which are
>printed (for which use [as.character(i)])."
>
>also applies to the indexing of matrices and arrays.
>
>Since matrices and arrays in R are vectors with 'dim' attributes, the
>behavior is essentially consistent as described above.
>
>Thus, perhaps just add the second sentence above or similar wording to
>the section for Matrices and arrays.
>
>Regards,
>
>Marc Schwartz
>
>> On Feb 20, 2019, at 4:23 AM, ঋষি ( ऋषि / rIsHi )
><[hidden email]> wrote:
>>
>> Hi All,
>>
>> I like to report this bug related to matrix subset by rownames when
>passed
>> as factors. Now factors are may not be safe to use but then it should
>> generate a warning message. Since many time we use values returned by
>some
>> packages as factor to subset a matrix and which may result in a wrong
>> calculation.
>>
>> I wish if "factor" is not expected in matrix operation then it should
>throw
>> an error/warning message.
>>
>> Below are the codes to reproduce it.
>>
>>> x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"),
>> c("A","B","C")))
>>>
>>> rNames <- as.factor(c("X","Z"))
>>> # As some functions from different packages return factors and which
>> could be overlooked
>>> rNames
>> [1] X Z
>> Levels: X Z
>>>
>>> x[rNames,]
>>  A B C
>> X 1 4 7
>> Y 2 5 8
>>>
>>> ## The intended matrix should return X and Z rows instead of X and Y
>>>
>>> sessionInfo()
>> R version 3.4.1 (2017-06-30)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 14.04.5 LTS
>>
>> Matrix products: default
>> BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
>> LAPACK: /usr/lib/lapack/liblapack.so.3.0
>>
>> locale:
>> [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>> [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>> [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>> [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.4.1
>>>
>>
>>
>>
>> With regards
>> Rishi Das Roy
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.