na.omit not omitting rows

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

na.omit not omitting rows

Ted Stankowich
Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.

> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade))  #Create the dataframe with variables of interest from an attached dataset
> row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as rownames
> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
> str(Protect1)
'data.frame': 319 obs. of  4 variables:
 $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ...
 $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
 $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
 $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ...
 - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
  ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...

Dr. Ted Stankowich
Associate Professor
Department of Biological Sciences
California State University Long Beach
Long Beach, CA 90840
[hidden email]<mailto:[hidden email]>
562-985-4826
http://www.csulb.edu/mammal-lab/
@CSULBMammalLab




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: na.omit not omitting rows

R help mailing list-2
Does droplevels() help?

> d <- data.frame(size = factor(c("S","M","M","L","L"),
levels=c("S","M","L")), id=c(101,NA,NA,104,105))
> str(d)
'data.frame':   5 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
 $ id  : num  101 NA NA 104 105
> str(na.omit(d))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 3 3
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"
> str(droplevels(na.omit(d)))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 2 levels "S","L": 1 2 2
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <
[hidden email]> wrote:

> Hello! I'm trying to create a subset of a dataset and then remove all rows
> with NAs in them. Ultimately, I am running phylogenetic analyses with trees
> that require the tree tiplabels to match exactly with the rows in the
> dataframe. But when I use na.omit to delete the rows with NAs, there is
> still a trace of those omitted rows in the data.frame, which then causes an
> error in the phylogenetic analyses. Is there any way to completely scrub
> those omitted rows from the dataframe? The code is below. As you can see
> from the result of the final str(Protect1) line, there are attributes with
> the omitted features still in the dataframe (356 species names in the
> UphamComplBinomial factor, but only 319 observations). These traces are
> causing errors with the phylo analyses.
>
> > Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep,
> Shade))  #Create the dataframe with variables of interest from an attached
> dataset
> > row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as
> rownames
> > Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
> > str(Protect1)
> 'data.frame': 319 obs. of  4 variables:
>  $ UphamComplBinomial: Factor w/ 356 levels
> "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10
> 11 12 ...
>  $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>  $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>  $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53
> 17 49 52 52 39 39 41 ...
>  - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
>   ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES"
> "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES"
> "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...
>
> Dr. Ted Stankowich
> Associate Professor
> Department of Biological Sciences
> California State University Long Beach
> Long Beach, CA 90840
> [hidden email]<mailto:[hidden email]>
> 562-985-4826
> http://www.csulb.edu/mammal-lab/
> @CSULBMammalLab
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: na.omit not omitting rows

Ted Stankowich
Thanks, but no that doesn’t work. The na.omit attributes are still in the dataframe, which you can see in the str outputs from the post. The problem line is likely:  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3

From: William Dunlap [mailto:[hidden email]]
Sent: Thursday, June 4, 2020 12:39 PM
To: Ted Stankowich <[hidden email]>
Cc: [hidden email]
Subject: Re: [R] na.omit not omitting rows

CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.

Does droplevels() help?

> d <- data.frame(size = factor(c("S","M","M","L","L"), levels=c("S","M","L")), id=c(101,NA,NA,104,105))
> str(d)
'data.frame':   5 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
 $ id  : num  101 NA NA 104 105
> str(na.omit(d))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 3 levels "S","M","L": 1 3 3
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"
> str(droplevels(na.omit(d)))
'data.frame':   3 obs. of  2 variables:
 $ size: Factor w/ 2 levels "S","L": 1 2 2
 $ id  : num  101 104 105
 - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
  ..- attr(*, "names")= chr [1:2] "2" "3"

Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>


On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <[hidden email]<mailto:[hidden email]>> wrote:
Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.

> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade))  #Create the dataframe with variables of interest from an attached dataset
> row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as rownames
> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
> str(Protect1)
'data.frame': 319 obs. of  4 variables:
 $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ...
 $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
 $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
 $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ...
 - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
  ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...

Dr. Ted Stankowich
Associate Professor
Department of Biological Sciences
California State University Long Beach
Long Beach, CA 90840
[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>
562-985-4826
http://www.csulb.edu/mammal-lab/
@CSULBMammalLab




        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: na.omit not omitting rows

Rui Barradas
Hello,

If the problem is the "na.action" attribute, here are two ways of
solving it.

First, an example data set.

set.seed(2020)    # Make the example reproducible
phamComplBinomial <- sprintf("f%003d", 1:356)
is.na(UphamComplBinomial) <- sample(356, 37)
DarkEum <- factor(sample(1:2, 356, TRUE))
Protect1 <- data.frame(UphamComplBinomial = factor(UphamComplBinomial),
DarkEum)


1. Setting the attribute "na.action" to NULL removes it

Protect2 <- na.omit(Protect1)
attributes(Protect2)
attr(Protect2, "na.action") <- NULL
attributes(Protect2)


2. Use an index vector to subset the data

na <- is.na(Protect1$UphamComplBinomial)
Protect3 <- Protect1[!na, ]


The results are identical. But if you have more than one column with
NA's, this second way will be more complicated.

identical(Protect2, Protect3)
#[1] TRUE


Hope this helps,

Rui Barradas

Às 22:27 de 04/06/20, Ted Stankowich escreveu:

> Thanks, but no that doesn’t work. The na.omit attributes are still in the dataframe, which you can see in the str outputs from the post. The problem line is likely:  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>
> From: William Dunlap [mailto:[hidden email]]
> Sent: Thursday, June 4, 2020 12:39 PM
> To: Ted Stankowich <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] na.omit not omitting rows
>
> CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.
>
> Does droplevels() help?
>
>> d <- data.frame(size = factor(c("S","M","M","L","L"), levels=c("S","M","L")), id=c(101,NA,NA,104,105))
>> str(d)
> 'data.frame':   5 obs. of  2 variables:
>   $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
>   $ id  : num  101 NA NA 104 105
>> str(na.omit(d))
> 'data.frame':   3 obs. of  2 variables:
>   $ size: Factor w/ 3 levels "S","M","L": 1 3 3
>   $ id  : num  101 104 105
>   - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>    ..- attr(*, "names")= chr [1:2] "2" "3"
>> str(droplevels(na.omit(d)))
> 'data.frame':   3 obs. of  2 variables:
>   $ size: Factor w/ 2 levels "S","L": 1 2 2
>   $ id  : num  101 104 105
>   - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>    ..- attr(*, "names")= chr [1:2] "2" "3"
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com<http://tibco.com>
>
>
> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <[hidden email]<mailto:[hidden email]>> wrote:
> Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.
>
>> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep, Shade))  #Create the dataframe with variables of interest from an attached dataset
>> row.names(Protect1)=Protect1$UphamComplBinomial #assign species names as rownames
>> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing data
>> str(Protect1)
> 'data.frame': 319 obs. of  4 variables:
>   $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ...
>   $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>   $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>   $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ...
>   - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
>    ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...
>
> Dr. Ted Stankowich
> Associate Professor
> Department of Biological Sciences
> California State University Long Beach
> Long Beach, CA 90840
> [hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>
> 562-985-4826
> http://www.csulb.edu/mammal-lab/
> @CSULBMammalLab
>
>
>
>
>          [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: na.omit not omitting rows

Ted Stankowich
This worked! Thank you!

-----Original Message-----
From: Rui Barradas [mailto:[hidden email]]
Sent: Thursday, June 4, 2020 2:49 PM
To: Ted Stankowich <[hidden email]>; William Dunlap <[hidden email]>
Cc: [hidden email]
Subject: Re: [R] na.omit not omitting rows

CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.


Hello,

If the problem is the "na.action" attribute, here are two ways of solving it.

First, an example data set.

set.seed(2020)    # Make the example reproducible
phamComplBinomial <- sprintf("f%003d", 1:356)
is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <- factor(sample(1:2, 356, TRUE))
Protect1 <- data.frame(UphamComplBinomial = factor(UphamComplBinomial),
DarkEum)


1. Setting the attribute "na.action" to NULL removes it

Protect2 <- na.omit(Protect1)
attributes(Protect2)
attr(Protect2, "na.action") <- NULL
attributes(Protect2)


2. Use an index vector to subset the data

na <- is.na(Protect1$UphamComplBinomial)
Protect3 <- Protect1[!na, ]


The results are identical. But if you have more than one column with NA's, this second way will be more complicated.

identical(Protect2, Protect3)
#[1] TRUE


Hope this helps,

Rui Barradas

Às 22:27 de 04/06/20, Ted Stankowich escreveu:

> Thanks, but no that doesn’t work. The na.omit attributes are still in
> the dataframe, which you can see in the str outputs from the post. The
> problem line is likely:  - attr(*, "na.action")= 'omit' Named int
> [1:2] 2 3
>
> From: William Dunlap [mailto:[hidden email]]
> Sent: Thursday, June 4, 2020 12:39 PM
> To: Ted Stankowich <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] na.omit not omitting rows
>
> CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.
>
> Does droplevels() help?
>
>> d <- data.frame(size = factor(c("S","M","M","L","L"),
>> levels=c("S","M","L")), id=c(101,NA,NA,104,105))
>> str(d)
> 'data.frame':   5 obs. of  2 variables:
>   $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
>   $ id  : num  101 NA NA 104 105
>> str(na.omit(d))
> 'data.frame':   3 obs. of  2 variables:
>   $ size: Factor w/ 3 levels "S","M","L": 1 3 3
>   $ id  : num  101 104 105
>   - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>    ..- attr(*, "names")= chr [1:2] "2" "3"
>> str(droplevels(na.omit(d)))
> 'data.frame':   3 obs. of  2 variables:
>   $ size: Factor w/ 2 levels "S","L": 1 2 2
>   $ id  : num  101 104 105
>   - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>    ..- attr(*, "names")= chr [1:2] "2" "3"
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com<http://tibco.com>
>
>
> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <[hidden email]<mailto:[hidden email]>> wrote:
> Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.
>
>> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep,
>> Shade))  #Create the dataframe with variables of interest from an
>> attached dataset row.names(Protect1)=Protect1$UphamComplBinomial
>> #assign species names as rownames
>> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing
>> data
>> str(Protect1)
> 'data.frame': 319 obs. of  4 variables:
>   $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ...
>   $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>   $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>   $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ...
>   - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
>    ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...
>
> Dr. Ted Stankowich
> Associate Professor
> Department of Biological Sciences
> California State University Long Beach Long Beach, CA 90840
> [hidden email]<mailto:[hidden email]><ma
> ilto:[hidden email]<mailto:[hidden email]
> u>>
> 562-985-4826
> http://www.csulb.edu/mammal-lab/
> @CSULBMammalLab
>
>
>
>
>          [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list -- To
> UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: na.omit not omitting rows

David Winsemius
Perhaps indexing with rowSums(is.na(dfrm))?


David

Sent from my iPhone

> On Jun 4, 2020, at 4:58 PM, Ted Stankowich <[hidden email]> wrote:
>
> This worked! Thank you!
>
> -----Original Message-----
> From: Rui Barradas [mailto:[hidden email]]
> Sent: Thursday, June 4, 2020 2:49 PM
> To: Ted Stankowich <[hidden email]>; William Dunlap <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] na.omit not omitting rows
>
> CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.
>
>
> Hello,
>
> If the problem is the "na.action" attribute, here are two ways of solving it.
>
> First, an example data set.
>
> set.seed(2020)    # Make the example reproducible
> phamComplBinomial <- sprintf("f%003d", 1:356)
> is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <- factor(sample(1:2, 356, TRUE))
> Protect1 <- data.frame(UphamComplBinomial = factor(UphamComplBinomial),
> DarkEum)
>
>
> 1. Setting the attribute "na.action" to NULL removes it
>
> Protect2 <- na.omit(Protect1)
> attributes(Protect2)
> attr(Protect2, "na.action") <- NULL
> attributes(Protect2)
>
>
> 2. Use an index vector to subset the data
>
> na <- is.na(Protect1$UphamComplBinomial)
> Protect3 <- Protect1[!na, ]
>
>
> The results are identical. But if you have more than one column with NA's, this second way will be more complicated.
>
> identical(Protect2, Protect3)
> #[1] TRUE
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 22:27 de 04/06/20, Ted Stankowich escreveu:
>> Thanks, but no that doesn’t work. The na.omit attributes are still in
>> the dataframe, which you can see in the str outputs from the post. The
>> problem line is likely:  - attr(*, "na.action")= 'omit' Named int
>> [1:2] 2 3
>>
>> From: William Dunlap [mailto:[hidden email]]
>> Sent: Thursday, June 4, 2020 12:39 PM
>> To: Ted Stankowich <[hidden email]>
>> Cc: [hidden email]
>> Subject: Re: [R] na.omit not omitting rows
>>
>> CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.
>>
>> Does droplevels() help?
>>
>>> d <- data.frame(size = factor(c("S","M","M","L","L"),
>>> levels=c("S","M","L")), id=c(101,NA,NA,104,105))
>>> str(d)
>> 'data.frame':   5 obs. of  2 variables:
>>  $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3
>>  $ id  : num  101 NA NA 104 105
>>> str(na.omit(d))
>> 'data.frame':   3 obs. of  2 variables:
>>  $ size: Factor w/ 3 levels "S","M","L": 1 3 3
>>  $ id  : num  101 104 105
>>  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>>   ..- attr(*, "names")= chr [1:2] "2" "3"
>>> str(droplevels(na.omit(d)))
>> 'data.frame':   3 obs. of  2 variables:
>>  $ size: Factor w/ 2 levels "S","L": 1 2 2
>>  $ id  : num  101 104 105
>>  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>>   ..- attr(*, "names")= chr [1:2] "2" "3"
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com<http://tibco.com>
>>
>>
>> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <[hidden email]<mailto:[hidden email]>> wrote:
>> Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.
>>
>>> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep,
>>> Shade))  #Create the dataframe with variables of interest from an
>>> attached dataset row.names(Protect1)=Protect1$UphamComplBinomial
>>> #assign species names as rownames
>>> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing
>>> data
>>> str(Protect1)
>> 'data.frame': 319 obs. of  4 variables:
>>  $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ...
>>  $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>>  $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>>  $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ...
>>  - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
>>   ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...
>>
>> Dr. Ted Stankowich
>> Associate Professor
>> Department of Biological Sciences
>> California State University Long Beach Long Beach, CA 90840
>> [hidden email]<mailto:[hidden email]><ma
>> ilto:[hidden email]<mailto:[hidden email]
>> u>>
>> 562-985-4826
>> http://www.csulb.edu/mammal-lab/
>> @CSULBMammalLab
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email]<mailto:[hidden email]> mailing list -- To
>> UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: na.omit not omitting rows

Ted Stankowich
Thanks - a previous response resolved the issue and I'm off and running with the analyses.

-----Original Message-----
From: David Winsemius [mailto:[hidden email]]
Sent: Thursday, June 4, 2020 5:02 PM
To: Ted Stankowich <[hidden email]>
Cc: Rui Barradas <[hidden email]>; William Dunlap <[hidden email]>; [hidden email]
Subject: Re: [R] na.omit not omitting rows

CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.


Perhaps indexing with rowSums(is.na(dfrm))?


David

Sent from my iPhone

> On Jun 4, 2020, at 4:58 PM, Ted Stankowich <[hidden email]> wrote:
>
> This worked! Thank you!
>
> -----Original Message-----
> From: Rui Barradas [mailto:[hidden email]]
> Sent: Thursday, June 4, 2020 2:49 PM
> To: Ted Stankowich <[hidden email]>; William Dunlap
> <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] na.omit not omitting rows
>
> CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.
>
>
> Hello,
>
> If the problem is the "na.action" attribute, here are two ways of solving it.
>
> First, an example data set.
>
> set.seed(2020)    # Make the example reproducible
> phamComplBinomial <- sprintf("f%003d", 1:356)
> is.na(UphamComplBinomial) <- sample(356, 37) DarkEum <-
> factor(sample(1:2, 356, TRUE))
> Protect1 <- data.frame(UphamComplBinomial =
> factor(UphamComplBinomial),
> DarkEum)
>
>
> 1. Setting the attribute "na.action" to NULL removes it
>
> Protect2 <- na.omit(Protect1)
> attributes(Protect2)
> attr(Protect2, "na.action") <- NULL
> attributes(Protect2)
>
>
> 2. Use an index vector to subset the data
>
> na <- is.na(Protect1$UphamComplBinomial)
> Protect3 <- Protect1[!na, ]
>
>
> The results are identical. But if you have more than one column with NA's, this second way will be more complicated.
>
> identical(Protect2, Protect3)
> #[1] TRUE
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 22:27 de 04/06/20, Ted Stankowich escreveu:
>> Thanks, but no that doesn’t work. The na.omit attributes are still in
>> the dataframe, which you can see in the str outputs from the post.
>> The problem line is likely:  - attr(*, "na.action")= 'omit' Named int
>> [1:2] 2 3
>>
>> From: William Dunlap [mailto:[hidden email]]
>> Sent: Thursday, June 4, 2020 12:39 PM
>> To: Ted Stankowich <[hidden email]>
>> Cc: [hidden email]
>> Subject: Re: [R] na.omit not omitting rows
>>
>> CAUTION: This email was sent from an external source. Use caution when replying, opening links or attachments.
>>
>> Does droplevels() help?
>>
>>> d <- data.frame(size = factor(c("S","M","M","L","L"),
>>> levels=c("S","M","L")), id=c(101,NA,NA,104,105))
>>> str(d)
>> 'data.frame':   5 obs. of  2 variables:
>>  $ size: Factor w/ 3 levels "S","M","L": 1 2 2 3 3  $ id  : num  101
>> NA NA 104 105
>>> str(na.omit(d))
>> 'data.frame':   3 obs. of  2 variables:
>>  $ size: Factor w/ 3 levels "S","M","L": 1 3 3  $ id  : num  101 104
>> 105
>>  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>>   ..- attr(*, "names")= chr [1:2] "2" "3"
>>> str(droplevels(na.omit(d)))
>> 'data.frame':   3 obs. of  2 variables:
>>  $ size: Factor w/ 2 levels "S","L": 1 2 2  $ id  : num  101 104 105
>>  - attr(*, "na.action")= 'omit' Named int [1:2] 2 3
>>   ..- attr(*, "names")= chr [1:2] "2" "3"
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com<http://tibco.com>
>>
>>
>> On Thu, Jun 4, 2020 at 12:18 PM Ted Stankowich <[hidden email]<mailto:[hidden email]>> wrote:
>> Hello! I'm trying to create a subset of a dataset and then remove all rows with NAs in them. Ultimately, I am running phylogenetic analyses with trees that require the tree tiplabels to match exactly with the rows in the dataframe. But when I use na.omit to delete the rows with NAs, there is still a trace of those omitted rows in the data.frame, which then causes an error in the phylogenetic analyses. Is there any way to completely scrub those omitted rows from the dataframe? The code is below. As you can see from the result of the final str(Protect1) line, there are attributes with the omitted features still in the dataframe (356 species names in the UphamComplBinomial factor, but only 319 observations). These traces are causing errors with the phylo analyses.
>>
>>> Protect1=as.data.frame(cbind(UphamComplBinomial, DarkEum, NoctCrep,
>>> Shade))  #Create the dataframe with variables of interest from an
>>> attached dataset row.names(Protect1)=Protect1$UphamComplBinomial
>>> #assign species names as rownames
>>> Protect1=as.data.frame(na.omit(Protect1)) #drop rows with missing
>>> data
>>> str(Protect1)
>> 'data.frame': 319 obs. of  4 variables:
>>  $ UphamComplBinomial: Factor w/ 356 levels "Allenopithecus_nigroviridis_CERCOPITHECIDAE_PRIMATES",..: 1 2 3 4 5 8 9 10 11 12 ...
>>  $ DarkEum           : Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
>>  $ NoctCrep          : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
>>  $ Shade             : Factor w/ 59 levels "0.1","0.2","0.25",..: 10 58 53 17 49 52 52 39 39 41 ...
>>  - attr(*, "na.action")= 'omit' Named int  6 7 23 36 37 40 42 50 51 60 ...
>>   ..- attr(*, "names")= chr  "Alouatta_macconnelli_ATELIDAE_PRIMATES" "Alouatta_nigerrima_ATELIDAE_PRIMATES" "Ateles_fusciceps_ATELIDAE_PRIMATES" "Callicebus_baptista_PITHECIIDAE_PRIMATES" ...
>>
>> Dr. Ted Stankowich
>> Associate Professor
>> Department of Biological Sciences
>> California State University Long Beach Long Beach, CA 90840
>> [hidden email]<mailto:[hidden email]><m
>> a
>> ilto:[hidden email]<mailto:[hidden email]
>> d
>> u>>
>> 562-985-4826
>> http://www.csulb.edu/mammal-lab/
>> @CSULBMammalLab
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email]<mailto:[hidden email]> mailing list -- To
>> UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.