NA rows appeared in data.frame

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

NA rows appeared in data.frame

Ernest Han
Dear All,

After replacing some values in a data.frame, NAs rows have appeared
and cannot be removed. I have googled these issues and found that
several people have encountered it. Solutions in stackoverflow seem to
provide work-arounds but does not remove it from the data.frame.
Therefore, I am turning to experts in this community for help.

The code is as follows,

> t1 <- iris
> t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
> t1[t1$Petal.Width == 2.0, ]
      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
NA              NA          NA           NA          NA      <NA>
NA.1            NA          NA           NA          NA      <NA>
NA.2            NA          NA           NA          NA      <NA>
NA.3            NA          NA           NA          NA      <NA>
111            6.5         3.2          5.1           2 virginica
114            5.7         2.5          5.0           2 virginica
NA.4            NA          NA           NA          NA      <NA>
122            5.6         2.8          4.9           2 virginica
123            7.7         2.8          6.7           2 virginica
NA.5            NA          NA           NA          NA      <NA>
NA.6            NA          NA           NA          NA      <NA>
NA.7            NA          NA           NA          NA      <NA>
NA.8            NA          NA           NA          NA      <NA>
132            7.9         3.8          6.4           2 virginica
NA.9            NA          NA           NA          NA      <NA>
NA.10           NA          NA           NA          NA      <NA>
148            6.5         3.0          5.2           2 virginica
NA.11           NA          NA           NA          NA      <NA>

## Twelve values were replaced, twelve NA rows appeared.

### MISC INFO ###
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.5.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.0 tools_3.4.0
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"


Thank you,
Ernest

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NA rows appeared in data.frame

Rui Barradas
Hello,

You have to test for NA. Some (12) of the values of t1$Petal.Width are
NA therefore t1$Petal.Width == 2.0 alone returns 12 NA values.

t1[t1$Petal.Width == 2.0 & !is.na(t1$Petal.Width == 2.0), ]

Or use which(t1$Petal.Width == 2.0).

t1[which(t1$Petal.Width == 2.0), ]


Hope this helps,

Rui Barradas

Às 08:23 de 12/01/2019, Ernest Han escreveu:

> Dear All,
>
> After replacing some values in a data.frame, NAs rows have appeared
> and cannot be removed. I have googled these issues and found that
> several people have encountered it. Solutions in stackoverflow seem to
> provide work-arounds but does not remove it from the data.frame.
> Therefore, I am turning to experts in this community for help.
>
> The code is as follows,
>
>> t1 <- iris
>> t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
>> t1[t1$Petal.Width == 2.0, ]
>        Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> NA              NA          NA           NA          NA      <NA>
> NA.1            NA          NA           NA          NA      <NA>
> NA.2            NA          NA           NA          NA      <NA>
> NA.3            NA          NA           NA          NA      <NA>
> 111            6.5         3.2          5.1           2 virginica
> 114            5.7         2.5          5.0           2 virginica
> NA.4            NA          NA           NA          NA      <NA>
> 122            5.6         2.8          4.9           2 virginica
> 123            7.7         2.8          6.7           2 virginica
> NA.5            NA          NA           NA          NA      <NA>
> NA.6            NA          NA           NA          NA      <NA>
> NA.7            NA          NA           NA          NA      <NA>
> NA.8            NA          NA           NA          NA      <NA>
> 132            7.9         3.8          6.4           2 virginica
> NA.9            NA          NA           NA          NA      <NA>
> NA.10           NA          NA           NA          NA      <NA>
> 148            6.5         3.0          5.2           2 virginica
> NA.11           NA          NA           NA          NA      <NA>
>
> ## Twelve values were replaced, twelve NA rows appeared.
>
> ### MISC INFO ###
>> sessionInfo()
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-apple-darwin16.5.0 (64-bit)
> Running under: macOS  10.14.2
>
> Matrix products: default
> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.0 tools_3.4.0
>> Sys.getlocale()
> [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
>
>
> Thank you,
> Ernest
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NA rows appeared in data.frame

PIKAL Petr
Hi

If you want to remove rows with NA values from your data you could use

?complete.cases

or

t2 <- t1[!is.na(t1$Petal.Width),]

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Rui Barradas
> Sent: Saturday, January 12, 2019 12:55 PM
> To: Ernest Han <[hidden email]>; [hidden email]
> Subject: Re: [R] NA rows appeared in data.frame
>
> Hello,
>
> You have to test for NA. Some (12) of the values of t1$Petal.Width are NA
> therefore t1$Petal.Width == 2.0 alone returns 12 NA values.
>
> t1[t1$Petal.Width == 2.0 & !is.na(t1$Petal.Width == 2.0), ]
>
> Or use which(t1$Petal.Width == 2.0).
>
> t1[which(t1$Petal.Width == 2.0), ]
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 08:23 de 12/01/2019, Ernest Han escreveu:
> > Dear All,
> >
> > After replacing some values in a data.frame, NAs rows have appeared
> > and cannot be removed. I have googled these issues and found that
> > several people have encountered it. Solutions in stackoverflow seem to
> > provide work-arounds but does not remove it from the data.frame.
> > Therefore, I am turning to experts in this community for help.
> >
> > The code is as follows,
> >
> >> t1 <- iris
> >> t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA t1[t1$Petal.Width ==
> >> 2.0, ]
> >        Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> > NA              NA          NA           NA          NA      <NA>
> > NA.1            NA          NA           NA          NA      <NA>
> > NA.2            NA          NA           NA          NA      <NA>
> > NA.3            NA          NA           NA          NA      <NA>
> > 111            6.5         3.2          5.1           2 virginica
> > 114            5.7         2.5          5.0           2 virginica
> > NA.4            NA          NA           NA          NA      <NA>
> > 122            5.6         2.8          4.9           2 virginica
> > 123            7.7         2.8          6.7           2 virginica
> > NA.5            NA          NA           NA          NA      <NA>
> > NA.6            NA          NA           NA          NA      <NA>
> > NA.7            NA          NA           NA          NA      <NA>
> > NA.8            NA          NA           NA          NA      <NA>
> > 132            7.9         3.8          6.4           2 virginica
> > NA.9            NA          NA           NA          NA      <NA>
> > NA.10           NA          NA           NA          NA      <NA>
> > 148            6.5         3.0          5.2           2 virginica
> > NA.11           NA          NA           NA          NA      <NA>
> >
> > ## Twelve values were replaced, twelve NA rows appeared.
> >
> > ### MISC INFO ###
> >> sessionInfo()
> > R version 3.4.0 (2017-04-21)
> > Platform: x86_64-apple-darwin16.5.0 (64-bit) Running under: macOS
> > 10.14.2
> >
> > Matrix products: default
> > BLAS:
> > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/
> > vecLib.framework/Versions/A/libBLAS.dylib
> > LAPACK:
> > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/
> > vecLib.framework/Versions/A/libLAPACK.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.4.0 tools_3.4.0
> >> Sys.getlocale()
> > [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
> >
> >
> > Thank you,
> > Ernest
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NA rows appeared in data.frame

S Ellison-2
In reply to this post by Ernest Han
> After replacing some values in a data.frame, NAs rows have appeared
> and cannot be removed.
I'm not clear why you say 'cannot be removed', which sounds quite a bit stronger than 'I couldn't ...'.
The example you gave returned new NA rows because your logical test included NAs (Petal.Width == 2.0 returns NA for all of the NA petal widths, and an NA in indexing returns an NA row).
But 'cannot be removed' sounded to me as if you've read somewhere that it's impossible, or that you've tried something that should work and didn't; if you meant either of those you'll have to say what the problem was.

In the mean time:
If you want to remove rows containing _any_ NAs, see ?complete.cases  and use something like
t1[complete.cases(t1),]

If you want to remove rows that are _all_ NA, you may need something like

subset(t1, apply(t1, 1, function(x) !all(is.na(x))))
(or the equivalent '[' usage)

and, as an aside, using '==' for floating point numbers is not generally safe; for example
> sqrt(2)^2 == 2.0
[1] FALSE

See R FAQ 7.31 for details of why '==' is bad for floating point, if you haven't already.


S Ellison

> -----Original Message-----
> From: R-help [mailto:[hidden email]] On Behalf Of Ernest Han
> Sent: 12 January 2019 08:23
> To: [hidden email]
> Subject: [R] NA rows appeared in data.frame
>
> Dear All,
>
> After replacing some values in a data.frame, NAs rows have appeared
> and cannot be removed. I have googled these issues and found that
> several people have encountered it. Solutions in stackoverflow seem to
> provide work-arounds but does not remove it from the data.frame.
> Therefore, I am turning to experts in this community for help.
>
> The code is as follows,
>
> > t1 <- iris
> > t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
> > t1[t1$Petal.Width == 2.0, ]
>       Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> NA              NA          NA           NA          NA      <NA>
> NA.1            NA          NA           NA          NA      <NA>
> NA.2            NA          NA           NA          NA      <NA>
> NA.3            NA          NA           NA          NA      <NA>
> 111            6.5         3.2          5.1           2 virginica
> 114            5.7         2.5          5.0           2 virginica
> NA.4            NA          NA           NA          NA      <NA>
> 122            5.6         2.8          4.9           2 virginica
> 123            7.7         2.8          6.7           2 virginica
> NA.5            NA          NA           NA          NA      <NA>
> NA.6            NA          NA           NA          NA      <NA>
> NA.7            NA          NA           NA          NA      <NA>
> NA.8            NA          NA           NA          NA      <NA>
> 132            7.9         3.8          6.4           2 virginica
> NA.9            NA          NA           NA          NA      <NA>
> NA.10           NA          NA           NA          NA      <NA>
> 148            6.5         3.0          5.2           2 virginica
> NA.11           NA          NA           NA          NA      <NA>
>
> ## Twelve values were replaced, twelve NA rows appeared.
>
> ### MISC INFO ###
> > sessionInfo()
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-apple-darwin16.5.0 (64-bit)
> Running under: macOS  10.14.2
>
> Matrix products: default
> BLAS:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewor
> ks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewor
> ks/vecLib.framework/Versions/A/libLAPACK.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.0 tools_3.4.0
> > Sys.getlocale()
> [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-
> 8"
>
>
> Thank you,
> Ernest
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NA rows appeared in data.frame

Ernest Han
In reply to this post by PIKAL Petr
Dear Rui and Petr,

Thank you for taking time and effort to help.

Rui's solution is an effective workaround so that I can continue to
work with the data. However, the appearance of these NA rows (with NA
rownames) is clearly errorneous (possibly a bug behaviour due to R
base code). What I am interested is a solution that removes these NA
rows.

The reasons is because (1) prior to the NA assignment, one does not
need to test for NA value. (2) Besides, sometimes these NA values are
needed as part of the data to indicate that the missing data.

> t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA

Petr's solution is also not apt in my case, because it removes 12 rows
that have NA values in "Petal.Width". I would like a solution that
keeps the 150 rows, but not the mysterious 12 rows with all NA values
in all columns.

Once again, I appreciate your suggestions and I am hoping that this
'errorneous' behaviour has a fix.

Cheers,
Ernest

On Mon, Jan 14, 2019 at 4:25 PM PIKAL Petr <[hidden email]> wrote:

>
> Hi
>
> If you want to remove rows with NA values from your data you could use
>
> ?complete.cases
>
> or
>
> t2 <- t1[!is.na(t1$Petal.Width),]
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <[hidden email]> On Behalf Of Rui Barradas
> > Sent: Saturday, January 12, 2019 12:55 PM
> > To: Ernest Han <[hidden email]>; [hidden email]
> > Subject: Re: [R] NA rows appeared in data.frame
> >
> > Hello,
> >
> > You have to test for NA. Some (12) of the values of t1$Petal.Width are NA
> > therefore t1$Petal.Width == 2.0 alone returns 12 NA values.
> >
> > t1[t1$Petal.Width == 2.0 & !is.na(t1$Petal.Width == 2.0), ]
> >
> > Or use which(t1$Petal.Width == 2.0).
> >
> > t1[which(t1$Petal.Width == 2.0), ]
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 08:23 de 12/01/2019, Ernest Han escreveu:
> > > Dear All,
> > >
> > > After replacing some values in a data.frame, NAs rows have appeared
> > > and cannot be removed. I have googled these issues and found that
> > > several people have encountered it. Solutions in stackoverflow seem to
> > > provide work-arounds but does not remove it from the data.frame.
> > > Therefore, I am turning to experts in this community for help.
> > >
> > > The code is as follows,
> > >
> > >> t1 <- iris
> > >> t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA t1[t1$Petal.Width ==
> > >> 2.0, ]
> > >        Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> > > NA              NA          NA           NA          NA      <NA>
> > > NA.1            NA          NA           NA          NA      <NA>
> > > NA.2            NA          NA           NA          NA      <NA>
> > > NA.3            NA          NA           NA          NA      <NA>
> > > 111            6.5         3.2          5.1           2 virginica
> > > 114            5.7         2.5          5.0           2 virginica
> > > NA.4            NA          NA           NA          NA      <NA>
> > > 122            5.6         2.8          4.9           2 virginica
> > > 123            7.7         2.8          6.7           2 virginica
> > > NA.5            NA          NA           NA          NA      <NA>
> > > NA.6            NA          NA           NA          NA      <NA>
> > > NA.7            NA          NA           NA          NA      <NA>
> > > NA.8            NA          NA           NA          NA      <NA>
> > > 132            7.9         3.8          6.4           2 virginica
> > > NA.9            NA          NA           NA          NA      <NA>
> > > NA.10           NA          NA           NA          NA      <NA>
> > > 148            6.5         3.0          5.2           2 virginica
> > > NA.11           NA          NA           NA          NA      <NA>
> > >
> > > ## Twelve values were replaced, twelve NA rows appeared.
> > >
> > > ### MISC INFO ###
> > >> sessionInfo()
> > > R version 3.4.0 (2017-04-21)
> > > Platform: x86_64-apple-darwin16.5.0 (64-bit) Running under: macOS
> > > 10.14.2
> > >
> > > Matrix products: default
> > > BLAS:
> > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/
> > > vecLib.framework/Versions/A/libBLAS.dylib
> > > LAPACK:
> > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/
> > > vecLib.framework/Versions/A/libLAPACK.dylib
> > >
> > > locale:
> > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > >
> > > attached base packages:
> > > [1] stats     graphics  grDevices utils     datasets  methods   base
> > >
> > > loaded via a namespace (and not attached):
> > > [1] compiler_3.4.0 tools_3.4.0
> > >> Sys.getlocale()
> > > [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
> > >
> > >
> > > Thank you,
> > > Ernest
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
> Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NA rows appeared in data.frame

PIKAL Petr
Hi

You put NA to some variable in 150 rows. So you do not have "mysterious" NA rows in your file. If you want to select anything based on column with NA values you have to perform your selection using which (as Rui suggested).

It is documented in help page, although it is probably rather less comprehensible (maybe some example added to help page could be useful).
-----
NAs in indexing

When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and NULL for a list. (It returns 00 for a raw result.)
-----
I believe that this behaviour has some reason, because you compare 2 to NA and NA is basically "I do not know". So it could be 2 and therefore also rows with NA are returned. If I am wrong, I hope R gurus will correct me.

You said you want to remove rows with NA values, therefore I suggested complete.cases function. After this you end with object stripped from rows with NA values so with less rows.

I would be rather cautious with word "errorneous". I remember old days when Excel considered empty cells as zeros and gave "errorneous" calculations but I believe that it was pretty sensible from accountant point of view as empty cell means 0.

In almost all cases, analysis in R give you correct results, you just need to tell R how to apply function to object with NA values.
> mean(t1$Petal.Width)
[1] NA
> mean(t1$Petal.Width, na.rm=T)
[1] 1.147101
>

Cheers
Petr


> -----Original Message-----
> From: Ernest Han <[hidden email]>
> Sent: Wednesday, January 16, 2019 3:27 AM
> To: PIKAL Petr <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] NA rows appeared in data.frame
>
> Dear Rui and Petr,
>
> Thank you for taking time and effort to help.
>
> Rui's solution is an effective workaround so that I can continue to work with
> the data. However, the appearance of these NA rows (with NA
> rownames) is clearly errorneous (possibly a bug behaviour due to R base code).
> What I am interested is a solution that removes these NA rows.
>
> The reasons is because (1) prior to the NA assignment, one does not need to
> test for NA value. (2) Besides, sometimes these NA values are needed as part of
> the data to indicate that the missing data.
>
> > t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
>
> Petr's solution is also not apt in my case, because it removes 12 rows that have
> NA values in "Petal.Width". I would like a solution that keeps the 150 rows, but
> not the mysterious 12 rows with all NA values in all columns.

Now I am puzzled what do you really want?

with your example and my suggestion you get

t1 <- iris
t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA
t2 <- t1[!is.na(t1$Petal.Width),]
t2[t2$Petal.Width == 2.0, ]
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
111          6.5         3.2          5.1           2 virginica
114          5.7         2.5          5.0           2 virginica
122          5.6         2.8          4.9           2 virginica
123          7.7         2.8          6.7           2 virginica
132          7.9         3.8          6.4           2 virginica
148          6.5         3.0          5.2           2 virginica
>

> dim(t2)
[1] 138   5
> dim(t1)
[1] 150   5
>

>
> Once again, I appreciate your suggestions and I am hoping that this 'errorneous'
> behaviour has a fix.
>
> Cheers,
> Ernest
>
> On Mon, Jan 14, 2019 at 4:25 PM PIKAL Petr <[hidden email]> wrote:
> >
> > Hi
> >
> > If you want to remove rows with NA values from your data you could use
> >
> > ?complete.cases
> >
> > or
> >
> > t2 <- t1[!is.na(t1$Petal.Width),]
> >
> > Cheers
> > Petr
> >
> > > -----Original Message-----
> > > From: R-help <[hidden email]> On Behalf Of Rui
> > > Barradas
> > > Sent: Saturday, January 12, 2019 12:55 PM
> > > To: Ernest Han <[hidden email]>; [hidden email]
> > > Subject: Re: [R] NA rows appeared in data.frame
> > >
> > > Hello,
> > >
> > > You have to test for NA. Some (12) of the values of t1$Petal.Width
> > > are NA therefore t1$Petal.Width == 2.0 alone returns 12 NA values.
> > >
> > > t1[t1$Petal.Width == 2.0 & !is.na(t1$Petal.Width == 2.0), ]
> > >
> > > Or use which(t1$Petal.Width == 2.0).
> > >
> > > t1[which(t1$Petal.Width == 2.0), ]
> > >
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > > Às 08:23 de 12/01/2019, Ernest Han escreveu:
> > > > Dear All,
> > > >
> > > > After replacing some values in a data.frame, NAs rows have
> > > > appeared and cannot be removed. I have googled these issues and
> > > > found that several people have encountered it. Solutions in
> > > > stackoverflow seem to provide work-arounds but does not remove it from
> the data.frame.
> > > > Therefore, I am turning to experts in this community for help.
> > > >
> > > > The code is as follows,
> > > >
> > > >> t1 <- iris
> > > >> t1[t1$Petal.Width==1.8, "Petal.Width"] <- NA t1[t1$Petal.Width ==
> > > >> 2.0, ]
> > > >        Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> > > > NA              NA          NA           NA          NA      <NA>
> > > > NA.1            NA          NA           NA          NA      <NA>
> > > > NA.2            NA          NA           NA          NA      <NA>
> > > > NA.3            NA          NA           NA          NA      <NA>
> > > > 111            6.5         3.2          5.1           2 virginica
> > > > 114            5.7         2.5          5.0           2 virginica
> > > > NA.4            NA          NA           NA          NA      <NA>
> > > > 122            5.6         2.8          4.9           2 virginica
> > > > 123            7.7         2.8          6.7           2 virginica
> > > > NA.5            NA          NA           NA          NA      <NA>
> > > > NA.6            NA          NA           NA          NA      <NA>
> > > > NA.7            NA          NA           NA          NA      <NA>
> > > > NA.8            NA          NA           NA          NA      <NA>
> > > > 132            7.9         3.8          6.4           2 virginica
> > > > NA.9            NA          NA           NA          NA      <NA>
> > > > NA.10           NA          NA           NA          NA      <NA>
> > > > 148            6.5         3.0          5.2           2 virginica
> > > > NA.11           NA          NA           NA          NA      <NA>
> > > >
> > > > ## Twelve values were replaced, twelve NA rows appeared.
> > > >
> > > > ### MISC INFO ###
> > > >> sessionInfo()
> > > > R version 3.4.0 (2017-04-21)
> > > > Platform: x86_64-apple-darwin16.5.0 (64-bit) Running under: macOS
> > > > 10.14.2
> > > >
> > > > Matrix products: default
> > > > BLAS:
> > > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewo
> > > > rks/ vecLib.framework/Versions/A/libBLAS.dylib
> > > > LAPACK:
> > > > /System/Library/Frameworks/Accelerate.framework/Versions/A/Framewo
> > > > rks/ vecLib.framework/Versions/A/libLAPACK.dylib
> > > >
> > > > locale:
> > > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > > >
> > > > attached base packages:
> > > > [1] stats     graphics  grDevices utils     datasets  methods   base
> > > >
> > > > loaded via a namespace (and not attached):
> > > > [1] compiler_3.4.0 tools_3.4.0
> > > >> Sys.getlocale()
> > > > [1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-
> 8"
> > > >
> > > >
> > > > Thank you,
> > > > Ernest
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > Osobní údaje: Informace o zpracování a ochraně osobních údajů
> > obchodních partnerů PRECHEZA a.s. jsou zveřejněny na:
> > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information
> > about processing and protection of business partner’s personal data
> > are available on website:
> > https://www.precheza.cz/en/personal-data-protection-principles/
> > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou
> > důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení
> > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any
> > documents attached to it may be confidential and are subject to the
> > legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: NA rows appeared in data.frame

Peter Dalgaard-2
There is some logic to getting something that you don't know what is when you don't know whether you want it or not. It is certainly more informative than not getting anything, just like if you indexed with FALSE.

However, a more straightforward argument is that when you use integer indexing as a lookup table, as in

color <- c("red","blue")[gender]

then clearly you want NA if gender is NA. The rest then follows from coercion rules: NA is by default mode "logical" and much confusion could happen if x[NA]!=x[NA_integer], for instance x[c(1,NA)] != c(x[1], x[NA]).

-pd

> On 16 Jan 2019, at 08:29 , PIKAL Petr <[hidden email]> wrote:
>
> I believe that this behaviour has some reason, because you compare 2 to NA and NA is basically "I do not know". So it could be 2 and therefore also rows with NA are returned. If I am wrong, I hope R gurus will correct me.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.