Quantcast

subsetting and NAs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

subsetting and NAs

Eric Archer
R-help,

I'm getting some unexpected behavior with subsetting a data frame
(aircraft flight data) that I can't sort out.
Here is a simplified version of my data frame and problem:

 > flight
      FlightID TailNo FlightDate HobbsTime FlightCost       Date year
1         4497  6009K       <NA>       2.2      330.0       <NA>   NA
2         4498  6009K       <NA>       0.8      120.0       <NA>   NA
3         4499  6009K       <NA>       0.9      135.0       <NA>   NA
4         4500  6009K       <NA>       1.1      165.0       <NA>   NA
5         4501  6009K       <NA>       1.5      225.0       <NA>   NA
2587      7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
2588      7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
2589      7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
2590      7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
2591      7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
29793    35208  91630  1/21/2006       1.4      107.8 2006-01-21 2006
29794    35209  91630  1/21/2006       0.7       53.9 2006-01-21 2006
29795    35210  9725B  1/21/2006       1.4      138.6 2006-01-21 2006
29796    35212  91630  1/28/2006       1.0       77.0 2006-01-28 2006
29797    35213  91630  1/28/2006       1.6      123.2 2006-01-28 2006
29798    35214  3386E   1/5/2006       1.1       86.9 2006-01-05 2006

I then try to extract the error years :

 > errors <- flight[flight$year > 2006,]
 > errors
     FlightID TailNo FlightDate HobbsTime FlightCost       Date year
NA         NA   <NA>       <NA>        NA         NA       <NA>   NA
NA.1       NA   <NA>       <NA>        NA         NA       <NA>   NA
NA.2       NA   <NA>       <NA>        NA         NA       <NA>   NA
NA.3       NA   <NA>       <NA>        NA         NA       <NA>   NA
NA.4       NA   <NA>       <NA>        NA         NA       <NA>   NA
2587     7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
2588     7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
2589     7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
2590     7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
2591     7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009

Would someone please explain to me why the new data frame has all
columns (and row names) replaced with NA where year was NA and how to
avoid this behavior?.
Thanks in advance.

I am using R v2.2.1 on Windows XP.

Cheers,
eric

Sample Data:

structure(list(FlightID = c(4497, 4498, 4499, 4500, 4501, 7083,
7084, 7085, 7086, 7087, 35208, 35209, 35210, 35212, 35213, 35214
), TailNo = structure(c(28, 28, 28, 28, 28, 49, 49, 49, 49, 49,
47, 47, 54, 47, 47, 15), .Label = c("12345", "133BW", "152GB",
"172CM", "172RW", "1955L", "2219E", "222WC", "231NW", "2496M",
"2630V", "2726E", "2903A", "2977G", "3386E", "3803E", "3979V",
"409EV", "43160", "46275", "4644B", "47885", "4922D", "4975F",
"5073H", "5317P", "5335P", "6009K", "6013X", "6036J", "6360D",
"64048", "6495R", "66038", "67844", "6913R", "733XL", "734BT",
"738QA", "808LP", "8148F", "8164Z", "8269T", "8451R", "8654V",
"8715E", "91630", "9199Z", "9206N", "92SA", "936GW", "9488G",
"9596H", "9725B", "9756U", "ELITE", "N20BY", "N53MF"), class = "factor"),
    FlightDate = c(NA, NA, NA, NA, NA, "4/8/2009", "4/10/2009",
    "4/11/2009", "4/12/2009", "4/15/2009", "1/21/2006", "1/21/2006",
    "1/21/2006", "1/28/2006", "1/28/2006", "1/5/2006"), HobbsTime = c(2.2,
    0.8, 0.9, 1.1, 1.5, 1.5, 1.3, 1.9, 1.3, 1.1, 1.4, 0.7, 1.4,
    1, 1.6, 1.1), FlightCost = c(330, 120, 135, 165, 225, 103.5,
    89.7, 131.1, 89.7, 75.9, 107.8, 53.9, 138.6, 77, 123.2, 86.9
    ), Date = structure(c(NA, NA, NA, NA, NA, 1239174000, 1239346800,
    1239433200, 1239519600, 1239778800, 1137830400, 1137830400,
    1137830400, 1138435200, 1138435200, 1136448000), tzone = "", class =
c("POSIXt",
    "POSIXct")), year = c(NA, NA, NA, NA, NA, 2009, 2009, 2009,
    2009, 2009, 2006, 2006, 2006, 2006, 2006, 2006)), .Names =
c("FlightID",
"TailNo", "FlightDate", "HobbsTime", "FlightCost", "Date", "year"
), row.names = c("1", "2", "3", "4", "5", "2587", "2588", "2589",
"2590", "2591", "29793", "29794", "29795", "29796", "29797",
"29798"), class = "data.frame")


--

Eric Archer, Ph.D.
NOAA-SWFSC
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121,7003(FAX)
[hidden email]


"Lighthouses are more helpful than churches."
    - Benjamin Franklin

"Cogita tute" - Think for yourself

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subsetting and NAs

P Ehlers


Eric Archer wrote:

> R-help,
>
> I'm getting some unexpected behavior with subsetting a data frame
> (aircraft flight data) that I can't sort out.
> Here is a simplified version of my data frame and problem:
>
>  > flight
>       FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> 1         4497  6009K       <NA>       2.2      330.0       <NA>   NA
> 2         4498  6009K       <NA>       0.8      120.0       <NA>   NA
> 3         4499  6009K       <NA>       0.9      135.0       <NA>   NA
> 4         4500  6009K       <NA>       1.1      165.0       <NA>   NA
> 5         4501  6009K       <NA>       1.5      225.0       <NA>   NA
> 2587      7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> 2588      7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> 2589      7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> 2590      7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> 2591      7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> 29793    35208  91630  1/21/2006       1.4      107.8 2006-01-21 2006
> 29794    35209  91630  1/21/2006       0.7       53.9 2006-01-21 2006
> 29795    35210  9725B  1/21/2006       1.4      138.6 2006-01-21 2006
> 29796    35212  91630  1/28/2006       1.0       77.0 2006-01-28 2006
> 29797    35213  91630  1/28/2006       1.6      123.2 2006-01-28 2006
> 29798    35214  3386E   1/5/2006       1.1       86.9 2006-01-05 2006
>
> I then try to extract the error years :
>
>  > errors <- flight[flight$year > 2006,]
>  > errors
>      FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> NA         NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.1       NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.2       NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.3       NA   <NA>       <NA>        NA         NA       <NA>   NA
> NA.4       NA   <NA>       <NA>        NA         NA       <NA>   NA
> 2587     7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> 2588     7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> 2589     7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> 2590     7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> 2591     7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
>
> Would someone please explain to me why the new data frame has all
> columns (and row names) replaced with NA where year was NA and how to
> avoid this behavior?.
> Thanks in advance.
>
> I am using R v2.2.1 on Windows XP.
>
> Cheers,
> eric

  [snip]

flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this:

errors <- subset(flight, subset = year > 2006)

Peter Ehlers

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subsetting and NAs

Eric Archer
Had I just looked at flight$year > 2006, I would've seen what was up.
Thanks much Peter!

Cheers,
eric

P Ehlers wrote:
>
>  [snip]
>
> flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this:
>
> errors <- subset(flight, subset = year > 2006)
>
> Peter Ehlers
>


--

Eric Archer, Ph.D.
NOAA-SWFSC
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121,7003(FAX)
[hidden email]


"Lighthouses are more helpful than churches."
    - Benjamin Franklin

"Cogita tute" - Think for yourself

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subsetting and NAs

Patrick Drechsler
In reply to this post by Eric Archer

Eric Archer wrote on 20 Mar 2006 19:46:44 MET:

> I'm getting some unexpected behavior with subsetting a data
> frame (aircraft flight data) that I can't sort out.  Here is a
> simplified version of my data frame and problem:
>
>  > flight
>       FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> 1         4497  6009K       <NA>       2.2      330.0       <NA>   NA
> 2         4498  6009K       <NA>       0.8      120.0       <NA>   NA
> 3         4499  6009K       <NA>       0.9      135.0       <NA>   NA
> 4         4500  6009K       <NA>       1.1      165.0       <NA>   NA
> 5         4501  6009K       <NA>       1.5      225.0       <NA>   NA
> 2587      7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> 2588      7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> 2589      7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> 2590      7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> 2591      7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> 29793    35208  91630  1/21/2006       1.4      107.8 2006-01-21 2006
> 29794    35209  91630  1/21/2006       0.7       53.9 2006-01-21 2006
> 29795    35210  9725B  1/21/2006       1.4      138.6 2006-01-21 2006
> 29796    35212  91630  1/28/2006       1.0       77.0 2006-01-28 2006
> 29797    35213  91630  1/28/2006       1.6      123.2 2006-01-28 2006
> 29798    35214  3386E   1/5/2006       1.1       86.9 2006-01-05 2006
>
> I then try to extract the error years :

flight <- flight[complete.cases(flight),]# <- delete rows with NaNs

> errors <- flight[flight$year > 2006,]
> errors

     FlightID TailNo FlightDate HobbsTime FlightCost                Date year
2587     7083  9206N   4/8/2009       1.5      103.5 2009-04-08 08:00:00 2009
2588     7084  9206N  4/10/2009       1.3       89.7 2009-04-10 08:00:00 2009
2589     7085  9206N  4/11/2009       1.9      131.1 2009-04-11 08:00:00 2009
2590     7086  9206N  4/12/2009       1.3       89.7 2009-04-12 08:00:00 2009
2591     7087  9206N  4/15/2009       1.1       75.9 2009-04-15 08:00:00 2009


HTH

Patrick
--
Geld ist besser als Armut - wenn auch nur aus finanziellen GrĂ¼nden.
                                                      [Woody Allen]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subsetting and NAs

Gabor Grothendieck
In reply to this post by P Ehlers
On 3/20/06, P Ehlers <[hidden email]> wrote:

>
>
> Eric Archer wrote:
> > R-help,
> >
> > I'm getting some unexpected behavior with subsetting a data frame
> > (aircraft flight data) that I can't sort out.
> > Here is a simplified version of my data frame and problem:
> >
> >  > flight
> >       FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> > 1         4497  6009K       <NA>       2.2      330.0       <NA>   NA
> > 2         4498  6009K       <NA>       0.8      120.0       <NA>   NA
> > 3         4499  6009K       <NA>       0.9      135.0       <NA>   NA
> > 4         4500  6009K       <NA>       1.1      165.0       <NA>   NA
> > 5         4501  6009K       <NA>       1.5      225.0       <NA>   NA
> > 2587      7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> > 2588      7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> > 2589      7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> > 2590      7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> > 2591      7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> > 29793    35208  91630  1/21/2006       1.4      107.8 2006-01-21 2006
> > 29794    35209  91630  1/21/2006       0.7       53.9 2006-01-21 2006
> > 29795    35210  9725B  1/21/2006       1.4      138.6 2006-01-21 2006
> > 29796    35212  91630  1/28/2006       1.0       77.0 2006-01-28 2006
> > 29797    35213  91630  1/28/2006       1.6      123.2 2006-01-28 2006
> > 29798    35214  3386E   1/5/2006       1.1       86.9 2006-01-05 2006
> >
> > I then try to extract the error years :
> >
> >  > errors <- flight[flight$year > 2006,]
> >  > errors
> >      FlightID TailNo FlightDate HobbsTime FlightCost       Date year
> > NA         NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.1       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.2       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.3       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > NA.4       NA   <NA>       <NA>        NA         NA       <NA>   NA
> > 2587     7083  9206N   4/8/2009       1.5      103.5 2009-04-08 2009
> > 2588     7084  9206N  4/10/2009       1.3       89.7 2009-04-10 2009
> > 2589     7085  9206N  4/11/2009       1.9      131.1 2009-04-11 2009
> > 2590     7086  9206N  4/12/2009       1.3       89.7 2009-04-12 2009
> > 2591     7087  9206N  4/15/2009       1.1       75.9 2009-04-15 2009
> >
> > Would someone please explain to me why the new data frame has all
> > columns (and row names) replaced with NA where year was NA and how to
> > avoid this behavior?.
> > Thanks in advance.
> >
> > I am using R v2.2.1 on Windows XP.
> >
> > Cheers,
> > eric
>
>  [snip]
>
> flight$year > 2006 will return TRUE/FALSE, not row numbers. Try this:
>
> errors <- subset(flight, subset = year > 2006)
>

Another solution is:

flight[which(flight$year > 2006),]

Also note that the problem is not the TRUE and FALSE.  The problem is
that in addition to the TRUE and FALSE entries there are NA entries.

For example flight2 had no NA entries the original code works fine:

flight2 <- na.omit(flight)
flight2[flight2$year > 2006,] # ok

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Loading...