correcting a few data in a large data frame

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

correcting a few data in a large data frame

Mr. Natural
The data frame is lwf that records the survival of bushes over an 8 year period. Years are called bouts. Dead bushes are recorded as zeros, and live bushes as "1."
str(lwf)
'data.frame':   638 obs. of  9 variables:
 $ bushno: int  1 2 3 4 5 6 7 8 9 10 ...
 $ bout1 : int  0 1 0 1 1 1 0 1 0 1 ...
 $ bout2 : int  0 1 0 0 0 0 0 0 0 1 ...
 $ bout3 : int  0 1 0 0 0 0 0 0 0 1 ...
 $ bout4 : int  0 1 0 0 0 0 0 0 0 0 ...
 $ bout5 : int  0 1 0 0 0 0 0 0 0 0 ...
 $ bout6 : int  0 1 0 0 0 0 0 0 0 0 ...
 $ bout7 : int  0 1 0 0 0 0 0 0 0 0 ...
 $ bout8 : int  0 1 0 0 0 0 0 0 0 0 ...

head(lwf)
  bushno bout1 bout2 bout3 bout4 bout5 bout6 bout7 bout8
1      1     0     0     0     0     0     0     0     0
2      2     1     1     1     1     1     1     1     1
3      3     0     0     0     0     0     0     0     0
4      4     1     0     0     0     0     0     0     0
5      5     1     0     0     0     0     0     0     0
6      6     1     0     0     0     0     0     0     0

A number of the data are incorrect. For example, that for bush 145 in year three is recorded as dead="0"
when it should be alive ="1."  The bushes do not come back to life after they die.

> lwf[lw$bushno==145,]
    bushno bout1 bout2 bout3 bout4 bout5 bout6 bout7 bout8
144    145     1     1     0     1     1     1     1     1


I know that I can do this with fix(lwf) or edit(lwf). However, I would like to learn some more R.
What code could I use to correct these data?

I have been screwing around with such as
lwfb[(lwf$bushno==145) & (lwf$bout3==0),0]<- lwf[(lwf$bushno==145) & (lwf$bout3==0),1]
to no avail.
Any help appreciated.Thanks, MN


Reply | Threaded
Open this post in threaded view
|

Re: correcting a few data in a large data frame

David Winsemius

On May 31, 2010, at 5:29 PM, Mr. Natural wrote:

>
> The data frame is lwf that records the survival of bushes over an 8  
> year
> period. Years are called bouts. Dead bushes are recorded as zeros,  
> and live
> bushes as "1."
> str(lwf)
> 'data.frame':   638 obs. of  9 variables:
> $ bushno: int  1 2 3 4 5 6 7 8 9 10 ...
> $ bout1 : int  0 1 0 1 1 1 0 1 0 1 ...
> $ bout2 : int  0 1 0 0 0 0 0 0 0 1 ...
> $ bout3 : int  0 1 0 0 0 0 0 0 0 1 ...
> $ bout4 : int  0 1 0 0 0 0 0 0 0 0 ...
> $ bout5 : int  0 1 0 0 0 0 0 0 0 0 ...
> $ bout6 : int  0 1 0 0 0 0 0 0 0 0 ...
> $ bout7 : int  0 1 0 0 0 0 0 0 0 0 ...
> $ bout8 : int  0 1 0 0 0 0 0 0 0 0 ...
>
> head(lwf)
>  bushno bout1 bout2 bout3 bout4 bout5 bout6 bout7 bout8
> 1      1     0     0     0     0     0     0     0     0
> 2      2     1     1     1     1     1     1     1     1
> 3      3     0     0     0     0     0     0     0     0
> 4      4     1     0     0     0     0     0     0     0
> 5      5     1     0     0     0     0     0     0     0
> 6      6     1     0     0     0     0     0     0     0
>
> A number of the data are incorrect. For example, that for bush 145  
> in year
> three is recorded as dead="0"
> when it should be alive ="1."  The bushes do not come back to life  
> after
> they die.
>
>> lwf[lw$bushno==145,]
>    bushno bout1 bout2 bout3 bout4 bout5 bout6 bout7 bout8
> 144    145     1     1     0     1     1     1     1     1
>
>
> I know that I can do this with fix(lwf) or edit(lwf). However, I  
> would like
> to learn some more R.
> What code could I use to correct these data?

rle is a function that records lengths of runs and values. Your  
problem is to find rows where the length of the rle encoded data is  
more than two. Perhaps something like:

apply(lwf[ , -1], 1, function(x){ length( rle(x)$values ) >2 } )

>
> I have been screwing around with such as
> lwfb[(lwf$bushno==145) & (lwf$bout3==0),0]<- lwf[(lwf$bushno==145) &
> (lwf$bout3==0),1]
> to no avail.

If all you want to do is correct these by hand then:

lwf[lwf$bushno==145 , "bout3"] <- 1

Or if you want to work on a copy (safer):

lwfb <- lwf
lwfb[lwfb$bushno==145 , "bout3"] <- 1

> Any help appreciated.Thanks, MN
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/correcting-a-few-data-in-a-large-data-frame-tp2237834p2237834.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: correcting a few data in a large data frame

djmuseR
In reply to this post by Mr. Natural
Hi:

A simple diagnostic is to check how many distinct run lengths exist
in a row - ideally, it should be one or two. If it's more than two,
something
is amiss. Hence, define f() as a function to determine the number of
distinct
runs in a given row and call the apply() function with it:

f <- function(x) length(rle(x)$lengths)
apply(lwf[, -1], 1, f)
1 2 3 4 5 6
1 1 1 2 2 2

HTH,
Dennis

On Mon, May 31, 2010 at 2:29 PM, Mr. Natural <[hidden email]> wrote:

>
> The data frame is lwf that records the survival of bushes over an 8 year
> period. Years are called bouts. Dead bushes are recorded as zeros, and live
> bushes as "1."
> str(lwf)
> 'data.frame':   638 obs. of  9 variables:
>  $ bushno: int  1 2 3 4 5 6 7 8 9 10 ...
>  $ bout1 : int  0 1 0 1 1 1 0 1 0 1 ...
>  $ bout2 : int  0 1 0 0 0 0 0 0 0 1 ...
>  $ bout3 : int  0 1 0 0 0 0 0 0 0 1 ...
>  $ bout4 : int  0 1 0 0 0 0 0 0 0 0 ...
>  $ bout5 : int  0 1 0 0 0 0 0 0 0 0 ...
>  $ bout6 : int  0 1 0 0 0 0 0 0 0 0 ...
>  $ bout7 : int  0 1 0 0 0 0 0 0 0 0 ...
>  $ bout8 : int  0 1 0 0 0 0 0 0 0 0 ...
>
> head(lwf)
>  bushno bout1 bout2 bout3 bout4 bout5 bout6 bout7 bout8
> 1      1     0     0     0     0     0     0     0     0
> 2      2     1     1     1     1     1     1     1     1
> 3      3     0     0     0     0     0     0     0     0
> 4      4     1     0     0     0     0     0     0     0
> 5      5     1     0     0     0     0     0     0     0
> 6      6     1     0     0     0     0     0     0     0
>
> A number of the data are incorrect. For example, that for bush 145 in year
> three is recorded as dead="0"
> when it should be alive ="1."  The bushes do not come back to life after
> they die.
>
> > lwf[lw$bushno==145,]
>    bushno bout1 bout2 bout3 bout4 bout5 bout6 bout7 bout8
> 144    145     1     1     0     1     1     1     1     1
>
>
> I know that I can do this with fix(lwf) or edit(lwf). However, I would like
> to learn some more R.
> What code could I use to correct these data?
>
> I have been screwing around with such as
> lwfb[(lwf$bushno==145) & (lwf$bout3==0),0]<- lwf[(lwf$bushno==145) &
> (lwf$bout3==0),1]
> to no avail.
> Any help appreciated.Thanks, MN
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/correcting-a-few-data-in-a-large-data-frame-tp2237834p2237834.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: correcting a few data in a large data frame

Mr. Natural
Dennis. Tres cool. I will try it. regards, MN
Reply | Threaded
Open this post in threaded view
|

Re: correcting a few data in a large data frame. Thanks

Mr. Natural
In reply to this post by David Winsemius
David: Thanks. I cannot believe that I had not tried the simple,
lwf[lwf$bushno==145 , "bout3"] <- 1


I will mess around with the runs suggestion too.
Thanks, Don