Cleaning data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Cleaning data

bayan sardini
Hi

I want to clean my data frame, based on the age column, whereas i want to delete the rows that the difference between its elements (i+1)-i= integer. i used

a <- diff(df$age)
for(i in a){if(is.integer(a) == true){df <- df[-a,]
}}

but, it doesn’t work, any ideas

Thanks in advance
Bayan
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Cleaning data

Eric Berger
Hi Bayan,
In your code, 'a' is a vector and is.integer(a) is a logical of length 1 -
most likely FALSE if even one element of a is not an integer. (Since R will
coerce all the elements of a to the same type.)
You need to decide whether something "close enough" to an integer is to be
considered an integer - e.g. a distance of 0.000001 = 1e-6.

 a <- df$age
df <- df[ c( TRUE, abs( a - round(a,0) )%%1 ) > 1e-6 ), ]

I added the 'TRUE' at the beginning to always keep the first row of df. If
you prefer to always keep the last row then move the TRUE to the end.

HTH,

Eric




On Tue, Sep 26, 2017 at 12:50 PM, bayan sardini <[hidden email]>
wrote:

> Hi
>
> I want to clean my data frame, based on the age column, whereas i want to
> delete the rows that the difference between its elements (i+1)-i= integer.
> i used
>
> a <- diff(df$age)
> for(i in a){if(is.integer(a) == true){df <- df[-a,]
> }}
>
> but, it doesn’t work, any ideas
>
> Thanks in advance
> Bayan
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Cleaning data

Jim Lemon-4
In reply to this post by bayan sardini
Hi Bayan,
Your question seems to imply that the "age" column contains floating
point numbers, e.g.

df
height  weight  age
170      72         21.5
...

If this is so, you will only find an integer in diff(age) if two
adjacent numbers happen to have the same decimal fraction _and_ the
subtraction does not produce a very small decimal remainder due to one
or both of the numbers being unable to be represented exactly in
binary notation as Eric pointed out. This seems an unusual criterion
for discarding values. Perhaps if you explain why an integer result is
undesirable it would help. It can be done:

badrows<-which(is.integer(diff(df$age)))
df<-df[-badrows,]

OR

df<-df[badrows+1,]

if you want to delete the second rather than the first age.

Jim

On Tue, Sep 26, 2017 at 7:50 PM, bayan sardini <[hidden email]> wrote:

> Hi
>
> I want to clean my data frame, based on the age column, whereas i want to delete the rows that the difference between its elements (i+1)-i= integer. i used
>
> a <- diff(df$age)
> for(i in a){if(is.integer(a) == true){df <- df[-a,]
> }}
>
> but, it doesn’t work, any ideas
>
> Thanks in advance
> Bayan
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.