How to erase (replace) certain elements in the data.frame?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to erase (replace) certain elements in the data.frame?

sneaffer
Hello R-world,
Please, help me to get round my little mess
I have a data.frame in which I'd rather like some values to be NA for the future imputation process.

I've come up with the following piece of code:

random.del <- function (x, n.keeprows, del.percent){
  n.items <- ncol(x)
  k <- n.items*(del.percent/100)
  x.del <- x
  for (i in (n.keeprows+1):nrow(x)){
    j <- sample(1:n.items, k)
    x.del[i,j] <- NA
  }
  return (x.del)
}
 
The problem is that random.del() turns out to be slow on large samples.
Is there any other more effective/charming way to do the same?

Thanks,
Sergey
Reply | Threaded
Open this post in threaded view
|

Re: How to erase (replace) certain elements in the data.frame?

Thomas Levine
This should do the same thing

random.del <- function (x, n.keeprows, del.percent){
  del<-function(col){
    col[sample.int(length(col),length(col)*del.percent/100)]<-NA
    col
  }
  change<-n.keeprows:nrow(x)
  x[change,]<-lapply(x[change,],del)
  x
}

This is faster because it's vectorized.

[1] "Mine"
   user  system elapsed
  0.004   0.000   0.002
[1] "Yours"
   user  system elapsed
  1.172   0.020   1.193

Tom

On Sat, Apr 23, 2011 at 8:37 PM, sneaffer <[hidden email]> wrote:

>
> Hello R-world,
> Please, help me to get round my little mess
> I have a data.frame in which I'd rather like some values to be NA for the
> future imputation process.
>
> I've come up with the following piece of code:
>
> random.del <- function (x, n.keeprows, del.percent){
>  n.items <- ncol(x)
>  k <- n.items*(del.percent/100)
>  x.del <- x
>  for (i in (n.keeprows+1):nrow(x)){
>    j <- sample(1:n.items, k)
>    x.del[i,j] <- NA
>  }
>  return (x.del)
> }
>
> The problems is that random.del turns out to be slow on huge samples.
> Is there any other more effective/charming way to do the same?
>
> Thanks,
> Sergey
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to erase (replace) certain elements in the data.frame?

Joshua Wiley-2
In reply to this post by sneaffer
Hi Sergey,

This is not an answer to your exact question, but can you use a
matrix?  If you can use a matrix instead of a data frame, you should
get a considerable performance boost.  Even for very large matrices
(at least on my system), it is fast enough I find it hard to believe
it is a bottle neck in the overall imputation process.  For example,
for a 1000 by 100 object
as a data frame:
> system.time(r0 <- random.del(mat, 100, 50))
   user  system elapsed
   1.09    0.02    1.12
and as a matrix:
> system.time(r0 <- random.del(mat, 100, 50))
   user  system elapsed
   0.02    0.00    0.01

Beyond that, for very large objects, this revision gives a slight
(i.e., around 5 seconds for 1 million by 100 column object on my
system) performance increase, which is small for matrices and
completely dwarfed by other bottlenecks for data frames, at the cost
of readability/flexibility:

rdel <- function (x, n.keeprows, del.percent){
  n.items <- ncol(x)
  k <- as.integer(n.items * del.percent / 100)
  cols <- 1:n.items
  lcols <- length(cols)
  for (i in (n.keeprows+1):nrow(x)){
    j <- cols[.Internal(sample(lcols, k, FALSE, NULL))]
    x[i,j] <- NA
  }
  return(x)
}

If you must use a data frame, you can gain some performance increase
(for a 10000 by 100 data frame, it takes about 30 seconds on my system
versus 40 for your original function) by using:

random.del2 <- function (x, n.keeprows, del.percent){
  n.items <- ncol(x)
  k <- n.items*(del.percent/100)
  for (i in (n.keeprows+1):nrow(x)){
    j <- sample(1:n.items, k)
    `[<-.data.frame`(x, i, j, NA)
  }
  return(x)
}

which basically just saves R the trouble of figuring out which
assignment method to use.  Of course the problem is that your function
becomes extremely specialized.  If you pass anything to it but a data
frame, good things will not happen.

Cheers,

Josh

On Sat, Apr 23, 2011 at 5:37 PM, sneaffer <[hidden email]> wrote:

> Hello R-world,
> Please, help me to get round my little mess
> I have a data.frame in which I'd rather like some values to be NA for the
> future imputation process.
>
> I've come up with the following piece of code:
>
> random.del <- function (x, n.keeprows, del.percent){
>  n.items <- ncol(x)
>  k <- n.items*(del.percent/100)
>  x.del <- x
>  for (i in (n.keeprows+1):nrow(x)){
>    j <- sample(1:n.items, k)
>    x.del[i,j] <- NA
>  }
>  return (x.del)
> }
>
> The problems is that random.del turns out to be slow on huge samples.
> Is there any other more effective/charming way to do the same?
>
> Thanks,
> Sergey
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to erase (replace) certain elements in the data.frame?

sneaffer
In reply to this post by Thomas Levine
Thanks a lot, guys.
Thomas, your method is great, precisely the thing I've been looking forward to.
Oh dear, how I love R for those list comprehension tricks!
Reply | Threaded
Open this post in threaded view
|

Re: How to erase (replace) certain elements in the data.frame?

Joshua Wiley-2
In reply to this post by Thomas Levine
On Sat, Apr 23, 2011 at 11:35 PM, Thomas Levine <[hidden email]> wrote:
> This should do the same thing

Did you actually test it?  I get very different things.

>
> random.del <- function (x, n.keeprows, del.percent){
>   del<-function(col){
>     col[sample.int(length(col),length(col)*del.percent/100)]<-NA
>     col
>   }
>   change<-n.keeprows:nrow(x)
>   x[change,]<-lapply(x[change,],del)

but a data frame is a list of vectors column wise, while Sergey's
function went row by row.  However, using sample.int() is a much
better idea than what I did with sample().

>   x
> }
>
> This is faster because it's vectorized.

but in such a way that you cannot guarantee the same number of cells
are missing from each row.  Try:

rowSums(is.na("Mine"))

>
> [1] "Mine"
>   user  system elapsed
>  0.004   0.000   0.002
> [1] "Yours"
>   user  system elapsed
>  1.172   0.020   1.193
>
> Tom
>
> On Sat, Apr 23, 2011 at 8:37 PM, sneaffer <[hidden email]> wrote:
>>
>> Hello R-world,
>> Please, help me to get round my little mess
>> I have a data.frame in which I'd rather like some values to be NA for the
>> future imputation process.
>>
>> I've come up with the following piece of code:
>>
>> random.del <- function (x, n.keeprows, del.percent){
>>  n.items <- ncol(x)
>>  k <- n.items*(del.percent/100)
>>  x.del <- x
>>  for (i in (n.keeprows+1):nrow(x)){
>>    j <- sample(1:n.items, k)
>>    x.del[i,j] <- NA
>>  }
>>  return (x.del)
>> }
>>
>> The problems is that random.del turns out to be slow on huge samples.
>> Is there any other more effective/charming way to do the same?
>>
>> Thanks,
>> Sergey
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to erase (replace) certain elements in the data.frame?

Thomas Levine
In reply to this post by sneaffer
As Joshua said, mine was indeed different from yours. And it didn't
work on non-numeric data. But this one seems to work right:

random.del_vec <- function (x, n.keeprows, del.percent){
  del<-function(notkeep){
    k<-floor(length(notkeep)*del.percent/100)
    notkeep[sample.int(length(notkeep),k)]<-NA
    notkeep
  }
  change<-(n.keeprows+1):nrow(x)
   x[change,]<-t(apply(x[change,],1,del))
  x
}

On the other hand, maybe you really didn't want the stratification by row.

Tom

On Sun, Apr 24, 2011 at 8:31 AM, sneaffer <[hidden email]> wrote:

> Thanks a lot, guys.
> Thomas, your method is great, precisely the thing I've been looking forward
> to.
> Oh dear, how I love R for those list comprehension tricks!
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3471380.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.