Hi Sergey,

This is not an answer to your exact question, but can you use a

matrix? If you can use a matrix instead of a data frame, you should

get a considerable performance boost. Even for very large matrices

(at least on my system), it is fast enough I find it hard to believe

it is a bottle neck in the overall imputation process. For example,

for a 1000 by 100 object

as a data frame:

> system.time(r0 <- random.del(mat, 100, 50))

user system elapsed

1.09 0.02 1.12

and as a matrix:

> system.time(r0 <- random.del(mat, 100, 50))

user system elapsed

0.02 0.00 0.01

Beyond that, for very large objects, this revision gives a slight

(i.e., around 5 seconds for 1 million by 100 column object on my

system) performance increase, which is small for matrices and

completely dwarfed by other bottlenecks for data frames, at the cost

of readability/flexibility:

rdel <- function (x, n.keeprows, del.percent){

n.items <- ncol(x)

k <- as.integer(n.items * del.percent / 100)

cols <- 1:n.items

lcols <- length(cols)

for (i in (n.keeprows+1):nrow(x)){

j <- cols[.Internal(sample(lcols, k, FALSE, NULL))]

x[i,j] <- NA

}

return(x)

}

If you must use a data frame, you can gain some performance increase

(for a 10000 by 100 data frame, it takes about 30 seconds on my system

versus 40 for your original function) by using:

random.del2 <- function (x, n.keeprows, del.percent){

n.items <- ncol(x)

k <- n.items*(del.percent/100)

for (i in (n.keeprows+1):nrow(x)){

j <- sample(1:n.items, k)

`[<-.data.frame`(x, i, j, NA)

}

return(x)

}

which basically just saves R the trouble of figuring out which

assignment method to use. Of course the problem is that your function

becomes extremely specialized. If you pass anything to it but a data

frame, good things will not happen.

Cheers,

Josh

On Sat, Apr 23, 2011 at 5:37 PM, sneaffer <

[hidden email]> wrote:

> Hello R-world,

> Please, help me to get round my little mess

> I have a data.frame in which I'd rather like some values to be NA for the

> future imputation process.

>

> I've come up with the following piece of code:

>

> random.del <- function (x, n.keeprows, del.percent){

> n.items <- ncol(x)

> k <- n.items*(del.percent/100)

> x.del <- x

> for (i in (n.keeprows+1):nrow(x)){

> j <- sample(1:n.items, k)

> x.del[i,j] <- NA

> }

> return (x.del)

> }

>

> The problems is that random.del turns out to be slow on huge samples.

> Is there any other more effective/charming way to do the same?

>

> Thanks,

> Sergey

>

> --

> View this message in context:

http://r.789695.n4.nabble.com/How-to-erase-replace-certain-elements-in-the-data-frame-tp3470883p3470883.html> Sent from the R help mailing list archive at Nabble.com.

>

> ______________________________________________

>

[hidden email] mailing list

>

https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

>

--

Joshua Wiley

Ph.D. Student, Health Psychology

University of California, Los Angeles

http://www.joshuawiley.com/______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.