replacing percentage of values in data frame

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

replacing percentage of values in data frame

a217
I've been looking for how to change a certain percentage of values in a data frame, but I've been struggling to find information in R.

For example:

#################example data##############
> data
      V1    V2    V3  V4 V5  V6 V7
1   chr1   500   500 CHH  0 0.5  +
2   chr1   550   550 CHH  0 0.0  +
3   chr2   700   700 CHH  0 0.0  +
4   chr2  1000  1000 CHH  0 0.0  +
5   chr3   100   100 CHH  0 0.0  +
6   chr4   450   450  CG  0 0.0  +
7   chr5   450   450 CHH  0 0.0  +
8   chr5 50034 50034 CHG  0 0.0  +
9   chr7 50055 50055 CHG  0 0.0  +
10 chr10 50063 50063 CHH  0 0.0  +

> dput(data)
structure(list(V1 = structure(c(1L, 1L, 3L, 3L, 4L, 5L, 6L, 6L,
7L, 2L), .Label = c("chr1", "chr10", "chr2", "chr3", "chr4",
"chr5", "chr7"), class = "factor"), V2 = c(500L, 550L, 700L,
1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L), V3 = c(500L,
550L, 700L, 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L),
    V4 = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 3L), .Label = c("CG",
    "CHG", "CHH"), class = "factor"), V5 = c(0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0.5, 0, 0, 0, 0, 0, 0, 0,
    0, 0), V7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L), .Label = "+", class = "factor")), .Names = c("V1", "V2",
"V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA,
-10L))
>
############################

Say for instance, I'd like to change 20% of values in column 6 to '1' instead of zero or whatever value may be currently present. How would I approach this?

I am working with a large data frame and I need to replace values in one of the columns for 10-20% of the entire dataset. I hope what I am trying to convey is understandable to you.
Reply | Threaded
Open this post in threaded view
|

Re: replacing percentage of values in data frame

Henrique Dallazuanna
Try this:

data$V6[sample(nrow(data), ceiling(length(data$V6) * 0.2))] <- 1

On Wed, Oct 19, 2011 at 9:38 PM, a217 <[hidden email]> wrote:

> I've been looking for how to change a certain percentage of values in a data
> frame, but I've been struggling to find information in R.
>
> For example:
>
> #################example data##############
>> data
>      V1    V2    V3  V4 V5  V6 V7
> 1   chr1   500   500 CHH  0 0.5  +
> 2   chr1   550   550 CHH  0 0.0  +
> 3   chr2   700   700 CHH  0 0.0  +
> 4   chr2  1000  1000 CHH  0 0.0  +
> 5   chr3   100   100 CHH  0 0.0  +
> 6   chr4   450   450  CG  0 0.0  +
> 7   chr5   450   450 CHH  0 0.0  +
> 8   chr5 50034 50034 CHG  0 0.0  +
> 9   chr7 50055 50055 CHG  0 0.0  +
> 10 chr10 50063 50063 CHH  0 0.0  +
>
>> dput(data)
> structure(list(V1 = structure(c(1L, 1L, 3L, 3L, 4L, 5L, 6L, 6L,
> 7L, 2L), .Label = c("chr1", "chr10", "chr2", "chr3", "chr4",
> "chr5", "chr7"), class = "factor"), V2 = c(500L, 550L, 700L,
> 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L), V3 = c(500L,
> 550L, 700L, 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L),
>    V4 = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 3L), .Label =
> c("CG",
>    "CHG", "CHH"), class = "factor"), V5 = c(0L, 0L, 0L, 0L,
>    0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0.5, 0, 0, 0, 0, 0, 0, 0,
>    0, 0), V7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>    1L), .Label = "+", class = "factor")), .Names = c("V1", "V2",
> "V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA,
> -10L))
>>
> ############################
>
> Say for instance, I'd like to change 20% of values in column 6 to '1'
> instead of zero or whatever value may be currently present. How would I
> approach this?
>
> I am working with a large data frame and I need to replace values in one of
> the columns for 10-20% of the entire dataset. I hope what I am trying to
> convey is understandable to you.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/replacing-percentage-of-values-in-data-frame-tp3920484p3920484.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.