# Sampling problems

11 messages
Open this post in threaded view
|

## Sampling problems

 Hi, I need to sample randomly my dataset for 1000 times. The sample need to be the 80%. I know how to do that, my problem is that not only I need the 80%, but I also need the corresponding 20% each time. Is there any way to do that? Alternatively, I was thinking to something like setdiff () function to compare my 80% sample to the original dataset and obtain the corresponding 20%, unfortunately setdiff works just for vectors, do you know a similar function for dataframes? Thanks
Open this post in threaded view
|

## Re: Sampling problems

 You could make a vector containing the number of TRUE values that makes up 80% of your data, and the number of FALSE values that makes up 20% of your data. Use sample() to reorder it, then use it to divide your dataset. If you had provided a reproducible example, I could write you code. Sarah On Wed, Mar 7, 2012 at 11:41 AM, Oritteropus <[hidden email]> wrote: > Hi, > I need to sample randomly my dataset for 1000 times. The sample need to be > the 80%. I know how to do that, my problem is that not only I need the 80%, > but I also need the corresponding 20% each time. Is there any way to do > that? > Alternatively, I was thinking to something like setdiff () function to > compare my 80% sample to the original dataset and obtain the corresponding > 20%, unfortunately setdiff works just for vectors, do you know a similar > function for dataframes? > Thanks > -- Sarah Goslee http://www.functionaldiversity.org______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Sampling problems

 In reply to this post by Oritteropus On Wed, Mar 07, 2012 at 08:41:35AM -0800, Oritteropus wrote: > Hi, > I need to sample randomly my dataset for 1000 times. The sample need to be > the 80%. I know how to do that, my problem is that not only I need the 80%, > but I also need the corresponding 20% each time. Is there any way to do > that? Hi. If you use sample() to get the 80% and store the indices, you can also get the remaining cases   a <- matrix(1:30, ncol=3)   i <- sample(10, 8)   a[sort(i), ]        [,1] [,2] [,3]   [1,]    1   11   21   [2,]    2   12   22   [3,]    3   13   23   [4,]    4   14   24   [5,]    6   16   26   [6,]    7   17   27   [7,]    8   18   28   [8,]   10   20   30   a[-i, ]        [,1] [,2] [,3]   [1,]    5   15   25   [2,]    9   19   29 Hope this helps. Petr Savicky. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Sampling problems

 In reply to this post by Oritteropus On Mar 7, 2012, at 11:41 AM, Oritteropus wrote: > Hi, > I need to sample randomly my dataset for 1000 times. The sample need   > to be > the 80%. I know how to do that, my problem is that not only I need   > the 80%, > but I also need the corresponding 20% each time. Is there any way to   > do > that? > Alternatively, I was thinking to something like setdiff () function to > compare my 80% sample to the original dataset and obtain the   > corresponding > 20%, unfortunately setdiff works just for vectors, do you know a   > similar > function for dataframes? Create an index vector with runif or sample and then use that to get   you sample and use negative indexing to get the remainder. idx <- sample(1:1000, 800) x[ idx, ]  # 80% x[ -idx, ] # the other 20% (I think this does presume you have not mucked with the default   rownames.) -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Sampling problems

 Hi, thank you but it does work for vectors and matrix but not dataframes, it gives me this message error: MeanA <- read.csv("MeanAmf.csv",header=T) mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] remainder<-MeanA[-mysample] Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' In Ops.factor(left) : - not meaningful for factors Any other way?
Open this post in threaded view
|

## Re: Sampling problems

 In reply to this post by Sarah Goslee Hi sarah, it is not clear to me how to do that, can you show me please? Imagine I have a situation like this: MeanA <- read.csv("MeanAmf.csv",header=T) mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Then?
Open this post in threaded view
|

## Re: Sampling problems

 In reply to this post by Oritteropus Hi I have only faint idea what was you problem as there is no context in you message but maybe remainder<-MeanA[-mysample, ] could work. Regards Petr > > Hi, thank you but it does work for vectors and matrix but not dataframes, it > gives me this message error: > > MeanA <- read.csv("MeanAmf.csv",header=T) > mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] > remainder<-MeanA[-mysample] > Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' > In Ops.factor(left) : - not meaningful for factors > > Any other way? > > -- > View this message in context: http://r.789695.n4.nabble.com/Sampling-> problems-tp4453752p4455912.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Sampling problems

 In reply to this post by Oritteropus > > Hi, thank you but it does work for vectors and matrix but not dataframes, it > gives me this message error: > > MeanA <- read.csv("MeanAmf.csv",header=T) > mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Well, maybe slight correction mysample <- sample(1:nrow(MeanA), 20, replace=FALSE) chosen.one<-MeanA[mysample,] remainder<-MeanA[-mysample,] Regards Petr > remainder<-MeanA[-mysample] > Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' > In Ops.factor(left) : - not meaningful for factors > > Any other way? > > -- > View this message in context: http://r.789695.n4.nabble.com/Sampling-> problems-tp4453752p4455912.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Sampling problems

 In reply to this post by PIKAL Petr Thanks, but it doesn't work either, it gives me the same message error. It works just if my first sample is taken in this way: mysample <- sample(1:nrow(MeanA), 20, replace=FALSE) However, in this way it sample just the number of rows:  [1] 71 24 12 36  2 39 69 62 43 38  9 44 13 54 50 63 67 66 37 28 but not the data inside.  I need to sample in this way: mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] to get a sample like this HRkm        Mean.mf         Mean.mfm         Loc         Diet         Terr Soc         Type         Soc.Ter         W.cat.0.25         W.cat.0.5 -2.49                -0.43                2.57                       A            O                T                       S                   D                      TS                          b         23                     -2.05                0.67                       T            C                N                       S                    D                      NS                       A This is an example of my dataframe