Hi,
I need to sample randomly my dataset for 1000 times. The sample need to be the 80%. I know how to do that, my problem is that not only I need the 80%, but I also need the corresponding 20% each time. Is there any way to do that? Alternatively, I was thinking to something like setdiff () function to compare my 80% sample to the original dataset and obtain the corresponding 20%, unfortunately setdiff works just for vectors, do you know a similar function for dataframes? Thanks |
You could make a vector containing the number of TRUE values that
makes up 80% of your data, and the number of FALSE values that makes up 20% of your data. Use sample() to reorder it, then use it to divide your dataset. If you had provided a reproducible example, I could write you code. Sarah On Wed, Mar 7, 2012 at 11:41 AM, Oritteropus <[hidden email]> wrote: > Hi, > I need to sample randomly my dataset for 1000 times. The sample need to be > the 80%. I know how to do that, my problem is that not only I need the 80%, > but I also need the corresponding 20% each time. Is there any way to do > that? > Alternatively, I was thinking to something like setdiff () function to > compare my 80% sample to the original dataset and obtain the corresponding > 20%, unfortunately setdiff works just for vectors, do you know a similar > function for dataframes? > Thanks > -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Oritteropus
On Wed, Mar 07, 2012 at 08:41:35AM -0800, Oritteropus wrote:
> Hi, > I need to sample randomly my dataset for 1000 times. The sample need to be > the 80%. I know how to do that, my problem is that not only I need the 80%, > but I also need the corresponding 20% each time. Is there any way to do > that? Hi. If you use sample() to get the 80% and store the indices, you can also get the remaining cases a <- matrix(1:30, ncol=3) i <- sample(10, 8) a[sort(i), ] [,1] [,2] [,3] [1,] 1 11 21 [2,] 2 12 22 [3,] 3 13 23 [4,] 4 14 24 [5,] 6 16 26 [6,] 7 17 27 [7,] 8 18 28 [8,] 10 20 30 a[-i, ] [,1] [,2] [,3] [1,] 5 15 25 [2,] 9 19 29 Hope this helps. Petr Savicky. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Oritteropus
On Mar 7, 2012, at 11:41 AM, Oritteropus wrote: > Hi, > I need to sample randomly my dataset for 1000 times. The sample need > to be > the 80%. I know how to do that, my problem is that not only I need > the 80%, > but I also need the corresponding 20% each time. Is there any way to > do > that? > Alternatively, I was thinking to something like setdiff () function to > compare my 80% sample to the original dataset and obtain the > corresponding > 20%, unfortunately setdiff works just for vectors, do you know a > similar > function for dataframes? Create an index vector with runif or sample and then use that to get you sample and use negative indexing to get the remainder. idx <- sample(1:1000, 800) x[ idx, ] # 80% x[ -idx, ] # the other 20% (I think this does presume you have not mucked with the default rownames.) -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi, thank you but it does work for vectors and matrix but not dataframes, it gives me this message error:
MeanA <- read.csv("MeanAmf.csv",header=T) mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] remainder<-MeanA[-mysample] Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' In Ops.factor(left) : - not meaningful for factors Any other way? |
In reply to this post by Sarah Goslee
Hi sarah, it is not clear to me how to do that, can you show me please?
Imagine I have a situation like this: MeanA <- read.csv("MeanAmf.csv",header=T) mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Then? |
In reply to this post by Oritteropus
Hi
I have only faint idea what was you problem as there is no context in you message but maybe remainder<-MeanA[-mysample, ] could work. Regards Petr > > Hi, thank you but it does work for vectors and matrix but not dataframes, it > gives me this message error: > > MeanA <- read.csv("MeanAmf.csv",header=T) > mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] > remainder<-MeanA[-mysample] > Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' > In Ops.factor(left) : - not meaningful for factors > > Any other way? > > -- > View this message in context: http://r.789695.n4.nabble.com/Sampling- > problems-tp4453752p4455912.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Oritteropus
>
> Hi, thank you but it does work for vectors and matrix but not dataframes, it > gives me this message error: > > MeanA <- read.csv("MeanAmf.csv",header=T) > mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] Well, maybe slight correction mysample <- sample(1:nrow(MeanA), 20, replace=FALSE) chosen.one<-MeanA[mysample,] remainder<-MeanA[-mysample,] Regards Petr > remainder<-MeanA[-mysample] > Error in `[.default`(MeanA, -mysample) : invalid subscript type 'list' > In Ops.factor(left) : - not meaningful for factors > > Any other way? > > -- > View this message in context: http://r.789695.n4.nabble.com/Sampling- > problems-tp4453752p4455912.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by PIKAL Petr
Thanks, but it doesn't work either, it gives me the same message error.
It works just if my first sample is taken in this way: mysample <- sample(1:nrow(MeanA), 20, replace=FALSE) However, in this way it sample just the number of rows: [1] 71 24 12 36 2 39 69 62 43 38 9 44 13 54 50 63 67 66 37 28 but not the data inside. I need to sample in this way: mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] to get a sample like this HRkm Mean.mf Mean.mfm Loc Diet Terr Soc Type Soc.Ter W.cat.0.25 W.cat.0.5 -2.49 -0.43 2.57 A O T S D TS b 23 -2.05 0.67 T C N S D NS A This is an example of my dataframe |
Please use dput() to give a reproducible example: I can make this work
on a data frame quite easily -- x <- data.frame(1:10, letters[1:10], rnorm(10)) str(x) print(x) x[sample(nrow(x), 5), ] So it's not a problem with something being a data frame or having factors. Michael On Thu, Mar 8, 2012 at 5:16 AM, Oritteropus <[hidden email]> wrote: > Thanks, but it doesn't work either, it gives me the same message error. > It works just if my first sample is taken in this way: > > mysample <- sample(1:nrow(MeanA), 20, replace=FALSE) > > However, in this way it sample just the number of rows: > [1] 71 24 12 36 2 39 69 62 43 38 9 44 13 54 50 63 67 66 37 28 > > but not the data inside. I need to sample in this way: > > mysample <- MeanA[sample(1:nrow(MeanA), 20, replace=FALSE),] > > to get a sample like this > > HRkm Mean.mf Mean.mfm Loc Diet Terr > Soc Type Soc.Ter W.cat.0.25 W.cat.0.5 > -2.49 -0.43 2.57 A > O T S D > TS b > 23 -2.05 0.67 T > C N S D > NS A > > This is an example of my dataframe > > -- > View this message in context: http://r.789695.n4.nabble.com/Sampling-problems-tp4453752p4456048.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Oritteropus
Hi everybody,
Thank you all for your suggestions, you have been very helpful. However at the end I solved in this way: mysample <- MaxDH[sample(1:nrow(MaxDH), 150, replace=FALSE),] A<-mysample[1:120,] B<-mysample[121:150,] So simple at the end... Best, Luca |
Free forum by Nabble | Edit this page |