doing 1000 permutations and doing test statistics distribution

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

doing 1000 permutations and doing test statistics distribution

anikaM
Hello,

I have a matrix
> dim(dat)
[1] 15568   132

It looks like this:

                   NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
           4.33             4.63             3.85
Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
           6.26             6.24             5.99
W38p0ogk.wIBVRXllY             7.13             7.35             7.55
           7.37             7.36             7.55
QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
           5.39             4.75             4.96
BZKiEvS0eQ305U0v34             6.35             7.02             6.76
           5.45             5.25             5.02
6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
           5.61             5.66             5.37

So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
named ex. NoD_14381_norm.1


How to do 1000 permutations of these 132 columns and on each created
new permuted matrix perform this code:

subject="all_replicate"
targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep=''))
Treat <- factor(targets$Treatment,levels=c("C","T"))
Replicates <- factor(targets$rep)
design <- model.matrix(~Replicates+Treat)
corfit <- duplicateCorrelation(dat, block = targets$Subject)
corfit$consensus.correlation
fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
fit<-eBayes(fit)
qval.cutoff=0.1; FC.cutoff=0.17
y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes)

y1 for each iteration of permutation would  have P.Value column and
these I would have plotted on the end to find the distribution of all
p values generated in those 1000 permutations.

Please advise,
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

Bert Gunter-2
If you just want to permute columns of a matrix,

?sample
> sample.int(10)
 [1]  9  2 10  8  4  6  3  1  5  7

and you can just use this as an index into the columns of your matrix,
presumably within a loop of some sort.

If I have misunderstood, just ignore.

Cheers,
Bert




On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]>
wrote:

> Hello,
>
> I have a matrix
> > dim(dat)
> [1] 15568   132
>
> It looks like this:
>
>                    NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
>            4.33             4.63             3.85
> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
>            6.26             6.24             5.99
> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
>            7.37             7.36             7.55
> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
>            5.39             4.75             4.96
> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
>            5.45             5.25             5.02
> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
>            5.61             5.66             5.37
>
> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
> named ex. NoD_14381_norm.1
>
>
> How to do 1000 permutations of these 132 columns and on each created
> new permuted matrix perform this code:
>
> subject="all_replicate"
> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt",
> sep=''))
> Treat <- factor(targets$Treatment,levels=c("C","T"))
> Replicates <- factor(targets$rep)
> design <- model.matrix(~Replicates+Treat)
> corfit <- duplicateCorrelation(dat, block = targets$Subject)
> corfit$consensus.correlation
> fit
> <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
> fit<-eBayes(fit)
> qval.cutoff=0.1; FC.cutoff=0.17
> y1=topTable(fit, coef="TreatT",
> n=nrow(genes),adjust.method="BH",genelist=genes)
>
> y1 for each iteration of permutation would  have P.Value column and
> these I would have plotted on the end to find the distribution of all
> p values generated in those 1000 permutations.
>
> Please advise,
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

anikaM
Hi Bert,

thanks for getting back to me. I have to permute those 132 columns
1000 times and perform the code given in the previous email.

Can you please show me how you would do that in the loop? This is also
a huge data set ...

Thanks
Ana

On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]> wrote:

>
> If you just want to permute columns of a matrix,
>
> ?sample
> > sample.int(10)
>  [1]  9  2 10  8  4  6  3  1  5  7
>
> and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort.
>
> If I have misunderstood, just ignore.
>
> Cheers,
> Bert
>
>
>
>
> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]> wrote:
>>
>> Hello,
>>
>> I have a matrix
>> > dim(dat)
>> [1] 15568   132
>>
>> It looks like this:
>>
>>                    NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
>> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
>>            4.33             4.63             3.85
>> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
>>            6.26             6.24             5.99
>> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
>>            7.37             7.36             7.55
>> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
>>            5.39             4.75             4.96
>> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
>>            5.45             5.25             5.02
>> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
>>            5.61             5.66             5.37
>>
>> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
>> named ex. NoD_14381_norm.1
>>
>>
>> How to do 1000 permutations of these 132 columns and on each created
>> new permuted matrix perform this code:
>>
>> subject="all_replicate"
>> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep=''))
>> Treat <- factor(targets$Treatment,levels=c("C","T"))
>> Replicates <- factor(targets$rep)
>> design <- model.matrix(~Replicates+Treat)
>> corfit <- duplicateCorrelation(dat, block = targets$Subject)
>> corfit$consensus.correlation
>> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
>> fit<-eBayes(fit)
>> qval.cutoff=0.1; FC.cutoff=0.17
>> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes)
>>
>> y1 for each iteration of permutation would  have P.Value column and
>> these I would have plotted on the end to find the distribution of all
>> p values generated in those 1000 permutations.
>>
>> Please advise,
>> Ana
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

anikaM
Basically I would just reshuffle column names in each of 1000 permutations
how to do that and perform everything I described in my initial email

On Tue, 4 Feb 2020 at 14:46, Ana Marija <[hidden email]> wrote:

> Hi Bert,
>
> thanks for getting back to me. I have to permute those 132 columns
> 1000 times and perform the code given in the previous email.
>
> Can you please show me how you would do that in the loop? This is also
> a huge data set ...
>
> Thanks
> Ana
>
> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]> wrote:
> >
> > If you just want to permute columns of a matrix,
> >
> > ?sample
> > > sample.int(10)
> >  [1]  9  2 10  8  4  6  3  1  5  7
> >
> > and you can just use this as an index into the columns of your matrix,
> presumably within a loop of some sort.
> >
> > If I have misunderstood, just ignore.
> >
> > Cheers,
> > Bert
> >
> >
> >
> >
> > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]>
> wrote:
> >>
> >> Hello,
> >>
> >> I have a matrix
> >> > dim(dat)
> >> [1] 15568   132
> >>
> >> It looks like this:
> >>
> >>                    NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
> >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
> >> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
> >>            4.33             4.63             3.85
> >> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
> >>            6.26             6.24             5.99
> >> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
> >>            7.37             7.36             7.55
> >> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
> >>            5.39             4.75             4.96
> >> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
> >>            5.45             5.25             5.02
> >> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
> >>            5.61             5.66             5.37
> >>
> >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
> >> named ex. NoD_14381_norm.1
> >>
> >>
> >> How to do 1000 permutations of these 132 columns and on each created
> >> new permuted matrix perform this code:
> >>
> >> subject="all_replicate"
> >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt",
> sep=''))
> >> Treat <- factor(targets$Treatment,levels=c("C","T"))
> >> Replicates <- factor(targets$rep)
> >> design <- model.matrix(~Replicates+Treat)
> >> corfit <- duplicateCorrelation(dat, block = targets$Subject)
> >> corfit$consensus.correlation
> >> fit
> <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
> >> fit<-eBayes(fit)
> >> qval.cutoff=0.1; FC.cutoff=0.17
> >> y1=topTable(fit, coef="TreatT",
> n=nrow(genes),adjust.method="BH",genelist=genes)
> >>
> >> y1 for each iteration of permutation would  have P.Value column and
> >> these I would have plotted on the end to find the distribution of all
> >> p values generated in those 1000 permutations.
> >>
> >> Please advise,
> >> Ana
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

Bert Gunter-2
In reply to this post by anikaM
I am not going to do your programming for you. If the following doesn't
suffice, maybe someone else will provide you something that will.

m = your matrix

code = your code that uses m

your list of results <- lapply(seq_len(1000), FUN = function(m){
  m <- m[, sample.int(132)]
 code
} )

or use an explicit for() loop to populate a list or vector with your
results.

Again, if I have misunderstood what you want to do, then clarify, and
perhaps someone else will help.

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <[hidden email]>
wrote:

> Hi Bert,
>
> thanks for getting back to me. I have to permute those 132 columns
> 1000 times and perform the code given in the previous email.
>
> Can you please show me how you would do that in the loop? This is also
> a huge data set ...
>
> Thanks
> Ana
>
> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]> wrote:
> >
> > If you just want to permute columns of a matrix,
> >
> > ?sample
> > > sample.int(10)
> >  [1]  9  2 10  8  4  6  3  1  5  7
> >
> > and you can just use this as an index into the columns of your matrix,
> presumably within a loop of some sort.
> >
> > If I have misunderstood, just ignore.
> >
> > Cheers,
> > Bert
> >
> >
> >
> >
> > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]>
> wrote:
> >>
> >> Hello,
> >>
> >> I have a matrix
> >> > dim(dat)
> >> [1] 15568   132
> >>
> >> It looks like this:
> >>
> >>                    NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
> >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
> >> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
> >>            4.33             4.63             3.85
> >> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
> >>            6.26             6.24             5.99
> >> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
> >>            7.37             7.36             7.55
> >> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
> >>            5.39             4.75             4.96
> >> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
> >>            5.45             5.25             5.02
> >> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
> >>            5.61             5.66             5.37
> >>
> >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
> >> named ex. NoD_14381_norm.1
> >>
> >>
> >> How to do 1000 permutations of these 132 columns and on each created
> >> new permuted matrix perform this code:
> >>
> >> subject="all_replicate"
> >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt",
> sep=''))
> >> Treat <- factor(targets$Treatment,levels=c("C","T"))
> >> Replicates <- factor(targets$rep)
> >> design <- model.matrix(~Replicates+Treat)
> >> corfit <- duplicateCorrelation(dat, block = targets$Subject)
> >> corfit$consensus.correlation
> >> fit
> <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
> >> fit<-eBayes(fit)
> >> qval.cutoff=0.1; FC.cutoff=0.17
> >> y1=topTable(fit, coef="TreatT",
> n=nrow(genes),adjust.method="BH",genelist=genes)
> >>
> >> y1 for each iteration of permutation would  have P.Value column and
> >> these I would have plotted on the end to find the distribution of all
> >> p values generated in those 1000 permutations.
> >>
> >> Please advise,
> >> Ana
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

anikaM
I tired your code on this simplified data just for say 10 permutations:

dat <- read.table(text = "   code.1 code.2 code.3 code.4
1     82     93     NA     NA
2     15     85     93     NA
3     93     89     NA     NA
4     81     NA     NA     NA",
                  header = TRUE, stringsAsFactors = FALSE)

dat2=data.matrix(dat)

> result<- lapply(seq_len(10), FUN = function(dat2){
+     dat2 <- dat2[, sample.int(4)]
+     print(colnames(dat2))
+ } )
Error in dat2[, sample.int(4)] : incorrect number of dimensions


On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <[hidden email]> wrote:

>
> I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will.
>
> m = your matrix
>
> code = your code that uses m
>
> your list of results <- lapply(seq_len(1000), FUN = function(m){
>   m <- m[, sample.int(132)]
>  code
> } )
>
> or use an explicit for() loop to populate a list or vector with your results.
>
> Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help.
>
> -- Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <[hidden email]> wrote:
>>
>> Hi Bert,
>>
>> thanks for getting back to me. I have to permute those 132 columns
>> 1000 times and perform the code given in the previous email.
>>
>> Can you please show me how you would do that in the loop? This is also
>> a huge data set ...
>>
>> Thanks
>> Ana
>>
>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]> wrote:
>> >
>> > If you just want to permute columns of a matrix,
>> >
>> > ?sample
>> > > sample.int(10)
>> >  [1]  9  2 10  8  4  6  3  1  5  7
>> >
>> > and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort.
>> >
>> > If I have misunderstood, just ignore.
>> >
>> > Cheers,
>> > Bert
>> >
>> >
>> >
>> >
>> > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I have a matrix
>> >> > dim(dat)
>> >> [1] 15568   132
>> >>
>> >> It looks like this:
>> >>
>> >>                    NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
>> >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
>> >> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
>> >>            4.33             4.63             3.85
>> >> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
>> >>            6.26             6.24             5.99
>> >> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
>> >>            7.37             7.36             7.55
>> >> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
>> >>            5.39             4.75             4.96
>> >> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
>> >>            5.45             5.25             5.02
>> >> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
>> >>            5.61             5.66             5.37
>> >>
>> >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
>> >> named ex. NoD_14381_norm.1
>> >>
>> >>
>> >> How to do 1000 permutations of these 132 columns and on each created
>> >> new permuted matrix perform this code:
>> >>
>> >> subject="all_replicate"
>> >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep=''))
>> >> Treat <- factor(targets$Treatment,levels=c("C","T"))
>> >> Replicates <- factor(targets$rep)
>> >> design <- model.matrix(~Replicates+Treat)
>> >> corfit <- duplicateCorrelation(dat, block = targets$Subject)
>> >> corfit$consensus.correlation
>> >> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
>> >> fit<-eBayes(fit)
>> >> qval.cutoff=0.1; FC.cutoff=0.17
>> >> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes)
>> >>
>> >> y1 for each iteration of permutation would  have P.Value column and
>> >> these I would have plotted on the end to find the distribution of all
>> >> p values generated in those 1000 permutations.
>> >>
>> >> Please advise,
>> >> Ana
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

Duncan Murdoch-2
On 04/02/2020 4:28 p.m., Ana Marija wrote:

> I tired your code on this simplified data just for say 10 permutations:
>
> dat <- read.table(text = "   code.1 code.2 code.3 code.4
> 1     82     93     NA     NA
> 2     15     85     93     NA
> 3     93     89     NA     NA
> 4     81     NA     NA     NA",
>                    header = TRUE, stringsAsFactors = FALSE)
>
> dat2=data.matrix(dat)
>
>> result<- lapply(seq_len(10), FUN = function(dat2){
> +     dat2 <- dat2[, sample.int(4)]
> +     print(colnames(dat2))
> + } )
> Error in dat2[, sample.int(4)] : incorrect number of dimensions

Yes, Bert did suggest that, but it's obviously wrong.  The argument to
FUN is an element of seq_len(10), it's not the full dataset.  Try

result<- lapply(seq_len(10), FUN = function(i){
      dat <- dat2[, sample.int(4)]
      print(colnames(dat))
  } )

Duncan Murdoch

>
>
> On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <[hidden email]> wrote:
>>
>> I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will.
>>
>> m = your matrix
>>
>> code = your code that uses m
>>
>> your list of results <- lapply(seq_len(1000), FUN = function(m){
>>    m <- m[, sample.int(132)]
>>   code
>> } )
>>
>> or use an explicit for() loop to populate a list or vector with your results.
>>
>> Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help.
>>
>> -- Bert
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <[hidden email]> wrote:
>>>
>>> Hi Bert,
>>>
>>> thanks for getting back to me. I have to permute those 132 columns
>>> 1000 times and perform the code given in the previous email.
>>>
>>> Can you please show me how you would do that in the loop? This is also
>>> a huge data set ...
>>>
>>> Thanks
>>> Ana
>>>
>>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]> wrote:
>>>>
>>>> If you just want to permute columns of a matrix,
>>>>
>>>> ?sample
>>>>> sample.int(10)
>>>>   [1]  9  2 10  8  4  6  3  1  5  7
>>>>
>>>> and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort.
>>>>
>>>> If I have misunderstood, just ignore.
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have a matrix
>>>>>> dim(dat)
>>>>> [1] 15568   132
>>>>>
>>>>> It looks like this:
>>>>>
>>>>>                     NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
>>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
>>>>> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
>>>>>             4.33             4.63             3.85
>>>>> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
>>>>>             6.26             6.24             5.99
>>>>> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
>>>>>             7.37             7.36             7.55
>>>>> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
>>>>>             5.39             4.75             4.96
>>>>> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
>>>>>             5.45             5.25             5.02
>>>>> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
>>>>>             5.61             5.66             5.37
>>>>>
>>>>> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
>>>>> named ex. NoD_14381_norm.1
>>>>>
>>>>>
>>>>> How to do 1000 permutations of these 132 columns and on each created
>>>>> new permuted matrix perform this code:
>>>>>
>>>>> subject="all_replicate"
>>>>> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep=''))
>>>>> Treat <- factor(targets$Treatment,levels=c("C","T"))
>>>>> Replicates <- factor(targets$rep)
>>>>> design <- model.matrix(~Replicates+Treat)
>>>>> corfit <- duplicateCorrelation(dat, block = targets$Subject)
>>>>> corfit$consensus.correlation
>>>>> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
>>>>> fit<-eBayes(fit)
>>>>> qval.cutoff=0.1; FC.cutoff=0.17
>>>>> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes)
>>>>>
>>>>> y1 for each iteration of permutation would  have P.Value column and
>>>>> these I would have plotted on the end to find the distribution of all
>>>>> p values generated in those 1000 permutations.
>>>>>
>>>>> Please advise,
>>>>> Ana
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

Bert Gunter-2
Yes, a clear thinko... Thanks for the correction.

-- Bert

On Tue, Feb 4, 2020 at 1:41 PM Duncan Murdoch <[hidden email]>
wrote:

> On 04/02/2020 4:28 p.m., Ana Marija wrote:
> > I tired your code on this simplified data just for say 10 permutations:
> >
> > dat <- read.table(text = "   code.1 code.2 code.3 code.4
> > 1     82     93     NA     NA
> > 2     15     85     93     NA
> > 3     93     89     NA     NA
> > 4     81     NA     NA     NA",
> >                    header = TRUE, stringsAsFactors = FALSE)
> >
> > dat2=data.matrix(dat)
> >
> >> result<- lapply(seq_len(10), FUN = function(dat2){
> > +     dat2 <- dat2[, sample.int(4)]
> > +     print(colnames(dat2))
> > + } )
> > Error in dat2[, sample.int(4)] : incorrect number of dimensions
>
> Yes, Bert did suggest that, but it's obviously wrong.  The argument to
> FUN is an element of seq_len(10), it's not the full dataset.  Try
>
> result<- lapply(seq_len(10), FUN = function(i){
>       dat <- dat2[, sample.int(4)]
>       print(colnames(dat))
>   } )
>
> Duncan Murdoch
>
> >
> >
> > On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <[hidden email]>
> wrote:
> >>
> >> I am not going to do your programming for you. If the following doesn't
> suffice, maybe someone else will provide you something that will.
> >>
> >> m = your matrix
> >>
> >> code = your code that uses m
> >>
> >> your list of results <- lapply(seq_len(1000), FUN = function(m){
> >>    m <- m[, sample.int(132)]
> >>   code
> >> } )
> >>
> >> or use an explicit for() loop to populate a list or vector with your
> results.
> >>
> >> Again, if I have misunderstood what you want to do, then clarify, and
> perhaps someone else will help.
> >>
> >> -- Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <[hidden email]>
> wrote:
> >>>
> >>> Hi Bert,
> >>>
> >>> thanks for getting back to me. I have to permute those 132 columns
> >>> 1000 times and perform the code given in the previous email.
> >>>
> >>> Can you please show me how you would do that in the loop? This is also
> >>> a huge data set ...
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]>
> wrote:
> >>>>
> >>>> If you just want to permute columns of a matrix,
> >>>>
> >>>> ?sample
> >>>>> sample.int(10)
> >>>>   [1]  9  2 10  8  4  6  3  1  5  7
> >>>>
> >>>> and you can just use this as an index into the columns of your
> matrix, presumably within a loop of some sort.
> >>>>
> >>>> If I have misunderstood, just ignore.
> >>>>
> >>>> Cheers,
> >>>> Bert
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <
> [hidden email]> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I have a matrix
> >>>>>> dim(dat)
> >>>>> [1] 15568   132
> >>>>>
> >>>>> It looks like this:
> >>>>>
> >>>>>                     NoD_14381_norm.1 NoD_14381_norm.2
> NoD_14381_norm.3
> >>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
> >>>>> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
> >>>>>             4.33             4.63             3.85
> >>>>> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
> >>>>>             6.26             6.24             5.99
> >>>>> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
> >>>>>             7.37             7.36             7.55
> >>>>> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
> >>>>>             5.39             4.75             4.96
> >>>>> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
> >>>>>             5.45             5.25             5.02
> >>>>> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
> >>>>>             5.61             5.66             5.37
> >>>>>
> >>>>> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and
> subjects
> >>>>> named ex. NoD_14381_norm.1
> >>>>>
> >>>>>
> >>>>> How to do 1000 permutations of these 132 columns and on each created
> >>>>> new permuted matrix perform this code:
> >>>>>
> >>>>> subject="all_replicate"
> >>>>>
> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt",
> sep=''))
> >>>>> Treat <- factor(targets$Treatment,levels=c("C","T"))
> >>>>> Replicates <- factor(targets$rep)
> >>>>> design <- model.matrix(~Replicates+Treat)
> >>>>> corfit <- duplicateCorrelation(dat, block = targets$Subject)
> >>>>> corfit$consensus.correlation
> >>>>> fit
> <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
> >>>>> fit<-eBayes(fit)
> >>>>> qval.cutoff=0.1; FC.cutoff=0.17
> >>>>> y1=topTable(fit, coef="TreatT",
> n=nrow(genes),adjust.method="BH",genelist=genes)
> >>>>>
> >>>>> y1 for each iteration of permutation would  have P.Value column and
> >>>>> these I would have plotted on the end to find the distribution of all
> >>>>> p values generated in those 1000 permutations.
> >>>>>
> >>>>> Please advise,
> >>>>> Ana
> >>>>>
> >>>>> ______________________________________________
> >>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

anikaM
I tried to solve the task via following code:

manyorders <- replicate(100, sample(colnames(dat)), simplify=FALSE)

all_results <- lapply(manyorders, function(ord) {
  tmpdat <- `colnames<-`(dat, ord) # copies and renames in one line
  corfit <- duplicateCorrelation(dat, block = targets$Subject)
  corfit$consensus.correlation
  fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
  fit <- eBayes(fit)
  y1 <- topTable(fit, coef="TreatT", n=nrow(genes),
adjust.method="BH", genelist=genes)
  list(fit = fit, y1 = y1)
})

and I wrote all_results in a file
write.table(all_results, file="all_res", sep = " ", row.names = FALSE,
col.names = TRUE,quote=FALSE)

when I tried to open the file:
> a=read.table("all_res", header=T)
Error in read.table("all_res", header = T) :
  more columns than column names

also with fread:
> a=fread("all_res")
Warning messages:
1: In fread("all_res") :
  Detected 35300 column names but the data has 35700 columns (i.e.
invalid file). Added 400 extra default column names at the end.
2: In fread("all_res") :
  Stopped early on line 5. Expected 35700 fields but found 35400.
Consider fill=TRUE and comment.char=. First discarded non-empty line:
<<5.73583235032546 0.182418204566858 0.178323702841331
-1.69234503648485 -1.83571739423166 -1.72136694103431 1.35210840970636
1.37698365727967 1.65643614366521 1.33750809366081 1.22116614774455
1.07000318265432 -1.13789084968265 -1.0757049956716 -1.24824627469937
-0.208441151406013 -0.246971068064524 -0.272463550264579
-0.561691522816148 -0.407713100376217 -0.538071283637418
-0.979274581542649 -0.909855772342568 -0.974827213844384
0.21700425934175 0.134586251989901 0.0818947419096577 -0.6788584605>>

my original dat matrix has 132 columns and around 15000 rows

Can you please advise on this?

Thanks
Ana

On Tue, Feb 4, 2020 at 4:45 PM Bert Gunter <[hidden email]> wrote:

>
> Yes, a clear thinko... Thanks for the correction.
>
> -- Bert
>
> On Tue, Feb 4, 2020 at 1:41 PM Duncan Murdoch <[hidden email]> wrote:
>>
>> On 04/02/2020 4:28 p.m., Ana Marija wrote:
>> > I tired your code on this simplified data just for say 10 permutations:
>> >
>> > dat <- read.table(text = "   code.1 code.2 code.3 code.4
>> > 1     82     93     NA     NA
>> > 2     15     85     93     NA
>> > 3     93     89     NA     NA
>> > 4     81     NA     NA     NA",
>> >                    header = TRUE, stringsAsFactors = FALSE)
>> >
>> > dat2=data.matrix(dat)
>> >
>> >> result<- lapply(seq_len(10), FUN = function(dat2){
>> > +     dat2 <- dat2[, sample.int(4)]
>> > +     print(colnames(dat2))
>> > + } )
>> > Error in dat2[, sample.int(4)] : incorrect number of dimensions
>>
>> Yes, Bert did suggest that, but it's obviously wrong.  The argument to
>> FUN is an element of seq_len(10), it's not the full dataset.  Try
>>
>> result<- lapply(seq_len(10), FUN = function(i){
>>       dat <- dat2[, sample.int(4)]
>>       print(colnames(dat))
>>   } )
>>
>> Duncan Murdoch
>>
>> >
>> >
>> > On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <[hidden email]> wrote:
>> >>
>> >> I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will.
>> >>
>> >> m = your matrix
>> >>
>> >> code = your code that uses m
>> >>
>> >> your list of results <- lapply(seq_len(1000), FUN = function(m){
>> >>    m <- m[, sample.int(132)]
>> >>   code
>> >> } )
>> >>
>> >> or use an explicit for() loop to populate a list or vector with your results.
>> >>
>> >> Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help.
>> >>
>> >> -- Bert
>> >>
>> >>
>> >>
>> >> Bert Gunter
>> >>
>> >> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>
>> >>
>> >> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <[hidden email]> wrote:
>> >>>
>> >>> Hi Bert,
>> >>>
>> >>> thanks for getting back to me. I have to permute those 132 columns
>> >>> 1000 times and perform the code given in the previous email.
>> >>>
>> >>> Can you please show me how you would do that in the loop? This is also
>> >>> a huge data set ...
>> >>>
>> >>> Thanks
>> >>> Ana
>> >>>
>> >>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <[hidden email]> wrote:
>> >>>>
>> >>>> If you just want to permute columns of a matrix,
>> >>>>
>> >>>> ?sample
>> >>>>> sample.int(10)
>> >>>>   [1]  9  2 10  8  4  6  3  1  5  7
>> >>>>
>> >>>> and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort.
>> >>>>
>> >>>> If I have misunderstood, just ignore.
>> >>>>
>> >>>> Cheers,
>> >>>> Bert
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <[hidden email]> wrote:
>> >>>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I have a matrix
>> >>>>>> dim(dat)
>> >>>>> [1] 15568   132
>> >>>>>
>> >>>>> It looks like this:
>> >>>>>
>> >>>>>                     NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3
>> >>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
>> >>>>> Ku8QhfS0n_hIOABXuE             4.75             4.25             4.79
>> >>>>>             4.33             4.63             3.85
>> >>>>> Bx496XsFXiAlj.Eaeo             6.15             6.23             6.55
>> >>>>>             6.26             6.24             5.99
>> >>>>> W38p0ogk.wIBVRXllY             7.13             7.35             7.55
>> >>>>>             7.37             7.36             7.55
>> >>>>> QIBkqIS9LR5DfTlTS8             6.27             6.73             6.45
>> >>>>>             5.39             4.75             4.96
>> >>>>> BZKiEvS0eQ305U0v34             6.35             7.02             6.76
>> >>>>>             5.45             5.25             5.02
>> >>>>> 6TheVd.HiE1UF3lX6g             5.53             5.02             5.36
>> >>>>>             5.61             5.66             5.37
>> >>>>>
>> >>>>> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects
>> >>>>> named ex. NoD_14381_norm.1
>> >>>>>
>> >>>>>
>> >>>>> How to do 1000 permutations of these 132 columns and on each created
>> >>>>> new permuted matrix perform this code:
>> >>>>>
>> >>>>> subject="all_replicate"
>> >>>>> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep=''))
>> >>>>> Treat <- factor(targets$Treatment,levels=c("C","T"))
>> >>>>> Replicates <- factor(targets$rep)
>> >>>>> design <- model.matrix(~Replicates+Treat)
>> >>>>> corfit <- duplicateCorrelation(dat, block = targets$Subject)
>> >>>>> corfit$consensus.correlation
>> >>>>> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
>> >>>>> fit<-eBayes(fit)
>> >>>>> qval.cutoff=0.1; FC.cutoff=0.17
>> >>>>> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes)
>> >>>>>
>> >>>>> y1 for each iteration of permutation would  have P.Value column and
>> >>>>> these I would have plotted on the end to find the distribution of all
>> >>>>> p values generated in those 1000 permutations.
>> >>>>>
>> >>>>> Please advise,
>> >>>>> Ana
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> >>>>> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: doing 1000 permutations and doing test statistics distribution

Ivan Krylov
On Wed, 5 Feb 2020 09:15:16 -0600
Ana Marija <[hidden email]> wrote:

> I tried to solve the task via following code:

> all_results <- lapply(manyorders, function(ord) {

# ...

>  list(fit = fit, y1 = y1)
> })

> and I wrote all_results in a file
> write.table(all_results, file="all_res", sep = " ", row.names = FALSE,
> col.names = TRUE,quote=FALSE)

all_results is a list of lists of pairs of data.frames and MArrayLM
S3 objects, while write.table works best with data.frames (or things
that make sense when coerced to a data.frame). I am afraid that the
result of coercing a list of lists of lists into a data.frame may not
make sense.

What do you need the "all_res" file for? If you just want persistence
(save the data between runs of R programs, but not examine it
manually), consider using saveRDS/readRDS instead. If you need to
interoperate with non-R programs or want a text file for transparency
reasons, read on.

> when I tried to open the file:
> > a=read.table("all_res", header=T)  
> Error in read.table("all_res", header = T) :
>   more columns than column names

Do any of the strings stored in all_results contain spaces?
(y1[,'genelist'] might.) A combination of sep=" " and quote=FALSE could
make such data impossible to read back unambiguously. Does it help to
re-enable quote=TRUE or switch to a different sep (like "\t") that's
guaranteed to be absent from strings you are trying to save? Either
way, read.table() will not reconstruct the same all_results that was
fed to write.table() previously unless all_results is already a
data.frame (which it isn't).

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.