RNA Seq Analysis in R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RNA Seq Analysis in R

Anas Jamshed
I choose microarray data GSE75693 of 30 patients with stable kidney
transplantation and 15 with BKVN to identify differentially expressed genes
(DEGs). I performed this in GEO2R and find R script there and Runs R script
Successfully on R studio as well. The R script is :

 # Differential expression analysis with limma

library(Biobase)
library(GEOquery)
library(limma)
# load series and platform data from GEO

gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)if
(length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
<- 1
gset <- gset[[idx]]
# make proper column names to match toptable
fvarLabels(gset) <- make.names(fvarLabels(gset))
# group names for all samples
gsms <- paste0("000000000000000000000000000000XXXXXXXXXXXXXXX11111",
        "1111111111XXXXXXXXXXXXXXXXXXX")
sml <- c()for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
# eliminate samples marked as "X"
sel <- which(sml != "X")
sml <- sml[sel]
gset <- gset[ ,sel]
# log2 transform
exprs(gset) <- log2(exprs(gset))
# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)

tT <- subset(tT,
select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2)

After running this no genes are found plz help me

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RNA Seq Analysis in R

Jeff Newmiller
https://www.bioconductor.org/help/

On August 1, 2020 4:01:08 AM PDT, Anas Jamshed <[hidden email]> wrote:

>I choose microarray data GSE75693 of 30 patients with stable kidney
>transplantation and 15 with BKVN to identify differentially expressed
>genes
>(DEGs). I performed this in GEO2R and find R script there and Runs R
>script
>Successfully on R studio as well. The R script is :
>
> # Differential expression analysis with limma
>
>library(Biobase)
>library(GEOquery)
>library(limma)
># load series and platform data from GEO
>
>gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)if
>(length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
><- 1
>gset <- gset[[idx]]
># make proper column names to match toptable
>fvarLabels(gset) <- make.names(fvarLabels(gset))
># group names for all samples
>gsms <- paste0("000000000000000000000000000000XXXXXXXXXXXXXXX11111",
>        "1111111111XXXXXXXXXXXXXXXXXXX")
>sml <- c()for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
># eliminate samples marked as "X"
>sel <- which(sml != "X")
>sml <- sml[sel]
>gset <- gset[ ,sel]
># log2 transform
>exprs(gset) <- log2(exprs(gset))
># set up the data and proceed with analysis
>sml <- paste("G", sml, sep="")    # set group names
>fl <- as.factor(sml)
>gset$description <- fl
>design <- model.matrix(~ description + 0, gset)
>colnames(design) <- levels(fl)
>fit <- lmFit(gset, design)
>cont.matrix <- makeContrasts(G1-G0, levels=design)
>fit2 <- contrasts.fit(fit, cont.matrix)
>fit2 <- eBayes(fit2, 0.01)
>tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)
>
>tT <- subset(tT,
>select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
>DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2)
>
>After running this no genes are found plz help me
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RNA Seq Analysis in R

Matthew McCormack
As with the previous post, I agree that Bioconductor will be a better
place to ask this question.

As a quick thought you also might try to adjust the p-value in the last
line:

DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2). You could play
around with the P.Value, 0.01 is pretty low, you could try 0.05 and
maybe abs(logFC) > 1.

But, first you should try to print out tT with something like
write.table(tT, file = TopTable.txt, sep = "\t").

This will write out tT to a tab-delimited text file (in the directory
that you are working in) that you can import into Excel and then inspect
the logFC and p-values for the top 1250 genes.

Matthew

On 8/1/20 1:13 PM, Jeff Newmiller wrote:

>          External Email - Use Caution
>
> https://www.bioconductor.org/help/
>
> On August 1, 2020 4:01:08 AM PDT, Anas Jamshed <[hidden email]> wrote:
>> I choose microarray data GSE75693 of 30 patients with stable kidney
>> transplantation and 15 with BKVN to identify differentially expressed
>> genes
>> (DEGs). I performed this in GEO2R and find R script there and Runs R
>> script
>> Successfully on R studio as well. The R script is :
>>
>> # Differential expression analysis with limma
>>
>> library(Biobase)
>> library(GEOquery)
>> library(limma)
>> # load series and platform data from GEO
>>
>> gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)if
>> (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx
>> <- 1
>> gset <- gset[[idx]]
>> # make proper column names to match toptable
>> fvarLabels(gset) <- make.names(fvarLabels(gset))
>> # group names for all samples
>> gsms <- paste0("000000000000000000000000000000XXXXXXXXXXXXXXX11111",
>>         "1111111111XXXXXXXXXXXXXXXXXXX")
>> sml <- c()for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }
>> # eliminate samples marked as "X"
>> sel <- which(sml != "X")
>> sml <- sml[sel]
>> gset <- gset[ ,sel]
>> # log2 transform
>> exprs(gset) <- log2(exprs(gset))
>> # set up the data and proceed with analysis
>> sml <- paste("G", sml, sep="")    # set group names
>> fl <- as.factor(sml)
>> gset$description <- fl
>> design <- model.matrix(~ description + 0, gset)
>> colnames(design) <- levels(fl)
>> fit <- lmFit(gset, design)
>> cont.matrix <- makeContrasts(G1-G0, levels=design)
>> fit2 <- contrasts.fit(fit, cont.matrix)
>> fit2 <- eBayes(fit2, 0.01)
>> tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)
>>
>> tT <- subset(tT,
>> select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
>> DEGs = subset(tT, P.Value < 0.01 & abs(logFC) > 2)
>>
>> After running this no genes are found plz help me
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: RNA Seq Analysis in R

Rasmus Liland-3
On 2020-08-01 15:52 -0400, Matthew McCormack wrote:
| On 8/1/20 1:13 PM, Jeff Newmiller wrote:
| | On August 1, 2020 4:01:08 AM PDT, Anas Jamshed wrote:
| | | I performed this in GEO2R and find
| | | R script there and Runs R script

Anas, how did you come up with this
script at all by reading the article?

How can you be sure that
limma::lmFit/limma::eBayes procedure was
the one Jia et al. used in their
article?

The three author emails are listed on
page 1 of the article.

| | | After running this no genes are
| | | found plz help me
| |
| | https://www.bioconductor.org/help/
|
| As with the previous post, I agree
| that Bioconductor will be a better
| place to ask this question.
|
| As a quick thought you also might try
| to adjust the p-value in the last
| line:
 
This is the "distribution" of
possible log2 Fold Change in tT:

        > tab <- table(signif(tT$logFC, 1))
        > tab[as.character(sort(
        +   as.numeric(names(tab)),
        +   decreasing=F))]
         -0.5  -0.4  -0.3  -0.2  -0.1 -0.09   0.1
            1    25   158   376   185     7    49
          0.2   0.3   0.4   0.5   0.6   0.7
          250   140    42    11     4     2

... knowing full well “regulated” is
supposed to be abs(logFC)>1, we can
instead select above .5 there to get
the few up-regulated ones ...

        > rownames(tT) <- NULL
        > subset(x=tT,
        +   subset=
        +     adj.P.Val < .01 &
        +     abs(logFC) > .5,
        +   select=adj.P.Val:Gene.symbol)
               adj.P.Val      P.Value        t
        4   7.457501e-05 5.894525e-09 7.075753
        5   7.457501e-05 7.877860e-09 6.993182
        9   1.170092e-04 1.926078e-08 6.738920
        29  2.565179e-04 1.432599e-07 6.168230
        42  3.202947e-04 2.511181e-07 6.008168
        60  4.039665e-04 4.433103e-07 5.845695
        343 1.043185e-03 6.555444e-06 5.066127
        475 1.391091e-03 1.208538e-05 4.885755
                    B     logFC       Gene.symbol
        4   10.264385 0.6225559             REG1A
        5    9.996103 0.6630585              TNMD
        9    9.168329 0.5138611            NKAIN4
        29   7.306904 0.5538644              C1QB
        42   6.785641 0.5530439             ISG20
        60   6.257651 0.5082288              GZMH
        343  3.755608 0.5543619 MIR155///MIR155HG
        475  3.188253 0.7264114            CXCL13

... none of which are in the network of
important proteins in figure 5 on page 6.

Best,
Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment