how to create a txt file with parsed columns

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

how to create a txt file with parsed columns

anikaM
Hello,

I have two data frames:

head(a)
              GENE        rs       BETA
1  ENSG00000154803 rs2605134  0.0360182
2  ENSG00000154803 rs7405677  0.0525463
3  ENSG00000154803 rs7211573  0.0525531
4  ENSG00000154803 rs2746026  0.0466392
5  ENSG00000141030 rs2605134  0.0806140
6  ENSG00000141030 rs7405677  0.0251654
7  ENSG00000141030 rs7211573  0.0252775
8  ENSG00000141030 rs2746026  0.0976396
9  ENSG00000205309 rs2605134  0.0838975
10 ENSG00000205309 rs7405677 -0.2148500
11 ENSG00000205309 rs7211573 -0.2148170
12 ENSG00000205309 rs2746026  0.1013920
13 ENSG00000215030 rs2605134  0.1261050
14 ENSG00000215030 rs7405677  0.0165236
15 ENSG00000215030 rs7211573  0.0163509
16 ENSG00000215030 rs2746026  0.1201180
17 ENSG00000141026 rs2605134  0.0485897
18 ENSG00000141026 rs7405677 -0.0929964
19 ENSG00000141026 rs7211573 -0.0930321
20 ENSG00000141026 rs2746026  0.0623033

head(b)
          rs       GWAS
1  rs2605134  0.0315177
2  rs7405677 -0.0816389
3  rs7211573 -0.0797796
4  rs2746026  0.0199350
5 rs11658521  0.0728377
6  rs9914107  0.0720096
7 rs56964223  0.0723903

Data frame a has:
> length(unique(a$GENE))
[1] 51
> dim(a)
[1] 287   3

and the whole data frame b is shown

I would like to create a txt file which would have rs match for each
ENSG from data frame b. If a particular ENSG does not have matching rs
from data frame b the value under it would be zero. So the txt file
would have 7 rows (for all those unique rs from data frame b) and 53
columns (for 51 ENSGs and one for unique rs and one for GWAS)

So one row of that txt file would look like this.

GENES       ENSG00000154803   ENSG00000141030  ENSG00000205309
ENSG00000215030    ENSG00000141026  GWAS
rs2605134   0.0360182         0.0806140         0.0838975
0.1261050           0.0485897       0.0315177


Please advise,
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to create a txt file with parsed columns

Jim Lemon-4
Hi Ana,
Is this what you want?

a<-read.table(text="GENE        rs       BETA
1  ENSG00000154803 rs2605134  0.0360182
2  ENSG00000154803 rs7405677  0.0525463
3  ENSG00000154803 rs7211573  0.0525531
4  ENSG00000154803 rs2746026  0.0466392
5  ENSG00000141030 rs2605134  0.0806140
6  ENSG00000141030 rs7405677  0.0251654
7  ENSG00000141030 rs7211573  0.0252775
8  ENSG00000141030 rs2746026  0.0976396
9  ENSG00000205309 rs2605134  0.0838975
10 ENSG00000205309 rs7405677 -0.2148500
11 ENSG00000205309 rs7211573 -0.2148170
12 ENSG00000205309 rs2746026  0.1013920
13 ENSG00000215030 rs2605134  0.1261050
14 ENSG00000215030 rs7405677  0.0165236
15 ENSG00000215030 rs7211573  0.0163509
16 ENSG00000215030 rs2746026  0.1201180
17 ENSG00000141026 rs2605134  0.0485897
18 ENSG00000141026 rs7405677 -0.0929964
19 ENSG00000141026 rs7211573 -0.0930321
20 ENSG00000141026 rs2746026  0.0623033",
header=TRUE,stringsAsFactors=FALSE)
b<-read.table(text="rs       GWAS
1  rs2605134  0.0315177
2  rs7405677 -0.0816389
3  rs7211573 -0.0797796
4  rs2746026  0.0199350
5 rs11658521  0.0728377
6  rs9914107  0.0720096
7 rs56964223  0.0723903",
header=TRUE,stringsAsFactors=FALSE)
ab<-merge(a,b,by="rs")
library(prettyR)
abc<-stretch_df(ab,idvar="rs",to.stretch=c("GENE","BETA"))

Jiim

On Mon, Dec 9, 2019 at 11:10 AM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I have two data frames:
>
> head(a)
>               GENE        rs       BETA
> 1  ENSG00000154803 rs2605134  0.0360182
> 2  ENSG00000154803 rs7405677  0.0525463
> 3  ENSG00000154803 rs7211573  0.0525531
> 4  ENSG00000154803 rs2746026  0.0466392
> 5  ENSG00000141030 rs2605134  0.0806140
> 6  ENSG00000141030 rs7405677  0.0251654
> 7  ENSG00000141030 rs7211573  0.0252775
> 8  ENSG00000141030 rs2746026  0.0976396
> 9  ENSG00000205309 rs2605134  0.0838975
> 10 ENSG00000205309 rs7405677 -0.2148500
> 11 ENSG00000205309 rs7211573 -0.2148170
> 12 ENSG00000205309 rs2746026  0.1013920
> 13 ENSG00000215030 rs2605134  0.1261050
> 14 ENSG00000215030 rs7405677  0.0165236
> 15 ENSG00000215030 rs7211573  0.0163509
> 16 ENSG00000215030 rs2746026  0.1201180
> 17 ENSG00000141026 rs2605134  0.0485897
> 18 ENSG00000141026 rs7405677 -0.0929964
> 19 ENSG00000141026 rs7211573 -0.0930321
> 20 ENSG00000141026 rs2746026  0.0623033
>
> head(b)
>           rs       GWAS
> 1  rs2605134  0.0315177
> 2  rs7405677 -0.0816389
> 3  rs7211573 -0.0797796
> 4  rs2746026  0.0199350
> 5 rs11658521  0.0728377
> 6  rs9914107  0.0720096
> 7 rs56964223  0.0723903
>
> Data frame a has:
> > length(unique(a$GENE))
> [1] 51
> > dim(a)
> [1] 287   3
>
> and the whole data frame b is shown
>
> I would like to create a txt file which would have rs match for each
> ENSG from data frame b. If a particular ENSG does not have matching rs
> from data frame b the value under it would be zero. So the txt file
> would have 7 rows (for all those unique rs from data frame b) and 53
> columns (for 51 ENSGs and one for unique rs and one for GWAS)
>
> So one row of that txt file would look like this.
>
> GENES       ENSG00000154803   ENSG00000141030  ENSG00000205309
> ENSG00000215030    ENSG00000141026  GWAS
> rs2605134   0.0360182         0.0806140         0.0838975
> 0.1261050           0.0485897       0.0315177
> …
>
> Please advise,
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to create a txt file with parsed columns

anikaM
Thanks for getting back to me, I resolved my problem with this:

library(reshape2)
c=dcast(a, rs ~ GENE)
d=merge(c,b,by="rs")
d[is.na(d)] <- 0

On Sun, Dec 8, 2019 at 11:03 PM Jim Lemon <[hidden email]> wrote:

>
> Hi Ana,
> Is this what you want?
>
> a<-read.table(text="GENE        rs       BETA
> 1  ENSG00000154803 rs2605134  0.0360182
> 2  ENSG00000154803 rs7405677  0.0525463
> 3  ENSG00000154803 rs7211573  0.0525531
> 4  ENSG00000154803 rs2746026  0.0466392
> 5  ENSG00000141030 rs2605134  0.0806140
> 6  ENSG00000141030 rs7405677  0.0251654
> 7  ENSG00000141030 rs7211573  0.0252775
> 8  ENSG00000141030 rs2746026  0.0976396
> 9  ENSG00000205309 rs2605134  0.0838975
> 10 ENSG00000205309 rs7405677 -0.2148500
> 11 ENSG00000205309 rs7211573 -0.2148170
> 12 ENSG00000205309 rs2746026  0.1013920
> 13 ENSG00000215030 rs2605134  0.1261050
> 14 ENSG00000215030 rs7405677  0.0165236
> 15 ENSG00000215030 rs7211573  0.0163509
> 16 ENSG00000215030 rs2746026  0.1201180
> 17 ENSG00000141026 rs2605134  0.0485897
> 18 ENSG00000141026 rs7405677 -0.0929964
> 19 ENSG00000141026 rs7211573 -0.0930321
> 20 ENSG00000141026 rs2746026  0.0623033",
> header=TRUE,stringsAsFactors=FALSE)
> b<-read.table(text="rs       GWAS
> 1  rs2605134  0.0315177
> 2  rs7405677 -0.0816389
> 3  rs7211573 -0.0797796
> 4  rs2746026  0.0199350
> 5 rs11658521  0.0728377
> 6  rs9914107  0.0720096
> 7 rs56964223  0.0723903",
> header=TRUE,stringsAsFactors=FALSE)
> ab<-merge(a,b,by="rs")
> library(prettyR)
> abc<-stretch_df(ab,idvar="rs",to.stretch=c("GENE","BETA"))
>
> Jiim
>
> On Mon, Dec 9, 2019 at 11:10 AM Ana Marija <[hidden email]> wrote:
> >
> > Hello,
> >
> > I have two data frames:
> >
> > head(a)
> >               GENE        rs       BETA
> > 1  ENSG00000154803 rs2605134  0.0360182
> > 2  ENSG00000154803 rs7405677  0.0525463
> > 3  ENSG00000154803 rs7211573  0.0525531
> > 4  ENSG00000154803 rs2746026  0.0466392
> > 5  ENSG00000141030 rs2605134  0.0806140
> > 6  ENSG00000141030 rs7405677  0.0251654
> > 7  ENSG00000141030 rs7211573  0.0252775
> > 8  ENSG00000141030 rs2746026  0.0976396
> > 9  ENSG00000205309 rs2605134  0.0838975
> > 10 ENSG00000205309 rs7405677 -0.2148500
> > 11 ENSG00000205309 rs7211573 -0.2148170
> > 12 ENSG00000205309 rs2746026  0.1013920
> > 13 ENSG00000215030 rs2605134  0.1261050
> > 14 ENSG00000215030 rs7405677  0.0165236
> > 15 ENSG00000215030 rs7211573  0.0163509
> > 16 ENSG00000215030 rs2746026  0.1201180
> > 17 ENSG00000141026 rs2605134  0.0485897
> > 18 ENSG00000141026 rs7405677 -0.0929964
> > 19 ENSG00000141026 rs7211573 -0.0930321
> > 20 ENSG00000141026 rs2746026  0.0623033
> >
> > head(b)
> >           rs       GWAS
> > 1  rs2605134  0.0315177
> > 2  rs7405677 -0.0816389
> > 3  rs7211573 -0.0797796
> > 4  rs2746026  0.0199350
> > 5 rs11658521  0.0728377
> > 6  rs9914107  0.0720096
> > 7 rs56964223  0.0723903
> >
> > Data frame a has:
> > > length(unique(a$GENE))
> > [1] 51
> > > dim(a)
> > [1] 287   3
> >
> > and the whole data frame b is shown
> >
> > I would like to create a txt file which would have rs match for each
> > ENSG from data frame b. If a particular ENSG does not have matching rs
> > from data frame b the value under it would be zero. So the txt file
> > would have 7 rows (for all those unique rs from data frame b) and 53
> > columns (for 51 ENSGs and one for unique rs and one for GWAS)
> >
> > So one row of that txt file would look like this.
> >
> > GENES       ENSG00000154803   ENSG00000141030  ENSG00000205309
> > ENSG00000215030    ENSG00000141026  GWAS
> > rs2605134   0.0360182         0.0806140         0.0838975
> > 0.1261050           0.0485897       0.0315177
> > …
> >
> > Please advise,
> > Ana
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.