negative vector length when merging data frames

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

negative vector length when merging data frames

anikaM
Hello,

I have two data frames like this:

> head(l4)
    X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> head(asign)
              gene  chr                chr_pos   pos p.val.Retina
1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
  negative length vectors are not allowed
> sapply(l4,class)
          X1           X2           X3           X4           X5   variant_id
 "character"  "character"  "character"  "character"  "character"  "character"
pval_nominal  gene_id.LCL
   "numeric"  "character"
> sapply(asign,class)
        gene          chr      chr_pos          pos p.val.Retina
 "character"  "character"  "character"  "character"  "character"

Please advise as to why I am getting this error when merging?

Thanks
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

anikaM
I also tried left_join but I got: Error: std::bad_alloc

> df3 <- left_join(l4, asign, by = c("chr","pos"))
Error: std::bad_alloc
> dim(l4)
[1] 166941635         8
> dim(asign)
[1] 107371528         5

On Wed, Oct 23, 2019 at 5:32 PM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I have two data frames like this:
>
> > head(l4)
>     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > head(asign)
>               gene  chr                chr_pos   pos p.val.Retina
> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
>   negative length vectors are not allowed
> > sapply(l4,class)
>           X1           X2           X3           X4           X5   variant_id
>  "character"  "character"  "character"  "character"  "character"  "character"
> pval_nominal  gene_id.LCL
>    "numeric"  "character"
> > sapply(asign,class)
>         gene          chr      chr_pos          pos p.val.Retina
>  "character"  "character"  "character"  "character"  "character"
>
> Please advise as to why I am getting this error when merging?
>
> Thanks
> Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Jim Lemon-4
In reply to this post by anikaM
Hi Ana,
When I run this example taken from your email:

l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
header=TRUE,stringsAsFactors=FALSE)
asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
header=TRUE,stringsAsFactors=FALSE)
merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
 [1] X1           X2           X3           X4           X5
[6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
[11] p.val.Retina
<0 rows> (or 0-length row.names)

It works okay, but there are no matches in the join. So I can't even
guess what the problem is.

Jim

On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I have two data frames like this:
>
> > head(l4)
>     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > head(asign)
>               gene  chr                chr_pos   pos p.val.Retina
> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
>   negative length vectors are not allowed
> > sapply(l4,class)
>           X1           X2           X3           X4           X5   variant_id
>  "character"  "character"  "character"  "character"  "character"  "character"
> pval_nominal  gene_id.LCL
>    "numeric"  "character"
> > sapply(asign,class)
>         gene          chr      chr_pos          pos p.val.Retina
>  "character"  "character"  "character"  "character"  "character"
>
> Please advise as to why I am getting this error when merging?
>
> Thanks
> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Jim Lemon-4
In reply to this post by anikaM
Ah, it looks like a memory allocation problem.

Jim

On Thu, Oct 24, 2019 at 10:05 AM Ana Marija <[hidden email]> wrote:

>
> I also tried left_join but I got: Error: std::bad_alloc
>
> > df3 <- left_join(l4, asign, by = c("chr","pos"))
> Error: std::bad_alloc
> > dim(l4)
> [1] 166941635         8
> > dim(asign)
> [1] 107371528         5
>
> On Wed, Oct 23, 2019 at 5:32 PM Ana Marija <[hidden email]> wrote:
> >
> > Hello,
> >
> > I have two data frames like this:
> >
> > > head(l4)
> >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > head(asign)
> >               gene  chr                chr_pos   pos p.val.Retina
> > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> >   negative length vectors are not allowed
> > > sapply(l4,class)
> >           X1           X2           X3           X4           X5   variant_id
> >  "character"  "character"  "character"  "character"  "character"  "character"
> > pval_nominal  gene_id.LCL
> >    "numeric"  "character"
> > > sapply(asign,class)
> >         gene          chr      chr_pos          pos p.val.Retina
> >  "character"  "character"  "character"  "character"  "character"
> >
> > Please advise as to why I am getting this error when merging?
> >
> > Thanks
> > Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

anikaM
In reply to this post by Jim Lemon-4
Hi Jim,

I think one of the issue is that data frames are so big,
> dim(l4)
[1] 166941635         8
> dim(asign)
[1] 107371528         5

so my example would not reproduce the error

On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:

>
> Hi Ana,
> When I run this example taken from your email:
>
> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> header=TRUE,stringsAsFactors=FALSE)
> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> header=TRUE,stringsAsFactors=FALSE)
> merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>  [1] X1           X2           X3           X4           X5
> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> [11] p.val.Retina
> <0 rows> (or 0-length row.names)
>
> It works okay, but there are no matches in the join. So I can't even
> guess what the problem is.
>
> Jim
>
> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
> >
> > Hello,
> >
> > I have two data frames like this:
> >
> > > head(l4)
> >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > head(asign)
> >               gene  chr                chr_pos   pos p.val.Retina
> > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> >   negative length vectors are not allowed
> > > sapply(l4,class)
> >           X1           X2           X3           X4           X5   variant_id
> >  "character"  "character"  "character"  "character"  "character"  "character"
> > pval_nominal  gene_id.LCL
> >    "numeric"  "character"
> > > sapply(asign,class)
> >         gene          chr      chr_pos          pos p.val.Retina
> >  "character"  "character"  "character"  "character"  "character"
> >
> > Please advise as to why I am getting this error when merging?
> >
> > Thanks
> > Ana
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Jim Lemon-4
Yes. Have you tried the bigmemory package?

Jim

On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <[hidden email]> wrote:

>
> Hi Jim,
>
> I think one of the issue is that data frames are so big,
> > dim(l4)
> [1] 166941635         8
> > dim(asign)
> [1] 107371528         5
>
> so my example would not reproduce the error
>
> On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
> >
> > Hi Ana,
> > When I run this example taken from your email:
> >
> > l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
> > chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> > header=TRUE,stringsAsFactors=FALSE)
> > asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> > ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> > header=TRUE,stringsAsFactors=FALSE)
> > merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> >  [1] X1           X2           X3           X4           X5
> > [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> > [11] p.val.Retina
> > <0 rows> (or 0-length row.names)
> >
> > It works okay, but there are no matches in the join. So I can't even
> > guess what the problem is.
> >
> > Jim
> >
> > On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > I have two data frames like this:
> > >
> > > > head(l4)
> > >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > > head(asign)
> > >               gene  chr                chr_pos   pos p.val.Retina
> > > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > > Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> > >   negative length vectors are not allowed
> > > > sapply(l4,class)
> > >           X1           X2           X3           X4           X5   variant_id
> > >  "character"  "character"  "character"  "character"  "character"  "character"
> > > pval_nominal  gene_id.LCL
> > >    "numeric"  "character"
> > > > sapply(asign,class)
> > >         gene          chr      chr_pos          pos p.val.Retina
> > >  "character"  "character"  "character"  "character"  "character"
> > >
> > > Please advise as to why I am getting this error when merging?
> > >
> > > Thanks
> > > Ana
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

anikaM
no can you please send me an example how the command would look like in my case?

On Wed, Oct 23, 2019 at 6:16 PM Jim Lemon <[hidden email]> wrote:

>
> Yes. Have you tried the bigmemory package?
>
> Jim
>
> On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <[hidden email]> wrote:
> >
> > Hi Jim,
> >
> > I think one of the issue is that data frames are so big,
> > > dim(l4)
> > [1] 166941635         8
> > > dim(asign)
> > [1] 107371528         5
> >
> > so my example would not reproduce the error
> >
> > On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
> > >
> > > Hi Ana,
> > > When I run this example taken from your email:
> > >
> > > l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
> > > chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> > > header=TRUE,stringsAsFactors=FALSE)
> > > asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> > > ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> > > header=TRUE,stringsAsFactors=FALSE)
> > > merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > >  [1] X1           X2           X3           X4           X5
> > > [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> > > [11] p.val.Retina
> > > <0 rows> (or 0-length row.names)
> > >
> > > It works okay, but there are no matches in the join. So I can't even
> > > guess what the problem is.
> > >
> > > Jim
> > >
> > > On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have two data frames like this:
> > > >
> > > > > head(l4)
> > > >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > > > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > > > head(asign)
> > > >               gene  chr                chr_pos   pos p.val.Retina
> > > > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > > > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > > > Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> > > >   negative length vectors are not allowed
> > > > > sapply(l4,class)
> > > >           X1           X2           X3           X4           X5   variant_id
> > > >  "character"  "character"  "character"  "character"  "character"  "character"
> > > > pval_nominal  gene_id.LCL
> > > >    "numeric"  "character"
> > > > > sapply(asign,class)
> > > >         gene          chr      chr_pos          pos p.val.Retina
> > > >  "character"  "character"  "character"  "character"  "character"
> > > >
> > > > Please advise as to why I am getting this error when merging?
> > > >
> > > > Thanks
> > > > Ana
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Jim Lemon-4
I don't have it installed - that was merely a suggestion. I notice
that both data.table and dplyr packages are mentioned as possibilities
for "merge big datasets in r". Apparently the best way to do it if you
have a database manager is to read the two datasets into tables and do
the join via SQL or whatever language is available.

Jim

On Thu, Oct 24, 2019 at 10:17 AM Ana Marija <[hidden email]> wrote:

>
> no can you please send me an example how the command would look like in my case?
>
> On Wed, Oct 23, 2019 at 6:16 PM Jim Lemon <[hidden email]> wrote:
> >
> > Yes. Have you tried the bigmemory package?
> >
> > Jim
> >
> > On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <[hidden email]> wrote:
> > >
> > > Hi Jim,
> > >
> > > I think one of the issue is that data frames are so big,
> > > > dim(l4)
> > > [1] 166941635         8
> > > > dim(asign)
> > > [1] 107371528         5
> > >
> > > so my example would not reproduce the error
> > >
> > > On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
> > > >
> > > > Hi Ana,
> > > > When I run this example taken from your email:
> > > >
> > > > l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
> > > > chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > > chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > > chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > > chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > > chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > > chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> > > > header=TRUE,stringsAsFactors=FALSE)
> > > > asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> > > > ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > > ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > > ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > > ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > > ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > > ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> > > > header=TRUE,stringsAsFactors=FALSE)
> > > > merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > > >  [1] X1           X2           X3           X4           X5
> > > > [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> > > > [11] p.val.Retina
> > > > <0 rows> (or 0-length row.names)
> > > >
> > > > It works okay, but there are no matches in the join. So I can't even
> > > > guess what the problem is.
> > > >
> > > > Jim
> > > >
> > > > On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I have two data frames like this:
> > > > >
> > > > > > head(l4)
> > > > >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > > > > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > > > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > > > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > > > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > > > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > > > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > > > > head(asign)
> > > > >               gene  chr                chr_pos   pos p.val.Retina
> > > > > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > > > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > > > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > > > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > > > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > > > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > > > > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > > > > Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> > > > >   negative length vectors are not allowed
> > > > > > sapply(l4,class)
> > > > >           X1           X2           X3           X4           X5   variant_id
> > > > >  "character"  "character"  "character"  "character"  "character"  "character"
> > > > > pval_nominal  gene_id.LCL
> > > > >    "numeric"  "character"
> > > > > > sapply(asign,class)
> > > > >         gene          chr      chr_pos          pos p.val.Retina
> > > > >  "character"  "character"  "character"  "character"  "character"
> > > > >
> > > > > Please advise as to why I am getting this error when merging?
> > > > >
> > > > > Thanks
> > > > > Ana
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

anikaM
thanks but I would need solution in R

On Wed, Oct 23, 2019 at 6:31 PM Jim Lemon <[hidden email]> wrote:

>
> I don't have it installed - that was merely a suggestion. I notice
> that both data.table and dplyr packages are mentioned as possibilities
> for "merge big datasets in r". Apparently the best way to do it if you
> have a database manager is to read the two datasets into tables and do
> the join via SQL or whatever language is available.
>
> Jim
>
> On Thu, Oct 24, 2019 at 10:17 AM Ana Marija <[hidden email]> wrote:
> >
> > no can you please send me an example how the command would look like in my case?
> >
> > On Wed, Oct 23, 2019 at 6:16 PM Jim Lemon <[hidden email]> wrote:
> > >
> > > Yes. Have you tried the bigmemory package?
> > >
> > > Jim
> > >
> > > On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <[hidden email]> wrote:
> > > >
> > > > Hi Jim,
> > > >
> > > > I think one of the issue is that data frames are so big,
> > > > > dim(l4)
> > > > [1] 166941635         8
> > > > > dim(asign)
> > > > [1] 107371528         5
> > > >
> > > > so my example would not reproduce the error
> > > >
> > > > On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
> > > > >
> > > > > Hi Ana,
> > > > > When I run this example taken from your email:
> > > > >
> > > > > l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
> > > > > chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > > > chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > > > chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > > > chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > > > chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > > > chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> > > > > header=TRUE,stringsAsFactors=FALSE)
> > > > > asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> > > > > ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > > > ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > > > ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > > > ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > > > ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > > > ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> > > > > header=TRUE,stringsAsFactors=FALSE)
> > > > > merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > > > >  [1] X1           X2           X3           X4           X5
> > > > > [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> > > > > [11] p.val.Retina
> > > > > <0 rows> (or 0-length row.names)
> > > > >
> > > > > It works okay, but there are no matches in the join. So I can't even
> > > > > guess what the problem is.
> > > > >
> > > > > Jim
> > > > >
> > > > > On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I have two data frames like this:
> > > > > >
> > > > > > > head(l4)
> > > > > >     X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> > > > > > 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> > > > > > 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> > > > > > 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> > > > > > 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> > > > > > 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> > > > > > 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> > > > > > > head(asign)
> > > > > >               gene  chr                chr_pos   pos p.val.Retina
> > > > > > 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> > > > > > 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> > > > > > 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> > > > > > 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> > > > > > 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> > > > > > 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> > > > > > > m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> > > > > > Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> > > > > >   negative length vectors are not allowed
> > > > > > > sapply(l4,class)
> > > > > >           X1           X2           X3           X4           X5   variant_id
> > > > > >  "character"  "character"  "character"  "character"  "character"  "character"
> > > > > > pval_nominal  gene_id.LCL
> > > > > >    "numeric"  "character"
> > > > > > > sapply(asign,class)
> > > > > >         gene          chr      chr_pos          pos p.val.Retina
> > > > > >  "character"  "character"  "character"  "character"  "character"
> > > > > >
> > > > > > Please advise as to why I am getting this error when merging?
> > > > > >
> > > > > > Thanks
> > > > > > Ana
> > > > > >
> > > > > > ______________________________________________
> > > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Duncan Murdoch-2
In reply to this post by anikaM
On 23/10/2019 7:04 p.m., Ana Marija wrote:
> I also tried left_join but I got: Error: std::bad_alloc
>
>> df3 <- left_join(l4, asign, by = c("chr","pos")
> Error: std::bad_alloc

Looks like bugs in whatever package you're finding "left_join" in (and
previously "merge").  Are those from dplyr and base?  Showing us
str(lr), str(asign), and sessionInfo() would be helpful.

Duncan Murdoch

>> dim(l4)
> [1] 166941635         8
>> dim(asign)
> [1] 107371528         5
>
> On Wed, Oct 23, 2019 at 5:32 PM Ana Marija <[hidden email]> wrote:
>>
>> Hello,
>>
>> I have two data frames like this:
>>
>>> head(l4)
>>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
>>> head(asign)
>>                gene  chr                chr_pos   pos p.val.Retina
>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
>>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
>>    negative length vectors are not allowed
>>> sapply(l4,class)
>>            X1           X2           X3           X4           X5   variant_id
>>   "character"  "character"  "character"  "character"  "character"  "character"
>> pval_nominal  gene_id.LCL
>>     "numeric"  "character"
>>> sapply(asign,class)
>>          gene          chr      chr_pos          pos p.val.Retina
>>   "character"  "character"  "character"  "character"  "character"
>>
>> Please advise as to why I am getting this error when merging?
>>
>> Thanks
>> Ana
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

anikaM
I am using R-3.6.1
and these libraries:
library(data.table)
library(dplyr)

On Wed, Oct 23, 2019 at 6:54 PM Duncan Murdoch <[hidden email]> wrote:

>
> On 23/10/2019 7:04 p.m., Ana Marija wrote:
> > I also tried left_join but I got: Error: std::bad_alloc
> >
> >> df3 <- left_join(l4, asign, by = c("chr","pos")
> > Error: std::bad_alloc
>
> Looks like bugs in whatever package you're finding "left_join" in (and
> previously "merge").  Are those from dplyr and base?  Showing us
> str(lr), str(asign), and sessionInfo() would be helpful.
>
> Duncan Murdoch
>
> >> dim(l4)
> > [1] 166941635         8
> >> dim(asign)
> > [1] 107371528         5
> >
> > On Wed, Oct 23, 2019 at 5:32 PM Ana Marija <[hidden email]> wrote:
> >>
> >> Hello,
> >>
> >> I have two data frames like this:
> >>
> >>> head(l4)
> >>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> >> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> >> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> >> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> >> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> >> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> >> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> >>> head(asign)
> >>                gene  chr                chr_pos   pos p.val.Retina
> >> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> >> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> >> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> >> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> >> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> >> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> >>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> >> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> >>    negative length vectors are not allowed
> >>> sapply(l4,class)
> >>            X1           X2           X3           X4           X5   variant_id
> >>   "character"  "character"  "character"  "character"  "character"  "character"
> >> pval_nominal  gene_id.LCL
> >>     "numeric"  "character"
> >>> sapply(asign,class)
> >>          gene          chr      chr_pos          pos p.val.Retina
> >>   "character"  "character"  "character"  "character"  "character"
> >>
> >> Please advise as to why I am getting this error when merging?
> >>
> >> Thanks
> >> Ana
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Jeff Newmiller
Ana... contributed packages like data.table and dplyr are developed completely independently from R, have their own versions, and in fact both of them have recommendations as to how to report bugs in their package descriptions.

As for getting help here, you really need to supply ALL of the information requested to make forward progress in clarifying next steps... there were several items that Duncan mentioned that you failed to provide.

Also, note that dplyr and data.table take very different approaches to handling data, and have been known to not play well with each other. At the very least I would suggest using as.data.frame to convert to a standardized data representation before switching from using functions in one of these packages to using functions in the other package.

[1] https://cran.r-project.org/web/packages/data.table/index.html

[2] https://cran.r-project.org/web/packages/dplyr/index.html

On October 23, 2019 5:05:44 PM PDT, Ana Marija <[hidden email]> wrote:

>I am using R-3.6.1
>and these libraries:
>library(data.table)
>library(dplyr)
>
>On Wed, Oct 23, 2019 at 6:54 PM Duncan Murdoch
><[hidden email]> wrote:
>>
>> On 23/10/2019 7:04 p.m., Ana Marija wrote:
>> > I also tried left_join but I got: Error: std::bad_alloc
>> >
>> >> df3 <- left_join(l4, asign, by = c("chr","pos")
>> > Error: std::bad_alloc
>>
>> Looks like bugs in whatever package you're finding "left_join" in
>(and
>> previously "merge").  Are those from dplyr and base?  Showing us
>> str(lr), str(asign), and sessionInfo() would be helpful.
>>
>> Duncan Murdoch
>>
>> >> dim(l4)
>> > [1] 166941635         8
>> >> dim(asign)
>> > [1] 107371528         5
>> >
>> > On Wed, Oct 23, 2019 at 5:32 PM Ana Marija
><[hidden email]> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I have two data frames like this:
>> >>
>> >>> head(l4)
>> >>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
>> >> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>> >> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>> >> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>> >> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>> >> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>> >> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
>> >>> head(asign)
>> >>                gene  chr                chr_pos   pos p.val.Retina
>> >> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>> >> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>> >> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>> >> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>> >> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>> >> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
>> >>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>> >> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y =
>c("chr",  :
>> >>    negative length vectors are not allowed
>> >>> sapply(l4,class)
>> >>            X1           X2           X3           X4           X5
> variant_id
>> >>   "character"  "character"  "character"  "character"  "character"
>"character"
>> >> pval_nominal  gene_id.LCL
>> >>     "numeric"  "character"
>> >>> sapply(asign,class)
>> >>          gene          chr      chr_pos          pos p.val.Retina
>> >>   "character"  "character"  "character"  "character"  "character"
>> >>
>> >> Please advise as to why I am getting this error when merging?
>> >>
>> >> Thanks
>> >> Ana
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Michael Dewey-3
In reply to this post by anikaM
Dear Ana

Since this appears to be genetics data have you thought of looking at
Bioconductor for help? I do not use genetic data-sets but people there
must use big files every day three times before breakfast.

Michael

On 24/10/2019 00:33, Ana Marija wrote:

> thanks but I would need solution in R
>
> On Wed, Oct 23, 2019 at 6:31 PM Jim Lemon <[hidden email]> wrote:
>>
>> I don't have it installed - that was merely a suggestion. I notice
>> that both data.table and dplyr packages are mentioned as possibilities
>> for "merge big datasets in r". Apparently the best way to do it if you
>> have a database manager is to read the two datasets into tables and do
>> the join via SQL or whatever language is available.
>>
>> Jim
>>
>> On Thu, Oct 24, 2019 at 10:17 AM Ana Marija <[hidden email]> wrote:
>>>
>>> no can you please send me an example how the command would look like in my case?
>>>
>>> On Wed, Oct 23, 2019 at 6:16 PM Jim Lemon <[hidden email]> wrote:
>>>>
>>>> Yes. Have you tried the bigmemory package?
>>>>
>>>> Jim
>>>>
>>>> On Thu, Oct 24, 2019 at 10:08 AM Ana Marija <[hidden email]> wrote:
>>>>>
>>>>> Hi Jim,
>>>>>
>>>>> I think one of the issue is that data frames are so big,
>>>>>> dim(l4)
>>>>> [1] 166941635         8
>>>>>> dim(asign)
>>>>> [1] 107371528         5
>>>>>
>>>>> so my example would not reproduce the error
>>>>>
>>>>> On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
>>>>>>
>>>>>> Hi Ana,
>>>>>> When I run this example taken from your email:
>>>>>>
>>>>>> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
>>>>>> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>>>>>> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>>>>>> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>>>>>> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>>>>>> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>>>>>> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
>>>>>> header=TRUE,stringsAsFactors=FALSE)
>>>>>> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
>>>>>> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>>>>>> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>>>>>> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>>>>>> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>>>>>> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>>>>>> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
>>>>>> header=TRUE,stringsAsFactors=FALSE)
>>>>>> merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>>>>>>   [1] X1           X2           X3           X4           X5
>>>>>> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
>>>>>> [11] p.val.Retina
>>>>>> <0 rows> (or 0-length row.names)
>>>>>>
>>>>>> It works okay, but there are no matches in the join. So I can't even
>>>>>> guess what the problem is.
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have two data frames like this:
>>>>>>>
>>>>>>>> head(l4)
>>>>>>>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
>>>>>>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>>>>>>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>>>>>>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>>>>>>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>>>>>>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>>>>>>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
>>>>>>>> head(asign)
>>>>>>>                gene  chr                chr_pos   pos p.val.Retina
>>>>>>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>>>>>>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>>>>>>> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>>>>>>> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>>>>>>> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>>>>>>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
>>>>>>>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>>>>>>> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
>>>>>>>    negative length vectors are not allowed
>>>>>>>> sapply(l4,class)
>>>>>>>            X1           X2           X3           X4           X5   variant_id
>>>>>>>   "character"  "character"  "character"  "character"  "character"  "character"
>>>>>>> pval_nominal  gene_id.LCL
>>>>>>>     "numeric"  "character"
>>>>>>>> sapply(asign,class)
>>>>>>>          gene          chr      chr_pos          pos p.val.Retina
>>>>>>>   "character"  "character"  "character"  "character"  "character"
>>>>>>>
>>>>>>> Please advise as to why I am getting this error when merging?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ana
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Michael
http://www.dewey.myzen.co.uk/home.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

Rui Barradas
In reply to this post by anikaM
Hello,

Sometimes sqldf::sqldf tends to save memory. Maybe if you try

library(sqldf)

sqldf('select l4.*, asign.gene, asign.chr_pos, asign.`p.val.Retina`
       from l4
       inner join asign
       on X1 = asign.chr and X2 = asign.pos')

Or you can filter the rows that match first, then merge the results.
Something along the lines of

# read in only the columns needed with fread, it's fast
l4join <- data.table::fread(l4_file, select = c("X1", "X2"))
ajoin <- data.table::fread(asign_file, select = c("chr", "pos"))

# create indices with the matches on both sides
i1 <- (l4join$X1 %in% ajoin$chr) & (l4join$X2 %in% ajoin$pos)
i2 <- (ajoin$chr %in% l4join$X1) & (ajoin$pos %in% l4join$X2)

rm(l4join, ajoin)   # don't need this any more, remove them

# now the real fread's
l4 <- data.table::fread(l4_file)
asign <- data.table::fread(asign_file)

# extract the relevant rows and merge
res <- l4[i1, ]
res2 <- asign[i2, setdiff(names(asign), names(l4))]
merge(res, res2, by.x = c("X1", "X2"), by.y = c("chr", "pos"))


Hope this helps,

Rui Barradas






Às 00:08 de 24/10/19, Ana Marija escreveu:

> Hi Jim,
>
> I think one of the issue is that data frames are so big,
>> dim(l4)
> [1] 166941635         8
>> dim(asign)
> [1] 107371528         5
>
> so my example would not reproduce the error
>
> On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
>>
>> Hi Ana,
>> When I run this example taken from your email:
>>
>> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
>> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
>> header=TRUE,stringsAsFactors=FALSE)
>> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
>> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
>> header=TRUE,stringsAsFactors=FALSE)
>> merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>>   [1] X1           X2           X3           X4           X5
>> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
>> [11] p.val.Retina
>> <0 rows> (or 0-length row.names)
>>
>> It works okay, but there are no matches in the join. So I can't even
>> guess what the problem is.
>>
>> Jim
>>
>> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> I have two data frames like this:
>>>
>>>> head(l4)
>>>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
>>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
>>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
>>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
>>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
>>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
>>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
>>>> head(asign)
>>>                gene  chr                chr_pos   pos p.val.Retina
>>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
>>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
>>> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
>>> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
>>> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
>>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
>>>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
>>> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
>>>    negative length vectors are not allowed
>>>> sapply(l4,class)
>>>            X1           X2           X3           X4           X5   variant_id
>>>   "character"  "character"  "character"  "character"  "character"  "character"
>>> pval_nominal  gene_id.LCL
>>>     "numeric"  "character"
>>>> sapply(asign,class)
>>>          gene          chr      chr_pos          pos p.val.Retina
>>>   "character"  "character"  "character"  "character"  "character"
>>>
>>> Please advise as to why I am getting this error when merging?
>>>
>>> Thanks
>>> Ana
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: negative vector length when merging data frames

anikaM
HI Rui,

thank you so much for this. I tried with the sqldf but it didn't help.
Next I tried your 2nd method and I was following your steps until:

> res2 <- asign[i2, setdiff(names(asign), names(l4))]
> m=merge(res, res2, by.x = c("chr", "pos"), by.y = c("chr", "pos"))
Error in merge.data.table(res, res2, by.x = c("chr", "pos"), by.y = c("chr",  :
  Elements listed in `by.y` must be valid column names in y.
> head(res)
    chr   pos a1 a2  a3         variant_id pval_nominal           gene_id
1: chr1 54490  G  A b38 chr1_54490_G_A_b38     0.608495 ENSG00000227232.5
2: chr1 58814  G  A b38 chr1_58814_G_A_b38     0.295211 ENSG00000227232.5
3: chr1 60351  A  G b38 chr1_60351_A_G_b38     0.439788 ENSG00000227232.5
4: chr1 61920  G  A b38 chr1_61920_G_A_b38     0.319528 ENSG00000227232.5
5: chr1 63671  G  A b38 chr1_63671_G_A_b38     0.237739 ENSG00000227232.5
6: chr1 64931  G  A b38 chr1_64931_G_A_b38     0.276679 ENSG00000227232.5
> head(res2)
[1] "gene"         "chr_pos"      "p.val.Retina"
> dim(res)
[1] 111478253         8
> head(l4)
    chr   pos a1 a2  a3         variant_id pval_nominal           gene_id
1: chr1 13550  G  A b38 chr1_13550_G_A_b38     0.375614 ENSG00000227232.5
2: chr1 14671  G  C b38 chr1_14671_G_C_b38     0.474708 ENSG00000227232.5
3: chr1 14677  G  A b38 chr1_14677_G_A_b38     0.699887 ENSG00000227232.5
4: chr1 16841  G  T b38 chr1_16841_G_T_b38     0.127895 ENSG00000227232.5
5: chr1 16856  A  G b38 chr1_16856_A_G_b38     0.627822 ENSG00000227232.5
6: chr1 17005  A  G b38 chr1_17005_A_G_b38     0.802803 ENSG00000227232.5
> head(asign)
              gene  chr                chr_pos   pos p.val.Retina
1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> length(i2)
[1] 107371528

Everything is the same as I stated initially in the problem, except
that as you can see in l4 I renamed columns so now
instead of X1 and X2 I have "chr", "pos"

Do you know why this command didn't return anything?
res2 <- asign[i2, setdiff(names(asign), names(l4))]

On Thu, Oct 24, 2019 at 2:17 PM Rui Barradas <[hidden email]> wrote:

>
> Hello,
>
> Sometimes sqldf::sqldf tends to save memory. Maybe if you try
>
> library(sqldf)
>
> sqldf('select l4.*, asign.gene, asign.chr_pos, asign.`p.val.Retina`
>        from l4
>        inner join asign
>        on X1 = asign.chr and X2 = asign.pos')
>
> Or you can filter the rows that match first, then merge the results.
> Something along the lines of
>
> # read in only the columns needed with fread, it's fast
> l4join <- data.table::fread(l4_file, select = c("X1", "X2"))
> ajoin <- data.table::fread(asign_file, select = c("chr", "pos"))
>
> # create indices with the matches on both sides
> i1 <- (l4join$X1 %in% ajoin$chr) & (l4join$X2 %in% ajoin$pos)
> i2 <- (ajoin$chr %in% l4join$X1) & (ajoin$pos %in% l4join$X2)
>
> rm(l4join, ajoin)   # don't need this any more, remove them
>
> # now the real fread's
> l4 <- data.table::fread(l4_file)
> asign <- data.table::fread(asign_file)
>
> # extract the relevant rows and merge
> res <- l4[i1, ]
> res2 <- asign[i2, setdiff(names(asign), names(l4))]
> merge(res, res2, by.x = c("X1", "X2"), by.y = c("chr", "pos"))
>
>
> Hope this helps,
>
> Rui Barradas
>
>
>
>
>
>
> Às 00:08 de 24/10/19, Ana Marija escreveu:
> > Hi Jim,
> >
> > I think one of the issue is that data frames are so big,
> >> dim(l4)
> > [1] 166941635         8
> >> dim(asign)
> > [1] 107371528         5
> >
> > so my example would not reproduce the error
> >
> > On Wed, Oct 23, 2019 at 6:05 PM Jim Lemon <[hidden email]> wrote:
> >>
> >> Hi Ana,
> >> When I run this example taken from your email:
> >>
> >> l4<-read.table(text="X1 X2 X3 X4  X5 variant_id pval_nominal gene_id.LCL
> >> chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> >> chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> >> chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> >> chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> >> chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> >> chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232",
> >> header=TRUE,stringsAsFactors=FALSE)
> >> asign<-read.table(text="gene  chr  chr_pos   pos p.val.Retina
> >> ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> >> ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> >> ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> >> ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> >> ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> >> ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572",
> >> header=TRUE,stringsAsFactors=FALSE)
> >> merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> >>   [1] X1           X2           X3           X4           X5
> >> [6] variant_id   pval_nominal gene_id.LCL  gene         chr_pos
> >> [11] p.val.Retina
> >> <0 rows> (or 0-length row.names)
> >>
> >> It works okay, but there are no matches in the join. So I can't even
> >> guess what the problem is.
> >>
> >> Jim
> >>
> >> On Thu, Oct 24, 2019 at 9:33 AM Ana Marija <[hidden email]> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I have two data frames like this:
> >>>
> >>>> head(l4)
> >>>      X1    X2 X3 X4  X5  variant_id pval_nominal     gene_id.LCL
> >>> 1 chr1 13550  G  A b38 1:13550:G:A     0.375614 ENSG00000227232
> >>> 2 chr1 14671  G  C b38 1:14671:G:C     0.474708 ENSG00000227232
> >>> 3 chr1 14677  G  A b38 1:14677:G:A     0.699887 ENSG00000227232
> >>> 4 chr1 16841  G  T b38 1:16841:G:T     0.127895 ENSG00000227232
> >>> 5 chr1 16856  A  G b38 1:16856:A:G     0.627822 ENSG00000227232
> >>> 6 chr1 17005  A  G b38 1:17005:A:G     0.802803 ENSG00000227232
> >>>> head(asign)
> >>>                gene  chr                chr_pos   pos p.val.Retina
> >>> 1: ENSG00000227232 chr1           1:10177:A:AC 10177     0.381708
> >>> 2: ENSG00000227232 chr1 rs145072688:10352:T:TA 10352     0.959523
> >>> 3: ENSG00000227232 chr1            1:11008:C:G 11008     0.218132
> >>> 4: ENSG00000227232 chr1            1:11012:C:G 11012     0.218132
> >>> 5: ENSG00000227232 chr1            1:13110:G:A 13110     0.998262
> >>> 6: ENSG00000227232 chr1  rs201725126:13116:T:G 13116     0.438572
> >>>> m = merge(l4, asign, by.x=c("X1", "X2"), by.y=c("chr", "pos"))
> >>> Error in merge.data.frame(l4, asign, by.x = c("X1", "X2"), by.y = c("chr",  :
> >>>    negative length vectors are not allowed
> >>>> sapply(l4,class)
> >>>            X1           X2           X3           X4           X5   variant_id
> >>>   "character"  "character"  "character"  "character"  "character"  "character"
> >>> pval_nominal  gene_id.LCL
> >>>     "numeric"  "character"
> >>>> sapply(asign,class)
> >>>          gene          chr      chr_pos          pos p.val.Retina
> >>>   "character"  "character"  "character"  "character"  "character"
> >>>
> >>> Please advise as to why I am getting this error when merging?
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> ______________________________________________
> >>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.