Loading large tar.gz XenaHub Data into R

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Loading large tar.gz XenaHub Data into R

Spencer Brackett
Good evening,

I am attempting to load the following Xena dataset
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz

I am trying to unpack the dataset and read it into R as a table, but due to
the size of the file, I am having some trouble. The following are the
commands I have tried thus far.

HumanMethylation450 <- fread("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
")

readLines("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
")

                 ###These two above attempts failed with warning messages
from R###

Methyl <-read.delim("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
")

               ##This attempt is still processing, but has been doing so
for quite some time##

Any ideas as to what else I could try?

Best,

Spencer

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loading large tar.gz XenaHub Data into R

Bert Gunter-2
These are gzipped files, I assume. So see ?gzfile and associated info
for how to open a gzip connection and read from it. You may also
prefer to search (e.g. at rseek.org) on "read a gzipped file" or
similar for possible alternatives.

Of course, if they're not gzipped files, then ignore the above. If
they are, your current approach is hopeless.


Cheers,
Bert

On Thu, Aug 1, 2019 at 3:13 PM Spencer Brackett
<[hidden email]> wrote:

>
> Good evening,
>
> I am attempting to load the following Xena dataset
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
>
> I am trying to unpack the dataset and read it into R as a table, but due to
> the size of the file, I am having some trouble. The following are the
> commands I have tried thus far.
>
> HumanMethylation450 <- fread("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> ")
>
> readLines("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> ")
>
>                  ###These two above attempts failed with warning messages
> from R###
>
> Methyl <-read.delim("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> ")
>
>                ##This attempt is still processing, but has been doing so
> for quite some time##
>
> Any ideas as to what else I could try?
>
> Best,
>
> Spencer
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loading large tar.gz XenaHub Data into R

R help mailing list-2
By the way, instead of saying only that there were warnings, it would be
nice to show some of them.  E.g.,
> z <- readLines("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
")
[ Hit control-C or Esc to interrupt, or wait a long time ]
There were 50 or more warnings (use warnings() to see the first 50)
> warnings()
Warning messages:
1: In readLines("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")
:
  line 1 appears to contain an embedded nul
2: In readLines("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")
:
  line 4 appears to contain an embedded nul
3: In readLines("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")
:
  line 7 appears to contain an embedded nul

Burt's guess looks right, as the following gives 10 long lines of
reasonable-looking data.  Remove the 'n=10' to get all of it.

z <- readLines(gzcon(url("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")),
n=10)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 1, 2019 at 3:37 PM Bert Gunter <[hidden email]> wrote:

> These are gzipped files, I assume. So see ?gzfile and associated info
> for how to open a gzip connection and read from it. You may also
> prefer to search (e.g. at rseek.org) on "read a gzipped file" or
> similar for possible alternatives.
>
> Of course, if they're not gzipped files, then ignore the above. If
> they are, your current approach is hopeless.
>
>
> Cheers,
> Bert
>
> On Thu, Aug 1, 2019 at 3:13 PM Spencer Brackett
> <[hidden email]> wrote:
> >
> > Good evening,
> >
> > I am attempting to load the following Xena dataset
> >
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> >
> > I am trying to unpack the dataset and read it into R as a table, but due
> to
> > the size of the file, I am having some trouble. The following are the
> > commands I have tried thus far.
> >
> > HumanMethylation450 <- fread("
> >
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> > ")
> >
> > readLines("
> >
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> > ")
> >
> >                  ###These two above attempts failed with warning messages
> > from R###
> >
> > Methyl <-read.delim("
> >
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> > ")
> >
> >                ##This attempt is still processing, but has been doing so
> > for quite some time##
> >
> > Any ideas as to what else I could try?
> >
> > Best,
> >
> > Spencer
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Loading large tar.gz XenaHub Data into R

Spencer Brackett
Thank you both for your advice! The z <- readLines(gzcon(url("
https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")),
) command worked out nicely

On Thu, Aug 1, 2019 at 6:47 PM William Dunlap <[hidden email]> wrote:

> By the way, instead of saying only that there were warnings, it would be
> nice to show some of them.  E.g.,
> > z <- readLines("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
> ")
> [ Hit control-C or Esc to interrupt, or wait a long time ]
> There were 50 or more warnings (use warnings() to see the first 50)
> > warnings()
> Warning messages:
> 1: In readLines("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")
> :
>   line 1 appears to contain an embedded nul
> 2: In readLines("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")
> :
>   line 4 appears to contain an embedded nul
> 3: In readLines("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")
> :
>   line 7 appears to contain an embedded nul
>
> Burt's guess looks right, as the following gives 10 long lines of
> reasonable-looking data.  Remove the 'n=10' to get all of it.
>
> z <- readLines(gzcon(url("
> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz")),
> n=10)
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Aug 1, 2019 at 3:37 PM Bert Gunter <[hidden email]> wrote:
>
>> These are gzipped files, I assume. So see ?gzfile and associated info
>> for how to open a gzip connection and read from it. You may also
>> prefer to search (e.g. at rseek.org) on "read a gzipped file" or
>> similar for possible alternatives.
>>
>> Of course, if they're not gzipped files, then ignore the above. If
>> they are, your current approach is hopeless.
>>
>>
>> Cheers,
>> Bert
>>
>> On Thu, Aug 1, 2019 at 3:13 PM Spencer Brackett
>> <[hidden email]> wrote:
>> >
>> > Good evening,
>> >
>> > I am attempting to load the following Xena dataset
>> >
>> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
>> >
>> > I am trying to unpack the dataset and read it into R as a table, but
>> due to
>> > the size of the file, I am having some trouble. The following are the
>> > commands I have tried thus far.
>> >
>> > HumanMethylation450 <- fread("
>> >
>> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
>> > ")
>> >
>> > readLines("
>> >
>> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
>> > ")
>> >
>> >                  ###These two above attempts failed with warning
>> messages
>> > from R###
>> >
>> > Methyl <-read.delim("
>> >
>> https://tcga.xenahubs.net/download/TCGA.GBMLGG.sampleMap/HumanMethylation450.gz
>> > ")
>> >
>> >                ##This attempt is still processing, but has been doing so
>> > for quite some time##
>> >
>> > Any ideas as to what else I could try?
>> >
>> > Best,
>> >
>> > Spencer
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.