Help converting .txt to .csv file

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Help converting .txt to .csv file

Spencer Brackett
Good evening,

I am attempting to anaylze the protein expression data contained within
these two ICGC, TCGA datasets (one for GBM and the other for LGG)

*File for GBM  protein expression*:
https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D

*File for LGG protein expression:*


*https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
<https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D>*

  When I tried to transfer the files from .txt (via Notepad) to .csv (via
Excel), the data appeared in the columns as unorganized and random
script... not like how a typical csv should be arranged at all. I need the
dataset to be converted into .csv in order to analyze it in R, which is why
I am hoping someone here might help me in doing that. If not, is there
perhaps some other way that I could analyze the datatsets on R, which again
is downloaded from the dataportal ICGC?

Best,

Spencer Brackett

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help converting .txt to .csv file

Bert Gunter-2
Inline.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Dec 26, 2018 at 3:04 PM Spencer Brackett <
[hidden email]> wrote:

> Good evening,
>
> I am attempting to anaylze the protein expression data contained within
> these two ICGC, TCGA datasets (one for GBM and the other for LGG)
>
> ...
>   When I tried to transfer the files from .txt (via Notepad) to .csv (via
> Excel), the data appeared in the columns as unorganized and random
> script... not like how a typical csv should be arranged at all. I need the
> dataset to be converted into .csv in order to analyze it in R,


Huh?? Why do you think this? A csv is just a comma delimited text file.

R can input pretty much any kind of file, ONCE YOU KNOW THE FORMAT OF WHAT
YOU ARE INPUTTING. This should be provided by the links that you gave. Then
see ?read.table or, more generally, ?scan for how to read the (text) file
into R into whatever data structure you need. See also the R data
import/export manual. Or possibly post to the Bioconductor list where they
specialize in this sort of thing and may already have packages that can
access the repositories and bring in the data in the form you need them.
They also have lots of software there for analysis, too.

Cheers,
Bert






> which is why
> I am hoping someone here might help me in doing that. If not, is there
> perhaps some other way that I could analyze the datatsets on R, which again
> is downloaded from the dataportal ICGC?
>
> Best,
>
> Spencer Brackett
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help converting .txt to .csv file

Richard M. Heiberger
In reply to this post by Spencer Brackett
I looked at the first file.  It gives an option to download as TSV
(tab separated values).
That is the same as CSV except with tabs instead of commas.
You do not need any external software to read it.  Read the downloaded
file directly into R.

read.delim looks as if it would work directly on the downloaded file.
?read.delim
The notation "\t" means the tab character.

As an aside, stay away from notepad. it is too naive for almost
anything interesting.
The specific case I often see is people reading linux-style text files
with notepad, which doesn't
understand NL terminated lines.  nicely formatted text files become illegible.

On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
<[hidden email]> wrote:

>
> Good evening,
>
> I am attempting to anaylze the protein expression data contained within
> these two ICGC, TCGA datasets (one for GBM and the other for LGG)
>
> *File for GBM  protein expression*:
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>
> *File for LGG protein expression:*
>
>
> *https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> <https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D>*
>
>   When I tried to transfer the files from .txt (via Notepad) to .csv (via
> Excel), the data appeared in the columns as unorganized and random
> script... not like how a typical csv should be arranged at all. I need the
> dataset to be converted into .csv in order to analyze it in R, which is why
> I am hoping someone here might help me in doing that. If not, is there
> perhaps some other way that I could analyze the datatsets on R, which again
> is downloaded from the dataportal ICGC?
>
> Best,
>
> Spencer Brackett
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help converting .txt to .csv file

Spencer Brackett
Mr. Heiberger,

 Thank you for the insight! I will try out suggestion.

Best,

Spencer Brackett

On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <[hidden email]> wrote:

> I looked at the first file.  It gives an option to download as TSV
> (tab separated values).
> That is the same as CSV except with tabs instead of commas.
> You do not need any external software to read it.  Read the downloaded
> file directly into R.
>
> read.delim looks as if it would work directly on the downloaded file.
> ?read.delim
> The notation "\t" means the tab character.
>
> As an aside, stay away from notepad. it is too naive for almost
> anything interesting.
> The specific case I often see is people reading linux-style text files
> with notepad, which doesn't
> understand NL terminated lines.  nicely formatted text files become
> illegible.
>
> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
> <[hidden email]> wrote:
> >
> > Good evening,
> >
> > I am attempting to anaylze the protein expression data contained within
> > these two ICGC, TCGA datasets (one for GBM and the other for LGG)
> >
> > *File for GBM  protein expression*:
> >
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >
> > *File for LGG protein expression:*
> >
> >
> > *
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> > <
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >*
> >
> >   When I tried to transfer the files from .txt (via Notepad) to .csv (via
> > Excel), the data appeared in the columns as unorganized and random
> > script... not like how a typical csv should be arranged at all. I need
> the
> > dataset to be converted into .csv in order to analyze it in R, which is
> why
> > I am hoping someone here might help me in doing that. If not, is there
> > perhaps some other way that I could analyze the datatsets on R, which
> again
> > is downloaded from the dataportal ICGC?
> >
> > Best,
> >
> > Spencer Brackett
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help converting .txt to .csv file

Spencer Brackett
Hello again,

I worked on directly downloading the file into R as was suggested, but have
thus far been unsuccessful. This is what  I generated on my second
attempt...

 GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
Error: unexpected symbol in "GBM protein_expression"
> GBM
protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
sep="\t")
Error: unexpected symbol in "GBM protein_expression"
>

What part of the argument is in error?

Also I tried importing the dataset as an excel file on RStudio to see if I
could solve my problem that way. However, my imported excel file has been
stuck in the 'retrieving preview data' and no data is appearing. Is the
data file prehaps too large or in the wrong format?



On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
[hidden email]> wrote:

> Mr. Heiberger,
>
>  Thank you for the insight! I will try out suggestion.
>
> Best,
>
> Spencer Brackett
>
> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <[hidden email]>
> wrote:
>
>> I looked at the first file.  It gives an option to download as TSV
>> (tab separated values).
>> That is the same as CSV except with tabs instead of commas.
>> You do not need any external software to read it.  Read the downloaded
>> file directly into R.
>>
>> read.delim looks as if it would work directly on the downloaded file.
>> ?read.delim
>> The notation "\t" means the tab character.
>>
>> As an aside, stay away from notepad. it is too naive for almost
>> anything interesting.
>> The specific case I often see is people reading linux-style text files
>> with notepad, which doesn't
>> understand NL terminated lines.  nicely formatted text files become
>> illegible.
>>
>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>> <[hidden email]> wrote:
>> >
>> > Good evening,
>> >
>> > I am attempting to anaylze the protein expression data contained within
>> > these two ICGC, TCGA datasets (one for GBM and the other for LGG)
>> >
>> > *File for GBM  protein expression*:
>> >
>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >
>> > *File for LGG protein expression:*
>> >
>> >
>> > *
>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> > <
>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >*
>> >
>> >   When I tried to transfer the files from .txt (via Notepad) to .csv
>> (via
>> > Excel), the data appeared in the columns as unorganized and random
>> > script... not like how a typical csv should be arranged at all. I need
>> the
>> > dataset to be converted into .csv in order to analyze it in R, which is
>> why
>> > I am hoping someone here might help me in doing that. If not, is there
>> > perhaps some other way that I could analyze the datatsets on R, which
>> again
>> > is downloaded from the dataportal ICGC?
>> >
>> > Best,
>> >
>> > Spencer Brackett
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Fwd: UPDATE

Spencer Brackett
I tried importing the file without preview and recieved the following....

library(readxl)
> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
protein_expression.csv")
Error: Can't establish that the input is either xls or xlsx.
> View(GBM_protein_expression)
Error in View : object 'GBM_protein_expression' not found
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
  cannot open compressed file
'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
probable reason 'No such file or directory'
> library(readxl)
> GBM_protein_expression <-
read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
readxl works best with a newer version of the tibble package.
You currently have tibble v1.4.2.
Falling back to column name repair from tibble <= v1.4.2.
Message displays once per session.
> View(GBM_protein_expression)

Also, the area above my console says that no data is available in the
table. Is this perhaps the result of lack of preview or the fact that the
excel file itself contains no numerical data, but only TRUE or FALSE
entries?

On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
[hidden email]> wrote:

> Hello again,
>
> I worked on directly downloading the file into R as was suggested, but
> have thus far been unsuccessful. This is what  I generated on my second
> attempt...
>
>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
> Error: unexpected symbol in "GBM protein_expression"
> > GBM
> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
> sep="\t")
> Error: unexpected symbol in "GBM protein_expression"
> >
>
> What part of the argument is in error?
>
> Also I tried importing the dataset as an excel file on RStudio to see if I
> could solve my problem that way. However, my imported excel file has been
> stuck in the 'retrieving preview data' and no data is appearing. Is the
> data file prehaps too large or in the wrong format?
>
>
>
> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
> [hidden email]> wrote:
>
>> Mr. Heiberger,
>>
>>  Thank you for the insight! I will try out suggestion.
>>
>> Best,
>>
>> Spencer Brackett
>>
>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <[hidden email]>
>> wrote:
>>
>>> I looked at the first file.  It gives an option to download as TSV
>>> (tab separated values).
>>> That is the same as CSV except with tabs instead of commas.
>>> You do not need any external software to read it.  Read the downloaded
>>> file directly into R.
>>>
>>> read.delim looks as if it would work directly on the downloaded file.
>>> ?read.delim
>>> The notation "\t" means the tab character.
>>>
>>> As an aside, stay away from notepad. it is too naive for almost
>>> anything interesting.
>>> The specific case I often see is people reading linux-style text files
>>> with notepad, which doesn't
>>> understand NL terminated lines.  nicely formatted text files become
>>> illegible.
>>>
>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>>> <[hidden email]> wrote:
>>> >
>>> > Good evening,
>>> >
>>> > I am attempting to anaylze the protein expression data contained within
>>> > these two ICGC, TCGA datasets (one for GBM and the other for LGG)
>>> >
>>> > *File for GBM  protein expression*:
>>> >
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >
>>> > *File for LGG protein expression:*
>>> >
>>> >
>>> > *
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> > <
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >*
>>> >
>>> >   When I tried to transfer the files from .txt (via Notepad) to .csv
>>> (via
>>> > Excel), the data appeared in the columns as unorganized and random
>>> > script... not like how a typical csv should be arranged at all. I need
>>> the
>>> > dataset to be converted into .csv in order to analyze it in R, which
>>> is why
>>> > I am hoping someone here might help me in doing that. If not, is there
>>> > perhaps some other way that I could analyze the datatsets on R, which
>>> again
>>> > is downloaded from the dataportal ICGC?
>>> >
>>> > Best,
>>> >
>>> > Spencer Brackett
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Fwd: UPDATE

Spencer Brackett
I tried importing the file without preview and recieved the following....

library(readxl)
> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
protein_expression.csv")
Error: Can't establish that the input is either xls or xlsx.
> View(GBM_protein_expression)
Error in View : object 'GBM_protein_expression' not found
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
  cannot open compressed file
'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
probable reason 'No such file or directory'
> library(readxl)
> GBM_protein_expression <-
read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
readxl works best with a newer version of the tibble package.
You currently have tibble v1.4.2.
Falling back to column name repair from tibble <= v1.4.2.
Message displays once per session.
> View(GBM_protein_expression)

Also, the area above my console says that no data is available in the
table. Is this perhaps the result of lack of preview or the fact that the
excel file itself contains no numerical data, but only TRUE or FALSE
entries?

On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
[hidden email]> wrote:

> Hello again,
>
> I worked on directly downloading the file into R as was suggested, but
> have thus far been unsuccessful. This is what  I generated on my second
> attempt...
>
>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
> Error: unexpected symbol in "GBM protein_expression"
> > GBM
> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
> sep="\t")
> Error: unexpected symbol in "GBM protein_expression"
> >
>
> What part of the argument is in error?
>
> Also I tried importing the dataset as an excel file on RStudio to see if I
> could solve my problem that way. However, my imported excel file has been
> stuck in the 'retrieving preview data' and no data is appearing. Is the
> data file prehaps too large or in the wrong format?
>
>
>
> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
> [hidden email]> wrote:
>
>> Mr. Heiberger,
>>
>>  Thank you for the insight! I will try out suggestion.
>>
>> Best,
>>
>> Spencer Brackett
>>
>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <[hidden email]>
>> wrote:
>>
>>> I looked at the first file.  It gives an option to download as TSV
>>> (tab separated values).
>>> That is the same as CSV except with tabs instead of commas.
>>> You do not need any external software to read it.  Read the downloaded
>>> file directly into R.
>>>
>>> read.delim looks as if it would work directly on the downloaded file.
>>> ?read.delim
>>> The notation "\t" means the tab character.
>>>
>>> As an aside, stay away from notepad. it is too naive for almost
>>> anything interesting.
>>> The specific case I often see is people reading linux-style text files
>>> with notepad, which doesn't
>>> understand NL terminated lines.  nicely formatted text files become
>>> illegible.
>>>
>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>>> <[hidden email]> wrote:
>>> >
>>> > Good evening,
>>> >
>>> > I am attempting to anaylze the protein expression data contained within
>>> > these two ICGC, TCGA datasets (one for GBM and the other for LGG)
>>> >
>>> > *File for GBM  protein expression*:
>>> >
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >
>>> > *File for LGG protein expression:*
>>> >
>>> >
>>> > *
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> > <
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >*
>>> >
>>> >   When I tried to transfer the files from .txt (via Notepad) to .csv
>>> (via
>>> > Excel), the data appeared in the columns as unorganized and random
>>> > script... not like how a typical csv should be arranged at all. I need
>>> the
>>> > dataset to be converted into .csv in order to analyze it in R, which
>>> is why
>>> > I am hoping someone here might help me in doing that. If not, is there
>>> > perhaps some other way that I could analyze the datatsets on R, which
>>> again
>>> > is downloaded from the dataportal ICGC?
>>> >
>>> > Best,
>>> >
>>> > Spencer Brackett
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Jeff Newmiller
CSV and TSV are not Excel files. Yes, I know Excel will open them, but that does not make them Excel files.

Read a TSV file with read.table or read.csv, setting the sep argument to "\t".

On December 26, 2018 7:26:35 PM PST, Spencer Brackett <[hidden email]> wrote:

>I tried importing the file without preview and recieved the
>following....
>
>library(readxl)
>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>protein_expression.csv")
>Error: Can't establish that the input is either xls or xlsx.
>> View(GBM_protein_expression)
>Error in View : object 'GBM_protein_expression' not found
>Error in gzfile(file, mode) : cannot open the connection
>In addition: Warning message:
>In gzfile(file, mode) :
>  cannot open compressed file
>'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>probable reason 'No such file or directory'
>> library(readxl)
>> GBM_protein_expression <-
>read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>readxl works best with a newer version of the tibble package.
>You currently have tibble v1.4.2.
>Falling back to column name repair from tibble <= v1.4.2.
>Message displays once per session.
>> View(GBM_protein_expression)
>
>Also, the area above my console says that no data is available in the
>table. Is this perhaps the result of lack of preview or the fact that
>the
>excel file itself contains no numerical data, but only TRUE or FALSE
>entries?
>
>On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
>[hidden email]> wrote:
>
>> Hello again,
>>
>> I worked on directly downloading the file into R as was suggested,
>but
>> have thus far been unsuccessful. This is what  I generated on my
>second
>> attempt...
>>
>>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
>> Error: unexpected symbol in "GBM protein_expression"
>> > GBM
>>
>protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
>> sep="\t")
>> Error: unexpected symbol in "GBM protein_expression"
>> >
>>
>> What part of the argument is in error?
>>
>> Also I tried importing the dataset as an excel file on RStudio to see
>if I
>> could solve my problem that way. However, my imported excel file has
>been
>> stuck in the 'retrieving preview data' and no data is appearing. Is
>the
>> data file prehaps too large or in the wrong format?
>>
>>
>>
>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
>> [hidden email]> wrote:
>>
>>> Mr. Heiberger,
>>>
>>>  Thank you for the insight! I will try out suggestion.
>>>
>>> Best,
>>>
>>> Spencer Brackett
>>>
>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
><[hidden email]>
>>> wrote:
>>>
>>>> I looked at the first file.  It gives an option to download as TSV
>>>> (tab separated values).
>>>> That is the same as CSV except with tabs instead of commas.
>>>> You do not need any external software to read it.  Read the
>downloaded
>>>> file directly into R.
>>>>
>>>> read.delim looks as if it would work directly on the downloaded
>file.
>>>> ?read.delim
>>>> The notation "\t" means the tab character.
>>>>
>>>> As an aside, stay away from notepad. it is too naive for almost
>>>> anything interesting.
>>>> The specific case I often see is people reading linux-style text
>files
>>>> with notepad, which doesn't
>>>> understand NL terminated lines.  nicely formatted text files become
>>>> illegible.
>>>>
>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>>>> <[hidden email]> wrote:
>>>> >
>>>> > Good evening,
>>>> >
>>>> > I am attempting to anaylze the protein expression data contained
>within
>>>> > these two ICGC, TCGA datasets (one for GBM and the other for LGG)
>>>> >
>>>> > *File for GBM  protein expression*:
>>>> >
>>>>
>https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>>> >
>>>> > *File for LGG protein expression:*
>>>> >
>>>> >
>>>> > *
>>>>
>https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>>> > <
>>>>
>https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>>> >*
>>>> >
>>>> >   When I tried to transfer the files from .txt (via Notepad) to
>.csv
>>>> (via
>>>> > Excel), the data appeared in the columns as unorganized and
>random
>>>> > script... not like how a typical csv should be arranged at all. I
>need
>>>> the
>>>> > dataset to be converted into .csv in order to analyze it in R,
>which
>>>> is why
>>>> > I am hoping someone here might help me in doing that. If not, is
>there
>>>> > perhaps some other way that I could analyze the datatsets on R,
>which
>>>> again
>>>> > is downloaded from the dataportal ICGC?
>>>> >
>>>> > Best,
>>>> >
>>>> > Spencer Brackett
>>>> >
>>>> >         [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible
>code.
>>>>
>>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Jeff Newmiller
Please always reply-all to keep the list involved.

If you used Save As to change the data format to Excel AND the file extension to xlsx, then yes, you should be able to read with readxl. I don't recommend it, though... Excel often changes data silently and in irregularly located places in your file.

On December 26, 2018 7:38:16 PM PST, Spencer Brackett <[hidden email]> wrote:

>So even if I imported the file form ICGC to my desktop as an excel
>file,
>and can view and saved the data as such, it is still a TSV?
>
>On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
><[hidden email]>
>wrote:
>
>> CSV and TSV are not Excel files. Yes, I know Excel will open them,
>but
>> that does not make them Excel files.
>>
>> Read a TSV file with read.table or read.csv, setting the sep argument
>to
>> "\t".
>>
>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
>> [hidden email]> wrote:
>> >I tried importing the file without preview and recieved the
>> >following....
>> >
>> >library(readxl)
>> >> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>> >protein_expression.csv")
>> >Error: Can't establish that the input is either xls or xlsx.
>> >> View(GBM_protein_expression)
>> >Error in View : object 'GBM_protein_expression' not found
>> >Error in gzfile(file, mode) : cannot open the connection
>> >In addition: Warning message:
>> >In gzfile(file, mode) :
>> >  cannot open compressed file
>>
>>'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>> >probable reason 'No such file or directory'
>> >> library(readxl)
>> >> GBM_protein_expression <-
>> >read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>> >readxl works best with a newer version of the tibble package.
>> >You currently have tibble v1.4.2.
>> >Falling back to column name repair from tibble <= v1.4.2.
>> >Message displays once per session.
>> >> View(GBM_protein_expression)
>> >
>> >Also, the area above my console says that no data is available in
>the
>> >table. Is this perhaps the result of lack of preview or the fact
>that
>> >the
>> >excel file itself contains no numerical data, but only TRUE or FALSE
>> >entries?
>> >
>> >On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
>> >[hidden email]> wrote:
>> >
>> >> Hello again,
>> >>
>> >> I worked on directly downloading the file into R as was suggested,
>> >but
>> >> have thus far been unsuccessful. This is what  I generated on my
>> >second
>> >> attempt...
>> >>
>> >>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
>> >> Error: unexpected symbol in "GBM protein_expression"
>> >> > GBM
>> >>
>>
>>protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
>> >> sep="\t")
>> >> Error: unexpected symbol in "GBM protein_expression"
>> >> >
>> >>
>> >> What part of the argument is in error?
>> >>
>> >> Also I tried importing the dataset as an excel file on RStudio to
>see
>> >if I
>> >> could solve my problem that way. However, my imported excel file
>has
>> >been
>> >> stuck in the 'retrieving preview data' and no data is appearing.
>Is
>> >the
>> >> data file prehaps too large or in the wrong format?
>> >>
>> >>
>> >>
>> >> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
>> >> [hidden email]> wrote:
>> >>
>> >>> Mr. Heiberger,
>> >>>
>> >>>  Thank you for the insight! I will try out suggestion.
>> >>>
>> >>> Best,
>> >>>
>> >>> Spencer Brackett
>> >>>
>> >>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
>> ><[hidden email]>
>> >>> wrote:
>> >>>
>> >>>> I looked at the first file.  It gives an option to download as
>TSV
>> >>>> (tab separated values).
>> >>>> That is the same as CSV except with tabs instead of commas.
>> >>>> You do not need any external software to read it.  Read the
>> >downloaded
>> >>>> file directly into R.
>> >>>>
>> >>>> read.delim looks as if it would work directly on the downloaded
>> >file.
>> >>>> ?read.delim
>> >>>> The notation "\t" means the tab character.
>> >>>>
>> >>>> As an aside, stay away from notepad. it is too naive for almost
>> >>>> anything interesting.
>> >>>> The specific case I often see is people reading linux-style text
>> >files
>> >>>> with notepad, which doesn't
>> >>>> understand NL terminated lines.  nicely formatted text files
>become
>> >>>> illegible.
>> >>>>
>> >>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>> >>>> <[hidden email]> wrote:
>> >>>> >
>> >>>> > Good evening,
>> >>>> >
>> >>>> > I am attempting to anaylze the protein expression data
>contained
>> >within
>> >>>> > these two ICGC, TCGA datasets (one for GBM and the other for
>LGG)
>> >>>> >
>> >>>> > *File for GBM  protein expression*:
>> >>>> >
>> >>>>
>> >
>>
>https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >>>> >
>> >>>> > *File for LGG protein expression:*
>> >>>> >
>> >>>> >
>> >>>> > *
>> >>>>
>> >
>>
>https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >>>> > <
>> >>>>
>> >
>>
>https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >>>> >*
>> >>>> >
>> >>>> >   When I tried to transfer the files from .txt (via Notepad)
>to
>> >.csv
>> >>>> (via
>> >>>> > Excel), the data appeared in the columns as unorganized and
>> >random
>> >>>> > script... not like how a typical csv should be arranged at
>all. I
>> >need
>> >>>> the
>> >>>> > dataset to be converted into .csv in order to analyze it in R,
>> >which
>> >>>> is why
>> >>>> > I am hoping someone here might help me in doing that. If not,
>is
>> >there
>> >>>> > perhaps some other way that I could analyze the datatsets on
>R,
>> >which
>> >>>> again
>> >>>> > is downloaded from the dataportal ICGC?
>> >>>> >
>> >>>> > Best,
>> >>>> >
>> >>>> > Spencer Brackett
>> >>>> >
>> >>>> >         [[alternative HTML version deleted]]
>> >>>> >
>> >>>> > ______________________________________________
>> >>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>see
>> >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>> > PLEASE do read the posting guide
>> >>>> http://www.R-project.org/posting-guide.html
>> >>>> > and provide commented, minimal, self-contained, reproducible
>> >code.
>> >>>>
>> >>>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >______________________________________________
>> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Spencer Brackett
Sorry, my mistake.

So I could still use read.table and should I try using a .txt version of
the file to avoid the silent changes you described?

Also, when I tried to simply this process by downloading the dataset onto
RStudio opposed to R (Gui) I received the following...
 library(readxl)
> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
protein_expression.csv")
Error: Can't establish that the input is either xls or xlsx.
> View(GBM_protein_expression)
Error in View : object 'GBM_protein_expression' not found
Error in gzfile(file, mode) : cannot open the connection
In addition: Warning message:
In gzfile(file, mode) :
  cannot open compressed file
'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
probable reason 'No such file or directory'
> library(readxl)
> GBM_protein_expression <-
read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
readxl works best with a newer version of the tibble package.
You currently have tibble v1.4.2.
Falling back to column name repair from tibble <= v1.4.2.
Message displays once per session.
> View(GBM_protein_expression)


Is this perhaps the result of lack of preview (which I did not complete at
the time I hit import as the preview failed to load), or the fact that the
excel file itself contains no numerical data, but only TRUE or FALSE
entries?

On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <[hidden email]>
wrote:

> Please always reply-all to keep the list involved.
>
> If you used Save As to change the data format to Excel AND the file
> extension to xlsx, then yes, you should be able to read with readxl. I
> don't recommend it, though... Excel often changes data silently and in
> irregularly located places in your file.
>
> On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
> [hidden email]> wrote:
> >So even if I imported the file form ICGC to my desktop as an excel
> >file,
> >and can view and saved the data as such, it is still a TSV?
> >
> >On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
> ><[hidden email]>
> >wrote:
> >
> >> CSV and TSV are not Excel files. Yes, I know Excel will open them,
> >but
> >> that does not make them Excel files.
> >>
> >> Read a TSV file with read.table or read.csv, setting the sep argument
> >to
> >> "\t".
> >>
> >> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
> >> [hidden email]> wrote:
> >> >I tried importing the file without preview and recieved the
> >> >following....
> >> >
> >> >library(readxl)
> >> >> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> >> >protein_expression.csv")
> >> >Error: Can't establish that the input is either xls or xlsx.
> >> >> View(GBM_protein_expression)
> >> >Error in View : object 'GBM_protein_expression' not found
> >> >Error in gzfile(file, mode) : cannot open the connection
> >> >In addition: Warning message:
> >> >In gzfile(file, mode) :
> >> >  cannot open compressed file
> >>
> >>'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
> >> >probable reason 'No such file or directory'
> >> >> library(readxl)
> >> >> GBM_protein_expression <-
> >> >read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
> >> >readxl works best with a newer version of the tibble package.
> >> >You currently have tibble v1.4.2.
> >> >Falling back to column name repair from tibble <= v1.4.2.
> >> >Message displays once per session.
> >> >> View(GBM_protein_expression)
> >> >
> >> >Also, the area above my console says that no data is available in
> >the
> >> >table. Is this perhaps the result of lack of preview or the fact
> >that
> >> >the
> >> >excel file itself contains no numerical data, but only TRUE or FALSE
> >> >entries?
> >> >
> >> >On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
> >> >[hidden email]> wrote:
> >> >
> >> >> Hello again,
> >> >>
> >> >> I worked on directly downloading the file into R as was suggested,
> >> >but
> >> >> have thus far been unsuccessful. This is what  I generated on my
> >> >second
> >> >> attempt...
> >> >>
> >> >>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
> >> >> Error: unexpected symbol in "GBM protein_expression"
> >> >> > GBM
> >> >>
> >>
>
> >>protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
> >> >> sep="\t")
> >> >> Error: unexpected symbol in "GBM protein_expression"
> >> >> >
> >> >>
> >> >> What part of the argument is in error?
> >> >>
> >> >> Also I tried importing the dataset as an excel file on RStudio to
> >see
> >> >if I
> >> >> could solve my problem that way. However, my imported excel file
> >has
> >> >been
> >> >> stuck in the 'retrieving preview data' and no data is appearing.
> >Is
> >> >the
> >> >> data file prehaps too large or in the wrong format?
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
> >> >> [hidden email]> wrote:
> >> >>
> >> >>> Mr. Heiberger,
> >> >>>
> >> >>>  Thank you for the insight! I will try out suggestion.
> >> >>>
> >> >>> Best,
> >> >>>
> >> >>> Spencer Brackett
> >> >>>
> >> >>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
> >> ><[hidden email]>
> >> >>> wrote:
> >> >>>
> >> >>>> I looked at the first file.  It gives an option to download as
> >TSV
> >> >>>> (tab separated values).
> >> >>>> That is the same as CSV except with tabs instead of commas.
> >> >>>> You do not need any external software to read it.  Read the
> >> >downloaded
> >> >>>> file directly into R.
> >> >>>>
> >> >>>> read.delim looks as if it would work directly on the downloaded
> >> >file.
> >> >>>> ?read.delim
> >> >>>> The notation "\t" means the tab character.
> >> >>>>
> >> >>>> As an aside, stay away from notepad. it is too naive for almost
> >> >>>> anything interesting.
> >> >>>> The specific case I often see is people reading linux-style text
> >> >files
> >> >>>> with notepad, which doesn't
> >> >>>> understand NL terminated lines.  nicely formatted text files
> >become
> >> >>>> illegible.
> >> >>>>
> >> >>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
> >> >>>> <[hidden email]> wrote:
> >> >>>> >
> >> >>>> > Good evening,
> >> >>>> >
> >> >>>> > I am attempting to anaylze the protein expression data
> >contained
> >> >within
> >> >>>> > these two ICGC, TCGA datasets (one for GBM and the other for
> >LGG)
> >> >>>> >
> >> >>>> > *File for GBM  protein expression*:
> >> >>>> >
> >> >>>>
> >> >
> >>
> >
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >> >>>> >
> >> >>>> > *File for LGG protein expression:*
> >> >>>> >
> >> >>>> >
> >> >>>> > *
> >> >>>>
> >> >
> >>
> >
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >> >>>> > <
> >> >>>>
> >> >
> >>
> >
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >> >>>> >*
> >> >>>> >
> >> >>>> >   When I tried to transfer the files from .txt (via Notepad)
> >to
> >> >.csv
> >> >>>> (via
> >> >>>> > Excel), the data appeared in the columns as unorganized and
> >> >random
> >> >>>> > script... not like how a typical csv should be arranged at
> >all. I
> >> >need
> >> >>>> the
> >> >>>> > dataset to be converted into .csv in order to analyze it in R,
> >> >which
> >> >>>> is why
> >> >>>> > I am hoping someone here might help me in doing that. If not,
> >is
> >> >there
> >> >>>> > perhaps some other way that I could analyze the datatsets on
> >R,
> >> >which
> >> >>>> again
> >> >>>> > is downloaded from the dataportal ICGC?
> >> >>>> >
> >> >>>> > Best,
> >> >>>> >
> >> >>>> > Spencer Brackett
> >> >>>> >
> >> >>>> >         [[alternative HTML version deleted]]
> >> >>>> >
> >> >>>> > ______________________________________________
> >> >>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> >see
> >> >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> >>>> > PLEASE do read the posting guide
> >> >>>> http://www.R-project.org/posting-guide.html
> >> >>>> > and provide commented, minimal, self-contained, reproducible
> >> >code.
> >> >>>>
> >> >>>
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> >______________________________________________
> >> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> >https://stat.ethz.ch/mailman/listinfo/r-help
> >> >PLEASE do read the posting guide
> >> >http://www.R-project.org/posting-guide.html
> >> >and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >>
>
> --
> Sent from my phone. Please excuse my brevity.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Richard M. Heiberger
this is wrong because the file is a csv file.  read_excel is designed
for xls files.
GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
protein_expression.csv")

How did you get a csv? it downloads as tsv.

the statement you should use is in base, no library() statement is needed.

GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM
protein_expression.csv")

read.delim is the same as read.csv except that it sets the sep
argument to "\t".



On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett
<[hidden email]> wrote:

>
> Sorry, my mistake.
>
> So I could still use read.table and should I try using a .txt version of
> the file to avoid the silent changes you described?
>
> Also, when I tried to simply this process by downloading the dataset onto
> RStudio opposed to R (Gui) I received the following...
>  library(readxl)
> > GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> protein_expression.csv")
> Error: Can't establish that the input is either xls or xlsx.
> > View(GBM_protein_expression)
> Error in View : object 'GBM_protein_expression' not found
> Error in gzfile(file, mode) : cannot open the connection
> In addition: Warning message:
> In gzfile(file, mode) :
>   cannot open compressed file
> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
> probable reason 'No such file or directory'
> > library(readxl)
> > GBM_protein_expression <-
> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
> readxl works best with a newer version of the tibble package.
> You currently have tibble v1.4.2.
> Falling back to column name repair from tibble <= v1.4.2.
> Message displays once per session.
> > View(GBM_protein_expression)
>
>
> Is this perhaps the result of lack of preview (which I did not complete at
> the time I hit import as the preview failed to load), or the fact that the
> excel file itself contains no numerical data, but only TRUE or FALSE
> entries?
>
> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <[hidden email]>
> wrote:
>
> > Please always reply-all to keep the list involved.
> >
> > If you used Save As to change the data format to Excel AND the file
> > extension to xlsx, then yes, you should be able to read with readxl. I
> > don't recommend it, though... Excel often changes data silently and in
> > irregularly located places in your file.
> >
> > On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
> > [hidden email]> wrote:
> > >So even if I imported the file form ICGC to my desktop as an excel
> > >file,
> > >and can view and saved the data as such, it is still a TSV?
> > >
> > >On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
> > ><[hidden email]>
> > >wrote:
> > >
> > >> CSV and TSV are not Excel files. Yes, I know Excel will open them,
> > >but
> > >> that does not make them Excel files.
> > >>
> > >> Read a TSV file with read.table or read.csv, setting the sep argument
> > >to
> > >> "\t".
> > >>
> > >> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
> > >> [hidden email]> wrote:
> > >> >I tried importing the file without preview and recieved the
> > >> >following....
> > >> >
> > >> >library(readxl)
> > >> >> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> > >> >protein_expression.csv")
> > >> >Error: Can't establish that the input is either xls or xlsx.
> > >> >> View(GBM_protein_expression)
> > >> >Error in View : object 'GBM_protein_expression' not found
> > >> >Error in gzfile(file, mode) : cannot open the connection
> > >> >In addition: Warning message:
> > >> >In gzfile(file, mode) :
> > >> >  cannot open compressed file
> > >>
> > >>'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
> > >> >probable reason 'No such file or directory'
> > >> >> library(readxl)
> > >> >> GBM_protein_expression <-
> > >> >read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
> > >> >readxl works best with a newer version of the tibble package.
> > >> >You currently have tibble v1.4.2.
> > >> >Falling back to column name repair from tibble <= v1.4.2.
> > >> >Message displays once per session.
> > >> >> View(GBM_protein_expression)
> > >> >
> > >> >Also, the area above my console says that no data is available in
> > >the
> > >> >table. Is this perhaps the result of lack of preview or the fact
> > >that
> > >> >the
> > >> >excel file itself contains no numerical data, but only TRUE or FALSE
> > >> >entries?
> > >> >
> > >> >On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
> > >> >[hidden email]> wrote:
> > >> >
> > >> >> Hello again,
> > >> >>
> > >> >> I worked on directly downloading the file into R as was suggested,
> > >> >but
> > >> >> have thus far been unsuccessful. This is what  I generated on my
> > >> >second
> > >> >> attempt...
> > >> >>
> > >> >>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
> > >> >> Error: unexpected symbol in "GBM protein_expression"
> > >> >> > GBM
> > >> >>
> > >>
> >
> > >>protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
> > >> >> sep="\t")
> > >> >> Error: unexpected symbol in "GBM protein_expression"
> > >> >> >
> > >> >>
> > >> >> What part of the argument is in error?
> > >> >>
> > >> >> Also I tried importing the dataset as an excel file on RStudio to
> > >see
> > >> >if I
> > >> >> could solve my problem that way. However, my imported excel file
> > >has
> > >> >been
> > >> >> stuck in the 'retrieving preview data' and no data is appearing.
> > >Is
> > >> >the
> > >> >> data file prehaps too large or in the wrong format?
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
> > >> >> [hidden email]> wrote:
> > >> >>
> > >> >>> Mr. Heiberger,
> > >> >>>
> > >> >>>  Thank you for the insight! I will try out suggestion.
> > >> >>>
> > >> >>> Best,
> > >> >>>
> > >> >>> Spencer Brackett
> > >> >>>
> > >> >>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
> > >> ><[hidden email]>
> > >> >>> wrote:
> > >> >>>
> > >> >>>> I looked at the first file.  It gives an option to download as
> > >TSV
> > >> >>>> (tab separated values).
> > >> >>>> That is the same as CSV except with tabs instead of commas.
> > >> >>>> You do not need any external software to read it.  Read the
> > >> >downloaded
> > >> >>>> file directly into R.
> > >> >>>>
> > >> >>>> read.delim looks as if it would work directly on the downloaded
> > >> >file.
> > >> >>>> ?read.delim
> > >> >>>> The notation "\t" means the tab character.
> > >> >>>>
> > >> >>>> As an aside, stay away from notepad. it is too naive for almost
> > >> >>>> anything interesting.
> > >> >>>> The specific case I often see is people reading linux-style text
> > >> >files
> > >> >>>> with notepad, which doesn't
> > >> >>>> understand NL terminated lines.  nicely formatted text files
> > >become
> > >> >>>> illegible.
> > >> >>>>
> > >> >>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
> > >> >>>> <[hidden email]> wrote:
> > >> >>>> >
> > >> >>>> > Good evening,
> > >> >>>> >
> > >> >>>> > I am attempting to anaylze the protein expression data
> > >contained
> > >> >within
> > >> >>>> > these two ICGC, TCGA datasets (one for GBM and the other for
> > >LGG)
> > >> >>>> >
> > >> >>>> > *File for GBM  protein expression*:
> > >> >>>> >
> > >> >>>>
> > >> >
> > >>
> > >
> > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> > >> >>>> >
> > >> >>>> > *File for LGG protein expression:*
> > >> >>>> >
> > >> >>>> >
> > >> >>>> > *
> > >> >>>>
> > >> >
> > >>
> > >
> > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> > >> >>>> > <
> > >> >>>>
> > >> >
> > >>
> > >
> > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> > >> >>>> >*
> > >> >>>> >
> > >> >>>> >   When I tried to transfer the files from .txt (via Notepad)
> > >to
> > >> >.csv
> > >> >>>> (via
> > >> >>>> > Excel), the data appeared in the columns as unorganized and
> > >> >random
> > >> >>>> > script... not like how a typical csv should be arranged at
> > >all. I
> > >> >need
> > >> >>>> the
> > >> >>>> > dataset to be converted into .csv in order to analyze it in R,
> > >> >which
> > >> >>>> is why
> > >> >>>> > I am hoping someone here might help me in doing that. If not,
> > >is
> > >> >there
> > >> >>>> > perhaps some other way that I could analyze the datatsets on
> > >R,
> > >> >which
> > >> >>>> again
> > >> >>>> > is downloaded from the dataportal ICGC?
> > >> >>>> >
> > >> >>>> > Best,
> > >> >>>> >
> > >> >>>> > Spencer Brackett
> > >> >>>> >
> > >> >>>> >         [[alternative HTML version deleted]]
> > >> >>>> >
> > >> >>>> > ______________________________________________
> > >> >>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> > >see
> > >> >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >>>> > PLEASE do read the posting guide
> > >> >>>> http://www.R-project.org/posting-guide.html
> > >> >>>> > and provide commented, minimal, self-contained, reproducible
> > >> >code.
> > >> >>>>
> > >> >>>
> > >> >
> > >> >       [[alternative HTML version deleted]]
> > >> >
> > >> >______________________________________________
> > >> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >> >https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >PLEASE do read the posting guide
> > >> >http://www.R-project.org/posting-guide.html
> > >> >and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >> --
> > >> Sent from my phone. Please excuse my brevity.
> > >>
> >
> > --
> > Sent from my phone. Please excuse my brevity.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Caitlin Gibbons
Does this help Spencer? The read.delim() function assumes a tab character by default, but I specifically included it using the read.csv function. The downloaded file is NOT an Excel file so this should help.

GBM_protein_expression <- read.csv("C:/Users/Spencer/Desktop/GBM
protein_expression.tsv", sep=“\t”)

Sent from my iPhone

> On Dec 26, 2018, at 9:23 PM, Richard M. Heiberger <[hidden email]> wrote:
>
> this is wrong because the file is a csv file.  read_excel is designed
> for xls files.
> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> protein_expression.csv")
>
> How did you get a csv? it downloads as tsv.
>
> the statement you should use is in base, no library() statement is needed.
>
> GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM
> protein_expression.csv")
>
> read.delim is the same as read.csv except that it sets the sep
> argument to "\t".
>
>
>
> On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett
> <[hidden email]> wrote:
>>
>> Sorry, my mistake.
>>
>> So I could still use read.table and should I try using a .txt version of
>> the file to avoid the silent changes you described?
>>
>> Also, when I tried to simply this process by downloading the dataset onto
>> RStudio opposed to R (Gui) I received the following...
>> library(readxl)
>>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>> protein_expression.csv")
>> Error: Can't establish that the input is either xls or xlsx.
>>> View(GBM_protein_expression)
>> Error in View : object 'GBM_protein_expression' not found
>> Error in gzfile(file, mode) : cannot open the connection
>> In addition: Warning message:
>> In gzfile(file, mode) :
>>  cannot open compressed file
>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>> probable reason 'No such file or directory'
>>> library(readxl)
>>> GBM_protein_expression <-
>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>> readxl works best with a newer version of the tibble package.
>> You currently have tibble v1.4.2.
>> Falling back to column name repair from tibble <= v1.4.2.
>> Message displays once per session.
>>> View(GBM_protein_expression)
>>
>>
>> Is this perhaps the result of lack of preview (which I did not complete at
>> the time I hit import as the preview failed to load), or the fact that the
>> excel file itself contains no numerical data, but only TRUE or FALSE
>> entries?
>>
>> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <[hidden email]>
>> wrote:
>>
>>> Please always reply-all to keep the list involved.
>>>
>>> If you used Save As to change the data format to Excel AND the file
>>> extension to xlsx, then yes, you should be able to read with readxl. I
>>> don't recommend it, though... Excel often changes data silently and in
>>> irregularly located places in your file.
>>>
>>> On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
>>> [hidden email]> wrote:
>>>> So even if I imported the file form ICGC to my desktop as an excel
>>>> file,
>>>> and can view and saved the data as such, it is still a TSV?
>>>>
>>>> On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
>>>> <[hidden email]>
>>>> wrote:
>>>>
>>>>> CSV and TSV are not Excel files. Yes, I know Excel will open them,
>>>> but
>>>>> that does not make them Excel files.
>>>>>
>>>>> Read a TSV file with read.table or read.csv, setting the sep argument
>>>> to
>>>>> "\t".
>>>>>
>>>>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
>>>>> [hidden email]> wrote:
>>>>>> I tried importing the file without preview and recieved the
>>>>>> following....
>>>>>>
>>>>>> library(readxl)
>>>>>>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>>>>>> protein_expression.csv")
>>>>>> Error: Can't establish that the input is either xls or xlsx.
>>>>>>> View(GBM_protein_expression)
>>>>>> Error in View : object 'GBM_protein_expression' not found
>>>>>> Error in gzfile(file, mode) : cannot open the connection
>>>>>> In addition: Warning message:
>>>>>> In gzfile(file, mode) :
>>>>>> cannot open compressed file
>>>>>
>>>>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>>>>>> probable reason 'No such file or directory'
>>>>>>> library(readxl)
>>>>>>> GBM_protein_expression <-
>>>>>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>>>>>> readxl works best with a newer version of the tibble package.
>>>>>> You currently have tibble v1.4.2.
>>>>>> Falling back to column name repair from tibble <= v1.4.2.
>>>>>> Message displays once per session.
>>>>>>> View(GBM_protein_expression)
>>>>>>
>>>>>> Also, the area above my console says that no data is available in
>>>> the
>>>>>> table. Is this perhaps the result of lack of preview or the fact
>>>> that
>>>>>> the
>>>>>> excel file itself contains no numerical data, but only TRUE or FALSE
>>>>>> entries?
>>>>>>
>>>>>> On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
>>>>>> [hidden email]> wrote:
>>>>>>
>>>>>>> Hello again,
>>>>>>>
>>>>>>> I worked on directly downloading the file into R as was suggested,
>>>>>> but
>>>>>>> have thus far been unsuccessful. This is what  I generated on my
>>>>>> second
>>>>>>> attempt...
>>>>>>>
>>>>>>> GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
>>>>>>> Error: unexpected symbol in "GBM protein_expression"
>>>>>>>> GBM
>>>>>>>
>>>>>
>>>
>>>>> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
>>>>>>> sep="\t")
>>>>>>> Error: unexpected symbol in "GBM protein_expression"
>>>>>>>>
>>>>>>>
>>>>>>> What part of the argument is in error?
>>>>>>>
>>>>>>> Also I tried importing the dataset as an excel file on RStudio to
>>>> see
>>>>>> if I
>>>>>>> could solve my problem that way. However, my imported excel file
>>>> has
>>>>>> been
>>>>>>> stuck in the 'retrieving preview data' and no data is appearing.
>>>> Is
>>>>>> the
>>>>>>> data file prehaps too large or in the wrong format?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>>> Mr. Heiberger,
>>>>>>>>
>>>>>>>> Thank you for the insight! I will try out suggestion.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Spencer Brackett
>>>>>>>>
>>>>>>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I looked at the first file.  It gives an option to download as
>>>> TSV
>>>>>>>>> (tab separated values).
>>>>>>>>> That is the same as CSV except with tabs instead of commas.
>>>>>>>>> You do not need any external software to read it.  Read the
>>>>>> downloaded
>>>>>>>>> file directly into R.
>>>>>>>>>
>>>>>>>>> read.delim looks as if it would work directly on the downloaded
>>>>>> file.
>>>>>>>>> ?read.delim
>>>>>>>>> The notation "\t" means the tab character.
>>>>>>>>>
>>>>>>>>> As an aside, stay away from notepad. it is too naive for almost
>>>>>>>>> anything interesting.
>>>>>>>>> The specific case I often see is people reading linux-style text
>>>>>> files
>>>>>>>>> with notepad, which doesn't
>>>>>>>>> understand NL terminated lines.  nicely formatted text files
>>>> become
>>>>>>>>> illegible.
>>>>>>>>>
>>>>>>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> Good evening,
>>>>>>>>>>
>>>>>>>>>> I am attempting to anaylze the protein expression data
>>>> contained
>>>>>> within
>>>>>>>>>> these two ICGC, TCGA datasets (one for GBM and the other for
>>>> LGG)
>>>>>>>>>>
>>>>>>>>>> *File for GBM  protein expression*:
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>>>>>>>>>
>>>>>>>>>> *File for LGG protein expression:*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>>>>>>>>> <
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>>  When I tried to transfer the files from .txt (via Notepad)
>>>> to
>>>>>> .csv
>>>>>>>>> (via
>>>>>>>>>> Excel), the data appeared in the columns as unorganized and
>>>>>> random
>>>>>>>>>> script... not like how a typical csv should be arranged at
>>>> all. I
>>>>>> need
>>>>>>>>> the
>>>>>>>>>> dataset to be converted into .csv in order to analyze it in R,
>>>>>> which
>>>>>>>>> is why
>>>>>>>>>> I am hoping someone here might help me in doing that. If not,
>>>> is
>>>>>> there
>>>>>>>>>> perhaps some other way that I could analyze the datatsets on
>>>> R,
>>>>>> which
>>>>>>>>> again
>>>>>>>>>> is downloaded from the dataportal ICGC?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Spencer Brackett
>>>>>>>>>>
>>>>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more,
>>>> see
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>> code.
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>      [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> --
>>>>> Sent from my phone. Please excuse my brevity.
>>>>>
>>>
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Spencer Brackett
Caitlin,

  I tried your command in both RGui and RStudio but both came up as errors.
I believe I made a mistake somewhere I labeling/downloading the files,
which is the source of the confusion in R. I will re-examine the files
saved on my desktop to determine the error. Regardless, would it be better
to use a read.table or read.csv function when attempting to download my
datasets? I tried using read.xl on RStudio as this process seemed much
easier, however, it would seem that my proclivity to error prevents such.

Best,

Spencer

On Wed, Dec 26, 2018 at 11:55 PM Caitlin Gibbons <[hidden email]>
wrote:

> Does this help Spencer? The read.delim() function assumes a tab character
> by default, but I specifically included it using the read.csv function. The
> downloaded file is NOT an Excel file so this should help.
>
> GBM_protein_expression <- read.csv("C:/Users/Spencer/Desktop/GBM
> protein_expression.tsv", sep=“\t”)
>
> Sent from my iPhone
>
> > On Dec 26, 2018, at 9:23 PM, Richard M. Heiberger <[hidden email]>
> wrote:
> >
> > this is wrong because the file is a csv file.  read_excel is designed
> > for xls files.
> > GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> > protein_expression.csv")
> >
> > How did you get a csv? it downloads as tsv.
> >
> > the statement you should use is in base, no library() statement is
> needed.
> >
> > GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM
> > protein_expression.csv")
> >
> > read.delim is the same as read.csv except that it sets the sep
> > argument to "\t".
> >
> >
> >
> > On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett
> > <[hidden email]> wrote:
> >>
> >> Sorry, my mistake.
> >>
> >> So I could still use read.table and should I try using a .txt version of
> >> the file to avoid the silent changes you described?
> >>
> >> Also, when I tried to simply this process by downloading the dataset
> onto
> >> RStudio opposed to R (Gui) I received the following...
> >> library(readxl)
> >>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> >> protein_expression.csv")
> >> Error: Can't establish that the input is either xls or xlsx.
> >>> View(GBM_protein_expression)
> >> Error in View : object 'GBM_protein_expression' not found
> >> Error in gzfile(file, mode) : cannot open the connection
> >> In addition: Warning message:
> >> In gzfile(file, mode) :
> >>  cannot open compressed file
> >> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
> >> probable reason 'No such file or directory'
> >>> library(readxl)
> >>> GBM_protein_expression <-
> >> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
> >> readxl works best with a newer version of the tibble package.
> >> You currently have tibble v1.4.2.
> >> Falling back to column name repair from tibble <= v1.4.2.
> >> Message displays once per session.
> >>> View(GBM_protein_expression)
> >>
> >>
> >> Is this perhaps the result of lack of preview (which I did not complete
> at
> >> the time I hit import as the preview failed to load), or the fact that
> the
> >> excel file itself contains no numerical data, but only TRUE or FALSE
> >> entries?
> >>
> >> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <
> [hidden email]>
> >> wrote:
> >>
> >>> Please always reply-all to keep the list involved.
> >>>
> >>> If you used Save As to change the data format to Excel AND the file
> >>> extension to xlsx, then yes, you should be able to read with readxl. I
> >>> don't recommend it, though... Excel often changes data silently and in
> >>> irregularly located places in your file.
> >>>
> >>> On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
> >>> [hidden email]> wrote:
> >>>> So even if I imported the file form ICGC to my desktop as an excel
> >>>> file,
> >>>> and can view and saved the data as such, it is still a TSV?
> >>>>
> >>>> On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
> >>>> <[hidden email]>
> >>>> wrote:
> >>>>
> >>>>> CSV and TSV are not Excel files. Yes, I know Excel will open them,
> >>>> but
> >>>>> that does not make them Excel files.
> >>>>>
> >>>>> Read a TSV file with read.table or read.csv, setting the sep argument
> >>>> to
> >>>>> "\t".
> >>>>>
> >>>>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
> >>>>> [hidden email]> wrote:
> >>>>>> I tried importing the file without preview and recieved the
> >>>>>> following....
> >>>>>>
> >>>>>> library(readxl)
> >>>>>>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
> >>>>>> protein_expression.csv")
> >>>>>> Error: Can't establish that the input is either xls or xlsx.
> >>>>>>> View(GBM_protein_expression)
> >>>>>> Error in View : object 'GBM_protein_expression' not found
> >>>>>> Error in gzfile(file, mode) : cannot open the connection
> >>>>>> In addition: Warning message:
> >>>>>> In gzfile(file, mode) :
> >>>>>> cannot open compressed file
> >>>>>
> >>>>>
> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
> >>>>>> probable reason 'No such file or directory'
> >>>>>>> library(readxl)
> >>>>>>> GBM_protein_expression <-
> >>>>>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
> >>>>>> readxl works best with a newer version of the tibble package.
> >>>>>> You currently have tibble v1.4.2.
> >>>>>> Falling back to column name repair from tibble <= v1.4.2.
> >>>>>> Message displays once per session.
> >>>>>>> View(GBM_protein_expression)
> >>>>>>
> >>>>>> Also, the area above my console says that no data is available in
> >>>> the
> >>>>>> table. Is this perhaps the result of lack of preview or the fact
> >>>> that
> >>>>>> the
> >>>>>> excel file itself contains no numerical data, but only TRUE or FALSE
> >>>>>> entries?
> >>>>>>
> >>>>>> On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
> >>>>>> [hidden email]> wrote:
> >>>>>>
> >>>>>>> Hello again,
> >>>>>>>
> >>>>>>> I worked on directly downloading the file into R as was suggested,
> >>>>>> but
> >>>>>>> have thus far been unsuccessful. This is what  I generated on my
> >>>>>> second
> >>>>>>> attempt...
> >>>>>>>
> >>>>>>> GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
> >>>>>>>> GBM
> >>>>>>>
> >>>>>
> >>>
> >>>>>
> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
> >>>>>>> sep="\t")
> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
> >>>>>>>>
> >>>>>>>
> >>>>>>> What part of the argument is in error?
> >>>>>>>
> >>>>>>> Also I tried importing the dataset as an excel file on RStudio to
> >>>> see
> >>>>>> if I
> >>>>>>> could solve my problem that way. However, my imported excel file
> >>>> has
> >>>>>> been
> >>>>>>> stuck in the 'retrieving preview data' and no data is appearing.
> >>>> Is
> >>>>>> the
> >>>>>>> data file prehaps too large or in the wrong format?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
> >>>>>>> [hidden email]> wrote:
> >>>>>>>
> >>>>>>>> Mr. Heiberger,
> >>>>>>>>
> >>>>>>>> Thank you for the insight! I will try out suggestion.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Spencer Brackett
> >>>>>>>>
> >>>>>>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
> >>>>>> <[hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I looked at the first file.  It gives an option to download as
> >>>> TSV
> >>>>>>>>> (tab separated values).
> >>>>>>>>> That is the same as CSV except with tabs instead of commas.
> >>>>>>>>> You do not need any external software to read it.  Read the
> >>>>>> downloaded
> >>>>>>>>> file directly into R.
> >>>>>>>>>
> >>>>>>>>> read.delim looks as if it would work directly on the downloaded
> >>>>>> file.
> >>>>>>>>> ?read.delim
> >>>>>>>>> The notation "\t" means the tab character.
> >>>>>>>>>
> >>>>>>>>> As an aside, stay away from notepad. it is too naive for almost
> >>>>>>>>> anything interesting.
> >>>>>>>>> The specific case I often see is people reading linux-style text
> >>>>>> files
> >>>>>>>>> with notepad, which doesn't
> >>>>>>>>> understand NL terminated lines.  nicely formatted text files
> >>>> become
> >>>>>>>>> illegible.
> >>>>>>>>>
> >>>>>>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
> >>>>>>>>> <[hidden email]> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Good evening,
> >>>>>>>>>>
> >>>>>>>>>> I am attempting to anaylze the protein expression data
> >>>> contained
> >>>>>> within
> >>>>>>>>>> these two ICGC, TCGA datasets (one for GBM and the other for
> >>>> LGG)
> >>>>>>>>>>
> >>>>>>>>>> *File for GBM  protein expression*:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >>>>>>>>>>
> >>>>>>>>>> *File for LGG protein expression:*
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> *
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >>>>>>>>>> <
> >>>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
> >>>>>>>>>> *
> >>>>>>>>>>
> >>>>>>>>>>  When I tried to transfer the files from .txt (via Notepad)
> >>>> to
> >>>>>> .csv
> >>>>>>>>> (via
> >>>>>>>>>> Excel), the data appeared in the columns as unorganized and
> >>>>>> random
> >>>>>>>>>> script... not like how a typical csv should be arranged at
> >>>> all. I
> >>>>>> need
> >>>>>>>>> the
> >>>>>>>>>> dataset to be converted into .csv in order to analyze it in R,
> >>>>>> which
> >>>>>>>>> is why
> >>>>>>>>>> I am hoping someone here might help me in doing that. If not,
> >>>> is
> >>>>>> there
> >>>>>>>>>> perhaps some other way that I could analyze the datatsets on
> >>>> R,
> >>>>>> which
> >>>>>>>>> again
> >>>>>>>>>> is downloaded from the dataportal ICGC?
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>>
> >>>>>>>>>> Spencer Brackett
> >>>>>>>>>>
> >>>>>>>>>>        [[alternative HTML version deleted]]
> >>>>>>>>>>
> >>>>>>>>>> ______________________________________________
> >>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more,
> >>>> see
> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> >>>>>> code.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>      [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> >>>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>
> >>>>> --
> >>>>> Sent from my phone. Please excuse my brevity.
> >>>>>
> >>>
> >>> --
> >>> Sent from my phone. Please excuse my brevity.
> >>>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Spencer Brackett
Follow up,

Would read.txt also work, as I am certain that I have both datasets in .txt
files? As to a previous users question concern the .csv nature of the
supposed excel file, I am uncertain as to how this was translated as such.
The file is most certainly in excel.


On Thu, Dec 27, 2018 at 12:10 AM Spencer Brackett <
[hidden email]> wrote:

> Caitlin,
>
>   I tried your command in both RGui and RStudio but both came up as
> errors. I believe I made a mistake somewhere I labeling/downloading the
> files, which is the source of the confusion in R. I will re-examine the
> files saved on my desktop to determine the error. Regardless, would it be
> better to use a read.table or read.csv function when attempting to download
> my datasets? I tried using read.xl on RStudio as this process seemed much
> easier, however, it would seem that my proclivity to error prevents such.
>
> Best,
>
> Spencer
>
> On Wed, Dec 26, 2018 at 11:55 PM Caitlin Gibbons <[hidden email]>
> wrote:
>
>> Does this help Spencer? The read.delim() function assumes a tab character
>> by default, but I specifically included it using the read.csv function. The
>> downloaded file is NOT an Excel file so this should help.
>>
>> GBM_protein_expression <- read.csv("C:/Users/Spencer/Desktop/GBM
>> protein_expression.tsv", sep=“\t”)
>>
>> Sent from my iPhone
>>
>> > On Dec 26, 2018, at 9:23 PM, Richard M. Heiberger <[hidden email]>
>> wrote:
>> >
>> > this is wrong because the file is a csv file.  read_excel is designed
>> > for xls files.
>> > GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>> > protein_expression.csv")
>> >
>> > How did you get a csv? it downloads as tsv.
>> >
>> > the statement you should use is in base, no library() statement is
>> needed.
>> >
>> > GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM
>> > protein_expression.csv")
>> >
>> > read.delim is the same as read.csv except that it sets the sep
>> > argument to "\t".
>> >
>> >
>> >
>> > On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett
>> > <[hidden email]> wrote:
>> >>
>> >> Sorry, my mistake.
>> >>
>> >> So I could still use read.table and should I try using a .txt version
>> of
>> >> the file to avoid the silent changes you described?
>> >>
>> >> Also, when I tried to simply this process by downloading the dataset
>> onto
>> >> RStudio opposed to R (Gui) I received the following...
>> >> library(readxl)
>> >>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>> >> protein_expression.csv")
>> >> Error: Can't establish that the input is either xls or xlsx.
>> >>> View(GBM_protein_expression)
>> >> Error in View : object 'GBM_protein_expression' not found
>> >> Error in gzfile(file, mode) : cannot open the connection
>> >> In addition: Warning message:
>> >> In gzfile(file, mode) :
>> >>  cannot open compressed file
>> >> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>> >> probable reason 'No such file or directory'
>> >>> library(readxl)
>> >>> GBM_protein_expression <-
>> >> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>> >> readxl works best with a newer version of the tibble package.
>> >> You currently have tibble v1.4.2.
>> >> Falling back to column name repair from tibble <= v1.4.2.
>> >> Message displays once per session.
>> >>> View(GBM_protein_expression)
>> >>
>> >>
>> >> Is this perhaps the result of lack of preview (which I did not
>> complete at
>> >> the time I hit import as the preview failed to load), or the fact that
>> the
>> >> excel file itself contains no numerical data, but only TRUE or FALSE
>> >> entries?
>> >>
>> >> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <
>> [hidden email]>
>> >> wrote:
>> >>
>> >>> Please always reply-all to keep the list involved.
>> >>>
>> >>> If you used Save As to change the data format to Excel AND the file
>> >>> extension to xlsx, then yes, you should be able to read with readxl. I
>> >>> don't recommend it, though... Excel often changes data silently and in
>> >>> irregularly located places in your file.
>> >>>
>> >>> On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
>> >>> [hidden email]> wrote:
>> >>>> So even if I imported the file form ICGC to my desktop as an excel
>> >>>> file,
>> >>>> and can view and saved the data as such, it is still a TSV?
>> >>>>
>> >>>> On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
>> >>>> <[hidden email]>
>> >>>> wrote:
>> >>>>
>> >>>>> CSV and TSV are not Excel files. Yes, I know Excel will open them,
>> >>>> but
>> >>>>> that does not make them Excel files.
>> >>>>>
>> >>>>> Read a TSV file with read.table or read.csv, setting the sep
>> argument
>> >>>> to
>> >>>>> "\t".
>> >>>>>
>> >>>>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
>> >>>>> [hidden email]> wrote:
>> >>>>>> I tried importing the file without preview and recieved the
>> >>>>>> following....
>> >>>>>>
>> >>>>>> library(readxl)
>> >>>>>>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>> >>>>>> protein_expression.csv")
>> >>>>>> Error: Can't establish that the input is either xls or xlsx.
>> >>>>>>> View(GBM_protein_expression)
>> >>>>>> Error in View : object 'GBM_protein_expression' not found
>> >>>>>> Error in gzfile(file, mode) : cannot open the connection
>> >>>>>> In addition: Warning message:
>> >>>>>> In gzfile(file, mode) :
>> >>>>>> cannot open compressed file
>> >>>>>
>> >>>>>
>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>> >>>>>> probable reason 'No such file or directory'
>> >>>>>>> library(readxl)
>> >>>>>>> GBM_protein_expression <-
>> >>>>>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>> >>>>>> readxl works best with a newer version of the tibble package.
>> >>>>>> You currently have tibble v1.4.2.
>> >>>>>> Falling back to column name repair from tibble <= v1.4.2.
>> >>>>>> Message displays once per session.
>> >>>>>>> View(GBM_protein_expression)
>> >>>>>>
>> >>>>>> Also, the area above my console says that no data is available in
>> >>>> the
>> >>>>>> table. Is this perhaps the result of lack of preview or the fact
>> >>>> that
>> >>>>>> the
>> >>>>>> excel file itself contains no numerical data, but only TRUE or
>> FALSE
>> >>>>>> entries?
>> >>>>>>
>> >>>>>> On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
>> >>>>>> [hidden email]> wrote:
>> >>>>>>
>> >>>>>>> Hello again,
>> >>>>>>>
>> >>>>>>> I worked on directly downloading the file into R as was suggested,
>> >>>>>> but
>> >>>>>>> have thus far been unsuccessful. This is what  I generated on my
>> >>>>>> second
>> >>>>>>> attempt...
>> >>>>>>>
>> >>>>>>> GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
>> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
>> >>>>>>>> GBM
>> >>>>>>>
>> >>>>>
>> >>>
>> >>>>>
>> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
>> >>>>>>> sep="\t")
>> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> What part of the argument is in error?
>> >>>>>>>
>> >>>>>>> Also I tried importing the dataset as an excel file on RStudio to
>> >>>> see
>> >>>>>> if I
>> >>>>>>> could solve my problem that way. However, my imported excel file
>> >>>> has
>> >>>>>> been
>> >>>>>>> stuck in the 'retrieving preview data' and no data is appearing.
>> >>>> Is
>> >>>>>> the
>> >>>>>>> data file prehaps too large or in the wrong format?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
>> >>>>>>> [hidden email]> wrote:
>> >>>>>>>
>> >>>>>>>> Mr. Heiberger,
>> >>>>>>>>
>> >>>>>>>> Thank you for the insight! I will try out suggestion.
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>>
>> >>>>>>>> Spencer Brackett
>> >>>>>>>>
>> >>>>>>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
>> >>>>>> <[hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I looked at the first file.  It gives an option to download as
>> >>>> TSV
>> >>>>>>>>> (tab separated values).
>> >>>>>>>>> That is the same as CSV except with tabs instead of commas.
>> >>>>>>>>> You do not need any external software to read it.  Read the
>> >>>>>> downloaded
>> >>>>>>>>> file directly into R.
>> >>>>>>>>>
>> >>>>>>>>> read.delim looks as if it would work directly on the downloaded
>> >>>>>> file.
>> >>>>>>>>> ?read.delim
>> >>>>>>>>> The notation "\t" means the tab character.
>> >>>>>>>>>
>> >>>>>>>>> As an aside, stay away from notepad. it is too naive for almost
>> >>>>>>>>> anything interesting.
>> >>>>>>>>> The specific case I often see is people reading linux-style text
>> >>>>>> files
>> >>>>>>>>> with notepad, which doesn't
>> >>>>>>>>> understand NL terminated lines.  nicely formatted text files
>> >>>> become
>> >>>>>>>>> illegible.
>> >>>>>>>>>
>> >>>>>>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>> >>>>>>>>> <[hidden email]> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Good evening,
>> >>>>>>>>>>
>> >>>>>>>>>> I am attempting to anaylze the protein expression data
>> >>>> contained
>> >>>>>> within
>> >>>>>>>>>> these two ICGC, TCGA datasets (one for GBM and the other for
>> >>>> LGG)
>> >>>>>>>>>>
>> >>>>>>>>>> *File for GBM  protein expression*:
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >>>>>>>>>>
>> >>>>>>>>>> *File for LGG protein expression:*
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> *
>> >>>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >>>>>>>>>> <
>> >>>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>> >>>>>>>>>> *
>> >>>>>>>>>>
>> >>>>>>>>>>  When I tried to transfer the files from .txt (via Notepad)
>> >>>> to
>> >>>>>> .csv
>> >>>>>>>>> (via
>> >>>>>>>>>> Excel), the data appeared in the columns as unorganized and
>> >>>>>> random
>> >>>>>>>>>> script... not like how a typical csv should be arranged at
>> >>>> all. I
>> >>>>>> need
>> >>>>>>>>> the
>> >>>>>>>>>> dataset to be converted into .csv in order to analyze it in R,
>> >>>>>> which
>> >>>>>>>>> is why
>> >>>>>>>>>> I am hoping someone here might help me in doing that. If not,
>> >>>> is
>> >>>>>> there
>> >>>>>>>>>> perhaps some other way that I could analyze the datatsets on
>> >>>> R,
>> >>>>>> which
>> >>>>>>>>> again
>> >>>>>>>>>> is downloaded from the dataportal ICGC?
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>>
>> >>>>>>>>>> Spencer Brackett
>> >>>>>>>>>>
>> >>>>>>>>>>        [[alternative HTML version deleted]]
>> >>>>>>>>>>
>> >>>>>>>>>> ______________________________________________
>> >>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more,
>> >>>> see
>> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>>>>>>> PLEASE do read the posting guide
>> >>>>>>>>> http://www.R-project.org/posting-guide.html
>> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>> >>>>>> code.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>      [[alternative HTML version deleted]]
>> >>>>>>
>> >>>>>> ______________________________________________
>> >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>>> PLEASE do read the posting guide
>> >>>>>> http://www.R-project.org/posting-guide.html
>> >>>>>> and provide commented, minimal, self-contained, reproducible code.
>> >>>>>
>> >>>>> --
>> >>>>> Sent from my phone. Please excuse my brevity.
>> >>>>>
>> >>>
>> >>> --
>> >>> Sent from my phone. Please excuse my brevity.
>> >>>
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: UPDATE

Caitlin Gibbons
Is the file being saved as .xls, .xlsx, .csv, .tsv, or .txt?


On Wed, Dec 26, 2018 at 10:14 PM Spencer Brackett <
[hidden email]> wrote:

> Follow up,
>
> Would read.txt also work, as I am certain that I have both datasets in
> .txt files? As to a previous users question concern the .csv nature of the
> supposed excel file, I am uncertain as to how this was translated as such.
> The file is most certainly in excel.
>
>
> On Thu, Dec 27, 2018 at 12:10 AM Spencer Brackett <
> [hidden email]> wrote:
>
>> Caitlin,
>>
>>   I tried your command in both RGui and RStudio but both came up as
>> errors. I believe I made a mistake somewhere I labeling/downloading the
>> files, which is the source of the confusion in R. I will re-examine the
>> files saved on my desktop to determine the error. Regardless, would it be
>> better to use a read.table or read.csv function when attempting to download
>> my datasets? I tried using read.xl on RStudio as this process seemed much
>> easier, however, it would seem that my proclivity to error prevents such.
>>
>> Best,
>>
>> Spencer
>>
>> On Wed, Dec 26, 2018 at 11:55 PM Caitlin Gibbons <[hidden email]>
>> wrote:
>>
>>> Does this help Spencer? The read.delim() function assumes a tab
>>> character by default, but I specifically included it using the read.csv
>>> function. The downloaded file is NOT an Excel file so this should help.
>>>
>>> GBM_protein_expression <- read.csv("C:/Users/Spencer/Desktop/GBM
>>> protein_expression.tsv", sep=“\t”)
>>>
>>> Sent from my iPhone
>>>
>>> > On Dec 26, 2018, at 9:23 PM, Richard M. Heiberger <[hidden email]>
>>> wrote:
>>> >
>>> > this is wrong because the file is a csv file.  read_excel is designed
>>> > for xls files.
>>> > GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>>> > protein_expression.csv")
>>> >
>>> > How did you get a csv? it downloads as tsv.
>>> >
>>> > the statement you should use is in base, no library() statement is
>>> needed.
>>> >
>>> > GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM
>>> > protein_expression.csv")
>>> >
>>> > read.delim is the same as read.csv except that it sets the sep
>>> > argument to "\t".
>>> >
>>> >
>>> >
>>> > On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett
>>> > <[hidden email]> wrote:
>>> >>
>>> >> Sorry, my mistake.
>>> >>
>>> >> So I could still use read.table and should I try using a .txt version
>>> of
>>> >> the file to avoid the silent changes you described?
>>> >>
>>> >> Also, when I tried to simply this process by downloading the dataset
>>> onto
>>> >> RStudio opposed to R (Gui) I received the following...
>>> >> library(readxl)
>>> >>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>>> >> protein_expression.csv")
>>> >> Error: Can't establish that the input is either xls or xlsx.
>>> >>> View(GBM_protein_expression)
>>> >> Error in View : object 'GBM_protein_expression' not found
>>> >> Error in gzfile(file, mode) : cannot open the connection
>>> >> In addition: Warning message:
>>> >> In gzfile(file, mode) :
>>> >>  cannot open compressed file
>>> >>
>>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>>> >> probable reason 'No such file or directory'
>>> >>> library(readxl)
>>> >>> GBM_protein_expression <-
>>> >> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>>> >> readxl works best with a newer version of the tibble package.
>>> >> You currently have tibble v1.4.2.
>>> >> Falling back to column name repair from tibble <= v1.4.2.
>>> >> Message displays once per session.
>>> >>> View(GBM_protein_expression)
>>> >>
>>> >>
>>> >> Is this perhaps the result of lack of preview (which I did not
>>> complete at
>>> >> the time I hit import as the preview failed to load), or the fact
>>> that the
>>> >> excel file itself contains no numerical data, but only TRUE or FALSE
>>> >> entries?
>>> >>
>>> >> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <
>>> [hidden email]>
>>> >> wrote:
>>> >>
>>> >>> Please always reply-all to keep the list involved.
>>> >>>
>>> >>> If you used Save As to change the data format to Excel AND the file
>>> >>> extension to xlsx, then yes, you should be able to read with readxl.
>>> I
>>> >>> don't recommend it, though... Excel often changes data silently and
>>> in
>>> >>> irregularly located places in your file.
>>> >>>
>>> >>> On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
>>> >>> [hidden email]> wrote:
>>> >>>> So even if I imported the file form ICGC to my desktop as an excel
>>> >>>> file,
>>> >>>> and can view and saved the data as such, it is still a TSV?
>>> >>>>
>>> >>>> On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
>>> >>>> <[hidden email]>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> CSV and TSV are not Excel files. Yes, I know Excel will open them,
>>> >>>> but
>>> >>>>> that does not make them Excel files.
>>> >>>>>
>>> >>>>> Read a TSV file with read.table or read.csv, setting the sep
>>> argument
>>> >>>> to
>>> >>>>> "\t".
>>> >>>>>
>>> >>>>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
>>> >>>>> [hidden email]> wrote:
>>> >>>>>> I tried importing the file without preview and recieved the
>>> >>>>>> following....
>>> >>>>>>
>>> >>>>>> library(readxl)
>>> >>>>>>> GBM_protein_expression <-
>>> read_excel("C:/Users/Spencer/Desktop/GBM
>>> >>>>>> protein_expression.csv")
>>> >>>>>> Error: Can't establish that the input is either xls or xlsx.
>>> >>>>>>> View(GBM_protein_expression)
>>> >>>>>> Error in View : object 'GBM_protein_expression' not found
>>> >>>>>> Error in gzfile(file, mode) : cannot open the connection
>>> >>>>>> In addition: Warning message:
>>> >>>>>> In gzfile(file, mode) :
>>> >>>>>> cannot open compressed file
>>> >>>>>
>>> >>>>>
>>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>>> >>>>>> probable reason 'No such file or directory'
>>> >>>>>>> library(readxl)
>>> >>>>>>> GBM_protein_expression <-
>>> >>>>>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_
>>> expression.xlsx")
>>> >>>>>> readxl works best with a newer version of the tibble package.
>>> >>>>>> You currently have tibble v1.4.2.
>>> >>>>>> Falling back to column name repair from tibble <= v1.4.2.
>>> >>>>>> Message displays once per session.
>>> >>>>>>> View(GBM_protein_expression)
>>> >>>>>>
>>> >>>>>> Also, the area above my console says that no data is available in
>>> >>>> the
>>> >>>>>> table. Is this perhaps the result of lack of preview or the fact
>>> >>>> that
>>> >>>>>> the
>>> >>>>>> excel file itself contains no numerical data, but only TRUE or
>>> FALSE
>>> >>>>>> entries?
>>> >>>>>>
>>> >>>>>> On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
>>> >>>>>> [hidden email]> wrote:
>>> >>>>>>
>>> >>>>>>> Hello again,
>>> >>>>>>>
>>> >>>>>>> I worked on directly downloading the file into R as was
>>> suggested,
>>> >>>>>> but
>>> >>>>>>> have thus far been unsuccessful. This is what  I generated on my
>>> >>>>>> second
>>> >>>>>>> attempt...
>>> >>>>>>>
>>> >>>>>>> GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
>>> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
>>> >>>>>>>> GBM
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> >>>>>
>>> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
>>> >>>>>>> sep="\t")
>>> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> What part of the argument is in error?
>>> >>>>>>>
>>> >>>>>>> Also I tried importing the dataset as an excel file on RStudio to
>>> >>>> see
>>> >>>>>> if I
>>> >>>>>>> could solve my problem that way. However, my imported excel file
>>> >>>> has
>>> >>>>>> been
>>> >>>>>>> stuck in the 'retrieving preview data' and no data is appearing.
>>> >>>> Is
>>> >>>>>> the
>>> >>>>>>> data file prehaps too large or in the wrong format?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
>>> >>>>>>> [hidden email]> wrote:
>>> >>>>>>>
>>> >>>>>>>> Mr. Heiberger,
>>> >>>>>>>>
>>> >>>>>>>> Thank you for the insight! I will try out suggestion.
>>> >>>>>>>>
>>> >>>>>>>> Best,
>>> >>>>>>>>
>>> >>>>>>>> Spencer Brackett
>>> >>>>>>>>
>>> >>>>>>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
>>> >>>>>> <[hidden email]>
>>> >>>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> I looked at the first file.  It gives an option to download as
>>> >>>> TSV
>>> >>>>>>>>> (tab separated values).
>>> >>>>>>>>> That is the same as CSV except with tabs instead of commas.
>>> >>>>>>>>> You do not need any external software to read it.  Read the
>>> >>>>>> downloaded
>>> >>>>>>>>> file directly into R.
>>> >>>>>>>>>
>>> >>>>>>>>> read.delim looks as if it would work directly on the downloaded
>>> >>>>>> file.
>>> >>>>>>>>> ?read.delim
>>> >>>>>>>>> The notation "\t" means the tab character.
>>> >>>>>>>>>
>>> >>>>>>>>> As an aside, stay away from notepad. it is too naive for almost
>>> >>>>>>>>> anything interesting.
>>> >>>>>>>>> The specific case I often see is people reading linux-style
>>> text
>>> >>>>>> files
>>> >>>>>>>>> with notepad, which doesn't
>>> >>>>>>>>> understand NL terminated lines.  nicely formatted text files
>>> >>>> become
>>> >>>>>>>>> illegible.
>>> >>>>>>>>>
>>> >>>>>>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>>> >>>>>>>>> <[hidden email]> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Good evening,
>>> >>>>>>>>>>
>>> >>>>>>>>>> I am attempting to anaylze the protein expression data
>>> >>>> contained
>>> >>>>>> within
>>> >>>>>>>>>> these two ICGC, TCGA datasets (one for GBM and the other for
>>> >>>> LGG)
>>> >>>>>>>>>>
>>> >>>>>>>>>> *File for GBM  protein expression*:
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >>>>>>>>>>
>>> >>>>>>>>>> *File for LGG protein expression:*
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> *
>>> >>>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >>>>>>>>>> <
>>> >>>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >>>>>>>>>> *
>>> >>>>>>>>>>
>>> >>>>>>>>>>  When I tried to transfer the files from .txt (via Notepad)
>>> >>>> to
>>> >>>>>> .csv
>>> >>>>>>>>> (via
>>> >>>>>>>>>> Excel), the data appeared in the columns as unorganized and
>>> >>>>>> random
>>> >>>>>>>>>> script... not like how a typical csv should be arranged at
>>> >>>> all. I
>>> >>>>>> need
>>> >>>>>>>>> the
>>> >>>>>>>>>> dataset to be converted into .csv in order to analyze it in R,
>>> >>>>>> which
>>> >>>>>>>>> is why
>>> >>>>>>>>>> I am hoping someone here might help me in doing that. If not,
>>> >>>> is
>>> >>>>>> there
>>> >>>>>>>>>> perhaps some other way that I could analyze the datatsets on
>>> >>>> R,
>>> >>>>>> which
>>> >>>>>>>>> again
>>> >>>>>>>>>> is downloaded from the dataportal ICGC?
>>> >>>>>>>>>>
>>> >>>>>>>>>> Best,
>>> >>>>>>>>>>
>>> >>>>>>>>>> Spencer Brackett
>>> >>>>>>>>>>
>>> >>>>>>>>>>        [[alternative HTML version deleted]]
>>> >>>>>>>>>>
>>> >>>>>>>>>> ______________________________________________
>>> >>>>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more,
>>> >>>> see
>>> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>>>>>>> PLEASE do read the posting guide
>>> >>>>>>>>> http://www.R-project.org/posting-guide.html
>>> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>> >>>>>> code.
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>
>>> >>>>>>      [[alternative HTML version deleted]]
>>> >>>>>>
>>> >>>>>> ______________________________________________
>>> >>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>>> PLEASE do read the posting guide
>>> >>>>>> http://www.R-project.org/posting-guide.html
>>> >>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>>>
>>> >>>>> --
>>> >>>>> Sent from my phone. Please excuse my brevity.
>>> >>>>>
>>> >>>
>>> >>> --
>>> >>> Sent from my phone. Please excuse my brevity.
>>> >>>
>>> >>
>>> >>        [[alternative HTML version deleted]]
>>> >>
>>> >> ______________________________________________
>>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help converting .txt to .csv file

PIKAL Petr
In reply to this post by Spencer Brackett
Hi

See inline

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Spencer Brackett
> Sent: Thursday, December 27, 2018 3:57 AM
> To: Richard M. Heiberger <[hidden email]>
> Cc: R-help <[hidden email]>
> Subject: Re: [R] Help converting .txt to .csv file
>
> Hello again,
>
> I worked on directly downloading the file into R as was suggested, but have
> thus far been unsuccessful. This is what  I generated on my second attempt...
>
>  GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
> Error: unexpected symbol in "GBM protein_expression"
> > GBM

You forgot to add read.* function.

something like
protein_expression <- read.delim(file.choose())

or

protein_expression <- read.table(file.choose(), header=TRUE, sep="\t")

If your files are tab delimited as Richard suggested.

Cheers
Petr

> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
> sep="\t")
> Error: unexpected symbol in "GBM protein_expression"
> >
>
> What part of the argument is in error?
>
> Also I tried importing the dataset as an excel file on RStudio to see if I could
> solve my problem that way. However, my imported excel file has been stuck in
> the 'retrieving preview data' and no data is appearing. Is the data file prehaps
> too large or in the wrong format?
>
>
>
> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
> [hidden email]> wrote:
>
> > Mr. Heiberger,
> >
> >  Thank you for the insight! I will try out suggestion.
> >
> > Best,
> >
> > Spencer Brackett
> >
> > On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <[hidden email]>
> > wrote:
> >
> >> I looked at the first file.  It gives an option to download as TSV
> >> (tab separated values).
> >> That is the same as CSV except with tabs instead of commas.
> >> You do not need any external software to read it.  Read the
> >> downloaded file directly into R.
> >>
> >> read.delim looks as if it would work directly on the downloaded file.
> >> ?read.delim
> >> The notation "\t" means the tab character.
> >>
> >> As an aside, stay away from notepad. it is too naive for almost
> >> anything interesting.
> >> The specific case I often see is people reading linux-style text
> >> files with notepad, which doesn't understand NL terminated lines.
> >> nicely formatted text files become illegible.
> >>
> >> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
> >> <[hidden email]> wrote:
> >> >
> >> > Good evening,
> >> >
> >> > I am attempting to anaylze the protein expression data contained
> >> > within these two ICGC, TCGA datasets (one for GBM and the other for
> >> > LGG)
> >> >
> >> > *File for GBM  protein expression*:
> >> >
> >> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22
> >> :%7B%22is%22:%5B%22GBM-
> US%22%5D%7D,%22availableDataTypes%22:%7B%22is%
> >> 22:%5B%22pexp%22%5D%7D%7D%7D
> >> >
> >> > *File for LGG protein expression:*
> >> >
> >> >
> >> > *
> >> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22
> >> :%7B%22is%22:%5B%22LGG-
> US%22%5D%7D,%22availableDataTypes%22:%7B%22is%
> >> 22:%5B%22pexp%22%5D%7D%7D%7D
> >> > <
> >> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22
> >> :%7B%22is%22:%5B%22LGG-
> US%22%5D%7D,%22availableDataTypes%22:%7B%22is%
> >> 22:%5B%22pexp%22%5D%7D%7D%7D
> >> >*
> >> >
> >> >   When I tried to transfer the files from .txt (via Notepad) to
> >> > .csv
> >> (via
> >> > Excel), the data appeared in the columns as unorganized and random
> >> > script... not like how a typical csv should be arranged at all. I
> >> > need
> >> the
> >> > dataset to be converted into .csv in order to analyze it in R,
> >> > which is
> >> why
> >> > I am hoping someone here might help me in doing that. If not, is
> >> > there perhaps some other way that I could analyze the datatsets on
> >> > R, which
> >> again
> >> > is downloaded from the dataportal ICGC?
> >> >
> >> > Best,
> >> >
> >> > Spencer Brackett
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.