data reshape

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

data reshape

Yuan Chun Ding
Hi R users,

I have a folder (called genotype) with 652 files; the file names are  GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only one column of data without a header as below
201
2/2
238
3/4
245
1/2
.....
983255
3/3
983766
None


A total of 20528 rows;

I need to read all those 652 files in the genotype folder and then reshape the one column in each file as:
SampleID             201        238        245        ....   983255         983766
GTEX-1A3MV     2/2         3/4        1/2                         3/3         None

There are 10264 data columns plus the sample ID column, so 10265 columns in total after data reshaping.

After reading those 652 file and reshape the one column in each file, I will stack them by the rbind function, then I have a file with a dimension of 653 row, 10265 column.


Thank you,

Ding

----------------------------------------------------------------------
------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-  

This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to rec
 eive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301)
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data reshape

Bert Gunter-2
Did you even make an attempt to do this? -- or would you like us do all
your work for you?

If you made an attempt, show us your code and errors.
If not, we usually expect you to try on your own first.
If you have no idea where to start, perhaps you need to spend some more
time with tutorials to learn basic R functionality before proceeding.

Bert

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <[hidden email]> wrote:

> Hi R users,
>
> I have a folder (called genotype) with 652 files; the file names are
> GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only
> one column of data without a header as below
> 201
> 2/2
> 238
> 3/4
> 245
> 1/2
> .....
> 983255
> 3/3
> 983766
> None
>
>
> A total of 20528 rows;
>
> I need to read all those 652 files in the genotype folder and then reshape
> the one column in each file as:
> SampleID             201        238        245        ....   983255
>  983766
> GTEX-1A3MV     2/2         3/4        1/2                         3/3
>    None
>
> There are 10264 data columns plus the sample ID column, so 10265 columns
> in total after data reshaping.
>
> After reading those 652 file and reshape the one column in each file, I
> will stack them by the rbind function, then I have a file with a dimension
> of 653 row, 10265 column.
>
>
> Thank you,
>
> Ding
>
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> -SECURITY/CONFIDENTIALITY WARNING-
>
> This message and any attachments are intended solely for the individual or
> entity to which they are addressed. This communication may contain
> information that is privileged, confidential, or exempt from disclosure
> under applicable law (e.g., personal health information, research data,
> financial information). Because this e-mail has been sent without
> encryption, individuals other than the intended recipient may be able to
> view the information, forward it to others or tamper with the information
> without the knowledge or consent of the sender. If you are not the intended
> recipient, or the employee or person responsible for delivering the message
> to the intended recipient, any dissemination, distribution or copying of
> the communication is strictly prohibited. If you received the communication
> in error, please notify the sender immediately by replying to this message
> and deleting the message and any accompanying files from your system. If,
> due to the security risks, you do not wish to rec
>  eive further communications via e-mail, please reply to this message and
> inform the sender that you do not wish to receive further e-mail from the
> sender. (LCP301)
> ------------------------------------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data reshape

Yuan Chun Ding
Hi Bert,

Sorry that I was in a hurry  going home yesterday afternoon and just posted my question and hoped to get some advice.

Here is what I got yesterday before going home.
---------------------------------------------------------------
setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")

file_list <- list.files(pattern="*.out")

#to read all 652 files into Rstudio and found that NOT all files have same number of rows
for (i in 1:length(file_list)){

  assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,

         read.delim(file_list[i], head=F))
}

#the first file, GTEX_1117F, in the following format,  one column and 19482 rows
#4 is marker id, 25/48 is its marker value;
#  V1
#  4
# 25/48
# 201
# 2/2
# ...
# 648589
# None

#to make this one-column file into a two-column file as below
# so first column is marker id, second is corresponding marker values for the sample GTEX_1117F
#  VNTRid      GTEX_1117F
#   4               25/48
#   201            2/2
#    ...          ...
# 648589          None

for (i in 1:length(file_list)){
  temp <- read.delim(file_list[i], head=F)
  even <-seq(2, length(temp$V1),2)
  odd <-seq(1, length(temp$V1)-1, 2)
  output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
  colnames(output)<- c("VNTRid",substr(file_list[i], 1, nchar(file_list[i]) -4))
  for (j in 1:length(temp$V1)/2){
  output[j,1]<- as.character(temp$V1)[odd[j]]
  output[j,2]<- as.character(temp$V1)[even[j]]}
  assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)), as.data.frame(output))
                             }

Yesterday, I intended to reshape the output file above from long to wide using VNTRid as key.
Since not all files have the same number of rows, after reshaping, those file would not bind correctly using rbind function.
One my way to work place this morning, I changed my intension; I will not reshape to wide format and actually like the long format I generated. I will read in a VNTR marker annotation file including VNTRid in first column and marker locations in human chromosomes in the second column, this annotation file should include all the VNTR markers.  I know the VNTRid in the annotation file are same as the VNTRid in the 652 file I read in.

Do you know a good way to merge all those 652 files (with two columns) ?

Thank you,

Ding


#merge all 652 files into one file with VNTRid as first column, 2nd to 653th column are genotype with header
#as sample ID,  so

From: Bert Gunter [mailto:[hidden email]]
Sent: Thursday, December 19, 2019 6:52 PM
To: Yuan Chun Ding
Cc: [hidden email]
Subject: Re: [R] data reshape

________________________________
[Attention: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.]
________________________________
Did you even make an attempt to do this? -- or would you like us do all your work for you?

If you made an attempt, show us your code and errors.
If not, we usually expect you to try on your own first.
If you have no idea where to start, perhaps you need to spend some more time with tutorials to learn basic R functionality before proceeding.

Bert

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <[hidden email]<mailto:[hidden email]>> wrote:
Hi R users,

I have a folder (called genotype) with 652 files; the file names are  GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only one column of data without a header as below
201
2/2
238
3/4
245
1/2
.....
983255
3/3
983766
None


A total of 20528 rows;

I need to read all those 652 files in the genotype folder and then reshape the one column in each file as:
SampleID             201        238        245        ....   983255         983766
GTEX-1A3MV     2/2         3/4        1/2                         3/3         None

There are 10264 data columns plus the sample ID column, so 10265 columns in total after data reshaping.

After reading those 652 file and reshape the one column in each file, I will stack them by the rbind function, then I have a file with a dimension of 653 row, 10265 column.


Thank you,

Ding

----------------------------------------------------------------------
------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to rec
 eive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301)
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data reshape

Bert Gunter-2
?merge ## note the all.x option
Example:
> a <- data.frame(x = 1:3, y1 = 11:13)
> b <- data.frame(x = c(1,3), y2 = 21:22)

> merge(a,b, all.x = TRUE)
  x y1 y2
1 1 11 21
2 2 12 NA
3 3 13 22


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Dec 20, 2019 at 9:00 AM Yuan Chun Ding <[hidden email]> wrote:

> Hi Bert,
>
>
>
> Sorry that I was in a hurry  going home yesterday afternoon and just
> posted my question and hoped to get some advice.
>
>
>
> Here is what I got yesterday before going home.
>
> ---------------------------------------------------------------
>
> setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")
>
>
>
> file_list <- list.files(pattern="*.out")
>
>
>
> #to read all 652 files into Rstudio and found that NOT all files have same
> number of rows
>
> for (i in 1:length(file_list)){
>
>
>
>   assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,
>
>
>
>          read.delim(file_list[i], head=F))
>
> }
>
>
>
> #the first file, GTEX_1117F, in the following format,  one column and
> 19482 rows
>
> #4 is marker id, 25/48 is its marker value;
>
> #  V1
>
> #  4
>
> # 25/48
>
> # 201
>
> # 2/2
>
> # ...
>
> # 648589
>
> # None
>
>
>
> #to make this one-column file into a two-column file as below
>
> # so first column is marker id, second is corresponding marker values for
> the sample GTEX_1117F
>
> #  VNTRid      GTEX_1117F
>
> #   4               25/48
>
> #   201            2/2
>
> #    ...          ...
>
> # 648589          None
>
>
>
> for (i in 1:length(file_list)){
>
>   temp <- read.delim(file_list[i], head=F)
>
>   even <-seq(2, length(temp$V1),2)
>
>   odd <-seq(1, length(temp$V1)-1, 2)
>
>   output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
>
>   colnames(output)<- c("VNTRid",substr(file_list[i], 1,
> nchar(file_list[i]) -4))
>
>   for (j in 1:length(temp$V1)/2){
>
>   output[j,1]<- as.character(temp$V1)[odd[j]]
>
>   output[j,2]<- as.character(temp$V1)[even[j]]}
>
>   assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)),
> as.data.frame(output))
>
>                              }
>
>
>
> Yesterday, I intended to reshape the output file above from long to wide
> using VNTRid as key.
>
> Since not all files have the same number of rows, after reshaping, those
> file would not bind correctly using rbind function.
>
> One my way to work place this morning, I changed my intension; I will not
> reshape to wide format and actually like the long format I generated. I
> will read in a VNTR marker annotation file including VNTRid in first column
> and marker locations in human chromosomes in the second column, this
> annotation file should include all the VNTR markers.  I know the VNTRid in
> the annotation file are same as the VNTRid in the 652 file I read in.
>
>
>
> Do you know a good way to merge all those 652 files (with two columns) ?
>
>
>
> Thank you,
>
>
>
> Ding
>
>
>
>
>
> #merge all 652 files into one file with VNTRid as first column, 2nd to
> 653th column are genotype with header
>
> #as sample ID,  so
>
>
>
> *From:* Bert Gunter [mailto:[hidden email]]
> *Sent:* Thursday, December 19, 2019 6:52 PM
> *To:* Yuan Chun Ding
> *Cc:* [hidden email]
> *Subject:* Re: [R] data reshape
>
>
> ------------------------------
>
> [Attention: This email came from an external source. Do not open
> attachments or click on links from unknown senders or unexpected emails.]
> ------------------------------
>
> Did you even make an attempt to do this? -- or would you like us do all
> your work for you?
>
>
>
> If you made an attempt, show us your code and errors.
>
> If not, we usually expect you to try on your own first.
>
> If you have no idea where to start, perhaps you need to spend some more
> time with tutorials to learn basic R functionality before proceeding.
>
>
>
> Bert
>
>
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>
>
>
> On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <[hidden email]> wrote:
>
> Hi R users,
>
> I have a folder (called genotype) with 652 files; the file names are
> GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only
> one column of data without a header as below
> 201
> 2/2
> 238
> 3/4
> 245
> 1/2
> .....
> 983255
> 3/3
> 983766
> None
>
>
> A total of 20528 rows;
>
> I need to read all those 652 files in the genotype folder and then reshape
> the one column in each file as:
> SampleID             201        238        245        ....   983255
>  983766
> GTEX-1A3MV     2/2         3/4        1/2                         3/3
>    None
>
> There are 10264 data columns plus the sample ID column, so 10265 columns
> in total after data reshaping.
>
> After reading those 652 file and reshape the one column in each file, I
> will stack them by the rbind function, then I have a file with a dimension
> of 653 row, 10265 column.
>
>
> Thank you,
>
> Ding
>
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> -SECURITY/CONFIDENTIALITY WARNING-
>
> This message and any attachments are intended solely for the individual or
> entity to which they are addressed. This communication may contain
> information that is privileged, confidential, or exempt from disclosure
> under applicable law (e.g., personal health information, research data,
> financial information). Because this e-mail has been sent without
> encryption, individuals other than the intended recipient may be able to
> view the information, forward it to others or tamper with the information
> without the knowledge or consent of the sender. If you are not the intended
> recipient, or the employee or person responsible for delivering the message
> to the intended recipient, any dissemination, distribution or copying of
> the communication is strictly prohibited. If you received the communication
> in error, please notify the sender immediately by replying to this message
> and deleting the message and any accompanying files from your system. If,
> due to the security risks, you do not wish to rec
>  eive further communications via e-mail, please reply to this message and
> inform the sender that you do not wish to receive further e-mail from the
> sender. (LCP301)
> ------------------------------------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data reshape

Bert Gunter-2
It is perhaps worth noting that (assuming I understand correctly) this can
easily be done in one go without any overt looping as a nice application of
Reduce() after all your files are read into your global environment as a
nice application of Reduce().

Example:

> a.out <- data.frame(x = 1:3, y1 = 11:13)
> b.out <- data.frame(x = c(1,3), y2 = 21:22)
> d.out <- data.frame(x = c(2:3), y3 = c(.5,.6))

> nm <- ls(pat = ".*out$")
> f <- function(dat, y) merge(dat, get(y), all = TRUE)
> allofthem <- Reduce(f, nm[-1], init = get(nm[1]))
> allofthem
  x y1 y2  y3
1 1 11 21  NA
2 2 12 NA 0.5
3 3 13 22 0.6

## note the change to "all = TRUE" in the merge() call

Cheers,
Bert



On Fri, Dec 20, 2019 at 9:37 AM Bert Gunter <[hidden email]> wrote:

> ?merge ## note the all.x option
> Example:
> > a <- data.frame(x = 1:3, y1 = 11:13)
> > b <- data.frame(x = c(1,3), y2 = 21:22)
>
> > merge(a,b, all.x = TRUE)
>   x y1 y2
> 1 1 11 21
> 2 2 12 NA
> 3 3 13 22
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Dec 20, 2019 at 9:00 AM Yuan Chun Ding <[hidden email]> wrote:
>
>> Hi Bert,
>>
>>
>>
>> Sorry that I was in a hurry  going home yesterday afternoon and just
>> posted my question and hoped to get some advice.
>>
>>
>>
>> Here is what I got yesterday before going home.
>>
>> ---------------------------------------------------------------
>>
>> setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")
>>
>>
>>
>> file_list <- list.files(pattern="*.out")
>>
>>
>>
>> #to read all 652 files into Rstudio and found that NOT all files have
>> same number of rows
>>
>> for (i in 1:length(file_list)){
>>
>>
>>
>>   assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,
>>
>>
>>
>>          read.delim(file_list[i], head=F))
>>
>> }
>>
>>
>>
>> #the first file, GTEX_1117F, in the following format,  one column and
>> 19482 rows
>>
>> #4 is marker id, 25/48 is its marker value;
>>
>> #  V1
>>
>> #  4
>>
>> # 25/48
>>
>> # 201
>>
>> # 2/2
>>
>> # ...
>>
>> # 648589
>>
>> # None
>>
>>
>>
>> #to make this one-column file into a two-column file as below
>>
>> # so first column is marker id, second is corresponding marker values for
>> the sample GTEX_1117F
>>
>> #  VNTRid      GTEX_1117F
>>
>> #   4               25/48
>>
>> #   201            2/2
>>
>> #    ...          ...
>>
>> # 648589          None
>>
>>
>>
>> for (i in 1:length(file_list)){
>>
>>   temp <- read.delim(file_list[i], head=F)
>>
>>   even <-seq(2, length(temp$V1),2)
>>
>>   odd <-seq(1, length(temp$V1)-1, 2)
>>
>>   output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
>>
>>   colnames(output)<- c("VNTRid",substr(file_list[i], 1,
>> nchar(file_list[i]) -4))
>>
>>   for (j in 1:length(temp$V1)/2){
>>
>>   output[j,1]<- as.character(temp$V1)[odd[j]]
>>
>>   output[j,2]<- as.character(temp$V1)[even[j]]}
>>
>>   assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)),
>> as.data.frame(output))
>>
>>                              }
>>
>>
>>
>> Yesterday, I intended to reshape the output file above from long to wide
>> using VNTRid as key.
>>
>> Since not all files have the same number of rows, after reshaping, those
>> file would not bind correctly using rbind function.
>>
>> One my way to work place this morning, I changed my intension; I will not
>> reshape to wide format and actually like the long format I generated. I
>> will read in a VNTR marker annotation file including VNTRid in first column
>> and marker locations in human chromosomes in the second column, this
>> annotation file should include all the VNTR markers.  I know the VNTRid in
>> the annotation file are same as the VNTRid in the 652 file I read in.
>>
>>
>>
>> Do you know a good way to merge all those 652 files (with two columns) ?
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Ding
>>
>>
>>
>>
>>
>> #merge all 652 files into one file with VNTRid as first column, 2nd to
>> 653th column are genotype with header
>>
>> #as sample ID,  so
>>
>>
>>
>> *From:* Bert Gunter [mailto:[hidden email]]
>> *Sent:* Thursday, December 19, 2019 6:52 PM
>> *To:* Yuan Chun Ding
>> *Cc:* [hidden email]
>> *Subject:* Re: [R] data reshape
>>
>>
>> ------------------------------
>>
>> [Attention: This email came from an external source. Do not open
>> attachments or click on links from unknown senders or unexpected emails.]
>> ------------------------------
>>
>> Did you even make an attempt to do this? -- or would you like us do all
>> your work for you?
>>
>>
>>
>> If you made an attempt, show us your code and errors.
>>
>> If not, we usually expect you to try on your own first.
>>
>> If you have no idea where to start, perhaps you need to spend some more
>> time with tutorials to learn basic R functionality before proceeding.
>>
>>
>>
>> Bert
>>
>>
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>>
>>
>>
>> On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <[hidden email]> wrote:
>>
>> Hi R users,
>>
>> I have a folder (called genotype) with 652 files; the file names are
>> GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only
>> one column of data without a header as below
>> 201
>> 2/2
>> 238
>> 3/4
>> 245
>> 1/2
>> .....
>> 983255
>> 3/3
>> 983766
>> None
>>
>>
>> A total of 20528 rows;
>>
>> I need to read all those 652 files in the genotype folder and then
>> reshape the one column in each file as:
>> SampleID             201        238        245        ....   983255
>>    983766
>> GTEX-1A3MV     2/2         3/4        1/2                         3/3
>>      None
>>
>> There are 10264 data columns plus the sample ID column, so 10265 columns
>> in total after data reshaping.
>>
>> After reading those 652 file and reshape the one column in each file, I
>> will stack them by the rbind function, then I have a file with a dimension
>> of 653 row, 10265 column.
>>
>>
>> Thank you,
>>
>> Ding
>>
>> ----------------------------------------------------------------------
>> ------------------------------------------------------------
>> -SECURITY/CONFIDENTIALITY WARNING-
>>
>> This message and any attachments are intended solely for the individual
>> or entity to which they are addressed. This communication may contain
>> information that is privileged, confidential, or exempt from disclosure
>> under applicable law (e.g., personal health information, research data,
>> financial information). Because this e-mail has been sent without
>> encryption, individuals other than the intended recipient may be able to
>> view the information, forward it to others or tamper with the information
>> without the knowledge or consent of the sender. If you are not the intended
>> recipient, or the employee or person responsible for delivering the message
>> to the intended recipient, any dissemination, distribution or copying of
>> the communication is strictly prohibited. If you received the communication
>> in error, please notify the sender immediately by replying to this message
>> and deleting the message and any accompanying files from your system. If,
>> due to the security risks, you do not wish to rec
>>  eive further communications via e-mail, please reply to this message and
>> inform the sender that you do not wish to receive further e-mail from the
>> sender. (LCP301)
>> ------------------------------------------------------------
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: data reshape

Yuan Chun Ding
Hi Bert,

Thank you for the elegant code example.  I achieved my goal using lapply function and do.call function together.  Reduce function is nicer one and I am looking into it.

Ding

From: Bert Gunter [mailto:[hidden email]]
Sent: Friday, December 20, 2019 11:47 AM
To: Yuan Chun Ding
Cc: [hidden email]
Subject: Re: [R] data reshape

________________________________
[Attention: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.]
________________________________
It is perhaps worth noting that (assuming I understand correctly) this can easily be done in one go without any overt looping as a nice application of Reduce() after all your files are read into your global environment as a nice application of Reduce().

Example:

> a.out <- data.frame(x = 1:3, y1 = 11:13)
> b.out <- data.frame(x = c(1,3), y2 = 21:22)
> d.out <- data.frame(x = c(2:3), y3 = c(.5,.6))

> nm <- ls(pat = ".*out$")
> f <- function(dat, y) merge(dat, get(y), all = TRUE)
> allofthem <- Reduce(f, nm[-1], init = get(nm[1]))
> allofthem
  x y1 y2  y3
1 1 11 21  NA
2 2 12 NA 0.5
3 3 13 22 0.6

## note the change to "all = TRUE" in the merge() call

Cheers,
Bert



On Fri, Dec 20, 2019 at 9:37 AM Bert Gunter <[hidden email]<mailto:[hidden email]>> wrote:
?merge ## note the all.x option
Example:
> a <- data.frame(x = 1:3, y1 = 11:13)
> b <- data.frame(x = c(1,3), y2 = 21:22)

> merge(a,b, all.x = TRUE)
  x y1 y2
1 1 11 21
2 2 12 NA
3 3 13 22


Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Dec 20, 2019 at 9:00 AM Yuan Chun Ding <[hidden email]<mailto:[hidden email]>> wrote:
Hi Bert,

Sorry that I was in a hurry  going home yesterday afternoon and just posted my question and hoped to get some advice.

Here is what I got yesterday before going home.
---------------------------------------------------------------
setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")

file_list <- list.files(pattern="*.out")

#to read all 652 files into Rstudio and found that NOT all files have same number of rows
for (i in 1:length(file_list)){

  assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,

         read.delim(file_list[i], head=F))
}

#the first file, GTEX_1117F, in the following format,  one column and 19482 rows
#4 is marker id, 25/48 is its marker value;
#  V1
#  4
# 25/48
# 201
# 2/2
# ...
# 648589
# None

#to make this one-column file into a two-column file as below
# so first column is marker id, second is corresponding marker values for the sample GTEX_1117F
#  VNTRid      GTEX_1117F
#   4               25/48
#   201            2/2
#    ...          ...
# 648589          None

for (i in 1:length(file_list)){
  temp <- read.delim(file_list[i], head=F)
  even <-seq(2, length(temp$V1),2)
  odd <-seq(1, length(temp$V1)-1, 2)
  output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
  colnames(output)<- c("VNTRid",substr(file_list[i], 1, nchar(file_list[i]) -4))
  for (j in 1:length(temp$V1)/2){
  output[j,1]<- as.character(temp$V1)[odd[j]]
  output[j,2]<- as.character(temp$V1)[even[j]]}
  assign(gsub("-","_", substr(file_list[i], 1, nchar(file_list[i])-4)), as.data.frame(output))
                             }

Yesterday, I intended to reshape the output file above from long to wide using VNTRid as key.
Since not all files have the same number of rows, after reshaping, those file would not bind correctly using rbind function.
One my way to work place this morning, I changed my intension; I will not reshape to wide format and actually like the long format I generated. I will read in a VNTR marker annotation file including VNTRid in first column and marker locations in human chromosomes in the second column, this annotation file should include all the VNTR markers.  I know the VNTRid in the annotation file are same as the VNTRid in the 652 file I read in.

Do you know a good way to merge all those 652 files (with two columns) ?

Thank you,

Ding


#merge all 652 files into one file with VNTRid as first column, 2nd to 653th column are genotype with header
#as sample ID,  so

From: Bert Gunter [mailto:[hidden email]<mailto:[hidden email]>]
Sent: Thursday, December 19, 2019 6:52 PM
To: Yuan Chun Ding
Cc: [hidden email]<mailto:[hidden email]>
Subject: Re: [R] data reshape

________________________________
[Attention: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.]
________________________________
Did you even make an attempt to do this? -- or would you like us do all your work for you?

If you made an attempt, show us your code and errors.
If not, we usually expect you to try on your own first.
If you have no idea where to start, perhaps you need to spend some more time with tutorials to learn basic R functionality before proceeding.

Bert

"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <[hidden email]<mailto:[hidden email]>> wrote:
Hi R users,

I have a folder (called genotype) with 652 files; the file names are  GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file,  only one column of data without a header as below
201
2/2
238
3/4
245
1/2
.....
983255
3/3
983766
None


A total of 20528 rows;

I need to read all those 652 files in the genotype folder and then reshape the one column in each file as:
SampleID             201        238        245        ....   983255         983766
GTEX-1A3MV     2/2         3/4        1/2                         3/3         None

There are 10264 data columns plus the sample ID column, so 10265 columns in total after data reshaping.

After reading those 652 file and reshape the one column in each file, I will stack them by the rbind function, then I have a file with a dimension of 653 row, 10265 column.


Thank you,

Ding

----------------------------------------------------------------------
------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to rec
 eive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301)
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email]<mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.