how to read this kind of csv in R?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

how to read this kind of csv in R?

R help mailing list-2
I got hundreds of csv files. The real formats in each csv file are as follows:

aa(cm)
1, 2 , 3,

bb(mm)
1, 2, 3,
4, 5, 6,
7, 8, 9,

cc(mm)
3, 4, 5,
7, 5, 9,
6, 5, 8,

How can I use read.table or read.csv to convert the csv files
to a tidy data frame format as follow:

aa, bb, cc
1, 1, 3
1, 2, 4
1, 3, 5
2, 4, 7
2, 5, 5
2, 6, 9
3, 7, 6
3, 8, 5
3, 9, 8

many thanks.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to read this kind of csv in R?

Duncan Murdoch-2
On 06/10/2019 7:29 a.m., vod vos via R-help wrote:

> I got hundreds of csv files. The real formats in each csv file are as follows:
>
> aa(cm)
> 1, 2 , 3,
>
> bb(mm)
> 1, 2, 3,
> 4, 5, 6,
> 7, 8, 9,
>
> cc(mm)
> 3, 4, 5,
> 7, 5, 9,
> 6, 5, 8,
>
> How can I use read.table or read.csv to convert the csv files
> to a tidy data frame format as follow:
>
> aa, bb, cc
> 1, 1, 3
> 1, 2, 4
> 1, 3, 5
> 2, 4, 7
> 2, 5, 5
> 2, 6, 9
> 3, 7, 6
> 3, 8, 5
> 3, 9, 8
>
> many thanks.

You'll need more than those two functions to do the transformation you
want.  To work out what you need, write out the process in detail in
English (or another natural language), not in code.  For example:

1.  Read aa from file 1.
2.  Read bb from file 2.
3.  Read cc from file 3.
4.  Expand all vectors to the same length.
5.  Combine them into a single dataframe.

Then work out each step separately.  I think you'll want to use
something like scan("filename", skip = 1, sep = ",") in steps 1, 2, and
3, but this will add NA values at the end of each line because of the
final comma, so you could do this:

aa <- scan("file1", skip = 1, sep = ",")
aa <- aa[!is.na(aa)]

and similarly for the others.

I don't know the rules for expanding that you'll need in your real data,
but for your example step 4 could be

   aa <- rep(aa, each = 3)

Then step 5 could be

   result <- data.frame(aa, bb, cc)

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to read this kind of csv in R?

R help mailing list-2
The problem is aa, bb and cc all in a single csv file
contains no blank line.
The single csv file like list output.

aa(cm)
 1, 2 , 3,
 bb(mm)
  1, 2, 3,
 4, 5, 6,
 7, 8, 9,
 cc(mm)
 3, 4, 5,
 7, 5, 9,
 6, 5, 8,



 ---- 在 星期日, 06 十月 2019 05:08:41 -0700 Duncan Murdoch <[hidden email]> 撰写 ----
 > On 06/10/2019 7:29 a.m., vod vos via R-help wrote:
 > > I got hundreds of csv files. The real formats in each csv file are as follows:
 > >
 > > aa(cm)
 > > 1, 2 , 3,
 > >
 > > bb(mm)
 > > 1, 2, 3,
 > > 4, 5, 6,
 > > 7, 8, 9,
 > >
 > > cc(mm)
 > > 3, 4, 5,
 > > 7, 5, 9,
 > > 6, 5, 8,
 > >
 > > How can I use read.table or read.csv to convert the csv files
 > > to a tidy data frame format as follow:
 > >
 > > aa, bb, cc
 > > 1, 1, 3
 > > 1, 2, 4
 > > 1, 3, 5
 > > 2, 4, 7
 > > 2, 5, 5
 > > 2, 6, 9
 > > 3, 7, 6
 > > 3, 8, 5
 > > 3, 9, 8
 > >
 > > many thanks.
 >
 > You'll need more than those two functions to do the transformation you
 > want.  To work out what you need, write out the process in detail in
 > English (or another natural language), not in code.  For example:
 >
 > 1.  Read aa from file 1.
 > 2.  Read bb from file 2.
 > 3.  Read cc from file 3.
 > 4.  Expand all vectors to the same length.
 > 5.  Combine them into a single dataframe.
 >
 > Then work out each step separately.  I think you'll want to use
 > something like scan("filename", skip = 1, sep = ",") in steps 1, 2, and
 > 3, but this will add NA values at the end of each line because of the
 > final comma, so you could do this:
 >
 > aa <- scan("file1", skip = 1, sep = ",")
 > aa <- aa[!is.na(aa)]
 >
 > and similarly for the others.
 >
 > I don't know the rules for expanding that you'll need in your real data,
 > but for your example step 4 could be
 >
 >    aa <- rep(aa, each = 3)
 >
 > Then step 5 could be
 >
 >    result <- data.frame(aa, bb, cc)
 >
 > Duncan Murdoch
 >

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to read this kind of csv in R?

Duncan Murdoch-2
On 06/10/2019 8:23 a.m., vod vos wrote:
> The problem is aa, bb and cc all in a single csv file
> contains no blank line.

So what steps do you need, and which of them do you need help with?

Duncan Murdoch

> The single csv file like list output.
>
> aa(cm)
>   1, 2 , 3,
>   bb(mm)
>    1, 2, 3,
>   4, 5, 6,
>   7, 8, 9,
>   cc(mm)
>   3, 4, 5,
>   7, 5, 9,
>   6, 5, 8,
>
>
>
>   ---- 在 星期日, 06 十月 2019 05:08:41 -0700 Duncan Murdoch <[hidden email]> 撰写 ----
>   > On 06/10/2019 7:29 a.m., vod vos via R-help wrote:
>   > > I got hundreds of csv files. The real formats in each csv file are as follows:
>   > >
>   > > aa(cm)
>   > > 1, 2 , 3,
>   > >
>   > > bb(mm)
>   > > 1, 2, 3,
>   > > 4, 5, 6,
>   > > 7, 8, 9,
>   > >
>   > > cc(mm)
>   > > 3, 4, 5,
>   > > 7, 5, 9,
>   > > 6, 5, 8,
>   > >
>   > > How can I use read.table or read.csv to convert the csv files
>   > > to a tidy data frame format as follow:
>   > >
>   > > aa, bb, cc
>   > > 1, 1, 3
>   > > 1, 2, 4
>   > > 1, 3, 5
>   > > 2, 4, 7
>   > > 2, 5, 5
>   > > 2, 6, 9
>   > > 3, 7, 6
>   > > 3, 8, 5
>   > > 3, 9, 8
>   > >
>   > > many thanks.
>   >
>   > You'll need more than those two functions to do the transformation you
>   > want.  To work out what you need, write out the process in detail in
>   > English (or another natural language), not in code.  For example:
>   >
>   > 1.  Read aa from file 1.
>   > 2.  Read bb from file 2.
>   > 3.  Read cc from file 3.
>   > 4.  Expand all vectors to the same length.
>   > 5.  Combine them into a single dataframe.
>   >
>   > Then work out each step separately.  I think you'll want to use
>   > something like scan("filename", skip = 1, sep = ",") in steps 1, 2, and
>   > 3, but this will add NA values at the end of each line because of the
>   > final comma, so you could do this:
>   >
>   > aa <- scan("file1", skip = 1, sep = ",")
>   > aa <- aa[!is.na(aa)]
>   >
>   > and similarly for the others.
>   >
>   > I don't know the rules for expanding that you'll need in your real data,
>   > but for your example step 4 could be
>   >
>   >    aa <- rep(aa, each = 3)
>   >
>   > Then step 5 could be
>   >
>   >    result <- data.frame(aa, bb, cc)
>   >
>   > Duncan Murdoch
>   >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to read this kind of csv in R?

Rui Barradas
In reply to this post by R help mailing list-2
Hello,

It is not clear if all files have

* a first block with just one data line
* all other blocks with as many rows as the numbers in that first data line.

If yes, maybe something like this?

lns <- readLines("strange.csv")
lns <- lns[sapply(lns, nchar) > 0]
lns <- sub(",$", "", lns)
i_title <- grep("[[:alpha:]]", lns)

tmp <- lapply(seq_along(i_title), function(i){
   tmp <- if(i < length(i_title)){
     lns[(i_title[i] + 1):(i_title[i + 1] - 1)]
   }else{
     lns[(i_title[i] + 1):length(lns)]
   }
   list(n = length(tmp), text = unlist(strsplit(tmp, ",")))
})

n <- max(sapply(tmp, '[[', 'n'))
tmp <- lapply(tmp, function(x) as.numeric(x$text))
tmp[[1]] <- rep(tmp[[1]], each = n)
res <- do.call(cbind.data.frame, tmp)
names(res) <- lns[i_title]
res


If you have hundreds of files, you should make a function out of the
code above.

Hope this helps,

Rui Barradas

Às 12:29 de 06/10/19, vod vos via R-help escreveu:

> I got hundreds of csv files. The real formats in each csv file are as follows:
>
> aa(cm)
> 1, 2 , 3,
>
> bb(mm)
> 1, 2, 3,
> 4, 5, 6,
> 7, 8, 9,
>
> cc(mm)
> 3, 4, 5,
> 7, 5, 9,
> 6, 5, 8,
>
> How can I use read.table or read.csv to convert the csv files
> to a tidy data frame format as follow:
>
> aa, bb, cc
> 1, 1, 3
> 1, 2, 4
> 1, 3, 5
> 2, 4, 7
> 2, 5, 5
> 2, 6, 9
> 3, 7, 6
> 3, 8, 5
> 3, 9, 8
>
> many thanks.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to read this kind of csv in R?

R help mailing list-2
The csv file is exported from Windows (dos format), so the line break is different from Unix.


 ---- 在 星期一, 07 十月 2019 01:18:54 -0700  <[hidden email]> 撰写 ----
 > I am mad about importing this strange csv format type.
 >
 > The real csv has been attached now. The raw data points are huge.
 >
 > Many thanks.
 >
 >
 >
 >
 >  ---- 在 星期日, 06 十月 2019 07:58:37 -0700 Rui Barradas <[hidden email]> 撰写 ----
 >  > Hello,
 >  >
 >  > It is not clear if all files have
 >  >
 >  > * a first block with just one data line
 >  > * all other blocks with as many rows as the numbers in that first data line.
 >  >
 >  > If yes, maybe something like this?
 >  >
 >  > lns <- readLines("strange.csv")
 >  > lns <- lns[sapply(lns, nchar) > 0]
 >  > lns <- sub(",$", "", lns)
 >  > i_title <- grep("[[:alpha:]]", lns)
 >  >
 >  > tmp <- lapply(seq_along(i_title), function(i){
 >  >    tmp <- if(i < length(i_title)){
 >  >      lns[(i_title[i] + 1):(i_title[i + 1] - 1)]
 >  >    }else{
 >  >      lns[(i_title[i] + 1):length(lns)]
 >  >    }
 >  >    list(n = length(tmp), text = unlist(strsplit(tmp, ",")))
 >  > })
 >  >
 >  > n <- max(sapply(tmp, '[[', 'n'))
 >  > tmp <- lapply(tmp, function(x) as.numeric(x$text))
 >  > tmp[[1]] <- rep(tmp[[1]], each = n)
 >  > res <- do.call(cbind.data.frame, tmp)
 >  > names(res) <- lns[i_title]
 >  > res
 >  >
 >  >
 >  > If you have hundreds of files, you should make a function out of the
 >  > code above.
 >  >
 >  > Hope this helps,
 >  >
 >  > Rui Barradas
 >  >
 >  > Às 12:29 de 06/10/19, vod vos via R-help escreveu:
 >  > > I got hundreds of csv files. The real formats in each csv file are as follows:
 >  > >
 >  > > aa(cm)
 >  > > 1, 2 , 3,
 >  > >
 >  > > bb(mm)
 >  > > 1, 2, 3,
 >  > > 4, 5, 6,
 >  > > 7, 8, 9,
 >  > >
 >  > > cc(mm)
 >  > > 3, 4, 5,
 >  > > 7, 5, 9,
 >  > > 6, 5, 8,
 >  > >
 >  > > How can I use read.table or read.csv to convert the csv files
 >  > > to a tidy data frame format as follow:
 >  > >
 >  > > aa, bb, cc
 >  > > 1, 1, 3
 >  > > 1, 2, 4
 >  > > 1, 3, 5
 >  > > 2, 4, 7
 >  > > 2, 5, 5
 >  > > 2, 6, 9
 >  > > 3, 7, 6
 >  > > 3, 8, 5
 >  > > 3, 9, 8
 >  > >
 >  > > many thanks.
 >  > >
 >  > > ______________________________________________
 >  > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
 >  > > https://stat.ethz.ch/mailman/listinfo/r-help
 >  > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 >  > > and provide commented, minimal, self-contained, reproducible code.
 >  > >
 >  >

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to read this kind of csv in R?

Rui Barradas
In reply to this post by Rui Barradas
Hello,

OK, I had some spare time. Try



readCSVFile <- function(filename){
   lns <- readLines(filename)
   lns <- lns[sapply(lns, nchar) > 0]
   lns <- gsub(" ", "", lns)
   lns <- sub(";$", "", lns)
   i_title <- grep("[[:alpha:]]", lns)

   blocks <- lapply(seq_along(i_title)[-1], function(i){
     if(i == length(i_title)){
       j <- i_title[i] + 1
       k <- length(lns)
     }else{
       j <- i_title[i] + 1
       k <- i_title[i + 1] - 1
     }
     lns[j:k]
   })

   n <- length(unlist(strsplit(blocks[[1]][1], ";")))
   first <- unlist(strsplit(lns[i_title[1] + 1], ";"))
   first <- as.numeric(first)
   first <- rep(first, each = n)

   blocks <- lapply(blocks, function(x){
     unlist(strsplit(x, ";"))
   })
   res <- do.call(cbind.data.frame, blocks)
   res <- cbind.data.frame(first, res)

   names(res) <- sub("\\[.*\\]$", "", lns[i_title])
   res
}

df1 <- readCSVFile("strange.csv")


If this function doesn't do it, please try to make an effort on your
own, R-Help is not a code writing service, it's a mail list for *doubts*
on R code.

Hope this helps,

Rui Barradas

Às 09:18 de 07/10/19, [hidden email] escreveu:

> I am mad about importing this strange csv format type.
>
> The real csv has been attached now. The raw data points are huge.
>
> Many thanks.
>
>
>
>
>   ---- 在 星期日, 06 十月 2019 07:58:37 -0700 Rui Barradas <[hidden email]> 撰写 ----
>   > Hello,
>   >
>   > It is not clear if all files have
>   >
>   > * a first block with just one data line
>   > * all other blocks with as many rows as the numbers in that first data line.
>   >
>   > If yes, maybe something like this?
>   >
>   > lns <- readLines("strange.csv")
>   > lns <- lns[sapply(lns, nchar) > 0]
>   > lns <- sub(",$", "", lns)
>   > i_title <- grep("[[:alpha:]]", lns)
>   >
>   > tmp <- lapply(seq_along(i_title), function(i){
>   >    tmp <- if(i < length(i_title)){
>   >      lns[(i_title[i] + 1):(i_title[i + 1] - 1)]
>   >    }else{
>   >      lns[(i_title[i] + 1):length(lns)]
>   >    }
>   >    list(n = length(tmp), text = unlist(strsplit(tmp, ",")))
>   > })
>   >
>   > n <- max(sapply(tmp, '[[', 'n'))
>   > tmp <- lapply(tmp, function(x) as.numeric(x$text))
>   > tmp[[1]] <- rep(tmp[[1]], each = n)
>   > res <- do.call(cbind.data.frame, tmp)
>   > names(res) <- lns[i_title]
>   > res
>   >
>   >
>   > If you have hundreds of files, you should make a function out of the
>   > code above.
>   >
>   > Hope this helps,
>   >
>   > Rui Barradas
>   >
>   > Às 12:29 de 06/10/19, vod vos via R-help escreveu:
>   > > I got hundreds of csv files. The real formats in each csv file are as follows:
>   > >
>   > > aa(cm)
>   > > 1, 2 , 3,
>   > >
>   > > bb(mm)
>   > > 1, 2, 3,
>   > > 4, 5, 6,
>   > > 7, 8, 9,
>   > >
>   > > cc(mm)
>   > > 3, 4, 5,
>   > > 7, 5, 9,
>   > > 6, 5, 8,
>   > >
>   > > How can I use read.table or read.csv to convert the csv files
>   > > to a tidy data frame format as follow:
>   > >
>   > > aa, bb, cc
>   > > 1, 1, 3
>   > > 1, 2, 4
>   > > 1, 3, 5
>   > > 2, 4, 7
>   > > 2, 5, 5
>   > > 2, 6, 9
>   > > 3, 7, 6
>   > > 3, 8, 5
>   > > 3, 9, 8
>   > >
>   > > many thanks.
>   > >
>   > > ______________________________________________
>   > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>   > > https://stat.ethz.ch/mailman/listinfo/r-help
>   > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>   > > and provide commented, minimal, self-contained, reproducible code.
>   > >
>   >
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.