Read

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Read

Val-17
Hi all, I am trying to read a messy data  but facing  difficulty.  The
data has several columns separated by blank space(s).  Each column
value may have different lengths across the rows.   The first
row(header) has four columns. However, each row may not have the four
column values.  For instance, the first data row has only the first
two column values. The fourth data row has the first and last column
values, the second and the third column values are missing for this
row..  How do I read this data set correctly? Here is my sample data
set, output and desired output.   To make it clear to each data point
I have added the row and column numbers. I cannot use fixed width
format reading because each row  may have different length for  a
given column.

dat<-read.table(text="x1  x2  x3 x4
1 B22
2         C33
322 B22      D34
4                 D44
51         D53
60 D62            ",header=T, fill=T,na.strings=c("","NA"))

Output
      x1  x2     x3     x4
1   1     B12 <NA> NA
2   2    C23 <NA>  NA
3 322  B32  D34   NA
4   4   D44  <NA>  NA
5  51 D53  <NA>   NA
6  60 D62  <NA>  NA


Desired output
   x1   x2     x3       x4
1   1    B22    <NA>   NA
2   2   <NA>  C33     NA
3 322  B32    NA      D34
4   4   <NA>   NA      D44
5  51  <NA>  D53     NA
6  60   D62   <NA>   NA

Thank you,

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

Bill Dunlap-2
Since the columns in the file are separated by a space character, " ",
add the read.table argument sep=" ".

-Bill

On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:

>
> Hi all, I am trying to read a messy data  but facing  difficulty.  The
> data has several columns separated by blank space(s).  Each column
> value may have different lengths across the rows.   The first
> row(header) has four columns. However, each row may not have the four
> column values.  For instance, the first data row has only the first
> two column values. The fourth data row has the first and last column
> values, the second and the third column values are missing for this
> row..  How do I read this data set correctly? Here is my sample data
> set, output and desired output.   To make it clear to each data point
> I have added the row and column numbers. I cannot use fixed width
> format reading because each row  may have different length for  a
> given column.
>
> dat<-read.table(text="x1  x2  x3 x4
> 1 B22
> 2         C33
> 322 B22      D34
> 4                 D44
> 51         D53
> 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
>
> Output
>       x1  x2     x3     x4
> 1   1     B12 <NA> NA
> 2   2    C23 <NA>  NA
> 3 322  B32  D34   NA
> 4   4   D44  <NA>  NA
> 5  51 D53  <NA>   NA
> 6  60 D62  <NA>  NA
>
>
> Desired output
>    x1   x2     x3       x4
> 1   1    B22    <NA>   NA
> 2   2   <NA>  C33     NA
> 3 322  B32    NA      D34
> 4   4   <NA>   NA      D44
> 5  51  <NA>  D53     NA
> 6  60   D62   <NA>   NA
>
> Thank you,
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

Val-17
I Tried that one and it did not work. Please see the error message
Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
\n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
:
  more columns than column names

On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <[hidden email]> wrote:

>
> Since the columns in the file are separated by a space character, " ",
> add the read.table argument sep=" ".
>
> -Bill
>
> On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> >
> > Hi all, I am trying to read a messy data  but facing  difficulty.  The
> > data has several columns separated by blank space(s).  Each column
> > value may have different lengths across the rows.   The first
> > row(header) has four columns. However, each row may not have the four
> > column values.  For instance, the first data row has only the first
> > two column values. The fourth data row has the first and last column
> > values, the second and the third column values are missing for this
> > row..  How do I read this data set correctly? Here is my sample data
> > set, output and desired output.   To make it clear to each data point
> > I have added the row and column numbers. I cannot use fixed width
> > format reading because each row  may have different length for  a
> > given column.
> >
> > dat<-read.table(text="x1  x2  x3 x4
> > 1 B22
> > 2         C33
> > 322 B22      D34
> > 4                 D44
> > 51         D53
> > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> >
> > Output
> >       x1  x2     x3     x4
> > 1   1     B12 <NA> NA
> > 2   2    C23 <NA>  NA
> > 3 322  B32  D34   NA
> > 4   4   D44  <NA>  NA
> > 5  51 D53  <NA>   NA
> > 6  60 D62  <NA>  NA
> >
> >
> > Desired output
> >    x1   x2     x3       x4
> > 1   1    B22    <NA>   NA
> > 2   2   <NA>  C33     NA
> > 3 322  B32    NA      D34
> > 4   4   <NA>   NA      D44
> > 5  51  <NA>  D53     NA
> > 6  60   D62   <NA>   NA
> >
> > Thank you,
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

Bill Dunlap-2
You said the column values were separated by space characters.
Copying the text from gmail shows that some column names and column
values are separated by single spaces (e.g., between x1 and x2) and
some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
up the spacing or is there some other way to tell where the omitted
values are?

-Bill

On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:

>
> I Tried that one and it did not work. Please see the error message
> Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
> \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
> :
>   more columns than column names
>
> On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <[hidden email]> wrote:
> >
> > Since the columns in the file are separated by a space character, " ",
> > add the read.table argument sep=" ".
> >
> > -Bill
> >
> > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> > >
> > > Hi all, I am trying to read a messy data  but facing  difficulty.  The
> > > data has several columns separated by blank space(s).  Each column
> > > value may have different lengths across the rows.   The first
> > > row(header) has four columns. However, each row may not have the four
> > > column values.  For instance, the first data row has only the first
> > > two column values. The fourth data row has the first and last column
> > > values, the second and the third column values are missing for this
> > > row..  How do I read this data set correctly? Here is my sample data
> > > set, output and desired output.   To make it clear to each data point
> > > I have added the row and column numbers. I cannot use fixed width
> > > format reading because each row  may have different length for  a
> > > given column.
> > >
> > > dat<-read.table(text="x1  x2  x3 x4
> > > 1 B22
> > > 2         C33
> > > 322 B22      D34
> > > 4                 D44
> > > 51         D53
> > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> > >
> > > Output
> > >       x1  x2     x3     x4
> > > 1   1     B12 <NA> NA
> > > 2   2    C23 <NA>  NA
> > > 3 322  B32  D34   NA
> > > 4   4   D44  <NA>  NA
> > > 5  51 D53  <NA>   NA
> > > 6  60 D62  <NA>  NA
> > >
> > >
> > > Desired output
> > >    x1   x2     x3       x4
> > > 1   1    B22    <NA>   NA
> > > 2   2   <NA>  C33     NA
> > > 3 322  B32    NA      D34
> > > 4   4   <NA>   NA      D44
> > > 5  51  <NA>  D53     NA
> > > 6  60   D62   <NA>   NA
> > >
> > > Thank you,
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

Val-17
That is my problem. The spacing between columns is not consistent.  It
  may be  single space  or multiple spaces (two or three).

On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <[hidden email]> wrote:

>
> You said the column values were separated by space characters.
> Copying the text from gmail shows that some column names and column
> values are separated by single spaces (e.g., between x1 and x2) and
> some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
> up the spacing or is there some other way to tell where the omitted
> values are?
>
> -Bill
>
> On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
> >
> > I Tried that one and it did not work. Please see the error message
> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
> > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
> > :
> >   more columns than column names
> >
> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <[hidden email]> wrote:
> > >
> > > Since the columns in the file are separated by a space character, " ",
> > > add the read.table argument sep=" ".
> > >
> > > -Bill
> > >
> > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> > > >
> > > > Hi all, I am trying to read a messy data  but facing  difficulty.  The
> > > > data has several columns separated by blank space(s).  Each column
> > > > value may have different lengths across the rows.   The first
> > > > row(header) has four columns. However, each row may not have the four
> > > > column values.  For instance, the first data row has only the first
> > > > two column values. The fourth data row has the first and last column
> > > > values, the second and the third column values are missing for this
> > > > row..  How do I read this data set correctly? Here is my sample data
> > > > set, output and desired output.   To make it clear to each data point
> > > > I have added the row and column numbers. I cannot use fixed width
> > > > format reading because each row  may have different length for  a
> > > > given column.
> > > >
> > > > dat<-read.table(text="x1  x2  x3 x4
> > > > 1 B22
> > > > 2         C33
> > > > 322 B22      D34
> > > > 4                 D44
> > > > 51         D53
> > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> > > >
> > > > Output
> > > >       x1  x2     x3     x4
> > > > 1   1     B12 <NA> NA
> > > > 2   2    C23 <NA>  NA
> > > > 3 322  B32  D34   NA
> > > > 4   4   D44  <NA>  NA
> > > > 5  51 D53  <NA>   NA
> > > > 6  60 D62  <NA>  NA
> > > >
> > > >
> > > > Desired output
> > > >    x1   x2     x3       x4
> > > > 1   1    B22    <NA>   NA
> > > > 2   2   <NA>  C33     NA
> > > > 3 322  B32    NA      D34
> > > > 4   4   <NA>   NA      D44
> > > > 5  51  <NA>  D53     NA
> > > > 6  60   D62   <NA>   NA
> > > >
> > > > Thank you,
> > > >
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

jholtman
Try this:

> library(tidyverse)

> text <-  "x1  x2  x3 x4\n1 B12 \n2       C23 \n322 B32      D34 \n4
     D44 \n51     D53\n60 D62         "

> # read in the data as characters and replace multiple blanks with single
blank
> input <- read_lines(text)

> input <- str_replace_all(input, ' +', ' ')

> mydata <- read_delim(input, ' ', col_names = TRUE)
Warning: 5 parsing failures.
row col  expected    actual         file
  1  -- 4 columns 3 columns literal data
  2  -- 4 columns 3 columns literal data
  4  -- 4 columns 3 columns literal data
  5  -- 4 columns 2 columns literal data
  6  -- 4 columns 3 columns literal data

> mydata
# A tibble: 6 x 4
     x1 x2    x3    x4
  <dbl> <chr> <chr> <lgl>
1     1 B12   NA    NA
2     2 C23   NA    NA
3   322 B32   D34   NA
4     4 D44   NA    NA
5    51 D53   NA    NA
6    60 D62   NA    NA
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Feb 22, 2021 at 4:49 PM Val <[hidden email]> wrote:

> That is my problem. The spacing between columns is not consistent.  It
>   may be  single space  or multiple spaces (two or three).
>
> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <[hidden email]>
> wrote:
> >
> > You said the column values were separated by space characters.
> > Copying the text from gmail shows that some column names and column
> > values are separated by single spaces (e.g., between x1 and x2) and
> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
> > up the spacing or is there some other way to tell where the omitted
> > values are?
> >
> > -Bill
> >
> > On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
> > >
> > > I Tried that one and it did not work. Please see the error message
> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
> > > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
> > > :
> > >   more columns than column names
> > >
> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <[hidden email]>
> wrote:
> > > >
> > > > Since the columns in the file are separated by a space character, "
> ",
> > > > add the read.table argument sep=" ".
> > > >
> > > > -Bill
> > > >
> > > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> > > > >
> > > > > Hi all, I am trying to read a messy data  but facing  difficulty.
> The
> > > > > data has several columns separated by blank space(s).  Each column
> > > > > value may have different lengths across the rows.   The first
> > > > > row(header) has four columns. However, each row may not have the
> four
> > > > > column values.  For instance, the first data row has only the first
> > > > > two column values. The fourth data row has the first and last
> column
> > > > > values, the second and the third column values are missing for this
> > > > > row..  How do I read this data set correctly? Here is my sample
> data
> > > > > set, output and desired output.   To make it clear to each data
> point
> > > > > I have added the row and column numbers. I cannot use fixed width
> > > > > format reading because each row  may have different length for  a
> > > > > given column.
> > > > >
> > > > > dat<-read.table(text="x1  x2  x3 x4
> > > > > 1 B22
> > > > > 2         C33
> > > > > 322 B22      D34
> > > > > 4                 D44
> > > > > 51         D53
> > > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> > > > >
> > > > > Output
> > > > >       x1  x2     x3     x4
> > > > > 1   1     B12 <NA> NA
> > > > > 2   2    C23 <NA>  NA
> > > > > 3 322  B32  D34   NA
> > > > > 4   4   D44  <NA>  NA
> > > > > 5  51 D53  <NA>   NA
> > > > > 6  60 D62  <NA>  NA
> > > > >
> > > > >
> > > > > Desired output
> > > > >    x1   x2     x3       x4
> > > > > 1   1    B22    <NA>   NA
> > > > > 2   2   <NA>  C33     NA
> > > > > 3 322  B32    NA      D34
> > > > > 4   4   <NA>   NA      D44
> > > > > 5  51  <NA>  D53     NA
> > > > > 6  60   D62   <NA>   NA
> > > > >
> > > > > Thank you,
> > > > >
> > > > > ______________________________________________
> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

jholtman
Messed up did not see your 'desired' output which will be hard since there
is not a consistent number of spaces that would represent the desired
column number.  Do you have any hit as to how to interpret the spacing
especially you have several hundred more lines?  Is the output supposed to
the 'fixed' field?

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Feb 22, 2021 at 5:00 PM jim holtman <[hidden email]> wrote:

> Try this:
>
> > library(tidyverse)
>
> > text <-  "x1  x2  x3 x4\n1 B12 \n2       C23 \n322 B32      D34 \n4
>        D44 \n51     D53\n60 D62         "
>
> > # read in the data as characters and replace multiple blanks with single
> blank
> > input <- read_lines(text)
>
> > input <- str_replace_all(input, ' +', ' ')
>
> > mydata <- read_delim(input, ' ', col_names = TRUE)
> Warning: 5 parsing failures.
> row col  expected    actual         file
>   1  -- 4 columns 3 columns literal data
>   2  -- 4 columns 3 columns literal data
>   4  -- 4 columns 3 columns literal data
>   5  -- 4 columns 2 columns literal data
>   6  -- 4 columns 3 columns literal data
>
> > mydata
> # A tibble: 6 x 4
>      x1 x2    x3    x4
>   <dbl> <chr> <chr> <lgl>
> 1     1 B12   NA    NA
> 2     2 C23   NA    NA
> 3   322 B32   D34   NA
> 4     4 D44   NA    NA
> 5    51 D53   NA    NA
> 6    60 D62   NA    NA
> >
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Mon, Feb 22, 2021 at 4:49 PM Val <[hidden email]> wrote:
>
>> That is my problem. The spacing between columns is not consistent.  It
>>   may be  single space  or multiple spaces (two or three).
>>
>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <[hidden email]>
>> wrote:
>> >
>> > You said the column values were separated by space characters.
>> > Copying the text from gmail shows that some column names and column
>> > values are separated by single spaces (e.g., between x1 and x2) and
>> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
>> > up the spacing or is there some other way to tell where the omitted
>> > values are?
>> >
>> > -Bill
>> >
>> > On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
>> > >
>> > > I Tried that one and it did not work. Please see the error message
>> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
>> > > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
>> > > :
>> > >   more columns than column names
>> > >
>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <[hidden email]>
>> wrote:
>> > > >
>> > > > Since the columns in the file are separated by a space character, "
>> ",
>> > > > add the read.table argument sep=" ".
>> > > >
>> > > > -Bill
>> > > >
>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
>> > > > >
>> > > > > Hi all, I am trying to read a messy data  but facing
>> difficulty.  The
>> > > > > data has several columns separated by blank space(s).  Each column
>> > > > > value may have different lengths across the rows.   The first
>> > > > > row(header) has four columns. However, each row may not have the
>> four
>> > > > > column values.  For instance, the first data row has only the
>> first
>> > > > > two column values. The fourth data row has the first and last
>> column
>> > > > > values, the second and the third column values are missing for
>> this
>> > > > > row..  How do I read this data set correctly? Here is my sample
>> data
>> > > > > set, output and desired output.   To make it clear to each data
>> point
>> > > > > I have added the row and column numbers. I cannot use fixed width
>> > > > > format reading because each row  may have different length for  a
>> > > > > given column.
>> > > > >
>> > > > > dat<-read.table(text="x1  x2  x3 x4
>> > > > > 1 B22
>> > > > > 2         C33
>> > > > > 322 B22      D34
>> > > > > 4                 D44
>> > > > > 51         D53
>> > > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
>> > > > >
>> > > > > Output
>> > > > >       x1  x2     x3     x4
>> > > > > 1   1     B12 <NA> NA
>> > > > > 2   2    C23 <NA>  NA
>> > > > > 3 322  B32  D34   NA
>> > > > > 4   4   D44  <NA>  NA
>> > > > > 5  51 D53  <NA>   NA
>> > > > > 6  60 D62  <NA>  NA
>> > > > >
>> > > > >
>> > > > > Desired output
>> > > > >    x1   x2     x3       x4
>> > > > > 1   1    B22    <NA>   NA
>> > > > > 2   2   <NA>  C33     NA
>> > > > > 3 322  B32    NA      D34
>> > > > > 4   4   <NA>   NA      D44
>> > > > > 5  51  <NA>  D53     NA
>> > > > > 6  60   D62   <NA>   NA
>> > > > >
>> > > > > Thank you,
>> > > > >
>> > > > > ______________________________________________
>> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > > > > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

Val-17
Let us take the max space is two and the output should not be fixed
filed but preferable a csv file.

On Mon, Feb 22, 2021 at 8:05 PM jim holtman <[hidden email]> wrote:

>
> Messed up did not see your 'desired' output which will be hard since there is not a consistent number of spaces that would represent the desired column number.  Do you have any hit as to how to interpret the spacing especially you have several hundred more lines?  Is the output supposed to the 'fixed' field?
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Mon, Feb 22, 2021 at 5:00 PM jim holtman <[hidden email]> wrote:
>>
>> Try this:
>>
>> > library(tidyverse)
>>
>> > text <-  "x1  x2  x3 x4\n1 B12 \n2       C23 \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         "
>>
>> > # read in the data as characters and replace multiple blanks with single blank
>> > input <- read_lines(text)
>>
>> > input <- str_replace_all(input, ' +', ' ')
>>
>> > mydata <- read_delim(input, ' ', col_names = TRUE)
>> Warning: 5 parsing failures.
>> row col  expected    actual         file
>>   1  -- 4 columns 3 columns literal data
>>   2  -- 4 columns 3 columns literal data
>>   4  -- 4 columns 3 columns literal data
>>   5  -- 4 columns 2 columns literal data
>>   6  -- 4 columns 3 columns literal data
>>
>> > mydata
>> # A tibble: 6 x 4
>>      x1 x2    x3    x4
>>   <dbl> <chr> <chr> <lgl>
>> 1     1 B12   NA    NA
>> 2     2 C23   NA    NA
>> 3   322 B32   D34   NA
>> 4     4 D44   NA    NA
>> 5    51 D53   NA    NA
>> 6    60 D62   NA    NA
>> >
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Mon, Feb 22, 2021 at 4:49 PM Val <[hidden email]> wrote:
>>>
>>> That is my problem. The spacing between columns is not consistent.  It
>>>   may be  single space  or multiple spaces (two or three).
>>>
>>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <[hidden email]> wrote:
>>> >
>>> > You said the column values were separated by space characters.
>>> > Copying the text from gmail shows that some column names and column
>>> > values are separated by single spaces (e.g., between x1 and x2) and
>>> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
>>> > up the spacing or is there some other way to tell where the omitted
>>> > values are?
>>> >
>>> > -Bill
>>> >
>>> > On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
>>> > >
>>> > > I Tried that one and it did not work. Please see the error message
>>> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
>>> > > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         ",
>>> > > :
>>> > >   more columns than column names
>>> > >
>>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap <[hidden email]> wrote:
>>> > > >
>>> > > > Since the columns in the file are separated by a space character, " ",
>>> > > > add the read.table argument sep=" ".
>>> > > >
>>> > > > -Bill
>>> > > >
>>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
>>> > > > >
>>> > > > > Hi all, I am trying to read a messy data  but facing  difficulty.  The
>>> > > > > data has several columns separated by blank space(s).  Each column
>>> > > > > value may have different lengths across the rows.   The first
>>> > > > > row(header) has four columns. However, each row may not have the four
>>> > > > > column values.  For instance, the first data row has only the first
>>> > > > > two column values. The fourth data row has the first and last column
>>> > > > > values, the second and the third column values are missing for this
>>> > > > > row..  How do I read this data set correctly? Here is my sample data
>>> > > > > set, output and desired output.   To make it clear to each data point
>>> > > > > I have added the row and column numbers. I cannot use fixed width
>>> > > > > format reading because each row  may have different length for  a
>>> > > > > given column.
>>> > > > >
>>> > > > > dat<-read.table(text="x1  x2  x3 x4
>>> > > > > 1 B22
>>> > > > > 2         C33
>>> > > > > 322 B22      D34
>>> > > > > 4                 D44
>>> > > > > 51         D53
>>> > > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
>>> > > > >
>>> > > > > Output
>>> > > > >       x1  x2     x3     x4
>>> > > > > 1   1     B12 <NA> NA
>>> > > > > 2   2    C23 <NA>  NA
>>> > > > > 3 322  B32  D34   NA
>>> > > > > 4   4   D44  <NA>  NA
>>> > > > > 5  51 D53  <NA>   NA
>>> > > > > 6  60 D62  <NA>  NA
>>> > > > >
>>> > > > >
>>> > > > > Desired output
>>> > > > >    x1   x2     x3       x4
>>> > > > > 1   1    B22    <NA>   NA
>>> > > > > 2   2   <NA>  C33     NA
>>> > > > > 3 322  B32    NA      D34
>>> > > > > 4   4   <NA>   NA      D44
>>> > > > > 5  51  <NA>  D53     NA
>>> > > > > 6  60   D62   <NA>   NA
>>> > > > >
>>> > > > > Thank you,
>>> > > > >
>>> > > > > ______________________________________________
>>> > > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> > > > > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

Jeff Newmiller
In reply to this post by Val-17
This gets it into a data frame. If you know which columns should be numeric you can convert them.

s <-
"x1  x2  x3 x4
1 B22
2         C33
322 B22      D34
4                 D44
51         D53
60 D62            
"

tc <- textConnection( s )
lns <- readLines(tc)
close(tc)
if ( "" == lns[ length( lns ) ] )
  lns <- lns[ -length( lns ) ]

L <- strsplit( lns, " +" )
m <- do.call( rbind, lapply( L[-1], function(v) if (length(v)<length(L[[1]])) c( v, rep(NA, length(L[[1]]) - length(v) ) ) else v ) )
colnames( m ) <- L[[1]]
result <- as.data.frame( m, stringsAsFactors = FALSE )
result

On February 22, 2021 4:42:57 PM PST, Val <[hidden email]> wrote:

>That is my problem. The spacing between columns is not consistent.  It
>  may be  single space  or multiple spaces (two or three).
>
>On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <[hidden email]>
>wrote:
>>
>> You said the column values were separated by space characters.
>> Copying the text from gmail shows that some column names and column
>> values are separated by single spaces (e.g., between x1 and x2) and
>> some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
>> up the spacing or is there some other way to tell where the omitted
>> values are?
>>
>> -Bill
>>
>> On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
>> >
>> > I Tried that one and it did not work. Please see the error message
>> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
>> > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62        
>",
>> > :
>> >   more columns than column names
>> >
>> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
><[hidden email]> wrote:
>> > >
>> > > Since the columns in the file are separated by a space character,
>" ",
>> > > add the read.table argument sep=" ".
>> > >
>> > > -Bill
>> > >
>> > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
>> > > >
>> > > > Hi all, I am trying to read a messy data  but facing
>difficulty.  The
>> > > > data has several columns separated by blank space(s).  Each
>column
>> > > > value may have different lengths across the rows.   The first
>> > > > row(header) has four columns. However, each row may not have
>the four
>> > > > column values.  For instance, the first data row has only the
>first
>> > > > two column values. The fourth data row has the first and last
>column
>> > > > values, the second and the third column values are missing for
>this
>> > > > row..  How do I read this data set correctly? Here is my sample
>data
>> > > > set, output and desired output.   To make it clear to each data
>point
>> > > > I have added the row and column numbers. I cannot use fixed
>width
>> > > > format reading because each row  may have different length for
>a
>> > > > given column.
>> > > >
>> > > > dat<-read.table(text="x1  x2  x3 x4
>> > > > 1 B22
>> > > > 2         C33
>> > > > 322 B22      D34
>> > > > 4                 D44
>> > > > 51         D53
>> > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
>> > > >
>> > > > Output
>> > > >       x1  x2     x3     x4
>> > > > 1   1     B12 <NA> NA
>> > > > 2   2    C23 <NA>  NA
>> > > > 3 322  B32  D34   NA
>> > > > 4   4   D44  <NA>  NA
>> > > > 5  51 D53  <NA>   NA
>> > > > 6  60 D62  <NA>  NA
>> > > >
>> > > >
>> > > > Desired output
>> > > >    x1   x2     x3       x4
>> > > > 1   1    B22    <NA>   NA
>> > > > 2   2   <NA>  C33     NA
>> > > > 3 322  B32    NA      D34
>> > > > 4   4   <NA>   NA      D44
>> > > > 5  51  <NA>  D53     NA
>> > > > 6  60   D62   <NA>   NA
>> > > >
>> > > > Thank you,
>> > > >
>> > > > ______________________________________________
>> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
>see
>> > > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> > > > and provide commented, minimal, self-contained, reproducible
>code.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.
--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

jholtman
It looks like we can look at the last digit of the data and that would
be the column number; is that correct?

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Feb 22, 2021 at 5:34 PM Jeff Newmiller <[hidden email]> wrote:

>
> This gets it into a data frame. If you know which columns should be numeric you can convert them.
>
> s <-
> "x1  x2  x3 x4
> 1 B22
> 2         C33
> 322 B22      D34
> 4                 D44
> 51         D53
> 60 D62
> "
>
> tc <- textConnection( s )
> lns <- readLines(tc)
> close(tc)
> if ( "" == lns[ length( lns ) ] )
>   lns <- lns[ -length( lns ) ]
>
> L <- strsplit( lns, " +" )
> m <- do.call( rbind, lapply( L[-1], function(v) if (length(v)<length(L[[1]])) c( v, rep(NA, length(L[[1]]) - length(v) ) ) else v ) )
> colnames( m ) <- L[[1]]
> result <- as.data.frame( m, stringsAsFactors = FALSE )
> result
>
> On February 22, 2021 4:42:57 PM PST, Val <[hidden email]> wrote:
> >That is my problem. The spacing between columns is not consistent.  It
> >  may be  single space  or multiple spaces (two or three).
> >
> >On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap <[hidden email]>
> >wrote:
> >>
> >> You said the column values were separated by space characters.
> >> Copying the text from gmail shows that some column names and column
> >> values are separated by single spaces (e.g., between x1 and x2) and
> >> some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
> >> up the spacing or is there some other way to tell where the omitted
> >> values are?
> >>
> >> -Bill
> >>
> >> On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
> >> >
> >> > I Tried that one and it did not work. Please see the error message
> >> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
> >> > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62
> >",
> >> > :
> >> >   more columns than column names
> >> >
> >> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
> ><[hidden email]> wrote:
> >> > >
> >> > > Since the columns in the file are separated by a space character,
> >" ",
> >> > > add the read.table argument sep=" ".
> >> > >
> >> > > -Bill
> >> > >
> >> > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> >> > > >
> >> > > > Hi all, I am trying to read a messy data  but facing
> >difficulty.  The
> >> > > > data has several columns separated by blank space(s).  Each
> >column
> >> > > > value may have different lengths across the rows.   The first
> >> > > > row(header) has four columns. However, each row may not have
> >the four
> >> > > > column values.  For instance, the first data row has only the
> >first
> >> > > > two column values. The fourth data row has the first and last
> >column
> >> > > > values, the second and the third column values are missing for
> >this
> >> > > > row..  How do I read this data set correctly? Here is my sample
> >data
> >> > > > set, output and desired output.   To make it clear to each data
> >point
> >> > > > I have added the row and column numbers. I cannot use fixed
> >width
> >> > > > format reading because each row  may have different length for
> >a
> >> > > > given column.
> >> > > >
> >> > > > dat<-read.table(text="x1  x2  x3 x4
> >> > > > 1 B22
> >> > > > 2         C33
> >> > > > 322 B22      D34
> >> > > > 4                 D44
> >> > > > 51         D53
> >> > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> >> > > >
> >> > > > Output
> >> > > >       x1  x2     x3     x4
> >> > > > 1   1     B12 <NA> NA
> >> > > > 2   2    C23 <NA>  NA
> >> > > > 3 322  B32  D34   NA
> >> > > > 4   4   D44  <NA>  NA
> >> > > > 5  51 D53  <NA>   NA
> >> > > > 6  60 D62  <NA>  NA
> >> > > >
> >> > > >
> >> > > > Desired output
> >> > > >    x1   x2     x3       x4
> >> > > > 1   1    B22    <NA>   NA
> >> > > > 2   2   <NA>  C33     NA
> >> > > > 3 322  B32    NA      D34
> >> > > > 4   4   <NA>   NA      D44
> >> > > > 5  51  <NA>  D53     NA
> >> > > > 6  60   D62   <NA>   NA
> >> > > >
> >> > > > Thank you,
> >> > > >
> >> > > > ______________________________________________
> >> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> >see
> >> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > > > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> > > > and provide commented, minimal, self-contained, reproducible
> >code.
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

R help mailing list-2
This discussion is a bit weird so can we step back.

Someone wants help on how to read in a file that apparently was not written
following one of several consistent sets of rules.

If it was fixed width, R has functions that can read that.

If it was separated by commas, tabs, single spaces, arbitrary whitespace,
with or without a header line, we have functions that can read that if
properly called.

ALL the above normally assume that all the resulting columns are the same
length. If any are meant to be shorter, you still leave the separators in
place and put some NA or similar into the result. And, the functions we
normally talk about do NOT read in and produce multiple vectors but
something like a data.frame.

So the choice is either to make sure the darn data is in a consistent
format, or try a different plan. Fair enough?

Some are suggesting parsing it yourself line by line. Certainly that can be
done. But unless you know some schema to help you disambiguate, what do you
do it you reach a row that is too short and has enough data for two columns.
Which of the columns do you assign it to? If you had a clear rule, ...

And what if you have different data types? R does not handle that within a
single vector or row of a data.frame, albeit it can if you make it a list
column.

If this data is a one-time thing, perhaps it should be copied into something
like EXCEL by a human and edited so every column is filled as you wish and
THEN saved as something like a CSV file and then it can happily be imported
the usual way, including NA values as needed.

If the person really wants 4 independent vectors of different lengths to
read in, there are plenty of ways to do that and no need to lump them in
this odd format.



-----Original Message-----
From: R-help <[hidden email]> On Behalf Of jim holtman
Sent: Monday, February 22, 2021 9:01 PM
To: Jeff Newmiller <[hidden email]>
Cc: [hidden email] ([hidden email]) <[hidden email]>
Subject: Re: [R] Read

It looks like we can look at the last digit of the data and that would be
the column number; is that correct?

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Feb 22, 2021 at 5:34 PM Jeff Newmiller <[hidden email]>
wrote:
>
> This gets it into a data frame. If you know which columns should be
numeric you can convert them.

>
> s <-
> "x1  x2  x3 x4
> 1 B22
> 2         C33
> 322 B22      D34
> 4                 D44
> 51         D53
> 60 D62
> "
>
> tc <- textConnection( s )
> lns <- readLines(tc)
> close(tc)
> if ( "" == lns[ length( lns ) ] )
>   lns <- lns[ -length( lns ) ]
>
> L <- strsplit( lns, " +" )
> m <- do.call( rbind, lapply( L[-1], function(v) if
> (length(v)<length(L[[1]])) c( v, rep(NA, length(L[[1]]) - length(v) )
> ) else v ) ) colnames( m ) <- L[[1]] result <- as.data.frame( m,
> stringsAsFactors = FALSE ) result
>
> On February 22, 2021 4:42:57 PM PST, Val <[hidden email]> wrote:
> >That is my problem. The spacing between columns is not consistent.  
> >It
> >  may be  single space  or multiple spaces (two or three).
> >
> >On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap
> ><[hidden email]>
> >wrote:
> >>
> >> You said the column values were separated by space characters.
> >> Copying the text from gmail shows that some column names and column
> >> values are separated by single spaces (e.g., between x1 and x2) and
> >> some by multiple spaces (e.g., between x3 and x4.  Did the mail
> >> mess up the spacing or is there some other way to tell where the
> >> omitted values are?
> >>
> >> -Bill
> >>
> >> On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
> >> >
> >> > I Tried that one and it did not work. Please see the error message
> >> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
> >> > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62
> >",
> >> > :
> >> >   more columns than column names
> >> >
> >> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
> ><[hidden email]> wrote:
> >> > >
> >> > > Since the columns in the file are separated by a space
> >> > > character,
> >" ",
> >> > > add the read.table argument sep=" ".
> >> > >
> >> > > -Bill
> >> > >
> >> > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> >> > > >
> >> > > > Hi all, I am trying to read a messy data  but facing
> >difficulty.  The
> >> > > > data has several columns separated by blank space(s).  Each
> >column
> >> > > > value may have different lengths across the rows.   The first
> >> > > > row(header) has four columns. However, each row may not have
> >the four
> >> > > > column values.  For instance, the first data row has only the
> >first
> >> > > > two column values. The fourth data row has the first and last
> >column
> >> > > > values, the second and the third column values are missing
> >> > > > for
> >this
> >> > > > row..  How do I read this data set correctly? Here is my
> >> > > > sample
> >data
> >> > > > set, output and desired output.   To make it clear to each data
> >point
> >> > > > I have added the row and column numbers. I cannot use fixed
> >width
> >> > > > format reading because each row  may have different length
> >> > > > for
> >a
> >> > > > given column.
> >> > > >
> >> > > > dat<-read.table(text="x1  x2  x3 x4
> >> > > > 1 B22
> >> > > > 2         C33
> >> > > > 322 B22      D34
> >> > > > 4                 D44
> >> > > > 51         D53
> >> > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> >> > > >
> >> > > > Output
> >> > > >       x1  x2     x3     x4
> >> > > > 1   1     B12 <NA> NA
> >> > > > 2   2    C23 <NA>  NA
> >> > > > 3 322  B32  D34   NA
> >> > > > 4   4   D44  <NA>  NA
> >> > > > 5  51 D53  <NA>   NA
> >> > > > 6  60 D62  <NA>  NA
> >> > > >
> >> > > >
> >> > > > Desired output
> >> > > >    x1   x2     x3       x4
> >> > > > 1   1    B22    <NA>   NA
> >> > > > 2   2   <NA>  C33     NA
> >> > > > 3 322  B32    NA      D34
> >> > > > 4   4   <NA>   NA      D44
> >> > > > 5  51  <NA>  D53     NA
> >> > > > 6  60   D62   <NA>   NA
> >> > > >
> >> > > > Thank you,
> >> > > >
> >> > > > ______________________________________________
> >> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> >see
> >> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > > > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >> > > > and provide commented, minimal, self-contained, reproducible
> >code.
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Read

jholtman
This gives the desired output:

> library(tidyverse)
> text <-  "x1  x2  x3 x4\n1 B12 \n2       C23 \n322 B32      D34 \n4            D44 \n51     D53\n60 D62         "
>
> # read in the data as characters and split to a list
> input <- str_split(str_trim(read_lines(text)), ' +')
>
> max_cols <- 4  # assume a max of 4 columns
>
> put data in the correct column
> x_matrix <- do.call(rbind, map(input, ~{
+   result <- character(max_cols)
+   result[1] <- .x[1]
+   for (i in 2:length(.x)){
+     result[as.integer(str_sub(.x[i], -1))] <- .x[i]
+   }
+   result
+ }))
>
> # now add commas to convert to CSV
> x_csv <- apply(x_matrix, 1, paste, collapse = ',')
>
> # now read in and create desired output
> read_csv(x_csv)
# A tibble: 6 x 4
     x1 x2    x3    x4
  <dbl> <chr> <chr> <chr>
1     1 B12   <NA>  <NA>
2     2 <NA>  C23   <NA>
3   322 B32   <NA>  D34
4     4 <NA>  <NA>  D44
5    51 <NA>  D53   <NA>
6    60 D62   <NA>  <NA>
>



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Feb 22, 2021 at 6:20 PM Avi Gross via R-help
<[hidden email]> wrote:

>
> This discussion is a bit weird so can we step back.
>
> Someone wants help on how to read in a file that apparently was not written
> following one of several consistent sets of rules.
>
> If it was fixed width, R has functions that can read that.
>
> If it was separated by commas, tabs, single spaces, arbitrary whitespace,
> with or without a header line, we have functions that can read that if
> properly called.
>
> ALL the above normally assume that all the resulting columns are the same
> length. If any are meant to be shorter, you still leave the separators in
> place and put some NA or similar into the result. And, the functions we
> normally talk about do NOT read in and produce multiple vectors but
> something like a data.frame.
>
> So the choice is either to make sure the darn data is in a consistent
> format, or try a different plan. Fair enough?
>
> Some are suggesting parsing it yourself line by line. Certainly that can be
> done. But unless you know some schema to help you disambiguate, what do you
> do it you reach a row that is too short and has enough data for two columns.
> Which of the columns do you assign it to? If you had a clear rule, ...
>
> And what if you have different data types? R does not handle that within a
> single vector or row of a data.frame, albeit it can if you make it a list
> column.
>
> If this data is a one-time thing, perhaps it should be copied into something
> like EXCEL by a human and edited so every column is filled as you wish and
> THEN saved as something like a CSV file and then it can happily be imported
> the usual way, including NA values as needed.
>
> If the person really wants 4 independent vectors of different lengths to
> read in, there are plenty of ways to do that and no need to lump them in
> this odd format.
>
>
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of jim holtman
> Sent: Monday, February 22, 2021 9:01 PM
> To: Jeff Newmiller <[hidden email]>
> Cc: [hidden email] ([hidden email]) <[hidden email]>
> Subject: Re: [R] Read
>
> It looks like we can look at the last digit of the data and that would be
> the column number; is that correct?
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Mon, Feb 22, 2021 at 5:34 PM Jeff Newmiller <[hidden email]>
> wrote:
> >
> > This gets it into a data frame. If you know which columns should be
> numeric you can convert them.
> >
> > s <-
> > "x1  x2  x3 x4
> > 1 B22
> > 2         C33
> > 322 B22      D34
> > 4                 D44
> > 51         D53
> > 60 D62
> > "
> >
> > tc <- textConnection( s )
> > lns <- readLines(tc)
> > close(tc)
> > if ( "" == lns[ length( lns ) ] )
> >   lns <- lns[ -length( lns ) ]
> >
> > L <- strsplit( lns, " +" )
> > m <- do.call( rbind, lapply( L[-1], function(v) if
> > (length(v)<length(L[[1]])) c( v, rep(NA, length(L[[1]]) - length(v) )
> > ) else v ) ) colnames( m ) <- L[[1]] result <- as.data.frame( m,
> > stringsAsFactors = FALSE ) result
> >
> > On February 22, 2021 4:42:57 PM PST, Val <[hidden email]> wrote:
> > >That is my problem. The spacing between columns is not consistent.
> > >It
> > >  may be  single space  or multiple spaces (two or three).
> > >
> > >On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap
> > ><[hidden email]>
> > >wrote:
> > >>
> > >> You said the column values were separated by space characters.
> > >> Copying the text from gmail shows that some column names and column
> > >> values are separated by single spaces (e.g., between x1 and x2) and
> > >> some by multiple spaces (e.g., between x3 and x4.  Did the mail
> > >> mess up the spacing or is there some other way to tell where the
> > >> omitted values are?
> > >>
> > >> -Bill
> > >>
> > >> On Mon, Feb 22, 2021 at 2:54 PM Val <[hidden email]> wrote:
> > >> >
> > >> > I Tried that one and it did not work. Please see the error message
> > >> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2       C23
> > >> > \n322 B32      D34 \n4            D44 \n51     D53\n60 D62
> > >",
> > >> > :
> > >> >   more columns than column names
> > >> >
> > >> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
> > ><[hidden email]> wrote:
> > >> > >
> > >> > > Since the columns in the file are separated by a space
> > >> > > character,
> > >" ",
> > >> > > add the read.table argument sep=" ".
> > >> > >
> > >> > > -Bill
> > >> > >
> > >> > > On Mon, Feb 22, 2021 at 2:21 PM Val <[hidden email]> wrote:
> > >> > > >
> > >> > > > Hi all, I am trying to read a messy data  but facing
> > >difficulty.  The
> > >> > > > data has several columns separated by blank space(s).  Each
> > >column
> > >> > > > value may have different lengths across the rows.   The first
> > >> > > > row(header) has four columns. However, each row may not have
> > >the four
> > >> > > > column values.  For instance, the first data row has only the
> > >first
> > >> > > > two column values. The fourth data row has the first and last
> > >column
> > >> > > > values, the second and the third column values are missing
> > >> > > > for
> > >this
> > >> > > > row..  How do I read this data set correctly? Here is my
> > >> > > > sample
> > >data
> > >> > > > set, output and desired output.   To make it clear to each data
> > >point
> > >> > > > I have added the row and column numbers. I cannot use fixed
> > >width
> > >> > > > format reading because each row  may have different length
> > >> > > > for
> > >a
> > >> > > > given column.
> > >> > > >
> > >> > > > dat<-read.table(text="x1  x2  x3 x4
> > >> > > > 1 B22
> > >> > > > 2         C33
> > >> > > > 322 B22      D34
> > >> > > > 4                 D44
> > >> > > > 51         D53
> > >> > > > 60 D62            ",header=T, fill=T,na.strings=c("","NA"))
> > >> > > >
> > >> > > > Output
> > >> > > >       x1  x2     x3     x4
> > >> > > > 1   1     B12 <NA> NA
> > >> > > > 2   2    C23 <NA>  NA
> > >> > > > 3 322  B32  D34   NA
> > >> > > > 4   4   D44  <NA>  NA
> > >> > > > 5  51 D53  <NA>   NA
> > >> > > > 6  60 D62  <NA>  NA
> > >> > > >
> > >> > > >
> > >> > > > Desired output
> > >> > > >    x1   x2     x3       x4
> > >> > > > 1   1    B22    <NA>   NA
> > >> > > > 2   2   <NA>  C33     NA
> > >> > > > 3 322  B32    NA      D34
> > >> > > > 4   4   <NA>   NA      D44
> > >> > > > 5  51  <NA>  D53     NA
> > >> > > > 6  60   D62   <NA>   NA
> > >> > > >
> > >> > > > Thank you,
> > >> > > >
> > >> > > > ______________________________________________
> > >> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more,
> > >see
> > >> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > > > PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >> > > > and provide commented, minimal, self-contained, reproducible
> > >code.
> > >
> > >______________________________________________
> > >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Sent from my phone. Please excuse my brevity.
> > --
> > Sent from my phone. Please excuse my brevity.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.