Arranging column data to create plots

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Arranging column data to create plots

R help mailing list-2
Dear All,

I need some help arranging data that was imported.

The imported data frame looks something like this (the actual file is huge, so this is example data)

DF:
IDKey  X1  Y1  X2  Y2  X3  Y3  X4  Y4
Name1  21  15  25  10
Name2  15  18  35  24  27  45
Name3  17  21  30  22  15  40  32  55

I would like to create a new data frame with the following

NewDF:
IDKey   X   Y
Name1  21  15
Name1  25  10
Name2  15  18
Name2  35  24
Name2  27  45
Name3  17  21
Name3  30  22
Name3  15  40
Name3  32  55

With the data like this I think I can do the following

ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line

and get 3 lines with the various number of points.

The point is that each of the XY pairs is a data point tied to NameX.  I would like to rearrange the data so I can plot the points/lines by the IDKey.  There will be at least 2 points, but the number of points for each IDKey can be as many as 4.

I have tried using the gather() function from the tidyverse package, but I can't make it work.  The issue is that I believe I need two separate gather statements (one for X, another for Y) to consolidate the data.  This causes the pairs to not stay together and the data becomes jumbled.

Thoughts
Thanks for your help

Michael E. Reed


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Arranging column data to create plots

Ulrik Stervbo-2
Hi Michael,

Try gather from the tidyr package

HTH
Ulrik

Michael Reed via R-help <[hidden email]> schrieb am So., 16. Juli
2017, 10:19:

> Dear All,
>
> I need some help arranging data that was imported.
>
> The imported data frame looks something like this (the actual file is
> huge, so this is example data)
>
> DF:
> IDKey  X1  Y1  X2  Y2  X3  Y3  X4  Y4
> Name1  21  15  25  10
> Name2  15  18  35  24  27  45
> Name3  17  21  30  22  15  40  32  55
>
> I would like to create a new data frame with the following
>
> NewDF:
> IDKey   X   Y
> Name1  21  15
> Name1  25  10
> Name2  15  18
> Name2  35  24
> Name2  27  45
> Name3  17  21
> Name3  30  22
> Name3  15  40
> Name3  32  55
>
> With the data like this I think I can do the following
>
> ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line
>
> and get 3 lines with the various number of points.
>
> The point is that each of the XY pairs is a data point tied to NameX.  I
> would like to rearrange the data so I can plot the points/lines by the
> IDKey.  There will be at least 2 points, but the number of points for each
> IDKey can be as many as 4.
>
> I have tried using the gather() function from the tidyverse package, but I
> can't make it work.  The issue is that I believe I need two separate gather
> statements (one for X, another for Y) to consolidate the data.  This causes
> the pairs to not stay together and the data becomes jumbled.
>
> Thoughts
> Thanks for your help
>
> Michael E. Reed
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Arranging column data to create plots

Jeff Newmiller
In reply to this post by R help mailing list-2
On Sat, 15 Jul 2017, Michael Reed via R-help wrote:

> Dear All,
>
> I need some help arranging data that was imported.

It would be helpful if you were to use dput to give us the sample data
since you say you have already imported it.

> The imported data frame looks something like this (the actual file is
> huge, so this is example data)
>
> DF:
> IDKey  X1  Y1  X2  Y2  X3  Y3  X4  Y4
> Name1  21  15  25  10
> Name2  15  18  35  24  27  45
> Name3  17  21  30  22  15  40  32  55

That data is missing in X3 etc, but would be NA in an actual data frame,
so I don't know if my workaround was the same as your workaround. Dput
would have clarified the starting point.

> I would like to create a new data frame with the following
>
> NewDF:
> IDKey   X   Y
> Name1  21  15
> Name1  25  10
> Name2  15  18
> Name2  35  24
> Name2  27  45
> Name3  17  21
> Name3  30  22
> Name3  15  40
> Name3  32  55
>
> With the data like this I think I can do the following
>
> ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line

You are missing parentheses. If you use the reprex library to test your
examples before posting them, you can be sure your simple errors don't
send us off on wild goose chases.

> and get 3 lines with the various number of points.
>
> The point is that each of the XY pairs is a data point tied to NameX.
> I would like to rearrange the data so I can plot the points/lines by the
> IDKey.  There will be at least 2 points, but the number of points for
> each IDKey can be as many as 4.
>
> I have tried using the gather() function from the tidyverse package, but

The tidyverse package is a virtual package that pulls in many packages.

> I can't make it work.  The issue is that I believe I need two separate
> gather statements (one for X, another for Y) to consolidate the data.
> This causes the pairs to not stay together and the data becomes jumbled.

No, what you need is a gather-spread.

######
library(dplyr)
library(tidyr)

DF <- read.table( text=
"IDKey  X1  Y1  X2  Y2  X3  Y3  X4  Y4
Name1   21  15  25  10  NA  NA  NA  NA
Name2   15  18  35  24  27  45  NA  NA
Name3   17  21  30  22  15  40  32  55
", header=TRUE, as.is=TRUE )

NewDF <- (   dta
          %>% gather( XY, value, -IDKey )
          %>% separate( XY, c( "Coord", "Num" ), 1 )
          %>% spread( Coord, value )
          %>% filter( !is.na( X ) & !is.na( Y ) )
          )
######

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Arranging column data to create plots

Jeff Newmiller
Correction at the end.

On Sun, 16 Jul 2017, Jeff Newmiller wrote:

> On Sat, 15 Jul 2017, Michael Reed via R-help wrote:
>
>> Dear All,
>>
>> I need some help arranging data that was imported.
>
> It would be helpful if you were to use dput to give us the sample data since
> you say you have already imported it.
>
>> The imported data frame looks something like this (the actual file is huge,
>> so this is example data)
>>
>> DF:
>> IDKey  X1  Y1  X2  Y2  X3  Y3  X4  Y4
>> Name1  21  15  25  10
>> Name2  15  18  35  24  27  45
>> Name3  17  21  30  22  15  40  32  55
>
> That data is missing in X3 etc, but would be NA in an actual data frame, so I
> don't know if my workaround was the same as your workaround. Dput
> would have clarified the starting point.
>
>> I would like to create a new data frame with the following
>>
>> NewDF:
>> IDKey   X   Y
>> Name1  21  15
>> Name1  25  10
>> Name2  15  18
>> Name2  35  24
>> Name2  27  45
>> Name3  17  21
>> Name3  30  22
>> Name3  15  40
>> Name3  32  55
>>
>> With the data like this I think I can do the following
>>
>> ggplot(NewDF, aes(x=X, y=Y, color=IDKey) + geom_line
>
> You are missing parentheses. If you use the reprex library to test your
> examples before posting them, you can be sure your simple errors don't send
> us off on wild goose chases.
>
>> and get 3 lines with the various number of points.
>>
>> The point is that each of the XY pairs is a data point tied to NameX. I
>> would like to rearrange the data so I can plot the points/lines by the
>> IDKey.  There will be at least 2 points, but the number of points for each
>> IDKey can be as many as 4.
>>
>> I have tried using the gather() function from the tidyverse package, but
>
> The tidyverse package is a virtual package that pulls in many packages.
>
>> I can't make it work.  The issue is that I believe I need two separate
>> gather statements (one for X, another for Y) to consolidate the data. This
>> causes the pairs to not stay together and the data becomes jumbled.
>
> No, what you need is a gather-spread.
>
> ######
> library(dplyr)
> library(tidyr)
>
> DF <- read.table( text=
> "IDKey  X1  Y1  X2  Y2  X3  Y3  X4  Y4
> Name1   21  15  25  10  NA  NA  NA  NA
> Name2   15  18  35  24  27  45  NA  NA
> Name3   17  21  30  22  15  40  32  55
> ", header=TRUE, as.is=TRUE )
>
> NewDF <- (   dta
>         %>% gather( XY, value, -IDKey )
>         %>% separate( XY, c( "Coord", "Num" ), 1 )
>         %>% spread( Coord, value )
>         %>% filter( !is.na( X ) & !is.na( Y ) )
>         )
> ######

Sorry, should have practiced what I preached...

##########
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)

DF <- structure(list(IDKey = c("Name1", "Name2", "Name3"), X1 = c(21L,
15L, 17L), Y1 = c(15L, 18L, 21L), X2 = c(25L, 35L, 30L), Y2 = c(10L, 24L,
22L), X3 = c(NA, 27L, 15L), Y3 = c(NA, 45L, 40L), X4 = c(NA, NA, 32L), Y4
= c(NA, NA, 55L)), .Names = c("IDKey", "X1", "Y1", "X2", "Y2", "X3", "Y3",
"X4", "Y4"), class = "data.frame", row.names = c(NA, -3L))

NewDF <- (   DF
          %>% gather( XY, value, -IDKey )
          %>% separate( XY, c( "Coord", "Num" ), 1 )
          %>% spread( Coord, value )
          %>% filter( !is.na( X ) & !is.na( Y ) )
          )
NewDF
#>   IDKey Num  X  Y
#> 1 Name1   1 21 15
#> 2 Name1   2 25 10
#> 3 Name2   1 15 18
#> 4 Name2   2 35 24
#> 5 Name2   3 27 45
#> 6 Name3   1 17 21
#> 7 Name3   2 30 22
#> 8 Name3   3 15 40
#> 9 Name3   4 32 55
##########

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...