reshaping column items into rows per unique ID

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

reshaping column items into rows per unique ID

Allaisone 1
Hi All

I have a datafram which looks like this :

CustomerID    DietType
1                           a
1                            c
1                            b
2                            f
2                             a
3                             j
4                             c
4                             c
4                              f

And I would like to reshape this so I can see the list of DietTypes per customer in rows instead of columns like this :

> MyDf
CustomerID      DietType   DietType  DietType
1                                a            c               b
2                                 f             a
3                                 j
4                                 c              c             f

I tried many times using melt(),spread (),and dcast () functions but was not able to produce the desired table. The best attempt was by typing :

# 1) Adding new column with unique values:
MyDf $newcol <- c (1:9)
#2) then :
NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)

This produces the desired table but with many NA values like this :

CustomerID    1   2   3    4     5    6     7   8   9
1                    a  c    b   NA NA NA NA NA NA
2                  NA NA NA  f     a  NA NA NA NA
3                  NA NA NA NA NA  j   NA NA NA
4                  NA NA NA NA NA NA c     c     f

  As you see, the lette/s indicating DietType move to the right side each time we move down leaving many NA values and as my original files is very large, I expect that the final output would contain around 800,000 columns and 70,000 rows. This is why my code works with small data but does not work with my large file because of memory issue even though I'm using large PC.

What changes I need to do with my code to produce the desired table where the list of DietTypes are grouped in rows exactly like the second table shown abover?

Regards
Allaisnoe

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reshaping column items into rows per unique ID

Bert Gunter-2
I believe you need to spend time with an R tutorial or two: a data frame
(presumably the "table" data structure you describe) can *not* contain
"blanks" -- all columns must be the same length, which means NA's are
filled in as needed.

Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
local version of R (maybe it would in some server version -- others more
knowledgeable should comment on this).

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <[hidden email]> wrote:

> Hi All
>
> I have a datafram which looks like this :
>
> CustomerID    DietType
> 1                           a
> 1                            c
> 1                            b
> 2                            f
> 2                             a
> 3                             j
> 4                             c
> 4                             c
> 4                              f
>
> And I would like to reshape this so I can see the list of DietTypes per
> customer in rows instead of columns like this :
>
> > MyDf
> CustomerID      DietType   DietType  DietType
> 1                                a            c               b
> 2                                 f             a
> 3                                 j
> 4                                 c              c             f
>
> I tried many times using melt(),spread (),and dcast () functions but was
> not able to produce the desired table. The best attempt was by typing :
>
> # 1) Adding new column with unique values:
> MyDf $newcol <- c (1:9)
> #2) then :
> NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
>
> This produces the desired table but with many NA values like this :
>
> CustomerID    1   2   3    4     5    6     7   8   9
> 1                    a  c    b   NA NA NA NA NA NA
> 2                  NA NA NA  f     a  NA NA NA NA
> 3                  NA NA NA NA NA  j   NA NA NA
> 4                  NA NA NA NA NA NA c     c     f
>
>   As you see, the lette/s indicating DietType move to the right side each
> time we move down leaving many NA values and as my original files is very
> large, I expect that the final output would contain around 800,000 columns
> and 70,000 rows. This is why my code works with small data but does not
> work with my large file because of memory issue even though I'm using large
> PC.
>
> What changes I need to do with my code to produce the desired table where
> the list of DietTypes are grouped in rows exactly like the second table
> shown abover?
>
> Regards
> Allaisnoe
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reshaping column items into rows per unique ID

Eric Berger
Hi Allaisone,
I took a slightly different approach but you might find this either as or
more useful than your approach, or at least a start on the path to a
solution you need.

df1   <-
data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"),
                    stringsAsFactors=FALSE)
custs <- unique(df1$CustId)
dtype <- unique(df1$DietType)
nc    <- length(custs)
nd    <- length(dtype)
df2   <- as.data.frame( matrix(rep(0,nc*(nd+1)),nrow=nc),
stringsAsFactors=FALSE)
colnames(df2) <- c("CustId",dtype[order(dtype)])
df2$CustId <- custs[ order(custs) ]

for ( i in 1:nrow(df1) ) {
  iRow <- match(df1$CustId[i],df2$CustId)
  iCol <- match(df1$DietType[i],colnames(df2))
  df2[ iRow, iCol ] <- df2[ iRow, iCol] + 1
}

> df2
#       CustId   a  b  c  f   j
# 1             1  1  1  1  0  0
# 2              2  0 0  0  0  0
# 3              3  1 0  0  0  1
# 4              4  0 0  2  1  0

The dataframe df2 will have a column for the CustId and one column for each
unique diet type.
Each row is a unique customerId, and each entry contains the number of
times the given diet type occurred for that customer.

I hope that helps,
Eric



On Sun, Feb 25, 2018 at 7:08 PM, Bert Gunter <[hidden email]> wrote:

> I believe you need to spend time with an R tutorial or two: a data frame
> (presumably the "table" data structure you describe) can *not* contain
> "blanks" -- all columns must be the same length, which means NA's are
> filled in as needed.
>
> Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
> local version of R (maybe it would in some server version -- others more
> knowledgeable should comment on this).
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <[hidden email]>
> wrote:
>
> > Hi All
> >
> > I have a datafram which looks like this :
> >
> > CustomerID    DietType
> > 1                           a
> > 1                            c
> > 1                            b
> > 2                            f
> > 2                             a
> > 3                             j
> > 4                             c
> > 4                             c
> > 4                              f
> >
> > And I would like to reshape this so I can see the list of DietTypes per
> > customer in rows instead of columns like this :
> >
> > > MyDf
> > CustomerID      DietType   DietType  DietType
> > 1                                a            c               b
> > 2                                 f             a
> > 3                                 j
> > 4                                 c              c             f
> >
> > I tried many times using melt(),spread (),and dcast () functions but was
> > not able to produce the desired table. The best attempt was by typing :
> >
> > # 1) Adding new column with unique values:
> > MyDf $newcol <- c (1:9)
> > #2) then :
> > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
> >
> > This produces the desired table but with many NA values like this :
> >
> > CustomerID    1   2   3    4     5    6     7   8   9
> > 1                    a  c    b   NA NA NA NA NA NA
> > 2                  NA NA NA  f     a  NA NA NA NA
> > 3                  NA NA NA NA NA  j   NA NA NA
> > 4                  NA NA NA NA NA NA c     c     f
> >
> >   As you see, the lette/s indicating DietType move to the right side each
> > time we move down leaving many NA values and as my original files is very
> > large, I expect that the final output would contain around 800,000
> columns
> > and 70,000 rows. This is why my code works with small data but does not
> > work with my large file because of memory issue even though I'm using
> large
> > PC.
> >
> > What changes I need to do with my code to produce the desired table where
> > the list of DietTypes are grouped in rows exactly like the second table
> > shown abover?
> >
> > Regards
> > Allaisnoe
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reshaping column items into rows per unique ID

Peter Dalgaard-2
It depends quite strongly on what you want to do with the result, but I wonder if what is really needed might be a list of diettypes per person, i.e. continuing from Eric's code

> On 25 Feb 2018, at 18:56 , Eric Berger <[hidden email]> wrote:
>
> Hi Allaisone,
> I took a slightly different approach but you might find this either as or
> more useful than your approach, or at least a start on the path to a
> solution you need.
>
> df1   <-
> data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"),
>                    stringsAsFactors=FALSE)

> with(df1, tapply(DietType, CustId, list))
$`1`
[1] "a" "c" "b"

$`2`
[1] "f"

$`3`
[1] "a" "j"

$`4`
[1] "c" "c" "f"

or maybe get rid of duplicates with

> with(df1, tapply(DietType, CustId, unique))
$`1`
[1] "a" "c" "b"

$`2`
[1] "f"

$`3`
[1] "a" "j"

$`4`
[1] "c" "f"



--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reshaping column items into rows per unique ID

Jim Lemon-4
In reply to this post by Allaisone 1
Hi Allaisone,
If you want a data frame as the output you will have to put up with a
few NA values unless each Customer has the same number of diet types:

a1df<-read.table(text="CustomerID    DietType
1                           a
1                            c
1                            b
2                            f
2                             a
3                             j
4                             c
4                             c
4                              f",
header=TRUE,stringsAsFactors=FALSE)
library(prettyR)
stretch_df(a1df,"CustomerID","DietType")
  CustomerID DietType_1 DietType_2 DietType_3
1          1          a          c          b
2          2          f          a       <NA>
3          3          j       <NA>       <NA>
4          4          c          c          f

Jim


On Sun, Feb 25, 2018 at 11:59 PM, Allaisone 1 <[hidden email]> wrote:

> Hi All
>
> I have a datafram which looks like this :
>
> CustomerID    DietType
> 1                           a
> 1                            c
> 1                            b
> 2                            f
> 2                             a
> 3                             j
> 4                             c
> 4                             c
> 4                              f
>
> And I would like to reshape this so I can see the list of DietTypes per customer in rows instead of columns like this :
>
>> MyDf
> CustomerID      DietType   DietType  DietType
> 1                                a            c               b
> 2                                 f             a
> 3                                 j
> 4                                 c              c             f
>
> I tried many times using melt(),spread (),and dcast () functions but was not able to produce the desired table. The best attempt was by typing :
>
> # 1) Adding new column with unique values:
> MyDf $newcol <- c (1:9)
> #2) then :
> NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
>
> This produces the desired table but with many NA values like this :
>
> CustomerID    1   2   3    4     5    6     7   8   9
> 1                    a  c    b   NA NA NA NA NA NA
> 2                  NA NA NA  f     a  NA NA NA NA
> 3                  NA NA NA NA NA  j   NA NA NA
> 4                  NA NA NA NA NA NA c     c     f
>
>   As you see, the lette/s indicating DietType move to the right side each time we move down leaving many NA values and as my original files is very large, I expect that the final output would contain around 800,000 columns and 70,000 rows. This is why my code works with small data but does not work with my large file because of memory issue even though I'm using large PC.
>
> What changes I need to do with my code to produce the desired table where the list of DietTypes are grouped in rows exactly like the second table shown abover?
>
> Regards
> Allaisnoe
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: reshaping column items into rows per unique ID

Allaisone 1
Many thanks all for your very helpful replays.

I wasn't aware by stretch_df function in PrettyR package!.. It has produced exactly what I want in a very simple and efficient way!. Thank you so much Jim.

Kind Regards
Allaisone

________________________________________
From: Jim Lemon <[hidden email]>
Sent: 25 February 2018 21:48:51
To: Allaisone 1; r-help mailing list
Subject: Re: [R] reshaping column items into rows per unique ID

Hi Allaisone,
If you want a data frame as the output you will have to put up with a
few NA values unless each Customer has the same number of diet types:

a1df<-read.table(text="CustomerID    DietType
1                           a
1                            c
1                            b
2                            f
2                             a
3                             j
4                             c
4                             c
4                              f",
header=TRUE,stringsAsFactors=FALSE)
library(prettyR)
stretch_df(a1df,"CustomerID","DietType")
  CustomerID DietType_1 DietType_2 DietType_3
1          1          a          c          b
2          2          f          a       <NA>
3          3          j       <NA>       <NA>
4          4          c          c          f

Jim


On Sun, Feb 25, 2018 at 11:59 PM, Allaisone 1 <[hidden email]> wrote:

> Hi All
>
> I have a datafram which looks like this :
>
> CustomerID    DietType
> 1                           a
> 1                            c
> 1                            b
> 2                            f
> 2                             a
> 3                             j
> 4                             c
> 4                             c
> 4                              f
>
> And I would like to reshape this so I can see the list of DietTypes per customer in rows instead of columns like this :
>
>> MyDf
> CustomerID      DietType   DietType  DietType
> 1                                a            c               b
> 2                                 f             a
> 3                                 j
> 4                                 c              c             f
>
> I tried many times using melt(),spread (),and dcast () functions but was not able to produce the desired table. The best attempt was by typing :
>
> # 1) Adding new column with unique values:
> MyDf $newcol <- c (1:9)
> #2) then :
> NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType)
>
> This produces the desired table but with many NA values like this :
>
> CustomerID    1   2   3    4     5    6     7   8   9
> 1                    a  c    b   NA NA NA NA NA NA
> 2                  NA NA NA  f     a  NA NA NA NA
> 3                  NA NA NA NA NA  j   NA NA NA
> 4                  NA NA NA NA NA NA c     c     f
>
>   As you see, the lette/s indicating DietType move to the right side each time we move down leaving many NA values and as my original files is very large, I expect that the final output would contain around 800,000 columns and 70,000 rows. This is why my code works with small data but does not work with my large file because of memory issue even though I'm using large PC.
>
> What changes I need to do with my code to produce the desired table where the list of DietTypes are grouped in rows exactly like the second table shown abover?
>
> Regards
> Allaisnoe
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.