Expanding data ...

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Expanding data ...

John Poulsen
Hello,

I have a dataset that has counts, but I need to expand the dataset so
that each of the counts has its own line in the dataset (row) and is
given and id.  It looks something like:

Site Type Cnt
1 "A" 3
1 "B" 0
2 "C" 2

I want the dataset to look like:

Site Type ID
1 "A" 1
1 "A" 2
1 "A" 3
1 "B" 0
2 "C" 1
2 "C" 2

I can do this using loops, but I was wondering if anyone knows a more
efficient way of expanding the data on counts and giving id numbers.

Thanks for your help,
John

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Expanding data ...

Philipp Pagel-5
On Sun, Nov 16, 2008 at 07:31:04AM -0500, John Poulsen wrote:

> I have a dataset that has counts, but I need to expand the dataset so  
> that each of the counts has its own line in the dataset (row) and is  
> given and id.  It looks something like:
>
> Site Type Cnt
> 1 "A" 3
> 1 "B" 0
> 2 "C" 2
>
> I want the dataset to look like:
>
> Site Type ID
> 1 "A" 1
> 1 "A" 2
> 1 "A" 3
> 1 "B" 0
> 2 "C" 1
> 2 "C" 2
>
> I can do this using loops, but I was wondering if anyone knows a more  
> efficient way of expanding the data on counts and giving id numbers.

The following will almost do what you want:

# create example data
df <- data.frame(site=c(1,1,2), type=c('A','B','C'), cnt=c(3,0,2))

# expand according to cnt column
df2 <- df[rep(1:dim(df)[1], times=df$cnt), ]
# generate ID column
df2$ID <- unlist(tapply(df2$cnt, df2$type, function(x){1:length(x)}))
# get rid of cnt column
df2$cnt <- NULL


There is one major difference to your example above: As Type 'B' has zero
counts, it will not occur in the expanded dataset - which seems the right thing
to do to me. Keeping a row for zero counts and assigning an ID of 0 is
inconsitent with how positive counts are treated. But factor 'type' still has
level 'B' - even though it does no longer occur in the actual data:

> str(df2)
'data.frame':   5 obs. of  3 variables:
 $ site: num  1 1 1 2 2
 $ type: Factor w/ 3 levels "A","B","C": 1 1 1 3 3
 $ ID  : int  1 2 3 1 2

Maybe this already solves your problem. If not: why do you want special
treatment of empty categories? Maybe you can use this solution and take care of
the zero counts in a different way than you had planned, originally?

cu
        Philipp

--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.