Quantcast

how to deduplicate records, e.g. using melt() and cast()

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

how to deduplicate records, e.g. using melt() and cast()

Karl Brand
Esteemed UseRs,

This must be embarrassingly trivial to achieve with e.g., melt() and
cast(): deduplicating records ("pw.X" in example) for a given set of
responses ("cond.Y" in example).

Hopefully the runnable example shows clearly what i have and what i'm
trying to convert it to. But i'm just not getting it, ?cast that is! So
i'd really appreciate some ones patience to clarify this, using the
reshape package, or any other approach.

With sincere thanks in advance,

Karl


## Runnable example
## The data.frame i have:
library("reshape")
my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
rep("pw.C", 1)),
                    cond.one = c(0.5, NA, 0.4, NA, NA, NA),
                    cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
                    cond.three = c(NA, NA, NA, NA, 0.1, NA))
my.df
## The data fram i want:
wanted.df  <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
                    cond.one = c(0.5, 0.4, NA),
                    cond.two = c(0.6, 0.9, 0.2),
                    cond.three = c(NA, 0.1, NA))
wanted.df


--
Karl Brand
Dept of Cardiology and Dept of Bioinformatics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Odp: how to deduplicate records, e.g. using melt() and cast()

PIKAL Petr
Hi

I wold vote aggregate

> aggregate(my.df[,-1], list(pathway=my.df$pathway), mean, na.rm=T)
  pathway cond.one cond.two cond.three
1    pw.A      0.5      0.6        NaN
2    pw.B      0.4      0.9        0.1
3    pw.C      NaN      0.2        NaN
>

Regards
Petr


>
> Esteemed UseRs,
>
> This must be embarrassingly trivial to achieve with e.g., melt() and
> cast(): deduplicating records ("pw.X" in example) for a given set of
> responses ("cond.Y" in example).
>
> Hopefully the runnable example shows clearly what i have and what i'm
> trying to convert it to. But i'm just not getting it, ?cast that is! So
> i'd really appreciate some ones patience to clarify this, using the
> reshape package, or any other approach.
>
> With sincere thanks in advance,
>
> Karl
>
>
> ## Runnable example
> ## The data.frame i have:
> library("reshape")
> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
> rep("pw.C", 1)),
>                     cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>                     cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>                     cond.three = c(NA, NA, NA, NA, 0.1, NA))
> my.df
> ## The data fram i want:
> wanted.df  <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>                     cond.one = c(0.5, 0.4, NA),
>                     cond.two = c(0.6, 0.9, 0.2),
>                     cond.three = c(NA, 0.1, NA))
> wanted.df
>
>
> --
> Karl Brand
> Dept of Cardiology and Dept of Bioinformatics
> Erasmus MC
> Dr Molewaterplein 50
> 3015 GE Rotterdam
> T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to deduplicate records, e.g. using melt() and cast()

D. Rizopoulos
In reply to this post by Karl Brand
you could try aggregate(), e.g.,

my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
rep("pw.C", 1)),
                    cond.one = c(0.5, NA, 0.4, NA, NA, NA),
                    cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
                    cond.three = c(NA, NA, NA, NA, 0.1, NA))


aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)

or

sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
aggregate(my.df[-1], my.df['pathway'], sum.)


I hope it helps.

Best,
Dimitris


On 5/7/2012 11:50 AM, Karl Brand wrote:

> Esteemed UseRs,
>
> This must be embarrassingly trivial to achieve with e.g., melt() and
> cast(): deduplicating records ("pw.X" in example) for a given set of
> responses ("cond.Y" in example).
>
> Hopefully the runnable example shows clearly what i have and what i'm
> trying to convert it to. But i'm just not getting it, ?cast that is! So
> i'd really appreciate some ones patience to clarify this, using the
> reshape package, or any other approach.
>
> With sincere thanks in advance,
>
> Karl
>
>
> ## Runnable example
> ## The data.frame i have:
> library("reshape")
> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
> rep("pw.C", 1)),
> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
> cond.three = c(NA, NA, NA, NA, 0.1, NA))
> my.df
> ## The data fram i want:
> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
> cond.one = c(0.5, 0.4, NA),
> cond.two = c(0.6, 0.9, 0.2),
> cond.three = c(NA, 0.1, NA))
> wanted.df
>
>

--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to deduplicate records, e.g. using melt() and cast()

Karl Brand
Dimitris, Petra,

Thank you! aggregate() is my lesson for today, not melt() | cast()

Really appreciate the super fast help,

Karl

On 07/05/12 12:09, Dimitris Rizopoulos wrote:

> you could try aggregate(), e.g.,
>
> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
> rep("pw.C", 1)),
> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>
>
> aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
>
> or
>
> sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
> aggregate(my.df[-1], my.df['pathway'], sum.)
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
>
> On 5/7/2012 11:50 AM, Karl Brand wrote:
>> Esteemed UseRs,
>>
>> This must be embarrassingly trivial to achieve with e.g., melt() and
>> cast(): deduplicating records ("pw.X" in example) for a given set of
>> responses ("cond.Y" in example).
>>
>> Hopefully the runnable example shows clearly what i have and what i'm
>> trying to convert it to. But i'm just not getting it, ?cast that is! So
>> i'd really appreciate some ones patience to clarify this, using the
>> reshape package, or any other approach.
>>
>> With sincere thanks in advance,
>>
>> Karl
>>
>>
>> ## Runnable example
>> ## The data.frame i have:
>> library("reshape")
>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>> rep("pw.C", 1)),
>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>> my.df
>> ## The data fram i want:
>> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>> cond.one = c(0.5, 0.4, NA),
>> cond.two = c(0.6, 0.9, 0.2),
>> cond.three = c(NA, 0.1, NA))
>> wanted.df
>>
>>
>

--
Karl Brand
Dept of Cardiology and Dept of Bioinformatics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to deduplicate records, e.g. using melt() and cast()

Jan van der LAan-2
using reshape:

library(reshape)
m <- melt(my.df, id.var="pathway", na.rm=T)
cast(m, pathway~variable, sum, fill=NA)

Jan


On 05/07/2012 12:30 PM, Karl Brand wrote:

> Dimitris, Petra,
>
> Thank you! aggregate() is my lesson for today, not melt() | cast()
>
> Really appreciate the super fast help,
>
> Karl
>
> On 07/05/12 12:09, Dimitris Rizopoulos wrote:
>> you could try aggregate(), e.g.,
>>
>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>> rep("pw.C", 1)),
>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>
>>
>> aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
>>
>> or
>>
>> sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
>> aggregate(my.df[-1], my.df['pathway'], sum.)
>>
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> On 5/7/2012 11:50 AM, Karl Brand wrote:
>>> Esteemed UseRs,
>>>
>>> This must be embarrassingly trivial to achieve with e.g., melt() and
>>> cast(): deduplicating records ("pw.X" in example) for a given set of
>>> responses ("cond.Y" in example).
>>>
>>> Hopefully the runnable example shows clearly what i have and what i'm
>>> trying to convert it to. But i'm just not getting it, ?cast that is! So
>>> i'd really appreciate some ones patience to clarify this, using the
>>> reshape package, or any other approach.
>>>
>>> With sincere thanks in advance,
>>>
>>> Karl
>>>
>>>
>>> ## Runnable example
>>> ## The data.frame i have:
>>> library("reshape")
>>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>>> rep("pw.C", 1)),
>>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>> my.df
>>> ## The data fram i want:
>>> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>>> cond.one = c(0.5, 0.4, NA),
>>> cond.two = c(0.6, 0.9, 0.2),
>>> cond.three = c(NA, 0.1, NA))
>>> wanted.df
>>>
>>>
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: how to deduplicate records, e.g. using melt() and cast()

Karl Brand
Fantastic Jan,

Thanks a lot for the example on how i achieve this with melt()/cast().
Very good for my understanding of these functions.

Karl


On 07/05/12 13:49, Jan van der Laan wrote:

> using reshape:
>
> library(reshape)
> m <- melt(my.df, id.var="pathway", na.rm=T)
> cast(m, pathway~variable, sum, fill=NA)
>
> Jan
>
>
> On 05/07/2012 12:30 PM, Karl Brand wrote:
>> Dimitris, Petra,
>>
>> Thank you! aggregate() is my lesson for today, not melt() | cast()
>>
>> Really appreciate the super fast help,
>>
>> Karl
>>
>> On 07/05/12 12:09, Dimitris Rizopoulos wrote:
>>> you could try aggregate(), e.g.,
>>>
>>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>>> rep("pw.C", 1)),
>>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>>
>>>
>>> aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE)
>>>
>>> or
>>>
>>> sum. <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
>>> aggregate(my.df[-1], my.df['pathway'], sum.)
>>>
>>>
>>> I hope it helps.
>>>
>>> Best,
>>> Dimitris
>>>
>>>
>>> On 5/7/2012 11:50 AM, Karl Brand wrote:
>>>> Esteemed UseRs,
>>>>
>>>> This must be embarrassingly trivial to achieve with e.g., melt() and
>>>> cast(): deduplicating records ("pw.X" in example) for a given set of
>>>> responses ("cond.Y" in example).
>>>>
>>>> Hopefully the runnable example shows clearly what i have and what i'm
>>>> trying to convert it to. But i'm just not getting it, ?cast that is! So
>>>> i'd really appreciate some ones patience to clarify this, using the
>>>> reshape package, or any other approach.
>>>>
>>>> With sincere thanks in advance,
>>>>
>>>> Karl
>>>>
>>>>
>>>> ## Runnable example
>>>> ## The data.frame i have:
>>>> library("reshape")
>>>> my.df <- data.frame(pathway = c(rep("pw.A", 2), rep("pw.B", 3),
>>>> rep("pw.C", 1)),
>>>> cond.one = c(0.5, NA, 0.4, NA, NA, NA),
>>>> cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2),
>>>> cond.three = c(NA, NA, NA, NA, 0.1, NA))
>>>> my.df
>>>> ## The data fram i want:
>>>> wanted.df <- data.frame(pathway = c("pw.A", "pw.B", "pw.C"),
>>>> cond.one = c(0.5, 0.4, NA),
>>>> cond.two = c(0.6, 0.9, 0.2),
>>>> cond.three = c(NA, 0.1, NA))
>>>> wanted.df
>>>>
>>>>
>>>
>>
>

--
Karl Brand
Dept of Cardiology and Dept of Bioinformatics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...