aggregate and list elements of variables in data.frame

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

aggregate and list elements of variables in data.frame

Massimo Bressan
#given the following reproducible and simplified example

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t

#I need to get the following result

r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
r

# i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
#any help for that?

#so far I've just managed to "aggregate" and "count", like:

library(sqldf)
sqldf('select count(*) as count_id, A as unique_A from t group by A')

library(dplyr)
t%>%group_by(unique_A=A) %>% summarise(count_id = n())

# thank you


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Ivan Calandra-5
Hi Massimo,

Something along those lines could help you I guess:
t$A <- factor(t$A)
sapply(levels(t$A), function(x) which(t$A==x))

You can then play with the output using paste()

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 06/06/2018 10:13, Massimo Bressan wrote:

> #given the following reproducible and simplified example
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> #I need to get the following result
>
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
> r
>
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
> #any help for that?
>
> #so far I've just managed to "aggregate" and "count", like:
>
> library(sqldf)
> sqldf('select count(*) as count_id, A as unique_A from t group by A')
>
> library(dplyr)
> t%>%group_by(unique_A=A) %>% summarise(count_id = n())
>
> # thank you
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Massimo Bressan
In reply to this post by Massimo Bressan
thanks for the help

I'm posting here the complete solution

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t$A <- factor(t$A)
l<-sapply(levels(t$A), function(x) which(t$A==x))
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
r<-cbind(unique_A=row.names(r),r)
row.names(r)<-NULL
r

best



Da: "Massimo Bressan" <[hidden email]>
A: "r-help" <[hidden email]>
Inviato: Mercoledì, 6 giugno 2018 10:13:10
Oggetto: aggregate and list elements of variables in data.frame

#given the following reproducible and simplified example

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t

#I need to get the following result

r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
r

# i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
#any help for that?

#so far I've just managed to "aggregate" and "count", like:

library(sqldf)
sqldf('select count(*) as count_id, A as unique_A from t group by A')

library(dplyr)
t%>%group_by(unique_A=A) %>% summarise(count_id = n())

# thank you


--

------------------------------------------------------------
Massimo Bressan

ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto

Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy

tel: +39 0422 558545
fax: +39 0422 558516
e-mail: [hidden email]
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Massimo Bressan
sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A

#please consider this new example

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
t

# I need to get this result
r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97'))
r

# any help for this, please?





Da: "Massimo Bressan" <[hidden email]>
A: "r-help" <[hidden email]>
Inviato: Giovedì, 7 giugno 2018 10:09:55
Oggetto: Re: aggregate and list elements of variables in data.frame

thanks for the help

I'm posting here the complete solution

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t$A <- factor(t$A)
l<-sapply(levels(t$A), function(x) which(t$A==x))
r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
r<-cbind(unique_A=row.names(r),r)
row.names(r)<-NULL
r

best



Da: "Massimo Bressan" <[hidden email]>
A: "r-help" <[hidden email]>
Inviato: Mercoledì, 6 giugno 2018 10:13:10
Oggetto: aggregate and list elements of variables in data.frame

#given the following reproducible and simplified example

t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
t

#I need to get the following result

r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
r

# i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
#any help for that?

#so far I've just managed to "aggregate" and "count", like:

library(sqldf)
sqldf('select count(*) as count_id, A as unique_A from t group by A')

library(dplyr)
t%>%group_by(unique_A=A) %>% summarise(count_id = n())

# thank you


--

------------------------------------------------------------
Massimo Bressan

ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto

Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy

tel: +39 0422 558545
fax: +39 0422 558516
e-mail: [hidden email]
------------------------------------------------------------


--

------------------------------------------------------------
Massimo Bressan

ARPAV
Agenzia Regionale per la Prevenzione e
Protezione Ambientale del Veneto

Dipartimento Provinciale di Treviso
Via Santa Barbara, 5/a
31100 Treviso, Italy

tel: +39 0422 558545
fax: +39 0422 558516
e-mail: [hidden email]
------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Ivan Calandra-5
Using which() to subset t$id should do the trick:

sapply(levels(t$A), function(x) t$id[which(t$A==x)])

Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 07/06/2018 14:21, Massimo Bressan wrote:

> sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A
>
> #please consider this new example
>
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> # I need to get this result
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97'))
> r
>
> # any help for this, please?
>
>
>
>
>
> Da: "Massimo Bressan" <[hidden email]>
> A: "r-help" <[hidden email]>
> Inviato: Giovedì, 7 giugno 2018 10:09:55
> Oggetto: Re: aggregate and list elements of variables in data.frame
>
> thanks for the help
>
> I'm posting here the complete solution
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t$A <- factor(t$A)
> l<-sapply(levels(t$A), function(x) which(t$A==x))
> r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
> r<-cbind(unique_A=row.names(r),r)
> row.names(r)<-NULL
> r
>
> best
>
>
>
> Da: "Massimo Bressan" <[hidden email]>
> A: "r-help" <[hidden email]>
> Inviato: Mercoledì, 6 giugno 2018 10:13:10
> Oggetto: aggregate and list elements of variables in data.frame
>
> #given the following reproducible and simplified example
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> #I need to get the following result
>
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
> r
>
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
> #any help for that?
>
> #so far I've just managed to "aggregate" and "count", like:
>
> library(sqldf)
> sqldf('select count(*) as count_id, A as unique_A from t group by A')
>
> library(dplyr)
> t%>%group_by(unique_A=A) %>% summarise(count_id = n())
>
> # thank you
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Ben Tupper-2
In reply to this post by Massimo Bressan
Hi,

Does this do what you want?  I had to change the id values to something more obvious.  It uses tibbles which allow each variable to be a list.

library(tibble)
library(dplyr)
x       <- tibble(id=LETTERS[1:10],
                A=c(123,345,123,678,345,123,789,345,123,789))
uA      <- unique(x$A)
idx     <- lapply(uA, function(v) which(x$A %in% v))
vals    <- lapply(idx, function(index) x$id[index])

r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals)


> r
# A tibble: 4 x 3
  unique_A list_idx  list_vals
     <dbl> <list>    <list>  
1     123. <int [4]> <chr [4]>
2     345. <int [3]> <chr [3]>
3     678. <int [1]> <chr [1]>
4     789. <int [2]> <chr [2]>
> r$list_idx[1]
[[1]]
[1] 1 3 6 9

> r$list_vals[1]
[[1]]
[1] "A" "C" "F" "I"


Cheers,
ben



> On Jun 7, 2018, at 8:21 AM, Massimo Bressan <[hidden email]> wrote:
>
> sorry, but by further looking at the example I just realised that the posted solution it's not completely what I need because in fact I do not need to get back the 'indices' but instead the corrisponding values of column A
>
> #please consider this new example
>
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> # I need to get this result
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('18,20,27,4','91,54,15','68','26,97'))
> r
>
> # any help for this, please?
>
>
>
>
>
> Da: "Massimo Bressan" <[hidden email]>
> A: "r-help" <[hidden email]>
> Inviato: Giovedì, 7 giugno 2018 10:09:55
> Oggetto: Re: aggregate and list elements of variables in data.frame
>
> thanks for the help
>
> I'm posting here the complete solution
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t$A <- factor(t$A)
> l<-sapply(levels(t$A), function(x) which(t$A==x))
> r<-data.frame(list_id=unlist(lapply(l, paste, collapse = ", ")))
> r<-cbind(unique_A=row.names(r),r)
> row.names(r)<-NULL
> r
>
> best
>
>
>
> Da: "Massimo Bressan" <[hidden email]>
> A: "r-help" <[hidden email]>
> Inviato: Mercoledì, 6 giugno 2018 10:13:10
> Oggetto: aggregate and list elements of variables in data.frame
>
> #given the following reproducible and simplified example
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> #I need to get the following result
>
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
> r
>
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
> #any help for that?
>
> #so far I've just managed to "aggregate" and "count", like:
>
> library(sqldf)
> sqldf('select count(*) as count_id, A as unique_A from t group by A')
>
> library(dplyr)
> t%>%group_by(unique_A=A) %>% summarise(count_id = n())
>
> # thank you
>
>
> --
>
> ------------------------------------------------------------
> Massimo Bressan
>
> ARPAV
> Agenzia Regionale per la Prevenzione e
> Protezione Ambientale del Veneto
>
> Dipartimento Provinciale di Treviso
> Via Santa Barbara, 5/a
> 31100 Treviso, Italy
>
> tel: +39 0422 558545
> fax: +39 0422 558516
> e-mail: [hidden email]
> ------------------------------------------------------------
>
>
> --
>
> ------------------------------------------------------------
> Massimo Bressan
>
> ARPAV
> Agenzia Regionale per la Prevenzione e
> Protezione Ambientale del Veneto
>
> Dipartimento Provinciale di Treviso
> Via Santa Barbara, 5/a
> 31100 Treviso, Italy
>
> tel: +39 0422 558545
> fax: +39 0422 558516
> e-mail: [hidden email]
> ------------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/






        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Massimo Bressan
thank you for the help

this is my solution based on your valuable hint but without the need to pass through the use of a 'tibble'

x<-data.frame(id=LETTERS[1:10], A=c(123,345,123,678,345,123,789,345,123,789))
uA<-unique(x$A)
idx<-lapply(uA, function(v) which(x$A %in% v))
vals<- lapply(idx, function(index) x$id[index])
data.frame(unique_A = uA, list_vals=unlist(lapply(vals, paste, collapse = ", ")))

best



Da: "Ben Tupper" <[hidden email]>
A: "Massimo Bressan" <[hidden email]>
Cc: "r-help" <[hidden email]>
Inviato: Giovedì, 7 giugno 2018 14:47:55
Oggetto: Re: [R] aggregate and list elements of variables in data.frame

Hi,

Does this do what you want? I had to change the id values to something more obvious. It uses tibbles which allow each variable to be a list.

library(tibble)
library(dplyr)
x <- tibble(id=LETTERS[1:10],
A=c(123,345,123,678,345,123,789,345,123,789))
uA <- unique(x$A)
idx <- lapply(uA, function(v) which(x$A %in% v))
vals <- lapply(idx, function(index) x$id[index])

r <- tibble(unique_A = uA, list_idx = idx, list_vals = vals)


> r
# A tibble: 4 x 3
unique_A list_idx list_vals
<dbl> <list> <list>
1 123. <int [4]> <chr [4]>
2 345. <int [3]> <chr [3]>
3 678. <int [1]> <chr [1]>
4 789. <int [2]> <chr [2]>
> r$list_idx[1]
[[1]]
[1] 1 3 6 9

> r$list_vals[1]
[[1]]
[1] "A" "C" "F" "I"


Cheers,
ben


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Massimo Bressan
#ok, finally this is my final "best and more compact" solution of the problem by merging different contributions (thanks to all indeed)

t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
l<-sapply(unique(t$A), function(x) t$id[which(t$A==x)])
r<-data.frame(unique_A= unique(t$A), list_id=unlist(lapply(l, paste, collapse = ", ")))
r


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Bert Gunter-2
which() is unnecessary. Use logical subscripting:

... t$id[t$A ==x]

Further simplification can be gotten by using the with() function:

l <- with(t, sapply(unique(A), function(x) id[A ==x]))

Check this though -- there might be scoping issues.

Cheers,
Bert



On Thu, Jun 7, 2018, 6:49 AM Massimo Bressan <[hidden email]>
wrote:

> #ok, finally this is my final "best and more compact" solution of the
> problem by merging different contributions (thanks to all indeed)
>
> t<-data.frame(id=c(18,91,20,68,54,27,26,15,4,97),A=c(123,345,123,678,345,123,789,345,123,789))
>
> l<-sapply(unique(t$A), function(x) t$id[which(t$A==x)])
> r<-data.frame(unique_A= unique(t$A), list_id=unlist(lapply(l, paste,
> collapse = ", ")))
> r
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: aggregate and list elements of variables in data.frame

Eik Vettorazzi-2
In reply to this post by Massimo Bressan
Hi,
if you are willing to use dplyr, you can do all in one line of code:

library(dplyr)
df<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))

df%>%group_by(unique_A=A)%>%summarise(list_id=paste(id,collapse=", "))->r

cheers


Am 06.06.2018 um 10:13 schrieb Massimo Bressan:

> #given the following reproducible and simplified example
>
> t<-data.frame(id=1:10,A=c(123,345,123,678,345,123,789,345,123,789))
> t
>
> #I need to get the following result
>
> r<-data.frame(unique_A=c(123, 345, 678, 789),list_id=c('1,3,6,9','2,5,8','4','7,10'))
> r
>
> # i.e. aggregate over the variable "A" and list all elements of the variable "id" satisfying the criteria of having the same corrisponding value of "A"
> #any help for that?
>
> #so far I've just managed to "aggregate" and "count", like:
>
> library(sqldf)
> sqldf('select count(*) as count_id, A as unique_A from t group by A')
>
> library(dplyr)
> t%>%group_by(unique_A=A) %>% summarise(count_id = n())
>
> # thank you
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistrasse 52
building W 34
20246 Hamburg

Phone: +49 (0) 40 7410 - 58243
Fax:   +49 (0) 40 7410 - 57790
Web: www.uke.de/imbe
--

_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg | www.uke.de
Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.