|
12
|
I'm studying alone the R language for data preparation. I found a course at MIT for data preparation that uses python but I'm using R to learning. The first exercise is the preparation of data from a database that shows the contributions made to candidates for U.S. president. The database is described in FORMART ftp://ftp.fec.gov/FEC/Presidential_Map/2012/DATA_DICTIONARIES/CONTRIBUTOR_FORMAT.txt link. I wonder how to print the table showing how many states are President Obama the top candidate (by full amount of donations received) with R language?
I try using tapply method but, i dont understand how to working with more than one variable grouped. Could anyone help me in advance of the studies?
|
|
Hi
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of noobmin
> Sent: Monday, October 22, 2012 2:28 AM
> To: [hidden email]
> Subject: [R] How to use tapply with more than one variables grouped
>
> I'm studying alone the R language for data preparation. I found a
> course at MIT for data preparation that uses python but I'm using R to
> learning. The first exercise is the preparation of data from a database
> that shows the contributions made to candidates for U.S. president. The
> database is described in FORMART
> ftp://ftp.fec.gov/FEC/Presidential_Map/2012/DATA_DICTIONARIES/CONTRIBUT
> OR_FORMAT.txt
> link. I wonder how to print the table showing how many states are
> President Obama the top candidate (by full amount of donations
> received) with R language?
>
> I try using tapply method but, i dont understand how to working with
> more than one variable grouped. Could anyone help me in advance of the
> studies?
>
How did you use tapply? Did you read help page? It points to ?aggregate which is maybe what you are looking for.
Regards
Petr
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
interTable <-data.frame (Tapply ($ contb_receipt_amt date, list ($ cand_nm date, $ contbr_st date), sum))
I got create a table with the sum total contribution (contb_receipt_amt) of each presidential candidate (cand_nm) in each state (contbr_st). How could from interTable create a table of states where candidate 'Obama' has received greater Contribution?
thanks
|
|
Hi
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of noobmin
> Sent: Monday, October 22, 2012 12:31 PM
> To: [hidden email]
> Subject: Re: [R] How to use tapply with more than one variables grouped
>
> interTable <-data.frame (Tapply ($ contb_receipt_amt date, list ($
> cand_nm date, $ contbr_st date), sum))
>
> I got create a table with the sum total contribution
> (contb_receipt_amt) of each presidential candidate (cand_nm) in each
> state (contbr_st). How could from interTable create a table of states
> where candidate 'Obama' has received greater Contribution?
Greater than what? How does the table look like? Sorry I forgot my crystal ball at home.
Maybe you want to look to
?"]"
Regards
Petr
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Inline.
On Mon, Oct 22, 2012 at 6:55 AM, PIKAL Petr < [hidden email]> wrote:
> Hi
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:r-help-bounces@r-
>> project.org] On Behalf Of noobmin
>> Sent: Monday, October 22, 2012 12:31 PM
>> To: [hidden email]
>> Subject: Re: [R] How to use tapply with more than one variables grouped
>>
>> interTable <-data.frame (Tapply ($ contb_receipt_amt date, list ($
>> cand_nm date, $ contbr_st date), sum))
>>
>> I got create a table with the sum total contribution
>> (contb_receipt_amt) of each presidential candidate (cand_nm) in each
>> state (contbr_st). How could from interTable create a table of states
>> where candidate 'Obama' has received greater Contribution?
>
> Greater than what? How does the table look like? Sorry I forgot my crystal ball at home.
>
> Maybe you want to look to
>
> ?"]"
-- or ?"[" rather. :-)
-- Bert
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Hi
> -----Original Message-----
> From: Bert Gunter [mailto: [hidden email]]
> Sent: Monday, October 22, 2012 4:00 PM
> To: PIKAL Petr
> Cc: noobmin; [hidden email]
> Subject: Re: [R] How to use tapply with more than one variables grouped
>
> Inline.
>
> On Mon, Oct 22, 2012 at 6:55 AM, PIKAL Petr < [hidden email]>
> wrote:
> > Hi
> >
> >
> >> -----Original Message-----
> >> From: [hidden email] [mailto:r-help-bounces@r-
> >> project.org] On Behalf Of noobmin
> >> Sent: Monday, October 22, 2012 12:31 PM
> >> To: [hidden email]
> >> Subject: Re: [R] How to use tapply with more than one variables
> >> grouped
> >>
> >> interTable <-data.frame (Tapply ($ contb_receipt_amt date, list ($
> >> cand_nm date, $ contbr_st date), sum))
> >>
> >> I got create a table with the sum total contribution
> >> (contb_receipt_amt) of each presidential candidate (cand_nm) in each
> >> state (contbr_st). How could from interTable create a table of
> >> states where candidate 'Obama' has received greater Contribution?
> >
> > Greater than what? How does the table look like? Sorry I forgot my
> crystal ball at home.
> >
> > Maybe you want to look to
> >
> > ?"]"
>
> -- or ?"[" rather. :-)
THX
Correct, I forgot to check.
Regards
Petr
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
I believe that previously could not be understood. To facilitate'll give you an example. Assuming my table is presented below with the amount received from each candidate for president in a particular country state.
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
I would like to list the states where Obama has higher amount received (ie in CA and NY) and also the number of states, in this case 2. How to do this?
Thanks
|
|
I used these commands previously:
data <- read.csv("test.csv")
> tbl= data.frame(tapply(data$contb_receipt_amt,list(data$cand_nm,data$contbr_st),sum))
> tbl
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
|
|
Hi
and what is wrong?
Petr
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of noobmin
> Sent: Tuesday, October 23, 2012 2:52 PM
> To: [hidden email]
> Subject: Re: [R] How to use tapply with more than one variables grouped
>
> I used these commands previously:
>
> data <- read.csv("test.csv")
> > tbl=
> >
> data.frame(tapply(data$contb_receipt_amt,list(data$cand_nm,data$contbr_
> st),sum))
> > tbl
> AL AR CA NY
> Doug 250 250 250 NA
> Jennifer 20 340 300 100
> Michele 250 500 250 60
> Obama 15 45 520 600
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-use-> tapply-with-more-than-one-variables-grouped-tp4646948p4647122.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-> guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Hi,
If the criteria is to pick which among the following states are the top 2 contributors for each candidate,
dat1<-read.table(text="
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")
#for Obama
apply(dat1,1,function(x,n) x[which(rank(x)>length(x)-n)],n=2)[4]
#$Obama
# CA NY
#520 600
Your question was to list the states where Obama has higher amount received compared to ??
A.K.
----- Original Message -----
From: noobmin < [hidden email]>
To: [hidden email]
Cc:
Sent: Tuesday, October 23, 2012 7:41 AM
Subject: Re: [R] How to use tapply with more than one variables grouped
I believe that previously could not be understood. To facilitate'll give you
an example. Assuming my table is presented below with the amount received
from each candidate for president in a particular country state.
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
I would like to list the states where Obama has higher amount received (ie
in CA and NY) and also the number of states, in this case 2. How to do this?
Thanks
--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647111.htmlSent from the R help mailing list archive at Nabble.com.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
To take this example I reduced the number of records absurdly. In the original database there are 48 000 candidates and dozens of states. There is no way to analyze data visually. I would not put 400 mb of tables here. But based on the example how could list the states where obama received more contribution?
|
|
Hi,
Suppose if you have a threshold (say >500), then:
dat1<-read.table(text="
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")
res<-unlist(lapply(split(dat1,rownames(dat1)),function(x) x[x[!is.na(x)]>500]))
res
Obama.CA Obama.NY
520 600
# And suppose the threshold is >400
res1<-unlist(lapply(split(dat1,rownames(dat1)),function(x) x[x[!is.na(x)]>400]))
res1
#Michele.AR Obama.CA Obama.NY
# 500 520 600
res1[grep("Obama",names(res1))] #amount received for Obama
#Obama.CA Obama.NY
# 520 600
length(res1[grep("Obama",names(res1))])
#[1] 2
A.K.
----- Original Message -----
From: noobmin < [hidden email]>
To: [hidden email]
Cc:
Sent: Tuesday, October 23, 2012 7:41 AM
Subject: Re: [R] How to use tapply with more than one variables grouped
I believe that previously could not be understood. To facilitate'll give you
an example. Assuming my table is presented below with the amount received
from each candidate for president in a particular country state.
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
I would like to list the states where Obama has higher amount received (ie
in CA and NY) and also the number of states, in this case 2. How to do this?
Thanks
--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647111.htmlSent from the R help mailing list archive at Nabble.com.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
The criteria is to list where Obama has a higher number of contributions. The table shows the number of contribution that each presidential candidate received in a state of the country.
The table shown is an example, the query should be generic to a database with hundreds of candidates and dozens of states of the country. The original base has 450 mb, in real database I'm don't know how many contry states Obama has more donations, but in sample is in CA and NA. Michelle wins more contribution on AR....
Thanks
|
|
This post has NOT been accepted by the mailing list yet.
Hi,
Your statement
"The criteria is to list where Obama has a higher number of contributions."
"Higher" means higher than the highest of all the other candidates or ....
Bit confused with the statement
A.K.
|
|
I meant where obama has higher value compared to other candidates. Looking at the column NY, Obama has the highest. So to state that he wins. Looking for AR column, Michelle wins. I JUST want to list where obama wins.
Thank you! This seems to work, just do not understand why you used a threshold?
I will study your solution, thanks again!
|
|
Hi,
Your question is not clear.
Suppose if you want to find the highest two contributions for each candidate:
dat1<-read.table(text="
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")
res1<-unlist(lapply(split(dat1,rownames(dat1)),function(x) tail(apply(x,1,sort),2)))
nam1<-unlist(lapply(lapply(split(dat1,rownames(dat1)),function(x) tail(apply(x,1,sort),2)),function(x) dimnames(x)[1]),use.names=F)
names(res1)<-paste(names(res1),nam1,sep="_")
names(res1)<-gsub("\\d+","",names(res1))
res1
# Doug_AR Doug_CA Jennifer_CA Jennifer_AR Michele_CA Michele_AR
# 250 250 300 340 250 500
# Obama_CA Obama_NY
# 520 600
#Contribution for Obama
res1[grep("Obama",names(res1))]
#Obama_CA Obama_NY
# 520 600
A.K.
----- Original Message -----
From: noobmin < [hidden email]>
To: [hidden email]
Cc:
Sent: Tuesday, October 23, 2012 12:48 PM
Subject: Re: [R] How to use tapply with more than one variables grouped
To take this example I reduced the number of records absurdly. In the
original database there are 48 000 candidates and dozens of states. There is
no way to analyze data visually. I would not put 400 mb of tables here. But
based on the example how could list the states where obama received more
contribution?
--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647175.htmlSent from the R help mailing list archive at Nabble.com.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
HI,
I think I understand what you meant. This will output all those states where contribution for Obama is higher than all the other candidates.
dat1<-read.table(text="
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
",header=TRUE,stringsAsFactors=FALSE,sep="")
res<-unlist(lapply(apply(dat1,2,function(x) x[!is.na(x)]),function(x) x[all(x["Obama"]>x[names(x)!=names(x)[grep("Obama",names(x))]])]))
res[grep("Obama",names(res))]
#CA.Obama NY.Obama
# 520 600
A.K.
----- Original Message -----
From: noobmin < [hidden email]>
To: [hidden email]
Cc:
Sent: Tuesday, October 23, 2012 3:00 PM
Subject: Re: [R] How to use tapply with more than one variables grouped
I meant where obama has higher value compared to other candidates. Looking at
the column NY, Obama has the highest. So to state that he wins. Looking for
AR column, Michelle wins. I JUST want to list where obama wins.
Thank you! This seems to work, just do not understand why you used a
threshold?
I will study your solution, thanks again!
--
View this message in context: http://r.789695.n4.nabble.com/How-to-use-tapply-with-more-than-one-variables-grouped-tp4646948p4647203.htmlSent from the R help mailing list archive at Nabble.com.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
AL AR CA NY
Doug 250 250 250 NA
Jennifer 20 340 300 100
Michele 250 500 250 60
Obama 15 45 520 600
My English is not very good, I'll try again. I want to list ALL states in the country where Obama had greater contribution. The table above shows the total contribution received by each candidate in a given state. To AL state obama not received more than Doug. For the AR state he received no more than others candidates. For the CA state he received a total of $ 520, which is 520>300>250>=250 and should be selected. In NY also had the largest contribution, $ 600, 600>100>60 and should therefore be selected.
I want to make it to the N presidency candidates and M states of the country. The table above is only an example.
Sorry again, for me it was clear. = (
Thanks
|
12
|