Correct subsetting in R

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Correct subsetting in R

R help mailing list-2
Hi all,
I have two data frames that one of them does not have the column ID:

    > str(data)
    'data.frame': 499 obs. of  608 variables:
    $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
    $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
    $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
    $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
    $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
    $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
    $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
    $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
    $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
    $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
    $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...

and the other one is:

    > str(training)
    'data.frame': 375 obs. of  607 variables:
    $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
    $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
    $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
    $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
    $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
    $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
    $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
    $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
    $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
    $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
Does anyone know how should I get the IDs of training from data?
thanks for any help!
Elahe

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correct subsetting in R

R help mailing list-2
But they row.names() cannot give me the IDs






On Wednesday, November 1, 2017 9:45 AM, David Wolfskill <[hidden email]> wrote:



On Wed, Nov 01, 2017 at 04:13:42PM +0000, Elahe chalabi via R-help wrote:

> Hi all,
> I have two data frames that one of them does not have the column ID:
>
>     > str(data)
>     'data.frame':    499 obs. of  608 variables:
>     $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
>     $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
>     $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
>     $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
>     $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
>     $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
>     $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
>     $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
>     $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
>     $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
>     $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...
>
> and the other one is:
>
>     > str(training)
>     'data.frame':    375 obs. of  607 variables:
>     $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
>     $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
>     $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
>     $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
>     $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
>     $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
>     $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
>     $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
>     $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
>     $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
> Does anyone know how should I get the IDs of training from data?
> thanks for any help!
> Elahe
> ....

row.names() appears to be what is wanted.

Peace,
david
--
David H. Wolfskill                [hidden email]
Unsubstantiated claims of "Fake News" are evidence that the claimant lies again.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correct subsetting in R

Eric Berger
In reply to this post by R help mailing list-2
matches <- merge(training,data,by=intersect(names(training),names(data)))

HTH,
Eric


On Wed, Nov 1, 2017 at 6:13 PM, Elahe chalabi via R-help <
[hidden email]> wrote:

> Hi all,
> I have two data frames that one of them does not have the column ID:
>
>     > str(data)
>     'data.frame':       499 obs. of  608 variables:
>     $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
>     $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
>     $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
>     $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
>     $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
>     $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
>     $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
>     $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
>     $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
>     $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
>     $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...
>
> and the other one is:
>
>     > str(training)
>     'data.frame':       375 obs. of  607 variables:
>     $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
>     $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
>     $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
>     $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
>     $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
>     $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
>     $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
>     $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
>     $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
>     $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
> Does anyone know how should I get the IDs of training from data?
> thanks for any help!
> Elahe
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correct subsetting in R

R help mailing list-2
In reply to this post by R help mailing list-2

It's not what I want, the first data frame has 499 observations and the second data frame is a subset of the first one but with 375 observations. I want something that returns the ID for training data frame  


On Wednesday, November 1, 2017 10:18 AM, Eric Berger <[hidden email]> wrote:



matches <- merge(training,data,by=intersect(names(training),names(data)))

HTH,
Eric



On Wed, Nov 1, 2017 at 6:13 PM, Elahe chalabi via R-help <[hidden email]> wrote:

Hi all,

>I have two data frames that one of them does not have the column ID:
>
>    > str(data)
>    'data.frame':       499 obs. of  608 variables:
>    $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
>    $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
>    $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
>    $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
>    $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
>    $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
>    $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
>    $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
>    $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
>    $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
>    $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...
>
>and the other one is:
>
>    > str(training)
>    'data.frame':       375 obs. of  607 variables:
>    $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
>    $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
>    $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
>    $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
>    $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
>    $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
>    $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
>    $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
>    $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
>    $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
>Does anyone know how should I get the IDs of training from data?
>thanks for any help!
>Elahe
>
>______________________________ ________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/ listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correct subsetting in R

Eric Berger
training$TrainingRownum <- 1:nrow(training)
data$DataRownum <- 1:nrow(data)
matches <- merge(training,data,by=intersect(names(training),names(data)))

The data frame 'matches' now has additional columns telling you the row in
each data frame corresponding to the matched items.

Regards,
Eric

On Wed, Nov 1, 2017 at 9:29 PM, Elahe chalabi <[hidden email]>
wrote:

>
> It's not what I want, the first data frame has 499 observations and the
> second data frame is a subset of the first one but with 375 observations. I
> want something that returns the ID for training data frame
>
>
> On Wednesday, November 1, 2017 10:18 AM, Eric Berger <
> [hidden email]> wrote:
>
>
>
> matches <- merge(training,data,by=intersect(names(training),names(data)))
>
> HTH,
> Eric
>
>
>
> On Wed, Nov 1, 2017 at 6:13 PM, Elahe chalabi via R-help <
> [hidden email]> wrote:
>
> Hi all,
> >I have two data frames that one of them does not have the column ID:
> >
> >    > str(data)
> >    'data.frame':       499 obs. of  608 variables:
> >    $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
> >    $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
> >    $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
> >    $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
> >    $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
> >    $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
> >    $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
> >    $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
> >    $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
> >    $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
> >    $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...
> >
> >and the other one is:
> >
> >    > str(training)
> >    'data.frame':       375 obs. of  607 variables:
> >    $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
> >    $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
> >    $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
> >    $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
> >    $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
> >    $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
> >    $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
> >    $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
> >    $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
> >    $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
> >Does anyone know how should I get the IDs of training from data?
> >thanks for any help!
> >Elahe
> >
> >______________________________ ________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/ listinfo/r-help
> >PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Correct subsetting in R

Peter Dalgaard-2
In reply to this post by R help mailing list-2

> On 1 Nov 2017, at 18:03 , Elahe chalabi via R-help <[hidden email]> wrote:
>
> But they row.names() cannot give me the IDs
>

Is "training" extracted from "data" using standard data frame indexing? If so, data[row.names(training), "ID"] should give you the relevant values.

If not, then you are in trouble because you cannot tell the difference between two IDs that have identical responses in columns 2:608. You might proceed with something like

signature1 <- do.call("paste", data)
any(duplicated(signature1)) # if TRUE you're not quite happy because two or more IDs are indistinguishable.

signature2 <- do.call("paste", data)
m <- match(signature2, signature1)

any(duplicated(m)) # ouch if TRUE... will require more thought

any(is.na(m)) # even more ouch, if TRUE...

data$ID[m]


-pd

>
>
>
>
>
> On Wednesday, November 1, 2017 9:45 AM, David Wolfskill <[hidden email]> wrote:
>
>
>
> On Wed, Nov 01, 2017 at 04:13:42PM +0000, Elahe chalabi via R-help wrote:
>
>> Hi all,
>> I have two data frames that one of them does not have the column ID:
>>
>>> str(data)
>>    'data.frame':    499 obs. of  608 variables:
>>    $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
>>    $ alright      : int  1 0 0 0 0 0 0 1 2 1 ...
>>    $ bad          : int  1 0 0 0 0 0 0 0 0 0 ...
>>    $ boy          : int  1 2 1 1 0 2 2 4 2 1 ...
>>    $ cooki        : int  1 2 2 1 0 1 1 4 2 3 ...
>>    $ curtain      : int  1 0 0 0 0 2 0 2 0 0 ...
>>    $ dish         : int  2 1 0 1 0 0 1 2 2 2 ...
>>    $ doesnt       : int  1 0 0 0 0 0 0 0 1 0 ...
>>    $ dont         : int  2 1 4 2 0 0 2 1 2 0 ...
>>    $ fall         : int  3 1 0 0 1 0 1 2 3 2 ...
>>    $ fell         : int  1 0 0 0 0 0 0 0 0 0 ...
>>
>> and the other one is:
>>
>>> str(training)
>>    'data.frame':    375 obs. of  607 variables:
>>    $ alright      : num  1 0 0 0 1 2 1 0 0 0 ...
>>    $ bad          : num  1 0 0 0 0 0 0 0 0 0 ...
>>    $ boy          : num  1 1 2 2 4 2 1 0 1 0 ...
>>    $ cooki        : num  1 1 1 1 4 2 3 1 2 2 ...
>>    $ curtain      : num  1 0 2 0 2 0 0 0 0 0 ...
>>    $ dish         : num  2 1 0 1 2 2 2 1 4 1 ...
>>    $ doesnt       : num  1 0 0 0 0 1 0 0 0 0 ...
>>    $ dont         : num  2 2 0 2 1 2 0 0 1 0 ...
>>    $ fall         : num  3 0 0 1 2 3 2 0 2 0 ...
>>    $ fell         : num  1 0 0 0 0 0 0 0 0 0 ...
>> Does anyone know how should I get the IDs of training from data?
>> thanks for any help!
>> Elahe
>> ....
>
> row.names() appears to be what is wanted.
>
> Peace,
> david
> --
> David H. Wolfskill                [hidden email]
> Unsubstantiated claims of "Fake News" are evidence that the claimant lies again.
>
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.