subsetting comparison problem

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

subsetting comparison problem

Neha Aggarwal
Hello All,
I am facing a unique problem and am unable to find any help in R help pages
or online. I will appreciate your help for the following problem:
I have 2 data-frames, samples below and there is an expected output

R Dataframe1:
            C1              C2   C3         C4...... CN
R1       0                  1       0           1
R2        1                  0      1            1
R3        1                  0       0             0
.
.
.
RN

U Dataframe2 :
             C1         C2        C3         C4...... CN
U1         1           1            0            1
U2         1           1             1            1


Expected Output:
U1 satisfies R1, R3
U2 satisfies R1, R2, R3

So this is a comparison of dataframes problem, with a subset dimension.
There are 2 dataframe R and U. column names are same. There are certain
columns belonging to each row in dataframe 1, denoted as 1s, while there
are certain cols to each U denoted as 1s in each URow in dataframe2.

I have to find relationships between Rs and Us. So i start with each U row
in U dataframe (lets say U1 row) and try to find all the rows in R
dataframe, which are subset of U1 row.

I cant find a way to compare rows to see if one is subset of
another....what can I try, any pointers/ packages will be great help.
Please help.

Thanks
Neha

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Fwd: subsetting comparison problem

Neha Aggarwal
Hello All,
I am facing a unique problem and am unable to find any help in R help pages
or online. I will appreciate your help for the following problem:
I have 2 data-frames, samples below and there is an expected output

R Dataframe1:
            C1              C2   C3         C4...... CN
R1       0                  1       0           1
R2        1                  0      1            1
R3        1                  0       0             0
.
.
.
RN

U Dataframe2 :
             C1         C2        C3         C4...... CN
U1         1           1            0            1
U2         1           1             1            1


Expected Output:
U1 satisfies R1, R3
U2 satisfies R1, R2, R3

So this is a comparison of dataframes problem, with a subset dimension.
There are 2 dataframe R and U. column names are same. There are certain
columns belonging to each row in dataframe 1, denoted as 1s, while there
are certain cols to each U denoted as 1s in each URow in dataframe2.

I have to find relationships between Rs and Us. So i start with each U row
in U dataframe (lets say U1 row) and try to find all the rows in R
dataframe, which are subset of U1 row.

I cant find a way to compare rows to see if one is subset of
another....what can I try, any pointers/ packages will be great help.
Please help.

Thanks
Neha

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subsetting comparison problem

Jeff Newmiller
In reply to this post by Neha Aggarwal
Responses inline.

On Sun, 11 Mar 2018, Neha Aggarwal wrote:

> Hello All,
> I am facing a unique problem and am unable to find any help in R help pages
> or online. I will appreciate your help for the following problem:
> I have 2 data-frames, samples below and there is an expected output
>
> R Dataframe1:
>            C1              C2   C3         C4...... CN
> R1       0                  1       0           1
> R2        1                  0      1            1
> R3        1                  0       0             0
> .
> .
> .
> RN
>
> U Dataframe2 :
>             C1         C2        C3         C4...... CN
> U1         1           1            0            1
> U2         1           1             1            1
>
>
> Expected Output:
> U1 satisfies R1, R3
> U2 satisfies R1, R2, R3
>
> So this is a comparison of dataframes problem, with a subset dimension.
> There are 2 dataframe R and U. column names are same. There are certain
> columns belonging to each row in dataframe 1, denoted as 1s, while there
> are certain cols to each U denoted as 1s in each URow in dataframe2.
>
> I have to find relationships between Rs and Us. So i start with each U row
> in U dataframe (lets say U1 row) and try to find all the rows in R
> dataframe, which are subset of U1 row.
>
> I cant find a way to compare rows to see if one is subset of
> another....what can I try, any pointers/ packages will be great help.
> Please help.
>
> Thanks
> Neha
>
> [[alternative HTML version deleted]]

As the Posting Guide says (you have read it, haven't you?), please post
plain text... the mailing list mangles your code with varying levels of
damage as it tries to fix this problem for you. It also helps if you can
pose your question in R code rather than pseudo-code and formatted data
tables.

Your problem appears to be an outer join of binary subsets... I don't
think this is a very common problem structure (in most cases you want to
avoid outer joins if you can because they are computationally expensive),
but you can read ?outer and ?expand.grid to see some ways to pair up all
possible row indexes.  If you know that the number of rows in both inputs
is <32, this problem can be optimized for speed and memory with the bitops
package, or for larger size problems you can use the bit package. The
below code shows the skeleton of logic with no such optimizations, and is
likely the most practical solution for a one-off analysis:

##############
r <- read.table( text=
"         C1   C2     C3   C4
R1        0     1       0       1
R2        1     0       1       1
R3        1     0       0       0
", header=TRUE )

u <- read.table( text=
"       C1      C2      C3      C4
U1     1       1       0       1
U2      1       1       1       1
", header=TRUE )

rmx <- as.matrix( r )
umx <- as.matrix( u )

result <- expand.grid( R = rownames( rmx )
                      , U = rownames( umx )
                      )

# see how:
1L - umx[ U, ]  # 1 for every 0 in u
rmx[ R, ]       # 1 for every 1 in r
( 1L - umx[ U, ] ) * rmx[ R, ] # 1 where both have 1

# do it:
# for every row, 0 where both conditions are true in any column
result$IN <- 1L - with( result
                       , apply(   ( 1L - umx[ U, ] ) # any 0 column
                                * rmx[ R, ]  # any 1 column
                              , 1  # by rows
                              , max
                              )
                       )
result
# show key pairings only
result[ as.logical( result$IN ), c( "U", "R" ) ]
##############

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subsetting comparison problem

David Winsemius
In reply to this post by Neha Aggarwal

> On Mar 11, 2018, at 3:32 PM, Neha Aggarwal <[hidden email]> wrote:
>
> Hello All,
> I am facing a unique problem and am unable to find any help in R help pages
> or online. I will appreciate your help for the following problem:
> I have 2 data-frames, samples below and there is an expected output
>
> R Dataframe1:
>            C1              C2   C3         C4...... CN
> R1       0                  1       0           1
> R2        1                  0      1            1
> R3        1                  0       0             0
> .
> .
> .
> RN
>
> U Dataframe2 :
>             C1         C2        C3         C4...... CN
> U1         1           1            0            1
> U2         1           1             1            1
>
>
> Expected Output:
> U1 satisfies R1, R3
> U2 satisfies R1, R2, R3
>

I don't think you have communicated what sort of meaning is attached to the word "satisfies".

Here's a double loop that reports membership of the column names of each row of U (Dataframe2) in each row of R (Dataframe1):

 apply( Dataframe2, 1, function(x){ z <- which(x==1);
                                   z2 <- names(x)[z];  
                        zlist=apply(Dataframe1, 1, function(y){ z3 <- which(y==1);
                                                                z4 <- names(y)[z3];
                                                                z4[ which(z4 %in% z2) ]});
                        zlist})
$U1
$U1$R1
[1] "C2" "C4"

$U1$R2
[1] "C1" "C4"

$U1$R3
[1] "C1"


$U2
$U2$R1
[1] "C2" "C4"

$U2$R2
[1] "C1" "C3" "C4"

$U2$R3
[1] "C1"

--
David.


> So this is a comparison of dataframes problem, with a subset dimension.
> There are 2 dataframe R and U. column names are same. There are certain
> columns belonging to each row in dataframe 1, denoted as 1s, while there
> are certain cols to each U denoted as 1s in each URow in dataframe2.
>
> I have to find relationships between Rs and Us. So i start with each U row
> in U dataframe (lets say U1 row) and try to find all the rows in R
> dataframe, which are subset of U1 row.
>
> I cant find a way to compare rows to see if one is subset of
> another....what can I try, any pointers/ packages will be great help.
> Please help.
>
> Thanks
> Neha
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: subsetting comparison problem

Jim Lemon-4
Hi Neha,
This might help:

R<-read.table(text="C1 C2 C3 C4
R1 0 1 0 1
R2 1 0 1 1
R3 1 0 0 0",
header=TRUE)
U<-read.table(text="C1 C2 C3 C4
U1 1 1 0 1
U2 1 1 1 1",
header=TRUE)
# these are matrices - I think this will work for dataframes as well
for(ui in 1:dim(U)[1]) {
 for(ri in 1:dim(R)[1]) {
  if(sum(U[ui,]&R[ri,])==sum(R[ri,]))
   cat("R$",rownames(R)[ri]," subset of ","U$",rownames(U)[ui],"\n",sep="")
 }
}

Jim

On Mon, Mar 12, 2018 at 1:59 PM, David Winsemius <[hidden email]> wrote:

>
>> On Mar 11, 2018, at 3:32 PM, Neha Aggarwal <[hidden email]> wrote:
>>
>> Hello All,
>> I am facing a unique problem and am unable to find any help in R help pages
>> or online. I will appreciate your help for the following problem:
>> I have 2 data-frames, samples below and there is an expected output
>>
>> R Dataframe1:
>>            C1              C2   C3         C4...... CN
>> R1       0                  1       0           1
>> R2        1                  0      1            1
>> R3        1                  0       0             0
>> .
>> .
>> .
>> RN
>>
>> U Dataframe2 :
>>             C1         C2        C3         C4...... CN
>> U1         1           1            0            1
>> U2         1           1             1            1
>>
>>
>> Expected Output:
>> U1 satisfies R1, R3
>> U2 satisfies R1, R2, R3
>>
>
> I don't think you have communicated what sort of meaning is attached to the word "satisfies".
>
> Here's a double loop that reports membership of the column names of each row of U (Dataframe2) in each row of R (Dataframe1):
>
>  apply( Dataframe2, 1, function(x){ z <- which(x==1);
>                                    z2 <- names(x)[z];
>                         zlist=apply(Dataframe1, 1, function(y){ z3 <- which(y==1);
>                                                                 z4 <- names(y)[z3];
>                                                                 z4[ which(z4 %in% z2) ]});
>                         zlist})
> $U1
> $U1$R1
> [1] "C2" "C4"
>
> $U1$R2
> [1] "C1" "C4"
>
> $U1$R3
> [1] "C1"
>
>
> $U2
> $U2$R1
> [1] "C2" "C4"
>
> $U2$R2
> [1] "C1" "C3" "C4"
>
> $U2$R3
> [1] "C1"
>
> --
> David.
>
>
>> So this is a comparison of dataframes problem, with a subset dimension.
>> There are 2 dataframe R and U. column names are same. There are certain
>> columns belonging to each row in dataframe 1, denoted as 1s, while there
>> are certain cols to each U denoted as 1s in each URow in dataframe2.
>>
>> I have to find relationships between Rs and Us. So i start with each U row
>> in U dataframe (lets say U1 row) and try to find all the rows in R
>> dataframe, which are subset of U1 row.
>>
>> I cant find a way to compare rows to see if one is subset of
>> another....what can I try, any pointers/ packages will be great help.
>> Please help.
>>
>> Thanks
>> Neha
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.