merging tables based on both row and column names

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

merging tables based on both row and column names

baccts
Dear R users,

I am trying to merge tables based on both their row names and column names.
My ultimate goal is to build a distribution table of values for each combination of row and column names.
I have more test tables, more x's and y's than in the toy example below.
Thanks in advance for your help.

For example :
test1 <- data.frame(rbind(c(0.1,0.2),0.3,0.1))
rownames(test1)=c('y1','y2','y3')
colnames(test1) = c('x1','x2');
test2 <- data.frame(rbind(c(0.8,0.9,0.5),c(0.5,0.1,0.6)))
rownames(test2) = c('y2','y5')
colnames(test2) = c('x1','x3','x2')

test1
       x1   x2
y1  0.1  0.2
y2  0.3  0.3
y3  0.1  0.1

test2
       x1   x3   x2
y2  0.8  0.9  0.5
y5  0.5  0.1  0.6

I would like to combine test1 and test2 such that if the column name and row name are both the same they are combined.

combined_test
           x1              x2             x3
y1      0.1              0.2           NA
y2  (0.3,0.8)    (0.3,0.5)      0.9
y3      0.1              0.1           NA
y5      0.5              0.6           0.1

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging tables based on both row and column names

Frank Schwidom

test1 <- (rbind(c(0.1,0.2),0.3,0.1))
rownames(test1)=c('y1','y2','y3')
colnames(test1) = c('x1','x2');
test2 <- (rbind(c(0.8,0.9,0.5),c(0.5,0.1,0.6)))
rownames(test2) = c('y2','y5')
colnames(test2) = c('x1','x3','x2')


lTest12 <- list( test1, test2)
namesRow <- unique( unlist( lapply( lTest12, rownames)))
namesCol <- unique( unlist( lapply( lTest12, colnames)))
tmp1 <- do.call( cbind, lapply( lTest12, function( x) as.vector( x[ match( namesRow, rownames( x)), match( namesCol, colnames( x))])))
tmp2 <- apply( tmp1, 1, list)
tmp3 <- lapply( tmp2, function( x) x[[1]][ !is.na( x[[1]])])
dimnames1 <- list( namesRow, namesCol)
tmp4 <- array( data= tmp3, dim= sapply( dimnames1, length), dimnames= dimnames1)

paste( tmp4)

 [1] "0.1"         "c(0.3, 0.8)" "0.1"         "0.5"         "0.2"        
 [6] "c(0.3, 0.5)" "0.1"         "0.6"         "numeric(0)"  "0.9"        
[11] "numeric(0)"  "0.1"        

tmp4
   x1        x2        x3      
y1 0.1       0.2       Numeric,0
y2 Numeric,2 Numeric,2 0.9      
y3 0.1       0.1       Numeric,0
y5 0.5       0.6       0.1      


Regards.


On 2015-09-28 18:46:18, C Lin wrote:

> Dear R users,
>
> I am trying to merge tables based on both their row names and column names.
> My ultimate goal is to build a distribution table of values for each combination of row and column names.
> I have more test tables, more x's and y's than in the toy example below.
> Thanks in advance for your help.
>
> For example :
> test1 <- data.frame(rbind(c(0.1,0.2),0.3,0.1))
> rownames(test1)=c('y1','y2','y3')
> colnames(test1) = c('x1','x2');
> test2 <- data.frame(rbind(c(0.8,0.9,0.5),c(0.5,0.1,0.6)))
> rownames(test2) = c('y2','y5')
> colnames(test2) = c('x1','x3','x2')
>
> test1
>        x1   x2
> y1  0.1  0.2
> y2  0.3  0.3
> y3  0.1  0.1
>
> test2
>        x1   x3   x2
> y2  0.8  0.9  0.5
> y5  0.5  0.1  0.6
>
> I would like to combine test1 and test2 such that if the column name and row name are both the same they are combined.
>
> combined_test
>            x1              x2             x3
> y1      0.1              0.2           NA
> y2  (0.3,0.8)    (0.3,0.5)      0.9
> y3      0.1              0.1           NA
> y5      0.5              0.6           0.1
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging tables based on both row and column names

Giorgio Garziano
In reply to this post by baccts
Another approach:

test1 <- data.frame(rbind(c(0.1,0.2),0.3,0.1))
rownames(test1) = c('y1','y2','y3')
colnames(test1) = c('x1','x2');
test2 <- data.frame(rbind(c(0.8,0.9,0.5),c(0.5,0.1,0.6)))
rownames(test2) = c('y2','y5')
colnames(test2) = c('x1','x3','x2')

> test1
    x1  x2
y1 0.1 0.2
y2 0.3 0.3
y3 0.1 0.1

> test2
    x1  x3  x2
y2 0.8 0.9 0.5
y5 0.5 0.1 0.6

t1.r <- rownames(test1)
t2.r <- rownames(test2)
t1.c <- colnames(test1)
t2.c <- colnames(test2)

col <- unique(union(t1.c, t2.c))
ncol <- length(col)
row <- unique(union(t1.r, t2.r))
nrow <- length(row)

m <- matrix(list(), nrow=nrow, ncol=ncol)
rownames(m) <- row
colnames(m) <- col

for (i in 1:nrow) {
 for (j in 1:ncol) {
     rowname <- row[i]
     colname <- col[j]
     v <- c()
     if (!is.null(test1[rowname, colname]) && !is.na(test1[rowname, colname])) {
       v <- c(test1[rowname, colname])
     }
     if (!is.null(test2[rowname, colname]) && !is.na(test2[rowname, colname])) {
       v <- c(v, test2[rowname, colname])
     }
     if (!is.null(v)) {
       m[rowname, colname] <- list(v)
     } else {
       m[rowname, colname] <- NA
     }
  }
}


> m
x1        x2        x3
y1 0.1       0.2       NA
y2 Numeric,2 Numeric,2 0.9
y3 0.1       0.1       NA
y5 0.5       0.6       0.1


> m["y2",]
$x1
[1] 0.3 0.8

$x2
[1] 0.3 0.5

$x3
[1] 0.9

--
Giorgio Garziano

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging tables based on both row and column names

baccts
Thank you, Frank and Giorgio for your replies. Both of your solutions work for my need.
Frank, I ended up using your codes. Like that it's short.

For multiple test data frame, I ended up creating a list in the beginning and put each test data frame into the list as such:

ltest = list();
ltest[[i]]=test;

Thanks again for your help.

Lin
Reply | Threaded
Open this post in threaded view
|

Re: merging tables based on both row and column names

Giorgio Garziano
In reply to this post by baccts
I reworked Frank Schwidom's solution to make it shorter than its original version.

  test1 <- (rbind(c(0.1,0.2),0.3,0.1))
  rownames(test1)=c('y1','y2','y3')
  colnames(test1) = c('x1','x2');
  test2 <- (rbind(c(0.8,0.9,0.5),c(0.5,0.1,0.6)))
  rownames(test2) = c('y2','y5')
  colnames(test2) = c('x1','x3','x2')

  lTest12 <- list(test1, test2)
  namesRow <- unique( unlist( lapply(lTest12, rownames)))
  namesCol <- unique( unlist( lapply(lTest12, colnames)))

# here reworked code starts

  tmp1 <- sapply(lTest12, function(x) as.vector( x[match(namesRow, rownames(x)), match(namesCol, colnames(x))]))
  tmp2 <- apply(tmp1, 1, function(x) { na.omit(x) })
  dimnames1 <- list(namesRow, namesCol)
  tmp3 <- array(data = tmp2, dim = sapply(dimnames1, length), dimnames = dimnames1)
  tmp3

  > paste(tmp3)
  [1] "0.1"         "c(0.3, 0.8)" "0.1"         "0.5"         "0.2"
  [6] "c(0.3, 0.5)" "0.1"         "0.6"         "numeric(0)"  "0.9"
  [11] "numeric(0)"  "0.1"

  > tmp3
  x1           x2        x3
  y1 0.1       0.2       Numeric,0
  y2 Numeric,2 Numeric,2 0.9
  y3 0.1       0.1       Numeric,0
  y5 0.5       0.6       0.1
  >

  > tmp3["y2","x1"]
  [[1]]
  [1] 0.3 0.8

  > tmp3["y2","x2"]
  [[1]]
  [1] 0.3 0.5


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: merging tables based on both row and column names

Giorgio Garziano
In reply to this post by baccts
Replacing na.omit() with !is.na() appears to improve performance with time.

  rm(list=ls())

  test1 <- (rbind(c(0.1,0.2),0.3,0.1))
  rownames(test1)=c('y1','y2','y3')
  colnames(test1) = c('x1','x2');
  test2 <- (rbind(c(0.8,0.9,0.5),c(0.5,0.1,0.6)))
  rownames(test2) = c('y2','y5')
  colnames(test2) = c('x1','x3','x2')

  solution_3 <- function(test1, test2) {
    lTest12 <- list(test1, test2)
    namesRow <- unique( unlist( lapply(lTest12, rownames)))
    namesCol <- unique( unlist( lapply(lTest12, colnames)))
    tmp1 <- sapply(lTest12, function(x) as.vector(x[match(namesRow, rownames(x)), match(namesCol, colnames(x))]))
    tmp2 <- apply(tmp1, 1, function(x) { x[!is.na(x)] })
    dimnames1 <- list(namesRow, namesCol)
    tmp3 <- array(data = tmp2, dim = sapply(dimnames1, length), dimnames = dimnames1)
    tmp3
  }

  system.time(for(i in 1:10000) {solution_3(test1, test2)})




        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.