Bilateral matrix

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Bilateral matrix

Miluji Sb
I have data on current and previous location of individuals. I would like
to have a matrix with bilateral movement between locations. I would like
the final output to look like the second table below.

I have tried using crosstab() from the ecodist but I do not have another
variable to measure the flow. Ultimately I would like to compute the
probability of movement between cities (movement to city_i/total movement
from city_j).

Is it possible to aggregate the data in this way? Any guidance would be
highly appreciated. Thank you!

# Original data
structure(list(id = 101:115, current_location = structure(c(2L,
8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
c("Austin",
"Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
"New York"), class = "factor"), previous_location = structure(c(6L,
2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
c("Atlanta",
"Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
), class = "factor")), class = "data.frame", row.names = c(NA,
-15L))

# Expected output
structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
"Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
    New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
"data.frame", row.names = c(NA,
-3L))

Sincerely,

Milu

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

Huzefa Khalil
Dear Miluji,

If I understand correctly, this should get you what you need.

temp1 <-
structure(list(id = 101:115, current_location = structure(c(2L,
8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
c("Austin",
"Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
"New York"), class = "factor"), previous_location = structure(c(6L,
2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
c("Atlanta",
"Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
), class = "factor")), class = "data.frame", row.names = c(NA,
-15L))

dcast(temp1, previous_location ~ current_location)

On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:

> I have data on current and previous location of individuals. I would like
> to have a matrix with bilateral movement between locations. I would like
> the final output to look like the second table below.
>
> I have tried using crosstab() from the ecodist but I do not have another
> variable to measure the flow. Ultimately I would like to compute the
> probability of movement between cities (movement to city_i/total movement
> from city_j).
>
> Is it possible to aggregate the data in this way? Any guidance would be
> highly appreciated. Thank you!
>
> # Original data
> structure(list(id = 101:115, current_location = structure(c(2L,
> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
> c("Austin",
> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
> "New York"), class = "factor"), previous_location = structure(c(6L,
> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
> c("Atlanta",
> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
> ), class = "factor")), class = "data.frame", row.names = c(NA,
> -15L))
>
> # Expected output
> structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
> "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
>     New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
> "data.frame", row.names = c(NA,
> -3L))
>
> Sincerely,
>
> Milu
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

Bert Gunter-2
or in base R : ?xtabs    ??

as in:
xtabs(~previous_location + current_location,data=x)

(You can convert the 0s to NA's if you like)


Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <[hidden email]>
wrote:

> Dear Miluji,
>
> If I understand correctly, this should get you what you need.
>
> temp1 <-
> structure(list(id = 101:115, current_location = structure(c(2L,
> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
> c("Austin",
> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
> "New York"), class = "factor"), previous_location = structure(c(6L,
> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
> c("Atlanta",
> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
> ), class = "factor")), class = "data.frame", row.names = c(NA,
> -15L))
>
> dcast(temp1, previous_location ~ current_location)
>
> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:
> > I have data on current and previous location of individuals. I would like
> > to have a matrix with bilateral movement between locations. I would like
> > the final output to look like the second table below.
> >
> > I have tried using crosstab() from the ecodist but I do not have another
> > variable to measure the flow. Ultimately I would like to compute the
> > probability of movement between cities (movement to city_i/total movement
> > from city_j).
> >
> > Is it possible to aggregate the data in this way? Any guidance would be
> > highly appreciated. Thank you!
> >
> > # Original data
> > structure(list(id = 101:115, current_location = structure(c(2L,
> > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
> > c("Austin",
> > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
> > "New York"), class = "factor"), previous_location = structure(c(6L,
> > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
> > c("Atlanta",
> > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
> > ), class = "factor")), class = "data.frame", row.names = c(NA,
> > -15L))
> >
> > # Expected output
> > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
> > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
> >     New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
> > "data.frame", row.names = c(NA,
> > -3L))
> >
> > Sincerely,
> >
> > Milu
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

Miluji Sb
Dear Bert and Huzefa,

Apologies for the late reply, my account got hacked and I have just managed
to recover it.

Thank you very much for your replies and the solutions. Both work well.

I was wondering if there was any way to ensure (force) that all possible
combinations show up in the output. The full dataset has 25 cities but of
course people have not moved from Boston to all the other 24 cities. I
would like all the combinations if possible.

Thank you again!

Sincerely,

Milu

On Tue, May 8, 2018 at 6:28 PM, Bert Gunter <[hidden email]> wrote:

> or in base R : ?xtabs    ??
>
> as in:
> xtabs(~previous_location + current_location,data=x)
>
> (You can convert the 0s to NA's if you like)
>
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <[hidden email]>
> wrote:
>
>> Dear Miluji,
>>
>> If I understand correctly, this should get you what you need.
>>
>> temp1 <-
>> structure(list(id = 101:115, current_location = structure(c(2L,
>> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>> c("Austin",
>> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>> "New York"), class = "factor"), previous_location = structure(c(6L,
>> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>> c("Atlanta",
>> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>> ), class = "factor")), class = "data.frame", row.names = c(NA,
>> -15L))
>>
>> dcast(temp1, previous_location ~ current_location)
>>
>> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:
>> > I have data on current and previous location of individuals. I would
>> like
>> > to have a matrix with bilateral movement between locations. I would like
>> > the final output to look like the second table below.
>> >
>> > I have tried using crosstab() from the ecodist but I do not have another
>> > variable to measure the flow. Ultimately I would like to compute the
>> > probability of movement between cities (movement to city_i/total
>> movement
>> > from city_j).
>> >
>> > Is it possible to aggregate the data in this way? Any guidance would be
>> > highly appreciated. Thank you!
>> >
>> > # Original data
>> > structure(list(id = 101:115, current_location = structure(c(2L,
>> > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>> > c("Austin",
>> > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>> > "New York"), class = "factor"), previous_location = structure(c(6L,
>> > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>> > c("Atlanta",
>> > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>> > ), class = "factor")), class = "data.frame", row.names = c(NA,
>> > -15L))
>> >
>> > # Expected output
>> > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
>> > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
>> >     New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
>> > "data.frame", row.names = c(NA,
>> > -3L))
>> >
>> > Sincerely,
>> >
>> > Milu
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

R help mailing list-2
Make current_location and previous_location factors with the same set of
levels.  The levels could be the union of the values in the two columns or
a predetermined list.  E.g.,

> x <- data.frame(previous_location=c("Mount Vernon","Burlington"),
current_location=c("Sedro Woolley","Burlington"))
> allCities <- levels(factor(unlist(x))) # union of observed values
> allCities
[1] "Burlington"    "Mount Vernon"  "Sedro Woolley"
> x[] <- lapply(x, factor, levels=allCities)
> xtabs(~previous_location + current_location,data=x)
                 current_location
previous_location Burlington Mount Vernon Sedro Woolley
    Burlington             1            0             0
    Mount Vernon           0            0             1
    Sedro Woolley          0            0             0

or, using an externally determined set of cities

> allCities <- c("Anacortes","Burlington","Concrete","Mount Vernon","Sedro
Woolley")
> x[] <- lapply(x, factor, levels=allCities)
> xtabs(~previous_location + current_location,data=x)
                 current_location
previous_location Anacortes Burlington Concrete Mount Vernon Sedro Woolley
    Anacortes             0          0        0            0             0
    Burlington            0          1        0            0             0
    Concrete              0          0        0            0             0
    Mount Vernon          0          0        0            0             1
    Sedro Woolley         0          0        0            0             0


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, May 16, 2018 at 7:49 AM, Miluji Sb <[hidden email]> wrote:

> Dear Bert and Huzefa,
>
> Apologies for the late reply, my account got hacked and I have just managed
> to recover it.
>
> Thank you very much for your replies and the solutions. Both work well.
>
> I was wondering if there was any way to ensure (force) that all possible
> combinations show up in the output. The full dataset has 25 cities but of
> course people have not moved from Boston to all the other 24 cities. I
> would like all the combinations if possible.
>
> Thank you again!
>
> Sincerely,
>
> Milu
>
> On Tue, May 8, 2018 at 6:28 PM, Bert Gunter <[hidden email]>
> wrote:
>
> > or in base R : ?xtabs    ??
> >
> > as in:
> > xtabs(~previous_location + current_location,data=x)
> >
> > (You can convert the 0s to NA's if you like)
> >
> >
> > Cheers,
> > Bert
> >
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <[hidden email]>
> > wrote:
> >
> >> Dear Miluji,
> >>
> >> If I understand correctly, this should get you what you need.
> >>
> >> temp1 <-
> >> structure(list(id = 101:115, current_location = structure(c(2L,
> >> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
> >> c("Austin",
> >> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
> >> "New York"), class = "factor"), previous_location = structure(c(6L,
> >> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
> >> c("Atlanta",
> >> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
> >> ), class = "factor")), class = "data.frame", row.names = c(NA,
> >> -15L))
> >>
> >> dcast(temp1, previous_location ~ current_location)
> >>
> >> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:
> >> > I have data on current and previous location of individuals. I would
> >> like
> >> > to have a matrix with bilateral movement between locations. I would
> like
> >> > the final output to look like the second table below.
> >> >
> >> > I have tried using crosstab() from the ecodist but I do not have
> another
> >> > variable to measure the flow. Ultimately I would like to compute the
> >> > probability of movement between cities (movement to city_i/total
> >> movement
> >> > from city_j).
> >> >
> >> > Is it possible to aggregate the data in this way? Any guidance would
> be
> >> > highly appreciated. Thank you!
> >> >
> >> > # Original data
> >> > structure(list(id = 101:115, current_location = structure(c(2L,
> >> > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
> >> > c("Austin",
> >> > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
> >> > "New York"), class = "factor"), previous_location = structure(c(6L,
> >> > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
> >> > c("Atlanta",
> >> > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
> >> > ), class = "factor")), class = "data.frame", row.names = c(NA,
> >> > -15L))
> >> >
> >> > # Expected output
> >> > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
> >> > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
> >> >     New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
> >> > "data.frame", row.names = c(NA,
> >> > -3L))
> >> >
> >> > Sincerely,
> >> >
> >> > Milu
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posti
> >> ng-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posti
> >> ng-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

Bert Gunter-2
In reply to this post by Miluji Sb
xtabs does this automatically if your cross classifying variables are
factors with levels all the cities (sorted, if you like):

 > x <- sample(letters[1:5],8, rep=TRUE)
> y <- sample(letters[1:5],8,rep=TRUE)

> xtabs(~ x + y)
   y
x   c d e
  a 1 0 0
  b 0 0 1
  c 1 0 0
  d 1 1 1
  e 1 1 0

> lvls <- sort(union(x,y))
> x <- factor(x, levels = lvls)
> y <- factor(y, levels = lvls)

> xtabs( ~ x + y)
   y
x   a b c d e
  a 0 0 1 0 0
  b 0 0 0 0 1
  c 0 0 1 0 0
  d 0 0 1 1 1
  e 0 0 1 1 0

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, May 16, 2018 at 7:49 AM, Miluji Sb <[hidden email]> wrote:

> Dear Bert and Huzefa,
>
> Apologies for the late reply, my account got hacked and I have just
> managed to recover it.
>
> Thank you very much for your replies and the solutions. Both work well.
>
> I was wondering if there was any way to ensure (force) that all possible
> combinations show up in the output. The full dataset has 25 cities but of
> course people have not moved from Boston to all the other 24 cities. I
> would like all the combinations if possible.
>
> Thank you again!
>
> Sincerely,
>
> Milu
>
> On Tue, May 8, 2018 at 6:28 PM, Bert Gunter <[hidden email]>
> wrote:
>
>> or in base R : ?xtabs    ??
>>
>> as in:
>> xtabs(~previous_location + current_location,data=x)
>>
>> (You can convert the 0s to NA's if you like)
>>
>>
>> Cheers,
>> Bert
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <[hidden email]>
>> wrote:
>>
>>> Dear Miluji,
>>>
>>> If I understand correctly, this should get you what you need.
>>>
>>> temp1 <-
>>> structure(list(id = 101:115, current_location = structure(c(2L,
>>> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>>> c("Austin",
>>> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>>> "New York"), class = "factor"), previous_location = structure(c(6L,
>>> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>>> c("Atlanta",
>>> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>>> ), class = "factor")), class = "data.frame", row.names = c(NA,
>>> -15L))
>>>
>>> dcast(temp1, previous_location ~ current_location)
>>>
>>> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:
>>> > I have data on current and previous location of individuals. I would
>>> like
>>> > to have a matrix with bilateral movement between locations. I would
>>> like
>>> > the final output to look like the second table below.
>>> >
>>> > I have tried using crosstab() from the ecodist but I do not have
>>> another
>>> > variable to measure the flow. Ultimately I would like to compute the
>>> > probability of movement between cities (movement to city_i/total
>>> movement
>>> > from city_j).
>>> >
>>> > Is it possible to aggregate the data in this way? Any guidance would be
>>> > highly appreciated. Thank you!
>>> >
>>> > # Original data
>>> > structure(list(id = 101:115, current_location = structure(c(2L,
>>> > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>>> > c("Austin",
>>> > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>>> > "New York"), class = "factor"), previous_location = structure(c(6L,
>>> > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>>> > c("Atlanta",
>>> > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>>> > ), class = "factor")), class = "data.frame", row.names = c(NA,
>>> > -15L))
>>> >
>>> > # Expected output
>>> > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
>>> > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
>>> >     New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
>>> > "data.frame", row.names = c(NA,
>>> > -3L))
>>> >
>>> > Sincerely,
>>> >
>>> > Milu
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

Miluji Sb
Dear William and Ben,

Thank you for your replies and elegant solutions. I am having trouble with
the fact that two of the previous locations do not appear in current
locations (that is no one moved to OKC and Dallas from other cities), so
these two cities are not being included in the output.

I have provided a better sample of the data and the ideal output (wide form
- a 10x10 bilateral matrix) but haven't been able to do this. Would it be
easier if I create variable for each ID - it would be equal to 1 if the
person moved? I am a bit lost - thank you again!

### data
structure(list(ID = 1:12, previous_location. = structure(c(3L,
9L, 8L, 10L, 2L, 5L, 1L, 7L, 4L, 6L, 10L, 5L), .Label = c("Atlanta",
"Austin", "Boston", "Cambridge", "Dallas", "Durham", "Lynn",
"New Orleans", "New York", "OKC"), class = "factor"), current_location. =
structure(c(8L,
3L, 3L, 8L, 4L, 1L, 4L, 5L, 6L, 4L, 7L, 2L), .Label = c("Atlanta",
"Austin", "Boston", "Cambridge", "Durham", "Lynn", "New Orleans",
"New York"), class = "factor")), class = "data.frame", row.names = c(NA,
-12L))

### ideal output
structure(list(previous_location. = structure(c(3L, 9L, 8L, 10L,
2L, 5L, 1L, 7L, 4L, 6L), .Label = c("Atlanta", "Austin", "Boston",
"Cambridge", "Dallas", "Durham", "Lynn", "New Orleans", "New York",
"OKC"), class = "factor"), Boston = c(0L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), New.York = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), New.Orleans = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L), OKC = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Austin = c(0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), Dallas = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Atlanta = c(0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L), Lynn = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L), Cambridge = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L), Durham = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names =
c(NA,
-10L))

Sincerely,

Milu

On Wed, May 16, 2018 at 5:12 PM, Bert Gunter <[hidden email]> wrote:

> xtabs does this automatically if your cross classifying variables are
> factors with levels all the cities (sorted, if you like):
>
>  > x <- sample(letters[1:5],8, rep=TRUE)
> > y <- sample(letters[1:5],8,rep=TRUE)
>
> > xtabs(~ x + y)
>    y
> x   c d e
>   a 1 0 0
>   b 0 0 1
>   c 1 0 0
>   d 1 1 1
>   e 1 1 0
>
> > lvls <- sort(union(x,y))
> > x <- factor(x, levels = lvls)
> > y <- factor(y, levels = lvls)
>
> > xtabs( ~ x + y)
>    y
> x   a b c d e
>   a 0 0 1 0 0
>   b 0 0 0 0 1
>   c 0 0 1 0 0
>   d 0 0 1 1 1
>   e 0 0 1 1 0
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Wed, May 16, 2018 at 7:49 AM, Miluji Sb <[hidden email]> wrote:
>
>> Dear Bert and Huzefa,
>>
>> Apologies for the late reply, my account got hacked and I have just
>> managed to recover it.
>>
>> Thank you very much for your replies and the solutions. Both work well.
>>
>> I was wondering if there was any way to ensure (force) that all possible
>> combinations show up in the output. The full dataset has 25 cities but of
>> course people have not moved from Boston to all the other 24 cities. I
>> would like all the combinations if possible.
>>
>> Thank you again!
>>
>> Sincerely,
>>
>> Milu
>>
>> On Tue, May 8, 2018 at 6:28 PM, Bert Gunter <[hidden email]>
>> wrote:
>>
>>> or in base R : ?xtabs    ??
>>>
>>> as in:
>>> xtabs(~previous_location + current_location,data=x)
>>>
>>> (You can convert the 0s to NA's if you like)
>>>
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>> On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <[hidden email]>
>>> wrote:
>>>
>>>> Dear Miluji,
>>>>
>>>> If I understand correctly, this should get you what you need.
>>>>
>>>> temp1 <-
>>>> structure(list(id = 101:115, current_location = structure(c(2L,
>>>> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>>>> c("Austin",
>>>> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>>>> "New York"), class = "factor"), previous_location = structure(c(6L,
>>>> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>>>> c("Atlanta",
>>>> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>>>> ), class = "factor")), class = "data.frame", row.names = c(NA,
>>>> -15L))
>>>>
>>>> dcast(temp1, previous_location ~ current_location)
>>>>
>>>> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:
>>>> > I have data on current and previous location of individuals. I would
>>>> like
>>>> > to have a matrix with bilateral movement between locations. I would
>>>> like
>>>> > the final output to look like the second table below.
>>>> >
>>>> > I have tried using crosstab() from the ecodist but I do not have
>>>> another
>>>> > variable to measure the flow. Ultimately I would like to compute the
>>>> > probability of movement between cities (movement to city_i/total
>>>> movement
>>>> > from city_j).
>>>> >
>>>> > Is it possible to aggregate the data in this way? Any guidance would
>>>> be
>>>> > highly appreciated. Thank you!
>>>> >
>>>> > # Original data
>>>> > structure(list(id = 101:115, current_location = structure(c(2L,
>>>> > 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>>>> > c("Austin",
>>>> > "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>>>> > "New York"), class = "factor"), previous_location = structure(c(6L,
>>>> > 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>>>> > c("Atlanta",
>>>> > "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>>>> > ), class = "factor")), class = "data.frame", row.names = c(NA,
>>>> > -15L))
>>>> >
>>>> > # Expected output
>>>> > structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
>>>> > "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
>>>> >     New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
>>>> > "data.frame", row.names = c(NA,
>>>> > -3L))
>>>> >
>>>> > Sincerely,
>>>> >
>>>> > Milu
>>>> >
>>>> >         [[alternative HTML version deleted]]
>>>> >
>>>> > ______________________________________________
>>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>> > PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Bilateral matrix

David Winsemius

> On May 17, 2018, at 6:40 AM, Miluji Sb <[hidden email]> wrote:
>
> Dear William and Ben,
>
> Thank you for your replies and elegant solutions. I am having trouble with
> the fact that two of the previous locations do not appear in current
> locations (that is no one moved to OKC and Dallas from other cities), so
> these two cities are not being included in the output.

William told you to make sure that the two location factors had the same levels (aka Labels). At the moment they do not. Dallas and OKC are missing from the Labels in current_location. Bert showed you how to do that.

Please read all the replies for meaning.

--
David.

> I have provided a better sample of the data and the ideal output (wide form
> - a 10x10 bilateral matrix) but haven't been able to do this. Would it be
> easier if I create variable for each ID - it would be equal to 1 if the
> person moved? I am a bit lost - thank you again!
>
> ### data
> structure(list(ID = 1:12, previous_location. = structure(c(3L,
> 9L, 8L, 10L, 2L, 5L, 1L, 7L, 4L, 6L, 10L, 5L), .Label = c("Atlanta",
> "Austin", "Boston", "Cambridge", "Dallas", "Durham", "Lynn",
> "New Orleans", "New York", "OKC"), class = "factor"), current_location. =
> structure(c(8L,
> 3L, 3L, 8L, 4L, 1L, 4L, 5L, 6L, 4L, 7L, 2L), .Label = c("Atlanta",
> "Austin", "Boston", "Cambridge", "Durham", "Lynn", "New Orleans",
> "New York"), class = "factor")), class = "data.frame", row.names = c(NA,
> -12L))
>
> ### ideal output
> structure(list(previous_location. = structure(c(3L, 9L, 8L, 10L,
> 2L, 5L, 1L, 7L, 4L, 6L), .Label = c("Atlanta", "Austin", "Boston",
> "Cambridge", "Dallas", "Durham", "Lynn", "New Orleans", "New York",
> "OKC"), class = "factor"), Boston = c(0L, 1L, 1L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), New.York = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
> 0L, 0L), New.Orleans = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
> 0L), OKC = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Austin = c(0L,
> 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), Dallas = c(0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L), Atlanta = c(0L, 0L, 0L, 0L, 0L, 1L,
> 0L, 0L, 0L, 0L), Lynn = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
> 0L), Cambridge = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L), Durham = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names =
> c(NA,
> -10L))
>
> Sincerely,
>
> Milu
>
> On Wed, May 16, 2018 at 5:12 PM, Bert Gunter <[hidden email]> wrote:
>
>> xtabs does this automatically if your cross classifying variables are
>> factors with levels all the cities (sorted, if you like):
>>
>>> x <- sample(letters[1:5],8, rep=TRUE)
>>> y <- sample(letters[1:5],8,rep=TRUE)
>>
>>> xtabs(~ x + y)
>>   y
>> x   c d e
>>  a 1 0 0
>>  b 0 0 1
>>  c 1 0 0
>>  d 1 1 1
>>  e 1 1 0
>>
>>> lvls <- sort(union(x,y))
>>> x <- factor(x, levels = lvls)
>>> y <- factor(y, levels = lvls)
>>
>>> xtabs( ~ x + y)
>>   y
>> x   a b c d e
>>  a 0 0 1 0 0
>>  b 0 0 0 0 1
>>  c 0 0 1 0 0
>>  d 0 0 1 1 1
>>  e 0 0 1 1 0
>>
>> Cheers,
>> Bert
>>
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Wed, May 16, 2018 at 7:49 AM, Miluji Sb <[hidden email]> wrote:
>>
>>> Dear Bert and Huzefa,
>>>
>>> Apologies for the late reply, my account got hacked and I have just
>>> managed to recover it.
>>>
>>> Thank you very much for your replies and the solutions. Both work well.
>>>
>>> I was wondering if there was any way to ensure (force) that all possible
>>> combinations show up in the output. The full dataset has 25 cities but of
>>> course people have not moved from Boston to all the other 24 cities. I
>>> would like all the combinations if possible.
>>>
>>> Thank you again!
>>>
>>> Sincerely,
>>>
>>> Milu
>>>
>>> On Tue, May 8, 2018 at 6:28 PM, Bert Gunter <[hidden email]>
>>> wrote:
>>>
>>>> or in base R : ?xtabs    ??
>>>>
>>>> as in:
>>>> xtabs(~previous_location + current_location,data=x)
>>>>
>>>> (You can convert the 0s to NA's if you like)
>>>>
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>> On Tue, May 8, 2018 at 9:21 AM, Huzefa Khalil <[hidden email]>
>>>> wrote:
>>>>
>>>>> Dear Miluji,
>>>>>
>>>>> If I understand correctly, this should get you what you need.
>>>>>
>>>>> temp1 <-
>>>>> structure(list(id = 101:115, current_location = structure(c(2L,
>>>>> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>>>>> c("Austin",
>>>>> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>>>>> "New York"), class = "factor"), previous_location = structure(c(6L,
>>>>> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>>>>> c("Atlanta",
>>>>> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>>>>> ), class = "factor")), class = "data.frame", row.names = c(NA,
>>>>> -15L))
>>>>>
>>>>> dcast(temp1, previous_location ~ current_location)
>>>>>
>>>>> On Tue, May 8, 2018 at 12:10 PM, Miluji Sb <[hidden email]> wrote:
>>>>>> I have data on current and previous location of individuals. I would
>>>>> like
>>>>>> to have a matrix with bilateral movement between locations. I would
>>>>> like
>>>>>> the final output to look like the second table below.
>>>>>>
>>>>>> I have tried using crosstab() from the ecodist but I do not have
>>>>> another
>>>>>> variable to measure the flow. Ultimately I would like to compute the
>>>>>> probability of movement between cities (movement to city_i/total
>>>>> movement
>>>>>> from city_j).
>>>>>>
>>>>>> Is it possible to aggregate the data in this way? Any guidance would
>>>>> be
>>>>>> highly appreciated. Thank you!
>>>>>>
>>>>>> # Original data
>>>>>> structure(list(id = 101:115, current_location = structure(c(2L,
>>>>>> 8L, 8L, 3L, 6L, 5L, 1L, 2L, 7L, 4L, 2L, 8L, 8L, 3L, 6L), .Label =
>>>>>> c("Austin",
>>>>>> "Boston", "Cambridge", "Durham", "Houston", "Lynn", "New Orleans",
>>>>>> "New York"), class = "factor"), previous_location = structure(c(6L,
>>>>>> 2L, 4L, 6L, 7L, 5L, 1L, 3L, 6L, 2L, 6L, 2L, 4L, 6L, 7L), .Label =
>>>>>> c("Atlanta",
>>>>>> "Austin", "Cleveland", "Houston", "New Orleans", "OKC", "Tulsa"
>>>>>> ), class = "factor")), class = "data.frame", row.names = c(NA,
>>>>>> -15L))
>>>>>>
>>>>>> # Expected output
>>>>>> structure(list(X = structure(c(3L, 1L, 2L), .Label = c("Austin",
>>>>>> "Houston", "OKC"), class = "factor"), Boston = c(2L, NA, NA),
>>>>>>    New.York = c(NA, 2L, 2L), Cambridge = c(2L, NA, NA)), class =
>>>>>> "data.frame", row.names = c(NA,
>>>>>> -3L))
>>>>>>
>>>>>> Sincerely,
>>>>>>
>>>>>> Milu
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>> ng-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>>> ng-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.