family

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

family

Val-17
Hi all,
I am reading a huge data set(12M rows) that contains family information,
Offspring, Parent1 and Parent2

Parent1 and parent2 should be in the first column as an offspring
before their offspring information. Their parent information (parent1
and parent2) should be  set to zero, if unknown.  Also the first
column should be unique.


Here is my sample data  set  and desired output.


fam <- read.table(textConnection(" offspring  Parent1 Parent2
Smith Alex1  Alexa
Carla Alex1     0
Jacky Smith   Abbot
Jack  0       Jacky
Almo  Jack    Carla
 "),header = TRUE)



desired output.
Offspring Parent1 Parent2
Alex1      0        0
Alexa      0        0
Abbot      0        0
Smith    Alex1  Alexa
Carla    Alex1      0
Jacky    Smith   Abbot
Jack       0     Jacky
Almo     Jack    Carla

Thank you.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: family

Jeff Newmiller
This question is about algorithm help... or rather, "do my work for me", not about R.

Study up on "directed acyclic graphs" [1]... there actually are some packages related to such data structures on CRAN (e.g. pooh::tsort, Task View gR "gRaphical Models in R"), but you should at least be aware of the possible approaches before we talk about implementing (that is the "R" part that is on topic here) one of them on this list.

[1] https://en.wikipedia.org/wiki/Topological_sorting
--
Sent from my phone. Please excuse my brevity.

On November 17, 2017 4:28:09 PM PST, Val <[hidden email]> wrote:

>Hi all,
>I am reading a huge data set(12M rows) that contains family
>information,
>Offspring, Parent1 and Parent2
>
>Parent1 and parent2 should be in the first column as an offspring
>before their offspring information. Their parent information (parent1
>and parent2) should be  set to zero, if unknown.  Also the first
>column should be unique.
>
>
>Here is my sample data  set  and desired output.
>
>
>fam <- read.table(textConnection(" offspring  Parent1 Parent2
>Smith Alex1  Alexa
>Carla Alex1     0
>Jacky Smith   Abbot
>Jack  0       Jacky
>Almo  Jack    Carla
> "),header = TRUE)
>
>
>
>desired output.
>Offspring Parent1 Parent2
>Alex1      0        0
>Alexa      0        0
>Abbot      0        0
>Smith    Alex1  Alexa
>Carla    Alex1      0
>Jacky    Smith   Abbot
>Jack       0     Jacky
>Almo     Jack    Carla
>
>Thank you.
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: family

David Winsemius
In reply to this post by Val-17

> On Nov 17, 2017, at 4:28 PM, Val <[hidden email]> wrote:
>
> Hi all,
> I am reading a huge data set(12M rows) that contains family information,
> Offspring, Parent1 and Parent2
>
> Parent1 and parent2 should be in the first column as an offspring
> before their offspring information. Their parent information (parent1
> and parent2) should be  set to zero, if unknown.  Also the first
> column should be unique.
>
>
> Here is my sample data  set  and desired output.
>
>
> fam <- read.table(textConnection(" offspring  Parent1 Parent2
> Smith Alex1  Alexa
> Carla Alex1     0
> Jacky Smith   Abbot
> Jack  0       Jacky
> Almo  Jack    Carla
> "),header = TRUE)
>
>
>
> desired output.
> Offspring Parent1 Parent2
> Alex1      0        0
> Alexa      0        0
> Abbot      0        0
> Smith    Alex1  Alexa
> Carla    Alex1      0
> Jacky    Smith   Abbot
> Jack       0     Jacky
> Almo     Jack    Carla

You might get useful ideas by looking at ?'%in%" and ?union (set operations)

> fam$Parent1[!fam$Parent1 %in% fam$offspring]
[1] "Alex1" "Alex1" "0"    
> fam$Parent2[!fam$Parent1 %in% fam$offspring]
[1] "Alexa" "0"     "Jacky"

David.
>
> Thank you.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.