create groups from data with duplicates, such that each group has a duplicate represented once

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

create groups from data with duplicates, such that each group has a duplicate represented once

kevinkariuki
Hi, I have a sequencing run with ~3000 samples (attached dataset). The samples were initially tagged and amplified by PCR in duplicate. The tags used range from MID01 to MID26.

MID01-MID13 were used for pair 1 while MID14-MID26 were used for pair 2. The tags are re-used to allow samples to be pooled.

The pooling process will involve mixing samples with MID01-26 into the first group, the next group samples with MID01-26 into the second group and so on.

I'm hoping to get an R script that can create these groups such that for each group, any of the Tags appears only once. An example is shown below.

ID

TagA

TagB

group

180

MID03

MID10

group1

181

MID04

MID06

group1

182

MID05

MID07

group1

183

MID03

MID09

group2

184

MID04

MID10

group2

185

MID05

MID06

group2

186

MID01

MID06

group3

187

MID02

MID07

group3

188

MID03

MID08

group3



______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: create groups from data with duplicates, such that each group has a duplicate represented once

PIKAL Petr
Hi

Instead of attachment which is usually removed you should use dput

Something like output from
dput(head(yourdata,30))

To remove duplicate values see

unique or duplicated

Cheers
Petr

> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Kevin Wamae
> Sent: Thursday, January 17, 2019 1:29 AM
> To: [hidden email]
> Subject: [R] create groups from data with duplicates, such that each group has
> a duplicate represented once
>
> Hi, I have a sequencing run with ~3000 samples (attached dataset). The
> samples were initially tagged and amplified by PCR in duplicate. The tags used
> range from MID01 to MID26.
>
> MID01-MID13 were used for pair 1 while MID14-MID26 were used for pair 2.
> The tags are re-used to allow samples to be pooled.
>
> The pooling process will involve mixing samples with MID01-26 into the first
> group, the next group samples with MID01-26 into the second group and so on.
>
> I'm hoping to get an R script that can create these groups such that for each
> group, any of the Tags appears only once. An example is shown below.
>
> ID
>
> TagA
>
> TagB
>
> group
>
> 180
>
> MID03
>
> MID10
>
> group1
>
> 181
>
> MID04
>
> MID06
>
> group1
>
> 182
>
> MID05
>
> MID07
>
> group1
>
> 183
>
> MID03
>
> MID09
>
> group2
>
> 184
>
> MID04
>
> MID10
>
> group2
>
> 185
>
> MID05
>
> MID06
>
> group2
>
> 186
>
> MID01
>
> MID06
>
> group3
>
> 187
>
> MID02
>
> MID07
>
> group3
>
> 188
>
> MID03
>
> MID08
>
> group3
>
>
>
> ___________________________________________________________________
> ___
>
> This e-mail contains information which is confidential. It is intended only for
> the use of the named recipient. If you have received this e-mail in error, please
> let us know by replying to the sender, and immediately delete it from your
> system.  Please note, that in these circumstances, the use, disclosure,
> distribution or copying of this information is strictly prohibited. KEMRI-
> Wellcome Trust Programme cannot accept any responsibility for the  accuracy
> or completeness of this message as it has been transmitted over a public
> network. Although the Programme has taken reasonable precautions to ensure
> no viruses are present in emails, it cannot accept responsibility for any loss or
> damage arising from the use of the email or attachments. Any views expressed
> in this message are those of the individual sender, except where the sender
> specifically states them to be the views of KEMRI-Wellcome Trust Programme.
> ___________________________________________________________________
> ___
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: create groups from data with duplicates, such that each group has a duplicate represented once

kevinkariuki
Dear Petr, thank you for the guidance.

A colleague managed to solve it....

I'll definitely use "dput" for future postings.

Regards
------------------
Kevin Wamae

On 17/01/2019, 03:57, "PIKAL Petr" <[hidden email]> wrote:

    Hi
   
    Instead of attachment which is usually removed you should use dput
   
    Something like output from
    dput(head(yourdata,30))
   
    To remove duplicate values see
   
    unique or duplicated
   
    Cheers
    Petr
   
    > -----Original Message-----
    > From: R-help <[hidden email]> On Behalf Of Kevin Wamae
    > Sent: Thursday, January 17, 2019 1:29 AM
    > To: [hidden email]
    > Subject: [R] create groups from data with duplicates, such that each group has
    > a duplicate represented once
    >
    > Hi, I have a sequencing run with ~3000 samples (attached dataset). The
    > samples were initially tagged and amplified by PCR in duplicate. The tags used
    > range from MID01 to MID26.
    >
    > MID01-MID13 were used for pair 1 while MID14-MID26 were used for pair 2.
    > The tags are re-used to allow samples to be pooled.
    >
    > The pooling process will involve mixing samples with MID01-26 into the first
    > group, the next group samples with MID01-26 into the second group and so on.
    >
    > I'm hoping to get an R script that can create these groups such that for each
    > group, any of the Tags appears only once. An example is shown below.
    >
    > ID
    >
    > TagA
    >
    > TagB
    >
    > group
    >
    > 180
    >
    > MID03
    >
    > MID10
    >
    > group1
    >
    > 181
    >
    > MID04
    >
    > MID06
    >
    > group1
    >
    > 182
    >
    > MID05
    >
    > MID07
    >
    > group1
    >
    > 183
    >
    > MID03
    >
    > MID09
    >
    > group2
    >
    > 184
    >
    > MID04
    >
    > MID10
    >
    > group2
    >
    > 185
    >
    > MID05
    >
    > MID06
    >
    > group2
    >
    > 186
    >
    > MID01
    >
    > MID06
    >
    > group3
    >
    > 187
    >
    > MID02
    >
    > MID07
    >
    > group3
    >
    > 188
    >
    > MID03
    >
    > MID08
    >
    > group3
    >
    >
    >
    > ___________________________________________________________________
    > ___
    >
    > This e-mail contains information which is confidential. It is intended only for
    > the use of the named recipient. If you have received this e-mail in error, please
    > let us know by replying to the sender, and immediately delete it from your
    > system.  Please note, that in these circumstances, the use, disclosure,
    > distribution or copying of this information is strictly prohibited. KEMRI-
    > Wellcome Trust Programme cannot accept any responsibility for the  accuracy
    > or completeness of this message as it has been transmitted over a public
    > network. Although the Programme has taken reasonable precautions to ensure
    > no viruses are present in emails, it cannot accept responsibility for any loss or
    > damage arising from the use of the email or attachments. Any views expressed
    > in this message are those of the individual sender, except where the sender
    > specifically states them to be the views of KEMRI-Wellcome Trust Programme.
    > ___________________________________________________________________
    > ___
    > ______________________________________________
    > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
    Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
    Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/
   
   


______________________________________________________________________

This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system.  Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the  accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme.
______________________________________________________________________
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.