Merge the data from multiple text files

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Merge the data from multiple text files

R help mailing list-2
I have multiple text files, where each file has Boolean rules.
Example of my text file 1 and 2
Text file 1:
A = not(B or C)
B = A and C
C = D
Text file 2:
A = D and E
B = not(D)

I want to merge the contents in text file as follows
A = not(B or C) and (D and E)
B = not(D) and (A and C)
C = D
Is there a code in R to merge the data from multiple text files?
Thank you
Priya 

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merge the data from multiple text files

David Winsemius

On 1/5/19 7:28 AM, Priya Arasu via R-help wrote:

> I have multiple text files, where each file has Boolean rules.
> Example of my text file 1 and 2
> Text file 1:
> A = not(B or C)
> B = A and C
> C = D
> Text file 2:
> A = D and E
> B = not(D)
>
> I want to merge the contents in text file as follows
> A = not(B or C) and (D and E)
> B = not(D) and (A and C)
> C = D
> Is there a code in R to merge the data from multiple text files?


There is a `merge` function. For this use case you would need to first
parse your expressions so that the LHS was in one character column and
the RHS was in another character column in each of 2 dataframes. Then
merge on the LHS columns and `paste` matching values from the two
columns. You will probably need to learn how to use `ifelse` and `is.na`.

> Thank you
> Priya
>
> [[alternative HTML version deleted]]


You also need to learn that R is a plain text mailing list and that each
mail client has its own method for building mail in plain text.


--

David.

>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merge the data from multiple text files

David Carlson
To expand on David W's answer, here is an approach to your example. If you have many text files, you would want to process them together rather than individually. You gave us two examples so I'll use those and read them from the console using readLines(), but you would use the same function to open the files on your computer:

> TF1 <- readLines(n=3)
A = not(B or C)
B = A and C
C = D
>
> TF2 <- readLines(n=2)
A = D and E
B = not(D)
>
> TF <- sort(c(TF1, TF2))
> TF
[1] "A = D and E"     "A = not(B or C)" "B = A and C"     "B = not(D)"
[5] "C = D"

Now we have combined the files into a single character vector called TF and sorted them. Next we need to parse them into the left and right hand sides. We will replace " = " with "\t" (tab) to do that:

> TF.delim <- gsub(" = ", "\t", TF)
> TF.data <- read.delim(text=TF.delim, header=FALSE, as.is=TRUE)
> colnames(TF.data) <- c("LHS", "RHS")
> print(TF.data, right=FALSE)
  LHS RHS
1 A   D and E
2 A   not(B or C)
3 B   A and C
4 B   not(D)
5 C   D

TF.data is a data frame with two columns. The tricky part is to add surrounding parentheses to rows 1 and 3 to get your example output:

> paren1 <- grepl("and", TF.data$RHS)
> paren2 <- !grepl("\\(*\\)", TF.data$RHS)
> paren <- apply(cbind(paren1, paren2), 1, all)
> TF.data$RHS[paren] <- paste0("(", TF.data$RHS[paren], ")")
> print(TF.data, right=FALSE)
  LHS RHS
1 A   (D and E)
2 A   not(B or C)
3 B   (A and C)
4 B   not(D)
5 C   D

The first three lines identify the rows that have the word "and" but do not already have parentheses. The fourth line adds the surrounding parentheses. Finally we will combine the rows that belong to the same LHS value with split and create a list:

> TF.list <- split(TF.data$RHS, TF.data$LHS)
> TF.list
$`A`
[1] "(D and E)"   "not(B or C)"

$B
[1] "(A and C)" "not(D)"  

$C
[1] "D"

> TF.and <- lapply(TF.list, paste, collapse=" and ")
> TF.final <- lapply(names(TF.and), function(x) paste(x, "=", TF.and[[x]]))
> TF.final <- do.call(rbind, TF.final)
> TF.final
     [,1]                          
[1,] "A = (D and E) and not(B or C)"
[2,] "B = (A and C) and not(D)"
[3,] "C = D"
> write(TF.final, file="TF.output.txt")

The text file "TF.output.txt" contains the three lines.

----------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of David Winsemius
Sent: Saturday, January 5, 2019 1:12 PM
To: Priya Arasu <[hidden email]>; [hidden email]
Subject: Re: [R] Merge the data from multiple text files


On 1/5/19 7:28 AM, Priya Arasu via R-help wrote:

> I have multiple text files, where each file has Boolean rules.
> Example of my text file 1 and 2
> Text file 1:
> A = not(B or C)
> B = A and C
> C = D
> Text file 2:
> A = D and E
> B = not(D)
>
> I want to merge the contents in text file as follows
> A = not(B or C) and (D and E)
> B = not(D) and (A and C)
> C = D
> Is there a code in R to merge the data from multiple text files?


There is a `merge` function. For this use case you would need to first
parse your expressions so that the LHS was in one character column and
the RHS was in another character column in each of 2 dataframes. Then
merge on the LHS columns and `paste` matching values from the two
columns. You will probably need to learn how to use `ifelse` and `is.na`.

> Thank you
> Priya
>
> [[alternative HTML version deleted]]


You also need to learn that R is a plain text mailing list and that each
mail client has its own method for building mail in plain text.


--

David.

>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merge the data from multiple text files

R help mailing list-2
In reply to this post by R help mailing list-2
Thank you David Winsemius and David L Carlson. 
@David L Carlson, Thank you for the code. I have one more issue, while merging the files. Please advice.For example
In text file 1:
A = not(B or C)B = A and CC = D
In text file 2:
A = not(C or D) and (D and E)

So when I merge using your code, it merges A = not(B or C) and (D and E). How do I merge A as A= not(B or C or D) and (D and E) ?  I also have duplicates like A= not(B or C) and not (C or D) instead as A= not(B or C or D) ThanksPriya

    On Sunday, 6 January 2019 4:39 AM, David L Carlson <[hidden email]> wrote:
 

 To expand on David W's answer, here is an approach to your example. If you have many text files, you would want to process them together rather than individually. You gave us two examples so I'll use those and read them from the console using readLines(), but you would use the same function to open the files on your computer:

> TF1 <- readLines(n=3)
A = not(B or C)
B = A and C
C = D
>
> TF2 <- readLines(n=2)
A = D and E
B = not(D)
>
> TF <- sort(c(TF1, TF2))
> TF
[1] "A = D and E"    "A = not(B or C)" "B = A and C"    "B = not(D)"
[5] "C = D"

Now we have combined the files into a single character vector called TF and sorted them. Next we need to parse them into the left and right hand sides. We will replace " = " with "\t" (tab) to do that:

> TF.delim <- gsub(" = ", "\t", TF)
> TF.data <- read.delim(text=TF.delim, header=FALSE, as.is=TRUE)
> colnames(TF.data) <- c("LHS", "RHS")
> print(TF.data, right=FALSE)
  LHS RHS
1 A  D and E
2 A  not(B or C)
3 B  A and C
4 B  not(D)
5 C  D

TF.data is a data frame with two columns. The tricky part is to add surrounding parentheses to rows 1 and 3 to get your example output:

> paren1 <- grepl("and", TF.data$RHS)
> paren2 <- !grepl("\\(*\\)", TF.data$RHS)
> paren <- apply(cbind(paren1, paren2), 1, all)
> TF.data$RHS[paren] <- paste0("(", TF.data$RHS[paren], ")")
> print(TF.data, right=FALSE)
  LHS RHS
1 A  (D and E)
2 A  not(B or C)
3 B  (A and C)
4 B  not(D)
5 C  D

The first three lines identify the rows that have the word "and" but do not already have parentheses. The fourth line adds the surrounding parentheses. Finally we will combine the rows that belong to the same LHS value with split and create a list:

> TF.list <- split(TF.data$RHS, TF.data$LHS)
> TF.list
$`A`
[1] "(D and E)"  "not(B or C)"

$B
[1] "(A and C)" "not(D)" 

$C
[1] "D"

> TF.and <- lapply(TF.list, paste, collapse=" and ")
> TF.final <- lapply(names(TF.and), function(x) paste(x, "=", TF.and[[x]]))
> TF.final <- do.call(rbind, TF.final)
> TF.final
    [,1]                         
[1,] "A = (D and E) and not(B or C)"
[2,] "B = (A and C) and not(D)"
[3,] "C = D"
> write(TF.final, file="TF.output.txt")

The text file "TF.output.txt" contains the three lines.

----------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of David Winsemius
Sent: Saturday, January 5, 2019 1:12 PM

Subject: Re: [R] Merge the data from multiple text files


On 1/5/19 7:28 AM, Priya Arasu via R-help wrote:

> I have multiple text files, where each file has Boolean rules.
> Example of my text file 1 and 2
> Text file 1:
> A = not(B or C)
> B = A and C
> C = D
> Text file 2:
> A = D and E
> B = not(D)
>
> I want to merge the contents in text file as follows
> A = not(B or C) and (D and E)
> B = not(D) and (A and C)
> C = D
> Is there a code in R to merge the data from multiple text files?


There is a `merge` function. For this use case you would need to first
parse your expressions so that the LHS was in one character column and
the RHS was in another character column in each of 2 dataframes. Then
merge on the LHS columns and `paste` matching values from the two
columns. You will probably need to learn how to use `ifelse` and `is.na`.

> Thank you
> Priya
>
>     [[alternative HTML version deleted]]


You also need to learn that R is a plain text mailing list and that each
mail client has its own method for building mail in plain text.


--

David.

>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

   
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merge the data from multiple text files

Jeff Newmiller
I think it is rather presumptuous of you to think that anyone is going to write an expression optimizer for some unspecified language on the R-help mailing list. I am sure that such tasks can be handled in R, but it is non-trivial and the background needed would be very off-topic here.

On January 7, 2019 4:49:04 AM PST, Priya Arasu via R-help <[hidden email]> wrote:

>Thank you David Winsemius and David L Carlson. 
>@David L Carlson, Thank you for the code. I have one more issue, while
>merging the files. Please advice.For example
>In text file 1:
>A = not(B or C)B = A and CC = D
>In text file 2:
>A = not(C or D) and (D and E)
>
>So when I merge using your code, it merges A = not(B or C) and (D and
>E). How do I merge A as A= not(B or C or D) and (D and E) ?  I also
>have duplicates like A= not(B or C) and not (C or D) instead as A=
>not(B or C or D) ThanksPriya
>
>On Sunday, 6 January 2019 4:39 AM, David L Carlson <[hidden email]>
>wrote:
>
>
>To expand on David W's answer, here is an approach to your example. If
>you have many text files, you would want to process them together
>rather than individually. You gave us two examples so I'll use those
>and read them from the console using readLines(), but you would use the
>same function to open the files on your computer:
>
>> TF1 <- readLines(n=3)
>A = not(B or C)
>B = A and C
>C = D
>>
>> TF2 <- readLines(n=2)
>A = D and E
>B = not(D)
>>
>> TF <- sort(c(TF1, TF2))
>> TF
>[1] "A = D and E"    "A = not(B or C)" "B = A and C"    "B = not(D)"
>[5] "C = D"
>
>Now we have combined the files into a single character vector called TF
>and sorted them. Next we need to parse them into the left and right
>hand sides. We will replace " = " with "\t" (tab) to do that:
>
>> TF.delim <- gsub(" = ", "\t", TF)
>> TF.data <- read.delim(text=TF.delim, header=FALSE, as.is=TRUE)
>> colnames(TF.data) <- c("LHS", "RHS")
>> print(TF.data, right=FALSE)
>  LHS RHS
>1 A  D and E
>2 A  not(B or C)
>3 B  A and C
>4 B  not(D)
>5 C  D
>
>TF.data is a data frame with two columns. The tricky part is to add
>surrounding parentheses to rows 1 and 3 to get your example output:
>
>> paren1 <- grepl("and", TF.data$RHS)
>> paren2 <- !grepl("\\(*\\)", TF.data$RHS)
>> paren <- apply(cbind(paren1, paren2), 1, all)
>> TF.data$RHS[paren] <- paste0("(", TF.data$RHS[paren], ")")
>> print(TF.data, right=FALSE)
>  LHS RHS
>1 A  (D and E)
>2 A  not(B or C)
>3 B  (A and C)
>4 B  not(D)
>5 C  D
>
>The first three lines identify the rows that have the word "and" but do
>not already have parentheses. The fourth line adds the surrounding
>parentheses. Finally we will combine the rows that belong to the same
>LHS value with split and create a list:
>
>> TF.list <- split(TF.data$RHS, TF.data$LHS)
>> TF.list
>$`A`
>[1] "(D and E)"  "not(B or C)"
>
>$B
>[1] "(A and C)" "not(D)" 
>
>$C
>[1] "D"
>
>> TF.and <- lapply(TF.list, paste, collapse=" and ")
>> TF.final <- lapply(names(TF.and), function(x) paste(x, "=",
>TF.and[[x]]))
>> TF.final <- do.call(rbind, TF.final)
>> TF.final
>    [,1]                         
>[1,] "A = (D and E) and not(B or C)"
>[2,] "B = (A and C) and not(D)"
>[3,] "C = D"
>> write(TF.final, file="TF.output.txt")
>
>The text file "TF.output.txt" contains the three lines.
>
>----------------------------------------------
>David L. Carlson
>Department of Anthropology
>Texas A&M University
>
>-----Original Message-----
>From: R-help [mailto:[hidden email]] On Behalf Of David
>Winsemius
>Sent: Saturday, January 5, 2019 1:12 PM
>
>Subject: Re: [R] Merge the data from multiple text files
>
>
>On 1/5/19 7:28 AM, Priya Arasu via R-help wrote:
>> I have multiple text files, where each file has Boolean rules.
>> Example of my text file 1 and 2
>> Text file 1:
>> A = not(B or C)
>> B = A and C
>> C = D
>> Text file 2:
>> A = D and E
>> B = not(D)
>>
>> I want to merge the contents in text file as follows
>> A = not(B or C) and (D and E)
>> B = not(D) and (A and C)
>> C = D
>> Is there a code in R to merge the data from multiple text files?
>
>
>There is a `merge` function. For this use case you would need to first
>parse your expressions so that the LHS was in one character column and
>the RHS was in another character column in each of 2 dataframes. Then
>merge on the LHS columns and `paste` matching values from the two
>columns. You will probably need to learn how to use `ifelse` and
>`is.na`.
>
>> Thank you
>> Priya
>>
>>     [[alternative HTML version deleted]]
>
>
>You also need to learn that R is a plain text mailing list and that
>each
>mail client has its own method for building mail in plain text.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Merge the data from multiple text files

David Carlson
Thank you. I couldn't have said it better myself. It would probably be simpler if you process the lines first to remove duplicates and break compound statements into simple statements. Even then it will be a challenge to not end up with statements that are internally contradictory, e.g. (A and B) and not(B).

David C

-----Original Message-----
From: Jeff Newmiller [mailto:[hidden email]]
Sent: Monday, January 7, 2019 11:04 AM
To: Priya Arasu <[hidden email]>; Priya Arasu via R-help <[hidden email]>; David L Carlson <[hidden email]>; David Winsemius <[hidden email]>; [hidden email]
Subject: Re: [R] Merge the data from multiple text files

I think it is rather presumptuous of you to think that anyone is going to write an expression optimizer for some unspecified language on the R-help mailing list. I am sure that such tasks can be handled in R, but it is non-trivial and the background needed would be very off-topic here.

On January 7, 2019 4:49:04 AM PST, Priya Arasu via R-help <[hidden email]> wrote:

>Thank you David Winsemius and David L Carlson. 
>@David L Carlson, Thank you for the code. I have one more issue, while
>merging the files. Please advice.For example
>In text file 1:
>A = not(B or C)B = A and CC = D
>In text file 2:
>A = not(C or D) and (D and E)
>
>So when I merge using your code, it merges A = not(B or C) and (D and
>E). How do I merge A as A= not(B or C or D) and (D and E) ?  I also
>have duplicates like A= not(B or C) and not (C or D) instead as A=
>not(B or C or D) ThanksPriya
>
>On Sunday, 6 January 2019 4:39 AM, David L Carlson <[hidden email]>
>wrote:
>
>
>To expand on David W's answer, here is an approach to your example. If
>you have many text files, you would want to process them together
>rather than individually. You gave us two examples so I'll use those
>and read them from the console using readLines(), but you would use the
>same function to open the files on your computer:
>
>> TF1 <- readLines(n=3)
>A = not(B or C)
>B = A and C
>C = D
>>
>> TF2 <- readLines(n=2)
>A = D and E
>B = not(D)
>>
>> TF <- sort(c(TF1, TF2))
>> TF
>[1] "A = D and E"    "A = not(B or C)" "B = A and C"    "B = not(D)"
>[5] "C = D"
>
>Now we have combined the files into a single character vector called TF
>and sorted them. Next we need to parse them into the left and right
>hand sides. We will replace " = " with "\t" (tab) to do that:
>
>> TF.delim <- gsub(" = ", "\t", TF)
>> TF.data <- read.delim(text=TF.delim, header=FALSE, as.is=TRUE)
>> colnames(TF.data) <- c("LHS", "RHS")
>> print(TF.data, right=FALSE)
>  LHS RHS
>1 A  D and E
>2 A  not(B or C)
>3 B  A and C
>4 B  not(D)
>5 C  D
>
>TF.data is a data frame with two columns. The tricky part is to add
>surrounding parentheses to rows 1 and 3 to get your example output:
>
>> paren1 <- grepl("and", TF.data$RHS)
>> paren2 <- !grepl("\\(*\\)", TF.data$RHS)
>> paren <- apply(cbind(paren1, paren2), 1, all)
>> TF.data$RHS[paren] <- paste0("(", TF.data$RHS[paren], ")")
>> print(TF.data, right=FALSE)
>  LHS RHS
>1 A  (D and E)
>2 A  not(B or C)
>3 B  (A and C)
>4 B  not(D)
>5 C  D
>
>The first three lines identify the rows that have the word "and" but do
>not already have parentheses. The fourth line adds the surrounding
>parentheses. Finally we will combine the rows that belong to the same
>LHS value with split and create a list:
>
>> TF.list <- split(TF.data$RHS, TF.data$LHS)
>> TF.list
>$`A`
>[1] "(D and E)"  "not(B or C)"
>
>$B
>[1] "(A and C)" "not(D)" 
>
>$C
>[1] "D"
>
>> TF.and <- lapply(TF.list, paste, collapse=" and ")
>> TF.final <- lapply(names(TF.and), function(x) paste(x, "=",
>TF.and[[x]]))
>> TF.final <- do.call(rbind, TF.final)
>> TF.final
>    [,1]                         
>[1,] "A = (D and E) and not(B or C)"
>[2,] "B = (A and C) and not(D)"
>[3,] "C = D"
>> write(TF.final, file="TF.output.txt")
>
>The text file "TF.output.txt" contains the three lines.
>
>----------------------------------------------
>David L. Carlson
>Department of Anthropology
>Texas A&M University
>
>-----Original Message-----
>From: R-help [mailto:[hidden email]] On Behalf Of David
>Winsemius
>Sent: Saturday, January 5, 2019 1:12 PM
>
>Subject: Re: [R] Merge the data from multiple text files
>
>
>On 1/5/19 7:28 AM, Priya Arasu via R-help wrote:
>> I have multiple text files, where each file has Boolean rules.
>> Example of my text file 1 and 2
>> Text file 1:
>> A = not(B or C)
>> B = A and C
>> C = D
>> Text file 2:
>> A = D and E
>> B = not(D)
>>
>> I want to merge the contents in text file as follows
>> A = not(B or C) and (D and E)
>> B = not(D) and (A and C)
>> C = D
>> Is there a code in R to merge the data from multiple text files?
>
>
>There is a `merge` function. For this use case you would need to first
>parse your expressions so that the LHS was in one character column and
>the RHS was in another character column in each of 2 dataframes. Then
>merge on the LHS columns and `paste` matching values from the two
>columns. You will probably need to learn how to use `ifelse` and
>`is.na`.
>
>> Thank you
>> Priya
>>
>>     [[alternative HTML version deleted]]
>
>
>You also need to learn that R is a plain text mailing list and that
>each
>mail client has its own method for building mail in plain text.

--
Sent from my phone. Please excuse my brevity.
______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.