Filtering data with dplyr or grep and losing data?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Filtering data with dplyr or grep and losing data?

Satish Vadlamani-2
Hello Experts:

I have this log file that has about 1200 characters (max) on a line. What I
want to do is read this first and then extract certain portions of the file
into new columns. I want to extract rows that contain the text “[DF_API:
input string]”. When I read it and then filter based on the rows that I am
interested, it almost seems like I am losing data. I tried this using the
dplyr filter and using standard grep with the same result.

Not sure why this is the case. Appreciate your help with this. The code and
the data is there at the following link. Satish

Code is given below

library(dplyr)
setwd("C:/Users/satis/Documents/VF/df_issue_dec01")

sec1 <- read.delim(file="secondary1_aa_small.log")
head(sec1)
names(sec1) <- c("V1")
sec1_test <- filter(sec1,str_detect(V1,"DF_API: input string")==TRUE)
head(sec1_test)

sec1_test2 = sec1[grep("DF_API: input string",sec1$V1, perl = TRUE),]
head(sec1_test2)

write.csv(sec1_test, file = "test_out.txt", row.names = F, quote = F)
write.csv(sec1_test2, file = "test2_out.txt", row.names = F, quote = F)

Data (and code) is given at the link below. Sorry, I should have used dput.

https://spaces.hightail.com/space/arJlYkgIev


Satish Vadlamani

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Filtering data with dplyr or grep and losing data?

Sarah Goslee
Hi,

What does, "it almost seems like I am losing data mean?

Are you losing data? If so, what rows are being excluded that you
think should be included?

There are 90 rows in the test file that meet your criterion, as far as
I can tell, and 90 rows in my R output.

So apparently "almost seems like I am losing data" means "I am not
losing data but am confused."

So we need to know more about what you are expecting but not getting.

Sarah

On Wed, Dec 19, 2018 at 3:21 PM Satish Vadlamani
<[hidden email]> wrote:

>
> Hello Experts:
>
> I have this log file that has about 1200 characters (max) on a line. What I
> want to do is read this first and then extract certain portions of the file
> into new columns. I want to extract rows that contain the text “[DF_API:
> input string]”. When I read it and then filter based on the rows that I am
> interested, it almost seems like I am losing data. I tried this using the
> dplyr filter and using standard grep with the same result.
>
> Not sure why this is the case. Appreciate your help with this. The code and
> the data is there at the following link. Satish
>
> Code is given below
>
> library(dplyr)
> setwd("C:/Users/satis/Documents/VF/df_issue_dec01")
>
> sec1 <- read.delim(file="secondary1_aa_small.log")
> head(sec1)
> names(sec1) <- c("V1")
> sec1_test <- filter(sec1,str_detect(V1,"DF_API: input string")==TRUE)
> head(sec1_test)
>
> sec1_test2 = sec1[grep("DF_API: input string",sec1$V1, perl = TRUE),]
> head(sec1_test2)
>
> write.csv(sec1_test, file = "test_out.txt", row.names = F, quote = F)
> write.csv(sec1_test2, file = "test2_out.txt", row.names = F, quote = F)
>
> Data (and code) is given at the link below. Sorry, I should have used dput.
>
> https://spaces.hightail.com/space/arJlYkgIev
>
>
> Satish Vadlamani
>


--
Sarah Goslee (she/her)
http://www.numberwright.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.