Checking for similar file names in two different directories

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Checking for similar file names in two different directories

Thomas Subia-2
Colleagues,

I have two locations where my data resides.
One folder is for data taken under treatment A
One folder is for data taken under treatment B

"G:\ 0020-49785 10806.xls"
"Q:\ 301864 4519 10806.xls"

Here the 10806 is the part which is common to both directories.

Is there a way to have R extract parts common to both directories?

Thomas Subia
Statistician / Senior Quality Engineer
ASQ CQE

IMG Companies 
225 Mountain Vista Parkway
Livermore, CA 94551
T. (925) 273-1106
F. (925) 273-1111
E. [hidden email]


Precision Manufacturing for Emerging Technologies
imgprecision.com 

The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Checking for similar file names in two different directories

Bert Gunter-2
?list.files and ?regexp

Warning: following obviously untested:

Gfiles <- list.files("G:", pattern = ".*10806\\.xls$")

should then give you a vector of character names of the files you want to
feed to read.xls() or whatever function exists in the favored package is
for reading Excel files these days.

Cheers,
Bert



On Thu, Dec 26, 2019 at 9:54 AM Thomas Subia <[hidden email]>
wrote:

> Colleagues,
>
> I have two locations where my data resides.
> One folder is for data taken under treatment A
> One folder is for data taken under treatment B
>
> "G:\ 0020-49785 10806.xls"
> "Q:\ 301864 4519 10806.xls"
>
> Here the 10806 is the part which is common to both directories.
>
> Is there a way to have R extract parts common to both directories?
>
> Thomas Subia
> Statistician / Senior Quality Engineer
> ASQ CQE
>
> IMG Companies
> 225 Mountain Vista Parkway
> Livermore, CA 94551
> T. (925) 273-1106
> F. (925) 273-1111
> E. [hidden email]
>
>
> Precision Manufacturing for Emerging Technologies
> imgprecision.com
>
> The contents of this message, together with any attachments, are intended
> only for the use of the individual or entity to which they are addressed
> and may contain information that is legally privileged, confidential and
> exempt from disclosure. If you are not the intended recipient, you are
> hereby notified that any dissemination, distribution, or copying of this
> message, or any attachment, is strictly prohibited. If you have received
> this message in error, please notify the original sender or IMG Companies,
> LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and
> delete this message, along with any attachments, from your computer. Thank
> you.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Checking for similar file names in two different directories

Rui Barradas
In reply to this post by Thomas Subia-2
Hello,

I am not sure if the following code is what you need but maybe you can
get some inspiration from it.


x <- c("G:\ 0020-49785 10806.xls", "Q:\ 301864 4519 10806.xls")

y <- strsplit(x, split = "[^[:alnum:]]+")
eq <- sapply(y[[1]], `==`, y[[2]])
i <- apply(eq, 1, function(e) Reduce(`|`, e))

y[[1]][i]
#[1] "10806" "xls"


This returns "10806" but also returns the file extension "xls".
And it could be made to loop through a vector of filenames.


Hope this helps,

Rui Barradas

Às 17:54 de 26/12/19, Thomas Subia escreveu:

> Colleagues,
>
> I have two locations where my data resides.
> One folder is for data taken under treatment A
> One folder is for data taken under treatment B
>
> "G:\ 0020-49785 10806.xls"
> "Q:\ 301864 4519 10806.xls"
>
> Here the 10806 is the part which is common to both directories.
>
> Is there a way to have R extract parts common to both directories?
>
> Thomas Subia
> Statistician / Senior Quality Engineer
> ASQ CQE
>
> IMG Companies
> 225 Mountain Vista Parkway
> Livermore, CA 94551
> T. (925) 273-1106
> F. (925) 273-1111
> E. [hidden email]
>
>
> Precision Manufacturing for Emerging Technologies
> imgprecision.com
>
> The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Checking for similar file names in two different directories

Richard O'Keefe-2
In reply to this post by Thomas Subia-2
I think you had better start by defining what you mean by "similar".
Examples are good, but not enough.

On Fri, 27 Dec 2019 at 06:54, Thomas Subia <[hidden email]> wrote:

>
> Colleagues,
>
> I have two locations where my data resides.
> One folder is for data taken under treatment A
> One folder is for data taken under treatment B
>
> "G:\ 0020-49785 10806.xls"
> "Q:\ 301864 4519 10806.xls"
>
> Here the 10806 is the part which is common to both directories.
>
> Is there a way to have R extract parts common to both directories?
>
> Thomas Subia
> Statistician / Senior Quality Engineer
> ASQ CQE
>
> IMG Companies
> 225 Mountain Vista Parkway
> Livermore, CA 94551
> T. (925) 273-1106
> F. (925) 273-1111
> E. [hidden email]
>
>
> Precision Manufacturing for Emerging Technologies
> imgprecision.com
>
> The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Checking for similar file names in two different directories

Bert Gunter-2
In reply to this post by Thomas Subia-2
AHA! -- I think I now see what you mean.

My previous suggestion was almost useless as it assumes you already know
what the "common" parts are ... but you don't.

However, if it is the filename parts at the end are separated by spaces
from the preceding part of the filename, i.e. like "stuff xxxxxxx.xls",
then something like the following example would work I think:

## Read in *all* the filenames from both directories as I previously
suggested.

Gfiles <- list.files("G:")
Qfiles <- list.files("Q:")

Suppose this gave you (a simplified example):

> Gfiles
 [1] "kjqdx 157.xls" "aorgz 287.xls" "ioldc 380.xls" "fpnxr 509.xls"
 [5] "wytcg 853.xls" "xujos 964.xls" "xdeto 217.xls" "nqriu 574.xls"
 [9] "jclir 480.xls" "fndyu 769.xls"
> Qfiles
 [1] "vexrb 509.xls" "jxeio 770.xls" "zhmwf 920.xls" "cajdq 287.xls"
 [5] "nwdic 259.xls" "sqjkb 889.xls" "brhfu 157.xls" "uyirq 574.xls"
 [9] "ijfqm 480.xls" "nedhj 982.xls"

## all that's important is the " xxx.xls" at the end
## extract the filename part, omitting the ".xls" using regex's
> Gnm <- sub("^.+ (.+)\\.xls$","\\1",Gfiles)
> Qnm <- sub("^.+ (.+)\\.xls$","\\1",Qfiles)

> Gnm
 [1] "157" "287" "380" "509" "853" "964" "217" "574" "480" "769"
> Qnm
 [1] "509" "770" "920" "287" "259" "889" "157" "574" "480" "982"

> ## The 'common' parts are:
> intersect(Gnm,Qnm)
[1] "157" "287" "509" "574" "480"

You can now use these as I described previously to extract your common
files.

A similar strategy can be used for any other definition of "common" you
wish to use *provided* you can uniquely and specifically define "common" to
match  in the filenames.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Dec 26, 2019 at 9:54 AM Thomas Subia <[hidden email]>
wrote:

> Colleagues,
>
> I have two locations where my data resides.
> One folder is for data taken under treatment A
> One folder is for data taken under treatment B
>
> "G:\ 0020-49785 10806.xls"
> "Q:\ 301864 4519 10806.xls"
>
> Here the 10806 is the part which is common to both directories.
>
> Is there a way to have R extract parts common to both directories?
>
> Thomas Subia
> Statistician / Senior Quality Engineer
> ASQ CQE
>
> IMG Companies
> 225 Mountain Vista Parkway
> Livermore, CA 94551
> T. (925) 273-1106
> F. (925) 273-1111
> E. [hidden email]
>
>
> Precision Manufacturing for Emerging Technologies
> imgprecision.com
>
> The contents of this message, together with any attachments, are intended
> only for the use of the individual or entity to which they are addressed
> and may contain information that is legally privileged, confidential and
> exempt from disclosure. If you are not the intended recipient, you are
> hereby notified that any dissemination, distribution, or copying of this
> message, or any attachment, is strictly prohibited. If you have received
> this message in error, please notify the original sender or IMG Companies,
> LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and
> delete this message, along with any attachments, from your computer. Thank
> you.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.