

R 4.0.2
OS X
Colleagues
I have strings that contain a space in an unexpected location. The intended string is:
“STRING 01. Remainder of the string"
However, variants are:
“STR ING 01. Remainder of the string"
“STRIN G 01. Remainder of the string"
I would like a general approach to deleting a space, but only if it appears before the period. Any suggestions on a regular expression for this?
Dennis
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1866PLessThan (18667537784)
www.PLessThan.com < http://www.plessthan.com/>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 20200728 21:20, Dennis Fisher wrote:
> R 4.0.2
> OS X
>
> Colleagues
>
> I have strings that contain a space in an unexpected location. The
> intended string is:
> “STRING 01. Remainder of the string"
> However, variants are:
> “STR ING 01. Remainder of the string"
> “STRIN G 01. Remainder of the string"
>
> I would like a general approach to deleting a space, but only if it
> appears before the period. Any suggestions on a regular expression
> for this?
You aren't deleting the space before 0? Is that in the requirement?
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Only the spaces in STRING. However, if I inadvertently delete the space between STRING and NN, I can add it back in.
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1866PLessThan (18667537784)
www.PLessThan.com < http://www.plessthan.com/>
> On Jul 28, 2020, at 1:29 PM, [hidden email] wrote:
>
> On 20200728 21:20, Dennis Fisher wrote:
>> R 4.0.2
>> OS X
>> Colleagues
>> I have strings that contain a space in an unexpected location. The
>> intended string is:
>> “STRING 01. Remainder of the string"
>> However, variants are:
>> “STR ING 01. Remainder of the string"
>> “STRIN G 01. Remainder of the string"
>> I would like a general approach to deleting a space, but only if it
>> appears before the period. Any suggestions on a regular expression
>> for this?
>
> You aren't deleting the space before 0? Is that in the requirement?
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 20200728 21:31, Dennis Fisher wrote:
> Only the spaces in STRING. However, if I inadvertently delete the
> space between STRING and NN, I can add it back in.
>
Can there only be one space in STR ING or is ST RI NG possible?
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1866PLessThan (18667537784)
> www.PLessThan.com [1]
>
>> On Jul 28, 2020, at 1:29 PM, [hidden email] wrote:
>>
>> On 20200728 21:20, Dennis Fisher wrote:
>>
>>> R 4.0.2
>>> OS X
>>> Colleagues
>>> I have strings that contain a space in an unexpected location.
>>> The
>>> intended string is:
>>> “STRING 01. Remainder of the string"
>>> However, variants are:
>>> “STR ING 01. Remainder of the string"
>>> “STRIN G 01. Remainder of the string"
>>> I would like a general approach to deleting a space, but only if
>>> it
>>> appears before the period. Any suggestions on a regular
>>> expression
>>> for this?
>>
>> You aren't deleting the space before 0? Is that in the requirement?
>
>
>
> Links:
> 
> [1] http://www.plessthan.com/______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


It is possible that there will be > 1 space. But, most likely only one (i.e., a solution for one space will suffice; a solution for > 1 space would be even better)
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1866PLessThan (18667537784)
www.PLessThan.com < http://www.plessthan.com/>
> On Jul 28, 2020, at 1:29 PM, [hidden email] wrote:
>
> On 20200728 21:20, Dennis Fisher wrote:
>> R 4.0.2
>> OS X
>> Colleagues
>> I have strings that contain a space in an unexpected location. The
>> intended string is:
>> “STRING 01. Remainder of the string"
>> However, variants are:
>> “STR ING 01. Remainder of the string"
>> “STRIN G 01. Remainder of the string"
>> I would like a general approach to deleting a space, but only if it
>> appears before the period. Any suggestions on a regular expression
>> for this?
>
> You aren't deleting the space before 0? Is that in the requirement?
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


This RegEx would do it I think: \s(?=.*\s\d*\.)
Looks for space  \s
Before any strings followed by space, numbers, period
text < "STR ING 01. Remainder of the string"
stringr::str_replace_all(text, "\\s(?=.*\\s\\d*\\.)", "")
Should do it I think!
On 20200728 21:34, Dennis Fisher wrote:
> It is possible that there will be > 1 space. But, most likely only
> one (i.e., a solution for one space will suffice; a solution for > 1
> space would be even better)
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1866PLessThan (18667537784)
> www.PLessThan.com [1]
>
>> On Jul 28, 2020, at 1:29 PM, [hidden email] wrote:
>>
>> On 20200728 21:20, Dennis Fisher wrote:
>>
>>> R 4.0.2
>>> OS X
>>> Colleagues
>>> I have strings that contain a space in an unexpected location.
>>> The
>>> intended string is:
>>> “STRING 01. Remainder of the string"
>>> However, variants are:
>>> “STR ING 01. Remainder of the string"
>>> “STRIN G 01. Remainder of the string"
>>> I would like a general approach to deleting a space, but only if
>>> it
>>> appears before the period. Any suggestions on a regular
>>> expression
>>> for this?
>>
>> You aren't deleting the space before 0? Is that in the requirement?
>
>
>
> Links:
> 
> [1] http://www.plessthan.com/______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Dennis,
On 20200728 13:20 0700, Dennis Fisher wrote:
 Colleagues

 I have strings that contain a space in
 an unexpected location. The intended
 string is:
 “STRING 01. Remainder of the string"
 However, variants are:
 “STR ING 01. Remainder of the string"
 “STRIN G 01. Remainder of the string"

 I would like a general approach to
 deleting a space, but only if it
 appears before the period. Any
 suggestions on a regular expression
 for this?
Perhaps by using gregexpr to look for
dots, remove spaces from the substring until the first
finding, then pasting it back.
strings <
c("STRING 01. Remainder of the string.",
"STR ING 01. Remainder of the string.",
"STRIN G 01. Remainder of the string.")
search < gregexpr("\\.", strings)
lens < nchar(strings)
FUN < function(i, strings, search, lens) {
before.dot < substr(strings[i], 1, search[[i]][1])
before.dot < gsub(" ", "", before.dot)
after.dot < substr(strings[i], search[[i]][1]+1, lens[i])
return(paste0(before.dot, after.dot))
}
simplify2array(parallel::mclapply(
X=1:length(strings),
FUN=FUN,
strings=strings,
search=search,
lens=lens))
yields
[1] "STRING01. Remainder of the string."
[2] "STRING01. Remainder of the string."
[3] "STRING01. Remainder of the string."
Yes, I know, the space just before 01
also disappears ...
Best,
Rasmus
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On 20200728 23:00 +0200, Rasmus Liland wrote:

 Perhaps by using gregexpr to look for
 dots, remove spaces from the substring until the first
 finding, then pasting it back.

 strings <
 c("STRING 01. Remainder of the string.",
 "STR ING 01. Remainder of the string.",
 "STRIN G 01. Remainder of the string.")

 search < gregexpr("\\.", strings)
 lens < nchar(strings)
 FUN < function(i, strings, search, lens) {
 before.dot < substr(strings[i], 1, search[[i]][1])
 before.dot < gsub(" ", "", before.dot)
 after.dot < substr(strings[i], search[[i]][1]+1, lens[i])
 return(paste0(before.dot, after.dot))
 }
 simplify2array(parallel::mclapply(
 X=1:length(strings),
 FUN=FUN,
 strings=strings,
 search=search,
 lens=lens))

 yields

 [1] "STRING01. Remainder of the string."
 [2] "STRING01. Remainder of the string."
 [3] "STRING01. Remainder of the string."

 Yes, I know, the space just before 01
 also disappears ...
I forgot about regexpr ... this is
simpler I think:
strings <
c("STRING 01. Remainder of the string.",
"STR ING 01. Remainder of the string.",
"STRIN G 01. Remainder of the string.")
search < regexpr("...\\.", strings) # search for the first dot and three chars in front of it
ml < attr(search, "match.length")
paste0(
gsub(" ", "", substr(strings, 1, search)),
substr(strings, search, search+ml1),
substr(strings, search+ml, nchar(strings))
)
/Rasmus
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


1. Thanks for the nice reprex.
2. However, I thought there was still a bit of ambiguity. I interpreted
your specification to mean: "any number of spaces could occur in the
beginning alphabetic part of the strings before one or more digits occur
followed by a '.' (a period) and then more stuff after."
3. My strategy was simply to split the strings into the first part
consisting of the alphabetic characters and spaces and the second part with
the numbers and everything else. Then I just removed the spaces in the
first part. You can then concatenate them together again (using paste())
however you like. Thus
>x
[1] "STRING 01. Remainder of the string" "STR ING 01. Remainder of the
string"
[3] "STRIN G 01. Remainder of the string"
> p1 <gsub(" ","",gsub("([^[:digit:]]+)[[:digit:]]+\\..*$","\\1",x))
> p2 < gsub("[^[:digit:]]+([[:digit:]]+\\..*$)","\\1",x)
> p1
[1] "STRING" "STRING" "STRING"
> p2
[1] "01. Remainder of the string" "01. Remainder of the string"
[3] "01. Remainder of the string"
I look forward to better approaches using basic regex's (no additional
packages), however.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
 Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Jul 28, 2020 at 1:20 PM Dennis Fisher < [hidden email]> wrote:
> R 4.0.2
> OS X
>
> Colleagues
>
> I have strings that contain a space in an unexpected location. The
> intended string is:
> “STRING 01. Remainder of the string"
> However, variants are:
> “STR ING 01. Remainder of the string"
> “STRIN G 01. Remainder of the string"
>
> I would like a general approach to deleting a space, but only if it
> appears before the period. Any suggestions on a regular expression for
> this?
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1866PLessThan (18667537784)
> www.PLessThan.com < http://www.plessthan.com/>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Note that my previous strategy can be expressed slightly more clearly as:
x < c("STRING 01. Remainder of the string",
"STR ING 01. Remainder of the string",
"STRIN G 01. Remainder of the string",
"STR IN G 01. Remainder of the string") ## more spaces in this last
example entry
rx < "([^[:digit:]]+)([[:digit:]]+.+)"
> gsub(" ","",gsub(rx,"\\1",x))
[1] "STRING" "STRING" "STRING" "STRING"
> gsub(rx,"\\2",x)
[1] "01. Remainder of the string" "01. Remainder of the string"
[3] "01. Remainder of the string" "01. Remainder of the string"
Bert Gunter
On Tue, Jul 28, 2020 at 2:53 PM Bert Gunter < [hidden email]> wrote:
> 1. Thanks for the nice reprex.
> 2. However, I thought there was still a bit of ambiguity. I interpreted
> your specification to mean: "any number of spaces could occur in the
> beginning alphabetic part of the strings before one or more digits occur
> followed by a '.' (a period) and then more stuff after."
> 3. My strategy was simply to split the strings into the first part
> consisting of the alphabetic characters and spaces and the second part with
> the numbers and everything else. Then I just removed the spaces in the
> first part. You can then concatenate them together again (using paste())
> however you like. Thus
>
> >x
> [1] "STRING 01. Remainder of the string" "STR ING 01. Remainder of the
> string"
> [3] "STRIN G 01. Remainder of the string"
> > p1 <gsub(" ","",gsub("([^[:digit:]]+)[[:digit:]]+\\..*$","\\1",x))
> > p2 < gsub("[^[:digit:]]+([[:digit:]]+\\..*$)","\\1",x)
> > p1
> [1] "STRING" "STRING" "STRING"
> > p2
> [1] "01. Remainder of the string" "01. Remainder of the string"
> [3] "01. Remainder of the string"
>
> I look forward to better approaches using basic regex's (no additional
> packages), however.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
>  Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Jul 28, 2020 at 1:20 PM Dennis Fisher < [hidden email]>
> wrote:
>
>> R 4.0.2
>> OS X
>>
>> Colleagues
>>
>> I have strings that contain a space in an unexpected location. The
>> intended string is:
>> “STRING 01. Remainder of the string"
>> However, variants are:
>> “STR ING 01. Remainder of the string"
>> “STRIN G 01. Remainder of the string"
>>
>> I would like a general approach to deleting a space, but only if it
>> appears before the period. Any suggestions on a regular expression for
>> this?
>>
>> Dennis
>>
>> Dennis Fisher MD
>> P < (The "P Less Than" Company)
>> Phone / Fax: 1866PLessThan (18667537784)
>> www.PLessThan.com < http://www.plessthan.com/>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide
>> http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The first response has to be "how did the spaces get there
in the first place?" Can you fix the process that creates
the data? If the process sometimes generates one extra
space, are you sure it never generates two?
But let's treat this purely as a regular expression
problem, where if there is a space before a dot you want
to delete the first. In vi(1) you would do
s/^\([^ .]*\) \([^.]*\)/\1\2/
but apparently there is *supposed* to be a space before
the 01, so it is only when there are two or more spaces
that one should be deleted, so we'd want
s/^\([^ .]*\) \([^ .]* \)/\1\2/
I leave converting that to R as an exercise for the reader.
On Wed, 29 Jul 2020 at 08:20, Dennis Fisher < [hidden email]> wrote:
> R 4.0.2
> OS X
>
> Colleagues
>
> I have strings that contain a space in an unexpected location. The
> intended string is:
> “STRING 01. Remainder of the string"
> However, variants are:
> “STR ING 01. Remainder of the string"
> “STRIN G 01. Remainder of the string"
>
> I would like a general approach to deleting a space, but only if it
> appears before the period. Any suggestions on a regular expression for
> this?
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1866PLessThan (18667537784)
> www.PLessThan.com < http://www.plessthan.com/>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Richard
In reply to your “first response”, the text was originally in a Word document and it did NOT contain the errant spaces. I used read_docx in the textreadr package to access the text. The spaces were added during that step. I am copying the maintainer of that package to see if he has any idea as to the source.
Thanks for your regular expression suggestion.
Dennis
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1866PLessThan (18667537784)
www.PLessThan.com < http://www.plessthan.com/>
> On Jul 28, 2020, at 5:11 PM, Richard O'Keefe < [hidden email]> wrote:
>
> The first response has to be "how did the spaces get there
> in the first place?" Can you fix the process that creates
> the data? If the process sometimes generates one extra
> space, are you sure it never generates two?
>
> But let's treat this purely as a regular expression
> problem, where if there is a space before a dot you want
> to delete the first. In vi(1) you would do
>
> s/^\([^ .]*\) \([^.]*\)/\1\2/
>
> but apparently there is *supposed* to be a space before
> the 01, so it is only when there are two or more spaces
> that one should be deleted, so we'd want
>
> s/^\([^ .]*\) \([^ .]* \)/\1\2/
>
> I leave converting that to R as an exercise for the reader.
>
>
>
>
> On Wed, 29 Jul 2020 at 08:20, Dennis Fisher < [hidden email] <mailto: [hidden email]>> wrote:
> R 4.0.2
> OS X
>
> Colleagues
>
> I have strings that contain a space in an unexpected location. The intended string is:
> “STRING 01. Remainder of the string"
> However, variants are:
> “STR ING 01. Remainder of the string"
> “STRIN G 01. Remainder of the string"
>
> I would like a general approach to deleting a space, but only if it appears before the period. Any suggestions on a regular expression for this?
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1866PLessThan (18667537784)
> www.PLessThan.com < http://www.plessthan.com/> < http://www.plessthan.com/ < http://www.plessthan.com/>>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] <mailto: [hidden email]> mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp < https://stat.ethz.ch/mailman/listinfo/rhelp>
> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html < http://www.rproject.org/postingguide.html>
> and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


The spaces may not have been VISIBLE in the Word document,
but that does not mean that there wasn't anything THERE.
 What happens if you open the document in Word and
save it as plain text?
 What happens if you open the document in Word and
save it as RTF, then read that using read_rtf?
 If you do that, what does the RTF look like?
 Was the Word document typed by hand, or did was it
the result of some other process?
The thing is, "our troubles come not as single spies,
but as whole battalions", so I'm wondering what _else_
is going wrong in the conversion process.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hi!
How about this?
 snip 
> x < c("STRING 01. Remainder of the string","STR ING 01. Remainder
of the string","STRIN G 01. Remainder of the string","STR IN G
01. Remainder of the string")
> x1<unlist(strsplit(x,"\\."))
> for (i in seq(1,length(x1),2)) { x[(i+1) %/% 2]<paste(gsub("
","",x1[i]),x1[i+1],sep=".") }
> x
[1] "STRING01. Remainder of the string" "STRING01. Remainder of the
string" "STRING01. Remainder of the string"
[4] "STRING01. Remainder of the string"
 snip 
Or do I miss something?
Best,
Kimmo
ti, 20200728 kello 17:19 0700, Dennis Fisher kirjoitti:
> Richard
>
> In reply to your “first response”, the text was originally in a Word
> document and it did NOT contain the errant spaces. I used read_docx
> in the textreadr package to access the text. The spaces were added
> during that step. I am copying the maintainer of that package to see
> if he has any idea as to the source.
>
> Thanks for your regular expression suggestion.
>
> Dennis
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1866PLessThan (18667537784)
> www.PLessThan.com < http://www.plessthan.com/>
>
>
>
>
> > On Jul 28, 2020, at 5:11 PM, Richard O'Keefe < [hidden email]>
> > wrote:
> >
> > The first response has to be "how did the spaces get there
> > in the first place?" Can you fix the process that creates
> > the data? If the process sometimes generates one extra
> > space, are you sure it never generates two?
> >
> > But let's treat this purely as a regular expression
> > problem, where if there is a space before a dot you want
> > to delete the first. In vi(1) you would do
> >
> > s/^\([^ .]*\) \([^.]*\)/\1\2/
> >
> > but apparently there is *supposed* to be a space before
> > the 01, so it is only when there are two or more spaces
> > that one should be deleted, so we'd want
> >
> > s/^\([^ .]*\) \([^ .]* \)/\1\2/
> >
> > I leave converting that to R as an exercise for the reader.
> >
> >
> >
> >
> > On Wed, 29 Jul 2020 at 08:20, Dennis Fisher < [hidden email]
> > <mailto: [hidden email]>> wrote:
> > R 4.0.2
> > OS X
> >
> > Colleagues
> >
> > I have strings that contain a space in an unexpected location. The
> > intended string is:
> > “STRING 01. Remainder of the string"
> > However, variants are:
> > “STR ING 01. Remainder of the string"
> > “STRIN G 01. Remainder of the string"
> >
> > I would like a general approach to deleting a space, but only if it
> > appears before the period. Any suggestions on a regular expression
> > for this?
> >
> > Dennis
> >
> > Dennis Fisher MD
> > P < (The "P Less Than" Company)
> > Phone / Fax: 1866PLessThan (18667537784)
> > www.PLessThan.com < http://www.plessthan.com/> <
> > http://www.plessthan.com/ < http://www.plessthan.com/>>
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] <mailto: [hidden email]> mailing list 
> > To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp <
> > https://stat.ethz.ch/mailman/listinfo/rhelp>
> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html <
> > http://www.rproject.org/postingguide.html>
> > and provide commented, minimal, selfcontained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Richard
Per your requests:
1. Plain text: no spaces
2. read_docx: spaces
3. read_rtf: no spaces
4. Not requested by you: copying from the Word document, then pasting into “vim”: no spaces
The Word document was created by hand but #1, #3, and #4 confirm that it contains no spaces. The offending entity here is textreadr:read_dicx
That addresses how the spaces arose. But, my question was not about that — rather I was looking for a general fix when that situation arises.
Dennis
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1866PLessThan (18667537784)
www.PLessThan.com < http://www.plessthan.com/>
> On Jul 28, 2020, at 8:07 PM, Richard O'Keefe < [hidden email]> wrote:
>
> The spaces may not have been VISIBLE in the Word document,
> but that does not mean that there wasn't anything THERE.
>
>  What happens if you open the document in Word and
> save it as plain text?
>  What happens if you open the document in Word and
> save it as RTF, then read that using read_rtf?
>  If you do that, what does the RTF look like?
>  Was the Word document typed by hand, or did was it
> the result of some other process?
>
> The thing is, "our troubles come not as single spies,
> but as whole battalions", so I'm wondering what _else_
> is going wrong in the conversion process.
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

