Unable to read csv files with comma in values

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Unable to read csv files with comma in values

amit.cs03@gmail.com
Hi,

I have a bunch of csv files to read in R. I'm unable to read them correctly
because in some of the files, there is a column ("Role") which has comma in
the values.

Sample data:

User, Role, Rule, GAPId
Sam, [HadoopAnalyst, DBA, Developer], R46443

I'm trying to play with the below code but it doesnt work:

files <- list.files(pattern='.*REDUNDANT(.*).csv$')

tbl <- sapply(files, function(f) {
  gsub('\\[|\\]', '"', readLines(f)) %>%
    read.csv(text = ., check.names = FALSE)
}) %>%
  bind_rows(.id = "id") %>%
  select(id, User, Rule) %>%
  distinct()

Please assist.

Thanks

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unable to read csv files with comma in values

Duncan Murdoch-2
On 06/04/2019 10:03 a.m., Amit Govil wrote:

> Hi,
>
> I have a bunch of csv files to read in R. I'm unable to read them correctly
> because in some of the files, there is a column ("Role") which has comma in
> the values.
>
> Sample data:
>
> User, Role, Rule, GAPId
> Sam, [HadoopAnalyst, DBA, Developer], R46443
>
> I'm trying to play with the below code but it doesnt work:

Since you didn't give a reproducible example, you should at least say
what "doesn't work" means.

But here's some general advice:  if you want to debug code, don't write
huge expressions like the chain of functions below, put things in
temporary variables and make sure you get what you were expecting at
each stage.

Instead of

>
> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>
> tbl <- sapply(files, function(f) {
>    gsub('\\[|\\]', '"', readLines(f)) %>%
>      read.csv(text = ., check.names = FALSE)
> }) %>%
>    bind_rows(.id = "id") %>%
>    select(id, User, Rule) %>%
>    distinct()

try


files <- list.files(pattern='.*REDUNDANT(.*).csv$')

tmp1 <- sapply(files, function(f) {
   gsub('\\[|\\]', '"', readLines(f)) %>%
     read.csv(text = ., check.names = FALSE)
})

tmp2 <- tmp1 %>% bind_rows(.id = "id")

tmp3 <- tmp2 %>% select(id, User, Rule)

tbl <- tmp3 %>% distinct()

(You don't need pipes here, but it will make it easier to put the giant
expression back together at the end.)

Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
wrong.

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unable to read csv files with comma in values

Bert Gunter-2
(Note: This follows an earlier mistaken reply just to Duncan)

Multiple "amens!" to Duncan's comments...

However:

Here is a start at my interpretation of how to do what you want. Note first
that your "example" listed 4 fields in the line, but you showed only 3. I
modified your example for 3 text fields, only one of which has brackets
([...]) in it I assume. Here is a little example of how to use regex's to
replace the commas within the brackets by "-", which would presumably then
allow you to easily convert the text into a data frame e.g. using
textConnection() and read.csv. Obviously, if this is not what you meant,
read no further.

##Example
txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
        "Mary, [Stats, Designer, R], t14")

wh <- grep("\\[.+\\]",txt)  ## which records need to be modified?
fixup <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ## bracketed
expressions, changing "," to "-"

## Unfortunately, the "replacement" argument in sub() is not vectorized, se
we need a loop:

for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ## replace
original bracketed text with fixed up bracketed text

> txt
[1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 "
[2] "Jan, DBA, R101"
[3] "Mary, [HadoopAnalyst-DBA-Developer], t14"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <[hidden email]>
wrote:

> On 06/04/2019 10:03 a.m., Amit Govil wrote:
> > Hi,
> >
> > I have a bunch of csv files to read in R. I'm unable to read them
> correctly
> > because in some of the files, there is a column ("Role") which has comma
> in
> > the values.
> >
> > Sample data:
> >
> > User, Role, Rule, GAPId
> > Sam, [HadoopAnalyst, DBA, Developer], R46443
> >
> > I'm trying to play with the below code but it doesnt work:
>
> Since you didn't give a reproducible example, you should at least say
> what "doesn't work" means.
>
> But here's some general advice:  if you want to debug code, don't write
> huge expressions like the chain of functions below, put things in
> temporary variables and make sure you get what you were expecting at
> each stage.
>
> Instead of
> >
> > files <- list.files(pattern='.*REDUNDANT(.*).csv$')
> >
> > tbl <- sapply(files, function(f) {
> >    gsub('\\[|\\]', '"', readLines(f)) %>%
> >      read.csv(text = ., check.names = FALSE)
> > }) %>%
> >    bind_rows(.id = "id") %>%
> >    select(id, User, Rule) %>%
> >    distinct()
>
> try
>
>
> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>
> tmp1 <- sapply(files, function(f) {
>    gsub('\\[|\\]', '"', readLines(f)) %>%
>      read.csv(text = ., check.names = FALSE)
> })
>
> tmp2 <- tmp1 %>% bind_rows(.id = "id")
>
> tmp3 <- tmp2 %>% select(id, User, Rule)
>
> tbl <- tmp3 %>% distinct()
>
> (You don't need pipes here, but it will make it easier to put the giant
> expression back together at the end.)
>
> Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
> wrong.
>
> Duncan Murdoch
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unable to read csv files with comma in values

Bert Gunter-2
... and here's another perhaps simpler, perhaps more efficient (??) way of
doing it using strsplit().Note that it uses the fixed field position, 2, of
the bracketed roles. Adjust as needed.

A better solution would be a regex that avoids the loops (here, the sapply)
altogether, but I don't know how to do this. Maybe someone cleverer will
offer such a solution.

txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
        "Mary, [Stats, Designer, R], t14")

wh <-  grep("\\[.+\\]", txt)
spl <-  strsplit(txt[wh], "\\[|\\]")
txt[wh] <-  sapply(spl, function(y)
   paste0(y[1], gsub(" *, *","-", y[2]), y[-(1:2)]))

> txt
[1] "Sam, HadoopAnalyst-DBA-Developer, R46443 "
[2] "Jan, DBA, R101"
[3] "Mary, Stats-Designer-R, t14"

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Apr 7, 2019 at 9:55 AM Bert Gunter <[hidden email]> wrote:

> (Note: This follows an earlier mistaken reply just to Duncan)
>
> Multiple "amens!" to Duncan's comments...
>
> However:
>
> Here is a start at my interpretation of how to do what you want. Note
> first that your "example" listed 4 fields in the line, but you showed only
> 3. I modified your example for 3 text fields, only one of which has
> brackets ([...]) in it I assume. Here is a little example of how to use
> regex's to replace the commas within the brackets by "-", which would
> presumably then allow you to easily convert the text into a data frame e.g.
> using textConnection() and read.csv. Obviously, if this is not what you
> meant, read no further.
>
> ##Example
> txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
>         "Mary, [Stats, Designer, R], t14")
>
> wh <- grep("\\[.+\\]",txt)  ## which records need to be modified?
> fixup <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ##
> bracketed expressions, changing "," to "-"
>
> ## Unfortunately, the "replacement" argument in sub() is not vectorized,
> se we need a loop:
>
> for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ## replace
> original bracketed text with fixed up bracketed text
>
> > txt
> [1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 "
> [2] "Jan, DBA, R101"
> [3] "Mary, [HadoopAnalyst-DBA-Developer], t14"
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <[hidden email]>
> wrote:
>
>> On 06/04/2019 10:03 a.m., Amit Govil wrote:
>> > Hi,
>> >
>> > I have a bunch of csv files to read in R. I'm unable to read them
>> correctly
>> > because in some of the files, there is a column ("Role") which has
>> comma in
>> > the values.
>> >
>> > Sample data:
>> >
>> > User, Role, Rule, GAPId
>> > Sam, [HadoopAnalyst, DBA, Developer], R46443
>> >
>> > I'm trying to play with the below code but it doesnt work:
>>
>> Since you didn't give a reproducible example, you should at least say
>> what "doesn't work" means.
>>
>> But here's some general advice:  if you want to debug code, don't write
>> huge expressions like the chain of functions below, put things in
>> temporary variables and make sure you get what you were expecting at
>> each stage.
>>
>> Instead of
>> >
>> > files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>> >
>> > tbl <- sapply(files, function(f) {
>> >    gsub('\\[|\\]', '"', readLines(f)) %>%
>> >      read.csv(text = ., check.names = FALSE)
>> > }) %>%
>> >    bind_rows(.id = "id") %>%
>> >    select(id, User, Rule) %>%
>> >    distinct()
>>
>> try
>>
>>
>> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>>
>> tmp1 <- sapply(files, function(f) {
>>    gsub('\\[|\\]', '"', readLines(f)) %>%
>>      read.csv(text = ., check.names = FALSE)
>> })
>>
>> tmp2 <- tmp1 %>% bind_rows(.id = "id")
>>
>> tmp3 <- tmp2 %>% select(id, User, Rule)
>>
>> tbl <- tmp3 %>% distinct()
>>
>> (You don't need pipes here, but it will make it easier to put the giant
>> expression back together at the end.)
>>
>> Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
>> wrong.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unable to read csv files with comma in values

Bert Gunter-2
... and if anyone cares, here's a way to do it using vectorization (no
loops) by working only on the subvector containing bracketed text  and
using the brackets to break up the strings into 3 separate pieces,
replacing the commas in the middle piece with dashes, and then
reassembling. Quite clumsy, so a better solution is still needed, but here
it is:

txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
        "Mary, [Stats, Designer, R], t14")
wh <- grep("\\[.+\\]",txt)
txt1 <- sub("(.+), *\\[.+","\\1",txt[wh]) ## before "["
txt2 <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ## bracketed
part
txt3 <- sub(".*\\], *(.+?) *$","\\1",txt[wh]) ## after "]"
txt[wh]<- paste(txt1, txt2, txt3, sep = ", ")

> txt
[1] "Sam, [HadoopAnalyst-DBA-Developer], R46443"
[2] "Jan, DBA, R101"
[3] "Mary, [Stats-Designer-R], t14"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Apr 7, 2019 at 10:35 AM Bert Gunter <[hidden email]> wrote:

> ... and here's another perhaps simpler, perhaps more efficient (??) way of
> doing it using strsplit().Note that it uses the fixed field position, 2, of
> the bracketed roles. Adjust as needed.
>
> A better solution would be a regex that avoids the loops (here, the
> sapply) altogether, but I don't know how to do this. Maybe someone cleverer
> will offer such a solution.
>
> txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
>         "Mary, [Stats, Designer, R], t14")
>
> wh <-  grep("\\[.+\\]", txt)
> spl <-  strsplit(txt[wh], "\\[|\\]")
> txt[wh] <-  sapply(spl, function(y)
>    paste0(y[1], gsub(" *, *","-", y[2]), y[-(1:2)]))
>
> > txt
> [1] "Sam, HadoopAnalyst-DBA-Developer, R46443 "
> [2] "Jan, DBA, R101"
> [3] "Mary, Stats-Designer-R, t14"
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Apr 7, 2019 at 9:55 AM Bert Gunter <[hidden email]> wrote:
>
>> (Note: This follows an earlier mistaken reply just to Duncan)
>>
>> Multiple "amens!" to Duncan's comments...
>>
>> However:
>>
>> Here is a start at my interpretation of how to do what you want. Note
>> first that your "example" listed 4 fields in the line, but you showed only
>> 3. I modified your example for 3 text fields, only one of which has
>> brackets ([...]) in it I assume. Here is a little example of how to use
>> regex's to replace the commas within the brackets by "-", which would
>> presumably then allow you to easily convert the text into a data frame e.g.
>> using textConnection() and read.csv. Obviously, if this is not what you
>> meant, read no further.
>>
>> ##Example
>> txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
>>         "Mary, [Stats, Designer, R], t14")
>>
>> wh <- grep("\\[.+\\]",txt)  ## which records need to be modified?
>> fixup <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ##
>> bracketed expressions, changing "," to "-"
>>
>> ## Unfortunately, the "replacement" argument in sub() is not vectorized,
>> se we need a loop:
>>
>> for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ## replace
>> original bracketed text with fixed up bracketed text
>>
>> > txt
>> [1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 "
>> [2] "Jan, DBA, R101"
>> [3] "Mary, [HadoopAnalyst-DBA-Developer], t14"
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <[hidden email]>
>> wrote:
>>
>>> On 06/04/2019 10:03 a.m., Amit Govil wrote:
>>> > Hi,
>>> >
>>> > I have a bunch of csv files to read in R. I'm unable to read them
>>> correctly
>>> > because in some of the files, there is a column ("Role") which has
>>> comma in
>>> > the values.
>>> >
>>> > Sample data:
>>> >
>>> > User, Role, Rule, GAPId
>>> > Sam, [HadoopAnalyst, DBA, Developer], R46443
>>> >
>>> > I'm trying to play with the below code but it doesnt work:
>>>
>>> Since you didn't give a reproducible example, you should at least say
>>> what "doesn't work" means.
>>>
>>> But here's some general advice:  if you want to debug code, don't write
>>> huge expressions like the chain of functions below, put things in
>>> temporary variables and make sure you get what you were expecting at
>>> each stage.
>>>
>>> Instead of
>>> >
>>> > files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>>> >
>>> > tbl <- sapply(files, function(f) {
>>> >    gsub('\\[|\\]', '"', readLines(f)) %>%
>>> >      read.csv(text = ., check.names = FALSE)
>>> > }) %>%
>>> >    bind_rows(.id = "id") %>%
>>> >    select(id, User, Rule) %>%
>>> >    distinct()
>>>
>>> try
>>>
>>>
>>> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>>>
>>> tmp1 <- sapply(files, function(f) {
>>>    gsub('\\[|\\]', '"', readLines(f)) %>%
>>>      read.csv(text = ., check.names = FALSE)
>>> })
>>>
>>> tmp2 <- tmp1 %>% bind_rows(.id = "id")
>>>
>>> tmp3 <- tmp2 %>% select(id, User, Rule)
>>>
>>> tbl <- tmp3 %>% distinct()
>>>
>>> (You don't need pipes here, but it will make it easier to put the giant
>>> expression back together at the end.)
>>>
>>> Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
>>> wrong.
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Unable to read csv files with comma in values

amit.cs03@gmail.com
Sorry for reverting back late (as I had some issues with my laptop) and
thank you Bert and Duncan for your suggestions and approach. I'll try the
solutions proposed and get back to you.

Thanks
Amit

On Mon, Apr 8, 2019 at 3:30 AM Bert Gunter <[hidden email]> wrote:

> ... and if anyone cares, here's a way to do it using vectorization (no
> loops) by working only on the subvector containing bracketed text  and
> using the brackets to break up the strings into 3 separate pieces,
> replacing the commas in the middle piece with dashes, and then
> reassembling. Quite clumsy, so a better solution is still needed, but here
> it is:
>
> txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
>         "Mary, [Stats, Designer, R], t14")
> wh <- grep("\\[.+\\]",txt)
> txt1 <- sub("(.+), *\\[.+","\\1",txt[wh]) ## before "["
> txt2 <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ##
> bracketed part
> txt3 <- sub(".*\\], *(.+?) *$","\\1",txt[wh]) ## after "]"
> txt[wh]<- paste(txt1, txt2, txt3, sep = ", ")
>
> > txt
> [1] "Sam, [HadoopAnalyst-DBA-Developer], R46443"
> [2] "Jan, DBA, R101"
> [3] "Mary, [Stats-Designer-R], t14"
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Apr 7, 2019 at 10:35 AM Bert Gunter <[hidden email]>
> wrote:
>
>> ... and here's another perhaps simpler, perhaps more efficient (??) way
>> of doing it using strsplit().Note that it uses the fixed field position, 2,
>> of the bracketed roles. Adjust as needed.
>>
>> A better solution would be a regex that avoids the loops (here, the
>> sapply) altogether, but I don't know how to do this. Maybe someone cleverer
>> will offer such a solution.
>>
>> txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
>>         "Mary, [Stats, Designer, R], t14")
>>
>> wh <-  grep("\\[.+\\]", txt)
>> spl <-  strsplit(txt[wh], "\\[|\\]")
>> txt[wh] <-  sapply(spl, function(y)
>>    paste0(y[1], gsub(" *, *","-", y[2]), y[-(1:2)]))
>>
>> > txt
>> [1] "Sam, HadoopAnalyst-DBA-Developer, R46443 "
>> [2] "Jan, DBA, R101"
>> [3] "Mary, Stats-Designer-R, t14"
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, Apr 7, 2019 at 9:55 AM Bert Gunter <[hidden email]>
>> wrote:
>>
>>> (Note: This follows an earlier mistaken reply just to Duncan)
>>>
>>> Multiple "amens!" to Duncan's comments...
>>>
>>> However:
>>>
>>> Here is a start at my interpretation of how to do what you want. Note
>>> first that your "example" listed 4 fields in the line, but you showed only
>>> 3. I modified your example for 3 text fields, only one of which has
>>> brackets ([...]) in it I assume. Here is a little example of how to use
>>> regex's to replace the commas within the brackets by "-", which would
>>> presumably then allow you to easily convert the text into a data frame e.g.
>>> using textConnection() and read.csv. Obviously, if this is not what you
>>> meant, read no further.
>>>
>>> ##Example
>>> txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101",
>>>         "Mary, [Stats, Designer, R], t14")
>>>
>>> wh <- grep("\\[.+\\]",txt)  ## which records need to be modified?
>>> fixup <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ##
>>> bracketed expressions, changing "," to "-"
>>>
>>> ## Unfortunately, the "replacement" argument in sub() is not vectorized,
>>> se we need a loop:
>>>
>>> for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ##
>>> replace original bracketed text with fixed up bracketed text
>>>
>>> > txt
>>> [1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 "
>>> [2] "Jan, DBA, R101"
>>> [3] "Mary, [HadoopAnalyst-DBA-Developer], t14"
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <[hidden email]>
>>> wrote:
>>>
>>>> On 06/04/2019 10:03 a.m., Amit Govil wrote:
>>>> > Hi,
>>>> >
>>>> > I have a bunch of csv files to read in R. I'm unable to read them
>>>> correctly
>>>> > because in some of the files, there is a column ("Role") which has
>>>> comma in
>>>> > the values.
>>>> >
>>>> > Sample data:
>>>> >
>>>> > User, Role, Rule, GAPId
>>>> > Sam, [HadoopAnalyst, DBA, Developer], R46443
>>>> >
>>>> > I'm trying to play with the below code but it doesnt work:
>>>>
>>>> Since you didn't give a reproducible example, you should at least say
>>>> what "doesn't work" means.
>>>>
>>>> But here's some general advice:  if you want to debug code, don't write
>>>> huge expressions like the chain of functions below, put things in
>>>> temporary variables and make sure you get what you were expecting at
>>>> each stage.
>>>>
>>>> Instead of
>>>> >
>>>> > files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>>>> >
>>>> > tbl <- sapply(files, function(f) {
>>>> >    gsub('\\[|\\]', '"', readLines(f)) %>%
>>>> >      read.csv(text = ., check.names = FALSE)
>>>> > }) %>%
>>>> >    bind_rows(.id = "id") %>%
>>>> >    select(id, User, Rule) %>%
>>>> >    distinct()
>>>>
>>>> try
>>>>
>>>>
>>>> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
>>>>
>>>> tmp1 <- sapply(files, function(f) {
>>>>    gsub('\\[|\\]', '"', readLines(f)) %>%
>>>>      read.csv(text = ., check.names = FALSE)
>>>> })
>>>>
>>>> tmp2 <- tmp1 %>% bind_rows(.id = "id")
>>>>
>>>> tmp3 <- tmp2 %>% select(id, User, Rule)
>>>>
>>>> tbl <- tmp3 %>% distinct()
>>>>
>>>> (You don't need pipes here, but it will make it easier to put the giant
>>>> expression back together at the end.)
>>>>
>>>> Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went
>>>> wrong.
>>>>
>>>> Duncan Murdoch
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.