Help with read.csv.sql()

classic Classic list List threaded Threaded
18 messages Options
H-2
Reply | Threaded
Open this post in threaded view
|

Help with read.csv.sql()

H-2
I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.

The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.

I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.

It seems that the above should be feasible but I am missing something? Does anyone know?

A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.

Is it possible to do this when importing using read.csv.sql()?

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Bert Gunter-2
Is there some reason that you can't make the changes to the data frame
(column names, as.date(), ...) *after* you have read all your data in?

Do all your csv files use the same names and date formats?


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email]> wrote:

> I have created a dataframe with columns that are characters, integers and
> numeric and with column names assigned by me. I am using read.csv.sql() to
> read portions of a number of large csv files into this dataframe, each csv
> file having a header row with columb names.
>
> The problem I am having is that the csv files have header rows with column
> names that are slightly different from the column names I have assigned in
> the dataframe and it seems that when I read the csv data into the
> dataframe, the column names from the csv file replace the column names I
> chose when creating the dataframe.
>
> I have been unable to figure out if it is possible to assign column names
> of my choosing in the read.csv.sql() function? I have tried various
> variations but none seem to work. I tried colClasses = c(....) but that did
> not work, I tried field.types = c(...) but could not get that to work
> either.
>
> It seems that the above should be feasible but I am missing something?
> Does anyone know?
>
> A secondary issue is that the csv files have a column with a date in
> mm/dd/yyyy format that I would like to make into a Date type column in my
> dataframe. Again, I have been unable to find a way - if at all possible -
> to force a conversion into a Date format when importing into the dataframe.
> The best I have so far is to import is a character column and then use
> as.Date() to later force the conversion of the dataframe column.
>
> Is it possible to do this when importing using read.csv.sql()?
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
H-2
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

H-2
On 07/17/2020 09:49 PM, Bert Gunter wrote:

> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in?
>
> Do all your csv files use the same names and date formats?
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email] <mailto:[hidden email]>> wrote:
>
>     I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>
>     The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>
>     I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>
>     It seems that the above should be feasible but I am missing something? Does anyone know?
>
>     A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>
>     Is it possible to do this when importing using read.csv.sql()?
>
>     ______________________________________________
>     [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>
Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible.

Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Rui Barradas
Hello,

I don't believe that what you are asking for is possible but like Bert
suggested, you can do it after reading in the data.
You could write a convenience function to read the data, then change
what you need to change.
Then the function would return this final object.

Rui Barradas

Às 16:43 de 18/07/2020, H escreveu:

> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in?
>>
>> Do all your csv files use the same names and date formats?
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>      I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>>
>>      The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>>
>>      I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>>
>>      It seems that the above should be feasible but I am missing something? Does anyone know?
>>
>>      A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>>
>>      Is it possible to do this when importing using read.csv.sql()?
>>
>>      ______________________________________________
>>      [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>      PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>      and provide commented, minimal, self-contained, reproducible code.
>>
> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible.
>
> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
H-2
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

H-2
On 07/18/2020 11:54 AM, Rui Barradas wrote:

> Hello,
>
> I don't believe that what you are asking for is possible but like Bert suggested, you can do it after reading in the data.
> You could write a convenience function to read the data, then change what you need to change.
> Then the function would return this final object.
>
> Rui Barradas
>
> Às 16:43 de 18/07/2020, H escreveu:
>
>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in?
>>>
>>> Do all your csv files use the same names and date formats?
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>      I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>>>
>>>      The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>>>
>>>      I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>>>
>>>      It seems that the above should be feasible but I am missing something? Does anyone know?
>>>
>>>      A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>>>
>>>      Is it possible to do this when importing using read.csv.sql()?
>>>
>>>      ______________________________________________
>>>      [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>      PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>      and provide commented, minimal, self-contained, reproducible code.
>>>
>> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible.
>>
>> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have.
>>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
The documentation for read.csv.sql() suggests that colClasses() and/or field.types() should work but I may well have misunderstood the documentation, hence my question in this group.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Rui Barradas
Hello,

The documentation says the following.

field.types
A list whose names are the column names and whose contents are the
SQLite types (not the R class names) of the columns.


So argument field.types is a named list.

  - The list members names are the column names of the table to be read.
  - The list members values are SQLite types, like "CHAR", "VARCHAR",
"INT", etc.


As for colClasses, those are R class names.

Rui Barradas




Às 17:59 de 18/07/2020, H escreveu:

> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>> Hello,
>>
>> I don't believe that what you are asking for is possible but like Bert suggested, you can do it after reading in the data.
>> You could write a convenience function to read the data, then change what you need to change.
>> Then the function would return this final object.
>>
>> Rui Barradas
>>
>> Às 16:43 de 18/07/2020, H escreveu:
>>
>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in?
>>>>
>>>> Do all your csv files use the same names and date formats?
>>>>
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email] <mailto:[hidden email]>> wrote:
>>>>
>>>>       I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>>>>
>>>>       The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>>>>
>>>>       I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>>>>
>>>>       It seems that the above should be feasible but I am missing something? Does anyone know?
>>>>
>>>>       A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>>>>
>>>>       Is it possible to do this when importing using read.csv.sql()?
>>>>
>>>>       ______________________________________________
>>>>       [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
>>>>       https://stat.ethz.ch/mailman/listinfo/r-help
>>>>       PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>       and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible.
>>>
>>> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have.
>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> The documentation for read.csv.sql() suggests that colClasses() and/or field.types() should work but I may well have misunderstood the documentation, hence my question in this group.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

R help mailing list-2
In reply to this post by H-2
Do either of the postings/threads below help?

https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html

Otherwise you can try reading through the FAQ on Github:

https://github.com/ggrothendieck/sqldf

HTH, Bill.

W. Michels, Ph.D.



On Sat, Jul 18, 2020 at 9:59 AM H <[hidden email]> wrote:

>
> On 07/18/2020 11:54 AM, Rui Barradas wrote:
> > Hello,
> >
> > I don't believe that what you are asking for is possible but like Bert suggested, you can do it after reading in the data.
> > You could write a convenience function to read the data, then change what you need to change.
> > Then the function would return this final object.
> >
> > Rui Barradas
> >
> > Às 16:43 de 18/07/2020, H escreveu:
> >
> >> On 07/17/2020 09:49 PM, Bert Gunter wrote:
> >>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in?
> >>>
> >>> Do all your csv files use the same names and date formats?
> >>>
> >>>
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>
> >>>
> >>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email] <mailto:[hidden email]>> wrote:
> >>>
> >>>      I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
> >>>
> >>>      The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
> >>>
> >>>      I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
> >>>
> >>>      It seems that the above should be feasible but I am missing something? Does anyone know?
> >>>
> >>>      A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
> >>>
> >>>      Is it possible to do this when importing using read.csv.sql()?
> >>>
> >>>      ______________________________________________
> >>>      [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> >>>      https://stat.ethz.ch/mailman/listinfo/r-help
> >>>      PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>>      and provide commented, minimal, self-contained, reproducible code.
> >>>
> >> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible.
> >>
> >> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have.
> >>
> >>
> >>     [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> The documentation for read.csv.sql() suggests that colClasses() and/or field.types() should work but I may well have misunderstood the documentation, hence my question in this group.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Rasmus Liland-3
In reply to this post by Rui Barradas
On 2020-07-18 18:09 +0100, Rui Barradas wrote:
| Às 17:59 de 18/07/2020, H escreveu:
| | On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email]> wrote:
| | |
| | | The problem I am having is that
| | | the csv files have header rows
| | | with column names that are
| | | slightly different from the column
| | | names I have assigned in the
| | | dataframe and it seems that when I
| | | read the csv data into the
| | | dataframe, the column names from
| | | the csv file replace the column
| | | names I chose when creating the
| | | dataframe.
| | |
| | | A secondary issue is that the csv
| | | files have a column with a date in
| | | mm/dd/yyyy format that I would
| | | like to make into a Date type
| | | column in my dataframe. Again, I
| | | have been unable to find a way -
| | | if at all possible - to force a
| | | conversion into a Date format when
| | | importing into the dataframe. The
| | | best I have so far is to import is
| | | a character column and then use
| | | as.Date() to later force the
| | | conversion of the dataframe
| | | column.
| |
| | The documentation for read.csv.sql()
| | suggests that colClasses() and/or
| | field.types() should work but I may
| | well have misunderstood the
| | documentation, hence my question in
| | this group.
|
| As for colClasses, those are R class
| names.

Ok Mister H, I might have hit the nail
on the head this time with this badass
example for your usecase:

        # Make a csv with %d/%m/%Y dates in it ...
        Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158
        STM05-1 2005/02/28 19:44 Good -35.487 177.129
        STM05-1 2005/02/28 23:01 Unknown -35.399 177.064
        STM05-1 2005/03/01 07:28 Unknown -34.978 177.268
        STM05-1 2005/03/01 18:06 Poor -34.799 177.027
        STM05-1 2005/03/01 18:47 Poor -34.85 177.059
        STM05-2 2005/02/28 12:49 Good -35.928 177.328
        STM05-2 2005/02/28 21:23 Poor -35.926 177.314
        "
        DF <- read.table(textConnection(Lines), as.is = TRUE,
          col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long"))
        DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"), "%d/%m/%Y")
        write.csv(DF, file="df.csv", row.names=FALSE)
       
        colClasses <-
          c("character",
            "Date",
            "character",
            "character",
            "numeric",
            "numeric")
        sql <- paste0(
          "select ",
            "date(",  # [2]
              "substr(Date, 8, 4) || '-' || ",  # [1]
              "substr(Date, 5, 2) || '-' || ",
              "substr(Date, 2, 2)), Long, Lat, Quality ",
          "from ff where Quality like '%oo%' and Long>177.129")
        ff <- file(description="df.csv", open="r")
        dat <- sqldf::read.csv.sql(
          sql=sql, colClasses=colClasses)
        close(ff)
       
        str(dat)
       
        as.Date(dat[,1])
        dat[,3]

Both sqlite and Postgres has a function
substr you can call on strings like
this.[5]  I have a hunch this has always
been possible in sql from way back ...

The warning from sqldf about unused
connections, might suggest file
descriptor handling to be a bit crusty
... [3]

The thing is, defining the second column
as of type Date in colClasses happens to
work, but it's still character when you
check with str(dat) ... perhaps it has
something to do with this info from [4]:

        as_tibble_row() converts a vector to
        a tibble with one row. The input
        must be a bare vector, e.g. vectors
        of dates are not supported yet. If
        the input is a list, all elements
        must have length one.

[1] https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite
[2] https://www.sqlite.org/lang_datefunc.html
[3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8
[4] https://tibble.tidyverse.org/reference/as_tibble.html
[5] https://www.sqlite.org/lang_corefunc.html#substr,
https://www.postgresql.org/docs/9.1/functions-string.html,
http://www.h2database.com/html/functions.html#substring 

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
H-2
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

H-2
In reply to this post by R help mailing list-2
On 07/18/2020 01:38 PM, William Michels wrote:

> Do either of the postings/threads below help?
>
> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>
> Otherwise you can try reading through the FAQ on Github:
>
> https://github.com/ggrothendieck/sqldf
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
> On Sat, Jul 18, 2020 at 9:59 AM H <[hidden email]> wrote:
>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>> Hello,
>>>
>>> I don't believe that what you are asking for is possible but like Bert suggested, you can do it after reading in the data.
>>> You could write a convenience function to read the data, then change what you need to change.
>>> Then the function would return this final object.
>>>
>>> Rui Barradas
>>>
>>> Às 16:43 de 18/07/2020, H escreveu:
>>>
>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in?
>>>>>
>>>>> Do all your csv files use the same names and date formats?
>>>>>
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>
>>>>>      I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>>>>>
>>>>>      The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>>>>>
>>>>>      I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>>>>>
>>>>>      It seems that the above should be feasible but I am missing something? Does anyone know?
>>>>>
>>>>>      A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>>>>>
>>>>>      Is it possible to do this when importing using read.csv.sql()?
>>>>>
>>>>>      ______________________________________________
>>>>>      [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>      PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>      and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible.
>>>>
>>>> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have.
>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>> The documentation for read.csv.sql() suggests that colClasses() and/or field.types() should work but I may well have misunderstood the documentation, hence my question in this group.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

I had read the sqldf() documentation but was left with the impression that what I want to do is not easily doable.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
H-2
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

H-2
In reply to this post by Rasmus Liland-3
On 07/18/2020 11:42 PM, Rasmus Liland wrote:

> On 2020-07-18 18:09 +0100, Rui Barradas wrote:
> | �s 17:59 de 18/07/2020, H escreveu:
> | | On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email]> wrote:
> | | |
> | | | The problem I am having is that
> | | | the csv files have header rows
> | | | with column names that are
> | | | slightly different from the column
> | | | names I have assigned in the
> | | | dataframe and it seems that when I
> | | | read the csv data into the
> | | | dataframe, the column names from
> | | | the csv file replace the column
> | | | names I chose when creating the
> | | | dataframe.
> | | |
> | | | A secondary issue is that the csv
> | | | files have a column with a date in
> | | | mm/dd/yyyy format that I would
> | | | like to make into a Date type
> | | | column in my dataframe. Again, I
> | | | have been unable to find a way -
> | | | if at all possible - to force a
> | | | conversion into a Date format when
> | | | importing into the dataframe. The
> | | | best I have so far is to import is
> | | | a character column and then use
> | | | as.Date() to later force the
> | | | conversion of the dataframe
> | | | column.
> | |
> | | The documentation for read.csv.sql()
> | | suggests that colClasses() and/or
> | | field.types() should work but I may
> | | well have misunderstood the
> | | documentation, hence my question in
> | | this group.
> |
> | As for colClasses, those are R class
> | names.
>
> Ok Mister H, I might have hit the nail
> on the head this time with this badass
> example for your usecase:
>
> # Make a csv with %d/%m/%Y dates in it ...
> Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158
> STM05-1 2005/02/28 19:44 Good -35.487 177.129
> STM05-1 2005/02/28 23:01 Unknown -35.399 177.064
> STM05-1 2005/03/01 07:28 Unknown -34.978 177.268
> STM05-1 2005/03/01 18:06 Poor -34.799 177.027
> STM05-1 2005/03/01 18:47 Poor -34.85 177.059
> STM05-2 2005/02/28 12:49 Good -35.928 177.328
> STM05-2 2005/02/28 21:23 Poor -35.926 177.314
> "
> DF <- read.table(textConnection(Lines), as.is = TRUE,
>  col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long"))
> DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"), "%d/%m/%Y")
> write.csv(DF, file="df.csv", row.names=FALSE)
>
> colClasses <-
>  c("character",
>    "Date",
>    "character",
>    "character",
>    "numeric",
>    "numeric")
> sql <- paste0(
>  "select ",
>    "date(",  # [2]
>      "substr(Date, 8, 4) || '-' || ",  # [1]
>      "substr(Date, 5, 2) || '-' || ",
>      "substr(Date, 2, 2)), Long, Lat, Quality ",
>  "from ff where Quality like '%oo%' and Long>177.129")
> ff <- file(description="df.csv", open="r")
> dat <- sqldf::read.csv.sql(
>  sql=sql, colClasses=colClasses)
> close(ff)
>
> str(dat)
>
> as.Date(dat[,1])
> dat[,3]
>
> Both sqlite and Postgres has a function
> substr you can call on strings like
> this.[5]  I have a hunch this has always
> been possible in sql from way back ...
>
> The warning from sqldf about unused
> connections, might suggest file
> descriptor handling to be a bit crusty
> ... [3]
>
> The thing is, defining the second column
> as of type Date in colClasses happens to
> work, but it's still character when you
> check with str(dat) ... perhaps it has
> something to do with this info from [4]:
>
> as_tibble_row() converts a vector to
> a tibble with one row. The input
> must be a bare vector, e.g. vectors
> of dates are not supported yet. If
> the input is a list, all elements
> must have length one.
>
> [1] https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite
> [2] https://www.sqlite.org/lang_datefunc.html
> [3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8
> [4] https://tibble.tidyverse.org/reference/as_tibble.html
> [5] https://www.sqlite.org/lang_corefunc.html#substr,
> https://www.postgresql.org/docs/9.1/functions-string.html,
> http://www.h2database.com/html/functions.html#substring 
>
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Thank you for your extensive example. However, I have decided to simply convert column types as necessary and rename columns as desired after importing the data since that seems the simplest solution.


        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Rasmus Liland-3
On 2020-07-20 17:54 -0400, H wrote:

> On 07/18/2020 11:42 PM, Rasmus Liland wrote:
> >
> > The thing is, defining the second column
> > as of type Date in colClasses happens to
> > work, but it's still character when you
> > check with str(dat) ... perhaps it has
> > something to do with this info from [4]:
> >
> > as_tibble_row() converts a vector to
> > a tibble with one row. The input
> > must be a bare vector, e.g. vectors
> > of dates are not supported yet. If
> > the input is a list, all elements
> > must have length one.
> >
> > [4] https://tibble.tidyverse.org/reference/as_tibble.html
>
> Thank you for your extensive example.
> However, I have decided to simply
> convert column types as necessary and
> rename columns as desired after
> importing the data since that seems
> the simplest solution.
Dear H,

Right, I am glad you figured this out.  
Please just elaborate (if you want
to/are able) what solution/idea you were
after so I, others, learn something for
another time.

I imagined you were sitting on tonnes of
csv files and were going to handle dates
on some very specific, rarely-occurring
rows in there ...

Best,
Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

R help mailing list-2
In reply to this post by H-2
You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
     STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
     STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
     STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
     STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
     STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
     STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
     STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
     STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime", "myQuality", "myLat",
"myLong"),
   col_types = cols(
     myDate = col_date(format = ""),
     myTime = col_time(format = ""),
     myLat = col_number(),
     myLong = col_number(),
     .default = col_character()
   )
   )

read_csv(
   lines,
   col_types = cols_only(
     Id = col_character(),
     Date = col_date(format = ""),
     Time = col_time(format = "")
   )
)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime", "myQuality", "myLat",
"myLong"),
   col_types = cols_only(
     myId = col_character(),
     myDate = col_date(format = ""),
     myTime = col_time(format = "")
   )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:

> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>>
>> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>>
>> Otherwise you can try reading through the FAQ on Github:
>>
>> https://github.com/ggrothendieck/sqldf
>>
>> HTH, Bill.
>>
>> W. Michels, Ph.D.
>>
>>
>>
>> On Sat, Jul 18, 2020 at 9:59 AM H <[hidden email]> wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>>
>>>> I don't believe that what you are asking for is possible but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 16:43 de 18/07/2020, H escreveu:
>>>>
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have read all
>>>>>> your data in?
>>>>>>
>>>>>> Do all your csv files use the same names and date formats?
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email]
>>>>>> <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>>      I have created a dataframe with columns that are characters,
>>>>>> integers and numeric and with column names assigned by me. I am
>>>>>> using read.csv.sql() to read portions of a number of large csv
>>>>>> files into this dataframe, each csv file having a header row with
>>>>>> columb names.
>>>>>>
>>>>>>      The problem I am having is that the csv files have header
>>>>>> rows with column names that are slightly different from the column
>>>>>> names I have assigned in the dataframe and it seems that when I
>>>>>> read the csv data into the dataframe, the column names from the
>>>>>> csv file replace the column names I chose when creating the
>>>>>> dataframe.
>>>>>>
>>>>>>      I have been unable to figure out if it is possible to assign
>>>>>> column names of my choosing in the read.csv.sql() function? I have
>>>>>> tried various variations but none seem to work. I tried colClasses
>>>>>> = c(....) but that did not work, I tried field.types = c(...) but
>>>>>> could not get that to work either.
>>>>>>
>>>>>>      It seems that the above should be feasible but I am missing
>>>>>> something? Does anyone know?
>>>>>>
>>>>>>      A secondary issue is that the csv files have a column with a
>>>>>> date in mm/dd/yyyy format that I would like to make into a Date
>>>>>> type column in my dataframe. Again, I have been unable to find a
>>>>>> way - if at all possible - to force a conversion into a Date
>>>>>> format when importing into the dataframe. The best I have so far
>>>>>> is to import is a character column and then use as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>>
>>>>>>      Is it possible to do this when importing using
>>>>>> read.csv.sql()?
>>>>>>
>>>>>>      ______________________________________________
>>>>>>      [hidden email] <mailto:[hidden email]> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>      PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>      and provide commented, minimal, self-contained, reproducible
>>>>>> code.
>>>>>>
>>>>> Yes, the files use the same column names and date format (at least
>>>>> as far as I know now.) I agree I could do it as you suggest above
>>>>> but from a purist perspective I would rather do it when importing
>>>>> the data using read.csv.sql(), particularly if column names and/or
>>>>> date format might change, or be different between different files.
>>>>> I am indeed selecting rows from a large number of csv files so this
>>>>> is entirely plausible.
>>>>>
>>>>> Has anyone been able to name columns in the read.csv.sql() call
>>>>> and/or force date format conversion in the call itself? The first
>>>>> refers to naming columns differently from what a header in the csv
>>>>> file may have.
>>>>>
>>>>>
>>>>>     [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses()
>>> and/or field.types() should work but I may well have misunderstood
>>> the documentation, hence my question in this group.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

R help mailing list-2
In reply to this post by H-2
You might achieve this using readr:

```
library(readr)

lines <- "Id, Date, Time, Quality, Lat, Long
     STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
     STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
     STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
     STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
     STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
     STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
     STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
     STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"

read_csv(lines)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime", "myQuality", "myLat",
"myLong"),
   col_types = cols(
     myDate = col_date(format = ""),
     myTime = col_time(format = ""),
     myLat = col_number(),
     myLong = col_number(),
     .default = col_character()
   )
   )

read_csv(
   lines,
   col_types = cols_only(
     Id = col_character(),
     Date = col_date(format = ""),
     Time = col_time(format = "")
   )
)

read_csv(
   lines,
   skip = 1, # Ignore the header row
   col_names = c("myId", "myDate", "myTime", "myQuality", "myLat",
"myLong"),
   col_types = cols_only(
     myId = col_character(),
     myDate = col_date(format = ""),
     myTime = col_time(format = "")
   )
)
```

HTH
Ulrik

On 2020-07-20 02:07, H wrote:

> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>>
>> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>>
>> Otherwise you can try reading through the FAQ on Github:
>>
>> https://github.com/ggrothendieck/sqldf
>>
>> HTH, Bill.
>>
>> W. Michels, Ph.D.
>>
>>
>>
>> On Sat, Jul 18, 2020 at 9:59 AM H <[hidden email]> wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>>
>>>> I don't believe that what you are asking for is possible but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 16:43 de 18/07/2020, H escreveu:
>>>>
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have read all
>>>>>> your data in?
>>>>>>
>>>>>> Do all your csv files use the same names and date formats?
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <[hidden email]
>>>>>> <mailto:[hidden email]>> wrote:
>>>>>>
>>>>>>      I have created a dataframe with columns that are characters,
>>>>>> integers and numeric and with column names assigned by me. I am
>>>>>> using read.csv.sql() to read portions of a number of large csv
>>>>>> files into this dataframe, each csv file having a header row with
>>>>>> columb names.
>>>>>>
>>>>>>      The problem I am having is that the csv files have header
>>>>>> rows with column names that are slightly different from the column
>>>>>> names I have assigned in the dataframe and it seems that when I
>>>>>> read the csv data into the dataframe, the column names from the
>>>>>> csv file replace the column names I chose when creating the
>>>>>> dataframe.
>>>>>>
>>>>>>      I have been unable to figure out if it is possible to assign
>>>>>> column names of my choosing in the read.csv.sql() function? I have
>>>>>> tried various variations but none seem to work. I tried colClasses
>>>>>> = c(....) but that did not work, I tried field.types = c(...) but
>>>>>> could not get that to work either.
>>>>>>
>>>>>>      It seems that the above should be feasible but I am missing
>>>>>> something? Does anyone know?
>>>>>>
>>>>>>      A secondary issue is that the csv files have a column with a
>>>>>> date in mm/dd/yyyy format that I would like to make into a Date
>>>>>> type column in my dataframe. Again, I have been unable to find a
>>>>>> way - if at all possible - to force a conversion into a Date
>>>>>> format when importing into the dataframe. The best I have so far
>>>>>> is to import is a character column and then use as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>>
>>>>>>      Is it possible to do this when importing using
>>>>>> read.csv.sql()?
>>>>>>
>>>>>>      ______________________________________________
>>>>>>      [hidden email] <mailto:[hidden email]> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>>      https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>      PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>      and provide commented, minimal, self-contained, reproducible
>>>>>> code.
>>>>>>
>>>>> Yes, the files use the same column names and date format (at least
>>>>> as far as I know now.) I agree I could do it as you suggest above
>>>>> but from a purist perspective I would rather do it when importing
>>>>> the data using read.csv.sql(), particularly if column names and/or
>>>>> date format might change, or be different between different files.
>>>>> I am indeed selecting rows from a large number of csv files so this
>>>>> is entirely plausible.
>>>>>
>>>>> Has anyone been able to name columns in the read.csv.sql() call
>>>>> and/or force date format conversion in the call itself? The first
>>>>> refers to naming columns differently from what a header in the csv
>>>>> file may have.
>>>>>
>>>>>
>>>>>     [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses()
>>> and/or field.types() should work but I may well have misunderstood
>>> the documentation, hence my question in this group.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Rasmus Liland-3
Dear Ulrik,

On 2020-07-29 17:14 +0200, Ulrik Stervbo via R-help wrote:
> library(readr)
> read_csv(

This thread was about
sqldf::read.csv.sql ...

What is the purpose of bringing up
readr::read_csv?  I am unfamilliar with
it, so it might be a good one.

Best,
Rasmus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

R help mailing list-2
True, but the question was also how to control for formats and naming columns while loading the file.

The only way I know how to do this (sans work on my part) is through the functions in readr. So, 50% on topic :-)

Best,
Ulrik


On 29 Jul 2020, 17:59, at 17:59, Rasmus Liland <[hidden email]> wrote:

>Dear Ulrik,
>
>On 2020-07-29 17:14 +0200, Ulrik Stervbo via R-help wrote:
>> library(readr)
>> read_csv(
>
>This thread was about
>sqldf::read.csv.sql ...
>
>What is the purpose of bringing up
>readr::read_csv?  I am unfamilliar with
>it, so it might be a good one.
>
>Best,
>Rasmus
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Rasmus Liland-3
On 2020-07-29 22:56 +0200, Ulrik Stervbo wrote:
> So, 50% on topic :-)

I guess so haha :-)  Still I hope this
is useful for H <[hidden email]>,
others, or he solved it but still no
bottom line volatile situation open for
new ideas.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

Gabor Grothendieck
In reply to this post by H-2
Probably simplest to assign the names afterwards as others have
suggested but it could be done like this:

  library(sqldf)
  write.csv(BOD, "BOD.csv", quote = FALSE, row.names = FALSE)  # test data

  read.csv.sql("BOD.csv", "select Time as Time2, demand as demand2 from file")

giving the column names Time2 and demand2 rather than the original column names.

    Time2 demand2
  1     1     8.3
  2     2    10.3
  3     3    19.0
  4     4    16.0
  5     5    15.6
  6     7    19.8

On Fri, Jul 17, 2020 at 9:28 PM H <[hidden email]> wrote:

>
> I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>
> The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>
> I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>
> It seems that the above should be feasible but I am missing something? Does anyone know?
>
> A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>
> Is it possible to do this when importing using read.csv.sql()?
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
H-2
Reply | Threaded
Open this post in threaded view
|

Re: Help with read.csv.sql()

H-2
On 07/30/2020 06:09 PM, Gabor Grothendieck wrote:

> Probably simplest to assign the names afterwards as others have
> suggested but it could be done like this:
>
>   library(sqldf)
>   write.csv(BOD, "BOD.csv", quote = FALSE, row.names = FALSE)  # test data
>
>   read.csv.sql("BOD.csv", "select Time as Time2, demand as demand2 from file")
>
> giving the column names Time2 and demand2 rather than the original column names.
>
>     Time2 demand2
>   1     1     8.3
>   2     2    10.3
>   3     3    19.0
>   4     4    16.0
>   5     5    15.6
>   6     7    19.8
>
> On Fri, Jul 17, 2020 at 9:28 PM H <[hidden email]> wrote:
>> I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names.
>>
>> The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe.
>>
>> I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either.
>>
>> It seems that the above should be feasible but I am missing something? Does anyone know?
>>
>> A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column.
>>
>> Is it possible to do this when importing using read.csv.sql()?
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
Apologies, I had tuned out from this discussion since I solved the problem by renaming the columns after reading the file. Your suggestion to do it in the SQL statement itself, however, seems to be neatest one though!

Thank you.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.