iterators : checkFunc with ireadLines

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

iterators : checkFunc with ireadLines

Laurent Rhelp
Dear R-Help List,

    I would like to use an iterator to read a file filtering some
selected lines according to the line name in order to use after a
foreach loop. I wanted to use the checkFunc argument as the following
example found on internet to select only prime numbers :

|                                iprime <- ||iter||(1:100, checkFunc =
||function||(n) ||isprime||(n))|

|(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
<https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|

but the checkFunc argument seems not to be available with the function
ireadLines (package iterators). So, I did the code below to solve my
problem but I am sure that I miss something to use iterators with files.
Since I found nothing on the web about ireadLines and the checkFunc
argument, could somebody help me to understand how we have to use
iterator (and foreach loop) on files keeping only selected lines ?

Thank you very much
Laurent

Presently here is my code:

##        mock file to read: test.txt
##
# Time    0    0.000999    0.001999    0.002998    0.003998 0.004997   
0.005997    0.006996    0.007996
# N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464   
-0.026816    -0.03369    -0.041067    -0.038747
# N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996   
-0.005337    -0.008738    -0.015094    -0.012104
# N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748   
-0.020243    -0.015089    -0.014439    -0.011681
# N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612   
-0.036953    -0.036061    -0.044516    -0.046436
# N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569   
-0.021827    -0.021996    -0.021755    -0.021846


# sensors to keep

sensors <-  c("N053", "N163")


library(iterators)

library(rlist)


file_name <- "test.txt"

con_obj <- file( file_name , "r")
ifile <- ireadLines( con_obj , n = 1 )


## I do not do a loop for the example

res <- list()

r <- get_Lines_iter( ifile , sensors)
res <- list.append( res , r )
res
r <- get_Lines_iter( ifile , sensors)
res <- list.append( res , r )
res
r <- get_Lines_iter( ifile , sensors)
do.call("cbind",res)

## the function get_Lines_iter to select and process the line

get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
   ## read the next record in the iterator
   r = try( nextElem(iter) )
  while(  TRUE ){
     if( class(r) == "try-error") {
           return( stop("The iterator is empty") )
    } else {
    ## split the read line according to the separator
     r_txt <- textConnection(r)
     fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
quiet)
      ## test if we have to keep the line
      if( fields[1] %in% sensors){
        ## data processing for the selected line (for the example
transformation in dataframe)
        n <- length(fields)
        x <- data.frame( as.numeric(fields[2:n]) )
        names(x) <- fields[1]
        ## We return the values
        print(paste0("sensor ",fields[1]," ok"))
        return( x )
      }else{
       print(paste0("Sensor ", fields[1] ," not selected"))
       r = try(nextElem(iter) )}
    }
}# end while loop
}







--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

R help mailing list-2

Dear Laurent,

I'm going through your code quickly, and the first question I have is
whether you loaded the "gmp" library?

> library(gmp)

Attaching package: ‘gmp’

The following objects are masked from ‘package:base’:

    %*%, apply, crossprod, matrix, tcrossprod

> library(iterators)
> iter(1:100, checkFunc = function(n) isprime(n))
$state
<environment: 0x7fbead8837f0>

$length
[1] 100

$checkFunc
function (n)
isprime(n)

$recycle
[1] FALSE

attr(,"class")
[1] "containeriter" "iter"
>

HTH, Bill.

W. Michels, Ph.D.



On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <[hidden email]> wrote:

>
> Dear R-Help List,
>
>     I would like to use an iterator to read a file filtering some
> selected lines according to the line name in order to use after a
> foreach loop. I wanted to use the checkFunc argument as the following
> example found on internet to select only prime numbers :
>
> |                                iprime <- ||iter||(1:100, checkFunc =
> ||function||(n) ||isprime||(n))|
>
> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>
> but the checkFunc argument seems not to be available with the function
> ireadLines (package iterators). So, I did the code below to solve my
> problem but I am sure that I miss something to use iterators with files.
> Since I found nothing on the web about ireadLines and the checkFunc
> argument, could somebody help me to understand how we have to use
> iterator (and foreach loop) on files keeping only selected lines ?
>
> Thank you very much
> Laurent
>
> Presently here is my code:
>
> ##        mock file to read: test.txt
> ##
> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
> 0.005997    0.006996    0.007996
> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
> -0.026816    -0.03369    -0.041067    -0.038747
> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
> -0.005337    -0.008738    -0.015094    -0.012104
> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
> -0.020243    -0.015089    -0.014439    -0.011681
> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
> -0.036953    -0.036061    -0.044516    -0.046436
> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
> -0.021827    -0.021996    -0.021755    -0.021846
>
>
> # sensors to keep
>
> sensors <-  c("N053", "N163")
>
>
> library(iterators)
>
> library(rlist)
>
>
> file_name <- "test.txt"
>
> con_obj <- file( file_name , "r")
> ifile <- ireadLines( con_obj , n = 1 )
>
>
> ## I do not do a loop for the example
>
> res <- list()
>
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> do.call("cbind",res)
>
> ## the function get_Lines_iter to select and process the line
>
> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
>    ## read the next record in the iterator
>    r = try( nextElem(iter) )
>   while(  TRUE ){
>      if( class(r) == "try-error") {
>            return( stop("The iterator is empty") )
>     } else {
>     ## split the read line according to the separator
>      r_txt <- textConnection(r)
>      fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
> quiet)
>       ## test if we have to keep the line
>       if( fields[1] %in% sensors){
>         ## data processing for the selected line (for the example
> transformation in dataframe)
>         n <- length(fields)
>         x <- data.frame( as.numeric(fields[2:n]) )
>         names(x) <- fields[1]
>         ## We return the values
>         print(paste0("sensor ",fields[1]," ok"))
>         return( x )
>       }else{
>        print(paste0("Sensor ", fields[1] ," not selected"))
>        r = try(nextElem(iter) )}
>     }
> }# end while loop
> }
>
>
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

R help mailing list-2
In reply to this post by Laurent Rhelp

Apologies, Laurent, for this two-part answer. I misunderstood your
post where you stated you wanted to "filter(ing) some
selected lines according to the line name... ." I thought that meant
you had a separate index (like a series of primes) that you wanted to
use to only read-in selected line numbers from a file (test file below
with numbers 1:1000 each on a separate line):

> library(gmp)
> library(iterators)
> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
Read 1 item
[1] 2
> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
Read 1 item
[1] 3
> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
Read 1 item
[1] 5
> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
Read 1 item
[1] 7
>

However, what it really seems that you want to do is read each line of
a (possibly enormous) file, test each line "string-wise" to keep or
discard, and if you're keeping it, append the line to a list. I can
certainly see the advantage of this strategy for reading in very, very
large files, but it's not clear to me how the "ireadLines" function (
in the "iterators" package) will help you, since it doesn't seem to
generate anything but a sequential index.

Anyway, below is an absolutely standard read-in of your data using
read.table(). Hopefully some of the code I've posted has been useful
to you.

> sensors <-  c("N053", "N163")
> read.table("test2.txt")
    V1        V2        V3        V4        V5        V6        V7
   V8        V9       V10
1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
0.005997  0.006996  0.007996
2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
-0.033690 -0.041067 -0.038747
3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
-0.008738 -0.015094 -0.012104
4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
-0.015089 -0.014439 -0.011681
5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
-0.036061 -0.044516 -0.046436
6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
-0.021996 -0.021755 -0.021846
> Laurent_data <- read.table("test2.txt")
> Laurent_data[Laurent_data$V1 %in% sensors, ]
    V1        V2        V3        V4        V5        V6        V7
   V8        V9       V10
3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
-0.008738 -0.015094 -0.012104
5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
-0.036061 -0.044516 -0.046436

Best, Bill.

W. Michels, Ph.D.


On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <[hidden email]> wrote:

>
> Dear R-Help List,
>
>     I would like to use an iterator to read a file filtering some
> selected lines according to the line name in order to use after a
> foreach loop. I wanted to use the checkFunc argument as the following
> example found on internet to select only prime numbers :
>
> |                                iprime <- ||iter||(1:100, checkFunc =
> ||function||(n) ||isprime||(n))|
>
> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>
> but the checkFunc argument seems not to be available with the function
> ireadLines (package iterators). So, I did the code below to solve my
> problem but I am sure that I miss something to use iterators with files.
> Since I found nothing on the web about ireadLines and the checkFunc
> argument, could somebody help me to understand how we have to use
> iterator (and foreach loop) on files keeping only selected lines ?
>
> Thank you very much
> Laurent
>
> Presently here is my code:
>
> ##        mock file to read: test.txt
> ##
> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
> 0.005997    0.006996    0.007996
> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
> -0.026816    -0.03369    -0.041067    -0.038747
> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
> -0.005337    -0.008738    -0.015094    -0.012104
> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
> -0.020243    -0.015089    -0.014439    -0.011681
> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
> -0.036953    -0.036061    -0.044516    -0.046436
> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
> -0.021827    -0.021996    -0.021755    -0.021846
>
>
> # sensors to keep
>
> sensors <-  c("N053", "N163")
>
>
> library(iterators)
>
> library(rlist)
>
>
> file_name <- "test.txt"
>
> con_obj <- file( file_name , "r")
> ifile <- ireadLines( con_obj , n = 1 )
>
>
> ## I do not do a loop for the example
>
> res <- list()
>
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> res <- list.append( res , r )
> res
> r <- get_Lines_iter( ifile , sensors)
> do.call("cbind",res)
>
> ## the function get_Lines_iter to select and process the line
>
> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
>    ## read the next record in the iterator
>    r = try( nextElem(iter) )
>   while(  TRUE ){
>      if( class(r) == "try-error") {
>            return( stop("The iterator is empty") )
>     } else {
>     ## split the read line according to the separator
>      r_txt <- textConnection(r)
>      fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
> quiet)
>       ## test if we have to keep the line
>       if( fields[1] %in% sensors){
>         ## data processing for the selected line (for the example
> transformation in dataframe)
>         n <- length(fields)
>         x <- data.frame( as.numeric(fields[2:n]) )
>         names(x) <- fields[1]
>         ## We return the values
>         print(paste0("sensor ",fields[1]," ok"))
>         return( x )
>       }else{
>        print(paste0("Sensor ", fields[1] ," not selected"))
>        r = try(nextElem(iter) )}
>     }
> }# end while loop
> }
>
>
>
>
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Laurent Rhelp
In reply to this post by Laurent Rhelp

Dear William,
  Thank you for your answer
My file is very large so I cannot read it in my memory (I cannot use
read.table). So I want to put in memory only the line I need to process.
With readLines, as I did, it works but I would like to use an iterator
and a foreach loop to understand this way to do because I thought that
it was a better solution to write a nice code.


Le 18/05/2020 à 04:54, William Michels a écrit :

> Apologies, Laurent, for this two-part answer. I misunderstood your
> post where you stated you wanted to "filter(ing) some
> selected lines according to the line name... ." I thought that meant
> you had a separate index (like a series of primes) that you wanted to
> use to only read-in selected line numbers from a file (test file below
> with numbers 1:1000 each on a separate line):
>
>> library(gmp)
>> library(iterators)
>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 2
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 3
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 5
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 7
> However, what it really seems that you want to do is read each line of
> a (possibly enormous) file, test each line "string-wise" to keep or
> discard, and if you're keeping it, append the line to a list. I can
> certainly see the advantage of this strategy for reading in very, very
> large files, but it's not clear to me how the "ireadLines" function (
> in the "iterators" package) will help you, since it doesn't seem to
> generate anything but a sequential index.
>
> Anyway, below is an absolutely standard read-in of your data using
> read.table(). Hopefully some of the code I've posted has been useful
> to you.
>
>> sensors <-  c("N053", "N163")
>> read.table("test2.txt")
>      V1        V2        V3        V4        V5        V6        V7
>     V8        V9       V10
> 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
> 0.005997  0.006996  0.007996
> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
> -0.033690 -0.041067 -0.038747
> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094 -0.012104
> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
> -0.015089 -0.014439 -0.011681
> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> -0.036061 -0.044516 -0.046436
> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
> -0.021996 -0.021755 -0.021846
>> Laurent_data <- read.table("test2.txt")
>> Laurent_data[Laurent_data$V1 %in% sensors, ]
>      V1        V2        V3        V4        V5        V6        V7
>     V8        V9       V10
> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094 -0.012104
> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> -0.036061 -0.044516 -0.046436
>
> Best, Bill.
>
> W. Michels, Ph.D.
>
>
> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <[hidden email]> wrote:
>> Dear R-Help List,
>>
>>      I would like to use an iterator to read a file filtering some
>> selected lines according to the line name in order to use after a
>> foreach loop. I wanted to use the checkFunc argument as the following
>> example found on internet to select only prime numbers :
>>
>> |                                iprime <- ||iter||(1:100, checkFunc =
>> ||function||(n) ||isprime||(n))|
>>
>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>
>> but the checkFunc argument seems not to be available with the function
>> ireadLines (package iterators). So, I did the code below to solve my
>> problem but I am sure that I miss something to use iterators with files.
>> Since I found nothing on the web about ireadLines and the checkFunc
>> argument, could somebody help me to understand how we have to use
>> iterator (and foreach loop) on files keeping only selected lines ?
>>
>> Thank you very much
>> Laurent
>>
>> Presently here is my code:
>>
>> ##        mock file to read: test.txt
>> ##
>> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
>> 0.005997    0.006996    0.007996
>> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
>> -0.026816    -0.03369    -0.041067    -0.038747
>> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
>> -0.005337    -0.008738    -0.015094    -0.012104
>> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
>> -0.020243    -0.015089    -0.014439    -0.011681
>> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
>> -0.036953    -0.036061    -0.044516    -0.046436
>> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
>> -0.021827    -0.021996    -0.021755    -0.021846
>>
>>
>> # sensors to keep
>>
>> sensors <-  c("N053", "N163")
>>
>>
>> library(iterators)
>>
>> library(rlist)
>>
>>
>> file_name <- "test.txt"
>>
>> con_obj <- file( file_name , "r")
>> ifile <- ireadLines( con_obj , n = 1 )
>>
>>
>> ## I do not do a loop for the example
>>
>> res <- list()
>>
>> r <- get_Lines_iter( ifile , sensors)
>> res <- list.append( res , r )
>> res
>> r <- get_Lines_iter( ifile , sensors)
>> res <- list.append( res , r )
>> res
>> r <- get_Lines_iter( ifile , sensors)
>> do.call("cbind",res)
>>
>> ## the function get_Lines_iter to select and process the line
>>
>> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
>>     ## read the next record in the iterator
>>     r = try( nextElem(iter) )
>>    while(  TRUE ){
>>       if( class(r) == "try-error") {
>>             return( stop("The iterator is empty") )
>>      } else {
>>      ## split the read line according to the separator
>>       r_txt <- textConnection(r)
>>       fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
>> quiet)
>>        ## test if we have to keep the line
>>        if( fields[1] %in% sensors){
>>          ## data processing for the selected line (for the example
>> transformation in dataframe)
>>          n <- length(fields)
>>          x <- data.frame( as.numeric(fields[2:n]) )
>>          names(x) <- fields[1]
>>          ## We return the values
>>          print(paste0("sensor ",fields[1]," ok"))
>>          return( x )
>>        }else{
>>         print(paste0("Sensor ", fields[1] ," not selected"))
>>         r = try(nextElem(iter) )}
>>      }
>> }# end while loop
>> }
>>
>>
>>
>>
>>
>>
>>
>> --
>> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Laurent Rhelp
In reply to this post by Laurent Rhelp

Dear William,
  Thank you for your answer
My file is very large so I cannot read it in my memory (I cannot use
read.table). So I want to put in memory only the line I need to process.
With readLines, as I did, it works but I would like to use an iterator
and a foreach loop to understand this way to do because I thought that
it was a better solution to write a nice code.


Le 18/05/2020 à 04:54, William Michels a écrit :

> Apologies, Laurent, for this two-part answer. I misunderstood your
> post where you stated you wanted to "filter(ing) some
> selected lines according to the line name... ." I thought that meant
> you had a separate index (like a series of primes) that you wanted to
> use to only read-in selected line numbers from a file (test file below
> with numbers 1:1000 each on a separate line):
>
>> library(gmp)
>> library(iterators)
>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 2
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 3
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 5
>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> Read 1 item
> [1] 7
> However, what it really seems that you want to do is read each line of
> a (possibly enormous) file, test each line "string-wise" to keep or
> discard, and if you're keeping it, append the line to a list. I can
> certainly see the advantage of this strategy for reading in very, very
> large files, but it's not clear to me how the "ireadLines" function (
> in the "iterators" package) will help you, since it doesn't seem to
> generate anything but a sequential index.
>
> Anyway, below is an absolutely standard read-in of your data using
> read.table(). Hopefully some of the code I've posted has been useful
> to you.
>
>> sensors <-  c("N053", "N163")
>> read.table("test2.txt")
>      V1        V2        V3        V4        V5        V6        V7
>     V8        V9       V10
> 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
> 0.005997  0.006996  0.007996
> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
> -0.033690 -0.041067 -0.038747
> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094 -0.012104
> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
> -0.015089 -0.014439 -0.011681
> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> -0.036061 -0.044516 -0.046436
> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
> -0.021996 -0.021755 -0.021846
>> Laurent_data <- read.table("test2.txt")
>> Laurent_data[Laurent_data$V1 %in% sensors, ]
>      V1        V2        V3        V4        V5        V6        V7
>     V8        V9       V10
> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094 -0.012104
> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> -0.036061 -0.044516 -0.046436
>
> Best, Bill.
>
> W. Michels, Ph.D.
>
>
> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <[hidden email]> wrote:
>> Dear R-Help List,
>>
>>      I would like to use an iterator to read a file filtering some
>> selected lines according to the line name in order to use after a
>> foreach loop. I wanted to use the checkFunc argument as the following
>> example found on internet to select only prime numbers :
>>
>> |                                iprime <- ||iter||(1:100, checkFunc =
>> ||function||(n) ||isprime||(n))|
>>
>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>
>> but the checkFunc argument seems not to be available with the function
>> ireadLines (package iterators). So, I did the code below to solve my
>> problem but I am sure that I miss something to use iterators with files.
>> Since I found nothing on the web about ireadLines and the checkFunc
>> argument, could somebody help me to understand how we have to use
>> iterator (and foreach loop) on files keeping only selected lines ?
>>
>> Thank you very much
>> Laurent
>>
>> Presently here is my code:
>>
>> ##        mock file to read: test.txt
>> ##
>> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
>> 0.005997    0.006996    0.007996
>> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
>> -0.026816    -0.03369    -0.041067    -0.038747
>> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
>> -0.005337    -0.008738    -0.015094    -0.012104
>> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
>> -0.020243    -0.015089    -0.014439    -0.011681
>> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
>> -0.036953    -0.036061    -0.044516    -0.046436
>> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
>> -0.021827    -0.021996    -0.021755    -0.021846
>>
>>
>> # sensors to keep
>>
>> sensors <-  c("N053", "N163")
>>
>>
>> library(iterators)
>>
>> library(rlist)
>>
>>
>> file_name <- "test.txt"
>>
>> con_obj <- file( file_name , "r")
>> ifile <- ireadLines( con_obj , n = 1 )
>>
>>
>> ## I do not do a loop for the example
>>
>> res <- list()
>>
>> r <- get_Lines_iter( ifile , sensors)
>> res <- list.append( res , r )
>> res
>> r <- get_Lines_iter( ifile , sensors)
>> res <- list.append( res , r )
>> res
>> r <- get_Lines_iter( ifile , sensors)
>> do.call("cbind",res)
>>
>> ## the function get_Lines_iter to select and process the line
>>
>> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
>>     ## read the next record in the iterator
>>     r = try( nextElem(iter) )
>>    while(  TRUE ){
>>       if( class(r) == "try-error") {
>>             return( stop("The iterator is empty") )
>>      } else {
>>      ## split the read line according to the separator
>>       r_txt <- textConnection(r)
>>       fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
>> quiet)
>>        ## test if we have to keep the line
>>        if( fields[1] %in% sensors){
>>          ## data processing for the selected line (for the example
>> transformation in dataframe)
>>          n <- length(fields)
>>          x <- data.frame( as.numeric(fields[2:n]) )
>>          names(x) <- fields[1]
>>          ## We return the values
>>          print(paste0("sensor ",fields[1]," ok"))
>>          return( x )
>>        }else{
>>         print(paste0("Sensor ", fields[1] ," not selected"))
>>         r = try(nextElem(iter) )}
>>      }
>> }# end while loop
>> }
>>
>>
>>
>>
>>
>>
>>
>> --
>> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

R help mailing list-2
In reply to this post by Laurent Rhelp

Hi Laurent,

Thank you for explaining your size limitations. Below is an example
using the read.fwf() function to grab the first column of your input
file (in 2000 row chunks). This column is converted to an index, and
the index is used to create an iterator useful for skipping lines when
reading input with scan(). (You could try processing your large file
in successive 2000 line chunks, or whatever number of lines fits into
memory). Maybe not as elegant as the approach you were going for, but
read.fwf() should be pretty efficient:

> sensors <-  c("N053", "N163")
> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000, skip=0)
    V1
1 Time
2 N023
3 N053
4 N123
5 N163
6 N193
> first_col <- read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000, skip=0)
> which(first_col$V1 %in% sensors)
[1] 3 5
> index1 <- which(first_col$V1 %in% sensors)
> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1})
> unlist(scan(file="test2.txt", what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
 [1] "N053"      "-0.014083" "-0.004741" "0.001443"  "-0.010152"
"-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104"
> unlist(scan(file="test2.txt", what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
 [1] "N163"      "-0.054023" "-0.049345" "-0.037158" "-0.04112"
"-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436"
>

(Note for this email and the previous one, I've deleted the first
"hash" character from each line of your test file for clarity).

HTH, Bill.

W. Michels, Ph.D.





On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <[hidden email]> wrote:

>
> Dear William,
>   Thank you for your answer
> My file is very large so I cannot read it in my memory (I cannot use
> read.table). So I want to put in memory only the line I need to process.
> With readLines, as I did, it works but I would like to use an iterator
> and a foreach loop to understand this way to do because I thought that
> it was a better solution to write a nice code.
>
>
> Le 18/05/2020 à 04:54, William Michels a écrit :
> > Apologies, Laurent, for this two-part answer. I misunderstood your
> > post where you stated you wanted to "filter(ing) some
> > selected lines according to the line name... ." I thought that meant
> > you had a separate index (like a series of primes) that you wanted to
> > use to only read-in selected line numbers from a file (test file below
> > with numbers 1:1000 each on a separate line):
> >
> >> library(gmp)
> >> library(iterators)
> >> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> > Read 1 item
> > [1] 2
> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> > Read 1 item
> > [1] 3
> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> > Read 1 item
> > [1] 5
> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
> > Read 1 item
> > [1] 7
> > However, what it really seems that you want to do is read each line of
> > a (possibly enormous) file, test each line "string-wise" to keep or
> > discard, and if you're keeping it, append the line to a list. I can
> > certainly see the advantage of this strategy for reading in very, very
> > large files, but it's not clear to me how the "ireadLines" function (
> > in the "iterators" package) will help you, since it doesn't seem to
> > generate anything but a sequential index.
> >
> > Anyway, below is an absolutely standard read-in of your data using
> > read.table(). Hopefully some of the code I've posted has been useful
> > to you.
> >
> >> sensors <-  c("N053", "N163")
> >> read.table("test2.txt")
> >      V1        V2        V3        V4        V5        V6        V7
> >     V8        V9       V10
> > 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
> > 0.005997  0.006996  0.007996
> > 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
> > -0.033690 -0.041067 -0.038747
> > 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> > -0.008738 -0.015094 -0.012104
> > 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
> > -0.015089 -0.014439 -0.011681
> > 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> > -0.036061 -0.044516 -0.046436
> > 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
> > -0.021996 -0.021755 -0.021846
> >> Laurent_data <- read.table("test2.txt")
> >> Laurent_data[Laurent_data$V1 %in% sensors, ]
> >      V1        V2        V3        V4        V5        V6        V7
> >     V8        V9       V10
> > 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> > -0.008738 -0.015094 -0.012104
> > 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
> > -0.036061 -0.044516 -0.046436
> >
> > Best, Bill.
> >
> > W. Michels, Ph.D.
> >
> >
> > On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <[hidden email]> wrote:
> >> Dear R-Help List,
> >>
> >>      I would like to use an iterator to read a file filtering some
> >> selected lines according to the line name in order to use after a
> >> foreach loop. I wanted to use the checkFunc argument as the following
> >> example found on internet to select only prime numbers :
> >>
> >> |                                iprime <- ||iter||(1:100, checkFunc =
> >> ||function||(n) ||isprime||(n))|
> >>
> >> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
> >> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
> >>
> >> but the checkFunc argument seems not to be available with the function
> >> ireadLines (package iterators). So, I did the code below to solve my
> >> problem but I am sure that I miss something to use iterators with files.
> >> Since I found nothing on the web about ireadLines and the checkFunc
> >> argument, could somebody help me to understand how we have to use
> >> iterator (and foreach loop) on files keeping only selected lines ?
> >>
> >> Thank you very much
> >> Laurent
> >>
> >> Presently here is my code:
> >>
> >> ##        mock file to read: test.txt
> >> ##
> >> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
> >> 0.005997    0.006996    0.007996
> >> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
> >> -0.026816    -0.03369    -0.041067    -0.038747
> >> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
> >> -0.005337    -0.008738    -0.015094    -0.012104
> >> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
> >> -0.020243    -0.015089    -0.014439    -0.011681
> >> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
> >> -0.036953    -0.036061    -0.044516    -0.046436
> >> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
> >> -0.021827    -0.021996    -0.021755    -0.021846
> >>
> >>
> >> # sensors to keep
> >>
> >> sensors <-  c("N053", "N163")
> >>
> >>
> >> library(iterators)
> >>
> >> library(rlist)
> >>
> >>
> >> file_name <- "test.txt"
> >>
> >> con_obj <- file( file_name , "r")
> >> ifile <- ireadLines( con_obj , n = 1 )
> >>
> >>
> >> ## I do not do a loop for the example
> >>
> >> res <- list()
> >>
> >> r <- get_Lines_iter( ifile , sensors)
> >> res <- list.append( res , r )
> >> res
> >> r <- get_Lines_iter( ifile , sensors)
> >> res <- list.append( res , r )
> >> res
> >> r <- get_Lines_iter( ifile , sensors)
> >> do.call("cbind",res)
> >>
> >> ## the function get_Lines_iter to select and process the line
> >>
> >> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
> >>     ## read the next record in the iterator
> >>     r = try( nextElem(iter) )
> >>    while(  TRUE ){
> >>       if( class(r) == "try-error") {
> >>             return( stop("The iterator is empty") )
> >>      } else {
> >>      ## split the read line according to the separator
> >>       r_txt <- textConnection(r)
> >>       fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
> >> quiet)
> >>        ## test if we have to keep the line
> >>        if( fields[1] %in% sensors){
> >>          ## data processing for the selected line (for the example
> >> transformation in dataframe)
> >>          n <- length(fields)
> >>          x <- data.frame( as.numeric(fields[2:n]) )
> >>          names(x) <- fields[1]
> >>          ## We return the values
> >>          print(paste0("sensor ",fields[1]," ok"))
> >>          return( x )
> >>        }else{
> >>         print(paste0("Sensor ", fields[1] ," not selected"))
> >>         r = try(nextElem(iter) )}
> >>      }
> >> }# end while loop
> >> }
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> >> https://www.avast.com/antivirus
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Laurent Rhelp
In reply to this post by Laurent Rhelp


GREAT ! It is exactly in the idea of my request !
I like the nextElem call in the skip argument.
Thank you very much William
Best Regards
Laurent


Le 18/05/2020 à 20:37, William Michels a écrit :

> Hi Laurent,
>
> Thank you for explaining your size limitations. Below is an example
> using the read.fwf() function to grab the first column of your input
> file (in 2000 row chunks). This column is converted to an index, and
> the index is used to create an iterator useful for skipping lines when
> reading input with scan(). (You could try processing your large file
> in successive 2000 line chunks, or whatever number of lines fits into
> memory). Maybe not as elegant as the approach you were going for, but
> read.fwf() should be pretty efficient:
>
>> sensors <-  c("N053", "N163")
>> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000, skip=0)
>      V1
> 1 Time
> 2 N023
> 3 N053
> 4 N123
> 5 N163
> 6 N193
>> first_col <- read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000, skip=0)
>> which(first_col$V1 %in% sensors)
> [1] 3 5
>> index1 <- which(first_col$V1 %in% sensors)
>> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1})
>> unlist(scan(file="test2.txt", what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
>   [1] "N053"      "-0.014083" "-0.004741" "0.001443"  "-0.010152"
> "-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104"
>> unlist(scan(file="test2.txt", what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
>   [1] "N163"      "-0.054023" "-0.049345" "-0.037158" "-0.04112"
> "-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436"
> (Note for this email and the previous one, I've deleted the first
> "hash" character from each line of your test file for clarity).
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
>
>
> On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <[hidden email]> wrote:
>> Dear William,
>>    Thank you for your answer
>> My file is very large so I cannot read it in my memory (I cannot use
>> read.table). So I want to put in memory only the line I need to process.
>> With readLines, as I did, it works but I would like to use an iterator
>> and a foreach loop to understand this way to do because I thought that
>> it was a better solution to write a nice code.
>>
>>
>> Le 18/05/2020 à 04:54, William Michels a écrit :
>>> Apologies, Laurent, for this two-part answer. I misunderstood your
>>> post where you stated you wanted to "filter(ing) some
>>> selected lines according to the line name... ." I thought that meant
>>> you had a separate index (like a series of primes) that you wanted to
>>> use to only read-in selected line numbers from a file (test file below
>>> with numbers 1:1000 each on a separate line):
>>>
>>>> library(gmp)
>>>> library(iterators)
>>>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 2
>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 3
>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 5
>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>> Read 1 item
>>> [1] 7
>>> However, what it really seems that you want to do is read each line of
>>> a (possibly enormous) file, test each line "string-wise" to keep or
>>> discard, and if you're keeping it, append the line to a list. I can
>>> certainly see the advantage of this strategy for reading in very, very
>>> large files, but it's not clear to me how the "ireadLines" function (
>>> in the "iterators" package) will help you, since it doesn't seem to
>>> generate anything but a sequential index.
>>>
>>> Anyway, below is an absolutely standard read-in of your data using
>>> read.table(). Hopefully some of the code I've posted has been useful
>>> to you.
>>>
>>>> sensors <-  c("N053", "N163")
>>>> read.table("test2.txt")
>>>       V1        V2        V3        V4        V5        V6        V7
>>>      V8        V9       V10
>>> 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
>>> 0.005997  0.006996  0.007996
>>> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
>>> -0.033690 -0.041067 -0.038747
>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>> -0.008738 -0.015094 -0.012104
>>> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
>>> -0.015089 -0.014439 -0.011681
>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>> -0.036061 -0.044516 -0.046436
>>> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
>>> -0.021996 -0.021755 -0.021846
>>>> Laurent_data <- read.table("test2.txt")
>>>> Laurent_data[Laurent_data$V1 %in% sensors, ]
>>>       V1        V2        V3        V4        V5        V6        V7
>>>      V8        V9       V10
>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>> -0.008738 -0.015094 -0.012104
>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>> -0.036061 -0.044516 -0.046436
>>>
>>> Best, Bill.
>>>
>>> W. Michels, Ph.D.
>>>
>>>
>>> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp <[hidden email]> wrote:
>>>> Dear R-Help List,
>>>>
>>>>       I would like to use an iterator to read a file filtering some
>>>> selected lines according to the line name in order to use after a
>>>> foreach loop. I wanted to use the checkFunc argument as the following
>>>> example found on internet to select only prime numbers :
>>>>
>>>> |                                iprime <- ||iter||(1:100, checkFunc =
>>>> ||function||(n) ||isprime||(n))|
>>>>
>>>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>>>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>>>
>>>> but the checkFunc argument seems not to be available with the function
>>>> ireadLines (package iterators). So, I did the code below to solve my
>>>> problem but I am sure that I miss something to use iterators with files.
>>>> Since I found nothing on the web about ireadLines and the checkFunc
>>>> argument, could somebody help me to understand how we have to use
>>>> iterator (and foreach loop) on files keeping only selected lines ?
>>>>
>>>> Thank you very much
>>>> Laurent
>>>>
>>>> Presently here is my code:
>>>>
>>>> ##        mock file to read: test.txt
>>>> ##
>>>> # Time    0    0.000999    0.001999    0.002998    0.003998 0.004997
>>>> 0.005997    0.006996    0.007996
>>>> # N023    -0.031323    -0.035026    -0.029759    -0.024886 -0.024464
>>>> -0.026816    -0.03369    -0.041067    -0.038747
>>>> # N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
>>>> -0.005337    -0.008738    -0.015094    -0.012104
>>>> # N123    -0.019008    -0.013494    -0.01318    -0.029208 -0.032748
>>>> -0.020243    -0.015089    -0.014439    -0.011681
>>>> # N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
>>>> -0.036953    -0.036061    -0.044516    -0.046436
>>>> # N193    -0.022171    -0.022384    -0.022338    -0.023304 -0.022569
>>>> -0.021827    -0.021996    -0.021755    -0.021846
>>>>
>>>>
>>>> # sensors to keep
>>>>
>>>> sensors <-  c("N053", "N163")
>>>>
>>>>
>>>> library(iterators)
>>>>
>>>> library(rlist)
>>>>
>>>>
>>>> file_name <- "test.txt"
>>>>
>>>> con_obj <- file( file_name , "r")
>>>> ifile <- ireadLines( con_obj , n = 1 )
>>>>
>>>>
>>>> ## I do not do a loop for the example
>>>>
>>>> res <- list()
>>>>
>>>> r <- get_Lines_iter( ifile , sensors)
>>>> res <- list.append( res , r )
>>>> res
>>>> r <- get_Lines_iter( ifile , sensors)
>>>> res <- list.append( res , r )
>>>> res
>>>> r <- get_Lines_iter( ifile , sensors)
>>>> do.call("cbind",res)
>>>>
>>>> ## the function get_Lines_iter to select and process the line
>>>>
>>>> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet = FALSE){
>>>>      ## read the next record in the iterator
>>>>      r = try( nextElem(iter) )
>>>>     while(  TRUE ){
>>>>        if( class(r) == "try-error") {
>>>>              return( stop("The iterator is empty") )
>>>>       } else {
>>>>       ## split the read line according to the separator
>>>>        r_txt <- textConnection(r)
>>>>        fields <- scan(file = r_txt, what = "character", sep = sep, quiet =
>>>> quiet)
>>>>         ## test if we have to keep the line
>>>>         if( fields[1] %in% sensors){
>>>>           ## data processing for the selected line (for the example
>>>> transformation in dataframe)
>>>>           n <- length(fields)
>>>>           x <- data.frame( as.numeric(fields[2:n]) )
>>>>           names(x) <- fields[1]
>>>>           ## We return the values
>>>>           print(paste0("sensor ",fields[1]," ok"))
>>>>           return( x )
>>>>         }else{
>>>>          print(paste0("Sensor ", fields[1] ," not selected"))
>>>>          r = try(nextElem(iter) )}
>>>>       }
>>>> }# end while loop
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>>>> https://www.avast.com/antivirus
>>>>
>>>>           [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> --
>> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Jeff Newmiller
In reply to this post by R help mailing list-2

Laurent... Bill is suggesting building your own indexed database... but this has been done before, so re-inventing the wheel seems inefficient and risky. It is actually impossible to create such a beast without reading the entire file into memory at least temporarily anyway, so you are better off looking at ways to process the entire file efficiently.

For example, you could load the data into a sqlite database in a couple of lines of code and use SQL directly or use the sqldf data frame interface, or use dplyr to query the database.

Or you could look at read_csv_chunked from readr package.

On May 18, 2020 11:37:46 AM PDT, William Michels via R-help <[hidden email]> wrote:

>
>Hi Laurent,
>
>Thank you for explaining your size limitations. Below is an example
>using the read.fwf() function to grab the first column of your input
>file (in 2000 row chunks). This column is converted to an index, and
>the index is used to create an iterator useful for skipping lines when
>reading input with scan(). (You could try processing your large file
>in successive 2000 line chunks, or whatever number of lines fits into
>memory). Maybe not as elegant as the approach you were going for, but
>read.fwf() should be pretty efficient:
>
>> sensors <-  c("N053", "N163")
>> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000,
>skip=0)
>    V1
>1 Time
>2 N023
>3 N053
>4 N123
>5 N163
>6 N193
>> first_col <- read.fwf("test2.txt", widths=c(4), as.is=TRUE,
>flush=TRUE, n=2000, skip=0)
>> which(first_col$V1 %in% sensors)
>[1] 3 5
>> index1 <- which(first_col$V1 %in% sensors)
>> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1})
>> unlist(scan(file="test2.txt",
>what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE,
>skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
> [1] "N053"      "-0.014083" "-0.004741" "0.001443"  "-0.010152"
>"-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104"
>> unlist(scan(file="test2.txt",
>what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE,
>skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
> [1] "N163"      "-0.054023" "-0.049345" "-0.037158" "-0.04112"
>"-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436"
>>
>
>(Note for this email and the previous one, I've deleted the first
>"hash" character from each line of your test file for clarity).
>
>HTH, Bill.
>
>W. Michels, Ph.D.
>
>
>
>
>
>On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <[hidden email]>
>wrote:
>>
>> Dear William,
>>   Thank you for your answer
>> My file is very large so I cannot read it in my memory (I cannot use
>> read.table). So I want to put in memory only the line I need to
>process.
>> With readLines, as I did, it works but I would like to use an
>iterator
>> and a foreach loop to understand this way to do because I thought
>that
>> it was a better solution to write a nice code.
>>
>>
>> Le 18/05/2020 à 04:54, William Michels a écrit :
>> > Apologies, Laurent, for this two-part answer. I misunderstood your
>> > post where you stated you wanted to "filter(ing) some
>> > selected lines according to the line name... ." I thought that
>meant
>> > you had a separate index (like a series of primes) that you wanted
>to
>> > use to only read-in selected line numbers from a file (test file
>below
>> > with numbers 1:1000 each on a separate line):
>> >
>> >> library(gmp)
>> >> library(iterators)
>> >> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>> > Read 1 item
>> > [1] 2
>> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>> > Read 1 item
>> > [1] 3
>> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>> > Read 1 item
>> > [1] 5
>> >> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>> > Read 1 item
>> > [1] 7
>> > However, what it really seems that you want to do is read each line
>of
>> > a (possibly enormous) file, test each line "string-wise" to keep or
>> > discard, and if you're keeping it, append the line to a list. I can
>> > certainly see the advantage of this strategy for reading in very,
>very
>> > large files, but it's not clear to me how the "ireadLines" function
>(
>> > in the "iterators" package) will help you, since it doesn't seem to
>> > generate anything but a sequential index.
>> >
>> > Anyway, below is an absolutely standard read-in of your data using
>> > read.table(). Hopefully some of the code I've posted has been
>useful
>> > to you.
>> >
>> >> sensors <-  c("N053", "N163")
>> >> read.table("test2.txt")
>> >      V1        V2        V3        V4        V5        V6        V7
>> >     V8        V9       V10
>> > 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
>> > 0.005997  0.006996  0.007996
>> > 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
>> > -0.033690 -0.041067 -0.038747
>> > 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>> > -0.008738 -0.015094 -0.012104
>> > 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
>> > -0.015089 -0.014439 -0.011681
>> > 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>> > -0.036061 -0.044516 -0.046436
>> > 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
>> > -0.021996 -0.021755 -0.021846
>> >> Laurent_data <- read.table("test2.txt")
>> >> Laurent_data[Laurent_data$V1 %in% sensors, ]
>> >      V1        V2        V3        V4        V5        V6        V7
>> >     V8        V9       V10
>> > 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>> > -0.008738 -0.015094 -0.012104
>> > 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>> > -0.036061 -0.044516 -0.046436
>> >
>> > Best, Bill.
>> >
>> > W. Michels, Ph.D.
>> >
>> >
>> > On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp
><[hidden email]> wrote:
>> >> Dear R-Help List,
>> >>
>> >>      I would like to use an iterator to read a file filtering some
>> >> selected lines according to the line name in order to use after a
>> >> foreach loop. I wanted to use the checkFunc argument as the
>following
>> >> example found on internet to select only prime numbers :
>> >>
>> >> |                                iprime <- ||iter||(1:100,
>checkFunc =
>> >> ||function||(n) ||isprime||(n))|
>> >>
>> >> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>> >> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>> >>
>> >> but the checkFunc argument seems not to be available with the
>function
>> >> ireadLines (package iterators). So, I did the code below to solve
>my
>> >> problem but I am sure that I miss something to use iterators with
>files.
>> >> Since I found nothing on the web about ireadLines and the
>checkFunc
>> >> argument, could somebody help me to understand how we have to use
>> >> iterator (and foreach loop) on files keeping only selected lines ?
>> >>
>> >> Thank you very much
>> >> Laurent
>> >>
>> >> Presently here is my code:
>> >>
>> >> ##        mock file to read: test.txt
>> >> ##
>> >> # Time    0    0.000999    0.001999    0.002998    0.003998
>0.004997
>> >> 0.005997    0.006996    0.007996
>> >> # N023    -0.031323    -0.035026    -0.029759    -0.024886
>-0.024464
>> >> -0.026816    -0.03369    -0.041067    -0.038747
>> >> # N053    -0.014083    -0.004741    0.001443    -0.010152
>-0.012996
>> >> -0.005337    -0.008738    -0.015094    -0.012104
>> >> # N123    -0.019008    -0.013494    -0.01318    -0.029208
>-0.032748
>> >> -0.020243    -0.015089    -0.014439    -0.011681
>> >> # N163    -0.054023    -0.049345    -0.037158    -0.04112
>-0.044612
>> >> -0.036953    -0.036061    -0.044516    -0.046436
>> >> # N193    -0.022171    -0.022384    -0.022338    -0.023304
>-0.022569
>> >> -0.021827    -0.021996    -0.021755    -0.021846
>> >>
>> >>
>> >> # sensors to keep
>> >>
>> >> sensors <-  c("N053", "N163")
>> >>
>> >>
>> >> library(iterators)
>> >>
>> >> library(rlist)
>> >>
>> >>
>> >> file_name <- "test.txt"
>> >>
>> >> con_obj <- file( file_name , "r")
>> >> ifile <- ireadLines( con_obj , n = 1 )
>> >>
>> >>
>> >> ## I do not do a loop for the example
>> >>
>> >> res <- list()
>> >>
>> >> r <- get_Lines_iter( ifile , sensors)
>> >> res <- list.append( res , r )
>> >> res
>> >> r <- get_Lines_iter( ifile , sensors)
>> >> res <- list.append( res , r )
>> >> res
>> >> r <- get_Lines_iter( ifile , sensors)
>> >> do.call("cbind",res)
>> >>
>> >> ## the function get_Lines_iter to select and process the line
>> >>
>> >> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet =
>FALSE){
>> >>     ## read the next record in the iterator
>> >>     r = try( nextElem(iter) )
>> >>    while(  TRUE ){
>> >>       if( class(r) == "try-error") {
>> >>             return( stop("The iterator is empty") )
>> >>      } else {
>> >>      ## split the read line according to the separator
>> >>       r_txt <- textConnection(r)
>> >>       fields <- scan(file = r_txt, what = "character", sep = sep,
>quiet =
>> >> quiet)
>> >>        ## test if we have to keep the line
>> >>        if( fields[1] %in% sensors){
>> >>          ## data processing for the selected line (for the example
>> >> transformation in dataframe)
>> >>          n <- length(fields)
>> >>          x <- data.frame( as.numeric(fields[2:n]) )
>> >>          names(x) <- fields[1]
>> >>          ## We return the values
>> >>          print(paste0("sensor ",fields[1]," ok"))
>> >>          return( x )
>> >>        }else{
>> >>         print(paste0("Sensor ", fields[1] ," not selected"))
>> >>         r = try(nextElem(iter) )}
>> >>      }
>> >> }# end while loop
>> >> }
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> L'absence de virus dans ce courrier électronique a été vérifiée
>par le logiciel antivirus Avast.
>> >> https://www.avast.com/antivirus
>> >>
>> >>          [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> L'absence de virus dans ce courrier électronique a été vérifiée par
>le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Laurent Rhelp
In reply to this post by R help mailing list-2

Ok, thank you for the advice I will take some time to see in details
these packages.


Le 19/05/2020 à 05:44, Jeff Newmiller a écrit :

> Laurent... Bill is suggesting building your own indexed database... but this has been done before, so re-inventing the wheel seems inefficient and risky. It is actually impossible to create such a beast without reading the entire file into memory at least temporarily anyway, so you are better off looking at ways to process the entire file efficiently.
>
> For example, you could load the data into a sqlite database in a couple of lines of code and use SQL directly or use the sqldf data frame interface, or use dplyr to query the database.
>
> Or you could look at read_csv_chunked from readr package.
>
> On May 18, 2020 11:37:46 AM PDT, William Michels via R-help <[hidden email]> wrote:
>> Hi Laurent,
>>
>> Thank you for explaining your size limitations. Below is an example
>> using the read.fwf() function to grab the first column of your input
>> file (in 2000 row chunks). This column is converted to an index, and
>> the index is used to create an iterator useful for skipping lines when
>> reading input with scan(). (You could try processing your large file
>> in successive 2000 line chunks, or whatever number of lines fits into
>> memory). Maybe not as elegant as the approach you were going for, but
>> read.fwf() should be pretty efficient:
>>
>>> sensors <-  c("N053", "N163")
>>> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000,
>> skip=0)
>>     V1
>> 1 Time
>> 2 N023
>> 3 N053
>> 4 N123
>> 5 N163
>> 6 N193
>>> first_col <- read.fwf("test2.txt", widths=c(4), as.is=TRUE,
>> flush=TRUE, n=2000, skip=0)
>>> which(first_col$V1 %in% sensors)
>> [1] 3 5
>>> index1 <- which(first_col$V1 %in% sensors)
>>> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1})
>>> unlist(scan(file="test2.txt",
>> what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE,
>> skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
>> [1] "N053"      "-0.014083" "-0.004741" "0.001443"  "-0.010152"
>> "-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104"
>>> unlist(scan(file="test2.txt",
>> what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE,
>> skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
>> [1] "N163"      "-0.054023" "-0.049345" "-0.037158" "-0.04112"
>> "-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436"
>> (Note for this email and the previous one, I've deleted the first
>> "hash" character from each line of your test file for clarity).
>>
>> HTH, Bill.
>>
>> W. Michels, Ph.D.
>>
>>
>>
>>
>>
>> On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <[hidden email]>
>> wrote:
>>> Dear William,
>>>    Thank you for your answer
>>> My file is very large so I cannot read it in my memory (I cannot use
>>> read.table). So I want to put in memory only the line I need to
>> process.
>>> With readLines, as I did, it works but I would like to use an
>> iterator
>>> and a foreach loop to understand this way to do because I thought
>> that
>>> it was a better solution to write a nice code.
>>>
>>>
>>> Le 18/05/2020 à 04:54, William Michels a écrit :
>>>> Apologies, Laurent, for this two-part answer. I misunderstood your
>>>> post where you stated you wanted to "filter(ing) some
>>>> selected lines according to the line name... ." I thought that
>> meant
>>>> you had a separate index (like a series of primes) that you wanted
>> to
>>>> use to only read-in selected line numbers from a file (test file
>> below
>>>> with numbers 1:1000 each on a separate line):
>>>>
>>>>> library(gmp)
>>>>> library(iterators)
>>>>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>>> Read 1 item
>>>> [1] 2
>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>>> Read 1 item
>>>> [1] 3
>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>>> Read 1 item
>>>> [1] 5
>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1, nlines=1)
>>>> Read 1 item
>>>> [1] 7
>>>> However, what it really seems that you want to do is read each line
>> of
>>>> a (possibly enormous) file, test each line "string-wise" to keep or
>>>> discard, and if you're keeping it, append the line to a list. I can
>>>> certainly see the advantage of this strategy for reading in very,
>> very
>>>> large files, but it's not clear to me how the "ireadLines" function
>> (
>>>> in the "iterators" package) will help you, since it doesn't seem to
>>>> generate anything but a sequential index.
>>>>
>>>> Anyway, below is an absolutely standard read-in of your data using
>>>> read.table(). Hopefully some of the code I've posted has been
>> useful
>>>> to you.
>>>>
>>>>> sensors <-  c("N053", "N163")
>>>>> read.table("test2.txt")
>>>>       V1        V2        V3        V4        V5        V6        V7
>>>>      V8        V9       V10
>>>> 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
>>>> 0.005997  0.006996  0.007996
>>>> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
>>>> -0.033690 -0.041067 -0.038747
>>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>>> -0.008738 -0.015094 -0.012104
>>>> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
>>>> -0.015089 -0.014439 -0.011681
>>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>>> -0.036061 -0.044516 -0.046436
>>>> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
>>>> -0.021996 -0.021755 -0.021846
>>>>> Laurent_data <- read.table("test2.txt")
>>>>> Laurent_data[Laurent_data$V1 %in% sensors, ]
>>>>       V1        V2        V3        V4        V5        V6        V7
>>>>      V8        V9       V10
>>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>>> -0.008738 -0.015094 -0.012104
>>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>>> -0.036061 -0.044516 -0.046436
>>>>
>>>> Best, Bill.
>>>>
>>>> W. Michels, Ph.D.
>>>>
>>>>
>>>> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp
>> <[hidden email]> wrote:
>>>>> Dear R-Help List,
>>>>>
>>>>>       I would like to use an iterator to read a file filtering some
>>>>> selected lines according to the line name in order to use after a
>>>>> foreach loop. I wanted to use the checkFunc argument as the
>> following
>>>>> example found on internet to select only prime numbers :
>>>>>
>>>>> |                                iprime <- ||iter||(1:100,
>> checkFunc =
>>>>> ||function||(n) ||isprime||(n))|
>>>>>
>>>>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>>>>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>>>>
>>>>> but the checkFunc argument seems not to be available with the
>> function
>>>>> ireadLines (package iterators). So, I did the code below to solve
>> my
>>>>> problem but I am sure that I miss something to use iterators with
>> files.
>>>>> Since I found nothing on the web about ireadLines and the
>> checkFunc
>>>>> argument, could somebody help me to understand how we have to use
>>>>> iterator (and foreach loop) on files keeping only selected lines ?
>>>>>
>>>>> Thank you very much
>>>>> Laurent
>>>>>
>>>>> Presently here is my code:
>>>>>
>>>>> ##        mock file to read: test.txt
>>>>> ##
>>>>> # Time    0    0.000999    0.001999    0.002998    0.003998
>> 0.004997
>>>>> 0.005997    0.006996    0.007996
>>>>> # N023    -0.031323    -0.035026    -0.029759    -0.024886
>> -0.024464
>>>>> -0.026816    -0.03369    -0.041067    -0.038747
>>>>> # N053    -0.014083    -0.004741    0.001443    -0.010152
>> -0.012996
>>>>> -0.005337    -0.008738    -0.015094    -0.012104
>>>>> # N123    -0.019008    -0.013494    -0.01318    -0.029208
>> -0.032748
>>>>> -0.020243    -0.015089    -0.014439    -0.011681
>>>>> # N163    -0.054023    -0.049345    -0.037158    -0.04112
>> -0.044612
>>>>> -0.036953    -0.036061    -0.044516    -0.046436
>>>>> # N193    -0.022171    -0.022384    -0.022338    -0.023304
>> -0.022569
>>>>> -0.021827    -0.021996    -0.021755    -0.021846
>>>>>
>>>>>
>>>>> # sensors to keep
>>>>>
>>>>> sensors <-  c("N053", "N163")
>>>>>
>>>>>
>>>>> library(iterators)
>>>>>
>>>>> library(rlist)
>>>>>
>>>>>
>>>>> file_name <- "test.txt"
>>>>>
>>>>> con_obj <- file( file_name , "r")
>>>>> ifile <- ireadLines( con_obj , n = 1 )
>>>>>
>>>>>
>>>>> ## I do not do a loop for the example
>>>>>
>>>>> res <- list()
>>>>>
>>>>> r <- get_Lines_iter( ifile , sensors)
>>>>> res <- list.append( res , r )
>>>>> res
>>>>> r <- get_Lines_iter( ifile , sensors)
>>>>> res <- list.append( res , r )
>>>>> res
>>>>> r <- get_Lines_iter( ifile , sensors)
>>>>> do.call("cbind",res)
>>>>>
>>>>> ## the function get_Lines_iter to select and process the line
>>>>>
>>>>> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet =
>> FALSE){
>>>>>      ## read the next record in the iterator
>>>>>      r = try( nextElem(iter) )
>>>>>     while(  TRUE ){
>>>>>        if( class(r) == "try-error") {
>>>>>              return( stop("The iterator is empty") )
>>>>>       } else {
>>>>>       ## split the read line according to the separator
>>>>>        r_txt <- textConnection(r)
>>>>>        fields <- scan(file = r_txt, what = "character", sep = sep,
>> quiet =
>>>>> quiet)
>>>>>         ## test if we have to keep the line
>>>>>         if( fields[1] %in% sensors){
>>>>>           ## data processing for the selected line (for the example
>>>>> transformation in dataframe)
>>>>>           n <- length(fields)
>>>>>           x <- data.frame( as.numeric(fields[2:n]) )
>>>>>           names(x) <- fields[1]
>>>>>           ## We return the values
>>>>>           print(paste0("sensor ",fields[1]," ok"))
>>>>>           return( x )
>>>>>         }else{
>>>>>          print(paste0("Sensor ", fields[1] ," not selected"))
>>>>>          r = try(nextElem(iter) )}
>>>>>       }
>>>>> }# end while loop
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> L'absence de virus dans ce courrier électronique a été vérifiée
>> par le logiciel antivirus Avast.
>>>>> https://www.avast.com/antivirus
>>>>>
>>>>>           [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> --
>>> L'absence de virus dans ce courrier électronique a été vérifiée par
>> le logiciel antivirus Avast.
>>> https://www.avast.com/antivirus
>>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Jeff Newmiller
In reply to this post by R help mailing list-2

There is also apparently a package called disk.frame that you might consider.

On May 19, 2020 12:07:38 AM PDT, Laurent Rhelp <[hidden email]> wrote:

>Ok, thank you for the advice I will take some time to see in details
>these packages.
>
>
>Le 19/05/2020 à 05:44, Jeff Newmiller a écrit :
>> Laurent... Bill is suggesting building your own indexed database...
>but this has been done before, so re-inventing the wheel seems
>inefficient and risky. It is actually impossible to create such a beast
>without reading the entire file into memory at least temporarily
>anyway, so you are better off looking at ways to process the entire
>file efficiently.
>>
>> For example, you could load the data into a sqlite database in a
>couple of lines of code and use SQL directly or use the sqldf data
>frame interface, or use dplyr to query the database.
>>
>> Or you could look at read_csv_chunked from readr package.
>>
>> On May 18, 2020 11:37:46 AM PDT, William Michels via R-help
><[hidden email]> wrote:
>>> Hi Laurent,
>>>
>>> Thank you for explaining your size limitations. Below is an example
>>> using the read.fwf() function to grab the first column of your input
>>> file (in 2000 row chunks). This column is converted to an index, and
>>> the index is used to create an iterator useful for skipping lines
>when
>>> reading input with scan(). (You could try processing your large file
>>> in successive 2000 line chunks, or whatever number of lines fits
>into
>>> memory). Maybe not as elegant as the approach you were going for,
>but
>>> read.fwf() should be pretty efficient:
>>>
>>>> sensors <-  c("N053", "N163")
>>>> read.fwf("test2.txt", widths=c(4), as.is=TRUE, flush=TRUE, n=2000,
>>> skip=0)
>>>     V1
>>> 1 Time
>>> 2 N023
>>> 3 N053
>>> 4 N123
>>> 5 N163
>>> 6 N193
>>>> first_col <- read.fwf("test2.txt", widths=c(4), as.is=TRUE,
>>> flush=TRUE, n=2000, skip=0)
>>>> which(first_col$V1 %in% sensors)
>>> [1] 3 5
>>>> index1 <- which(first_col$V1 %in% sensors)
>>>> iter_index1 <- iter(1:2000, checkFunc= function(n) {n %in% index1})
>>>> unlist(scan(file="test2.txt",
>>> what=list("","","","","","","","","",""), flush=TRUE,
>multi.line=FALSE,
>>> skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
>>> [1] "N053"      "-0.014083" "-0.004741" "0.001443"  "-0.010152"
>>> "-0.012996" "-0.005337" "-0.008738" "-0.015094" "-0.012104"
>>>> unlist(scan(file="test2.txt",
>>> what=list("","","","","","","","","",""), flush=TRUE,
>multi.line=FALSE,
>>> skip=nextElem(iter_index1)-1, nlines=1, quiet=TRUE))
>>> [1] "N163"      "-0.054023" "-0.049345" "-0.037158" "-0.04112"
>>> "-0.044612" "-0.036953" "-0.036061" "-0.044516" "-0.046436"
>>> (Note for this email and the previous one, I've deleted the first
>>> "hash" character from each line of your test file for clarity).
>>>
>>> HTH, Bill.
>>>
>>> W. Michels, Ph.D.
>>>
>>>
>>>
>>>
>>>
>>> On Mon, May 18, 2020 at 3:35 AM Laurent Rhelp <[hidden email]>
>>> wrote:
>>>> Dear William,
>>>>    Thank you for your answer
>>>> My file is very large so I cannot read it in my memory (I cannot
>use
>>>> read.table). So I want to put in memory only the line I need to
>>> process.
>>>> With readLines, as I did, it works but I would like to use an
>>> iterator
>>>> and a foreach loop to understand this way to do because I thought
>>> that
>>>> it was a better solution to write a nice code.
>>>>
>>>>
>>>> Le 18/05/2020 à 04:54, William Michels a écrit :
>>>>> Apologies, Laurent, for this two-part answer. I misunderstood your
>>>>> post where you stated you wanted to "filter(ing) some
>>>>> selected lines according to the line name... ." I thought that
>>> meant
>>>>> you had a separate index (like a series of primes) that you wanted
>>> to
>>>>> use to only read-in selected line numbers from a file (test file
>>> below
>>>>> with numbers 1:1000 each on a separate line):
>>>>>
>>>>>> library(gmp)
>>>>>> library(iterators)
>>>>>> iprime <- iter(1:100, checkFunc = function(n) isprime(n))
>>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
>nlines=1)
>>>>> Read 1 item
>>>>> [1] 2
>>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
>nlines=1)
>>>>> Read 1 item
>>>>> [1] 3
>>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
>nlines=1)
>>>>> Read 1 item
>>>>> [1] 5
>>>>>> scan(file="one_thou_lines.txt", skip=nextElem(iprime)-1,
>nlines=1)
>>>>> Read 1 item
>>>>> [1] 7
>>>>> However, what it really seems that you want to do is read each
>line
>>> of
>>>>> a (possibly enormous) file, test each line "string-wise" to keep
>or
>>>>> discard, and if you're keeping it, append the line to a list. I
>can
>>>>> certainly see the advantage of this strategy for reading in very,
>>> very
>>>>> large files, but it's not clear to me how the "ireadLines"
>function
>>> (
>>>>> in the "iterators" package) will help you, since it doesn't seem
>to
>>>>> generate anything but a sequential index.
>>>>>
>>>>> Anyway, below is an absolutely standard read-in of your data using
>>>>> read.table(). Hopefully some of the code I've posted has been
>>> useful
>>>>> to you.
>>>>>
>>>>>> sensors <-  c("N053", "N163")
>>>>>> read.table("test2.txt")
>>>>>       V1        V2        V3        V4        V5        V6      
>V7
>>>>>      V8        V9       V10
>>>>> 1 Time  0.000000  0.000999  0.001999  0.002998  0.003998  0.004997
>>>>> 0.005997  0.006996  0.007996
>>>>> 2 N023 -0.031323 -0.035026 -0.029759 -0.024886 -0.024464 -0.026816
>>>>> -0.033690 -0.041067 -0.038747
>>>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>>>> -0.008738 -0.015094 -0.012104
>>>>> 4 N123 -0.019008 -0.013494 -0.013180 -0.029208 -0.032748 -0.020243
>>>>> -0.015089 -0.014439 -0.011681
>>>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>>>> -0.036061 -0.044516 -0.046436
>>>>> 6 N193 -0.022171 -0.022384 -0.022338 -0.023304 -0.022569 -0.021827
>>>>> -0.021996 -0.021755 -0.021846
>>>>>> Laurent_data <- read.table("test2.txt")
>>>>>> Laurent_data[Laurent_data$V1 %in% sensors, ]
>>>>>       V1        V2        V3        V4        V5        V6      
>V7
>>>>>      V8        V9       V10
>>>>> 3 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
>>>>> -0.008738 -0.015094 -0.012104
>>>>> 5 N163 -0.054023 -0.049345 -0.037158 -0.041120 -0.044612 -0.036953
>>>>> -0.036061 -0.044516 -0.046436
>>>>>
>>>>> Best, Bill.
>>>>>
>>>>> W. Michels, Ph.D.
>>>>>
>>>>>
>>>>> On Sun, May 17, 2020 at 5:43 PM Laurent Rhelp
>>> <[hidden email]> wrote:
>>>>>> Dear R-Help List,
>>>>>>
>>>>>>       I would like to use an iterator to read a file filtering
>some
>>>>>> selected lines according to the line name in order to use after a
>>>>>> foreach loop. I wanted to use the checkFunc argument as the
>>> following
>>>>>> example found on internet to select only prime numbers :
>>>>>>
>>>>>> |                                iprime <- ||iter||(1:100,
>>> checkFunc =
>>>>>> ||function||(n) ||isprime||(n))|
>>>>>>
>>>>>> |(https://datawookie.netlify.app/blog/2013/11/iterators-in-r/)
>>>>>> <https://datawookie.netlify.app/blog/2013/11/iterators-in-r/>|
>>>>>>
>>>>>> but the checkFunc argument seems not to be available with the
>>> function
>>>>>> ireadLines (package iterators). So, I did the code below to solve
>>> my
>>>>>> problem but I am sure that I miss something to use iterators with
>>> files.
>>>>>> Since I found nothing on the web about ireadLines and the
>>> checkFunc
>>>>>> argument, could somebody help me to understand how we have to use
>>>>>> iterator (and foreach loop) on files keeping only selected lines
>?
>>>>>>
>>>>>> Thank you very much
>>>>>> Laurent
>>>>>>
>>>>>> Presently here is my code:
>>>>>>
>>>>>> ##        mock file to read: test.txt
>>>>>> ##
>>>>>> # Time    0    0.000999    0.001999    0.002998    0.003998
>>> 0.004997
>>>>>> 0.005997    0.006996    0.007996
>>>>>> # N023    -0.031323    -0.035026    -0.029759    -0.024886
>>> -0.024464
>>>>>> -0.026816    -0.03369    -0.041067    -0.038747
>>>>>> # N053    -0.014083    -0.004741    0.001443    -0.010152
>>> -0.012996
>>>>>> -0.005337    -0.008738    -0.015094    -0.012104
>>>>>> # N123    -0.019008    -0.013494    -0.01318    -0.029208
>>> -0.032748
>>>>>> -0.020243    -0.015089    -0.014439    -0.011681
>>>>>> # N163    -0.054023    -0.049345    -0.037158    -0.04112
>>> -0.044612
>>>>>> -0.036953    -0.036061    -0.044516    -0.046436
>>>>>> # N193    -0.022171    -0.022384    -0.022338    -0.023304
>>> -0.022569
>>>>>> -0.021827    -0.021996    -0.021755    -0.021846
>>>>>>
>>>>>>
>>>>>> # sensors to keep
>>>>>>
>>>>>> sensors <-  c("N053", "N163")
>>>>>>
>>>>>>
>>>>>> library(iterators)
>>>>>>
>>>>>> library(rlist)
>>>>>>
>>>>>>
>>>>>> file_name <- "test.txt"
>>>>>>
>>>>>> con_obj <- file( file_name , "r")
>>>>>> ifile <- ireadLines( con_obj , n = 1 )
>>>>>>
>>>>>>
>>>>>> ## I do not do a loop for the example
>>>>>>
>>>>>> res <- list()
>>>>>>
>>>>>> r <- get_Lines_iter( ifile , sensors)
>>>>>> res <- list.append( res , r )
>>>>>> res
>>>>>> r <- get_Lines_iter( ifile , sensors)
>>>>>> res <- list.append( res , r )
>>>>>> res
>>>>>> r <- get_Lines_iter( ifile , sensors)
>>>>>> do.call("cbind",res)
>>>>>>
>>>>>> ## the function get_Lines_iter to select and process the line
>>>>>>
>>>>>> get_Lines_iter  <-  function( iter , sensors, sep = '\t', quiet =
>>> FALSE){
>>>>>>      ## read the next record in the iterator
>>>>>>      r = try( nextElem(iter) )
>>>>>>     while(  TRUE ){
>>>>>>        if( class(r) == "try-error") {
>>>>>>              return( stop("The iterator is empty") )
>>>>>>       } else {
>>>>>>       ## split the read line according to the separator
>>>>>>        r_txt <- textConnection(r)
>>>>>>        fields <- scan(file = r_txt, what = "character", sep =
>sep,
>>> quiet =
>>>>>> quiet)
>>>>>>         ## test if we have to keep the line
>>>>>>         if( fields[1] %in% sensors){
>>>>>>           ## data processing for the selected line (for the
>example
>>>>>> transformation in dataframe)
>>>>>>           n <- length(fields)
>>>>>>           x <- data.frame( as.numeric(fields[2:n]) )
>>>>>>           names(x) <- fields[1]
>>>>>>           ## We return the values
>>>>>>           print(paste0("sensor ",fields[1]," ok"))
>>>>>>           return( x )
>>>>>>         }else{
>>>>>>          print(paste0("Sensor ", fields[1] ," not selected"))
>>>>>>          r = try(nextElem(iter) )}
>>>>>>       }
>>>>>> }# end while loop
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> L'absence de virus dans ce courrier électronique a été vérifiée
>>> par le logiciel antivirus Avast.
>>>>>> https://www.avast.com/antivirus
>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible
>code.
>>>>
>>>>
>>>> --
>>>> L'absence de virus dans ce courrier électronique a été vérifiée par
>>> le logiciel antivirus Avast.
>>>> https://www.avast.com/antivirus
>>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Ivan Krylov
In reply to this post by Laurent Rhelp
Hi Laurent,

I am not saying this will work every time and I do recognise that this
is very different from a more general solution that you had envisioned,
but if you are on an UNIX-like system or have the relevant utilities
installed and on the %PATH% on Windows, you can filter the input file
line-by-line using a pipe and an external program:

On Sun, 17 May 2020 15:52:30 +0200
Laurent Rhelp <[hidden email]> wrote:

> # sensors to keep
> sensors <-  c("N053", "N163")

# filter on the beginning of the line
i <- pipe("grep -E '^(N053|N163)' test.txt")
# or:
# filter on the beginning of the given column
# (use $2 for the second column, etc.)
i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt")
# or:
# since your message is full of Unicode non-breaking spaces, I have to
# bring in heavier machinery to handle those correctly;
# only this solution manages to match full column values
# (here you can also use $F[1] for second column and so on)
i <- pipe("perl -CSD -F'\\s+' -lE \\
 'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
 test.txt
")
lines <- read.table(i) # closes i when done

The downside of this approach is having to shell-escape the command
lines, which can become complicated, and choosing between use of regular
expressions and more wordy programs (Unicode whitespace in the input
doesn't help, either).

--
Best regards,
Ivan

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Laurent Rhelp
Hi Ivan,
   Endeed, it is a good idea. I am under MSwindows but I can use the
bash command I use with git. I will see how to do that with the unix
command lines.


Le 20/05/2020 à 09:46, Ivan Krylov a écrit :

> Hi Laurent,
>
> I am not saying this will work every time and I do recognise that this
> is very different from a more general solution that you had envisioned,
> but if you are on an UNIX-like system or have the relevant utilities
> installed and on the %PATH% on Windows, you can filter the input file
> line-by-line using a pipe and an external program:
>
> On Sun, 17 May 2020 15:52:30 +0200
> Laurent Rhelp <[hidden email]> wrote:
>
>> # sensors to keep
>> sensors <-  c("N053", "N163")
> # filter on the beginning of the line
> i <- pipe("grep -E '^(N053|N163)' test.txt")
> # or:
> # filter on the beginning of the given column
> # (use $2 for the second column, etc.)
> i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt")
> # or:
> # since your message is full of Unicode non-breaking spaces, I have to
> # bring in heavier machinery to handle those correctly;
> # only this solution manages to match full column values
> # (here you can also use $F[1] for second column and so on)
> i <- pipe("perl -CSD -F'\\s+' -lE \\
>   'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
>   test.txt
> ")
> lines <- read.table(i) # closes i when done
>
> The downside of this approach is having to shell-escape the command
> lines, which can become complicated, and choosing between use of regular
> expressions and more wordy programs (Unicode whitespace in the input
> doesn't help, either).
>


--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

R help mailing list-2
Hi Laurent,

Seeking to give you an "R-only" solution, I thought the read.fwf()
function might be useful (to read-in your first column of data, only).
However Jeff is correct that this is a poor strategy, since read.fwf()
reads the entire file into R (documented in "Fixed-width-format
files", Section 2.2: R Data Import/Export Manual).

Jeff has suggested a number of packages, as well as using a database.
Ivan Krylov has posted answers using grep, awk and perl (perl5--to
disambiguate). [In point of fact, the R Data Import/Export Manual
suggests using perl]. Similar to Ivan, I've posted code below using
the Raku programming language (the language formerly known as Perl6).
Regexes are claimed to be more readable, but are currently very slow
in Raku. However on the plus side, the language is designed to handle
Unicode gracefully:

> # pipe() using raku-grep on Laurent's data (sep=mult whitespace):
> con_obj1 <- pipe(paste("raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' ", "Laurents.txt"), open="rt");
> p6_import_a <- scan(file=con_obj1, what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, quiet=TRUE);
> close(con_obj1);
> as.data.frame(sapply(p6_import_a, t), stringsAsFactors=FALSE);
  V1   V2        V3        V4        V5        V6        V7        V8
      V9       V10
1  2 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
-0.008738 -0.015094
2  4 N163 -0.054023 -0.049345 -0.037158  -0.04112 -0.044612 -0.036953
-0.036061 -0.044516
>
> # pipe() using raku-grep "starts-with" to find genbankID ( >3GB TSV file)
> # "lines[0..5]" restricts raku to reading first 6 lines!
> # change "lines[0..5]" to "lines" to run raku code on whole file:
> con_obj2 <- pipe(paste("raku -e '.put for lines[0..5].grep( *.starts-with(q[A00145]), :p);' ", "genbankIDs_3GB.tsv"), "rt");
> p6_import_b <- read.table(con_obj2, sep="\t");
> close(con_obj2)
> p6_import_b
  V1     V2       V3          V4 V5
1  4 A00145 A00145.1 IFN-alpha A NA
>
> # unicode test using R's system() function:
> try(system("raku -ne '.grep( /  你好  |  こんにちは  |  مرحبا  |  Привет  /, :v ).put;'  hello_7lang.txt", intern = TRUE, ignore.stderr = FALSE))
[1] ""                    ""                    ""
"你好 Chinese"
[5] "こんにちは Japanese" "مرحبا Arabic"        "Привет Russian"
>

[special thanks to Brad Gilbert, Joseph Brenner and others on the
perl6-users mailing list. All errors above are my own.]

HTH, Bill.

W. Michels, Ph.D.




On Fri, May 22, 2020 at 4:48 AM Laurent Rhelp <[hidden email]> wrote:

>
> Hi Ivan,
>    Endeed, it is a good idea. I am under MSwindows but I can use the
> bash command I use with git. I will see how to do that with the unix
> command lines.
>
>
> Le 20/05/2020 à 09:46, Ivan Krylov a écrit :
> > Hi Laurent,
> >
> > I am not saying this will work every time and I do recognise that this
> > is very different from a more general solution that you had envisioned,
> > but if you are on an UNIX-like system or have the relevant utilities
> > installed and on the %PATH% on Windows, you can filter the input file
> > line-by-line using a pipe and an external program:
> >
> > On Sun, 17 May 2020 15:52:30 +0200
> > Laurent Rhelp <[hidden email]> wrote:
> >
> >> # sensors to keep
> >> sensors <-  c("N053", "N163")
> > # filter on the beginning of the line
> > i <- pipe("grep -E '^(N053|N163)' test.txt")
> > # or:
> > # filter on the beginning of the given column
> > # (use $2 for the second column, etc.)
> > i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt")
> > # or:
> > # since your message is full of Unicode non-breaking spaces, I have to
> > # bring in heavier machinery to handle those correctly;
> > # only this solution manages to match full column values
> > # (here you can also use $F[1] for second column and so on)
> > i <- pipe("perl -CSD -F'\\s+' -lE \\
> >   'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
> >   test.txt
> > ")
> > lines <- read.table(i) # closes i when done
> >
> > The downside of this approach is having to shell-escape the command
> > lines, which can become complicated, and choosing between use of regular
> > expressions and more wordy programs (Unicode whitespace in the input
> > doesn't help, either).
> >
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

R help mailing list-2
Strike that one sentence in brackets: "[In point of fact, the R Data
Import/Export Manual suggests using perl]", to pre-process data before
loading into R. The manual's recommendation only pertains to large
fixed width formatted files [see #1], whereas Laurent's data is
whitespace-delimited:

> read.table( "Laurents.txt")
> read.delim( "Laurents.txt", sep="")

Best Regards, Bill.

W. Michels, Ph.D.

Citation:
[#1] https://cran.r-project.org/doc/manuals/r-release/R-data.html#Fixed_002dwidth_002dformat-files

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Laurent Rhelp
In reply to this post by R help mailing list-2
I installed raku on my PC to test your solution:

The command raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' 
Laurents.txt works fine when I write it in the bash command but when I
use the pipe command in R as you say there is nothing in lines with
lines <- read.table(i)

There is the same problem with Ivan's solution the command grep -E
'^(N053|N163)' test.txt works fine under the bash command but not i <-
pipe("grep -E '^(N053|N163)' test.txt"); lines <- read.table(i)

May be it is because I work with MS windows ?

thx
LP




Le 24/05/2020 à 04:34, William Michels a écrit :

> Hi Laurent,
>
> Seeking to give you an "R-only" solution, I thought the read.fwf()
> function might be useful (to read-in your first column of data, only).
> However Jeff is correct that this is a poor strategy, since read.fwf()
> reads the entire file into R (documented in "Fixed-width-format
> files", Section 2.2: R Data Import/Export Manual).
>
> Jeff has suggested a number of packages, as well as using a database.
> Ivan Krylov has posted answers using grep, awk and perl (perl5--to
> disambiguate). [In point of fact, the R Data Import/Export Manual
> suggests using perl]. Similar to Ivan, I've posted code below using
> the Raku programming language (the language formerly known as Perl6).
> Regexes are claimed to be more readable, but are currently very slow
> in Raku. However on the plus side, the language is designed to handle
> Unicode gracefully:
>
>> # pipe() using raku-grep on Laurent's data (sep=mult whitespace):
>> con_obj1 <- pipe(paste("raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' ", "Laurents.txt"), open="rt");
>> p6_import_a <- scan(file=con_obj1, what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, quiet=TRUE);
>> close(con_obj1);
>> as.data.frame(sapply(p6_import_a, t), stringsAsFactors=FALSE);
>    V1   V2        V3        V4        V5        V6        V7        V8
>        V9       V10
> 1  2 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> -0.008738 -0.015094
> 2  4 N163 -0.054023 -0.049345 -0.037158  -0.04112 -0.044612 -0.036953
> -0.036061 -0.044516
>> # pipe() using raku-grep "starts-with" to find genbankID ( >3GB TSV file)
>> # "lines[0..5]" restricts raku to reading first 6 lines!
>> # change "lines[0..5]" to "lines" to run raku code on whole file:
>> con_obj2 <- pipe(paste("raku -e '.put for lines[0..5].grep( *.starts-with(q[A00145]), :p);' ", "genbankIDs_3GB.tsv"), "rt");
>> p6_import_b <- read.table(con_obj2, sep="\t");
>> close(con_obj2)
>> p6_import_b
>    V1     V2       V3          V4 V5
> 1  4 A00145 A00145.1 IFN-alpha A NA
>> # unicode test using R's system() function:
>> try(system("raku -ne '.grep( /  你好  |  こんにちは  |  مرحبا  |  Привет  /, :v ).put;'  hello_7lang.txt", intern = TRUE, ignore.stderr = FALSE))
> [1] ""                    ""                    ""
> "你好 Chinese"
> [5] "こんにちは Japanese" "مرحبا Arabic"        "Привет Russian"
> [special thanks to Brad Gilbert, Joseph Brenner and others on the
> perl6-users mailing list. All errors above are my own.]
>
> HTH, Bill.
>
> W. Michels, Ph.D.
>
>
>
>
> On Fri, May 22, 2020 at 4:48 AM Laurent Rhelp <[hidden email]> wrote:
>> Hi Ivan,
>>     Endeed, it is a good idea. I am under MSwindows but I can use the
>> bash command I use with git. I will see how to do that with the unix
>> command lines.
>>
>>
>> Le 20/05/2020 à 09:46, Ivan Krylov a écrit :
>>> Hi Laurent,
>>>
>>> I am not saying this will work every time and I do recognise that this
>>> is very different from a more general solution that you had envisioned,
>>> but if you are on an UNIX-like system or have the relevant utilities
>>> installed and on the %PATH% on Windows, you can filter the input file
>>> line-by-line using a pipe and an external program:
>>>
>>> On Sun, 17 May 2020 15:52:30 +0200
>>> Laurent Rhelp <[hidden email]> wrote:
>>>
>>>> # sensors to keep
>>>> sensors <-  c("N053", "N163")
>>> # filter on the beginning of the line
>>> i <- pipe("grep -E '^(N053|N163)' test.txt")
>>> # or:
>>> # filter on the beginning of the given column
>>> # (use $2 for the second column, etc.)
>>> i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt")
>>> # or:
>>> # since your message is full of Unicode non-breaking spaces, I have to
>>> # bring in heavier machinery to handle those correctly;
>>> # only this solution manages to match full column values
>>> # (here you can also use $F[1] for second column and so on)
>>> i <- pipe("perl -CSD -F'\\s+' -lE \\
>>>    'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
>>>    test.txt
>>> ")
>>> lines <- read.table(i) # closes i when done
>>>
>>> The downside of this approach is having to shell-escape the command
>>> lines, which can become complicated, and choosing between use of regular
>>> expressions and more wordy programs (Unicode whitespace in the input
>>> doesn't help, either).
>>>
>>
>> --
>> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

Ivan Krylov
On Wed, 27 May 2020 10:56:42 +0200
Laurent Rhelp <[hidden email]> wrote:

> May be it is because I work with MS windows ?

That is probably the case.

On Windows, pipe() invokes "%COMSPEC% /c <description>", and the
rules of command line quoting are different between POSIX shell and
cmd.exe + runtimes of Windows applications [*].

Can you run raku / perl / grep / awk from the cmd prompt? If not, check
your %PATH% variable. Either way, shQuote() is supposed to be able to
handle the quoting madness for us, the inner call performing the
quoting for the runtime and the outer call escaping for the cmd.exe
itself:

pipe(shQuote(
 paste(
  'raku', '-e',
  shQuote('.put for lines.grep( / ^^N053 | ^^N163 /, :p );'),
  'Laurents.txt'
 ),
 type = 'cmd2'
))

pipe(shQuote(
 paste('grep', '-E', shQuote('^(N053|N163)'), 'test.txt'),
 'cmd2'
))

pipe(shQuote(
 paste('awk', shQuote('($1 ~ "^(N053|N163)")'), 'test.txt'),
 'cmd2'
))

pipe(shQuote(
 paste(
  'perl', '-CSD', '-F', shQuote('\\s+'), '-lE',
  shQuote('print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/'),
  'test.txt'
 ), 'cmd2'
))

This way, we can even pretend that we are passing an _array_ of command
line arguments to the child process, like K&R intended, and not
building a command _line_ to be interpreted by the command line
interpreter and application runtime.

--
Best regards,
Ivan

[*] In POSIX, the command line is an array of NUL-terminated C strings.
In Windows, the command line is a single NUL-terminated C string, so
the runtime of the application is responsible for obtaining an array of
command line arguments from that:
https://docs.microsoft.com/ru-ru/archive/blogs/twistylittlepassagesallalike/everyone-quotes-command-line-arguments-the-wrong-way

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: iterators : checkFunc with ireadLines

R help mailing list-2
In reply to this post by Laurent Rhelp
Hi Laurent,

Off the bat I would have guessed that the problem you're seeing has to
do with 'command line quoting' differences between the Windows system
and the Linux/Mac systems. I've noticed people using Windows having
better command line success with "exterior double-quotes / interior
single-quotes" while Linux/Mac tend to have more success with
"exterior single- quotes / interior double-quotes". The problem is
exacerbated in R by system() or pipe() calls which require another
(exterior) set of quotations.

1. You can print out your connection object to make sure that the
interior code was read properly into R. Also, take a look at the
'connections' help page to see if there are other parameters you need
to explicitly set (like encoding). Here's the first (working) example
from my last post to you:

> ?connections
> con_obj1
                                                              description
"raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );'  Laurents.txt"
                                                                    class
                                                                   "pipe"
                                                                     mode
                                                                     "rt"
                                                                     text
                                                                   "text"
                                                                   opened
                                                                 "opened"
                                                                 can read
                                                                    "yes"
                                                                can write
                                                                     "no"
>

2. You can try 'backslash-escaping' interior quotes in your system()
or pipe() calls. Also, in two of my previous examples I use paste() to
break up complicated quoting into more manageable chunks. You can try
these calls with 'backslash-escaped' interior quotes, and without
paste():

> con_obj1 <- pipe("raku -e \'.put for lines.grep( / ^^N053 | ^^N163 /, :p );\' Laurents.txt", open="rt");
> con_obj1
                                                             description
"raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' Laurents.txt"
                                                                   class
                                                                  "pipe"
                                                                    mode
                                                                    "rt"
                                                                    text
                                                                  "text"
                                                                  opened
                                                                "opened"
                                                                can read
                                                                   "yes"
                                                               can write
                                                                    "no"
>

3. If R creates your 'con_obj' without throwing an error, then you
should try the most basic functions for reading data into R, something
like readLines(). Again, recreate our 'con_obj' with different
encodings, if necessary. Be careful of reading from the same
connection object with multiple R functions (an unlikely scenario, but
one that should be mentioned). Below it appears that 'con_obj1' gets
consumed by readLines() before the second call to scan():

> rm(con_obj1)
> # note: dropped ':p' adverb below to simplify
> con_obj1 <- pipe("raku -e \'.put for lines.grep( / ^^N053 | ^^N163 / );\' Laurents.txt", open="rt");
> scan(con_obj1)
Error in scan(con_obj1) : scan() expected 'a real', got 'N053'
> con_obj1 <- pipe("raku -e \'.put for lines.grep( / ^^N053 | ^^N163 / );\' Laurents.txt", open="rt");
> readLines(con_obj1)
[1] "N053    -0.014083    -0.004741    0.001443    -0.010152 -0.012996
   -0.005337    -0.008738    -0.015094    -0.012104"
[2] "N163    -0.054023    -0.049345    -0.037158    -0.04112 -0.044612
   -0.036953    -0.036061    -0.044516    -0.046436"
> scan(con_obj1)
Read 0 items
numeric(0)

>

Other than that, you can post here again and we'll try to help. If you
become convinced it's a raku problem, you can check the 'raku-grep'
help page at https://docs.raku.org/routine/grep, or post a question to
the perl6-users mailing list at [hidden email] .

HTH, Bill.

W. Michels, Ph.D.
On Wed, May 27, 2020 at 1:56 AM Laurent Rhelp <[hidden email]> wrote:

>
> I installed raku on my PC to test your solution:
>
> The command raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );'
> Laurents.txt works fine when I write it in the bash command but when I
> use the pipe command in R as you say there is nothing in lines with
> lines <- read.table(i)
>
> There is the same problem with Ivan's solution the command grep -E
> '^(N053|N163)' test.txt works fine under the bash command but not i <-
> pipe("grep -E '^(N053|N163)' test.txt"); lines <- read.table(i)
>
> May be it is because I work with MS windows ?
>
> thx
> LP
>
>
>
>
> Le 24/05/2020 à 04:34, William Michels a écrit :
> > Hi Laurent,
> >
> > Seeking to give you an "R-only" solution, I thought the read.fwf()
> > function might be useful (to read-in your first column of data, only).
> > However Jeff is correct that this is a poor strategy, since read.fwf()
> > reads the entire file into R (documented in "Fixed-width-format
> > files", Section 2.2: R Data Import/Export Manual).
> >
> > Jeff has suggested a number of packages, as well as using a database.
> > Ivan Krylov has posted answers using grep, awk and perl (perl5--to
> > disambiguate). [In point of fact, the R Data Import/Export Manual
> > suggests using perl]. Similar to Ivan, I've posted code below using
> > the Raku programming language (the language formerly known as Perl6).
> > Regexes are claimed to be more readable, but are currently very slow
> > in Raku. However on the plus side, the language is designed to handle
> > Unicode gracefully:
> >
> >> # pipe() using raku-grep on Laurent's data (sep=mult whitespace):
> >> con_obj1 <- pipe(paste("raku -e '.put for lines.grep( / ^^N053 | ^^N163 /, :p );' ", "Laurents.txt"), open="rt");
> >> p6_import_a <- scan(file=con_obj1, what=list("","","","","","","","","",""), flush=TRUE, multi.line=FALSE, quiet=TRUE);
> >> close(con_obj1);
> >> as.data.frame(sapply(p6_import_a, t), stringsAsFactors=FALSE);
> >    V1   V2        V3        V4        V5        V6        V7        V8
> >        V9       V10
> > 1  2 N053 -0.014083 -0.004741  0.001443 -0.010152 -0.012996 -0.005337
> > -0.008738 -0.015094
> > 2  4 N163 -0.054023 -0.049345 -0.037158  -0.04112 -0.044612 -0.036953
> > -0.036061 -0.044516
> >> # pipe() using raku-grep "starts-with" to find genbankID ( >3GB TSV file)
> >> # "lines[0..5]" restricts raku to reading first 6 lines!
> >> # change "lines[0..5]" to "lines" to run raku code on whole file:
> >> con_obj2 <- pipe(paste("raku -e '.put for lines[0..5].grep( *.starts-with(q[A00145]), :p);' ", "genbankIDs_3GB.tsv"), "rt");
> >> p6_import_b <- read.table(con_obj2, sep="\t");
> >> close(con_obj2)
> >> p6_import_b
> >    V1     V2       V3          V4 V5
> > 1  4 A00145 A00145.1 IFN-alpha A NA
> >> # unicode test using R's system() function:
> >> try(system("raku -ne '.grep( /  你好  |  こんにちは  |  مرحبا  |  Привет  /, :v ).put;'  hello_7lang.txt", intern = TRUE, ignore.stderr = FALSE))
> > [1] ""                    ""                    ""
> > "你好 Chinese"
> > [5] "こんにちは Japanese" "مرحبا Arabic"        "Привет Russian"
> > [special thanks to Brad Gilbert, Joseph Brenner and others on the
> > perl6-users mailing list. All errors above are my own.]
> >
> > HTH, Bill.
> >
> > W. Michels, Ph.D.
> >
> >
> >
> >
> > On Fri, May 22, 2020 at 4:48 AM Laurent Rhelp <[hidden email]> wrote:
> >> Hi Ivan,
> >>     Endeed, it is a good idea. I am under MSwindows but I can use the
> >> bash command I use with git. I will see how to do that with the unix
> >> command lines.
> >>
> >>
> >> Le 20/05/2020 à 09:46, Ivan Krylov a écrit :
> >>> Hi Laurent,
> >>>
> >>> I am not saying this will work every time and I do recognise that this
> >>> is very different from a more general solution that you had envisioned,
> >>> but if you are on an UNIX-like system or have the relevant utilities
> >>> installed and on the %PATH% on Windows, you can filter the input file
> >>> line-by-line using a pipe and an external program:
> >>>
> >>> On Sun, 17 May 2020 15:52:30 +0200
> >>> Laurent Rhelp <[hidden email]> wrote:
> >>>
> >>>> # sensors to keep
> >>>> sensors <-  c("N053", "N163")
> >>> # filter on the beginning of the line
> >>> i <- pipe("grep -E '^(N053|N163)' test.txt")
> >>> # or:
> >>> # filter on the beginning of the given column
> >>> # (use $2 for the second column, etc.)
> >>> i <- pipe("awk '($1 ~ \"^(N053|N163)\")' test.txt")
> >>> # or:
> >>> # since your message is full of Unicode non-breaking spaces, I have to
> >>> # bring in heavier machinery to handle those correctly;
> >>> # only this solution manages to match full column values
> >>> # (here you can also use $F[1] for second column and so on)
> >>> i <- pipe("perl -CSD -F'\\s+' -lE \\
> >>>    'print join qq{\\t}, @F if $F[0] =~ /^(N053|N163)$/' \\
> >>>    test.txt
> >>> ")
> >>> lines <- read.table(i) # closes i when done
> >>>
> >>> The downside of this approach is having to shell-escape the command
> >>> lines, which can become complicated, and choosing between use of regular
> >>> expressions and more wordy programs (Unicode whitespace in the input
> >>> doesn't help, either).
> >>>
> >>
> >> --
> >> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> >> https://www.avast.com/antivirus
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
> https://www.avast.com/antivirus
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.