# Problems of data processing

6 messages
Open this post in threaded view
|

## Problems of data processing

 I have two problems for the data processing of my large data base (50000 rows). For example, a sample is as follows Num <- c(1,2,3,4,4,4,5,5) Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06") Place <- c("x1","x1","x3","x4","x4","x4","x5","x5") X <- c(1,””,2,3,3,3,6,6) Y <- c(1,””,9,7,7,7,8,8) toto <- data.frame(Num,Date,Place,X,Y) The first problem is to keep one line for each Num with the “minimum” date. I managed to do it with loops but i would like a solution without using loops. It will be better for my large data base. The other one is to retrieve the coordinates ill-informed. For example, for the same place “x1”, Num=2 doesn't have X and Y. But, we have this information for Num=1. The example data base must be like this Num <- c(1,2,3,4,5) Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 9:05") Place <- c("x1","x1","x3","x4","x5") X <- c(1,1,2,3,6) Y <- c(1,1,9,7,8) toto <- data.frame(Num,Date,Place,X,Y)   Somebody know how to do ? Thanks. Florent Bonneu Laboratoire de Statistique et Probabilités bureau 148  bât. 1R2 Université Toulouse 3 118 route de Narbonne - 31062 Toulouse cedex 9 [hidden email] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|

## Re: Problems of data processing

 something wrong in X and Y definitions... but this could work: do.call("rbind", lapply(split(toto, toto\$Num),     function(x) x[which.min(as.POSIXct(strptime(toto\$Date, "%d/%m/%y %H:%M"))),])) i don't understand the second query; do you want to keep the first line when there are several lines for the same place ? Florent Bonneu a écrit : >I have two problems for the data processing of my large data base (50000 rows). For example, a sample is as follows > >Num <- c(1,2,3,4,4,4,5,5) >Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06") >Place <- c("x1","x1","x3","x4","x4","x4","x5","x5") >X <- c(1,””,2,3,3,3,6,6) >Y <- c(1,””,9,7,7,7,8,8) > >toto <- data.frame(Num,Date,Place,X,Y) > >The first problem is to keep one line for each Num with the “minimum” date. I managed to do it with loops but i would like a solution without using loops. It will be better for my large data base. > >The other one is to retrieve the coordinates ill-informed. For example, for the same place “x1”, Num=2 doesn't have X and Y. But, we have this information for Num=1. > >The example data base must be like this > >Num <- c(1,2,3,4,5) >Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 9:05") >Place <- c("x1","x1","x3","x4","x5") >X <- c(1,1,2,3,6) >Y <- c(1,1,9,7,8) > >toto <- data.frame(Num,Date,Place,X,Y)   > > >Somebody know how to do ? >Thanks. > >Florent Bonneu >Laboratoire de Statistique et Probabilités >bureau 148  bât. 1R2 >Université Toulouse 3 >118 route de Narbonne - 31062 Toulouse cedex 9 >[hidden email] > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html> >   > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|

## Re: Problems of data processing

 Indeed, X <- c(1,Na,2,3,3,3,6,6) Y <- c(1,Na,9,7,7,7,8,8) I want to obtain one line for each Num. It's not a problem if there are several lines for the same place, because my identifier is Num. I just want to get X and Y well-informed in an other line for the same place. For example, "Num=2" is at the place "x1", like "Num=1", but we don't have the coordinates X and Y for "Num=2".  Now, the same coordinates are well-informed for "Num=1", so i want to retrieve this coordinates in my line "Num=2" for my columns X and Y. Jacques VESLOT wrote: > something wrong in X and Y definitions... but this could work: > > do.call("rbind", lapply(split(toto, toto\$Num), >    function(x) x[which.min(as.POSIXct(strptime(toto\$Date, "%d/%m/%y > %H:%M"))),])) > > i don't understand the second query; do you want to keep the first > line when there are several lines for the same place ? > > > Florent Bonneu a écrit : > >> I have two problems for the data processing of my large data base >> (50000 rows). For example, a sample is as follows >> >> Num <- c(1,2,3,4,4,4,5,5) >> Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", >> "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06") >> Place <- c("x1","x1","x3","x4","x4","x4","x5","x5") >> X <- c(1,””,2,3,3,3,6,6) >> Y <- c(1,””,9,7,7,7,8,8) >> >> toto <- data.frame(Num,Date,Place,X,Y) >> >> The first problem is to keep one line for each Num with the “minimum” >> date. I managed to do it with loops but i would like a solution >> without using loops. It will be better for my large data base. >> >> The other one is to retrieve the coordinates ill-informed. For >> example, for the same place “x1”, Num=2 doesn't have X and Y. But, we >> have this information for Num=1. >> >> The example data base must be like this >> >> Num <- c(1,2,3,4,5) >> Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", >> "1/1/04 9:05") >> Place <- c("x1","x1","x3","x4","x5") >> X <- c(1,1,2,3,6) >> Y <- c(1,1,9,7,8) >> >> toto <- data.frame(Num,Date,Place,X,Y) >> >> Somebody know how to do ? >> Thanks. >> >> Florent Bonneu >> Laboratoire de Statistique et Probabilités >> bureau 148  bât. 1R2 >> Université Toulouse 3 >> 118 route de Narbonne - 31062 Toulouse cedex 9 >> [hidden email] >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html>> >>   >> > > > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Open this post in threaded view
|

## Re: Problems of data processing

 OK ! so try this: merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T) Florent Bonneu a écrit : > Indeed, > X <- c(1,Na,2,3,3,3,6,6) > Y <- c(1,Na,9,7,7,7,8,8) > > I want to obtain one line for each Num. It's not a problem if there > are several lines for the same place, because my identifier is Num. I > just want to get X and Y well-informed in an other line for the same > place. For example, "Num=2" is at the place "x1", like "Num=1", but we > don't have the coordinates X and Y for "Num=2".  Now, the same > coordinates are well-informed for "Num=1", so i want to retrieve this > coordinates in my line "Num=2" for my columns X and Y. > > > > Jacques VESLOT wrote: > >> something wrong in X and Y definitions... but this could work: >> >> do.call("rbind", lapply(split(toto, toto\$Num), >>    function(x) x[which.min(as.POSIXct(strptime(toto\$Date, "%d/%m/%y >> %H:%M"))),])) >> >> i don't understand the second query; do you want to keep the first >> line when there are several lines for the same place ? >> >> >> Florent Bonneu a écrit : >> >>> I have two problems for the data processing of my large data base >>> (50000 rows). For example, a sample is as follows >>> >>> Num <- c(1,2,3,4,4,4,5,5) >>> Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", >>> "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06") >>> Place <- c("x1","x1","x3","x4","x4","x4","x5","x5") >>> X <- c(1,””,2,3,3,3,6,6) >>> Y <- c(1,””,9,7,7,7,8,8) >>> >>> toto <- data.frame(Num,Date,Place,X,Y) >>> >>> The first problem is to keep one line for each Num with the >>> “minimum” date. I managed to do it with loops but i would like a >>> solution without using loops. It will be better for my large data base. >>> >>> The other one is to retrieve the coordinates ill-informed. For >>> example, for the same place “x1”, Num=2 doesn't have X and Y. But, >>> we have this information for Num=1. >>> >>> The example data base must be like this >>> >>> Num <- c(1,2,3,4,5) >>> Date <- c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", >>> "1/1/04 9:05") >>> Place <- c("x1","x1","x3","x4","x5") >>> X <- c(1,1,2,3,6) >>> Y <- c(1,1,9,7,8) >>> >>> toto <- data.frame(Num,Date,Place,X,Y) >>> Somebody know how to do ? >>> Thanks. >>> >>> Florent Bonneu >>> Laboratoire de Statistique et Probabilités >>> bureau 148  bât. 1R2 >>> Université Toulouse 3 >>> 118 route de Narbonne - 31062 Toulouse cedex 9 >>> [hidden email] >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help>>> PLEASE do read the posting guide! >>> http://www.R-project.org/posting-guide.html>>> >>>   >>> >> >> >> > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide! http://www.R-project.org/posting-guide.html