

I have two problems for the data processing of my large data base (50000 rows). For example, a sample is as follows
Num < c(1,2,3,4,4,4,5,5)
Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
Place < c("x1","x1","x3","x4","x4","x4","x5","x5")
X < c(1,””,2,3,3,3,6,6)
Y < c(1,””,9,7,7,7,8,8)
toto < data.frame(Num,Date,Place,X,Y)
The first problem is to keep one line for each Num with the “minimum” date. I managed to do it with loops but i would like a solution without using loops. It will be better for my large data base.
The other one is to retrieve the coordinates illinformed. For example, for the same place “x1”, Num=2 doesn't have X and Y. But, we have this information for Num=1.
The example data base must be like this
Num < c(1,2,3,4,5)
Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 9:05")
Place < c("x1","x1","x3","x4","x5")
X < c(1,1,2,3,6)
Y < c(1,1,9,7,8)
toto < data.frame(Num,Date,Place,X,Y)
Somebody know how to do ?
Thanks.
Florent Bonneu
Laboratoire de Statistique et Probabilités
bureau 148 bât. 1R2
Université Toulouse 3
118 route de Narbonne  31062 Toulouse cedex 9
[hidden email] <mailto: [hidden email]>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide! http://www.Rproject.org/postingguide.html


something wrong in X and Y definitions... but this could work:
do.call("rbind", lapply(split(toto, toto$Num),
function(x) x[which.min(as.POSIXct(strptime(toto$Date, "%d/%m/%y
%H:%M"))),]))
i don't understand the second query; do you want to keep the first line
when there are several lines for the same place ?
Florent Bonneu a écrit :
>I have two problems for the data processing of my large data base (50000 rows). For example, a sample is as follows
>
>Num < c(1,2,3,4,4,4,5,5)
>Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
>Place < c("x1","x1","x3","x4","x4","x4","x5","x5")
>X < c(1,””,2,3,3,3,6,6)
>Y < c(1,””,9,7,7,7,8,8)
>
>toto < data.frame(Num,Date,Place,X,Y)
>
>The first problem is to keep one line for each Num with the “minimum” date. I managed to do it with loops but i would like a solution without using loops. It will be better for my large data base.
>
>The other one is to retrieve the coordinates illinformed. For example, for the same place “x1”, Num=2 doesn't have X and Y. But, we have this information for Num=1.
>
>The example data base must be like this
>
>Num < c(1,2,3,4,5)
>Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 9:05")
>Place < c("x1","x1","x3","x4","x5")
>X < c(1,1,2,3,6)
>Y < c(1,1,9,7,8)
>
>toto < data.frame(Num,Date,Place,X,Y)
>
>
>Somebody know how to do ?
>Thanks.
>
>Florent Bonneu
>Laboratoire de Statistique et Probabilités
>bureau 148 bât. 1R2
>Université Toulouse 3
>118 route de Narbonne  31062 Toulouse cedex 9
> [hidden email] <mailto: [hidden email]>
>
>______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp>PLEASE do read the posting guide! http://www.Rproject.org/postingguide.html>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide! http://www.Rproject.org/postingguide.html


Indeed,
X < c(1,Na,2,3,3,3,6,6)
Y < c(1,Na,9,7,7,7,8,8)
I want to obtain one line for each Num. It's not a problem if there are
several lines for the same place, because my identifier is Num. I just
want to get X and Y wellinformed in an other line for the same place.
For example, "Num=2" is at the place "x1", like "Num=1", but we don't
have the coordinates X and Y for "Num=2". Now, the same coordinates are
wellinformed for "Num=1", so i want to retrieve this coordinates in my
line "Num=2" for my columns X and Y.
Jacques VESLOT wrote:
> something wrong in X and Y definitions... but this could work:
>
> do.call("rbind", lapply(split(toto, toto$Num),
> function(x) x[which.min(as.POSIXct(strptime(toto$Date, "%d/%m/%y
> %H:%M"))),]))
>
> i don't understand the second query; do you want to keep the first
> line when there are several lines for the same place ?
>
>
> Florent Bonneu a écrit :
>
>> I have two problems for the data processing of my large data base
>> (50000 rows). For example, a sample is as follows
>>
>> Num < c(1,2,3,4,4,4,5,5)
>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14",
>> "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
>> Place < c("x1","x1","x3","x4","x4","x4","x5","x5")
>> X < c(1,””,2,3,3,3,6,6)
>> Y < c(1,””,9,7,7,7,8,8)
>>
>> toto < data.frame(Num,Date,Place,X,Y)
>>
>> The first problem is to keep one line for each Num with the “minimum”
>> date. I managed to do it with loops but i would like a solution
>> without using loops. It will be better for my large data base.
>>
>> The other one is to retrieve the coordinates illinformed. For
>> example, for the same place “x1”, Num=2 doesn't have X and Y. But, we
>> have this information for Num=1.
>>
>> The example data base must be like this
>>
>> Num < c(1,2,3,4,5)
>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14",
>> "1/1/04 9:05")
>> Place < c("x1","x1","x3","x4","x5")
>> X < c(1,1,2,3,6)
>> Y < c(1,1,9,7,8)
>>
>> toto < data.frame(Num,Date,Place,X,Y)
>>
>> Somebody know how to do ?
>> Thanks.
>>
>> Florent Bonneu
>> Laboratoire de Statistique et Probabilités
>> bureau 148 bât. 1R2
>> Université Toulouse 3
>> 118 route de Narbonne  31062 Toulouse cedex 9
>> [hidden email] <mailto: [hidden email]>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide!
>> http://www.Rproject.org/postingguide.html>>
>>
>>
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide! http://www.Rproject.org/postingguide.html


OK ! so try this:
merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
Florent Bonneu a écrit :
> Indeed,
> X < c(1,Na,2,3,3,3,6,6)
> Y < c(1,Na,9,7,7,7,8,8)
>
> I want to obtain one line for each Num. It's not a problem if there
> are several lines for the same place, because my identifier is Num. I
> just want to get X and Y wellinformed in an other line for the same
> place. For example, "Num=2" is at the place "x1", like "Num=1", but we
> don't have the coordinates X and Y for "Num=2". Now, the same
> coordinates are wellinformed for "Num=1", so i want to retrieve this
> coordinates in my line "Num=2" for my columns X and Y.
>
>
>
> Jacques VESLOT wrote:
>
>> something wrong in X and Y definitions... but this could work:
>>
>> do.call("rbind", lapply(split(toto, toto$Num),
>> function(x) x[which.min(as.POSIXct(strptime(toto$Date, "%d/%m/%y
>> %H:%M"))),]))
>>
>> i don't understand the second query; do you want to keep the first
>> line when there are several lines for the same place ?
>>
>>
>> Florent Bonneu a écrit :
>>
>>> I have two problems for the data processing of my large data base
>>> (50000 rows). For example, a sample is as follows
>>>
>>> Num < c(1,2,3,4,4,4,5,5)
>>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14",
>>> "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
>>> Place < c("x1","x1","x3","x4","x4","x4","x5","x5")
>>> X < c(1,””,2,3,3,3,6,6)
>>> Y < c(1,””,9,7,7,7,8,8)
>>>
>>> toto < data.frame(Num,Date,Place,X,Y)
>>>
>>> The first problem is to keep one line for each Num with the
>>> “minimum” date. I managed to do it with loops but i would like a
>>> solution without using loops. It will be better for my large data base.
>>>
>>> The other one is to retrieve the coordinates illinformed. For
>>> example, for the same place “x1”, Num=2 doesn't have X and Y. But,
>>> we have this information for Num=1.
>>>
>>> The example data base must be like this
>>>
>>> Num < c(1,2,3,4,5)
>>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04 2:14",
>>> "1/1/04 9:05")
>>> Place < c("x1","x1","x3","x4","x5")
>>> X < c(1,1,2,3,6)
>>> Y < c(1,1,9,7,8)
>>>
>>> toto < data.frame(Num,Date,Place,X,Y)
>>> Somebody know how to do ?
>>> Thanks.
>>>
>>> Florent Bonneu
>>> Laboratoire de Statistique et Probabilités
>>> bureau 148 bât. 1R2
>>> Université Toulouse 3
>>> 118 route de Narbonne  31062 Toulouse cedex 9
>>> [hidden email] <mailto: [hidden email]>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>> PLEASE do read the posting guide!
>>> http://www.Rproject.org/postingguide.html>>>
>>>
>>>
>>
>>
>>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide! http://www.Rproject.org/postingguide.html


Thank you very much for your help but I think there is an error for the
answer to the first problem I spent time on searching the solution but
I failed to find it. I tried to put "which.max" instead of "which.min"
but it doesn't work. I tried to do my best but i didn't have any idea to
solve this problem.
An example :
Num < c(1,2,4,3,4,4,5,5,5)
Date < c("1/1/04 0:48","1/1/04 8:02", "1/1/04 1:55", "1/1/04 2:14", "1/1/04 1:19", "1/1/04 1:02", "1/1/04 11:15", "1/1/04 9:06", "1/1/04 10:32")
Place < c("x1","x1","x4","x3","x4","x4","x5","x5","x5")
X < c(1,NA,3,2,3,3,6,6,6)
Y < c(1,NA,7,9,7,7,8,8,8)
toto < data.frame(Num,Date,Place,X,Y)
toto[order(toto$Num,as.numeric(as.POSIXct(strptime(toto$Date, "%d/%m/%y %H:%M")))),]
toto < merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
help < do.call("rbind", lapply(split(toto, toto$Num),
function(x) x[which.min(as.numeric(as.POSIXct(strptime(toto$Date, "%d/%m/%y %H:%M")))),]))
help
The solution must be
Num < c(1,2,3,4,5)
Date < c("1/1/04 0:48","1/1/04 8:02", "1/1/04 2:14", "1/1/04 1:02", "1/1/04 9:06")
Place < c("x1","x1","x3","x4","x5")
X < c(1,1,2,3,6)
Y < c(1,1,9,7,8)
toto < data.frame(Num,Date,Place,X,Y)
Any suggestion is welcome.
Florent Bonneu.
Jacques VESLOT wrote:
> OK ! so try this:
> merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
>
>
> Florent Bonneu a écrit :
>
>> Indeed,
>> X < c(1,Na,2,3,3,3,6,6)
>> Y < c(1,Na,9,7,7,7,8,8)
>>
>> I want to obtain one line for each Num. It's not a problem if there
>> are several lines for the same place, because my identifier is Num. I
>> just want to get X and Y wellinformed in an other line for the same
>> place. For example, "Num=2" is at the place "x1", like "Num=1", but
>> we don't have the coordinates X and Y for "Num=2". Now, the same
>> coordinates are wellinformed for "Num=1", so i want to retrieve this
>> coordinates in my line "Num=2" for my columns X and Y.
>>
>>
>>
>> Jacques VESLOT wrote:
>>
>>> something wrong in X and Y definitions... but this could work:
>>>
>>> do.call("rbind", lapply(split(toto, toto$Num),
>>> function(x) x[which.min(as.POSIXct(strptime(toto$Date, "%d/%m/%y
>>> %H:%M"))),]))
>>>
>>> i don't understand the second query; do you want to keep the first
>>> line when there are several lines for the same place ?
>>>
>>>
>>> Florent Bonneu a écrit :
>>>
>>>> I have two problems for the data processing of my large data base
>>>> (50000 rows). For example, a sample is as follows
>>>>
>>>> Num < c(1,2,3,4,4,4,5,5)
>>>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04
>>>> 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
>>>> Place < c("x1","x1","x3","x4","x4","x4","x5","x5")
>>>> X < c(1,””,2,3,3,3,6,6)
>>>> Y < c(1,””,9,7,7,7,8,8)
>>>>
>>>> toto < data.frame(Num,Date,Place,X,Y)
>>>>
>>>> The first problem is to keep one line for each Num with the
>>>> “minimum” date. I managed to do it with loops but i would like a
>>>> solution without using loops. It will be better for my large data
>>>> base.
>>>>
>>>> The other one is to retrieve the coordinates illinformed. For
>>>> example, for the same place “x1”, Num=2 doesn't have X and Y. But,
>>>> we have this information for Num=1.
>>>>
>>>> The example data base must be like this
>>>>
>>>> Num < c(1,2,3,4,5)
>>>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04
>>>> 2:14", "1/1/04 9:05")
>>>> Place < c("x1","x1","x3","x4","x5")
>>>> X < c(1,1,2,3,6)
>>>> Y < c(1,1,9,7,8)
>>>>
>>>> toto < data.frame(Num,Date,Place,X,Y)
>>>> Somebody know how to do ?
>>>> Thanks.
>>>>
>>>> Florent Bonneu
>>>> Laboratoire de Statistique et Probabilités
>>>> bureau 148 bât. 1R2
>>>> Université Toulouse 3
>>>> 118 route de Narbonne  31062 Toulouse cedex 9
>>>> [hidden email] <mailto: [hidden email]>
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>> PLEASE do read the posting guide!
>>>> http://www.Rproject.org/postingguide.html>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide! http://www.Rproject.org/postingguide.html


sorry, i let toto$Date in the function within lapply() instead of x$Date !
now, it works :
> toto
Num Date Place X Y
1 1 1/1/04 0:48 x1 1 1
2 2 1/1/04 8:02 x1 NA NA
3 4 1/1/04 1:55 x4 3 7
4 3 1/1/04 2:14 x3 2 9
5 4 1/1/04 1:19 x4 3 7
6 4 1/1/04 1:02 x4 3 7
7 5 1/1/04 11:15 x5 6 8
8 5 1/1/04 9:06 x5 6 8
9 5 1/1/04 10:32 x5 6 8
> toto < do.call("rbind", lapply(split(toto, toto$Num),
+ function(x) x[which.min(as.POSIXct(strptime(x$Date, "%d/%m/%y
%H:%M"))),]))
> toto
Num Date Place X Y
1 1 1/1/04 0:48 x1 1 1
2 2 1/1/04 8:02 x1 NA NA
3 3 1/1/04 2:14 x3 2 9
4 4 1/1/04 1:02 x4 3 7
5 5 1/1/04 9:06 x5 6 8
> toto < merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
> toto
Place Num Date X Y
1 x1 1 1/1/04 0:48 1 1
2 x1 2 1/1/04 8:02 1 1
3 x3 3 1/1/04 2:14 2 9
4 x4 4 1/1/04 1:02 3 7
5 x5 5 1/1/04 9:06 6 8
Florent Bonneu a écrit :
> Thank you very much for your help but I think there is an error for
> the answer to the first problem I spent time on searching the
> solution but I failed to find it. I tried to put "which.max" instead
> of "which.min" but it doesn't work. I tried to do my best but i didn't
> have any idea to solve this problem.
>
> An example :
>
> Num < c(1,2,4,3,4,4,5,5,5)
> Date < c("1/1/04 0:48","1/1/04 8:02", "1/1/04 1:55", "1/1/04 2:14",
> "1/1/04 1:19", "1/1/04 1:02", "1/1/04 11:15", "1/1/04 9:06", "1/1/04
> 10:32")
> Place < c("x1","x1","x4","x3","x4","x4","x5","x5","x5")
> X < c(1,NA,3,2,3,3,6,6,6)
> Y < c(1,NA,7,9,7,7,8,8,8)
> toto < data.frame(Num,Date,Place,X,Y)
> toto[order(toto$Num,as.numeric(as.POSIXct(strptime(toto$Date,
> "%d/%m/%y %H:%M")))),]
> toto < merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
> help < do.call("rbind", lapply(split(toto, toto$Num),
> function(x) x[which.min(as.numeric(as.POSIXct(strptime(toto$Date,
> "%d/%m/%y %H:%M")))),]))
> help
>
> The solution must be
>
> Num < c(1,2,3,4,5)
> Date < c("1/1/04 0:48","1/1/04 8:02", "1/1/04 2:14", "1/1/04 1:02",
> "1/1/04 9:06")
> Place < c("x1","x1","x3","x4","x5")
> X < c(1,1,2,3,6)
> Y < c(1,1,9,7,8)
> toto < data.frame(Num,Date,Place,X,Y)
>
>
> Any suggestion is welcome.
>
> Florent Bonneu.
>
>
>
> Jacques VESLOT wrote:
>
>> OK ! so try this:
>> merge(toto[1:3], unique(na.omit(toto[3:5])),by="Place",all.x=T)
>>
>>
>> Florent Bonneu a écrit :
>>
>>> Indeed,
>>> X < c(1,Na,2,3,3,3,6,6)
>>> Y < c(1,Na,9,7,7,7,8,8)
>>>
>>> I want to obtain one line for each Num. It's not a problem if there
>>> are several lines for the same place, because my identifier is Num.
>>> I just want to get X and Y wellinformed in an other line for the
>>> same place. For example, "Num=2" is at the place "x1", like "Num=1",
>>> but we don't have the coordinates X and Y for "Num=2". Now, the
>>> same coordinates are wellinformed for "Num=1", so i want to
>>> retrieve this coordinates in my line "Num=2" for my columns X and Y.
>>>
>>>
>>>
>>> Jacques VESLOT wrote:
>>>
>>>> something wrong in X and Y definitions... but this could work:
>>>>
>>>> do.call("rbind", lapply(split(toto, toto$Num),
>>>> function(x) x[which.min(as.POSIXct(strptime(toto$Date, "%d/%m/%y
>>>> %H:%M"))),]))
>>>>
>>>> i don't understand the second query; do you want to keep the first
>>>> line when there are several lines for the same place ?
>>>>
>>>>
>>>> Florent Bonneu a écrit :
>>>>
>>>>> I have two problems for the data processing of my large data base
>>>>> (50000 rows). For example, a sample is as follows
>>>>>
>>>>> Num < c(1,2,3,4,4,4,5,5)
>>>>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04
>>>>> 2:14", "1/1/04 3:09", "1/1/04 8:02", "1/1/04 9:05", "1/1/04 9:06")
>>>>> Place < c("x1","x1","x3","x4","x4","x4","x5","x5")
>>>>> X < c(1,””,2,3,3,3,6,6)
>>>>> Y < c(1,””,9,7,7,7,8,8)
>>>>>
>>>>> toto < data.frame(Num,Date,Place,X,Y)
>>>>>
>>>>> The first problem is to keep one line for each Num with the
>>>>> “minimum” date. I managed to do it with loops but i would like a
>>>>> solution without using loops. It will be better for my large data
>>>>> base.
>>>>>
>>>>> The other one is to retrieve the coordinates illinformed. For
>>>>> example, for the same place “x1”, Num=2 doesn't have X and Y. But,
>>>>> we have this information for Num=1.
>>>>>
>>>>> The example data base must be like this
>>>>>
>>>>> Num < c(1,2,3,4,5)
>>>>> Date < c("1/1/04 0:48","1/1/04 1:52", "1/1/04 1:55", "1/1/04
>>>>> 2:14", "1/1/04 9:05")
>>>>> Place < c("x1","x1","x3","x4","x5")
>>>>> X < c(1,1,2,3,6)
>>>>> Y < c(1,1,9,7,8)
>>>>>
>>>>> toto < data.frame(Num,Date,Place,X,Y)
>>>>> Somebody know how to do ?
>>>>> Thanks.
>>>>>
>>>>> Florent Bonneu
>>>>> Laboratoire de Statistique et Probabilités
>>>>> bureau 148 bât. 1R2
>>>>> Université Toulouse 3
>>>>> 118 route de Narbonne  31062 Toulouse cedex 9
>>>>> [hidden email] <mailto: [hidden email]>
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>>> PLEASE do read the posting guide!
>>>>> http://www.Rproject.org/postingguide.html>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide! http://www.Rproject.org/postingguide.html

