

Hi,
I have a dataset (d_vigi)with this kind of data:
behavior type duration(s) observation nr species
Nonvigilant 5 1 red deer
Vigilant 2 1 red deer
Vigilant 2 1 red deer
Nonvigilant 3 1 red deer
Vigilant 7 2 red deer
Vigilant 2 2 red deer
Nonvigilant 1 2 red deer
Unkown 2 2 red deer
Now I have to calculate the percentage of vigilant behavior spent per
observation.
So eventually I will need to end up with something like this:
Observation nr Species vigilant(s) total (s) percentage of vigilant (%)
1 red deer 4 12 33
2 red deer 9 12 75
Now I know how to calculate the total amount of seconds per observation.
But I don't know how I get to the total seconds of vigilant behavior per
observation (red numbers). If I could get there I will know how to
calculate the percentage.
I calculated the total duration per observation this way:
for(id in d_vigi$Obs.nr){
d_vigi$t.duration[d_vigi$Obs.nr==id]<sum(d_vigi$'Duration.(s).x'[d_vigi$Obs.nr==id])
}
this does work and gives me the total (s) but i don't know how to get to
the sum of the seconds just for the vigilant per observation number. Is
there anyone who could help me?
Thanks,
Krissie
Dear Krissie
I think you may be looking for the aggregate command.
Note that this is a plain text list so if you post in HTML we do not see
what you see. In this case we did not see any red numbers.
Michael
Hello,
Try the following.
First aggregate the data, then get the totals, then the percentages.
Finally, put the species in the result.
agg < aggregate(formula = `duration(s)` ~ `observation nr` + `behavior
type`,
data = d_vigi,
FUN = sum,
subset = `behavior type` == 'Vigilant')
agg$total < tapply(d_vigi$`duration(s)`, d_vigi$`observation nr`, FUN =
sum)
agg$percent < round(100 * agg$`duration(s)`/agg$total)
res < merge(agg, d_vigi[c(1, 3:4)])
res[!duplicated(res), ]
Data in dput format:
d_vigi <
structure(list(`behavior type` = c("Nonvigilant", "Vigilant",
"Vigilant", "Nonvigilant", "Vigilant", "Vigilant", "Nonvigilant",
"Unkown"), `duration(s)` = c(5L, 2L, 2L, 3L, 7L, 2L, 1L, 2L),
`observation nr` = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), species =
c("red deer",
"red deer", "red deer", "red deer", "red deer", "red deer",
"red deer", "red deer")), class = "data.frame", row.names = c(NA,
8L))
Hope this helps,
Rui Barradas
Hi,
Thanks for your response.
I do get what you're doing. However, the table I sent is just a small piece
of the complete database. So for me to have to add in everything with
structure list (c ......) by hand would be too much work.
Just to give you an idea, the database is around 16000 rows and has 40
columns with other variables that I do want to keep. So I kind of want to
find a way to keep everything and just add a couple of columns with the
calculated time for vigilant behavior and the percentage.
Still thanks for thinking with me. I am looking into the aggregate
function. Hopefully, this could be a solution.
krissie
Hi,
So one thing i could manage to do was this:
d_vigi$combi < paste(d_vigi$Behavioral.category, d_vigi$Obs.nr, sep = "")
This created a new column with a combination of the category and the
observation number.
Afterwards I did this:
for(id in d_vigi$combi){
d_vigi$durationpercat[d_vigi$combi==id]<sum(d_vigi$'Duration.(s).x'[d_vigi$combi==id])
}
So this created another new column with the correct duration per category.
So that means that I have this:
behavior Behavioral category Duration Obs nr species combi durationpercat
Nonvigilant 5 1 red deer Nonvigilant1 8
Vigilant 2 1 red deer Vigilant1 4
Vigilant 2 1 red deer Vigilant1 4
Nonvigilant 3 1 red deer Nonvigilant1 8
Vigilant 7 2 red deer Vigilant2 9
Vigilant 2 2 red deer Vigilant2 9
Nonvigilant 1 2 red deer Nonvigilant2 1
Unknown 2 2 red deer Unknown2 2
However, this doesn't work for me further along the line. I have to have
the duration for vigilant behaviour in a separate column. I really don't
know how to get there.
Hopefully, you understand where my problem lies. So I kinda need to have
three columns for vigilant, nonvigilant and unknown. That way I could add
in zero's for the observations where there weren't any vigilant behaviour.
Krissie
Dear Krissie
I think you misunderstood Rui's response. He was generating some fake
data to test the code not suggesting you rebuild your data frame.
Michael
Hello,
Michael, you're is almost right. I copy&pasted the OP's data but with so
many blank spaces I had to put single quotes around those values and
column names. Then, when I wrote the answer, I also posted the output of
dput(d_vigi)
so that others that might want to give it a try would have their job
made easier.
To the OP: the code before "Data in dput format:" is a proposed
solution, not the structure(list(etc)), like Michael says.
If you run that statement, you will see the data as you've posted it,
even with the original column names.
Hope this helps,
Rui Barradas
