Detect and replace omitted data

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Detect and replace omitted data

Jonny Armstrong
I am analyzing the spatial distribution of fish in a stream. The stream is
divided into equally sized units, and the number of fish in each unit is
counted. My problem is that my dataset is missing rows where the count in a
unit equals zero. I need to create zero data for the missing units.

For example:
day<-(c(rep(4,8),rep(6,8)))
unit<-c(seq(1,8,1),seq(2,16,2))
value<-floor(rnorm(16,25,10))
x<-cbind(day,unit,value)

x
      day unit value
 [1,]   4    1    19
 [2,]   4    2    15
 [3,]   4    3    16
 [4,]   4    4    20
 [5,]   4    5    17
 [6,]   4    6    15
 [7,]   4    7    14
 [8,]   4    8    29
 [9,]   6    2    18
[10,]   6    4    22
[11,]   6    6    27
[12,]   6    8    16
[13,]   6   10    45
[14,]   6   12    36
[15,]   6   14    34
[16,]   6   16    13

Lets say the stream has 16 units. For each day, I want to fill in rows for
any missing units (e.g., units 9-16 for day 4, the odd numbered units on day
6) with values of zero.

Does anyone know a relatively concise way to do this?
Thank you.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Detect and replace omitted data

Sarah Goslee
Hi Jonny,

On Tue, Oct 18, 2011 at 1:02 PM, Jonny Armstrong
<[hidden email]> wrote:

> I am analyzing the spatial distribution of fish in a stream. The stream is
> divided into equally sized units, and the number of fish in each unit is
> counted. My problem is that my dataset is missing rows where the count in a
> unit equals zero. I need to create zero data for the missing units.
>
> For example:
> day<-(c(rep(4,8),rep(6,8)))
> unit<-c(seq(1,8,1),seq(2,16,2))
> value<-floor(rnorm(16,25,10))
> x<-cbind(day,unit,value)

Thanks for the actual reproducible example.

> x
>      day unit value
>  [1,]   4    1    19
>  [2,]   4    2    15
>  [3,]   4    3    16
>  [4,]   4    4    20
>  [5,]   4    5    17
>  [6,]   4    6    15
>  [7,]   4    7    14
>  [8,]   4    8    29
>  [9,]   6    2    18
> [10,]   6    4    22
> [11,]   6    6    27
> [12,]   6    8    16
> [13,]   6   10    45
> [14,]   6   12    36
> [15,]   6   14    34
> [16,]   6   16    13
>
> Lets say the stream has 16 units. For each day, I want to fill in rows for
> any missing units (e.g., units 9-16 for day 4, the odd numbered units on day
> 6) with values of zero.

Here's one option, though it may not be terribly concise:

all.samples <- expand.grid(day=unique(x[,"day"]), unit=1:16)
all.samples <- all.samples[order(all.samples[,"day"], all.samples[,"unit"]),]
x.final <- merge(x, all.samples, all.y=TRUE)
x.final[is.na(x.final[,"value"]), "value"] <- 0

Sarah

> Does anyone know a relatively concise way to do this?
> Thank you.
>
>        [[alternative HTML version deleted]]
>

--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Detect and replace omitted data

Trevor Davies
In reply to this post by Jonny Armstrong
Here is one option:
a<- data.frame(day=c(rep(4,8),rep(6,8)),unit=
c((1:8),seq(2,16,2)),value=round(runif(16,1,34),0)) #approx your data
b<- data.frame(day=c(rep(4,16),rep(6,16)),unit= 1:16) #fake df

b1<-merge (a,b, by=c('day','unit'),all.y=T)
b1$value[is.na(b1$value)]<-0


----------------------------


On Tue, Oct 18, 2011 at 10:02 AM, Jonny Armstrong <[hidden email]
> wrote:

> I am analyzing the spatial distribution of fish in a stream. The stream is
> divided into equally sized units, and the number of fish in each unit is
> counted. My problem is that my dataset is missing rows where the count in a
> unit equals zero. I need to create zero data for the missing units.
>
> For example:
> day<-(c(rep(4,8),rep(6,8)))
> unit<-c(seq(1,8,1),seq(2,16,2))
> value<-floor(rnorm(16,25,10))
> x<-cbind(day,unit,value)
>
> x
>      day unit value
>  [1,]   4    1    19
>  [2,]   4    2    15
>  [3,]   4    3    16
>  [4,]   4    4    20
>  [5,]   4    5    17
>  [6,]   4    6    15
>  [7,]   4    7    14
>  [8,]   4    8    29
>  [9,]   6    2    18
> [10,]   6    4    22
> [11,]   6    6    27
> [12,]   6    8    16
> [13,]   6   10    45
> [14,]   6   12    36
> [15,]   6   14    34
> [16,]   6   16    13
>
> Lets say the stream has 16 units. For each day, I want to fill in rows for
> any missing units (e.g., units 9-16 for day 4, the odd numbered units on
> day
> 6) with values of zero.
>
> Does anyone know a relatively concise way to do this?
> Thank you.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Detect and replace omitted data

David Winsemius
In reply to this post by Sarah Goslee

On Oct 18, 2011, at 2:24 PM, Sarah Goslee wrote:

> Hi Jonny,
>
> On Tue, Oct 18, 2011 at 1:02 PM, Jonny Armstrong
> <[hidden email]> wrote:
>> I am analyzing the spatial distribution of fish in a stream. The  
>> stream is
>> divided into equally sized units, and the number of fish in each  
>> unit is
>> counted. My problem is that my dataset is missing rows where the  
>> count in a
>> unit equals zero. I need to create zero data for the missing units.
>>
>> For example:
>> day<-(c(rep(4,8),rep(6,8)))
>> unit<-c(seq(1,8,1),seq(2,16,2))
>> value<-floor(rnorm(16,25,10))
>> x<-cbind(day,unit,value)
>
> Thanks for the actual reproducible example.
>
>> x
>>      day unit value
>>  [1,]   4    1    19
>>  [2,]   4    2    15
>>  [3,]   4    3    16
>>  [4,]   4    4    20
>>  [5,]   4    5    17
>>  [6,]   4    6    15
>>  [7,]   4    7    14
>>  [8,]   4    8    29
>>  [9,]   6    2    18
>> [10,]   6    4    22
>> [11,]   6    6    27
>> [12,]   6    8    16
>> [13,]   6   10    45
>> [14,]   6   12    36
>> [15,]   6   14    34
>> [16,]   6   16    13
>>
>> Lets say the stream has 16 units. For each day, I want to fill in  
>> rows for
>> any missing units (e.g., units 9-16 for day 4, the odd numbered  
>> units on day
>> 6) with values of zero.

I could not figure out what you wanted precisely. If "day" is the row  
designator, and you want values by 'unit' and 'day' with zeros for the  
missing, then that is exactly what `xtab` delivers:

 > xtabs(value ~ day+unit, data=x)
    unit
day  1  2  3  4  5  6  7  8 10 12 14 16
   4 25 34  3 25 38 18 19 33  0  0  0  0
   6  0 22  0 42  0 37  0  4 12 31 17 28

You cannot get much more concise than that.

--
david.

>
> Here's one option, though it may not be terribly concise:
>
> all.samples <- expand.grid(day=unique(x[,"day"]), unit=1:16)
> all.samples <- all.samples[order(all.samples[,"day"],  
> all.samples[,"unit"]),]
> x.final <- merge(x, all.samples, all.y=TRUE)
> x.final[is.na(x.final[,"value"]), "value"] <- 0
>
> Sarah
>
>> Does anyone know a relatively concise way to do this?
>> Thank you.
>>
>>        [[alternative HTML version deleted]]
>>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Detect and replace omitted data

Jonny Armstrong
Thanks you for the quick and helpful replies. Problem solved.
Jonny
On Tue, Oct 18, 2011 at 11:33 AM, David Winsemius <[hidden email]>wrote:

>
> On Oct 18, 2011, at 2:24 PM, Sarah Goslee wrote:
>
>  Hi Jonny,
>>
>> On Tue, Oct 18, 2011 at 1:02 PM, Jonny Armstrong
>> <[hidden email]> wrote:
>>
>>> I am analyzing the spatial distribution of fish in a stream. The stream
>>> is
>>> divided into equally sized units, and the number of fish in each unit is
>>> counted. My problem is that my dataset is missing rows where the count in
>>> a
>>> unit equals zero. I need to create zero data for the missing units.
>>>
>>> For example:
>>> day<-(c(rep(4,8),rep(6,8)))
>>> unit<-c(seq(1,8,1),seq(2,16,2)**)
>>> value<-floor(rnorm(16,25,10))
>>> x<-cbind(day,unit,value)
>>>
>>
>> Thanks for the actual reproducible example.
>>
>>  x
>>>     day unit value
>>>  [1,]   4    1    19
>>>  [2,]   4    2    15
>>>  [3,]   4    3    16
>>>  [4,]   4    4    20
>>>  [5,]   4    5    17
>>>  [6,]   4    6    15
>>>  [7,]   4    7    14
>>>  [8,]   4    8    29
>>>  [9,]   6    2    18
>>> [10,]   6    4    22
>>> [11,]   6    6    27
>>> [12,]   6    8    16
>>> [13,]   6   10    45
>>> [14,]   6   12    36
>>> [15,]   6   14    34
>>> [16,]   6   16    13
>>>
>>> Lets say the stream has 16 units. For each day, I want to fill in rows
>>> for
>>> any missing units (e.g., units 9-16 for day 4, the odd numbered units on
>>> day
>>> 6) with values of zero.
>>>
>>
> I could not figure out what you wanted precisely. If "day" is the row
> designator, and you want values by 'unit' and 'day' with zeros for the
> missing, then that is exactly what `xtab` delivers:
>
> > xtabs(value ~ day+unit, data=x)
>   unit
> day  1  2  3  4  5  6  7  8 10 12 14 16
>  4 25 34  3 25 38 18 19 33  0  0  0  0
>  6  0 22  0 42  0 37  0  4 12 31 17 28
>
> You cannot get much more concise than that.
>
> --
> david.
>
>>
>> Here's one option, though it may not be terribly concise:
>>
>> all.samples <- expand.grid(day=unique(x[,"**day"]), unit=1:16)
>> all.samples <- all.samples[order(all.samples[**,"day"],
>> all.samples[,"unit"]),]
>> x.final <- merge(x, all.samples, all.y=TRUE)
>> x.final[is.na(x.final[,"value"**]), "value"] <- 0
>>
>> Sarah
>>
>>  Does anyone know a relatively concise way to do this?
>>> Thank you.
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>>
>> --
>> Sarah Goslee
>> http://www.**functionaldiversity.org <http://www.functionaldiversity.org>
>>
>> ______________________________**________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Detect and replace omitted data

djmuseR
In reply to this post by David Winsemius
Prompted by David's xtabs() suggestion, one way to do what I think the
OP wants is to
 * define day and unit as factors whose levels comprise the full range
of desired values;
 * use xtabs();
 * return the result as a data frame.
Something like

x <- data.frame( day = factor(rep(c(4, 6), each = 8), levels = 4:6),
                 unit = factor(c(1:8, seq(2,16,2)), levels = 1:16),
                 value = floor(rnorm(16,25,10)) )
as.data.frame(with(x, xtabs(value ~ unit + day)))

HTH,
Dennis

On Tue, Oct 18, 2011 at 11:33 AM, David Winsemius
<[hidden email]> wrote:

>
> On Oct 18, 2011, at 2:24 PM, Sarah Goslee wrote:
>
>> Hi Jonny,
>>
>> On Tue, Oct 18, 2011 at 1:02 PM, Jonny Armstrong
>> <[hidden email]> wrote:
>>>
>>> I am analyzing the spatial distribution of fish in a stream. The stream
>>> is
>>> divided into equally sized units, and the number of fish in each unit is
>>> counted. My problem is that my dataset is missing rows where the count in
>>> a
>>> unit equals zero. I need to create zero data for the missing units.
>>>
>>> For example:
>>> day<-(c(rep(4,8),rep(6,8)))
>>> unit<-c(seq(1,8,1),seq(2,16,2))
>>> value<-floor(rnorm(16,25,10))
>>> x<-cbind(day,unit,value)
>>
>> Thanks for the actual reproducible example.
>>
>>> x
>>>     day unit value
>>>  [1,]   4    1    19
>>>  [2,]   4    2    15
>>>  [3,]   4    3    16
>>>  [4,]   4    4    20
>>>  [5,]   4    5    17
>>>  [6,]   4    6    15
>>>  [7,]   4    7    14
>>>  [8,]   4    8    29
>>>  [9,]   6    2    18
>>> [10,]   6    4    22
>>> [11,]   6    6    27
>>> [12,]   6    8    16
>>> [13,]   6   10    45
>>> [14,]   6   12    36
>>> [15,]   6   14    34
>>> [16,]   6   16    13
>>>
>>> Lets say the stream has 16 units. For each day, I want to fill in rows
>>> for
>>> any missing units (e.g., units 9-16 for day 4, the odd numbered units on
>>> day
>>> 6) with values of zero.
>
> I could not figure out what you wanted precisely. If "day" is the row
> designator, and you want values by 'unit' and 'day' with zeros for the
> missing, then that is exactly what `xtab` delivers:
>
>> xtabs(value ~ day+unit, data=x)
>   unit
> day  1  2  3  4  5  6  7  8 10 12 14 16
>  4 25 34  3 25 38 18 19 33  0  0  0  0
>  6  0 22  0 42  0 37  0  4 12 31 17 28
>
> You cannot get much more concise than that.
>
> --
> david.
>>
>> Here's one option, though it may not be terribly concise:
>>
>> all.samples <- expand.grid(day=unique(x[,"day"]), unit=1:16)
>> all.samples <- all.samples[order(all.samples[,"day"],
>> all.samples[,"unit"]),]
>> x.final <- merge(x, all.samples, all.y=TRUE)
>> x.final[is.na(x.final[,"value"]), "value"] <- 0
>>
>> Sarah
>>
>>> Does anyone know a relatively concise way to do this?
>>> Thank you.
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>
>> --
>> Sarah Goslee
>> http://www.functionaldiversity.org
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Detect and replace omitted data

David Winsemius

On Oct 18, 2011, at 2:53 PM, Dennis Murphy wrote:

> Prompted by David's xtabs() suggestion, one way to do what I think the
> OP wants is to
> * define day and unit as factors whose levels comprise the full range
> of desired values;
> * use xtabs();
> * return the result as a data frame.
> Something like
>
> x <- data.frame( day = factor(rep(c(4, 6), each = 8), levels = 4:6),
>                 unit = factor(c(1:8, seq(2,16,2)), levels = 1:16),
>                 value = floor(rnorm(16,25,10)) )
> as.data.frame(with(x, xtabs(value ~ unit + day)))

Oh, ... sometimes I'm "slow". Dennis' code has it's virtues, but  
sometimes people want to avoid factors. Could also create a zero-
numeric-matrix to fill the interiors and rbind to the analysis matrix  
just in the data= input to xtabs:

  zeroes <- cbind(day =seq( min(day), max(day), by=1),
                 unit=seq(min(unit), max(unit), by=1),
                 value=0)   # ignore warning

xtabs(value~day+unit, data=rbind(x, zeroes) )
    unit
day  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
   4 25 34  3 25 38 18 19 33  0  0  0  0  0  0  0  0
   5  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   6  0 22  0 42  0 37  0  4  0 12  0 31  0 17  0 28


--
David.


>
> HTH,
> Dennis
>
> On Tue, Oct 18, 2011 at 11:33 AM, David Winsemius
> <[hidden email]> wrote:
>>
>> On Oct 18, 2011, at 2:24 PM, Sarah Goslee wrote:
>>
>>> Hi Jonny,
>>>
>>> On Tue, Oct 18, 2011 at 1:02 PM, Jonny Armstrong
>>> <[hidden email]> wrote:
>>>>
>>>> I am analyzing the spatial distribution of fish in a stream. The  
>>>> stream
>>>> is
>>>> divided into equally sized units, and the number of fish in each  
>>>> unit is
>>>> counted. My problem is that my dataset is missing rows where the  
>>>> count in
>>>> a
>>>> unit equals zero. I need to create zero data for the missing units.
>>>>
>>>> For example:
>>>> day<-(c(rep(4,8),rep(6,8)))
>>>> unit<-c(seq(1,8,1),seq(2,16,2))
>>>> value<-floor(rnorm(16,25,10))
>>>> x<-cbind(day,unit,value)
>>>
>>> Thanks for the actual reproducible example.
>>>
>>>> x
>>>>     day unit value
>>>>  [1,]   4    1    19
>>>>  [2,]   4    2    15
>>>>  [3,]   4    3    16
>>>>  [4,]   4    4    20
>>>>  [5,]   4    5    17
>>>>  [6,]   4    6    15
>>>>  [7,]   4    7    14
>>>>  [8,]   4    8    29
>>>>  [9,]   6    2    18
>>>> [10,]   6    4    22
>>>> [11,]   6    6    27
>>>> [12,]   6    8    16
>>>> [13,]   6   10    45
>>>> [14,]   6   12    36
>>>> [15,]   6   14    34
>>>> [16,]   6   16    13
>>>>
>>>> Lets say the stream has 16 units. For each day, I want to fill in  
>>>> rows
>>>> for
>>>> any missing units (e.g., units 9-16 for day 4, the odd numbered  
>>>> units on
>>>> day
>>>> 6) with values of zero.
>>
>> I could not figure out what you wanted precisely. If "day" is the row
>> designator, and you want values by 'unit' and 'day' with zeros for  
>> the
>> missing, then that is exactly what `xtab` delivers:
>>
>>> xtabs(value ~ day+unit, data=x)
>>   unit
>> day  1  2  3  4  5  6  7  8 10 12 14 16
>>  4 25 34  3 25 38 18 19 33  0  0  0  0
>>  6  0 22  0 42  0 37  0  4 12 31 17 28
>>
>> You cannot get much more concise than that.
>>
>> --
>> david.
>>>
>>> Here's one option, though it may not be terribly concise:
>>>
>>> all.samples <- expand.grid(day=unique(x[,"day"]), unit=1:16)
>>> all.samples <- all.samples[order(all.samples[,"day"],
>>> all.samples[,"unit"]),]
>>> x.final <- merge(x, all.samples, all.y=TRUE)
>>> x.final[is.na(x.final[,"value"]), "value"] <- 0
>>>
>>> Sarah
>>>
>>>> Does anyone know a relatively concise way to do this?
>>>> Thank you.
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.