OK - I got the data - now what? :-)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

OK - I got the data - now what? :-)

Mark Knecht
OK, I guess I'm getting better at the data part of R. I wrote a
program outside of R this morning to dump a bunch of experimental
data. It's a sort of ragged array - about 700 rows and 400 columns,
but the amount of data in each column varies based on the length of
the experiment. The real data ends with a 0 following some non-zero
value. It might be as short as 5 to 10 columns or as many as 390. The
first 9 columns contain some data about when the experiment was run
and a few other things I thought I might be interested in later. All
the data starts in column 10 and has headers saying C1, C2, C3, C4,
etc., up to C390 The first value for every experiment is some value I
will normalize and then the values following are above and below the
original tracing out the path that the experiment took, ending
somewhere to the right but not a fixed number of readings.

R reads it in fine and it looks good so far.

Now, what I thought I might do with R is plot all 700 rows as
individual lines, giving them some color based on info in columns 1-9,
but suddenly I'm lost again in plots which I think should be fairly
easy. How would I go about creating a plot for even one line, much
less all of them? I don't have a row with 1,2,3,4 to us as the X axis
values. I could go back and put one in the data but then I don't think
that should really be required, or I could go back and make the
headers for the whole array 1:400 and then plot from 10:400 but I
thought I read that headers cannot start with numbers.

Maybe the X axis values for a plot can actually be non-numeric C1, C2,
C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
I should strip the C from C1 and be left with 1? Maybe the best thing
is to copy the data for one line to another data.frame or array and
then plot that?

Just sort of lost looking at help files. Thanks for any ideas you can
send along. Ask questions if I didn't explain my problem well enough.
Not looking for anyone to do my work, just trying to get the concepts
right

Cheers,
Mark

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

jholtman
See if this example helps; show how to either plot the row or columns
of a data frame:

> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
> test
           C1        C2        C3
1  0.91287592 0.3390729 0.4346595
2  0.29360337 0.8394404 0.7125147
3  0.45906573 0.3466835 0.3999944
4  0.33239467 0.3337749 0.3253522
5  0.65087047 0.4763512 0.7570871
6  0.25801678 0.8921983 0.2026923
7  0.47854525 0.8643395 0.7111212
8  0.76631067 0.3899895 0.1216919
9  0.08424691 0.7773207 0.2454885
10 0.87532133 0.9606180 0.1433044
> # this will plot each column (C1, C2, C3)
> matplot(test, type='o')
> # plot each row
> matplot(t(test), type='o')


On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<[hidden email]> wrote:

> OK, I guess I'm getting better at the data part of R. I wrote a
> program outside of R this morning to dump a bunch of experimental
> data. It's a sort of ragged array - about 700 rows and 400 columns,
> but the amount of data in each column varies based on the length of
> the experiment. The real data ends with a 0 following some non-zero
> value. It might be as short as 5 to 10 columns or as many as 390. The
> first 9 columns contain some data about when the experiment was run
> and a few other things I thought I might be interested in later. All
> the data starts in column 10 and has headers saying C1, C2, C3, C4,
> etc., up to C390 The first value for every experiment is some value I
> will normalize and then the values following are above and below the
> original tracing out the path that the experiment took, ending
> somewhere to the right but not a fixed number of readings.
>
> R reads it in fine and it looks good so far.
>
> Now, what I thought I might do with R is plot all 700 rows as
> individual lines, giving them some color based on info in columns 1-9,
> but suddenly I'm lost again in plots which I think should be fairly
> easy. How would I go about creating a plot for even one line, much
> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
> values. I could go back and put one in the data but then I don't think
> that should really be required, or I could go back and make the
> headers for the whole array 1:400 and then plot from 10:400 but I
> thought I read that headers cannot start with numbers.
>
> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
> I should strip the C from C1 and be left with 1? Maybe the best thing
> is to copy the data for one line to another data.frame or array and
> then plot that?
>
> Just sort of lost looking at help files. Thanks for any ideas you can
> send along. Ask questions if I didn't explain my problem well enough.
> Not looking for anyone to do my work, just trying to get the concepts
> right
>
> Cheers,
> Mark
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Wardle
In reply to this post by Mark Knecht
Hi. Essentially your data is currently in "wide" format, with repeated
measures in different columns. For most analysis and in particular for
graphing, it is frequently helpful to reshape your data into a "long"
format, with one row per data value and additional variables to list
experiment or subject identifier, experimental conditions etc.

see ?reshape and Dr. Wickham's reshape package (http://had.co.nz/reshape/)

Good luck,

Mark


2009/7/5 Mark Knecht <[hidden email]>:

> OK, I guess I'm getting better at the data part of R. I wrote a
> program outside of R this morning to dump a bunch of experimental
> data. It's a sort of ragged array - about 700 rows and 400 columns,
> but the amount of data in each column varies based on the length of
> the experiment. The real data ends with a 0 following some non-zero
> value. It might be as short as 5 to 10 columns or as many as 390. The
> first 9 columns contain some data about when the experiment was run
> and a few other things I thought I might be interested in later. All
> the data starts in column 10 and has headers saying C1, C2, C3, C4,
> etc., up to C390 The first value for every experiment is some value I
> will normalize and then the values following are above and below the
> original tracing out the path that the experiment took, ending
> somewhere to the right but not a fixed number of readings.
>
> R reads it in fine and it looks good so far.
>
> Now, what I thought I might do with R is plot all 700 rows as
> individual lines, giving them some color based on info in columns 1-9,
> but suddenly I'm lost again in plots which I think should be fairly
> easy. How would I go about creating a plot for even one line, much
> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
> values. I could go back and put one in the data but then I don't think
> that should really be required, or I could go back and make the
> headers for the whole array 1:400 and then plot from 10:400 but I
> thought I read that headers cannot start with numbers.
>
> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
> I should strip the C from C1 and be left with 1? Maybe the best thing
> is to copy the data for one line to another data.frame or array and
> then plot that?
>
> Just sort of lost looking at help files. Thanks for any ideas you can
> send along. Ask questions if I didn't explain my problem well enough.
> Not looking for anyone to do my work, just trying to get the concepts
> right
>
> Cheers,
> Mark
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



--
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
In reply to this post by jholtman
On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<[hidden email]> wrote:

> See if this example helps; show how to either plot the row or columns
> of a data frame:
>
>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>> test
>           C1        C2        C3
> 1  0.91287592 0.3390729 0.4346595
> 2  0.29360337 0.8394404 0.7125147
> 3  0.45906573 0.3466835 0.3999944
> 4  0.33239467 0.3337749 0.3253522
> 5  0.65087047 0.4763512 0.7570871
> 6  0.25801678 0.8921983 0.2026923
> 7  0.47854525 0.8643395 0.7111212
> 8  0.76631067 0.3899895 0.1216919
> 9  0.08424691 0.7773207 0.2454885
> 10 0.87532133 0.9606180 0.1433044
>> # this will plot each column (C1, C2, C3)
>> matplot(test, type='o')
>> # plot each row
>> matplot(t(test), type='o')
>
>
> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<[hidden email]> wrote:
>> OK, I guess I'm getting better at the data part of R. I wrote a
>> program outside of R this morning to dump a bunch of experimental
>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>> but the amount of data in each column varies based on the length of
>> the experiment. The real data ends with a 0 following some non-zero
>> value. It might be as short as 5 to 10 columns or as many as 390. The
>> first 9 columns contain some data about when the experiment was run
>> and a few other things I thought I might be interested in later. All
>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>> etc., up to C390 The first value for every experiment is some value I
>> will normalize and then the values following are above and below the
>> original tracing out the path that the experiment took, ending
>> somewhere to the right but not a fixed number of readings.
>>
>> R reads it in fine and it looks good so far.
>>
>> Now, what I thought I might do with R is plot all 700 rows as
>> individual lines, giving them some color based on info in columns 1-9,
>> but suddenly I'm lost again in plots which I think should be fairly
>> easy. How would I go about creating a plot for even one line, much
>> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
>> values. I could go back and put one in the data but then I don't think
>> that should really be required, or I could go back and make the
>> headers for the whole array 1:400 and then plot from 10:400 but I
>> thought I read that headers cannot start with numbers.
>>
>> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
>> I should strip the C from C1 and be left with 1? Maybe the best thing
>> is to copy the data for one line to another data.frame or array and
>> then plot that?
>>
>> Just sort of lost looking at help files. Thanks for any ideas you can
>> send along. Ask questions if I didn't explain my problem well enough.
>> Not looking for anyone to do my work, just trying to get the concepts
>> right
>>
>> Cheers,
>> Mark
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390

Hey Jim,
   Thanks for the pointers on matplot. I suspect that will be useful
one of these days.

   I'm attaching a little code to make a test case closer to what I
have to deal with at the bottom. My problem with your data was that
you plot everything. In my data I need to plot only a portion of it,
and in the array not every cell is valid - I don't want to plot cells
that have 0.00 as a value. In the array 'test' I need to plot the
general area defined by C1:C6, each row as a line, but stop plotting
each row when I run into a 0. Keep in mind that I don't know what
column C1 starts in. It is likely to change over time.

   I think the root cause of a number of my coding problems in R right
now is my lack of skills in reading and grabbing portions of the data
out of arrays. I'm new at this. (And not a programmer) I need to find
some good examples to read and test on that subject. If I could locate
which column was called C1, then read row 3 from C1 up to the last
value before a 0, I'd have proper data to plot for one line. Repeat as
necessary through the array and I get all the lines. Doing the lines
one at a time should allow me the opportunity to apply color or not
plot based on values in the first few columns.

Thanks,
Mark

test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test<-round(test,2)

#Make array ragged
test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
test$C6[7]<-0
test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0

#Print array
test

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
In reply to this post by Mark Wardle
On Sun, Jul 5, 2009 at 12:00 AM, Mark Wardle<[hidden email]> wrote:

> Hi. Essentially your data is currently in "wide" format, with repeated
> measures in different columns. For most analysis and in particular for
> graphing, it is frequently helpful to reshape your data into a "long"
> format, with one row per data value and additional variables to list
> experiment or subject identifier, experimental conditions etc.
>
> see ?reshape and Dr. Wickham's reshape package (http://had.co.nz/reshape/)
>
> Good luck,
>
> Mark
>

This looks interesting. Thanks!

cheers,
Mark

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

David Winsemius
In reply to this post by Mark Knecht

On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:

> On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<[hidden email]> wrote:
>> See if this example helps; show how to either plot the row or columns
>> of a data frame:
>>
>>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>>> test
>>           C1        C2        C3
>> 1  0.91287592 0.3390729 0.4346595
>> 2  0.29360337 0.8394404 0.7125147
>> 3  0.45906573 0.3466835 0.3999944
>> 4  0.33239467 0.3337749 0.3253522
>> 5  0.65087047 0.4763512 0.7570871
>> 6  0.25801678 0.8921983 0.2026923
>> 7  0.47854525 0.8643395 0.7111212
>> 8  0.76631067 0.3899895 0.1216919
>> 9  0.08424691 0.7773207 0.2454885
>> 10 0.87532133 0.9606180 0.1433044
>>> # this will plot each column (C1, C2, C3)
>>> matplot(test, type='o')
>>> # plot each row
>>> matplot(t(test), type='o')
>>
>>
>> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<[hidden email]>  
>> wrote:
>>> OK, I guess I'm getting better at the data part of R. I wrote a
>>> program outside of R this morning to dump a bunch of experimental
>>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>>> but the amount of data in each column varies based on the length of
>>> the experiment. The real data ends with a 0 following some non-zero
>>> value. It might be as short as 5 to 10 columns or as many as 390.  
>>> The
>>> first 9 columns contain some data about when the experiment was run
>>> and a few other things I thought I might be interested in later. All
>>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>>> etc., up to C390 The first value for every experiment is some  
>>> value I
>>> will normalize and then the values following are above and below the
>>> original tracing out the path that the experiment took, ending
>>> somewhere to the right but not a fixed number of readings.
>>>
>>> R reads it in fine and it looks good so far.
>>>
>>> Now, what I thought I might do with R is plot all 700 rows as
>>> individual lines, giving them some color based on info in columns  
>>> 1-9,
>>> but suddenly I'm lost again in plots which I think should be fairly
>>> easy. How would I go about creating a plot for even one line, much
>>> less all of them? I don't have a row with 1,2,3,4 to us as the X  
>>> axis
>>> values. I could go back and put one in the data but then I don't  
>>> think
>>> that should really be required, or I could go back and make the
>>> headers for the whole array 1:400 and then plot from 10:400 but I
>>> thought I read that headers cannot start with numbers.
>>>
>>> Maybe the X axis values for a plot can actually be non-numeric C1,  
>>> C2,
>>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or  
>>> maybe
>>> I should strip the C from C1 and be left with 1? Maybe the best  
>>> thing
>>> is to copy the data for one line to another data.frame or array and
>>> then plot that?
>>>
>>> Just sort of lost looking at help files. Thanks for any ideas you  
>>> can
>>> send along. Ask questions if I didn't explain my problem well  
>>> enough.
>>> Not looking for anyone to do my work, just trying to get the  
>>> concepts
>>> right
>>>
>>> Cheers,
>>> Mark
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>
> Hey Jim,
>   Thanks for the pointers on matplot. I suspect that will be useful
> one of these days.
>
>   I'm attaching a little code to make a test case closer to what I
> have to deal with at the bottom. My problem with your data was that
> you plot everything. In my data I need to plot only a portion of it,
> and in the array not every cell is valid - I don't want to plot cells
> that have 0.00 as a value. In the array 'test' I need to plot the
> general area defined by C1:C6, each row as a line, but stop plotting
> each row when I run into a 0. Keep in mind that I don't know what
> column C1 starts in. It is likely to change over time.
>
>   I think the root cause of a number of my coding problems in R right
> now is my lack of skills in reading and grabbing portions of the data
> out of arrays. I'm new at this. (And not a programmer) I need to find
> some good examples to read and test on that subject. If I could locate
> which column was called C1, then read row 3 from C1 up to the last
> value before a 0, I'd have proper data to plot for one line. Repeat as
> necessary through the array and I get all the lines. Doing the lines
> one at a time should allow me the opportunity to apply color or not
> plot based on values in the first few columns.
>
> Thanks,
> Mark
>
> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> test<-round(test,2)
>
> #Make array ragged
> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> test$C6[7]<-0
> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>
> #Print array
> test

?"[" for the help page on Extract which is a gold mine of useful methods

A single row can be extracted with:
test[3, ]

Two rows:
test[3:4, ]

And individual elements of a vector can be further specified:
 > test[3,][4:5]
     C2   C3
3 0.66 0.51

You can then access or determine numerical values with logical  
functions such as which:
which(names(test)=="C1")   # 3  names gives you an ordered listing of  
column names
which(test[3,] == 0.0)     # 6,7

(Note:  one of the most frequent newbie questions is why some  
seemingly obvious equality expressions are FALSE):
 > sqrt(2)*sqrt(2) == 2
[1] FALSE
So if your values are calculated from other values then consider using  
all.equal()

And repeated applications of the testing criteria process are effective:

test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
     C1   C2   C3
3 0.52 0.66 0.51

(and a warning that does not seem accurate to me.)

In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
   numerical expression has 3 elements: only the first used

Seems to me that all of the element were used. I cannot explain that  
warning but am pretty sure it can be ignored.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Uwe Ligges-3


David Winsemius wrote:

>
> On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:
>
>> On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<[hidden email]> wrote:
>>> See if this example helps; show how to either plot the row or columns
>>> of a data frame:
>>>
>>>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>>>> test
>>>           C1        C2        C3
>>> 1  0.91287592 0.3390729 0.4346595
>>> 2  0.29360337 0.8394404 0.7125147
>>> 3  0.45906573 0.3466835 0.3999944
>>> 4  0.33239467 0.3337749 0.3253522
>>> 5  0.65087047 0.4763512 0.7570871
>>> 6  0.25801678 0.8921983 0.2026923
>>> 7  0.47854525 0.8643395 0.7111212
>>> 8  0.76631067 0.3899895 0.1216919
>>> 9  0.08424691 0.7773207 0.2454885
>>> 10 0.87532133 0.9606180 0.1433044
>>>> # this will plot each column (C1, C2, C3)
>>>> matplot(test, type='o')
>>>> # plot each row
>>>> matplot(t(test), type='o')
>>>
>>>
>>> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<[hidden email]> wrote:
>>>> OK, I guess I'm getting better at the data part of R. I wrote a
>>>> program outside of R this morning to dump a bunch of experimental
>>>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>>>> but the amount of data in each column varies based on the length of
>>>> the experiment. The real data ends with a 0 following some non-zero
>>>> value. It might be as short as 5 to 10 columns or as many as 390. The
>>>> first 9 columns contain some data about when the experiment was run
>>>> and a few other things I thought I might be interested in later. All
>>>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>>>> etc., up to C390 The first value for every experiment is some value I
>>>> will normalize and then the values following are above and below the
>>>> original tracing out the path that the experiment took, ending
>>>> somewhere to the right but not a fixed number of readings.
>>>>
>>>> R reads it in fine and it looks good so far.
>>>>
>>>> Now, what I thought I might do with R is plot all 700 rows as
>>>> individual lines, giving them some color based on info in columns 1-9,
>>>> but suddenly I'm lost again in plots which I think should be fairly
>>>> easy. How would I go about creating a plot for even one line, much
>>>> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
>>>> values. I could go back and put one in the data but then I don't think
>>>> that should really be required, or I could go back and make the
>>>> headers for the whole array 1:400 and then plot from 10:400 but I
>>>> thought I read that headers cannot start with numbers.
>>>>
>>>> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
>>>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
>>>> I should strip the C from C1 and be left with 1? Maybe the best thing
>>>> is to copy the data for one line to another data.frame or array and
>>>> then plot that?
>>>>
>>>> Just sort of lost looking at help files. Thanks for any ideas you can
>>>> send along. Ask questions if I didn't explain my problem well enough.
>>>> Not looking for anyone to do my work, just trying to get the concepts
>>>> right
>>>>
>>>> Cheers,
>>>> Mark
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>
>> Hey Jim,
>>   Thanks for the pointers on matplot. I suspect that will be useful
>> one of these days.
>>
>>   I'm attaching a little code to make a test case closer to what I
>> have to deal with at the bottom. My problem with your data was that
>> you plot everything. In my data I need to plot only a portion of it,
>> and in the array not every cell is valid - I don't want to plot cells
>> that have 0.00 as a value. In the array 'test' I need to plot the
>> general area defined by C1:C6, each row as a line, but stop plotting
>> each row when I run into a 0. Keep in mind that I don't know what
>> column C1 starts in. It is likely to change over time.
>>
>>   I think the root cause of a number of my coding problems in R right
>> now is my lack of skills in reading and grabbing portions of the data
>> out of arrays. I'm new at this. (And not a programmer) I need to find
>> some good examples to read and test on that subject. If I could locate
>> which column was called C1, then read row 3 from C1 up to the last
>> value before a 0, I'd have proper data to plot for one line. Repeat as
>> necessary through the array and I get all the lines. Doing the lines
>> one at a time should allow me the opportunity to apply color or not
>> plot based on values in the first few columns.
>>
>> Thanks,
>> Mark
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> #Print array
>> test
>
> ?"[" for the help page on Extract which is a gold mine of useful methods
>
> A single row can be extracted with:
> test[3, ]
>
> Two rows:
> test[3:4, ]
>
> And individual elements of a vector can be further specified:
>  > test[3,][4:5]
>     C2   C3
> 3 0.66 0.51
>
> You can then access or determine numerical values with logical functions
> such as which:
> which(names(test)=="C1")   # 3  names gives you an ordered listing of
> column names
> which(test[3,] == 0.0)     # 6,7
>
> (Note:  one of the most frequent newbie questions is why some seemingly
> obvious equality expressions are FALSE):
>  > sqrt(2)*sqrt(2) == 2
> [1] FALSE
> So if your values are calculated from other values then consider using
> all.equal()
>
> And repeated applications of the testing criteria process are effective:
>
> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>     C1   C2   C3
> 3 0.52 0.66 0.51
>
> (and a warning that does not seem accurate to me.)
>
> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>   numerical expression has 3 elements: only the first used


David,

# which(test[3,] == 0.0)
[1] 6 7 8

and in a:b a and b must be length 1 vectors (scalars) otherwise just the
first element (in this case 6) is used.

That leads us to the conclusion that writing the line above is not
really the cleanest way or you intended something different ....

Best,
Uwe



> Seems to me that all of the element were used. I cannot explain that
> warning but am pretty sure it can be ignored.
>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
In reply to this post by David Winsemius
On Sun, Jul 5, 2009 at 7:35 AM, David Winsemius<[hidden email]> wrote:

>
> On Jul 5, 2009, at 9:53 AM, Mark Knecht wrote:
>
>> On Sat, Jul 4, 2009 at 5:22 PM, jim holtman<[hidden email]> wrote:
>>>
>>> See if this example helps; show how to either plot the row or columns
>>> of a data frame:
>>>
>>>> test <- data.frame(C1=runif(10), C2=runif(10), C3=runif(10))
>>>> test
>>>
>>>          C1        C2        C3
>>> 1  0.91287592 0.3390729 0.4346595
>>> 2  0.29360337 0.8394404 0.7125147
>>> 3  0.45906573 0.3466835 0.3999944
>>> 4  0.33239467 0.3337749 0.3253522
>>> 5  0.65087047 0.4763512 0.7570871
>>> 6  0.25801678 0.8921983 0.2026923
>>> 7  0.47854525 0.8643395 0.7111212
>>> 8  0.76631067 0.3899895 0.1216919
>>> 9  0.08424691 0.7773207 0.2454885
>>> 10 0.87532133 0.9606180 0.1433044
>>>>
>>>> # this will plot each column (C1, C2, C3)
>>>> matplot(test, type='o')
>>>> # plot each row
>>>> matplot(t(test), type='o')
>>>
>>>
>>> On Sat, Jul 4, 2009 at 8:02 PM, Mark Knecht<[hidden email]> wrote:
>>>>
>>>> OK, I guess I'm getting better at the data part of R. I wrote a
>>>> program outside of R this morning to dump a bunch of experimental
>>>> data. It's a sort of ragged array - about 700 rows and 400 columns,
>>>> but the amount of data in each column varies based on the length of
>>>> the experiment. The real data ends with a 0 following some non-zero
>>>> value. It might be as short as 5 to 10 columns or as many as 390. The
>>>> first 9 columns contain some data about when the experiment was run
>>>> and a few other things I thought I might be interested in later. All
>>>> the data starts in column 10 and has headers saying C1, C2, C3, C4,
>>>> etc., up to C390 The first value for every experiment is some value I
>>>> will normalize and then the values following are above and below the
>>>> original tracing out the path that the experiment took, ending
>>>> somewhere to the right but not a fixed number of readings.
>>>>
>>>> R reads it in fine and it looks good so far.
>>>>
>>>> Now, what I thought I might do with R is plot all 700 rows as
>>>> individual lines, giving them some color based on info in columns 1-9,
>>>> but suddenly I'm lost again in plots which I think should be fairly
>>>> easy. How would I go about creating a plot for even one line, much
>>>> less all of them? I don't have a row with 1,2,3,4 to us as the X axis
>>>> values. I could go back and put one in the data but then I don't think
>>>> that should really be required, or I could go back and make the
>>>> headers for the whole array 1:400 and then plot from 10:400 but I
>>>> thought I read that headers cannot start with numbers.
>>>>
>>>> Maybe the X axis values for a plot can actually be non-numeric C1, C2,
>>>> C3, C4, etc and I could use line (C1,0) to (C2,5) and so on? Or maybe
>>>> I should strip the C from C1 and be left with 1? Maybe the best thing
>>>> is to copy the data for one line to another data.frame or array and
>>>> then plot that?
>>>>
>>>> Just sort of lost looking at help files. Thanks for any ideas you can
>>>> send along. Ask questions if I didn't explain my problem well enough.
>>>> Not looking for anyone to do my work, just trying to get the concepts
>>>> right
>>>>
>>>> Cheers,
>>>> Mark
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>
>> Hey Jim,
>>  Thanks for the pointers on matplot. I suspect that will be useful
>> one of these days.
>>
>>  I'm attaching a little code to make a test case closer to what I
>> have to deal with at the bottom. My problem with your data was that
>> you plot everything. In my data I need to plot only a portion of it,
>> and in the array not every cell is valid - I don't want to plot cells
>> that have 0.00 as a value. In the array 'test' I need to plot the
>> general area defined by C1:C6, each row as a line, but stop plotting
>> each row when I run into a 0. Keep in mind that I don't know what
>> column C1 starts in. It is likely to change over time.
>>
>>  I think the root cause of a number of my coding problems in R right
>> now is my lack of skills in reading and grabbing portions of the data
>> out of arrays. I'm new at this. (And not a programmer) I need to find
>> some good examples to read and test on that subject. If I could locate
>> which column was called C1, then read row 3 from C1 up to the last
>> value before a 0, I'd have proper data to plot for one line. Repeat as
>> necessary through the array and I get all the lines. Doing the lines
>> one at a time should allow me the opportunity to apply color or not
>> plot based on values in the first few columns.
>>
>> Thanks,
>> Mark
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> #Print array
>> test
>
> ?"[" for the help page on Extract which is a gold mine of useful methods
>
> A single row can be extracted with:
> test[3, ]
>
> Two rows:
> test[3:4, ]
>
> And individual elements of a vector can be further specified:
>> test[3,][4:5]
>    C2   C3
> 3 0.66 0.51
>
> You can then access or determine numerical values with logical functions
> such as which:
> which(names(test)=="C1")   # 3  names gives you an ordered listing of column
> names
> which(test[3,] == 0.0)     # 6,7
>
> (Note:  one of the most frequent newbie questions is why some seemingly
> obvious equality expressions are FALSE):
>> sqrt(2)*sqrt(2) == 2
> [1] FALSE
> So if your values are calculated from other values then consider using
> all.equal()
>
> And repeated applications of the testing criteria process are effective:
>
> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>    C1   C2   C3
> 3 0.52 0.66 0.51
>
> (and a warning that does not seem accurate to me.)
>
> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>  numerical expression has 3 elements: only the first used
>
> Seems to me that all of the element were used. I cannot explain that warning
> but am pretty sure it can be ignored.
>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

Really GREAT examples. Giving me lots of ideas. In fact with a little
study it seemed to help me solve your warning message. Since the
expression

which(test[3,0]==0)

returns a list of integer values, I was able to choose only the first
of those values with [1] and the warning disappears:

> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
    C1   C2  C3
3 0.01 0.37 0.4
Warning message:
In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
  numerical expression has 3 elements: only the first used


> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)[1]-1)]
    C1   C2  C3
3 0.01 0.37 0.4
>

LOTS more study to do but I think this helps me move forward.

Thanks!

- Mark

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

David Winsemius
In reply to this post by Uwe Ligges-3

On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:

>
>
> David Winsemius wrote:
>>
>> So if your values are calculated from other values then consider  
>> using all.equal()
>> And repeated applications of the testing criteria process are  
>> effective:
>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>    C1   C2   C3
>> 3 0.52 0.66 0.51
>> (and a warning that does not seem accurate to me.)
>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>  numerical expression has 3 elements: only the first used
>
>
> David,
>
> # which(test[3,] == 0.0)
> [1] 6 7 8
>
> and in a:b a and b must be length 1 vectors (scalars) otherwise just  
> the first element (in this case 6) is used.
>
> That leads us to the conclusion that writing the line above is not  
> really the cleanest way or you intended something different ....

Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks  
as though I would not be getting in truouble this way, but a cleaner  
method would be to access only the first element of which(test[3, ] ==  
0):

test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]

>
> David

>> Seems to me that all of the element were used. I cannot explain  
>> that warning but am pretty sure it can be ignored.
>>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
On Sun, Jul 5, 2009 at 8:18 AM, David Winsemius<[hidden email]> wrote:

>
> On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>
>>
>>
>> David Winsemius wrote:
>>>
>>> So if your values are calculated from other values then consider using
>>> all.equal()
>>> And repeated applications of the testing criteria process are effective:
>>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>>   C1   C2   C3
>>> 3 0.52 0.66 0.51
>>> (and a warning that does not seem accurate to me.)
>>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>>  numerical expression has 3 elements: only the first used
>>
>>
>> David,
>>
>> # which(test[3,] == 0.0)
>> [1] 6 7 8
>>
>> and in a:b a and b must be length 1 vectors (scalars) otherwise just the
>> first element (in this case 6) is used.
>>
>> That leads us to the conclusion that writing the line above is not really
>> the cleanest way or you intended something different ....
>
> Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks as
> though I would not be getting in truouble this way, but a cleaner method
> would be to access only the first element of which(test[3, ] == 0):
>
> test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
>
>>
>> David
>
>>> Seems to me that all of the element were used. I cannot explain that
>>> warning but am pretty sure it can be ignored.
>>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>

OK - making lots more headway. Thanks for your help.

QUESTION: How do I handle the case where I'm testing for 0 and don't
find it? In this case I need to all of the row from C1:C6.

test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
test<-round(test,2)

#Make array ragged
test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
test$C6[7]<-0
test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0

test

#C1 always the same so calculate it only once
StartCol <- which(names(test)=="C1")

#Print row 3 explicitly
test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]

#Row 6 fails because 0 is not found
test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]

EndCol <- which(test[6,] == 0.0)[1]-1
EndCol

Thanks,
Mark

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

David Winsemius

On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:

> On Sun, Jul 5, 2009 at 8:18 AM, David  
> Winsemius<[hidden email]> wrote:
>>
>> On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>>
>>>
>>>
>>> David Winsemius wrote:
>>>>
>>>> So if your values are calculated from other values then consider  
>>>> using
>>>> all.equal()
>>>> And repeated applications of the testing criteria process are  
>>>> effective:
>>>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>>>   C1   C2   C3
>>>> 3 0.52 0.66 0.51
>>>> (and a warning that does not seem accurate to me.)
>>>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>>>  numerical expression has 3 elements: only the first used
>>>
>>>
>>> David,
>>>
>>> # which(test[3,] == 0.0)
>>> [1] 6 7 8
>>>
>>> and in a:b a and b must be length 1 vectors (scalars) otherwise  
>>> just the
>>> first element (in this case 6) is used.
>>>
>>> That leads us to the conclusion that writing the line above is not  
>>> really
>>> the cleanest way or you intended something different ....
>>
>> Thanks, Uwe. I see my confusion. I did want 6 to be used  and it  
>> looks as
>> though I would not be getting in truouble this way, but a cleaner  
>> method
>> would be to access only the first element of which(test[3, ] == 0):
>>
>> test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)
>> [1]-1) ]
>>
>>>
>>> David
>>
>>>> Seems to me that all of the element were used. I cannot explain  
>>>> that
>>>> warning but am pretty sure it can be ignored.
>>>>
>>
>> David
>
> OK - making lots more headway. Thanks for your help.
>
> QUESTION: How do I handle the case where I'm testing for 0 and don't
> find it? In this case I need to all of the row from C1:C6.
>
> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> test<-round(test,2)
>
> #Make array ragged
> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> test$C6[7]<-0
> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>
> test
>
> #C1 always the same so calculate it only once
> StartCol <- which(names(test)=="C1")
>
> #Print row 3 explicitly
> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>
> #Row 6 fails because 0 is not found
> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>
> EndCol <- which(test[6,] == 0.0)[1]-1
> EndCol
>

It's getting a bit Baroque, but here is a solution that handles an NA:

test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
                               ncol(test),   which(test[6,] == 0.0)
[1]-1 )
             ]
#####-----
     C1   C2   C3   C4   C5   C6
6 0.33 0.84 0.51 0.86 0.84 0.15


Maybe an R-meister can offer something more compact?

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Uwe Ligges-3


David Winsemius wrote:

>
> On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:
>
>> On Sun, Jul 5, 2009 at 8:18 AM, David
>> Winsemius<[hidden email]> wrote:
>>>
>>> On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>>>
>>>>
>>>>
>>>> David Winsemius wrote:
>>>>>
>>>>> So if your values are calculated from other values then consider using
>>>>> all.equal()
>>>>> And repeated applications of the testing criteria process are
>>>>> effective:
>>>>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>>>>   C1   C2   C3
>>>>> 3 0.52 0.66 0.51
>>>>> (and a warning that does not seem accurate to me.)
>>>>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>>>>  numerical expression has 3 elements: only the first used
>>>>
>>>>
>>>> David,
>>>>
>>>> # which(test[3,] == 0.0)
>>>> [1] 6 7 8
>>>>
>>>> and in a:b a and b must be length 1 vectors (scalars) otherwise just
>>>> the
>>>> first element (in this case 6) is used.
>>>>
>>>> That leads us to the conclusion that writing the line above is not
>>>> really
>>>> the cleanest way or you intended something different ....
>>>
>>> Thanks, Uwe. I see my confusion. I did want 6 to be used  and it
>>> looks as
>>> though I would not be getting in truouble this way, but a cleaner method
>>> would be to access only the first element of which(test[3, ] == 0):
>>>
>>> test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
>>>
>>>>
>>>> David
>>>
>>>>> Seems to me that all of the element were used. I cannot explain that
>>>>> warning but am pretty sure it can be ignored.
>>>>>
>>>
>>> David
>>
>> OK - making lots more headway. Thanks for your help.
>>
>> QUESTION: How do I handle the case where I'm testing for 0 and don't
>> find it? In this case I need to all of the row from C1:C6.
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> test
>>
>> #C1 always the same so calculate it only once
>> StartCol <- which(names(test)=="C1")
>>
>> #Print row 3 explicitly
>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>
>> #Row 6 fails because 0 is not found
>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>
>> EndCol <- which(test[6,] == 0.0)[1]-1
>> EndCol
>>
>
> It's getting a bit Baroque, but here is a solution that handles an NA:
>
> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>                               ncol(test),   which(test[6,] == 0.0)[1]-1 )
>             ]
> #####-----
>     C1   C2   C3   C4   C5   C6
> 6 0.33 0.84 0.51 0.86 0.84 0.15
>
>
> Maybe an R-meister can offer something more compact?


So let's wait for some R-meister, I'd write even more ....

Reason: testing for exactly zero after possible calculations is a bit
dangerous and ifelse() is designed for vectorized operations but is not
efficient for scalar operations, particularly since both expressions are
evaluated, so if() else would be preferable, but we could use min()
instead. Finally, a:b could end up in 5:3 without a warning and I'd use
seq() instead.

Hence I'd prefer:

temp <- which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1]
test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm =
TRUE), by = 1)]


Best,
Uwe Ligges



> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
2009/7/5 Uwe Ligges <[hidden email]>:

>
>
> David Winsemius wrote:
>>
>> On Jul 5, 2009, at 12:19 PM, Mark Knecht wrote:
>>
>>> On Sun, Jul 5, 2009 at 8:18 AM, David Winsemius<[hidden email]>
>>> wrote:
>>>>
>>>> On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>>>>
>>>>>
>>>>>
>>>>> David Winsemius wrote:
>>>>>>
>>>>>> So if your values are calculated from other values then consider using
>>>>>> all.equal()
>>>>>> And repeated applications of the testing criteria process are
>>>>>> effective:
>>>>>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>>>>>>  C1   C2   C3
>>>>>> 3 0.52 0.66 0.51
>>>>>> (and a warning that does not seem accurate to me.)
>>>>>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>>>>>>  numerical expression has 3 elements: only the first used
>>>>>
>>>>>
>>>>> David,
>>>>>
>>>>> # which(test[3,] == 0.0)
>>>>> [1] 6 7 8
>>>>>
>>>>> and in a:b a and b must be length 1 vectors (scalars) otherwise just
>>>>> the
>>>>> first element (in this case 6) is used.
>>>>>
>>>>> That leads us to the conclusion that writing the line above is not
>>>>> really
>>>>> the cleanest way or you intended something different ....
>>>>
>>>> Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
>>>> as
>>>> though I would not be getting in truouble this way, but a cleaner method
>>>> would be to access only the first element of which(test[3, ] == 0):
>>>>
>>>> test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
>>>>
>>>>>
>>>>> David
>>>>
>>>>>> Seems to me that all of the element were used. I cannot explain that
>>>>>> warning but am pretty sure it can be ignored.
>>>>>>
>>>>
>>>> David
>>>
>>> OK - making lots more headway. Thanks for your help.
>>>
>>> QUESTION: How do I handle the case where I'm testing for 0 and don't
>>> find it? In this case I need to all of the row from C1:C6.
>>>
>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>> test<-round(test,2)
>>>
>>> #Make array ragged
>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>> test$C6[7]<-0
>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>
>>> test
>>>
>>> #C1 always the same so calculate it only once
>>> StartCol <- which(names(test)=="C1")
>>>
>>> #Print row 3 explicitly
>>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>>
>>> #Row 6 fails because 0 is not found
>>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>>
>>> EndCol <- which(test[6,] == 0.0)[1]-1
>>> EndCol
>>>
>>
>> It's getting a bit Baroque, but here is a solution that handles an NA:
>>
>> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>>                              ncol(test),   which(test[6,] == 0.0)[1]-1 )
>>            ]
>> #####-----
>>    C1   C2   C3   C4   C5   C6
>> 6 0.33 0.84 0.51 0.86 0.84 0.15
>>
>>
>> Maybe an R-meister can offer something more compact?
>
>
> So let's wait for some R-meister, I'd write even more ....
>
> Reason: testing for exactly zero after possible calculations is a bit
> dangerous and ifelse() is designed for vectorized operations but is not
> efficient for scalar operations, particularly since both expressions are
> evaluated, so if() else would be preferable, but we could use min() instead.
> Finally, a:b could end up in 5:3 without a warning and I'd use seq()
> instead.
>
> Hence I'd prefer:
>
> temp <- which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)), 0))[1]
> test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm =
> TRUE), by = 1)]
>
>

I appreciate both of the answers. I don't completely understand them,
but I do appreciate them. Thanks!

I was wondering whether it's easy to simply test the last column for
==0, and if true run the previous command, if false just return
everything up to the end of the row?

Currently my data is one experiment per row, but that's wasting space
as most experiments only take 20% of the row and 80% of the row is
filled with 0's. I might want to make the array more narrow and have a
flag somewhere in the 1st 10 columns that says the this row is a
continuation row from the previous row. That way I could pack the
array better, use less memory and when I do finally test for 0 I have
a short line to traverse?

Just an idea.

Anyway, I suspect either of these will suit my short term needs. On to
the next step.

Cheers,
Mark

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

David Winsemius
In reply to this post by Uwe Ligges-3

On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote:

>>> snippedpreample
>>>
>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>> test<-round(test,2)
>>>
>>> #Make array ragged
>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>> test$C6[7]<-0
>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>
>>> test
>>>
>>> #C1 always the same so calculate it only once
>>> StartCol <- which(names(test)=="C1")
>>>
>>> #Print row 3 explicitly
>>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>>
>>> #Row 6 fails because 0 is not found
>>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>>
>>> EndCol <- which(test[6,] == 0.0)[1]-1
>>> EndCol
>>>
>> It's getting a bit Baroque, but here is a solution that handles an  
>> NA:
>> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>>                              ncol(test),   which(test[6,] == 0.0)
>> [1]-1 )
>>            ]
>> #####-----
>>    C1   C2   C3   C4   C5   C6
>> 6 0.33 0.84 0.51 0.86 0.84 0.15
>> Maybe an R-meister can offer something more compact?
>
>
> So let's wait for some R-meister, I'd write even more ....
>
> Reason: testing for exactly zero after possible calculations is a  
> bit dangerous and ifelse() is designed for vectorized operations but  
> is not efficient for scalar operations, particularly since both  
> expressions are evaluated, so if() else would be preferable, but we  
> could use min() instead. Finally, a:b could end up in 5:3 without a  
> warning and I'd use seq() instead.
>
> Hence I'd prefer:
>
> temp <- which(sapply(test[6,], function(x, y)  
> isTRUE(all.equal(x,y)), 0))[1]

This appears to be learning moment for me. Do I have it correctly that  
the first argument to sapply, the vector(test[6,],  gets passed  
element-wise to the first parameter of the function, x, and the second  
argument, 0, is getting passed via recycling to the second parameter,  
y, through the , ...)  mechanism of the sapply function?

> test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm  
> = TRUE), by = 1)]

I had tried a min() solution and got Inf in return when there was an  
NA in the vector, but did not realize that it had an na.rm mode.

Thanks for the meisterhaft corrections.

>
>
> Best,
> Uwe Ligges

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Uwe Ligges-3


David Winsemius wrote:

>
> On Jul 5, 2009, at 1:19 PM, Uwe Ligges wrote:
>
>>>> snippedpreample
>>>>
>>>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>>>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>>>> test<-round(test,2)
>>>>
>>>> #Make array ragged
>>>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>>>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>>>> test$C6[7]<-0
>>>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>>>
>>>> test
>>>>
>>>> #C1 always the same so calculate it only once
>>>> StartCol <- which(names(test)=="C1")
>>>>
>>>> #Print row 3 explicitly
>>>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>>>
>>>> #Row 6 fails because 0 is not found
>>>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>>>
>>>> EndCol <- which(test[6,] == 0.0)[1]-1
>>>> EndCol
>>>>
>>> It's getting a bit Baroque, but here is a solution that handles an NA:
>>> test[6,][StartCol :ifelse(is.na( which(test[6,] == 0.0)[1]) ,
>>>                              ncol(test),   which(test[6,] == 0.0)[1]-1 )
>>>            ]
>>> #####-----
>>>    C1   C2   C3   C4   C5   C6
>>> 6 0.33 0.84 0.51 0.86 0.84 0.15
>>> Maybe an R-meister can offer something more compact?
>>
>>
>> So let's wait for some R-meister, I'd write even more ....
>>
>> Reason: testing for exactly zero after possible calculations is a bit
>> dangerous and ifelse() is designed for vectorized operations but is
>> not efficient for scalar operations, particularly since both
>> expressions are evaluated, so if() else would be preferable, but we
>> could use min() instead. Finally, a:b could end up in 5:3 without a
>> warning and I'd use seq() instead.
>>
>> Hence I'd prefer:
>>
>> temp <- which(sapply(test[6,], function(x, y) isTRUE(all.equal(x,y)),
>> 0))[1]
>
> This appears to be learning moment for me. Do I have it correctly that
> the first argument to sapply, the vector(test[6,],  gets passed
> element-wise to the first parameter of the function, x,


Yes.


> and the second
> argument, 0, is getting passed via recycling to the second parameter, y,
> through the , ...)  mechanism of the sapply function?


No, each time the whole thing (which is just 0 here) is passed to
sapply, not via recycling.



>> test[6, seq(from = StartCol, to = min(c(temp - 1, ncol(test)), na.rm =
>> TRUE), by = 1)]
>
> I had tried a min() solution and got Inf in return when there was an NA
> in the vector, but did not realize that it had an na.rm mode.
>
> Thanks for the meisterhaft corrections.


:-)


Uwe


>>
>>
>> Best,
>> Uwe Ligges
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Henrique Dallazuanna
In reply to this post by Mark Knecht
Try this:

subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) > 0]

subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) > 0]



On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht <[hidden email]> wrote:

> On Sun, Jul 5, 2009 at 8:18 AM, David Winsemius<[hidden email]>
> wrote:
> >
> > On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
> >
> >>
> >>
> >> David Winsemius wrote:
> >>>
> >>> So if your values are calculated from other values then consider using
> >>> all.equal()
> >>> And repeated applications of the testing criteria process are
> effective:
> >>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
> >>>   C1   C2   C3
> >>> 3 0.52 0.66 0.51
> >>> (and a warning that does not seem accurate to me.)
> >>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
> >>>  numerical expression has 3 elements: only the first used
> >>
> >>
> >> David,
> >>
> >> # which(test[3,] == 0.0)
> >> [1] 6 7 8
> >>
> >> and in a:b a and b must be length 1 vectors (scalars) otherwise just the
> >> first element (in this case 6) is used.
> >>
> >> That leads us to the conclusion that writing the line above is not
> really
> >> the cleanest way or you intended something different ....
> >
> > Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks as
> > though I would not be getting in truouble this way, but a cleaner method
> > would be to access only the first element of which(test[3, ] == 0):
> >
> > test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
> >
> >>
> >> David
> >
> >>> Seems to me that all of the element were used. I cannot explain that
> >>> warning but am pretty sure it can be ignored.
> >>>
> >
> > David Winsemius, MD
> > Heritage Laboratories
> > West Hartford, CT
> >
> >
>
> OK - making lots more headway. Thanks for your help.
>
> QUESTION: How do I handle the case where I'm testing for 0 and don't
> find it? In this case I need to all of the row from C1:C6.
>
> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> test<-round(test,2)
>
> #Make array ragged
> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> test$C6[7]<-0
> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>
> test
>
> #C1 always the same so calculate it only once
> StartCol <- which(names(test)=="C1")
>
> #Print row 3 explicitly
> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>
> #Row 6 fails because 0 is not found
> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>
> EndCol <- which(test[6,] == 0.0)[1]-1
> EndCol
>
> Thanks,
> Mark
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuanna<[hidden email]> wrote:
> Try this:
>
> subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) > 0]
>
> subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) > 0]
>
>

I must admit I like this one. Pleasing to look at. It seems
approachable. Thanks!

If I understand this the second subset gets evaluated first producing
either TRUE or FALSE, and then the first subset gets evaluated but
only for the entries that are TRUE? Is that the process?

Thanks,
Mark

>
> On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht <[hidden email]> wrote:
>>
>> On Sun, Jul 5, 2009 at 8:18 AM, David Winsemius<[hidden email]>
>> wrote:
>> >
>> > On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
>> >
>> >>
>> >>
>> >> David Winsemius wrote:
>> >>>
>> >>> So if your values are calculated from other values then consider using
>> >>> all.equal()
>> >>> And repeated applications of the testing criteria process are
>> >>> effective:
>> >>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
>> >>>   C1   C2   C3
>> >>> 3 0.52 0.66 0.51
>> >>> (and a warning that does not seem accurate to me.)
>> >>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
>> >>>  numerical expression has 3 elements: only the first used
>> >>
>> >>
>> >> David,
>> >>
>> >> # which(test[3,] == 0.0)
>> >> [1] 6 7 8
>> >>
>> >> and in a:b a and b must be length 1 vectors (scalars) otherwise just
>> >> the
>> >> first element (in this case 6) is used.
>> >>
>> >> That leads us to the conclusion that writing the line above is not
>> >> really
>> >> the cleanest way or you intended something different ....
>> >
>> > Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
>> > as
>> > though I would not be getting in truouble this way, but a cleaner method
>> > would be to access only the first element of which(test[3, ] == 0):
>> >
>> > test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
>> >
>> >>
>> >> David
>> >
>> >>> Seems to me that all of the element were used. I cannot explain that
>> >>> warning but am pretty sure it can be ignored.
>> >>>
>> >
>> > David Winsemius, MD
>> > Heritage Laboratories
>> > West Hartford, CT
>> >
>> >
>>
>> OK - making lots more headway. Thanks for your help.
>>
>> QUESTION: How do I handle the case where I'm testing for 0 and don't
>> find it? In this case I need to all of the row from C1:C6.
>>
>> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
>> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
>> test<-round(test,2)
>>
>> #Make array ragged
>> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
>> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
>> test$C6[7]<-0
>> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>>
>> test
>>
>> #C1 always the same so calculate it only once
>> StartCol <- which(names(test)=="C1")
>>
>> #Print row 3 explicitly
>> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
>>
>> #Row 6 fails because 0 is not found
>> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
>>
>> EndCol <- which(test[6,] == 0.0)[1]-1
>> EndCol
>>
>> Thanks,
>> Mark
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Henrique Dallazuanna
Yes,

First, select only columns C1 to C6, then look for values greater than 0,
after use this to select the columns in original subset.

On Sun, Jul 5, 2009 at 4:48 PM, Mark Knecht <[hidden email]> wrote:

> On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuanna<[hidden email]>
> wrote:
> > Try this:
> >
> > subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) > 0]
> >
> > subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) > 0]
> >
> >
>
> I must admit I like this one. Pleasing to look at. It seems
> approachable. Thanks!
>
> If I understand this the second subset gets evaluated first producing
> either TRUE or FALSE, and then the first subset gets evaluated but
> only for the entries that are TRUE? Is that the process?
>
> Thanks,
> Mark
>
> >
> > On Sun, Jul 5, 2009 at 1:19 PM, Mark Knecht <[hidden email]>
> wrote:
> >>
> >> On Sun, Jul 5, 2009 at 8:18 AM, David Winsemius<[hidden email]>
> >> wrote:
> >> >
> >> > On Jul 5, 2009, at 10:50 AM, Uwe Ligges wrote:
> >> >
> >> >>
> >> >>
> >> >> David Winsemius wrote:
> >> >>>
> >> >>> So if your values are calculated from other values then consider
> using
> >> >>> all.equal()
> >> >>> And repeated applications of the testing criteria process are
> >> >>> effective:
> >> >>> test[3,][which(names(test)=="C1"):(which(test[3,] == 0.0)-1)]
> >> >>>   C1   C2   C3
> >> >>> 3 0.52 0.66 0.51
> >> >>> (and a warning that does not seem accurate to me.)
> >> >>> In which(names(test) == "C1"):(which(test[3, ] == 0) - 1) :
> >> >>>  numerical expression has 3 elements: only the first used
> >> >>
> >> >>
> >> >> David,
> >> >>
> >> >> # which(test[3,] == 0.0)
> >> >> [1] 6 7 8
> >> >>
> >> >> and in a:b a and b must be length 1 vectors (scalars) otherwise just
> >> >> the
> >> >> first element (in this case 6) is used.
> >> >>
> >> >> That leads us to the conclusion that writing the line above is not
> >> >> really
> >> >> the cleanest way or you intended something different ....
> >> >
> >> > Thanks, Uwe. I see my confusion. I did want 6 to be used  and it looks
> >> > as
> >> > though I would not be getting in truouble this way, but a cleaner
> method
> >> > would be to access only the first element of which(test[3, ] == 0):
> >> >
> >> > test[3,][ which(names(test) == "C1") : (which(test[3,] == 0.0)[1]-1) ]
> >> >
> >> >>
> >> >> David
> >> >
> >> >>> Seems to me that all of the element were used. I cannot explain that
> >> >>> warning but am pretty sure it can be ignored.
> >> >>>
> >> >
> >> > David Winsemius, MD
> >> > Heritage Laboratories
> >> > West Hartford, CT
> >> >
> >> >
> >>
> >> OK - making lots more headway. Thanks for your help.
> >>
> >> QUESTION: How do I handle the case where I'm testing for 0 and don't
> >> find it? In this case I need to all of the row from C1:C6.
> >>
> >> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> >> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> >> test<-round(test,2)
> >>
> >> #Make array ragged
> >> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> >> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> >> test$C6[7]<-0
> >> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
> >>
> >> test
> >>
> >> #C1 always the same so calculate it only once
> >> StartCol <- which(names(test)=="C1")
> >>
> >> #Print row 3 explicitly
> >> test[3,][StartCol :(which(test[3,] == 0.0)[1]-1)]
> >>
> >> #Row 6 fails because 0 is not found
> >> test[6,][StartCol :(which(test[6,] == 0.0)[1]-1)]
> >>
> >> EndCol <- which(test[6,] == 0.0)[1]-1
> >> EndCol
> >>
> >> Thanks,
> >> Mark
> >>
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

Mark Knecht
On Sun, Jul 5, 2009 at 1:00 PM, Henrique Dallazuanna<[hidden email]> wrote:

> Yes,
>
> First, select only columns C1 to C6, then look for values greater than 0,
> after use this to select the columns in original subset.
>
> On Sun, Jul 5, 2009 at 4:48 PM, Mark Knecht <[hidden email]> wrote:
>>
>> On Sun, Jul 5, 2009 at 12:30 PM, Henrique Dallazuanna<[hidden email]>
>> wrote:
>> > Try this:
>> >
>> > subset(test[3,], select=C1:C6)[,subset(test[3,], select = C1:C6) > 0]
>> >
>> > subset(test[6,], select=C1:C6)[,subset(test[6,], select = C1:C6) > 0]
>> >
>> >
>>

Thanks for the further explanation.

One small difference in this approach is that in the general case I
have to supply the name of the last column whereas the other just
starts at the beginning and goes until it's done. No big deal and
possibly an advantage as I could search a subset of the data on the
row, i.e. supply both the start and stop columns, for instance
C61:C120. This could be valuable as each column generally represents 1
minute further into the experiment, so that range would look at the
second hour only.

Cheers,
Mark

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
markknecht@gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: OK - I got the data - now what? :-)

hadley wickham
In reply to this post by Mark Knecht
>   I think the root cause of a number of my coding problems in R right
> now is my lack of skills in reading and grabbing portions of the data
> out of arrays. I'm new at this. (And not a programmer) I need to find
> some good examples to read and test on that subject. If I could locate
> which column was called C1, then read row 3 from C1 up to the last
> value before a 0, I'd have proper data to plot for one line. Repeat as
> necessary through the array and I get all the lines. Doing the lines
> one at a time should allow me the opportunity to apply color or not
> plot based on values in the first few columns.
>
> Thanks,
> Mark
>
> test <- data.frame(A=1:10, B=100, C1=runif(10), C2=runif(10),
> C3=runif(10), C4=runif(10), C5=runif(10), C6=runif(10))
> test<-round(test,2)
>
> #Make array ragged
> test$C3[2]<-0;test$C4[2]<-0;test$C5[2]<-0;test$C6[2]<-0
> test$C4[3]<-0;test$C5[3]<-0;test$C6[3]<-0
> test$C6[7]<-0
> test$C4[8]<-0;test$C5[8]<-0;test$C6[8]<-0
>
> #Print array
> test

Are the zeros always going to be arranged like this? i.e. for
experiment there is a point at which all later values are zero?  If
so, the following is a much simpler way of getting to the core of your
data, without fussing with overly complicated matrix indexing:

library(reshape)
testm <- melt(test, id = c("A", "B"))
subset(testm, value > 0)

I suspect you will also find this form easier to plot and analyse.

Hadley

--
http://had.co.nz/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
12