

Hi guys,
I'm a newby, so sorry for the easy question.
I have a matrix (459x28) in which a large number of observations are
repeated (same placed sampled in different times).
One of the columns is refers to the ID of the place of sampling.
What I would like is to extract subset matrix for every point of sampling.
I can do it manually, e.g. x1<data.frame(dataset[dataset$ID=="x1",])
but is it possible to write a script and let do it to R?
So i got n submatrix of the n ID found in the original columns.
Cheers
Matteo
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
Try the following.
result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID ==
uid, ])
names(result) < unique(dataset$ID)
Hope this helps,
Rui Barradas
Em 24062013 15:36, matteo escreveu:
> Hi guys,
> I'm a newby, so sorry for the easy question.
>
> I have a matrix (459x28) in which a large number of observations are
> repeated (same placed sampled in different times).
> One of the columns is refers to the ID of the place of sampling.
> What I would like is to extract subset matrix for every point of sampling.
>
> I can do it manually, e.g. x1<data.frame(dataset[dataset$ID=="x1",])
> but is it possible to write a script and let do it to R?
> So i got n submatrix of the n ID found in the original columns.
>
> Cheers
>
> Matteo
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


First of all, is your data structure a matrix or a data frame? They
are different!
Assuming the latter, a shorter version of Rui's answer that avoids
unique() and automatically takes care of names is:
result < by(dataset, dataset$ID,I)
See ?by, ?tapply, and ?split
 Bert
On Mon, Jun 24, 2013 at 9:24 AM, Rui Barradas < [hidden email]> wrote:
> Hello,
>
> Try the following.
>
>
> result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID ==
> uid, ])
> names(result) < unique(dataset$ID)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 24062013 15:36, matteo escreveu:
>>
>> Hi guys,
>> I'm a newby, so sorry for the easy question.
>>
>> I have a matrix (459x28) in which a large number of observations are
>> repeated (same placed sampled in different times).
>> One of the columns is refers to the ID of the place of sampling.
>> What I would like is to extract subset matrix for every point of sampling.
>>
>> I can do it manually, e.g. x1<data.frame(dataset[dataset$ID=="x1",])
>> but is it possible to write a script and let do it to R?
>> So i got n submatrix of the n ID found in the original columns.
>>
>> Cheers
>>
>> Matteo
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide
>> http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 4677374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdbfunctionalgroups/pdbbiostatistics/pdbncbhome.htm______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
You don't have a subdata.frame, what you have is a list, with each
element of that list a df. Try to see, for instance, result[[1]]. This
should be a data.frame corresponding to the first ID.
Rui Barradas
Em 24062013 18:03, matteo escreveu:
> Hi,
>
>> result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID
>> == uid, ])
> Ok, I have the element result as a list
>
>> names(result) < unique(dataset$ID)
> Nothing happens. I don't have any submatrix...
>
> Matteo
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


On Jun 24, 2013, at 10:03 AM, matteo wrote:
> Hi,
>
>> result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID == uid, ])
> Ok, I have the element result as a list
>
>> names(result) < unique(dataset$ID)
> Nothing happens. I don't have any submatrix...
>
> Matteo
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
David Winsemius
Alameda, CA, USA
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Sorry for the blank message. The default behavior of the Mac Mail.app spell checker has me confused.
On Jun 24, 2013, at 10:03 AM, matteo wrote:
> Hi,
>
>> result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID == uid, ])
> Ok, I have the element result as a list
>
>> names(result) < unique(dataset$ID)
> Nothing happens. I don't have any submatrix...
What were you expecting to happen? You just assigned names to the result. You should be able to execute:
names(result)
Or:
str(result)
>

David Winsemius
Alameda, CA, USA
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
I had forgotten the much simpler solutions. The following should do it.
split(dataset, dataset$ID)
Rui Barradas
Em 24062013 18:13, Bert Gunter escreveu:
> First of all, is your data structure a matrix or a data frame? They
> are different!
>
> Assuming the latter, a shorter version of Rui's answer that avoids
> unique() and automatically takes care of names is:
>
> result < by(dataset, dataset$ID,I)
>
> See ?by, ?tapply, and ?split
>
>  Bert
>
> On Mon, Jun 24, 2013 at 9:24 AM, Rui Barradas < [hidden email]> wrote:
>> Hello,
>>
>> Try the following.
>>
>>
>> result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID ==
>> uid, ])
>> names(result) < unique(dataset$ID)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 24062013 15:36, matteo escreveu:
>>>
>>> Hi guys,
>>> I'm a newby, so sorry for the easy question.
>>>
>>> I have a matrix (459x28) in which a large number of observations are
>>> repeated (same placed sampled in different times).
>>> One of the columns is refers to the ID of the place of sampling.
>>> What I would like is to extract subset matrix for every point of sampling.
>>>
>>> I can do it manually, e.g. x1<data.frame(dataset[dataset$ID=="x1",])
>>> but is it possible to write a script and let do it to R?
>>> So i got n submatrix of the n ID found in the original columns.
>>>
>>> Cheers
>>>
>>> Matteo
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>> PLEASE do read the posting guide
>>> http://www.Rproject.org/postingguide.html>>> and provide commented, minimal, selfcontained, reproducible code.
>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>
>
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Oh yes, that's even better!
 Bert
On Mon, Jun 24, 2013 at 10:33 AM, Rui Barradas < [hidden email]> wrote:
> Hello,
>
> I had forgotten the much simpler solutions. The following should do it.
>
> split(dataset, dataset$ID)
>
>
> Rui Barradas
>
> Em 24062013 18:13, Bert Gunter escreveu:
>>
>> First of all, is your data structure a matrix or a data frame? They
>> are different!
>>
>> Assuming the latter, a shorter version of Rui's answer that avoids
>> unique() and automatically takes care of names is:
>>
>> result < by(dataset, dataset$ID,I)
>>
>> See ?by, ?tapply, and ?split
>>
>>  Bert
>>
>> On Mon, Jun 24, 2013 at 9:24 AM, Rui Barradas < [hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> Try the following.
>>>
>>>
>>> result < lapply(unique(dataset$ID), function(uid) dataset[dataset$ID ==
>>> uid, ])
>>> names(result) < unique(dataset$ID)
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Em 24062013 15:36, matteo escreveu:
>>>>
>>>>
>>>> Hi guys,
>>>> I'm a newby, so sorry for the easy question.
>>>>
>>>> I have a matrix (459x28) in which a large number of observations are
>>>> repeated (same placed sampled in different times).
>>>> One of the columns is refers to the ID of the place of sampling.
>>>> What I would like is to extract subset matrix for every point of
>>>> sampling.
>>>>
>>>> I can do it manually, e.g. x1<data.frame(dataset[dataset$ID=="x1",])
>>>> but is it possible to write a script and let do it to R?
>>>> So i got n submatrix of the n ID found in the original columns.
>>>>
>>>> Cheers
>>>>
>>>> Matteo
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>>> PLEASE do read the posting guide
>>>> http://www.Rproject.org/postingguide.html>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>
>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/rhelp>>> PLEASE do read the posting guide
>>> http://www.Rproject.org/postingguide.html>>> and provide commented, minimal, selfcontained, reproducible code.
>>
>>
>>
>>
>

Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 4677374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdbfunctionalgroups/pdbbiostatistics/pdbncbhome.htm______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


First of all, thanks for all the replies!!
What you have written helps, but is not entirely the answer to my problem.
What I'd have is the creation of new data.frames each of one named with
the ID of the original dataframe and with all the columns.
For example, in the original dataframe one column (ID) has 5 different
elements:
ID value1 value2
x1 10 12
x1 12 22
x1 11 9
x2 15 10
x3 11 11
x3 13 8
I need a command ables to split the dataframe in other smallest and
separated dataframes, so that they look like
x1 is
ID value1 value2
x1 10 12
x1 12 22
x1 11 9
x2 is
ID value1 value2
x2 15 10
and x3 is
ID value1 value2
x1 10 12
x3 11 11
x3 13 8
Sorry if I'm not able to explain it better and as I said I'm very new to
R.....
Thanks
Matteo
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Inline below...
On Mon, Jun 24, 2013 at 11:31 AM, matteo < [hidden email]> wrote:
> First of all, thanks for all the replies!!
> What you have written helps, but is not entirely the answer to my problem.
>
> What I'd have is the creation of new data.frames each of one named with the
> ID of the original dataframe and with all the columns.
No you don't! You want what you were provided, a list of data frames.
Anything you want to do can be done, probably more conveniently, with
that.
Read "An Introduction to R" and learn to work with lists. Being a
newbie is no excuse for not making an effort to learn.
 Bert
>
> For example, in the original dataframe one column (ID) has 5 different
> elements:
>
> ID value1 value2
> x1 10 12
> x1 12 22
> x1 11 9
> x2 15 10
> x3 11 11
> x3 13 8
>
> I need a command ables to split the dataframe in other smallest and
> separated dataframes, so that they look like
>
> x1 is
> ID value1 value2
> x1 10 12
> x1 12 22
> x1 11 9
>
> x2 is
> ID value1 value2
> x2 15 10
>
> and x3 is
> ID value1 value2
> x1 10 12
> x3 11 11
> x3 13 8
>
>
> Sorry if I'm not able to explain it better and as I said I'm very new to
> R.....
>
> Thanks
>
> Matteo

Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 4677374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdbfunctionalgroups/pdbbiostatistics/pdbncbhome.htm______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


> First of all, thanks for all the replies!!
> What you have written helps, but is not entirely the answer to my problem.
>
> What I'd have is the creation of new data.frames each of one named with
> the ID of the original dataframe and with all the columns.
What was suggested gave you a list of data.frames, each named with the ID .
You can use the syntax list$name or list[["name"]] to refer to a data.frame.
R> splitData < split(allData, allData$ID)
R> splitData$x1
ID value1 value2
1 x1 10 12
2 x1 12 22
3 x1 11 9
R> splitData$x2
ID value1 value2
4 x2 15 10
You seem to want a function that creates a bunch of data.frames in the
current environment instead of one that creates them in a list created to
hold them. This is not necessary and actually gets in the way most of the
time.
If you want to refer to 'x1' instead of 'splitData$x1' you can use 'with', as in
R> with(splitData, mean(x1$value2)  mean(x2$value2))
[1] 4.333333
instead of the slightly wordier
R> mean(splitData$x1$value2)  mean(splitData$x2$value2)
[1] 4.333333
If you want to process each subdata.frame (these are data.frame, not matrices)
you can use lapply() or sapply() or vapply() on the list
R> dm < sapply(splitData, function(x)mean(x$value2)  mean(x$value1))
R> dm
x1 x2 x3
3.333333 5.000000 2.500000
R> dm["x2"]
x2
5
If you put all those names into the current environment you stand the chance
of clobbering some other dataset whose name matched one of the entries in
allData$ID. Also you would have to use some rather ugly code involving get()
and assign() to manipulate the objects. Learn to love lists.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> Original Message
> From: [hidden email] [mailto: [hidden email]] On Behalf
> Of matteo
> Sent: Monday, June 24, 2013 11:32 AM
> To: Bert Gunter
> Cc: [hidden email]
> Subject: Re: [R] extracting submatrix from a bigger one
>
> First of all, thanks for all the replies!!
> What you have written helps, but is not entirely the answer to my problem.
>
> What I'd have is the creation of new data.frames each of one named with
> the ID of the original dataframe and with all the columns.
>
> For example, in the original dataframe one column (ID) has 5 different
> elements:
>
> ID value1 value2
> x1 10 12
> x1 12 22
> x1 11 9
> x2 15 10
> x3 11 11
> x3 13 8
>
> I need a command ables to split the dataframe in other smallest and
> separated dataframes, so that they look like
>
> x1 is
> ID value1 value2
> x1 10 12
> x1 12 22
> x1 11 9
>
> x2 is
> ID value1 value2
> x2 15 10
>
> and x3 is
> ID value1 value2
> x1 10 12
> x3 11 11
> x3 13 8
>
>
> Sorry if I'm not able to explain it better and as I said I'm very new to
> R.....
>
> Thanks
>
> Matteo
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

