Split dataframe into new dataframes

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Split dataframe into new dataframes

Johannes Radinger
Hi,

I want to split a dataframe based on a grouping variable (in one column). The resulting new
dataframes should be stored in a new variable. I tried to split the dataframe using split() and
to store it using a FOR loop, but thats not working so far:

df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))

Fsplit <- function(x,y){
        ls <- split(x,f=x$y)
        for (i in names(ls)){
                i <- ls$i
        }
}

Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping variable
 

Any suggestions how to get that done?

Best regards
Johannes
        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

David Winsemius

On Feb 8, 2012, at 4:11 PM, Johannes Radinger wrote:

> Hi,
>
> I want to split a dataframe based on a grouping variable (in one  
> column). The resulting new
> dataframes should be stored in a new variable. I tried to split the  
> dataframe using split() and
> to store it using a FOR loop, but thats not working so far:
>
> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>
> Fsplit <- function(x,y){
> ls <- split(x,f=x$y)
> for (i in names(ls)){
> i <- ls$i
> }
> }
>
> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument  
> grouping variable
>

It appears you want the name of the levels of df$A to be the names of  
separate variables in the global environment. If that is correct, then  
see the FAQ. I'm not sure which one it is among the Miscellaneous  
section, but you should be looking of the one that tells you how to  
construct a named variable.

Or:

? assign

--
David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

Johannes Radinger

Am 08.02.2012 um 22:19 schrieb David Winsemius:

>
> On Feb 8, 2012, at 4:11 PM, Johannes Radinger wrote:
>
>> Hi,
>>
>> I want to split a dataframe based on a grouping variable (in one column). The resulting new
>> dataframes should be stored in a new variable. I tried to split the dataframe using split() and
>> to store it using a FOR loop, but thats not working so far:
>>
>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>
>> Fsplit <- function(x,y){
>> ls <- split(x,f=x$y)
>> for (i in names(ls)){
>> i <- ls$i
>> }
>> }
>>
>> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping variable
>>
>
> It appears you want the name of the levels of df$A to be the names of separate variables in the global environment. If that is correct, then see the FAQ. I'm not sure which one it is among the Miscellaneous section, but you should be looking of the one that tells you how to construct a named variable.
>

Your hint with the global environment brought me on track. It seems that I this task can be done with list2env() although there is still a problem with my function. How
can I parse the name of the dataframe and the column name in the function...

df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))

Fsplit <- function(x,y){
        ls <- split(x,f=x$y)
        list2env(ls,envir = .GlobalEnv)
}

Fsplit(df,A)

/johannes

> Or:
>
> ? assign
>
> --
> David Winsemius, MD
> West Hartford, CT
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

David Winsemius

On Feb 8, 2012, at 5:06 PM, Johannes Radinger wrote:

>
> Am 08.02.2012 um 22:19 schrieb David Winsemius:
>
>>
>> On Feb 8, 2012, at 4:11 PM, Johannes Radinger wrote:
>>
>>> Hi,
>>>
>>> I want to split a dataframe based on a grouping variable (in one  
>>> column). The resulting new
>>> dataframes should be stored in a new variable. I tried to split  
>>> the dataframe using split() and
>>> to store it using a FOR loop, but thats not working so far:
>>>
>>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>>
>>> Fsplit <- function(x,y){
>>> ls <- split(x,f=x$y)
>>> for (i in names(ls)){
>>> i <- ls$i
>>> }
>>> }
>>>
>>> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument  
>>> grouping variable
>>>
>>
>> It appears you want the name of the levels of df$A to be the names  
>> of separate variables in the global environment. If that is  
>> correct, then see the FAQ. I'm not sure which one it is among the  
>> Miscellaneous section, but you should be looking of the one that  
>> tells you how to construct a named variable.
>>
>
> Your hint with the global environment brought me on track. It seems  
> that I this task can be done with list2env() although there is still  
> a problem with my function. How
> can I parse the name of the dataframe and the column name in the  
> function...
>
> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>
> Fsplit <- function(x,y){
> ls <- split(x,f=x$y)
> list2env(ls,envir = .GlobalEnv)
> }
>
> Fsplit(df,A)

I still have not figured out what you really want to do. The simple  
answer to what you ask for in your written request is simply:

dfvar <- split(df, df$A)

So what is it about that result that is not useful for your (as yet  
unstated)  destination?

 > split(df, df$A)
$A1
    A B
1 A1 1
2 A1 2

$A2
    A B
3 A2 3
4 A2 4





>
> /johannes
>
>> Or:
>>
>> ? assign
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>

David Winsemius, MD
West Hartford, CT


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

Johannes Radinger

Am 08.02.2012 um 23:47 schrieb David Winsemius:

>
> On Feb 8, 2012, at 5:06 PM, Johannes Radinger wrote:
>
>>
>> Am 08.02.2012 um 22:19 schrieb David Winsemius:
>>
>>>
>>> On Feb 8, 2012, at 4:11 PM, Johannes Radinger wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to split a dataframe based on a grouping variable (in one column). The resulting new
>>>> dataframes should be stored in a new variable. I tried to split the dataframe using split() and
>>>> to store it using a FOR loop, but thats not working so far:
>>>>
>>>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>>>
>>>> Fsplit <- function(x,y){
>>>> ls <- split(x,f=x$y)
>>>> for (i in names(ls)){
>>>> i <- ls$i
>>>> }
>>>> }
>>>>
>>>> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping variable
>>>>
>>>
>>> It appears you want the name of the levels of df$A to be the names of separate variables in the global environment. If that is correct, then see the FAQ. I'm not sure which one it is among the Miscellaneous section, but you should be looking of the one that tells you how to construct a named variable.
>>>
>>
>> Your hint with the global environment brought me on track. It seems that I this task can be done with list2env() although there is still a problem with my function. How
>> can I parse the name of the dataframe and the column name in the function...
>>
>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>
>> Fsplit <- function(x,y){
>> ls <- split(x,f=x$y)
>> list2env(ls,envir = .GlobalEnv)
>> }
>>
>> Fsplit(df,A)
>
> I still have not figured out what you really want to do. The simple answer to what you ask for in your written request is simply:
>
> dfvar <- split(df, df$A)
>
> So what is it about that result that is not useful for your (as yet unstated)  destination?
>
> > split(df, df$A)
> $A1
>    A B
> 1 A1 1
> 2 A1 2
>
> $A2
>    A B
> 3 A2 3
> 4 A2 4
>
>

Sorry for not being clear enough, and your are
right as "split(df, df$A)" is what I want. Additionally I want to store afterwards
the single objects of the list in new dataframes
where variable name = name of list object (which can be done with list2env()).
Is that clear enough so far?

What I want exactly is to express that two operations (split, list2env) within
one function. I need the function for other tasks in R.

/johannes

>
>
>
>>
>> /johannes
>>
>>> Or:
>>>
>>> ? assign
>>>
>>> --
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

David Winsemius

On Feb 8, 2012, at 6:29 PM, Johannes Radinger wrote:

>
> Am 08.02.2012 um 23:47 schrieb David Winsemius:
>
>>
>> On Feb 8, 2012, at 5:06 PM, Johannes Radinger wrote:
>>
>>>
>>> Am 08.02.2012 um 22:19 schrieb David Winsemius:
>>>
>>>>
>>>> On Feb 8, 2012, at 4:11 PM, Johannes Radinger wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to split a dataframe based on a grouping variable (in one  
>>>>> column). The resulting new
>>>>> dataframes should be stored in a new variable. I tried to split  
>>>>> the dataframe using split() and
>>>>> to store it using a FOR loop, but thats not working so far:
>>>>>
>>>>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>>>>
>>>>> Fsplit <- function(x,y){
>>>>> ls <- split(x,f=x$y)
>>>>> for (i in names(ls)){
>>>>> i <- ls$i
>>>>> }
>>>>> }
>>>>>
>>>>> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument  
>>>>> grouping variable
>>>>>
>>>>
>>>> It appears you want the name of the levels of df$A to be the  
>>>> names of separate variables in the global environment. If that is  
>>>> correct, then see the FAQ. I'm not sure which one it is among the  
>>>> Miscellaneous section, but you should be looking of the one that  
>>>> tells you how to construct a named variable.
>>>>
>>>
>>> Your hint with the global environment brought me on track. It  
>>> seems that I this task can be done with list2env() although there  
>>> is still a problem with my function. How
>>> can I parse the name of the dataframe and the column name in the  
>>> function...
>>>
>>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>>
>>> Fsplit <- function(x,y){
>>> ls <- split(x,f=x$y)
>>> list2env(ls,envir = .GlobalEnv)
>>> }
>>>
>>> Fsplit(df,A)
>>
>> I still have not figured out what you really want to do. The simple  
>> answer to what you ask for in your written request is simply:
>>
>> dfvar <- split(df, df$A)
>>
>> So what is it about that result that is not useful for your (as yet  
>> unstated)  destination?
>>
>> > split(df, df$A)
>> $A1
>>    A B
>> 1 A1 1
>> 2 A1 2
>>
>> $A2
>>    A B
>> 3 A2 3
>> 4 A2 4
>>
>>
>
> Sorry for not being clear enough, and your are
> right as "split(df, df$A)" is what I want. Additionally I want to  
> store afterwards
> the single objects of the list in new dataframes
> where variable name = name of list object (which can be done with  
> list2env()).
> Is that clear enough so far?

If you want to put that list in an environment, it's fine with me. Or  
you can access it from the split-object-list-of-dataframes, dfvar,  
using with()

 >  with(dfvar, A1)
    A B
1 A1 1
2 A1 2

Note: with() does not work well inside other functions
For programming purposes this might be safer..

 > new.env <- environment()
 > list2env(dfvar, new.env)
<environment: R_GlobalEnv>
 > new.env$A1
    A B
1 A1 1
2 A1 2




>
> What I want exactly is to express that two operations (split,  
> list2env) within
> one function. I need the function for other tasks in R.
>
> /johannes
>
>>
>>
>>
>>>
>>> /johannes
>>>
>>>> Or:
>>>>
>>>> ? assign
>>>>
>>>> --
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

Michael Weylandt
In reply to this post by Johannes Radinger
You can of course make your own function to do this all "in one step"

split2env <- function(x, f, drop = FALSE, ...){
     list2env(split(x, f, drop), ...)
}

But why do you need this for other tasks as a one-liner?

M


On Wed, Feb 8, 2012 at 6:29 PM, Johannes Radinger <[hidden email]> wrote:

>
> Am 08.02.2012 um 23:47 schrieb David Winsemius:
>
>>
>> On Feb 8, 2012, at 5:06 PM, Johannes Radinger wrote:
>>
>>>
>>> Am 08.02.2012 um 22:19 schrieb David Winsemius:
>>>
>>>>
>>>> On Feb 8, 2012, at 4:11 PM, Johannes Radinger wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to split a dataframe based on a grouping variable (in one column). The resulting new
>>>>> dataframes should be stored in a new variable. I tried to split the dataframe using split() and
>>>>> to store it using a FOR loop, but thats not working so far:
>>>>>
>>>>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>>>>
>>>>> Fsplit <- function(x,y){
>>>>>    ls <- split(x,f=x$y)
>>>>>    for (i in names(ls)){
>>>>>            i <- ls$i
>>>>>    }
>>>>> }
>>>>>
>>>>> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping variable
>>>>>
>>>>
>>>> It appears you want the name of the levels of df$A to be the names of separate variables in the global environment. If that is correct, then see the FAQ. I'm not sure which one it is among the Miscellaneous section, but you should be looking of the one that tells you how to construct a named variable.
>>>>
>>>
>>> Your hint with the global environment brought me on track. It seems that I this task can be done with list2env() although there is still a problem with my function. How
>>> can I parse the name of the dataframe and the column name in the function...
>>>
>>> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>>>
>>> Fsplit <- function(x,y){
>>>      ls <- split(x,f=x$y)
>>>      list2env(ls,envir = .GlobalEnv)
>>> }
>>>
>>> Fsplit(df,A)
>>
>> I still have not figured out what you really want to do. The simple answer to what you ask for in your written request is simply:
>>
>> dfvar <- split(df, df$A)
>>
>> So what is it about that result that is not useful for your (as yet unstated)  destination?
>>
>> > split(df, df$A)
>> $A1
>>    A B
>> 1 A1 1
>> 2 A1 2
>>
>> $A2
>>    A B
>> 3 A2 3
>> 4 A2 4
>>
>>
>
> Sorry for not being clear enough, and your are
> right as "split(df, df$A)" is what I want. Additionally I want to store afterwards
> the single objects of the list in new dataframes
> where variable name = name of list object (which can be done with list2env()).
> Is that clear enough so far?
>
> What I want exactly is to express that two operations (split, list2env) within
> one function. I need the function for other tasks in R.
>
> /johannes
>
>>
>>
>>
>>>
>>> /johannes
>>>
>>>> Or:
>>>>
>>>> ? assign
>>>>
>>>> --
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split dataframe into new dataframes

glsnow
In reply to this post by Johannes Radinger
I think one of your problems (the others have been addressed by
others) is that you want the expression x$y to represent a column of x
whose name is stored in y (not the name y itself).  The problem here
is that the $ notation is a magical shortcut and like any other magic
if used incorrectly is likely to do the programmatic equivalent of
turning yourself into a toad.  It is better to use '[[' instead of the
shortcut, try replacing x$y with x[[y]] and see if that fixes that
problem.

On Wed, Feb 8, 2012 at 2:11 PM, Johannes Radinger <[hidden email]> wrote:

> Hi,
>
> I want to split a dataframe based on a grouping variable (in one column). The resulting new
> dataframes should be stored in a new variable. I tried to split the dataframe using split() and
> to store it using a FOR loop, but thats not working so far:
>
> df <- data.frame(A=c("A1","A1","A2","A2"),B=seq(1:4))
>
> Fsplit <- function(x,y){
>        ls <- split(x,f=x$y)
>        for (i in names(ls)){
>                i <- ls$i
>        }
> }
>
> Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping variable
>
>
> Any suggestions how to get that done?
>
> Best regards
> Johannes
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Gregory (Greg) L. Snow Ph.D.
[hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.