Split a data.frame

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Split a data.frame

Bogaso
Hi,

I am struggling to split a data.frame as will below scheme :

DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF

split_str = c('a', 'c')

Now, for each element in split_str, R should find which row of DF contains
that element, and return DF with all rows starting from next row of the
corresponding element and ending with the preceding value of the next
element.

So in my case, I should see 2 data.frames

1st data-frame with name = 'v' (i.e. 2nd row of DF)

2nd data.frame with number_of_rows as 0 (as there is no row left after 'c')

Similarly if split_str = c('v'') then, my 2 data.frames will be

1st data.frame with name = 'a'
2nd data.frame with name = 'c'

Any idea how to efficiently implement above scheme would be highly
appreciated. I tried with split() function, however, it is not giving the
right answer.

Thanks,

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split a data.frame

Rui Barradas
Hello,

Maybe something like the following.

splitDF <- function(data, col, s){
     n <- nrow(data)
     inx <- which(data[[col]] %in% s)
     lapply(seq_along(inx), function(i){
         k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
         data[k, ]
     })
}

splitDF(DF, "name", split_str)


Hope this helps,

Rui Barradas

On 5/19/2018 12:07 PM, Christofer Bogaso wrote:

> Hi,
>
> I am struggling to split a data.frame as will below scheme :
>
> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>
> split_str = c('a', 'c')
>
> Now, for each element in split_str, R should find which row of DF contains
> that element, and return DF with all rows starting from next row of the
> corresponding element and ending with the preceding value of the next
> element.
>
> So in my case, I should see 2 data.frames
>
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>
> 2nd data.frame with number_of_rows as 0 (as there is no row left after 'c')
>
> Similarly if split_str = c('v'') then, my 2 data.frames will be
>
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
>
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving the
> right answer.
>
> Thanks,
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split a data.frame

jholtman
DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
##   name val
## 1    a   0
## 2    v   0
## 3    c   0
split_str = c('a', 'c')
# If we assume that the values in split_str are ordered in the same order
as in the dataframe, then this might work.

offsets <- match(split_str, DF$name)
# Since you only want the rows in between

DF[diff(offsets), ]
##   name val
## 2    v   0


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <[hidden email]> wrote:

> Hello,
>
> Maybe something like the following.
>
> splitDF <- function(data, col, s){
>     n <- nrow(data)
>     inx <- which(data[[col]] %in% s)
>     lapply(seq_along(inx), function(i){
>         k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
>         data[k, ]
>     })
> }
>
> splitDF(DF, "name", split_str)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 5/19/2018 12:07 PM, Christofer Bogaso wrote:
>
>> Hi,
>>
>> I am struggling to split a data.frame as will below scheme :
>>
>> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>>
>> split_str = c('a', 'c')
>>
>> Now, for each element in split_str, R should find which row of DF contains
>> that element, and return DF with all rows starting from next row of the
>> corresponding element and ending with the preceding value of the next
>> element.
>>
>> So in my case, I should see 2 data.frames
>>
>> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>>
>> 2nd data.frame with number_of_rows as 0 (as there is no row left after
>> 'c')
>>
>> Similarly if split_str = c('v'') then, my 2 data.frames will be
>>
>> 1st data.frame with name = 'a'
>> 2nd data.frame with name = 'c'
>>
>> Any idea how to efficiently implement above scheme would be highly
>> appreciated. I tried with split() function, however, it is not giving the
>> right answer.
>>
>> Thanks,
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split a data.frame

Bert Gunter-2
In reply to this post by Rui Barradas
...
yes, but note that:

which(data[[col]] %in% s

can be replaced directly by match:

match(data[[col]], s)

Corner cases (nothing matches, etc.) would also have to be checked and
probably should sort the matched row numbers for safety.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <[hidden email]> wrote:

> Hello,
>
> Maybe something like the following.
>
> splitDF <- function(data, col, s){
>     n <- nrow(data)
>     inx <- which(data[[col]] %in% s)
>     lapply(seq_along(inx), function(i){
>         k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
>         data[k, ]
>     })
> }
>
> splitDF(DF, "name", split_str)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 5/19/2018 12:07 PM, Christofer Bogaso wrote:
>
>> Hi,
>>
>> I am struggling to split a data.frame as will below scheme :
>>
>> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>>
>> split_str = c('a', 'c')
>>
>> Now, for each element in split_str, R should find which row of DF contains
>> that element, and return DF with all rows starting from next row of the
>> corresponding element and ending with the preceding value of the next
>> element.
>>
>> So in my case, I should see 2 data.frames
>>
>> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>>
>> 2nd data.frame with number_of_rows as 0 (as there is no row left after
>> 'c')
>>
>> Similarly if split_str = c('v'') then, my 2 data.frames will be
>>
>> 1st data.frame with name = 'a'
>> 2nd data.frame with name = 'c'
>>
>> Any idea how to efficiently implement above scheme would be highly
>> appreciated. I tried with split() function, however, it is not giving the
>> right answer.
>>
>> Thanks,
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split a data.frame

jholtman
In reply to this post by Bogaso
Forgot to take care of the boundary conditions:

# revised data.frame to take care of boundary conditions
DF = data.frame(name = c('b', 'a','v','z', 'c','d'), val = 0); DF
##   name val
## 1    b   0
## 2    a   0
## 3    v   0
## 4    z   0
## 5    c   0
## 6    d   0
split_str = c('a', 'c')

# If we assume that the values in split_str are ordered in
# the same order as in the dataframe, then this might work.
offsets <- match(split_str, DF$name)

# now find the values inbetween the offsets
ret_indx <- NULL
for (i in seq_len(length(offsets) - 1)){
  if (offsets[i + 1] - offsets[i] > 1){  # something inbetween
    ret_indx <- c(ret_indx, (offsets[i] + 1):(offsets[i+1] - 1))
  }
}
DF[ret_indx, ]
##   name val
## 3    v   0
## 4    z   0



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, May 19, 2018 at 4:07 AM, Christofer Bogaso <
[hidden email]> wrote:

> Hi,
>
> I am struggling to split a data.frame as will below scheme :
>
> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>
> split_str = c('a', 'c')
>
> Now, for each element in split_str, R should find which row of DF contains
> that element, and return DF with all rows starting from next row of the
> corresponding element and ending with the preceding value of the next
> element.
>
> So in my case, I should see 2 data.frames
>
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>
> 2nd data.frame with number_of_rows as 0 (as there is no row left after 'c')
>
> Similarly if split_str = c('v'') then, my 2 data.frames will be
>
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
>
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving the
> right answer.
>
> Thanks,
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Split a data.frame

K. Elo-2
In reply to this post by Bogaso
Hi!

How about this:

--- snip --

for (i in 1:(length(split_str)-1)) {
    assign(paste("DF",i,sep=""),DF[
c((which(DF$name==split_str[i])+1):(which(DF$name==split_str[i+1])-1)),
])
}

--- snip ---

'assign' creates for each subset a new data.frame DFn, where n ist a
count (1,2,...).

But note: if your DF has duplicates in 'name' (e.g. two rows with 'a'
in 'DF$name'), my solution will use the first occurrence only (and this
for both start and for end).

HTH,
Kimmo

2018-05-19 kello 16:37 +0530, Christofer Bogaso wrote:

> Hi,
>
> I am struggling to split a data.frame as will below scheme :
>
> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>
> split_str = c('a', 'c')
>
> Now, for each element in split_str, R should find which row of DF
> contains
> that element, and return DF with all rows starting from next row of
> the
> corresponding element and ending with the preceding value of the next
> element.
>
> So in my case, I should see 2 data.frames
>
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>
> 2nd data.frame with number_of_rows as 0 (as there is no row left
> after 'c')
>
> Similarly if split_str = c('v'') then, my 2 data.frames will be
>
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
>
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving
> the
> right answer.
>
> Thanks,
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-gui
> de.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.