Quantcast

A basic design question for R

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

A basic design question for R

Onur Uncu
Hello R Community,

I have the following design question. I have a data set that looks
like this (shortened for the sake of example).

Gender  Age
 M          70
 F           65
 M          70

Each row represents a person with an age/gender combination. We could
put this data into a data frame.

Now, I would like to do some actuarial analysis on this data set. To
do so, I need to create and store a mortality curve for each person in
the table (a mortality curve is a matrix with 2 columns: date and
survival probability). I can write a function that returns a mortality
curve given gender and age.  The question is the following: In what
data format should I store all these mortality curve objects? Should I
add a column to the data frame and each entry in that column is a
matrix (a mortality curve)? This way, the mortality curve would be
stored next to age/gender data in the data frame. However, I read in
several places that putting vectors/matrices as elements of a data
frame is a bad idea. I do not know why. What is a good design choice
in this instance please? How should I store the mortality curves?

Thank you for your help.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: A basic design question for R

steven mosher
use a list. or create new class which is a list
On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote:

> Hello R Community,
>
> I have the following design question. I have a data set that looks
> like this (shortened for the sake of example).
>
> Gender  Age
>  M          70
>  F           65
>  M          70
>
> Each row represents a person with an age/gender combination. We could
> put this data into a data frame.
>
> Now, I would like to do some actuarial analysis on this data set. To
> do so, I need to create and store a mortality curve for each person in
> the table (a mortality curve is a matrix with 2 columns: date and
> survival probability). I can write a function that returns a mortality
> curve given gender and age.  The question is the following: In what
> data format should I store all these mortality curve objects? Should I
> add a column to the data frame and each entry in that column is a
> matrix (a mortality curve)? This way, the mortality curve would be
> stored next to age/gender data in the data frame. However, I read in
> several places that putting vectors/matrices as elements of a data
> frame is a bad idea. I do not know why. What is a good design choice
> in this instance please? How should I store the mortality curves?
>
> Thank you for your help.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: A basic design question for R

Onur Uncu
Thank you.  But isn't a data frame already a list? What is wrong with
adding a column to the existing data frame (a column with the
mortality curve matrices)?

Sorry if I am being difficult. Just want to learn good design in R.


On Sat, Jun 16, 2012 at 5:49 PM, steven mosher <[hidden email]> wrote:

> use a list. or create new class which is a list
>
> On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote:
>>
>> Hello R Community,
>>
>> I have the following design question. I have a data set that looks
>> like this (shortened for the sake of example).
>>
>> Gender  Age
>>  M          70
>>  F           65
>>  M          70
>>
>> Each row represents a person with an age/gender combination. We could
>> put this data into a data frame.
>>
>> Now, I would like to do some actuarial analysis on this data set. To
>> do so, I need to create and store a mortality curve for each person in
>> the table (a mortality curve is a matrix with 2 columns: date and
>> survival probability). I can write a function that returns a mortality
>> curve given gender and age.  The question is the following: In what
>> data format should I store all these mortality curve objects? Should I
>> add a column to the data frame and each entry in that column is a
>> matrix (a mortality curve)? This way, the mortality curve would be
>> stored next to age/gender data in the data frame. However, I read in
>> several places that putting vectors/matrices as elements of a data
>> frame is a bad idea. I do not know why. What is a good design choice
>> in this instance please? How should I store the mortality curves?
>>
>> Thank you for your help.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: A basic design question for R

Rui Barradas
In reply to this post by Onur Uncu
Hello,

Follow this example. It uses a list to hold the mortality curves.
Since there are only two different gender/age combinations, it first
gets all such unique combinations and then creates a list of the
appropriate length. Then assigns a matrix to the first list element.

DF <- read.table(text="
Gender  Age
  M          70
  F           65
  M          70
", header=TRUE)

# get unique gender&age
nms <- unique(apply(DF, 1, paste, collapse="."))
n <- length(nms)

# Create a list:
#   list are meant to hold any type of related objects
mort.curve <- vector("list", n)
names(mort.curve) <- nms

# Assign a value to its 1st element
mort.curve[[ 1 ]] <- matrix(1:12, nrow=4)
mort.curve$M.70  # see it
mort.curve[[ "M.70" ]]  # the same
mort.curve[[ nms[1] ]]  # the same


Alternatively, if you want each data.frame row to correspond to its own
list element, the list would be vector("list", nrow(DF)). Anyway, list
are very flexible, and the premier choice for that sort of problem.

Hope this helps,

Rui Barradas

Em 16-06-2012 16:50, Onur Uncu escreveu:

> Hello R Community,
>
> I have the following design question. I have a data set that looks
> like this (shortened for the sake of example).
>
> Gender  Age
>   M          70
>   F           65
>   M          70
>
> Each row represents a person with an age/gender combination. We could
> put this data into a data frame.
>
> Now, I would like to do some actuarial analysis on this data set. To
> do so, I need to create and store a mortality curve for each person in
> the table (a mortality curve is a matrix with 2 columns: date and
> survival probability). I can write a function that returns a mortality
> curve given gender and age.  The question is the following: In what
> data format should I store all these mortality curve objects? Should I
> add a column to the data frame and each entry in that column is a
> matrix (a mortality curve)? This way, the mortality curve would be
> stored next to age/gender data in the data frame. However, I read in
> several places that putting vectors/matrices as elements of a data
> frame is a bad idea. I do not know why. What is a good design choice
> in this instance please? How should I store the mortality curves?
>
> Thank you for your help.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: A basic design question for R

Petr Savicky
In reply to this post by Onur Uncu
On Sat, Jun 16, 2012 at 06:14:38PM +0100, Onur Uncu wrote:
> Thank you.  But isn't a data frame already a list?

Data frame is a list of columns. The suggestion was to use a list,
whose length is the number of rows and which contains a matrix
for each row.

> What is wrong with
> adding a column to the existing data frame (a column with the
> mortality curve matrices)?

The elements of a data frame cannot be matrices. If the matrices
may be unfolded to vectors, then these vectors can be included
into the rows. Something like

  Gender  Age  x1  x2  x3  y1  y2  y3
   M      70   ...
   F      65   ...
   M      70   ...

where ... represent six numbers in each row, which form a matrix

  x1  y1
  x2  y2
  x3  y3

However, i think, a list of matrices is more flexible.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: A basic design question for R

Jeff Newmiller
In reply to this post by Onur Uncu
A data frame is a list of vectors all with the same length. Note that vectors have simple types. Sticking other types of objects into it violates the generic constraint stated above, leading to incompatibility with many functions that normally work with data frames.

If you want to maintain data frame semantics for your original data, possibilities I see right off.

1) Make a list with two elements... the first being the original data frame, and the second being a list of matrices of the same length as the nrows of the data frame, maintaining index correspondence.

2) Represent the list of matrices as a single data frame, with one column that identifies the row in the original data frame for each row of the mortality matrix. This is a more relational (as in SQL) solution.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.







Onur Uncu <[hidden email]> wrote:

>Thank you.  But isn't a data frame already a list? What is wrong with
>adding a column to the existing data frame (a column with the
>mortality curve matrices)?
>
>Sorry if I am being difficult. Just want to learn good design in R.
>
>
>On Sat, Jun 16, 2012 at 5:49 PM, steven mosher <[hidden email]>
>wrote:
>> use a list. or create new class which is a list
>>
>> On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote:
>>>
>>> Hello R Community,
>>>
>>> I have the following design question. I have a data set that looks
>>> like this (shortened for the sake of example).
>>>
>>> Gender  Age
>>>  M          70
>>>  F           65
>>>  M          70
>>>
>>> Each row represents a person with an age/gender combination. We
>could
>>> put this data into a data frame.
>>>
>>> Now, I would like to do some actuarial analysis on this data set. To
>>> do so, I need to create and store a mortality curve for each person
>in
>>> the table (a mortality curve is a matrix with 2 columns: date and
>>> survival probability). I can write a function that returns a
>mortality
>>> curve given gender and age.  The question is the following: In what
>>> data format should I store all these mortality curve objects? Should
>I
>>> add a column to the data frame and each entry in that column is a
>>> matrix (a mortality curve)? This way, the mortality curve would be
>>> stored next to age/gender data in the data frame. However, I read in
>>> several places that putting vectors/matrices as elements of a data
>>> frame is a bad idea. I do not know why. What is a good design choice
>>> in this instance please? How should I store the mortality curves?
>>>
>>> Thank you for your help.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: A basic design question for R

Onur Uncu
Thank you all for the useful replies. I will go with the list of
matrices idea as suggested multiple times.



On Sat, Jun 16, 2012 at 6:50 PM, Jeff Newmiller
<[hidden email]> wrote:

> A data frame is a list of vectors all with the same length. Note that vectors have simple types. Sticking other types of objects into it violates the generic constraint stated above, leading to incompatibility with many functions that normally work with data frames.
>
> If you want to maintain data frame semantics for your original data, possibilities I see right off.
>
> 1) Make a list with two elements... the first being the original data frame, and the second being a list of matrices of the same length as the nrows of the data frame, maintaining index correspondence.
>
> 2) Represent the list of matrices as a single data frame, with one column that identifies the row in the original data frame for each row of the mortality matrix. This is a more relational (as in SQL) solution.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
>
>
>
>
>
>
> Onur Uncu <[hidden email]> wrote:
>
>>Thank you.  But isn't a data frame already a list? What is wrong with
>>adding a column to the existing data frame (a column with the
>>mortality curve matrices)?
>>
>>Sorry if I am being difficult. Just want to learn good design in R.
>>
>>
>>On Sat, Jun 16, 2012 at 5:49 PM, steven mosher <[hidden email]>
>>wrote:
>>> use a list. or create new class which is a list
>>>
>>> On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote:
>>>>
>>>> Hello R Community,
>>>>
>>>> I have the following design question. I have a data set that looks
>>>> like this (shortened for the sake of example).
>>>>
>>>> Gender  Age
>>>>  M          70
>>>>  F           65
>>>>  M          70
>>>>
>>>> Each row represents a person with an age/gender combination. We
>>could
>>>> put this data into a data frame.
>>>>
>>>> Now, I would like to do some actuarial analysis on this data set. To
>>>> do so, I need to create and store a mortality curve for each person
>>in
>>>> the table (a mortality curve is a matrix with 2 columns: date and
>>>> survival probability). I can write a function that returns a
>>mortality
>>>> curve given gender and age.  The question is the following: In what
>>>> data format should I store all these mortality curve objects? Should
>>I
>>>> add a column to the data frame and each entry in that column is a
>>>> matrix (a mortality curve)? This way, the mortality curve would be
>>>> stored next to age/gender data in the data frame. However, I read in
>>>> several places that putting vectors/matrices as elements of a data
>>>> frame is a bad idea. I do not know why. What is a good design choice
>>>> in this instance please? How should I store the mortality curves?
>>>>
>>>> Thank you for your help.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>______________________________________________
>>[hidden email] mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...