|
Hello R Community,
I have the following design question. I have a data set that looks like this (shortened for the sake of example). Gender Age M 70 F 65 M 70 Each row represents a person with an age/gender combination. We could put this data into a data frame. Now, I would like to do some actuarial analysis on this data set. To do so, I need to create and store a mortality curve for each person in the table (a mortality curve is a matrix with 2 columns: date and survival probability). I can write a function that returns a mortality curve given gender and age. The question is the following: In what data format should I store all these mortality curve objects? Should I add a column to the data frame and each entry in that column is a matrix (a mortality curve)? This way, the mortality curve would be stored next to age/gender data in the data frame. However, I read in several places that putting vectors/matrices as elements of a data frame is a bad idea. I do not know why. What is a good design choice in this instance please? How should I store the mortality curves? Thank you for your help. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
use a list. or create new class which is a list
On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote: > Hello R Community, > > I have the following design question. I have a data set that looks > like this (shortened for the sake of example). > > Gender Age > M 70 > F 65 > M 70 > > Each row represents a person with an age/gender combination. We could > put this data into a data frame. > > Now, I would like to do some actuarial analysis on this data set. To > do so, I need to create and store a mortality curve for each person in > the table (a mortality curve is a matrix with 2 columns: date and > survival probability). I can write a function that returns a mortality > curve given gender and age. The question is the following: In what > data format should I store all these mortality curve objects? Should I > add a column to the data frame and each entry in that column is a > matrix (a mortality curve)? This way, the mortality curve would be > stored next to age/gender data in the data frame. However, I read in > several places that putting vectors/matrices as elements of a data > frame is a bad idea. I do not know why. What is a good design choice > in this instance please? How should I store the mortality curves? > > Thank you for your help. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thank you. But isn't a data frame already a list? What is wrong with
adding a column to the existing data frame (a column with the mortality curve matrices)? Sorry if I am being difficult. Just want to learn good design in R. On Sat, Jun 16, 2012 at 5:49 PM, steven mosher <[hidden email]> wrote: > use a list. or create new class which is a list > > On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote: >> >> Hello R Community, >> >> I have the following design question. I have a data set that looks >> like this (shortened for the sake of example). >> >> Gender Age >> M 70 >> F 65 >> M 70 >> >> Each row represents a person with an age/gender combination. We could >> put this data into a data frame. >> >> Now, I would like to do some actuarial analysis on this data set. To >> do so, I need to create and store a mortality curve for each person in >> the table (a mortality curve is a matrix with 2 columns: date and >> survival probability). I can write a function that returns a mortality >> curve given gender and age. The question is the following: In what >> data format should I store all these mortality curve objects? Should I >> add a column to the data frame and each entry in that column is a >> matrix (a mortality curve)? This way, the mortality curve would be >> stored next to age/gender data in the data frame. However, I read in >> several places that putting vectors/matrices as elements of a data >> frame is a bad idea. I do not know why. What is a good design choice >> in this instance please? How should I store the mortality curves? >> >> Thank you for your help. >> >> ______________________________________________ >> [hidden email] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Onur Uncu
Hello,
Follow this example. It uses a list to hold the mortality curves. Since there are only two different gender/age combinations, it first gets all such unique combinations and then creates a list of the appropriate length. Then assigns a matrix to the first list element. DF <- read.table(text=" Gender Age M 70 F 65 M 70 ", header=TRUE) # get unique gender&age nms <- unique(apply(DF, 1, paste, collapse=".")) n <- length(nms) # Create a list: # list are meant to hold any type of related objects mort.curve <- vector("list", n) names(mort.curve) <- nms # Assign a value to its 1st element mort.curve[[ 1 ]] <- matrix(1:12, nrow=4) mort.curve$M.70 # see it mort.curve[[ "M.70" ]] # the same mort.curve[[ nms[1] ]] # the same Alternatively, if you want each data.frame row to correspond to its own list element, the list would be vector("list", nrow(DF)). Anyway, list are very flexible, and the premier choice for that sort of problem. Hope this helps, Rui Barradas Em 16-06-2012 16:50, Onur Uncu escreveu: > Hello R Community, > > I have the following design question. I have a data set that looks > like this (shortened for the sake of example). > > Gender Age > M 70 > F 65 > M 70 > > Each row represents a person with an age/gender combination. We could > put this data into a data frame. > > Now, I would like to do some actuarial analysis on this data set. To > do so, I need to create and store a mortality curve for each person in > the table (a mortality curve is a matrix with 2 columns: date and > survival probability). I can write a function that returns a mortality > curve given gender and age. The question is the following: In what > data format should I store all these mortality curve objects? Should I > add a column to the data frame and each entry in that column is a > matrix (a mortality curve)? This way, the mortality curve would be > stored next to age/gender data in the data frame. However, I read in > several places that putting vectors/matrices as elements of a data > frame is a bad idea. I do not know why. What is a good design choice > in this instance please? How should I store the mortality curves? > > Thank you for your help. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Onur Uncu
On Sat, Jun 16, 2012 at 06:14:38PM +0100, Onur Uncu wrote:
> Thank you. But isn't a data frame already a list? Data frame is a list of columns. The suggestion was to use a list, whose length is the number of rows and which contains a matrix for each row. > What is wrong with > adding a column to the existing data frame (a column with the > mortality curve matrices)? The elements of a data frame cannot be matrices. If the matrices may be unfolded to vectors, then these vectors can be included into the rows. Something like Gender Age x1 x2 x3 y1 y2 y3 M 70 ... F 65 ... M 70 ... where ... represent six numbers in each row, which form a matrix x1 y1 x2 y2 x3 y3 However, i think, a list of matrices is more flexible. Petr Savicky. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Onur Uncu
A data frame is a list of vectors all with the same length. Note that vectors have simple types. Sticking other types of objects into it violates the generic constraint stated above, leading to incompatibility with many functions that normally work with data frames.
If you want to maintain data frame semantics for your original data, possibilities I see right off. 1) Make a list with two elements... the first being the original data frame, and the second being a list of matrices of the same length as the nrows of the data frame, maintaining index correspondence. 2) Represent the list of matrices as a single data frame, with one column that identifies the row in the original data frame for each row of the mortality matrix. This is a more relational (as in SQL) solution. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Onur Uncu <[hidden email]> wrote: >Thank you. But isn't a data frame already a list? What is wrong with >adding a column to the existing data frame (a column with the >mortality curve matrices)? > >Sorry if I am being difficult. Just want to learn good design in R. > > >On Sat, Jun 16, 2012 at 5:49 PM, steven mosher <[hidden email]> >wrote: >> use a list. or create new class which is a list >> >> On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote: >>> >>> Hello R Community, >>> >>> I have the following design question. I have a data set that looks >>> like this (shortened for the sake of example). >>> >>> Gender Age >>> M 70 >>> F 65 >>> M 70 >>> >>> Each row represents a person with an age/gender combination. We >could >>> put this data into a data frame. >>> >>> Now, I would like to do some actuarial analysis on this data set. To >>> do so, I need to create and store a mortality curve for each person >in >>> the table (a mortality curve is a matrix with 2 columns: date and >>> survival probability). I can write a function that returns a >mortality >>> curve given gender and age. The question is the following: In what >>> data format should I store all these mortality curve objects? Should >I >>> add a column to the data frame and each entry in that column is a >>> matrix (a mortality curve)? This way, the mortality curve would be >>> stored next to age/gender data in the data frame. However, I read in >>> several places that putting vectors/matrices as elements of a data >>> frame is a bad idea. I do not know why. What is a good design choice >>> in this instance please? How should I store the mortality curves? >>> >>> Thank you for your help. >>> >>> ______________________________________________ >>> [hidden email] mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >[hidden email] mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thank you all for the useful replies. I will go with the list of
matrices idea as suggested multiple times. On Sat, Jun 16, 2012 at 6:50 PM, Jeff Newmiller <[hidden email]> wrote: > A data frame is a list of vectors all with the same length. Note that vectors have simple types. Sticking other types of objects into it violates the generic constraint stated above, leading to incompatibility with many functions that normally work with data frames. > > If you want to maintain data frame semantics for your original data, possibilities I see right off. > > 1) Make a list with two elements... the first being the original data frame, and the second being a list of matrices of the same length as the nrows of the data frame, maintaining index correspondence. > > 2) Represent the list of matrices as a single data frame, with one column that identifies the row in the original data frame for each row of the mortality matrix. This is a more relational (as in SQL) solution. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<[hidden email]> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > > > > > > > Onur Uncu <[hidden email]> wrote: > >>Thank you. But isn't a data frame already a list? What is wrong with >>adding a column to the existing data frame (a column with the >>mortality curve matrices)? >> >>Sorry if I am being difficult. Just want to learn good design in R. >> >> >>On Sat, Jun 16, 2012 at 5:49 PM, steven mosher <[hidden email]> >>wrote: >>> use a list. or create new class which is a list >>> >>> On Jun 16, 2012 8:52 AM, "Onur Uncu" <[hidden email]> wrote: >>>> >>>> Hello R Community, >>>> >>>> I have the following design question. I have a data set that looks >>>> like this (shortened for the sake of example). >>>> >>>> Gender Age >>>> M 70 >>>> F 65 >>>> M 70 >>>> >>>> Each row represents a person with an age/gender combination. We >>could >>>> put this data into a data frame. >>>> >>>> Now, I would like to do some actuarial analysis on this data set. To >>>> do so, I need to create and store a mortality curve for each person >>in >>>> the table (a mortality curve is a matrix with 2 columns: date and >>>> survival probability). I can write a function that returns a >>mortality >>>> curve given gender and age. The question is the following: In what >>>> data format should I store all these mortality curve objects? Should >>I >>>> add a column to the data frame and each entry in that column is a >>>> matrix (a mortality curve)? This way, the mortality curve would be >>>> stored next to age/gender data in the data frame. However, I read in >>>> several places that putting vectors/matrices as elements of a data >>>> frame is a bad idea. I do not know why. What is a good design choice >>>> in this instance please? How should I store the mortality curves? >>>> >>>> Thank you for your help. >>>> >>>> ______________________________________________ >>>> [hidden email] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >>______________________________________________ >>[hidden email] mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
