Using R and the Tidyverse for an economic model

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Using R and the Tidyverse for an economic model

R help mailing list-2
I've been translating an economic model from Python into R, and I thought
members of the list would like to see a presentation I've written about it.
I've blogged this at
http://www.j-paine.org/blog/2018/03/r-taxben-a-microsimulation-economic-model-in-r.html
, and the presentation itself is a slideshow at
http://www.j-paine.org/rtaxben/R/reveal/rtaxben.html . The slideshow is
written as one side of a conversation which reveals R and the Tidyverse a
feature at a time to a colleague not familiar with R. Those who _are_
familar with R might prefer the version at
http://www.j-paine.org/rtaxben/R/reveal/rtaxben_anim.html . Exactly the
same material, but, as explained in my introduction, quicker to read. Read
the blog post first.

Our model, R-Taxben, is a microeconomic model, which simulates at the level
of individual people rather than bulk variables such as unemployment and
inflation. It works, roughly speaking, by reading survey data about actual
households, then applying taxes and benefits to calculate net income and
expenditure from gross. It has four main parts: (1) read and process
parameters which describe the taxes and benefits; (2) read the household
data from CSV files and transform into data frames usable by the model; (3)
apply the taxes and benefits, calculating such things as council tax, VAT,
child benefit, and pensions; (4) display the results.

My slides are mainly about (2) and (4), but do touch on the others. I
suggest, for example, that legible R code for (3) could be used as a
"reference standard" against which to describe the notoriously complex UK
benefits system. Organisations such as the Child Poverty Action Group have
written handbooks for benefits advisers which try to specify the system
precisely. We'd like to use R for an electronic version of these.

I've said quite a bit about R for probing and plotting data. Not only for
economists, but for students learning about economics, fiscal policy, and
statistics. And after a brief intro to base R, I've concentrated on the
Tidyverse, because of what I see as its advantages. There are lots of small
demos of the Tidyverse scattered around the web, but fewer of big projects
which use lots of different features from it. So my examples here might be
useful.

Reliability and accuracy are vital, which is why I have more slides about
testing than about anything else, with examples of "testthat".

Near the end, I show a web interface, built using Vis.js , which displays
dataflow in the model. The aim is to make it completely scrutable, so that
none of its economic assumptions are a mystery.

We're looking for funding to go beyond this prototype. There are places
where we'll probably need help with such things as efficiency (see the
section on representation-independent selectors), efficiency again
(multiple JOINs), and the best way to overcome lack of static typing. It
would be great to have R experts, even R implementors, who were willing to
advise on this, and even to collaborate on our grant applications.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using R and the Tidyverse for an economic model

Jeff Newmiller
Looks like you have made an impressive start and some attractive
introductions. I have no significant interest in your topic (sorry), but
it seems that you are re-inventing the wheel a bit in regards to much of
your documentation and modularization... R packages can help you solve
these problems in a cross-platform way. You might try starting with [1]
and referring to [2] as needed.

Regarding your representation-independent selectors... this looks to me
like yet another representation (I think the term is "network database"),
and subject to specific advantages and limitations that this
representation imposes. For most work I do, tidy data frames have an
excellent balance of speed and adaptability. For other types of analyses,
multi-dimensional arrays would be better. Nested lists are extremely
flexible, but not particularly fast (some would say quite slow, but that
depends on your use case). Sometimes a relational database or the
data.table package [3] can be used for increased performance, but your
functional interface would not be compatible with _merging_ the
information efficiently, while dplyr can theoretically support any data
store that presents a tabular data interface with data merge capability.

R seems to work best when used in the functional paradigm operating on
general-purpose objects... functions that transform, analyze, and present
data. Having more general classes of objects means more re-use and ad-hoc
analysis can occur. If I make an object of class "myspecial", only
functions I write will be useful. Making it a subclass of a more general
class is one way to make it more widely useful, but avoiding making it a
subclass of the general class at all can be the most flexible design
principle... which is what "tidy data" aspires to do with data frames.

That is, I think you should not be avoiding $ (or more generally the "[["
operator)... you should be embracing it and enabling users to use it as
well. Just don't go all multi-level with it... prefer multi-column indexes
in data frames in most cases (e.g [4]).

[1] http://r-pkgs.had.co.nz/
[2] https://cran.r-project.org/doc/manuals/R-exts.html
[3] https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
[4] https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf

On Mon, 26 Mar 2018, Jocelyn Ireson-Paine via R-help wrote:

> I've been translating an economic model from Python into R, and I thought
> members of the list would like to see a presentation I've written about it.
> I've blogged this at
> http://www.j-paine.org/blog/2018/03/r-taxben-a-microsimulation-economic-model-in-r.html
> , and the presentation itself is a slideshow at
> http://www.j-paine.org/rtaxben/R/reveal/rtaxben.html . The slideshow is
> written as one side of a conversation which reveals R and the Tidyverse a
> feature at a time to a colleague not familiar with R. Those who _are_
> familar with R might prefer the version at
> http://www.j-paine.org/rtaxben/R/reveal/rtaxben_anim.html . Exactly the
> same material, but, as explained in my introduction, quicker to read. Read
> the blog post first.
>
> Our model, R-Taxben, is a microeconomic model, which simulates at the level
> of individual people rather than bulk variables such as unemployment and
> inflation. It works, roughly speaking, by reading survey data about actual
> households, then applying taxes and benefits to calculate net income and
> expenditure from gross. It has four main parts: (1) read and process
> parameters which describe the taxes and benefits; (2) read the household
> data from CSV files and transform into data frames usable by the model; (3)
> apply the taxes and benefits, calculating such things as council tax, VAT,
> child benefit, and pensions; (4) display the results.
>
> My slides are mainly about (2) and (4), but do touch on the others. I
> suggest, for example, that legible R code for (3) could be used as a
> "reference standard" against which to describe the notoriously complex UK
> benefits system. Organisations such as the Child Poverty Action Group have
> written handbooks for benefits advisers which try to specify the system
> precisely. We'd like to use R for an electronic version of these.
>
> I've said quite a bit about R for probing and plotting data. Not only for
> economists, but for students learning about economics, fiscal policy, and
> statistics. And after a brief intro to base R, I've concentrated on the
> Tidyverse, because of what I see as its advantages. There are lots of small
> demos of the Tidyverse scattered around the web, but fewer of big projects
> which use lots of different features from it. So my examples here might be
> useful.
>
> Reliability and accuracy are vital, which is why I have more slides about
> testing than about anything else, with examples of "testthat".
>
> Near the end, I show a web interface, built using Vis.js , which displays
> dataflow in the model. The aim is to make it completely scrutable, so that
> none of its economic assumptions are a mystery.
>
> We're looking for funding to go beyond this prototype. There are places
> where we'll probably need help with such things as efficiency (see the
> section on representation-independent selectors), efficiency again
> (multiple JOINs), and the best way to overcome lack of static typing. It
> would be great to have R experts, even R implementors, who were willing to
> advise on this, and even to collaborate on our grant applications.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using R and the Tidyverse for an economic model

R help mailing list-2
Jeff, thanks for taking the time to read and comment.

About the points you raise:

1) Packages. I do understand the advantages of packages. But I'm still
prototyping, and it seemed too complicated to make every source file a
separate package, when I'm still often moving source code from one to
another. At the moment, one directory full of source files is simpler
to experiment with and test.

2) Documentation. What I'm concerned with here is not so much where to
put the documentation, as what to write. I follow the
abstract-datatype approach (
https://www.geeksforgeeks.org/abstract-data-types/ ), which is that
the internal structure of a value should be hidden. What's important
is its behaviour: the functions one can call on it, and the types of
result they return. The implementor should be free to change
representation as radically as he or she desires, as long as those
calls remain the same. That is, the _interface_ is not affected. (See
the cartoon I drew at
http://www.j-paine.org/dobbs/engineers_honouring_the_uniform_referent_principle_708.png
.)

But $ and [[ are pervasive in R, and as you say, there are times when
I shouldn't avoid them. On the other hand, I know that I'm likely to
make big changes to some structures while experimenting with
efficiency. I want to proof the rest of the program against those
changes, and I want to proof the rest of my _documentation_ against
those changes. So that means I have to document in a
representation-independent way.

There are also some very R-specific things that I need to think about.
For example, I need to say whether every function is vectorised, don't
I? And I need to say whether NA can occur in every value, and if so,
what it means. These are the documentation-related things I've been
thinking about.

3) Classes. I agree with you, and I have not used any of R's object
systems. Most of my data is, in fact, lists (possibly
tree-structured), vectors, or tibbles.

4) Representation-independent selectors. See 2. It's the
abstract-datatype thing again, wanting to proof code and documentation
outside a module against changes to how the data defined by that
module is stored.

Thanks again. I do welcome comment on these points, because there are
lots of trade-offs to consider, and I know I may be overlooking all
kinds of things I should know.

Jocelyn

On Tue, Mar 27, 2018 at 7:20 PM, Jeff Newmiller
<[hidden email]> wrote:

> Looks like you have made an impressive start and some attractive
> introductions. I have no significant interest in your topic (sorry), but it
> seems that you are re-inventing the wheel a bit in regards to much of your
> documentation and modularization... R packages can help you solve these
> problems in a cross-platform way. You might try starting with [1] and
> referring to [2] as needed.
>
> Regarding your representation-independent selectors... this looks to me like
> yet another representation (I think the term is "network database"), and
> subject to specific advantages and limitations that this representation
> imposes. For most work I do, tidy data frames have an excellent balance of
> speed and adaptability. For other types of analyses, multi-dimensional
> arrays would be better. Nested lists are extremely flexible, but not
> particularly fast (some would say quite slow, but that depends on your use
> case). Sometimes a relational database or the data.table package [3] can be
> used for increased performance, but your functional interface would not be
> compatible with _merging_ the information efficiently, while dplyr can
> theoretically support any data store that presents a tabular data interface
> with data merge capability.
>
> R seems to work best when used in the functional paradigm operating on
> general-purpose objects... functions that transform, analyze, and present
> data. Having more general classes of objects means more re-use and ad-hoc
> analysis can occur. If I make an object of class "myspecial", only functions
> I write will be useful. Making it a subclass of a more general class is one
> way to make it more widely useful, but avoiding making it a subclass of the
> general class at all can be the most flexible design principle... which is
> what "tidy data" aspires to do with data frames.
>
> That is, I think you should not be avoiding $ (or more generally the "[["
> operator)... you should be embracing it and enabling users to use it as
> well. Just don't go all multi-level with it... prefer multi-column indexes
> in data frames in most cases (e.g [4]).
>
> [1] http://r-pkgs.had.co.nz/
> [2] https://cran.r-project.org/doc/manuals/R-exts.html
> [3]
> https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
> [4] https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf
>
> On Mon, 26 Mar 2018, Jocelyn Ireson-Paine via R-help wrote:
>
>> I've been translating an economic model from Python into R, and I thought
>> members of the list would like to see a presentation I've written about
>> it.
>> I've blogged this at
>>
>> http://www.j-paine.org/blog/2018/03/r-taxben-a-microsimulation-economic-model-in-r.html
>> , and the presentation itself is a slideshow at
>> http://www.j-paine.org/rtaxben/R/reveal/rtaxben.html . The slideshow is
>> written as one side of a conversation which reveals R and the Tidyverse a
>> feature at a time to a colleague not familiar with R. Those who _are_
>> familar with R might prefer the version at
>> http://www.j-paine.org/rtaxben/R/reveal/rtaxben_anim.html . Exactly the
>> same material, but, as explained in my introduction, quicker to read. Read
>> the blog post first.
>>
>> Our model, R-Taxben, is a microeconomic model, which simulates at the
>> level
>> of individual people rather than bulk variables such as unemployment and
>> inflation. It works, roughly speaking, by reading survey data about actual
>> households, then applying taxes and benefits to calculate net income and
>> expenditure from gross. It has four main parts: (1) read and process
>> parameters which describe the taxes and benefits; (2) read the household
>> data from CSV files and transform into data frames usable by the model;
>> (3)
>> apply the taxes and benefits, calculating such things as council tax, VAT,
>> child benefit, and pensions; (4) display the results.
>>
>> My slides are mainly about (2) and (4), but do touch on the others. I
>> suggest, for example, that legible R code for (3) could be used as a
>> "reference standard" against which to describe the notoriously complex UK
>> benefits system. Organisations such as the Child Poverty Action Group have
>> written handbooks for benefits advisers which try to specify the system
>> precisely. We'd like to use R for an electronic version of these.
>>
>> I've said quite a bit about R for probing and plotting data. Not only for
>> economists, but for students learning about economics, fiscal policy, and
>> statistics. And after a brief intro to base R, I've concentrated on the
>> Tidyverse, because of what I see as its advantages. There are lots of
>> small
>> demos of the Tidyverse scattered around the web, but fewer of big projects
>> which use lots of different features from it. So my examples here might be
>> useful.
>>
>> Reliability and accuracy are vital, which is why I have more slides about
>> testing than about anything else, with examples of "testthat".
>>
>> Near the end, I show a web interface, built using Vis.js , which displays
>> dataflow in the model. The aim is to make it completely scrutable, so that
>> none of its economic assumptions are a mystery.
>>
>> We're looking for funding to go beyond this prototype. There are places
>> where we'll probably need help with such things as efficiency (see the
>> section on representation-independent selectors), efficiency again
>> (multiple JOINs), and the best way to overcome lack of static typing. It
>> would be great to have R experts, even R implementors, who were willing to
>> advise on this, and even to collaborate on our grant applications.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.