How to import sensitive data when multiple users collaborate on R-script?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to import sensitive data when multiple users collaborate on R-script?

Nikolai Stenfors
We conduct medical research and our datafiles therefore contain sensitive
data, not to be shared in the cloud (Dropboc, Box, Drive, Bitbucket, GitHub).
When we collaborate on a r-analysis-script, we stumble upon the following
annoyance. Researcher 1 has a line in the script importing the sensitive
data from his/her personal computer. Researcher 2 has to put an additional
line importing the data from his/her personal computer. Thus, we have lines
in the script that are unnecessery for one or the other researcher. How can
we avoid this? Is there another way of conducting the collaboration. Other
workflow?

I'm perhaps looking for something like:
"If the script is run on researcher 1 computer, load file from this
directory. If the script is run on researcher 2 computer, load data from
that directory".

Example:
## Import data-------------------------------------
# Researcher 1 import data from laptop1, unnecessery line for Researcher 2
data <- read.table("/path/to_researcher1_computer/sensitive_data.csv")

# Researcher 2 import data from laptop2 (unnecessery line for Researcher 1)
data <- read.table("/path/to_researcher2_computer/sensitive_data.csv")

## Clean data
data$var1 <- NULL

## Analyze data
boxplot(data$var2)

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to import sensitive data when multiple users collaborate on R-script?

Tom Wright-9
My general approach to this is to put the function for loading data
into a separate file which is then sourced in the main analysis file.
Occasionally I'll use a construct like:

if file.exists("loadData_local.R")
  {
    source("loadData_local.R")
  }else{
    source("loadData_generic.R")
  }

Where loadData_generic.R contains the path to some sample (non-sensitive) data.

On Tue, May 31, 2016 at 6:44 AM, Nikolai Stenfors
<[hidden email]> wrote:

> We conduct medical research and our datafiles therefore contain sensitive
> data, not to be shared in the cloud (Dropboc, Box, Drive, Bitbucket, GitHub).
> When we collaborate on a r-analysis-script, we stumble upon the following
> annoyance. Researcher 1 has a line in the script importing the sensitive
> data from his/her personal computer. Researcher 2 has to put an additional
> line importing the data from his/her personal computer. Thus, we have lines
> in the script that are unnecessery for one or the other researcher. How can
> we avoid this? Is there another way of conducting the collaboration. Other
> workflow?
>
> I'm perhaps looking for something like:
> "If the script is run on researcher 1 computer, load file from this
> directory. If the script is run on researcher 2 computer, load data from
> that directory".
>
> Example:
> ## Import data-------------------------------------
> # Researcher 1 import data from laptop1, unnecessery line for Researcher 2
> data <- read.table("/path/to_researcher1_computer/sensitive_data.csv")
>
> # Researcher 2 import data from laptop2 (unnecessery line for Researcher 1)
> data <- read.table("/path/to_researcher2_computer/sensitive_data.csv")
>
> ## Clean data
> data$var1 <- NULL
>
> ## Analyze data
> boxplot(data$var2)
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to import sensitive data when multiple users collaborate on R-script?

John McKown
In reply to this post by Nikolai Stenfors
On Tue, May 31, 2016 at 5:44 AM, Nikolai Stenfors <
[hidden email]> wrote:

> We conduct medical research and our datafiles therefore contain sensitive
> data, not to be shared in the cloud (Dropboc, Box, Drive, Bitbucket,
> GitHub).
> When we collaborate on a r-analysis-script, we stumble upon the following
> annoyance. Researcher 1 has a line in the script importing the sensitive
> data from his/her personal computer. Researcher 2 has to put an additional
> line importing the data from his/her personal computer. Thus, we have lines
> in the script that are unnecessery for one or the other researcher. How can
> we avoid this? Is there another way of conducting the collaboration. Other
> workflow?
>
> I'm perhaps looking for something like:
> "If the script is run on researcher 1 computer, load file from this
> directory. If the script is run on researcher 2 computer, load data from
> that directory".
>
> Example:
> ## Import data-------------------------------------
> # Researcher 1 import data from laptop1, unnecessery line for Researcher 2
> data <- read.table("/path/to_researcher1_computer/sensitive_data.csv")
>
> # Researcher 2 import data from laptop2 (unnecessery line for Researcher 1)
> data <- read.table("/path/to_researcher2_computer/sensitive_data.csv")
>
> ## Clean data
> data$var1 <- NULL
>
> ## Analyze data
> boxplot(data$var2)
>
>
​Can you have the researchers input the name of the data file to be
analyzed? I use code similar to:

arguments <- commandArgs(trailingOnly=TRUE);
#
# I put in the next command due to my own ignorance
# If you invoke an R script file using just R, you
# need to say something like:
# R BATCH CMD script.R --args ... other arguments ...
#
# but if you use Rscript, you invoke it like:
# Rscript script.R ... other arguments ...
#
# Well, I got confused and did:
# Rscript script.R --args ... other arguments ...
#
# The next line adjusts for my own idiocy.
if ("--args" == arguments[1]) arguments <- arguments[-1];
#
for (file in arguments) {
...
}

Please ignore the line about my own idiocy :-}

Another thought is to use an environment variable which is set in the
user's logon profile (or the Windows registry, forgive my ignorance of
Windows). I think this would be something like:

filename <- Sys.getenv("FILENAME")
if (filename = "") {
... no file name in environment, what to do?
}

You could have someone do this for the user, if he is not familiar with ​
the process.



--
The unfacts, did we have them, are too imprecisely few to warrant our
certitude.

Maranatha! <><
John McKown

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to import sensitive data when multiple users collaborate on R-script?

Jeff Newmiller
In reply to this post by Nikolai Stenfors
Assume everyone will begin their work in a suitable working directory for their computer. Put data in that working directory or some directory "near" it. Then use relative paths to the data instead of absolute paths (don't use paths that start with "/"). I usually start by reading in a "configuration" file that I keep customized for per computer, that includes such things as the names of files I want to analyze. Sometimes there is only one row in that file, other times I select one row on the fly to use.
--
Sent from my phone. Please excuse my brevity.

On May 31, 2016 3:44:21 AM PDT, Nikolai Stenfors <[hidden email]> wrote:

>We conduct medical research and our datafiles therefore contain
>sensitive
>data, not to be shared in the cloud (Dropboc, Box, Drive, Bitbucket,
>GitHub).
>When we collaborate on a r-analysis-script, we stumble upon the
>following
>annoyance. Researcher 1 has a line in the script importing the
>sensitive
>data from his/her personal computer. Researcher 2 has to put an
>additional
>line importing the data from his/her personal computer. Thus, we have
>lines
>in the script that are unnecessery for one or the other researcher. How
>can
>we avoid this? Is there another way of conducting the collaboration.
>Other
>workflow?
>
>I'm perhaps looking for something like:
>"If the script is run on researcher 1 computer, load file from this
>directory. If the script is run on researcher 2 computer, load data
>from
>that directory".
>
>Example:
>## Import data-------------------------------------
># Researcher 1 import data from laptop1, unnecessery line for
>Researcher 2
>data <- read.table("/path/to_researcher1_computer/sensitive_data.csv")
>
># Researcher 2 import data from laptop2 (unnecessery line for
>Researcher 1)
>data <- read.table("/path/to_researcher2_computer/sensitive_data.csv")
>
>## Clean data
>data$var1 <- NULL
>
>## Analyze data
>boxplot(data$var2)
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to import sensitive data when multiple users collaborate on R-script?

MacQueen, Don
In reply to this post by Nikolai Stenfors
There are lots of ways to handle this kind of thing, and the other
suggestions are good. But specific to your "something like" idea, see the
output of

  Sys.info()

in particular
  Sys.info()['nodename']
  Sys.info()['user']

-Don

--
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/31/16, 3:44 AM, "R-help on behalf of Nikolai Stenfors"
<[hidden email] on behalf of [hidden email]>
wrote:

>We conduct medical research and our datafiles therefore contain sensitive
>data, not to be shared in the cloud (Dropboc, Box, Drive, Bitbucket,
>GitHub).
>When we collaborate on a r-analysis-script, we stumble upon the following
>annoyance. Researcher 1 has a line in the script importing the sensitive
>data from his/her personal computer. Researcher 2 has to put an additional
>line importing the data from his/her personal computer. Thus, we have
>lines
>in the script that are unnecessery for one or the other researcher. How
>can
>we avoid this? Is there another way of conducting the collaboration. Other
>workflow?
>
>I'm perhaps looking for something like:
>"If the script is run on researcher 1 computer, load file from this
>directory. If the script is run on researcher 2 computer, load data from
>that directory".
>
>Example:
>## Import data-------------------------------------
># Researcher 1 import data from laptop1, unnecessery line for Researcher 2
>data <- read.table("/path/to_researcher1_computer/sensitive_data.csv")
>
># Researcher 2 import data from laptop2 (unnecessery line for Researcher
>1)
>data <- read.table("/path/to_researcher2_computer/sensitive_data.csv")
>
>## Clean data
>data$var1 <- NULL
>
>## Analyze data
>boxplot(data$var2)
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.