Data contamination

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Data contamination

mazlina Abu Bakar
Dear experts,

Helps are badly needed.
I'm trying to generate a panel data with error term from N(0,1) and alpha
from U(0,20).Explanatory variables are from multivariate std normal distn.
Problem arised when I tried to contaminate the data in Y by adding
additional term from N(50,1).  I ask the computer to choose 5 random data
from Y by using the command runif(5,1,50) since we have 50 data altogether.
i worry the computer will choose the same data twice. will that happen?  i
attach the command for comments.  thanks for your help.

-Mag

##########################
N=10;
Ti=5;
K=3;
alpha=matrix(1:N);
beta=matrix(0,nrow=K, ncol=1);
beta=matrix(1:K);

generate.p<-function(N,Ti,K){

    X=matrix(,nrow=N*Ti,ncol=K);
    Y=matrix(,nrow=N*Ti,ncol=1);
c=1
for (j in 1:N){
    X[c:(Ti*j),]<-rmvnorm(Ti,rep(0,K),diag(K));
    Y[c:(Ti*j),]<-alpha[j]+X[c:(Ti*j),]%*%beta+matrix(rnorm(Ti,1));
    c=1+(Ti*j);
    }
data.sim<-cbind(Y,X)
data.sim[runif(5,1,50),1]<-data.sim[runif(5,1,50),1]+rnorm(5, mean=20, sd=1)

data.sim;

}
#####################

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data contamination

Wu Gong
Hi,

You are right, runif(5,1,50) could generate a same value twice. And I think that your code runs runif(5,1,50) twice too. Try ?sample

selected <- sample(1:50,3)
data.sim[selected,1] <- data.sim[selected,1] + rnorm(5, mean=20, sd=1)

Hope it helps.