Dear R helpers,
I have two queries. (1) If the dataset contains some variables having all the entries = 0 and while analysing I want to delete those pericular columns, how do acheive this. i.e. dataset1 sr_no var1 var2 var3 var4 var5 1 5 0 3 1 0 2 3 0 2 9 0 3 4 0 4 7 0 4 11 0 1 6 0 In the above dataset, var2 and var3 are all 0's so I don't want to select these columns. It is not that always these two variables will be zeros, so in general how the dataset can be filtered in order to have only non-zero columns. (2) Suppose I have variable no of datasets 'say n = 10'. I wish to write a loop assigning each of these datasets to diffrent csv files e.g. for (i in 1:10) { write.csv(data.frame(dataset[,,i]), 'data_set[i].csv', row.names = FALSE) } The result of this command is generation of a csv file 'data_set[i].csv' containing the last dataset (owing to the wrong command written by me). What I need is creation of say data_set[1].csv, data_set[2].csv, .........data_set[10].csv i.e. 10 different csv files containing 10 different datasets. Thanking you in advance Anna Your Mail works best with the New Yahoo Optimized IE8. Get it NOW! http://downloads.yahoo.com/in/internetexplorer/ [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Dear Anna,
19.02.2010 08:17, Anna Carter wrote: > (1) If the dataset contains some variables having all the entries = 0 > and while analysing I want to delete those pericular columns, how do > acheive this. i.e. Let's suppose 'df' is your data frame, then: subset(df, select=which(colSums(df)!=0)) should do the work :) HTH, Kimmo ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Anna Carter
Hi Anna,
A column with all 0s will have a column sum of zero. So do this : dataset1[, which(colSums(dataSet1 > 0))] If you have a list of data.frames you could do this for(index in 1:10) { write.csv(yourListOfTables[[index]], file = paste("Dataset", index, ".csv", sep = ""), row.names = FALSE) } - Dario. |
In reply to this post by K. Elo
On Feb 19, 2010, at 1:36 AM, K. Elo wrote: > Dear Anna, > > 19.02.2010 08:17, Anna Carter wrote: >> (1) If the dataset contains some variables having all the entries = 0 >> and while analysing I want to delete those pericular columns, how do >> acheive this. i.e. > > Let's suppose 'df' is your data frame, then: > > subset(df, select=which(colSums(df)!=0)) > > should do the work :) It would not work if there were paired negative and positive values or any collection that summed to zero. > dataset1 <- structure(list(sr_no = 1:4, var1 = c(5L, 3L, 4L, 11L), var2 = c(0, 0, 1, -1), var3 = c(3L, 2L, 4L, 1L), var4 = c(1L, 9L, 7L, 6L), var5 = c(0L, 0L, 0L, 0L)), .Names = c("sr_no", "var1", "var2", "var3", "var4", "var5"), row.names = c(NA, -4L), class = "data.frame") Perhaps: > idx <- vector() > for (x in seq_along(names(dataset1))) if (all(dataset1[, x] == 0)) { } else{ idx<- c(idx, x)} > dataset1[, idx] sr_no var1 var2 var3 var4 1 1 5 0 3 1 2 2 3 0 2 9 3 3 4 1 4 7 4 4 11 -1 1 6 Or a modification to Kimmo Elo's code which would still "break" if any columns were character: > subset(dataset1, select=which(colSums(abs(dataset1))!=0)) sr_no var1 var2 var3 var4 1 1 5 0 3 1 2 2 3 0 2 9 3 3 4 1 4 7 4 4 11 -1 1 6 > > HTH, > Kimmo -- David Winsemius, MD Heritage Laboratories West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by Anna Carter
> (2) Suppose I have variable no of datasets 'say n = 10'. I wish to write a loop assigning each of these datasets to diffrent csv files e.g.
> > for (i in 1:10) > { > write.csv(data.frame(dataset[,,i]), 'data_set[i].csv', row.names = FALSE) > } > > The result of this command is generation of a csv file 'data_set[i].csv' containing the last dataset (owing to the wrong command written by me). > > What I need is creation of say data_set[1].csv, data_set[2].csv, .........data_set[10].csv i.e. 10 different csv files containing 10 different datasets. Why do you want to create csv datasets? They lose much of the structure that is in the R object. If you are trying to transfer them to somewhere else, then a direct transfer would be a better choice. If the goal is Excel, there are about 5 options. I prefer RExcel because it allows the tightest coordination of the R and the Excel calculations. If the goal is one of the other popular statistical systems, they also have direct connections, often through the foreign package. Rich ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
In reply to this post by K. Elo
(Forgot to cc. reply to K. Elo, apologies if you get it twice)
K. Elo wrote: > Dear Anna, > > 19.02.2010 08:17, Anna Carter wrote: >> (1) If the dataset contains some variables having all the entries = 0 >> and while analysing I want to delete those pericular columns, how do >> acheive this. i.e. > > Let's suppose 'df' is your data frame, then: > > subset(df, select=which(colSums(df)!=0)) > > should do the work :) Beware negative entries in df! which(colSums(df!=0)) may work better, but it is a bit "sneaky". I'd also avoid subset in favour of df[....] or df[,....]. And why use indexing with which() when you can use the logical index directly? My preference goes to df[,apply(df,2,any)] (a student assistant once almost killed me when I showed her that after she had spent days programming the same thing using loops and whatnots...) -- O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - ([hidden email]) FAX: (+45) 35327907 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Hi!
Right, my solution did not take into accound paired negative values summing up to zero. This should work in all cases: df[, which(colSums(df!=0)!=0)] Kind regards, Kimmo ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
Free forum by Nabble | Edit this page |