changing one character in the name of dataframes repeatedly

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

changing one character in the name of dataframes repeatedly

Laszlo
Dear R-community,

I'd like to ask you a question concerning R again. I try to keep this simple because I am not willing to confuse you at all.

I have a little data frame which I have created the following way:

a <-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
b <-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
df <-as.data.frame(cbind(a,b))
df

Now in the next step I would have liked to create smaller dataframes where the data have been extracted (basically splitted) from the main 'df' dataframe according to the numbers in df$a. Something like this:
a b
1 1
1 2
1 3
1 4
1 5

a  b
2  6
2  7
2  8
2  9
2 10

a  b
3 11
3 12
3 13
3 14
3 15

a  b
4 16
4 17
4 18
4 19
4 20

etc.

It is not quite difficult to do this part. But!! I also want that the name of each and every small dataframe should refer to the fact that according to which number in df$a have I selected the data in df$b.

For example:

df.1 meaning I have only chosen those numbers in df$b which have value "1" in df$a

df.1
a b
1 1
1 2
1 3
1 4
1 5

df.2 meaning I have only chosen those numbers in df$b which have value "2" in df$a.

df.2
a  b
2  6
2  7
2  8
2  9
2 10

df.3 meaning I have only chosen those numbers in df$b which have value "3" in df$a.
a  b
3 11
3 12
3 13
3 14
3 15
etc...

I know it would not be difficult to do this in this way:
df.1 <-split(df,df$a)[[1]]
df.2 <-split(df,df$a)[[2]]
df.3 <-split(df,df$a)[[3]]
df.4 <-split(df,df$a)[[4]]
etc...

But as a matter of fact, my real df dataframe consists of more than 4400 records so it is impossible to do this "manually" for a numerous times with the previously mentioned split function.

I wanted to use loops and managing the problem in the following (wrong) way:
for (i in 1:6)
    {
    df.i <-split(df,df$a)[[i]]
    }

After I wanted to enter df.1, df.2, etc... R sent me the message:
Error: object 'df.1' not found.

However, it recognized df.i and listed following:
   a  b
6 26
6 27
6 28
6 29
6 30

Can you help me with this matter? I wonder if there is a proper way to do this which I haven't figured out yet...

Thank you very much and have a pleasant weekend,
Laszlo Bodnar

____________________________________________________________________________________________________
Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton történő terjesztése, felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

changing one character in the name of dataframes repeatedly

Laszlo
Dear R-community,

I'd like to ask you a question concerning R again. I try to keep this simple because I am not willing to confuse you at all.

I have a little data frame which I have created the following way:

a <-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
b <-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
df <-as.data.frame(cbind(a,b))
df

Now in the next step I would have liked to create smaller dataframes where the data have been extracted (basically splitted) from the main 'df' dataframe according to the numbers in df$a. Something like this:
a b
1 1
1 2
1 3
1 4
1 5

a  b
2  6
2  7
2  8
2  9
2 10

a  b
3 11
3 12
3 13
3 14
3 15

a  b
4 16
4 17
4 18
4 19
4 20

etc.

It is not quite difficult to do this part. But!! I also want that the name of each and every small dataframe should refer to the fact that according to which number in df$a have I selected the data in df$b.

For example:

df.1 meaning I have only chosen those numbers in df$b which have value "1" in df$a

df.1
a b
1 1
1 2
1 3
1 4
1 5

df.2 meaning I have only chosen those numbers in df$b which have value "2" in df$a.

df.2
a  b
2  6
2  7
2  8
2  9
2 10

df.3 meaning I have only chosen those numbers in df$b which have value "3" in df$a.
a  b
3 11
3 12
3 13
3 14
3 15
etc...

I know it would not be difficult to do this in this way:
df.1 <-split(df,df$a)[[1]]
df.2 <-split(df,df$a)[[2]]
df.3 <-split(df,df$a)[[3]]
df.4 <-split(df,df$a)[[4]]
etc...

But as a matter of fact, my real df dataframe consists of more than 4400 records so it is impossible to do this "manually" for a numerous times with the previously mentioned split function.

I wanted to use loops and managing the problem in the following (wrong) way:
for (i in 1:6)
    {
    df.i <-split(df,df$a)[[i]]
    }

After I wanted to enter df.1, df.2, etc... R sent me the message:
Error: object 'df.1' not found.

However, it recognized df.i and listed following:
   a  b
6 26
6 27
6 28
6 29
6 30

Can you help me with this matter? I wonder if there is a proper way to do this which I haven't figured out yet...

Thank you very much and have a pleasant weekend,
Laszlo Bodnar

____________________________________________________________________________________________________
Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton történő terjesztése, felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából vagy annak megbízhatatlanságából eredő kárért.

Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos munkamenetének biztosítása érdekében.


This e-mail and any attached files are confidential and/...{{dropped:19}}


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changing one character in the name of dataframes repeatedly

Ivan Calandra
Hi,

I think, what you want is assign().
for (i in 1:6) assign(paste("df", i, sep="."), split(df,df$a)[[i]])

But using lists is usually a better solution since you can work with
them using functions such as lapply().

First, you don't need cbind() to create your data.frame:
df2 <- data.frame(a,b)
identical(df, df2)
[1] TRUE

Then, I think that
df_split <- split(df, df$a)
does pretty much what you want.
You could additionally adjust the names like this:
names(df_split) <- paste("df", 1:length(df_split), sep=".")

As a last comment, depending on what your ultimate goal is, you might
not need to do it at all. Take a look at ?aggregate, ?by, ?summaryBy
(from package doBy) and ?ddply( from package plyr) for example.

HTH,
Ivan


Le 3/11/2011 16:49, Bodnar Laszlo EB_HU a écrit :

> Dear R-community,
>
>
>
> I'd like to ask you a question concerning R again. I try to keep this simple because I am not willing to confuse you at all.
>
>
>
> I have a little data frame which I have created the following way:
>
>
>
> a<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
>
> b<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
>
> df<-as.data.frame(cbind(a,b))
>
> df
>
>
>
> Now in the next step I would have liked to create smaller dataframes where the data have been extracted (basically splitted) from the main 'df' dataframe according to the numbers in df$a. Something like this:
>
> a b
>
> 1 1
>
> 1 2
>
> 1 3
>
> 1 4
>
> 1 5
>
>
>
> a  b
>
> 2  6
>
> 2  7
>
> 2  8
>
> 2  9
>
> 2 10
>
>
>
> a  b
>
> 3 11
>
> 3 12
>
> 3 13
>
> 3 14
>
> 3 15
>
>
>
> a  b
>
> 4 16
>
> 4 17
>
> 4 18
>
> 4 19
>
> 4 20
>
>
>
> etc.
>
>
>
> It is not quite difficult to do this part. But!! I also want that the name of each and every small dataframe should refer to the fact that according to which number in df$a have I selected the data in df$b.
>
>
>
> For example:
>
>
>
> df.1 meaning I have only chosen those numbers in df$b which have value "1" in df$a
>
>
>
> df.1
>
> a b
>
> 1 1
>
> 1 2
>
> 1 3
>
> 1 4
>
> 1 5
>
>
>
> df.2 meaning I have only chosen those numbers in df$b which have value "2" in df$a.
>
>
>
> df.2
>
> a  b
>
> 2  6
>
> 2  7
>
> 2  8
>
> 2  9
>
> 2 10
>
>
>
> df.3 meaning I have only chosen those numbers in df$b which have value "3" in df$a.
>
> a  b
>
> 3 11
>
> 3 12
>
> 3 13
>
> 3 14
>
> 3 15
>
> etc...
>
>
>
> I know it would not be difficult to do this in this way:
>
> df.1<-split(df,df$a)[[1]]
>
> df.2<-split(df,df$a)[[2]]
>
> df.3<-split(df,df$a)[[3]]
>
> df.4<-split(df,df$a)[[4]]
>
> etc...
>
>
>
> But as a matter of fact, my real df dataframe consists of more than 4400 records so it is impossible to do this "manually" for a numerous times with the previously mentioned split function.
>
>
>
> I wanted to use loops and managing the problem in the following (wrong) way:
>
> for (i in 1:6)
>
>      {
>
>      df.i<-split(df,df$a)[[i]]
>
>      }
>
>
>
> After I wanted to enter df.1, df.2, etc... R sent me the message:
>
> Error: object 'df.1' not found.
>
>
>
> However, it recognized df.i and listed following:
>
>     a  b
>
> 6 26
>
> 6 27
>
> 6 28
>
> 6 29
>
> 6 30
>
>
>
> Can you help me with this matter? I wonder if there is a proper way to do this which I haven't figured out yet...
>
>
>
> Thank you very much and have a pleasant weekend,
>
> Laszlo Bodnar
>
>
>
> ____________________________________________________________________________________________________
>
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton történő terjesztése, felhasználása szigorúan tilos. Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelősséget az információ teljes és pontos - címzett(ek)hez történő - eljuttatásáért, valamint semmilyen késésért, kapcsolat megszakadásból eredő hibáért, vagy az információ felhasználásából vagy annak megbízhatatlanságából eredő kárért.
>
>
>
> Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos munkamenetének biztosítása érdekében.
>
>
>
>
>
> This e-mail and any attached files are confidential and/...{{dropped:19}}
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
[hidden email]

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changing one character in the name of dataframes repeatedly

David Winsemius
In reply to this post by Laszlo

On Mar 11, 2011, at 10:49 AM, Bodnar Laszlo EB_HU wrote:

> Dear R-community,
>
> I'd like to ask you a question concerning R again. I try to keep  
> this simple because I am not willing to confuse you at all.
>
> I have a little data frame which I have created the following way:
>
> a <-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6)
> b <-
> c
> (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30
> )
> df <-as.data.frame(cbind(a,b))
> df
>
> Now in the next step I would have liked to create smaller dataframes  
> where the data have been extracted (basically splitted) from the  
> main 'df' dataframe according to the numbers in df$a.

?split

> Something like this:
> a b
> 1 1
> 1 2
> 1 3
> 1 4
> 1 5
>
> a  b
> 2  6
> 2  7
> 2  8
> 2  9
> 2 10
>
> a  b
> 3 11
> 3 12
> 3 13
> 3 14
> 3 15
>
> a  b
> 4 16
> 4 17
> 4 18
> 4 19
> 4 20
>
> etc.
>
> It is not quite difficult to do this part. But!! I also want that  
> the name of each and every small dataframe should refer to the fact  
> that according to which number in df$a have I selected the data in df
> $b.

split.name <- function(df) { z <- split(df, df[["a"]])
                 names(z) <- paste("df", unique(df[["a"]], sep="."); z}

 > split.name(df)
$df.1
   a b
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5

$df.2
    a  b
6  2  6
7  2  7
<output snipped>

--
David.

>
> For example:
>
> df.1 meaning I have only chosen those numbers in df$b which have  
> value "1" in df$a
>
> df.1
> a b
> 1 1
> 1 2
> 1 3
> 1 4
> 1 5
>
> df.2 meaning I have only chosen those numbers in df$b which have  
> value "2" in df$a.
>
> df.2
> a  b
> 2  6
> 2  7
> 2  8
> 2  9
> 2 10
>
> df.3 meaning I have only chosen those numbers in df$b which have  
> value "3" in df$a.
> a  b
> 3 11
> 3 12
> 3 13
> 3 14
> 3 15
> etc...
>
> I know it would not be difficult to do this in this way:
> df.1 <-split(df,df$a)[[1]]
> df.2 <-split(df,df$a)[[2]]
> df.3 <-split(df,df$a)[[3]]
> df.4 <-split(df,df$a)[[4]]
> etc...
>
> But as a matter of fact, my real df dataframe consists of more than  
> 4400 records so it is impossible to do this "manually" for a  
> numerous times with the previously mentioned split function.
>
> I wanted to use loops and managing the problem in the following  
> (wrong) way:
> for (i in 1:6)
>    {
>    df.i <-split(df,df$a)[[i]]
>    }
>
> After I wanted to enter df.1, df.2, etc... R sent me the message:
> Error: object 'df.1' not found.
>
> However, it recognized df.i and listed following:
>   a  b
> 6 26
> 6 27
> 6 28
> 6 29
> 6 30
>
> Can you help me with this matter? I wonder if there is a proper way  
> to do this which I haven't figured out yet...
>
> Thank you very much and have a pleasant weekend,
> Laszlo Bodnar
>
> ____________________________________________________________________________________________________
> Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos  
> és/vagy jogilag, szakmailag vagy más módon védett információt  
> tartalmazhat. Amennyiben nem Ön a levél címzettje akkor a levél  
> tartalmának közlése, reprodukálása, másolása, vagy egyéb más  
> úton történő terjesztése, felhasználása szigorúan tilos.  
> Amennyiben tévedésből kapta meg ezt az üzenetet kérjük azonnal  
> értesítse az üzenet küldőjét. Az Erste Bank Hungary Zrt. (EBH)  
> nem vállal felelősséget az információ teljes és pontos -  
> címzett(ek)hez történő - eljuttatásáért, valamint semmilyen  
> késésért, kapcsolat megszakadásból eredő hibáért, vagy az  
> információ felhasználásából vagy annak  
> megbízhatatlanságából eredő kárért.
>
> Az üzenetek EBH-n kívüli küldője vagy címzettje tudomásul  
> veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott  
> is hozzáférhet az EBH folytonos munkamenetének biztosítása  
> érdekében.
>
>
> This e-mail and any attached files are confidential and/...{{dropped:
> 19}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: changing one character in the name of dataframes repeatedly

Laszlo
In reply to this post by Ivan Calandra
Hello Ivan,

Thank you very much for your comments, they were really useful and I’ll try to memorize and use them in the future.

Getting back to my problem… well, I try to put it in a different way because I’m afraid this is gonna be a little bit more difficult than I thought.

So, here is my refreshed database (it is a little bit more similar to my original database than my previous ’df’ database in my previous letter, although still simplified).

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df

Basically what I would like to do is to get the distributions of the numbers for each column (a,b,c,d,e) and for each group (1,2,3) (for this latter grouping see my column ’id’).

So, for column ’a’ and for number ’1’ (for the latter see column ’id’):
as.numeric(table(df[1:10,2]))[1]/sum(as.numeric(table(df[1:10,2])))
as.numeric(table(df[1:10,2]))[2]/sum(as.numeric(table(df[1:10,2])))

Fist time you get: [1] 0.3, then you get: [1] 0.7

Just to briefly explain my results: in column ’a’ (and regarding only those records which have number ’1’ in column ’id’) we can say that:
number 1 occured 3 times, and
number 3 occured 7 times.
3 / (3+7) = 0.3, and 7 / (3+7) = 0.7

Again, just to show you another example. For column ’a’ and for number ’2’ (for the latter grouping see again column ’id’):
as.numeric(table(df[11:20,2]))[1]/sum(as.numeric(table(df[11:20,2])))
as.numeric(table(df[11:20,2]))[2]/sum(as.numeric(table(df[11:20,2])))
as.numeric(table(df[11:20,2]))[3]/sum(as.numeric(table(df[11:20,2])))

After running the codes the results are: 0.4, 0.3, 0.3.

Let me explain a little again: in column ’a’ and regarding only those observations which have number ’2’ in column ’id’) we can say that
Number 1 occured 4 times
number 2 occured 3 times and
number 3 occured 3 times.
Now the results are obvious: 4/10 = 0.4, 3/10=0.3, 3/10=0.3 etc.

So this is what I would like to do. Calculating distributions for each "custom-defined" subsets and then collecting these values into a data frame.

The reason I wanted to sort out the problem with indices like ’i’, ’k’ etc. (you know we discussed it previously) was because I’m gonna have to change the input ’df’ dataframe on a regular basis and hence both the overall number of rows and columns might change over time…

Thank you again,
Laszlo