Splitting a vector into data frame

classic Classic list List threaded Threaded
4 messages Options
BHM
Reply | Threaded
Open this post in threaded view
|

Splitting a vector into data frame

BHM
Hi,

1. I have scraped some data from the web, subset shown below

> dput(temp.data)
c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341",
"30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
"0", "0", "0", "0", "Austria", "Austria", "135417", "166200",
"144500", "147300", "163211", "162536", "155412", "133667", "134962",
"146440", "131188", "100001", "100000", "80000", "35000")

2. The corresponding list of countries, is as follows

> dput(raw.country)
c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria",
"Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia",
"Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador",
"Estonia", "Finland", "France", "Georgia", "Germany", "Ghana",
"Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel",
"Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia",
"Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway",
"Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal",
"Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia",
"Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom",
"USA", "Venezuela", "Vltava", "World Total")


3. I want to organize the data into a data frame, where each row will
contain the 20 values for the corresponding country.
It needs to ignore the country name which appears twice.Something like:

Armenia "43827", "39200", "35700", "36700", "39341",
"30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
"0", "0", "0", "0",

"Austria", "135417", "166200",
"144500", "147300", "163211", "162536", "155412", "133667", "134962",
"146440", "131188", "100001", "100000", "80000", "35000"

and so on


Thanks /

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a vector into data frame

Boris Steipe
Your data rows have different numbers of columns. Thus your problem is not sufficiently specified.

B.
On Mar 24, 2016, at 6:30 AM, Burhan ul haq <[hidden email]> wrote:

> Hi,
>
> 1. I have scraped some data from the web, subset shown below
>
>> dput(temp.data)
> c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341",
> "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
> "0", "0", "0", "0", "Austria", "Austria", "135417", "166200",
> "144500", "147300", "163211", "162536", "155412", "133667", "134962",
> "146440", "131188", "100001", "100000", "80000", "35000")
>
> 2. The corresponding list of countries, is as follows
>
>> dput(raw.country)
> c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria",
> "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia",
> "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador",
> "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana",
> "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel",
> "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia",
> "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway",
> "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal",
> "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia",
> "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom",
> "USA", "Venezuela", "Vltava", "World Total")
>
>
> 3. I want to organize the data into a data frame, where each row will
> contain the 20 values for the corresponding country.
> It needs to ignore the country name which appears twice.Something like:
>
> Armenia "43827", "39200", "35700", "36700", "39341",
> "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
> "0", "0", "0", "0",
>
> "Austria", "135417", "166200",
> "144500", "147300", "163211", "162536", "155412", "133667", "134962",
> "146440", "131188", "100001", "100000", "80000", "35000"
>
> and so on
>
>
> Thanks /
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a vector into data frame

Jim Lemon-4
In reply to this post by BHM
Hi Burhan,
As all of your values seem to be character, perhaps:

country.df<-as.data.frame(matrix(temp.data,ncol=22,byrow=TRUE)[,2:21])

if there really are 2 country names and 20 values for each country. As
Boris has pointed out, there are different numbers of values following
the country names in your example.

Jim


On Thu, Mar 24, 2016 at 9:30 PM, Burhan ul haq <[hidden email]> wrote:

> Hi,
>
> 1. I have scraped some data from the web, subset shown below
>
>> dput(temp.data)
> c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341",
> "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
> "0", "0", "0", "0", "Austria", "Austria", "135417", "166200",
> "144500", "147300", "163211", "162536", "155412", "133667", "134962",
> "146440", "131188", "100001", "100000", "80000", "35000")
>
> 2. The corresponding list of countries, is as follows
>
>> dput(raw.country)
> c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria",
> "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia",
> "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador",
> "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana",
> "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel",
> "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia",
> "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway",
> "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal",
> "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia",
> "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom",
> "USA", "Venezuela", "Vltava", "World Total")
>
>
> 3. I want to organize the data into a data frame, where each row will
> contain the 20 values for the corresponding country.
> It needs to ignore the country name which appears twice.Something like:
>
> Armenia "43827", "39200", "35700", "36700", "39341",
> "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
> "0", "0", "0", "0",
>
> "Austria", "135417", "166200",
> "144500", "147300", "163211", "162536", "155412", "133667", "134962",
> "146440", "131188", "100001", "100000", "80000", "35000"
>
> and so on
>
>
> Thanks /
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Splitting a vector into data frame

Ivan Calandra-4
In reply to this post by BHM
Hi!

As Boris explained, if you do not always have the same number of values
per country, you need to provide more details, e.g. should the empty
cells be filled with NA?

But if you do always have 20 values per country (unlike in your sample
data), then this could work for you:
mydf <- data.frame(matrix(temp.data, nrow=2, ncol=22, byrow=TRUE))
You can then subset to remove the 1st column:
mydf[-1]

HTH,
Ivan

--
Ivan Calandra, PhD
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
[hidden email]
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 24/03/2016 11:30, Burhan ul haq a écrit :

> Hi,
>
> 1. I have scraped some data from the web, subset shown below
>
>> dput(temp.data)
> c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341",
> "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
> "0", "0", "0", "0", "Austria", "Austria", "135417", "166200",
> "144500", "147300", "163211", "162536", "155412", "133667", "134962",
> "146440", "131188", "100001", "100000", "80000", "35000")
>
> 2. The corresponding list of countries, is as follows
>
>> dput(raw.country)
> c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria",
> "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia",
> "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador",
> "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana",
> "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel",
> "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia",
> "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway",
> "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal",
> "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia",
> "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom",
> "USA", "Venezuela", "Vltava", "World Total")
>
>
> 3. I want to organize the data into a data frame, where each row will
> contain the 20 values for the corresponding country.
> It needs to ignore the country name which appears twice.Something like:
>
> Armenia "43827", "39200", "35700", "36700", "39341",
> "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0",
> "0", "0", "0", "0",
>
> "Austria", "135417", "166200",
> "144500", "147300", "163211", "162536", "155412", "133667", "134962",
> "146440", "131188", "100001", "100000", "80000", "35000"
>
> and so on
>
>
> Thanks /
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.