split a character variable into several character variable by a character

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

split a character variable into several character variable by a character

Mao Jianfeng
Dear, R-lister,

I have a dataframe like the followed. And, I want to split a character
variable ("popcode", or "codetot") into several new variables. For example,
split "BCPy01-01" (in popcode) into "BCPy01" and "01". I need to know how to
do that. I have tried strsplit() and substring() functions. But, I still can
not perform the spliting.

Any advice? Thanks in advance.

df1:
popcode     codetot   p3need
BCPy01-01 BCPy01-01-1 100.0000
BCPy01-01 BCPy01-01-2 100.0000
BCPy01-01 BCPy01-01-3 100.0000
BCPy01-02 BCPy01-02-1  92.5926
BCPy01-02 BCPy01-02-1 100.0000
BCPy01-02 BCPy01-02-2  92.5926
BCPy01-02 BCPy01-02-2 100.0000
BCPy01-02 BCPy01-02-3  92.5926
BCPy01-02 BCPy01-02-3 100.0000
BCPy01-03 BCPy01-03-1 100.0000

Regards,

Mao Jian-feng

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: split a character variable into several character variable by a character

Francisco J. Zagmutt
Hello Mao,

If the popcode variable has a fixed number of characters (i.e each entry
has 9 characters), you can use a simple call to substr:

dat<-read.table("clipboard", header=T)#Read from your email
varleft<-substr(dat$popcode,0,6)
varright<-substr(dat$popcode,8,9)
datnew<-data.frame(dat,varleft,varright)

 > datnew
      popcode     codetot   p3need varleft varright
1  BCPy01-01 BCPy01-01-1 100.0000  BCPy01       01
2  BCPy01-01 BCPy01-01-2 100.0000  BCPy01       01
3  BCPy01-01 BCPy01-01-3 100.0000  BCPy01       01
4  BCPy01-02 BCPy01-02-1  92.5926  BCPy01       02
5  BCPy01-02 BCPy01-02-1 100.0000  BCPy01       02
6  BCPy01-02 BCPy01-02-2  92.5926  BCPy01       02
7  BCPy01-02 BCPy01-02-2 100.0000  BCPy01       02
8  BCPy01-02 BCPy01-02-3  92.5926  BCPy01       02
9  BCPy01-02 BCPy01-02-3 100.0000  BCPy01       02
10 BCPy01-03 BCPy01-03-1 100.0000  BCPy01       03


You can use a similar construction for codetot.

I hope this helps,

Francisco

Francisco J. Zagmutt
Vose Consulting
2891 20th Street
Boulder, CO, 80304
USA
[hidden email]
www.voseconsulting.com

Mao Jianfeng wrote:

> Dear, R-lister,
>
> I have a dataframe like the followed. And, I want to split a character
> variable ("popcode", or "codetot") into several new variables. For example,
> split "BCPy01-01" (in popcode) into "BCPy01" and "01". I need to know how to
> do that. I have tried strsplit() and substring() functions. But, I still can
> not perform the spliting.
>
> Any advice? Thanks in advance.
>
> df1:
> popcode     codetot   p3need
> BCPy01-01 BCPy01-01-1 100.0000
> BCPy01-01 BCPy01-01-2 100.0000
> BCPy01-01 BCPy01-01-3 100.0000
> BCPy01-02 BCPy01-02-1  92.5926
> BCPy01-02 BCPy01-02-1 100.0000
> BCPy01-02 BCPy01-02-2  92.5926
> BCPy01-02 BCPy01-02-2 100.0000
> BCPy01-02 BCPy01-02-3  92.5926
> BCPy01-02 BCPy01-02-3 100.0000
> BCPy01-03 BCPy01-03-1 100.0000
>
> Regards,
>
> Mao Jian-feng
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: split a character variable into several character variable by a character

Darren Norris
This post was updated on .
just an alternative try gsub to perform the split by "-".
It is not splitting but substituting everything before or after "-" with nothing....
(just seen this wont be a good solution with your "codetot" column so see strsplit example below as well!!)
?gsub

from Francisco's example:
 
dat<-read.table("clipboard", header=T)#Read from your email
gsub("-.*","",dat$popcode)# gives the BCPy01 part of column popcode
gsub(".*-","",dat$popcode) # gives the 01 part of column popcode

then to add these vectors as columns to your dataframe:

dat$popcodeStart<-gsub("-.*","",dat$popcode)
dat$popcodeEnd<-gsub(".*-","",dat$popcode)

dat
     popcode     codetot   p3need varleft varright popcodeStart popcodeEnd
1  BCPy01-01 BCPy01-01-1 100.0000  BCPy01        1       BCPy01         01
2  BCPy01-01 BCPy01-01-2 100.0000  BCPy01        1       BCPy01         01
3  BCPy01-01 BCPy01-01-3 100.0000  BCPy01        1       BCPy01         01
4  BCPy01-02 BCPy01-02-1  92.5926  BCPy01        2       BCPy01         02
5  BCPy01-02 BCPy01-02-1 100.0000  BCPy01        2       BCPy01         02
6  BCPy01-02 BCPy01-02-2  92.5926  BCPy01        2       BCPy01         02
7  BCPy01-02 BCPy01-02-2 100.0000  BCPy01        2       BCPy01         02
8  BCPy01-02 BCPy01-02-3  92.5926  BCPy01        2       BCPy01         02
9  BCPy01-02 BCPy01-02-3 100.0000  BCPy01        2       BCPy01         02
10 BCPy01-03 BCPy01-03-1 100.0000  BCPy01        3       BCPy01         03

or with strsplit:

dat<-read.table("clipboard", header=T)#Read from your email
newdat<-do.call("rbind",strsplit(as.character(dat$codetot),"-")) #creates a matrix with result of strsplit
colnames(newdat)<-c("codetotStart","codetotMid","codetotEnd") # add column names
newd<-data.frame(dat,newdat) # create new dataframe
newd
     popcode     codetot   p3need codetotStart codetotMid codetotEnd
1  BCPy01-01 BCPy01-01-1 100.0000       BCPy01         01          1
2  BCPy01-01 BCPy01-01-2 100.0000       BCPy01         01          2
3  BCPy01-01 BCPy01-01-3 100.0000       BCPy01         01          3
4  BCPy01-02 BCPy01-02-1  92.5926       BCPy01         02          1
5  BCPy01-02 BCPy01-02-1 100.0000       BCPy01         02          1
6  BCPy01-02 BCPy01-02-2  92.5926       BCPy01         02          2
7  BCPy01-02 BCPy01-02-2 100.0000       BCPy01         02          2
8  BCPy01-02 BCPy01-02-3  92.5926       BCPy01         02          3
9  BCPy01-02 BCPy01-02-3 100.0000       BCPy01         02          3
10 BCPy01-03 BCPy01-03-1 100.0000       BCPy01         03          1

Hope that helps,
Darren


<quote author="Francisco Zagmutt">
Hello Mao,



dat<-read.table("clipboard", header=T)#Read from your email
varleft<-substr(dat$popcode,0,6)
varright<-substr(dat$popcode,8,9)
datnew<-data.frame(dat,varleft,varright)

 > datnew