HI,

I am sorry. I didn't test it properly.

Check if this works. (But, you already got Hervé's solution).

For ##2

eyaSpl2 <- rep("#",sum(length(eyaSpl),length(CDS1[,1]))) ##as in previous code

indx <- CDS1[,1]+rep(seq(0,length(CDS1[,1]),by=2),each=2)[-c(1,40)]

eyaSpl2[-indx] <- eyaSpl

###testing

indx2 <- which(eyaSpl2=="#")

lst1 <- lapply(split(CDS1[,1],((seq_along(CDS1[,1])-1)%/%2)+1),function(x) paste(eyaSpl[(x[1]-1):(x[2]+1)],collapse=""))

lst2 <- lapply(split(indx2,((seq_along(indx2)-1)%/%2)+1),function(x) paste(eyaSpl2[(x[1]-1):(x[2]+1)],collapse=""))

lst1[[1]]

#[1] "fapkkaakafmfffakkannpaaapkacfaapfdk"

lst2[[1]]

#[1] "f#apkkaakafmfffakkannpaaapkacfaapfd#k"

####

lst1[[2]]

#[1] "kkpaaakaaaafkpkfbfakaaofakapkpppfcgaanfpfakaappffakk"

lst2[[2]]

#[1] "k#kpaaakaaaafkpkfbfakaaofakapkpppfcgaanfpfakaappffak#k"

####

lst1[[19]]

#[1] "kfafaafapkfffpphpkkakkapapfeaknfafpfckaffpfhpkkfpfpefahfaakfafpkkaappakakpapppkpaaf"

lst2[[19]]

#[1] "k#fafaafapkfffpphpkkakkapapfeaknfafpfckaffpfhpkkfpfpefahfaakfafpkkaappakakpapppkpaa#f"

A.K.

Hi A.K., thanks for your help. I have some follow up queries.

For ##2, the code doesn't seem to get exactly what I was after. For example, for the first position pair, the code generates:

a#fapkkaakafmfffakkannpaaapkacfaapf#dk

whereas the # signs should be around this:

af#apkkaakafmfffakkannpaaapkacfaapfd#k

The positions of # are also slightly off for the latter position pairs.

On Thursday, January 23, 2014 2:04 PM, arun <

[hidden email]> wrote:

Hi,

Try:

CDS1 <- read.table("CDS coordinates.txt",header=FALSE)

CDS2 <- split(CDS1[,1],as.numeric(as.character(gl(nrow(CDS1),2,length=nrow(CDS1)))))

eya4 <- readChar("eya4_lagan_HM_cp.txt",file.info("eya4_lagan_HM_cp.txt")$size)

eyaSpl<- head(strsplit(eya4,"")[[1]],-1)

length(eyaSpl)

#[1] 311522

eyaSpl1 <- eyaSpl

##1

for(i in seq_along(CDS2)){

eyaSpl1[seq(CDS2[[i]][1],CDS2[[i]][2],by=1)] <- "#"

eyaSpl1}

##2

eyaSpl2 <- rep("#",sum(length(eyaSpl),length(CDS1[,1])))

vec1 <- unlist(lapply(CDS2,function(x) c(x[1]-1,x[2]+1)),use.names=FALSE)

eyaSpl2[-vec1] <- eyaSpl

eyaSpl2New <- paste(eyaSpl2,collapse="")

A.K.

I have a data file here, which is imported into R by:

eya4_lagan_HM_cp <- "E:/blahblah/eya4_lagan_HM_cp.txt"

eya4_lagan_HM_cp <- readChar(eya4_lagan_HM_cp, file.info(eya4_lagan_HM_cp)$size)

Label the first string with position "1" and the last string

as position "311,522" (note the sequence contains in total 311,522

characters). I have two queries which are closely related.

**Query 1)**

Now I have a data file with a list of positions here. The positions are read in "pairs", that is, take the first pair 44184

and 44216 as an example. I wish to delete the subsequence from position

44184 (inclusive) to position 44216 (inclusive) from the previous

sequence `eya4_lagan_HM_cp` and in its place, insert the character #. In other words, substitute the subsequence from 44184 to 44216 with #. I

would like to do this with the rest of the pairs, that is, for 151795

and 151844, I want to delete from position 151795 (inclusive) to 151844

(inclusive) in `eya4_lagan_HM_cp` and replace it with #, and so on.

**Query 2)**

Now I would like to do something slightly different with the

data file with the list of positions. Take the first pair as an example

again. I would like to insert a # right before position 44184, in other words, insert a # between positions 44183 and 44184 in

`eya4_lagan_HM_cp` and then I would like to insert a # right after position 44216, i.e., insert a # between positions 44216 and 44217. I would like to repeat this procedure for all position pairs. So for the next pair, I would like a # right before 151795 and a # right after 151844.

Thank you.

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.