Survuval Anaysis

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Survuval Anaysis

Hadii
Hello guys this problem was never answered and I happened to come across
the same problem , kindly help. This is a simple R program that I have been
trying to run. I keep running into the "singular matrix" error. I end up
with no sensible results. Can anyone suggest any changes or a way around
this?

I am a total rookie when working with R.

Thanks,
Haddison

> library(survival)
Loading required package: splines
> args(coxph)
function (formula, data, weights, subset, na.action, init, control,
    method = c("efron", "breslow", "exact"), singular.ok = TRUE,
    robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
NULL
> test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
> sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
data=test1)
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1,2 ; beta may be infinite.
2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen,  :
  X matrix deemed to be singular; variable 3
> summary(sondes)
Call:
coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
    DODamen, data = test1)

  n= 1737, number of events= 58
   (1 observation deleted due to missingness)

               coef  exp(coef)   se(coef)  z Pr(>|z|)
DOLoomis -2.152e+00  1.163e-01  1.161e+05  0        1
DOI55     4.560e-01  1.578e+00  3.755e+04  0        1
DODamen          NA         NA  0.000e+00 NA       NA

         exp(coef) exp(-coef) lower .95 upper .95
DOLoomis    0.1163     8.5995         0       Inf
DOI55       1.5777     0.6338         0       Inf
DODamen         NA         NA        NA        NA

Concordance= 0.5  (se = 0 )
Rsquare= 0   (max possible= 0.01 )
Likelihood ratio test= 0  on 2 df,   p=1
Wald test            = 0  on 2 df,   p=1
Score (logrank) test = 0  on 2 df,   p=1

On Wed, 1 May 2019, 1:00 pm , <[hidden email]> wrote:

> Send R-help mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://stat.ethz.ch/mailman/listinfo/r-help
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-help digest..."
>
>
> Today's Topics:
>
>    1. Re: Bug in R 3.6.0? (Martin Maechler)
>    2. Re: Bug in R 3.6.0? ([hidden email])
>    3. Time series (trend over time) for irregular sampling dates
>       and multiple sites (=?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?=)
>    4. Re:  Time series (trend over time) for irregular sampling
>       dates and multiple sites (Bert Gunter)
>    5. Passing formula as parameter to `lm` within `sapply` causes
>       error [BUG?] (Jens Heumann)
>    6. (no subject) (Haddison Mureithi)
>    7. Help with loop for column means into new column by a subset
>       Factor w/131 levels (Bill Poling)
>    8. Re: Help with loop for column means into new column by a
>       subset Factor w/131 levels (Bill Poling)
>    9. transpose and split dataframe (Matthew)
>   10. Re: transpose and split dataframe (David L Carlson)
>   11. Re: Passing formula as parameter to `lm` within `sapply`
>       causes error [BUG?] (David Winsemius)
>   12. Fwd: Re:  transpose and split dataframe (Matthew)
>   13. Re: transpose and split dataframe (Jim Lemon)
>   14. Re:  Time series (trend over time) for irregular sampling
>       dates and multiple sites (Abs Spurdle)
>   15. Re: Fwd: Re:  transpose and split dataframe (David L Carlson)
>   16. Re: Passing formula as parameter to `lm` within `sapply`
>       causes error [BUG?] (Duncan Murdoch)
>   17. Re:  Time series (trend over time) for irregular sampling
>       dates and multiple sites (Abs Spurdle)
>   18. Re:  Time series (trend over time) for irregular sampling
>       dates and multiple sites (Abs Spurdle)
>   19. Re: Passing formula as parameter to `lm` within `sapply`
>       causes error [BUG?] (Jens Heumann)
>   20. Re: Passing formula as parameter to `lm` within `sapply`
>       causes error [BUG?] (peter dalgaard)
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 30 Apr 2019 16:54:10 +0200
> From: Martin Maechler <[hidden email]>
> To: Morgan Morgan <[hidden email]>
> Cc: <[hidden email]>
> Subject: Re: [R] Bug in R 3.6.0?
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> >>>>> Morgan Morgan
> >>>>>     on Mon, 29 Apr 2019 21:42:36 +0100 writes:
>
>     > Hi,
>     > I am using the R 3.6.0 on windows. The issue that I report below
> does not
>     > exist with previous version of R.
>     > In order to reproduce the error you must install a package of your
> choice
>     > from source (tar.gz).
>
>     > -Create a .Rprofile file with the following command in it :
> setwd("D:/")
>     > -Close your R session and re-open it. Your working directory must be
> now set
>     > to D:
>     > -Install a package of your choice from source, example :
>     > install.packages("data.table",type="source")
>
>     > In my case the package fail to install and I get the following error
>     > message:
>
>     > ** R
>     > ** inst
>     > ** byte-compile and prepare package for lazy loading
>     > Error in tools:::.read_description(file) :
>     > file 'DESCRIPTION' does not exist
>     > Calls: suppressPackageStartupMessages ... withCallingHandlers ->
>     > .getRequiredPackages -> <Anonymous> -> <Anonymous>
>     > Execution halted
>     > ERROR: lazy loading failed for package 'data.table'
>     > * removing 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>     > * restoring previous
>     > 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>     > Warning in install.packages :
>     > installation of package ‘data.table’ had non-zero exit status
>
>     > Now remove the .Rprofile file, restart your R session and try to
> install th
> e
>     > package with the same command.
>     > In that case everything should be installed just fine.
>
>     > FYI the issue happens on macOS as well and I suspect it also does on
> all
>     > linux systems.
>
>     > My question: Is this expected or is it a bug?
>
> It is a bug, thank you very much for reporting it.
>
> I've been told privately by Ömer An (thank you!) who's been
> affected as well, that this problem seems to affect others, and
> that there's a thread about this over at the Rstudio support site
>
>
> https://support.rstudio.com/hc/en-us/community/posts/200704708-Build-tool-does-not-recognize-DESCRIPTION-file
>
> There, users mention that (all?) packages are affected which
> have a multiline 'Description:' field in their DESCRIPTION file.
> Of course, many if not most packages have this property.
>
> Indeed, I can reproduce the problem (e.g. with my 'sfsmisc'
> package) if I ("silly enough to") add a setwd() call to my
> Rprofile file  (the one I set via env.var  R_PROFILE or R_PROFILE_USER).
>
> This is clearly a bug, and indeed a bad one.
>
> It seems all R core (and other R expert users who have tried R
> 3.6.0 alpha, beta, and RC versions) have *not* seen the bug as they
> are intuitively smart not to mess with R's working directory in
> a global R profile file ...
>
> For now you definitively have to work around by not doing what's
> the problem : do *NOT* setwd() in your  ~/.Rprofile or other
> such R init files.
>
> Best,
> Martin Maechler
> ETH Zurich and  R Core Team
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 30 Apr 2019 16:15:46 +0200
> From: <[hidden email]>
> To: "'Morgan Morgan'" <[hidden email]>,
>         <[hidden email]>
> Subject: Re: [R] Bug in R 3.6.0?
> Message-ID: <002d01d4ff5f$34816be0$9d8443a0$@free.fr>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I have exactly the same problem when I install one of my own packages:
>
> Error in tools:::.read_description(file) :
>   file 'DESCRIPTION' does not exist
> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
> .getRequiredPackages -> <Anonymous> -> <Anonymous>
> Exécution arrêtée
> ERROR: lazy loading failed for package 'RRegArch'
>
> Best,
> Ollivier
>
>
> -----Message d'origine-----
> De : R-help <[hidden email]> De la part de Morgan Morgan
> Envoyé : lundi 29 avril 2019 22:43
> À : [hidden email]
> Objet : [R] Bug in R 3.6.0?
>
> Hi,
>
> I am using the R 3.6.0 on windows. The issue that I report below does not
> exist with previous version of R.
> In order to reproduce the error you must install a package of your choice
> from source (tar.gz).
>
> -Create a .Rprofile file with the following command in it : setwd("D:/")
> -Close your R session and re-open it. Your working directory must be now
> set to D:
> -Install a package of your choice from source, example :
> install.packages("data.table",type="source")
>
> In my case the package fail to install and I get the following error
> message:
>
> ** R
> ** inst
> ** byte-compile and prepare package for lazy loading Error in
> tools:::.read_description(file) :
>   file 'DESCRIPTION' does not exist
> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
> .getRequiredPackages -> <Anonymous> -> <Anonymous> Execution halted
> ERROR: lazy loading failed for package 'data.table'
> * removing 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
> * restoring previous
> 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
> Warning in install.packages :
>   installation of package ‘data.table’ had non-zero exit status
>
> Now remove the .Rprofile file, restart your R session and try to install
> the package with the same command.
> In that case everything should be installed just fine.
>
> FYI the issue happens on macOS as well and I suspect it also does on all
> linux systems.
>
> My question: Is this expected or is it a bug?
>
> Thank you
> Best regards,
> Morgan
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 1 May 2019 00:57:43 +1000
> From: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <[hidden email]>
> To: [hidden email]
> Subject: [R] Time series (trend over time) for irregular sampling
>         dates and multiple sites
> Message-ID:
>         <
> [hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> I have a dataset of marine debris items (number of items standardized per
> effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
> locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA
> and 4 in Queensland) at irregular sampling intervals over a period 15
> years.
>
> I want to test if there is a change over the years on the amount of debris
> in these locations and more specifically a change after the implementation
> of a mitigation strategy (in 2013).
> Here’s the head of the data:[image: enter image description here]
> <https://i.stack.imgur.com/VNIpb.png>Description of each one of the
> varables in the dataframe:
>
> *eventid *= each sampling (clean-up) event Location = Queensland and New
> South Wales Sites = all the 9 sampling beaches
>
> *Date *= specific dates for the clean-up events (day-month-year)
>
> *Date1 *= specific dates for the clean-up events (day-month-year) on the
> POSICXT format Year= Year of sampling event (2004 to 2018)
>
> *Month*= Month of the sampling event (jan to dec)
>
> *nMonth*= a number was determined to the respective month of the sampling
> event (1 to 12)
>
> *Day*= Day of sampling (1 to 31) Days = Days since the first date of clean
> up = just another way of using the dates
>
> *MARPOL *= before and after implementation (factor with 2 levels)
>
> *DaysC *= days between sampling events for the same sites = number of days
> since the previous clean-up event
>
> *DaysI *= Days since intervention, all the dates before implementation are
> zero, and after we count the number of days since the implementation date
> (1 jan 2013)
>
> *DaysIa*= same as DayI but instead of zero for before the intervention we
> have negative values (days)
>
> *Items *= number of fishing and shipping items counted in each clean-up
> event
>
> *Hours *= hours spent by all volunteers together at each clean up event
>
> *Lenght *= Lenght of beach sampled by all volunteers together at each clean
> up event volunteers = all volunteers at each clean up event
>
> *HoursVolunteer *= hours spent bt each volunteer at each clean up event
> (Hours/volunteers)
>
> *Ieffort *= the items standarized by the effort (hours, volunteers and
> lenght)
>
> *GrossWeight & **GrossTotal are not relevant *
> ------------------------------
> Problems:
>
> My data has a few problems: (1) I think I will need to fix the effects of
> seasonal variation (Monthly) and (2) of possible spatial correlation
> (probability of finding an item is higher after finding one since they can
> come from the same ship). (3) How do I handle the fact that the
> measurements were not taken at a regular interval?
>
> I was trying to use GAMs to analyse the data and see the trends over time.
> The model I came across is the following:
>
> m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12),
> random=list(Site=~1,Location=~1),data = d)
>
> *thank you in advance.*
> -
> *Catarina Serra Gonçalves *
> PhD candidate
>
> Adrift Lab  <https://adriftlab.org>
> University of Tasmania <http://www.utas.edu.au/> | Institute for Marine
> and
> Antarctic Studies  <http://www.imas.utas.edu.au/>
> Launceston, TAS | Australia
>
> Personal website <https://catarinasg.wixsite.com/acserra>
> <https://catarinasg.wixsite.com/acserra>| E-mail  <[hidden email]> |
> Twitter <https://twitter.com/CatarinaSerraG>
> Research Gate
> <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
> Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 30 Apr 2019 08:28:37 -0700
> From: Bert Gunter <[hidden email]>
> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <[hidden email]>
> Cc: R-help <[hidden email]>
> Subject: Re: [R]  Time series (trend over time) for irregular sampling
>         dates and multiple sites
> Message-ID:
>         <CAGxFJbT2YSB1xcs0MajpeqtHbbn4T1ycYoSOBEFvMucFme1t=
> [hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> I have 0 expertise, but I suggest that you check out the SPatioTemporal
> taskview on CRAN (or possibly others, like environmetrics). You might also
> want to move this to the R-Sig-geo list,where you probably are more likely
> to find relevant expertise.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 30, 2019 at 8:13 AM Catarina Serra Gonçalves <
> [hidden email]> wrote:
>
> > I have a dataset of marine debris items (number of items standardized per
> > effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
> > locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA
> > and 4 in Queensland) at irregular sampling intervals over a period 15
> > years.
> >
> > I want to test if there is a change over the years on the amount of
> debris
> > in these locations and more specifically a change after the
> implementation
> > of a mitigation strategy (in 2013).
> > Here’s the head of the data:[image: enter image description here]
> > <https://i.stack.imgur.com/VNIpb.png>Description of each one of the
> > varables in the dataframe:
> >
> > *eventid *= each sampling (clean-up) event Location = Queensland and New
> > South Wales Sites = all the 9 sampling beaches
> >
> > *Date *= specific dates for the clean-up events (day-month-year)
> >
> > *Date1 *= specific dates for the clean-up events (day-month-year) on the
> > POSICXT format Year= Year of sampling event (2004 to 2018)
> >
> > *Month*= Month of the sampling event (jan to dec)
> >
> > *nMonth*= a number was determined to the respective month of the sampling
> > event (1 to 12)
> >
> > *Day*= Day of sampling (1 to 31) Days = Days since the first date of
> clean
> > up = just another way of using the dates
> >
> > *MARPOL *= before and after implementation (factor with 2 levels)
> >
> > *DaysC *= days between sampling events for the same sites = number of
> days
> > since the previous clean-up event
> >
> > *DaysI *= Days since intervention, all the dates before implementation
> are
> > zero, and after we count the number of days since the implementation date
> > (1 jan 2013)
> >
> > *DaysIa*= same as DayI but instead of zero for before the intervention we
> > have negative values (days)
> >
> > *Items *= number of fishing and shipping items counted in each clean-up
> > event
> >
> > *Hours *= hours spent by all volunteers together at each clean up event
> >
> > *Lenght *= Lenght of beach sampled by all volunteers together at each
> clean
> > up event volunteers = all volunteers at each clean up event
> >
> > *HoursVolunteer *= hours spent bt each volunteer at each clean up event
> > (Hours/volunteers)
> >
> > *Ieffort *= the items standarized by the effort (hours, volunteers and
> > lenght)
> >
> > *GrossWeight & **GrossTotal are not relevant *
> > ------------------------------
> > Problems:
> >
> > My data has a few problems: (1) I think I will need to fix the effects of
> > seasonal variation (Monthly) and (2) of possible spatial correlation
> > (probability of finding an item is higher after finding one since they
> can
> > come from the same ship). (3) How do I handle the fact that the
> > measurements were not taken at a regular interval?
> >
> > I was trying to use GAMs to analyse the data and see the trends over
> time.
> > The model I came across is the following:
> >
> > m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12),
> > random=list(Site=~1,Location=~1),data = d)
> >
> > *thank you in advance.*
> > -
> > *Catarina Serra Gonçalves *
> > PhD candidate
> >
> > Adrift Lab  <https://adriftlab.org>
> > University of Tasmania <http://www.utas.edu.au/> | Institute for Marine
> > and
> > Antarctic Studies  <http://www.imas.utas.edu.au/>
> > Launceston, TAS | Australia
> >
> > Personal website <https://catarinasg.wixsite.com/acserra>
> > <https://catarinasg.wixsite.com/acserra>| E-mail  <[hidden email]>
> |
> > Twitter <https://twitter.com/CatarinaSerraG>
> > Research Gate
> > <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
> > Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 30 Apr 2019 17:24:33 +0200
> From: Jens Heumann <[hidden email]>
> To: <[hidden email]>
> Subject: [R] Passing formula as parameter to `lm` within `sapply`
>         causes error [BUG?]
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> Hi,
>
> `lm` won't take formula as a parameter when it is within a `sapply`; see
> example below. Please, could anyone either point me to a syntax error or
> confirm that this might be a bug?
>
> Best,
> Jens
>
> [Disclaimer: This is my first post here, following advice of how to
> proceed with possible bugs from here: https://www.r-project.org/bugs.html]
>
>
> SUMMARY
>
> While `lm` alone accepts formula parameter `FO` well, the same within a
> `sapply` causes an error. When putting everything as parameter but
> formula `FO`, it's still working, though. All parameters work fine
> within a similar `for` loop.
>
>
> MCVE (see data / R-version at bottom)
>
>  > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
>    Estimate Std. Error    t value   Pr(>|t|)
>   1.6269038  0.9042738  1.7991275  0.3229600
>  > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
>    Estimate Std. Error    t value   Pr(>|t|)
>   1.6269038  0.9042738  1.7991275  0.3229600
>  > sapply(unique(df1$z), function(s)
> +   summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
>                  [,1]       [,2]         [,3]
> Estimate   1.6269038 -0.1404174 -0.010338774
> Std. Error 0.9042738  0.4577001  1.858138516
> t value    1.7991275 -0.3067890 -0.005564049
> Pr(>|t|)   0.3229600  0.8104951  0.996457853
>  > sapply(unique(data[[st]]), function(s)
> +   summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ])  # !!!
> Error in eval(substitute(subset), data, env) : object 's' not found
>  > sapply(unique(data[[st]]), function(s)
> +   summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
>                  [,1]       [,2]         [,3]
> Estimate   1.6269038 -0.1404174 -0.010338774
> Std. Error 0.9042738  0.4577001  1.858138516
> t value    1.7991275 -0.3067890 -0.005564049
> Pr(>|t|)   0.3229600  0.8104951  0.996457853
>  > m <- matrix(NA, 4, length(unique(data[[st]])))
>  > for (s in unique(data[[st]])) {
> +   m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]
> + }
>  > m
>            [,1]       [,2]         [,3]
> [1,] 1.6269038 -0.1404174 -0.010338774
> [2,] 0.9042738  0.4577001  1.858138516
> [3,] 1.7991275 -0.3067890 -0.005564049
> [4,] 0.3229600  0.8104951  0.996457853
>
> # DATA #################################################################
>
> df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
> 0.363128411337339,
> 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
> 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628,
> -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734
> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA,
> -9L))
>
> FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1
>
> ########################################################################
>
>  > R.version
>                 _
> platform       x86_64-w64-mingw32
> arch           x86_64
> os             mingw32
> system         x86_64, mingw32
> status
> major          3
> minor          6.0
> year           2019
> month          04
> day            26
> svn rev        76424
> language       R
> version.string R version 3.6.0 (2019-04-26)
> nickname       Planting of a Tree
>
> #########################################################################
>
> NOTE: Question on SO two days ago
> (
> https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
>
> brought many views but neither answer nor bug confirmation.
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 29 Apr 2019 21:38:00 +0300
> From: Haddison Mureithi <[hidden email]>
> To: [hidden email]
> Subject: [R] (no subject)
> Message-ID:
>         <CABVwvn6y_M2M1o41HryKYp=
> [hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Hello guys this problem was never answered and I happened to come across
> the same problem , kindly help. This is a simple R program that I have been
> trying to run. I keep running into the "singular matrix" error. I end up
> with no sensible results. Can anyone suggest any changes or a way around
> this?
>
> I am a total rookie when working with R.
>
> Thanks,
> Rasika
>
> > library(survival)
> Loading required package: splines
> > args(coxph)
> function (formula, data, weights, subset, na.action, init, control,
>     method = c("efron", "breslow", "exact"), singular.ok = TRUE,
>     robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
> NULL
> > test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
> Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
> > sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
> data=test1)
> Warning messages:
> 1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>   Loglik converged before variable  1,2 ; beta may be infinite.
> 2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen,  :
>   X matrix deemed to be singular; variable 3
> > summary(sondes)
> Call:
> coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
>     DODamen, data = test1)
>
>   n= 1737, number of events= 58
>    (1 observation deleted due to missingness)
>
>                coef  exp(coef)   se(coef)  z Pr(>|z|)
> DOLoomis -2.152e+00  1.163e-01  1.161e+05  0        1
> DOI55     4.560e-01  1.578e+00  3.755e+04  0        1
> DODamen          NA         NA  0.000e+00 NA       NA
>
>          exp(coef) exp(-coef) lower .95 upper .95
> DOLoomis    0.1163     8.5995         0       Inf
> DOI55       1.5777     0.6338         0       Inf
> DODamen         NA         NA        NA        NA
>
> Concordance= 0.5  (se = 0 )
> Rsquare= 0   (max possible= 0.01 )
> Likelihood ratio test= 0  on 2 df,   p=1
> Wald test            = 0  on 2 df,   p=1
> Score (logrank) test = 0  on 2 df,   p=1
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 30 Apr 2019 16:50:48 +0000
> From: Bill Poling <[hidden email]>
> To: "r-help ([hidden email])" <[hidden email]>
> Subject: [R] Help with loop for column means into new column by a
>         subset Factor w/131 levels
> Message-ID:
>         <
> [hidden email]
> >
>
> Content-Type: text/plain; charset="windows-1252"
>
> Good afternoon.
>
> #RStudio Version 1.1.456
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows >= 8 x64 (build 9200)
>
>
>
> #I have a DF of 8 columns and 14025 rows
>
> str(hcd2tmp2)
>
> # 'data.frame':14025 obs. of  8 variables:
> # $ Submitted_Charge: num  21021 15360 40561 29495 7904 ...
> # $ Allowed_Amt     : num  18393 6254 40561 29495 7904 ...
> # $ Submitted_Units : num  60 240 420 45 120 215 215 15 57 2 ...
> # $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117
> 125 24 85 85 90 86 25 ...
> # $ AllowByLimit    : num  4.268 0.949 7.913 6.124 3.524 ...
> # $ UnitsByDose     : num  600 240 420 450 120 215 215 750 570 500 ...
> # $ LimitByUnits    : num  4310 6591 5126 4816 2243 ...
> # $ HCPCSCodeDose1  : num  10 1 1 10 1 1 1 50 10 250 ...
>
> #I would like to create four additional columns that are the mean of four
> current columns in the DF.
> #Current columns
> #Allowed_Amt
> #LimitByUnits
> #AllowByLimit
> #UnitsByDose
>
> #The goal is to be able to identify rows where (for instance) Allowed_Amt
> is greater than the average (aka outliers).
>
> #The trick Is I want the means of those columns based on a Factor value
> #The Factor is:
> #Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"
>
> #So each of my four new columns will have 131 distinct values based on the
> mean for the specific Procedure_Code1 grouping
>
> #In SQL it would look something like this:
>
> #SELECT *,
> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
> #INTO NewTable
> #FROM Oldtable
>
> #Here are some sample data
>
> head(hcd2tmp2, n=40)
> #      Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
> # 1          21020.70    18393.12              60           J1745
> 4.2679810         600      4309.56             10
> # 2          15360.00     6254.40             240           J9299
> 0.9488785         240      6591.36              1
> # 3          40561.32    40561.32             420           J9306
> 7.9133539         420      5125.68              1
> # 4          29495.25    29495.25              45           J9355
> 6.1244417         450      4815.99             10
> # 5           7904.30     7904.30             120           J0897
> 3.5243000         120      2242.80              1
> # 6          15331.95    10614.31             215           J9034
> 2.0586686         215      5155.91              1
> # 7          15331.95    10614.31             215           J9034
> 2.0586686         215      5155.91              1
> # 8            461.90        0.00              15           J9045
> 0.0000000         750        46.38             50
> # 9          27340.96    15092.21              57           J9035
> 3.2600227         570      4629.48             10
> # 10           768.00      576.00               2           J1190
> 1.3617343         500       422.99            250
> # 11           101.00       38.38               5           J2250
>  59.9687500           5         0.64              1
> # 12         17458.40        0.00             200           J9033
> 0.0000000         200      5990.00              1
> # 13          7885.10     7569.70               1           J1745
> 105.3835445          10        71.83             10
> # 14          2015.00     1155.78               4           J2785
> 5.0051100           0       230.92              0
> # 15           443.72      443.72              12           J9045
>  11.9601078         600        37.10             50
> # 16        113750.00   113750.00             600           J2350
> 3.3025003         600     34443.60              1
> # 17          3582.85     3582.85              10           J2469
>  30.5573561         250       117.25             25
> # 18          5152.65     5152.65              50           J2796
> 1.4362988         500      3587.45             10
> # 19          5152.65     5152.65              50           J2796
> 1.4362988         500      3587.45             10
> # 20         39664.09        0.00              74           J9355
> 0.0000000         740      7919.63             10
> # 21           166.71      102.53               9           J9045
> 3.6841538         450        27.83             50
> # 22         13823.61     9676.53               1           J2505
> 2.0785247           6      4655.48              6
> # 23         90954.00    26436.53             360           J1786
> 1.7443775        3600     15155.28             10
> # 24          4800.00     3494.40             800           J3262
> 0.8861838         800      3943.20              1
> # 25           216.00      105.84               4           J0696
>  42.3360000        1000         2.50            250
> # 26          5300.00     4770.00               1           J0178
> 4.9677151           1       960.20              1
> # 27         35203.00    35203.00             200           J9271
> 3.5772498         200      9840.80              1
> # 28         17589.15    17589.15             300           J3380
> 2.9696855         300      5922.90              1
> # 29         18394.64    17842.79               1           J9355
> 166.7238834          10       107.02             10
> # 30           770.00      731.50              10           J2469
> 6.2388060         250       117.25             25
> # 31           461.90        0.00              15           J9045
> 0.0000000         750        46.38             50
> # 32          8160.00     3342.40              80           J1459
> 1.0260818       40000      3257.44            500
> # 33          1653.48      314.16               6           J9305
> 0.7661505          60       410.05             10
> # 34         13036.50        0.00             194           J9034
> 0.0000000         194      4652.31              1
> # 35         10486.87        0.00             156           J9034
> 0.0000000         156      3741.04              1
> # 36         15360.00     6254.40             240           J9299
> 0.9488785         240      6591.36              1
> # 37          1616.83     1616.83             150           J1453
> 5.2528590         150       307.80              1
> # 38         80685.74    34772.43              96           J9035
> 4.4597077         960      7797.02             10
> # 39         85220.58    35925.13             287           J9299
> 4.5577715         287      7882.17              1
> # 40          3860.17     1627.27              13           J9299
> 4.5577963          13       357.03              1
>
>
> #I hope this is enough inforamtion to warrant your support
> #Thank you
> #WHP
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
>
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 30 Apr 2019 18:45:40 +0000
> From: Bill Poling <[hidden email]>
> To: "r-help ([hidden email])" <[hidden email]>
> Subject: Re: [R] Help with loop for column means into new column by a
>         subset Factor w/131 levels
> Message-ID:
>         <
> [hidden email]
> >
>
> Content-Type: text/plain; charset="windows-1252"
>
> I ran this routine but I was thinking there must be a more elegant way of
> doing this.
>
>
> #
> https://community.rstudio.com/t/how-to-average-mean-variables-in-r-based-on-the-level-of-another-variable-and-save-this-as-a-new-variable/8764/8
>
> hcd2tmp2_summmary <- hcd2tmp2 %>%
>   select(.) %>%
>   group_by(Procedure_Code1) %>%
>   summarize(average = mean(Allowed_Amt))
> # A tibble: 131 x 2
> # Procedure_Code1 average
> # <fct>             <dbl>
> # 1 A9606            57785.
> # 2 J0129             5420.
> # 3 J0178             4700.
> # 4 J0180            13392.
> # 5 J0202            56328.
> # 6 J0256            17366.
> # 7 J0257             7563.
> # 8 J0485             2450.
> # 9 J0490             6398.
> # 10 J0585            4492.
> # ... with 121 more rows
>
> hcd2tmp2 <- hcd2tmp %>%
>   group_by(Procedure_Code1) %>%
>   summarise(Avg_Allowed_Amt = mean(Allowed_Amt))
>
> view(hcd2tmp2)
>
>
> hcd2tmp3 <- hcd2tmp %>%
>   group_by(Procedure_Code1) %>%
>   summarise(Avg_AllowByLimit = mean(AllowByLimit))
>
> view(hcd2tmp3)
>
>
> hcd2tmp4 <- hcd2tmp %>%
>   group_by(Procedure_Code1) %>%
>   summarise(Avg_UnitsByDose = mean(UnitsByDose))
>
> view(hcd2tmp4)
>
> hcd2tmp5 <- hcd2tmp %>%
>   group_by(Procedure_Code1) %>%
>   summarise(Avg_LimitByUnits = mean(LimitByUnits))
>
> view(hcd2tmp5)
>
> #Joins----
>
>
> hcd2tmp <- left_join(hcd2tmp2, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
> hcd2tmp <- left_join(hcd2tmp3, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
> hcd2tmp <- left_join(hcd2tmp4, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
> hcd2tmp <- left_join(hcd2tmp5, hcd2tmp, by =
> c("Procedure_Code1"="Procedure_Code1"))
>
> view(hcd2tmp)
>
> hcd2tmp$Avg_LimitByUnits <- round(hcd2tmp$Avg_LimitByUnits, digits = 2)
> hcd2tmp$Avg_Allowed_Amt <- round(hcd2tmp$Avg_Allowed_Amt, digits = 2)
> hcd2tmp$Avg_AllowByLimit <- round(hcd2tmp$Avg_AllowByLimit, digits = 2)
> hcd2tmp$Avg_UnitsByDose <- round(hcd2tmp$Avg_UnitsByDose, digits = 2)
>
> view(hcd2tmp)
>
> #Over under columns----
> hcd2tmp$AllowByLimitFlag <- hcd2tmp$AllowByLimit > hcd2tmp$Avg_AllowByLimit
> hcd2tmp$LimitByUnitsFlag <- hcd2tmp$LimitByUnits > hcd2tmp$Avg_LimitByUnits
> hcd2tmp$Allowed_AmtFlag  <- hcd2tmp$Allowed_Amt  > hcd2tmp$Avg_Allowed_Amt
> hcd2tmp$UnitsByDoseFlag  <- hcd2tmp$UnitsByDose  > hcd2tmp$Avg_UnitsByDose
>
> view(hcd2tmp)
>
>
> -----Original Message-----
> From: Bill Poling
> Sent: Tuesday, April 30, 2019 12:51 PM
> To: r-help ([hidden email]) <[hidden email]>
> Cc: Bill Poling <[hidden email]>
> Subject: Help with loop for column means into new column by a subset
> Factor w/131 levels
>
> Good afternoon.
>
> #RStudio Version 1.1.456
> sessionInfo()
> #R version 3.5.3 (2019-03-11)
> #Platform: x86_64-w64-mingw32/x64 (64-bit) #Running under: Windows >= 8
> x64 (build 9200)
>
>
>
> #I have a DF of 8 columns and 14025 rows
>
> str(hcd2tmp2)
>
> # 'data.frame':14025 obs. of  8 variables:
> # $ Submitted_Charge: num  21021 15360 40561 29495 7904 ...
> # $ Allowed_Amt     : num  18393 6254 40561 29495 7904 ...
> # $ Submitted_Units : num  60 240 420 45 120 215 215 15 57 2 ...
> # $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117
> 125 24 85 85 90 86 25 ...
> # $ AllowByLimit    : num  4.268 0.949 7.913 6.124 3.524 ...
> # $ UnitsByDose     : num  600 240 420 450 120 215 215 750 570 500 ...
> # $ LimitByUnits    : num  4310 6591 5126 4816 2243 ...
> # $ HCPCSCodeDose1  : num  10 1 1 10 1 1 1 50 10 250 ...
>
> #I would like to create four additional columns that are the mean of four
> current columns in the DF.
> #Current columns
> #Allowed_Amt
> #LimitByUnits
> #AllowByLimit
> #UnitsByDose
>
> #The goal is to be able to identify rows where (for instance) Allowed_Amt
> is greater than the average (aka outliers).
>
> #The trick Is I want the means of those columns based on a Factor value
> #The Factor is:
> #Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"
>
> #So each of my four new columns will have 131 distinct values based on the
> mean for the specific Procedure_Code1 grouping
>
> #In SQL it would look something like this:
>
> #SELECT *,
> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
> #INTO NewTable
> #FROM Oldtable
>
> #Here are some sample data
>
> head(hcd2tmp2, n=40)
> #      Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
> # 1          21020.70    18393.12              60           J1745
> 4.2679810         600      4309.56             10
> # 2          15360.00     6254.40             240           J9299
> 0.9488785         240      6591.36              1
> # 3          40561.32    40561.32             420           J9306
> 7.9133539         420      5125.68              1
> # 4          29495.25    29495.25              45           J9355
> 6.1244417         450      4815.99             10
> # 5           7904.30     7904.30             120           J0897
> 3.5243000         120      2242.80              1
> # 6          15331.95    10614.31             215           J9034
> 2.0586686         215      5155.91              1
> # 7          15331.95    10614.31             215           J9034
> 2.0586686         215      5155.91              1
> # 8            461.90        0.00              15           J9045
> 0.0000000         750        46.38             50
> # 9          27340.96    15092.21              57           J9035
> 3.2600227         570      4629.48             10
> # 10           768.00      576.00               2           J1190
> 1.3617343         500       422.99            250
> # 11           101.00       38.38               5           J2250
>  59.9687500           5         0.64              1
> # 12         17458.40        0.00             200           J9033
> 0.0000000         200      5990.00              1
> # 13          7885.10     7569.70               1           J1745
> 105.3835445          10        71.83             10
> # 14          2015.00     1155.78               4           J2785
> 5.0051100           0       230.92              0
> # 15           443.72      443.72              12           J9045
>  11.9601078         600        37.10             50
> # 16        113750.00   113750.00             600           J2350
> 3.3025003         600     34443.60              1
> # 17          3582.85     3582.85              10           J2469
>  30.5573561         250       117.25             25
> # 18          5152.65     5152.65              50           J2796
> 1.4362988         500      3587.45             10
> # 19          5152.65     5152.65              50           J2796
> 1.4362988         500      3587.45             10
> # 20         39664.09        0.00              74           J9355
> 0.0000000         740      7919.63             10
> # 21           166.71      102.53               9           J9045
> 3.6841538         450        27.83             50
> # 22         13823.61     9676.53               1           J2505
> 2.0785247           6      4655.48              6
> # 23         90954.00    26436.53             360           J1786
> 1.7443775        3600     15155.28             10
> # 24          4800.00     3494.40             800           J3262
> 0.8861838         800      3943.20              1
> # 25           216.00      105.84               4           J0696
>  42.3360000        1000         2.50            250
> # 26          5300.00     4770.00               1           J0178
> 4.9677151           1       960.20              1
> # 27         35203.00    35203.00             200           J9271
> 3.5772498         200      9840.80              1
> # 28         17589.15    17589.15             300           J3380
> 2.9696855         300      5922.90              1
> # 29         18394.64    17842.79               1           J9355
> 166.7238834          10       107.02             10
> # 30           770.00      731.50              10           J2469
> 6.2388060         250       117.25             25
> # 31           461.90        0.00              15           J9045
> 0.0000000         750        46.38             50
> # 32          8160.00     3342.40              80           J1459
> 1.0260818       40000      3257.44            500
> # 33          1653.48      314.16               6           J9305
> 0.7661505          60       410.05             10
> # 34         13036.50        0.00             194           J9034
> 0.0000000         194      4652.31              1
> # 35         10486.87        0.00             156           J9034
> 0.0000000         156      3741.04              1
> # 36         15360.00     6254.40             240           J9299
> 0.9488785         240      6591.36              1
> # 37          1616.83     1616.83             150           J1453
> 5.2528590         150       307.80              1
> # 38         80685.74    34772.43              96           J9035
> 4.4597077         960      7797.02             10
> # 39         85220.58    35925.13             287           J9299
> 4.5577715         287      7882.17              1
> # 40          3860.17     1627.27              13           J9299
> 4.5577963          13       357.03              1
>
>
> #I hope this is enough inforamtion to warrant your support
> #Thank you
> #WHP
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
>
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 30 Apr 2019 15:24:57 -0400
> From: Matthew <[hidden email]>
> To: "r-help ([hidden email])" <[hidden email]>
> Subject: [R] transpose and split dataframe
> Message-ID:
>         <[hidden email]>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator    hits
> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980    AT2G85403,AT4G89223
>
>     In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits         : Factor w/ 6 levels
>
>    I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
>    I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
>
>
>
> ------------------------------
>
> Message: 10
> Date: Tue, 30 Apr 2019 21:04:50 +0000
> From: David L Carlson <[hidden email]>
> To: "[hidden email]" <[hidden email]>, Matthew
>         <[hidden email]>
> Subject: Re: [R] transpose and split dataframe
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> I neglected to copy this to the list:
>
> I think we need more information. Can you give us the structure of the
> data with str(YourDataFrame). Alternatively you could copy a small piece
> into your email message by copying and pasting the results of the following
> code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say "hits"
> is a factor with a variable number of elements. If each value of "hits" was
> a single character string, it would only have 2 factor levels not 6 and
> your efforts to parse the string would make more sense. Transposing to a
> data frame would only be possible if each column was padded with NAs to
> make them equal in length. Since your example tries use the name TF2list,
> it is possible that you do not have a data frame but a list and you have no
> factor levels, just character vectors.
>
> If you are not familiar with R, it may be helpful to tell us what your
> overall goal is rather than an intermediate step. Very likely R can easily
> handle what you want by doing things a different way.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 2:25 PM
> To: r-help ([hidden email]) <[hidden email]>
> Subject: [R] transpose and split dataframe
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator    hits
> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980    AT2G85403,AT4G89223
>
>     In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits         : Factor w/ 6 levels
>
>    I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
>    I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
>
> Message: 11
> Date: Tue, 30 Apr 2019 15:03:09 -0600
> From: David Winsemius <[hidden email]>
> To: Jens Heumann <[hidden email]>
> Cc: [hidden email]
> Subject: Re: [R] Passing formula as parameter to `lm` within `sapply`
>         causes error [BUG?]
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Try using do.call
>
> —
> David
>
> Sent from my iPhone
>
> > On Apr 30, 2019, at 9:24 AM, Jens Heumann <
> [hidden email]> wrote:
> >
> > Hi,
> >
> > `lm` won't take formula as a parameter when it is within a `sapply`; see
> example below. Please, could anyone either point me to a syntax error or
> confirm that this might be a bug?
> >
> > Best,
> > Jens
> >
> > [Disclaimer: This is my first post here, following advice of how to
> proceed with possible bugs from here: https://www.r-project.org/bugs.html]
> >
> >
> > SUMMARY
> >
> > While `lm` alone accepts formula parameter `FO` well, the same within a
> `sapply` causes an error. When putting everything as parameter but formula
> `FO`, it's still working, though. All parameters work fine within a similar
> `for` loop.
> >
> >
> > MCVE (see data / R-version at bottom)
> >
> > > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
> >  Estimate Std. Error    t value   Pr(>|t|)
> > 1.6269038  0.9042738  1.7991275  0.3229600
> > > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
> >  Estimate Std. Error    t value   Pr(>|t|)
> > 1.6269038  0.9042738  1.7991275  0.3229600
> > > sapply(unique(df1$z), function(s)
> > +   summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
> >                [,1]       [,2]         [,3]
> > Estimate   1.6269038 -0.1404174 -0.010338774
> > Std. Error 0.9042738  0.4577001  1.858138516
> > t value    1.7991275 -0.3067890 -0.005564049
> > Pr(>|t|)   0.3229600  0.8104951  0.996457853
> > > sapply(unique(data[[st]]), function(s)
> > +   summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ])  # !!!
> > Error in eval(substitute(subset), data, env) : object 's' not found
> > > sapply(unique(data[[st]]), function(s)
> > +   summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
> >                [,1]       [,2]         [,3]
> > Estimate   1.6269038 -0.1404174 -0.010338774
> > Std. Error 0.9042738  0.4577001  1.858138516
> > t value    1.7991275 -0.3067890 -0.005564049
> > Pr(>|t|)   0.3229600  0.8104951  0.996457853
> > > m <- matrix(NA, 4, length(unique(data[[st]])))
> > > for (s in unique(data[[st]])) {
> > +   m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1,
> ]
> > + }
> > > m
> >          [,1]       [,2]         [,3]
> > [1,] 1.6269038 -0.1404174 -0.010338774
> > [2,] 0.9042738  0.4577001  1.858138516
> > [3,] 1.7991275 -0.3067890 -0.005564049
> > [4,] 0.3229600  0.8104951  0.996457853
> >
> > # DATA #################################################################
> >
> > df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
> 0.363128411337339,
> > 0.63286260496104, 0.404268323140999, -0.106124516091484,
> 1.51152199743894,
> > -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
> > 0.740171482827397, 2.64977380403845, -0.755998096151299,
> 0.125479556323628,
> > -0.239445852485142, 2.14747239550901, -0.37891195982917,
> -0.638031707027734
> > ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
> > 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA,
> > -9L))
> >
> > FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1
> >
> > ########################################################################
> >
> > > R.version
> >               _
> > platform       x86_64-w64-mingw32
> > arch           x86_64
> > os             mingw32
> > system         x86_64, mingw32
> > status
> > major          3
> > minor          6.0
> > year           2019
> > month          04
> > day            26
> > svn rev        76424
> > language       R
> > version.string R version 3.6.0 (2019-04-26)
> > nickname       Planting of a Tree
> >
> > #########################################################################
> >
> > NOTE: Question on SO two days ago (
> https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
> brought many views but neither answer nor bug confirmation.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Message: 12
> Date: Tue, 30 Apr 2019 17:31:28 -0400
> From: Matthew <[hidden email]>
> To: "[hidden email]" <[hidden email]>
> Subject: [R] Fwd: Re:  transpose and split dataframe
> Message-ID:
>         <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
>   str(TF2list)
> 'data.frame':    152 obs. of  2 variables:
>   $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
>   $ hits     : Factor w/ 97 levels
> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
>     And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
> 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
> "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
> "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
> "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...
>
> This is another way of looking at the first 4 entries (Regulator is
> tab-separated from hits):
>
> Regulator
>    hits
> 1
> AT1G69490
>
>   AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
> 2
> AT1G29860
>
>   AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>
> 3
> AT1G2986
>
>   AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
>
>     So, the goal would be to
>
> first: Transpose the existing dataframe so that the factor Regulator
> becomes a column name (column 1 name = AT1G69490, column2 name
> AT1G29860, etc.) and the hits associated with each Regulator become
> rows. Hits is a comma separated 'list' ( I do not not know if
> technically it is an R list.), so it would have to be comma
> 'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950,
> col 1 row 2 - AT5G24410, etc); like this :
>
> AT1G69490
> AT4G31950
> AT5G24110
> AT1G05675
> AT5G64905
>
> ... I did not include all the rows)
>
> I think it would be best to actually make the first entry a separate
> dataframe ( 1 column with name = AT1G69490 and number of rows depending
> on the number of hits), then make the second column (column name =
> AT1G29860, and number of rows depending on the number of hits) into a
> new dataframe and do a full join of of the two dataframes; continue by
> making the third column (column name = AT1G2986) into a dataframe and
> full join it with the previous; continue for the 152 observations so
> that then end result is a dataframe with 152 columns and number of rows
> depending on the entry with the greatest number of hits. The full joins
> I can do with dplyr, but getting up to that point seems rather difficult.
>
> This would get me what my ultimate goal would be; each Regulator is a
> column name (152 columns) and a given row has either NA or the same hit.
>
>     This seems very difficult to me, but I appreciate any attempt.
>
> Matthew
>
> On 4/30/2019 4:34 PM, David L Carlson wrote:
> >          External Email - Use Caution
> >
> > I think we need more information. Can you give us the structure of the
> data with str(YourDataFrame). Alternatively you could copy a small piece
> into your email message by copying and pasting the results of the following
> code:
> >
> > dput(head(YourDataFrame))
> >
> > The data frame you present could not be a data frame since you say
> "hits" is a factor with a variable number of elements. If each value of
> "hits" was a single character string, it would only have 2 factor levels
> not 6 and your efforts to parse the string would make more sense.
> Transposing to a data frame would only be possible if each column was
> padded with NAs to make them equal in length. Since your example tries use
> the name TF2list, it is possible that you do not have a data frame but a
> list and you have no factor levels, just character vectors.
> >
> > If you are not familiar with R, it may be helpful to tell us what your
> overall goal is rather than an intermediate step. Very likely R can easily
> handle what you want by doing things a different way.
> >
> > ----------------------------------------
> > David L Carlson
> > Department of Anthropology
> > Texas A&M University
> > College Station, TX 77843-4352
> >
> >
> >
> > -----Original Message-----
> > From: R-help<[hidden email]>  On Behalf Of Matthew
> > Sent: Tuesday, April 30, 2019 2:25 PM
> > To: r-help ([hidden email])<[hidden email]>
> > Subject: [R] transpose and split dataframe
> >
> > I have a data frame that is a lot bigger but for simplicity sake we can
> > say it looks like this:
> >
> > Regulator    hits
> > AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
> > AT2G55980    AT2G85403,AT4G89223
> >
> >      In other words:
> >
> > data.frame : 2 obs. of 2 variables
> > $Regulator: Factor w/ 2 levels
> > $hits         : Factor w/ 6 levels
> >
> >     I want to transpose it so that Regulator is now the column headings
> > and each of the AGI numbers now separated by commas is a row. So,
> > AT1G69490 is now the header of the first column and AT4G31950 is row 1
> > of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> > column 2 and AT2G85403 is row 1 of column 2, etc.
> >
> >     I have tried playing around with strsplit(TF2list[2:2]) and
> > strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
> >
> > Matthew
> >
> > ______________________________________________
> > [hidden email]  mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://
> www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 13
> Date: Wed, 1 May 2019 07:46:32 +1000
> From: Jim Lemon <[hidden email]>
> To: Matthew <[hidden email]>
> Cc: "r-help ([hidden email])" <[hidden email]>
> Subject: Re: [R] transpose and split dataframe
> Message-ID:
>         <CA+8X3fUjv3APb=
> [hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Matthew,
> Is this what you are trying to do?
>
> mmdf<-read.table(text="Regulator    hits
> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980    AT2G85403,AT4G89223",header=TRUE,
> stringsAsFactors=FALSE)
> # split the second column at the commas
> hitsplit<-strsplit(mmdf$hits,",")
> # define a function that will fill with NAs
> NAfill<-function(x,n) return(x[1:n])
> # get the maximum length of hits
> maxlen<-max(unlist(lapply(hitsplit,length)))
> # fill the list with NAs
> hitsplit<-lapply(hitsplit,NAfill,maxlen)
> # change the names of the list
> names(hitsplit)<-mmdf$Regulator
> # convert to a data frame
> tmmdf<-as.data.frame(hitsplit)
>
> Jim
>
> On Wed, May 1, 2019 at 5:25 AM Matthew <[hidden email]>
> wrote:
> >
> > I have a data frame that is a lot bigger but for simplicity sake we can
> > say it looks like this:
> >
> > Regulator    hits
> > AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
> > AT2G55980    AT2G85403,AT4G89223
> >
> >     In other words:
> >
> > data.frame : 2 obs. of 2 variables
> > $Regulator: Factor w/ 2 levels
> > $hits         : Factor w/ 6 levels
> >
> >    I want to transpose it so that Regulator is now the column headings
> > and each of the AGI numbers now separated by commas is a row. So,
> > AT1G69490 is now the header of the first column and AT4G31950 is row 1
> > of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> > column 2 and AT2G85403 is row 1 of column 2, etc.
> >
> >    I have tried playing around with strsplit(TF2list[2:2]) and
> > strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
> >
> > Matthew
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> ------------------------------
>
> Message: 14
> Date: Wed, 1 May 2019 09:58:34 +1200
> From: Abs Spurdle <[hidden email]>
> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <[hidden email]>
> Cc: r-help <[hidden email]>
> Subject: Re: [R]  Time series (trend over time) for irregular sampling
>         dates and multiple sites
> Message-ID:
>         <
> [hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> > My data has a few problems: (1) I think I will need to fix the effects of
> > seasonal variation (Monthly) and (2) of possible spatial correlation
> > (probability of finding an item is higher after finding one since they
> can
> > come from the same ship). (3) How do I handle the fact that the
> > measurements were not taken at a regular interval?
>
> Can I ask two questions:
> (1) Is the data autocorrelated (or "Seasonal") over time?
> If not then this problem is a lot simpler.
> (2) Can you expand on the following statement?
> "possible spatial correlation (probability of finding an item is higher
> after finding one since they can come from the same ship"
>
>         [[alternative HTML version deleted]]
>
>
>
>
> ------------------------------
>
> Message: 15
> Date: Tue, 30 Apr 2019 22:29:24 +0000
> From: David L Carlson <[hidden email]>
> To: Matthew <[hidden email]>, "[hidden email]"
>         <[hidden email]>
> Subject: Re: [R] Fwd: Re:  transpose and split dataframe
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> If you read the data frame with read.csv() or one of the other read()
> functions, use the asis=TRUE argument to prevent conversion to factors. If
> not do the conversion first:
>
> # Convert factors to characters
> DataMatrix <- sapply(TF2list, as.character)
> # Split the vector of hits
> DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
> # Use the values in Regulator to name the parts of the list
> names(DataList) <- DataMatrix[,"Regulator"]
>
> # Now create a data frame
> # How long is the longest list of hits?
> mx <- max(sapply(DataList, length))
> # Now add NAs to vectors shorter than mx
> DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x))))
> # Finally convert back to a data frame
> TF2list2 <- do.call(data.frame, DataList2)
>
> Try this on a portion of the list, say 25 lines and print each object to
> see what is happening.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
>
>
> -----Original Message-----
> From: R-help <[hidden email]> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 4:31 PM
> To: [hidden email]
> Subject: [R] Fwd: Re: transpose and split dataframe
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
>   str(TF2list)
> 'data.frame':    152 obs. of  2 variables:
>   $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
>   $ hits     : Factor w/ 97 levels
> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
>     And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survuval Anaysis

Michael Dewey-3
Without more details it is hard to answer but it is suspicious that it
is dropping one of your predictors and the standard errors of the other
are very large. This suggests you should investigate the joint
distribution of your predictors and the events.

Michael

On 02/05/2019 13:37, Haddison Mureithi wrote:

> Hello guys this problem was never answered and I happened to come across
> the same problem , kindly help. This is a simple R program that I have been
> trying to run. I keep running into the "singular matrix" error. I end up
> with no sensible results. Can anyone suggest any changes or a way around
> this?
>
> I am a total rookie when working with R.
>
> Thanks,
> Haddison
>
>> library(survival)
> Loading required package: splines
>> args(coxph)
> function (formula, data, weights, subset, na.action, init, control,
>      method = c("efron", "breslow", "exact"), singular.ok = TRUE,
>      robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
> NULL
>> test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
> Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
>> sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
> data=test1)
> Warning messages:
> 1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>    Loglik converged before variable  1,2 ; beta may be infinite.
> 2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen,  :
>    X matrix deemed to be singular; variable 3
>> summary(sondes)
> Call:
> coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
>      DODamen, data = test1)
>
>    n= 1737, number of events= 58
>     (1 observation deleted due to missingness)
>
>                 coef  exp(coef)   se(coef)  z Pr(>|z|)
> DOLoomis -2.152e+00  1.163e-01  1.161e+05  0        1
> DOI55     4.560e-01  1.578e+00  3.755e+04  0        1
> DODamen          NA         NA  0.000e+00 NA       NA
>
>           exp(coef) exp(-coef) lower .95 upper .95
> DOLoomis    0.1163     8.5995         0       Inf
> DOI55       1.5777     0.6338         0       Inf
> DODamen         NA         NA        NA        NA
>
> Concordance= 0.5  (se = 0 )
> Rsquare= 0   (max possible= 0.01 )
> Likelihood ratio test= 0  on 2 df,   p=1
> Wald test            = 0  on 2 df,   p=1
> Score (logrank) test = 0  on 2 df,   p=1
>
> On Wed, 1 May 2019, 1:00 pm , <[hidden email]> wrote:
>
>> Send R-help mailing list submissions to
>>          [hidden email]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>          https://stat.ethz.ch/mailman/listinfo/r-help
>> or, via email, send a message with subject or body 'help' to
>>          [hidden email]
>>
>> You can reach the person managing the list at
>>          [hidden email]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of R-help digest..."
>>
>>
>> Today's Topics:
>>
>>     1. Re: Bug in R 3.6.0? (Martin Maechler)
>>     2. Re: Bug in R 3.6.0? ([hidden email])
>>     3. Time series (trend over time) for irregular sampling dates
>>        and multiple sites (=?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?=)
>>     4. Re:  Time series (trend over time) for irregular sampling
>>        dates and multiple sites (Bert Gunter)
>>     5. Passing formula as parameter to `lm` within `sapply` causes
>>        error [BUG?] (Jens Heumann)
>>     6. (no subject) (Haddison Mureithi)
>>     7. Help with loop for column means into new column by a subset
>>        Factor w/131 levels (Bill Poling)
>>     8. Re: Help with loop for column means into new column by a
>>        subset Factor w/131 levels (Bill Poling)
>>     9. transpose and split dataframe (Matthew)
>>    10. Re: transpose and split dataframe (David L Carlson)
>>    11. Re: Passing formula as parameter to `lm` within `sapply`
>>        causes error [BUG?] (David Winsemius)
>>    12. Fwd: Re:  transpose and split dataframe (Matthew)
>>    13. Re: transpose and split dataframe (Jim Lemon)
>>    14. Re:  Time series (trend over time) for irregular sampling
>>        dates and multiple sites (Abs Spurdle)
>>    15. Re: Fwd: Re:  transpose and split dataframe (David L Carlson)
>>    16. Re: Passing formula as parameter to `lm` within `sapply`
>>        causes error [BUG?] (Duncan Murdoch)
>>    17. Re:  Time series (trend over time) for irregular sampling
>>        dates and multiple sites (Abs Spurdle)
>>    18. Re:  Time series (trend over time) for irregular sampling
>>        dates and multiple sites (Abs Spurdle)
>>    19. Re: Passing formula as parameter to `lm` within `sapply`
>>        causes error [BUG?] (Jens Heumann)
>>    20. Re: Passing formula as parameter to `lm` within `sapply`
>>        causes error [BUG?] (peter dalgaard)
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 30 Apr 2019 16:54:10 +0200
>> From: Martin Maechler <[hidden email]>
>> To: Morgan Morgan <[hidden email]>
>> Cc: <[hidden email]>
>> Subject: Re: [R] Bug in R 3.6.0?
>> Message-ID: <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>>>>>>> Morgan Morgan
>>>>>>>      on Mon, 29 Apr 2019 21:42:36 +0100 writes:
>>
>>      > Hi,
>>      > I am using the R 3.6.0 on windows. The issue that I report below
>> does not
>>      > exist with previous version of R.
>>      > In order to reproduce the error you must install a package of your
>> choice
>>      > from source (tar.gz).
>>
>>      > -Create a .Rprofile file with the following command in it :
>> setwd("D:/")
>>      > -Close your R session and re-open it. Your working directory must be
>> now set
>>      > to D:
>>      > -Install a package of your choice from source, example :
>>      > install.packages("data.table",type="source")
>>
>>      > In my case the package fail to install and I get the following error
>>      > message:
>>
>>      > ** R
>>      > ** inst
>>      > ** byte-compile and prepare package for lazy loading
>>      > Error in tools:::.read_description(file) :
>>      > file 'DESCRIPTION' does not exist
>>      > Calls: suppressPackageStartupMessages ... withCallingHandlers ->
>>      > .getRequiredPackages -> <Anonymous> -> <Anonymous>
>>      > Execution halted
>>      > ERROR: lazy loading failed for package 'data.table'
>>      > * removing 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>>      > * restoring previous
>>      > 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>>      > Warning in install.packages :
>>      > installation of package ‘data.table’ had non-zero exit status
>>
>>      > Now remove the .Rprofile file, restart your R session and try to
>> install th
>> e
>>      > package with the same command.
>>      > In that case everything should be installed just fine.
>>
>>      > FYI the issue happens on macOS as well and I suspect it also does on
>> all
>>      > linux systems.
>>
>>      > My question: Is this expected or is it a bug?
>>
>> It is a bug, thank you very much for reporting it.
>>
>> I've been told privately by Ömer An (thank you!) who's been
>> affected as well, that this problem seems to affect others, and
>> that there's a thread about this over at the Rstudio support site
>>
>>
>> https://support.rstudio.com/hc/en-us/community/posts/200704708-Build-tool-does-not-recognize-DESCRIPTION-file
>>
>> There, users mention that (all?) packages are affected which
>> have a multiline 'Description:' field in their DESCRIPTION file.
>> Of course, many if not most packages have this property.
>>
>> Indeed, I can reproduce the problem (e.g. with my 'sfsmisc'
>> package) if I ("silly enough to") add a setwd() call to my
>> Rprofile file  (the one I set via env.var  R_PROFILE or R_PROFILE_USER).
>>
>> This is clearly a bug, and indeed a bad one.
>>
>> It seems all R core (and other R expert users who have tried R
>> 3.6.0 alpha, beta, and RC versions) have *not* seen the bug as they
>> are intuitively smart not to mess with R's working directory in
>> a global R profile file ...
>>
>> For now you definitively have to work around by not doing what's
>> the problem : do *NOT* setwd() in your  ~/.Rprofile or other
>> such R init files.
>>
>> Best,
>> Martin Maechler
>> ETH Zurich and  R Core Team
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 30 Apr 2019 16:15:46 +0200
>> From: <[hidden email]>
>> To: "'Morgan Morgan'" <[hidden email]>,
>>          <[hidden email]>
>> Subject: Re: [R] Bug in R 3.6.0?
>> Message-ID: <002d01d4ff5f$34816be0$9d8443a0$@free.fr>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hello,
>>
>> I have exactly the same problem when I install one of my own packages:
>>
>> Error in tools:::.read_description(file) :
>>    file 'DESCRIPTION' does not exist
>> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
>> .getRequiredPackages -> <Anonymous> -> <Anonymous>
>> Exécution arrêtée
>> ERROR: lazy loading failed for package 'RRegArch'
>>
>> Best,
>> Ollivier
>>
>>
>> -----Message d'origine-----
>> De : R-help <[hidden email]> De la part de Morgan Morgan
>> Envoyé : lundi 29 avril 2019 22:43
>> À : [hidden email]
>> Objet : [R] Bug in R 3.6.0?
>>
>> Hi,
>>
>> I am using the R 3.6.0 on windows. The issue that I report below does not
>> exist with previous version of R.
>> In order to reproduce the error you must install a package of your choice
>> from source (tar.gz).
>>
>> -Create a .Rprofile file with the following command in it : setwd("D:/")
>> -Close your R session and re-open it. Your working directory must be now
>> set to D:
>> -Install a package of your choice from source, example :
>> install.packages("data.table",type="source")
>>
>> In my case the package fail to install and I get the following error
>> message:
>>
>> ** R
>> ** inst
>> ** byte-compile and prepare package for lazy loading Error in
>> tools:::.read_description(file) :
>>    file 'DESCRIPTION' does not exist
>> Calls: suppressPackageStartupMessages ... withCallingHandlers ->
>> .getRequiredPackages -> <Anonymous> -> <Anonymous> Execution halted
>> ERROR: lazy loading failed for package 'data.table'
>> * removing 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>> * restoring previous
>> 'C:/Users/Morgan/Documents/R/win-library/3.6/data.table'
>> Warning in install.packages :
>>    installation of package ‘data.table’ had non-zero exit status
>>
>> Now remove the .Rprofile file, restart your R session and try to install
>> the package with the same command.
>> In that case everything should be installed just fine.
>>
>> FYI the issue happens on macOS as well and I suspect it also does on all
>> linux systems.
>>
>> My question: Is this expected or is it a bug?
>>
>> Thank you
>> Best regards,
>> Morgan
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Wed, 1 May 2019 00:57:43 +1000
>> From: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <[hidden email]>
>> To: [hidden email]
>> Subject: [R] Time series (trend over time) for irregular sampling
>>          dates and multiple sites
>> Message-ID:
>>          <
>> [hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I have a dataset of marine debris items (number of items standardized per
>> effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
>> locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA
>> and 4 in Queensland) at irregular sampling intervals over a period 15
>> years.
>>
>> I want to test if there is a change over the years on the amount of debris
>> in these locations and more specifically a change after the implementation
>> of a mitigation strategy (in 2013).
>> Here’s the head of the data:[image: enter image description here]
>> <https://i.stack.imgur.com/VNIpb.png>Description of each one of the
>> varables in the dataframe:
>>
>> *eventid *= each sampling (clean-up) event Location = Queensland and New
>> South Wales Sites = all the 9 sampling beaches
>>
>> *Date *= specific dates for the clean-up events (day-month-year)
>>
>> *Date1 *= specific dates for the clean-up events (day-month-year) on the
>> POSICXT format Year= Year of sampling event (2004 to 2018)
>>
>> *Month*= Month of the sampling event (jan to dec)
>>
>> *nMonth*= a number was determined to the respective month of the sampling
>> event (1 to 12)
>>
>> *Day*= Day of sampling (1 to 31) Days = Days since the first date of clean
>> up = just another way of using the dates
>>
>> *MARPOL *= before and after implementation (factor with 2 levels)
>>
>> *DaysC *= days between sampling events for the same sites = number of days
>> since the previous clean-up event
>>
>> *DaysI *= Days since intervention, all the dates before implementation are
>> zero, and after we count the number of days since the implementation date
>> (1 jan 2013)
>>
>> *DaysIa*= same as DayI but instead of zero for before the intervention we
>> have negative values (days)
>>
>> *Items *= number of fishing and shipping items counted in each clean-up
>> event
>>
>> *Hours *= hours spent by all volunteers together at each clean up event
>>
>> *Lenght *= Lenght of beach sampled by all volunteers together at each clean
>> up event volunteers = all volunteers at each clean up event
>>
>> *HoursVolunteer *= hours spent bt each volunteer at each clean up event
>> (Hours/volunteers)
>>
>> *Ieffort *= the items standarized by the effort (hours, volunteers and
>> lenght)
>>
>> *GrossWeight & **GrossTotal are not relevant *
>> ------------------------------
>> Problems:
>>
>> My data has a few problems: (1) I think I will need to fix the effects of
>> seasonal variation (Monthly) and (2) of possible spatial correlation
>> (probability of finding an item is higher after finding one since they can
>> come from the same ship). (3) How do I handle the fact that the
>> measurements were not taken at a regular interval?
>>
>> I was trying to use GAMs to analyse the data and see the trends over time.
>> The model I came across is the following:
>>
>> m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12),
>> random=list(Site=~1,Location=~1),data = d)
>>
>> *thank you in advance.*
>> -
>> *Catarina Serra Gonçalves *
>> PhD candidate
>>
>> Adrift Lab  <https://adriftlab.org>
>> University of Tasmania <http://www.utas.edu.au/> | Institute for Marine
>> and
>> Antarctic Studies  <http://www.imas.utas.edu.au/>
>> Launceston, TAS | Australia
>>
>> Personal website <https://catarinasg.wixsite.com/acserra>
>> <https://catarinasg.wixsite.com/acserra>| E-mail  <[hidden email]> |
>> Twitter <https://twitter.com/CatarinaSerraG>
>> Research Gate
>> <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
>> Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
>>
>>          [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Tue, 30 Apr 2019 08:28:37 -0700
>> From: Bert Gunter <[hidden email]>
>> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <[hidden email]>
>> Cc: R-help <[hidden email]>
>> Subject: Re: [R]  Time series (trend over time) for irregular sampling
>>          dates and multiple sites
>> Message-ID:
>>          <CAGxFJbT2YSB1xcs0MajpeqtHbbn4T1ycYoSOBEFvMucFme1t=
>> [hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I have 0 expertise, but I suggest that you check out the SPatioTemporal
>> taskview on CRAN (or possibly others, like environmetrics). You might also
>> want to move this to the R-Sig-geo list,where you probably are more likely
>> to find relevant expertise.
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Apr 30, 2019 at 8:13 AM Catarina Serra Gonçalves <
>> [hidden email]> wrote:
>>
>>> I have a dataset of marine debris items (number of items standardized per
>>> effort: Items/(number of volunteers*Hours*Lenght)) taken from 2 main
>>> locations (WA and Queensland) in Australia (8 Sub Sites in total: 4 in WA
>>> and 4 in Queensland) at irregular sampling intervals over a period 15
>>> years.
>>>
>>> I want to test if there is a change over the years on the amount of
>> debris
>>> in these locations and more specifically a change after the
>> implementation
>>> of a mitigation strategy (in 2013).
>>> Here’s the head of the data:[image: enter image description here]
>>> <https://i.stack.imgur.com/VNIpb.png>Description of each one of the
>>> varables in the dataframe:
>>>
>>> *eventid *= each sampling (clean-up) event Location = Queensland and New
>>> South Wales Sites = all the 9 sampling beaches
>>>
>>> *Date *= specific dates for the clean-up events (day-month-year)
>>>
>>> *Date1 *= specific dates for the clean-up events (day-month-year) on the
>>> POSICXT format Year= Year of sampling event (2004 to 2018)
>>>
>>> *Month*= Month of the sampling event (jan to dec)
>>>
>>> *nMonth*= a number was determined to the respective month of the sampling
>>> event (1 to 12)
>>>
>>> *Day*= Day of sampling (1 to 31) Days = Days since the first date of
>> clean
>>> up = just another way of using the dates
>>>
>>> *MARPOL *= before and after implementation (factor with 2 levels)
>>>
>>> *DaysC *= days between sampling events for the same sites = number of
>> days
>>> since the previous clean-up event
>>>
>>> *DaysI *= Days since intervention, all the dates before implementation
>> are
>>> zero, and after we count the number of days since the implementation date
>>> (1 jan 2013)
>>>
>>> *DaysIa*= same as DayI but instead of zero for before the intervention we
>>> have negative values (days)
>>>
>>> *Items *= number of fishing and shipping items counted in each clean-up
>>> event
>>>
>>> *Hours *= hours spent by all volunteers together at each clean up event
>>>
>>> *Lenght *= Lenght of beach sampled by all volunteers together at each
>> clean
>>> up event volunteers = all volunteers at each clean up event
>>>
>>> *HoursVolunteer *= hours spent bt each volunteer at each clean up event
>>> (Hours/volunteers)
>>>
>>> *Ieffort *= the items standarized by the effort (hours, volunteers and
>>> lenght)
>>>
>>> *GrossWeight & **GrossTotal are not relevant *
>>> ------------------------------
>>> Problems:
>>>
>>> My data has a few problems: (1) I think I will need to fix the effects of
>>> seasonal variation (Monthly) and (2) of possible spatial correlation
>>> (probability of finding an item is higher after finding one since they
>> can
>>> come from the same ship). (3) How do I handle the fact that the
>>> measurements were not taken at a regular interval?
>>>
>>> I was trying to use GAMs to analyse the data and see the trends over
>> time.
>>> The model I came across is the following:
>>>
>>> m4<- gamm(Ieffort ~ s(DaysIa)+MARPOL+ s(nMonth, bs = "ps", k = 12),
>>> random=list(Site=~1,Location=~1),data = d)
>>>
>>> *thank you in advance.*
>>> -
>>> *Catarina Serra Gonçalves *
>>> PhD candidate
>>>
>>> Adrift Lab  <https://adriftlab.org>
>>> University of Tasmania <http://www.utas.edu.au/> | Institute for Marine
>>> and
>>> Antarctic Studies  <http://www.imas.utas.edu.au/>
>>> Launceston, TAS | Australia
>>>
>>> Personal website <https://catarinasg.wixsite.com/acserra>
>>> <https://catarinasg.wixsite.com/acserra>| E-mail  <[hidden email]>
>> |
>>> Twitter <https://twitter.com/CatarinaSerraG>
>>> Research Gate
>>> <https://www.researchgate.net/profile/Catarina_Serra_Goncalves> | Google
>>> Scholar <https://scholar.google.pt/citations?user=8nBrRFwAAAAJ&hl=en>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Tue, 30 Apr 2019 17:24:33 +0200
>> From: Jens Heumann <[hidden email]>
>> To: <[hidden email]>
>> Subject: [R] Passing formula as parameter to `lm` within `sapply`
>>          causes error [BUG?]
>> Message-ID: <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>>
>> Hi,
>>
>> `lm` won't take formula as a parameter when it is within a `sapply`; see
>> example below. Please, could anyone either point me to a syntax error or
>> confirm that this might be a bug?
>>
>> Best,
>> Jens
>>
>> [Disclaimer: This is my first post here, following advice of how to
>> proceed with possible bugs from here: https://www.r-project.org/bugs.html]
>>
>>
>> SUMMARY
>>
>> While `lm` alone accepts formula parameter `FO` well, the same within a
>> `sapply` causes an error. When putting everything as parameter but
>> formula `FO`, it's still working, though. All parameters work fine
>> within a similar `for` loop.
>>
>>
>> MCVE (see data / R-version at bottom)
>>
>>   > summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
>>     Estimate Std. Error    t value   Pr(>|t|)
>>    1.6269038  0.9042738  1.7991275  0.3229600
>>   > summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
>>     Estimate Std. Error    t value   Pr(>|t|)
>>    1.6269038  0.9042738  1.7991275  0.3229600
>>   > sapply(unique(df1$z), function(s)
>> +   summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
>>                   [,1]       [,2]         [,3]
>> Estimate   1.6269038 -0.1404174 -0.010338774
>> Std. Error 0.9042738  0.4577001  1.858138516
>> t value    1.7991275 -0.3067890 -0.005564049
>> Pr(>|t|)   0.3229600  0.8104951  0.996457853
>>   > sapply(unique(data[[st]]), function(s)
>> +   summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ])  # !!!
>> Error in eval(substitute(subset), data, env) : object 's' not found
>>   > sapply(unique(data[[st]]), function(s)
>> +   summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
>>                   [,1]       [,2]         [,3]
>> Estimate   1.6269038 -0.1404174 -0.010338774
>> Std. Error 0.9042738  0.4577001  1.858138516
>> t value    1.7991275 -0.3067890 -0.005564049
>> Pr(>|t|)   0.3229600  0.8104951  0.996457853
>>   > m <- matrix(NA, 4, length(unique(data[[st]])))
>>   > for (s in unique(data[[st]])) {
>> +   m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ]
>> + }
>>   > m
>>             [,1]       [,2]         [,3]
>> [1,] 1.6269038 -0.1404174 -0.010338774
>> [2,] 0.9042738  0.4577001  1.858138516
>> [3,] 1.7991275 -0.3067890 -0.005564049
>> [4,] 0.3229600  0.8104951  0.996457853
>>
>> # DATA #################################################################
>>
>> df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
>> 0.363128411337339,
>> 0.63286260496104, 0.404268323140999, -0.106124516091484, 1.51152199743894,
>> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
>> 0.740171482827397, 2.64977380403845, -0.755998096151299, 0.125479556323628,
>> -0.239445852485142, 2.14747239550901, -0.37891195982917, -0.638031707027734
>> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
>> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA,
>> -9L))
>>
>> FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1
>>
>> ########################################################################
>>
>>   > R.version
>>                  _
>> platform       x86_64-w64-mingw32
>> arch           x86_64
>> os             mingw32
>> system         x86_64, mingw32
>> status
>> major          3
>> minor          6.0
>> year           2019
>> month          04
>> day            26
>> svn rev        76424
>> language       R
>> version.string R version 3.6.0 (2019-04-26)
>> nickname       Planting of a Tree
>>
>> #########################################################################
>>
>> NOTE: Question on SO two days ago
>> (
>> https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
>>
>> brought many views but neither answer nor bug confirmation.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 6
>> Date: Mon, 29 Apr 2019 21:38:00 +0300
>> From: Haddison Mureithi <[hidden email]>
>> To: [hidden email]
>> Subject: [R] (no subject)
>> Message-ID:
>>          <CABVwvn6y_M2M1o41HryKYp=
>> [hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hello guys this problem was never answered and I happened to come across
>> the same problem , kindly help. This is a simple R program that I have been
>> trying to run. I keep running into the "singular matrix" error. I end up
>> with no sensible results. Can anyone suggest any changes or a way around
>> this?
>>
>> I am a total rookie when working with R.
>>
>> Thanks,
>> Rasika
>>
>>> library(survival)
>> Loading required package: splines
>>> args(coxph)
>> function (formula, data, weights, subset, na.action, init, control,
>>      method = c("efron", "breslow", "exact"), singular.ok = TRUE,
>>      robust = FALSE, model = FALSE, x = FALSE, y = TRUE, tt, ...)
>> NULL
>>> test1<-read.table("S:/FISHDO/03_Phase_I_Field_Work/Data_6_28_2011/Working
>> Folder/R_files/4SondesJuly24.csv", header=T, sep=",")
>>> sondes<-coxph(Surv(Start, Stop, Depart)~DOLoomis + DOI55 + DODamen,
>> data=test1)
>> Warning messages:
>> 1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
>>    Loglik converged before variable  1,2 ; beta may be infinite.
>> 2: In coxph(Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 + DODamen,  :
>>    X matrix deemed to be singular; variable 3
>>> summary(sondes)
>> Call:
>> coxph(formula = Surv(Start, Stop, Depart) ~ DOLoomis + DOI55 +
>>      DODamen, data = test1)
>>
>>    n= 1737, number of events= 58
>>     (1 observation deleted due to missingness)
>>
>>                 coef  exp(coef)   se(coef)  z Pr(>|z|)
>> DOLoomis -2.152e+00  1.163e-01  1.161e+05  0        1
>> DOI55     4.560e-01  1.578e+00  3.755e+04  0        1
>> DODamen          NA         NA  0.000e+00 NA       NA
>>
>>           exp(coef) exp(-coef) lower .95 upper .95
>> DOLoomis    0.1163     8.5995         0       Inf
>> DOI55       1.5777     0.6338         0       Inf
>> DODamen         NA         NA        NA        NA
>>
>> Concordance= 0.5  (se = 0 )
>> Rsquare= 0   (max possible= 0.01 )
>> Likelihood ratio test= 0  on 2 df,   p=1
>> Wald test            = 0  on 2 df,   p=1
>> Score (logrank) test = 0  on 2 df,   p=1
>>
>>          [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 7
>> Date: Tue, 30 Apr 2019 16:50:48 +0000
>> From: Bill Poling <[hidden email]>
>> To: "r-help ([hidden email])" <[hidden email]>
>> Subject: [R] Help with loop for column means into new column by a
>>          subset Factor w/131 levels
>> Message-ID:
>>          <
>> [hidden email]
>>>
>>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> Good afternoon.
>>
>> #RStudio Version 1.1.456
>> sessionInfo()
>> #R version 3.5.3 (2019-03-11)
>> #Platform: x86_64-w64-mingw32/x64 (64-bit)
>> #Running under: Windows >= 8 x64 (build 9200)
>>
>>
>>
>> #I have a DF of 8 columns and 14025 rows
>>
>> str(hcd2tmp2)
>>
>> # 'data.frame':14025 obs. of  8 variables:
>> # $ Submitted_Charge: num  21021 15360 40561 29495 7904 ...
>> # $ Allowed_Amt     : num  18393 6254 40561 29495 7904 ...
>> # $ Submitted_Units : num  60 240 420 45 120 215 215 15 57 2 ...
>> # $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117
>> 125 24 85 85 90 86 25 ...
>> # $ AllowByLimit    : num  4.268 0.949 7.913 6.124 3.524 ...
>> # $ UnitsByDose     : num  600 240 420 450 120 215 215 750 570 500 ...
>> # $ LimitByUnits    : num  4310 6591 5126 4816 2243 ...
>> # $ HCPCSCodeDose1  : num  10 1 1 10 1 1 1 50 10 250 ...
>>
>> #I would like to create four additional columns that are the mean of four
>> current columns in the DF.
>> #Current columns
>> #Allowed_Amt
>> #LimitByUnits
>> #AllowByLimit
>> #UnitsByDose
>>
>> #The goal is to be able to identify rows where (for instance) Allowed_Amt
>> is greater than the average (aka outliers).
>>
>> #The trick Is I want the means of those columns based on a Factor value
>> #The Factor is:
>> #Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"
>>
>> #So each of my four new columns will have 131 distinct values based on the
>> mean for the specific Procedure_Code1 grouping
>>
>> #In SQL it would look something like this:
>>
>> #SELECT *,
>> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
>> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
>> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
>> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
>> #INTO NewTable
>> #FROM Oldtable
>>
>> #Here are some sample data
>>
>> head(hcd2tmp2, n=40)
>> #      Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
>> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
>> # 1          21020.70    18393.12              60           J1745
>> 4.2679810         600      4309.56             10
>> # 2          15360.00     6254.40             240           J9299
>> 0.9488785         240      6591.36              1
>> # 3          40561.32    40561.32             420           J9306
>> 7.9133539         420      5125.68              1
>> # 4          29495.25    29495.25              45           J9355
>> 6.1244417         450      4815.99             10
>> # 5           7904.30     7904.30             120           J0897
>> 3.5243000         120      2242.80              1
>> # 6          15331.95    10614.31             215           J9034
>> 2.0586686         215      5155.91              1
>> # 7          15331.95    10614.31             215           J9034
>> 2.0586686         215      5155.91              1
>> # 8            461.90        0.00              15           J9045
>> 0.0000000         750        46.38             50
>> # 9          27340.96    15092.21              57           J9035
>> 3.2600227         570      4629.48             10
>> # 10           768.00      576.00               2           J1190
>> 1.3617343         500       422.99            250
>> # 11           101.00       38.38               5           J2250
>>   59.9687500           5         0.64              1
>> # 12         17458.40        0.00             200           J9033
>> 0.0000000         200      5990.00              1
>> # 13          7885.10     7569.70               1           J1745
>> 105.3835445          10        71.83             10
>> # 14          2015.00     1155.78               4           J2785
>> 5.0051100           0       230.92              0
>> # 15           443.72      443.72              12           J9045
>>   11.9601078         600        37.10             50
>> # 16        113750.00   113750.00             600           J2350
>> 3.3025003         600     34443.60              1
>> # 17          3582.85     3582.85              10           J2469
>>   30.5573561         250       117.25             25
>> # 18          5152.65     5152.65              50           J2796
>> 1.4362988         500      3587.45             10
>> # 19          5152.65     5152.65              50           J2796
>> 1.4362988         500      3587.45             10
>> # 20         39664.09        0.00              74           J9355
>> 0.0000000         740      7919.63             10
>> # 21           166.71      102.53               9           J9045
>> 3.6841538         450        27.83             50
>> # 22         13823.61     9676.53               1           J2505
>> 2.0785247           6      4655.48              6
>> # 23         90954.00    26436.53             360           J1786
>> 1.7443775        3600     15155.28             10
>> # 24          4800.00     3494.40             800           J3262
>> 0.8861838         800      3943.20              1
>> # 25           216.00      105.84               4           J0696
>>   42.3360000        1000         2.50            250
>> # 26          5300.00     4770.00               1           J0178
>> 4.9677151           1       960.20              1
>> # 27         35203.00    35203.00             200           J9271
>> 3.5772498         200      9840.80              1
>> # 28         17589.15    17589.15             300           J3380
>> 2.9696855         300      5922.90              1
>> # 29         18394.64    17842.79               1           J9355
>> 166.7238834          10       107.02             10
>> # 30           770.00      731.50              10           J2469
>> 6.2388060         250       117.25             25
>> # 31           461.90        0.00              15           J9045
>> 0.0000000         750        46.38             50
>> # 32          8160.00     3342.40              80           J1459
>> 1.0260818       40000      3257.44            500
>> # 33          1653.48      314.16               6           J9305
>> 0.7661505          60       410.05             10
>> # 34         13036.50        0.00             194           J9034
>> 0.0000000         194      4652.31              1
>> # 35         10486.87        0.00             156           J9034
>> 0.0000000         156      3741.04              1
>> # 36         15360.00     6254.40             240           J9299
>> 0.9488785         240      6591.36              1
>> # 37          1616.83     1616.83             150           J1453
>> 5.2528590         150       307.80              1
>> # 38         80685.74    34772.43              96           J9035
>> 4.4597077         960      7797.02             10
>> # 39         85220.58    35925.13             287           J9299
>> 4.5577715         287      7882.17              1
>> # 40          3860.17     1627.27              13           J9299
>> 4.5577963          13       357.03              1
>>
>>
>> #I hope this is enough inforamtion to warrant your support
>> #Thank you
>> #WHP
>>
>>
>>
>> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 8
>> Date: Tue, 30 Apr 2019 18:45:40 +0000
>> From: Bill Poling <[hidden email]>
>> To: "r-help ([hidden email])" <[hidden email]>
>> Subject: Re: [R] Help with loop for column means into new column by a
>>          subset Factor w/131 levels
>> Message-ID:
>>          <
>> [hidden email]
>>>
>>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> I ran this routine but I was thinking there must be a more elegant way of
>> doing this.
>>
>>
>> #
>> https://community.rstudio.com/t/how-to-average-mean-variables-in-r-based-on-the-level-of-another-variable-and-save-this-as-a-new-variable/8764/8
>>
>> hcd2tmp2_summmary <- hcd2tmp2 %>%
>>    select(.) %>%
>>    group_by(Procedure_Code1) %>%
>>    summarize(average = mean(Allowed_Amt))
>> # A tibble: 131 x 2
>> # Procedure_Code1 average
>> # <fct>             <dbl>
>> # 1 A9606            57785.
>> # 2 J0129             5420.
>> # 3 J0178             4700.
>> # 4 J0180            13392.
>> # 5 J0202            56328.
>> # 6 J0256            17366.
>> # 7 J0257             7563.
>> # 8 J0485             2450.
>> # 9 J0490             6398.
>> # 10 J0585            4492.
>> # ... with 121 more rows
>>
>> hcd2tmp2 <- hcd2tmp %>%
>>    group_by(Procedure_Code1) %>%
>>    summarise(Avg_Allowed_Amt = mean(Allowed_Amt))
>>
>> view(hcd2tmp2)
>>
>>
>> hcd2tmp3 <- hcd2tmp %>%
>>    group_by(Procedure_Code1) %>%
>>    summarise(Avg_AllowByLimit = mean(AllowByLimit))
>>
>> view(hcd2tmp3)
>>
>>
>> hcd2tmp4 <- hcd2tmp %>%
>>    group_by(Procedure_Code1) %>%
>>    summarise(Avg_UnitsByDose = mean(UnitsByDose))
>>
>> view(hcd2tmp4)
>>
>> hcd2tmp5 <- hcd2tmp %>%
>>    group_by(Procedure_Code1) %>%
>>    summarise(Avg_LimitByUnits = mean(LimitByUnits))
>>
>> view(hcd2tmp5)
>>
>> #Joins----
>>
>>
>> hcd2tmp <- left_join(hcd2tmp2, hcd2tmp, by =
>> c("Procedure_Code1"="Procedure_Code1"))
>> hcd2tmp <- left_join(hcd2tmp3, hcd2tmp, by =
>> c("Procedure_Code1"="Procedure_Code1"))
>> hcd2tmp <- left_join(hcd2tmp4, hcd2tmp, by =
>> c("Procedure_Code1"="Procedure_Code1"))
>> hcd2tmp <- left_join(hcd2tmp5, hcd2tmp, by =
>> c("Procedure_Code1"="Procedure_Code1"))
>>
>> view(hcd2tmp)
>>
>> hcd2tmp$Avg_LimitByUnits <- round(hcd2tmp$Avg_LimitByUnits, digits = 2)
>> hcd2tmp$Avg_Allowed_Amt <- round(hcd2tmp$Avg_Allowed_Amt, digits = 2)
>> hcd2tmp$Avg_AllowByLimit <- round(hcd2tmp$Avg_AllowByLimit, digits = 2)
>> hcd2tmp$Avg_UnitsByDose <- round(hcd2tmp$Avg_UnitsByDose, digits = 2)
>>
>> view(hcd2tmp)
>>
>> #Over under columns----
>> hcd2tmp$AllowByLimitFlag <- hcd2tmp$AllowByLimit > hcd2tmp$Avg_AllowByLimit
>> hcd2tmp$LimitByUnitsFlag <- hcd2tmp$LimitByUnits > hcd2tmp$Avg_LimitByUnits
>> hcd2tmp$Allowed_AmtFlag  <- hcd2tmp$Allowed_Amt  > hcd2tmp$Avg_Allowed_Amt
>> hcd2tmp$UnitsByDoseFlag  <- hcd2tmp$UnitsByDose  > hcd2tmp$Avg_UnitsByDose
>>
>> view(hcd2tmp)
>>
>>
>> -----Original Message-----
>> From: Bill Poling
>> Sent: Tuesday, April 30, 2019 12:51 PM
>> To: r-help ([hidden email]) <[hidden email]>
>> Cc: Bill Poling <[hidden email]>
>> Subject: Help with loop for column means into new column by a subset
>> Factor w/131 levels
>>
>> Good afternoon.
>>
>> #RStudio Version 1.1.456
>> sessionInfo()
>> #R version 3.5.3 (2019-03-11)
>> #Platform: x86_64-w64-mingw32/x64 (64-bit) #Running under: Windows >= 8
>> x64 (build 9200)
>>
>>
>>
>> #I have a DF of 8 columns and 14025 rows
>>
>> str(hcd2tmp2)
>>
>> # 'data.frame':14025 obs. of  8 variables:
>> # $ Submitted_Charge: num  21021 15360 40561 29495 7904 ...
>> # $ Allowed_Amt     : num  18393 6254 40561 29495 7904 ...
>> # $ Submitted_Units : num  60 240 420 45 120 215 215 15 57 2 ...
>> # $ Procedure_Code1 : Factor w/ 131 levels "A9606","J0129",..: 43 113 117
>> 125 24 85 85 90 86 25 ...
>> # $ AllowByLimit    : num  4.268 0.949 7.913 6.124 3.524 ...
>> # $ UnitsByDose     : num  600 240 420 450 120 215 215 750 570 500 ...
>> # $ LimitByUnits    : num  4310 6591 5126 4816 2243 ...
>> # $ HCPCSCodeDose1  : num  10 1 1 10 1 1 1 50 10 250 ...
>>
>> #I would like to create four additional columns that are the mean of four
>> current columns in the DF.
>> #Current columns
>> #Allowed_Amt
>> #LimitByUnits
>> #AllowByLimit
>> #UnitsByDose
>>
>> #The goal is to be able to identify rows where (for instance) Allowed_Amt
>> is greater than the average (aka outliers).
>>
>> #The trick Is I want the means of those columns based on a Factor value
>> #The Factor is:
>> #Procedure_Code1 : Factor w/ 131 levels "A9606","J0129"
>>
>> #So each of my four new columns will have 131 distinct values based on the
>> mean for the specific Procedure_Code1 grouping
>>
>> #In SQL it would look something like this:
>>
>> #SELECT *,
>> # NewCol1 = mean(Allowed_Amt) OVER (PARTITION BY Procedure_Code1),
>> # NewCol2 = mean(LimitByUnits) OVER (PARTITION BY Procedure_Code1),
>> # NewCol3 = mean(AllowByLimit) OVER (PARTITION BY Procedure_Code1),
>> # NewCol4 = mean(UnitsByDose) OVER (PARTITION BY Procedure_Code1)
>> #INTO NewTable
>> #FROM Oldtable
>>
>> #Here are some sample data
>>
>> head(hcd2tmp2, n=40)
>> #      Submitted_Charge Allowed_Amt Submitted_Units Procedure_Code1
>> AllowByLimit UnitsByDose LimitByUnits HCPCSCodeDose1
>> # 1          21020.70    18393.12              60           J1745
>> 4.2679810         600      4309.56             10
>> # 2          15360.00     6254.40             240           J9299
>> 0.9488785         240      6591.36              1
>> # 3          40561.32    40561.32             420           J9306
>> 7.9133539         420      5125.68              1
>> # 4          29495.25    29495.25              45           J9355
>> 6.1244417         450      4815.99             10
>> # 5           7904.30     7904.30             120           J0897
>> 3.5243000         120      2242.80              1
>> # 6          15331.95    10614.31             215           J9034
>> 2.0586686         215      5155.91              1
>> # 7          15331.95    10614.31             215           J9034
>> 2.0586686         215      5155.91              1
>> # 8            461.90        0.00              15           J9045
>> 0.0000000         750        46.38             50
>> # 9          27340.96    15092.21              57           J9035
>> 3.2600227         570      4629.48             10
>> # 10           768.00      576.00               2           J1190
>> 1.3617343         500       422.99            250
>> # 11           101.00       38.38               5           J2250
>>   59.9687500           5         0.64              1
>> # 12         17458.40        0.00             200           J9033
>> 0.0000000         200      5990.00              1
>> # 13          7885.10     7569.70               1           J1745
>> 105.3835445          10        71.83             10
>> # 14          2015.00     1155.78               4           J2785
>> 5.0051100           0       230.92              0
>> # 15           443.72      443.72              12           J9045
>>   11.9601078         600        37.10             50
>> # 16        113750.00   113750.00             600           J2350
>> 3.3025003         600     34443.60              1
>> # 17          3582.85     3582.85              10           J2469
>>   30.5573561         250       117.25             25
>> # 18          5152.65     5152.65              50           J2796
>> 1.4362988         500      3587.45             10
>> # 19          5152.65     5152.65              50           J2796
>> 1.4362988         500      3587.45             10
>> # 20         39664.09        0.00              74           J9355
>> 0.0000000         740      7919.63             10
>> # 21           166.71      102.53               9           J9045
>> 3.6841538         450        27.83             50
>> # 22         13823.61     9676.53               1           J2505
>> 2.0785247           6      4655.48              6
>> # 23         90954.00    26436.53             360           J1786
>> 1.7443775        3600     15155.28             10
>> # 24          4800.00     3494.40             800           J3262
>> 0.8861838         800      3943.20              1
>> # 25           216.00      105.84               4           J0696
>>   42.3360000        1000         2.50            250
>> # 26          5300.00     4770.00               1           J0178
>> 4.9677151           1       960.20              1
>> # 27         35203.00    35203.00             200           J9271
>> 3.5772498         200      9840.80              1
>> # 28         17589.15    17589.15             300           J3380
>> 2.9696855         300      5922.90              1
>> # 29         18394.64    17842.79               1           J9355
>> 166.7238834          10       107.02             10
>> # 30           770.00      731.50              10           J2469
>> 6.2388060         250       117.25             25
>> # 31           461.90        0.00              15           J9045
>> 0.0000000         750        46.38             50
>> # 32          8160.00     3342.40              80           J1459
>> 1.0260818       40000      3257.44            500
>> # 33          1653.48      314.16               6           J9305
>> 0.7661505          60       410.05             10
>> # 34         13036.50        0.00             194           J9034
>> 0.0000000         194      4652.31              1
>> # 35         10486.87        0.00             156           J9034
>> 0.0000000         156      3741.04              1
>> # 36         15360.00     6254.40             240           J9299
>> 0.9488785         240      6591.36              1
>> # 37          1616.83     1616.83             150           J1453
>> 5.2528590         150       307.80              1
>> # 38         80685.74    34772.43              96           J9035
>> 4.4597077         960      7797.02             10
>> # 39         85220.58    35925.13             287           J9299
>> 4.5577715         287      7882.17              1
>> # 40          3860.17     1627.27              13           J9299
>> 4.5577963          13       357.03              1
>>
>>
>> #I hope this is enough inforamtion to warrant your support
>> #Thank you
>> #WHP
>>
>>
>>
>> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 9
>> Date: Tue, 30 Apr 2019 15:24:57 -0400
>> From: Matthew <[hidden email]>
>> To: "r-help ([hidden email])" <[hidden email]>
>> Subject: [R] transpose and split dataframe
>> Message-ID:
>>          <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"; Format="flowed"
>>
>> I have a data frame that is a lot bigger but for simplicity sake we can
>> say it looks like this:
>>
>> Regulator    hits
>> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
>> AT2G55980    AT2G85403,AT4G89223
>>
>>      In other words:
>>
>> data.frame : 2 obs. of 2 variables
>> $Regulator: Factor w/ 2 levels
>> $hits         : Factor w/ 6 levels
>>
>>     I want to transpose it so that Regulator is now the column headings
>> and each of the AGI numbers now separated by commas is a row. So,
>> AT1G69490 is now the header of the first column and AT4G31950 is row 1
>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>
>>     I have tried playing around with strsplit(TF2list[2:2]) and
>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>
>> Matthew
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 10
>> Date: Tue, 30 Apr 2019 21:04:50 +0000
>> From: David L Carlson <[hidden email]>
>> To: "[hidden email]" <[hidden email]>, Matthew
>>          <[hidden email]>
>> Subject: Re: [R] transpose and split dataframe
>> Message-ID: <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> I neglected to copy this to the list:
>>
>> I think we need more information. Can you give us the structure of the
>> data with str(YourDataFrame). Alternatively you could copy a small piece
>> into your email message by copying and pasting the results of the following
>> code:
>>
>> dput(head(YourDataFrame))
>>
>> The data frame you present could not be a data frame since you say "hits"
>> is a factor with a variable number of elements. If each value of "hits" was
>> a single character string, it would only have 2 factor levels not 6 and
>> your efforts to parse the string would make more sense. Transposing to a
>> data frame would only be possible if each column was padded with NAs to
>> make them equal in length. Since your example tries use the name TF2list,
>> it is possible that you do not have a data frame but a list and you have no
>> factor levels, just character vectors.
>>
>> If you are not familiar with R, it may be helpful to tell us what your
>> overall goal is rather than an intermediate step. Very likely R can easily
>> handle what you want by doing things a different way.
>>
>> ----------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>> -----Original Message-----
>> From: R-help <[hidden email]> On Behalf Of Matthew
>> Sent: Tuesday, April 30, 2019 2:25 PM
>> To: r-help ([hidden email]) <[hidden email]>
>> Subject: [R] transpose and split dataframe
>>
>> I have a data frame that is a lot bigger but for simplicity sake we can
>> say it looks like this:
>>
>> Regulator    hits
>> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
>> AT2G55980    AT2G85403,AT4G89223
>>
>>      In other words:
>>
>> data.frame : 2 obs. of 2 variables
>> $Regulator: Factor w/ 2 levels
>> $hits         : Factor w/ 6 levels
>>
>>     I want to transpose it so that Regulator is now the column headings
>> and each of the AGI numbers now separated by commas is a row. So,
>> AT1G69490 is now the header of the first column and AT4G31950 is row 1
>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>
>>     I have tried playing around with strsplit(TF2list[2:2]) and
>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>
>> Matthew
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ------------------------------
>>
>> Message: 11
>> Date: Tue, 30 Apr 2019 15:03:09 -0600
>> From: David Winsemius <[hidden email]>
>> To: Jens Heumann <[hidden email]>
>> Cc: [hidden email]
>> Subject: Re: [R] Passing formula as parameter to `lm` within `sapply`
>>          causes error [BUG?]
>> Message-ID: <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Try using do.call
>>
>> —
>> David
>>
>> Sent from my iPhone
>>
>>> On Apr 30, 2019, at 9:24 AM, Jens Heumann <
>> [hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> `lm` won't take formula as a parameter when it is within a `sapply`; see
>> example below. Please, could anyone either point me to a syntax error or
>> confirm that this might be a bug?
>>>
>>> Best,
>>> Jens
>>>
>>> [Disclaimer: This is my first post here, following advice of how to
>> proceed with possible bugs from here: https://www.r-project.org/bugs.html]
>>>
>>>
>>> SUMMARY
>>>
>>> While `lm` alone accepts formula parameter `FO` well, the same within a
>> `sapply` causes an error. When putting everything as parameter but formula
>> `FO`, it's still working, though. All parameters work fine within a similar
>> `for` loop.
>>>
>>>
>>> MCVE (see data / R-version at bottom)
>>>
>>>> summary(lm(y ~ x, df1, df1[["z"]] == 1, df1[["w"]]))$coef[1, ]
>>>   Estimate Std. Error    t value   Pr(>|t|)
>>> 1.6269038  0.9042738  1.7991275  0.3229600
>>>> summary(lm(FO, data, data[[st]] == st1, data[[ws]]))$coef[1, ]
>>>   Estimate Std. Error    t value   Pr(>|t|)
>>> 1.6269038  0.9042738  1.7991275  0.3229600
>>>> sapply(unique(df1$z), function(s)
>>> +   summary(lm(y ~ x, df1, df1[["z"]] == s, df1[[ws]]))$coef[1, ])
>>>                 [,1]       [,2]         [,3]
>>> Estimate   1.6269038 -0.1404174 -0.010338774
>>> Std. Error 0.9042738  0.4577001  1.858138516
>>> t value    1.7991275 -0.3067890 -0.005564049
>>> Pr(>|t|)   0.3229600  0.8104951  0.996457853
>>>> sapply(unique(data[[st]]), function(s)
>>> +   summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1, ])  # !!!
>>> Error in eval(substitute(subset), data, env) : object 's' not found
>>>> sapply(unique(data[[st]]), function(s)
>>> +   summary(lm(y ~ x, data, data[[st]] == s, data[[ws]]))$coef[1, ])
>>>                 [,1]       [,2]         [,3]
>>> Estimate   1.6269038 -0.1404174 -0.010338774
>>> Std. Error 0.9042738  0.4577001  1.858138516
>>> t value    1.7991275 -0.3067890 -0.005564049
>>> Pr(>|t|)   0.3229600  0.8104951  0.996457853
>>>> m <- matrix(NA, 4, length(unique(data[[st]])))
>>>> for (s in unique(data[[st]])) {
>>> +   m[, s] <- summary(lm(FO, data, data[[st]] == s, data[[ws]]))$coef[1,
>> ]
>>> + }
>>>> m
>>>           [,1]       [,2]         [,3]
>>> [1,] 1.6269038 -0.1404174 -0.010338774
>>> [2,] 0.9042738  0.4577001  1.858138516
>>> [3,] 1.7991275 -0.3067890 -0.005564049
>>> [4,] 0.3229600  0.8104951  0.996457853
>>>
>>> # DATA #################################################################
>>>
>>> df1 <- structure(list(x = c(1.37095844714667, -0.564698171396089,
>> 0.363128411337339,
>>> 0.63286260496104, 0.404268323140999, -0.106124516091484,
>> 1.51152199743894,
>>> -0.0946590384130976, 2.01842371387704), y = c(1.30824434809425,
>>> 0.740171482827397, 2.64977380403845, -0.755998096151299,
>> 0.125479556323628,
>>> -0.239445852485142, 2.14747239550901, -0.37891195982917,
>> -0.638031707027734
>>> ), z = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), w = c(0.7, 0.8,
>>> 1.2, 0.9, 1.3, 1.2, 0.8, 1, 1)), class = "data.frame", row.names = c(NA,
>>> -9L))
>>>
>>> FO <- y ~ x; data <- df1; st <- "z"; ws <- "w"; st1 <- 1
>>>
>>> ########################################################################
>>>
>>>> R.version
>>>                _
>>> platform       x86_64-w64-mingw32
>>> arch           x86_64
>>> os             mingw32
>>> system         x86_64, mingw32
>>> status
>>> major          3
>>> minor          6.0
>>> year           2019
>>> month          04
>>> day            26
>>> svn rev        76424
>>> language       R
>>> version.string R version 3.6.0 (2019-04-26)
>>> nickname       Planting of a Tree
>>>
>>> #########################################################################
>>>
>>> NOTE: Question on SO two days ago (
>> https://stackoverflow.com/questions/55893189/passing-formula-as-parameter-to-lm-within-sapply-causes-error-bug-confirmation)
>> brought many views but neither answer nor bug confirmation.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 12
>> Date: Tue, 30 Apr 2019 17:31:28 -0400
>> From: Matthew <[hidden email]>
>> To: "[hidden email]" <[hidden email]>
>> Subject: [R] Fwd: Re:  transpose and split dataframe
>> Message-ID:
>>          <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Thanks for your reply. I was trying to simplify it a little, but must
>> have got it wrong. Here is the real dataframe, TF2list:
>>
>>    str(TF2list)
>> 'data.frame':    152 obs. of  2 variables:
>>    $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
>> 54 82 82 82 82 82 ...
>>    $ hits     : Factor w/ 97 levels
>> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>>
>> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>>
>>      And the first few lines resulting from dput(head(TF2list)):
>>
>> dput(head(TF2list))
>> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
>> 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380",
>> "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300",
>> "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600",
>> "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ...
>>
>> This is another way of looking at the first 4 entries (Regulator is
>> tab-separated from hits):
>>
>> Regulator
>>     hits
>> 1
>> AT1G69490
>>
>>    AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
>> 2
>> AT1G29860
>>
>>    AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>>
>> 3
>> AT1G2986
>>
>>    AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
>>
>>      So, the goal would be to
>>
>> first: Transpose the existing dataframe so that the factor Regulator
>> becomes a column name (column 1 name = AT1G69490, column2 name
>> AT1G29860, etc.) and the hits associated with each Regulator become
>> rows. Hits is a comma separated 'list' ( I do not not know if
>> technically it is an R list.), so it would have to be comma
>> 'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950,
>> col 1 row 2 - AT5G24410, etc); like this :
>>
>> AT1G69490
>> AT4G31950
>> AT5G24110
>> AT1G05675
>> AT5G64905
>>
>> ... I did not include all the rows)
>>
>> I think it would be best to actually make the first entry a separate
>> dataframe ( 1 column with name = AT1G69490 and number of rows depending
>> on the number of hits), then make the second column (column name =
>> AT1G29860, and number of rows depending on the number of hits) into a
>> new dataframe and do a full join of of the two dataframes; continue by
>> making the third column (column name = AT1G2986) into a dataframe and
>> full join it with the previous; continue for the 152 observations so
>> that then end result is a dataframe with 152 columns and number of rows
>> depending on the entry with the greatest number of hits. The full joins
>> I can do with dplyr, but getting up to that point seems rather difficult.
>>
>> This would get me what my ultimate goal would be; each Regulator is a
>> column name (152 columns) and a given row has either NA or the same hit.
>>
>>      This seems very difficult to me, but I appreciate any attempt.
>>
>> Matthew
>>
>> On 4/30/2019 4:34 PM, David L Carlson wrote:
>>>           External Email - Use Caution
>>>
>>> I think we need more information. Can you give us the structure of the
>> data with str(YourDataFrame). Alternatively you could copy a small piece
>> into your email message by copying and pasting the results of the following
>> code:
>>>
>>> dput(head(YourDataFrame))
>>>
>>> The data frame you present could not be a data frame since you say
>> "hits" is a factor with a variable number of elements. If each value of
>> "hits" was a single character string, it would only have 2 factor levels
>> not 6 and your efforts to parse the string would make more sense.
>> Transposing to a data frame would only be possible if each column was
>> padded with NAs to make them equal in length. Since your example tries use
>> the name TF2list, it is possible that you do not have a data frame but a
>> list and you have no factor levels, just character vectors.
>>>
>>> If you are not familiar with R, it may be helpful to tell us what your
>> overall goal is rather than an intermediate step. Very likely R can easily
>> handle what you want by doing things a different way.
>>>
>>> ----------------------------------------
>>> David L Carlson
>>> Department of Anthropology
>>> Texas A&M University
>>> College Station, TX 77843-4352
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: R-help<[hidden email]>  On Behalf Of Matthew
>>> Sent: Tuesday, April 30, 2019 2:25 PM
>>> To: r-help ([hidden email])<[hidden email]>
>>> Subject: [R] transpose and split dataframe
>>>
>>> I have a data frame that is a lot bigger but for simplicity sake we can
>>> say it looks like this:
>>>
>>> Regulator    hits
>>> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
>>> AT2G55980    AT2G85403,AT4G89223
>>>
>>>       In other words:
>>>
>>> data.frame : 2 obs. of 2 variables
>>> $Regulator: Factor w/ 2 levels
>>> $hits         : Factor w/ 6 levels
>>>
>>>      I want to transpose it so that Regulator is now the column headings
>>> and each of the AGI numbers now separated by commas is a row. So,
>>> AT1G69490 is now the header of the first column and AT4G31950 is row 1
>>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
>>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>>
>>>      I have tried playing around with strsplit(TF2list[2:2]) and
>>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>>
>>> Matthew
>>>
>>> ______________________________________________
>>> [hidden email]  mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guidehttp://
>> www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>          [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 13
>> Date: Wed, 1 May 2019 07:46:32 +1000
>> From: Jim Lemon <[hidden email]>
>> To: Matthew <[hidden email]>
>> Cc: "r-help ([hidden email])" <[hidden email]>
>> Subject: Re: [R] transpose and split dataframe
>> Message-ID:
>>          <CA+8X3fUjv3APb=
>> [hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi Matthew,
>> Is this what you are trying to do?
>>
>> mmdf<-read.table(text="Regulator    hits
>> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
>> AT2G55980    AT2G85403,AT4G89223",header=TRUE,
>> stringsAsFactors=FALSE)
>> # split the second column at the commas
>> hitsplit<-strsplit(mmdf$hits,",")
>> # define a function that will fill with NAs
>> NAfill<-function(x,n) return(x[1:n])
>> # get the maximum length of hits
>> maxlen<-max(unlist(lapply(hitsplit,length)))
>> # fill the list with NAs
>> hitsplit<-lapply(hitsplit,NAfill,maxlen)
>> # change the names of the list
>> names(hitsplit)<-mmdf$Regulator
>> # convert to a data frame
>> tmmdf<-as.data.frame(hitsplit)
>>
>> Jim
>>
>> On Wed, May 1, 2019 at 5:25 AM Matthew <[hidden email]>
>> wrote:
>>>
>>> I have a data frame that is a lot bigger but for simplicity sake we can
>>> say it looks like this:
>>>
>>> Regulator    hits
>>> AT1G69490    AT4G31950,AT5G24110,AT1G26380,AT1G05675
>>> AT2G55980    AT2G85403,AT4G89223
>>>
>>>      In other words:
>>>
>>> data.frame : 2 obs. of 2 variables
>>> $Regulator: Factor w/ 2 levels
>>> $hits         : Factor w/ 6 levels
>>>
>>>     I want to transpose it so that Regulator is now the column headings
>>> and each of the AGI numbers now separated by commas is a row. So,
>>> AT1G69490 is now the header of the first column and AT4G31950 is row 1
>>> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
>>> column 2 and AT2G85403 is row 1 of column 2, etc.
>>>
>>>     I have tried playing around with strsplit(TF2list[2:2]) and
>>> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>>>
>>> Matthew
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 14
>> Date: Wed, 1 May 2019 09:58:34 +1200
>> From: Abs Spurdle <[hidden email]>
>> To: =?UTF-8?Q?Catarina_Serra_Gon=C3=A7alves?= <[hidden email]>
>> Cc: r-help <[hidden email]>
>> Subject: Re: [R]  Time series (trend over time) for irregular sampling
>>          dates and multiple sites
>> Message-ID:
>>          <
>> [hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>>> My data has a few problems: (1) I think I will need to fix the effects of
>>> seasonal variation (Monthly) and (2) of possible spatial correlation
>>> (probability of finding an item is higher after finding one since they
>> can
>>> come from the same ship). (3) How do I handle the fact that the
>>> measurements were not taken at a regular interval?
>>
>> Can I ask two questions:
>> (1) Is the data autocorrelated (or "Seasonal") over time?
>> If not then this problem is a lot simpler.
>> (2) Can you expand on the following statement?
>> "possible spatial correlation (probability of finding an item is higher
>> after finding one since they can come from the same ship"
>>
>>          [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ------------------------------
>>
>> Message: 15
>> Date: Tue, 30 Apr 2019 22:29:24 +0000
>> From: David L Carlson <[hidden email]>
>> To: Matthew <[hidden email]>, "[hidden email]"
>>          <[hidden email]>
>> Subject: Re: [R] Fwd: Re:  transpose and split dataframe
>> Message-ID: <[hidden email]>
>> Content-Type: text/plain; charset="utf-8"
>>
>> If you read the data frame with read.csv() or one of the other read()
>> functions, use the asis=TRUE argument to prevent conversion to factors. If
>> not do the conversion first:
>>
>> # Convert factors to characters
>> DataMatrix <- sapply(TF2list, as.character)
>> # Split the vector of hits
>> DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
>> # Use the values in Regulator to name the parts of the list
>> names(DataList) <- DataMatrix[,"Regulator"]
>>
>> # Now create a data frame
>> # How long is the longest list of hits?
>> mx <- max(sapply(DataList, length))
>> # Now add NAs to vectors shorter than mx
>> DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x))))
>> # Finally convert back to a data frame
>> TF2list2 <- do.call(data.frame, DataList2)
>>
>> Try this on a portion of the list, say 25 lines and print each object to
>> see what is happening.
>>
>> ----------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77843-4352
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: R-help <[hidden email]> On Behalf Of Matthew
>> Sent: Tuesday, April 30, 2019 4:31 PM
>> To: [hidden email]
>> Subject: [R] Fwd: Re: transpose and split dataframe
>>
>> Thanks for your reply. I was trying to simplify it a little, but must
>> have got it wrong. Here is the real dataframe, TF2list:
>>
>>    str(TF2list)
>> 'data.frame':    152 obs. of  2 variables:
>>    $ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54
>> 54 82 82 82 82 82 ...
>>    $ hits     : Factor w/ 97 levels
>> "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
>>
>> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>>
>>      And the first few lines resulting from dput(head(TF2list)):
>>
>> dput(head(TF2list))
>> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>

--
Michael
http://www.dewey.myzen.co.uk/home.html

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Survuval Anaysis

Peter Dalgaard-2
Also, please do not include every single other message from a digested list!!

And yes, most likely there is a linear dependency between the predictors, or the 3rd one is constant. There could be other possibilities, though.

> On 2 May 2019, at 17:44 , Michael Dewey <[hidden email]> wrote:
>
> Without more details it is hard to answer but it is suspicious that it is dropping one of your predictors and the standard errors of the other are very large. This suggests you should investigate the joint distribution of your predictors and the events.
>
> Michael
>

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.