|
Dear all,
I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Refreshing GOE props... - Second time: Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! (etc.) I have already search the Web for a solution, but I have found nothing useful. Here is the full source code (all the librairies are already loaded): ------ Sys.setenv(NOAWT=TRUE) source <- ReutersSource("reuters-21578.xml", encoding="UTF-8") reuters <- Corpus(source) reuters <- tm_map(reuters, as.PlainTextDocument) reuters <- tm_map(reuters, removePunctuation) reuters <- tm_map(reuters, tolower) reuters <- tm_map(reuters, removeWords, stopwords("english")) reuters <- tm_map(reuters, removeNumbers) reuters <- tm_map(reuters, stripWhitespace) reuters <- tm_map(reuters, stemDocument) ------ Thank you for your help, Julien ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Le vendredi 13 janvier 2012 à 15:49 +0100, Julien Velcin a écrit :
> Dear all, > > I have some troubles using the stemming algorithm provided by the tm > (text mining) + Snowball packages. > Here is my config: > > MacOS 10.5 > R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) > > I have installed all the needed packages (tm, rJava, rWeka, Snowball) > + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) > with : > > Sys.setenv(NOAWT=TRUE) > > The command tm_map(reuters, stemDocument) gives the following errors : > > - First time: > Error in .jnew(name) : > java.lang.InternalError: Can't start the AWT because Java was > started on the first thread. Make sure StartOnFirstThread is not > specified in your application's Info.plist or on the command line > Refreshing GOE props... There's a good workaround, though: run your code from JGR, which is a GUI written in Java. Snowball works well this way. Cheers ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
I use the version 1.6:
$ java -version java version "1.6.0_26" Julien On Jan 15, 2012, at 8:55 PM, Milan Bouchet-Valat wrote: > Le dimanche 15 janvier 2012 à 16:32 +0100, Julien Velcin a écrit : >> Unfortunately, it doesn't work. I've installed JGR and launched my >> script. I still obtain an error: >> >> Error in .jcall("RWekaInterfaces", "[S", "stem", .jcast(stemmer, >> "weka/ >> core/stemmers/Stemmer"), : >> RcallMethod: cannot determine object class >> >> Any new idea? > Just a guess, but what version of Java do you have? You can find > this in > the Java preferences panel (type "Java" in Spotlight to find it). > 1.6 is > required, and often only 1.5 is used by default on OS X. > > > Regards ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Le dimanche 15 janvier 2012 à 23:06 +0100, Julien Velcin a écrit :
> I use the version 1.6: > > $ java -version > java version "1.6.0_26" OK, so nothing wrong on that side. FWIW, I've found another person with the same problem: https://list.scms.waikato.ac.nz/pipermail/wekalist/2010-June/048895.html He says that when R is run as root, it works. But he's on Linux, and I'm not sure that's really likely to be the problem on OS X. You might still want to try this, anyway. Let's CC Kurt Hornik, he'll be much more clueful than I am... ;-) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
I had already seen this post. If I'm logged as root (it is possible
under macosx), here is the result: * First run: > reuters <- tm_map(reuters, stemDocument) Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Unable to create WEKA_HOME (/Users/admin/wekafiles) Unable to create packages directory (/Users/admin/wekafiles/packages) Unable to create repository cache directory (/Users/admin/wekafiles/ repCache) Unable to create WEKA_HOME (/Users/admin/wekafiles) Unable to create packages directory (/Users/admin/wekafiles/packages) Unable to create repository cache directory (/Users/admin/wekafiles/ repCache) Refreshing GOE props... * Second run: > reuters <- tm_map(reuters, stemDocument) Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! Stemmer 'porter' unknown! Stemmer 'english' unknown! BTW, thank you very much for your assistance. Julien On Jan 16, 2012, at 10:40 AM, Milan Bouchet-Valat wrote: > Le dimanche 15 janvier 2012 à 23:06 +0100, Julien Velcin a écrit : >> I use the version 1.6: >> >> $ java -version >> java version "1.6.0_26" > OK, so nothing wrong on that side. > > FWIW, I've found another person with the same problem: > https://list.scms.waikato.ac.nz/pipermail/wekalist/2010-June/048895.html > > He says that when R is run as root, it works. But he's on Linux, and > I'm > not sure that's really likely to be the problem on OS X. You might > still > want to try this, anyway. > > Let's CC Kurt Hornik, he'll be much more clueful than I am... ;-) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Le lundi 16 janvier 2012 à 11:27 +0100, Julien Velcin a écrit :
> I had already seen this post. If I'm logged as root (it is possible > under macosx), here is the result: > > * First run: > > > reuters <- tm_map(reuters, stemDocument) > Error in .jnew(name) : > java.lang.InternalError: Can't start the AWT because Java was > started on the first thread. Make sure StartOnFirstThread is not > specified in your application's Info.plist or on the command line > Unable to create WEKA_HOME (/Users/admin/wekafiles) > Unable to create packages directory (/Users/admin/wekafiles/packages) > Unable to create repository cache directory (/Users/admin/wekafiles/ > repCache) > Unable to create WEKA_HOME (/Users/admin/wekafiles) > Unable to create packages directory (/Users/admin/wekafiles/packages) > Unable to create repository cache directory (/Users/admin/wekafiles/ > repCache) > Refreshing GOE props... you're experiencing two different problems here, so you still need to apply the workaround for the first one. Cheers ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Ok but unfortunately JGR doesn't work when run as root! I think this
is because the packages needed (e.g., rWeka) are installed locally (for the user). The installation "at system level (in R framework)" fails: "Currently it is not possible to install binary packages from a remote repository as root. Please use the CRAN binary of R to allow admin users to install system- wide packages without becoming root. Alternatively you can either use command-line version of R as root or install the packages from local files." If I follow these advices (launch R as a root and try to install the package manually): > install.packages('RWeka') --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... Error: .onLoad failed in loadNamespace() for 'tcltk', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared object '/Library/Frameworks/ R.framework/Versions/2.14/Resources/library/tcltk/libs/x86_64/tcltk.so': dlopen(/Library/Frameworks/R.framework/Versions/2.14/Resources/ library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/ local/lib/libtcl8.5.dylib Referenced from: /Library/Frameworks/R.framework/Versions/2.14/ Resources/library/tcltk/libs/x86_64/tcltk.so Reason: image not found Julien On Jan 16, 2012, at 11:47 AM, Milan Bouchet-Valat wrote: > Le lundi 16 janvier 2012 à 11:27 +0100, Julien Velcin a écrit : >> I had already seen this post. If I'm logged as root (it is possible >> under macosx), here is the result: >> >> * First run: >> >>> reuters <- tm_map(reuters, stemDocument) >> Error in .jnew(name) : >> java.lang.InternalError: Can't start the AWT because Java was >> started on the first thread. Make sure StartOnFirstThread is not >> specified in your application's Info.plist or on the command line >> Unable to create WEKA_HOME (/Users/admin/wekafiles) >> Unable to create packages directory (/Users/admin/wekafiles/packages) >> Unable to create repository cache directory (/Users/admin/wekafiles/ >> repCache) >> Unable to create WEKA_HOME (/Users/admin/wekafiles) >> Unable to create packages directory (/Users/admin/wekafiles/packages) >> Unable to create repository cache directory (/Users/admin/wekafiles/ >> repCache) >> Refreshing GOE props... > Running as root doesn't mean you can bypass the JGR trick. I think > you're experiencing two different problems here, so you still need to > apply the workaround for the first one. > > > Cheers ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Le lundi 16 janvier 2012 à 13:07 +0100, Julien Velcin a écrit :
> Ok but unfortunately JGR doesn't work when run as root! I think this > is because the packages needed (e.g., rWeka) are installed locally > (for the user). The installation "at system level (in R framework)" > fails: > > "Currently it is not possible to install binary packages from a remote > repository as root. > Please use the CRAN binary of R to allow admin users to install system- > wide packages without becoming root. Alternatively you can either use > command-line version of R as root or install the packages from local > files." > > If I follow these advices (launch R as a root and try to install the > package manually): > > > install.packages('RWeka') > --- Please select a CRAN mirror for use in this session --- > Loading Tcl/Tk interface ... Error: .onLoad failed in loadNamespace() > for 'tcltk', details: > call: dyn.load(file, DLLpath = DLLpath, ...) > error: unable to load shared object '/Library/Frameworks/ > R.framework/Versions/2.14/Resources/library/tcltk/libs/x86_64/tcltk.so': > dlopen(/Library/Frameworks/R.framework/Versions/2.14/Resources/ > library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/ > local/lib/libtcl8.5.dylib > Referenced from: /Library/Frameworks/R.framework/Versions/2.14/ > Resources/library/tcltk/libs/x86_64/tcltk.so > Reason: image not found root. Like I said, the fact that one user found it useful on Linux doesn't mean it will work for you on OS X. I think you'd better have a look at R packages in /Library/Frameworks/ and check that Snowball, RWeka and RWekajars files have correct permissions (particularly the jars/ subdirectories). You should also try reinstalling these packages, just in case (and check no error messages are printed). If it doesn't work, I'll wait for Kurt to step in, as I'm going to waste your time on silly attempts if I go on. ;-) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Jan 16, 2012, at 8:18 AM, Milan Bouchet-Valat wrote: > Le lundi 16 janvier 2012 à 13:07 +0100, Julien Velcin a écrit : snipped >>> install.packages('RWeka') >> --- Please select a CRAN mirror for use in this session --- >> Loading Tcl/Tk interface ... Error: .onLoad failed in loadNamespace() >> for 'tcltk', details: >> call: dyn.load(file, DLLpath = DLLpath, ...) >> error: unable to load shared object '/Library/Frameworks/ >> R.framework/Versions/2.14/Resources/library/tcltk/libs/x86_64/ >> tcltk.so': >> dlopen(/Library/Frameworks/R.framework/Versions/2.14/Resources/ >> library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/ >> local/lib/libtcl8.5.dylib >> Referenced from: /Library/Frameworks/R.framework/Versions/2.14/ >> Resources/library/tcltk/libs/x86_64/tcltk.so >> Reason: image not found > OK, I'm really not sure that's worth trying to install these > packages as > root. Like I said, the fact that one user found it useful on Linux > doesn't mean it will work for you on OS X. > > I think you'd better have a look at R packages in /Library/Frameworks/ > and check that Snowball, RWeka and RWekajars files have correct > permissions (particularly the jars/ subdirectories). The fact that the OP knows how to log in as root an a Mac suggests that he probably already knows that the Disk Utility.app program is the typical way to check and repair permissions, but I thought I would mention it in case my presumption is wrong. > You should also try > reinstalling these packages, just in case (and check no error messages > are printed). > -- David Winsemius, MD West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Milan Bouchet-Valat
My libraries are not in "/Library/Frameworks/..." but in "/Users/
myname/Library/R/2.14/library/..." I have made a "chmod a+x" for all the files (including the directories). No changes :(. Besides, I have already tried to re-install the packages (using different version of R: 2.12, 2.13, 2.4). Julien > I think you'd better have a look at R packages in /Library/Frameworks/ > and check that Snowball, RWeka and RWekajars files have correct > permissions (particularly the jars/ subdirectories). You should also > try > reinstalling these packages, just in case (and check no error messages > are printed). > > If it doesn't work, I'll wait for Kurt to step in, as I'm going to > waste > your time on silly attempts if I go on. ;-) ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by David Winsemius
Yes, I've tried to repair the permissions through Disk Utility. But it
doesn't work. Julien > The fact that the OP knows how to log in as root an a Mac suggests > that he probably already knows that the Disk Utility.app program is > the typical way to check and repair permissions, but I thought I > would mention it in case my presumption is wrong. > >> You should also try >> reinstalling these packages, just in case (and check no error >> messages >> are printed). >> > > -- > > David Winsemius, MD > West Hartford, CT > ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Julien Velcin
The Sys.setenv(NOAWT=TRUE) code indeed solved my problem which was excatly what Julien described.
The key is you have to deactivate AWT BEFORE loading RWeka/Snowball. If I do so it will fire a few warning messages but that should not affect anything. I am running the lsa package which requires RWeka and Snowball. My R version is 2.14.1, under Mac OS X 10.6.8. My code snippet as below: > dtm<-textmatrix(ldir,minWordLength=1,stopwords=stopwords_en,stemming=TRUE,language="english") Refreshing GOE props... ---Registering Weka Editors--- Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH? [KnowledgeFlow] Loading properties and plugins... [KnowledgeFlow] Initializing KF...
|
|
THANK YOU ! Actually, the key is to disable AWT before loading the R
packages. At last, it works with just a few warnings. Julien On Feb 2, 2012, at 4:36 PM, Zhou Zhou wrote: > The Sys.setenv(NOAWT=TRUE) code indeed solved my problem which was > excatly > what Julien described. > > The key is you have to deactivate AWT BEFORE loading RWeka/Snowball. > If I do > so it will fire a few warning messages but that should not affect > anything. > I am running the lsa package which requires RWeka and Snowball. My R > version > is 2.14.1, under Mac OS X 10.6.8. My code snippet as below: > >> dtm<- >> textmatrix >> (ldir >> ,minWordLength >> =1,stopwords=stopwords_en,stemming=TRUE,language="english") > Refreshing GOE props... > ---Registering Weka Editors--- > Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, > not in > CLASSPATH? > Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in > CLASSPATH? > Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - > Warning, not > in CLASSPATH? > Trying to add database driver (JDBC): com.mckoi.JDBCDriver - > Warning, not in > CLASSPATH? > Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - > Warning, not > in CLASSPATH? > [KnowledgeFlow] Loading properties and plugins... > [KnowledgeFlow] Initializing KF... > > > > Julien Velcin wrote >> >> I have desactivated AWT (like written in >> http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) >> with : >> >> Sys.setenv(NOAWT=TRUE) >> >> The command tm_map(reuters, stemDocument) gives the following >> errors : >> > > > -- > View this message in context: http://r.789695.n4.nabble.com/Troubles-with-stemming-tm-Snowball-packages-under-MacOS-tp4292605p4351779.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Hi, I'm having the same problem, but the aforementioned solution didn't work for me. I keep getting an error message and the Stemmer is still reportedly unknown. See code below. Please let me know if I'm overlooking anything. Thanks.
> Sys.setenv(NOAWT=TRUE) > library(tm) > library(Snowball) > library(RWeka) > library(rJava) > library(RWekajars) > data("crude") > stemDocument(crude[[1]]) Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH? Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH? > stemDocument(crude[[1]]) Stemmer 'porter' unknown! Diamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter Stemmer 'english' unknown! > |
| Powered by Nabble | Edit this page |
