[R-pkgs] Natural Language Processing for non-English languages with udpipe

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Jan
Reply | Threaded
Open this post in threaded view
|

[R-pkgs] Natural Language Processing for non-English languages with udpipe

Jan
Dear R users,

I'm happy to announce the release of version 0.3 of the udpipe R package on
CRAN (https://CRAN.R-project.org/package=udpipe). The udpipe R package is a
Natural Language Processing toolkit that provides language-agnostic
'tokenization', 'parts of speech tagging', 'lemmatization', 'morphological
feature tagging' and 'dependency parsing' of raw text. Next to text
parsing, the R package also allows you to train annotation models based on
data of 'treebanks' in 'CoNLL-U' format as provided at
http://universaldependencies.org/format.html.

The R package provides direct access to language models trained on more
than 50 languages. The following languages are directly available:

afrikaans, ancient_greek-proiel, ancient_greek, arabic, basque, belarusian,
bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt,
czech, danish, dutch-lassysmall, dutch, english-lines, english-partut,
english, estonian, finnish-ftb, finnish, french-partut, french-sequoia,
french, galician-treegal, galician, german, gothic, greek, hebrew, hindi,
hungarian, indonesian, irish, italian, japanese, kazakh, korean,
latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal,
norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br,
portuguese, romanian, russian-syntagrus, russian, sanskrit, serbian,
slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines,
swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese

We hope that the package will allow other R users to build natural language
applications on top of the resulting parts of speech tags, tokens,
morphological features and dependency parsing output. And we hope in
particular that applications will arise which are not limited to English
only (like the textrank R package or the cleanNLP package to name a few)

Note that the package has no external software dependencies (no java nor
python) and depends only on 2 R packages (Rcpp and data.table), which makes
the package easy to install on any platform.

The package is available on CRAN at
https://CRAN.R-project.org/package=udpipe and is developed at
https://github.com/bnosac/udpipe
A small docusaurus website is made available at
https://bnosac.github.io/udpipe/en

We hope you enjoy using it and we would like to thank Milan Straka for all
the efforts done on UDPipe as well as all persons involved in
http://universaldependencies.org

all the best,
Jan

Jan Wijffels
Statistician
www.bnosac.be  | +32 486 611708

        [[alternative HTML version deleted]]

_______________________________________________
R-packages mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-packages

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.