Components

List of U-Compare compatible UIMA components

This is a list of UIMA components, which are compatible with the U-Compare "comparable" type system. These components are included in the U-Compare single-click-to-launch package. See components for other components.

Abbreviations: UT or U-Tokyo for the University of Tokyo, UM or U-Man for the University of Manchester, CCP for Computational Pharamacology Center at the University of Colorado Health Science Center.

Syntactic Tools

Sentence Detectors

Name Provider Developer Description
GENIA Sentence Detector U-Tokyo Yuichiroh Matsubayashi, U-Tokyo Trained with GENIA corpus.
LingPipe Sentence Detector CCP Alias-i You have to download and import the lingpipe.ucz package separately from our download page.
NaCTeM Sentence Breaker NaCTeM
at U-Manchester
Scott Piao, NaCTeM
U-Manchester
English sentence boundary detector which employs heuristic rules, including error-correction rules, compiled based on corpus resources.
OpenNLP Sentence Detector CCP OpenNLP From OpenNLP project.
UIMA Sentence Detector U-Tokyo Apache UIMA From Apache UIMA examples.

Tokenizers

Name Provider Developer Description
GENIA Tagger U-Tokyo Yoshimasa Tsuruoka, U-Tokyo (GENIA project) Trained on the WSJ, GENIA, and PennBioIE corpora.
OpenNLP Tokenizer U-Tokyo OpenNLP/Apache UIMA From Apache UIMA examples.
Penn Bio Tokenizer CCP U-Penn Part of Penn BioTagger.
UIMA Tokenizer U-Tokyo Apache UIMA From Apache UIMA examples.

Part-of-Speech Taggers

Name Provider Developer Description
GENIATagger U-Tokyo Yoshimasa Tsuruoka, U-Tokyo (GENIA project) Trained on the WSJ, GENIA, and PennBioIE corpora.
SteppTagger U-Tokyo Yoshimasa Tsuruoka, NaCTeM Based on probabilistic models, tuned to biomedical text trained by WSJ, GENIA, and PennBioIE corpora.
LingPipe POS Tagger CCP Alias-i Trained on the Genia corpus by the Hidden Markov Model. You have to download and import the lingpipe.ucz package separately from our download page.
OpenNLPTagger U-Tokyo OpenNLP/Apache UIMA From Apache UIMA examples.

Lemmatizers

Name Provider Developer Description
morpha NaCTeM/U-Compare G. Minnen, et al.,U-Sussex (morph) a fast and robust morphological analyser for English based on finite-state techniques that returns the lemma and inflection type of a word, given the word form and its part of speech.
GENIATagger U-Tokyo Yoshimasa Tsuruoka, U-Tokyo (GENIA project) Trained on the WSJ, GENIA, and PennBioIE corpora.
Enju U-Tokyo Yusuke Miyao, U-Tokyo (Enju) HPSG parser with predicate-argument structures (PAS) as well as phrase structures, trained with newswire articles (Penn Treebank).

Syntactic Parsers (CFG/Deep/Dependency Parsers)

Name Provider Developer Description
Enju U-Tokyo Yusuke Miyao, U-Tokyo (Enju) HPSG parser with predicate-argument structures (PAS) as well as phrase structures, trained with newswire articles (Penn Treebank).
OpenNLPParser U-Tokyo OpenNLP/Apache UIMA CFG parser from Apache UIMA examples.