Components

List of U-Compare compatible UIMA components

This is a list of UIMA components, which are compatible with the U-Compare "comparable" type system. These components are included in the U-Compare single-click-to-launch package. See components for other components.

Abbreviations: UT or U-Tokyo for the University of Tokyo, UM or U-Man for the University of Manchester, CCP for Computational Pharamacology Center at the University of Colorado Health Science Center.

Semantic Tools

Biological Named Entity Recognizers

Name Provider Developer Description
ABNER-NLPBA CCP University of Wisconsin-Madison
CCP
CRF trained on the NLPBA corpus, tokenizer and setence detector included.
ABNER-BioCreative CCP University of Wisconsin-Madison
CCP
CRF trained on the BioCreative corpus, tokenizer and setence detector included.
ABNER with User Model CCP/U-Compare University of Wisconsin-Madison
CCP
Uses specified CRF trained user model. Tokenizer and setence detector included.
GENIATagger U-Tokyo Yoshimasa Tsuruoka, U-Tokyo (GENIA project) Uses Maximum Entropy, trained on the NLPBA data set, derived from the GENIA corpus.
NaCTeM Species Word Detector NaCTeM/U-Compare Xinglong Wang, NaCTeM and Claire Grover, U-Edinburgh Detect words that indicate model organisms (e.g., mouse, human, murine) in running text. The list of organisms was derived from NCBI Taxonomy and the UniProt controlled vocabulary of species.
NeMine NaCTeM
U-Manchester
Yutaka Sasaki, NaCTeM
U-Manchester
CRF trained on Genia Corpus/JNLPBA-2004 shared task data with BioThesaurus as dictionary
MedTNER-M U-Tokyo Kazuhiro Yoshida, U-Tokyo Protein mention detector, trained on the Genia Corpus using the Protein_molecule tags as outermost as possible when tags are nested, by the Maximum Entropy Markov Model.
Moara CBR-Tagger (BC2 model) National Center of Biotechnology, Madrid Mariana Lara Neves, National Center of Biotechnology, Madrid Wrapper for the CBR-Tagger, trained with the BioCreative 2 GM model.
Moara CBR-Tagger (BC2 and BC1 yeast, mouse and fly models) National Center of Biotechnology, Madrid Mariana Lara Neves, National Center of Biotechnology, Madrid Wrapper for the CBR-Tagger, trained with the BioCreative 2 GM model and the BioCreative 1 GN models for the yeast, mouse and fly.
LingPipe Entity Tagger (Genia) CCP Alias-i Trained on the Genia Corpus. You have to download and import the lingpipe.ucz package separately from our download page.
LingPipe Entity Tagger (Genia-NLPBA) CCP Alias-i Trained on the NLPBA data set, part of the Genia Corpus. You have to download and import the lingpipe.ucz package separately from our download page.
LingPipe Entity Tagger (GeneTag) CCP Alias-i Trained on GeneTag by the Hidden Markov Model. You have to download and import the lingpipe.ucz package separately from our download page.

Other Named Entity Recognizers

Name Provider Developer Description
OpenNLPNER U-Tokyo OpenNLP/Apache UIMA From Apache UIMA examples. Detects Person, Title, Place.

Named Entity Normalizers

Name Provider Developer Description
NaCTeM Species Disambiguator NaCTeM/U-Compare Xinglong Wang, NaCTeM Normalise biological named entity mentions in text to NCBI Taxomony IDs, which indicate the entities' model organisms. The program uses a maximum entropy multi-classification model and a binary relation classification SVM model, both trained on the Edinburgh TXM corpus.
MedTNER U-Tokyo Kazuhiro Yoshida, U-Tokyo Trained on the JNLPBA data. NEs normalized to Uniprot entries with MaxEnt classifier for disambiguations.

Abbreviation Detector

Name Provider Developer Description
extractabbrev NaCTeM/U-Compare A. Schwartz and M. Hearst, U-Berkeley The ExtractAbbrev class implements a simple algorithm for extraction of abbreviations and their definitions from biomedical text. Abbreviations (short forms) are extracted from the input file, and those abbreviations for which a definition (long form) is found are printed out, along with that definition, one per line.