List of U-Compare compatible UIMA components
This is a list of UIMA components, which are compatible with the U-Compare
"comparable" type system. These components are included in the
U-Compare single-click-to-launch package. See components for other components.
Abbreviations: UT or U-Tokyo for the University of Tokyo, UM or U-Man for the University of Manchester, CCP for Computational Pharamacology
Center at the University of Colorado Health Science Center.
Semantic Tools
Biological Named Entity Recognizers
| Name | Provider | Developer | Description |
|---|---|---|---|
| ABNER-NLPBA | CCP | University of Wisconsin-Madison CCP |
CRF trained on the NLPBA corpus, tokenizer and setence detector included. |
| ABNER-BioCreative | CCP | University of Wisconsin-Madison CCP |
CRF trained on the BioCreative corpus, tokenizer and setence detector included. |
| ABNER with User Model | CCP/U-Compare | University of Wisconsin-Madison CCP |
Uses specified CRF trained user model. Tokenizer and setence detector included. |
| GENIATagger | U-Tokyo | Yoshimasa Tsuruoka, U-Tokyo (GENIA project) | Uses Maximum Entropy, trained on the NLPBA data set, derived from the GENIA corpus. |
| NaCTeM Species Word Detector | NaCTeM/U-Compare | Xinglong Wang, NaCTeM and Claire Grover, U-Edinburgh | Detect words that indicate model organisms (e.g., mouse, human, murine) in running text. The list of organisms was derived from NCBI Taxonomy and the UniProt controlled vocabulary of species. |
| NeMine | NaCTeM U-Manchester |
Yutaka Sasaki, NaCTeM U-Manchester |
CRF trained on Genia Corpus/JNLPBA-2004 shared task data with BioThesaurus as dictionary |
| MedTNER-M | U-Tokyo | Kazuhiro Yoshida, U-Tokyo | Protein mention detector, trained on the Genia Corpus using the Protein_molecule tags as outermost as possible when tags are nested, by the Maximum Entropy Markov Model. |
| Moara CBR-Tagger (BC2 model) | National Center of Biotechnology, Madrid | Mariana Lara Neves, National Center of Biotechnology, Madrid | Wrapper for the CBR-Tagger, trained with the BioCreative 2 GM model. |
| Moara CBR-Tagger (BC2 and BC1 yeast, mouse and fly models) | National Center of Biotechnology, Madrid | Mariana Lara Neves, National Center of Biotechnology, Madrid | Wrapper for the CBR-Tagger, trained with the BioCreative 2 GM model and the BioCreative 1 GN models for the yeast, mouse and fly. |
| LingPipe Entity Tagger (Genia) | CCP | Alias-i | Trained on the Genia Corpus. You have to download and import the lingpipe.ucz package separately from our download page. |
| LingPipe Entity Tagger (Genia-NLPBA) | CCP | Alias-i | Trained on the NLPBA data set, part of the Genia Corpus. You have to download and import the lingpipe.ucz package separately from our download page. |
| LingPipe Entity Tagger (GeneTag) | CCP | Alias-i | Trained on GeneTag by the Hidden Markov Model. You have to download and import the lingpipe.ucz package separately from our download page. |
Other Named Entity Recognizers
| Name | Provider | Developer | Description |
|---|---|---|---|
| OpenNLPNER | U-Tokyo | OpenNLP/Apache UIMA | From Apache UIMA examples. Detects Person, Title, Place. |
Named Entity Normalizers
| Name | Provider | Developer | Description |
|---|---|---|---|
| NaCTeM Species Disambiguator | NaCTeM/U-Compare | Xinglong Wang, NaCTeM | Normalise biological named entity mentions in text to NCBI Taxomony IDs, which indicate the entities' model organisms. The program uses a maximum entropy multi-classification model and a binary relation classification SVM model, both trained on the Edinburgh TXM corpus. |
| MedTNER | U-Tokyo | Kazuhiro Yoshida, U-Tokyo | Trained on the JNLPBA data. NEs normalized to Uniprot entries with MaxEnt classifier for disambiguations. |
Abbreviation Detector
| Name | Provider | Developer | Description |
|---|---|---|---|
| extractabbrev | NaCTeM/U-Compare | A. Schwartz and M. Hearst, U-Berkeley | The ExtractAbbrev class implements a simple algorithm for extraction of abbreviations and their definitions from biomedical text. Abbreviations (short forms) are extracted from the input file, and those abbreviations for which a definition (long form) is found are printed out, along with that definition, one per line. |