Components

List of U-Compare compatible UIMA components

This is a list of UIMA components, which are compatible with the U-Compare "comparable" type system. These components are included in the U-Compare single-click-to-launch package. See components for other components.

Abbreviations: UT or U-Tokyo for the University of Tokyo, UM or U-Man for the University of Manchester, CCP for Computational Pharamacology Center at the University of Colorado Health Science Center.

Collection Readers for Biological Named Entities Annotated Corpora

Name Provider Developer Description
Bio1 Corpus Collection Reader CCP Nigel Collier, NII 200 abstracts and their titles retrieved from MEDLINE papers, with biological named entities.
Biocreative 1a Format Collection Reader U-Compare Yoshinobu Kano, U-Compare Reads Biocreative task 1a format files. You should prepare the corpus files by yourself; Biocreative Task 1a corpus can be obtained after registration in their website.
BioIE Oncology Corpus Collection Reader CCP U-Penn 1157 Pubmed abstracts from the Oncology domain, with biological named entities annotated.
NLPBA Collection Reader U-Compare Jin-Dong Kim et al., NLPBA (GENIA group) Reads NLPBA test/training corpus.Training data is 2000 MEDLINE abstracts of the GENIA version 3 corpus collected using search terms “human”, ”blood cell”, “transcription factor”, and test data is from 404 abstracts.
Texas Corpus Collection Reader CCP U-Texas 750 Medline abstracts with protein named entities annotated.
Yapex Reference Corpus Collection Reader CCP Swedish Institute of Computer Science MEDLINE abstracts obtained by posing the query 'protein binding [Mesh term] AND interaction AND molecular' with the parameters 'abstract', 'english', 'human', and 'publication date 1996-2001' to MEDLINE.
From this set 99 abstracts were drawn randomly to form the reference (training) collection, containing 1745 protein names.
Yapex Test Corpus Collection Reader CCP Swedish Institute of Computer Science MEDLINE abstracts obtained by posing the query 'protein binding [Mesh term] AND interaction AND molecular' with the parameters 'abstract', 'english', 'human', and 'publication date 1996-2001' to MEDLINE.
From this set 48 abstracts were drawn randomly to form the test collection, together with 53 abstracts from the GENIA corpus.
Totally 101 abstracts (cf. above) containing 1966 protein names
GENIACollectionReader U-Tokyo U-Tokyo (GENIA Project) UIMA version in preparation.

Collection Readers for Biological Event Annotated Corpora

Name Provider Developer Description
AImed Collection Reader U-Tokyo Bunescu, et al. 2005, U-Texas Proteins, Protein-Protein Interactions, Sentences, are annotated for 225 pubmed abstracts.
Bionlp 09 Shared Task Reader U-Compare Yoshinobu Kano, U-Compare Reads *.txt, *.a1 and *.a2 formats of the BioNLP 09 Shared Task format corpus. If no directory specified, retrieves six sample documents. You should download the corpus from BioNLP Shared Task website or prepare your own.

Collection Readers for Generic Formats

Name Provider Developer Description
BIO Format Collection Reader U-Compare Yoshinobu Kano, U-Compare Reads BIO (or IOB) format annotated files from specified folder. Mappings from the BIO tags to the UIMA types should be specified.
Input Text Reader U-Compare Yoshinobu Kano, U-Compare Allows interactive text input (typing directly or right-click to paste) in the U-Compare GUI.
File System Collection Reader Apache UIMA Apache UIMA Reads raw texts from files in the specified directory.
XMI Collection Reader Apache UIMA Apache UIMA Reads UIMA XMI format files in the specified directory.
XMI Single File Reader Apache UIMA/U-Compare Apache UIMA/U-Compare Reads specified UIMA XMI format file.
XMI, Inline XML, Annotation Printer, BIO

Writers

Name Provider Developer Description
BIO Format Writer Cas Consumer U-Compare Yoshinobu Kano, U-Compare Writes annotations into BIO (or IOB) format files. Mappings from the UIMA types to the BIO tags to should be specified. BIO Annotations should not overlaped in their text positions.
XMI Format Writer Cas Consumer Apache UIMA/U-Compare Apache UIMA/U-Compare Writes annotations into the XMI format files.