List of U-Compare compatible UIMA components
This is a list of UIMA components, which are compatible with the U-Compare
"comparable" type system. These components are included in the
U-Compare single-click-to-launch package. See components for other components.
Abbreviations: UT or U-Tokyo for the University of Tokyo, UM or U-Man for the University of Manchester, CCP for Computational Pharamacology
Center at the University of Colorado Health Science Center.
Collection Readers for Biological Named Entities Annotated Corpora
| Name | Provider | Developer | Description |
|---|---|---|---|
| Bio1 Corpus Collection Reader | CCP | Nigel Collier, NII | 200 abstracts and their titles retrieved from MEDLINE papers, with biological named entities. |
| Biocreative 1a Format Collection Reader | U-Compare | Yoshinobu Kano, U-Compare | Reads Biocreative task 1a format files. You should prepare the corpus files by yourself; Biocreative Task 1a corpus can be obtained after registration in their website. |
| BioIE Oncology Corpus Collection Reader | CCP | U-Penn | 1157 Pubmed abstracts from the Oncology domain, with biological named entities annotated. |
| NLPBA Collection Reader | U-Compare | Jin-Dong Kim et al., NLPBA (GENIA group) | Reads NLPBA test/training corpus.Training data is 2000 MEDLINE abstracts of the GENIA version 3 corpus collected using search terms “human”, ”blood cell”, “transcription factor”, and test data is from 404 abstracts. |
| Texas Corpus Collection Reader | CCP | U-Texas | 750 Medline abstracts with protein named entities annotated. |
| Yapex Reference Corpus Collection Reader | CCP | Swedish Institute of Computer Science | MEDLINE abstracts obtained by posing the query 'protein binding [Mesh term] AND interaction AND molecular' with the parameters 'abstract', 'english', 'human', and 'publication date 1996-2001' to MEDLINE. From this set 99 abstracts were drawn randomly to form the reference (training) collection, containing 1745 protein names. |
| Yapex Test Corpus Collection Reader | CCP | Swedish Institute of Computer Science | MEDLINE abstracts obtained by posing the query 'protein binding [Mesh term] AND interaction AND molecular' with the parameters 'abstract', 'english', 'human', and 'publication date 1996-2001' to MEDLINE. From this set 48 abstracts were drawn randomly to form the test collection, together with 53 abstracts from the GENIA corpus. Totally 101 abstracts (cf. above) containing 1966 protein names |
| GENIACollectionReader | U-Tokyo | U-Tokyo (GENIA Project) | UIMA version in preparation. |
Collection Readers for Biological Event Annotated Corpora
| Name | Provider | Developer | Description |
|---|---|---|---|
| AImed Collection Reader | U-Tokyo | Bunescu, et al. 2005, U-Texas | Proteins, Protein-Protein Interactions, Sentences, are annotated for 225 pubmed abstracts. |
| Bionlp 09 Shared Task Reader | U-Compare | Yoshinobu Kano, U-Compare | Reads *.txt, *.a1 and *.a2 formats of the BioNLP 09 Shared Task format corpus. If no directory specified, retrieves six sample documents. You should download the corpus from BioNLP Shared Task website or prepare your own. |
Collection Readers for Generic Formats
| Name | Provider | Developer | Description |
|---|---|---|---|
| BIO Format Collection Reader | U-Compare | Yoshinobu Kano, U-Compare | Reads BIO (or IOB) format annotated files from specified folder. Mappings from the BIO tags to the UIMA types should be specified. |
| Input Text Reader | U-Compare | Yoshinobu Kano, U-Compare | Allows interactive text input (typing directly or right-click to paste) in the U-Compare GUI. |
| File System Collection Reader | Apache UIMA | Apache UIMA | Reads raw texts from files in the specified directory. |
| XMI Collection Reader | Apache UIMA | Apache UIMA | Reads UIMA XMI format files in the specified directory. |
| XMI Single File Reader | Apache UIMA/U-Compare | Apache UIMA/U-Compare | Reads specified UIMA XMI format file. |
Writers
| Name | Provider | Developer | Description |
|---|---|---|---|
| BIO Format Writer Cas Consumer | U-Compare | Yoshinobu Kano, U-Compare | Writes annotations into BIO (or IOB) format files. Mappings from the UIMA types to the BIO tags to should be specified. BIO Annotations should not overlaped in their text positions. |
| XMI Format Writer Cas Consumer | Apache UIMA/U-Compare | Apache UIMA/U-Compare | Writes annotations into the XMI format files. |