Sections:
- Common Requirements
- UCLoader Options
- U-Compare Simple Stand-Off Format
- Editing Workflow Descriptor
- Calling U-Compare GUI Generated Workflows
Command Line Mode without U-Compare GUIs
UCLoader is our default launcher system, where UCLoader.class is launched via command line, while UCLoaderLauncher is intended to be used for GUI based startup. Please refer to launch U-Compare page for details.
UCLoader can be used as a pure command line tool with specific options. This section describes about such usages.
Common Requirements
Java 6 installed, U-Compare should be launched by UCLoader at least once. Internect connection required when you used web service components, and when you launch UCLoader for the first time.
Please download UCLoader.class from here.
UCLoader Options
Assuming that you saved UCLoader.class in the current directly, run:
java -cp . -XmsXXXm -XmxXXXm -Djavaws.workflow.path="path/to/yourworkflow.xml" UCLoader --jnlp http://u-compare.org/lib/u-compare-runworkflow.jnlp
``-cp'' is the Java VM option to specify your classpath (in this case the current directory ``.'' is specified). You should include UCLoader.class in your classpath.
``-Xms'' and ``-Xmx'' are the Java VM option to specify the amount of heap memory allcation.
You should set ``-Djavaws.workflow.path'' value to specify the location of your workflow descriptor file (see details below). The file should be on the classpath, and the location should be specified based on the classpath root.
This launcher runs any UIMA CPE workflow descriptor. All of the required resources should be on your classpath, but you don't have to include U-Compare predefined resources. Before launching the workflow, all of the U-Compare resources are specified on your classpath, the environment is the same and shared with the GUI mode.
Example Workflow
Please download the example workflow descriptor from here. Save this file to the directory where the UCLoader.class is stored.
Then run:
java -cp . -Xms400m -Xmx800m -Djavaws.workflow.path="StdinProteinTagging-CPE.xml" UCLoader --jnlp http://u-compare.org/lib/u-compare-runworkflow.jnlp
after initialization finished, please enter English sentences and press enter. You will get annotations processed by a sentence splitter and a protein mention tagger, in the standard output stream shown in your console.
Note that this string-then-enter input is a convenient way, the formal format is described below.
U-Compare Simple Stand-Off Format
Since this option runs any UIMA CPE workflow, it is up to your workflow what happens - your components may create GUI windows, network connections, etc.
However, for the developer's convenience, we prepared I/O components which takes input from the standard input stream, outputs all of the generated annotations to the standard output stream. The above example workflow uses these components.
The input format is defined as:
[byte_length_of_rawtext]\n[rawtext]\n[annotations]\n\n
Example:
21 This is the raw text. annotations...
The output format is equal to the annotations part format of the input.
Annotations part is defined as:
- annotations are lines which end with a blank line. Annotations are separated into lines. Each line represents a single annotation.
- annotation consists of fields separated by white spaces.
- Legend: begin end typename uniqueID [featureName=``value''|featureName=referredUniqueID]*
e.g.
22 26 jp.ac.u_tokyo.is.s.www_tsujii.bio.Protein id=``gene_prod3'' category=``ion channel''
32 46 .Protein id=``gene_prod82" category=``ion channel''
0 0 jp.ac.u_tokyo.is.s.www_tsujii.bio.Interaction p1=``gene_prod3'' p2=``gene_prod82''
Notes:
- begin and end are the offsets in integers. Both 0 for non-span annotations. Offsets are charcter-based counts.
- typename is the UIMA defined full-package name of the type of this annotation. start with .(dot) is an abbreviation for the default package (jp.ac.u_tokyo.is.s.www_tsujii)
- uniqueID is a concatinated string value of one or more alphabet letter(s) and a unique integer for that letter(s). Default prefix is ``u''.
- featureName and its value type should be equal to the UIMA side definition of the annotation type. Literal values should be quoted by double quotations and escaped in the XML manner. Reference to the other annotation should not be quoted. Its value is a unique ID.
Editing Workflow Descriptor
The workflow descriptor is a UIMA CPE (Collection Processing Engine) descriptor, which definition is fully written in the Apache UIMA official documentations.
However, it is simple to edit the workflow to modify which components to be called.
Please open the example workflow descriptor above in your text editor.
You will see <casProcessor> xml tags, each of them corresponds to an analysis engine processor, called in the order described in this xml file.
If you want to delete one of the processors, just delete the whole corresponding <casProcessor> xml tag block.
If you want to insert new processor, copy one of the <casProcessor> xml tag block in a proper place, then change the name field of the <import> xml tag in that <casProcessor>.
This field specifies the location of the component descriptor file, path from the classpath root concatinated by dot(.), without the .xml suffix.
If you want to use the U-Compare predefined components, it is enough just to specify this field. Please refer to the next section, using U-Compare components from your UIMA workflow, for how to get the locations of the U-Compare components.
Calling U-Compare GUI Generated Workflows
Upcoming.
Previous: Using Third Party UIMA Index: Developer Guide Next: Using U-Compare Components within
