userguide

  Previous: Creating a Workflow Index: User Guide Next: Comparison Workflows

Sections:

Editing a workflow

The workflow created in the previous section, comprised of only two components, is hardly representative of the normal level of complexity of UIMA NLP workflows. In this section we will add a few more components and customize the parameters of the included components.

Deleting Components

We will continue working on the workflow from the previous section, so if for any reason this workflow is no longer present in the workflow viewer, please reload the saved version of it from the workflow menu.

As was mentioned earlier, the ``Annotation Viewer'' component in this workflow isn't really adding any functionality that we can't gain by using the show button in the annotation statistics tab of the Session Manager. Its also likely to get in the way of the workflow we want to build, so lets remove it. To remove it just click the the red X in its top right hand corner.

Image removebutton

Once you click the X, the component should disappear and we should be left with a workflow containing only the AIMED reader. The ``Analysis Engines and Cas Consumers'' box should be empty, and we can now build our new workflow there.

Input and Output types

The workflow we are going to build takes its input files from the AIMED reader and then eventually prints NER annotated outputs to a file on the file system. The NER we are going to use is called ``Genia Tagger'' and can be found in the English : Named Entity Recognizers directory of the component library.

Image genia

The component library shows this component as requiring two different types of input annotations, Token and Sentence. We already know from the previous workflow that the AIMED reader will produce sentence annotations for us, but we will need to use another component to produce the Token annotations. For this almost any tokenizer component will suffice, the one I chose to use was ``UIMA Tokenizer''.

Image uimatokenizer

This requires Sentence annotations, which we already have and outputs Token annotation as required. The final component we are going to need is one to print the outputs to a file. The CAS Consumers folder in the component libraries provides several options, the one I am going to use is ``Annotation Printer'', but feel free to experiment with different ones.

Image annotationprinter

Assembling the Workflow

This time we are going to add three components to the ``Analysis Engines and CAS Consumers'' box of the component viewer instead of just one. This means we now have to take order into account. Begin by dragging and dropping the tokenizer component into the box.

Now drag the ``Genia Tagger'' component over the top of the tokenizer but don't release. When the mouse pointer is over the top half of the tokenizer component a gray line will appear immediately above the component or when it is over the lower half of the component the gray line will appear below. This gray line, as indicated by the red arrow in the following image, shows the position into which the new component will be added.

Image secondcomponent

Drag the tagger component over the bottom half of the component and then drop to add it to the end of the workflow. Once you have done that, drag and drop the ``Annotation Printer'' into place after the ``Genia Tagger''. Don't worry if you add a component in the wrong place, even after being added to the workflow, components can still be dragged and dropped to move them into the correct position.

Once you have all the components in the right order, click play to make sure the workflow runs as expected. The session manager should display results as with previous examples, and due to the inclusion of a CAS Consumer a file should also be created on your hard drive. The location will depend on the ``Output Directory'' displayed in the Cas Consumer's configuration parameters box in the workflow viewer. In the case of ``Annotation Printer'' this will be temp-uima-output/annotprint/annotations.txt located under the U-Compare folder in the users home directory. Locate this file and confirm that it exists.

Component Parameters

The default save location of the CAS Consumer is perhaps not the location that would be most convenient for us, so lets change it to somewhere better. To do this we need to edit the default parameters of the CAS Consumer component. Begin by clicking on the orange spanner icon in the components top right hand corner.

Image editbutton

Doing this will cause the edit view of the component to be displayed in the workflow viewer. In this view we can make changes not only to the components parameters but also its name, description and input and output types. In the case of the ``Annotation Printer'' component this view looks as follows.

Image annotationprinteredit

At the top are buttons to confirm/cancel any changes we make to the component, as well as a drop down menu that lets us access any saved parameter configurations. Below this is the edit panel, the top part of which includes an editable title field and description field.

Below the description is the ``Inputs'' box. Values listed in ``Inputs'' (and ``Outputs'') box for components generally just serve as a form of documentation and don't affect the behavior of the components. They are however used to determine the relevant parts of the type system when setting ``Outputs to Compare'' for parallel aggregate components. This will be explained later.

The setting we are interested in changing is located below all these in the``Configuration Parameters'' panel.

Image configparams

To change this setting either enter the desired output location in the input box directly, or use the ``Browse'' button. Once you have entered the desired output location, save the changes you have made by clicking on the ``Confirm Changes'' button.

Image confirmchanges

U-Compare should return to the workflow level view. Once there, run the workflow and confirm that the output file is saved to the new location.

Configurations

Once you find the file, taking a look at the contents will show you another problem of the current workflow; it includes a lot of the gold standard annotations produced by the collection reader that weren't actually used in the workflow. Lets configure the collection reader so these annotations aren't produced. Start by openning the edit view of the AIMED reader.

Image editview

The AIMED reader offers far more customizable parameter than the CAS consumer did. The parameters that we want to change are ``Generate Protein Annotations'' and ``Generate Ppi Annotations'', un-select the check boxes next to these. While we are here, lets also configure the collection reader to read in several documents at a time. Change the ``Number of Articles'' field to 5, which will cause the collection reader to read five documents for processing each time we run the workflow. When you finish making these changes, the parameters should be as follows.

Image aimedafter

At this point we could just confirm changes and run the workflow as we did before, but before we do that, lets save these settings as a configuration so that we can use them again in future. To do this click on the the ``Configuration'' drop down menu and select the option ``Save config as''.

Image saveconfigas

You will be prompted to enter a name to save the configuration as, I chose ``Sentence Only''. Click on the drop down menu again and you will now see your configuration listed.

Image configsaved

Every instance of the AIMED collection reader in any workflow you create will now have this option in the configuration drop down. This may not seem like much of a time saver, but it is particularly useful for aggregate components as it recursively saves the parameter settings of any descendant components (as well as saving the choice of descendant components itself). The red X next to configuration names in the drop down menu can be used to delete a particular configuration.

Now that we have saved the configuration, confirm the changes and run the workflow to make sure that everything worked as expected.

Section Summary

In this section we have seen how to construct a more realistic workflow from a greater number of components and how to configure the behavior of the components we used. We have yet to see how to create U-Compare comparison workflows or how to use aggregate components. These topics will be covered in the next section.

Previous: Creating a Workflow Index: User Guide Next: Comparison Workflows