U-Compare: Advanced Biological Text Analytics Using Workflows
Recently text mining groups at University of Tokyo and University of Colorado School of Medicine in collaboration with National Centre for Text Mining at University of Manchester have developed a powerful workflow based text analytics tool called as U-Compare which is freely available online. U-Compare is built on top of UIMA (Unstructured Information Management Architecture), and it allows users to build complex NLP workflows via an easy to use visual programming interface. In U-Compare, NLP workflows can be created using simple drag-and-drop of available UIMA components.
UIMA provides an open source framework, component collection and infrastructure for unstructured information analysis and search. UIMA originated at IBM, now incubated at the Apache Software Foundation and its specification standardization effort is hosted at OASIS. As it was designed to be industrial strength platform and hence several IBM applications such as OmniFind are based on UIMA.
FurtherUIMA enables applications to be decomposed into components, for example “language identification” => “language specific segmentation” => “sentence boundary detection” => “entity detection (person/place names etc.)”. Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages. UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.
U-Compare contains largest UIMA component collection. There are seven functional component classes: collection readers, sentence detectors, tokenizers, POS taggers, syntactic parsers, relation extracters and named entity recognizers. Each of these class specifies several components, for example in named entity recognizers class user have many options such as GENIA Tagger, LingPipe Entity Tagger, OpenNLP. In addition U-Compare provides a special parallel flow component, which can be used to make comparison workflows to compare the outputs of tool and corpus combinations.
Apart from functional components, current version of U-Compare includes two visualizer components: Annotation Viewer and MoriV. Annotation Viewer can be used to display annotations generated at any point in the workflow while MoriV is a tree structure visualizer.
U-Comapre also provides a developer API, which includes official UIMA Java/C++ APIs, along with a simpler interface which allows developers to access a UIMA workflow via the standard I/O streams or via stored files. U-Compare is available as Java Web Start application.
Reference:
Kano, Y., Baumgartner, W., McCrohon, L., Ananiadou, S., Cohen, K., Hunter, L., & Tsujii, J. (2009). U-Compare: share and compare text mining tools with UIMA Bioinformatics DOI: 10.1093/bioinformatics/btp289




U-Compare: Advanced Biological Text Analytics Using Workflows: Those who are familiar with Pipeline Pilot text a.. http://tinyurl.com/r758eb
U-Compare: Advanced Biological Text Analytics Using Workflows- by … http://bit.ly/wn4u8