ConverSpeech - Text data mining for model organism curation

The Customer

ConverSpeech LLC, provider of text data mining software to life science institutions.


Scientific literature contains the most trusted information on gene function. Model organism databases (for budding yeast, grains, and the fruit fly, for example) serve as repositories of information obtained from the scientific literature. It is the job of highly educated biologists to read the scientific articles, identify the important gene-related discoveries, and create database entries that include a description of the evidence given in the articles - whether a scientific claim is supported by experimentation or inferred from electronic annotation, for example. The number and importance of these discoveries has created a pressing need for efficient ways to access full-text articles and identify information contained in them.

The Application

In collaboration with model organism database curators, and supported by the National Human Genome Institute of the National Institutes of Health, we set out to implement a system that analyzes full-text articles and shows curators where in the article there is likely evidence for a new scientific claim about a gene's function.


Our requirement was for software that could translate a PDF file into text for analysis and annotation by our system and then generate a PDF file from that annotated text. In this way, curators could see highlighted text indicating exactly where in the article a gene-related discovery may be reported. Highlighting would also reveal the experimental evidence given in support of the discovery.

Big Faceless based solution

Without rendering the results of our analysis within the PDF version of the scientific article, database curators would be forced to read an article and then consult a separate display or printout of sentences extracted, without context, from the article. Without additional highlighting of places in the article offering evidence for the gene-related discovery, curators would be required to go back and search again through the article for this evidence. Big Faceless software allows us to bring efficiency and accuracy to an important scientific activity.

Colleen E. Crangle
Founder of ConverSpeech
Converspeech, LLC