New gene curation software created for genetic biologists

University of Illinois researchers have created a curation software tool for genetic biologists and it has led to a new approach in searching for information. They call it the BeeSpace Navigator.

The project originated as a partnership between researchers at the Institute for Genomic Biology and the department of computer science. They were led by Bruce Schatz, professor and head of medical information science at the U. of I. The team defined the software and its uses in the web server issue of the journal Nucleic Acids Research.

    

When biologists are looking for information about a certain gene or its function, they go to curators. Curators maintain large amounts of information from academic papers and scientific studies. A curator can extract as much information as possible from the papers in his or her collection and give the biologist with a thorough summary of what’s known about the gene – its location, function, sequence, regulation and more.

Curators do this by placing the information into an online database such as FlyBase.

“The question was, could you make an automatic version of that, which is accurate enough to be helpful?” Schatz said.

Schatz and his collaborators built BeeSpace Navigator, a free piece of online software that is powered by information from databases of scholarly publications. The semantic indexing to support the automatic curation used the Cloud Computing Testbed, a national computing datacenter hosted at U. of I.

Initially BeeSpace was built around literature about the bee genome, but it has gradually been expanded to the entire Medline database and has been used to study many insects as well as mice, pigs and fish.

The extreme usefulness of BeeSpace Navigator comes from its specific searches. A wide-ranging, basic level search of all compiled data would send back a disordered group of results – similar to the millions of hits a Google search would generate. But with BeeSpace, users make “spaces,” or special collections of literature to search. It can also take a large collection of articles on a topic and automatically partition it into subsets based on which words occur together, a function called clustering.

“The first thing you have to do if you have something that’s simulating a curator is to decide what papers it’s going to look at,” Schatz said. “Then you have to decide what to extract from the text, and then what you’re going to do with what you’ve extracted, what service you’re going to provide. The system is designed to have easy ways of doing that.”

The friendly GUI lets biologists build a distinctive space in a few easy steps, utilizing sub-searches and filters.

For instance, an entomologist fascinated by the genetic basis for foraging as a social behavior in bees would start with insect literature, and then focus in on genes that are associated in literature with both foraging and social behavior. This is a specific intersection of topics that typical search engines could not handle.

This directed data steering has numerous advantages. It is much more directed than a simple search, but able to process much more data than a human curator. It also has use in fields where there are no human curators, since only the most-studied animals like mice and flies have their own professional curators.

Schatz and his team tweaked the navigator to perform several tasks that biologists often perform when trying to interpret gene function. The program can summarize a gene, as a curator would, and it also can perform analysis to draw conclusions about functions from literature.

Let’s say a study shows that a gene controls a specific chemical, and another study will show that chemical has a role in a certain behavior, so the software makes the link that the gene could, in part, regulate that behavior.

BeeSpace also has the power to perform vocabulary switching, an automatic translation across species or behaviors. For example, if it is recognized that a specific gene in a honeybee is similar to another gene in a fruit fly, but the function of that gene has been documented in much more detail in a fruit fly, the navigator can make the connection and show a bee scientist information on the fly gene that may be helpful.

“The main point of the project is automatically finding out what genes do that don’t have known function,” Schatz said. “If a biologist is trying to figure out what these genes do, they’re happy with anything. They want to get as much information as possible.”

The BeeSpace Navigator is now in its fourth version. It is available free online. There is overview documentation available as well.