Swedberg lecture: Trying to survive the data deluge: bioinformatics tools for analyzing and visualizing large data samples

Dr. Reinhard Schneider from the European Molecular Biology Laboratory held a lecture at BMC in Uppsala with the title seen above. It seemed quite relevant to the stuff I'm currently doing at Science for Life Laboratory (where I'm employed for 2 months), investigating LIMS systems for NextGen sequencing data, as well as learning about analysis tools in the area.

What Reinhard presented was four different tools that they have developed/are working on, which tries to solve some of the problems of grasping heterogenous data sources. From the lecture info:

  • BioCompendium: a software pipeline which collects a comprehensive view of the already stored or published information regarding a gene or a list of genes and which allows the analysis of data patterns.
  • Arena3D: a 3D visualization tool for the analysis of large networks of heterogeneous data.
  • Reflect: an augmented web browsing tool which links today’s web with the database world (http://reflect.ws).
  • Mutationfinder: a tool, which allows the half-automatic extraction of mutational information from full text publications and linking it back to the underlying database sequence entry.

I think these summaries quite nicely summarizes what these tools are all about, and I don't have much more to say about them.

I realize - now after the lecture - that these tools are not directly related to at least the first steps of sequence visualization and analysis, and thus not more relevant to hook in later stages of a work flow. For example, the BioCompendium takes a list of genes, rather than raw sequences themselves, thus leaving the annotation part to be solved first.

Also tools like BioCompendium seems at first sight to be too complex to be interested at least to install at Uppnex systems. More realistic would be the Arena3D visualization tool, as an addition in Bioclipse (it's developed in Java), but unfortunately it's not open source.

Anyway ... some more random notes:

The reflex tool, while not directly relevant to Uppnex in general, seems quite cool  ... I tried to paste the URL of some experimental project we are working on (http://drugmet.rilspace.com/wiki/Maltase-glucoamylase,_intestinal) into the field at http://reflect.ws/. Then, clicking the highlighted items will bring up a small info box, with some nice related info. We also got to know that this tool is nowadays turned on by default on the online version of some of the Cell journals.

In general, Reinhard seems to have an interest in developing visualization tools that can span from the general - doing at least some kind of visualization even before you know what you are looking at - to more specific, so that the tool is adopted to specific use cases. They are also interested into extending the current visualization tools (which seem to be mainly built around clustering) to spatial and time-dependent visualization. I think that's an interesting goal ... especially if they succeed.