UPPMAX arranges NextGen Sequencing Cloud/Hadoop hackathon in May

For you next gen sequencing bioinformaticians interested in getting some hands on new cloud based technologies for computation, mainly around the hadoop framework, and being around in Uppsala at the end of May 2012, may want to have a look at this:

As the event page on UPPMAX states:
"The hackathon will focus on next challenges that cloud adoption poses: massively distributed data processing frameworks such as Hadoop, distributed cloud databases and distributed bioinformatics applications."

Based on my experiences from a very useful (as in getting new hands on experience) and interesting (as in making new contacts) hackathon at CSC in Helsinki, Finland, I am sure this will be a highly interesting and useful hackathon as well, for all who are faced with the challenges of big sequencing data.

Apply before April 30 to get (EU/COST action: SeqAhead) funding! See you in Uppsala at the end of May!

iRODS for managing next gen sequencing data in an HPC environment

My work in the UPPNEX project at UPPMAX HPC Center includes helping develop/find a solution to Next Generation Sequencing data management. The main aim is to automat data handling and making data handling easier for users.

One of the main challenges is  how to handle all this data in a multiple projects/users environment where access restrictions are critical, while many users also want to share certain data, etc.

Jonas Hagberg found out about iRODS, which after some initial research seems to fit our needs very well. It also now seems to be what some big sequencing centers are focusing on right now (Sanger, Broad ...).

Basically, iRODS is a rule-oriented data management system, that sits as a logical layer on top of actual file systems, provides a unified file identifier namespace, and can automate things like data migration between fast cache-like storage and longer time archiving storage, meta data tagging etc. (or, the automation itself can be controlled by manual tagging). Client access is done via the shell through the i-commands, via a web-file manager interface, fuse module, Java or PHP API. All in all, iRODS looks surprisingly mature, and to provide good flexibility while keeping the tech-stack reasonably simple.

The Sanger iRODS slides (March 2011) were very good indeed (they much describe the problems that we face). Also there seems to be some slides (April 2011) from a corresponding initiative at Broad Inistitute.

Installation of iRODS on ubuntu is mainly executing an automated shell script, and is done in a few minutes, so I expect to be diving into iRODS rather full time from now on.

Some iRODS Links: