iRODS

Thoughts and bits from the iRODS Workshop in Linköping

I just came back home from an interesting PRACE iRODS workshop at the National Supercomputing Centre in Linköping. Find below my random notes and impressions. (The presentations are found here).

Among the top bits were the tutorials by Leesa Brieger, not to mention the talks by DICE director Reagan Moore. He covered a lot of interesting stuff that is going on both with the iRODS software itself, and in how it is being used.

From the tutorials, I was very happy to get a federation up working, between two local datagrids that I managed to install on my laptop, and to actually iput a file from one of them to the other, yay! :) ... and to get the iRODS rule engine debugger idbug to work (See this page for some instructions). The idbug debugger will let you watch - in real-time - exactly which iRODS rules and actions are being executed when you trigger some icommand etc. Seems like a totally essential tool for any iRODS rules development!

Back to Reagan Moore's talks and what is going on with the iRODS software, many nice new things are supposed to come in the upcoming 3.2 release, which is actually supposed to being released next week or so.

Some things that sounded especially interesting:

  • Support for registration and sharing of workflows (Reagan showed some nice stuff about this. It looked really interesting, although I have to look closer at it to get it in detail).
  • Pluggable Authenication modules (will enable easier use of things like YubiKey, and not the least LDAP integration!)
  • Pluggable microservices (you can add them without recompiling the source code)
  • Pluggable storage resource drivers
  • Support for storage drivers for some interesting protocols like THREDDS, OpenDAP, pyDAP, ERDDAP, NetCDF etc.

Reagan also covered a bit of the things that are expected to show up in the near future. Among the interesting bits there, was DDN:s work on integrating iRODS directly into the storage controllers in their storage systems hardware, and the ongoing NSF funded project to create a nationwide (as in US) and discipline-crossing data federation network. Read more about that at datafed.org.

One thing I found extra interesting, was that Reagan mentioned some group (CIBER-U) doing MediaWiki integration of iRODS! Gotta figure out if they do share their code ... sounds really interesting!

Among my personal notes for further checking up, is also the "Fedora Commons" project, that seems to have a lot of overlap with iRODS, so I'd be interested to figure out more how they do actually compare in terms of key features.

Tags:

iRODS for managing next gen sequencing data in an HPC environment

My work in the UPPNEX project at UPPMAX HPC Center includes helping develop/find a solution to Next Generation Sequencing data management. The main aim is to automat data handling and making data handling easier for users.

One of the main challenges is  how to handle all this data in a multiple projects/users environment where access restrictions are critical, while many users also want to share certain data, etc.

Jonas Hagberg found out about iRODS, which after some initial research seems to fit our needs very well. It also now seems to be what some big sequencing centers are focusing on right now (Sanger, Broad ...).

Basically, iRODS is a rule-oriented data management system, that sits as a logical layer on top of actual file systems, provides a unified file identifier namespace, and can automate things like data migration between fast cache-like storage and longer time archiving storage, meta data tagging etc. (or, the automation itself can be controlled by manual tagging). Client access is done via the shell through the i-commands, via a web-file manager interface, fuse module, Java or PHP API. All in all, iRODS looks surprisingly mature, and to provide good flexibility while keeping the tech-stack reasonably simple.

The Sanger iRODS slides (March 2011) were very good indeed (they much describe the problems that we face). Also there seems to be some slides (April 2011) from a corresponding initiative at Broad Inistitute.

Installation of iRODS on ubuntu is mainly executing an automated shell script, and is done in a few minutes, so I expect to be diving into iRODS rather full time from now on.

Some iRODS Links: