How to create a Bioinformatics pipeline using Spotify !
A lot of big streaming companies like Spotify, Pandora, etc are more and more pushing towards a better and more stable frameworks and the best thing to do that is to go open source and get useful feedbacks and keep working on making the platform better.
Spotify developed a platform called Luigi, a python framework to handle users logs, and mine them intuitively by plugging several machine learning algorithms to improve their recommendation systems and their suggestions to clients.
Luigi works almost like any make-like python framework for pipeline development, like Ruffus or Snakemake etc.., but it has a plus over these solutions, it is designed to create Hadoop friendly pipelines and also comes with a visual diagnostic of each part of your pipeline while it is running. Another feature I like is that it notifies you via email when a task fails.
Here is a simple adaptation of Luigi for Bioinformatics. This pipeline :
<pre><code> def HelloCrowd( str ): "This is a function to welcome CodersCrowd members" print str return HelloCrowd("We Love Bioinformatics") </code></pre>