Skip to content

Beyond the data clouds

2010 August 24
tags:
by abhishektiwari

sample-3

No doubt, cloud computing is hot at the moment. Everyone is jumping onto the bandwagon before it become too late for them. Currently data clouds seems to be a major focus for most of the companies and institutions adopting cloud computing in their long term strategy. These organizations are using data clouds for both on demand computation and to persist and manage the data. Distributed and replicated data clouds not only enable the faster access to resources but they also ensure the higher availability, scalability and fault-tolerance. Data clouds have proven to be highly attractive for scientific community as well. Large scale genomic analysis on the cloud is one of the many examples where community is enjoying best of on demand computing and storage technologies. Most of use cases in life sciences community are focused around mining the huge amount of data produced by third generation sequencers.

Another niche area where cloud computing is making its way is simulation based science and engineering. Compared to data clouds modeling and simulation of various science and engineering problems using scalable cloud computing environments is still in fancy stage. There is some excitement with Amazon’s recently announced High Performance Computing (HPC) cloud services, but there is lot of uncertainty to what extent cloud based HPC clusters can compete with on-premise HPC clusters or in-house dedicated machines. For instance, it remains to be seen  how multi-tenancy in the cloud will react to the HPC performance. Exclusive access to a cloud computing node are way too expensive for both cloud infrastructure providers and users especially when scientific applications require large numbers of nodes. In addition dedicated or exclusive nodes don’t fit very well with economies of cloud computing, in fact multi-tenancy is a prerequisite for the cloud computing. There are some concerns over processing, memory, storage, and network usage patterns in shared multi-tenancy environments. This is an open and unexplored area for both scientific community as well as cloud infrastructure providers. Before people start adopting the cloud based HPC services,  these concerns need to be addressed and explored through the various benchmarking studies. As Mohamed Ahmed suggests,

Cloud infrastructure is still lucrative if comparing its economics to building in-house HPC machines. However, cloud for HPC has to be efficient enough to reach proper performance ceilings without disappointing customers who probably experienced at a certain point to run their HPC applications on dedicated machines.

I could not agree more. Lack of performance guarantees against shared cloud infrastructure is major issue which cloud computing users are facing on regular basis irrespective of type of application they are running. Currently most of cloud infrastructure provider guarantee only the uptime of their nodes and in some cases they provide persistent access to the resources by fully reserved RAM and storage allocations without over-subscription. Some of them guarantee a minimum CPU availability proportional to reserved size but often there are huge gaps between what is promised and what is delivered. There is growing demand for Service Level Agreement (SLA) which should cover both performance and availability. Compared to HPC applications, for the data clouds performance is not a major issue. HPC applications are computationally intensive and they can be highly demanding in a given time period while data clouds behave uniformly.

In next few weeks through a series of blog posts we will focus on some interesting modelling and simulations applications for high-throughput computational science built around cloud based HPC clusters. So stay tuned.

Cross-posted on iCODONS blog

Point-of-care diagnostics using microfluidic devices

2010 August 21
tags:
by abhishektiwari

Working in uncharted scientific territory

2010 August 18
by abhishektiwari

Not much food for your reading today, just pointers to two interesting articles about how young scientists can shape their careers by following uncharted scientific territories.
Bridging the Gap between Scientific Disciplines

With math, if you do it once — unless you made a mistake — it’s going to be the same every time you do it. However, if I put on my biology hat, it’s very hard to come up with a mathematical model that abstracts at the right level because [the biology is] very complex. It’s almost an art, really.

Clearly, there are risk involved in this process but science cannot afford to skimp on innovation just because there are high risks.
Taking “The Road Not Taken”: On the Benefits of Diversifying Your Academic Portfolio

New discoveries are naturally made in unexplored territories. Young people are more capable of exploring the “roads not taken” because they lack an unwarranted baggage of prejudice (or adopt a flat Bayesian prior) on the likelihood of discovery along these roads. The window of opportunity in a scientist’s career is often short: after tenure, most senior researchers get distracted by administrative and fund-raising concerns, and prefer to maintain a conservative profile that promotes old ideas within their discipline.

In second article Abraham Loeb recommends a new investment strategy for young researchers,
• 50% in bonds (mainstream research areas mostly incremental, low-risk )
• 30% in stocks (evolving research areas soon or later to be mainstream, medium risk)
• 20% in venture capital (innovative and interdeciplinary research areas or uncharted territory , high-risk)
I hope you will enjoy both articles.

Probability processing explained

2010 August 17
by abhishektiwari

According to a press release from MIT spin-out Lyric Semiconductor, a new processor which uses probability values between 0 and 1 instead of digital 1s and 0s will be available in market in next 12 months.

Data is represented as bits (1s and 0s). Boolean logic gates perform operations on these bits. Lyric has invented a new kind of logic gate circuit that uses transistors as dimmer switches instead of as on/off switches. These circuits can accept inputs and calculate outputs that are between 0 and 1, directly representing probabilities – levels of certainty.

A digital processor steps through these operations serially in order to perform a function. In order to improve efficiency even further, Lyric’s processors are designed to perform many probability computations in parallel.

Lyric’s approach can accelerate search, fraud detection, spam filtering, financial modeling, genome sequence analysis, and many other important present and future applications that involve simultaneously considering many possible alternatives and deciding on the best fit – the best guess for the answer.

Moving to GitHub

2010 August 16
tags:
by abhishektiwari
I am in the process to move my code repositories to my newly created GitHub account. Slowly I will be moving my private SVN and Mercurial repositories to GitHub but it is going to take some time as I need to perform some clean up operations before staging different projects. I still think Mercurial is really powerful and easy to use but putting code base on GitHub is more social and visible (it’s more of a hype and we have to be part of crowd). One of the main reasons to move on GitHub platform was strong presence of Ruby and Rails community compared to BitBucket which seems to more pythonic. In addition, I think GitHub is also team and organization friendly, managing workflows and working with teams is so easy with GitHub.

 Top languages on GitHub