Big data, Big challenges
Image by Photo Blog 0001 via Flickr
Well those are not only problems we have, most of our freshly minted university graduates are not prepared to face this kind of data deluge. Why?For the most part, university students have used rather modest computing systems to support their studies. They are learning to collect and manipulate information on personal computers or what are known as clusters, where computer servers are cabled together to form a larger computer. But even these machines fail to churn through enough data to really challenge and train a young mind meant to ponder the mega-scale problems of tomorrow.
I guess this is something for which we can not blame students alone. The lack of resources and exposer to new technologies is the one of the reasons. To tackle this issue Google and IBM are now promoting Internet-scale research at places like the University of Washington and Purdue by giving students wide access to their powerful computational infrastructure. Idea is to encourage the students to churn the data with the help of open-source tools like Hadoop used for processing Internet-scale data sets. Hadoop which is open source implementation of MapReduce, a software framework introduced by Google to support distributed computing on mega-scale data sets on clusters of computers. By the start of 2008 Google was processing over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters which gives a glimpse of Google's Internet-scale capabilities. In a similar kind of initiative to promote the cloud based distributed computing learning, Amazon Web Services (AWS) is providing their on-demand infrastructure to the educational purposes for free.
So far we have talked about the next generation of data which is coming out of high throughput technologies in different scientific disciplines, and we all agree that this will have greater impact on the infrastructure of research, research funding and beyond (if and only if this is managed properly). On a further note, this data will need to be annotated with metadata, then archived and curated. Each of these seems to be mammoth task which means focus should not be only on onetime analysis but also on future reusability and interoperability.
In following video Roger Magoulas (Director of Research at O'Reilly) talks about the Big Data in general and gives a glimpse into future technologies and general advice to organizations interested in improving their proficiency in handling web-scale data.
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=3ab0d8e2-34ce-4bed-83ad-07f7e5fb9690)






