April 20, 2012

Analytics, Big data - the emerging revolution of Data Science

Guest post by Arindam Banerji, Unit Technology Officer, Manufacturing, Infosys Limited

Like it or not, how we think of data science and business intelligence is changing; not only in terms of technologies and capabilities, but also in terms of what consumers of such technologies expect. The changes are drastic enough to warrant thinking about this as the new era of how science is done - its impact could be as large as the introduction of the web, with new business models, sub -industries and entirely new ways of doing science.

Based on Turing award winner, Jim Gray's work, Jnan Dash offers up, why data science is quite probably an entirely new model of doing science:

[So what is the Fourth Paradigm? Here is the explanation.
1. Thousand years ago - Experimental Science
- Description of natural phenomena
2. Last few hundred years - Theoretical Science
- Newton's Laws, Maxwell's Equations...
3. Last few decades - Computational Science
- Simulation of complex phenomena
4. Today
- Data-Intensive Science (unify theory, experiment, & simulation)] - Jnan Dash in "The Fourth paradigm in Science"

Not surprisingly, Mike Loukides in his blog http://radar.oreilly.com/2010/06/what-is-data-science.html observes:

"The future belongs to the companies and people that turn data into products"

Definitions being what they are, analytics, BigData or data science or call it whatever you will, is changing how we think of running our companies, make decisions, create new business models, manage risk and reinvent the nervous systems of the IT that run our institutions (corporations, non-profits).

So, what are the changes that are revolutionizing business intelligence, data and our products?

Predict the Future
Volatility in markets and the globally connected nature of disparate businesses, make gut-feel decisions ineffective. The parameters for business decisions are now so complex, that decision makers across corporations are asking us data scientists - "help us predict the future - simple post-mortems of the past will not cut it".

So, a windshield maker may want to predict the changes in Chinese auto industry growth, to help bring accuracy in sales & operations planning. A fast food restaurant chain needs a better handle on prices of buns in the market, 2 months out. As we'll see later, the source of such analysis cannot be based simply on our traditional sources of data.

Visualizations
Depending up-on who you ask, at most 5% to 8% of people who should be using BI tools within corporations actually do end up using them. The result is poor decision making and sub-optimal visibility into events.

The fix for this, is not just a question of training or overcoming inertia - but, fundamentally rethinking the science of data visualization or as some people call it - "telling a story with data". Thankfully, this is happening and good examples of this exist - such, as the visual growth of Walmart stores in the US at http://projects.flowingdata.com/walmart/. Visualizations, that can be easily contextualized for the kinds of problems and decisions at hand and also be made more sensitive to the needs of the decision-maker skills, are critical next steps. I find, initiatives, such as Many Eyes (IBM) that allow you to experiment with different visualization models, to be very helpful in deciding on appropriate models for telling-data-stories.

Big Data
What is Big data - several definitions exist but perhaps one of the better definitions is "big data is when the size of the data itself becomes part of the problem". The volume or size of data that we're beginning to deal with is almost oppressive - as far back as 2007, Hilbert surmised "human kind was able to store as much as 295 exabytes (trillion megabytes) of optimally compressed information in their technological devices". Martin Hilbert's analysis of technology capacity is an eye-opener (see http://martinhilbert.net/WorldInfoCapacity.html). But, we'll be remiss if we just focus on the volume/size of data, as Doug Laney points out; Velocity or the speed with which data gets built up/output and variety in data sources and formats are all critical elements of big data.

At sizes/varieties such as many researchers predict, traditional tools and approaches become ineffective - but, they also open up new opportunities. So, the insurance industry is now able to experiment with pay-as-you-go warranty schemes, as telematics in cars can collect myriad of information about our driving behavior - this was not possible, till a few years ago.

Clearly, the science of BI has changed - but, the possibilities are enormous, as we'll see in future posts; consider as an example, the ability to predict epidemic outbreaks.

[Scientists may have found a way to counter an upcoming influenza outbreak - Google.
"Researchers at Johns Hopkins have found 'Google Flu Trends' a powerful early warning system for emergency departments. They say monitoring Internet search traffic about influenza may prove to be a better way for hospital emergency rooms to prepare for a surge in sick patients compared to waiting for outdated government flu case reports"] - Times of India, Jan 12, 2012.

No comments: