Big Data: Marching Toward A Knowledge-Driven Society

Praveen Mandal from SGI explains the steps we will need to take to exploit the potential of Big Data and the Internet of Things

The invention of more cost effective, low power, miniature sensors with ubiquitous network access coupled with an advancing cloud infrastructure is creating a market disruption: the Internet of Things. We are witnessing the birth of a new era in which our movements, behaviour and buying habits are continuously captured. Athletic wear, thermostats, fire alarms, even golf clubs are now capable of recording and storing data around the clock.

Alex Pentland, a leading researcher at Massachusetts Institute of Technology’s (MIT) Human Dynamics Lab, suggests that once you collect enough data, you can get what he calls a “God’s eye view of humanity”. How exciting and scary is this?

With this big data disruption come increased challenges. How do we keep up with all of this data growth, analyse what is relevant, discover insights and in some cases, predict and change the outcomes for the better?

Discovery: the “unknown unknowns” problem

HerminFormer US Secretary of Defence, Donald Rumsfeld once said something profound:

“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.”

A majority of the energy has been focused on data mining for the “known knowns” and the “known unknowns;” search engines are one of the most widely used tools for this endeavour.

What about the “unknown unknowns?” How do you form a hypothesis to validate when you don’t even know where to start? Furthermore, how much of our past research needs to be revisited because we started with a narrow hypothesis that was proven with a severely constrained or incomplete data set? The process of knowledge discovery must be free from a biased initial hypothesis; especially when humans are only capable of analysing a minimal set of dimensions of data at any given time. Discovering answers to some of life’s more perplexing questions will require advancements in four critical areas:

  1. Access to large enough data sets
  2. Advanced algorithms and technologies
  3. Human’s intuition and judgment
  4. Data security and privacy polices

Big data sets

Although it can be viewed as daunting, the increasing size and variety of data sets is creating tremendous opportunities. Larger and more heterogeneous data sets will help create algorithmic models and validate their relevancy and accuracy.

As an example, access to large personal data sets will allow leading researchers to study human and social behaviour and enable them to model how social networks work by observing what people do. These data sets are created with the help of various sensors and social media activity, all marked by time and geospatial coordinates. 

Leading researchers from MIT are studying these ‘data crumbs’ to better understand how to design more efficient urban areas (a.k.a. ‘City Science’), communication networks, organisations and how to better to put the right incentives in place to deliver better outcomes. This study of collective intelligence, idea flow within a social structure, how this translates into changes in one’s behaviour is what researchers are referring to as “Social Physics.”

Advanced algorithms and technologies

Data analysis is moving from simple rule-based algorithms to model-based algorithms through the use of artificial intelligence techniques such as machine learning. Models allow computers to observe and process multiple dimensions versus the very limited (two or three) dimensions that typical simple rule-based algorithms take into account. This ability to observe and analyse across thousands of dimensions will open doors for more relevant and accurate models, allowing both man and machine to gain data insights to predict and influence better business outcomes.

Viktor88However, there will be no one-size-fits-all solution.  Every data set will have its own characteristics but the measure of success is ultimately the same – create relevant models, gain insight, predict outcomes and drive value.  As a result, we are now starting to see industries transform the way they look at their landscape. For example, High Frequency Trading (“HFT”) is now transforming itself into the “intelligent trading” market by making use of advanced analytics.

As algorithms become more advanced, it will put pressure on various technologies to provide real-time insights into various data streams. We must think beyond Moore’s Law – how we load, move (or don’t move), process and store data so it satisfies the needs of the workflow of interest. This will require advances in processor, networking, memory, storage and software technologies.

Human intuition and judgment

What is the role that computers should play in artificial intelligence (AI)? Some of the leading AI researchers, such as Marvin Minsky, believe that ultimately you can model intuition and judgment, whereas many philosophers argue that unconscious instincts can never be modelled. This debate will linger on but until AI can get us there, there will be cases where we must rely on human intuition and judgment to help with the process of knowledge discovery.

Advanced interactive visualisation software tools can help with this discovery process, especially for “unknown unknowns” problem sets. Human interactivity with machine-generated data can result in updates that make the machine learned models more accurate, further increasing our insatiable appetite for value driving faster time to insight. 

Data security and privacy polices

As we know, data exists everywhere and data generation technologies are on the verge of ubiquity. Technology titans such as Google, Facebook and Dropbox all have access to various aspects of personal business data. Policy discussions about sharing data across potentially geopolitical boundaries will eventually result in accessible data that advances big data research and innovation while maintaining data security, data privacy, data sovereignty and enforcement for violations.


There is a TV commercial that talks about “the age of knowing” and it couldn’t be truer. It is exciting. The massive amounts of data being generated create opportunities to gain insights into things that can help improve health care, public safety, a corporation’s sales performance, city planning and communication efficiencies just to name a few. 

The insights will come through research and innovation that is driven by mathematics and computer science working in tandem with inter-disciplinary fields, such as physical sciences, social sciences or the humanities, to create relevant computational models. These are exciting times and in 5, 10, 20 years from now, it will be interesting to take a look into life’s rear-view mirror and see the knowledge we have gained through analysis of large data sets and its impact on humanity.

This article was written by Praveen Mandal, SVP and GM of Emerging Technologies at supercomputing specialist SGI.