Data Lakes and Big Data Analytics

Big Data Analytics

As the quantities of data collected today by businesses grow exponentially, managing and then analyzing these vast datasets is now an essential skill all enterprises must master.

If data is the new oil, then this resource must be harnessed by all businesses. Having a Big Data analytics strategy for your company is vital to ensure the real value – often hidden inside the data collected – is revealed and then utilized.

As the volumes of data are about to increase yet again, thanks to IoT and the rollout of the 5G network, when every device is connected to the network, the quantities of data available will explode. If your business is struggling today with its data analysis approach, the near future will bring even more data analysis challenges.

Having a detailed data analysis strategy is also a business imperative. Those companies that can connect their datasets to BI will be able to leverage the insights they gain to accelerate product and service development.

Big Data analysis touches all market sectors: From banking to healthcare, using data to influence outcomes and innovate, is now a core competence all businesses are developing at pace. Applying Machine Learning techniques, the BI that is delivered from the vast datasets being assembled is a resource no business can ignore.

The power of Big Data analytics is the insights delivered. These insights lead to increased customer personalization and accurate predictive marketing. These techniques can take multiple data points and assign these to individuals or groups. Here, data becomes a flexible tool; all businesses can afford to use, as analytical approaches can scale.

“Predictive analytics is the high-value use case we are beginning to see in the market, particularly with the maturity of IoT solutions,” Michael Glenn, Market Intelligence at Exasol told Silicon. “For example, operators of trainlines fit their trains with sensors that can predict when a particular part is going to break, so that service can be done before the fact, rather than waiting for the breakage and subsequent disruptions in service.

“To do this well, you need a powerful analytics engine to look at historical performance data, and near-real-time data (e.g. the temperature of a particular part). There are massive opex reductions when predictive analytics are successfully deployed, as well as obvious benefits for customer experience when travel disruptions are reduced.”

Big Data analytics

In their report, BBVA succinctly identifies five components of Big Data analytics: volume, velocity, variety, veracity and value. Balancing these components of your business’s approach to data analytics will ensure practical action can be taken from the information mined from these vast datasets.

The large cloud service providers have ensured their services have high levels of analytics built-in. Microsoft Azure is a good example. Research by Forrester concluded: “A wide range of IT professionals, including database administrators, data scientists, and infrastructure support become more efficient as a result of the time savings that Azure Analytics with Power BI provides with its better tools and automation. Across the board, the average time savings is 1.73 hours per week. Business users include power users such as business analysts as well as consumers of business intelligence. The average time savings is 1.75 hours per week. The total risk-adjusted savings in effort over three years is $4.9 million.”

It is also vital to understand the tools that are now needed to extract insights from the large datasets, businesses are currently handling. Speaking to Silicon, Chris Roberts, head of data centre at Goonhilly Earth Station, said: “The sheer volume of data from IoT will make AI and machine learning a necessity. We are anticipating a wave of new applications that use the open mapping data from Earth observation satellites combined with IoT data from autonomous vehicles and smart speakers, for example.

Roberts continued: “While environmental applications are first out of the blocks, there are many business opportunities arising from using this open data – from allowing companies to price competitors’ assets, to predicting GDP and global conflict. An early example was when Orbital Insight forecast US retailer JCPenny’s struggles even before it shut down 130 stores in 2017 and before Wall Street had an inkling. It was tracking satellite images of 250,000 car park spaces for 96 retail chains across the US when it spotted trouble.”

Seeing patterns

As much of the information contained within the data businesses are managing will be highly personalized, security is paramount. As data is analyzed – often with automated AI systems – bias must be minimized from the results given.

Says Warren Poschman, Senior Solutions Architect, Comforte AG: “One of the biggest challenges to data analytics is how to provide uniform privacy and data security across all platforms while still maintaining open access and not destroying the analytic value of the data – something that is necessary to glean the valuable insights what drives the modern business. This is where data-centric security technologies such as tokenization shine – allowing organizations to protect their data while maintaining referential integrity across multiple platforms.

“The key is not to rely on anyone platform but instead to protect data across the enterprise such that data is protected upon ingestion, cross-platform analytics are performed on protected data, and shared data is actually protected data. Not only does this ensure privacy at all times and in all systems, but it also enables analytics on large datasets that would have normally been heavily restricted or off limits due to their data toxicity. The end result is that organizations can monetize their data and gain fruitful insights while being compliant with privacy regulations and satisfying internal risk and security.”

For CIOs and CTOs, Big Data analytics is now a fact of life. Commenting to Silicon, James Lawson, AI Evangelist, DataRobot said: “AI and Machine Learning are unlocking the true potential of Big Data analytics. Without intelligence, Big Data often has limited practical value to justify storage costs. Across every organization we’ve spoken to over the last 12 months, the use of AI is dominating the Big Data agenda for CIOs. This has been validated by external surveys, with PWC analysis suggesting 77% of executives see AI and Big Data as interconnected.”

McKinsey in their report into advanced analytics concluded: “Analytics create value when big data and advanced algorithms are applied to business problems to yield a solution that is measurably better than before. By identifying, sizing, prioritizing, and phasing all applicable use cases, businesses can create an analytics strategy that generates value.”

The C-suite is driving Big Data analytics. Increasingly, businesses can see the true value of the data they hold if they have robust and meaningful analytics in place. The future will see analytics continue to mature and expand its reach. Machine Learning is also rapidly developing in parallel, as automation is necessary to make sense of these expanding datasets.

Silicon in Focus

Aditya Sriram, Lead Data Scientist, Information Builders.

Aditya Sriram, Lead Data Analyst, Information Builders
Lead Data Analyst, Information Builders.

Aditya Sriram is a PhD candidate in the faculty of Engineering at the University of Waterloo where he is a member of KIMIA Lab (Laboratory for Knowledge Inference in Medical Image Analysis). He brings an extensive understanding of how Artificial Intelligence moves research and industries forward.

Since 2011, his research activities encompass content-based retrieval of medical images using machine learning, deep learning, and computer vision approaches. He has developed learning schemes and descriptors for medical imaging and published his works in top-tier journals and conferences. Aditya Sriram is a Lead Data Scientist at Information Builders in Toronto, Ontario, Canada.

How has Big Data analytics evolved over the last few years?

The genesis of Big Data-led organizations is to strategize best practices to better store and aggregate their data using technologies like data lakes, business intelligence platforms and master data management (MDM). Over the last few years, Big Data solutions have evolved from data residency to optimization and retrieval of data. Moving forward, the integration of Artificial Intelligence with Big Data will enable organizations to operationalize their big datasets to drive actionable insights.

What are the key trends in analytics with big datasets? Predictive analytics, for instance?

There are three key trends in analytics that depend on big datasets; these are Artificial Intelligence; Streaming IoT; and Cloud Computing. These trends are progressively evolving business intelligence (BI) platforms to help organizations change how they conduct and gain insight into their businesses.

  • AI Platforms
    The coupling of AI and Data Governance is designed to help organizations sanitize their data to gain a single consolidated view of their data sources. In this application, AI is positioned as a trusted advisor to provide guidance and recommendation to detect outliers and suggest data corrections.
  • Streaming IoT data
    Many organizations are strategically incorporating sensory data and real-time data into their business processes. Although IoT data is useful information, the value-added component is to couple IoT with AI to provide more accurate responses to the data in real-time. The generated data responses emphasize the importance of a BI platform that communicates the results, or alerts, appropriately with humans for actionable outcomes.
  • Cloud Computing
    While some organizations may want to keep their data on-premise, it becomes expensive to store and maintain high volumes of data incoming from multiple sources. Hence, cloud and hybrid cloud solutions provide a quick and easy way to access big data, significantly reducing the overall cost for organizations.

How is AI and Machine Learning impacting Big Data analytics?

The overlap of AI and big data is being cultivated to be a synergistic relationship wherein these disciplines often work in concert as AI is valueless without meaningful data, and Big Data is now dependent on analytics driven by AI. The following are a few examples showing where AI is dependent on big data and leverages the use of large datasets:

  • Retrieval and reasoning
  • Automated learning and dynamic scheduling
  • IoT streaming data
  • Natural language processing
  • Computer vision (images or video data)

Natural Language Processing (NLP) is among those fields in AI that require massive amounts of data. For example, NLP techniques will not be possible without large samples of human speech, written records, and recordings. To obtain a generalized model for NLP, the AI algorithm needs to capture high volume, variation and variety of linguistic data points to yield high accuracy. In conclusion, big data is continuing to grow, and AI will be used in concert with big data to help the end-user by automating tasks.

Is a new approach needed for data analytics when IoT becomes more widespread?

As IoT engagement strategies increase within organizations, the combination of IoT streaming data and AI to enable businesses to collect and transform data into usable and valuable information will be at the forefront.

Common interdisciplinary use-cases include predictive maintenance, conversational agents (chatbots), automatic customization of KPI to enhance user experiences, dynamic thresholding, modern cybersecurity and outlier detection. In essence, IoT improves business models by providing accurate real-time information on the systems; thereafter, AI absorbs the dynamic nature of IoT data to provide actionable insights.

How are privacy and security being supported across the analytical systems being used to find value in large datasets?

There are undoubtedly significant challenges in securing Big Data, which include: protecting data and transaction logs, validation of inputs, access control, and real-time preservation of privacy. Although encryption at multiple stages can ensure confidentiality, integrity, and availability of data; enterprises are avidly working on big data best practices that promote innovation without sacrificing the privacy of data.

These practices include effectively using big data to become highly competent in procuring and managing cloud services by well-defining the responsibilities for both the cloud service provider and the cloud service user; and sanitizing the data to avoid the aforementioned privacy issues by consolidating the data to one-source of truth by cleansing, pruning, matching, and merging data at the initial stages.