>Make no mistake: the Big Data movement has done more than just introduce a tidal wave of aggressive marketing campaigns. While it’s true that the “Big” in Big Data is an increasingly ambiguous term – as others have recently noted, someday we’ll just call it “data” again – there’s no denying that it has fundamentally changed the manner by which organisations today make decisions.

Beyond fundamentals, this movement has significantly increased the demand for two relatively new and critical organizational roles: Data Scientists and Data Explorers.

Defining the Roles

The Data Scientist role is key because the characteristics of Big Data (volume, variety and velocity) require professional data management, data mining and modeling expertise, with a special emphasis on the richest analysis types (including statistical and predictive). The Data Scientist also needs new skills gained from experience working with the modern, multi-structured data types common in a Big Data solution.

These new skills can be acquired most easily by attending vendor training (an example is IBM’s Big Data University, which offers a wide variety of online courses to learn all about managing, processing and analyzing Big Data) and by just rolling up the sleeves and building a pilot project that uses some of the newest Big Data technologies.

The next most important role is called – at least at Jaspersoft – the Data Explorer. This term is broad and spans several analytic skill levels, but is meant to describe anyone in a business function who has the need to put Big Data to work to help make better business decisions. The Data Explorer brings critical business domain knowledge to the table—knowledge, without which, gaining new insight from Big Data simply isn’t possible. The combination of the Data Scientist and the Data Explorer create a new level of unity between business analysis and IT. In this sense, Big Data is the driver of this newfound unity.

Bridging the IT-Business Gap

The general division of labor between the Data Scientist and the Data Explorer is becoming a better-defined process that essentially involves the Data Scientist modeling and analyzing the data in its richest and most timely forms (admittedly in a wide variety of ways). In this sense, the Data Scientist may not even know the relevant questions to ask of the data prior to his analysis; rather, some of his most valuable discoveries may be uncovering these relevant questions.

The Data Explorer is most interested in iterative discovery probably on more constrained data sets, which are better suited to making any number of specific data-driven business decisions. Data exploration is, therefore, typically superior for helping to answer pre-defined business questions.

Once the data is captured from its origin (e.g., a live set of web click-stream data) and managed into a Big Data source (such as Apache Hadoop’s HBase or Apache Cassandra), the Data Scientist and Data Explorer can get to work. The Data Scientist may be involved in helping to prepare the data for use where possible (a process which commonly requires some use of Apache MapReduce or a traditional ETL – extract, transform and load – tool). When Big Data is being directly accessed and used natively, as is often the case with Jaspersoft, the Data Scientist would probably validate that a robust and useful data connection has been established. At this point, the Data Explorer is then also in a successful position to use an analytic or reporting tool to access, probe and analyze the data.

The Big Data Skills Shortage

Modern analytic and reporting tools, designed for working with Big Data, are quickly becoming quite powerful and easy to use even for the Data Explorer. While most articles and discussions focus on the skills shortage among Data Scientists (and this skills shortage is largely accurate, in my estimation), what isn’t talked about enough is the skills shortage among Data Explorers. By this I mean that EVERY business person MUST possess sound analytic skills in order to thrive in this new, information-driven economy.

Many of those in business functions today do not possess an adequate analytic skill set. And so I think this “volume skills shortage” will soon be seen as the bigger overall problem to solve. Ideally, colleges and universities must more commonly offer degrees and certificates in “Analytics” or “Information-based Decision Making” – or something along these lines, so that a much larger number of graduates who possess a reasonable level of analytic acumen become available

The Big Data Change Agent: Open Source Software

Lastly, I am proud that open source software has become such an important change agent during this past decade. It has provided an unbelievably affordable, powerful, secure and modern foundation for a completely new IT infrastructure (cloud-based, scale-out, mobile-connected) and has enabled affordable access and usage of these new Big Data types. In each major area and layer of software, we find open source leading in features, functions, and breadth of use. In fact, the continuing maturation of open source cloud and Big Data software systems is transforming the modern computing landscape right before our eyes. No wonder that nearly all of the most important Big Data projects have come from the open source community and the democratized skills it has nourished.

Ultimately, it’s this democratization that will allow more people in more organisations to thrive in an increaingly competitive, information-driven economy. It is clear that we are all now competing on the basis of time and information. Big Data and open source technologies are allowing nearly anyone to compete and succeed in this new battleground, regardless of size. Building and breeding more Data Scientists and Data Explorers is now required to allow the continued growth and success of this new Big Data era.

Brian Gentile is CEO of open source firm Jaspersoft.

Do you know about Linux? Try our quiz!

TechWeekEurope Staff

Recent Posts

Protestors Clash With Police At Tesla Gigafactory In Germany

Hundreds of climate activists clashed with police outside Tesla gigafactory near Berlin, in protest over…

16 mins ago

Google I/O: Google Gemini, Project Astra Etc

AI very much the focus at Google's annual developer conference, including Google Gemini and a…

59 mins ago

OpenAI Co-founder Ilya Sutskever Departs To Work On ‘New Project’

Co-founder and chief scientist Ilya Sutskever to leave OpenAI, after role in Sam Altman's firing…

5 hours ago

Biden Administration Imposes 100 Percent Tariff On Chinese EVs

Electric vehicles made in China are now subject to a 100 percent tariff, to protect…

5 hours ago

Microsoft Faces EU Antitrust Charges Over Teams

Microsoft faces formal EU antitrust charges over videoconferencing app Teams after concessions to European Commission…

1 day ago

New Jersey Apple Store Workers Vote Against Unionisation

Workers at New Jersey Apple Store vote against joining union as post-pandemic labour drive at…

1 day ago