How To Take An Apache Hadoop Concept Into Production

Apache: Big Data Europe 2015 – Gary Richardson, UK head of data engineering, KPMG, shares his top Hadoop project tips

Sell To Everyone

What people tend to forget is that, as technologists, we’re always pitching and selling ideas. We have to start thinking like guys who sell things for a living.

We’ve got to think of data less in terms of its technological merits but in terms of its opportunities – data opportunities within an organisation. If you sit down with a bunch of sales guys they’ll say things like: “Who’s actually going to buy this? Who are the influencers?”

It’s really the first step in taking a fantastic idea and transitioning it to the production stage. You have to get out there and start selling the merits of the technology.

Experiment

In my team, we’ve experimented quite a lot. We were the first people in the entire KPMG organisation to implement a Linux Box.

Our company used to be very risk-averse. “It’s too risky,” they would tell us. “You can’t do that. It’s open source. We don’t do open source.” We finally convinced them, we got Linux and experimented. Most companies hate the idea of failure but we took the chance and we’ve done some pretty miraculous things.

Think Data Architecture

You probably all think that’s what you do already. But what we don’t tend to do is talk to the people who stop projects from getting to production. The people who prevent things from getting to production, in your average organisation, are those guys who like to call themselves the enterprise architecture team.

In the world of Hadoop, you have to convince these people that Hadoop and its approach to data architecture is safe. Convincing these people is half the battle.

sophosTalk About Investment Cases

We talk a lot about great use cases. What we’re not talking about, and the bit that turns on the finance guys who write the cheques, are investment cases. As well as thinking like the people who sell, we have to think like the finance people.

Take credit scoring, for example. Mortgage lenders can use the technology to improve detection of who they shouldn’t be selling a mortgage to. If it improves that detection even by, say, six percent, that can have a $100m dollar impact on the bottom line. That can be done by taking every bit of logged data on the application form and comparing it to the data of good mortgage customers. To do this we need a log data processing capability, which Hadoop can give us.

Start Competing

When you look at what you can do with the likes of Apache Hadoop and Apache Spark there is the great capability to compete. You can use them to out-compete your competitors.

You have to think competitively. There are great examples of terrific companies like Spotify who have gone from having 100 nodes to 1,700 nodes because they can see the benefit. It can improve the service they offer customers.

In the insurance world, imagine a situation where before you make a claim because your washing machine has broken and water has gone everywhere, a device was streaming the data into one of these platforms that could predict an upcoming potential problem and recommend a service engineer be called out. This would make you, as an insurer, a much more appealing proposition than other insurers, and this is the kind of thing we need to be thinking of. The technology exists to make us better than out competitors so we must take advantage of that.

How much do you know about open source technology? Try our quiz!