AI and unlocking super-massive dark data

The rise of “big data” holds noble promises in terms of analysing behaviour, healthcare, trends and helping society as a whole. And it has a dark side.

Around 75% of “data” on the open web is unstructured, dark if you will. With little in the way of commercially available tools to unlock it’s power, organisations are turning to technology companies (such as Apple’s acquisition of Lattice) to make sense of this data.

And there is a lot of data out there. It’s estimated that by 2020 there will be 44 zettabytes of data available across the interwebs.

That’s 44,000,000,000,000,000,000,000 bytes of data.

With that amount of data to mine, the role of machine learning is essential to maximise the potential of big data. And this is where AI comes in, dark data consists of jumbled information without tags, lacking categorisation and structure. It’s the job of AI to make sense of this data jungle.

Big data has enormous value to organisations and, consequently, we see some of the largest data organisations on the planet (think Facebook, Google, Apple, NSA, and GCHQ) investing heavily to unlock data’s true potential. Although we might think of Facebook as a social network, or maybe an advertising company – in reality Facebook is a data company that monetises it’s access to vast data through a myriad of channels.

As people move to existing purely in the Information Complex, consumers in this post-product world have become the “thing” that organisations now mine. We are the raw material. Whether that’s for sales and marketing, or for managing society, it’s clear that everything we do and say is now up for grabs. For a price.

There are some clear benefits to society as a whole; crime detection, biotech research, relationship management and lifestyle automation. But there is also the growth in commercialising our “private data”.

There is, in fact, very little that is private anymore; Facebook, Google et al, all process every single piece of data that passes through them – and that’s not restricted to “posted” information. Every website you look at, every product, every search, and the adverts you see are all drenched in data and tracking information.

Much of the data might be notionally “anonymised”, but the raw data is there, and it’s only a matter of joining up the dots using powerful AI systems to make sense of all of it. These systems already exist, they might not be used widely due to the legal and lobbying actions of pro-privacy organisations, but this will change. This is changing, now.

Sadly, the conflicting needs of large organisations versus the man-in-the-street’s desire for free services has generated a culture of acceptance, or ignorance, when it comes to poor (or no) privacy.

In fact, seeking any form of privacy on the internet today is costly in terms of effort, money, knowledge, interoperability, features, and performance. It represents another application of the two-speed web, with the rich and informed maintaining privacy at the expense of the less well-off, and less informed.

All your bases are belong to us

Should we accept that all our data is for sale? That is to say, everything we do, say, write, post, look at, or even think about (via deduction) is available to any organisation, anywhere in the world, for a price – and we get no direct slice of the sale price – or even know who’s got our data.

I’d argue not, and there are others who think the same. There are services out there that strive to protect us all from the excesses of data mining, but they are not popular as they lack the commercial clout to market themselves, and besides – the man-in-the-street expects all this for free. Doh!

There might be a glimmer of hope. GDPR Come into effect a year from now. It offers the potential for greater protection of privacy – most individuals have never heard of it and probably have no interest in it, but be assured that big data companies are very aware of it and are lobbying hard to make it work for them, not against them.