Is there a data volume required to do Big Data?

Is there a data volume required to do Big Data?

My company only has 100 gigabytes of data, is it ready to “do Big Data”?

To speak the truth, this question does not really make any sense.

The volume of data you have access to is highly dependent on your activity. 10 GB of video recordings do not equal remotely 10GB of text: the 5+ million of Yelp user reviews involving nearly 174,000 different companies fit on a simple 4GB1 USB stick… And yet, these data are sufficient to run properly designed learning and prediction algorithms.

The term ‘’Big Data’’ is vague, probably deliberately. It stands, all at once, for a volume of data, a set of technologies and techniques for managing and analyzing large amounts of data, specific types of data, a new paradigm for how a company operates…

As a result, Big Data (like “Artificial Intelligence” or “Deep Learning” for that matter), is perceived as something impressive, perhaps even frightening or mysterious, and which thus has the effect of provoking two types of behaviour in digital actors:

For the formers, we believe that it is an integral part of the business operates to take advantage of trends that are on the rise to promote the business. And this may really push the use of Big Data tools in the medium term.

To the latter, we answer that they are probably actually richer than they think… and that the data does not need to be big to be useful.

If you are looking to find out if your company is ready for Big Data, you have probably already approached the BDMM (for Big Data Maturity Models) models created precisely to define your degree of maturity before starting an evolution towards Big Data.

But tell me, when you consider doing something, do you appreciate it when you are told you are not mature enough to do it? Nah. No one likes it.

The same goes for Big Data. You may not yet be ready or equipped to do it; but mature? yes, you are. And this, as soon as you work in a company whose raw material is data and you have started even a questioning on your operations and your future strategy.

BDMM models are just one of many opinions (and are therefore always good to take), but they can also discourage you from starting your evolution towards Big Data.

We consider you ready to start this change as soon as you:

What about the data volume?

This question still arises at this point, but this time regarding the use of Machine/Deep Learning algorithms: depending on your activity, even several hundred Mb of data may be sufficient to take advantage of the power and relevance of automatic learning. First and foremost, it is the quality of your data. But let’s not kid ourselves either: if you only have an Excel file with 12 rows and 8 columns, just go for a paper and a pen.

At the extreme opposite, the phenomenon of infobesity, or this obsession with data acquisition or unbridled data generation, can quickly turn into a curse if you do not have the right tools to manage these volumes, and if these volumes are such that you are no longer even able to really understand what is in them. It is better to have relatively little data but clear and of good quality, than a huge data center filled with data that may not be relevant—or even incorrect—and then leading you straight to totally wrong decision-making.

In addition, the growing and increasingly simple access to external databases makes it possible to considerably enrich the data you have. These external data (weather, images, sounds, text, cartography, etc.) provide an important value by offering you a broad and solid base of information that complements the data you have, those that are specific to your business and make it unique.

In any case, the gradual adoption of Big Data tools is a profitable investment for the future as it will drive you more to improve the quality of your data, the way you manage it and how you then process it.

Just as a Dolby Surround 5.1 system will make you want to watch movies in 4K instead of 480p, access to the various Big Data solutions will make you want to explore new tracks from your data and the means to do so.


1https://www.kaggle.com/yelp-dataset/yelp-dataset