The term big data turned up around 2012 and quickly became a frequently-occurring word in business journalism. At the time, one didn’t know how important big data would become. The magazine The Economist claims that the world’s most valuable resource is no longer a raw material, but rather data.1 But before we explain how this can be the case, we must define what is meant by big data. The term per se indicates that it is about large quantities of data, which is correct. But how large? Extremely large is the answer. The more data that becomes available, the more intricate patterns, connections and trends can be revealed, something that is extra interesting when it comes to people’s and companies’ purchasing preferences. It is about understanding the customer, through the data that they have previously generated.
Without ‘big data’ you are blind and deaf in the middle of a freeway. | Geoffrey Moore, management consultant and writer
Let’s make a comparison to get an idea of the quantity of data that we are talking about. Data is stored in bytes (a byte is eight bits, in other words a combination of eight ones and zeros). If every byte symbolizes a grain of rice, a modern laptop with a 500-gigabyte hard disk contains the equivalent of 1,500 trucks full of rice.2 This can sound like a lot compared to the first home computers, who’s memory was typically equivalent to a tea cup of rice, but it is nothing compared do some of today’s actors.
The unbelievable amount of information that Internet companies like Google, Facebook and Amazon collect about their users corresponds to enormous amounts of rice. Every Google search, every status update and every online purchase generates new data that is save and can be analyzed. Qualified estimates indicate that they have reached exabyte (1018) levels, and Google at least is getting close to the zettabyte (1021) level. Expressed in grains of rice, that would fill the entire Pacific Ocean. At the current growth rate – IDC claims that the quantity of data doubles approximately every other year – we will be up to the yottabyte (1024) level within ten years.3 That corresponds to a ball of rice as large as our entire planet. The word “big” in big data thereby has its explanation:
- Byte: one grain of rice
- Kilobyte: one cup of rice
- Megabyte: eight bags of rice
- Gigabyte: three truckloads
- Terabyte: two container ships
- Petabyte: covers a small town
- Exabyte: covers a small country
- Zettabyte: fills the entire Pacific Ocean
- Yottabyte: a ball of rice as big as the Earth
A natural question is: what will they do with all that data? Brian Krzanich, the CEO for the processor manufacturer Intel, answers this question well. “Those who have the best data can develop the best AI tools, smart algorithms and data analysis.”4 The reason is, to a large degree, that today’s modern solutions for AI and machine learning require a lot of data for training. The systems become more precise and sophisticated the more data they have access to.
By accumulating huge quantities of users of websites and apps, certain companies gain access to enormous amounts of data that can be used to refine AI systems to be even better. Being number two is the same is sharing hundredth place – it will be unbelievably difficult to catch up and create equivalent performance without plenty of data. This is also the reason why so many apps are free today. By collecting unique data, one hopes to later be able to utilize this new knowledge – or sell it.
Selling data has become a large market per se. It is probably true that the most important information is that which one doesn’t have. It is becoming increasingly common to buy third-party data that can be combined with one’s own. An example is the company Q Data, which as created a marketplace for data buyers and sellers. There, one can for example buy data about places that 90 million Germans visited during 2017, or data about 100,000 stroke patients during one month in 2018. The latter costs 400 thousand US dollars. Another example is Oracle, which offers its customers five billion customer profiles and a billion company identities. Iota and Fetch are other market platforms for purchasing data that are in the process of emerging.
That people’s integrity can suffer as a result of this extensive data collection is obvious. We constantly leave digital footprints behind us when we use apps and websites. Each Google search, each Facebook like, each credit-card purchase, each phone call, each film on Netflix, each GPS navigation and each lap with RunKeeper generates data that can have value on a market. If a service is free, it is usually you and your data that are the products. There is always free cheese in a rat trap, as the saying goes. The perhaps most publicized recent case of intrusion on integrity was the exposure of Cambridge Analytica, which in violation of Facebook’s rules misappropriated 50 million user profiles belonging to American voters prior to the presidential election in 2016.5 A certain amount of carefulness and sensitivity appears to be required when collecting and managing big data.
Regardless of which industry one is in or how large an actor one is, this trends clearly shows the importance of beginning to collect data. Those who get an early start can get a jump on the competition that can be difficult to catch up to. Even if it is difficult today to understand its full value, the magnitude of this shift is enormous, which will become more apparent with time. And of course, exploring the value of the data that one already owns can also be a valuable analysis.
1 The Economist (2017, 6 May). The world’s most valuable resource is no longer oil, but data. Available: https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data
2 Moughal, J. (2017, 14 July). What is big data?. [blog post]. Downloaded 2018-10-22 from: https://www.c-sharpcorner.com/article/what-is-big-databig-data/
3 IDC (2014). Executive Summary: Data growth, business opportunities, and the IT imperatives. Downloaded 2018-10-22 from https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
4 Krzanich, B. (2018, 15 November). Data is the new oil in the future of automated driving. [blog post]. Downloaded 2018-10-22 from https://newsroom.intel.com/editorials/krzanich-the-future-of-automated-driving/
5 Svenska Dagbladet (2018, 4 April). Facebook: 55 000 svenska användare drabbade. Available: https://dfw.cbslocal.com/2018/03/17/data-analytics-firm-harvested-50-million-facebook-profiles/