by Alexandra Ritter & Florian Stahl
Through the digital transformation, a new kind of data-driven economy has emerged “based on the datafication of virtually any aspect of human, social, political and economic activity” (Ciuriak 2018). Digital Data is considered to be the new capital as it has transformed numerous aspects of the economy, simply owing to its unique and distinct characteristics that differ dramatically to analog datasets (Cukier 2010). For instance, real-time tracking of an individual’s location is possible through digital GPS technologies, giving companies micro insights on consumer’s preferences. Hence, digital data is becoming an important economic asset for companies’ corporate performance and growth (Chase, JR. 2013). The effective transformation of data into valuable insights can lead to “a new type of competitive advantage” (Lambrecht and Tucker Catherine 2017). In order to conduct such a transformation effectively, firms need to gain an understanding of the specific data characteristics.
This blog post intends to provide a concise overview of the most important economic attributes of digital data and aims to analyze the multiple trade-offs that firms face within the new digitalized world.
Definition of Data
Big Data is a concept with multiple definitions that vary depending on the context. However, it is typically described by three – or recently often by four – “V’s”, which are considered to form the base of its definition: the volume of data, the variety of gathered data, the velocity at which data is collected and the degree of veracity (Gandomi and Haider 2015). Moreover, data is also considered to be a non-rival intangible asset made of bits which is a key distinction to goods made of atoms (Goldfarb and Tucker 2019). Additionally, several data definitions include the interplay of processing technology and analytical methods as necessary prerequisites to utilize data (e.g. Boyd and Crawford 2012). The following figure summarizes the understanding of big data in this blog post:
Distinction of Data, Information, and Knowledge
In everyday conversations, data, information, and knowledge are often used interchangeably, even though they refer to different economic goods. Therefore, the purpose of this section is the definition and demarcation of these terms to gain a better understanding of their effect on the economy.
First of all, data, information, and knowledge are all intangible assets, each having its own characteristics, creating different kinds of utility (Boisot and Canals 2004). Information is – compared to physical goods – non-rival and can be divided in bit strings, such as 110001 (Jones and Tonetti 2018). Data is a type of information “defined at the syntactic level” (Duch-Brown, Martens, and Mueller-Langer 2017), meaning that it can be interpreted as the grammar of a language or the arithmetic operators in mathematics. In turn, the term information is described by Duch-Brown, Martens, and Mueller-Langer (2017) as “the semantic content that can be extracted from data or signals.” Information transforms the syntax into a meaningful context and thereby changes expectation or current knowledge. This extraction is only possible with prior structural and contextual knowledge (e.g. Boisot and Canals 2004). With this knowledge of numbers and arithmetic operators, a person can, for instance, read a formula. Summarized, data becomes information when it is processed, organized, analyzed and placed in a meaningful context. Information is transformed data that can be analyzed with the help of contextual and structural knowledge. The interplay of these terms is visualized in the following figure.
Characteristics of Data Value
Data by itself is not valuable. But its characteristics, in combination with a firm’s technical ability to analyze datasets by using experiments and algorithms, can generate valuable insights, create profit-enhancing opportunities and form the basis for strategic actions (Lambrecht and Tucker Catherine 2017). A firm’s challenge is the identification of critical pieces of data in large data pools originating from various sources. In addition, data can only be effectively utilized by accounting for its quality features and recognizing them. Janssen, van der Voort, and Wahyudi (2017) underline that the better firms integrate suitable systems to handle and transfer big data, the easier it becomes to draw valuable insights from data analysis. This leads to the identification of new market drivers, key performances indicators and consumer demand patterns. These insights maximize the efficiency of organizational processes and the alignment between demand and supply.
Volume of Data:
When thinking about big data, volume usually comes up first to people’s mind. Titles like “Data, data everywhere” (Cukier 2010) highlight the immense amount of data that is available in today’s world.
Companies like Walmart handle over one million customer transactions every hour which equals more than 2.5 petabytes of data (Cukier 2010). Rich datasets on their current or potential customers are “an information mine” (Ciuriak 2018b) if data noise can be successfully excluded from the analysis. Therefore, a large amount of analyzed data improves a firm’s ability to predict general trends as well as individual preferences (Acquisti 2014; Acquisti and College 2010). However, the value density of data is falling when the volume increases, which demands an increased and more time-consuming analysis. Therefore, firms have to consider how to deal with a massive volume of data.
Economies of Scale of Data:
However, an advantage of large datasets are the resulting economies of scale created through efficiencies which are formed by volume, not by variety. Average production costs are falling as the volume of output increases (Goldhar and Jelinek 1983).
Economies of scale are steep in the data-driven market place in virtue of low distribution costs of digitalized products, (near) zero production costs (Rifkin 2014) and near-frictionless commerce. Additionally, collecting data advances data-driven services, which in turn may attract more customers, leading to more accessible data to collect. For example, the more people use Google as a search engine, the more accurate the services become as more data can be gathered. This positive ‘feed-back-loop’ enables stronger companies to cement their market position while weakening smaller firms (OECD 2015; Roxana Mihet and Thomas Philippon 2018).
Economies of Scope of Data:
Economies of scope are the advantages that can result from producing or distributing multiple products within similar processes (Chandler JR. and Hikino Takashi 1994). For example, a fashion outlet that sells clothes can in addition sell shoes, jewelry, and coffee and can therefore achieve economies of scope through diversification (Goldhar and Jelinek 1983).
Merging related datasets may lead to extracting more insights than interpreting and contextualizing the datasets separately would. Moreover, the costs of abstracting knowledge from the datasets decrease. For example, studying Business Administration and reading Business related literature increases the complexity and knowledge of a student as the lectures of the professors and the literature are overlapping. Furthermore, merging data from adjacent areas can also create economies of scope. Combining the data extracted from a person’s location with their shopping behavior and their pay data might enable firms to use the added insights to create more individualized advertising. (Duch-Brown, Martens, and Mueller-Langer 2017) Having a more diversified set of data leads to a positive ‘feed-back-loop’ on which a firm can capitalize.
Economies of scope and economies of scale of digital data are the reason why firms are “data-hungry”(Duch-Brown, Martens, and Mueller-Langer 2017) and explain data trade and mergers, such as Facebook and WhatsApp or Google and DoubleClick. By extending the collection of data, firms are able to match data better and thereby track and analyze customer behavior and preferences in a way that was not possible before the merger. Therefore, mergers may create entry barriers because competitors cannot replicate the information derived from the merged datasets. However, at some point, economies of scope might only add little value and lead to diminishing returns (Duch-Brown, Martens, and Mueller-Langer 2017).
Searchability of Data:
It is often tempting for firms to collect a tremendous amount of data because the cost of looking for data is near zero. Due to online search engines like Google or Yahoo, every person can search the internet and find and compare information within seconds. Hence, the range and quality of search are enhanced along with the ability to compare prices and product variety. Firms theoretically encounter higher transparency of prices but the endogenous characteristic of search costs give companies the opportunity to manipulate the web browsing to increase their surplus. Moreover, low search costs positively affect variety as it is easier to discover firms with fairly unknown niche products which increases their brand awareness and gives them the opportunity to increase their margin.
Interoperability of Data:
Digital technology has changed how information is produced, stored, distributed and transported. The storage, replication, transportation, search and verification costs of data have decreased to a minimum, enabling high connectivity between and interoperability of digital datasets (Duch-Brown, Martens, and Mueller-Langer 2017).
Interoperability is “the ability of two or more systems to exchange (…) info and/ or data and subsequently, be able to use it“ (Duch-Brown, Martens, and Mueller-Langer 2017). For instance, taxpayers’ data can be combined with their social security number or social media accounts to detect fraudulent tax payers (Janssen, van der Voort, and Wahyudi 2017).
Data as an Information Good:
Big data is an information good that is non-rival and partly excludable (Jones and Tonetti 2018), raising debates about privacy protection laws and welfare.
Non-rivalry of data:
. Non-rivalry refers to the characteristic of a good that can be utilized by multiple parties at the same time without loss of utility for anyone (Lambrecht and Tucker Catherine 2017; OECD 2015). An analogy for a rival good is a cup of coffee. If one party takes a sip of it, the utility of the coffee is diminished for other parties. In contrast, a common example of a non-rival good is that a “person can start a fire without diminishing another’s fire” (e.g. Goldfarb and Tucker 2019). If a firm is using cookies to track a person’s online behavior, other firms can still use this data to interpret this person’s online traffic. A number of firms or algorithms can use data simultaneously without diminishing its amount. The costs of replicating digital goods are zero. Therefore, digital data has been found to be non-rival by nature (e.g. Duch-Brown, Martens, and Mueller-Langer 2017) .
Excludability and accessibility of data:
. Several authors (e.g. Acquisti and College 2010) have discussed excludability, privacy issues, and ownership of data and whether governmental restrictions lead to an increase or decrease of welfare for the society. The recent academic debate has especially focused on the question if non-personal data should be protected through a new legal framework and how to manage access to such data.
Non-personal data cannot be backtracked to individuals (Surblyte 2016), for example traffic data. Semi-personal data is data that is anonymized but can be traced back to individuals through technological tools. Analyzing semi-personal data can reveal patterns of behavior or even preferences of particular products of individuals (Surblyte 2016). Therefore, semi-personal data is considered to be the most interesting data for enterprises. However, before discussing access to non-personal data, the question of whether or not data can be excluded has to be answered first.
“A good or an asset is excludable if you can prevent somebody from using it” (Roxana Mihet and Thomas Philippon 2018). Physical assets are excludable. For instance, whenever you close your restaurant, you exclude other people from entering it. Contrary, once data has been published it cannot be protected from the usage by other people. Many firms fear that they encounter creative destruction if they share their data as privileged access to data is regarded to be a source of competitive advantages (Ciuriak 2018a). Additionally, if data is not a by-product of transactions, its creation is often costly (Duch-Brown, Martens, and Mueller-Langer 2017). However, Jones and Tonetti (2018) argue that not sharing data leads to an inefficient use of it.
Easley et al. (2018) discuss the two fundamental questions regarding data ownership: “Who gains and who loses if consumer data is shared and what happens to the total surplus? Second, if there is private ownership of consumer data how does the initial ownership of data affect sharing and thus consumer welfare, firm profit, and surplus?”
The foundation of legal ownership rights lay in the Arrow Information Paradox (Arrow 1972) which shows that once data is seen by a buyer the valuable information of that data may be fully exposed. However, according to Duch-Brown, Martens, and Mueller-Langer (2017) firms can, especially on platforms, commercialize the value of their data without revealing them completely and therefore open up the opportunity for data trade. Additionally, the non-rival nature of data indicates that firms can bundle a large number of goods without substantially increasing costs and thereby reduce competition. Relevant examples of this phenomena are Netflix, Spotify and Apple Music (Goldfarb and Tucker 2019). Summarized due to its characteristic of being non-rival and having near zero cost of production and distribution, big data is imitable, which raises the question of whether or not ownership protection laws are needed.
A monopolistic approach eliminates data sharing which could lead to large power inequalities and block access to downstream users. Hence, firms would be able to charge high prices for users. In addition, it has been shown that excessive protection of data cuts down innovation and increase an imbalance in knowledge. In contrast, data sharing induces economies of scope and increases welfare (Duch-Brown, Martens, and Mueller-Langer 2017). For that reason, economists (OECD 2015) argue that datasets should be a public good, much like Wikipedia, in order to improve market outcomes. Providing data for free signals the firm’s skills and quality to potential employees and customers. Moreover, offering the core product for free attracts new customers while allowing firms like Spotify to sell their add-ons at a premium (Goldfarb and Tucker 2019).
Even though sharing data always increases total surplus, firms may be better off in an economy with undisclosed data. They may maximize their gains in markets where they are monopolists. However, they have to face losses in global markets when their private information’s become public. The market structure determines who stands on the winning side. If a firm does not own enough local markets, data sharing could result in a decrease of its profits. (Easley et al. 2018). Nevertheless, Easley et al. (2018) point out that excluding data to protect competitive advantages results in efficiency loss and might lead to a prisoner’s dilemma. In their opinion mechanisms like data sales or governance are needed to ensure “socially optimal sharing” (Easley et al. 2018).
Easley et al. (2018) and Jones and Tonetti (2018) conclude by saying that data property rights matter. Depending on whether firms, consumers, a social planner or the government own data, a different degree of welfare is reached. Jones and Tonetti (2018) show through a numerical example of their framework that welfare is maximized when data is owned by a social planner who shares data without thinking about creative destruction or privacy. If firms own data, they extensively investigate it and refuse to share a larger amount of their data with others which results in lower welfare (89.17%). The costs of forced sharing might be a disincentive for firms to create data or to add noise to diminish the value for the public. To prohibit sharing is particularly harmful as it only leads to a welfare of 34.29% compared to the perfect allocation. The authors infer that the initial ownership of data should be given to consumers as they adequately balance data sharing and privacy, leading to a near perfect allocation. Sharing data leads to a scale effect and increases the consumption and variety of consumer goods. Seeing increasing returns of scale associated with data and the non-rival characteristic of data may engage firms to merge into a “single-economy-wide firm” (Jones and Tonetti 2018) to exploit the scale effect.
Summarized, several authors have empirically determined that sharing data leads to an increase in welfare for firms and consumers. However, their opinions differ regarding the perfect amount of sharing and how valuable data can be for firms.
Trackability of Microdata:
Through digital technologies, large amounts of individual’s surfing and click behavior on the internet can be costlessly recorded and saved in databases. Marketers often use a combination of different web beacons like web tags or page tags on consumers’ past internet activities to derive their current needs and detect common trends (Tucker 2010). As a consequence, firms are able to advertise suitable products at customized prices to individuals. For instance, micro-tracking empowers marketers to notice when a couple decides to become pregnant. Subsequently, they might decide to show them web-banners with buggies or cribs (Acquisti 2014; Easley et al. 2018). Personalization and one-to-one markets are possible and lead to an increase in advertising effectiveness. However, micro-tracking also results in an asymmetric distribution of information and might infringe on consumers’ privacy (Gandomi and Haider 2015).
Tracking improves the ability to target specific markets or customers with reduced advertising costs because the likelihood of only addressing receptive customers is increased (Acquisti 2014; Acquisti and College 2010).
Low tracking costs enable the differentiation of products through for example price discrimination in a new way (Goldfarb and Tucker 2019). Shiller (2016) shows that personalized price discrimination using individual-level tracking technologies to track web browsing behaviors raises profits by 14.55%. Some consumers pay twice as much for a product than others. In contrast, when targeting a consumer offline, companies have to rely on “noisy signals based on media demographics”(Goldfarb and Tucker 2019) , while the consumer’s digital footprint can be used to directly target that particular person, allowing higher revenues for firms (e.g. Acquisti and College 2010). In turn, higher revenues allow firms to invest in new services and business models. Moreover, tailored advertising may also be beneficial for customers. Targeting gives consumers useful information and insights on items they are interested in and reduces their cost of acquiring valuable information. (Goldfarb and Tucker 2019).
All in all, one can say that “big data is equivalent to an upward shift in the matching function between firms and customers” (Roxana Mihet and Thomas Philippon 2018). To maximize advertising effectiveness, firms need to conduct experiments on the influence of behaviorally targeted advertisement and then develop advanced algorithms and data processing tools to exploit microdata on website visitors and therefore maximize advertising effectiveness.
Verifiability of Data:
Even though the reduction in tracking costs facilitates verification of firms’ reputation, consumers still prefer face-to-face transactions and are more risk-averse towards online purchases. The difficulty for firms lays in establishing a system of trust in sole digital transactions. Goldfarb and Tucker (2019) summarize that several papers empirically prove that better rated and thereby more trusted sellers can demand higher prices. (Houser and Wooders 2006; Lucking-Reilley et al. 2007). Consumers trust trademarks and user reviews as a source of information about product quality. To achieve higher sales, firms can provide information on product quality on for example Amazon to inform their customers through positive reviews that their product is the best product available (Chevalier and Mayzlin 2006). Overall, it has become easier “to establish an online reputation (…) but the mechanisms for damaging that reputation in form of consumer complaints have also become easier” (Goldfarb and Tucker 2019) as well. Social media enables a rapid widespread of information not only on customers but also on firms. Jeff Bezos, the founder of Amazon, summarized the widespread effect of the internet on reputation perfectly: “If you make customers unhappy in the physical world, they might each tell six friends. If you make customers unhappy on the Internet, they can each tell 6,000 friends” (Newman 2015). Firms can use the internet in favor of their reputation. However, there is also a ubiquitous danger of losing esteem within seconds.
Costly Characteristics of Data
Several characteristics of data which make the creation of value from data challenging. The existence of incomplete and noisy datasets, collected at different points in time, challenges firms to exploit the value of data.
Understanding and dealing with costly characteristics of data is of the utmost importance for companies, as they otherwise might extract wrongful insights from their collected data.
Volume and Variety of Data:
As previously discussed, the world is filled with a vast amount of data which is continuously growing. Data-driven firms are eager to collect all available data because they are unable to determine the value of the data beforehand (Janssen, van der Voort, and Wahyudi 2017). However, competitive advantage is not obtained by possessing the highest amount of data but rather by having the organizational capabilities and technologies to transform data into valuable information (Duch-Brown, Martens, and Mueller-Langer 2017).
Captured data is subject to a high variety and therefore highly heterogenic, dividing data into structured and unstructured data. “90% of generated data is unstructured” (IBM 2019) including tweets, images, videos, and audios. Consequently, traditional programs, for instance linear modeling approaches are futile (Cai and Zhu 2015; Varian 2014). Varian (2014) describes several new big data analyzing tools that account for the complex and flexible relationships between datasets. He highlights the growing importance of machine learning techniques as they allow to effectively combine and analyze big data. Furthermore, he advises economists to enhance their knowledge of machine learning.
Data analysis creates a trade-off between the study of all available data and focusing on a subset of data. The essential key to success is to figure out which data should be filtered out as erroneous and which subset of data a firm should focus on. Hence, firms need to establish reliable tools to analyze and interpret data.
Velocity of Data:
Velocity is the third “V” of most data definitions and refers to the speed “at which data is generated and at which it should be analyzed and acted upon“ (e.g. Cai and Zhu 2015). IBM (2019) estimates a rate of 50,000 gigabytes per second of global internet traffic. Especially smartphones and sensors have “lead to an unprecedented rate of data creation” (Gandomi and Haider 2015). Every 60 seconds, two hours of footage are uploaded to YouTube and 216,000 Instagram pictures are posted (IBM 2019). Uncountable amounts of data are generated within seconds, leading to a large source of information. But data is highly subjected to perishability. Therefore it needs to be analyzed in real-time (Gandomi and Haider 2015).
Theoretically, a quick analysis of data enables immediate feedback. Search engines like Google constantly optimize their search algorithms and online product services by analyzing their user’s clickstream data. Online data can be easily used to improve product offerings, optimize user experiences on websites (Tucker 2010) and enables interactive relationships with individual customers. But only through the utilization of advanced big data technologies, firms are able to analyze a high volume of data timely and effectively to create “real-time intelligence” (Gandomi and Haider 2015). Otherwise, data becomes outdated and useless, ultimately leading to decision-making mistakes (Cai and Zhu 2015; Gandomi and Haider 2015).
Veracity of Data:
Veracity is probably the most important and simultaneously most challenging characteristic of data. It deals with the degree to which firms can trust and rely on data and the outcome of the analysis of that data. Data can be incomplete, out-of-date, fake and noisy (e.g. Gandomi and Haider 2015). According to IBM (2019), $3.1 trillion per year is lost in the US economy due to poor data quality. Data itself is never valuable until it is translated into relevant information. Yet low-quality data will always result in low-quality insights. To extract truthful, objective, and credible information, the collected data needs to be truthful, objective, and credible as well. Subjective data can be valuable to companies but it is crucial that firms are aware of its subjectivity (Lukoianova and Rubin 2014).
Lukoianova and Rubin (2014) came up with a big data veracity index which measures the degree to which collected data is objective, truthful and credible (OTC) and normalizes them at the (0,1) interval with 1 referring to the maximum OTC. In contrast “big data of low quality is subjective, deceptive and implausible” (Lukoianova and Rubin 2014), expressed by an index of 0. Deception is the intentional creation of false content with the aim of leading readers to make wrong conclusions. The tremendous amount of textual content on the internet has led to a rise in deceptive content which may lead to detrimentals results. The 2016 U.S presidential election shows how data can be manipulated and used to psychologically exploit people (Kauflin 2018).
The decision to disclose or protect data leads to several trade-offs. Undisclosed data that is not shared with other parties but protected, can impose entry barriers and limitations for competition. Protected data can be seen as opportunity costs. Therefore, firms might choose to forgo potentially valuable data gathering in order to not conflict with privacy protection laws and keep a good reputation. Furthermore, firms lose money if they overinvest in data security and protection (Acquisti and College 2010). According to Acquisti and College (2010), data restrictions hinder innovation because firms are lacking customer data and consequently welfare is diminished. The quality of available information about economic operators in the market may be diminished by privacy protection laws. Consequently, the signals needed for the market to analyze and efficiently use data to decide on production and pricing could be denied (Acquisti 2014). Consumers, on the other hand, are often more likely to buy products from firms that protect their customers’ data. As a result, several trade-offs for firms and consumers regarding the decision of how much data should be shared, have risen.
Interoperability of Data:
Portability refers to the ability to move data between different parties. Therefore, competition and services become closer substitutes, which forces firms to offer better quality and/or superior customer service (Duch-Brown, Martens, and Mueller-Langer 2017).
Interoperability of data constitutes one of the main reasons for data-driven mergers. The incentive behind a merger is to stop competition from entering the market by preventing data portability and interoperability. This is especially relevant for firms in multi-sided markets that use their superior access to data to strengthen their market position. Platforms like Opodo or Momondo gain from a large amount of interoperable data. However, airlines like Lufthansa are exposed to more transparency and competition in their industry, putting them under price pressure (Duch-Brown, Martens, and Mueller-Langer 2017). Nevertheless, data sharing increases the consumption and maximizes total surplus.
Data will impact the entire economy, and not only behavioral tracking and price discrimination. Therefore, managers’ new obligations include giving data a higher share of their attention, as underestimating data could lead to the downfall of a company.
First, managers need to make their firms data-ready. To do so, they may establish a new organizational structure and develop new data analyzing technologies. Data scientists are an essential part of extracting the right data from a tremendous pool of sets. They should work together with marketers to tackle consumers’ preference and predict trends. It is the obligation of managers to hire the right people and provide special training for example with a focus on machine learning to harness the power of data and improve decision making. Second, due to the decrease of veracity in data, a systematic approach to analyze large datasets has to be established. To ensure high quality decisions datasets have to be truthful, objective and credible. Third, managers should be aware that sharing data within and outside their firm does not harm, but benefits their firm in generating valuable insights from their collected data. Nevertheless, they should never betray their customers’ trust. In the fast-growing data-driven economy, reputation can be damaged quickly and newcomers can easily steal customers from incumbents due to the interoperability of data. Fourth, capital should be invested in detailed research on the characteristics of data. Data is a valuable economic input and every decision builds upon the foundation of a good database. Therefore, investing in research on the characteristics and effects of data will result in increased returns, as superior performance in big data analytics will result in a competitive edge. However, it should not be forgotten that even if data is timely analyzed and correctly processed, a competitive advantage is derived from the interplay of data insights, innovative ideas, commercial strategies, service and especially a good customer relationship.