Big Data Characteristics
Volume
Volume means “How much Data is generated.” Nowadays, organizations, human beings, or systems generate or get a vast amount of data, from terabytes (TB) to petabytes (PB) to exabytes (EB) and more.
The name Big Data itself is related to its size, which is enormous. The size of data plays a crucial role in determining its value. Also, whether a particular data can be considered as Big Data or not depends on the volume of data. Hence, ‘Volume’ is one characteristic that needs to be considered while dealing with Big Data.
Volume = Very large amount of data
Velocity
Velocity means “How fast data is produced.” Nowadays, organizations, human beings, and systems generate huge amounts of data at a very fast rate. The term ‘velocity’ refers to the speed of generation of data. How fast the data is generated and processed to meet the demands determines the real potential of the data.
Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, social media sites, sensors, mobile devices, etc. The flow of data is massive and continuous.
Velocity = Data produced at a very fast rate
Variety
Variety means “Different forms of Data.” Nowadays, organizations, human beings, and systems generate a huge amount of data at a very fast rate in different formats. Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by most applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in analysis applications. This variety of unstructured data poses certain issues for storage, mining, and analyzing data.
Variety = Data produced in different formats
Three “Vs” Paradigm of Big Data
The Three “Vs” Paradigm (Volume, Velocity, Variety) of Big Data was defined by Doug Laney in 2001. If our organization’s data fits within this 3V Paradigm, it means we are dealing with Big Data Problems. Therefore, we should use some Big Data Solutions to solve our problems.
The Fourth “V”: Veracity
Veracity
Veracity means “The Quality or Correctness or Accuracy of Captured Data.” Out of the 4Vs, it is the most important V for any Big Data Solution. Without correct information or data, there is no use in storing a large amount of data at a fast rate and in different formats. That data should provide the correct business value.
This refers to the inconsistency that data can sometimes show, hampering the process of handling and managing the data effectively.
Veracity = The correctness of data
Importance of Veracity
This 4th V answers the following questions:
- How accurate is that data in predicting business value?
- Do the results of a big data analysis make sense?
Big Data 4Vs in Simple Terminology
- Volume: The Amount of Data
- Variety: The Number of Types of Data
- Velocity: The Speed of Data Processing
- Veracity: The Correctness of Data