Big Data
Big data refers to huge the volume of data (structured or unstructured) that cannot be stored and processed
using the traditional approach (i.e. using computers processors or devices)
within a given time frame.
How much data is called Big Data?
Data in Gigabits,
Terabytes, Petabytes or Exabytes or anything that is larger than these in size.
Even a small amount of data can be referred to as Big data depending on the context it is being used.
Example. If you try to
attach a document to an email that is of 100 MB, we would not be able to do so.
As the email system would not be able to support an attachment of this size,
therefore this 100MB size of attachment with respect to email can be termed as
Big Data.
Let’s take an example of real-world
scenarios.
The popular networking
sites such as Facebook, Twitter, Instagram, Linked, YouTube etc each receives a huge volume of data on a daily basis. Facebook receiving 100TB of data each day,
Twitter processes around 400 Million tweets each day, Linked receiving tons of
TB of data each day, On YouTube each minute around 48 hours of new videos are
uploaded.
But as the number of
users is increasing day by day, storing and processing of this data becomes
challenging. Since this data holds a lot of valuable information, this data
needs to be processed in a short span of time. By using this valuable
information, companies can boost their sales and generate more revenue. But
using the traditional computing system, we would not be able to accomplish this
task within a given stipulated time. Therefore, we can term this data as Big
Data.
How is Big Data classified?
Basically, Big Data is
classified in 3 categories.
1.
Structured Data – The data that have a proper format associated with it is called Structured Data.
·
Example – Data present within the
databases (College Student database etc), Excel Spreadsheet etc.
2.
Semi-Structured Data – The data that does not have a proper format associated with it is called Semi
Structured Data.
·
Example – Data present within the Emails,
Log files, Word Documents etc.
3.
Unstructured Data – The data that does not have any format associated with
it is called Unstructured data.
·
Example – The image files, audio files,
video files etc.
Characteristics of Big Data
·
Volume – The amount of data that is being generated.
·
Velocity – The speed at which the data is being generated.
·
Variety – The different types of data being generated.
· Veracity – The quality
or the value of the data that is being generated.
Other important
characteristics.
·
Variability – The inconsistency in the data that is being generated.
·
Value – The utility of the data being generated.
·
Virality – The speed through which data
can be transmitted over a network.
How is Big Data Stored and processed?
The traditional approach of storing Big
data.
In a traditional
approach, the data being generated in an organization such as Stock Markets,
Banks, Hospitals etc. are given as an input to ETL System (ETL – Extract,
Transform, Load these database functions are combined into one system and used
to pull out data from one system and transfer it to others). ETL system would
convert this data to a proper format and loading it into the database. Now, the
end-users can perform analytics and generate a report from this data.
But as this data grows,
it becomes a very challenging task to manage and process this data using Traditional
approach
Drawbacks of using Traditional Approach
·
Expensive System – It requires huge
investments in establishing or upgrading the system. Therefore, not being
feasible for small and mid-size companies.
·
Scalability – It becomes a challenging
task to expand the system when the data grows.
·
Time-Consuming – Traditional approach the system takes a large amount of time to process and extract valuable information
from the data.
Applications of Big Data in Various
fields.
·
Healthcare – With the advancement in
technology and development of health tracking devices, every day tons of
activities of individuals are monitored and analysed. There is no scope for
human intervention in handling these data. Such data are termed as Big data and
are processed to give a valuable outcome.
·
Education – As there is an increase in demand
due to the development in technology, education institutions and colleges
collect data from various sources and processes to determine the on-going
demand for a particular profession/degree and develops the curriculum as per
the needs. This helping the industry to meet the demand by availing the people
with that particular domain knowledge.
·
Insurance – Insurance companies collect
the data on “Determinants of health” such as food habits, TV consumption,
marital status, purchasing habits etc and processes the data to determine the
age expectancy of the individual and also determine the premium for their
health policies.
·
Information Technology – The organizations
applying the principles of data along with the machine learning and machine
intelligence, the IT department can predict the potential issue and help them
in avoiding or overcoming them. Thus, Big Data plays an important role in
Information Technology.
Benefits of Big Data Processing
Improved Customer Services
– With Big Data the companies can analyse the user habits patterns from various
social media sites and target their users more precisely. Thus, increasing the
customer satisfaction rate.
Early Identification of
Risk of Product/Services – By replacing the traditional feedback form with Big
Data systems, the organizations can detect early about the change in demand and
hence make changes in their strategies.