There is a recent term in computer science, “Big Data,”
which has a very loose definition and causes a lot of confusion within the
industry. Big Data has been in existence since before computers existed. A
perfect example of Big Data is ancient history that was recorded on scrolls. A
scroll could only hold so much information before it was full and could hold no
more. A single scroll was not too much to handle and carry around, but the
amount of recorded data quickly expanded to hundreds and thousands of scrolls.
This is what we refer to as Big Data, and we are still trying to come up with a
solution to handle data that grows too large to be managed easily.
Most companies have an entire division dedicated to the
management of data, and they have a big issue to face as we produce and gather
more data in a single day than was collected and gathered for all of history
prior to the computer age. Big Data is not a problem that will just go away and
one of the ways we have begun to manage this landslide of data is to form
tighter data structures.
I know, I have now introduced another new term to talk about
an old problem. Don’t worry, data structures are easy. Remember the scroll, it
had a linear data structure, things were recorded on the scroll in the order
that they happened and stored as characters of a written language. This is a
very loose structure that is usually referred to as unstructured data, because
you can write anything on a scroll. To have real data structure, you need a set
format for recording the data. A great example of a data structure you have all
seen is your federal income tax return form. They provide a set number of
blocks to record your information on the form and reject the form if you go
outside of the boundaries. This is a
data structure in paper format.
So how do data structures help to manage Big Data? The
biggest way is by keeping the data in a known order, with a known size and
known fields. For example, you might want to keep an address book; it would
have all your friends’ names, addresses, phone numbers, and birthdays. What if
you just started writing your friends’ information on a blank sheet of paper in
a random order?
Bill, 9/1/73, 123 Main Street, Smith, MO, Licking, 65462, John
Licking, Stevens, MO, 4/23/85, 573-414-5555, 65462, 573-341-5565, 123 Cedar
Street.
It would become quickly impossible to find anyone’s contact information
in your address book, and even with the two friends in my example, you already
have a Big Data problem; we don’t know what information belongs together.
If we take the same two people and provide a structure for
the data, it suddenly becomes much more usable.
Bill Smith, 123 Main Street,
Licking, MO 65462, 537-414-5555, 9/1/73; John Stevens, 123 Cedar Street,
Licking, MO 65462, 573-341-5565, 4/23/85.
It is still not easily readable by a
computer, because even though there is a known order, we have a field
separator, the comma, but there is no known length, which complicates things
for computer software. A computer likes to store data structures of a known
length, so you need to define a size for each data field, and a character to
represent empty space. In my example we will use 15 characters for every field
and ^ will be an empty space.
Our address book now looks like this:
Bill Smith^^^^^
123 Main Street
Licking,_MO^^^^
65542^^^^^^^^^^
573-414-5555^^^
09/01/1973^^^^^
John Stevens^^^
123 Cedar Stree
Licking,_MO^^^^
65542^^^^^^^^^^
573-341-5565^^^
04/23/1985^^^^^
No comments:
Post a Comment