Friday, March 13, 2015

Document Data Store

Life moves so fast. You gotta document the good times, man - Big Boi

Really, if are not documenting events, you are not trying to remember them.
So true, right ?

Hey dude, listen we are not here to discuss what we do or what we ignore. Could you please try to be more task specific ?
OK OK. I am going in the path. I am trying to let you think of any document.

Document, which kind ?
Any document, you want to remember or may be right now carrying. The document may be a photo copy of your identity proof or may be its a news paper article or may be a medical receipt or whatever it may be, it is a document and they help us keep things remember.

Yes they are helpful in times.
That's the point. In the world of NoSQL also, its true. Documents are helpful.

Really ? How ?
Well, as we mentioned earlier in the Big Data introduction that, Big Data is mostly unstructured or semi-structured data where data model can be changed dynamically. In that sense, concept of document is helpful.
To answer your second question, How of you have seen two exactly structurally similar documents ?

Chances are less, right ?
Sounds familiar with Big Data community ?

Yes, I got it now. But let's see how you can provide a structural definition of an unstructured model.
Well, to be honest, its really a tough job for me. Although I have thought of describing it in the following way, I know its not complete,
Document Data Store: Document Data Store is one of the major category of NoSQL Database type, it is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. With the concept of documents still have a structure within it but the flexibility to change the structure on the fly is acceptable.

What do you mean by the term 'document-oriented' ?
Document is the heart of any document data store. A document is a data structure composed of field and value pairs.
Let's discuss with a scenario. That way, we can get into it in more natural way.

Let's say we have to provide some identity proof documents in an competition. Now, most of the cases, it will have your name, your address, your photo, your date of birth. Now,let's say this is the most common format of the document. In the user end, you are less worried about how to do or what do, you just have this information with you. May be you are bringing passport while I am bringing driving license. Both essentially serve the main purpose of Identity proof but with different format or different style.

To add more on your brain's thinking capacity, think in the way someone brings his professional identity card, which also provides the same functionality.

Puzzled ? Here are some images for your brain to feed.


Did you get the point?

Hmm, documents differ in format but serve the same purpose. All have name, address, image. But the address differ in format, name differ in format. Image sizes are different.

Good, now to add on to your point, the documents also differ in the content as a whole. Have you noticed that ?

Yes. Apart from thee common things, they have some specific attributes as well like Govt. Stamp or Employee Id or License Number.

You might have overlooked the fact that all have their own unique key for identification.

Yes. They have unique key as well for identification.

So, that's it of the 4 types of NoSQL you have the concept of one type known as document-oriented data store.

What ?
Yes, dude you have grasped the concept of a document. Now let's try to put some bullets on them.
  1. Documents vary in format but some basic structure can be maintained
  2. Apart from the generic structure, a document may also contain some specific contents as well
  3. Each Documents has a unique key
  4. Documents can store any kind of information like text, image, binary etc
  5. Structure of any document can be modified on the fly by adding or removing contents from the document
That's it. Let's put the mental images in documents. Let's see how they look like,

      _id : DL-0420110149XXXX,  
      name : 'Palash Kanti Kundu',  
      occupation : 'Software Engineer',  
      organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],
      gender : 'Male',  
      address : [ {  
           _id : 123456,  
           type : 'Current',  
           city : 'Kolkata',
           zip : 700098  
      }, {  
           _id : 156,  
           type : 'Permanent',  
           city : 'Barddhaman'  
      } ]  

Now another document in the same categoory might be different. Let's see,

      _id : KO-123,  
      name : 'Palash Kanti Kundu',  
      occupation : 'Software Engineer',  
      organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],  
      address : [ {  
           _id : 123456,  
           type : 'Current',  
           city : 'Kolkata',
           zip : 700098  
      } ],  
      dateOfBirth : '21-08-1990'

Another one might be like the following,

      _id : 51531642,  
      firstName : 'Palash',
      middleName : 'Kanti',
      lastName : 'Kundu',  
      occupation : 'Software Engineer',  
      organization : 'HCL Technologies',  
      designation : 'Senior Software Engineer',
      previousOrganisation : 'Cognizant Technology Solutions',
      address : [ {  
           _id : 123456,  
           type : 'Work',  
           city : 'Kolkata',
           zip : 700156  
      } ],  
      dateOfBirth : '21-08-1990',
      experience : ['Java', 'Spring', 'Hibernate', 'jQuery', 'Oracle PL/SQL'],
      maritalStatus : 'Single'

All these three defines a person and also differs in the format but we can find the information we need on a demand basis. We can see that structure of the data has also been added, removed or modified. Still the data is useful in all their flavours.
So, basically you are maintaining a structure in an unstructured way. That's where the beauty lies. That's what makes this very powerful.

It is worth mentioning here, when abused, it is the worst tool you will ever find in a data solution scenario while handling with care makes this is a powerful tool.

With great power comes great responsibility.

Prev     Next
Palash Kanti Kundu

1 comment:

  1. Finding the best data scientist for solving the big and complex data problem for making any project or program, it’s not easy, but when you comes at this, than you will get the variety of different data scientist and you can easily choose the data scientist according to your works needs in a very affordable or cheap rate of their working.