Thursday, March 26, 2015

Introduction to MongoDB

Let's start our journey in the MongoDB Way...

The first information, everyone is eager to have is,

What is MongoDB ?
MongoDB is a Non Relational JSON Document Database.

Now, each part of the definition has a weight, let's see each of the separately,

Non Relational: MongoDB is a NoSQL database which does not support relations between different collections (tables in RDBMS analogy). In RDBMS we have multiple tables connected to each other by different references in them. The tables can be joined to fetch data. In MongoDB, joins are not allowed.

JSON: JSON is JavaScript Object Notation. JSON is a text based, lightweight, data-interchange format, built based on a subset of JavaScript Programming language and is easy to read write for humans while easy for machine parsing as well. JSON is represented as key-value pairs. Following is an example,

    street:"Sukanta Nagar",  

More detail on JSON is available in JSON Website.

Document Database: We already have gone through Document Database. If you want to refresh your knowledge, you are encouraged to review the topic once again.

MongoDB Features: As of now, we know that, MongoDB is a document database. Let's see what it offers,
  • MongoDB is highly scalable
  • MongoDB is JSON based which is highly familiar with native programming languages.So, ideally you can store anything into it as long as it supports JSON architecture
  • MongoDB supports dynamic schema, By this we mean, no specified format of the schema. So, you can put anything in a collection. No need to worry about the structure of the data. You can view the following video from MongoDB University,
  • MongoDB is highly efficient
  • MongoDB is easy to use
On the other hand, it lacks some of the features from RDBMS like,
  • MongoDB does not support joins
  • MongoDB does not support transactions

To get a more detailed insight, you can check out here or you can watch the following video from MongoDB University,

Hope this article helps you get info on MongoDB, Now its time to set your system up for MongoDB usage. Here is the resource, you will need to have to set up MongoDB.

Prev     Next
Palash Kanti Kundu

Friday, March 13, 2015

Document Data Store

Life moves so fast. You gotta document the good times, man - Big Boi

Really, if are not documenting events, you are not trying to remember them.
So true, right ?

Hey dude, listen we are not here to discuss what we do or what we ignore. Could you please try to be more task specific ?
OK OK. I am going in the path. I am trying to let you think of any document.

Document, which kind ?
Any document, you want to remember or may be right now carrying. The document may be a photo copy of your identity proof or may be its a news paper article or may be a medical receipt or whatever it may be, it is a document and they help us keep things remember.

Yes they are helpful in times.
That's the point. In the world of NoSQL also, its true. Documents are helpful.

Really ? How ?
Well, as we mentioned earlier in the Big Data introduction that, Big Data is mostly unstructured or semi-structured data where data model can be changed dynamically. In that sense, concept of document is helpful.
To answer your second question, How of you have seen two exactly structurally similar documents ?

Chances are less, right ?
Sounds familiar with Big Data community ?

Yes, I got it now. But let's see how you can provide a structural definition of an unstructured model.
Well, to be honest, its really a tough job for me. Although I have thought of describing it in the following way, I know its not complete,
Document Data Store: Document Data Store is one of the major category of NoSQL Database type, it is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. With the concept of documents still have a structure within it but the flexibility to change the structure on the fly is acceptable.

What do you mean by the term 'document-oriented' ?
Document is the heart of any document data store. A document is a data structure composed of field and value pairs.
Let's discuss with a scenario. That way, we can get into it in more natural way.

Let's say we have to provide some identity proof documents in an competition. Now, most of the cases, it will have your name, your address, your photo, your date of birth. Now,let's say this is the most common format of the document. In the user end, you are less worried about how to do or what do, you just have this information with you. May be you are bringing passport while I am bringing driving license. Both essentially serve the main purpose of Identity proof but with different format or different style.

To add more on your brain's thinking capacity, think in the way someone brings his professional identity card, which also provides the same functionality.

Puzzled ? Here are some images for your brain to feed.


Did you get the point?

Hmm, documents differ in format but serve the same purpose. All have name, address, image. But the address differ in format, name differ in format. Image sizes are different.

Good, now to add on to your point, the documents also differ in the content as a whole. Have you noticed that ?

Yes. Apart from thee common things, they have some specific attributes as well like Govt. Stamp or Employee Id or License Number.

You might have overlooked the fact that all have their own unique key for identification.

Yes. They have unique key as well for identification.

So, that's it of the 4 types of NoSQL you have the concept of one type known as document-oriented data store.

What ?
Yes, dude you have grasped the concept of a document. Now let's try to put some bullets on them.
  1. Documents vary in format but some basic structure can be maintained
  2. Apart from the generic structure, a document may also contain some specific contents as well
  3. Each Documents has a unique key
  4. Documents can store any kind of information like text, image, binary etc
  5. Structure of any document can be modified on the fly by adding or removing contents from the document
That's it. Let's put the mental images in documents. Let's see how they look like,

      _id : DL-0420110149XXXX,  
      name : 'Palash Kanti Kundu',  
      occupation : 'Software Engineer',  
      organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],
      gender : 'Male',  
      address : [ {  
           _id : 123456,  
           type : 'Current',  
           city : 'Kolkata',
           zip : 700098  
      }, {  
           _id : 156,  
           type : 'Permanent',  
           city : 'Barddhaman'  
      } ]  

Now another document in the same categoory might be different. Let's see,

      _id : KO-123,  
      name : 'Palash Kanti Kundu',  
      occupation : 'Software Engineer',  
      organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],  
      address : [ {  
           _id : 123456,  
           type : 'Current',  
           city : 'Kolkata',
           zip : 700098  
      } ],  
      dateOfBirth : '21-08-1990'

Another one might be like the following,

      _id : 51531642,  
      firstName : 'Palash',
      middleName : 'Kanti',
      lastName : 'Kundu',  
      occupation : 'Software Engineer',  
      organization : 'HCL Technologies',  
      designation : 'Senior Software Engineer',
      previousOrganisation : 'Cognizant Technology Solutions',
      address : [ {  
           _id : 123456,  
           type : 'Work',  
           city : 'Kolkata',
           zip : 700156  
      } ],  
      dateOfBirth : '21-08-1990',
      experience : ['Java', 'Spring', 'Hibernate', 'jQuery', 'Oracle PL/SQL'],
      maritalStatus : 'Single'

All these three defines a person and also differs in the format but we can find the information we need on a demand basis. We can see that structure of the data has also been added, removed or modified. Still the data is useful in all their flavours.
So, basically you are maintaining a structure in an unstructured way. That's where the beauty lies. That's what makes this very powerful.

It is worth mentioning here, when abused, it is the worst tool you will ever find in a data solution scenario while handling with care makes this is a powerful tool.

With great power comes great responsibility.

Prev     Next
Palash Kanti Kundu

Friday, March 6, 2015

NoSQL Types Deep Dive

'Every time you dive, you hope you'll see something new - some new species. Sometimes the ocean gives you a gift, sometimes it doesn't.' - James Cameron

Well, the world of knowledge also work in the same way. But in most cases, it gives us gift. Now we are going to deep dive on the NoSQL Types. After the previous post, we want to deep dive to get some more in this area of Big Data.

We learnt that, we can categorize NoSQL Databases in 4 major types and we have some basic knowledge on them. Let's try to find some more idea on them. Let's take each of them one by one.

  1. Key-Value pair databases - The simplest of all the four categories. Conceptually we can look at them as HashMap<Key, Value>, where the key is the primary key for the value to be stored and the value is the raw data. Well, this value can be anything. The database just stores the value blindly without even caring what's inside.
    As more like a HashMap<Key, Value>, the set of operations is also somewhat analogous. We can get the value of a key, put to the database a key-value pair or simply can delete the value associated with a key. Query is only possible through the key itself. Mappings are usually accompanied by cache mechanisms to maximize performance.

      Due to the use of single primary key access, this type provides a better performance and scales incrementally.


      We can not query this database based on value, all the accesses must be done through the primary key. It is upto the application to understand, what it originally stored and how to process the value on retrieval.
      Implementing relationships between data is not recommended with this type.
      Since there is no column in the database, updating part of the data is cumbersome.

      Use cases:
      Key-value databases are best utilized in the following situations:
      •Storing user session data
      •Maintaining schema-less user profiles
      •Storing user preferences
      •Storing shopping cart data
    1. Document Database - This one is my favourite data store, we'll go through this type in deep detail in the next sections. In fact, this type provides the flexibility to migrate to NoSQL from RDBMS. This type allows the data to be stored in a semi-structured way. A document simply refers to a piece of data which has multiple attributes attached to it. The tricky part is, different document can have different architecture or they may be the same throughout the whole application. Application has the flexibility to add or remove attributes in the document on the fly.
      This type works on XML, JSON, BSON data which is easier to map with memory representation of object which is really helpful for Object Oriented Programming language like Java. Storing of database is also different than key-value pair. Document Databases don't store values blindly, they know about architecture of the data as well also store the metadata. So, query on the data is possible with this type. Interestingly, Document Store has the capacity to store document within another document as the backbone data representation(XML, JSON, BSON) of this type supports this capacity.

            _id : 1,  
            name : 'Palash Kanti Kundu',  
            occupation : 'Software Engineer',  
            organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],  
            address : [ {  
                 _id : 123456,  
                 type : 'Current',  
                 city : 'Kolkata',
                 zip : 700098  
            }, {  
                 _id : 156,  
                 type : 'Permanent',  
                 city : 'Barddhaman'  
            } ]  
      Document Data
      Use cases:
      Document Store databases are useful when you have to implement
      •Content management systems
      •Blogging platforms
      •Analytics platforms
      •E-commerce platforms
    2. Column Family store - Column-family databases are row-based databases. In this type of database data is stored in rows that have a unique row id, and instead of documents and ‘value’ like in Key-value store and document store databases, the data is stored in form of flexible columns.
      The key difference between Column Store and SQL database is that in Column-store you don’t have to maintain consistent column numbers. You can add a new column to any row without having to add them in all the rows of the database. Because of its similarity to SQL databases, column store are easier to query than previously mentioned NoSQL databases but they are not as flexible in storing random information like document store or key-value store.
      Column Family database
      Use Cases:
      Developers mainly use column databases in
      •Content management systems
      •Blogging platforms
      •Systems that maintain counters
      •Services that have expiring usage
      •Systems that require heavy write requests
    3. Graph databases - Connections are the main theme of this type. As a backbone, Graph Theory is implemented with concepts of nodes, edges, properties. Algorithms like BFS, DFS are used to find the shortest path connections. This type is extremely useful in connected data architecture.
      This type provides great flexibility while querying relational data and also supports index free searches.
      Graph databases
      Use cases:
      Graph based databases are enormously useful in applications that have connected data, such as social networks, routing infocenters, recommendation engine applications, spatial data and mapping applications and other applications requiring unique key relations.
      This gives greater flexibility in relational queries and also supports index free searches.

      They are extremely useful in analytic applications especially those which require predictions, recommendations, and consequence-analysis engines.
    So, we have some basic idea on the following:
    In the next sections, we'll be looking into a Document Data Stores and one of the popular implementation of this type, MongoDB.

    Prev     Next
    Palash Kanti Kundu

    Tuesday, March 3, 2015

    NoSQL Types

    At this point we are little bit familiar NoSQL databases, CAP Theorem and BASE.

    Let's see, what we can do with NoSQL.
    NoSQL comes in different flavours. We broadly define four categories of NoSQL.

    1. Column Families or wide column store
    2. Document Store
    3. Key/ value Store
    4. Graph Data store
    Let's check each of them in the next section,
    1. Column Families or wide column store - we know about matrix transpose. Where columns become rows and rows become columns. We can see this type NoSQL database in the same way. These type of databases work on the strategy of storing data as columns of data instead of rows.
    2. Document Store - most interesting part of this kind of database is that you can simply put the data in database as the way you put the same in the memory. Interestingly these database model follows XML, BSON or JSON data models that is easily mapped with Object Oriented Programming Languages like Java.
      This kind is perfect semi-structured information storing. So, this can be used in Web Applications where semi-structured data is very common.
    3. Key/ value Store - this is the simplest of all NoSQL databases cause it simply works on a key-value pair way. Again we can see this one as a HashMap<String, Object> or HashMap<Integer, Object> in Java.
    4. Graph databases - this kind of NoSQL databases works on node, property and relations between nodes. Social Networking is the platform for NoSQL graph databases to play key role.
      Interesting point to note here is that, although being part of NoSQL world, Graph Databases support ACID transactions.
    Each type has its own pros and cons. In next sections we will know about them. Here is a list of all NoSQL databases and their types for your reference NoSQL databases.

    Hope this gets you little bit of information on Big Data. If you liked this article, please reshare with your network...
    Prev     Next
    Palash Kanti Kundu