Wednesday, February 11, 2015

Introduction to NoSQL

We already have got the idea about Big Data and we understand why we need to take care and not ignore Big Data.

First let us know why it is tough to handle Big Data in traditional RDBMS way.
  1. We all do that in RDBMS, we define relations in RDBMS. So interrelated objects are basically different tables joined together in RDBMS.
    For example, an object of User can have attributes like firstName, lastName, and an object of address. Address in turn can have attributes like  zip, city, state. Now, if we want to define the same in RDMS way, we get two different tables USER INFO and ADDRESS INFO. Now again, we need to join the same using foreign key relations.

    Data representation in RDBMS and Memory

      So, think of read-write,
      i. While storing the data, we need to retrieve records from two different tables, merge two different data into one and represent in memory

      ii. While writing data, we have to split the data from memory, create two different representations, save it back to data store
      Unnecessarily doing some extra operations. In real life scenarios, things are more complicated with lots of joins in the query.
    1. We all do this one too, we define the structure of the data before we can work on that. In RDBMS, we need to define the data structure before hand. So, we can not add anything dynamically. Suppose we need to add contact to the User object. We need to again add another table, establish relation between Contact and User and then only we can effectively can use Contact in User. Nowadays, data mostly don't follow any structure. So, RDBMS fails to process this data.
    2. RDBMS is linearly dependent on the processing power of a single machine. If we need to process more data, we need to have more powerful machine. Vertical scalability can be achieved upto a certain limit. So at certain point we are bound to be saturated. Also having a more powerful system needs more knowledge to maintain. Cost is also a key factor here. More powerful machine is more costly.

    Now, what to do with the big data ?
    Well, technology has evolved much more in a sensible way. So, we can think of a solution to this problem in much more sophisticated way.

    NoSQL to rescue !!!

    NoSQL is  the new concept to process Big Data, which defines data in a more logical way and dealing with them is so sensible in NoSQL.

    So, what  is this NoSQL ?
    I can not define NoSQL. No formal definition can be provided here. It is a new age concept to deal with new age problem Big Data.

    OK, does this deal with the problems ?
    Yes, the major problems to deal with Big Data in RDBMS is dealt in NoSQL in a finer way possible.

    1. NoSQL stores data in the same way we define it in memory. So no extra processing required while retrieving or storing data
    2. NoSQL does not need to define the structure of data beforehand. We can simply store it as we want it to be. Although we can get what we need as and when required. (More on this is coming up in the way)
    3. NoSQL simply works on distribute framework, where in need we just have to add more similar systems to the existing network. No more powerful system is required to handle more data. NoSQL scales out instead of scaling up. So cost and maintenance is less in the NoSQL case. The following diagram depicts the difference,
      Vertical vs Horizaontal Scalability
    Well, good to know. But as I know, nothing is 100% accurate. There must be some shit in NoSQL as well.
    Yes, nothing is perfect. So do NoSQL too. Consistency is the issue that needs special care while working with NoSQL. I assume you know about ACID. But NoSQL really lacks this compared to RDBMS. RDBMS is built on top of ACID but NoSQL is quite different in this area.

    What NoSQL deals with is what we call it as 'BASE'. Well, a specialized version of NoSQL show ACID.

    'BASE'. Now, what is that ?
    We'll discuss this one in our next discussion.

    Prev    Next
    Palash Kanti Kundu