Sunday, November 29, 2015

Working with the shell - Javascript

In our previous post on creating collections, we used mongo client which is a shell client. This client is an interactive JavaScript Interpreter. So, you can write your JavaScript program and see the result immediately.

OK, let me prove my points. Let's start with running the server and the shell.

On two different Command Prompt start the following commands,

mongod

mongo

Now, you are going to see something similar to the following on the client,
Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Palash>mongo
2015-11-29T13:53:42.288+0530 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
MongoDB shell version: 3.0.1
connecting to: test
>

OK, that's it. You have successfully started the client shell. Now, let's run a simple JavaScript program on it.
> for(i=0;i<5;i++){print("Hello World");}

On running this program, we get to see the following output, which is very obvious,
Hello World
Hello World
Hello World
Hello World
Hello World
>

It was fairly a simple program in JavaScript. Now, let's define some variables in the shell,
> x=1
1
> y=2
2
> z=5
5
>

You can see that, when a variable is assigned a value, the value is printed on the next line. Now et's try to query back the variable. We can do that simply by putting that variable name on the shell and the expression will be evaluated and the corresponding value will be returned. Let's try that,
> x
1
> y
2
> z
5
>

In similar fashion, we can define JSON objects. Here is an example,
> doc={name:"Palash"}
{ "name" : "Palash" }

Now, let us try to use the defined variables in a program,
> for(var i=0;i<z;i++){print(JSON.stringify(doc));}
{"name":"Palash"}
{"name":"Palash"}
{"name":"Palash"}
{"name":"Palash"}
{"name":"Palash"}
>

In our next example, we are going to insert lots of document in our database. Well, we'll be forming the documents with the iterator value as _id. Let's write a short JavaScript program which will create 9999 documents in a collection named huge.

But first let's check the database with 'show tables', if there is any collection with the name specified.
> show tables
Employee
Employees
system.indexes
>

So, I found that, in my database there is no collection exists with that name. If in case you are having a collection with that name, please specify other name for the following test.
> length=9999
9999
> for(var i=0;i<length;i++){
... doc = {_id:i, name:"developer"+i};
... db.huge.insert(doc);
... }
WriteResult({ "nInserted" : 1 })
>

After this program runs successfully, let's go back to the database and check the integrity,
> db.huge.count();
9999
>

So, we can see that, the new collection holds 9999 documents. Now let's check some of the documents,
> db.huge.find();
{ "_id" : 0, "name" : "developer0" }
{ "_id" : 1, "name" : "developer1" }
{ "_id" : 2, "name" : "developer2" }
{ "_id" : 3, "name" : "developer3" }
{ "_id" : 4, "name" : "developer4" }
{ "_id" : 5, "name" : "developer5" }
{ "_id" : 6, "name" : "developer6" }
{ "_id" : 7, "name" : "developer7" }
{ "_id" : 8, "name" : "developer8" }
{ "_id" : 9, "name" : "developer9" }
{ "_id" : 10, "name" : "developer10" }
{ "_id" : 11, "name" : "developer11" }
{ "_id" : 12, "name" : "developer12" }
{ "_id" : 13, "name" : "developer13" }
{ "_id" : 14, "name" : "developer14" }
{ "_id" : 15, "name" : "developer15" }
{ "_id" : 16, "name" : "developer16" }
{ "_id" : 17, "name" : "developer17" }
{ "_id" : 18, "name" : "developer18" }
{ "_id" : 19, "name" : "developer19" }
Type "it" for more
>

So, we can see that only the first 20 documents have been returned by the command and in the end, it has provided some instruction to key in "it", let's try that one. Let's see what happens when key in "it"

> it
{ "_id" : 20, "name" : "developer20" }
{ "_id" : 21, "name" : "developer21" }
{ "_id" : 22, "name" : "developer22" }
{ "_id" : 23, "name" : "developer23" }
{ "_id" : 24, "name" : "developer24" }
{ "_id" : 25, "name" : "developer25" }
{ "_id" : 26, "name" : "developer26" }
{ "_id" : 27, "name" : "developer27" }
{ "_id" : 28, "name" : "developer28" }
{ "_id" : 29, "name" : "developer29" }
{ "_id" : 30, "name" : "developer30" }
{ "_id" : 31, "name" : "developer31" }
{ "_id" : 32, "name" : "developer32" }
{ "_id" : 33, "name" : "developer33" }
{ "_id" : 34, "name" : "developer34" }
{ "_id" : 35, "name" : "developer35" }
{ "_id" : 36, "name" : "developer36" }
{ "_id" : 37, "name" : "developer37" }
{ "_id" : 38, "name" : "developer38" }
{ "_id" : 39, "name" : "developer39" }
Type "it" for more
>

Well, now we see next 20 documents on the screen and the same instruction to type in "it".

Actually, if you continue to type in "it" you will get to see all the documents until the total result set gets displayed.

OK, we'll discuss this particular topic on some subsequent discussion. Let's get back to the Shell.

So, we can see that the shell can interpret any javascript program. This can be handy for our use and even we have used this feature to create a huge collection. Huge collections will be proved to be useful when we we'll be going through our subsequent discussions on performance.

Well, that is just one of the most beautiful feature mongo client offers to us. here are some more features, that we can use. Take some time to go through the following video,


Hope, you have got a basic understanding of the features mongo client provides. In our next post, we'll be going through, the internal data representation of MongoDB, known as BSON.

Palash Kanti Kundu

Sunday, August 23, 2015

Working with the shell - Create Collection, Insert Document

Now, you have installed MongoDB, you have the shell ready to use. Let's do some basic operations.

First, start the MongoDB server.
Open command prompt/shell and run the following command,
1
mongod

When you run the command, you are likely to see the following output,
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
2015-08-23T12:36:06.334+0530 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
2015-08-23T12:36:06.393+0530 W -        [initandlisten] Detected unclean shutdown - C:\data\db\mongod.lock is not empty.
2015-08-23T12:36:06.423+0530 I JOURNAL  [initandlisten] journal dir=C:\data\db\journal
2015-08-23T12:36:06.438+0530 I JOURNAL  [initandlisten] recover begin
2015-08-23T12:36:06.459+0530 I JOURNAL  [initandlisten] recover lsn: 55843
2015-08-23T12:36:06.459+0530 I JOURNAL  [initandlisten] recover C:\data\db\journal\j._0
2015-08-23T12:36:06.484+0530 I JOURNAL  [initandlisten] recover skipping application of section seq:0 < lsn:55843
2015-08-23T12:36:06.585+0530 I JOURNAL  [initandlisten] recover cleaning up
2015-08-23T12:36:06.585+0530 I JOURNAL  [initandlisten] removeJournalFiles
2015-08-23T12:36:06.588+0530 I JOURNAL  [initandlisten] recover done
2015-08-23T12:36:06.672+0530 I JOURNAL  [durability] Durability thread started
2015-08-23T12:36:06.833+0530 I CONTROL  [initandlisten] MongoDB starting : pid=2100 port=27017 dbpath=C:\data\db\ 64-bit host=Palash-PC
2015-08-23T12:36:06.833+0530 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2015-08-23T12:36:06.834+0530 I CONTROL  [initandlisten] db version v3.0.1
2015-08-23T12:36:06.834+0530 I CONTROL  [initandlisten] git version: 534b5a3f9d10f00cd27737fbcd951032248b5952
2015-08-23T12:36:06.834+0530 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1j-fips 15 Oct 2014
2015-08-23T12:36:06.835+0530 I CONTROL  [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
2015-08-23T12:36:06.835+0530 I CONTROL  [initandlisten] allocator: system
2015-08-23T12:36:06.835+0530 I CONTROL  [initandlisten] options: {}
2015-08-23T12:36:06.992+0530 I JOURNAL  [journal writer] Journal writer thread started
2015-08-23T12:36:08.557+0530 I NETWORK  [initandlisten] waiting for connections on port 27017
2015-08-23T12:36:16.464+0530 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:53085 #1 (1 connection now open)
At this point, MongoDB is running on localhost (127.0.0.1) on port 27017 by default. We'll see later how to change these defaults.

Now the server is up and running and ready to accept connections from clients.

Have you noted something ?
Don't you think, it is really very fast at starting up and accepting connections ?

Which client ?
You can use any client, a Java client, a Python client, a .NET Client, a shell client or if you are crazy enough, you can write your own client.

For now, we'll just use the Mongo Shell client for our use.

How do I get that client ?
While installation, MongoDB provides this basic client. Just open another console and hit the following command,
mongo

This will also connect to localhost:27017 and default to test database and most likely to print similar to,
1
2
3
4
2015-08-23T12:36:16.370+0530 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
MongoDB shell version: 3.0.1
connecting to: test
>

That's it. Let's start doing some thing. Let's first see what are the collections already present. Hit either of the following,
1
show tables
or,
1
show collections

If you are using this for the first time, you'll find the following,
1
system.indexes

Otherwise, you can find other collections you might have created earlier.

Hold on, I've not created system.indexes.

Well, this collection is a system collection, which stores the details of the indexes created in the database.

OK, how do I create my own collection ?
The easiest way to create a collection is to create no collection at all.

What ? Have you lost your mind ?
I think, I am completely with my senses.
Actually, this is unbelievable at first that, you don't create a collection until you use it but this is true for MongoDB. You create your collection when you insert your first document in the collection. MongoDB implicitly creates the collection for you.

However, you can create your collection explicitly using db.createCollection() command, this is more likely to be a more advance topic which we can discuss later.

Cool !!!
Yup it is. Another more interesting thing to notice here is that, you can even query on a non-existing collection.

What ???
Perfectly true, let me show you an example.
Follow these steps,
  1. Run the show collections command
  2. You will see the existing collections
  3. Then query a non-existing collection.
That's how it looks like in the shell,
1
2
3
4
> show collections
system.indexes
> db.animals.find()
>

So, you see that, you don't have a collection named animals but you actually can query it using the find() method.
Well, as expected you don't have a record to return from the db. If it did that, then probably it would be a mess in the database and perhaps no one would have use it.

Hmm...(Aww...). Don't you think, it is a mess indeed. You are performing an operation which should not be performed.
Well. I don't want to debate on this topic. I just know that this is a feature that you can query non-existing collection with no output instead of getting exceptions. Actually, I personally like this eaturre. May be your choice is different.

Now, let's dive deeper. We've database server up and running, we are connected to the database via mongo shell. Now, let's try to store some information in the database.

If you think in the SQL perspective to store information you need to create a table first. But it is different in case of MongoDB as you might have already experienced.

So, the question arises, where to put the information. Well, we'll do that using Collections. But to get into the collection, you need some container to get handle of it.

It happens to be that, MongoDB provides us an object of the database to simplify our work and we can perform our operations using the database object. To get a hold on the database, we use db command in our shell. Let's try it,
>db
test

Which returns the name of the database, the shell is connected. As in my case, it returns test.

OK. So, how do I create a collection ?
Well, let's now move into it. Think of any collection name. Let's start with Employees collection.
1
db.Employees.insert({'sap_code': 51531642, 'organization': 'HCL Technologies', 'department': 'Development', 'role': 'Senior Software Engineer'});

And when run, it should print the following,
1
WriteResult({ "nInserted" : 1 })

Now again check collections,
1
show collections

And this time you should see the Employees collection.

1
2
Employees
system.indexes

So, let's try to understand what we've fired and what instruction was provided to the database.

As we've seen earlier, db returns the current database and we have used a collection on the db by using the dot operator. So, db.Employees actually represents the Employees collection in the current database. Now on that, we've invoked the insert operation. Which takes JavaScript object as argument.

In a nutshell, we've instructed the database server to add a document(JavaScript object) to the Employees collection in the current database. In the server side, it will check for the collection, if it is present, it adds the argument of the insert method to that particular collection. Otherwise, it creates the collection and adds the document to it.

That's it, you have successfully created a collection in the DB. Let's fetch the records from the collection Employees,

1
2
3
> db.Employees.find();
{ "_id" : ObjectId("55d97adafa54c8ea260b8288"), "sap_code" : 51531642, "organization" : "HCL Technologies", "department" : "Development", "role" : "Senior Software Engineer" }
>

As you can see, there is only one document available in the collection. But the format of the result is somewhat difficult to read. Let's make it pretty. Just add pretty() in the end.

1
2
3
4
5
6
7
8
9
> db.Employees.find().pretty()
{
        "_id" : ObjectId("55d97adafa54c8ea260b8288"),
        "sap_code" : 51531642,
        "organization" : "HCL Technologies",
        "department" : "Development",
        "role" : "Senior Software Engineer"
}
>

Now, that's cool to read. But hold on. I have not added that _id in the document.
Well, that's with the MongoDB. To insert a document in MongoDB, we need to have a unique identifier for each document. That's the primary key of the document.

In case if you have provided an _id value for the document, which is unique in the collection, it will use that value as primary key. However, in case if you have not provided _id, MongoDB will generate one for you. You can use anything as _id such as numbers, Strings, ObjectId etc. Another important point to discuss here is that the _id value is immutable. Once one document is created with an _id value, you cannot change it. Let's try adding one document with _id in the Employees collection.

> db.Employees.insert({ "_id" : 1, "sap_code" : 51534225, "organization" : "HCL Technologies", "department" : "Development", "role" : "Technical Lead" });
WriteResult({ "nInserted" : 1 })
>


The document is added to the Employees collection. Let's try to add the same document(with same _id) once again,


> db.Employees.insert({ "_id" : 1, "sap_code" : 51534225, "organization" : "HCL Technologies", "department" : "Development", "role" : "Technical Lead" });
WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 11000,
                "errmsg" : "E11000 duplicate key error index: test.Employees.$_id_ dup key: { : 1.0 }"
        }
})
>
>

This time you are getting the duplicate key error. It actually provides you the exact location of the error. In my case, it is showing error for test.Employees.$_id_.

Now let's query the database and see what happened to the Employees collection.


> db.Employees.find().pretty();
{
        "_id" : ObjectId("55d97adafa54c8ea260b8288"),
        "sap_code" : 51531642,
        "organization" : "HCL Technologies",
        "department" : "Development",
        "role" : "Senior Software Engineer"
}
{
        "_id" : 1,
        "sap_code" : 51534225,
        "organization" : "HCL Technologies",
        "department" : "Development",
        "role" : "Technical Lead"
}
>


Watch out for the cases in the query command. Cause, MongoDB shell is case sensitive as you can expect it from a JavaScript point of view. You can't see the expected result when you have missed out the case. For example, following query does not work,

1
2
> db.employees.find().pretty()
>

OK, let's add one more document to the Employee collection,

1
2
3
> db.Employees.insert({'sap_code':51534225,'name':'Monojit Das',  'organization':'HCL Technologies', 'department':'Development', 'role':'Technical Lead'})
WriteResult({ "nInserted" : 1 })
>

Let's query the collection again,


> db.Employees.find().pretty();
{
        "_id" : ObjectId("55d97adafa54c8ea260b8288"),
        "sap_code" : 51531642,
        "organization" : "HCL Technologies",
        "department" : "Development",
        "role" : "Senior Software Engineer"
}
{
        "_id" : ObjectId("55d98196fa54c8ea260b8289"),
        "sap_code" : 51534225,
        "name" : "Monojit Das",
        "organization" : "HCL Technologies",
        "department" : "Development",
        "role" : "Technical Lead"
}
{
        "_id" : 1,
        "sap_code" : 51534225,
        "organization" : "HCL Technologies",
        "department" : "Development",
        "role" : "Technical Lead"
}
>

Hey, you have just added one more column in the record and no error ?
Yes, I've added one more column attribute in the document. This is perfectly fine  with MongoDB, you can add or remove attributes to the document in the collection without affecting other documents in the collection. You can go through this article on Document Data Store to get this more clarified.


I hope, at this point, you are able to
  1. Create a collection
  2. Insert a document in the collection
  3. Find all the documents in the collection
  4. Show the results in a more readable fashion

Let's watch a small video depicting all the above information,



Next, we'll be going through more on inserting documents and shell.

Prev     Next
Palash Kanti Kundu Palash Kanti Kundu Palash Kanti Kundu Palash Kanti Kundu Palash Kanti Kundu Palash Kanti Kundu Palash Kanti Kundu Palash Kanti Kundu

Thursday, June 4, 2015

Shell is on the way !!!

Now, that's all set. You've mongo DB Server installed, classpath is set and all the basic setup stuffs are working fine.

No, I've not done anything.
Well, you can do so very easily with the help of Mongo DB post Installation guide. If you need any help, you can go through the process.

Now, let's take a close look at each part. Hmm, So, it has two parts, a Server and a client interface. As a developer we are less bothered about the Server jobs and other DBA related stuffs.
So, let's go for the client part.

While installing MongoDB, we found something naming mongo.

Now, question arises what mongo is and why do we need that ?
Well, in a nutshell mongo is the shell client of MongoDB. This interactive JavaScript shell provides powerful interface for system administrators and developers. This shell provides a way to test queries and operations to the server without any specific client or driver.

Hmm, so it is somewhat analogous to Oracle SQL Developer ?
Well, yes. You can draw analogy between these two from a user point of view. However, mongo provides a fully functional JavaScript environment for MongoDB.

Hmm, so, how to use it ?
If you have set up your system with Mongo DB post Installation guide, simply open a command prompt, type mongo and hit enter. You will be connected to the shell client. This shell by default connects to the database hosted in localhost, port 27017 and test database.

What if I have another database to connect to ?
Hold on dear, I am gradually moving into the path.

OK go on. So, what is next ?
As MongoDB and RDBMS are different systems, so there terminologies are. Let's try to connect some dots between these two. If you are coming from a strong RDBMS backgrouond, the following can help you to get started. If you are new to databases, I recommend you to skip this section.

The following table presents the various SQL terminology and concepts and the corresponding MongoDB terminology and concepts.

SQL Terms/ConceptsMongoDB Terms/Concepts
databasedatabase
tablecollection
rowdocument or BSON document
columnfield
indexindex
table joinsembedded documents and linking
primary key
Specify any unique column or column combination as primary key.
In MongoDB, the primary key is automatically set to the_id field.
aggregation (e.g. group by)
aggregation pipeline
The following table presents some database executables and the corresponding MongoDB executables. We are making this short for our convenience.
MongoDBMySQLOracleInformixDB2
Database ServermongodmysqldoracleIDSDB2 Server
Database ClientmongomysqlsqlplusDB-AccessDB2 Client
more on this can be found here.

Well, at this point, you are connected to the database and you know some of the terminologies used in MongoDB world. Let's take another step.

You have a running database but you can not find any Collection or Document in the database. So, let's first create a document in the database.

Create a document in the database: To insert a document into the database we simply have to invoke a function insert. Let's first do that,

 db.inventory.insert(  
   {  
    item: "ABC1",  
    details: {  
     model: "14Q3",  
     manufacturer: "XYZ Company"  
    },  
    stock: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],  
    category: "clothing"  
   }  
 )  

When you fire this command in the console and everything works fine, you are most likely to get something like below,
 WriteResult({ "nInserted" : 1 })  

What does that mean at all ?
Well, WriteResult is an object. nInserted is an attribute of WriteResult which determines the number of rows inserted into the database. If the insert operation had some errors, the WriteResult object would have contain the errors rather than the number of rows.

Wait, you mean to say that, we have inserted a row in a table without even mentioning the structure of the table ?
Well, you can tell that but there is a certain correction. The corrected phrase should be like,
We have inserted a document in a collection without even mentioning the structure of the collection.
Yes, we have done the same, we have inserted a record without doing anything. Well, if you take a look into the MongoDB introduction, you will get some more information.

Hmm, I am eager to see the same now.
Yeah, you are. Me was too !!!
Let's do the job then. Hit the following command,
 db.inventory.find()  

Voila, you get to see,

 { "_id" : ObjectId("53d98f133bb604791249ca99"), "item" : "ABC1", "details" : { "model" : "14Q3", "manufacturer" : "XYZ Company" }, "stock" : [ { "size" : "S", "qty" : 25 }, { "size" : "M", "qty" : 50 } ], "category" : "clothing" }  

Well, at this point we are going to get disconnected from the server and the client, but if you have grown some interest, I would like to recommend you the following. If you get time, go through it,
SQL to MongoDB

Bye for now

Palash Kanti Kundu

Sunday, April 12, 2015

First things first...

Well, hope that you have a working MongoDB on your system. Now, let's see how it looks like !!!

I am using a Windows 7 for this tutorial. If you are using other operating systems, apart from the OS Specific instructions, others should work fine.

We know that, MoongoDB comes with a database server and a client preloaded with the distribution.

Now, we are going to add this in the system path.
  1. Right click on the My Computer icon, Click on Properties.
    Set environment path
  2. Another window will pop up to set up system settings, On the left pane, click on Advanced system settings
    Set environment path
  3. Clicking on this link, will open another window, click on Environment Variables...
    Set environment path
  4. On the pop up, you see different system variables available, find for variable named Path and click on Edit...
    Set environment path
  5. On the Edit System Variable pop up, add MongoDB Server bin path which in my case is, C:\Program Files\MongoDB\Server\3.0\bin
    Set environment path
  6. Now, click OK 3 times and cross out the Control Panel window. And again, click on Start Menu, type cmd on the search box and hit Enter
    Open cmd
  7. This will open up the console, now type cd ../.. and hit Enter
  8. Type mkdir data and hit Enter
  9. Type cd data and hit Enter
  10. Type mkdir db and hit Enter
  11. Type mongod and hit Enter
  12. This will start the MongoDB server installed on your system on port 27017 and the data will be stored in /data/db path.
    Start MongoDB Server
  13. Open another console as mentioned in Step 6
  14. Type mongo and hit Enter
  15. This will invoke mongo shell, which will connect to the server running in localhost, port 27017 and test database by default
    Mongo shell
Congratulations, you just have completed the very basics of MongoDB. Next will be looking into mongo shell.


Prev     Next
Palash Kanti Kundu

Thursday, March 26, 2015

Introduction to MongoDB

Let's start our journey in the MongoDB Way...

The first information, everyone is eager to have is,

What is MongoDB ?
MongoDB is a Non Relational JSON Document Database.

Now, each part of the definition has a weight, let's see each of the separately,

Non Relational: MongoDB is a NoSQL database which does not support relations between different collections (tables in RDBMS analogy). In RDBMS we have multiple tables connected to each other by different references in them. The tables can be joined to fetch data. In MongoDB, joins are not allowed.

JSON: JSON is JavaScript Object Notation. JSON is a text based, lightweight, data-interchange format, built based on a subset of JavaScript Programming language and is easy to read write for humans while easy for machine parsing as well. JSON is represented as key-value pairs. Following is an example,

 {   
   "_id":10,  
   firstName:"Palash",  
   middleName:"Kanti",  
   lastName:"Kundu",  
   address:{   
    "_id":4,  
    houseNumber:"G-97",  
    street:"Sukanta Nagar",  
    city:"Kolkata"  
   },  
   hobbies:[   
    "Reading",  
    "Cycling"  
   ]  
 }  

More detail on JSON is available in JSON Website.

Document Database: We already have gone through Document Database. If you want to refresh your knowledge, you are encouraged to review the topic once again.

MongoDB Features: As of now, we know that, MongoDB is a document database. Let's see what it offers,
  • MongoDB is highly scalable
  • MongoDB is JSON based which is highly familiar with native programming languages.So, ideally you can store anything into it as long as it supports JSON architecture
  • MongoDB supports dynamic schema, By this we mean, no specified format of the schema. So, you can put anything in a collection. No need to worry about the structure of the data. You can view the following video from MongoDB University,
  • MongoDB is highly efficient
  • MongoDB is easy to use
On the other hand, it lacks some of the features from RDBMS like,
  • MongoDB does not support joins
  • MongoDB does not support transactions



To get a more detailed insight, you can check out here or you can watch the following video from MongoDB University,


Hope this article helps you get info on MongoDB, Now its time to set your system up for MongoDB usage. Here is the resource, you will need to have to set up MongoDB.


Prev     Next
Palash Kanti Kundu

Friday, March 13, 2015

Document Data Store

Life moves so fast. You gotta document the good times, man - Big Boi


Really, if are not documenting events, you are not trying to remember them.
So true, right ?


Hey dude, listen we are not here to discuss what we do or what we ignore. Could you please try to be more task specific ?
OK OK. I am going in the path. I am trying to let you think of any document.

Document, which kind ?
Any document, you want to remember or may be right now carrying. The document may be a photo copy of your identity proof or may be its a news paper article or may be a medical receipt or whatever it may be, it is a document and they help us keep things remember.

Yes they are helpful in times.
That's the point. In the world of NoSQL also, its true. Documents are helpful.

Really ? How ?
Well, as we mentioned earlier in the Big Data introduction that, Big Data is mostly unstructured or semi-structured data where data model can be changed dynamically. In that sense, concept of document is helpful.
To answer your second question, How of you have seen two exactly structurally similar documents ?


Chances are less, right ?
Sounds familiar with Big Data community ?

Yes, I got it now. But let's see how you can provide a structural definition of an unstructured model.
Well, to be honest, its really a tough job for me. Although I have thought of describing it in the following way, I know its not complete,
Document Data Store: Document Data Store is one of the major category of NoSQL Database type, it is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. With the concept of documents still have a structure within it but the flexibility to change the structure on the fly is acceptable.

What do you mean by the term 'document-oriented' ?
Document is the heart of any document data store. A document is a data structure composed of field and value pairs.
Let's discuss with a scenario. That way, we can get into it in more natural way.

Let's say we have to provide some identity proof documents in an competition. Now, most of the cases, it will have your name, your address, your photo, your date of birth. Now,let's say this is the most common format of the document. In the user end, you are less worried about how to do or what do, you just have this information with you. May be you are bringing passport while I am bringing driving license. Both essentially serve the main purpose of Identity proof but with different format or different style.

To add more on your brain's thinking capacity, think in the way someone brings his professional identity card, which also provides the same functionality.

Puzzled ? Here are some images for your brain to feed.



 



Did you get the point?

Hmm, documents differ in format but serve the same purpose. All have name, address, image. But the address differ in format, name differ in format. Image sizes are different.

Good, now to add on to your point, the documents also differ in the content as a whole. Have you noticed that ?

Yes. Apart from thee common things, they have some specific attributes as well like Govt. Stamp or Employee Id or License Number.

You might have overlooked the fact that all have their own unique key for identification.

Yes. They have unique key as well for identification.

So, that's it of the 4 types of NoSQL you have the concept of one type known as document-oriented data store.

What ?
Yes, dude you have grasped the concept of a document. Now let's try to put some bullets on them.
  1. Documents vary in format but some basic structure can be maintained
  2. Apart from the generic structure, a document may also contain some specific contents as well
  3. Each Documents has a unique key
  4. Documents can store any kind of information like text, image, binary etc
  5. Structure of any document can be modified on the fly by adding or removing contents from the document
That's it. Let's put the mental images in documents. Let's see how they look like,

 {  
      _id : DL-0420110149XXXX,  
      name : 'Palash Kanti Kundu',  
      occupation : 'Software Engineer',  
      organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],
      gender : 'Male',  
      address : [ {  
           _id : 123456,  
           type : 'Current',  
           city : 'Kolkata',
           zip : 700098  
      }, {  
           _id : 156,  
           type : 'Permanent',  
           city : 'Barddhaman'  
      } ]  
 }  

Now another document in the same categoory might be different. Let's see,

 {  
      _id : KO-123,  
      name : 'Palash Kanti Kundu',  
      occupation : 'Software Engineer',  
      organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],  
      address : [ {  
           _id : 123456,  
           type : 'Current',  
           city : 'Kolkata',
           zip : 700098  
      } ],  
      dateOfBirth : '21-08-1990'
 }  

Another one might be like the following,

 {  
      _id : 51531642,  
      firstName : 'Palash',
      middleName : 'Kanti',
      lastName : 'Kundu',  
      occupation : 'Software Engineer',  
      organization : 'HCL Technologies',  
      designation : 'Senior Software Engineer',
      previousOrganisation : 'Cognizant Technology Solutions',
      address : [ {  
           _id : 123456,  
           type : 'Work',  
           city : 'Kolkata',
           zip : 700156  
      } ],  
      dateOfBirth : '21-08-1990',
      experience : ['Java', 'Spring', 'Hibernate', 'jQuery', 'Oracle PL/SQL'],
      maritalStatus : 'Single'
 }  

All these three defines a person and also differs in the format but we can find the information we need on a demand basis. We can see that structure of the data has also been added, removed or modified. Still the data is useful in all their flavours.
So, basically you are maintaining a structure in an unstructured way. That's where the beauty lies. That's what makes this very powerful.

It is worth mentioning here, when abused, it is the worst tool you will ever find in a data solution scenario while handling with care makes this is a powerful tool.


With great power comes great responsibility.




Prev     Next
Palash Kanti Kundu

Friday, March 6, 2015

NoSQL Types Deep Dive

'Every time you dive, you hope you'll see something new - some new species. Sometimes the ocean gives you a gift, sometimes it doesn't.' - James Cameron

Well, the world of knowledge also work in the same way. But in most cases, it gives us gift. Now we are going to deep dive on the NoSQL Types. After the previous post, we want to deep dive to get some more in this area of Big Data.

We learnt that, we can categorize NoSQL Databases in 4 major types and we have some basic knowledge on them. Let's try to find some more idea on them. Let's take each of them one by one.

  1. Key-Value pair databases - The simplest of all the four categories. Conceptually we can look at them as HashMap<Key, Value>, where the key is the primary key for the value to be stored and the value is the raw data. Well, this value can be anything. The database just stores the value blindly without even caring what's inside.
    As more like a HashMap<Key, Value>, the set of operations is also somewhat analogous. We can get the value of a key, put to the database a key-value pair or simply can delete the value associated with a key. Query is only possible through the key itself. Mappings are usually accompanied by cache mechanisms to maximize performance.

    Key-Value
      Pros:
      Due to the use of single primary key access, this type provides a better performance and scales incrementally.

      Cons:

      We can not query this database based on value, all the accesses must be done through the primary key. It is upto the application to understand, what it originally stored and how to process the value on retrieval.
      Implementing relationships between data is not recommended with this type.
      Since there is no column in the database, updating part of the data is cumbersome.

      Use cases:
      Key-value databases are best utilized in the following situations:
      •Storing user session data
      •Maintaining schema-less user profiles
      •Storing user preferences
      •Storing shopping cart data
    1. Document Database - This one is my favourite data store, we'll go through this type in deep detail in the next sections. In fact, this type provides the flexibility to migrate to NoSQL from RDBMS. This type allows the data to be stored in a semi-structured way. A document simply refers to a piece of data which has multiple attributes attached to it. The tricky part is, different document can have different architecture or they may be the same throughout the whole application. Application has the flexibility to add or remove attributes in the document on the fly.
      This type works on XML, JSON, BSON data which is easier to map with memory representation of object which is really helpful for Object Oriented Programming language like Java. Storing of database is also different than key-value pair. Document Databases don't store values blindly, they know about architecture of the data as well also store the metadata. So, query on the data is possible with this type. Interestingly, Document Store has the capacity to store document within another document as the backbone data representation(XML, JSON, BSON) of this type supports this capacity.

       {  
            _id : 1,  
            name : 'Palash Kanti Kundu',  
            occupation : 'Software Engineer',  
            organization : [ 'HCL Technologies', 'Cognizant Technology Solutions' ],  
            address : [ {  
                 _id : 123456,  
                 type : 'Current',  
                 city : 'Kolkata',
                 zip : 700098  
            }, {  
                 _id : 156,  
                 type : 'Permanent',  
                 city : 'Barddhaman'  
            } ]  
       }  
      Document Data
      Use cases:
      Document Store databases are useful when you have to implement
      •Content management systems
      •Blogging platforms
      •Analytics platforms
      •E-commerce platforms
    2. Column Family store - Column-family databases are row-based databases. In this type of database data is stored in rows that have a unique row id, and instead of documents and ‘value’ like in Key-value store and document store databases, the data is stored in form of flexible columns.
      The key difference between Column Store and SQL database is that in Column-store you don’t have to maintain consistent column numbers. You can add a new column to any row without having to add them in all the rows of the database. Because of its similarity to SQL databases, column store are easier to query than previously mentioned NoSQL databases but they are not as flexible in storing random information like document store or key-value store.
      Column Family database
      Use Cases:
      Developers mainly use column databases in
      •Content management systems
      •Blogging platforms
      •Systems that maintain counters
      •Services that have expiring usage
      •Systems that require heavy write requests
    3. Graph databases - Connections are the main theme of this type. As a backbone, Graph Theory is implemented with concepts of nodes, edges, properties. Algorithms like BFS, DFS are used to find the shortest path connections. This type is extremely useful in connected data architecture.
      This type provides great flexibility while querying relational data and also supports index free searches.
      Graph databases
      Use cases:
      Graph based databases are enormously useful in applications that have connected data, such as social networks, routing infocenters, recommendation engine applications, spatial data and mapping applications and other applications requiring unique key relations.
      This gives greater flexibility in relational queries and also supports index free searches.


      They are extremely useful in analytic applications especially those which require predictions, recommendations, and consequence-analysis engines.
    So, we have some basic idea on the following:
    In the next sections, we'll be looking into a Document Data Stores and one of the popular implementation of this type, MongoDB.

    Prev     Next
    Palash Kanti Kundu