Comparison of Both the database in terms of Security
1 Introduction
Both the databases are open source where one is document oriented
and other one is for larger database. These database are family for NoSQL. The
NoSQL database is mainly designed to increase scalability, fast storage, fast
access to data and security (Anon., n.d.) . This database can
run on large node and is capable of achieving numbers of features that was not
possible with RDBMs. There won’t be conflict on reading and writing of data at
once. The data are distributed over thousands of machines and are in the form
of clusters and access by nodes or routers. In this paper the comparison of
both the database is done in terms of performance, storage, retrieval time,
scalability, reliability and security. The database model of these database
varies in terms where MongoDB is used for document store and Cassandra is used
for Wide column store. Cassandra was developed in 2008 by apache software
foundation and MongoDB was developed by MongoDB inc. The language that uses
these database are java for Cassandra and C++ for MongoDB (Anon., n.d.) . The schema free is
both the database. There is no server side script for Cassandra but for
MangoDB, JavaScript is used as server side.
The requirement of all three of CAP can’t be fulfilled. The
MongoDB flows CP where was AP is followed by Cassandra. CP states that some of
data can be accessed and some of data could be accurate whereas AP sates that
some data could be returned inaccurate. The application of Cassandra mostly
covers IOT, recommendation engines, fraud detection application, playlists,
product catalogs and messaging application. It is based on scalability (class)
of NoSQL (Bushik, 2012) . Whereas MongoDB
helps businesses get transformed using harnessing the power of data that are
stored. It is used by organization for startups on larger companies for
creating applications that does complex tasks. The Cassandra requires minimal
administration compared to MongoDB. This report presents all the aspect of both
the database and its comparison is made.
2. MongoDB
The MongoDB uses single instance operation and supports
standalone. The performance provided by MongoDB is very high which is done
using replica set which handles failures (MongoDB, n.d.) . The cluster makes
the division of large set of data and store in different machines. The high
redundancy is provided combining replica set and clusters (sharded) and the
data is found to be transparent to the applications. The main feature of
MongoDB are as given below:
·
Iterative and fast development.
·
Data model with flexible feature.
·
Scalability with multi-datacenter.
·
Feature set that are integrated.
·
TCO is lower.
·
Commitment that is for long term.
·
Flexibility
Data Management for MongoDB
Linear scalability
The horizontal scale out is provide by MongoDB which is cost
efficient using sharding. This process is transparent to software applications.
This sharding makes the data to distribute to different and multiple partitions
which is also known as shards. The limitation that is occurred due to
bottleneck is being solved which deployment of MongoDB in this pattern (Ellis, 2009) . The complexity is
reduced in this case. When the data get bigger the clustering of data is being
done and the size of cluster is increased. This whole process is automatically
maintained unlike other databases. There is no effort required for the
application developer for sharding logic. There is also multiple sharding
allowed in this database which makes it easy for developer to distribute data
in the cluster at number of resources. There
is high scalability with workloads and they are as given below:
Sharding in range
As we know the MongoDB is mainly used to store documents,
these documents are partitioned in number of shards which is determined by shard
key and value pair. There is high possibility that if two documents have close
key values being closer to each other in cluster.
Sharding Hash
The encryption used in this database is MD5 hash for
document distribution. It give reliability to the data to be distributed properly
in the shards (Gajendran, 2012) .
Sharding zone
This provides operation of defining own rules for data
placement within the shard zone cluster. This provides a range to data
distributions. The data refining could be done continuously by the
administrator and can change the key value for data migration (Hoberman,
2014) .
2.1 Architecture
of MongoDB
The diagram below gives the model of MongoDB architecture.
It contains application server, configuration servers and shared MongoDB which
is replica set. The components that sharded cluster has are shards,
configuration servers, query routers. The data are stored into shards that has
replica set and it provides data consistency and availability (Anon., n.d.) . The router in the
diagram is the query router, it handles the query and provides the interface
with the application used by clients. This gives direct access to the data in
the shard. The main operation of router is to target the data at shards and
return the data to the clients. There could be number of router that gives fast
access to the data and provide high availability.
The config servers’ gives feature of storing metadata that
are of clusters. There is mapping of the cluster and its dataset with the
shards data. These metadata are used by the routers to access the particular
data in the shards. There are 3 configure servers in sharded clusters as shown
in the diagram.
Figure 1:
Architecture of MongoDB
2.2 Security
During this last decade, there has been significant increase
in hacking and issues with data security. By 2021, it is predicted that
cybercrime might cost $6.2 trillion annually in global economy. There is always
threat for the industry which is related to data security. The data plays vital
role in industry for its growth and analysis of business. It is task of
administrators at industry to secure all its data from being manipulated and
hacked. The MongoDB consists of security measures for defending itself, controlling
access to data and detection of changes in database (Anon., n.d.) . The diagram below
gives the overview of the security.
Figure 2: MongoDB
There is external security measure of authentication and
accessing the database. These include LDAP, Kerberos, PKI certificates and
Windows Active Directory. The lightweight directory access protocol is used
mostly in business computer networks which operates in distributed list (Hoberman,
2014) .
The computer that wants to access LDAP must be logged into the server and
follow the protocol.
The authentication provides much security but there is
requirement for high secured authorization services as well. In MongoDB the
permission for the users could set according to access mode. It could also be
used within LDAP server. The auditing is provided and it can be used by the
administrators for determining and tracking access in log.
Encryption is one of the oldest and most effective measure
for data security. MongoDB uses this technique for encrypting its data on the
network. There is separate engine for encryption, protection of data. These
building feature in MongoDB gives proper management and performance in data
access and protection. The encrypted data can only be accessed by the
authorized users.
3. Cassandra
The Cassandra is column oriented database, distributed,
fault tolerant, scalable and high performance (Hewitt, 2010) . It is difficult to
get high availability of data with big data storage therefor the data are
stored in different location and portion is done. The Cassandra provides such
high availability of data and there are other more feature of this database
that are given below:
- · Handles high amount of data (Big data)
- · Access is fast and random
- · Schema is variable
- · The same data is seen at the same time by all the nodes.
- · The processing and access of data are need to do fast.
- · It requires partition of data and distribution.
- · Availability is higher than other database.
All the three that is Availability, consistency and partition
tolerance can’t be achieved once fully. The Cassandra gives high availability
but lacks in consistency. It was developed by Avinash Lakshman for powering
Facebook messaging search. In this database each and every node of the database
points to the same role and it doesn’t has any change to get failed. Similarly
as MongoDB, the data distribution is in clusters (Ellis, 2009) . All the strategies
associated with replication are flexible for configuration according to need by
administrator. The designing for database is done according to distributed
system so that there could be multiple data centers and larger nodes.
It is specially designed for disaster recovery. With the
addition of new machine, there is significant increase in throughput for
reading and writing for data. The replication of data is automatically done
into number of nodes so that there could be fault-tolerance. This gives data
security for cloud computing as well. The integration of hadoop including
mapreduce support is on this database which supported by apache hive as well (Abramova,
2014) .
There is separate query language for Cassandra that is known as CQL. This is an
alternative for SQL which gives an additional layer that hides detail about the
database structure. The drivers are also available for java i.e. JDBC and other
number of languages.
3.1 Architecture
of Cassandra
The structure of Cassandra contains node, cluster, data
center, table, commit log, mem-table, bloom filter (Gajendran,
2012) .
The architecture of Cassandra is being given in this section. Before
understanding the architecture, it should be known that Cassandra was developed
understanding that the system failure is likely to occur and do occur. The
distribution is in peer-to-peer where all the nodes are same.
The partition of data is done automatically when writing
data into the database. Hence, these is no specific place where the data could
be written sequentially but data could be anywhere. The commit log gets the
data at the beginning and then the data is also written in memory structure
that is mem-table (Bushik, 2012) . The diagram below
is the architecture of Cassandra, there are two Cassandra clusters which
contains web client assess and numbers nodes. The cluster configuration is
provided by middle tier architecture.
The architecture of Cassandra also supports replication of
data for fault tolerance and efficiency.
Figure 3:
Architecture of Cassandra
3.2 Security
Security for any data is most important in today’s world.
The industry always focus on data that can’t be manipulated and accessed by
other 3rd party. The users can be created by the administrators who
are given permission of accessing database. The command that is used is create
user. The internal architecture of Cassandra manages the user and its password
into its clustering database. The query language of its own can used to drop
such users or alter then accordingly (Bushik, 2012) . The permission
management are in control of administrator for granting different levels of permissions
to the user for accessing data. Hence for security purposes the Cassandra
provides number of feature for its security and they are as given below:
3.2.1 Encryption on client to node
This is an extra security option that is provided by
Cassandra. The SSL server provides high security for helping data not be to
compromise. The communication with data cluster and client is maintained using
SSL encryption. This is maintained independent in Cassandra. For addition
security the setting of Cassandra.yaml file could be overridden in virtual
machine. At the virtual machine level the configuration and protocol can be
changes according to industry for more security. The SSL encryption is used for
Cassandra database which is for client to node, node to node, server
certification. The data is protected from the client machine side using secure
socket layer. Similarly the data transfer is also protected in cluster. The
generation of certification is carried out for all these protection.
3.2.2 Authentication
This database also follows the protocol for authentication
which can be pluggable into Cassandra. The use of authenticator setting in
Cassandra.yaml file enables the administrators for use these features. Allowallauthenticator
is at the beginning by default which acts as authentication and it doesn’t
require credentials. There is also passwordauthenticator for default use of authentication
in Cassandra and the credentials are stored by encryption (Hewitt, 2010) .
3.2.3 Authorization
The authorization can be configured in Cassandra using
authorizer setting in Cassandra.yaml file. Its configured allowallauthorizer by
default that doesn’t check for permission and gives all user permission to use.
The Cassandra provides options for adding security and changes it according to
use. It is flexible to get level of security that is required by the industry
and administrators (Ellis, 2009) .
No comments:
Post a Comment