NoSQL = Not Only SQL ( Structured
Query Language )
It was telling that "NOT ONLY STRUCTURED QUERIES would exist in data
science"
With Cloud / Big Data / IoT , new
concepts arrived. Volume, Variety,
Velocity
RDBMS could not respond well to these needs.
- Data could be variant. This means
UNSTRUCTURED DATA. RDBMS like
STRUCTURED DATA.
- Data volume may have any size like PBs (Peta Bytes) RDBMS does not operate well with PB data
sizes.
Also data continously grows but RDBMS
scalability has finite limits.
RDBMS generally scale up, for
infinite scalability scaling out is needed.
For scaling out, NoSQL systems always
work with replicated distributed(partitions/shards) data.
Data replication is built-in and it
is a MUST, it is not an OPTION.
- Data velocity is high so DB must
always perform well.
RDBMS can have complex long-running
queries making JOINS or FULL TABLE SCANS etc.
JOINS are generally prohibited at NoSQL systems.
Design is simpler at NoSQL, so
simplicity brings better performance.
(NoSQL systems are Key-Value Stores.
Key-Value logic is simpler. NO JOIN is also another simplicity.)
Query performances are generally
predictable and faster.
(No surprises like full table scans
makes results more predictable)
NoSQL systems are designed for such
big data processing needs.
NoSQL systems care PERFORMANCE a lot, while RDBMS systems care CONSISTENCY a
lot.
(More you care about CONSISTENCY, more you get LATENCY at NoSQL systems.)
RDBMS systems are generally READ-OPTIMIZED while NoSQL Systems are READ &
WRITE OPTIMIZED.
- NoSQL systems have flexible schema. (If your app still needs rigid schema
rules, those are at application scope with NoSQL.)
( app can implement SCHEMA on READ SEMANTIC, you can make any projections on
necessary data before reading.)
- NoSQL systems are generally
Write-Once-Read-Many systems.
- Relations between tables etc are not
so rigid like RDBMS in NoSQL.
NoSQL systems are categorized as
KEY-VALUE-STORE / DOCUMENT-STORE / COLUMN-STORE / GRAPH-BASED
Only Graph-based NoSQL systems carry relationships inside graphs. ( but it is
not as rigid as RDBMS )
Other 3 NoSQL systems does not carry data relationship. (These 3 are called
AGGREGATE STORES)
Data relationships may be important at AI applications or Fraud Applications
or Recommendation Engine Apps etc.
Graph databases also has graph types like "property graphs" /
"hypergraphs" / "RDF triple stores"
Sample DB names for each NoSQL
category are written below.
LevelDB,Oracle
NoSQL,Redis KEY-VALUE-STORE
NOSQL DB
MongoDB,CouchDB DOCUMENT-STORE
NOSQL DB
BigTable,Cassandra,HBase COLUMN-STORE
NOSQL DB
DataStax,Neo4j,OrientDB,
Hashgraph GRAPH-BASED NOSQL DB
- NoSQL systems are essentially
Key-Value (KV) Stores. They are easy and fast.
- Document-based systems are more advanced KV stores with more advanced
features.
You can store any kind of
hierarchical data with Document-based solutions.
- Column based solutions can store many column if application needs.
Eg, with Oracle database, you can't
create a table more than 1000 columns.
At HBase, you can create tables with
millions of rows and millions of columns.
- Graph based solutions keep relations.
Those information can be usefull for AI. GREMLIN API can be used for
querying.
NoSQL native data modelling language
is JSON ( JavaScript Object Notation ) But there may be some modifications
like BSON ( binary JSON ) of MongoDB. Each DB can also have its special query
languages like Cassandra CQL, HBase shell , HBase HiveQL etc...
NoSQL DB alternatives are still constrained against CAP theorem.It can also
be a criteria for you to choose your NoSQL database. For example, BigTable
care CONSISTENCY a lot. But MongoDB cares AVAILABILITY more than CONSISTENCY
in the event of NETWORK PARTITIONING. Cassandra optimize for
AVAILABILITY,LATENCY and RELAXED-CONSISTENCY
At some DB solutions, those can be configured with tunables about how much to
care about each parameter.
|