To SQL or to NoSQL? Weighing the Trade-offs

Posted on Posted in Enterprise Information Management, Uncategorized

Over the past thirty years, relational databases have proved a reliable option for storing business information.  With the rise of Web 2.0, the need for scalable systems to accommodate vast quantities of data introduced a series of data storage and retrieval technologies that provide alternatives to the then-standard relational model.  While these NoSQL systems fulfil a need for a high degree of flexibility, they cannot compete with the security and authority of relationally-organized information.  Recently, the past decade’s debate of whether NoSQL databases will usurp the dominance of relational databases faces an increased complication in the potential of Web 3.0 disruption.  Emerging research across the field of information science indicates a trend not of one system out-competing another, but rather of increased specialization of database technologies, tailored to specific problems, and specific types of data.

Introduction

The relational-database model is attributed to Edgar Codd, who developed it in 1970 [6].  Relational databases management systems (RDBMS’s) structure data in sets of tables fitting pre-defined categories.  The tables each contain columns, representing pre-defined sub-categories, which contain unique instances of data in rows.  The relational-database model functionally allows users to access and/or reassemble the data in different ways, without reorganizing the tables [6].  To modify data, RDBMS’s generally utilize Structured Query Language (SQL), a programming language designed to work with rationally organized and fixed tables.  Many RDBMS’s utilize the following ACID principles to ensure the correctness of transitions [5,9].  Together, these properties are used to assure the valid state of the database, when viewed at a single point in time [4].

NoSQL database management systems (DBMS’s) emerged directly from the need for Web 2.0 companies to ingest, store and process volumes of growing data, flexibly.  These developers and communities – companies such as Google, Amazon, Yahoo and Facebook – realized that RDBMS’s were not suitable for the real-time management of voluminous, heterogeneous social media data [6, 8].  Instead of relying on a fixed, relational schema, NoSQL databases store values that can be indexed for retrieval by keys.  In general, these DBMS’s use column-oriented databases, document-based stores, and graph databases, and object-oriented databases architectures, which are uncommon in traditional RDBMS design [1, 5, 6, 7, 8].  Further, NoSQL systems may be classified as non-aggregate oriented, which heavily utilize relationships, or aggregate oriented, which divide relationships [1].

Seeking Equilibrium

Both relational and non-relational database technologies trade-off numerous benefits and risks, depending on the applications, data types, and objectives at hand.  For instance, because SQL systems are so prevalent, they possess a variety of features that make them user-accessible, such as a standardized language, customer support, standard user interfaces, and a plethora of design/management tools.  However, because these systems are so ubiquitous across a variety of applications, there is a lack of information guiding business-user decisions of when to use non-relational alternatives, and which alternative is optimal for a particular situation.  Although it promises agility, the trend towards implementing customized NoSQL solutions throughout an enterprise still smells too strongly of “early-adopter risk”.  For many established organizations, the industry debate between SQL or NoSQL is really a struggle between two devils – SQL systems, at least, are known.

Consistency vs. Performance

The most significant trade-off between SQL and NoSQL systems – i.e. relational databases vs. "everything else" – is the security and trustworthiness of vital, operational data for the agility, scalability and flexibility of big data.  Relational databases are specialized to structure data in a specific, well-defined, well-organized model [9].  Because they fully support ACID principles, transactions are not only highly reliable, the database also guarantees crash recovery.  The security risks that do exist are defined, and new research and product iterations continually improve against these gaps [9].  However, the same fail-saves that guarantee data also restrain performance [6].  The relational database stores multiple copies of data, which is centralized and unencrypted; in this way, the RDBMS is both inefficient and vulnerable to fraud, error and security attacks [3].

NoSQL databases do not require pre-defined schema, relationships or keys; less complex models translate to less time spent preparing data [7].  Additionally, that NoSQL systems don’t fully support ACID principles also translates into faster performance of storing and retrieving data [7].  Theses performance-focused design features lend themselves particularly to the manipulation of Big Data.  At the same time, however, their design leaves security as an afterthought [9].  Because NoSQL systems cannot be evaluated using ACID, transaction reliability is not natively assured [5, 6, 9].  Where ACID principles are programmed into NoSQL systems, designers face a performance vs. consistency trade-off – i.e. performance is negatively affected [6].  Further, there is some research that indicates a wide variation in the distribution of NoSQL database performance measures based on the type of operation performed, as well as the number of synchronous users [8].  Without ACID support, changes made in close proximity can overwrite other changes made within the database, especially if they occur close in time.

Precision Vs. Scale

Scaling is the bane of SQL databases.  Designed to structure data requiring precision and accuracy through adherence to ACID principles, relational databases do not join tables well across distributed architectures.  RDBMS scalability is thus dependent upon vertical architecture improvements (s.a. increased RAM and storage capacity).  In vertical scaling, the costs and inefficiencies incurred by potential improvement initiatives are both impractical and prohibitive.  Lacking the features of scalability and data distribution, relational databases are not well suited to handle data warehouse applications or rise to the challenge of handling big data [9].

Where relational databases falter, NoSQL designs leap forward.  Their dependence on horizontal scalability means that more server nodes can be added inexpensively [7, 9].  In non-relational models, the lack of dependence on table joins means that data can be distributed across a cluster of commodity servers [5, 7].  The ability to distribute data is fundamental to the requirement to maintain large volumes of data.  Further, the scalability of NoSQL systems also supports a high degree of availability for large data sets [9].  According to Big Data engineer Jarosław Kurpanik, operating on a cluster environment supports both the high productivity of data recording and minimizes readout delays [5].  However, the tradeoff for scalability is in the need for NoSQL systems to give up ACID properties.  This trade-off results in less-strict assurance of the transactions.  For some applications, such as social-media, this trade-off is imperative in favor of performance.  However, as more organizations look to leverage the vendor-promised performance potential of NoSQL systems, the assurance of their operational data becomes of heightened importance.

Complexity vs. Agility

The structured nature of the relational model works best in representing structured data.  Complexity becomes an issue when user’s data doesn’t fit easily into tables [9].  Then, the process of data preparation can be intensive in fitting data within a relational model.  Further, the queries required to access this less structured information housed within a relational model can entail large amounts of complex code [6].  The complexity of representing anything but structured data within an RDMBS imposes the trade-off of the agility required of modern developments.

NoSQL datasets aim to both reduce modelling complexity and technical requirements in favor of performance agility.  As a result, NoSQL systems flexibly handle the storage of unstructured, semi-structured or structured data [1, 9].  In some cases, it is even possible to represent data based on two or more models [1].  This flexibility allows systems to change and evolve over time.  Because NoSQL systems are still immature, they lack many of the built-in support features familiar to relational databases.  While the reduction of these features reduces the complexity of the systems, it introduces complexity to potential business consumers.  Principally, the lack of a standard “No-SQL” query language for DBMS’s introduces a level of complexity to end-users.  Without SQL, these systems require manual programming.  Further, the queries themselves range in complexity, from fast for simple tasks, and cumbersome for more complex executions [6].

Discussion

Both relational and non-relational databases lend themselves to particular uses.  Relational databases are specialized to serve business operations, financial transactions, asset management and personal data management [3, 9].  SQL systems best support structured data requiring precision [6].  Historically designed and optimized to assure transaction consistency, RDBMS’s currently provide the foundation of many OLTP and OLAP applications [2].  NoSQL databases, on the other hand, are specialized to process Big Data.  They are specialized to support massively parallel and geographically distributed database systems, such as web applications; the trade-off for performance over precision lends them to public and content-centric applications [9]. Further, the flexibility of NoSQL systems in managing text as well as capability for agile stream processing lends them to data warehouse applications [9]. An emerging need is for precision in processing large data, and here we find new breeds of hybrid SQL/NoSQL architectures.

With all of these trade-offs to weigh and balance, and with more technological innovations emerging on the horizon, it is clear that the question of whether or not NoSQL systems will outcompete SQL systems is inadequate.  Database technologies are clearly diversifying – and this is merely the first wave.  Hybrid SQL/NoSQL technologies seek to lend scalability and flexibility of NoSQL models to familiar, relational environments.  Blockchain architectures are beginning to shift the paradigm of centralized ownership of mutable data to one of decentralized ownership of immutable transactions.  A more important question, particularly for businesses seeking to invest in their analytic capability, is when it is ideal to use which technologies to most effectively turn information into actionable insight.  However, in order to build an adoption-support decision support framework, an abundance of research across a variety of scenarios is yet required.

References

(1) Bajat, B., Višnjevac, N., Mihajlović, R., Šoškić, M., & Cvijetinović, Ž. (2017). Using NoSQL databases in the 3D cadastre domain. Geodetski Vestnik, 61(03), 412-426. doi:10.15292//geodetski-vestnik.2017.03.412-426

(2) Baralis, E., Valle, A. D., Garza, P., Rossi, C., & Scullino, F. (2017). SQL versus NoSQL databases for geospatial applications. 2017 IEEE International Conference on Big Data (Big Data), 3388-3396. doi:10.1109/bigdata.2017.8258324

(3) Brandon, D. (2016). The blockchain: The future of business information systems? International Journal of the Academic Business World, 10(2), 33-40.

(4) Greenspan, G. (2016, March 17). Blockchains vs centralized databases. Retrieved from https://www.multichain.com/blog/2016/03/blockchains-vs-centralized-databases/

(5) Kurpanik, J. (2017). NoSQL databases as a data warehouse for decision support systems. Journal of Science of the Military Academy of Land Forces, 49(3), 124-131. doi:10.5604/01.3001.0010.5128

(6) Leavitt, N. (2010). Will NoSQL databases live up to their promise? Computer, 43(2), 12-14. doi:10.1109/mc.2010.58

(7) Lee, K. K., Tang, W., & Choi, K. (2013). Alternatives to relational database: Comparison of NoSQL and XML approaches for clinical data storage. Computer Methods and Programs in Biomedicine, 110(1), 99-109. doi:10.1016/j.cmpb.2012.10.018

(8) Li, Y., & Manoharan, S. (2013). A performance comparison of SQL and NoSQL databases. 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM). doi:10.1109/pacrim.2013.6625441

(9) Mohamed, M. A., Altrafi, O. G., & Ismail, M. O. (2014). Relational vs. NoSQL databases: A survey. International Journal of Computer and Information Technology, 03(03), 598-601.