The buck stops here – NoSQL viewed in another light

Written by Thomas Widmann

The buck stops here – NoSQL under another light

Wednesday, 20 August 2014 00:00

The well-known statement „The buck stops here“, on the desk in the white-house can also be placed on the countless desks of IT-architects.

Quelle: Buck Stops Here sign on Truman’s Desk, studyourhistory.com/archives/596

The concepts and technologies of the over 45-year-old mainframes experience almost an unnoticed renaissance – the mainframe is not dead. Tremendous performance and reliability are the distinguishing features of the mainframe. These properties are equally symbolic and desired goal of any IT architecture.

It also applies to the well-known enterprises of Web X.0 era, such as search engines or marketplaces. At the end of the day, however, the question remains as to why NoSQL and not the mainframe is linked almost inextricably with these companies. A closer look makes the reasons obvious. It is not the computationally intensive operations, but the sheer amount of available information to be processed, that make all the difference.

The Price of Availability

If only the mainframe is considered, the number of mainframe MIPS has almost doubled in recent years. Almost linearly coupled to the MIPS-numbers are also the costs.

Thus when it comes to the mainframe, virtually, the number of possible operations is paid. In many organizations, it is indeed the same technical operations. Of course, the expectation often differs from the operations actually achieved. The economics of the mainframe is thus directly linked to the knowledge about the extent of possible technical operations. To the Web X.0 companies, this point is and was an unknown. Perhaps it is the main reason why these companies have not relied on the mainframe. This meant that these companies had to find alternatives to the mainframe.
Alternatives were found, which fall under the general term “NoSQL”.  This refers not just to the “No SQL Databases”, but the entire ecosystem surrounding these databases. Thus NoSQL is the answer to a dynamic and rapidly changing business environment. But how can NoSQL be a solution to this environment, when the arising volume of data and the needed availability (retrievability) don’t reduce.

The pillars of the mainframe

The mainframe is distinguished by three essential features.

  • Scaling. When it comes to mainframes, transactions are distributed and parallelized with a seemingly optimal distribution on the free MIPS.
  • The Throughput. By keeping data in VSAM files, the mainframe has an optimum I / O throughput and efficient access to large amounts of data.
  • The Reliability.  The hardware architecture of the mainframe, ensures its reliability.

In particular, when it comes to the aspects of scaling and throughput, it is required to have an extensive experience, for example, with Java Enterprise applications. If these options reach their max, then it implies that the data management system has to be set up. In relational databases, however, one sooner or later arrives at the limits of feasibility. A real-world example shows, that one reaches this point, when dealing with even a technically simple problem. So one fails right away, when attempting to efficiently (and so that it is fairly retrievable) map an insurance product, with thousands of product-options in millions of signed contracts. The response times of queries go, even with optimal indices, below the unusable area. The answer to this problem are NoSQL data stores such as Google Big-Table. Based on the free implementation, such as the one from Apache HBase, the necessary comparison between mainframe and NoSQL concepts is easily prepared/established.

NoSQL in comparison to Mainframe

In the Ecosystem around HBase there are countless other Apache Hadoop projects. Incorporating these in the comparison, relations with the mainframe are immediately recognizable. Thus, the data storage of Mahout is based on the Hadoop file system, HDFS in short. HDFS is a distributed file system that is resistant to hardware failures attributed to its distributed design. In addition to this fault tolerance, it is explicitly designed for batch processing of files with file sizes of up to terabytes (the so-called streaming data access on Large Datasets) [http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html]. The files in the mainframe environment show the same features.

With such large files an efficient method for access becomes mandatory. In the mainframe environment for example, this is provided for with VSAM files. However, a VSAM file must represent not just a physical file. These logical properties correspond to the concept of Apache HBase [http://hbase.apache.org/]. The VSAM catalog VVDS key (VSAM Volume Data Set) are similar to the RowKey of BigTable datastore [http://www.redbooks.ibm.com/abstracts/sg246105.html?Open]. With this alternatives for reliability and data throughput (the availability) can easily be found in the environment of commodity hardware.

But how does the scalability scenario look like. Adequate aids for scalability are found also in NoSQL environment. Using Map-Reduce complex tasks based on HBase data, for example, can be processed within the shortest time [http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html]. In MapReduce often extensive calculations (Map operations) are processed as far as possible in parallel. The results of the map operations will be reduced, in various merges (Reduce operations) to the final result. [http://en.wikipedia.org/wiki/MapReduce]. When one knows the mainframe and processing strategies under zOS, parallels can easily be drawn.

Conclusion

It can be established that the transition from mainframe to commodity hardware and development environment is possible. But the technical feasibility is not the only critical factor. It is crucial that the technical know-how of the existing workforce can be continued to be used. It is a question of proper migration, the profitability and strategic decisions. The mainframe is not dead, can be however superseded.