The buck stops here – NoSQL in a different light

The buck stops here – NoSQL in a different light 150 150 manu.mukundan
Written by Thomas Widmann

The buck stops here – NoSQL in a different light

Wednesday, 20 August 2014 00:00

The well-known statement “The buck stops here”, on the desk in the White House, can also be applied to the countless desks of IT architects.

the_buck_stops_here

Quelle: Buck Stops Here sign on Truman’s Desk, studyourhistory.com/archives/596

The concepts and technologies of the over 45-year-old mainframe are experiencing a renaissance almost unrecognised – the mainframe is not dead. The mainframe is characterised by enormous performance and reliability. These characteristics are both a symbol and a goal of every IT architecture.

It also applies to the well-known companies of the Web X.0 age, such as search engines or marketplaces. However, at the end of the day, the question arises why NoSQL and not the mainframe is almost inseparably linked to these companies. On closer inspection the reasons are obvious. It is not computationally intensive operations, it is the amount of inventory information to be processed that makes the difference.

Price of availability

If only the mainframe is considered, the number of mainframe MIPS has almost doubled in recent years. The costs are also almost linearly linked to the size of MIPS.

the_buck_stops_here

Hence, in the case of the mainframe, the number of possible operations is virtually paid for. In many organisations, it is similar to professional operations. Of course, expectations often differ from the actual number of operations. The profitability of the mainframe is thus directly linked to the knowledge of the scope of possible technical operations. In Web X.0 companies this point is and was unknown. It is possibly the decisive reason why these companies did not rely on the mainframe. For these companies, it was necessary to find alternatives to the mainframe.

Alternatives were found, which are referred to under the collective term “NoSQL”. Here, not only the “No SQL databases” are meant, but the complete ecosystem around these databases. NoSQL is thus the answer to a dynamic and rapidly changing business environment. But how can NoSQL be an answer to this environment if the data volume and the required availability do not decrease?

The pillars of the mainframe

The mainframe is characterised by three essential features.

  • The scaling. In the mainframe, transactions are distributed and parallelized to the free MIPS with the seemingly optimal distribution.
  • The throughput. By storing data in VSAM files, the mainframe has an optimal I/O throughput and efficient access to large data volumes.
  • The reliability. Due to the hardware architecture, the system stability of the mainframe is guaranteed.

Especially the aspects of scaling and throughput require a wealth of experience, for example with Java Enterprise applications. Once these possibilities have been exhausted, the next step is to start with data management. Sooner or later, however, relational databases reach the limits of feasibility. A real-life example shows how this point can be reached with an almost technically simple problem definition. For example, an attempt to map an insurance product with thousands of product variants in millions of contracts efficiently and with adequate availability fails. The response times of the enquiries, even with optimal indexes, go down to the unusable range. The answer to this problem is NoSQL data stores such as Google Big-Table. With free implementation, such as Apache HBASE, the necessary comparison between mainframe and NOSQL concepts is easily made.

NoSQL compared to the mainframe

In the ecosystem around HBASE there are countless other Apache Hadoop projects. If these are included in the comparison, the relationships to the mainframe become immediately apparent. Mahout’s data management is based on the Hadoop file system, HDFS for short. HDFS is a distributed file system, which is resistant to hardware errors due to its distributed design. In addition to this fault tolerance, it is explicitly designed for the batch processing of files up to several terabytes in size (the so-called Streaming Data Access on Large Data Sets) http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html. Files in the mainframe environment have the same properties.

With such large files, and an efficient access method is essential. In the mainframe environment, this is given for example with VSAM files. However, a VSAM file does not only have to represent a physical file. These logical properties correspond to the concept of Apache HBASE [http://hbase.apache.org/]. The VSAM catalogue VVDS keys (VSAM Volume Data Set) are similar to the RowKey of the BigTable Datastore [ http://www.redbooks.ibm.com/abstracts/sg246105.html?Open]. This makes it easy to find alternatives for reliability and data throughput (availability) in the commodity hardware environment.

But what about the scaling. Also for the scaling, you can find adequate help in the NoSQL environment. Using Map-Reduce, complex tasks, for example based on HBase data, can be processed within a very short time [http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html]. With Map-Reduce, the often extensive calculations (map operations) are processed as parallel as possible. The results of the map operations are reduced in various mergers (Reduce operations) to the final result. [http://en.wikipedia.org/wiki/MapReduce]. If one knows the mainframe and processing strategies under zOS, parallel ones can easily be identified.

the_buck_stop_here

Conclusion

It can be stated that the change from mainframe to commodity hardware and development environment is possible. But not only the technical feasibility is decisive. It is crucial that the technical know-how of the existing staff can continue to be used. It is a question of the right migration, profitability and strategic decisions. The mainframe is not dead but can be replaced.