Technologies for Big Data Applications work group

Worldwide the amount of data doubles every two years - some are already talking about a "gold-digger atmosphere". Big Data technologies make it possible to process very large volumes of unstructured data, to increase its value and make it economically usable.


High Performance Cluster for Big Data Applications

Technologies for Big Data applications (I14430-1)

The Faculty of Computer Science has developed a scalable system in which various standard software components work together on a use case on real data. Since the beginning of 2013, teaching staff and students have had access to a High Performance Cluster (HPC) for multiple data analysis. In past semesters this has resulted in several bachelor's and master's theses, some of them outstanding. In his award-winning thesis, "Analysis and Implementation of a Big Data Architecture on a Computer Cluster", Tim Schmiedl developed the Lioncub platform using the basic technology of Apache Hadoop, the distributed databank system HBase and the streaming framework Storm.

As a high-performance reference platform for big data technologies, Lioncub is being continually extended and optimized by the Faculty. Currently the Lioncub platform is administering over one million articles from social networks and news portals, and is available over a web surface. By entering a password, documents can be viewed and results shown in the form of diagrams or URL listings within a few seconds. However, this is not the most important application of Lioncub. Its primary function for the Faculty is as a research platform for innovative software from the field of Big Data. It can easily be used for other applications. For example, after reading data from the CAN-Bus of vehicles, Lioncub can be used to draw conclusions on the driving behaviour of the driver.

The Faculty will continually extend the capacity of the HPC in the coming years; the first stage is already in progress. The HPC offers great potential for teaching and research: Big Data is expected to have far-reaching effects, not only on product development, business processes and science, but also on society as a whole.

Contact details

Event Processing for Big Data

Traditional Big Data technologies are optimized for saving and processing mass data, however they show deficits in the area of near real-time data evaluation. As a result, a range of architectures and frameworks have been introduced in the last few years in order to remove this deficit. The use of technologies from the field of event processing have played an important role in this, because it makes direct information gain from incoming data possible. The particular importance of event processing for the real-time processing of mass data was set out in the 2014/15 HFU Research Report of the same name.

Starting with investigations into the performance capability of Complex Event Processing (CEP) implementations, research at the Faculty includes examining the combination of CEP with the Storm streaming framework. Project and thesis work examine various approaches of how the use of declarative CEP enquiry language can be combined with the scalability mechanism of Storm. One possibility is, for example, to carry out the pattern recognition directly in a Storm bolt (a so-called "CEP bolt"). Philipp Stussak examined this approach more closely in his master's thesis "Complex Event Processing of large Amounts of Data", which in October 2014 won the Aesculap Prize for particularly innovative thesis work.

The effectiveness of possible combinations will now be further examined using selected application scenarios and practically tested with the HPC infrastructure.

Contact details