A scientific paper entitled “BigDataStack: A Holistic Data-driven Stack for Big Data Applications and Operations” has been co-authored by UBITECH and is presented at the IEEE International Congress on Big Data (Big Data Congress 2018) that takes place between 2-7 July, 2018 in San Francisco, California, USA. In this paper, Dr. Panagiotis Gouvas and his co-authors present the architecture of a complete stack (namely BigDataStack), based on a frontrunner infrastructure management system that drives decisions according to data aspects, thus being fully scalable, runtime adaptable and high-performant to address the needs of big data operations and data-intensive applications. Furthermore, the stack goes beyond purely infrastructure elements by introducing techniques for the dimensioning of big data applications, modelling and analysis of processes as well as the provision of data-as-a-service exploiting a proposed seamless analytics framework.
In particular, the proposed data-driven BigDataStack architecture, which aims at ensuring that infrastructure management, will be fully efficient and optimized for data operations and data-intensive applications. As a holistic solution, the BigDataStack architecture also incorporates approaches that range from data-focused application analysis and dimensioning, process modelling, management and runtime optimization, to information-driven networking. Moreover, the architecture introduces a toolkit, which allows the specification of analytics tasks way and their efficient integration and execution on top of the proposed infrastructure management system.
While the majority of the approaches for data operations and data-intensive applications (e.g. Hadoop, Spark, Hive, etc) “run on top” on typical infrastructure management systems (e.g. Mesos, Docker, OpenStack, etc), BigDataStack provides a data-driven infrastructure management system that is fully efficient and optimized for data operations, managing resources according to data-based decisions. The goal is to provide Data as a Service as an optimum offering on top of an environment being managed through data-driven decisions, turning raw data into valuable knowledge through data functions across the complete data path.