Posted on

UBITECH presents a scientific paper on big data services autonomous orchestration at IEEE BIGDATASERVICE 2021

UBITECH’s paper entitled “Towards Platform-Agnostic and Autonomous Orchestration of Big Data Services” has been accepted to be presented at the 7th IEEE International Conference on Big Data Computing Service and Machine Learning Applications (BIGDATASERVICE 2021) held online, worldwide from August 23rd to August 26th, 2021. The UBITECH’s Privacy-preserving Distributed Machine Learning research group presents a comprehensive microservices architecture to ease the management and enactment of end-to[1]end big data workflow management processes. It is developed along with intuitive graphical user interfaces to abstract and hide to the end user the specificities of the underlying network, storage and compute infrastructure. Entitled as Big Data Apps Composition Environment, it facilitates the design, composition, configuration, orchestration, enactment, and validation of end[1]to-end big data analytic services actuated into deployment workflows. The approach of Ms. Iatropoulou, Mr. Petrou, Dr Karagiorgou and Dr Alexandrou differentiates to the current engines, as it adopts a big data-driven methodology which is scalable to multiple executors and has embedded notebooks for on-demand and real-time scripting analytics. Therefore, big data services and analytic applications deployment are being accelerated, while semi-automatic scaling through the definition of multiple executors for improved time performance of demanding tasks is supported.

In particular, Ms. Iatropoulou and her co-authors introduce the Big Data Apps Composition Environment (BDCE) which facilitates the detaching of the design, development and execution of big data services and analytic processes, including diagnostic, descriptive, predictive and prescriptive analysis on top of big data and distributed machine or deep learning frameworks. A typical analysis process of BDCE may include to: retrieve datasets from distributed or centralised end-points; select a specific big data service coupled with the asso[1]ciated software image and execution endpoints; adjust the relevant deployment configuration parameters, such as parameters associated with network, storage and compute resources; and adjust the services input/output parameters according to their type and attributes (i.e., text, image, file, exter[1]nal/internal data sources or storage systems).

The Big Data Apps Composition Environment supports the provision of access to a set of registered big data, data curation and machine learning algorithms by means of containerised applications-images (i.e., Apps). These Apps are further integrated with a graph design and authoring user interface along with pop up fields which can be filled with the main deployment configuration and execution parameters. The back-end services retrieved by BDCE are provided in the form of microservices or new services which can be registered by adding a new Node, as a containerised application-image. The technology stack of the Big Data Apps Composition Environment is based on open source including custom UIs in Angular and the Spring Cloud Data Flow as the core data workflow management solution. The latter has been heavily re-engineered to incorporate big data, HPC, machine learning and deep learning frameworks, embedded notebooks for on-demand analysis and interactive visualisations. At the end of each big data application composition comprised of multiple Nodes interpolated within an end-to-end workflow, the user is able to save her pipeline by appending all the specifications for further usage in a YAML file, that can be directly actuated for deployment and can be also centrally stored in a structured storage schema for future purposes.