DEF

Parallel execution and pushing large and time consuming compute-tasks into the Cloud becomes more and more important. Several research projects and providers of execution environments work on tools and frameworks to extend a specific programming language or Problem Solving Environment (PSE) with features to enable their product for parallel execution, e.g. parallel fortran [1], Unified-Parallel C (UPC) [2], MATLAB [3]. Cloud providers offer program language specific APIs for their Cloud environment to enable developers to integrate applications into the Cloud, e.g. Amazon Web Services (AWS)1.

In our EnFiLo2 project we were confronted with the situation that several analysts were working on different problems in the domains of energy, finance, and logistics, applying algorithms developed by themselves to complex optimization and simulation problems, each of them using their preferred programming and runtime environment. The analysts found out that a large number of algorithms developed by their colleagues could well be used for the problems they were working on, probably in just a slightly adapted, more abstract form, so that they could be applied to a different domain. And the analysts now wanted to find a way to reuse at least parts of the algorithms of their colleagues to make their lives easier and not always have to re-implement already existing code. So it is a central objective of the EnFiLo project to define and extract reusable parts of computationally complex algorithms in the form of library routines which can be executed in parallel and which can be used across domains. Though the resulting library routines were reusable and domain independent, the next problem was that the analysts were using different programming and runtime environments, which means that the reusable library routines could not directly be invoked from the runtime environments of the colleagues. Therefore a solution must be found to enable the parallel execution of library routines which can be written in an arbitrary programming language and which can be invoked from applications also written in an another programming language, without the application programmer needing to know the implementation details or programming language of the library routine used in his application. This was the idea to develop the Distributed Execution Framework (DEF).

Therefore the central goal of the DEF is to develop a system that allows for parallelized execution of library routines, independent of programming languages or platforms. We found that, before the DEF, there were no environments that support the parallelized invocation of tasks across arbitrary execution environments. In this paper we describe the design of a flexible, easy-to-use framework and execution environment that allows the parallel execution of library routines across different runtime environments invoked from some arbitrary client.

The DEF allows to (1) deploy arbitrary routines into a central algorithm library and (2) integrate these library routines at runtime into user programs in a way that enables the routines to be executed in parallel in the Cloud. Figure 1 illustrates that users can integrate implementations of their algorithms into an algorithm library. These library routines typically represent reusable modules which are invoked several times to operate independently on different data sets within larger programs. The developer can choose an arbitrary programming language or PSE to implement the routine. In this sense, the library routines follow the Single Program Multiple Data (SPMD) model [4, 5]. Every library routine needs to have a signature description document in which the interface of the library routine is defined.

https://static-content.springer.com/image/art%3A10.1186%2Fs13677-016-0070-z/MediaObjects/13677_2016_70_Fig1_HTML.gif
Fig. 1

DEF – General Overview

Based on the signature description document, any user can call the routine from any client program by applying it to the correct parameters and data. Typically, these library routines are called multiple times within a loop which is referred to as “loop parallelization” ([6], p. 122). Therefore these routines represent the computationally intensive and time consuming parts of the program, which predestines them to be executed in parallel on the worker nodes of a compute cluster.

An a priori analysis of the problem sets in the EnFiLo project revealed that the applications to be developed for our project can be reduced to calls of embarrassingly parallel computations, which means that the same routine is invoked in parallel on different independent input parameter sets, i.e. there is no communication required between the parallel computations ([6], p. 60) ([7], p. 121). The parameters (esp. large collections) for the library routines can be uploaded as shared resources before the library routines are invoked (call by reference) or they (esp. small base type parameters) can be directly attached to the routine call (call by value). The DEF then distributes the library calls among the worker nodes of a compute cluster. The results of the calls can synchronously or asynchronously be retrieved from the DEF and can be used for further computations.

To be able to provide flexible cluster structures we have set up the DEF on Cloud technologies. In this sense, the DEF could be viewed as a PaaS (Platform as a Service)3 that enables the deployment and concurrent execution of library routines. To be installable on multiple platforms, the DEF runtime environment is implemented in Java. With the DEF we want to achieve the following benefits:

  • High performance through parallelization

  • High efficiency by utilizing potentially all available (local) resources

  • High scalability by using elastic Cloud technology

  • High user acceptance by providing a simple API for integrating the parallel invocations of library routines into user applications

  • High flexibility by offering a programming language independent library of efficient algorithm implementations that can be integrated into user applications

  • Openness to different data formats regarding input and output parameters by using JSON as data exchange format

  • The DEF components are deployable on all common desktop-server hardware/OS platforms

  • Support users on searching for appropriate algorithms in the library

  • Security by supporting private and public Cloud infrastructures on demand

The novelty in this approach lies in enabling parallelized invocations of library routines, completely independent of programming languages or platforms. This is achieved by decoupling the client program from the implementation of the library routine and by dynamically distributing the library routine invocations to available worker instances in the cloud. This provides for programming language and platform independence as well as elasticity for large scale computations. To give an example: With the presented approach, analysts will able to execute MATLAB code in parallel without using MATLAB Parallel Computing Toolbox, invoked from a client implemented in Python. In this example the DEF shows an approximately linear speedup by adding more worker nodes.