P-SOCRATES will develop a complete and coherent software system stack, able to bridge the gap between the application design, and the hardware many-core platform. The project will investigate on a new programming framework to combine real-time embedded mapping and scheduling techniques with high-performance parallel programming models and associated tools, able to express parallelisation of applications. The programming model will be extended to support real-time properties and timing information. The software stack (shown in Figure) will extract a task dependency graph from the application, statically mapping these tasks to the threads of the operating system, which will be then dynamically scheduled on the many-core platform.
Current high-performance parallel programming models exploit the performance out of parallel architectures based on decisions taken at run-time. This is the case of OmpSs, which allows tracking data-dependencies between different tasks of a program (in a similar way of what out-of-order processors do to track instructions data-dependencies) with annotations. These annotations are then interpreted by a source-to-source compiler that emits calls to the runtime system, which dynamically generates the task dependency graph at run-time. This graph is then used to optimally decide which tasks can run in parallel on the available processor resources. Unfortunately, the use of run-time information makes it difficult to provide real-time guarantees.
Therefore, in order to provide real-time guarantees without suffering any performance degradation, it is required to identify which run-time configuration is necessary to statically build at design time to provide time guarantees. To do so, P-SOCRATES will investigate on enhanced parallel programming models that will extend current parallel programming models by incorporating new annotations and compiler techniques to automatically generate an extended task dependency graph containing not only the data dependencies among tasks, but also relevant information to derive the impact on execution time due to sharing resources when tasks communicate.
This information will be then used by the mapping and scheduling algorithms, and by the timing and schedulability analysis module. The mapping algorithm will be enhanced to statically build the required run-time configuration, efficiently assigning tasks-to-threads in order to guarantee timing requirements without performance degradation. The underlying scheduling algorithm, implemented within the operating system, will then dynamically interpret the task-to-thread mapping into an efficient thread-to-core allocation, selecting which thread to execute on each core, and arbitrating the access to other shared resources.