Michael Philippsen

Prof. Dr. Michael Philippsen

Computer Science Department
Programming Systems Group (Informatik 2)

Room: Raum 05.139
Martensstraße 3
91058 Erlangen

Student contact hour

On Mondays from 12:00 - 13:00 in room 05.139. Please prearrange an appointment by email.




  • Verification and validation in industrial practice

    (Own Funds)

    Term: since 01.01.2022

    Detection of flaky tests based on software version control data and test execution history

    Regression tests are carried out often and because of their volume also fully automatically. They are intended to ensure that changes to individual components of a software system do not have any unexpected side effects on the behavior of subsystems that they should not affect. However, even if a test case executes only unmodified code, it can still sometimes succeed and sometimes fail. This so-called "flaky" behavior can have different reasons, including race conditions due to concurrent execution or temporarily unavailable resources (e.g., network or databases). Flaky tests are a nuisance to the testing process in every respect, because they slow down or even interrupt the entire test execution and they undermine the confidence in the test results: if a test run is successful, it cannot necessarily be concluded that the program is really error-free, and if the test fails, expensive resources may have to be invested to reproduce and possibly fix the problem.

    The easiest way to detect test flakyness is to repeatedly run test cases on the identical code base until the test result changes or there is a reasonable statistical confidence that the test is non-flaky. However, this is rarely possible in an industrial environment, as integration or system tests can be extremely time-consuming and resource-demanding, e.g., because they require the availability of special test hardware. For this reason, it is desirable to classify test cases with regard to their flakyness without repeated re-execution, but instead to only use the information already available from the preceding development and test phases.

    In 2022, we implemented and compare various so-called black box methods for detecting test flakyness and evaluated them in a real industrial test process with 200 test cases. We classify test cases exclusively on the basis of generally available information from version control systems and test execution tools, i.e., in particular without an extensive analysis of the code base and without monitoring of the test coverage, which would in most cases be impossible for embedded systems anyway. From the 122 available indicators (including the test execution time, the number of lines of code, or the number of changed lines of code in the last 3, 14, and 54 days) we extracted different subsets and examined their suitability for detecting test flakyness using different techniques. The methods applied on the feature subsets include rule-based methods (e.g., "a test is flaky if it has failed at least five times within the observation window, but not five times in a row"), empirical evaluations (including the computation of the cumulative weighted "flip rate", i.e., the frequency of alternating between test success and failure) as well as various methods from the domain of machine learning (e.g., classification trees, random forest, or multi-layer perceptrons). By using AI-based classifiers together with the SHAP approach for explaining AI models we determined the four most important indicators ("features") for detecting test flakyness in the industrial environment under consideration. The so-called "gradient boosting" with the complete set of indicators has proven to be optimal (with an F1-score of 96.5%). The same method with only four selected features achieved just marginally lower accuracy and recall values (with almost the same F1 score).

    Synergies of a-priori and a-posteriori analysis methods to explain artificial intelligence

    Artificial intelligence is rapidly conquering more domains of everyday life and machines make more critical decisions: braking or evasive maneuvers in autonomous driving, credit(un)worthiness of individuals or companies, diagnosis of diseases from various examination results (e.g., cancer detection from CT/MRT scans), and many more. In order for such a system to receive trust in a real-life productive setting, it must be ensured and proven that the learned decision rules are correct and reflect reality. The training of a machine model itself is a very resource-intensive process and the quality of the result can usually only be quantified afterwards with extremely great effort and well-founded specialist knowledge. The success and quality of the learned model not only depends on the choice of a particular AI method, but is also strongly influenced by the magnitude and quality of the training data.

    In 2022, we therefore examined which qualitative and quantitative properties an input set must have ("a priori evaluation") in order to achieve a good AI model ("a posteriori evaluation"). For this purpose, we compared various evaluation criteria from the literature and we defined four basic indicators based on them: representativeness, freedom from redundancy, completeness, and correctness. The associated metrics allow a quantitative evaluation of the training data in advance of preparing the model. To investigate the impact of poor training data on an AI model, we experimented with the so-called "dSprites" dataset, a popular generator for image files used in the evaluation of image resp. pattern recognition methods. This way, we generated different training data sets that differ in exactly one of the four basic indicators and have quantitatively different "a priori quality". We used all of them to train two different AI models: Random Forest and Convolutional Neural Networks. Finally, we quantitatively evaluated the quality of the classification by the respective model using the usual statistical measures (accuracy, precision, recall, F1-score). In addition, we used SHAP (a method for explaining AI models) to determine the reasons for any misclassification in cases of poor data quality. As expected, the model quality highly correlates with the training data quality: the better the latter is with regard to the four basic indicators, the more precise is the classification of unknown data by the trained models. However, a noteworthy discovery has emerged while experimenting with the lack of redundancy: If a trained model is evaluated with completely new/unknown inputs, the accuracy of the classification is sometimes significantly worse than if the available input data is split into a training and an evaluation data set: In the latter case, the a posteriori evaluation of the trained AI system misleadingly suggests a higher model quality.

    Few-Shot Out-of-Domain Detection in Natural Language Processing Applications

    Natural language processing (NLP for short) using artificial intelligence has many areas of application, e.g., telephone or written dialogue systems (so-called chat bots) that provide cinema information, book a ticket, take sick leave, or answer various questions arising during certain industrial processes. Such chat bots are often also involved in social media, e.g., to recognize critical statements and to moderate them if necessary. With increasing progress in the field of artificial intelligence in general and NLP in particular, self-learning models are spreading that dynamically (and therefore mostly unsupervised) supplement their technical and linguistic knowledge from concrete practical use. But such approaches are susceptible to intentional or unintentional malicious disguise. Examples from industrial practice have shown that chat bots quickly "learn" for instance racist statements in social networks and then make dangerous extremist statements. It is therefore of central importance that NLP-based models are able to distinguish between valid "In-Domain (ID)" and invalid "Out-Of-Domain (OOD)" data (i.e., both inputs and outputs). However, the developers of an NLP system need an immense amount of ID and OOD training data for the initial training of the AI model. While the former are already difficult to find in sufficient quantities, the a priori choice of the latter is usually hardly possible in a meaningful way.

    In 2022, we therefore examined and compared different approaches to OOD detection that work with little to no training data at all (hence called "few-shot"). The currently best and most widespread, transformer-based and pre-trained language model RoBERTa served as the basis for the experimental evaluation. To improve the OOD detection, we applied "fine-tuning" and examined how reliable the adaptation of a pre-trained model to a specific domain can be done. In addition, we implemented various scoring methods and evaluated them to determine threshold values for the classification of ID and OOD data. To solve the problem of missing training data, we also evaluated a technique called "data augmentation": with little effort GPT3 ("Generative Pretrained Transformer 3", an autoregressive language model that uses deep learning to generate human-like text) can generate additional and safe ID and OOD data to train and evaluate NLP models.

    Application of weighted combinatorics in the generation and selection of parameters and their representatives in software testing

    Some functional testing methods (so-called black box tests), such as the equivalence class testing or boundary value analysis, focus on individual parameters. For these parameters, they determine representatives (values or classes of values) to be considered in the test. Since not just a single parameter but several parameters are usually required to perform such tests, representatives of several parameters must be combined with each other to be used for test execution. Well-understood combinatorial methods such as "All Combinations", "Pair-wise" or "Each choice" are usually used for this purpose. They do not take into account information about weights (attributes such as importance or priority) of the parameters and equivalence class representatives, which would affect the number of associated test cases (e.g. due to importance) or their recommended order (in terms of prioritization). In addition, in the case of the equivalence class method, there are scenarios in which a combination of several invalid classes in a single test case could optionally be explicitly desired, completely undesirable or limited to a certain number in order to specifically test fault combinations on the one hand, but also to simplify fault localization on the other. There is reason to believe that by considering such weights and options, more targeted and ultimately more efficient test cases can be derived.

    In 2023, we evaluated and compared known combinatorial approaches that take into account weights when combining parameters or their values. Based on this, we developed a novel approach to generate and select parameters and their representatives in software testing. The proposed method uses a weighting system to prioritize the individual parameters, their equivalence classes and concrete representatives, in a set of test cases. If necessary, their interactions can also be specifically weighted in order to allow certain combinations to occur more frequently in the generated test cases. To evaluate the approach, we defined a suitable prototype data structure that represents the various weightings. We then implemented evaluation functions for existing sets of test cases in order to quantitatively determine how well such a test case set satisfies the specified combinatorics. In a further step, we used these evaluation functions in combination with various systematic methods and heuristics (SAT solver Z3, simulated annealing, and genetic algorithms) to generate new test cases that match the weighting or to optimize existing sets by adding missing test cases. Simulated Annealing was the fastest and gave the best results in the test series. Although the SAT-approach worked well for small problems, it was no longer practical for larger test cases due to exorbitant runtimes.

  • Promote computer science as the basis for successful STEM studies along the entire education chain.

    (Third Party Funds Single)

    Term: 01.11.2019 - 31.10.2022
    Funding source: Bayerisches Staatsministerium für Wissenschaft und Kunst (StMWK) (seit 2018)
    URL: https://www.ddi.tf.fau.de/forschung/laufende-projekte/cs4mints-informatik-als-grundlage-eines-erfolgreichen-mint-studiums-entlan

    Progressive digitalization is changing not only the job market but also the educational landscape. With funding from the DigitalPakt Schule and in detail from the BAYERN DIGITAL II program, serious changes in computer science education are being driven forward, which entail new challenges at the various levels of education. 

    The CS4MINTS project addresses these challenges along with the educational levels and ties in with measures already launched as part of the MINTerAKTIV project, such as strengthening the encounter of increasing student heterogeneity in the introductory computer science course.

    For example, for promoting gifted students, the Frühstudium in computer science is actively promoted for girls, and the offer is explicitly expanded. A significant increase in the proportion of women in computer science is to be achieved in the long term through early action against gender-specific stereotypes regarding computer science and an expansion of the training program to include gender-sensitive computer science instruction in all types of schools.

    The expansion of the compulsory subject of computer science in all schools also creates a great need for suitable teaching concepts and a strengthening of teacher training. For this purpose, a regional network is to be established during the project period to provide university-developed and evaluated teaching ideas for strengthening STEM in the curricular and extra-curricular settings. 

    In 2020, we began the initial piloting of the design to automate feedback in the introductory programming exercises. For this purpose, the return values of the JUnit tests of students' solutions were analyzed, and possible sources of errors were investigated. The next step is to work out a way to infer programming errors or student misconceptions based on these return values. Finally, these efforts aim to provide the students with automatically generated, competence-oriented feedback available to them after the programming tasks have been submitted (or, if necessary, already during the development phase). The feedback should show where errors occurred in the program code and point out possible causes.

    Concerning handling heterogeneity, we have compared the Repetitorium Informatik (RIP) course content with the Bavarian curriculum of different school types in 2020. Subsequently, the content must be adapted so that first-year students from the most diverse educational backgrounds have equal opportunities to identify possible deficits through the Repetitorium and remedy them. Besides, a daily programming consultation hour was set up for the first time during the Repetitorium in the winter term 2020. Here, participants were able to ask questions and receive feedback on the assignments.

    For many students, the initial steps of learning to program is one of the major challenges at the beginning of their studies. In order to provide additional feedback to novice programmers, we have designed and piloted the Feedback+ project in 2021. Within the framework of Feedback+, students have the opportunity to document problems that occur during the processing of the exercises or during the setup/use of the programming environment. They can also receive additional feedback in individual consultation sessions (weekly). For this purpose, we have set up a StudOn environment in which problems can be systematically documented. An initial evaluation in the form of individual interviews with the participating students received consistently positive feedback and is motivation to continue the project.

  • Cooperative Exploration and Analysis of Software in a Virtual/Augmented Reality Appliance

    (Third Party Funds Group – Sub project)

    Overall project: Cooperative Exploration and Analysis of Software in a Virtual/Augmented Reality Appliance
    Term: 01.09.2018 - 31.12.2022
    Funding source: Bundesministerium für Wirtschaft und Technologie (BMWi)
    URL: https://www2.cs.fau.de/research/Holoware/
    Understanding software has a large share in the programming efforts of a software systems, up to 30% in development projects and up to 80% in maintenance projects. Therefore, an efficient and effective way for comprehending software is necessary in a modern software engineering workplace. Three-dimensional software visualization already boosts comprehension and efficiency, so utilization of the latest virtual reality techniques seems natural. Within the scope of the Holoware project, we create an environment to cooperatively explore and analyze a software project using virtual/augmented reality techniques as well as artificial intelligence algorithms. The software project in question is being visualized in said virtual reality, such that multiple participants can simultaneously explore and analyze the software. They can cooperate by communicating about their findings. Different participants benefit from different perspectives on the software, which is augmented by domain specific additional information. This provides them with intuitive access to the structure and behaviour of the software. Various use cases are possible, for example the cooperative analysis of a run time anomaly in a team of domain experts. The domain experts can see the same static structure, augmented with domain specific and detailed information. In the VR environment, they can share their findings and cooperate using their different expertise.

    In addition, the static and dynamic properties of the software system are analyzed. Static properties include source code, static call relationships or metrics such as LoC, cyclomatic complexity, etc. Dynamic properties can be grouped into logs, traces, runtime metrics, or configurations that are read in at runtime. The challenge lies in aggregating, analyzing, and correlating this wealth of information. An anomaly and significance detection is developed that automatically detects both structural and runtime anomalies. In addition, a prediction system is set up to make statements about component health. This makes it possible, for example, to predict which components are at risk of failing in the near future. Previously, the log entries were added to the traces, creating a detailed picture of the dynamic call relationships. These dynamic relationships are mapped to the static call graph because they describe calls that do not result from the static analysis (for example, REST calls across several distributed components).

    In 2018, the following significant contributions have been made: 

    • Development of a functional VR visualization prototype for demonstration and research purposes.
    • Mapping between dynamic run time data and static structure (required by later analysis and visualization tasks).
    • First draft and implementation of the trace anomaly detection by an unsupervised learning procedure. Evaluation and further improvements will follow in the coming months. 

    In 2019 we achieved the following improvements: 

    • Extension of the prototype to display dynamic software behaviour.
    • Cooperative (remote-)usability of the visualization prototype.
    • Interpretation of commit messages for anomaly detection.
    • Clustering system calls according to use cases. 

    Our paper "Towards Collaborative and Dynamic Software Visualization in VR" has been accepted for publication at the International Conference on Computer Graphics Theory and Applications (VISIGRAPP) 2020. It presents the efficiency of our prototype at increasing the software understanding process.  In 2020, our paper "A Layered Software City for Dependency Visualization" was accepted at the International Conference on Computer Graphics Theory and Applications (VISIGRAPP) 2021 and later received the "Best Paper Award". We demonstrated that our Layered Layout for Software Cities simplifies the analysis of software architecture and outperforms the standard layout by far. We successfully concluded the research project with a final prototype and the resulting publications. 

    In 2021, after the end of the official project funding we were asked to submit an extended version of the award paper (" Static And Dynamic Dependency Visualization In A Layered Software City") for review to a journal. Here we present a night view of the city that shows dynamic dependencies as arcs. We thus addressed a central, yet remaining issue: the visualization of dynamic dependencies. In the paper "Trace Visualization within the Software. City Metaphor: A Controlled Experiment on Program Comprehension" at the IEEE Working Conference on Software Visualization (VISSOFT), we displayed dynamic dependencies within the Software City by means of light intensities and were able to show that this representation is more helpful than drawing all dependencies. Also for this paper, we were invited to submit an extended journal article "Trace Visualization within the Software City Metaphor: Controlled Experiments on Program Comprehension" for review. This article demonstrates an extended visualization of dynamic dependencies and color arcs based on HTTP status codes. 

    In 2022, both journal papers were accepted: "Static And Dynamic Dependency Visualization in a Layered Software City" is published in Springer Nature Computer Science Journal and "Trace Visualization within the Software City Metaphor: Controlled Experiments on Program Comprehension" was accepted for the Information and Software Technology Journal. For the finalization of Holoware, all extensions were combined into one single visualization. For this purpose, different views were applied, allowing the user to switch between them: in the day view, the software architecture can be analyzed in the novel Holoware layered layout, and in the night view, dynamic dependencies are displayed. As part of a master thesis, Holoware was also implemented as an AR visualization, so that it can easily be used as a showcase or in everyday work.
    In mid 2023, we finalized the project with the dissertation "Visualizing the statics, dynamics and infrastructure of software using the city metaphor". It summarizes all investigated aspects: (a) the static structure of the system to understand the software architecture, (b) the dynamics of the system to understand the dynamic dependencies (e.g. modern microservice architectures), and (c) the infrastructure of the system to analyze costs and promote the understanding of software operation. We also uncovered another use case: the use of Holoware at trade fairs. The visualization of the software makes it easy to get into conversation with other software developers, as the visualized software can be discussed immediately. To this end, we simplified the setup of the AR and VR visualization so that Holoware can easily be started without a lot of prior technical knowledge. In addition, we improved the contrast of the visualization to make it easier to recognize outlines and arcs, especially in very bright lighting conditions.

  • Automatic Testing of Compilers

    (Own Funds)

    Compilers for programming languages are very complex applications and their correctness is crucial: If a compiler is erroneous (i.e., if its behavior deviates from that defined by the language specification), it may generate wrong code or crash with an error message. Often, such errors are hard to detect or circumvent. Thus, users typically demand a bug-free compiler implementation.

    Unfortunately, research studies and online bug databases suggest that probably no real compiler is bug-free. Several research works therefore aim to improve the quality of compilers. Since the formal verification (i.e., a proof of a compiler's correctness) is often prohibited in practice, most of the recent works focus on techniques for extensively testing compilers in an automated way. For this purpose, the compiler under test is usually fed with a test program and its behavior (or that of the generated program) is checked: If the actual behavior does not match the expectation (e.g., if the compiler crashes when fed with a valid test program), a compiler bug has been found. If this testing process is to be carried out in a fully automated way, three main challenges arise:

    • Where do the test programs come from that are fed into the compiler?
    • What is the expected behavior of the compiler or its output program? How can one determine if the compiler worked correctly?
    • How can test programs that indicate an error in the compiler be prepared to be most helpful in fixing the error in the compiler?

    While the scientific literature proposes several approaches for dealing with the second challenge (which are also already established in practice), the automatic generation of random test programs still remains a challenge. If all parts of a compiler should be tested, the test programs have to conform to all rules of the respective programming language, i.e., they have to be syntactically and semantically correct (and thus compilable). Due to the large number of rules of "real" programming languages, the generation of such compilable programs is a non-trivial task. This is further complicated by the fact that the program generation has to be as efficient as possible: Research suggests that the efficiency of such an approach significantly impacts its effectivity -- in a practical scenario, a tool can only be used for detecting compiler bugs if it can generate many (and large) programs in short time.

    The lack of an appropriate test program generator and the high costs associated with the development of such a tool often prevent the automatic testing of compilers in practice. Our research project therefore aims to reduce the effort for users to implement efficient program generators.

    Large programs generated by efficient automatic generation of random test programs are difficult to use for debugging. Typically, only a small part of the program is the cause of the error, and as many other parts as possible must be automatically removed before the error can be corrected.
    This so-called test case reduction also uses the solutions already mentioned for detecting the expected behavior so that a joint consideration makes sense.
    Test case reduction is an essential component for automatically generated programs and should be designed to process error-triggering programs from all sources.

    Unfortunately, it is often unclear which of the various methods presented in the scientific literature is best suited to a particular situation. Additionally, test case reduction can be a time-consuming process. Our research project aims to create a significant collection of unreduced test cases and to use them to compare and improve existing procedures.

    In 2018, we started the development of such a tool. As input, it requires a specification of a programming language's syntactic and semantic rules by means of an abstract attribute grammar. Such a grammar allows for a short notation of the rules on a high level of abstraction. Our newly devised algorithm then generates test programs that conform to all of the specified rules. It uses several novel technical ideas to reduce its expected runtime. This way, it can generate large sets of test programs in acceptable time, even when executed on a standard desktop computer. A first evaluation of our approach did not only show that it is efficient and effective, but also that it is versatile. Our approach detected several bugs in the C compilers gcc and clang (and achieved a bug detection rate which is comparable to that of a state-of-the-art C program generator from the literature) as well as multiple bugs in different SMT solvers. Some of the bugs that we detected were previously unknown to the respective developers.

    In 2019, we implemented additional features for the definition of language specifications and improved the efficiency of our program generator. These two contributions considerably increased the throughput of our tool. By developing additional language specifications, we were also able to uncover bugs in compilers for the programming languages Lua and SQL. The results of our work led to a publication that we submitted at the end of 2019 (and which has been accepted by now). Besides the work on our program generator, we also began working on a test case reduction technique. It reduces the size of a randomly generated test program that triggers a compiler bug since this eases the search for the bug's root cause.

    In 2020, we focussed on language-agnostic techniques for the automatic reduction of test programs. The scientific literature has proposed different reduction techniques, but since there is no conclusive comparison of these techniques yet, it is still unclear how efficient and effective the proposed techniques really are. We identified two main reasons for this, which also hamper the development and evaluation of new techniques. Firstly, the available implementations of the proposed reduction techniques use different implementation languages, program representations and input grammars. Therefore, a fair comparison of the proposed techniques is almost impossible with the available implementations. Secondly, there is no collection of (still unreduced) test programs that can be used for the evaluation of reduction techniques. As a result, the published techniques have only been evaluated with few test programs each, which compromises the significance of the published results. Furthermore, since some techniques have only been evaluated with test programs in a single programming language, it is still unclear how well these techniques generalize to other programming languages (i.e., how language-agnostic they really are). To close these gaps, we initiated the development of a framework that contains implementations of the most important reduction techniques and that enables a fair comparison of these techniques. In addition, we also started to work on a benchmark that already contains about 300 test programs in C and SMT-LIB 2 that trigger about 100 different bugs in real compilers. This benchmark not only enables conclusive comparisons of reduction techniques but also reduces the work for the evaluation of future techniques. Some first experiments already exposed that there is no reduction technique yet that performs best in all cases.

    In this year, we also investigated how the random program generator that has been developed in the context of this research project can be extended to not only detect functional bugs but also performance problems in compilers. A new technique has been developed within a thesis that first generates a set of random test programs and then applies an optimization technique to gradually mutate these programs. The goal is to find programs for which the compiler under test has a considerably higher runtime than a reference implementation. First experiments have shown that this approach can indeed detect performance problems in compilers.

    In 2021, we finished the implementation of the most important test case reduction techniques from the scientific literature as well as the construction of a benchmark for their evaluation. Building upon our framework and benchmark, we also conducted a quantitative comparison of the different techniques; to the best of our knowledge, this is by far the most extensive and conclusive comparison of the available reduction techniques to date. Our results show that there is no reduction technique yet that performs best in all cases. Furthermore, we detected that there are possible outliers for each technique, both in terms of efficiency (i.e., how quickly a reduction technique is able to reduce an input program) and effectiveness (i.e., how small the result of a reduction technique is). This indicates that there is still room for future work on test case reduction, and our results give some insights for the development of such future techniques. For example, we found that the hoisting of nodes in a program's syntax tree is mandatory for the generation of small results (i.e., to achieve a high effectiveness) and that an efficient procedure for handling list structures in the syntax tree is necessary. The results of our work led to a publication submitted and accepted in 2021.

    In this year, we also investigated if and how the effectiveness of our program generator can be increased by considering the coverage of the input grammar during the generation. To this end and within a thesis, several context-free coverage metrics from the scientific literature have been adapted, implemented and evaluated. The results showed that the correlation between the coverage w.r.t. a context-free coverage metric and the ability to detect bugs in a compiler is rather limited. Therefore, more advanced coverage metrics that also consider context-sensitive, semantic properties should be evaluated in future work.

    In 2022, we initiated the development of a new framework for the implementation of language-adapted reduction techniques. This framework introduces a novel domain-specific language (DSL) that allows the specification of reduction techniques in a simple and concise way. The framework and the developed DSL make is possible to easily adapt existing reduction techniques to the peculiarities and requirements of a specific programming language. It is our hope that such language-adapted reduction techniques can be even more efficient and effective than the existing, language-agnostic reduction techniques. In addition, the developed framework should also reduce the effort for the development of future reduction techniques; this way, our framework could make a valuable contribution to the research in this area.

    In 2023, the focus of the research project was on list structures, which had already been briefly addressed in 2021:
    Almost all methods investigated since 2021 group nodes in the syntax tree into lists in order to select only the necessary nodes from these lists using a list reduction. Our experiments have shown that in some cases 70% or more of the reduction time is spent on lists with more than 2 elements. These lists are relevant because there are several list reduction methods in the scientific literature, but they do not differ for lists with 2 or fewer elements. Since they take such a large fraction of time, we have worked on integrating these different list reduction methods into our implementations of the major reduction methods developed in 2020/2021. In addition to the methods found in the literature, we also considered methods that are only described on a website or whose source code is freely accessible.

    We also investigated how a list reduction can be interrupted at one point and resumed later. The idea was to reduce another list in the meantime, based on a prioritization, so that the list with the greater impact on the reduction always comes first. In some cases, the hoped-for speedup occurred, but questions remain that require further experiments with prioritizing reducers and interrupted list reduction methods.

  • OpenMP for reconfigurable heterogenous architectures

    (Third Party Funds Group – Sub project)

    Overall project: OpenMP für rekonfigurierbare heterogene Architekturen
    Term: 01.11.2017 - 31.12.2023
    Funding source: Bundesministerium für Bildung und Forschung (BMBF)
    URL: https://www2.cs.fau.de/research/ORKA/
    High-Performance Computing (HPC) is an important component of Europe's capacity for innovation and it is also seen as a building block of the digitization of the European industry. Reconfigurable technologies such as Field Programmable Gate Array (FPGA) modules are gaining in importance due to their energy efficiency, performance, and flexibility.
    There is also a trend towards heterogeneous systems with accelerators utilizing FPGAs. The great flexibility of FPGAs allows for a large class of HPC applications to be realized with FPGAs. However, FPGA programming has mainly been reserved for specialists as it is very time consuming. For that reason, the use of FPGAs in areas of scientific HPC is still rare today.
    In the HPC environment, there are various programming models for heterogeneous systems offering certain types of accelerators. Common models include OpenCL (http://www.opencl.org), OpenACC (https://www.openacc.org) and OpenMP (https://www.OpenMP.org). These standards, however, are not yet available for the use with FPGAs.

    Goals of the ORKA project are:

    1. Development of an OpenMP 4.0 compiler targeting heterogeneous computing platforms with FPGA accelerators in order to simplify the usage of such systems.
    2. Design and implementation of a source-to-source framework transforming C/C++ code with OpenMP 4.0 directives into executable programs utilizing both the host CPU and an FPGA.
    3. Utilization (and improvement) of existing algorithms mapping program code to FPGA hardware.
    4. Development of new (possibly heuristic) methods to optimize programs for inherently parallel architectures.

    In 2018, the following important contributions were made:

    • Development of a source-to-source compiler prototype for the rewriting of OpenMP C source code (cf. goal 2).
    • Development of an HLS compiler prototype capable of translating C code into hardware. This prototype later served as starting point for the work towards the goals 3 and 4.
    • Development of several experimental FPGA infrastructures for the execution of accelerator cores (necessary for the goals 1 and 2).

    In 2019, the following significant contributions were achieved:

    • Publication of two peer-reviewed papers: "OpenMP on FPGAs - A Survey" and "OpenMP to FPGA Offloading Prototype using OpenCL SDK".
    • Improvement of the source-to-source compiler in order to properly support OpenMP-target-outlining for FPGA targets (incl. smoke tests).
    • Completion of the first working ORKA-HPC prototype supporting a complete OpenMP-to-FPGA flow.
    • Formulation of a genome for the pragma-based genetic optimization of the high-level synthesis step during the ORKA-HPC flow.
    • Extension of the TaPaSCo composer to allow for hardware synchronization primitives inside of TaPaSCo systems.

    In 2020, the following significant contributions were achieved:

    • Improvement of the Genetic Optimization.
    • Engineering of a Docker container for reliable reproduction of results.
    • Integration of software components from project partners.
    • Development of a plugin architecture for Low-Level-Platforms.
    • Implementation and integration of two LLP plugin components.
    • Broadening of the accepted subset of OpenMP.
    • Enhancement of the test suite.

    In 2021, the following significant contributions were achieved:

    • Enhancement of the benchmark suite.
    • Enhancement of the test suite.
    • Successful project completion with live demo for the project sponsor.
    • Publication of the paper "ORKA-HPC - Practical OpenMP for FPGAs".
    • Release of the source code and the reproduction package.
    • Enhancement of the accepted OpenMP subset with new clauses to control the FPGA related transformations.
    • Improvement of the Genetic Optimization.
    • Comparison of the estimated performance data given by the HLS and the real performance.
    • Synthesis of a linear regression model for performance prediction based on that comparison.
    • Implementation of an infrastructure for the translation of OpenMP reduction clauses.
    • Automated translation of the OpenMP pragma `parallel for` into a parallel FPGA system.

    In 2022, the following significant contributions were achieved:

    • Generation and publication of an extensive dataset on HLS area estimates and actual performance.
    • Creation and comparative evaluation of different regression models to predict actual system performance from early (area) estimates.
    • Evaluation of the area estimates generated by the HLS.
    • Publication of the paper “Reducing OpenMP to FPGA Round-trip Times with Predictive Modelling”.
    • Development of a method to detect and remove redundant read operations in FPGA stencil codes based on the polyhedral model.
    • Implementation of the method for ORKA-HPC.
    • Quantitative evaluation of that method to show the strength of the method and to show when to use it.
    • Publication of the paper “Employing Polyhedral Methods to Reduce Data Movement in FPGA Stencil Codes”.

    In 2023, the following significant contributions were achieved:

    • Development and implementation of an optimization method for canonical loop shells (e.g. from OpenMP target regions) for FPGA hardware generation using HLS. The core of the method is a loop restructuring based on the polyhedral model that uses loop tiling, pipeline processing, and port widening to avoid unnecessary data transfers from/to the onboard RAM of the FPGA, increase the number of parallel active circuits, maximize data throughput to FPGA board RAM, and hide read/write latencies.
    • Quantitative evaluation of the strengths and application areas of this optimization method using ORKA-HPC.
    • Publication of the method in the conference paper "Employing polyhedral methods to optimize stencils on FPGAs with stencil-specific caches, data reuse, and wide data bursts".
    • Publication of a reproduction package for the optimization method.
    • Presentation of the method at the conference "14th International Workshop on Polyhedral Compilation Techniques" in a half-hour talk.
    • Development of a method for the fully automatic integration of multi-purpose caches into FPGA solutions generated from OpenMP.
    • Evaluation of multi-purpose caches in combination with HLS generated hardware blocks.
    • Publication of the paper "Multipurpose Cacheing to Accelerate OpenMP Target Regions on FPGAs" (Best Paper Award).
  • Recurrent Neuronal Networks (RNNs) for Real-Time Estimation of Nonlinear Motion Models

    (Third Party Funds Single)

    Term: 01.10.2017 - 31.03.2021
    Funding source: Fraunhofer-Gesellschaft
    URL: https://www2.cs.fau.de/research/RuNN/
    With the growing availability of information about an environment (e.g., the geometry of a gymnasium) and about the objects therein (e.g., athletes in the gymnasium), there is an increasing interest in bringing that information together profitably (so-called information fusion) and in processing that information. For example, one would like to reconstruct physically correct animations (e.g., in virtual reality, VR) of complex and highly dynamic movements (e.g., in sports situations) in real-time. Likewise, e.g., manufacturing plants of the industry, which suffer from unfavorable environmental conditions (e.g., magnetic field interference or missing GPS signal), benefit from, e.g., high-precision goods location. Typically, to describe movements, one uses either poses that describe a "snapshot" of a state of motion (e.g., idle state, stoppage), or a motion model that describes movement over time (e.g., walking or running). In addition, human movements may be identified, detected, and sensed by different sensors (e.g., on the body) and mapped in the form of poses and motion models. Different types of modern sensors (e.g., camera, radio, and inertial sensors) provide information of varying quality.In principle, with the help of expensive and highly precise measuring instruments, the extraction of the poses and resp. of the motion model, for example, from positions on small tracking areas is possible without errors. Positions, e.g., of human extremities, can describe or be described by poses and motion models. Camera-based sensors deliver the required high-frequency and high-precision reference measurements on small areas. However, as the size of the tracking surface increases, the usability of camera-based systems decreases (due to inaccuracies or occlusion issues). Likewise, on large areas radio and inertial sensors only provide noisy and inaccurate measurements. Although a combination of radio and inertial sensors based on Bayesian filters achieves greater accuracy, it is still inadequate to precisely sense human motion on large areas, e.g., in sports, as human movement changes abruptly and rapidly. Thus, the resulting motion models are inaccurate.Furthermore, every human movement is highly nonlinear (or unpredictable). We cannot map this nonlinearity correctly with today's motion models. Bayes filters describe these models but these (statistical) methods break down a nonlinear problem into linear subproblems, which in turn cannot physically represent the motion. In addition, current methods produce high latency when they require accuracy.Due to these three problems (inaccurate position data on large areas, nonlinearity, and latency), today's methods are unusable, e.g., for sports applications that require short response times. This project aims to counteract these nonlinearities by using machine learning methods. The project includes research on recurrent neural networks (RNN) to create nonlinear motion models. As modern Bayesian filtering methods (e.g., Kalman and Particle filters) and other statistical methods can only describe the linear portions of nonlinear human movements (e.g., the relative position of the head w.r.t. trunk while walking or running) they are thus physically not completely correct.Therefore, the main goal is to evaluate how machine learning methods can describe complex and nonlinear movements. We therefore examined whether RNNs describe the movements of an object physically correctly and support or replace previous methods. As part of a large-scale parameter study, we simulated physically correct movements and optimized RNN procedures on these simulations. We successfully showed that, with the help of suitable training methods, RNN models can either learn physical relationships or shapes of movement.
    This project addresses three key topics:
    I. A basic implementation investigates how and why methods of machine learning can be used to determine models of human movement.
    In 2018, we first established a deeper understanding of the initial situation and problem definition. With the help of different basic implementations (different motion models) we investigated (1) how different movements (e.g., humans: walk, run, slalom; vehicles: meander, zig-zag) affect measurement inaccuracies of different sensor families, (2) how measurement inaccuracies of different sensor families (e.g., visible orientation errors, audible noise, and deliberated artificial errors) affect human motion, and (3) how different filter methods for error correction (that balance accuracy and latency) affect both motion and sensing. In addition, we showed (4) how measurement inaccuracies (due to the use of current Bayesian filtering techniques) correlate nonlinearly with human posture (e.g., gait apparatus) and predictably affect health (simulator sickness) through machine learning.We studied methods of machine and deep learning for motion detection (humans: head, body, upper and lower extremity; vehicles: single- and bi-axial) and motion reconstruction (5) based on inertial, camera, and radio sensors, as well as various methods for feature extraction (e.g., SVM, DT, k-NN, VAE, 2D-CNN, 3D-CNN, RNN, LSTM, M/GRU). These were interconnected into different hybrid filter models to enrich extracted features with temporal and context-sensitive motion information, potentially creating more accurate, robust, and close to real-time motion models. In this way, these mechanics learned (6) motion models for multi-axis vehicles (e.g., forklifts) based on inertial, radio, and camera data, which generalize for different environments or tracking surfaces (with varying size, shape, and sensory structure, e.g., magnetic field, multipath, texturing, and illumination). Furthermore (7), we gained a deeper understanding of the effects of non-constant accelerated motion models on radio signals. On the basis of these findings, we trained an LSTM model that predicts different movement speeds and motion forms of a single-axis robot (i.e., Segway) close to real-time and more accurately than conventional methods.
    In 2019, we found that these models can also predict human movement (human movement model). We also determined that the LSTM models can either be fully self-sufficient at runtime or integrated as support points into localization estimates, e.g., into Pedestrian Dead Reckoning (PDR) methods.
    II. Based on this, we try to find ways to optimize the basic implementation in terms of robustness, latency, and reusability.
    In 2018, we used the findings from I. (1-7) to stabilize so-called (1) relative Pedestrian Dead Reckoning (PDR) methods using motion classifiers. These enable a generalization to any environment. A deeper radio signal understanding (2) allowed to learn long-term errors in RNN-based motion models. This improves the position accuracy, stability, and a near real-time prediction. First experiments showed the robustness of the movement models (3) with the help of different real (unknown to the models) movement trajectories for one- and two-axis vehicles. Furthermore, we investigated (4) how hybrid filter models (e.g., interconnection of feature extractors such as 2D/3D-CNNs and time-series trackers such as RNNs-LSTM) provide more accurate, more stable, and filtered (outlier-corrected) results.
    In 2019, we showed that models of the RNN family extrapolate movements into the future so that they compensate for the latency of the processing pipeline and beyond. Furthermore, we examined the explainability, interpretability, and robustness of the models examined here, and their reusability on the human movement.With the help of a simulator, we generated physically correct movements, e.g., positions of pedestrians, cyclists, cars, and planes. Based on this data, we showed that RNN models can interpolate between different types of movement and can compensate for missing data points, interpret white and random noise as such, and can extrapolate movements. The latter enables processing-specific latency to be compensated and enables human movement to be predicted from radio and inertial data in real time.Novel RNN architecture. Furthermore, in 2019, we researched a new architecture, or topology, of a neural network, that balances the strengths and weaknesses of flat neural networks (NN) and recurrent networks. We found this optimal NN for determining physically correct movement in a large-scale parameter study. In particular, we also optimized the model architecture and parameters for human-centered localization. These optimal architectures predict human movement far into the future from as little sensor information as possible. The architecture with the least localization error combines two DNNs with an RNN.Interpretability of models. In 2019, we examined the functionality of this new model. For this purpose, we researched a new process pipeline for the interpretation and explanation of the model. The pipeline uses the mutual information flow and the mutual transfer entropy in combination with various targeted manipulations of the hidden states and suitable visualization techniques to describe the state of the model at any time, both subjectively and objectively. In addition, we adapted a variational auto-encoder (VAE) to better visualize and interpret extracted features of a neural network. We designed and parameterized the VAE such that the reconstruction error of the signal is within the range of the measurement noise and at the same time forced the model to store disentangled features in its latent space. This disentanglement enabled the first subjective statements about the interrelationships of the features that are really necessary to optimally code the channel state of a radio signal.Compression. In 2019, we discovered a side effect of the VAE that offers the possibility of decentralized preprocessing of the channel information directly on the antenna. This compression then reduces the data traffic, lowers the communication load, and thus increases the number of possible participants in the communication and localization in a closed sensor network.Influence of the variation of the input information. In 2019, we also examined how changes in the input sequence length of a recurrent neural network affect the learning success and the type of results of the model. We discovered that a longer sequence persuades the model to be a motion model, i.e., to learn the form of movement, while with shorter sequences the model tends to learn physical relationships. The optimal balance between short and long sequences represents the highest accuracy.We investigated speed estimation using the new method. When used in a PDR model, this increased the position accuracy. An initial work in 2019 has examined in detail which methods are best suited to estimate the speed of human movement from a raw inertial signal. A new process, a combination of a one-dimensional CNN and a BLSTM, has replaced the state of the art.
    In 2020, we optimized the architecture of the model,with regard to its prediction accuracy and investigated the effects of a deep fusion of Bayesian and DL methods on the prediction accuracy and robustness.
    Optimization. In 2020, we improved the existing CNN and RNN architecture and proposed the fusion of ResNet and BLSTM. We replaced the CNN with a residual network to extract deeper and higher quality features from a continuous data stream. We showed that this architecture entails higher computing costs, but surpasses the accuracy of the state-of-the-art. In addition, the RNN architecture can be scaled down to counter the blurring of the context vector of the LSTM cells with very long input sequences, as the remaining ResNet network offers more qualitative features.Deep Bayesian Method. In 2020 we investigated whether methods of the RNN family can extract certain movement properties from recorded movement data to replace the measurement-, process-, and transition-noise distributions of a Kalman filter (KF). We showed that highly optimized LSTM cells can reconstruct trajectories more robust (low error variance) and more precise (positional accuracy)  than an equally highly optimized KF. The deep coupling of LSTM in KF, so-called Deep Bayes, provided the most robust and precise positions and trajectories. This study also showed that methods trained on realistic synthetic data, the Deep Bayesian method, needed the least real data to adapt to a new unknown domain, e.g., unknown motion shapes and velocity distribution.

    III. Finally, a demonstration of feasibility shall be tested.
    In 2018, a large-scale social science study opened the world's largest virtual dinosaur museum and showed that (1) a pre-selected (application-optimized) model of human movement robustly and accurately (i.e., without a significant impact on simulator sickness) maps human motion, resp. predicts it. We used this as a basis for comparison tests with other models that are human-centered and generalize to different environments.
    In 2019, we developed two new live demonstrators that are based on the research results achieved in I and II. (1) A model railway that crosses a landscape with a tunnel at variable speeds. The tunnel represents realistic and typical environmental characteristics that lead to nonlinear multipath propagation of a radio transmitter to be located and ultimately to an incorrectly determined position. This demonstrator shows that the RNN methods researched as part of the research project can localize highly precisely and robustly, both on complex channel impulse responses and on dimensionally reduced response times, and also deliver better results than conventional Kalman filters. (2) We used the second demonstrator to visualize the movement of a person's upper extremity. We recorded human movement using inexpensive inertial sensors attached to both arm joints, classified using machine-based and deep learning, and derived motion parameters. A graphic user interface visualizes the movement and the derived parameters in near real time.The planned generalizability, e.g., of human-centered models and the applicability of RNN-based methods in different environments, has been demonstrated using (1) and (2).In 2019, we applied the proposed methods in the following applications:Application: Radio Signal. We classified the channel information of a radio system hierarchically. We translated the localization problem of a Line of Sight (LoS) and Non Line of Sight (NLoS) classifier into a binary problem. Hence, we now can precisely localize a position within a meter, based on individual channel information from a single antenna if the environment provides heterogeneous channel propagation.Furthermore, we simulated LoS and NLoS channel information and used it to interpolate between different channels. This enables the providers of radio systems to respond to changing or new environments in the channel information a-priori in the simulation. By selectively retraining the models with the simulated knowledge, we enabled more robust models.Application: Camera and Radio Signal. We have shown how the RNN methods relate to information from other sensor families, e.g., video images, by combining radio and camera systems when training a model, the two sensor information streams merge smoothly, even in the event of occlusion of the camera. This yields a more robust and precise localization of multiple people.Application: Camera Signal. We used an RNN method to examine the temporal relationships between events in images. In contrast to the previous work, which uses heterogeneous sensor information, this network only uses image information. However, the model uses the image information in such a way that it interprets the images differently: as spatial information, i.e., a single image, and as temporal information, i.e., several images in the input. This splitting implies that individual images can be used as two fictitious virtual sensor information streams to recognize results spatially (features) and to better predict them temporally (temporal relationships). Another work uses camera images to localize the camera itself. For this purpose, we built a new processing pipeline that breaks up the video signal over time and learns absolute and relative information in different neural networks and merges their outputs into an optimal pose in a fusion network.Application: EEG Signal. In a cooperation project we applied the researched methods to other sensor data. We recorded beta- and gamma-waves of the human brain in different emotional states. When used to train an RNN, it correctly predicted the emotions of a test person in 90% of all cases from raw EEG data. Application: Simulator Sickness. We have shown how the visualization in VR affects human perception and movement anomalies, resp. simulator sickness, and how the neural networks researched here can be used to predict the effects.In 2020, we developed a new live demonstrator based on the research results achieved in II.Application: Gait reconstruction in VR. In 2020, we used the existing CNN-RNN model to predict human movement, namely gait cycle and gait phases, using sensor data from a head-mounted inertial sensor to visualize a virtual avatar in VR in real time. We showed that the DL model has significantly lower latencies than the state-of-the-art, since it can recognize gait phases earlier and predict future ones more precisely. However, this is at the expense of the required computing effort and thus the required hardware.

    The project was successfully completed in 2021. In 2021, as part of a successfully completed dissertation, essential findings from the course of the project were linked and conclusions were drawn, and numerous research questions were addressed and answered.
    As part of the research project, more than 15 qualification theses, 6 patent families, and more than 20 scientific publications were successfully completed and published. The core contribution of the project is the knowledge of the applicability and pitfalls of recurrent neural networks (RNN), their different cell types and architectures, in different application areas. Conclusion: The ability of the RNN family to deal with dynamics in data streams, e.g., failures, delays, and different sequence lengths in time series data, makes them indispensable in a large number of application areas today.
    The project is continued within the framework of seminars at the FAU and extracurricular research activities at Fraunhofer IIS within the framework of the ADA Lovelace Center.
    In 2022, time series augmentation was investigated. For this purpose, various generative methods, namely Variational Autoencoder (VAE) and Generative Adversarial Networks (GAN), were evaluated for their ability to generate time series of different application domains, e.g., features of radio signals, e.g., signal strength, channel impulse response, characteristics of GNSS spectra and multidimensional signals from inertial sensors. A novel architecture called ARCGAN was proposed, which combines all the known advantages of state-of-the-art methods and can therefore generate significantly more similar (effective) time series than the state-of-the-art.
    In 2023, we investigated generative methods based on attention mechanisms, transformer architectures, and GPT with respect to their predictive performance for time series. To this end, we evaluated methods such as Legendre Units (LMU), novel transformer architectures, and TimeGPT to better forecast localization information. We could show that using appropriate input prompts and calibration, preconfigured GPT models can be adapted to new areas of application, to make the training significantly more efficient, and to also save energy.

    In 2024, we further examine GPT-like models for their uncertainty, explainability, and adaptability. In addition, we analyze the feasibility of these generative methods in relation to various fields of application, e.g., forecasting and anomaly detection, anomaly characterization, anomaly localization, and anomaly mitigation.

  • Computer Science basics as an essential building block of modern STEM field curricula

    (Third Party Funds Single)

    Term: 01.10.2016 - 30.09.2019
    Funding source: Bayerisches Staatsministerium für Bildung und Kultus, Wissenschaft und Kunst (ab 10/2013)
    URL: https://www2.cs.fau.de/research/GIFzuMINTS/

    The increasing digitalization of all areas of science and life render competencies in the foundations of computer science essential for all tech students and more. For the success of their academic studies, often their courses, especially the introductory ones, are problematic hurdles that may lead to a dropout.
    For this reason, this project expands the support that students get while they are still at school, while transitioning from school to university, and during the introductory phase. To address the study orientation phase when future STEM students are still at school, we (a) use our regional and national contacts to provide support for seminars and we (b) offer advanced training for teachers as they act as multipliers when future students choose their degrees. To address the transition from school to university, we focus on the fact that freshmen show up with different previous knowledge. We offer revision courses to bring the students onto the same page, i.e., to make their knowledge more homogeneous. In the introductory phase, special intensification exercises and tutoring that take heterogeneity into account strive to lower the dropout rates.
    In 2018, one focus was to evaluate the effectiveness of our measures: the increased range of exercise groups, the more extensive support from the tutors, the correlation between exercise attendence and dropout rate, the effects of participation in the revision courses on the performance in the exercises and in the exam, etc.
    In order to attract and qualify teachers as multipliers, we expanded the range of advanced training courses for teachers: we demonstrated innovative approaches, examples and content for teaching so that the participants can pass on to their students what they have learned themselves.
    To quantitatively and qualitatively improve the W seminar papers written in computer science at school we compiled a 24-page brochure and sent it to schools in surrounding counties. This brochure supports teachers in the design and implementation of W seminars in IT by providing subject suggestions, tips, and a checklist for students.
    The GIFzuMINTS project ended in 2019 with a special highlight: On May 20, 2019, the Bavarian Minister of State for Science and Art, Bernd Sibler, and the deputy general manager of vbw bayme vbm, Dr. Christof Prechtl, visited us in a status meeting. Minister Bernd Sibler was impressed: "The concept of the FAU is perfectly tailored to the requirements of a degree in computer science. The young students are supported from the very beginning immediately after finishing school. That is exactly our concern, which we pursue with MINTerAKTIV: We want every student to receive the support she/he needs to successfully complete his/her academic studies."
    By the end of the project, the measures developed and implemented were thoroughly evaluated and established as permanent offers. The revision course on computer science was transformed into a continuous virtual offer for self-study and updated to the latest state of the art. The course for talented students that prepares them to participate in international programming competitions was expanded and set up as a formal module of the curriculum. In order to ensure that the measures sustain, we applied for subsequent funding, which has already been approved as CS4MINTS.

  • Adaptive Algorithms for RF-based Locating Systems

    (Third Party Funds Single)

    Term: 15.05.2016 - 31.03.2017
    Funding source: Industrie
    URL: https://www2.cs.fau.de/research/EAAFLS/

    The goal of this project is the development of adaptive algorithms for radio-based realtime locating systems. In the scope of this project we cover three essential topics:

    Automatic configuration of event detectors. In previous research projects we built the basics for the analysis of noisy sensor data streams. However, event
    detectors still need to be parameterized carefully to yield satisfying results. This work package explores the possibilities of an automatic configuration of the event detectors on existing sensor and event data streams.
    In 2016 we investigated concepts to extract optimal configurations from available sensor data streams. For soccer sport scientists manually annotated matches and scenes (e.g. player A kicks the ball with his/her left foot at time t). These manually annotated scenes may later by used to optimize the hierarchy of event detectors.

    Evaluation of machine learning techniques for locating applications. In previous research projects we already developed machine learning algorithms for radio-based locating systems (e.g., evolutionary algorithms to estimate antenna positions and orientations). This work package investigates further approaches that use machine learning to enhance the performance of realtime locating systems.
    In 2016 we evaluated concepts to replace parts of the position estimation algorithms by machine learning algorithms. Up to now a signal processing chain (analog/digital conversion, time of arrival estimation, Kalman filtering, motion estimation) uses the raw sensor data to calculate a position. This often results in high installation and configuration costs for the setup of locating systems in the application environment.

    Evaluation of vision-based techniques to support radio-based realtime locating.
    Radio-based locating systems have strengths if objects are occluded as microwaves may pass through the occluding objects. However, metallic surfaces in the environment pose challenges as they reflect RF-signals. Hence, the RF-signal that a transmitter emits arrives at the antennas over multiple paths. It is then often difficult to extract the directly received parts of the signal at the antenna and hence it is a challenge to properly estimate the distance between the antenna and the emitter. In this work package we investigate vision-based locating techniques that may help RF-based systems in calculating positions.
    In 2016 we developed two systems: CNNLok may be used by objects carrying a camera (self-localization), i.e., inside-out tracking, whereas InfraLok uses cameras installed in the environment to track objects with infrared light. CNNLok uses a convolutional neural network (CNN) that is trained on several camera images taken in the environment (at known places). At runtime the CNN receives a camera image and calculates the position of the camera. InfraLok detects infrared LEDs using a multi-camera system and calculated the position of objects in space.

  • Software Watermarking

    (Own Funds)

    Software watermarking means hiding selected features in code, in order to identify it or prove its authenticity. This is useful for fighting software piracy, but also for checking the correct distribution of open-source software (like for instance projects under the GNU license). The previously proposed methods assume that the watermark can be introduced at the time of software development, and require the understanding and input of the author for the embedding process. The goal of our research is the development of a watermarking framework that automates this process by introducing the watermark during the compilation phase into newly developed or even into legacy code. As a first approach we studied a method that is based on symbolic execution and function synthesis.
    In 2018, two bachelor theses analyzed two methods of symbolic execution and function synthesis in order to determine the most appropriate one for our approach. In 2019, we investigated the idea to use concolic execution in the context of the LLVM compiler infrastructure in order to hide a watermark in an unused register. Using a modified register allocation, one register can be reserved for storing the watermark. In 2020, we extended the framework (now called LLWM) for automatically embedding software watermarks into source code (based on the LLVM compiler infrastructure) with further dynamic methods. The newly introduced methods rely on replacing/hiding jump targets and on call graph modifications. In 2021, we added other adapted, dynamic methods that have already been published, as well as a newly developed method to LLWM. The added methods are based, among other things, on the conversion of conditional constructs into semantically equivalent loops or on the integration of hash functions, that leave the functionality of the program unchanged but increase its resilience. Our newly developed method IR-Mark now not only specifically selects the functions in which the code generator avoids using a certain register. IR-Mark now adds some dynamic computation of fake values that makes use of this register to blurr what is going on. There is a publication on both LLWM and IR-Mark. In 2022, we added another adapted procedure to the LLWM framework. The method uses exception handling to hide the watermark. In 2023, we adapted more methods to expand the LLWM framework. These include embedding techniques based on principles of number theory and aliasing.
  • Automatic Detection of Race-Conditions

    (Own Funds)

    Large software projects with hundreds of developers are difficult to review and often contain many bugs. Automatic tests are a well established technique to test sequential and deterministic software. They test the whole project (system test) or each module by itself (unit test). However, recent software contains more and more parallelism. This introduces several new bug patterns, like deadlocks and concurrent memory accesses, that are harder or even impossible to be detected reliably using conventional test methods. Whether the faulty behavior actually shows at runtime depends on the concrete scheduling of the threads which is indeterministic and varies between individual executions depending on the underlaying system. Due to this unpredictable behavior such bugs do not necessarily manifest in an arbitrary test run or may never arise in the testing environment at all. As a result, conventional tests are not well suited for modern, concurrent software.
    With the project AuDeRace, we develop methods to efficiently and reliably detect concurrent bugs while keeping the additional effort for developers as low as possible. In an initial approach we define a testing framework that allows the specification of a scheduling plan to regain deterministic execution. However, a major problem still remains: The developer has to identify and implement well suited test cases that cover the potential fault in the program and execute them in a special deterministic way in order to trigger the failure. Especially in the context of concurrency, it is difficult to visualize the behavior of a program and identify the problematic parts. To overcome this, the critical parts shall automatically be narrowed down before even writing dedicated test cases. Existing approaches and tools for this purpose generate too many false positives or the analysis is very time consuming, making their application to real world code prohibitive. The goal of this project is to generate less false positives and increase the analysis speed by combining existing static and dynamic analysis. This allows for the efficient use in not only small example codes but also large and complex software projects.

    In 2016 existing approaches were studied regarding their usability as a starting basis for our project. The most promising method uses model checking and predefined assertions to construct thread schedules that trigger the faulty behavior. However, the approach is currently infeasible for larger projects because only very small codes could be analyzed in reasonable time. Therefore, we focused on automatically detecting and removing statements that are unrelated to the parallelism respectively to the potentially faulty code parts in order to decrease the execution time of the preliminary static analysis.

    In 2017 the work on automatically reducing programs to speed up furher analysis was continued. Furthermore, we evaluated whether the concept of mutation testing can be applied to parallel software as well. The results indicate that this extension is indeed possible to rate tests qualitativly. However, to complete the analysis for larger programs in reasonable time, a few heuristics need to be applied during the process.

    In 2018 the focus moved to a deterministic execution of test cases. A concept to reproduce results during the execution was developed: In addition to the test case, a schedule specifies the dynamic behaviour of the threads. Instrumenting the code at previously marked positions and other relevant byte code instructions allows a separate control thread to enforce the schedule. When modifying the source code, the marked positions in the code need to be updated as well to keep them consistent with the test cases. A merging technique similar to the ones used in version control systems shall be used to automatically update the positions.

    Up to 2019, this project was a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi.fau.de/ ). In this context, several improvements for the quality of concurrent software were analyzed. The take-away result was that different approaches are applicable and required, but they also often suffer from long analysis times.
    Beyond the ESI project, we improved the usefulness of mutation testing by developing a tool for equivalence detection and test case generation. A submitted paper got accepted.
    In 2020, we studied and evaluated approaches to detect external race conditions. Whereas in classic race conditions, several threads of the analyzed software fail to work together properly, in external race conditions, software interacts with independent, unknown components. Examples are other programs, the operating system, or even malicious code written by attackers who interferes with the analyzed software.
    In 2021, we developed a system to statically detect race conditions when using external resources. A common pattern is to check properties of files and later access the files again, assuming the previously determined properties still hold. However, if the file is modified in the meantime, numerous problems can occur (time-of-check to time-of-use). Besides unexpected result, attackers can even modify the file to enforce malicious behavior and compromise the system. With our approach, such vulnerable codes can be detected in the software.

  • Parallel code analysis on a GPU

    (Own Funds)

    Term: 01.07.2013 - 30.09.2020
    URL: https://www2.cs.fau.de/research/ParCAn/

    In compiler construction there are analyses that propagate information along the edges of a graph and modify it, until a fix point is reached and the information no longer changes. In this project we built the ParCAn framework to accelerate such analyses by exploiting the massive parallelism of graphic cards.

    In 2016, our research focus was on synchronization mechanisms for GPUs. Known synchronization methods for CPUs (e. g. a spin lock) cannot be used without further adjustment on GPUs since their special architectural properties easily lead to dead- and livelocks. Synchronization is required (even for predominantly dataparallel graph implementations) if data dependences occur dynamically. We have therefore developed a novel synchronization mechanism which solves two non-trivial problems related to GPUs: First, we prevent dead- and livelocks. Second, we retain as much parallelism as possible by allowing dataparallel threads to work on disjoint areas of a data structure concurrently. For example, think of threads that modify disjoint locations of a graph without affecting its structural integrity. In our approach, a programmer can provide rules that describe the conditions under which a parallel access is allowed. At runtime, we check these rules and determine how many threads can run in parallel.

    We currently extend the above synchronisation mechanism with a scheduler that redistributes conflicting data access so that the SIMD execution on a GPU causes less serialization than without the re-ordering. Hence, the degree of parallelism grows. The underlying idea exploits that GPUs organize threads in hierarchical units. If the above synchronization mechanism detects a conflicting access in one of these units, it checks on the next smaller unit whether the conflict can also be found there. If this is not the case, the fewer threads in that unit can run in parallel. This is much better than serializing all threads in the enclosing unit. In this situation it is the scheduler’s task to re-distribute the detected collisions across the units so that as many threads as possible can run in parallel. As the scheduling is performed at run-time it needs to be efficient, must itself run in parallel, and potentially make use of the dynamic thread creation capabilities of modern GPUs.

    Graphs are fundamental data structures to represent relations between data (e.g., social networks, web link analysis). Graphs can have millions/billions of vertices/edges. GPUs can process graphs with 1000th of threads in parallel very efficiently. Graph Analyses use the Bulk Synchronous Parallel Model (BSP) which divides the analysis into three strictly separated phases: computation, communication and synchronization. The two latter ones require communications with the host system (CPU) that slow down execution. Our GPU-based compiler works after the BSP model too. Internally the code is represented as (control flow-) graph. This graph is transferred to the GPU and gets analyzed. Every code modification triggers this cycle. The Graph has thus to be generated and transferred to the GPU very fast.

    Publications in the field of graph-analysis focus on optimizing the computation time. The end-to-end execution time (including communication and synchronization) is ignored but has a strong impact on the run-time. Our compiler considers every phase of the BSP. In 2017 we published a paper that significantly reduces the time for synchronization.
    In addition, we focus on speeding up of the communication phase of the BSP model. Communication here means the transfer of the graph in both directions (GPU Host). While the graph and data structure used has strong impact on the run-time behavior it also influences the computation phase. Since there is no publication in the literature that systematically investigates the impact of the data-structure on the end-to-end run-time of a GPU graph analysis, we implemented a number of benchmarks that use different attributes of graphs (e.g., access successor/predecessor, random node access) and eight different graph data structures to represent graphs on the GPU. For the measurements we used a number of structurally different graphs. The results are likely to help developers in picking the right graph data structure for their GPU-problem.

    In 2018 we completed our comparative study on the efficiency of graph data structures on GPUs. To show the effectiveness of our framework we integrated it into the LLVM compiler framework. We picked four LLVM analyses and parallelized them with ParCAn. Ample measurements show that our framework can accelerate LLVM’s compilation process by up to 40%. A publication was accepted at the 28th International Conference on Compiler Construction and will receive the Best-Paper-Award.

    In 2019, ParCAn was adjusted to the new execution model of NVIDIA’s latest GPU architectures. With the introduction of the Volta architecture, threads can now achieve progress independently of the others. Since Volta every thread has its own program counter and call stack. Previously, a group of threads (called a warp) shared both a common program counter as well as a call stack. The threads either executed the same instruction or were idle (lock-step execution). Applications that are not adjusted to this execution model will compute wrong results. As threads now execute independently of each other, race conditions can occur within warps. Older lock-step fashioned execution models inserted synchronization points to prevent this implicitly. We inspected ParCAn’s source code for code fragments susceptible to causing race conditions on new architectures. These fragments were adjusted to now execute properly on the latest NVIDIA architectures.
    In 2020, we successfully completed this research project. We demonstrated that parallelizing the particularly cost-intensive data flow analyses can speed-up the compilation process of up to 31%. Thus, our research leads the way towards parallelized compilers that meet the requirements of today's software projects. The importance of this research topic was underlined by a "Best Paper Award" at the renowned "Compiler Construction" conference, see references.

    The use of the GPU as the target architecture raised other research-related questions that were also published.

    Some analyses store their information in a global data structure that can be modified by all threads simultaneously. Especially the high number of concurrent threads on a GPU demands for efficient synchronization. Thus, as part of the research project we implemented an efficient framework for establishing mutual exclusion, see the LNCS-paper in the references. Previous approaches inevitably resulted in deadlocks when the GPU is fully utilized. Moreover, by using a variant of the inspection-execution paradigm we further improved the efficiency of the framework.

    Another research topic considered the efficiency of graph structures on GPUs. At its core, ParCAn implements a graph traversal algorithm. The program to be translated is converted into a graph, the control flow graph (CFG), on which the analyses are executed. Due to the large number of parallel accesses, the CFG represents a critical data structure for the performance of ParCAn. For this reason, we conducted an extensive study comparing the performance of graph data structures. We used the results to determine the best possible data structure to represent the CFG. We derived general criteria that allow to make assumptions about the performance of a data structure under certain conditions. Even outside of the context of ParCAN, developers can use these criteria, represented as a decision tree, to choose the most appropriate data structure for their static graph algorithms. The results of the study were presented at the GPGPU workshop, see references.

  • Design for Diagnosability

    (Third Party Funds Single)

    Term: 15.05.2013 - 30.09.2018
    Funding source: Bayerisches Staatsministerium für Wirtschaft und Medien, Energie und Technologie (StMWIVT) (ab 10/2013)
    URL: http://www2.informatik.uni-erlangen.de/research/DfD/
    Many software systems behave obtrusively during the test phase or even in normal operation. The diagnosis and the therapy of such runtime anomalies is often time consuming and complex, up to being impossible. There are several possible consequences for using the software system: long response times, inexplicable behaviors, and crashes. The longer the consequences remain unresolved, the higher is the accumulated economic damage.
    "Design for Diagnosability" is a tool chain targeted towards increasing the diagnosability of software systems. By using the tool chain that consists of modeling languages, components, and tools, runtime anomalies can easily be identified and solved, ideally already while developing the software system. Our cooperation partner QAware GmbH provides a tool called Software EKG that enables developers to explore runtime metrics of software systems by visualizing them as time series.
    The research project Design for Diagnosability enhances the eco-system of the existing Software EKG. The Software-Blackbox measures technical and functional runtime values of a software system in a minimally intrusive way. We store the measured values as time series in a newly developed time series database, called Chronix. Chronix is an extremely efficient storage of time series that optimizes disk space as well as response times. Chronix is an open source project (www.chronix.io) and is free to use for everyone.
    The newly developed Time-Series-API analyzes these values, e.g., by means of an outlier detection mechanism. The Time-Series-API provides multiple additional building blocks to implement further strategies for identifying runtime anomalies.
    The mentioned tools in combination with the existing Software EKG will become the so-called Dynamic Analysis Workbench. This tool enables developers to diagnose, explain, and fix any occurring runtime anomalies both quickly and reliably. It will provide diagnosis plans to localize and identify the root causes of runtime anomalies. The full tool chain aims at increasing the quality of software systems, particularly with respect to the metrics mean-time-to-repair and mean-time-between-defects.

    Before we have successfully completed the project in July 2016, we have made the following contributions:

    • We have linked Chronix and a framework for distributed data processing so that our anomaly analyses now scale to huge sets of time series data.
    • We extended Chronix with additional components. Among them are, for example, a more efficient storage model, some adapters for more time series databases, additional server-side analysis functions, and some new time series types.
    • We have published our benchmark for time series databases.
    • We have investigated and implemented an approach to link application-level calls, e.g., a login of a user, down to the resulting calls on the OS level.

    Although funding expired in 2016, we made further contributions in 2017:

    • We presented Chronix at the FAST conference in Santa Clara, CA in February 2017.
    • We have equipped Chronix with interfaces to attach time series databases that are used in the industry.
    • We have developed an approach that determines the ideal cluster configuration (w.r.t. processing time and costs) for a given analysis (specific function and set of time series).
    • We have expanded Spark, a framework for distributed processing of large-scale data, so that it now can make use of GPUs in distributed time series analyses. We presented the results at the Apache Big Data Conference in Miami, Florida, in May 2017.

    We continued to make further contributions to the research project in 2018:

    • We have published a paper at PROFES 2018 that describes techniques and insights on how runtime data in a large software project can be offered to all project participants at the development stage to improve their collaboration.
    • We have maintained the Chronix Open Source project and stabilized it further (updating versions, fixing bugs, etc.).
  • Echtzeitkritische Systeme

    (Third Party Funds Single)

    Term: 01.01.2013 - 31.12.2013
    Funding source: Industrie
  • Techniques and tools for iterative development and optimization of software for embedded multicore systems

    (Third Party Funds Single)

    Term: 15.10.2012 - 30.11.2014
    Funding source: Bayerisches Staatsministerium für Wirtschaft, Infrastruktur, Verkehr und Technologie (StMWIVT) (bis 09/2013)
    Multicore processors are of rising importance in embedded systems as these processors offer high performance while maintaining low power consumption. Developing parallel software for these platforms poses new challenges for many industrial sectors because established tools and software libraries are not multicore enabled. The efficient development, optimization and testing of multicore-software is still open research, especially for reliable real-time embedded systems.

    In the multi-partner project "WEMUCS" [http://www.multicore-tools.de/] new methods and tools for efficient iterative development, optimization, and testing of multicore software have been created over the past two years. Innovative tools and technologies for modeling, simulation, visualization, tracing, and testing have been developed and integrated into a single tool chain. Using case studies from different industries (automotive, telecommunications, industry automation) these tools were evaluated and improved.

    Although several well-known methods for test case generation and best practice coverage measures for classical single-core applications exist, no such methods for multi-core software have established themselves yet. Unfortunately, it is the interaction of concurrent threads that can cause faults that cannot be discovered by testing the individual threads in isolation. As part of the WEMUCS project (more precisely: work package AP3 [http://www.multicore-tools.de/de/test.html]) and based on an industrial size case study, we developed a generic technique (called a "testing pipeline" below) that systematically creates test cases to find and analyze the impact of concurrent side effects.

    To evaluate the new testing pipeline (including the automated parallelization of sequential code) on real world examples, our project partner Siemens created a complete model of a large luggage conveyor belt, including the code to control the belt. Such a luggage conveyor can be found at every airport. The case study's model is used to automatically derive luggage conveyor belt systems of different sizes, i.e., built from an arbitrary number of feeder or outlet belts. The hardware of the conveyor belt is emulated on the SIMIT simulation tool from Siemens and the control software (written in the programming language AWL) runs on a software-based PLC.

    The first step of the testing pipeline converts the AWL code into a more comprehensible and human-readable programming language called HLL. We have completed this converter during this reporting period. In step two, our tool then transforms previously sequential parts of the AWL code into HLL units that are executed in parallel. When applied to a luggage conveyor belt system built from eight feeders, eight outlets and an inteconnecting circular belt of straight and curved segments, our tool automatically transcoded 11.704 lines of sequential AWL code into 34 KB of parallel HLL code.

    In step three another tool developed by the chair then analyzes the HLL code and automatically generates a testing model. This model represents the interprocedural control flow of the concurrent subroutines and also holds all the thread switches that might be relevant for testing. The testing model consists of a set of hierarchically organized UML activities (currently encoded as an XMI document that can be imported into Enterprise Architect by Sparx Systems). When applied to the case study outlined above, our tool automatically generates 103 UML activity diagrams (with 1,302 nodes and 2,581 edges).

    Step four is optional. The tester can manually adapt the testing model as needed (e.g. by changing priorities or inserting additional verification points). Then the completed model can be loaded into the MBTsuite tool, a model-based testing tool developed by our project partner sepp.med GmbH. This tool is highly configurable to generate test cases that cover as many parts of the testing model as possible. We ran MBTsuite on a standard PC and applied it to the testing model of our case study; within six minutes MBTSuite generated a highly optimized test set consisting of only 10 test cases that nevertheless cover 99% of the nodes and 78% of the edges.

    In cooperation with our project partner sepp.med GmbH we built two export modules for the above-mentioned MBTsuite. One module outputs the generated test cases as a human-readable spreadsheet, the other module outputs an executable test set in the Java language. The exported spreadsheet contains one individual sheet per test case, with one column per thread. The rows visualize the thread interleaving that gets tested. The Java file holds a Java class with a test method per test case. Every method holds a sequence of test steps that discretely control thread interleavings. This way, each test case execution leads to a unique and reproducible execution of the parallel System Under Test (SUT) written in HLL. Each run of a test instructs our HLL emulator to load and initialize the SUT and each subsequent test step instructs the HLL emulator to execute a certain set of instructions from a certain thread. During this fully controlled execution, each test case emits a detailed protocol of its execution for the final visualization step.

    A plug-in from sepp.med that creates a visualization layer in Enterprise Architect's UML editor visualizes the resulting log file. Colored nodes and edges tell the user which control flow paths a test has covered. If a test case "fails" (if a race condition or a logic error is found in the tested program), its graphical trace ends at the failing statement. The tester can then follow the control flow back in time in order to understand the underlying reason for the failure.

    We have implemented a prototype of the full testing pipeline and demonstrated its applicability to an industrial size case study. This tool is a major contribution to testing concurrent code for embedded systems. It is a contribution of the Programming Systems Group to the "ESI initiative" [http://www.esi-anwendungszentrum.de].

  • Incremental Code Analysis

    (Own Funds)

    Term: 01.04.2012 - 30.06.2017
    URL: https://www2.cs.fau.de/research/InCA/

    To ensure that errors in a program design are caught early in the development process, it is useful to detect mistakes already during the editing of the code. For that the employed analysis has to be fast enough to enable interactive use. One method to achieve this is the use of incremental analysis, which combines analysis results of parts of the program to analyze the whole program. As an advantage, it is then possible to re-use large parts of the analysis results when a small change to the program occurs, namely for the unaffected parts of the program and for libraries. Thus the work required for analysis can be drastically reduced, which makes the analysis useful for interactive use.
    Our analysis is based on determining, for (parts of) functions, which effects their execution can have on the state of a program at runtime. To achieve this, the runtime state of a program is modeled as a graph, which describes the variables and objects in the program's memory and their points-to relationship. The function is executed symbolically, to determine the changes made to the graph or, equivalently, to the runtime state described by it. To determine the effects of executing pieces of code in order, function calls, loops, etc., the change descriptions for smaller parts of the program can be combined in various ways, resulting in descriptions for the behavior of larger parts of the program. The analysis works in a bottom-up fashion, analyzing a called method before analyzing the callee (with recursion being analyzable as well).

    In 2015 we focused on improving the algorithms and data structures used for the analysis. We were able to significantly improve the runtime and memory requirements for analyzing a given program. Additionally, the analyzed program may now contain more, and more expressive language features.

    In 2016 we focused on improving the algorithms and data structures used for the analysis. We improved both the scalability of the analysis towards large code bases with more than 1 mio. statements, and the incremental analysis, where we re-used the analysis results for unmodified program parts, drastically speeding up the analysis for typical software projects (i.e. with a large code base and small, incremental changes)

    In 2017 we continued improving the algorithms and data structures used for the analysis. In addition to further development of the analysis' scalability towards large code bases, and of incremental analysis (where we re-used the analysis results for unmodified program parts), we focused on an easy to grasp documentation of the analysis, in order to understand it, and to lay theoretical basics to verify the analysis' correctness.

  • Inter-Thread Testing

    (Own Funds)

    Term: 01.01.2012 - 31.12.2013

    In order to achieve higher computing performance, microprocessor manufacturers do not try to achieve faster clock speed anymore - on the contrary: the absolute number of cycles has even decreased, while the number of independent processing units (cores) per processor is continually increased. Due to this evolution, developers must learn to think outside the box: The only way to make their applications faster (in terms of efficiency) is to modularize their programs such that independent sections of code execute concurrently. Unfortunately, present-day systems have reached a level of functional complexity, such that even software for sequential execution is significantly error-prone - the parallelization for multiple cores adds yet another dimension to the non-functional complexity. Although research in the field of software engineering emerged several different quality assurance measures, there are still very few effective methods for testing concurrent applications, as the broad emergence of multi-core systems is relatively young.

    This project aims to fill that gap by providing an automated test system. First of all, a testing criteria hierarchy is needed, which provides different coverage metrics tightly tailored to the concept of concurrency. Whilst for example branch coverage for sequential programs requires the execution of each program branch during the test (i.e. making the condition of an if-statements both true and false - even if there is no explicit else branch), a thorough test completion criterion for concurrent applications must demand for the systematic execution of all relevant thread interleavings (i.e. all possibly occurring orderings of statements, where two threads may modify a shared memory area). A testing criterion defines the properties of the 'final' test set only, but does not provide any support for identifying individual test cases. In contrast to testing sequentially executed code, test scenarios for parallel applications must also comprise control information for deterministically steering the execution of the TUT (Threads Under Test).

    In 2012, a framework for Java has been developed, which automatically generates such control structures for TUT. The tester must provide the bytecode of his application only; further details such as source code or restrictions of the test scenario selection are optional. The approach uses aspect-oriented programming techniques to enclose memory access statements (reads or writes of variables, responsible for typical race conditions) with automatically generated advices. After weaving the aspects into the SUT (System Under Test), variable accesses are intercepted at runtime, the execution of the corresponding thread is halted until the desired test scenario is reached, and the conflicting threads are reactivated in the order imposed by the given test scenario. In order to demonstrate the functionality, some naive sequence control strategies were implemented, e.g. alternately granting access to shared variables from different threads.

    In 2013, the prototypical implementation of the InThreaT framework has been reengineered as an Eclipse plugin. This way, our approach can also be applied in a multi-project environment and the required functionality integrates seamlessly into the development environment (IDE) familiar to programmers and testers. In addition, the configuration effort required from the tester could be reduced, as e.g. selecting and persisting the interesting points of interleaving also became intuitively usable parts of the IDE. In order to automatically explore all relevant interleavings, we need an infrastructure to enrich functional test cases with control information, required to systematically (re)execute individual test cases. In 2013, such an approach for JUnit has been evaluated and implemented prototypically, which allows to mark individual test cases or whole test classes with adequate annotations.

  • Embedded Realtime Language Development Framework

    (Own Funds)

    Term: 01.01.2012 - 30.11.2014

    ErLaDeF is our test-bed for new programming language and compiler techniques. Our main focus is on building infrastructure for easier (hard + soft) real-time embedded parallel systems programming.
    We focus on hard real-time embedded systems as they are about to go massively parallel in the near future.
    Real-time and embedded systems also have hard constraints on resource usage. For example, a task should complete in a fixed amount of time, have guaranteed upper-limits on the amount of memory used, etc. We are developing different ways to manage this concurrency using a combination of strategies: simpler language features, automatic parallelization, libraries of parallel programming patterns, deep compiler analysis, model checking, and making compiler analysis fast enough for interactive use.

    Runtime Parallelization of Programs

    Our automatic parallelization efforts are currently focused on dynamic parallelization. While a program is running, it is analyzed to find loops where parallelization can help performance. Our current idea is to run long-running loops three times. The first two runs analyze the memory accesses of the loop and can both run in parallel. The first run stores in a shared data structure for every memory address in which loop iteration a write access happens. We do not need any synchronization for this data structure, only the guarantee that one value is written to memory, when two concurrent writes happen. In the second pass we check for every memory access, if it has a dependency to one of the stored write accesses. A write access is part of any data-dependency, so we can find all types of data dependencies. If we do not find any, the loop is actually run in parallel. If we find dependencies, the loop is executed sequentially. We can execute the analyses in parallel to a modified sequential execution of the first loop iterations.

    In 2013 we have explored alternatives to polymorphism and inheritance that may be easier to analyze. We have also examined alternative thread synchronization methods, for example transactional memory, implicit synchronization, remote procedure calls, etc.

    In 2014 we enhanced the analysis so that a loop can start running while the remainder of the loop is analyzed to see if it can be run in parallel. To allow the sequential loop to execute while the tail of the loop is analyzed we needed to instrument the sequential loop slightly. The result is that the loop runs only slightly slower if the loop cannot be parallelized, but if the loop is found to be parallelizable, speedup is near to linear.

    Finally, we also created a new language that uses the above library for run-time parallelization. Any loops that the programmer marked as candidates for run-time parallelization are analyzed for constructs that the library cannot yet handle. If the loop is clean, code is generated that uses the library's macros.

    Design Patterns for Parallel Programming

    A library of parallel programming patterns allows a programmer to select well known parallelization and inter-core communication strategies from a well-debugged library. We are performing research into what (communication) patterns actually exist and when they can be applied. We have collected over 30 different patterns for parallel communication. In 2013 we investigated mechanisms to automatically determine the fitting implementation for a given software and hardware environment. We also added a set of distributed channels where cores can send data from one local memory to another. The
    distributed channels allow the library to be used to program modern Network-on-Chip (NoC) processors.

    Script based language for embedded systems (Pylon)

    Pylon is a language that is close to scripting languages (but is statically typed). A large part of the complexity that a programmer would normally take care of when creating an application, is moved to the compiler (i.e., type inference). The programmer does not have to think about types at all. By analyzing the expressions in the program, types are inferred (duck typing). The language is also implicitly parallel, the programmer does not need to have expert knowledge to parallelize an application. The compiler automatically decides what to run in parallel. Finally, the language is kept simple so that learning the language remains easy for novice programmers. For example, we kept the number of keywords small.
    Any language constructs that make analyzing the program hard for a compiler has been omitted (pointer arithmetic, inheritance, etc.). Any removed features have been replaced by simpler variants that can be easily analyzed. The current focus of this project lies in supporting the programmer in designing the code. The previous programming language research results have been absorbed in the Pylon project. For example, the prior research results on alternatives to polymorphism and inheritance have been added to the Pylon project. This allows us to report errors at compile time where other languages can only find them at run-time.

    Interactive Program Analysis

    To ensure that program design errors are caught early in the development cycle, it is necessary to find bugs while editing. This requires that any program analysis works at interactive speeds. We are following two approaches to this. The first approach centers around algorithmic changes to program analysis problems. Making analysis problems lazy means that only those parts of a program should be examined that are pertinent to the question that is currently asked by the compiler. For example, if the compiler needs to know which functions access a certain object, it should not examine unrelated functions, classes, or packages. Making program analysis incremental means that a small change in the program should only require small work for the (re-)analysis. To achieve that, a program is split recursively into parts. Then, for each of the parts, it
    is calculated which effect it would have during an execution of the program. For each part, a symbolic representation of its effects are saved. These representations can then for one be used to find the errors that occur when two of the parts interact (concurrently or non-concurrently). Also, we can deduce the effects that a bigger part of the program has during it’s execution by combining the effects of the smaller parts the bigger part consists of. This enables incremental analysis, because changes in one place do not cause the whole program to be reanalyzed, as the symbolic representations of the effects of unchanged parts of the program stay unchanged as well.

    In 2013 our key focuses were twofold: Firstly, we developed data structures that can both precisely and efficiently describe the effects of a part of a program. Secondly, we developed both efficient and precise algorithms to create and use these data structures.

    In 2014 we expanded and modified this analysis framework in order to support big code bases, to analyze them and keep the analysis results for later use.  This enables us to precisely analyze programs that use libraries, by first analyzing the library, and then using the library{'}s analysis result to get precise analysis results for our program.

    Our second approach to bring compiler analysis to interactive speed is to make the analysis itself parallel. In 2013 we continued to develop data-parallel formulations of basic compiler analyses. We have started to implement a generic data-parallel predicate propagation framework. Its data-parallel forms are then portably executable on many different multi-core architectures.

    Object-oriented languages offer the possibility to dynamically allcoate objects. The memory required for this is allocated at run-tie. However, in contrast to desktop systems, embedded systems typically have very little memory. If the 'new' operator is used often in an embedded system (that are now starting to be programmed with higher-level languages such as Java and C++ that include 'new'), memory can be exhausted at run-time causing an embedded system to crash.

    In 2014, we created an analysis to find this problem at compile-time and report this to the developer.
    To detect memory exhaustion at compile-time, the analysis determines the live-time of references to objects. If there are no more references to an object, the object can be removed from memory. Normally, such reference counting schemes are performed at run-time, we, however, perform reference counting at compile-time in an interactive fashion. The result is that memory management errors can be found at compile-time instead of at run-time. Additionally, the static reference counting increases a program{'}s performance as the reference counts do not have to be manipulated at run-time.

    If it is statically determined that an object can be removed, the developer needs to insert a 'delete' statement. With explicit memory management, we are now able to statically determine a program's worst-case memory requirements. The whole analysis outlined above is integrated into Pylon and a predicate propagation framework previously reported on. Note that the analysis is language independent, however, and can be applied to other languages as well (Java, C++, etc.). However, we can in that case no guarantee that reference counts are correctly computed as we require Pylon{'}s analyzability for this.

    The ErLaDeF project is a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi-anwendungszentrum.de/)

  • Compiler-supported parallelization for multi-core architectures

    (Own Funds)

    Term: 01.01.2011 - 31.12.2016

    Several issues significantly retard the development of quicker and more efficient computer architectures. Traditional technologies can no longer contribute to offer more hardware speed. Basic problems are the divergent ratio of the latencies of memory access and CPU speeds as well as the heat and waste of energy caused by increasing clock rates. Homogeneous and heterogeneous multi-core and many-core architectures were presented as a possible answer and offer enormous performance to the programmer. The multi-level cache hierarchy and decreased clock rates help avoid most of the above problems. Potentially, performance can increase even further by specialization of some hardware components. Current target architectures are GPUs with hundreds of arithmetic units and the Intel XeonPhi processor that provides 60 and more cores including hyper threading on a single board.

    While data parallel problems can be relatively simple accelerated by using the new hardware architectures, the implementation of task parallel problems is our main research focus. The difficulty is often the irregularity of the resulting task tree and thus the different task run times. From the point of view of a programming systems research group, there are - among others - the following open questions: Which core executes which work packet in which order? When do you donate a work packet from one compute node to another? Which data belongs to a work packet, are multiple cores/compute nodes allowed to access the data simultaneously? How do we have to merge data from multiple compute nodes? How can a compiler together with a runtime system create tasks and distribute work packets?

    In 2011, we have implemented and extended the Cilk programming model for the heterogeneous CellBE architecture (one PowerPC core (PPU) with eight SPU "coprocessors"). The CellBE architecture offers an enormous computing potential on a single chip. To move a work packet in the heterogeneous architecture, we have extended the Cilk programming model by an extra keyword. A source to source transformation then creates code for both, the PPU and SPU cores. Furthermore, we have moved the data along with the work packets in the SPU local stores and used a garbage collection technique to free memory from remote SPUs later.

    In 2012, we focused on graphic cards (GPUs) as a second target architecture. GPUs offer a lot more performance than ordinary CPUs, however achieving peak performance may be difficult. For data parallel problems, the performance can be achieved using Cuda (NVidia) or OpenCL (AMD) relatively easy. However, it is much more difficult to port task parallel problems with reasonable performance to the GPU, which is one of the goals on our roadmap. Thus, we design, implement and compare various load balancing algorithms. In 2012 we designed a first approach with hierarchical queues under the principle of work donation.

    In 2013, while further developing the load balancing algorithms for the GPU, we also targeted our work towards the Intel XeonPhi processor. With its many-core architecture and large register sets (and thus the ability to issue vector instructions on multiple data), the XeonPhi processor is a new challenge for load balancing algorithms. In practice, we extended and adopted Cilk for the XeonPhi such that we can automatically merge functions during the source-to-source transformation. This increases the Intel compiler{'}s chances to automatically parallelize. We implemented several analyses that not only increase the number of candidate functions for merging but also avoid (or at least handle) divergence in those merged functions.

    In 2014, we have extended our existing implementation for XeonPhi processors in a way that we can distribute the work over multiple XeonPhi processors. In contrast to the technique of work stealing that is used to distribute work over the many cores of a single XeonPhi, we use work donation to distribute the work to other XeonPhi processors. With a new source code annotation it is possible to mark the necessary data ranges for a work packet. These data ranges are then distributed along with the work packet and merged at synchronization points, which was the main challenge of the implementation. Furthermore, we have started to extend the clang compiler of the LLVM framework with support for Cilk in order to automatically generate CUDA code for execution on GPUs. Along with the generated CUDA code, we have designed a lightweight but general runtime system that manages execution and execution order of the work packets. We plan to implement analyses to avoid execution divergence as much as possible.

    In 2015, we evaluated and compared multiple load balancing algorithms to execute Cilk programs on the GPU. Therefore, we implemented queuing algorithms for parallel access and improved the automated generation of the necessary CUDA code. The correct placement of Cilk keywords for synchronization is still a challenge for the programmer. Thus, we generate from plain, recursive C code multiple, "plausible" code variants including synchronisation statements. These code variants will then be executed speculatively and the result from the fastest, correct variant will be used for further computations. Furthermore, the size of the base case of the recursion is crucial for an optimal performance improvement. Consequently, we started to optimize the size of the base case using analysis during compile and run time and will replace recursive calls with function inlining and vectorisation.

  • Automatic Code Parallelization at Runtime

    (Own Funds)

    Term: 01.01.2011 - 30.04.2016

    Our automatic parallelization efforts are currently focused on dynamic parallelization. While a program is running, it is analyzed to find loops where parallelization can help performance. Our current idea is to run long-running loops three times. The first two runs analyze the memory accesses of the loop and can both run in parallel. The first run stores in a shared data structure for every memory address in which loop iteration a write access happens. We do not need any synchronization for this data structure, only the guarantee that one value is written to memory, when two concurrent writes happen. In the second pass we check for every memory access, if it has a dependency to one of the stored write accesses. A write access is part of any data-dependency, so we can find all types of data dependencies. If we do not find any, the loop is actually run in parallel. If we find dependencies, the loop is executed sequentially. We can execute the analyses in parallel to a modified sequential execution of the first loop iterations.

    In 2014 we enhanced the analysis so that a loop can start running while the remainder of the loop is analyzed to see if it can be run in parallel. To allow the sequential loop to execute while the tail of the loop is analyzed we needed to instrument the sequential loop slightly. The result is that the loop runs only slightly slower if the loop cannot be parallelized, but if the loop is found to be parallelizable, speedup is near to linear.

    Finally, we also created a new language that uses the above library for run-time parallelization. Any loops that the programmer marked as candidates for run-time parallelization are analyzed for constructs that the library cannot yet handle. If the loop is clean, code is generated that uses the library's macros.

    The project is a contribution of the Chair of Computer Science 2 (Programming Systems) to the IZ ESI (Embedded Systems Initiative, http://www.esi-anwendungszentrum.de/)

  • Efficient Software Architectures for Distributed Event Processing Systems

    (Third Party Funds Single)

    Term: 15.11.2010 - 31.12.2015
    Funding source: Fraunhofer-Gesellschaft

    Localization Systems (also known as Realtime Location Systems, or RTLS) become more and more popular in industry sectors such as logistics, automation, and many more. These systems provide valuable information about whereabouts of objects at runtime. Therefore, processes can be traced, analyzed, and optimized. Besides the research activities at the core of localization systems (like resilience and interference-free location technologies or methods for highly accurate positioning), algorithms and techniques emerge that identify meaningful information for further processing steps. Our research focuses on automatic configuration methods for RTLSs as well as on the generation of dynamic motion models and techniques for event processing on position streams at runtime.

    In 2011, we investigated whether events can be predicted after analyzing and learning event streams from the localization system at runtime. As a result, we are able to deduce models that represent the information buried in the event stream to predict future events.

    We developed several methods and techniques in 2012 that process and detect events with low latency. Events (composite, complex) can be detected by means of a hierarchical aggregation of sub-events that themselves are detected by (several) event detectors processing sub-information in the event stream. This greatly reduces the complexity of the detection components and renders them fully maintainable. They even can use parallel or distributed cluster architectures more efficiently so that important events can be detected within a few milliseconds.

    In 2013 we further minimized detection latency in distributed event-based systems: first, a new migration technique modifies and optimizes the allocation of software components in a networked environment at runtime to minimize networking overhead and detection latencies. Second, a speculative event processing technique uses conservative buffering techniques to exploit available system resources. We also created and published a representative data set (consisting of realtime position data and event streams) and a corresponding task description.

    In 2014 we investigated fundamental approaches zu handle uncertainties (both w.r.t. the definition of event detectors and to the events). We implemented a promising prototype of an event-based system that is no longer deterministic but instead evaluates several possible system states in parallel to achieve a detection with a much higher robustness and correctness. The domain expert can parameterize the event detectors by attaching probabilities or probability functions to the generated events.

    In 2015 we improved, optimized and published our approach. Furthermore we started to investigate approaches to learn optimal parameter sets for the event detectors. Thus, a manual adjustment and tuning of parameters (like thresholds) becomes unnecessary.

    The project is a contribution of the Programming Systems Group to the IZ ESI http://www.esi.uni-erlangen.de/

  • Analysis of Code Repositories

    (Own Funds)

    Term: 01.01.2010 - 12.04.2024
    URL: https://www2.cs.fau.de/research/AnaCoRe/
    Software developers often modify their projects in a similar or repetitive way. The reasons for these changes include the adoption of a changed interface to a library, the correction of mistakes in functionally similar components, or the parallelization of sequential parts of a program. If developers have to perform the necessary changes on their own, the modifications can easily introduce errors, for example due to a missed change location. Therefore, an automatic technique is desireable that identifies similar changes and uses this knowledge to support developers with further modifications.

    Extraction of Code-Changes
    In 2017, we developed a new code recommendation tool called ARES (Accurate REcommendation System). It creates more accurate recommendation compared to previous tools as its algorithms take care of code movements during pattern and recommendation creation. The foundation of ARES lies in the comparison of two versions of the same program. It extracts the changes between the two versions and creates patterns based on the changed methods. ARES uses these patterns to suggest similar changes for the source code of different programs automatically.
    The extraction of code changes is based on trees. In 2016 we developed (and visibly published) a new tree-based algorithm (MTDIFF) that improves the accuracy of the change extraction.

    Symbolic Execution of Code-Fragments
    In 2014 we developed a new symbolic code execution engine called SYFEX to determine the behavioral similarity of two code fragments. In this way we aim to improve the quality of the recommendations. Depending on the number and the generality of the patterns in the database, it is possible that without the new engine SIFE generates some unfitting recommendations. To present only the fitting recommendations to the developers, we compare the summary of the semantics/behavior of the recommendation with summary of the semantics/behavior of the database pattern. If both differ too severely, our tool drops the recommendation from the results. The distinctive features of SYFEX are its applicability to isolated code fragments and its automatic configuration that does not require any human interaction.
    In 2015 SYFEX was refined and applied to code fragments from the repositories of different software projects. In 2016 we investigated to which extend SYFEX can be used to gauge the semantic similarity of submissions for a programming contest. In 2017 and 2018 we optimized the implementation of SYFEX. We also began collecting a data set of semantically similar methods from open source repositories. We published this data set in 2019.Techniques for symbolic execution use algorithms to check the satisfiability of logical/mathematical expressions in order to detect valid execution paths in a program. Usually, these algorithms account for a large part of the total runtime of a symbolic execution. To accelerate this satisfiability check, we experimented with a technique to replace complicated expressions with simpler equivalent expressions. These simpler expressions are obtained by using program synthesis. In the year 2020, we extended this program synthesis with a novel technique that can quickly detect whether a fixed set of operations can be used to construct an expression that is equivalent to the complicated expression. We published this approach in 2021 and were able to show that the technique can reduce the runtime of common program synthesizers by 33% on average. We subsequently extended this technique to other classes of program synthesis problems. In 2022, we performed a comprehensive evaluation of these extensions. This evaluation showed that these extensions similarly improve the runtime of program synthesizers on a larger class of program synthesis problems. We completed the work on unrealizability detectors for bit vector program synthesis in 2023 and described it in detail in a Dissertation.
    Detection of Semantically Similar Code Fragments
    SYFEX computes the semantic similarity of two code fragments. Therefore, it allows to identify pairs or groups of semantically similar code fragments (semantic clones). However, the high runtime of SYFEX (and similar tools) limit their applicability to larger software projects. In 2016, we started the development of a technique to accelerate the detection of semantically similar code fragments. The technique is based on so-called base comparators that compare two code fragments using a single criterion (e.g., the number of used control structures or the structure of the control flow graph) and that have a low runtime. These base comparators can be combined to form a hierarchy of comparators. To compute the semantic similarity of two code fragments as accurately as possible, we use genetic programming to search for hierarchies that approximate the similarity values as reported by SYFEX for a number of pairs of code fragments. A prototype implementation confirmed that the method is capable of detecting pairs of semantically similar code fragments.
    We further improved the implementation of this approach in 2017 and 2018. Additionally, we focused on evaluating the approach with pairs of methods from software repositories and from programming exercises. Moreover, we created a data set of semantically similar methods from open-source software repositories that we published in 2019.
    Techniques for symbolic execution rely on algorithms to detect the satisfiability of logic/mathematic expressions. These are used to detect whether an execution path in a program is feasible. The algorithms often use a large amount of the total computation time. To improve the speed of this satisfiability check, in the years 2019 and 2020 we experimented with a technique to replace complicated expressions with simpler expressions that have the same meaning. These simpler expressions result from the application of program synthesis. In 2020 we augmented the program synthesis with a novel approach to detect beforehand if some operations can form an expression with the same meaning as a more complicated expression.
    Semantic Code Search
    The functionality that has to be implemented during the development of a software product is often already available as part of program libraries. It is often advisable to reuse such an implementation instead of rewriting it, for example to reduce the effort for developing and testing the code.
    To reuse an implementation that fits the purpose, developers have to find it first. To this end developers already use code search engines on a regular basis. State-of-the-art search engines work on a syntactic level, i.e., the user specifies some keywords or names of variables and methods that should be searched for. However, current approaches do not consider the semantics of the code that the user seeks. As a consequence, relevant but syntactically different implementations often remain undetected ("false negatives") or the results include syntactically similar but semantically irrelevant implementations ("false positives"). The search for code fragments on a semantic level is the subject of current research.
    In 2017 we began the development of a new method for semantic code search. The user specifies the desired functionality in terms of input/output examples. A function synthesis algorithm from the literature is then used to create a method that implements the specified functionality as accurately as possible. Using our approach to detect similar code fragments, this synthesized method is then compared to the methods of program libraries to find semantically similar implementations. These implementations are then presented as search results to the user. A first evaluation of our prototypical implementation shows the feasibility and practicability of the approach.

    Clustering of Similar Code-Changes
    To create generalized change patterns, it is necessary that the set of extracted code changes is split into subsets of changes that are similar to each other. In 2015 this detection of similar code changes was improved and resulted in a new tool, called C3. We developed and evaluated different metrics for a pairwise similarity comparison of the extracted code changes. Subsequently, we evaluated different clustering algorithms known from the literature and implemented new heuristics to automatically choose the respective parameters to replace the previous naive approach for the detection of similar code changes. This clearly improved the results compared to the previous approach, i.e., C3's new techniques detect more groups of similar changes that can be processed by SIFE to generate recommendations.
    The aim of the second improvement is to automatically refine the resulting groups of similar code changes. For this purpose we evaluated several machine learning algorithms for outlier detection to remove those code changes that have been spuriously assigned to a group.
    In 2016 we implemented a new similarity metric for the comparison of two code changes that essentially considers the textual difference between the changes (as generated, for example, by the Unix tool 'diff'). We published both a paper on C3 and the dataset (consisting of groups of similar changes) that we generated for the evaluation of our tool under an open-source license, see https://github.com/FAU-Inf2/cthree . This dataset can be used as a reference or as input data for future research.  In addition, we prototypically extended C3 by techniques for an incremental similarity computation and clustering. This allows us to reuse results from previous runs and to only perform the absolutely necessary work whenever new code changes are added to a software archive.

  • Software Project Control Center

    (Third Party Funds Single)

    Term: 01.11.2009 - 31.12.2015
    Funding source: Bundesministerium für Wirtschaft und Technologie (BMWi)

    Prototypical implementation of a new tool for quality assurance during software development

    Modern software systems are growing increasingly complex with respect to functional, technical and organizational aspects. Thus, both the number of requirements per system and the degree of their interconnectivity constantly increase. Furthermore the technical parameters, e.g., for distribution and reliability are getting more complex and software is developed by teams that are not only spread around the globe but also suffer from increasing time pressure. Due to this, the functional, technical, and organizational control of software development projects is getting more difficult.

    The "Software Project Control Center" is a tool that helps the project leader, the software architect, the requirements engineer, or the head of development. Its purpose is to make all aspects of the development process transparent and thus to allow for better project control. To achieve transparence, the tool distills and gathers properties from all artifacts and correlations between them. It presents/visualizes this information in a way suitable for the individual needs of the users.

    The Software Project Control Center unifies the access to relations between artifacts (traceability) and to their properties (metrics) within software development projects. Thus, their efficiency can be significantly increased. The artifacts, their relations, and related metrics are gathered and integrated in a central data store. This data can be analyzed and visualized, metrics can be computed, and rules can be checked.

    For the Software Project Control Center project we cooperate with the QAware GmbH, Munich. The AIF ZIM program of the German Federal Ministry of Economics and Technology funded the first 30 months of the project.

    The Software Project Control Center is divided into two subsystems. The integration pipeline gathers traceability data and metrics from a variety of software engineering tools. The analysis core allows to analyze the integrated data in a holistic way. Each subsystem is developed in a separate subproject.

    The project partner QAware GmbH implemented the integration pipeline. The first step was to define TraceML, a modeling language for traceability information in conjunction with metrics. The language contains a meta-model and a model library. TraceML allows to define customized traceability models in an efficient way. The integration pipeline is realized using TraceML as lingua franca in all processing steps: From the extraction of traceability information to its transformation and integrated representation. We used the Eclipse Modeling Framework to define the TraceML models on each meta-model level. Furthermore, we used the Modeling Workflow Engine for model transformations and Eclipse CDO as our model repository. A set of wide-spread tools for software engineering are connected to the integration pipeline including Subversion, Eclipse, Jira, Enterprise Architect and Maven.

    The main contribution of our group to this project is the analysis core, i.e., the design and implementation of a domain-specific language for graph-based traceability analysis. Our Traceability Query Language (TracQL) significantly reduces the effort that is necessary to implement traceability analyses. This is crucial for both industry and the research community as lack of expressiveness and inefficient runtimes of other known approaches used to hinder the implementation of traceability analysis. TracQL eases not only the extraction, but also the analysis of traceability data using graph traversals that are denoted in a concise functional programming style. The language itself is built on top of Scala, a multi-paradigm programming language, and was successfully applied to several real-world industrial projects.

    In 2014, we improved the modularity of the language to make it both more adaptable and extendable in terms of structure and operations. This not only increases its expressiveness but also improves the reusability of existing traceability analyses.

    In 2015, we evaluated and documented our approach in order to emphasize its core attributes and to show its effectiveness. The three core attributes are:
    - Representation independence: TracQL is adaptable to various data sources at which their data types are available statically typed.
    - Modularity: the approach is both modifiable and extendable in terms of structure and operations.
    - Applicability: the language has a better expressiveness and performance than other approaches.

  • OpenMP/Java

    (Third Party Funds Single)

    Term: 01.10.2009 - 01.10.2015
    Funding source: Industrie
    JaMP is an implementation of the well-known OpenMP standard adapted for Java. JaMP allows one to program, for example, a parallel for loop or a barrier without resorting to low-level thread programming. For example:
    class Test {
    ...void foo() {
    ......//#omp parallel for
    ......for (int i=0;i .........a[i] = b[i] + c[i]
    } is valid JaMP code. JaMP currently supports all of OpenMP 2.0 with partial support for 3.0 features, e.g., the collapse clause. JaMP generates pure Java 1.5 code that runs on every JVM. It also translates parallel for loops to CUDA-enabled graphics cards for extra speed gains. If a particular loop is not CUDA-able, it is translated to a threaded version that uses the cores of a typical multi-core machine. JaMP also supports the use of multiple machines and compute accelerators to solve a single problem. This is achieved by means of two abstraction layers. The lower layer provides abstract compute devices that wrap around the actual CUDA GPUs, OpenCL GPUs, or multicore CPUs, wherever they might be in a cluster. The upper layer provides partitioned and replicated arrays. A partitioned array automatically partitions itself over the abstract compute devices and takes the individual accelerator speeds into account to achieve an equitable distribution. The JaMP compiler applies code-analysis to decide which type of abstract array to use for a specific Java array in the user’s program.
    In 2010, the JaMP environment was extended to support the use of multiple machines and compute accelerators to solve a single problem. We have developed two new abstraction layers. The lower layer provides abstract compute devices which wrap around the actual CUDA GPUs, OpenCL GPUs, or multicore CPUs, wherever they might be in a cluster. The upper provides partitioned and replicated arrays. A partitioned array automatically partitions itself over the abstract compute devices and takes the individual accelerator speeds into account to achieve an equitable distribution. The JaMP compiler applies code-analysis to decide which type of abstract array to use for a specific Java array in the user’s program.
    In 2012, we extended the JaMP framework to also handle Java objects on multiple ma- chines and accelerators (and not just arrays of primitive types). We added two different ways to handle objects. Standard shared objects are replicated on all compute devices. Arrays of objects are now also replicated or partitioned over the different devices. To increase the performance of the program, the framework has to break with Java’s semantics. Java’s object structure is mapped to a flat memory structure for the execution on the different devices.
    In 2013, we examined how to better support Java objects in OpenMP parallel code, regardless of where the code is executed. We found that we needed to restrict the language slightly by forbidding inheritance of objects used in a parallel block. This ensures that the objects will not be of a different type than what is seen at compile time. We use this property to, for example, allow object inlining into arrays to occur naturally. With the added inlining, communication of objects and arrays over the network and to the compute devices was accelerated enormously, including a small performance increase on the devices themselves.
    In 2014 we developed a JaMP implementation for Android 4.0. Currently this version only supports the SIMD construct of OpenMP.
    In 2015 we added OpenMP tasks (OpenMP 3.0) to JaMP. This makes it possible to parallelize recursive algorithms with JaMP.
  • Parallelization techniques for embedded systems in automation

    (Own Funds)

    Term: 01.06.2009 - 31.12.2015

    This project was launched in 2009 to address the refactorization and parallelization of applications used in the field of industry automation. These programs are executed on specially designed embedded systems. This hardware forms an industry standard and is used worldwide. As multicore-architectures are increasingly used in embedded systems, existing sequential software must be parallelized for these new architectures in order to improve performance. As these programs are typically used in the industrial domain for the control of processes and factory automation they have a long life cycle. Because of this, the programs often are not being maintained by their original developers any more. Besides that, a lot of effort was spent to guarantee that the programs work reliably. For these reasons the software is only extended in a very reluctant way.

    Therefore, a migration of these legacy applications to new hardware and a parallelization cannot be done manually, as it is too error prone. Thus, we need tools that perform these tasks automatically or aid the developer with the migration and parallelization.

    Research on parallelization techniques

    We developed a special compiler for the parallelization of existing automation programs. First, we examined automation applications with respect to automatic parallelizability. We found that it is hard to perform an efficient automatic parallelization with existing techniques. Therefore this part of the project focuses on two steps to handle this situation. As first step, we developed a data dependence analysis that identifies potential critical sections in a parallel program, presents them to the programmer and adds their protection to the code. We ware able to show that our approach to identify critical sections finds atomic blocks that closely match the atomic blocks that an expert would add to the code. Besides that, we showed in 2014 that the impacts on execution times are negligible if our technique finds atomic blocks that are larger than necessary or that are not necessary at all.

    As second step we have refined and enhanced existing techniques (software transactional memory (STM) and lock inference) to implement atomic blocks. In our approach, an atomic block uses STM as long as lock inference would lead to coarse-grained synchronization. The atomic block switches from STM to lock inference as soon as a fine-grained synchronization is possible. With this technique, an atomic block always uses fine-grained synchronization while the runtime overhead of STM is minimized at the same time. We showed that (compared to a pure STM or lock inference implementation) our technique speeds up execution times by a factor between 1.1 and 6.3. Although fine-grained synchronization in general leads to better performance than a coarse-grained solution, there are cases where a coarse-grained implementation shows equal performance. We therefore presented a runtime mechanism for an STM that also works together with our combined technique. This runtime mechanism starts with a small number of locks, i.e., a coarse-grained locking, where accesses to different shared variables are protected by the same lock. If this coarse-grained locking leads to too many non-conflicting accesses waiting for the same lock, our approach gradually increases the number of locks. This makes the locking more fine-grained so that non-conflicting accesses can be executed concurrently. Our runtime mechanism that dynamically tunes the locking-granularity makes the programs run up to 3.0 times faster than a fixed coarsegrained synchronization.

    We completed this project part in 2014.

    Research on migration techniques

    Our research on the migration of legacy applications originally consisted of having a tool that automatically replaces suboptimal code constructs with better code. The code sequences that had to be replaced as well as the replacement codes were specified by developers by means of a newly developed pattern description language. However, we found this approach to be too difficult for novice developers.

    This led us to the development of a new tool that automatically learns and generalizes patterns from source code archives, recognizes them in other projects, and presents recommendations to developers. The foundation of our tool lies in the comparison of two versions of the same program. It extracts the changes that were made between two source codes, derives generalized patterns of suboptimal and better code from these changes, and saves the patterns in a database. Our tool then uses these patterns to suggest similar changes for the source code of different programs.

    In 2014 we developed a new symbolic code execution engine to minimize the number of wrong recommendations. Depending on the number and the generality of the patterns in the database, it is possible that without the new engine our tool generates some unfitting recommendations. To discard the unfitting ones, we compare the summary of the semantics/behavior of the recommendation with summary of the semantics/behavior of the database pattern. If both differ too severely, our tool drops the recommendation from the results. The distinctive features of our approach are its applicability to isolated code fragments and its automatic configuration that does not require any human interaction.

    The latest results of our tool SIFE are found online (last update: 2014-05-09).

    Parts of the project are funded by the "ESI-Anwendungszentrum" [http://www.esi-anwendungszentrum.de/]

  • Integrated Tool Chain for Meta-model-based Process Modeling and Execution

    (Third Party Funds Single)

    Term: 01.10.2008 - 31.12.2012
    Funding source: Bundesministerium für Wirtschaft und Technologie (BMWi)
    As demands on the development of complex software systems are continuously increasing, compliance with well-defined software development processes becomes even more important. Especially large and globally distributed software development projects tend to require long-running and dynamically changeable processes spanning multiple organizations. In order to describe and support such processes, there is a strong need for suitable process modeling languages and for powerful support by tools. The results of a preceding cooperation project show that today's tools markets lack integrated tool chains which actually support the fine-grained and precise modeling of software development processes as well as their computer-aided execution, controlling and monitoring. A cooperation project has bridged this gap. This cooperation project was carried out together with develop group as an industrial partner and was funded by BMWi. It started in October 2008 and has been scheduled for three researchers. The project was finished in September 2011. The objective of this cooperation project was to prototype an integrated tool chain by using a rigorous, meta-model based approach that supports modeling, enactment, and execution of industrial software development processes. Bearing the applicability of such a tool in mind, this approach was mainly intended to provide a wide adaptability of process models to different industrial development scenarios, to define a user-friendly concept of process description and to establish an extensive computer-aided process execution support, contributing to the efficiency of development activities. These benefits were achieved by a high grade of formalism, by an integrated, generic concept of process modeling and process enactment and by using commonly accepted industrial standards (UML, SPEM). The integrated tool chain developed in this project is based on an extension of the SPEM standard (eSPEM - enactable SPEM). eSPEM adds a behaviour modeling concept by reusing UML activity and state machine diagrams. In addition, eSPEM provides behaviour modeling concepts that are specific to software development processes, for example, dynamic task creation and scheduling. In 2012, an overview of the tool chain and eSPEM has been presented at the "First Workshop on Academics Modeling with Eclipse" which was held in conjunction with the "8th European Conference on Modeling Foundations and Applications". In addition, practical experiences from modeling SDPs in industrial projects have shown a rising importance of standards and reference models which are subsequently summarized under the term quality standard. These quality standards are used to specify requirements for target-oriented and effective execution of software development projects. These requirements are thereby defined to address different goals related to e.g. quality and efficiency (Automotive SPICE, CMMI) or safety (ISO 26262 Road Vehicles - Functional Safety) aspects of SWDPMs (Software Development Process Models). In other words, these requirements - often described in terms of best practices - are imposed on the software process definition that is typically described by SWDPMs. Tracing these requirements to the process definition is a precondition for supporting efficient assessment activities and process improvement projects. An additional goal of this research project lies therefore in the integration of these quality standards with SWDPMs with a special focus on environments that requires conformance to more than one quality standard (e.g. CMMI, Automotive SPICE and ISO 26262).
  • Wireless Localization

    (Third Party Funds Single)

    Term: 01.05.2008 - 14.11.2013
    Funding source: Fraunhofer-Gesellschaft
    Localization Systems (also known as Realtime Location Systems, or RTLS) become more and more popular in industry sectors such as logistics, automation and many more. These systems provide valueable information about whereabouts from objects at runtime. Therefore, processes can be traced, analyzed and optimized. Besides the research activities at the core of localization systems (like resilience and interference-free location technologies or methods for highly accurate positioning), there emerge algorithms and techniques to identify meaningful information for further processing steps. In this context, the aim of the wireless localization project is the research on automatic configuration methods for RTLSs as well as the generation of dynamic moving models and techniques for event processing on position streams at runtime. In 2009, we continued the development of our algorithms to estimate the receiving antenna's position (pose) of location systems. The algorithms estimate measuring points which allow a fast and accurate measurement pose. We applied an robot to execute the measurement automatically. The developed algorithm considers obstacles and the receiving characteristics of the location system and can sort out errors contained in the measurement data (e.g. multipath measurements). In 2010, models have been developed that can be used as dynamic moving models. Learning methods are applied to adapt the models at run-time. A formal language called TBL (Trajectory Behavior Language) was developed for describing trajectories. Additional algorithms can shrink the size of that description and hence compress the data size required to store trajectories. We are evaluating methods for learning the moving models online. These are evaluated in a study with respect to motion prediction. Moreover, it is being investigated whether events can be predicted by analyzing and learning event streams from the localisation system at runtime.
  • Model Driven Component Composition

    (Third Party Funds Single)

    Term: 15.06.2007 - 31.12.2011
    Funding source: Industrie
    This project is part of the INI.FAU collaboration between AUDI AG and the University of Erlangen-Nuremberg. It examines model-driven ways to integrate vehicle functions on electronic control units (ECUs). Moreover, the project develops supporting methods and tools for this task. The insights gained in the course of this project will be practically validated by integrating a damper control system into an AUTOSAR ECU. In the automotive industry it is common practice to develop in-car-software on a high level of abstraction and in a model-based way. To eliminate uncertainties concerning resource consumption and runtime it is necessary to test the developed software on the target hardware as early as possible. But due to cost and safety requirements the integration of the software into an ECU is very time-consuming and demands special expertise going beyond that of the function developer. AUTOSAR (AUTomotive Open Systen ARchitecture) is on the way to become a standard for the basic software on ECUs. But due to the novelty of this standard there are neither processes nor tools to support the integration of the developed in-car-software into an ECU. In 2008, we have examined the modeling expressiveness of AUTOSAR with respect to both its applicability and possible conflicts with existing standards and technologies that are currently in use at Audi. Furthermore, the automatic generation of an AUTOSAR software architecture from a single damper control component has been implemented. Since 2009, a model-driven approach that supports the integration of software into an ECU is being implemented and integrated into the tool set used at Audi. In particular we are looking at the automatic configuration of the bus communication by means of a bus database and the automatic task scheduling among the application processes. The model-driven development, which in this case is based on the Eclipse Modeling Framework, will enable easier tayloring of the emerging prototype to changing requirements. In 2010, we exploited the information that is available in an AUTOSAR project to automatically configure local and remote communication between software components. We have also developed a genetic algorithm that uses dependency information to automatically generate task schedulings that minimize communcation latencies between cooperating software components. The existing prototype has been extended with the above-mentioned methods.
  • JavaParty

    (Own Funds)

    Term: 01.04.2007 - 31.12.2010

    JavaParty [http://svn.ipd.uni-karlsruhe.de/trac/javaparty/wiki/JavaParty] allows easy porting of multi-threaded Java programs to distributed environments such as e.g. clusters. Regular Java already supports parallel applications with threads and synchronization mechanisms. While multi-threaded Java programs are limited to a single address space, JavaParty extends the capabilities of Java to distributed computing environments.
    The normal way of porting a parallel application to a distributed environment is the use of a communication library. Java's remote method invocation (RMI) renders the implementation of communication protocols unnecessary, but still leads to increased program complexity. The reasons for increased complexity are the limited RMI capabilities and the additional functionality that has to be implemented for creation and access of remote objects.
    The JavaParty approach is different. JavaParty classes can be declared as remote. While regular Java classes are limited to a single virtual machine, remote classes and their instances are visible and accessible anywhere in the distributed JavaParty environment. As far as remote classes are concerned, the JavaParty environment can be viewed as a Java virtual machine that is distributed over several computers. Access and creation of remote classes are syntactically indistinguishable from regular Java classes.

    In 2007/08, a complete new version of the JavaParty compiler was implemented. This version supports the current Java Standard 1.5/1.6. The implementation is based on the open and freely available Eclipse compiler framework. Thus, future developments of the Java language and corresponding extensions for the Eclipse compiler will automatically become available for JavaParty as well.

    In 2009, the JavaParty compiler was extended by a semantics for inner classes.

    We have reached the following goals in 2010:
    - Most of the previously self-implemented structures of the run-time system were replaced with more efficient standard Java implementations, because of matters of stability.
    - Due to compatibility and security reasons, the communication layer (KaRMI) was reimplemented based on Java's current socket technology.
    - A new "near" context was added to remote classes. With this new context that provides host local memory for each instance, locality-aware algorithms can be expressed.

  • ParSeMiS - the Parallel and Sequential Graph Mining Suite

    (Own Funds)

    Term: 01.05.2006 - 31.12.2010

    The ParSeMiS project (Parallel and Sequential Graph Mining Suite) searches for frequent, interesting substructures in graph databases. This task is becoming increasingly popular because science and commerce need to detect, store, and process complex relations in huge graph structures.
    For huge data that cannot be worked on manually, algorithms are needed that detect interesting correlations. Since in general the problem is NP-hard and requires huge amounts of computation time and memory, parallel or specialized algorithms and heuristics are required that can perform the search within time boundaries and memory limits.
    Our target is to provide an efficient and flexible tool for searching in arbitrary graph data, to improve the adaption to new application areas, and to simplify and unify the design of new mining algorithms.

    Aufbauend auf den Ergebnissen des Projekts ParMol2 wurden 2006/2007 folgende Ziele erreicht:

    • Restrukturierung und Neudesign der gewachsenen ParMol-Strukturen zu einer flexiblen Graphbibiliothek.
    • Ergänzung des objekt-orientierten Graphdesigns zu kompakteren, zur Parallelisierung besser geeigneten Datenstrukturen.
    • Überführung und Zerlegung der Algorithmen gSpan und Gaston in die neuen Strukturen und Einbau von Erweiterungen für das aktuelle Anwendungsgebiet ”Prozedurale Abstraktion”.
    • Entwurf und Implementierung eines neuen Algorithmus zur Suche in gerichteten azyklischen Graphen (DAGs) als Spezialisierung für die Prozedurale Abstraktion.
    • Implementierung einer angepassten grafischen Anzeige für DAGs.

    In 2008, the following goals have been achieved:

    • Documentation and publication of the source code to enlarge the user base of the project,
    • Implementation of a specialized graph layout for DAGs,
    • Restructuring of the graphical user interface, and
    • Added support for clusters of multi-core nodes.

    In 2009, the following goals have been attacked:

    • Optimizations for embedding-based frequency mining: A more detailed look at the pruning-related properties of the maximum-clique-test resulted in a huge run-time improvement, particularly for low frequencies that are of special interest for applications. This is achieved by an early detection during the NP-complete test that decides for a fragment whether can become frequent at all.
    • Improved distribution for clusters of multi-core nodes: Co-location of threads in the same memory speeds up parallel search. First results have been seen in 2009, more are expected in 2010.

    In 2010, the distributed stack implementations have also been tested on other algorithms and data structures.

  • Tapir

    (Own Funds)

    Term: 01.01.2006 - 31.12.2010

    Tapir is a new programming language to ease systems programming. Systems programming includes networking protocols, operating systems, middlewares, DSM systems, etc. Such systems are critical for the functioning of a system as they supply services that are required by user applications. For example, an operating system supplies an operating environment and abstracts from concrete hardware in doing so. A DSM system simulates a single shared memory machine by abstracting from the single machines inside a cluster so that a (user-level application on a) cluster can be programmed without having to program explicit message passing.

    Compared to application programming, systems programming has a different set of requirements. The programming 'style' is also very different from the styles used in programming user-level applications. Finally, the performance requirements are usually very strict in systems programs as the complete system's performance greatly relies on the underlying layers of systems programs. Bugs in systems code have also great repercussions on a complete system's reliability. Combined, we can directly conclude the following when using high-level languages for systems programming:
    - High-level languages, such as C++, C#, and Java 'hide' implementation details from the programmer. A programmer for example no longer needs to know how exactly a method call is implemented. This knowledge is, however, required when doing systems programming.
    - High-level languages supply functionality that is neither required nor wanted. For example, when programming an operating system, automatic language driven exception handling or garbage collection are not wanted.
    - Systems programs require no high abstraction levels like common high-level programming languages supply. Likewise, the libraries that a given language offers can simply not be supplied in a systems context. Usually this is due to the system itself supplying the functionality that the library is to provide.

    The basics of the Tapir language have been created. While Tapir has some similarities to Java, C#, and C++, all unneeded and unwanted functionality of the above have been removed. For example, Tapir has no automatic memory management, no exception handling, and no type-casts. Class and object concepts have been kept, but inheritance has been removed. The resulting Tapir programs can be verified by means of model-checking, even while the programming is still being developed. Verification is made easy as code pieces that are implementation details can be marked as such so that they can be safely ignored by the verifier. Tapir code can be executed in parallel for example also on a graphics card without the possibility of the common programming errors associated with parallelism can occur.

    Even while the language is still being developed, a prototype DSM protocol has already been implemented in the Tapir language. We have evaluated RDMA-based DSM protocols so that they can be added to the Tapir language. Tapir's semantic checks are implemented by means of model-checking. Model-checking, however, is a very memory intensive analysis. This made us write our own Java Virtual machine, called LVM, which is especially suited for managing large numbers of objects. LVM outperforms standard Java VMs as soon as swapping becomes necessary.

    2006/2007 wurde an den grundlegenden Spracheigenschaften von Tapir gearbeitet. Obwohl Tapir an existierende Hochsprachen wie C++ und Java angelehnt ist, wurden alle unnötigen Eigenschaften und Funktionen entfernt. Beispielsweise fehlen Tapir Speicherbereinigung, Ausnahmebehandlung und Typwandlungen; Klassen und Objekte können zwar definiert werden, jedoch ist keine Vererbungsbeziehung zwischen Klassen erlaubt. Das mit Tapir spezifizierte Systemprogramm kann mit Model Checking-Techniken bereits während der Entwicklung auf Fehler überprüft werden. Ein prototypischer Übersetzer und ein Verifikationswerkzeug wurden 2006/2007 implementiert.

    In 2008, LVM was optimized both for sequential execution and for distributed execution on a cluster of workstations. This allows for faster verification of Tapir programs on clusters and for faster running of scientific Java programs.

    In 2009, the Tapir language was itself improved to allow both easier automatic program verification and to allow more efficient code to be generated. For example, the language has become easier to verify because code pieces that are implementation details can be marked as such and be safely ignored by the verifier. The efficiency of the language has been improved such that selected parts of programs can now be executed in parallel for example also on a graphics card without the possibility of the common programming errors associated with parallelism can occur.

  • Cluster and Grid computing made easy

    (Own Funds)

    Term: 01.01.2006 - 31.12.2010

    Jackal is a project to create a distributed-shared-memory system for Java. This means that you can write a multi-threaded program (that you could run using normal JVMs on single machines as well) and deploy it on a cluster connected by a network. Jackal also sports a nice check-pointer so it can periodically write the program state to disk for fault tolerance.
    To make things more interesting, you can write the program also using OpenMP annotations which allows re-parallelization. Combined with checkpointing this allows a program to be restarted on a different number of machines as that it was started with.
    The Jackal-DSM is not only suited for traditional clusters where each node containes only a single CPU and core, but also for newer style clusters where each node in the cluster contains a multi-core CPU. Additionally, Jackal also has extensions and tools for Grid computing.

  • International Collegiate Programming Contest at the FAU

    (Own Funds)

    Since 1977 the International Collegiate Programming Contest (ICPC) takes place every year. Teams of three students try to solve about 13 programming problems within five hours. What makes this task even harder, is that there is only one computer available per team. The problems demand for solid knowledge of algorithms from all areas of computer science and mathematics, e.g., graphs, combinatorics, strings, algebra, and geometry. To solve the problems, the teams need to find a correct and efficient algorithm and implement it.The ICPC consists of three rounds. First, each participating university hosts a local contest to find the up to three teams that are afterwards competing in one of the various regional contests. Germany lies in the catchment area of the Northwestern European Regional Contest (NWERC) with competing teams from Great Britain, Benelux, Scandinavia, etc. The winners of all regionals in the world (and some second place holders) advance to the world finals in spring of the following year (2023 in Sharm El Sheikh, Egypt).
    On January 28, 2023, the Winter Contest took place once again. 75 teams from 16 universities participated, including 13 teams from Erlangen. Our best team finished 10th. On June 17, the German Collegiate Programming Contest was held at several German universities, with 14 teams from Erlangen. The best FAU team secured the 11th position out of 105 participating teams from all over Germany. The NWERC took place on November 26 in Delft. FAU was represented by 3 teams, which finished on the 32nd, 96th, and 125th positions among 143 participating teams. As usual, we also conducted the main seminar "Hello World! - Advanced Programming" in 2023.





































Current Courses

Optimierungen in Übersetzern

Title Optimierungen in Übersetzern
Short text inf2-ue2
Module frequency nur im Sommersemester
Semester hours per week 2

Voraussetzung zur Teilnahme an der Prüfung ist die erfolgreiche Bearbeitung der Übungsaufgaben.

In der Vorlesung werden ausgewählte Kapitel aus dem Übersetzerbau besprochen.

Schwerpunktmäßig werden Optimierungstechniken für die Übersetzung imperativer Programmiersprachen diskutiert, insbesondere solche, die für Hochleistungsrechner und Parallelrechner von Bedeutung sind. Begleitend dazu werden einige oft verwendere Techniken und Repräsetationsformen vorgestellt, die erforderlich sind, um die zur Optimierung benötigten Informationen geeignet zu berechnen bzw. zu verwalten.

Die folgenden Stichworte geben einen Überblick über die in der Vorlesung angesprochenen Einzelthemen:
- Abhängigkeitsanalyse, Abhängigkeitsgraph, Array-Index-Analyse, SSA Graph, Steuerungsflußgraph, Dominatoren,
- datenflußbasierte Schleifentransformationen: Strength Reduction, Elimination von Induktionsvariablen, Verschiebung von schleifeninvariantem Code, Schleifenentzweigung,
- Schleifenumordnungen: Schleifenvertauschung, Wellenparallelisierung, Schleifenumkehr, Strip Mining, Kachelbildung, Schleifenaufspaltung, Schleifenvereinigung,
- Schleifenrestrukturierung: Ausrollen, Schleifenzusammenfassung, Schleifenersetzung: Reduktion, Schleifenmustererkennung,
- Speicherzugriffstransformationen: Array-Padding, Speicherbank-Konflikte, Skalarexpansion und Array-Kontraktion,
- Partielle Auswertung: Konstantenpropagierung, Konstantenfaltung, Algebraische Vereinfachungen, Strength Reduction,
- Redundanzentfernung: unerreichter Code, unnötiger Code, tote Variablen, gemeinsame Teilausdrücke,
- Prozeduraufruftransformationen: Blattprozeduren, Inlining, Prozedurduplizierung, Prozedureinbettung, Rekursionselimination, Funktionsvorauswertung,
- Optimierungen für Parallelrechner: Datenaufspaltung, Skalarreplikation, Arrayreplikation, Daten- und Aktivitätsausrichtung, Guards, Botschaftenkombination, Latenzzeitverbergung, Prefetch und Poststore, Synchronpunktelimination,
- Pointer- und Aliasanalyse

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

1. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Thu, 16:15 - 17:45 18.04.2024 - 18.07.2024 30.05.2024
  • Tobias Heineken
  • Florian Mayer

"Hallo Welt!" für Fortgeschrittene

Title "Hallo Welt!" für Fortgeschrittene
Short text HW
Module frequency nur im Sommersemester
Semester hours per week 3


Programmierwettbewerbe wie der International Collegiate Programming Contest (ICPC) der ACM bieten die Möglichkeit, die eigenen Programmier- und Teamfähigkeiten an einer Vielzahl algorithmischer Probleme aus ganz verschiedenen Gebieten wie Geometrie, Kombinatorik, String-Verarbeitung und Zahlentheorie zu testen. Dabei treten die Studenten in 3er-Teams an, haben aber nur einen Computer zur Verfügung. Oft ist die Teamstrategie entscheidend für den Erfolg der Gruppe.

In diesem Seminar werden wichtige Algorithmen zur Lösung von Problemen aus den verschiedenen Gebieten in wöchentlichen, studentischen Vorträgen vorgestellt und Standardverfahren eingeübt. Neben den Vorträgen werden zum Thema passende Aufgaben besprochen und diskutiert. Zusätzlich müssen eine gewisse Anzahl an Aufgaben in Einzelarbeit gelöst werden.

Das Seminar bereitet auf die Teilnahme am Programmierwettbewerb der Universität Erlangen-Nürnberg Ende des Sommersemesters vor. Es besteht Teilnahmepflicht für diesen Wettbewerb.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

Empfohlene Literatur:

  • Skiena/Revilla, Programming Challenges. The Programming Contest Training Manual. Springer 2003.
  • Cormen/Leiserson/Rivest/Stein, Introduction to Algorithms. MIT Press 2001.

1. Parallelgruppe

Maximum number of participants: 18

Link to Campo

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Mon, 14:00 - 16:00 15.04.2024 - 15.07.2024 20.05.2024 11302.04.150
Einzeltermin Fri, 14:00 - 16:00 19.04.2024 - 19.04.2024 11302.04.150

Begleitseminar zu Bachelor- und Masterarbeiten

Title Begleitseminar zu Bachelor- und Masterarbeiten
Short text inf2-bs-bama
Module frequency in jedem Semester
Semester hours per week 3

1. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Mon, 12:15 - 13:45 15.04.2024 - 15.07.2024 20.05.2024
  • Prof. Dr. Michael Philippsen

Intensivübungen zu Parallele und Funktionale Programmierung

Title Intensivübungen zu Parallele und Funktionale Programmierung
Short text PFP-IÜ
Module frequency nur im Sommersemester
Semester hours per week 2

1. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
nach Vereinbarung - -
  • Julian Brandner

Übungen zu Optimierungen in Übersetzern

Title Übungen zu Optimierungen in Übersetzern
Short text inf2-ueb-uebersetzer
Module frequency nur im Sommersemester
Semester hours per week 2

Zeit und Ort für die Übungen werden in der ersten Vorlesungsstunde vereinbart.

In der Übung werden die in der Vorlesung vorgestellten Konzepte und Algorithmen zur Optimierung von Programmen durch einen Übersetzer wiederholt und vertieft.

Im Rahmen der Projektübungen erweitern die Übungsteilnehmer den in Übersetzerbau 1 implementierten Übersetzer um eine Auswahl der vorgestellten Algorithmen.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

1. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Fri, 10:15 - 11:45 19.04.2024 - 19.07.2024 31.05.2024 11302.02.133

In der Übung werden die in der Vorlesung vorgestellten Konzepte und Algorithmen zur Optimierung von Programmen durch einen Übersetzer wiederholt und vertieft.

Im Rahmen der Projektübungen erweitern die Übungsteilnehmer den in Übersetzerbau 1 implementierten Übersetzer um eine Auswahl der vorgestellten Algorithmen.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

2. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Mon, 10:15 - 11:45 15.04.2024 - 15.07.2024 20.05.2024
  • Tobias Heineken

In der Übung werden die in der Vorlesung vorgestellten Konzepte und Algorithmen zur Optimierung von Programmen durch einen Übersetzer wiederholt und vertieft.

Im Rahmen der Projektübungen erweitern die Übungsteilnehmer den in Übersetzerbau 1 implementierten Übersetzer um eine Auswahl der vorgestellten Algorithmen.

Die Materialien zur Lehrveranstaltung werden über StudOn bereitgestellt.

3. Parallelgruppe

Date and Time Start date - End date Cancellation date Lecturer(s) Comment Room
wöchentlich Mon, 14:15 - 15:45 15.04.2024 - 15.07.2024 20.05.2024
  • Florian Mayer


  • (Secondary Application: WO2013091908)
    Inventor(s): , ,
  • (Secondary Application: WO2013091907)
    Inventor(s): , ,
  • (Priority Patent Application: DE102011089180)
    Inventor(s): , ,
  • (Priority Patent Application: DE102011089181)
    Inventor(s): , ,
  • (Priority Patent Application: EP2469496 (EP10196851))
    Inventor(s): ,

Committee work

Programme-/Steering Committees

Working groups, commissions, committees


  • Mitglied Berufungsausschuss W3 Data- und Software-Engineering (Nachfolge Leis), 12/2022-.
  • Vertrauensdozent der GI in Erlangen, seit 04/2004.
  • Lehrbelastungskommission, seit 05/2002.


  • Mitglied Berufungskommission W3 Informatik (Systemsoftware), Friedrich-Schiller Universität FSU Jena, 01/2020-06/2022
  • Vorsitzender Berufungsausschuss W2 Didaktik der Informatik (Nachfolge Romeike), 11/2018-11/2019.
  • Kommissarische Leitung der Professur für Didaktuk der Infrmatik, 10/2018-11/2019.
  • Mitglied der Studienkommission Informatik, 11/2018-11/2019.
  • Mitglied des Vorstands des Zentrums für Lehrerbildung, 10/2018-11/2019.
  • Mitglied Berufungsausschuss W3 Experimentelle Astroteilchenphysik (Nachfolge Anton), 12/2017-05/2019.
  • Mitglied Berufungsausschuss W3 Visual Computing (Nachfolge Greiner), 06/2016-12/2017.
  • Mitglied der Raum- und Baukommission der Technischen Fakultät, 10/2013-04/2015.
  • Stellvertretender Sprecher der Kollegialen Leitung des Department Informatik, 10/2011-09/2013.
  • Kommissarische Leitung der Professur für Didaktik der Informatik, 08/2012-09/2013.
  • Vorsitzender Berufungsverfahren W2-Professur Didaktik der Informatik (Nachfolge Brinda), 07/2012-02/2013.
  • Mitglied der Prüfungsausschusses MA Internationale Wirtschaftsinformatik, 12/2008-11/2019.
  • Mitglied der Berufungskommission W1-Professur für Digitalen Sport, 12/2009-02/2011.
  • Mitglied der Berufungskommission W3-Professur für IT-Sicherheitsinfrastrukturen, 10/2009-12/2010.
  • Schriftführer des Berufungsausschusses W2-Professur Open Source Software, 07/2008-09/2009.
  • Externes Mitglied der Berufungskommission W3-Professur für Softwaresysteme an der Universität Passau, 06/2008-10/2008.
  • Mitglied des Senats und des Hochschulrats der Friedrich-Alexander-Universität, 10/2007-09/2009.
  • Mitglied des Fachbereichsrats der Technischen Fakultät, 10/2004-09/2009.
  • Mitglied der Kommission zur Verteilung und Verwendung der Studienbeiträge der Informatik, 11/2006-09/2009 (für den Studiengang Informatik: 11/2006-09/2007, für den Studiengang IuK 10/2007-09/2009, 05/2010-09/2010)
  • Mitglied der Berufungskommission W2-Professur für technisch-wissenschaftl. Höchstleistungsrechnen, 05/2006-12/2007.
  • IT-Generalist der DFG-Expertenkommission zur Begleitung des Projekts Online-Wahl der Fachkollegien 2007, 04/2006-06/2008.
  • Mitglied der Berufungskommission W2-Professur für Informatik (Datenbanksysteme, Nachfolge Jablonski), 01/2006-09/2007.
  • Kommissarische Leitung des Lehrstuhls Informatik 3, Rechnerarchitektur, 10/2005-02/2009.
  • Exportbeauftragter des Instituts für Informatik, 10/2005-11/2007.
  • Arbeitsgruppe Bachelor/Master für den Studiengang Informatik, 05/2005-08/2007.
  • Arbeitsgruppe Bachelor/Master für den Studiengang Informations- und Kommunikationstechnik, 05/2005-01/2006.
  • Geschäftsführender Vorstand des Instituts für Informatik, 10/2004-09/2005.
  • Mitglied der Strukturkommission der Technischen Fakultät, 10/2004-09/2005.
  • Mitglied des Consilium Techfak, 10/2004-09/2005.
  • Vorstand des Interdisziplinären Zentrums für funktionale Genomik, FUGE, 09/2004-06/2009.
  • Arbeitsgruppe Bibliotheksmodernisierung, 04/2004-12/2012.
  • Mitglied der Berufungskommission W3-Professur für Informatik (Rechnerarchitektur, Nachfolge Dal Cin), 11/2003-02/2009.
  • Mitglied der Studienkommission Informations- und Kommunikationstechnik, 10/2003-09/2005.
  • Mitglied der Studienkommission Wirtschaftsinformatik, seit 04/2002.
  • Mitglied der Studienkommission Informatik, 04/2002-09/2011.
  • Senatsberichterstatter Berufungsverfahren C3-Professur Organische Chemie (Nachf. Saalfrank), 08/2004-02/2005.
  • Schriftführer Berufungsverfahren C3-Professur Didaktik der Informatik, 04/2004-03/2005.
  • Mitglied der Berufungskommission C3-Professur für Informatik (Nachfolge Müller), 11/2003-07/2004.
  • Mitglied der Berufungskommission C3-Professur für Numerische Simulation mit Höchstleistungsrechnern, 07/2002-01/2003.
  • Mitglied der Berufungskommission C4-Professur für Informatik (Rechnernetze und Kommunikationssysteme), Nachfolge Herzog, 04/2002-02/2003.

Reviewing for journals

  • ACM Transactions on Software Engineering and Methodology, TOSEM
  • ACM Transactions on Programming Languages and Systems, TOPLAS
  • IEEE Transactions on Parallel and Distributed Systems
  • Journal of Parallel and Distributed Computing
  • Concurrency – Practice and Experience
  • Software – Practice and Experience
  • Journal of Systems and Software
  • Informatik-Spektrum
  • Informatik – Forschung und Entwicklung


Supervised theses

Doctoral theses and postdoctoral dissertations


















Student theses

Sorted alphabetically in UnivIS

Student theses at KIT (Karlsruhe)

  1. Marqc Schanne: Software-Architekturen für lokalitätsabhägige Diensterbringung auf mobilen Endgeräten. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 2002
  2. Sven Buth: Persistenz von verteilten Objekten im Rahmen eines offenen, verteilten eCommerce-Frameworks. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 2002
  3. Jochen Reber: Verteilter Garbage Collector für JavaParty. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 2000
  4. Thorsten Schlachter: Entwicklung eines Java-Applets zur diagrammbasierten Navigation innerhalb des WWW. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1999
  5. Edwin Günthner: Komplexe Zahlen für Java. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1999
  6. Christian Nester: Ein flexibles RMI Design für eine effiziente Cluster Computing Implementierung. [DA]
    Betreuer: Philippsen, M. abgeschlossen 1999
  7. Daniel Lukic: ParaStation-Anbindung für Java. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1998
  8. Jörg Afflerbach: Vergleich von verteilten JavaParty-Servlets mit äquivalenten CGI-Skripts. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1998
  9. Thomas Dehoust: Abbildung heterogener Datensätze in Java. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1998
  10. Guido Malpohl: Erkennung von Plagiaten unter einer Vielzahl von ähnlichen Java-Programmen. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1997
  11. Bernhard Haumacher: Lokalitätsoptimierung durch statische Typanalyse in JavaParty. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1997
  12. Matthias Kölsch: Dynamische Datenobjekt- und Threadverteilung in JavaParty. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1997
  13. Christian Nester: Parallelisierung rekursiver Benchmarks für JavaParty mit expliziter Datenobjekt- und Threadverteilung. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1997
  14. Matthias Jacob: Parallele Realisierung geophysikalischer Basisalgorithmen in JavaParty. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1997
  15. Oliver Reiff: Optimierungsmöglichkeiten für Java-Bytecode. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1996
  16. Marc Schanne: Laufzeitverhalten und Portierungsaspekte der Java-VM und ausgewählter Java-Bibliotheken. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1996
  17. Edwin Günthner: Portierung der Java VM auf den Multimedia Video Prozessor MVP TMS320C80. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1996
  18. Matthias Zenger: Transparente Objektverteilung in Java. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1996
  19. Matthias Winkel: Erweiterung von Java um ein FORALL. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1996
  20. Roland Kasper: Modula-2*-Benchmarks in einem Netz von Arbeitsplatzrechnern. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1993
  21. Markus Mock: Alignment in Modula-2*. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1992
  22. Stefan Hänßgen: Ein symbolishcer X Windows Debugger für Modula-2*. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1992
  23. Paul Lukowicz: Code-Erzeugung für Modula-2* für verschiedene Maschinenarchitekturen. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1992
  24. Hendrik Mager: Die semantische Analyse von Modula-2*. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1992
  25. Ernst Heinz: Automatische Elimination von Synchronisationsbarriere in synchronen FORALLs. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1991
  26. Stephan Teiwes: Die schnellste Art zu multiplizieren? – Der Algorithmus von Schönhage und Strassen auf der Connection Machine. [SA]
    Betreuer: Philippsen, M.: abgeschlossen 1991
  27. Ralf Kretzschmar: Ein Modula-2*-Übersetzer für die Connection Machine. [DA]
    Betreuer: Philippsen, M.: abgeschlossen 1991

Curriculum vitae

Professional career
04/02 – heute Full Professor (W3), Chair of the Programming Systems Group (Informatik 2) of the Friedrich-Alexander Universität Erlangen-Nürnberg, Germany
06/10 Rejected appointment as Full Professor (W3) for Parallel and Distributed Architectures at the Johannes-Gutenberg University Mainz
01/98 – 03/02 Department manager of the Softwaretechnik/Authorized Java Center group at FZI Forschungszentrum Informatik, Karlsruhe, Germany
09/95 – 09/01 Assistant Professor (Hochschulassistent, C1) at IPD, Institute for Programming Systems, Chair of Prof. Tichy, at KIT, Karlsruhe Institute of Technology
09/94 – 08/95 Post-Doc at ICSI (International Computer Science Institute) of the University of Berkeley, California
02/90 – 08/94 Research Assistant (BAT IIa) and PhD student at IPD, Institute for Programming Systems, Chair of Prof. Tichy, at KIT, Karlsruhe Institute of Technology


07/01 Habilitation in Computer Science at KIT, Karlsruhe Institute of Technology, Topic: Performance Aspects of Parallel Object-Oriented Programming Languages.
11/93 PhD (Dr. rer. nat.) in Computer Science (summa cum laude), at KIT, Karlsruhe Institute of Technology, Topic: Optimization Techniques for Compiling Parallel Programming Languages; Advisors: Prof. Dr. Walter F. Tichy and Prof. Dr. G. Goos
WS 85/86 – 89/90 Diplom (BA and MA) in Computer Science with Minor Industrial Engineering and Management (Wirtschaftsingenieurwesen), at KIT, Karlsruhe Institute of Technology
01/90 Diplom/MA (A/sehr gut)
08/89 – 12/89 Diploma/MA thesis at ENC, IBM European Networking Center, in Heidelberg; Topic: Replication for a distributed file system in a heterogeneous network; Advisors: Dr. Ulf Hollberg (IBM) and Prof. Dr. G. Krüger (Institute of Telematics)
01/89 – 03/89 BA thesis (Studienarbeit) at the Institute of Telematics; Topic: Classification of consistency protocols for replicated file systems in distributed systems; Advisors: Dr. Cora Förster and Prof. Dr. G. Krüger
04/88 – 07/88 Student Assistant at IPD, the Institute for Programming Systems, Chair of Prof. Tichy, Teaching Assistant for Informatik IV
04/88 – 12/88 Student Assistant at FZI Forschungszentrum Informatik, Distributed Relational Databases, Project Kardamom, Principle Investigator Prof. Dr. P. Lockemann
10/87 Vordiplom/BA (B/gut)
05/85 Abitur (secondary school exam/university entrance level qualification), (1.6, third of class of 1985)
08/76 – 05/85 Theodor-Heuss-Gymnasium, Essen-Kettwig, Germany
08/72 – 06/76 Schmachtenbergschule, Kettwig, Germany

Prizes, awards, nominations 

2014 von der Technischen Fakultät der Friedrich-Alexander Universität Erlangen-Nürnberg und ihrer Fachschaft Informatik nominiert für den Ars legendi-Preis für exzellente Hochschullehre des Stifterverbands und der Hochschulrektorenkonferenz

International experience

05/15 – 16/15 ICSI (International Computer Science Institute) of UCB, University of Berkeley, CA
12/10 – 03/11 Microsoft Research, Research in Software Engineering (RiSE) Group, Redmond, WA
09/94 – 08/95 ICSI (International Computer Science Institute) of UCB, University of Berkeley, CA
02/96 – 04/96 another research stay at ICSI in Berkeley, CA
02/92 – 03/92 Research stay at INRIA (Institut National de Recherche en Informatique et en Automatique), Sophia Antipolis, France
02/91 – 03/91 another research stay at INRIA in Sophia Antipolis, France
02/90 – heute Countless trips to international scientific conferences to give formal presentations


04/91 – heute Self-empoyed management consulting and expert’s reports for various industry and crafts enterprises
10/99 – 01/13 Design and development of a use-case specific Content-Management-Systems; for ISO Arzneimittel GmbH & Co. KG
12/95 – 12/97 Design and development of a Java extension for scalable Internet services and electronic trade; .for Electric Communities, CA
01/96 – 05/96 Design of an applicaiton for Mercedes-Benz Lease & Finanz GmbH (now Mercedes-Benz-Bank AG)
07/85 – 03/91 Working student at Stinnes Organisationsberatung GmbH, various tasks across both Stinnes AG (now DB Schenker AG) and Veba AG (now E.ON AG)
01/84 – 12/86 Freelance system analyst and software developer at the headquarter of Horten AG (now Galeria Karstadt Kaufhof GmbH)
07/84 – 08/84 Working student at Brenntag Mineralöl GmbH; analysis and black box testing of an externally procured merchandise planning and control system