Software Engineering

Software engineering provides a wide range of application possibilities for applied machine learning research. In cooperation with the Department of Software Engineering, the group is active in the fields of software quality, software testing, complex event processing, and Internet of Things.  

Recently, the group conducted research on finding security vulnerabilities in source code and provided a manually validated dataset of refactoring transformations on real systems to facilitate software quality research. The group provided a detailed analysis on the differences of byte-code and source-code based instrumentation methods in program analysis. New methods were introduced in the hierarchical delta debugging process.

Current active research areas are as follows:

  • Machine learning source code, source code similarity
  • Software quality models
  • Testing,  test suite quality
  • Fault localization, automated debugging
  • Test-code traceability


László Vidács (contact), Péter Pusztai, Péter Hegedűs, Gergő Balogh, András Kicsi, László Tóth, Balázs Nagy


Multi-label classification for tagging user feedbacks given in natural language form When users or customers express their expectations relating to the software, they use natural languages. These sentences of feedbacks or requirements often contain more than one aspect of the expectations, therefore, they can be classified more than one classes.The object of the project is to develop machine-learning (and deep-learning) based methods which can be applied to multi-label classification and to develop tagger tool based on these methods. The method is to be extended also to support multi-label tagging process of sentences.

Feature extraction and analysis Product line architecture is a timely answer to chanllenges of maintaining several program variants within the same codebase. Semi-automatic feature extraction starts from the high level feature list provided by domain experts and uses information retrieval and static code analysis techniques to determine the program code implementing a given feature. To support the adoption of the product line architecture in case of separately developed but similar products, feature level similarity and quality metrics are defined. Partners: SZEGED Software Ltd., Hungary Co-evolution analysis of production and test code Motivated by the large amount of automated tests we apply association rule learning on the version control history of software projects to reason about developer behaviour in maintaining tests and production code in parallel. Partners: University of Klagenfurt, Austria Recovering test to code traceability links from source code Recovering traceability links between tests and the class or method that is intended to be tested facilitates software evolution activities like fault localization and program understanding. In case of the JUnit framework a clean naming convention is promoted to establish links. When naming conventions are not followed, we apply natural language processing, information retrieval and static analysis techniques to recover links.

Classification of non-functional requirements Requirements engineering is one of the very first tasks of the software development processes which fundamentally influences the quality of the software under development. The requirements are mostly given in natural language form which can be both functional and non-functional requirements. The non-functional requirements are the foundation of the quality aspects of the software such as security, usability, reliability. Classifying the non-functional requirements is one of the most important tasks of software engineering. The object of the project is to develop machine-learning (and deep-learning) based methods and tools which can support system analysts in classifying non-functional requirements given in natural language form. The collection of classified non-functional requirements can be used for both analysis and design phases.

Selected publications