Automated Uncertainty Data Classification and Inclusion for Enhanced Visual Trend Analytics (Bachelor/Master)

Big data counts as the oil of the 21st century. Importantly to mention is that the massive collection and storage of data does not lead automatically to new insights or knowledge. Therefore, appropriate processing and graphical analysis methods are required to extract a meaning from the data. In particular the combination of data mining approaches together with visual analytics leads to real beneficial applications to support decision making in e.g. innovation or technology management. In perspective of (visual) trend analysis, the challenge is the selection of promising data that enables an identification of trends. The usual data fundament is therefore patent data, since patent have a very clear structure, such as topical classification, next to an encompassing description. However, the biggest downside is the poor up-to-dateness, because the registration of patents takes usually between one and three years, which limits the possibilities of identifying trends early. An additional opportunity that came up in the recent years are digital libraries. Due to a number of open-access initiatives a variety of publicly accessible databases are still available, which cover a broad range of research fields.

The integration of such digital libraries is normally less restrictive and relatively easy to realize. The biggest challenge in using that data is next to licensing issues, the data quality and data completeness. Often, only the major publication information is available, for instance title, author names, publication year and the name of the book, proceeding or journal. But in particular data toward the specific publication like abstract or fulltext is missing. But in particular this content data is required to enable a sufficient insight trend analysis e.g. competing technologies of certain enterprises.

The goal of this master thesis is to face the information lack through considering and including third party sources, such as portals like CiteSeerX, OpenAIRE/zenodo, ResearchGate or websites of the authors’ research institutions. However, in praxis these data sources host not always the real paper, sometimes only the presentation or other documents are hosted under the original paper title, which makes it necessary to classify the originality based on a number of aspects like the use of established proceedings templates, validation of given meta information such as the author names, proceedings name and year etc.
To process the thesis, it is therewith required to first perform a research on Human-Computer Interaction (HCI) and Information Visualization (IV), followed by a research on (Visual) Trend Analysis, in particular with focus on data completion and enrichment through third data inclusion. A followed state of the art research should outline similar approaches and systems that might be relevant for the thesis. The main part will be the creation of a conceptual model and design that enables the collection of third data sources to advance the trend analytics processing and analysis. The concept should consider the uncertainty level of such external data, so that critical data will be excluded from the processing to avoid data blurring and quality decreasing. Based on the concept a prototypical implementation should act as proof of concept. The protype should then be applied on an existing database such as EuroGraphics or DBLP. An evaluation, where the different processing results will be compared (one time without disabled and one time with enabled data completion), concludes the thesis.


Dipl.-Inf. Dirk Burkhardt