TU Darmstadt / GRIS, Fraunhoferstr. 5, Room 073 Who:
Muhammad Ali Riaz (Author), Prof. Dr. Arjan Kuijper (Supervisor), Dipl.-Inf. Dirk Burkhardt
Master Thesis – “Visual Trend Analysis on Condensed Expert Data beside Research Library Data for Enhanced Insights” Abstract:
In the present age of information, we live amidst seas of digital text documents including academic publications, white papers, news articles, patents, newspapers. To tackle the issue of the ever-increasing amount of text documents, researchers from the field of text mining and information visualization have developed tools and techniques to facilitate text analysis. In the context of visual trend analysis on text data, the use of well-structured patent data and public digital libraries are quite established. However, both sources of information have their limitations. For instance, the registration process for patents takes at least one year, which makes the extracted insights not suitable to research on present scenarios. In contrast to patent data, the digital libraries are up-to-date but provide high-level insights, only limited to broader research domains, and the data usage is almost restricted on meta information, such as title, author names and abstract, and they do not provide full text.
For a certain type of detailed analysis such as competitor analysis or portfolio analysis, data from digital libraries is not enough, it would also make sense to analyze the full-text. Even more, it can be beneficial to analyze only a limited dataset that is filtered by an expert towards a very specific field, such as additive printing or smart wearables for medical observations. Sometimes also a mixture of both digital library data and manually collected documents is relevant to be able to validate a certain trend, where one gives a big picture and other gives a very condensed overview of the present scenario.
The thesis aims, therefore, to focus on such manually collected documents by experts that can be defined as condensed data. So, the major goal of this thesis is to conceptualize and implement a solution that enables the creation and analysis of such a condensed data set and compensate therewith the limitations of digital library data analysis. As a result, a visual trend analysis system for analyzing text documents is presented, it utilizes the best of both state-of-the-art text analytics and information visualization techniques. In a nutshell, the presented trend analysis system does two things. Firstly, it is capable of extracting raw data from text documents in the form of unstructured text and meta-data, convert it into structured and analyzable formats, extract trends from it and present it with appropriate visualizations. Secondly, the system is also capable of performing gap-analysis tasks between two data sources, which in this case is digital library data and data from manually collected text documents (Condensed Expert Data). The proposed visual trend analysis system can be used by researchers for analyzing the research trends, organizations to identify current market buzz and industry trends, and many other use-cases where text data is the primary source of valuable information.