In his thesis, Irtaza Rasheed implementated a universal name disambiguation approach that considers almost any existing property to identify authors. After an author of a paper is identied, the normalized name writing form on the paper is used to refine the author model and even give an overview about the different writing forms of the author’s name. This can be achieved by first examine the research on Human-Computer Interaction specifically with focus on (Visual) Trend Analysis. Furthermore, a research on different name disambiguation techniques. After that, building a concept and implementing a generalized method to identify author name and affiliation disambiguation while evaluating different properties.
TU Darmstadt / GRIS, Fraunhoferstr. 5 (Darmstadt), Room tba
!!!!! Due to the Corona crisis and the accompanying restrictions at the TU Darmstadt, the exam will be non-public! !!!!!
Who: Ubaid Rana (Author), Prof. Dr. Arjan Kuijper (Supervisor), Dipl.-Inf. Dirk Burkhardt (Advisor/Co-Supervisor)
What: Master Thesis – “Named-Entity Recognition on Publications and Raw-Text for Meticulous Insight at Visual Trend Analytics”
In the modern data-driven era, a massive amount of research documents are available from publicly accessible digital libraries in the form of academic papers, journals and publications. This plethora of data does not lead to new insights or knowledge. Therefore, suitable analysis techniques and graphical tools are needed to derive knowledge in order to get insight of this big data. To address this issue, researchers have developed visual analytical systems along with machine learning methods, e.g text mining with interactive data visualization, which leads to gain new insights of current and upcoming technology trends. These trends are significant for researchers, business analysts, and decision-makers for innovation, technology management and to make strategic decisions.
Nearly every existing search portal uses the traditional meta-information e.g only about the author and title to find the documents that match a search request and overlook the opportunity of extracting content-related information. It limits the possibility of discovering most relevant publications, moreover it lacks the knowledge required for trend analysis. To collect this very concrete information, named entity recognition must be used to be able to better identify the results and trends. The state-of-the-art systems use static approach for named entity recognition which means that upcoming technologies remain undetected. Modern techniques like distant supervision methods leverage big existing community-maintained data sources, such as Wikipedia, to extract entities dynamically. Nonetheless, these methods are still unstable and have never been tried on complex scenarios such as trend analysis before.
The aim of this thesis is to enable entity recognition on both static tables and dynamic community updated data sources like Wikipedia & DBpedia for trend analysis. To accomplish this goal, a model is suggested which enabled entity extraction on DBpedia and translated the extracted entities into interactive visualizations. The analysts can use these visualizations to gain trend insights, evaluate research trends or to analyze prevailing market moods and industry trends.