AISIS 2021

OCTOBER 11-15, 2021

Atilla Alkan: Natural Language Processing for analyzing messages of Astrophysical observations

Sep 21, 2021 (updated Oct 1, 2021)

Summary

Time-domain astronomy consists of observing and studying the most violent cosmic transient phenomena, such as tidal disruption events, supernovae, gamma-ray bursts, neutrinos and many other sources of a large variety of radiation and particles. While initial detection are typically reported via machine readable formats like the IVOA standardized VoEvent, subsequent observational reports on these phenomena are largely distributed via manual reports written by observers (e.g. GCN circulars, ATEL telegrams, TNS reports, etc.). In order to allow other observatories to react and conduct their own follow-up observations, the information related to the characterisation of a new transient phenomena has to be communicated and analyzed very rapidly. However, the improvement of observation techniques and the increased interest in time-domain astronomy has resulted in a substantial increase in the number of these reports leading to a saturation of the way astrophysicists read, analyze and classify information.

That is why, we aim to develop neural-based Natural Language Processing (NLP) methods that tackle the challenges of extracting and summarizing information in astrophysical observation reports. On the one hand, we started by identifying and defining annotation guidelines for astrophysical named entities. We highlighted areas of ambiguity regarding some entities. On the other hand, we implemented and trained our own version of Word2Vec model in order to adapt it to the astrophysics domain. We started to annotate and analyze real observation reports in order to optimize the procedures and relate them to the properties of the texts. As next steps we will implement detection of speculation and negation and set up the use of Graph Convolutional Networks to resolve co-references within and across several documents.

In this contribution to AISIS 2021 we will outline our general ideas for the challenge to use NLP for the analysis of astronomical observation reports. We will present the current state of the art of NLP in astrophysics and discuss our approach for Named Entity Recognition, and co-reference resolution.

Instituto de Ciencias Nucleares, UNAM - Circuito Exterior S/N, Ciudad Universitaria, Col. Universidad Nacional Autónoma de México, Zip Code 04510, Mexico City