Extracting Terminological Concept Systems from Natural Language Text (Text2TCS)

Project duration: July 2020 until September 2021

Project description: Terminology is the foundation of any specialised communication. As it serves the acquisition of knowledge and the successful communication in a certain domain, terminological inconsistencies represent one major source of misunderstanding in (multilingual) specialised communication.

Automatic term extraction is currently limited to the extraction of a list of term candidates. However, to explicitly display the relation between terms it is necessary to generate terminological concepts and visualise their semantic relations. The Text2TCS application will automatically extract hierarchical and semantic relations from monolingual and bilingual text corpora to create a terminological concept system (TCS). It will rely on findings from ontology learning, machine learning, Discourse Representation Theory and Combinatory Categorial Grammar. Such a TCS is a valuable resource in cross-border coordination of communication (beyond language barriers) and extremely important in terms of crisis, such as COVID-19. Thereby, it can be ensured that different parties in a communication situation, such as medical, political, and news teams in times of crisis, consistently refer to phenomena by using the same words. The final outcome of Text2TCS will be an easy-to-use extraction application that is made freely available on the European Language Grid, a European-wide platform for language technologies and language resources.

Funding: Pilot project of the European Language Grid (ELG; grant agreement 825627)

Contact person: Dagmar Gromann

Website: text2tcs.univie.ac.at/en/