Proceedings of the 2nd Workshop on Natural Language Processing for Computer-Mediated Communication / Social Media at GSCL2015 (NLP4CMC2015)

September 2015 Cmc-Corpora Conference Proceeding

Abstract

Over the past decade, there has been a growing interest in collecting, processing and analyzing data from genres of social media and computer-mediated communication (CMC): As part of large corpora which have been automatically crawled from the web, CMC data are often regarded as an unloved “bycatch” which is difficult to handle with NLP tools that have been optimized for processing edited text; on the other hand, the existence of CMC data in web corpora is relevant for all research and application contexts which require data sets that represent the full diversity of genres and linguistic variation on the web. For corpus-based variational linguistics, CMC corpora are an important resource for closing the “CMC gap” both in corpora of contemporary written language and in corpora of spoken language: Since CMC and social media make up an important part of contemporary everyday communication, investigations into language change and linguistic variation need to be able to include CMC and social media data into their empirical analyses.

Nevertheless, the development of approaches and tools for processing the linguistic and structural peculiarities of CMC genres and for building CMC corpora is lacking behind the interest of dealing with these types of data in the field of language technology, corpus-based linguistics and web mining.

The goal of the NLP4CMC workshops is to provide a platform for the presentation of results and the discussion of ongoing work in adapting NLP tools for processing CMC data and in using NLP solutions for building and annotating social media corpora. The main focus of the workshops is on German data, but submissions on NLP approaches, annotation experiments and CMC corpus projects for data of other European languages are also welcome.

The 1st NLP4CMC workshop was held in September 2014 at KONVENS at the University of Hildesheim. This volume presents proceedings from the 2nd NLP4CMC workshop which has been held in September 2015 at the annual conference of the German Society for Language Technology and Computational Linguistics (GSCL) at the University of Duisburg-Essen.

Type

Publication

German Society for Computational Linguistics & Language Technology