Conference Series

Series of conferences dedicated to the collection, annotation, processing, and analysis of corpora of computer-mediated communication (CMC) and social media corpora.

More about the Conference Series


The CLARIN K(nowledge)-Centre for Computer-Mediated Communication and Social Media Corpora (CKCMC) is dedicated to questions about representation, standardisation and distribution of CMC Corpora.

More about the CKCMC


Computer-Mediated Communication

Communication between humans via networked devices has become an everyday part of people’s lives across different generations, cultures, geographical areas, and social classes. Shaped by the specific social and technical context in which it is produced, synchronous and asynchronous computer-mediated communication (CMC) has become increasingly participatory, interactive, and multimodal. It constitutes public and private communication on-line, such as posts on blogs, forums, comments on online news sites, social media and networking sites such as Twitter and Facebook, mobile phone applications such as WhatsApp, e-mail and chat rooms.

All this user-generated CMC and social media content offers a wide range of research opportunities for a growing multidisciplinary research community to examine themes that often relate to—but are not limited to—the interaction between language, CMC, and society like, for example, language variation, pragmatics, media and communication studies. The data is also very important for the development of robust NLP tools that can deal with non-standard spelling, vocabulary and grammar. Compilation and dissemination of such corpora are hindered by the unclear legal status of CMC data when distributed as resource to the scientific community, which is further exacerbated by the rapidly changing terms of service by content providers.

The ambition of this still-growing research community is for the research into CMC to be based on the availability of large, structured data sets, as is the case for many scientific communities. These data sets (corpora) are often built collaboratively from the work of different research teams and disseminated across the research community so that they may form the basis for new analyses and comparative or counter-analyses. With this in mind, in the mid-2000s, a growing number of projects started to collect and structure CMC corpora and diffuse these empirical resources that cover a broad range of CMC genres and languages to both the wider scientific community and business enterprises that develop approaches and tools for web mining, opinion and trend detection, semantic content analysis, or machine translation.

It is gr8 2 c u 2nite.TY 4 ur treats.
SMS jargon from the early 2000’s. Picture by Miss Puzzle, CC BY-SA 4.0, via Wikimedia Commons