EmpiriST2015 is a community shared task on tokenization and part-of-speech
tagging of German CMC and social media data.Its goal is to encourage the
developers of NLP applications to adapt their tools and resources for the
processing of written CMC discourse. The website of the shared task provides
data samples, an extended PoS tagset for CMC and detailed annotation guidelines
for tokenizing and PoS tagging German CMC data.