Datasets
Email Signature and Reply Dataset
These 617 email messages have signature lines and
reply-to lines annotations, which can be used for extraction experiments. They were collected for the
CEAS-2004 paper. The messages are a subset of the 20 Newsgroups
dataset (produced by Ken Lang at CMU in the mid-90's).
Download: Signature
and Reply Dataset or in Minorthird Format
American English Nicknames Collection
The American English Nicknames Collection was created for our NAACL-HLT 2012 paper and is currently distributed by the LDC (Linguistic Data Consortium).
Download from here.