Datasets

    Email Signature and Reply Dataset

    These 617 email messages have signature lines and reply-to lines annotations, which can be used for extraction experiments. They were collected for the CEAS-2004 paper. The messages are a subset of the 20 Newsgroups dataset (produced by Ken Lang at CMU in the mid-90's). Download: Signature and Reply Dataset or in Minorthird Format


    American English Nicknames Collection

    The American English Nicknames Collection was created for our NAACL-HLT 2012 paper and is currently distributed by the LDC (Linguistic Data Consortium). Download from here.