README for Hungarian Facebook political comments corpus http://corpus.nytud.hu/trendminer/ Release date: 2014-11-10 Contact: Márton Miháltz RILMTA Files: comments_annotations.tsv.tgz: Contains all 1.9M fb comments + NLP annotations. Contains 3 directories of files, each file contains about 500 comments. Each comment is headed by a line: #START____ Succeeding lines are tab-separated columns describing the tokens of the comments: Blank lines denote sentence boundaries (last blank line means end of that comment). comments_corpus.tgz Contains 1 tsv file, which contains 20.9M annotation units over the comments. Columns: The first 3 columns reference ids in the comments file above (3-tuples of page, post and comment ids uniquely identify comments in the corpus). Sentence and token indices start from 0.