Asst.Prof.Dr. Gülşen Cebiroğlu Eryiğit

You are here: Gülşen Cebiroğlu Eryiğit's home page > treebanks.html


This web page is outdated.

Please check for updated versions of these resources.



ITU Validation Set

ITU Validation Set is a dependency treebank of 300 sentences from 3 different genres. The details of the data set may be obtained from here.

There are 2 versions available:

ITU Validation Set V1: Multi Word Expressions are expressed as single units in sentences. (download)

ITU Validation Set V2: Multi Word Expressions are expressed as seperate units in sentences (with a dependency label "MWE" in between). (download)

While using the ITU Validation Sets please refer to:

Gülşen Eryiğit. The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, May 2012. (bibtex) (pdf)


METU-Sabancı Turkish Dependency Treebank

New version with Multi Word Expressions splitted into multiple units and manually tagged. After this splitting the new occuring dependencies between MWE tokens are labeled with a "MWE" dependency label and the entire dependencies within sentences are rearanged according to this.

This new version of the treebank is build upon the original treebank which may be obtained from

The treebank may be downloaded from here.

While using the METU-Sabancı Turkish Treebank please refer to the following papers:

Gülşen Eryiğit, Tugay İlbay, Ozan Arkan Can. Multiword Expressions in Statistical Dependency Parsing, In Proceedings of the Workshop on Statistical Parsing of Morphologically-Rich Languages SPRML at IWPT, Dublin, October 2011. (bibtex) (pdf)

Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür,"Building a Turkish Treebank", Invited chapter in "Building and Exploiting Syntactically-annotated Corpora", Anne Abeille Editor, Kluwer Academic Publishers, 2003.

Nart B. Atalay, Kemal Oflazer, Bilge Say, "The Annotation Process in the Turkish Treebank", in "Proceedings of the EACL Workshop on Linguistically Interpreted Corpora - LINC", April 13-14, 2003, Budapest, Hungary