Turkish Parsing Model for Maltparser

You are here: Gülşen Cebiroğlu Eryiğit's home page > ITUAnnotationTool.html

Downloads:

Java Reimplementation may be downloaded: ITU Treebank Annotation Tool v3.0

Installation guideline for the Java version (download)

(ITU Annotation Tool is reimplemented in Java by Hüseyin Sular and Deniz Ece Aktan on July, 2011.)

The executable of Version 2.0 may be downloaded : ITU Treebank Annotation Tool v2.2

Copyright:

ITU Treebank annotation tool is developed for processing Turkish sentences.
The tool consists of three different annotation stages;
morphological analysis, morphological disambiguation and syntax analysis.
Each of these stages are integrated with existing analyzers in order to guide human annotators.
Our semiautomatic treebank annotation tool is currently used both for creating new data sets and
correcting the existing Turkish treebank.

License:
ITU Treebank annotation tool by Gülşen Eryiğit is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial — You may not use this work for commercial purposes.
Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

Attribution Info:
Please cite the following paper if you make use of this resource in your research.

@inproceedings{Eryigit07,
author={G{\"u}l{\c s}en Eryi{\u g}it},
title={ITU Treebank Annotation Tool},
booktitle={Proceedings of the ACL workshop on Linguistic Annotation (LAW 2007)},
year={2007},
month={24-30 June},
address={Prague}
}

*********************************************************************************************************

References:

Gülşen Eryiğit. ITU Treebank Annotation Tool. In Proceedings of the Linguistic Annotation Workshop at ACL 2007, Prague, 24-30 June 2007. (bibtex)(pdf)

FAQ:

Q. Is it possible to change the language of the interface?

A. Yes. In its current state, the Interface is available in both English and Turkish. To add a new language as the interface language, these steps should be applied: The EnglishInterface.txt should be examined which exist in the base file of the project.

A new *Interface.txt (i.e. SpanishInterface.txt) file should be created suiting to the order numbers and format of the button names.

In Main.java class add a new Button and the following lines of code to its actionlistener function

lang.setLangIndex(2);

initialize();

changeLan();

The number '2' parameter of the first method is the language index. 0 is for Turkish, 1 is for English, the other numbers as 2,3 etc.. could be assigned to other languages.

Q. Some Turkish characters are not displayed correctly?

A. The Input files should be in UTF-8 format. Please change the encoding of your files to UTF-8.

Q. Is it possible to change the dependency labels used in the syntactic analysis screen?

A. Yes. The file "dep.txt" which is available under the main project directory contains the dependency labels which appears in the combo boxes of the syntactic analysis screen. Change the file in order to add or remove labels.

Q. Is it possible to use the tool for other agglutinative languages?
A. The tool may be adapted to other similar languages. Please send an email to gulsenc (at) itu (dot) edu (dot) tr for further assistance. (The output of another morphological analyzer should be in the format of sample file "output2.txt")

Q. Is it possible to use the tool under linux?
A. Maybe. Not tested yet. Don't forget the linux binaries for Xerox tools. And change the call for perl scripts within the Java code.

Q. Why did you use perl?
A. Just for code reuse. The tags used in the Turkish Morphological analyzer output and in the Turkish Treebank are unfortunately not the same and the perl scripts do the required transformations. You may want to change them in order to adapt to other languages than Turkish. These are the scripts used while creating conll format of the treebank and used as they are. You may just discard them or change them for a new language. The scirpts are prepareconllfromxml.pl, preparexmlfromconll.pl

Contact:

We appreciate if you could send us your bug reports and updates for adaptation to new languages and environments.

gulsenc (at) itu (dot) edu (dot) tr