Asst.Prof.Dr. Gülşen Cebiroğlu Eryiğit

You are here: Gülşen Cebiroğlu Eryiğit's home page >ner.html

Turkish Named Entity Recognizer

In this page, we introduce a new Turkish Named Entity Recognizer

The details of this work are given in the following paper, please refer to it while using the introduced tool:

Features

Copyright

Download

Usage

 

Features:

- The model is pretrained on news data.

- It is written in Java with Eclipse

- uses CRFs as learning algorithm

 

Copyright: Turkish NER Tagger tool by Gökhan Akın ŞEKER and Gülşen ERYİĞİT is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Please see http://creativecommons.org/licenses/by-nc-sa/3.0/ for details.

You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work

Under the following conditions:
Attribution
— You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial
— You may not use this work for commercial purposes.
Share Alike
— If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

Attribution Info:
Please cite the following paper if you make use of this resource in your research:
Gökhan Akın Şeker, Gülşen Eryiğit. Initial explorations on using CRFs for Turkish Named Entity Recognition. In Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, India, 8-15 December 2012.

@inproceedings{SekerAndEryigit2012,
author={G\"{o}khan Ak{\i}n \c{S}eker \& G{\" u}l\-{\c s}en Er\-yi\-{\u g}it},
title={Initial explorations on using CRFs for Turkish Named Entity Recognition},
booktitle={In Proceedings of the 24th International Conference on Computational Linguistics, COLING 2012.},
year={2012},
month={8-15 December},
address={Mumbai, India}
}

Third Party Tools Used By Turkish NER Tagger Tool:
- CRF++: Yet Another CRF toolkit by Taku Kudo http://crfpp.googlecode.com/svn/trunk/doc/index.html

Download:

The NER tool is now available as SaaS. Please send an email to gulsen (dot) cebiroglu @ itu.edu.tr  (remove the spaces and replace (dot) with .) for further details.

Usage

Usage - Standalone

The input should be tokenized (one word in each line, sentences seperated with "*****" 5 stars) and morphologically analysed using the preprocessing tools (morphological analyzer and disambiguator) described in Turkish NLP pipeline.

The encoding of the input file should be in UTF-8.

The input file for the following sentence"Başbakan Recep Tayyip Erdoğan'ın başkanlığında Bakanlar Kurulu toplantısı yapılırken Başbakanlık'ta 3 el silah sesi duyuldu." is given below:

Başbakan başbakan+Noun+A3sg+Pnon+Nom
Recep Recep+Noun+Prop+A3sg+Pnon+Nom
Tayyip Tayyip+Noun+Prop+A3sg+Pnon+Nom
Erdoğan'ın Erdoğan+Noun+Prop+A3sg+Pnon+Gen
başkanlığında başkanlık+Noun+A3sg+P2sg+Loc
Bakanlar bakan+Noun+A3pl+Pnon+Nom
Kurulu kurul+Noun+A3sg+P3sg+Nom
toplantısı toplantı+Noun+A3sg+P3sg+Nom
yapılırken yap+Verb^DB+Verb+Pass+Pos+Aor^DB+Adverb+While
Başbakanlık'ta Başbakanlık+Noun+Prop+A3sg+Pnon+Loc
3 3+Num+Card
el el+Noun+A3sg+Pnon+Nom
silah silah+Noun+A3sg+Pnon+Nom
sesi ses+Noun+A3sg+P3sg+Nom
duyuldu duy+Verb^DB+Verb+Pass+Pos+Past+A3sg
. .+Punc

The output will be:

Başbakan başbakan+Noun+A3sg+Pnon+Nom O
Recep Recep+Noun+Prop+A3sg+Pnon+Nom B-PERSON
Tayyip Tayyip+Noun+Prop+A3sg+Pnon+Nom I-PERSON
Erdoğan'ın Erdoğan+Noun+Prop+A3sg+Pnon+Gen I-PERSON
başkanlığında başkanlık+Noun+A3sg+P2sg+Loc O
Bakanlar bakan+Noun+A3pl+Pnon+Nom B-ORGANIZATION
Kurulu kurul+Noun+A3sg+P3sg+Nom I-ORGANIZATION
toplantısı toplantı+Noun+A3sg+P3sg+Nom O
yapılırken yap+Verb^DB+Verb+Pass+Pos+Aor^DB+Adverb+While O
Başbakanlık'ta Başbakanlık+Noun+Prop+A3sg+Pnon+Loc B-ORGANIZATION
3 3+Num+Card O
el el+Noun+A3sg+Pnon+Nom O
silah silah+Noun+A3sg+Pnon+Nom O
sesi ses+Noun+A3sg+P3sg+Nom O
duyuldu duy+Verb^DB+Verb+Pass+Pos+Past+A3sg O
. .+Punc O