| Reference : Automated text categorization in a dead language. The detection of genres in Late Egyptian |
| Scientific congresses and symposiums : Paper published in a book | |||
| Arts & humanities : Languages & linguistics Arts & humanities : Classical & oriental studies Engineering, computing & technology : Computer science | |||
| http://hdl.handle.net/2268/110298 | |||
| Automated text categorization in a dead language. The detection of genres in Late Egyptian | |
| English | |
Gohy, Stéphanie [Université de Liège - ULg > Département des sciences de l'antiquité > Egyptologie >] | |
Martin Leon, Benjamin [Université de Liège - ULg > Département des sciences de l'antiquité > Egyptologie >] | |
Polis, Stéphane [Université de Liège - ULg > Département des sciences de l'antiquité > Egyptologie >] | |
| 2013 | |
| Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010 | |
Polis, Stéphane ![]() | |
Winand, Jean ![]() | |
| Presses Universitaires de Liège | |
| Aegyptiaca Leodiensia 9 | |
| 61-74 | |
| Yes | |
| Yes | |
| International | |
| Liège | |
| Belgium | |
| Informatique & Égyptologie 2010. Texts, Languages & Information Technology in Egyptology | |
| 6-8 juillet 2010 | |
| Stéphane Polis & Jean Winand | |
| Liège | |
| Belgium | |
| [en] This paper is a first step in applying machine learning methods typical of Automated Text Catego-rization (ATC) for Automatic Genre Identification (AGI) in Late Egyptian, a language written in either hieroglyphic or hieratic scripts that is found in documents from Ancient Egypt dating from ca. 1350-700 BCE. The study is divided into three parts. After a general intro¬duction on AGI (§1), we introduce the levels of annotation that are integrated in the Ramses corpus and can be used when performing AGI on Late Egyptian (§2). In the following section (§3) we offer a brief survey of the types of features that have been discussed in the literature on AGI, before proceeding with three case studies where we apply supervised machine learning methods — namely the naïve Bayes classifier (§4.1), the Support Vector Machine (§4.2), and the Segment and Combine approach (§4.3) — to a selection of texts in the corpus. Their respective performances are tested using lexical, part-of-speech and inflectional features. | |
| Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS | |
| Researchers | |
| http://hdl.handle.net/2268/110298 |
| File(s) associated to this reference | ||||||||||||||
|
Fulltext file(s):
| ||||||||||||||
All documents in ORBi are protected by a user license.