Reference : Automated text categorization in a dead language. The detection of genres in Late Egyptian
Scientific congresses and symposiums : Paper published in a book
Arts & humanities : Languages & linguistics
Arts & humanities : Classical & oriental studies
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/110298
Automated text categorization in a dead language. The detection of genres in Late Egyptian
English
Gohy, Stéphanie mailto [Université de Liège - ULg > Département des sciences de l'antiquité > Egyptologie >]
Martin Leon, Benjamin mailto [Université de Liège - ULg > Département des sciences de l'antiquité > Egyptologie >]
Polis, Stéphane mailto [Université de Liège - ULg > Département des sciences de l'antiquité > Egyptologie >]
2013
Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010
Polis, Stéphane mailto
Winand, Jean mailto
Presses Universitaires de Liège
Aegyptiaca Leodiensia 9
61-74
Yes
Yes
International
Liège
Belgium
Informatique & Égyptologie 2010. Texts, Languages & Information Technology in Egyptology
6-8 juillet 2010
Stéphane Polis & Jean Winand
Liège
Belgium
[en] This paper is a first step in applying machine learning methods typical of Automated Text Catego-rization (ATC) for Automatic Genre Identification (AGI) in Late Egyptian, a language written in either hieroglyphic or hieratic scripts that is found in documents from Ancient Egypt dating from ca. 1350-700 BCE. The study is divided into three parts. After a general intro¬duction on AGI (§1), we introduce the levels of annotation that are integrated in the Ramses corpus and can be used when performing AGI on Late Egyptian (§2). In the following section (§3) we offer a brief survey of the types of features that have been discussed in the literature on AGI, before proceeding with three case studies where we apply supervised machine learning methods — namely the naïve Bayes classifier (§4.1), the Support Vector Machine (§4.2), and the Segment and Combine approach (§4.3) — to a selection of texts in the corpus. Their respective performances are tested using lexical, part-of-speech and inflectional features.
Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS
Researchers
http://hdl.handle.net/2268/110298

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
AegLeod9_04_Ramses3.pdfPublisher postprint2.92 MBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.