Reference : A Virtualization-Based Retrieval and Update API for XML-Encoded Corpora
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Computer science
A Virtualization-Based Retrieval and Update API for XML-Encoded Corpora
Briquet, Cyril mailto [ATILF (CNRS & Nancy-Université) > > > >]
Renders, Pascale mailto [Université de Liège - ULg > Département de langues et littératures romanes > Linguistique du français - Dialectologie wallonne >]
Petitjean, Etienne mailto [ATILF (CNRS & Nancy-Université) > > > >]
Proceedings of Balisage: The Markup Conference 2010
Balisage Series on Markup Technologies, vol. 5
Balisage: The Markup Conference 2010
from 03-08-2010 to 06-08-2010
Montréal, QC
[en] XML ; corpus ; API ; text retrieval ; algorithm ; virtualization
[en] Providing support for flexible automated tagging of text-oriented XML documents (i.e. text with intersparsed markup) is a challenging issue. This requires support for tag-aware full text search (i.e. the capability to skip some tags or make invisible whole sections of the document), match points, and transparent updates. An API addressing this issue is described. Based on the virtualization of selected sections of the XML document, the API produces a tag-aware representation, backed by the document, that is transparently searchable (using keyword search or regular expressions) and updatable, offering support for natural linguistic reasoning.
Researchers ; Professionals

File(s) associated to this reference

Fulltext file(s):

Open access
balisage2010-virtual-strings-api-paper.zipOnline version: postprint233.27 kBView/Open

Additional material(s):

File Commentary Size Access
Open access
Bal2010briq1111-slides.pdfSlides of the conference presentation1.04 MBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.