Proceedings of Balisage: The Markup Conference 2010
Balisage Series on Markup Technologies, vol. 5
No
International
Balisage: The Markup Conference 2010
from 03-08-2010 to 06-08-2010
Montréal, QC
Canada
[en] XML ; corpus ; API ; text retrieval ; algorithm ; virtualization
[en] Providing support for flexible automated tagging of text-oriented XML documents (i.e. text with intersparsed markup) is a challenging issue. This requires support for tag-aware full text search (i.e. the capability to skip some tags or make invisible whole sections of the document), match points, and transparent updates. An API addressing this issue is described. Based on the virtualization of selected sections of the XML document, the API produces a tag-aware representation, backed by the document, that is transparently searchable (using keyword search or regular expressions) and updatable, offering support for natural linguistic reasoning.