About: Context Sensitive Pattern Based Segmentation: A Thai Challenge     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • A Thai written text is a string of symbols without explicit word boundaries. A method for a development of a segmentation tool from a corpus of already segmented text is described. The methodology is based on the technology of competing patterns. A new UNICODE pattern generation program, OPATGEN, is used for the learning phase. We have shown feasibility of our methodology by generating patterns for Thai segmentation from already segmented text of the Thai corpus ORCHID: the segmentation algorithm quickly reaches F-score of 93 %. Finally, we enumerate possible new applications based on the pattern technique, and conclude with the suggestion of a general Pattern Translation Process. The technology is general and can be used for any other segmentation tasks as phonetic, morphologic segmentation, word hyphenation, sentence segmentation and text topic segmentation for any language.
  • A Thai written text is a string of symbols without explicit word boundaries. A method for a development of a segmentation tool from a corpus of already segmented text is described. The methodology is based on the technology of competing patterns. A new UNICODE pattern generation program, OPATGEN, is used for the learning phase. We have shown feasibility of our methodology by generating patterns for Thai segmentation from already segmented text of the Thai corpus ORCHID: the segmentation algorithm quickly reaches F-score of 93 %. Finally, we enumerate possible new applications based on the pattern technique, and conclude with the suggestion of a general Pattern Translation Process. The technology is general and can be used for any other segmentation tasks as phonetic, morphologic segmentation, word hyphenation, sentence segmentation and text topic segmentation for any language. (en)
  • A Thai written text is a string of symbols without explicit word boundaries. A method for a development of a segmentation tool from a corpus of already segmented text is described. The methodology is based on the technology of competing patterns. A new UNICODE pattern generation program, OPATGEN, is used for the learning phase. We have shown feasibility of our methodology by generating patterns for Thai segmentation from already segmented text of the Thai corpus ORCHID: the segmentation algorithm quickly reaches F-score of 93 %. Finally, we enumerate possible new applications based on the pattern technique, and conclude with the suggestion of a general Pattern Translation Process. The technology is general and can be used for any other segmentation tasks as phonetic, morphologic segmentation, word hyphenation, sentence segmentation and text topic segmentation for any language. (cs)
Title
  • Context Sensitive Pattern Based Segmentation: A Thai Challenge
  • Context Sensitive Pattern Based Segmentation: A Thai Challenge (en)
  • Context Sensitive Pattern Based Segmentation: A Thai Challenge (cs)
skos:prefLabel
  • Context Sensitive Pattern Based Segmentation: A Thai Challenge
  • Context Sensitive Pattern Based Segmentation: A Thai Challenge (en)
  • Context Sensitive Pattern Based Segmentation: A Thai Challenge (cs)
skos:notation
  • RIV/00216224:14330/03:00008605!RIV08-MSM-14330___
http://linked.open.../vavai/riv/strany
  • 65-72
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • Z(MSM 143300003)
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 602093
http://linked.open...ai/riv/idVysledku
  • RIV/00216224:14330/03:00008605
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • segmentation Thai competing patterns (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [4F982D010574]
http://linked.open...v/mistoKonaniAkce
  • Budapest
http://linked.open...i/riv/mistoVydani
  • Budapest
http://linked.open...i/riv/nazevZdroje
  • Proceedings of EACL 2003 workshop Computational Linguistics for South Asian Languages -- Expanding Synergies with Europe
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Sojka, Petr
  • Antoš, David
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
http://linked.open...n/vavai/riv/zamer
number of pages
http://purl.org/ne...btex#hasPublisher
  • Association for Computational Linguistics
https://schema.org/isbn
  • 1-932432-02-7
http://localhost/t...ganizacniJednotka
  • 14330
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software