. "R" . "arabic; contemporary; toolkit; morphological; state; finite; based; corpus"@en . . "0955-792X" . . . "1"^^ . . "4"^^ . "[6198C9CEE7EE]" . "We develop an open-source large-scale \uFB01nite-state morphological processing toolkit (AraComLex) for Modern StandardArabic (MSA) distributed under the GPLv3 license (http://aracomlex.sourceforge.net). The morphological transducer is based on a lexical database speci\uFB01cally constructed for this purpose. In contrast to previous resources, the database is tuned to MSA, eliminating lexical entries no longer attested in contemporary use. The database is built using a corpus of 1,089,111,204 word tokens, a pre-annotation tool, machine learning techniques and knowledge-based pattern matching to automatically acquire lexical knowledge. Our morphological transducer is evaluated and compared to LDC's SAMA(StandardArabic Morphological Analyser). We also develop a \uFB01nite-state morphological guesser as part of a methodology for extracting unknown word forms, lemmatizing them, and giving them a priority weight for inclusion in the lexicon."@en . . . . "RIV/00216208:11320/13:10194805" . "We develop an open-source large-scale \uFB01nite-state morphological processing toolkit (AraComLex) for Modern StandardArabic (MSA) distributed under the GPLv3 license (http://aracomlex.sourceforge.net). The morphological transducer is based on a lexical database speci\uFB01cally constructed for this purpose. In contrast to previous resources, the database is tuned to MSA, eliminating lexical entries no longer attested in contemporary use. The database is built using a corpus of 1,089,111,204 word tokens, a pre-annotation tool, machine learning techniques and knowledge-based pattern matching to automatically acquire lexical knowledge. Our morphological transducer is evaluated and compared to LDC's SAMA(StandardArabic Morphological Analyser). We also develop a \uFB01nite-state morphological guesser as part of a methodology for extracting unknown word forms, lemmatizing them, and giving them a priority weight for inclusion in the lexicon." . "A corpus-based finite-state morphological toolkit for contemporary Arabic" . "Attia, Mohammed" . . . . "A corpus-based finite-state morphological toolkit for contemporary Arabic" . "A corpus-based finite-state morphological toolkit for contemporary Arabic"@en . . "58554" . "11320" . "Genabith, Josef" . "Pecina, Pavel" . "January 8," . . "RIV/00216208:11320/13:10194805!RIV14-MSM-11320___" . "18"^^ . . . "Toral, Antonio" . "A corpus-based finite-state morphological toolkit for contemporary Arabic"@en . "1" . "GB - Spojen\u00E9 kr\u00E1lovstv\u00ED Velk\u00E9 Brit\u00E1nie a Severn\u00EDho Irska" . "Journal of Logic and Computation" . "10.1093/logcom/exs070" . . "http://logcom.oxfordjournals.org/content/early/2013/01/08/logcom.exs070.abstract" . . . . .