. . "Teraman: A tool for N-gram extraction from large datasets" . "Han\u00E1k, Ivo" . . "Teraman: N\u00E1stroj pro extrakci N-gram\u016F z rozs\u00E1hl\u00FDch text\u016F"@cs . . . . "In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data." . . "[5C39C411B019]" . "IEEE 3rd International conference on intelligent computer communication and processing" . "RIV/49777513:23520/07:00000331" . "In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data."@en . . . "454680" . "Teraman: A tool for N-gram extraction from large datasets"@en . . . "209-216" . "23520" . . "1-4244-1491-1" . "RIV/49777513:23520/07:00000331!RIV08-MSM-23520___" . "Teraman: A tool for N-gram extraction from large datasets"@en . "New York" . "Cluj-Napoca" . "\u010Ce\u0161ka, Zden\u011Bk" . "large data processing; N-gram extraction; batch processing"@en . "V \u00FAloh\u00E1ch zpracov\u00E1n\u00ED p\u0159irozen\u00E9ho jazyka jsou k reprezentaci textov\u00FDch dokument\u016F nej\u010Dast\u011Bji pou\u017E\u00EDvan\u00E1 jednotliv\u00E1 slova. Celkov\u00E9 v\u00FDsledky lze v\u0161ak \u010Dasto vylep\u0161it pou\u017Eit\u00EDm dal\u0161\u00EDch, sofistikovan\u011Bj\u0161\u00EDch polo\u017Eek. Mezi n\u011B pat\u0159\u00ED i n-gramy, pro jejich\u017E extrakci byly publikov\u00E1ny algoritmy zalo\u017Een\u00E9 na r\u016Fzn\u00FDch principech. Existuj\u00EDc\u00ED techniky v\u0161ak nejsou prim\u00E1rn\u011B ur\u010Deny pro zpracov\u00E1n\u00ED velk\u00E9ho objemu dat, co\u017E je v sou\u010Dasn\u00E9 dob\u011B z\u00E1sadn\u00ED po\u017Eadavek. V tomto \u010Dl\u00E1nku prezentujeme algoritmus pro extrakci n-gram\u016F z rozs\u00E1hl\u00FDch textov\u00FDch korpus\u016F. Srovn\u00E1n\u00ED s jin\u00FDmi p\u0159\u00EDstupy nazna\u010Duj\u00ED, \u017Ee na\u0161e \u0159e\u0161en\u00ED dosahuje v\u00FDrazn\u011B lep\u0161\u00EDch v\u00FDsledk\u016F s ohledem na"@cs . "3"^^ . . . . . . "8"^^ . . "3"^^ . "2007-01-01+01:00"^^ . "Teraman: A tool for N-gram extraction from large datasets" . "Teraman: N\u00E1stroj pro extrakci N-gram\u016F z rozs\u00E1hl\u00FDch text\u016F"@cs . "IEEE" . "P(2C06009)" . "Tesa\u0159, Roman" .