Attributes | Values |
---|
rdf:type
| |
Description
| - In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data.
- In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data. (en)
- V úlohách zpracování přirozeného jazyka jsou k reprezentaci textových dokumentů nejčastěji používaná jednotlivá slova. Celkové výsledky lze však často vylepšit použitím dalších, sofistikovanějších položek. Mezi ně patří i n-gramy, pro jejichž extrakci byly publikovány algoritmy založené na různých principech. Existující techniky však nejsou primárně určeny pro zpracování velkého objemu dat, což je v současné době zásadní požadavek. V tomto článku prezentujeme algoritmus pro extrakci n-gramů z rozsáhlých textových korpusů. Srovnání s jinými přístupy naznačují, že naše řešení dosahuje výrazně lepších výsledků s ohledem na (cs)
|
Title
| - Teraman: A tool for N-gram extraction from large datasets
- Teraman: Nástroj pro extrakci N-gramů z rozsáhlých textů (cs)
- Teraman: A tool for N-gram extraction from large datasets (en)
|
skos:prefLabel
| - Teraman: A tool for N-gram extraction from large datasets
- Teraman: Nástroj pro extrakci N-gramů z rozsáhlých textů (cs)
- Teraman: A tool for N-gram extraction from large datasets (en)
|
skos:notation
| - RIV/49777513:23520/07:00000331!RIV08-MSM-23520___
|
http://linked.open.../vavai/riv/strany
| |
http://linked.open...avai/riv/aktivita
| |
http://linked.open...avai/riv/aktivity
| |
http://linked.open...vai/riv/dodaniDat
| |
http://linked.open...aciTvurceVysledku
| |
http://linked.open.../riv/druhVysledku
| |
http://linked.open...iv/duvernostUdaju
| |
http://linked.open...titaPredkladatele
| |
http://linked.open...dnocenehoVysledku
| |
http://linked.open...ai/riv/idVysledku
| - RIV/49777513:23520/07:00000331
|
http://linked.open...riv/jazykVysledku
| |
http://linked.open.../riv/klicovaSlova
| - large data processing; N-gram extraction; batch processing (en)
|
http://linked.open.../riv/klicoveSlovo
| |
http://linked.open...ontrolniKodProRIV
| |
http://linked.open...v/mistoKonaniAkce
| |
http://linked.open...i/riv/mistoVydani
| |
http://linked.open...i/riv/nazevZdroje
| - IEEE 3rd International conference on intelligent computer communication and processing
|
http://linked.open...in/vavai/riv/obor
| |
http://linked.open...ichTvurcuVysledku
| |
http://linked.open...cetTvurcuVysledku
| |
http://linked.open...vavai/riv/projekt
| |
http://linked.open...UplatneniVysledku
| |
http://linked.open...iv/tvurceVysledku
| - Tesař, Roman
- Hanák, Ivo
- Češka, Zdeněk
|
http://linked.open...vavai/riv/typAkce
| |
http://linked.open.../riv/zahajeniAkce
| |
number of pages
| |
http://purl.org/ne...btex#hasPublisher
| |
https://schema.org/isbn
| |
http://localhost/t...ganizacniJednotka
| |
is http://linked.open...avai/riv/vysledek
of | |