About: Teraman: A tool for N-gram extraction from large datasets     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data.
  • In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data. (en)
  • V úlohách zpracování přirozeného jazyka jsou k reprezentaci textových dokumentů nejčastěji používaná jednotlivá slova. Celkové výsledky lze však často vylepšit použitím dalších, sofistikovanějších položek. Mezi ně patří i n-gramy, pro jejichž extrakci byly publikovány algoritmy založené na různých principech. Existující techniky však nejsou primárně určeny pro zpracování velkého objemu dat, což je v současné době zásadní požadavek. V tomto článku prezentujeme algoritmus pro extrakci n-gramů z rozsáhlých textových korpusů. Srovnání s jinými přístupy naznačují, že naše řešení dosahuje výrazně lepších výsledků s ohledem na (cs)
Title
  • Teraman: A tool for N-gram extraction from large datasets
  • Teraman: Nástroj pro extrakci N-gramů z rozsáhlých textů (cs)
  • Teraman: A tool for N-gram extraction from large datasets (en)
skos:prefLabel
  • Teraman: A tool for N-gram extraction from large datasets
  • Teraman: Nástroj pro extrakci N-gramů z rozsáhlých textů (cs)
  • Teraman: A tool for N-gram extraction from large datasets (en)
skos:notation
  • RIV/49777513:23520/07:00000331!RIV08-MSM-23520___
http://linked.open.../vavai/riv/strany
  • 209-216
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(2C06009)
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 454680
http://linked.open...ai/riv/idVysledku
  • RIV/49777513:23520/07:00000331
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • large data processing; N-gram extraction; batch processing (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [5C39C411B019]
http://linked.open...v/mistoKonaniAkce
  • Cluj-Napoca
http://linked.open...i/riv/mistoVydani
  • New York
http://linked.open...i/riv/nazevZdroje
  • IEEE 3rd International conference on intelligent computer communication and processing
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Tesař, Roman
  • Hanák, Ivo
  • Češka, Zdeněk
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
number of pages
http://purl.org/ne...btex#hasPublisher
  • IEEE
https://schema.org/isbn
  • 1-4244-1491-1
http://localhost/t...ganizacniJednotka
  • 23520
is http://linked.open...avai/riv/vysledek of
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 123 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software