About: Teraman: A tool for N-gram extraction from large datasets

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Teraman: A tool for N-gram extraction from large datasets Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
Description	In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data. In natural language processing (NLP) mainly single words are utilized to represent text documents. Recent studies have shown that this approach can be often improved by employing other, more sophisticated features. Among them, mainly N-grams have been succesfully used for this purpose and many algorithms and procedures for their extraction have been proposed. However, usually they are noc primarily intended for large data processing, which has currently become a critical task. In this paper we present an algorithm for N-gram extraction from huge datasets. The experiments indicate that our approach reaches outstanding results among other available solutions in terms of speed and amount of processed data. (en) V úlohách zpracování přirozeného jazyka jsou k reprezentaci textových dokumentů nejčastěji používaná jednotlivá slova. Celkové výsledky lze však často vylepšit použitím dalších, sofistikovanějších položek. Mezi ně patří i n-gramy, pro jejichž extrakci byly publikovány algoritmy založené na různých principech. Existující techniky však nejsou primárně určeny pro zpracování velkého objemu dat, což je v současné době zásadní požadavek. V tomto článku prezentujeme algoritmus pro extrakci n-gramů z rozsáhlých textových korpusů. Srovnání s jinými přístupy naznačují, že naše řešení dosahuje výrazně lepších výsledků s ohledem na (cs)
Title	Teraman: A tool for N-gram extraction from large datasets Teraman: Nástroj pro extrakci N-gramů z rozsáhlých textů (cs) Teraman: A tool for N-gram extraction from large datasets (en)
skos:prefLabel	Teraman: A tool for N-gram extraction from large datasets Teraman: Nástroj pro extrakci N-gramů z rozsáhlých textů (cs) Teraman: A tool for N-gram extraction from large datasets (en)
skos:notation	RIV/49777513:23520/07:00000331!RIV08-MSM-23520___
http://linked.open.../vavai/riv/strany	209-216
http://linked.open...avai/riv/aktivita	P
http://linked.open...avai/riv/aktivity	P(2C06009)
http://linked.open...vai/riv/dodaniDat	2008
http://linked.open...aciTvurceVysledku	Hanák, Ivo Tesař, Roman Češka, Zdeněk
http://linked.open.../riv/druhVysledku	D - Článek ve sborníku
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...titaPredkladatele	Západočeská univerzita v Plzni / Fakulta aplikovaných věd
http://linked.open...dnocenehoVysledku	454680
http://linked.open...ai/riv/idVysledku	RIV/49777513:23520/07:00000331
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	large data processing; N-gram extraction; batch processing (en)
http://linked.open.../riv/klicoveSlovo	batch processing large data processing N-gram extraction
http://linked.open...ontrolniKodProRIV	[5C39C411B019]
http://linked.open...v/mistoKonaniAkce	Cluj-Napoca
http://linked.open...i/riv/mistoVydani	New York
http://linked.open...i/riv/nazevZdroje	IEEE 3rd International conference on intelligent computer communication and processing
http://linked.open...in/vavai/riv/obor	JC
http://linked.open...ichTvurcuVysledku	3 (xsd:int)
http://linked.open...cetTvurcuVysledku	3 (xsd:int)
http://linked.open...vavai/riv/projekt	Complex knowledge base tools for natural language communication with the semantic web
http://linked.open...UplatneniVysledku	2007
http://linked.open...iv/tvurceVysledku	Tesař, Roman Hanák, Ivo Češka, Zdeněk
http://linked.open...vavai/riv/typAkce	WRD - Světová
http://linked.open.../riv/zahajeniAkce	2007-01-01 (xsd:date)
number of pages	8 (xsd:int)
http://purl.org/ne...btex#hasPublisher	IEEE
https://schema.org/isbn	1-4244-1491-1
http://localhost/t...ganizacniJednotka	23520
is http://linked.open...avai/riv/vysledek of	Teraman: A tool for N-gram extraction from large datasets Teraman: A tool for N-gram extraction from large datasets Teraman: A tool for N-gram extraction from large datasets

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 123 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software