This HTML5 document contains 46 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
n20http://linked.opendata.cz/ontology/domain/vavai/riv/typAkce/
dctermshttp://purl.org/dc/terms/
n16http://purl.org/net/nknouf/ns/bibtex#
n17http://localhost/temp/predkladatel/
n9http://linked.opendata.cz/resource/domain/vavai/vysledek/RIV%2F00216224%3A14330%2F13%3A00070327%21RIV14-MSM-14330___/
n14http://linked.opendata.cz/resource/domain/vavai/projekt/
n4http://linked.opendata.cz/resource/domain/vavai/riv/tvurce/
n11http://linked.opendata.cz/resource/domain/vavai/subjekt/
n10http://linked.opendata.cz/ontology/domain/vavai/
n22https://schema.org/
shttp://schema.org/
skoshttp://www.w3.org/2004/02/skos/core#
n3http://linked.opendata.cz/ontology/domain/vavai/riv/
n2http://linked.opendata.cz/resource/domain/vavai/vysledek/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
n18http://linked.opendata.cz/ontology/domain/vavai/riv/klicoveSlovo/
n19http://linked.opendata.cz/ontology/domain/vavai/riv/duvernostUdaju/
xsdhhttp://www.w3.org/2001/XMLSchema#
n21http://linked.opendata.cz/ontology/domain/vavai/riv/aktivita/
n6http://linked.opendata.cz/ontology/domain/vavai/riv/jazykVysledku/
n15http://linked.opendata.cz/ontology/domain/vavai/riv/obor/
n8http://linked.opendata.cz/ontology/domain/vavai/riv/druhVysledku/
n7http://reference.data.gov.uk/id/gregorian-year/

Statements

Subject Item
n2:RIV%2F00216224%3A14330%2F13%3A00070327%21RIV14-MSM-14330___
rdf:type
n10:Vysledek skos:Concept
dcterms:description
The paper presents a work still in progress, but with promising results. We offer a new method of construction of word to number and number to word indices for very large corpus data (tens of billions of tokens), which is up to an order of magnitude faster than the current approach. We use HAT-trie for sorting the data and Daciuk’s algorithm for building a minimal deterministic finite state automaton from sorted data. The latter we reimplemented and our new implementation is roughly three times faster and with smaller memory footprint than the one of Daciuk. This is useful not only for building word-number indices, but also for many other applications, e.g. building data for morphological analysers. The paper presents a work still in progress, but with promising results. We offer a new method of construction of word to number and number to word indices for very large corpus data (tens of billions of tokens), which is up to an order of magnitude faster than the current approach. We use HAT-trie for sorting the data and Daciuk’s algorithm for building a minimal deterministic finite state automaton from sorted data. The latter we reimplemented and our new implementation is roughly three times faster and with smaller memory footprint than the one of Daciuk. This is useful not only for building word-number indices, but also for many other applications, e.g. building data for morphological analysers.
dcterms:title
Fast Construction of a Word-Number Index for Large Data Fast Construction of a Word-Number Index for Large Data
skos:prefLabel
Fast Construction of a Word-Number Index for Large Data Fast Construction of a Word-Number Index for Large Data
skos:notation
RIV/00216224:14330/13:00070327!RIV14-MSM-14330___
n10:predkladatel
n11:orjk%3A14330
n3:aktivita
n21:S n21:P
n3:aktivity
P(LM2010013), S
n3:dodaniDat
n7:2014
n3:domaciTvurceVysledku
n4:1322451 n4:6616844 n4:5837189
n3:druhVysledku
n8:D
n3:duvernostUdaju
n19:S
n3:entitaPredkladatele
n9:predkladatel
n3:idSjednocenehoVysledku
74703
n3:idVysledku
RIV/00216224:14330/13:00070327
n3:jazykVysledku
n6:eng
n3:klicovaSlova
word to number index; number to word index; finite state automata; hat-trie
n3:klicoveSlovo
n18:word%20to%20number%20index n18:hat-trie n18:finite%20state%20automata n18:number%20to%20word%20index
n3:kontrolniKodProRIV
[7EBD1E1A47FE]
n3:mistoKonaniAkce
Brno
n3:mistoVydani
Brno
n3:nazevZdroje
RASLAN 2013 Recent Advances in Slavonic Natural Language Processing
n3:obor
n15:IN
n3:pocetDomacichTvurcuVysledku
3
n3:pocetTvurcuVysledku
3
n3:projekt
n14:LM2010013
n3:rokUplatneniVysledku
n7:2013
n3:tvurceVysledku
Rychlý, Pavel Šmerk, Pavel Jakubíček, Miloš
n3:typAkce
n20:CST
n3:zahajeniAkce
2013-01-01+01:00
s:numberOfPages
5
n16:hasPublisher
Tribun EU
n22:isbn
9788026305200
n17:organizacniJednotka
14330