This HTML5 document contains 44 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
n15http://linked.opendata.cz/ontology/domain/vavai/riv/typAkce/
dctermshttp://purl.org/dc/terms/
n20http://purl.org/net/nknouf/ns/bibtex#
n5http://localhost/temp/predkladatel/
n16http://linked.opendata.cz/resource/domain/vavai/projekt/
n4http://linked.opendata.cz/resource/domain/vavai/riv/tvurce/
n18http://linked.opendata.cz/resource/domain/vavai/subjekt/
n17http://linked.opendata.cz/ontology/domain/vavai/
n6https://schema.org/
shttp://schema.org/
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#
n3http://linked.opendata.cz/ontology/domain/vavai/riv/
n2http://linked.opendata.cz/resource/domain/vavai/vysledek/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
n12http://linked.opendata.cz/ontology/domain/vavai/riv/klicoveSlovo/
n14http://linked.opendata.cz/ontology/domain/vavai/riv/duvernostUdaju/
xsdhhttp://www.w3.org/2001/XMLSchema#
n22http://linked.opendata.cz/ontology/domain/vavai/riv/jazykVysledku/
n11http://linked.opendata.cz/ontology/domain/vavai/riv/aktivita/
n21http://linked.opendata.cz/ontology/domain/vavai/riv/obor/
n23http://linked.opendata.cz/ontology/domain/vavai/riv/druhVysledku/
n7http://linked.opendata.cz/resource/domain/vavai/vysledek/RIV%2F00216224%3A14330%2F12%3A00064722%21RIV13-MSM-14330___/
n13http://reference.data.gov.uk/id/gregorian-year/

Statements

Subject Item
n2:RIV%2F00216224%3A14330%2F12%3A00064722%21RIV13-MSM-14330___
rdf:type
skos:Concept n17:Vysledek
rdfs:seeAlso
http://raslan2012.nlp-consulting.net/program
dcterms:description
The paper presents a work in progress: building morphologically annotated corpus of Tajik language of the size more than 100 million tokens. The corpus is and will be by far the largest available computer corpus of Tajik: even its current size is almost 85 million tokens. Because the available text sources are rather scarce, to achieve the goal also the texts of a lower quality have to be included. This short paper briefly reviews the current state of the corpus and analyzer, discusses problems with either “normalization” or at least categorization of low quality texts and finally also the perspectives for the nearest future. The paper presents a work in progress: building morphologically annotated corpus of Tajik language of the size more than 100 million tokens. The corpus is and will be by far the largest available computer corpus of Tajik: even its current size is almost 85 million tokens. Because the available text sources are rather scarce, to achieve the goal also the texts of a lower quality have to be included. This short paper briefly reviews the current state of the corpus and analyzer, discusses problems with either “normalization” or at least categorization of low quality texts and finally also the perspectives for the nearest future.
dcterms:title
Towards 100M Morphologically Annotated Corpus of Tajik Towards 100M Morphologically Annotated Corpus of Tajik
skos:prefLabel
Towards 100M Morphologically Annotated Corpus of Tajik Towards 100M Morphologically Annotated Corpus of Tajik
skos:notation
RIV/00216224:14330/12:00064722!RIV13-MSM-14330___
n17:predkladatel
n18:orjk%3A14330
n3:aktivita
n11:P
n3:aktivity
P(LM2010013)
n3:dodaniDat
n13:2013
n3:domaciTvurceVysledku
n4:1322451 n4:8884439 Dovudov, Gulshan
n3:druhVysledku
n23:D
n3:duvernostUdaju
n14:S
n3:entitaPredkladatele
n7:predkladatel
n3:idSjednocenehoVysledku
174736
n3:idVysledku
RIV/00216224:14330/12:00064722
n3:jazykVysledku
n22:eng
n3:klicovaSlova
web corpora; Tajik
n3:klicoveSlovo
n12:Tajik n12:web%20corpora
n3:kontrolniKodProRIV
[7C1CE48E9CB5]
n3:mistoKonaniAkce
Karlova Studánka
n3:mistoVydani
Brno
n3:nazevZdroje
Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2012
n3:obor
n21:AI
n3:pocetDomacichTvurcuVysledku
3
n3:pocetTvurcuVysledku
3
n3:projekt
n16:LM2010013
n3:rokUplatneniVysledku
n13:2012
n3:tvurceVysledku
Šmerk, Pavel Suchomel, Vít Dovudov, Gulshan
n3:typAkce
n15:EUR
n3:zahajeniAkce
2012-01-01+01:00
s:numberOfPages
4
n20:hasPublisher
Tribun EU
n6:isbn
9788026303138
n5:organizacniJednotka
14330