This HTML5 document contains 45 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
n7http://linked.opendata.cz/ontology/domain/vavai/riv/typAkce/
n20http://linked.opendata.cz/resource/domain/vavai/vysledek/RIV%2F00216208%3A11320%2F12%3A10130077%21RIV13-MSM-11320___/
dctermshttp://purl.org/dc/terms/
n13http://purl.org/net/nknouf/ns/bibtex#
n3http://localhost/temp/predkladatel/
n9http://linked.opendata.cz/resource/domain/vavai/projekt/
n16http://linked.opendata.cz/resource/domain/vavai/subjekt/
n14http://linked.opendata.cz/ontology/domain/vavai/
n19https://schema.org/
shttp://schema.org/
skoshttp://www.w3.org/2004/02/skos/core#
n4http://linked.opendata.cz/ontology/domain/vavai/riv/
n2http://linked.opendata.cz/resource/domain/vavai/vysledek/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
n5http://linked.opendata.cz/ontology/domain/vavai/riv/klicoveSlovo/
n10http://linked.opendata.cz/ontology/domain/vavai/riv/duvernostUdaju/
xsdhhttp://www.w3.org/2001/XMLSchema#
n18http://linked.opendata.cz/ontology/domain/vavai/riv/jazykVysledku/
n8http://linked.opendata.cz/ontology/domain/vavai/riv/aktivita/
n21http://linked.opendata.cz/ontology/domain/vavai/riv/druhVysledku/
n15http://linked.opendata.cz/ontology/domain/vavai/riv/obor/
n12http://reference.data.gov.uk/id/gregorian-year/

Statements

Subject Item
n2:RIV%2F00216208%3A11320%2F12%3A10130077%21RIV13-MSM-11320___
rdf:type
skos:Concept n14:Vysledek
dcterms:description
This paper describes the creation process of an Indonesian-English parallel corpus (IDENTIC). The corpus contains 45,000 sentences collected from different sources in different genres. Several manual text preprocessing tasks, such as alignment and spelling correction, are applied to the corpus to assure its quality. We also apply language specific text processing such as tokenization on both sides and clitic normalization on the Indonesian side. The corpus is available in two different formats: 'plain', stored in text format and 'morphologically enriched', stored in CoNLL format. Some parts of the corpus are publicly available at the IDENTIC homepage. This paper describes the creation process of an Indonesian-English parallel corpus (IDENTIC). The corpus contains 45,000 sentences collected from different sources in different genres. Several manual text preprocessing tasks, such as alignment and spelling correction, are applied to the corpus to assure its quality. We also apply language specific text processing such as tokenization on both sides and clitic normalization on the Indonesian side. The corpus is available in two different formats: 'plain', stored in text format and 'morphologically enriched', stored in CoNLL format. Some parts of the corpus are publicly available at the IDENTIC homepage.
dcterms:title
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
skos:prefLabel
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
skos:notation
RIV/00216208:11320/12:10130077!RIV13-MSM-11320___
n14:predkladatel
n16:orjk%3A11320
n4:aktivita
n8:P
n4:aktivity
P(LC536), P(LM2010013)
n4:dodaniDat
n12:2013
n4:domaciTvurceVysledku
Larasati, Septina Dian
n4:druhVysledku
n21:D
n4:duvernostUdaju
n10:S
n4:entitaPredkladatele
n20:predkladatel
n4:idSjednocenehoVysledku
140252
n4:idVysledku
RIV/00216208:11320/12:10130077
n4:jazykVysledku
n18:eng
n4:klicovaSlova
corpus; parallel; english; indonesian; enriched; morphologically; corpus; identic
n4:klicoveSlovo
n5:enriched n5:identic n5:english n5:parallel n5:morphologically n5:corpus n5:indonesian
n4:kontrolniKodProRIV
[0969C85B3E89]
n4:mistoKonaniAkce
İstanbul, Turkey
n4:mistoVydani
İstanbul, Turkey
n4:nazevZdroje
Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
n4:obor
n15:IN
n4:pocetDomacichTvurcuVysledku
1
n4:pocetTvurcuVysledku
1
n4:projekt
n9:LC536 n9:LM2010013
n4:rokUplatneniVysledku
n12:2012
n4:tvurceVysledku
Larasati, Septina Dian
n4:typAkce
n7:CST
n4:zahajeniAkce
2012-05-21+02:00
s:numberOfPages
5
n13:hasPublisher
European Language Resources Association
n19:isbn
978-2-9517408-7-7
n3:organizacniJednotka
11320