This HTML5 document contains 40 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
dctermshttp://purl.org/dc/terms/
n14http://localhost/temp/predkladatel/
n9http://linked.opendata.cz/resource/domain/vavai/projekt/
n7http://linked.opendata.cz/resource/domain/vavai/riv/tvurce/
n18http://linked.opendata.cz/resource/domain/vavai/subjekt/
n16http://linked.opendata.cz/ontology/domain/vavai/
n10http://linked.opendata.cz/resource/domain/vavai/vysledek/RIV%2F00216208%3A11320%2F12%3A10194834%21RIV14-MSM-11320___/
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#
n3http://linked.opendata.cz/ontology/domain/vavai/riv/
n2http://linked.opendata.cz/resource/domain/vavai/vysledek/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
n19http://linked.opendata.cz/ontology/domain/vavai/riv/vyuzitiJinymSubjektem/
n8http://linked.opendata.cz/ontology/domain/vavai/riv/klicoveSlovo/
n20http://linked.opendata.cz/ontology/domain/vavai/riv/duvernostUdaju/
xsdhhttp://www.w3.org/2001/XMLSchema#
n17http://linked.opendata.cz/ontology/domain/vavai/riv/aktivita/
n12http://linked.opendata.cz/ontology/domain/vavai/riv/jazykVysledku/
n15http://linked.opendata.cz/ontology/domain/vavai/riv/druhVysledku/
n11http://linked.opendata.cz/ontology/domain/vavai/riv/obor/
n5http://reference.data.gov.uk/id/gregorian-year/

Statements

Subject Item
n2:RIV%2F00216208%3A11320%2F12%3A10194834%21RIV14-MSM-11320___
rdf:type
skos:Concept n16:Vysledek
rdfs:seeAlso
https://wiki.ufal.ms.mff.cuni.cz/user:zeman:intersecting-parallel-corpora
dcterms:description
The organizers of the annual Workshop on Machine Translation (WMT) prepare and distribute parallel corpora that can be used to train systems for the shared tasks. Two core types of corpora are the News Commentary corpus and the Europarl corpus. Both are available in several language pairs, always between English and another European language: cs-en, de-en, es-en and fr-en. The corpora are not multi-parallel. They come from the same source and there is significant overlap but still some sentences are translated to only a subset of the languages. The bi-parallel subsets do not all have the same number of sentence pairs. Such corpora cannot be directly used to train a system for e.g. de-cs (German-Czech). However, we can use English as a pivot language. If we identify the intersection of the English parts of cs-en and de-en, we can take the non-English counterparts of the overlapping English sentences to create a de-cs parallel corpus. That is what this software does. The organizers of the annual Workshop on Machine Translation (WMT) prepare and distribute parallel corpora that can be used to train systems for the shared tasks. Two core types of corpora are the News Commentary corpus and the Europarl corpus. Both are available in several language pairs, always between English and another European language: cs-en, de-en, es-en and fr-en. The corpora are not multi-parallel. They come from the same source and there is significant overlap but still some sentences are translated to only a subset of the languages. The bi-parallel subsets do not all have the same number of sentence pairs. Such corpora cannot be directly used to train a system for e.g. de-cs (German-Czech). However, we can use English as a pivot language. If we identify the intersection of the English parts of cs-en and de-en, we can take the non-English counterparts of the overlapping English sentences to create a de-cs parallel corpus. That is what this software does.
dcterms:title
Intersecting Parallel Corpora Intersecting Parallel Corpora
skos:prefLabel
Intersecting Parallel Corpora Intersecting Parallel Corpora
skos:notation
RIV/00216208:11320/12:10194834!RIV14-MSM-11320___
n16:predkladatel
n18:orjk%3A11320
n3:aktivita
n17:P
n3:aktivity
P(7E11051)
n3:dodaniDat
n5:2014
n3:domaciTvurceVysledku
n7:9363661 n7:2630176
n3:druhVysledku
n15:R
n3:duvernostUdaju
n20:S
n3:ekonomickeParametry
The tool saves costs for obtaining, translating and annotating new parallel data in cases where texts exist for other language pairs.
n3:entitaPredkladatele
n10:predkladatel
n3:idSjednocenehoVysledku
142685
n3:idVysledku
RIV/00216208:11320/12:10194834
n3:interniIdentifikace
IPC
n3:jazykVysledku
n12:eng
n3:klicovaSlova
corpora; parallel; intersecting
n3:klicoveSlovo
n8:parallel n8:corpora n8:intersecting
n3:kontrolniKodProRIV
[6D687EE37D39]
n3:obor
n11:AI
n3:pocetDomacichTvurcuVysledku
2
n3:pocetTvurcuVysledku
2
n3:projekt
n9:7E11051
n3:rokUplatneniVysledku
n5:2012
n3:technickeParametry
https://wiki.ufal.ms.mff.cuni.cz/user:zeman:intersecting-parallel-corpora
n3:tvurceVysledku
Bojar, Ondřej Zeman, Daniel
n3:vlastnik
n10:vlastnikVysledku
n3:vyuzitiJinymSubjektem
n19:N
n14:organizacniJednotka
11320