About: Internet as a language corpus

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Internet as a language corpus Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Projekt, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	http://linked.opendata.cz/ontology/domain/vavai/Projekt
rdfs:seeAlso	http://www.isvav.cz/projectDetail.do?rowId=GA405/09/0278
Description	Sufficient amounts of language data (text corpora) are absolutely essential for methods of computational linguistics and natural language processing. Rapid development of computer technology allows processing of much larger datasets than before. However, such data is not available. Currently, the largest Czech corpora contaion at most hundreds of millions of tokens (Czech national corpus), which is for many methods not sufficient. Building text corpora is time-consuming and expensive process and can not possibly satisfy needs of current research in the field. The proposed project aims to build a text corpus at least ten times larger than currently available corpora with incomparably lower expenses. The corpus will be build from data publicly available on the internet. Automatically downloaded data will be filtered, cleaned up and linguistically processed. Language quality of such corpus will be, due to completely automatic processing, lower compared to quality of classical corpora, but its significant advantage will be size. (en) Sestavení rozsáhlého českého textového korpusu z dat dostupných na internetu a jeho základní lingvistické zpracování automatickými metodami.
Title	Internet as a language corpus (en) Internet jako jazykový korpus
skos:notation	GA405/09/0278
http://linked.open...avai/cep/aktivita	Standard projects
http://linked.open...kovaStatniPodpora	http://linked.opendata.cz/resource/domain/vavai/projekt/GA405%2F09%2F0278/celkovaStatniPodpora
http://linked.open...ep/celkoveNaklady	http://linked.opendata.cz/resource/domain/vavai/projekt/GA405%2F09%2F0278/celkoveNaklady
http://linked.open...datumDodatniDoRIV	2015-03-02 (xsd:date)
http://linked.open...i/cep/druhSouteze	VS - Veřejná soutěž ve výzkumu a vývoji
http://linked.open...ep/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open.../cep/fazeProjektu	91499435
http://linked.open...ai/cep/hlavniObor	JD - Využití počítačů, robotika a její aplikace
http://linked.open...hodnoceniProjektu	V - Vynikající výsledky (s mezinárodním významem apod.). Zároveň byly splněny cíle a předpokládané výsledky uvedené ve smlouvě / rozhodnutí o poskytnutí podpory.
http://linked.open...vai/cep/kategorie	ZV - Základní výzkum
http://linked.open.../cep/klicovaSlova	internet; jazykový korpus; čištění textových dat; značkování (en)
http://linked.open...ep/partnetrHlavni	Matematicko-fyzikální fakulta
http://linked.open...inujicichPrijemcu	0 (xsd:int)
http://linked.open...cep/pocetPrijemcu	1 (xsd:int)
http://linked.open...ocetSpoluPrijemcu	0 (xsd:int)
http://linked.open.../pocetVysledkuRIV	12 (xsd:int)
http://linked.open...enychVysledkuVRIV	12 (xsd:int)
http://linked.open...lneniVMinulemRoce	2011-04-16 (xsd:date)
http://linked.open.../prideleniPodpory	http://linked.opendata.cz/resource/domain/vavai/cep/prideleniPodpory/405%2F09%2F0278
http://linked.open...iciPoslednihoRoku	2011
http://linked.open...atUdajeProjZameru	2012
http://linked.open.../vavai/cep/soutez	SGA02009GA-ST
http://linked.open...usZobrazovaneFaze	DUU
http://linked.open...ai/cep/typPojektu	P - Projekt výzkumu a vývoje financovaný ze státního rozpočtu
http://linked.open...ep/ukonceniReseni	2011-12-31 (xsd:date)
http://linked.open.../cep/vedlejsiObor	IN - Informatika
http://linked.open...ep/zahajeniReseni	2009-01-01 (xsd:date)
http://linked.open...jektu+dodavatelem	Řešení projektu proběhlo výtečně, jak z hlediska odborného tak i z hlediska čerpání finančních prostředků. (cs) The project was excellent, both in terms of expertise and in terms of disbursement of funds. (en)
http://linked.open...tniCyklusProjektu	ZBKU
http://linked.open.../cep/klicoveSlovo	internet jazykový korpus čištění textových dat
is http://linked.open...vavai/riv/projekt of	Semi-Supervised Training for the Averaged Perceptron POS Tagger Dependency Parsing as a Sequence Labeling Task Ke vztahu kognitivního obsahu a jazykového významu An Augmented Three-Pass System Combination Framework: DCU Combination System for WMT 2010. Integration of Speech and Text Processing Modules into a Real-Time Dialogue System Building a Web Corpus of Czech On Syntax and Semantics of Czech Infinitival Constructions: A Case Study A High-Quality Web Corpus of Czech Comparable Fora Delimitation of information between grammatical rules and lexicon O rezultativnosti (především) v češtině Delimitation of information between grammatical rules and lexicon
is http://linked.open...vavai/cep/projekt of	http://linked.opendata.cz/resource/domain/vavai/cep/ucast/GA405%2F09%2F0278/2009/orjk%3A11320 http://linked.opendata.cz/resource/domain/vavai/cep/ucast/GA405%2F09%2F0278/2010/orjk%3A11320 http://linked.opendata.cz/resource/domain/vavai/cep/ucast/GA405%2F09%2F0278/2011/orjk%3A11320 http://linked.opendata.cz/resource/domain/vavai/cep/ucast/GA405%2F09%2F0278/2012/orjk%3A11320

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software