About: Internet as a language corpus     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Projekt, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
rdfs:seeAlso
Description
  • Sufficient amounts of language data (text corpora) are absolutely essential for methods of computational linguistics and natural language processing. Rapid development of computer technology allows processing of much larger datasets than before. However, such data is not available. Currently, the largest Czech corpora contaion at most hundreds of millions of tokens (Czech national corpus), which is for many methods not sufficient. Building text corpora is time-consuming and expensive process and can not possibly satisfy needs of current research in the field. The proposed project aims to build a text corpus at least ten times larger than currently available corpora with incomparably lower expenses. The corpus will be build from data publicly available on the internet. Automatically downloaded data will be filtered, cleaned up and linguistically processed. Language quality of such corpus will be, due to completely automatic processing, lower compared to quality of classical corpora, but its significant advantage will be size. (en)
  • Sestavení rozsáhlého českého textového korpusu z dat dostupných na internetu a jeho základní lingvistické zpracování automatickými metodami.
Title
  • Internet as a language corpus (en)
  • Internet jako jazykový korpus
skos:notation
  • GA405/09/0278
http://linked.open...avai/cep/aktivita
http://linked.open...kovaStatniPodpora
http://linked.open...ep/celkoveNaklady
http://linked.open...datumDodatniDoRIV
http://linked.open...i/cep/druhSouteze
http://linked.open...ep/duvernostUdaju
http://linked.open.../cep/fazeProjektu
http://linked.open...ai/cep/hlavniObor
http://linked.open...hodnoceniProjektu
http://linked.open...vai/cep/kategorie
http://linked.open.../cep/klicovaSlova
  • internet; jazykový korpus; čištění textových dat; značkování (en)
http://linked.open...ep/partnetrHlavni
http://linked.open...inujicichPrijemcu
http://linked.open...cep/pocetPrijemcu
http://linked.open...ocetSpoluPrijemcu
http://linked.open.../pocetVysledkuRIV
http://linked.open...enychVysledkuVRIV
http://linked.open...lneniVMinulemRoce
http://linked.open.../prideleniPodpory
http://linked.open...iciPoslednihoRoku
http://linked.open...atUdajeProjZameru
http://linked.open.../vavai/cep/soutez
http://linked.open...usZobrazovaneFaze
http://linked.open...ai/cep/typPojektu
http://linked.open...ep/ukonceniReseni
http://linked.open.../cep/vedlejsiObor
http://linked.open...ep/zahajeniReseni
http://linked.open...jektu+dodavatelem
  • Řešení projektu proběhlo výtečně, jak z hlediska odborného tak i z hlediska čerpání finančních prostředků. (cs)
  • The project was excellent, both in terms of expertise and in terms of disbursement of funds. (en)
http://linked.open...tniCyklusProjektu
http://linked.open.../cep/klicoveSlovo
  • internet
  • jazykový korpus
  • čištění textových dat
is http://linked.open...vavai/riv/projekt of
is http://linked.open...vavai/cep/projekt of
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software