About: Internet as a language corpus     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Projekt, within Data Space : linked.opendata.cz:8890 associated with source document(s)

AttributesValues
rdf:type
Description
  • Sestavení rozsáhlého českého textového korpusu z dat dostupných na internetu a jeho základní lingvistické zpracování automatickými metodami. (cs)
  • Sufficient amounts of language data (text corpora) are absolutely essential for methods of computational linguistics and natural language processing. Rapid development of computer technology allows processing of much larger datasets than before. However, such data is not available. Currently, the largest Czech corpora contaion at most hundreds of millions of tokens (Czech national corpus), which is for many methods not sufficient. Building text corpora is time-consuming and expensive process and can not possibly satisfy needs of current research in the field. The proposed project aims to build a text corpus at least ten times larger than currently available corpora with incomparably lower expenses. The corpus will be build from data publicly available on the internet. Automatically downloaded data will be filtered, cleaned up and linguistically processed. Language quality of such corpus will be, due to completely automatic processing, lower compared to quality of classical corpora, but its significant advantage will be size. (en)
Title
  • Internet as a language corpus (en)
  • Internet jako jazykový korpus (cs)
http://linked.open...vai/cislo-smlouvy
http://linked.open...avai/druh-souteze
http://linked.open...domain/vavai/faze
http://linked.open...vavai/hlavni-obor
http://linked.open...vai/vedlejsi-obor
http://linked.open...vavai/id-aktivity
http://linked.open.../vavai/id-souteze
http://linked.open...n/vavai/kategorie
http://linked.open...vai/klicova-slova
  • internet; jazykový korpus; čištění textových dat; značkování (en)
http://linked.open...avai/konec-reseni
http://linked.open...nujicich-prijemcu
http://linked.open...avai/poskytovatel
http://linked.open...avai/start-reseni
http://linked.open...ai/statni-podpora
http://linked.open...vavai/typProjektu
http://linked.open...ai/uznane-naklady
http://linked.open...ai/pocet-prijemcu
http://linked.open...cet-spoluprijemcu
http://linked.open...ai/pocet-vysledku
http://linked.open...ku-zverejnovanych
is http://linked.open...ain/vavai/projekt of
Faceted Search & Find service v1.16.121 as of Mar 31 2025


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data]
OpenLink Virtuoso version 07.20.3240 as of Mar 31 2025, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 40 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software