About: Finding Terms in Corpora for Many Languages with the Sketch Engine     Goto   Sponge   Distinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
rdfs:seeAlso
Description
  • Term candidates for a domain, in a language, can be found by taking a corpus for the domain, and a refer- ence corpus for the language identifying the grammatical shape of a term in the language tokenising, lemmatising and POS-tagging both corpora identifying (and counting) the items in each corpus which match the grammatical shape for each item in the domain corpus, compar- ing its frequency with its frequency in the refence corpus. Then, the items with the highest frequency in the domain corpus in comparison to the reference cor- pus will be the top term candidates. None of the steps above are unusual or innova- tive for NLP (see, e. g., (Aker et al., 2013), (Go- jun et al., 2012)). However it is far from trivial to implement them all, for numerous languages, in an environment that makes it easy for non- programmers to find the terms in a domain. This is what we have done in the Sketch Engine (Kilgarriff et al., 2004), and will demonstrate.
  • Term candidates for a domain, in a language, can be found by taking a corpus for the domain, and a refer- ence corpus for the language identifying the grammatical shape of a term in the language tokenising, lemmatising and POS-tagging both corpora identifying (and counting) the items in each corpus which match the grammatical shape for each item in the domain corpus, compar- ing its frequency with its frequency in the refence corpus. Then, the items with the highest frequency in the domain corpus in comparison to the reference cor- pus will be the top term candidates. None of the steps above are unusual or innova- tive for NLP (see, e. g., (Aker et al., 2013), (Go- jun et al., 2012)). However it is far from trivial to implement them all, for numerous languages, in an environment that makes it easy for non- programmers to find the terms in a domain. This is what we have done in the Sketch Engine (Kilgarriff et al., 2004), and will demonstrate. (en)
Title
  • Finding Terms in Corpora for Many Languages with the Sketch Engine
  • Finding Terms in Corpora for Many Languages with the Sketch Engine (en)
skos:prefLabel
  • Finding Terms in Corpora for Many Languages with the Sketch Engine
  • Finding Terms in Corpora for Many Languages with the Sketch Engine (en)
skos:notation
  • RIV/00216224:14330/14:00075387!RIV15-MSM-14330___
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(LM2010013), S
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 16860
http://linked.open...ai/riv/idVysledku
  • RIV/00216224:14330/14:00075387
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • terminology; terms; corpora; sketch engine (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [7749E15EF4C1]
http://linked.open...v/mistoKonaniAkce
  • Gothenburg, Sweden
http://linked.open...i/riv/mistoVydani
  • Gothenburg, Sweden
http://linked.open...i/riv/nazevZdroje
  • Proceedings of the Demonstrations at the 14th Conferencethe European Chapter of the Association for Computational Linguistics
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Jakubíček, Miloš
  • Kovář, Vojtěch
  • Suchomel, Vít
  • Kilgarriff, Adam
  • Rychlý, Pavel
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
number of pages
http://purl.org/ne...btex#hasPublisher
  • The Association for Computational Linguistics
https://schema.org/isbn
  • 9781937284756
http://localhost/t...ganizacniJednotka
  • 14330
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 100 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software