About: Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared dictionary of lemmas and set of derivation rules while the second one is based on automatic inference of the dictionary and the rules from training data. The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR) experiments. Such method is suitable for efficient and rather reliable comparison of the lemmatization performance since a correct lemmatization has proven to be crucial for IR effectiveness in highly inflected languages. Moreover, the proposed indirect comparison of the lemmatizers circumvents the need for manually lemmatized test data which are hard to obtain and also face the problem of incompatible sets of lemmas across different systems.
  • This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared dictionary of lemmas and set of derivation rules while the second one is based on automatic inference of the dictionary and the rules from training data. The comparison is done by evaluating the mean Generalized Average Precision (mGAP) measure of the lemmatized documents and search queries in the set of information retrieval (IR) experiments. Such method is suitable for efficient and rather reliable comparison of the lemmatization performance since a correct lemmatization has proven to be crucial for IR effectiveness in highly inflected languages. Moreover, the proposed indirect comparison of the lemmatizers circumvents the need for manually lemmatized test data which are hard to obtain and also face the problem of incompatible sets of lemmas across different systems. (en)
Title
  • Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance
  • Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance (en)
skos:prefLabel
  • Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance
  • Comparison of Different Lemmatization Approaches through the Means of Information Retrieval Performance (en)
skos:notation
  • RIV/49777513:23520/10:00504240!RIV11-AV0-23520___
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(1ET101470416), P(LC536), S
http://linked.open...iv/cisloPeriodika
  • 6231
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 251418
http://linked.open...ai/riv/idVysledku
  • RIV/49777513:23520/10:00504240
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • lemmatization; information retrieval (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...odStatuVydavatele
  • DE - Spolková republika Německo
http://linked.open...ontrolniKodProRIV
  • [C6E5E8B49AB8]
http://linked.open...i/riv/nazevZdroje
  • Lecture Notes in Artificial Intelligence
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...v/svazekPeriodika
  • 2010
http://linked.open...iv/tvurceVysledku
  • Kanis, Jakub
  • Skorkovská, Lucie
issn
  • 0302-9743
number of pages
http://localhost/t...ganizacniJednotka
  • 23520
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 91 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software