About: Selecting text entries using a few positive samples and similarity ranking

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Selecting text entries using a few positive samples and similarity ranking Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
Description	This research was inspired by procedures that are used by human bibliographic searchers: Given some textual and only 'positive' (relevant, interesting) examples coming just from one category, find promptly and simply in an available collection of various unlabeled documents the most similar ones that belong to a relevant topic defined by an applicant. The problem of the categorization of unlabeled relevant and irrelevant textual documents is here solved by using a small subset of relevant available patterns labeled manually in advance. Unlabeled text items are compared with such labeled patterns. The unlabeled samples are then ranked according their degree of similarity with the patterns. At the top of the rank, there are the most similar (relevant) items. Entries receding from the rank top represent gradually less and less similar entries. The authors emphasize that this simple method, aimed at processing large volumes of text entries, provides initial filtering results from the accuracy point of view and the users can avoid the demanding task of labeling too many training examples to be able to apply a chosen classifier, and at the same time, they can obtain quickly the relevant items. The ranking-based approach gives results that can be possibly further used for the following text-item processing where the number of irrelevant items is already not so high as at the beginning. Even if this relatively simple automatic search is not errorless due to the overlapping of documents, it can help process particularly very large unstructured textual data volumes. This research was inspired by procedures that are used by human bibliographic searchers: Given some textual and only 'positive' (relevant, interesting) examples coming just from one category, find promptly and simply in an available collection of various unlabeled documents the most similar ones that belong to a relevant topic defined by an applicant. The problem of the categorization of unlabeled relevant and irrelevant textual documents is here solved by using a small subset of relevant available patterns labeled manually in advance. Unlabeled text items are compared with such labeled patterns. The unlabeled samples are then ranked according their degree of similarity with the patterns. At the top of the rank, there are the most similar (relevant) items. Entries receding from the rank top represent gradually less and less similar entries. The authors emphasize that this simple method, aimed at processing large volumes of text entries, provides initial filtering results from the accuracy point of view and the users can avoid the demanding task of labeling too many training examples to be able to apply a chosen classifier, and at the same time, they can obtain quickly the relevant items. The ranking-based approach gives results that can be possibly further used for the following text-item processing where the number of irrelevant items is already not so high as at the beginning. Even if this relatively simple automatic search is not errorless due to the overlapping of documents, it can help process particularly very large unstructured textual data volumes. (en)
Title	Selecting text entries using a few positive samples and similarity ranking Selecting text entries using a few positive samples and similarity ranking (en)
skos:prefLabel	Selecting text entries using a few positive samples and similarity ranking Selecting text entries using a few positive samples and similarity ranking (en)
skos:notation	RIV/62156489:43110/11:00173470!RIV12-MSM-43110___
http://linked.open...avai/riv/aktivita	Z
http://linked.open...avai/riv/aktivity	Z(MSM6215648904)
http://linked.open...iv/cisloPeriodika	4
http://linked.open...vai/riv/dodaniDat	2012
http://linked.open...aciTvurceVysledku	Dařena, František Žižka, Jan
http://linked.open.../riv/druhVysledku	J - Článek v odborném periodiku
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...titaPredkladatele	Mendelova univerzita v Brně / Provozně ekonomická fakulta
http://linked.open...dnocenehoVysledku	228678
http://linked.open...ai/riv/idVysledku	RIV/62156489:43110/11:00173470
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	text similarity; one-class categorization; ranking by similarity; machine learning; natural language processing; unlabeled text documents; pattern recognition; non-semantic documents (en)
http://linked.open.../riv/klicoveSlovo	natural language processing non-semantic documents one-class categorization ranking by similarity text similarity unlabeled text documents machine learning pattern recognition
http://linked.open...odStatuVydavatele	CZ - Česká republika
http://linked.open...ontrolniKodProRIV	[CEB1EB21823D]
http://linked.open...i/riv/nazevZdroje	Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
http://linked.open...in/vavai/riv/obor	IN
http://linked.open...ichTvurcuVysledku	2 (xsd:int)
http://linked.open...cetTvurcuVysledku	3 (xsd:int)
http://linked.open...UplatneniVysledku	2011
http://linked.open...v/svazekPeriodika	LIX
http://linked.open...iv/tvurceVysledku	Žižka, Jan Dařena, František Svoboda, Arnošt
http://linked.open...n/vavai/riv/zamer	The Czech Economy in the Process of Integration and Globalisation, and the Development of Agricultural Sector and the Sector of Services under the New Conditions of the Integrated European Market
issn	1211-8516
number of pages	10 (xsd:int)
http://localhost/t...ganizacniJednotka	43110

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 48 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software