About: Strigil: A Framework for Data Extraction in Semi-Structured Web Documents     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
Description
  • In this paper we introduce Strigil, a framework for automated data extraction. It represents an easily con gurable tool that enables one to retrieve a data from textual or weak structured documents. The paper contains description of the framework architecture and its important components. Additionally, we propose a scraping language inspired by the XSL transformations designed to extract data from di erent kinds of documents. Although there are many di erent approaches focused on various aspects of data scraping, they are usually very specialized to a concrete domain or a data source. We compare these solutions and discuss their advantages and disadvantages. Our scraping language is designed to work with an ontology to map scraped data directly to classes and attributes.
  • In this paper we introduce Strigil, a framework for automated data extraction. It represents an easily con gurable tool that enables one to retrieve a data from textual or weak structured documents. The paper contains description of the framework architecture and its important components. Additionally, we propose a scraping language inspired by the XSL transformations designed to extract data from di erent kinds of documents. Although there are many di erent approaches focused on various aspects of data scraping, they are usually very specialized to a concrete domain or a data source. We compare these solutions and discuss their advantages and disadvantages. Our scraping language is designed to work with an ontology to map scraped data directly to classes and attributes. (en)
Title
  • Strigil: A Framework for Data Extraction in Semi-Structured Web Documents
  • Strigil: A Framework for Data Extraction in Semi-Structured Web Documents (en)
skos:prefLabel
  • Strigil: A Framework for Data Extraction in Semi-Structured Web Documents
  • Strigil: A Framework for Data Extraction in Semi-Structured Web Documents (en)
skos:notation
  • RIV/00216208:11320/13:10192339!RIV14-TA0-11320___
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • P(TA02010182)
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 108265
http://linked.open...ai/riv/idVysledku
  • RIV/00216208:11320/13:10192339
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • Web; Semi-Structured Data; Data Extraction; Framework; Strigil (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [4484F86780FE]
http://linked.open...v/mistoKonaniAkce
  • Vienna, Austria
http://linked.open...i/riv/mistoVydani
  • ACM Press
http://linked.open...i/riv/nazevZdroje
  • Proceedings of the 15th International Conference on Information Integration and Web-based Applications & Services
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...vavai/riv/projekt
http://linked.open...UplatneniVysledku
http://linked.open...iv/tvurceVysledku
  • Nečaský, Martin
  • Stárka, Jakub
  • Holubová, Irena
http://linked.open...vavai/riv/typAkce
http://linked.open.../riv/zahajeniAkce
number of pages
http://purl.org/ne...btex#hasPublisher
  • ACM Press
https://schema.org/isbn
  • 978-1-4503-2113-6
http://localhost/t...ganizacniJednotka
  • 11320
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 110 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software