About: Real-time big data webmining and data processing     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

AttributesValues
rdf:type
rdfs:seeAlso
Description
  • The source code represents the backend of web application. It is written in the Python 2.7, gui is not neccessary because the script is run in the specified time interval automatically by Phone/PC. Front-end can be made individually (JS, PHP, mySQL). For example: http://space-walk.info/phd/pages/cz/realmining.php. The main goal is to analyze BIG DATA volumes from the websites in the real-time and visualise them through the selected API. The Plot.ly and Google services are used in this case along with mySQL, PHP, Javascript to handle processing and visualisation. Code includes basic classes to handle HTML structure: MLStripper(HTMLParser): - clears the HTML structure (tagy, JS, atp.) ParseIt: - analyses the target websites - utilizes Counter collection and BeautifulSoup library for easier HTML transformation to classes, which allows elegant atribute handling - saves data into associative arrays, dictionaries, sometimes in the multidimensional structure - words are filtered by the bad word dictionaries Badwords: - manipulation with the bad word dictionaries, usage is optional, the stopwords.txt is usualy good enough PublishResults: - utilizes Plot.ly service as an API to visualize graphs - necessary to set up the app_cfg.py to access mySQL and API account SpecificAnalyzes: - searches top word contexts, based on the parametrical values - distance Crimes - searches through the set of words that are familiar to specific crime - in case of positive occurrance, the JSON dictionary of towns is scanned and the adequate town is returned (only for towns with more than 5 000 inhabitants). - JSON is utilized because of complicated structure of the Czech language
  • The source code represents the backend of web application. It is written in the Python 2.7, gui is not neccessary because the script is run in the specified time interval automatically by Phone/PC. Front-end can be made individually (JS, PHP, mySQL). For example: http://space-walk.info/phd/pages/cz/realmining.php. The main goal is to analyze BIG DATA volumes from the websites in the real-time and visualise them through the selected API. The Plot.ly and Google services are used in this case along with mySQL, PHP, Javascript to handle processing and visualisation. Code includes basic classes to handle HTML structure: MLStripper(HTMLParser): - clears the HTML structure (tagy, JS, atp.) ParseIt: - analyses the target websites - utilizes Counter collection and BeautifulSoup library for easier HTML transformation to classes, which allows elegant atribute handling - saves data into associative arrays, dictionaries, sometimes in the multidimensional structure - words are filtered by the bad word dictionaries Badwords: - manipulation with the bad word dictionaries, usage is optional, the stopwords.txt is usualy good enough PublishResults: - utilizes Plot.ly service as an API to visualize graphs - necessary to set up the app_cfg.py to access mySQL and API account SpecificAnalyzes: - searches top word contexts, based on the parametrical values - distance Crimes - searches through the set of words that are familiar to specific crime - in case of positive occurrance, the JSON dictionary of towns is scanned and the adequate town is returned (only for towns with more than 5 000 inhabitants). - JSON is utilized because of complicated structure of the Czech language (en)
Title
  • Real-time big data webmining and data processing
  • Real-time big data webmining and data processing (en)
skos:prefLabel
  • Real-time big data webmining and data processing
  • Real-time big data webmining and data processing (en)
skos:notation
  • RIV/00216275:25410/13:39896075!RIV14-MSM-25410___
http://linked.open...avai/predkladatel
http://linked.open...avai/riv/aktivita
http://linked.open...avai/riv/aktivity
  • S
http://linked.open...vai/riv/dodaniDat
http://linked.open...aciTvurceVysledku
http://linked.open.../riv/druhVysledku
http://linked.open...iv/duvernostUdaju
http://linked.open...onomickeParametry
  • Rychlé získání potřebných informací z velkého objemu dat, multioborové
http://linked.open...titaPredkladatele
http://linked.open...dnocenehoVysledku
  • 101630
http://linked.open...ai/riv/idVysledku
  • RIV/00216275:25410/13:39896075
http://linked.open...terniIdentifikace
  • 0.8
http://linked.open...riv/jazykVysledku
http://linked.open.../riv/klicovaSlova
  • python, webmining, big data (en)
http://linked.open.../riv/klicoveSlovo
http://linked.open...ontrolniKodProRIV
  • [AE92380473DE]
http://linked.open.../licencniPoplatek
http://linked.open...in/vavai/riv/obor
http://linked.open...ichTvurcuVysledku
http://linked.open...cetTvurcuVysledku
http://linked.open...UplatneniVysledku
http://linked.open...echnickeParametry
  • python 2.7, beautiful soup
http://linked.open...iv/tvurceVysledku
  • Hovad, Jan
http://linked.open...avai/riv/vlastnik
http://linked.open...itiJinymSubjektem
http://localhost/t...ganizacniJednotka
  • 25410
Faceted Search & Find service v1.16.118 as of Jun 21 2024


Alternative Linked Data Documents: ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software