About: Real-time big data webmining and data processing

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: Real-time big data webmining and data processing Goto Sponge NotDistinct Permalink

An Entity of Type : http://linked.opendata.cz/ontology/domain/vavai/Vysledek, within Data Space : linked.opendata.cz associated with source document(s)

Attributes	Values
rdf:type	skos:Concept http://linked.opendata.cz/ontology/domain/vavai/Vysledek
rdfs:seeAlso	http://projekty-usii.upce.cz/soubory/vytvoreny_software/hovad/2013/realtimeWebmining.zip
Description	The source code represents the backend of web application. It is written in the Python 2.7, gui is not neccessary because the script is run in the specified time interval automatically by Phone/PC. Front-end can be made individually (JS, PHP, mySQL). For example: http://space-walk.info/phd/pages/cz/realmining.php. The main goal is to analyze BIG DATA volumes from the websites in the real-time and visualise them through the selected API. The Plot.ly and Google services are used in this case along with mySQL, PHP, Javascript to handle processing and visualisation. Code includes basic classes to handle HTML structure: MLStripper(HTMLParser): - clears the HTML structure (tagy, JS, atp.) ParseIt: - analyses the target websites - utilizes Counter collection and BeautifulSoup library for easier HTML transformation to classes, which allows elegant atribute handling - saves data into associative arrays, dictionaries, sometimes in the multidimensional structure - words are filtered by the bad word dictionaries Badwords: - manipulation with the bad word dictionaries, usage is optional, the stopwords.txt is usualy good enough PublishResults: - utilizes Plot.ly service as an API to visualize graphs - necessary to set up the app_cfg.py to access mySQL and API account SpecificAnalyzes: - searches top word contexts, based on the parametrical values - distance Crimes - searches through the set of words that are familiar to specific crime - in case of positive occurrance, the JSON dictionary of towns is scanned and the adequate town is returned (only for towns with more than 5 000 inhabitants). - JSON is utilized because of complicated structure of the Czech language The source code represents the backend of web application. It is written in the Python 2.7, gui is not neccessary because the script is run in the specified time interval automatically by Phone/PC. Front-end can be made individually (JS, PHP, mySQL). For example: http://space-walk.info/phd/pages/cz/realmining.php. The main goal is to analyze BIG DATA volumes from the websites in the real-time and visualise them through the selected API. The Plot.ly and Google services are used in this case along with mySQL, PHP, Javascript to handle processing and visualisation. Code includes basic classes to handle HTML structure: MLStripper(HTMLParser): - clears the HTML structure (tagy, JS, atp.) ParseIt: - analyses the target websites - utilizes Counter collection and BeautifulSoup library for easier HTML transformation to classes, which allows elegant atribute handling - saves data into associative arrays, dictionaries, sometimes in the multidimensional structure - words are filtered by the bad word dictionaries Badwords: - manipulation with the bad word dictionaries, usage is optional, the stopwords.txt is usualy good enough PublishResults: - utilizes Plot.ly service as an API to visualize graphs - necessary to set up the app_cfg.py to access mySQL and API account SpecificAnalyzes: - searches top word contexts, based on the parametrical values - distance Crimes - searches through the set of words that are familiar to specific crime - in case of positive occurrance, the JSON dictionary of towns is scanned and the adequate town is returned (only for towns with more than 5 000 inhabitants). - JSON is utilized because of complicated structure of the Czech language (en)
Title	Real-time big data webmining and data processing Real-time big data webmining and data processing (en)
skos:prefLabel	Real-time big data webmining and data processing Real-time big data webmining and data processing (en)
skos:notation	RIV/00216275:25410/13:39896075!RIV14-MSM-25410___
http://linked.open...avai/predkladatel	Fakulta ekonomicko-správní
http://linked.open...avai/riv/aktivita	S
http://linked.open...avai/riv/aktivity	S
http://linked.open...vai/riv/dodaniDat	2014
http://linked.open...aciTvurceVysledku	Hovad, Jan
http://linked.open.../riv/druhVysledku	R - Software
http://linked.open...iv/duvernostUdaju	S - Úplné a pravdivé údaje nepodléhající ochraně podle zvláštních právních předpisů
http://linked.open...onomickeParametry	Rychlé získání potřebných informací z velkého objemu dat, multioborové
http://linked.open...titaPredkladatele	Univerzita Pardubice / Fakulta ekonomicko-správní
http://linked.open...dnocenehoVysledku	101630
http://linked.open...ai/riv/idVysledku	RIV/00216275:25410/13:39896075
http://linked.open...terniIdentifikace	0.8
http://linked.open...riv/jazykVysledku	eng - angličtina
http://linked.open.../riv/klicovaSlova	python, webmining, big data (en)
http://linked.open.../riv/klicoveSlovo	big data webmining python
http://linked.open...ontrolniKodProRIV	[AE92380473DE]
http://linked.open.../licencniPoplatek	Z - Poskytovatel licence na výsledek nepožaduje v některých případech licenční poplatek
http://linked.open...in/vavai/riv/obor	IN
http://linked.open...ichTvurcuVysledku	1 (xsd:int)
http://linked.open...cetTvurcuVysledku	1 (xsd:int)
http://linked.open...UplatneniVysledku	2013
http://linked.open...echnickeParametry	python 2.7, beautiful soup
http://linked.open...iv/tvurceVysledku	Hovad, Jan
http://linked.open...avai/riv/vlastnik	Univerzita Pardubice
http://linked.open...itiJinymSubjektem	P - Nabytí licence je nutné v některých případech
http://localhost/t...ganizacniJednotka	25410

Faceted Search & Find service v1.16.118 as of Jun 21 2024

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Jun 21 2024, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (126 GB total memory, 58 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software