Course Outline

Quick Overview

  • Data Sources
  • Minding Data
  • Recommender systems
  • Target Marketing


  • Structured vs unstructured
  • Static vs streamed
  • Attitudinal, behavioural and demographic data
  • Data-driven vs user-driven analytics
  • data validity
  • Volume, velocity and variety of data


  • Building models
  • Statistical Models
  • Machine learning

Data Classification

  • Clustering
  • kGroups, k-means, the nearest neighbours
  • Ant colonies, birds flocking

Predictive Models

  • Decision trees
  • Support vector machine
  • Naive Bayes classification
  • Neural networks
  • Markov Model
  • Regression
  • Ensemble methods


  • Benefit/Cost ratio
  • Cost of software
  • Cost of development
  • Potential benefits

Building Models

  • Data Preparation (MapReduce)
  • Data cleansing
  • Choosing methods
  • Developing model
  • Testing Model
  • Model evaluation
  • Model deployment and integration

Overview of Open Source and commercial software

  • Selection of R-project package
  • Python libraries
  • Hadoop and Mahout
  • Selected Apache projects related to Big Data and Analytics
  • Selected commercial solution
  • Integration with existing software and data sources


Understanding of traditional data management and analysis methods like SQL, data warehouses, business intelligence, OLAP, etc... Understanding of basic statistics and probability (mean, variance, probability, conditional probability, etc....)

 21 Hours

Number of participants

Price per participant

Testimonials (1)

Related Courses

Data Vault: Building a Scalable Data Warehouse

28 Hours

Spark Streaming with Python and Kafka

7 Hours

Confluent KSQL

7 Hours

Apache Ignite for Developers

14 Hours

Unified Batch and Stream Processing with Apache Beam

14 Hours

Apache Apex: Processing Big Data-in-Motion

21 Hours

Apache Storm

28 Hours

Apache NiFi for Administrators

21 Hours

Apache NiFi for Developers

7 Hours

Apache Flink Fundamentals

28 Hours

Python and Spark for Big Data (PySpark)

21 Hours

Introduction to Graph Computing

28 Hours

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

21 Hours

Apache Spark MLlib

35 Hours

Knowledge Discovery in Databases (KDD)

21 Hours

Related Categories