Predictive Modelling and Machine Learning

Synopsis:

This course introduces the principles, theories and concepts of statistics and data modelling. Students will learn how to construct and interpret graphical presentations of data, conduct appropriate statistical tests, use the appropriate techniques in data modelling and interpret the results generated. The students will be applying these statistics and data modelling techniques in practical projects, through which they will learn how to develop real world analytics solutions using RapidMiner.

Prerequisite: Nil

Topics

2020 March

Materials on Google Class room

https://classroom.google.com
Code to join the Google classroom: bfrtayn

Schedule

10 Mar10 a.m. to 3 p.m.Day 1: Introduction and Descriptive Statistics and Data Visualization
11 Mar10 a.m. to 3 p.m.Day 2: Inferential Statistics
12 Mar10 a.m. to 3 p.m.Day 3: Guest lecture - Topic: Artificial Intelligence in Mechanical Engineering and Industrial Automation
13 Mar10 a.m. to 3 p.m.Day 4: Predictive Modelling Part 1
16 Mar10 a.m. to 3 p.m.Day 5: Predictive Modelling Part 2
17 Mar10 a.m. to 3 p.m.Day 6: Predictive Modelling Part 3
18 Mar10 a.m. to 3 p.m.Day 7: Consultation and Revision
19 Mar10 a.m. to 3 p.m.Day 8: Consultation and Revision
20 Mar10 a.m. to 3 p.m.Day 9: Final test and closing

Venue: UE0002

Assessment:

  1. Lab Practical 15%
  2. Group Assignment 35%
  3. Written Exam 50%

Instructors:

  1. Ai Huey LIM lim_ai_huey@nyp.edu.sg
  2. Joanne FOO joanne_foo@nyp.edu.sg

Extra Materials

Tools and Softwares

  1. Microsoft Excel
  2. Tableau Desktop Student license will be issued during the lab session. To activate Tableau on PC behind the proxy firewall, look here .
  3. RapidMiner Studio Trial license allows 1000 rows of test record. To register RapidMiner on PC behind the proxy firewall
    1. Launch rapidminer studio
    2. When the application prompts for license, choose “manually enter the license”
    3. The license key can be applied and retrieved from "My Rapidminer".
    4. Copy paste the key from the link above into the license key window in RapidMiner Studio

References:

  1. Statistics for Managers: Using Microsoft Excel (6th ed.), David M. Levine et al. (2011), OT, B.
  2. Foundations of Predictive Analytics, James Wu & Stephen Coggeshall (2012), OT, B.
  3. Data Mining: Concepts and Techniques, Jiawei Han, Micheline Kamber, and Jian Pei, (2011), OT, B.

Old materials

2017 October

  1. Statistics Essentials (Part 1 || Part 2)
    1. Descriptive Statistics
      1. Types of data
      2. Central Tendency
    2. Data Visualisation
      1. Types of Charts
      2. Dashboard Interaction
      3. Animation
    3. Inferential Statistics
      1. Univariate Statistical Techniques
      2. Multivariate Statistical Techniques
      3. Parametric and non-parametric tests
  2. Predictive Modelling (Part 1 || Part 2 || Part 3)
    1. CRISP-DM model
    2. Supervised Learning
      1. Classification
      2. Decision Tree
      3. Linear Regression / Logistic Regression
    3. Unsupervised Learning
      1. Association Rule Mining
      2. Types of Clustering
    4. Applications of Predictive Models in Domains
  3. Scala, Spark and Machine Learning
    1. Scala
      1. slides
      2. exercises
      3. source codes
    2. Spark Machine Learning
      1. slide 1
      2. exercise 1
      3. slide 2
      4. exercise 2
    3. Exercise and discussion

Assessment:

  1. Practical assignment (individual, 50%)
  2. Group research report and presentation (2-3 in a group, 50%)