Predictive Modelling and Machine Learning
Synopsis:
This course introduces the principles, theories and concepts of statistics and data modelling.
Students will learn how to construct and interpret graphical presentations of data, conduct
appropriate statistical tests, use the appropriate techniques in data modelling and interpret
the results generated. The students will be applying these statistics and data modelling
techniques in practical projects, through which they will learn how to develop real world
analytics solutions using RapidMiner.
Prerequisite: Nil
Topics
2020 March
Materials on Google Class room
Code to join the Google classroom: bfrtayn
Schedule
Venue: UE0002
Assessment:
- Lab Practical 15%
- Group Assignment 35%
- Written Exam 50%
Instructors:
- Ai Huey LIM lim_ai_huey@nyp.edu.sg
- Joanne FOO joanne_foo@nyp.edu.sg
Extra Materials
Tools and Softwares
- Microsoft Excel
- Tableau
Desktop Student license will be issued during the lab session.
To activate Tableau on PC behind the proxy firewall, look
here .
- RapidMiner
Studio Trial license allows 1000 rows of test record.
To register RapidMiner on PC behind the proxy firewall
- Launch rapidminer studio
- When the application prompts for license, choose “manually enter the license”
- The license key can be applied and retrieved from "My Rapidminer".
- Copy paste the key from the link above into the license key
window in RapidMiner Studio
References:
- Statistics for Managers: Using Microsoft Excel (6th ed.), David M. Levine et al. (2011), OT, B.
- Foundations of Predictive Analytics, James Wu & Stephen Coggeshall (2012), OT, B.
- Data Mining: Concepts and Techniques, Jiawei Han, Micheline
Kamber, and Jian Pei, (2011), OT, B.
Old materials
2017 October
- Statistics Essentials (Part 1 || Part 2)
- Descriptive Statistics
- Types of data
- Central Tendency
- Data Visualisation
- Types of Charts
- Dashboard Interaction
- Animation
- Inferential Statistics
- Univariate Statistical Techniques
- Multivariate Statistical Techniques
- Parametric and non-parametric tests
- Predictive Modelling (Part 1 || Part 2 || Part 3)
- CRISP-DM model
- Supervised Learning
- Classification
- Decision Tree
- Linear Regression / Logistic Regression
- Unsupervised Learning
- Association Rule Mining
- Types of Clustering
- Applications of Predictive Models in Domains
-
Scala, Spark and Machine Learning
- Scala
- slides
- exercises
- source codes
- Spark Machine Learning
- slide 1
- exercise 1
- slide 2
- exercise 2
- Exercise and discussion
Assessment:
- Practical assignment (individual, 50%)
- Group research report and presentation (2-3 in a group, 50%)
- Suggested topics. But you
are not limited to these and you are encouraged to find out other
topics and applications of predictive modelling.
- Pick an application of Machine Learning, e.g. sentiment analysis, facial recognition, object identification, search-based regression test prioritization which can be solved by using at least two different approaches (two different ML approaches).
- Study and research into papers and tools on related to these approaches in depth.
- Compare these approaches in the context of the application of study.
- Write up a summary on the problem, the solutions and the comparison.
- A 10 minutes presentation is required.
- Deadline 15 Mar 2019
- Email your presentation slides and report to keith_fwa@nyp.edu.sg