MA3700: Mathematical Methods for Data Mining

School Cardiff School of Mathematics
Department Code MATHS
Module Code MA3700
External Subject Code G100
Number of Credits 10
Level L6
Language of Delivery English
Module Leader Professor Alexander Balinsky
Semester Spring Semester
Academic Year 2015/6

Outline Description of Module

Recent tremendous technical advances in processing power, storage capacity, and inter-connectivity of computer technology are creating unprecedented quantities of digital data. Data mining (also known as Knowledge Discovery in Data, or KDD), the science of extracting useful knowledge from such huge data repositories, has emerged as a young and interdisciplinary field. Data mining techniques have been widely applied to problems in industry, science, engineering and government, and it is widely believed that data mining will have profound impact on our society.

This module provides an introduction to the basic ideas and methods of mathematical data mining. In this course, we will consider the following problems: classification, cluster and outlier analysis, mining time-series and sequence data, text mining and web mining, pattern analysis.

A lecture-based module open to all students with a suitable grounding. It covers the fundamental data mining ideas (clustering, support vector machine analysis, semi-supervised learning, information retrieval, collaborative filtering, harmonic analysis) and the most important algorithms (the k-means algorithm,  support vector machines,  PageRank algorithm, k-nearest neighbour classification, Naive Bayes).

On completion of the module a student should be able to

  • Know and understand concepts of data mining.
  • Know and understand mining on different kinds of data.
  • Know and understand mining for different kind of knowledge.
  • Understand main principles of pattern analysis and machine learning.
  • Representation Discovery using Harmonic Analysis

How the module will be delivered

27 – 50 minute lectures

Some handouts will be provided in hard copy or via Learning Central, but students will be expected to take notes of lectures.

Students are also expected to undertake at least 50 hours private study including preparation of solutions to given exercises.

Skills that will be practised and developed

Skills:

The ability to understand and apply mathematical tools for data mining in many contemporary applications.

Transferable Skills:

Ability to recognize, formulate and solve mathematical problems in an interdisciplinary environment.

How the module will be assessed

Formative assessment is carried out in the problem classes.  Feedback to students on their solutions and their progress towards learning outcomes is provided during these classes.  

The in-course element of summative assessment is based on a class test (taken under examination conditions) similar in form to the tutorial exercises.

The major component of summative assessment is the written examination at the end of the module. This gives students the opportunity to demonstrate their overall achievement of learning outcomes.  It also allows them to give evidence of the higher levels of knowledge and understanding required for above average marks.

The examination paper has a choice of three from four equally weighted questions.

Assessment Breakdown

Type % Title Duration(hrs)
Exam - Spring Semester 85 Mathematical Methods For Data Mining 2
Class Test 15 Class Test N/A

Syllabus content

  • Concepts of Data Mining.
  • Classification, cluster and outlier analysis.
  • Algorithms of Data Mining.
  • Collaborative filtering and recommender systems.
  • General pattern theory.
  • The mathematics of machine learning.
  • Applications in Text Mining and Web Mining.

Essential Reading and Resource List

The top ten algorithms in Data Mining. Data Mining and Knowledge Discovery Series, Chapman& Hall/CRC, 2009.

Text Mining. Classification, Clustering and Applications. Data Mining and Knowledge Discovery Series, Chapman &  Hall/CRC, 2009.

Introduction to Semi-Supervised Learning. Zhu, X., & Goldberg, A. B., Morgan & Claypool, 2009.

Background Reading and Resource List

Not applicable.


Copyright Cardiff University. Registered charity no. 1136855