MA3700: Mathematical Methods for Data Mining
School | Cardiff School of Mathematics |
Department Code | MATHS |
Module Code | MA3700 |
External Subject Code | G100 |
Number of Credits | 10 |
Level | L6 |
Language of Delivery | English |
Module Leader | Professor Alexander Balinsky |
Semester | Spring Semester |
Academic Year | 2014/5 |
Outline Description of Module
Recent tremendous technical advances in processing power, storage capacity, and inter-connectivity of computer technology are creating unprecedented quantities of digital data. Data mining (also known as Knowledge Discovery in Data, or KDD), the science of extracting useful knowledge from such huge data repositories, has emerged as a young and interdisciplinary field. Data mining techniques have been widely applied to problems in industry, science, engineering and government, and it is widely believed that data mining will have profound impact on our society.
This module provides an introduction to the basic ideas and methods of mathematical data mining. In this course, we will consider the following problems: classification, cluster and outlier analysis, mining time-series and sequence data, text mining and web mining, pattern analysis.
A lecture-based module open to all students with a suitable grounding. It covers the fundamental data mining ideas (clustering, support vector machine analysis, semi-supervised learning, information retrieval, collaborative filtering, harmonic analysis) and the most important algorithms (the k-means algorithm, support vector machines, PageRank algorithm, k-nearest neighbour classification, Naive Bayes).
On completion of the module a student should be able to
- Know and understand concepts of data mining.
- Know and understand mining on different kinds of data.
- Know and understand mining for different kind of knowledge.
- Understand main principles of pattern analysis and machine learning.
- Representation Discovery using Harmonic Analysis
How the module will be delivered
27 – 50 minute lectures
Some handouts will be provided in hard copy or via Learning Central, but students will be expected to take notes of lectures.
Students are also expected to undertake at least 50 hours private study including preparation of solutions to given exercises.
Skills that will be practised and developed
Skills:
The ability to understand and apply mathematical tools for data mining in many contemporary applications.
Transferable Skills:
Ability to recognize, formulate and solve mathematical problems in an interdisciplinary environment.
How the module will be assessed
Formative assessment is carried out in the problem classes. Feedback to students on their solutions and their progress towards learning outcomes is provided during these classes.
The in-course element of summative assessment is based on a class test (taken under examination conditions) similar in form to the tutorial exercises.
The major component of summative assessment is the written examination at the end of the module. This gives students the opportunity to demonstrate their overall achievement of learning outcomes. It also allows them to give evidence of the higher levels of knowledge and understanding required for above average marks.
The examination paper has a choice of three from four equally weighted questions.
Assessment Breakdown
Type | % | Title | Duration(hrs) |
---|---|---|---|
Exam - Spring Semester | 85 | Mathematical Methods For Data Mining | 2 |
Class Test | 15 | Class Test | N/A |
Syllabus content
- Concepts of Data Mining.
- Classification, cluster and outlier analysis.
- Algorithms of Data Mining.
- Collaborative filtering and recommender systems.
- General pattern theory.
- The mathematics of machine learning.
- Applications in Text Mining and Web Mining.
Essential Reading and Resource List
The top ten algorithms in Data Mining. Data Mining and Knowledge Discovery Series, Chapman& Hall/CRC, 2009.
Text Mining. Classification, Clustering and Applications. Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC, 2009.
Introduction to Semi-Supervised Learning. Zhu, X., & Goldberg, A. B., Morgan & Claypool, 2009.
Background Reading and Resource List
Not applicable.