DAMI - Data Mining in Computer and System Sciences


Advanced level - Second cycle course


90 credits in Computer and Systems Sciences or 90 credits in a DSV bachelor programme with at least 60 credits in Computer and Systems Sciences.

Short description

As data is becoming more and more readily available, the need to analyse and make use of these large amounts of data is rapidly growing. Data mining deals with techniques that can find interesting and useful patterns in large volumes of data. This course covers basic concepts, techniques and algorithms in data mining combined with hands-on experimentation.


Knowledge and understanding: After having taken the course, the student is expected to:

Abilities and skills: After having taken the course, the student is expected to be able to:

Judgements and values: After having taken the course, the student is expected to:


Data mining and machine learning Fielded applications Machine learning and statistics Generalization as search Data mining and ethics

Input: Concepts, instances, attributes What’s a concept? What’s in an example? What’s in an attribute? Preparing the input

Output: Knowledge representation Decision tables Decision trees Classification rules ssociation rules Rules with exceptions Rules involving relations Trees for numeric prediction Instance-based representation Clusters

Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Divide-and-conquer: constructing decision trees Covering algorithms: constructing rules Mining association rules Linear models Instance-based learning Clustering Further reading Credibility: Evaluating what’s been learned Training and testing Predicting performance Cross-validation Other estimates Comparing data mining schemes Predicting probabilities Counting the cost Evaluating numeric prediction The minimum description length (MDL) principle Applying MDL to clustering

Real machine learning schemes Decision trees Classification rules Extending linear models Instance-based learning Numeric prediction Clustering Bayesian networks

Transformations: Engineering the input and output Attribute selection Discretizing numeric attributes Some useful transformations Automatic data cleansing Combining multiple models Using unlabeled data Further reading

Moving on: Extensions and applications Learning from massive datasets Incorporating domain knowledge Text and Web mining Adversarial situations Ubiquitous data mining


Lectures: 8 x 2 hours Assignment: 1 Seminars: 12 hours.


DAMI (last edited 2020-03-27 14:36:36 by sm@su.se)