= DAMI - Data Mining in Computer and System Sciences =
== Requirements ==
90 hp Computer and Systems Sciences
== Short description ==
As data is becoming more and more readily available, the need to analyse and make use of these large amounts of data is rapidly growing. Data mining deals with techniques that can find interesting and useful patterns in large volumes of data. This course covers basic concepts, techniques and algorithms in data mining combined with hands-on experimentation.
== Aim ==
Knowledge and understanding:
After having taken the course, the student is expected to:
* know how to do data mining on large data sets
* have knowledge of the basic concepts in data mining
* be familiar with basic techniques and algorithms used in data mining
Abilities and skills:
After having taken the course, the student is expected to be able to:
* formulate a data mining problem
* represent a data set in a form that will be useful for data mining
* evaluate the performance of different machine learning algorithms
Judgements and values:
After having taken the course, the student is expected to:
* be able to critically select appropriate tools, representations and algorithms for a given data mining scenario
* be able to critically reflect over ethical/privacy aspects of a proposed data mining study, such as whether or not the design or the results of the study may have any negative effect for people, either by their direct involvment in the study or through the results of the study.
== Syllabus ==
Data mining and machine learning
Fielded applications
Machine learning and statistics
Generalization as search
Data mining and ethics
Input: Concepts, instances, attributes
What’s a concept?
What’s in an example?
What’s in an attribute?
Preparing the input
Output: Knowledge representation
Decision tables
Decision trees
Classification rules
ssociation rules
Rules with exceptions
Rules involving relations
Trees for numeric prediction
Instance-based representation
Clusters
Algorithms: The basic methods
Inferring rudimentary rules
Statistical modeling
Divide-and-conquer: constructing decision trees
Covering algorithms: constructing rules
Mining association rules
Linear models
Instance-based learning
Clustering
Further reading
Credibility: Evaluating what’s been learned
Training and testing
Predicting performance
Cross-validation
Other estimates
Comparing data mining schemes
Predicting probabilities
Counting the cost
Evaluating numeric prediction
The minimum description length (MDL) principle
Applying MDL to clustering
Real machine learning schemes
Decision trees
Classification rules
Extending linear models
Instance-based learning
Numeric prediction
Clustering
Bayesian networks
Transformations: Engineering the input and output
Attribute selection
Discretizing numeric attributes
Some useful transformations
Automatic data cleansing
Combining multiple models
Using unlabeled data
Further reading
Moving on: Extensions and applications
Learning from massive datasets
Incorporating domain knowledge
Text and Web mining
Adversarial situations
Ubiquitous data mining
== Outline ==
Lectures: 8 x 2 hours
Assignment: 1
Seminars: 12 hours.
<>
----
CategoryCategory