ISBI - Teknik för internetsökning och omvärldsbevakning
The course gives an insight into the techniques for information searching and monitoring applied on the Internet. After the course is finished, the students should be able to:
- Compare the models of information retrieval, explain their advantages and disadvantages.
- Measure the quality of information retrieval tools.
- Explain the principles and algorithms used by major search engines and apply this knowledge for developing one’s own web documents.
- Explain how Business Intelligence systems work, their strength and weaknesses.
- Make a specification of a Business Intelligence system that fulfills certain requirements.
- Explain and choose language technology tools that increase the quality of document retrieval and filtering.
- Use the terminology and concepts in information retrieval and business intelligence
Fundamentals of Information Retrieval: Boolean, term weight- and vector-space text retrieval models; document similarity measures; quality measures - precision and recall; index of documents and its access methods; morphologic and semantic analysis in text retrieval.
Query analysis: Processing the search word and index using word stemming, query expansion, fuzzy matching, compound splitting and compound joining that increase the quality of search. Other techniques are automatic translation of search words to other languages to make cross language information retrieval.
Information clustering and presentation: Sorting of text flows using automatic clustering and semi automatic clustering. Automatic document summarization removes redundant information from a document and creates a shorter summarized document. Multi document summarization summarizes several documents to one document. Using machine translation to present results in the users native language.
Search Engines: Architecture of a search engine; crawlers and features that hinder crawling; keyword-based retrieval; link analysis and PageRank; optimization of websites for search engines (Search Engine Optimization) and search engine spamming; paid listing; meta-search engines; web directories. Furthermore, there exist authoritative information accessible over the Internet and not visible to ordinary search engines. This material resides on the "invisible web", which is largely comprised of content-rich databases from universities, libraries, associations, businesses, and government agencies.
Monitoring tools: News archives and indexing tools, news alerts and agents, and RSS based news surveillance tools.
Question-Answering Systems deliver the answer to the question the user has in mind while searching, instead of a ranked list of documents. The three main question-answering approaches are based on Natural Language Processing, Information Retrieval, and question templates.
Half speed Credits (p): 7,5 Lectures: 14 lectures x 2 hours Assignments: 3 Laborations: 3 occasions x 2 hours
Laborations are carried out in groups of maximum two students. Assignments in groups of maximum of four students. Laborations are carried out at university at fixed times under supervision of the course managers. The assignments are carried out at home but there are occasions of supervision where the students can ask questions and get support from the teacher.
The distance student must only participate physical for the exam, the rest of the tasks are solved completely at distance. The distance education students must be present at the campus for the exam, the rest of the tasks can be solved using electronic means of communication. If a distance student has no possibility to form a group then the student is allowed to solve all tasks alone, http://www.dsv.su.se/~eriks/66BI/66BIdist.html.