This project will be offered in winter term 2023/24. It can be placed both in Bachelor’s and Master’s program. More information will be provided soon.

Dates and Deadlines:

  • 08.10.2023 Registration deadline for the Project

Content

In this project, a Web Search Engine is to be developed. The core tasks are roughly the following:

  • Implement an HTML Parser.
  • Design and Implement a Web Crawler.
  • Design the required database schema to store the contents of visited pages and the link structure.
  • Write an SQL-based query processor to execute Google-style keyword queries.
  • Devise/Create index structures to accelerate the querying performance.
  • Implement alternate query processors using threshold algorithms.
  • Realize alternate methods to compute the score of how well a document matches the query.
  • For this, implement Google’s Pagerank algorithm and integrate it in the scoring model.
  • Implement an HTML-based user interface and a Web service
  • Use the Web services of your fellow student to realize a meta search engine.

Prerequisites

  • Participants should have successfully attended the core lecture Datenbanksysteme (database systems) or equivalent.
  • The lecture Information Retrieval and Data Mining is not a prerequisite but recommended to understand the deeper theory behind most of the aspects behind this project.
  • Having attended the beginner’s course Informationssysteme (information systems), or equivalent, is assumed anyway.

Registration

  • The number of participants is limited.
  • Registration is not done on a first-come, first-served basis.
  • In order to register, download this JSON template registration file, rename it to yourmatriculationnumber.json, edit it to reflect your information, and send it as an attachment via email to Damjan Gjurovski (damjan.gjurovski@cs.rptu.de). Make sure the file is valid JSON and ASCII or UTF-8 encoded, the latter without byte order mark. Please use your official university email account @cs.uni-kl.de or @student.uni-kl.de or @rhrk.uni-kl.de or @rptu…. to register and to send the email.
  • Registration due 08.10.2023
  • Soon after the end of the registration, we will let you know whether or not you got a slot in the project.

Literature

We will introduce the main concepts of the required techniques/tools when handing out the individual exercise sheets. In addition, the following are standard books for databases and information retrieval you might want to consult. We will also give specific pointers to Web sources during the semester.

  • Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan Hinrich Schütze, 2008.
  • Information Retrieval: Implementing and Evaluating Search Engines,by Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack.
  • Datenbanksysteme: Eine Einführung (German), by Alfons Kemper and André Eickler.
  • Database Management Systems, by Raghu Ramakrishnan and Johannes Gehrke.