Abschlussarbeiten
Themen für Abschlussarbeiten
Bitte beachten: Da ich die Universität Duisburg-Essen zum 30. September 2024 verlassen werde, nehme ich keine neuen Abschlussarbeiten mehr an. Ich werde aber selbstverständlich alle angefangenen Arbeiten wie zugesagt zu Ende betreuen.
Allgemeine Anmerkungen
Im Allgemeinen sind diese Themen sowohl für Bachelor- als auch für Masterarbeiten geeignet. Allerdings gibt es natürlich Unterschiede in den Anforderungen auf den verschiedenen Ebenen, z. B. in Bezug auf die Tiefe der Literaturrecherche, den Schwierigkeitsgrad der Lösung und die Diskussion.
Während die Themen in diesem Dokument in englischer Sprache beschrieben werden, können Abschlussarbeiten entweder auf Englisch oder auf Deutsch verfasst werden.
Um sich mit der Forschung unserer Gruppe vertraut zu machen, sehen Sie bitte unter https://searchstudies.org nach. Wenn Sie sich für ein Thema bewerben, erwarte ich, dass Sie sich bereits mit der einschlägigen Literatur und den Arbeiten unserer Gruppe befasst haben. Viele der unten aufgeführten Themen sind mit Projekten verbunden, insbesondere mit RAT (siehe https://searchstudies.org/research/rat/).
Für eine allgemeine Einführung in das Thema Suchmaschinen verweise ich auf mein Buch "Suchmaschinen verstehen" (deutsch) bzw. "Understanding Search Engines" (englisch). Beide Versionen des Buchs sind über die UDE elektronisch verfügbar.
Interessieren Sie sich für ein Thema aus dieser Liste?
Wenn Sie sich für eines der unten aufgeführten Themen interessieren, senden Sie mir bitte eine kurze Interessensbekundung (eine Seite), in der Sie angeben, warum Sie sich gerade für dieses Thema interessieren und welche Kurse Sie besucht haben, die Ihnen das nötige Wissen für die Bearbeitung des gewählten Themas vermittelt haben. Beachten Sie die Anleitung für die Erstellung der Interessensbekundung. Hilfreich für die Strukturierung Ihres Themas - schon bei der Interessenbekundung, aber auch später beim Exposé und der eigentlichen Arbeit - ist die Struktur des Mirco Article. Bitte senden Sie Ihre Interessenbekundung an dirk.lewandowski@uni-due.de. Ich werde mich innerhalb einer Woche mit Ihnen in Verbindung setzen. (Da ich in letzter Zeit einige offentlichtlich KI-generierte bzw. KI-aufgehübschte Interessenbekundungen bekommen habe: Bitte verzichten Sie darauf, das bringt nichts.)
Wenn Ihr Kurzvorschlag angenommen wurde, müssen Sie ein Exposé verfassen, in dem Sie die Grundzüge der Arbeit darlegen und eine erste Literaturübersicht geben. Bitte beachten Sie die Hinweise für das Verfassen des Exposés. Wenn Sie ein überarbeitetes Exposé einreichen, markieren Sie bitte die Passagen, die Sie verändert haben. In der Regel melde ich mich innerhalb von zwei Wochen mit Feedback zu Ihrem Exposé. Leider kann es manchmal aber auch etwas länger dauern (insbesondere zum Semesterende).
Umfang der Abschlussarbeit
Um diese beliebte Frage gleich zu beantworten: In der Prüfungsordnung für den Studiengang Angewandte Informatik steht: "Die Bachelorarbeit soll in der Regel 30 bis 50 Seiten (100.000 Zeichen) umfassen." Analog steht in der PO zum Masterstudiengang: "Die Masterarbeit soll in der Regel 60 bis 100 Seiten (200.000 Zeichen) umfassen." Wenn Sie in einem anderen Studiengang studieren, beachten Sie bitte die Hinweise in der für Sie geltenden Prüfungsordnung.
Anmeldung der Abschlussarbeit
Nachdem ich Ihr Exposé (eventuell in einer überarbeiteten Version nach meinem Feedback) angenommen habe, wird die Arbeit beim Prüfungsamt angemeldet. Dazu füllen Sie bitte das Anmeldeformular aus und schicken es per E-Mail sowie im Original unterschrieben per Post an mich. Wenn Sie eine/n bestimmte/n Zweitprüfer/in bevorzugen, kontaktieren Sie diese Person bitte vorab und fragen, ob sie zu der Zweitbetreuung bereit ist. Wenn Sie das Feld leer lassen, werde ich ein/n Zweitprüfer/in anfragen. Sobald das Formular bei mir eingetroffen ist, leite ich es an das Prüfungsamt weiter, welches Ihnen den offiziellen Abgabetermin mitteilt. Beachten Sie aber bitte, dass die Bearbeitungszeit ab dem auf dem Formular eingetragenen Datum läuft.
Wie schreibe ich eine Abschlussarbeit?
Das Schreiben einer Abschlussarbeit folgt einer Vielzahl von Regeln. Diese reichen von der Gliederung bis zum korrekten Zitieren. Ich empfehle Ihnen dringend, sich mit diesen Regeln vertraut zu machen. Leider gibt es immer wieder eigentlich gute Arbeiten, die aufgrund formaler Mängel schlechter bewertet werden müssen. Das ist schade und Sie können das bei Ihrer Arbeit leicht vermeiden.
Es gibt eine Vielzahl von Büchern zum Thema - lesen Sie eines davon, zum Beispiel "Sicher zur Abschlussarbeit in Natur- und Ingenieurwissenschaften" (https://link.springer.com/book/10.1007/978-3-658-36544-8; Zugang über die Unibibliothek/VPN). Speziell auf das Schreiben im Bereich Informatik ausgerichtet ist das Buch "Writing for Computer Science" (https://ebookcentral.proquest.com/lib/uni-due/detail.action?docID=1974126; ; Zugang über die Unibibliothek/VPN). Ich empfehle es generell, aber insbesondere, wenn Sie Ihre Masterarbeit schreiben. Schließlich bereitet oft auch die systematische Literaturrecherche Probleme; hier hilft unter anderem das Buch "Erfolgreich recherchieren - Informatik" (https://doi.org/10.1515/9783110298956; ; Zugang über die Unibibliothek/VPN).
Ein hilfreiches Tool, um das eigene Thema besser im großen Zusammenhang zu verstehen, ist der Micro Article (https://de.slideshare.net/lichtfouse/micro-arten).
Formatvorlage und Zitierweise
Sie sind nicht an die Verwendung einer bestimmten Formatvorlage gebunden, solange Sie alle Anforderungen einhalten. Ausführliche Hinweise und eine Vorlage zum Download finden Sie auf den Seiten des Fachgebiets Intelligente Systeme (https://www.uni-due.de/is/bachelor_und_masterarbeiten_aufbau_einer_abschlussarbeit). Bedenken Sie aber bitte, dass sich der Aufbau je nach Thema unterscheiden kann. Für das Zitieren der Literatur empfehle ich APA Style (https://apastyle.apa.org/); jeder andere der gängigen Zitierstile ist auch OK, solange er konsistent angewendet wird.
Code und Forschungsdaten gehören in den elektronischen Anhang Ihrer Arbeit und müssen nicht ausgedruckt werden. Sie können aber natürlich Ausschnitte aus dem Code im Haupttext der Arbeit abdrucken, wenn Sie etwas Spezifisches daran erläutern möchten.
Sprechstunde
Wenn Sie ein Thema mit mir in der Sprechstunde besprechen möchten, buchen Sie bitte einen Termin unter https://calendly.com/dirklewandowski/.
Open thesis topics
Da ich die UDE zum 30.9.2024 verlassen werde, nehme ich keine neuen Abschlussarbeiten mehr an.
Reserved thesis topics
Reserved thesis topicTopic extraction and query generation from websites
In search engine research, it is often of interest to compare a website’s standing within search engine results to that of other (competing) websites. To increase the evidential value of such research, a system should be developed that extracts relevant keywords / key phrases from webpages within a website that can later be used to query search engines and measure how the website performs relative to competing sites. For instance, when researchers are interested in the representation of political topics in search engines, the system might extract keywords and key phrases from the website of a particular political party. These keywords are later used to query search engines and compare their results automatically (this is not part of the thesis, as this is already implemented in the RAT research software).
The system must be able to crawl a website or parts of a website, depending on the size of the website. Potential topical query words and phrases shall be extracted from the crawled documents. These words and phrases should be ranked by their popularity. Stop word lists shall be used to ensure that the words/phrases can later be used as keywords describing the website's content. The system should work for English and German at least.
Recommended reading:
- Schultheiß, S.; Lewandowski, D.; Mach, S.; Yagci, N. (2023). Query sampler: generating query sets for analyzing search engines using keyword research tools. PeerJ Computer Science. https://doi.org/10.7717/peerj-cs.1421
- Technical documentation of the Result Assessment Tool. https://doi.org/10.17605/OSF.IO/3Z4DF
This topic is related to the RAT project. If the work is completed successfully and the student wishes, the developed software can be implemented into the RAT software toolkit.
Reserved thesis topicAn investigation of p-Hacking in Interactive Information Retrieval
P-hacking is the malicious practice of tuning results of studies that use hypothesis testing in a way that result reach a certain significance level, so that a study gets “publishable”. While the practice has been researched in psychology, there are no studies investigating sub-disciplines in computer science, like interactive information retrieval (IIR). In this thesis, the phenomenon of p-hacking should be investigated in that area.
Work to be done (preliminary list):
- Provide a literature review of the problem of p-hacking and studies done so far.
- Describe methods to detect p-hacking in published studies.
- Build a corpus of hypothesis testing studies from interactive information retrieval.
- Test results from this corpus for the occurrence of p-hacking, using standard methods.
- Discuss the results and make suggestions (based on the literature).
Recommended reading:
-
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-Curve: A Key to the File-Drawer. Journal of Experimental Psychology: General, 143(2), 534–547. https://doi.org/10.1037/a0033242
-
p-Checker: http://shinyapps.org/apps/p-checker
Reserved thesis topicIdentifying affiliate links in webpages
Affiliate links allow website owners to get a commission when they send users to a shop or other website, and that user buys something there. In addition, or as an alternative to advertisements, affiliate links offer an easy way for website owners to monetize their content.
In search engine research, one topic of interest is how commercialized search engine results are. In this thesis, an addition to the RAT software shall be developed. It should identify affiliate links in the HTML code of search results.
Work to be done (preliminary list):
- Provide an overview of affiliate marketing, the business of affiliate links on the web, and search result analysis (literature review).
- Develop software that allows for the upload of batches of webpages, extracts affiliate links from them, and scores each webpage according to its number and prominence of affiliate links.
- Test the approach using a considerable number of pages (test data will be provided).
Recommended reading:
- Lewandowski, D.; Sünkler, S.; Yagci, N. (2021). The Influence of search engine optimization on Google’s results: A multidimensional approach for detecting SEO. 13th ACM Web Science Conference, 2021. https://doi.org/10.1145/3447535.3462479
- Technical documentation of the Result Assessment Tool. https://doi.org/10.17605/OSF.IO/3Z4DF
This topic is related to the RAT project. If the work is completed successfully and the student wishes, the developed software can be implemented into the RAT software toolkit.
Reserved thesis topicTo what degree do news websites point to external sources?
The Web is structured through links. An essential assumption underlying the design of the Web is that it is beneficial for content providers and users to have links not only pointing to other pages on the same website but also to external sources. For content providers, linking to external sources allows them to provide contextual information without the need to produce that content oneself. For users, external sources give more information on particular aspects of a topic they may be interested in.
However, anecdotal evidence states that especially popular news websites (like spiegel.de and tagesschau.de) do not link to many external sources (anymore). The question to be addressed in this thesis is whether this actually holds. An empirical investigation will be conducted that analyzes links on news websites.
Work to be done (preliminary list):
- Provide an overview of the idea and importance of linking to external sources, and news websites’ strategies regarding linking (literature review)
- Build a sample of popular news websites, covering at least three countries
- Crawl these websites and analyze the outgoing links, considering the sections on the news websites (e.g., politics, sports)
- Discuss the findings in the context of the Web as a source for a diversity of opinions
This topic is related to the RAT project. If the work is completed successfully and the student wishes so, the developed software can be implemented into the RAT software toolkit.
Reserved thesis topicRepresentation of Political Topics in Search Engines
Which sources dominate Google’s and other search engines’ results when users type in political queries? To answer that question, the top results of selected search engines will be analyzed and compared.
Work to be done (preliminary list):
- Provide an overview of search engine source distribution analyses (literature review)
- Build a set of relevant queries (e.g., from Wahl-O-Mat statements, see https://www.bpb.de/themen/wahl-o-mat/45484/archiv/)
- Collect search results for these queries (you will get access to RAT to collect the results)
- Analyze the results. The analysis should include comparing the sources presented in the different search engines, analyzing the most popular sources per search engine and topic, and analyzing search engine optimization (SEO) results. (For the SEO analysis, data will be provided).
Recommended reading:
- Yagci, N.; Sünkler, S.; Häußler, H.; Lewandowski, D. (2022).
- A Comparison of Source Distribution and Result Overlap in Web Search Engines. In: Proceedings of the Association for Information Science and Technology, 59: 346-357. https://doi.org/10.1002/pra2.758
- Lewandowski, D.; Sünkler, S.; Yagci, N. (2021): The Influence of search engine optimization on Google’s results: A multidimensional approach for detecting SEO. 13th ACM Web Science Conference, 2021. https://doi.org/10.1145/3447535.3462479
Reserved thesis topicEvaluation of methods for web scraping to capture all content on a website
In web scraping with applications such as Selenium, Scrapy, or cURL, technical hurdles arise due to the recognition mechanisms of website operators of these applications. If these are recognized as bots, CAPTCHAS, for example, are displayed to ensure that the requests are made by real people and not computer programs. Another challenge in web scraping is that some programs do not execute JavaScript, and therefore dynamic content cannot be captured. The aim of this thesis is to investigate which mechanisms and programs are best suited to making requests more "human" so that recognition mechanisms such as CAPTCHAS do not take effect immediately and to identify which programs are best suited to automatically capture as much content as possible on a website, including dynamically generated content. In practice, for example, user agents have so far been randomized.
Main steps:
- Review of programs for web scraping with evaluation of what content can be captured on a website
- Development of methods for designing more "human" queries
- Provision of the most suitable program and methods for comprehensive web scraping
Reserved thesis topicA machine learning approach to web genre classification
For an assessment of the content on a website, it is helpful to know what type of website it is. Websites can be categorized according to various criteria, such as news sites, online stores, academic offers, etc. Using manually classified pages as a training dataset, it is possible to use machine learning approaches such as decision trees, clustering, and naive bayes to investigate how a classifier can be developed to classify the content of a website. It should also be noted that ambiguous classifications can also occur.
Main steps:
- Development of categories for the content classification of websites (based on a literature review)
- Development of a training data set for the content categories
- Evaluation of the suitability of machine learning methods for the automated classification of websites into categories
- A classifier for the automated recognition of the content category of a website
Reserved thesis topicAn analysis of personalization in search engines
Personalization is a way for search engines to improve result quality by tailoring search results to individual users. However, there has been an ongoing debate on whether personalization leads to users seeing only confirming information on debated topics (see, e.g., the discussion on “filter bubbles”). Research does not offer a consistent picture of how and to which degree search engine results are personalized. This thesis will aim to conduct a study on personalization and localization in Google’s search results.
Work to be done (preliminary list):
- Provide an overview of research on personalization in search engines (literature review)
- Use the Datenspende Plug-In (some adjustments will be needed)
- Develop a set of search queries
- Collect data from a broad user sample (distribute the plug-in widely; we can help with recruiting participants)
- Analyze the data in terms of personalization vs localization
Recommended reading:
- Krafft, T. D., Gamer, M., & Zweig, K. A. (2019). What did you see? A study to measure personalization in Google’s search engine. EPJ Data Science, 8(1), 38. https://doi.org/10.1140/epjds/s13688-019-0217-5
This topic is related to the RAT project. If the work is completed successfully and the student wishes so, the developed software can be implemented into the RAT software toolkit.
Reserved thesis topicExtending the classification of SEO indicators
Search engine optimization (SEO), i.e., the optimization of one's own content in order to be listed preferentially by commercial search engines such as Google, has an enormous influence on the results displayed by search engines at the top ranks. In previous work within the framework of the SEO Effect project, a list of indicators with which the presence of SEO on websites can be automatically assessed was compiled and verified in empirical studies.
In this thesis, further indicators already identified should be added to the model and tested for their suitability. Main steps:
(1) Identify suitable indicators.
(2) Develop a system that recognizes these indicators on HTML pages.
(3) Evaluate the system using real search result data. (The data for the evaluation will be provided).
(4) Statistical analysis and interpretation of the results.
If the classifier is successfully completed, it can be included as an analysis component in the software Result Assessment Tool (RAT), if this is desired by the student.
Reserved thesis topicWho will click on Google ads? A machine learning approach to predict ad clicks on search engine result pages
On search engine result pages, at least two types of results are presented: organic results and advertisements. Studies found that the distinction between these types is often blurred, and that the majority of users cannot reliably distinguish between ads and organic results. This thesis will analyze an existing dataset with data from more than 2,000 German internet users. The outcome of the thesis will be a predictive model that predicts who will click on ads on a search engine result page.
Work to be done (preliminary list):
- Provide an overview of research on search engine advertising with a focus on users’ understanding of ads as a type of search results (literature review)
- Provide a descriptive analysis of the dataset provided
- Clean the data for further processing
- Develop a predictive model
- Evaluate the model with real users
Dataset:
- Schultheiß, S., & Lewandowski, D. (2022). Data set of a representative online survey on search engines with a focus on search engine optimization (SEO): a cross-sectional study. F1000Research, 11(376). https://doi.org/10.12688/f1000research.109662.1
Reserved thesis topicVisualizing retrieval study results
In information retrieval studies, standard measures like precision, NDCG, etc., are computed to compare information retrieval systems. In this thesis, an add-on to the Result Assessment Tool (RAT; see https://searchstudies.org/research/rat/) should be developed that takes RAT’s output, computes appropriate measures and visualizes the results.
Work to be done (preliminary list):
- Provide an overview of information retrieval measures (literature review)
- Identify appropriate information retrieval measures
- Implement computation and visualization of the metrics
- Develop a user interface that allows users to select appropriate measures and download visualizations. It must be possible to further modify these figures.
This topic is related to the RAT project. If the work is completed successfully and the student wishes so, the developed software can be implemented into the RAT software toolkit.
Reserved thesis topicReplication of Heinström's study on fast surfers, broad scanners and deep divers
Jannica Heinström published an influential study in 2005 in which she grouped people into fast surfers, broad scanners and deep divers based on their information-seeking behaviour. This division is based on a survey in which the five-factor model (Big Five) was used in addition to scales on information behaviour.
In this thesis, Heinström's study will be critically evaluated and empirically replicated. So far, the applicability of the results to other user groups, the validity of the findings over time and the role of situational versus personal factors have not been sufficiently investigated.
Reserved thesis topicAn App review scraper for RAT
In the RAT project, we scrape different types of web documents. Copies of these documents are stored in a database. However, these are copies of the whole webpage, are static and are not updated. In this thesis, a software is to be developed that scrapes reviews from App stores (like Apple’s App Store and Google’s Play Store). The software should allow for regularly scraping the given URLs for new reviews and add them to the database. The resulting software will allow researchers to systematically compare reviews and extract relevant information from them.
Work to be done (preliminary list):
- Review the literature on studying user reviews and on extracting reviews.
- Develop a software that, given a list of URLs of articles from major App, regularly visits these URLs, extracts new reviews, and stores them in a database. Results should be downloadable in a structured format.
- Evaluate the software.
This topic is related to the RAT project. If the work is completed successfully and the student wishes so, the developed software can be implemented into the RAT software toolkit.