Skip to content
The picture shows a young lion, symbolising our automated German text simplifier Simba, which was developed by our research group Public Interest AI.
09 September 2024| doi: 10.5281/zenodo.13753450

From theory to practice and back again: A journey in Public Interest AI

In our research group on public interest-oriented AI we not only take a theoretical approach, but also a practical one, developing technical prototypes. One of these is Simba, our tool that automatically simplifies German-language text online. In this blog post, we reflect on our initial thoughts on public interest AI using our experiences from developing Simba.

What’s what

Simba features two AI-supported tools designed to help users better understand German-language online texts. The first is an internet app that simplifies personal texts, while the second is a browser extension that automatically summarises texts on websites. Both utilise an AI-based language model to automatically simplify German-language texts. Simplification is the act of reducing complexity whilst still maintaining the core message; it involves replacing longer words with short synonyms, shortening sentences, or inserting additional information to make relations between concepts clearer, for example. We have been developing Simba against the backdrop of our six Public Interest AI principles. These principles were formulated towards the beginning of our research project, and encompass the criteria ‘justification’, ‘equity’, ‘participatory design/deliberation’, ‘technical standards/safeguards’,  ‘open for validation’, and ‘sustainability’ which are elaborated upon in the image below.

Explore Simba

But first: why?

“Why” is an important question to ask in the context of public interest AI, reflected in the principles ‘justification’ and ‘equity’. Does the system have a societal purpose, which does not hinder equity? 

The six Public Interest AI principles

The ‘societal purpose’ of simplified language encompasses allowing as many different people as possible access to the same information. According to the LEO study, conducted by the University of Hamburg in 2018, around 12% of German-speaking adults in Germany have low literacy skills, which means they are capable of reading and writing simple sentences, at most. The target groups of simplified language range from people with cognitive disabilities, to non-native speakers, to children, to non-experts. Whilst we still maintain that certain processes themselves could be systematically simplified – particularly in a bureaucratic context – we believe that providing as much information in simplified language is beneficial for a democratic society and contributes to promoting equity.

An AI-based tool could improve efficiency and support translators in providing a wider range of simplified information. Our research has revealed that websites of public administrations, the education sector, and the scientific community often exclude a substantial portion of the population from vital information due to their complex language.

Let’s get technical

Developing an AI-based tool requires technical knowledge, not just for the success of the tool but also to ensure that no unnecessary risks are created and resources (such as time and money) are not wasted; these points are reflected in the condition ‘technical standards’. As a small team, we are reliant on technology made by other players – Simba is based on a large language model from Meta called Llama-3. The model is available on HuggingFace, a platform that hosts language models, and is often referred to as being open-weight: there is little documentation of the training data or process, yet the model itself is openly available. Whilst using a fully open model would be ideal, Llama-3 is highly efficient and produces outputs of a high quality compared to other models that we tested. We used our own datasets to further fine-tune Llama-3; this means we used simplification data to adjust the model to perform better at this specific task. With the condition of ‘sustainability’ in mind, we used efficient fine-tuning techniques which result in lower emissions. Finally, we performed evaluation in-house, with annotations done by our team, so we could ensure fair working conditions.

Open validation and deliberation

Our thinking behind Simba is open, with blog posts such as this one and research papers in the pipeline. Additionally, our code is on GitHub, our models are accessible on HuggingFace, a subset of our training data is documented and publicly available, and our website is accessible to the public. On our website in particular, we try to adapt information about Simba for a less technically versed audience. We began the project by consulting the existing literature, experts and holding an initial consultation with potential stakeholders; a group of researchers who have experience with disability and discrimination. We also explicitly ask for feedback and actively encourage other researchers, professionals and dedicated users from the simplification world to collaborate with us and help us make Simba better. So far, we have exchanged thoughts and ideas with a variety of organisations and potential users. We believe that it is only through collaboration that we can truly have a tool that works for as many different people as possible. 

The openness of Simba – including explanations tailored to different target groups –  fulfils a necessary prerequisite for participation and deliberation. However, we propose that a further differentiation on the type of public interest AI project is necessary to think about the conditions in different contexts, in particular this aspect of participation. Projects realised in the public sector, by mid to large commercial entities, and small non-profit groups will have different resources available and potentially have a different impact. For Simba, a more participatory design process would have been difficult, given the costs of involving stakeholders and users and the intricacies involved in design.

A strong foundation

Overall, our original thoughts on public interest AI provided a valuable foundation for thinking about AI for the public, and since then we have learnt from a variety of public interest AI projects, including our own prototypes. Our experiences with developing Simba have shown us that the principles formulated at the beginning of the project are still relevant, but that slight adjustments would be valuable. In particular, having a justification, technical safeguards, being open for validation and fostering collaboration are important factors regardless of the size and scope of a project, whereas participation and sustainability should be considered in a more nuanced way. Simba, for example, has been developed in the context of a research project by a small team; including more participatory processes would be difficult given the costs of involving stakeholders/users, and the intricacies involved in (co-)design. Additionally, as is the case for many research projects, success is often measured by academic publications and less so by practical output. Ideally, in the long-term more economic resources and incentives could be dedicated to practical outputs created in research contexts. In the meantime, we hope that this documentation of our insights and lessons learned will help researchers and developers navigate their own public interest AI projects.

This post represents the view of the author and does not necessarily represent the view of the institute itself. For more information about the topics of these articles and associated research projects, please contact info@hiig.de.

Freya Hewett

Researcher: AI & Society Lab

Sign up for HIIG's Monthly Digest

HIIG-Newsletter-Header

You will receive our latest blog articles once a month in a newsletter.

Explore Research issue in focus

Du siehst Eisenbahnschienen. Die vielen verschiedenen Abzweigungen symbolisieren die Entscheidungsmöglichkeiten von Künstlicher Intelligenz in der Gesellschaft. Manche gehen nach oben, unten, rechts. Manche enden auch in Sackgassen. Englisch: You see railway tracks. The many different branches symbolise the decision-making possibilities of artificial intelligence and society. Some go up, down, to the right. Some also end in dead ends.

Artificial intelligence and society

The future of artificial Intelligence and society operates in diverse societal contexts. What can we learn from its political, social and cultural facets?

Further articles

The picture shows seven yellow heads of lego figures, portraying different emotions. This symbolizes the emotions university educators go through in the process of resistance to change due to digitalisation.

Resistance to change: Challenges and opportunities in digital higher education

Resistance to change in higher education is inevitable. However, if properly understood, it can contribute to shaping digital transformation constructively.

The picture shows an invitation to the Berlin living lab, symbolizing citizen participation in digital city administrations.

Mobility transition in the neighbourhood: Simulating citizen participation in Berlin’s digital administration

How can data and digital solutions drive urban development? In the living lab, we tested citizen participation within Berlin's digital administration.

The photo shows a zebra crossing from above with people crossing the street, representing citizen participation in data-driven processes.

Participation in smart cities

This blog article examines citizen participation in data-driven processes and the challenges that must be overcome for successful participation.