,

Colombia 2 – HansUp

Challenge 4: Transforming citizen engagement in Ghana through AI and parliamentary data

Ghana is recognized as one of the most stable democracies in West Africa. It operates as a presidential republic with a tripartite structure: executive, legislative, and judicial powers, combining elements of customary law and the Anglo-Saxon legal system. Despite its democratic progress, Ghana faces challenges regarding transparency and access to information. Parliamentary debates, officially recorded in the Hansard, are published in flat, hard-to-navigate PDF formats. These documents are essential for political oversight and decision-making, but their current structure prevents efficient searches, automated analysis, and data visualization—limiting citizen participation, journalistic work, and academic research.

In this context, the Ghana Statistical Service (GSS), in partnership with the Parliament of Ghana, is participating in NGDA 2025 to lead a digital innovation challenge. The goal is to develop an artificial intelligence-based solution that uses technologies such as OCR and language models to digitize and structure the parliamentary Hansard. This will enable fast searches, intuitive information organization, and easier visualization through a platform accessible to legislators, journalists, researchers, and the general public.

Challenge Context

Background

Since 2024, the Ghana Statistical Service (GSS) has been working with the Embassy of Denmark and Statistics Denmark through the Strategic Sector Cooperation (SSC). This partnership gives GSS opportunities like joining the NGDA initiative and using new technologies to improve data processes in Ghana.

By law, GSS collects and publishes data to support policies and citizen engagement. While it has mainly focused on quantitative data, it is now expanding into qualitative analysis with advanced data science methods.

This partnership has as goal focusing on the large amount of qualitative information in parliamentary proceedings. Organizing this data is complex, but GSS plans to use data science tools to extract and structure it, making parliamentary records easier to access.

This project will process the Parliamentary Hansard—the official record of debates, bills, budgets, and oversight discussions—by extracting, structuring, and analyzing its content. The aim is to provide insights that support policymaking, improve governance, and encourage citizen participation.

Understanding the Problem

This challenge focuses on transforming citizen engagement in Ghana by making parliamentary records more accessible through large language models. The project seeks to digitize the Parliamentary Hansard using AI and Optical Character Recognition (OCR), turning paper-based records into structured, searchable, and user-friendly data. By improving accessibility and usability, this initiative will not only help policymakers integrate data and statistics more effectively into legislative processes but also empower citizens to better engage with parliamentary debates, decisions, and oversight.

Defining the problem

The problem lies in the lack of a digital solution that ensures long-term viability, accessibility, secure data handling, and efficient classification of information. Current systems often fail to update and adapt to different devices, making scalability and portability limited. Accessibility remains a challenge, as interfaces are not always easy to read, intuitive, or inclusive of both online and offline modes, visual aids, and intelligent search options. At the same time, data handling requires strict policies and reliable verification methods, such as biometrics, to guarantee ethical use and security. Finally, information is often poorly organized, lacking clear topic identification, effective filters, and coherent classification methods. These gaps highlight the need for a user-friendly, secure, and scalable system that can enhance usability, trust, and meaningful engagement with digital records.

Requirements specification

The proposed system for digitizing and structuring the Ghanaian Parliamentary Hansard is based on four core pillars : sustainability, accessibility, security, and efficient information management.

SUSTAINABILITY

The solution must support long-term use by allowing regular updates and ensuring compatibility across different devices. It should be designed with scalability and portability in mind to remain functional as technology evolves

ACCESSIBILITY

To ensure broad usability, the platform must be easily readable, offer visual aids, support both online and offline access, and include an intuitive interface and intelligent search features.

SECURITY

The system must manage data ethically and securely, implementing encrypted communications, and user credentialing. It should be resilient against disruptions while ensuring responsible access to sensitive legislative data.

INFORMATION MANAGEMENT

The platform must allow the classification of discussions by topic, offer efficient organization, and include advanced search filters. It should support automated topic modeling to structure data meaningfully and help users find relevant information quickly.

OUR SOLUTION: HansUp

How might we digitize parliamentary registers and develop user-friendly platforms that enhance the use of data and statistics in legislative processes?

Methodology

The team approached the challenge through a structured design process. The first step was to frame the problem by asking How Might We questions, which opened space for different perspectives and uncertainties around parliamentary documentation and citizen engagement.

Next, the team created an affinity diagram to answer the question What could be the most optimal way to document parliamentary sessions? Insights were organized into three categories: accessibility, optimal representation, and information and data management. From this exercise, the Design Specification table was developed, outlining requirements, criteria, and measures to guide the solution.

Building on this foundation, the team generated a catalog of ideas, analyzing possible solutions in terms of functionality, strengths, weaknesses, and expected benefits. This step allowed the comparison of alternatives while keeping in mind the challenges of complexity, data classification, and costs.

Finally, the process concluded with the Concept Description table, which refined the proposal into a concrete solution. At this stage, the value proposition, target users, technologies, and benefits were clearly defined, ensuring the project directly addressed issues such as extended search times, difficulty in analysis, and lack of accessibility in parliamentary records.

Concept

The project proposes the development of a lightweight mobile application that digitizes and restructures the Parliamentary Hansard using artificial intelligence. By leveraging OCR technology and natural language processing, the application would extract and classify the content of debates, enabling intuitive navigation through filters, search tools, and visual dashboards. The solution aims to be available both online and offline. This user-friendly platform transforms Hansards into an accessible resource, promoting timely, inclusive, and informed engagement with parliamentary data.

How does it works?

Here, a video demonstration showcasing the mock-ups of our application is presented.

The application, hansUp, offers login and registration for users who wish to access the legal information contained in the hansards. This registration is intended to keep a record of the people who access this data and to ensure the security of the information processed there.

Once logged in, users are directed to the Hansards search section, which displays an organized list of records sorted chronologically in ascending order. The main objective of this interface is to provide a structured and intuitive overview of the available data, functioning as an index that allows users to easily navigate through the records and identify those of particular interest.

To refine their searches, users can apply two types of filters — either independently or in combination — designed to enhance precision and usability:

Date Filter: Enables users to locate Hansards corresponding to a specific time frame, particularly useful when the user wishes to review sessions held within certain dates without necessarily knowing the topics discussed

Keyword Filter: Allows users to perform searches based on keywords, which are matched against the Motion and Conclusion sections. The filtering algorithm incorporates semantic similarity, meaning that results may include related terms or topics even if the exact keyword is not explicitly present in the text. This ensures that responses are retrieved efficiently and with contextual relevance.

Beyond search and filtering, the interface presents three core components that facilitate a rapid assessment of each record’s relevance:

  • Motion Section: Summarizes the main topic or objective of the session held on a specific date. It provides a concise description of the central discussion point, enabling users to quickly understand the subject matter before deciding whether to explore the record in greater detail.
  • Conclusion Section: Offers an overview of the session’s outcomes, including a chronological summary of the deliberations and the most relevant events. This section helps users grasp the flow and structure of the session without needing to read the full document.
  • Agreed/Negative Section: Indicates whether the motion presented during the session was approved or rejected. This status helps users immediately identify the final outcome of the deliberation.

The mock-ups presented below depict the layout and functional components of the application’s initial screen

Finally, once users identify a Hansard of interest, they may select it to access an expanded view that contains detailed information about the session

First, the date of the session is displayed, followed by a title summarizing the main objective or topic, and a brief description that expands upon and provides additional detail to the information previously presented in the Motion section.

On this screen, three subsections are displayed, which are detailed below.

TOPIC SUMMARY

This section presents the topics discussed during the parliamentary session in chronological order, along with relevant information for their interpretation and analysis. Each topic card is designed to provide a concise yet informative overview of the discussion’s structure and content.

At the top of each card, the title highlights the central theme addressed in that segment of the session, followed by a brief description summarizing the key points and arguments raised by the participants. This description offers users an immediate understanding of the nature and scope of the debate surrounding that topic.

At the bottom of the card, a progress bar visually represents the proportion of total session time devoted to the discussion of that topic, offering an intuitive grasp of its relative importance within the session. Meanwhile, in the upper right corner, the interface displays the members of parliament who contributed to the discussion, providing context on participant engagement and perspectives.

The primary purpose of this component is to deliver a structured and accessible overview of the debate’s composition. By identifying the main issues addressed and the participants involved, users can rapidly infer the parliament’s priorities, stances, and recurring interests related to the topic of inquiry.

Users can select a topic of interest to access more detailed information about it. This screen first displays the title of the Hansard to which the topic belongs, along with the main theme under discussion. It then provides a more comprehensive summary of the debate, highlighting the key arguments and developments that took place.

Additionally, the interface includes sections for keywords and top speakers, which allow users to identify the most relevant concepts and the members who demonstrated the highest level of participation or influence during the discussion.

Additionally, at the bottom of the screen, the comments and contributions of each member are displayed in chronological order. For every participant, the interface presents their name, photograph, and a concise summary of their intervention within the debate.

When a user selects a specific speaker, a detailed profile of that member is shown, including a brief biographical overview and a summary of their main areas of interest. This feature allows users to gain contextual awareness of who the speaker is and to better understand their perspective and role in the discussion.

The following are the mock-ups illustrating the functional flow of the Topic Summary feature.

MOTIONS

The Motions section mirrors the purpose and structure of the Topic Summary feature. It displays the motions discussed during the selected session, each represented by a card containing the title, concise description, proposing member, and a status indicator denoting whether the motion was approved or rejected.This component is designed to offer a succinct and structured overview of the motions debated, enabling users to quickly grasp the session’s key legislative actions and outcomes

When a user selects a motion of interest, the interface—similar to that of the Topic Summary view—displays more detailed information regarding the purpose of the motion, accompanied by keywords, top speakers, and a chronological overview of the discussion. At any point, users may also access detailed information about a specific member of parliament, allowing for a more comprehensive understanding of their participation and contributions within the debate.

TIMELINE

The objective of this section is to transform the textual and qualitative content of the Hansards into quantitative and numerical data, thereby facilitating the interpretation of dialogues and debates. This analytical layer provides an additional perspective that enhances the understanding of both the nature of parliamentary sessions and the flow of the discussions that take place within them.

This interface includes two sub-screens, offering either a general overview or a more detailed view of the available statistics.

The general overview presents a chronological summary of the session, divided into time intervals of approximately ten minutes each. Within every time frame, the interface highlights the total number of interventions and the members of parliament who participated. This visualization allows users to identify, at a glance, the most active or critical moments of the debate—those characterized by a higher density of participation—and to observe which members were most engaged during those key segments.

Moreover, this design enhances navigability by enabling users to efficiently locate and access the specific moments in which topics of personal or analytical interest were discussed during the session.

The Specific Statistics sub-screen presents data corresponding to each ten-minute time window of the session. For each interval, it displays the number of interventions per member and a summary of the overall discussion within that period. This summary does not provide speaker-level detail; rather, it offers an abstract and aggregated representation of the conversational dynamics. The intent is to capture the density and flow of exchanges, highlighting key moments in the debate and facilitating the identification of specific time frames—such as when a particular idea or theme was discussed—without requiring detailed knowledge of who proposed it.

If users wish to obtain speaker-specific information or a deeper account of how a general idea evolved, they can refer to the previously described subsections — Topic Summary and Motions — which provide more granular insights into the flow of arguments and participants’ contributions.

In summary, the Statistics section is designed to offer a macro-level understanding of the debate’s structure and temporal evolution, serving as a guide for where to begin exploration, while the Topic Summary and Motions sections provide the micro-level detail necessary to analyze the development of ideas and parliamentary discourse.

Finally, we believe there is significant potential in generating numerical data from the Hansards, as it can give rise to valuable analytical and research applications. For instance, to illustrate the potential of this quantitative layer, the resulting data could be used to train predictive models aimed at forecasting the dynamics and behavioral patterns of future parliamentary sessions. This transformation of qualitative debate records into structured data not only enhances interpretability but also opens new avenues for computational analysis, trend identification, and data-driven decision-making within the study of legislative processes. Of course, these are merely conceptual ideas that are beyond the scope of the present project

HOW WILL THE SYSTEM BE DEVELOPED ?

Below, we present a component view of the architecture designed for the project.

MIRO VIEW ARCHITECTURE

This architectural proposal aims to meet both the functional and non-functional requirements of HansUp. The team decided to divide it into three layers, each with a well-defined and isolated responsibility.

EXTRACTION LAYER

The primary function of this layer is to collect and clean the unstructured data extracted from the Hansards. Through a dedicated web scraping service, the system gathers all the information published on the official website of the Parliament of Ghana. Once the data is collected, it is processed by two specialized components designed to clean and refine the extracted content.

Since some Hansards are published in both scanned and plain-text formats, two processing components are employed to handle each data type appropriately. The OCR Processing component is responsible for processing scanned documents—extracting and cleaning their textual content—while the NLP Processing component handles the cleaning and linguistic normalization of plain-text documents.

The outcome of these processes is a unified corpus of plain text, free from redundant or irrelevant information that is often present in the original Hansards. Given that Hansards, by their nature, are highly detailed and include procedural and contextual minutiae, this post-processed text is optimized for subsequent analysis and information retrieval.

The cleaned and standardized data is stored in a non-relational database implemented in Cassandra. The selection of Cassandra is motivated by its suitability for managing large-scale text data efficiently, as it provides distributed storage and high scalability. Furthermore, due to the substantial data volume and the need for high throughput, all communication between the components in this layer is handled asynchronously through Kafka.

TRANSFORMATION LAYER

Once the Hansard data has been processed into clean plain text, the Transformation Layer is responsible for converting this information into lighter, summarized, and semantically structured data that supports the functional requirements of HansUp. This layer is composed of three classification components, each designed to process and transform the textual data to feed three graph-based databases implemented in Neo4j. Once again, all communication between the components in this layer is handled asynchronously through Kafka.

The Chronological Classifier extracts the chronological details of parliamentary sessions for specific dates. It identifies and structures information related to the temporal sequence of events, which is later used to present session timelines and generate time-based statistics. The Conversational Classifier focuses on identifying the flow of interactions during the sessions—specifically, which speakers participated and the subjects of their interventions. Meanwhile, the Topic-Based Classifier is responsible for producing summaries, identifying discussion topics, and extracting motions or resolutions debated in each Hansard.

The purpose of these classifiers is to structure the unorganized textual content into explicit, meaningful, and computationally accessible data representations. In other words, they convert plain text into a semantically rich structure that captures the logical and conversational flow of parliamentary debates. The outputs of these classifiers are stored in Neo4j databases. Neo4j was selected because its graph-based structure preserves the semantic relationships inherent to conversational exchanges and provides a lightweight yet expressive representation of the complex interconnections identified by the classifiers.

Together, the Extraction and Transformation Layers address the project’s requirements for accessibility and information management. They enable the conversion of unstructured, large-scale Hansard data into lightweight, portable, and expressive datasets that can be efficiently stored and consumed by low-resource clients or even accessed offline, when part of the data is stored locally.

APPLICATION LAYER

The Application Layer is responsible for managing the design and orchestration mechanisms that enable the cleaned and processed data—originating from the preceding layers—to be consumed, analyzed, and delivered to the end user. This layer incorporates two architectural patterns, Microservices and Backend-Driven UI (BDUI), along with an infrastructure tactic, Reverse Proxy.

The microservices architecture decouples the application logic into independent components, enhancing scalability and maintainability. Given the high volume of textual data processed in the project, each microservice handles a specific subset of functionalities, ensuring lightweight operation and clear responsibility boundaries that contribute to the overall performance of the system. An authentication service is also implemented to manage user registration and login processes securely.

The Backend-Driven UI (BDUI) pattern fosters flexibility and sustainability by delegating the control of the user interface to the backend. Within this architecture, a dedicated UI Provider component acts as an orchestration layer—fetching data from the various services and invoking the Backend Content UI module to render the corresponding screens. The Backend Content UI component centralizes all graphical resources, ensuring consistency, reusability, and a unified visual design. Since these graphical elements are managed server-side, updates to the user interface can be deployed dynamically without requiring users to install new releases.

Finally, the Reverse Proxy component enhances the system’s security and reliability. By providing encryption and masking the internal logic and service calls, it exposes a single secure endpoint to the mobile client, thereby reducing the attack surface and mitigating potential threats. Consequently, the proposed architectural design ensures compliance with the previously defined functional and non-functional requirements.

TARGET GROUP

The primary users of this solution include members of parliament, data analysts in the Ghanaian Parliament’s Data Management Department, media professionals, researchers, civil society organizations, NGOs, and academia. These groups frequently consult Hansards for insights into legislative processes, debates, and decision-making, yet face challenges due to the current document format and lack of digital accessibility.

STRENGTHS, OPPORTUNITIES AND BENEFITS

This solution enhances transparency and citizen participation by improving access to vital legislative information and enabling more efficient data management. Through graphical elements, it supports diverse learning preferences and helps both internal and external users quickly retrieve and comprehend parliamentary content. It reduces the time required for information retrieval, improves clarity, and makes key insights from debates more actionable, thereby supporting better decision-making, media reporting, academic research, and citizen oversight. For Members of Parliament, it offers clear visualizations of debate trends and speaker participation, aiding in legislative planning and preparation. The ability to update records after each session ensures that users access the most current data, encouraging reuse, knowledge continuity, and meaningful public engagement that reinforces democratic participation and informed citizenship.

Business Model Canvas

OUR TEAM

Jose Simón Ramos Sandoval

Universidad Nacional de Colombia

Systems and Computing engineering

email: [email protected]

Sofía Osejo Gallo

Universidad Nacional de Colombia

Electronics engineering

email: [email protected]

Danna Carolina Caballero Cañón

Universidad Nacional de Colombia

Architecture

email: [email protected]