,

Ghana 2 – Next Generation Digital Action 2025 – Challenge 4

Empowering citizens through accessible parliamentary records for a more transparent democracy.​

Bridging the Gap in Civic Participation​

Re-imagining Civic Access: Transforming Ghana’s Parliamentary Hansard through AI​

Elisha Soglo-Ahianyo, Kingsley Owusu Agyekum, Sussana Agyekum, Stephanie Davis
Supervised By: Dr. Ing. Alexander Boakye Marful

Top Level Keywords:

Parliamentary Hansard, Optical Character Recognition, AI Powered Solution, Machine Learning

Understanding The Problem

In Ghana’s democracy, Parliament is the nerve center of national governance. It is the institution where laws are crafted, budgets are debated, and public policies are shaped. Yet, despite its central role, the everyday Ghanaian remains disconnected from the very proceedings that govern their lives. The official record of Parliament—the Hansard—exists to bridge this gap, but its current format fails to do so. Often found in lengthy PDFs or stored in printed documents, the Hansard is largely inaccessible, difficult to search, and virtually unreadable for those outside the political or academic elite.​

At the heart of our project is a powerful idea: what if the Hansard could speak in the language of the people? What if it could be transformed from a bureaucratic archive into a dynamic tool for civic participation? As part of the 2025 Next Generation Digital Action (NGDA) Challenge, our team is working to answer that question by designing a platform that makes the Hansard not only digital, but meaningful, contextual, and accessible to all.​

Member of Parliament speaking

Defining the Problem:

Ghana’s 8th Parliamentary Hansard, which contains the official transcripts of parliamentary debates, decisions, and proceedings, remains largely inaccessible to the general public due to its current format. Much of the Hansard is stored as scanned documents or lengthy PDFs that are difficult to search, analyze, or understand—especially for citizens, researchers, and even policymakers who need quick access to specific information. This limits transparency, civic engagement, and data-driven decision-making. The documents are often lengthy, formally structured, and not designed for easy navigation. Without modern tools to extract key insights—such as who said what, when, and about which topic—valuable parliamentary knowledge remains locked away from those who could use it to promote accountability, participation, and informed discourse. The problem is not merely one of digitization, but of intelligent accessibility: how do we transform these records into living, searchable, and citizen-friendly information? Addressing this issue is critical for bridging the gap between Parliament and the people it serves.

Approaching the Challenge

Our Methodology

In this initial phase of our work, we have adopted a methodical and context-driven approach. The first step has been to digitize the Hansard through Optical Character Recognition (OCR) tools such as Amazon Textract and Tesseract. These technologies allow us to extract text from scanned parliamentary documents and prepare them for analysis.​ Once extracted, the text is cleaned and structured into meaningful units. We segment the content into debates, questions, responses, committee reports, and procedural items, labeling each with metadata such as speaker names, political affiliations, dates, bill titles, and keywords.​

Beyond digitization, our work also involves applying Natural Language Processing (NLP) tools to extract useful information. Using pre-trained models fine-tuned to parliamentary language, we are developing systems that can identify key entities—such as Members of Parliament, regions, ministries, and policy areas—and automatically generate plain-language summaries of each session. These summaries will provide users with a simplified overview of what was discussed, who contributed, and what decisions were made.​

Challenges with the existing problem:

We identified two main formats the Hansard documents could have. Scanned PDFs are image-based documents that require OCR (Optical Character Recognition) to extract text. Typed PDFs are text-based and are directly readable by machines. To address both formats, we designed a dual-pipeline architecture.

Tackling the challenge

The scanned Hansards pipeline begins with OCR processing using tools like Tesseract, Google Cloud Vision, and Amazon Textract to convert scanned pages into machine-readable text with high accuracy. Extracted text is then cleaned, structured, and formatted with standardized tags like Speaker Names, Dates, Session Numbers, Topics, and Bills. Using NLP techniques, we identify key entities such as MPs, Committees, and Policy Themes. Cleaned text is embedded using Sentence Transformers and stored in a Vector Database like FAISS or ChromaDB. Citizens can search naturally, for example, “What was said about education in 2022?”, and receive AI-generated summaries grounded in official records.

The typed Hansards pipeline begins with direct text extraction using Python libraries like PDFPlumber and LangChain PDF Loaders. Documents are split into logical blocks — by speaker, date, or debate topic — to improve precision in search and summarization. We apply a Phi-3 language model, fine-tuned with QLoRA, to create an intelligent chatbot-like interface. The RAG model retrieves the most relevant blocks from the database and uses Phi-3 to generate responses grounded in actual debate content. All of this is made accessible via a citizen-facing website, built to support search by MP, bill, date, topic, with local language support and mobile optimization.

Design Concept:

The technology behind the platform includes a range of tools and frameworks. For OCR, we use Google Cloud Vision, AWS Textract, and Tesseract. Text extraction is performed with PDFPlumber and LangChain. NLP and entity extraction utilize spaCy and HuggingFace Transformers. Embedding is done with MiniLM, BERT, and Sentence Transformers. We store vector data using FAISS, ChromaDB, or Weaviate. The core language model is Phi-3 from Microsoft, fine-tuned using QLoRA. AI search is implemented through a Retrieval-Augmented Generation (RAG) setup. The frontend platform is built using React with TailwindCSS or Streamlit, while the backend infrastructure uses FastAPI and Supabase or PostgreSQL.

Imagine a citizen asking, “What did Hon. Samuel Okudzeto Ablakwa say about university funding in 2022?” Our system retrieves the right Hansard sections using semantic vector search. It sends those sections to Phi-3, which generates a concise, accurate summary. The answer appears instantly — readable, traceable, and shareable.

Images of chatbot responding to the question “What did Hon. Samuel Okudzeto Ablakwa say about university funding in 2022?”

Challenging the challenge

Innovative Aspects of Our Solution – Theory of Scaling Science

We’ll track impact through several indicators including OCR accuracy rate above 95 percent for scanned documents, search response time under two seconds, user engagement measured by repeat visits and average session duration, and public sector adoption by MPs, journalists, and civil society users.

We see this platform as the foundation for a broader civic tech ecosystem. We envision open API access for researchers and developers, integration with other datasets such as budgets and SDG tracking, sentiment and policy trend analysis over time, and a mobile app with notifications and parliamentary alerts.

We believe everyone deserves to know what happens in Parliament — not just lawyers or journalists. Whether you’re a teacher, student, farmer, or civil servant, this platform is for you. We’re not just opening documents. We’re opening democracy. Follow our updates, test the platform, share feedback, and help us shape this tool for Ghana and beyond.

The Crew

This project is being driven by a vibrant team with diverse skills. The urban planner conducts systems research and maps information needs. The cybersecurity expert leads backend and web development with security compliance. The architect crafts a clean, human-centered design and mobile UX. The ICT lead integrates NLP workflows and ensures technical cohesion. The telecom engineer optimizes for low-bandwidth users and supports system deployment.

Elisha Soglo-Ahianyo

TEAM LEAD

Kwame Nkrumah University of Science and Technology

Mphil. Cyber-Security and Digital Forensics

A full-stack software developer with experience with building cutting-edge software and currently exploring web penetration testing

Kingsley Owusu Agyekum

Kwame Nkrumah University of Science and Technology

BSc. Telecommunication Engineering

A telecom engineering student with strong passion for wireless communication and computer networking

Stephanie Davies

Kwame Nkrumah University of Science and Technology

Mphil. Architecture

An architectural academic and researcher exploring how sustainability and digital innovation in education can shape more equitable, resilient built environments

Susanna Agyekum

Kwame Nkrumah University of Science and Technology

Mphil. Planning

An Urban Development and Environmental Policy and Sustainability enthusiast with over three (3) years of experience