FinTech / Automation

Internal Tool / Client Deployment.

A document processing pipeline that extracts, classifies, and structures transaction data from M-Pesa statements — eliminating manual reconciliation.

Most businesses in Kenya run significant transaction volumes through M-Pesa. The statements those transactions generate are structured PDFs — readable by humans, but not by accounting systems or business intelligence tools. We built the extraction and processing layer that changes that.

The problem

A financial services company was receiving M-Pesa statements from hundreds of agents on a monthly cycle. Each statement contained hundreds of transactions. A team of people was manually extracting the data — copying figures from PDFs into spreadsheets, classifying transaction types, and building the monthly reconciliation report by hand.

The process took three days per month. It was prone to transcription errors. Those errors had downstream effects on agent commissions, compliance reporting, and client billing. And the volume was growing — more agents meant more statements, which meant more manual work.

The problem was not the data. The data was all there. The problem was that no tool existed to read it in the format it came in.

What we built

We built a document processing pipeline that ingests M-Pesa PDF statements, extracts every transaction, and outputs structured data ready for analysis and reconciliation. The extraction handles the variation in M-Pesa statement formatting across different account types and date ranges.

Transactions are classified automatically — by type (send money, receive money, buy goods, pay bill, withdrawal), by amount range, by counterparty type where identifiable. The classified output feeds directly into the reporting layer.

The pipeline runs on a schedule. Statements are dropped into the input folder; reports come out the other end. No human in the loop for the extraction and classification steps. The team reviews the output, not the raw data.

What changed

  • Three-day manual process reduced to under two hours — including review time
  • Zero manual data entry in the reconciliation workflow
  • Every transaction is classified and tagged — the audit trail is automatic
  • Reports are generated on demand, not at month-end after days of prep
  • Scales with volume — adding more agents adds statements, not more manual work
  • Error rate from transcription dropped to zero in the extraction step

Working on something similar?

Tell us about the process you want to fix.

Start a conversation →