Automating Occupation Coding in Zimbabwe: A Leap Toward Smarter Labour Statistics

By Clapton Munongerwa | October 2025
In October 2025, the Zimbabwe National Statistics Agency (ZIMSTAT) launched a transformative initiative to automate occupation coding using Natural Language Processing (NLP) and the International Standard Classification of Occupations (ISCO). Developed with technical support from the UK’s Office for National Statistics (ONS) Data Science Campus, this marks a major milestone in modernizing labour statistics across Africa.

Why Automate Occupation Coding?
Manual coding was labour-intensive, costly, and prone to inconsistencies. The introduction of the open-source SOCCoder package revolutionizes this process by offering:

  • ✅ Faster processing of large datasets
  • ✅ Standardized logic for consistency
  • ✅ Scalable architecture for national surveys
  • ✅ Confidence scores for transparency

How It Works: From Job Titles to ISCO Codes
SOCCoder is a TF-IDF fuzzy matching classifier that maps job titles and descriptions to ISCO codes. Here's how it works:

  1. Organize Inputs: Job titles, manual codes, tasks, and industry details
  2. Preprocess: Clean and standardize data to match coder requirements
  3. Apply AI: SOCCoder predicts Top 3 ISCO codes with confidence scores
  4. Quality Check: Flags ambiguous matches and validates predictions
  5. Export Results: Outputs saved in Excel with original and predicted codes

Table 1. Top three jobs: Maize farm workers

Top Prediction ISCO Code Description Confidence
1 9211 Crop farm labourers 0.85
2 6111 Field crop growers 0.72
3 9212 Livestock farm labourers 0.6

The top prediction matched the manual code, demonstrating SOCCoder’s impressive accuracy.

Human-in-the-Loop Quality Assurance
SOCCoder supports manual verification through a feedback loop:

  • High-confidence predictions → Adopted
  • Moderate-confidence → Verified
  • Low-confidence → Reviewed deeply

This ensures transparency and reliability throughout the coding process.

Why This Work Matters
Automating occupation coding offers measurable benefits:

  • ⏱️ Efficiency: Weeks of manual work reduced to minutes
  • 📊 Consistency: Reduces human error and subjectivity
  • 🌍 Scalability: Ideal for national-level surveys
  • 🔎 Transparency: Confidence scores guide decisions

Limitations & Considerations

  • Currently supports English only
  • Sensitive to typos and vague job titles
  • Local job terms may require manual interpretation
  • Requires well-structured ISCO dictionaries
    These limitations highlight the importance of pairing automation with expert oversight.

Looking Ahead
ZIMSTAT plans to enhance the tool with:

  • Multilingual support
  • Improved handling of local terminology
  • Easier setup protocols
  • Transformer-based NLP (e.g., SBERT) for semantic classification

These upgrades will make the tool even more robust and adaptable across diverse survey environments.

Explore the Tool
The occupation coder is open-source and available on GitHub:
ONS Data Science Campus – occupationcoder-international

About the Author/s
Clapton Munongerwa is an economist and data scientist specializing in labour statistics. He currently leads the Labour Statistics Department at ZIMSTAT, overseeing the production of official labour statistics through labour force surveys, establishment-based surveys, and the integration of administrative data.

Contact: munongerwac@zimstat.co.zw