Enhancing Scope 3 Category 1 Emissions Estimation with Machine Learning and LLMs

Context and Motivation

Companies increasingly disclose their greenhouse gas emissions through initiatives like CDP. However, Scope 3 emissions, and in particular Category 1: Purchased Goods and Services remain highly uncertain. Reported data are often incomplete, inconsistent, or methodologically ambiguous, with errors ranging from boundary misinterpretations to simple typos. These inconsistencies limit comparability across firms and reduce confidence in corporate carbon data.

Limitations of Current Approaches

To compensate for missing or unreliable disclosures, estimation methods often rely on sector-level emission intensity factors, applying average values across companies. While pragmatic, this approach ignores firm-level differences such as size, capital intensity, or supplier engagement strategies, all of which can strongly influence emissions. The result is a process that is simple but lacks explanatory power and precision.

Methodology Overview

This project proposes a data-driven framework to improve the accuracy, scalability, and interpretability of Scope 3 Category 1 emission estimates. It combines:

1. Machine Learning (ML) techniques to model emissions based on company-level variables.

2. Large Language Models (LLMs) to extract additional insights from unstructured textual data.

Leveraging LLMs for Variable Extraction

Many relevant variables, such as supply chain integration, sourcing strategies, or decarbonization practices, are described in unstructured text (e.g., sustainability reports, supplier documents, ESG disclosures). To capture this missing information, LLMs are employed to extract high-signal qualitative

indicators that can then be integrated into the ML models. This step aims to enrich the feature space with contextual and behavioral dimensions that conventional datasets overlook, enhancing the model’s explanatory depth and predictive precision.

Expected Outcomes and Impact

By integrating structured and unstructured data through ML and LLMs, this project seeks to establish a more reliable and scalable methodology for estimating Scope 3 Category 1 emissions. Beyond improving numerical accuracy, it demonstrates how modern tools can strengthen corporate carbon transparency, foster better comparability between firms, and ultimately support more effective climate action across global supply chains.

Enhancing Scope 3 Category 1 Emissions Estimation with Machine Learning and LLMs

Related

Categories

Enhancing Scope 3 Category 1 Emissions Estimation with Machine Learning and LLMs

Share this:

Related

Categories