SPDX for Developers: The Ideals and Realities of License Management

2026年2月6日

This article is a machine translation from Japanese. It may contain translation errors.

I’m Yuki from PredNext. I joined PredNext last year. In my previous role, I was involved in international standardization of communication protocols and their deployment in society, which gave me a deep appreciation for how important it is to “exchange information in a common format” when bringing technology to the wider world. Software license management is another area where a common format is needed.

In modern software development, it’s not unusual for a project to have over 100 dependency packages. Are you accurately tracking and managing all of their licenses? This article outlines the challenges of license management and explains the “SPDX License List”—a resource that software developers can start using today.

Do You Know Every Package Your Project Uses?

Taking AI development as an example, PyTorch depends on about 50 packages including transitive dependencies, and Hugging Face Transformers—which itself uses PyTorch—depends on about 100 packages. Manually managing all of these dependencies in a spreadsheet would require enormous effort.

This article explains SPDX, the standard designed to solve this problem. Let’s start with SBOM (Software Bill of Materials), the concept behind SPDX.

SBOM (Software Bill of Materials)

SBOM is often compared to a food ingredient label—it’s a mechanism for clearly stating “what’s inside” a piece of software, just as ingredient labels tell you what’s in a food product.

Same concept

💻 Software SBOM

numpy 2.4.0

torch 2.10.0

pandas 3.0.0

requests 2.32.5

🍪 Food Ingredient Label

Flour

Sugar

Eggs

Butter

SBOMs gained traction from a security perspective. In the United States, following incidents such as the SolarWinds supply chain attack in late 2020, Executive Order 14028 in May 2021 mandated SBOM provision for software procured by the federal government. When the critical “Apache Log4j” vulnerability (Log4Shell) was discovered in December of the same year, many organizations were unable to immediately determine their exposure, reinforcing the importance of SBOMs. In Europe, the Cyber Resilience Act will mandate SBOM creation and maintenance starting in 2027.

Standard SBOM formats include SPDX, CycloneDX, and SWID. This article focuses on SPDX, which has the longest history and the most widespread adoption across ecosystems.

What Is SPDX?

SPDX (Software Package Data Exchange) is an open standard for SBOMs led by the Linux Foundation. It has been in operation since 2010 and is internationally standardized as ISO/IEC 5962:2021. The SPDX project consists of three components:

  • Specification — The SBOM specification
  • License List — License identifiers ← The focus of this article
  • Tools — Generation and validation tools

A key strength of SPDX is its comprehensive framework for license management.

Historically, automating license management has been fraught with challenges. First, there is the problem of inconsistent naming. The same license might be referred to as MIT, MIT License, or Expat. Second, licenses can involve complex situations. Sometimes a different license applies to only part of the code, or a project uses dual licensing where users can choose between licenses. Debian’s copyright file was a pioneering effort in machine-readable license declarations, but it was too coarse-grained for managing dependency licenses—for example, both GPLv2 and GPLv3 were lumped together as simply “GPL.” Third, there was no standard way to declare licenses in a machine-readable format. The only option was to check whether a file named LICENSE or LICENSE.txt existed at the top of a repository, which had clear limitations for automated detection.

SPDX addresses these challenges by providing the “SPDX License List,” which assigns a unique identifier to each license. This list is now used as a standard in major package managers and platforms including npm, PyPI, Cargo, and GitHub.

SPDX License List

The SPDX License List assigns a unique identifier (SPDX License Identifier) to each license. This standardizes how licenses are identified and referenced, making management easier.

Key SPDX License Identifiers

IdentifierLicense Name
MITMIT License
Apache-2.0Apache License 2.0
GPL-3.0-onlyGNU GPL v3.0 only
GPL-3.0-or-laterGNU GPL v3.0 or later
BSD-3-ClauseBSD 3-Clause License

For GNU licenses (GPL, LGPL, AGPL), the target version is specified explicitly. Instead of GPL-3.0, you use GPL-3.0-only (v3.0 only) or GPL-3.0-or-later (v3.0 or any later version at the user’s choice).

Exceptions

Exceptions are additional clauses to a license. SPDX defines identifiers such as GCC-exception-3.1, LLVM-exception, and Classpath-exception-2.0.

Particularly important are compiler-related exceptions. When you build a program with compilers like GCC or Clang, the resulting binary links against libraries provided by the compiler (such as libgcc or libc++). Normally, this would impose the obligations of these libraries’ licenses (GPL copyleft or Apache 2.0 attribution requirements) on the user’s program, but these exceptions waive those obligations. For example, GCC-exception-3.1 is an exception that prevents the GPL from propagating when linking against GCC-provided libraries. Similarly, LLVM-exception exempts Apache 2.0 attribution requirements when linking against Clang-provided libraries. Additionally, Classpath-exception-2.0 is used in OpenJDK and prevents the GPL from propagating when linking with independent modules.

License Expression Syntax

A distinctive feature of SPDX is its license expression syntax. Previously, complex licensing conditions such as “choose either MIT or Apache-2.0” or “GPL with an exception clause” could only be described in natural language. However, SPDX defined three operators—OR, AND, and WITH—enabling such complex conditions to be expressed in a machine-readable format. This makes automated license detection, aggregation, and verification practical, simplifying license management even for projects with many dependency packages.

The following operators are used to express multiple licenses:

OR operator: Dual licensing

  • Users can choose either license
  • Example: MIT OR Apache-2.0 (choose either MIT or Apache-2.0; Rust uses this license)

AND operator: Multiple conditions

  • Users must comply with both license conditions
  • Example: LGPL-2.1-only AND BSD-3-Clause (both LGPL v2.1 and the Modified BSD License apply; this can occur when some files in a library were imported from another project with a different license)

WITH operator: Adding exception clauses

  • An exception clause is added to the license
  • Example: GPL-2.0-only WITH Classpath-exception-2.0 (GPL v2, but linking with independent modules does not trigger copyleft)

License Management in the Python Ecosystem

In Python, PEP 639 defines how to declare licenses in pyproject.toml, specifying two fields: License-Expression and License-File.

[project]
name = "my-package"
license = "MIT"  # or SPDX license expression syntax like "MIT OR Apache-2.0"
license-files = ["LICENSE", "NOTICE"]

License-Expression is defined as using SPDX license expression syntax, making it extremely useful for automated license management. Ideally, License-Expression would eventually resolve all of these issues, but as of 2026, we’re still far from that ideal, and many issues remain.

Although it has been over a year since PEP 639 defined License-Expression, many packages have yet to adopt this field. When trying to check the licenses of dependency packages, you’ll encounter various issues:

  • Packages that haven’t adopted PEP 639 at all
  • Packages with PEP 639 compliance pull requests left unmerged
  • Packages where PEP 639 fixes have been merged but not released
  • Cases where the latest version supports PEP 639, but you depend on an older version that lacks License-Expression

Beyond these relatively straightforward issues, we also encountered more subtle problems that required reading the source code to understand the situation. For example, ninja has its PyPI license metadata listed as “Apache Software License, BSD License,” but the correct license is actually Apache-2.0—the BSD mention is incorrect. It appears to have been introduced through a copy-paste error from another package, and our colleague Tokunaga has submitted a fix PR. Additionally, torchaudio is distributed under the BSD license, but if ffmpeg is installed in the environment, it will be used. ffmpeg is often built with the --enable-gpl option, which applies the GPL. This means the effective license via torchaudio can vary depending on the environment. Furthermore, soundfile is distributed under the BSD license, but internally uses libsndfile, which is LGPL. Since it’s a C library without machine-readable license information, you wouldn’t notice this without checking the repository.

As these examples show, you cannot rely solely on automated detection by tools. Metadata can be incorrect, licenses can vary by environment, and native dependencies require separate verification.

Putting SPDX to Use

Despite the challenges mentioned above, using SPDX License Identifiers does improve license management. Even if it’s not perfect, describing and retrieving information in a standardized format significantly reduces the burden of manual verification.

Checking Licenses of Your Dependencies

Start by checking the licenses of the packages your project depends on. Each language ecosystem has license-checking tools.

Python: pip-licenses

pip install pip-licenses
pip-licenses --format=markdown

Example output:

NameVersionLicense
numpy2.4.1BSD-3-Clause
packaging26.0Apache-2.0 OR BSD-2-Clause
uvicorn0.40.0BSD-3-Clause

pip-licenses (v5.5.0) prioritizes PEP 639 License-Expression metadata when available. For packages without License-Expression, it falls back to traditional Trove Classifiers and License metadata.

Node.js: license-checker

npx license-checker --summary

Rust: cargo-license

cargo install cargo-license
cargo license

Adding SPDX License Identifiers to Your Own Code

If you write open-source software, add SPDX License Identifiers to your own code. This allows other developers who use your code to accurately understand the license terms.

Source files

Add the identifier in a comment (before or after the shebang or copyright notice):

# SPDX-License-Identifier: MIT

pyproject.toml (Python)

Follow PEP 639 and use SPDX license expression syntax:

[project]
name = "my-package"
license = "MIT"
license-files = ["LICENSE"]

package.json (npm)

{
  "name": "my-package",
  "license": "MIT"
}

Cargo.toml (Rust)

[package]
name = "my-package"
license = "MIT OR Apache-2.0"

Summary

This article introduced the challenges of license management and how SPDX addresses them.

  • Inconsistent naming is resolved by SPDX License Identifiers, which uniquely identify each license.
  • Complex licensing conditions can now be expressed in a machine-readable format using license expression syntax (OR, AND, WITH).
  • Lack of a standard declaration location is being resolved as major package managers (npm, PyPI, Cargo) adopt the SPDX format, thanks to SPDX’s standardization of license notation.

We also discussed the current state of standardization (PEP 639) in the Python ecosystem and the practical challenges encountered in the field.

License management with SPDX is not perfect. There are challenges that tools alone cannot solve, such as metadata errors, environment-dependent variations, and native dependency issues. However, the adoption of SPDX License Identifiers in major package managers continues to grow, and SBOM mandates under the Cyber Resilience Act will begin in 2027. The wave of standardization will increase the reliability of license management and improve transparency across the entire ecosystem.

Work With Us

While this article focused on SPDX, our core business is providing AI-related consulting services. If you’re interested in practical AI applications, please reach out through our contact form. We look forward to hearing from you.

Share: