SPDX in Practice for Developers: Checking Dependency Licenses for Transformers with pip-licenses

2026年3月19日

This article is a machine translation from Japanese. It may contain translation errors.

I’m Yuki from PredNext. In the previous article, I introduced SPDX license identifiers and expression syntax (AND/OR/WITH), and touched on the current state of inconsistent license notation even as the transition to PEP 639 progresses.

In this article, I’ll use Hugging Face Transformers as a case study and walk through checking dependency licenses with pip-licenses. I’ll cover three common needs — “getting an overview,” “blocking prohibited licenses,” and “restricting to an allowlist” — showing how pip-licenses handles each, while also touching on notation inconsistencies and platform differences.

Setup

I used uv to create a project with transformers[torch] as a dependency.

uv init hf-license-check --python 3.12
cd hf-license-check
uv add 'transformers[torch]' pip-licenses

Checking Dependency Licenses for Transformers

Running uv run pip-licenses lists the licenses of all installed packages. The output format can be changed with the -f option to markdown, json, csv, etc. You can include package URLs with --with-urls and full license text with --with-license-file. This is useful for compiling license information for submission.

I checked the license status of the transformers[torch] environment on macOS (Python 3.12.7, pip-licenses 5.5.1) and aggregated the results by license family:

FamilyCount
BSD family15
MIT family9
Apache family7
MPL family2
ISC1
PSF1

Out of 35 packages, there were 0 GPL-family licenses and 0 UNKNOWN entries. All are permissive open-source licenses.

Full list of all 35 packages (click to expand)
NameVersionLicense
Jinja23.1.6BSD License
MarkupSafe3.0.3BSD-3-Clause
PyYAML6.0.3MIT License
Pygments2.19.2BSD License
accelerate1.13.0Apache Software License
annotated-doc0.0.4MIT
anyio4.12.1MIT
certifi2026.2.25Mozilla Public License 2.0 (MPL 2.0)
click8.3.1BSD-3-Clause
filelock3.25.1MIT
fsspec2026.2.0BSD-3-Clause
h110.16.0MIT License
hf-xet1.3.2Apache-2.0
httpcore1.0.9BSD-3-Clause
httpx0.28.1BSD License
huggingface_hub1.6.0Apache Software License
idna3.11BSD-3-Clause
markdown-it-py4.0.0MIT License
mdurl0.1.2MIT License
mpmath1.3.0BSD License
networkx3.6.1BSD-3-Clause
numpy2.4.3BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0
packaging26.0Apache-2.0 OR BSD-2-Clause
psutil7.2.2BSD-3-Clause
regex2026.2.28Apache-2.0 AND CNRI-Python
rich14.3.3MIT License
safetensors0.7.0Apache Software License
shellingham1.5.4ISC License (ISCL)
sympy1.14.0BSD License
tokenizers0.22.2Apache Software License
torch2.10.0BSD-3-Clause
tqdm4.67.3MPL-2.0 AND MIT
transformers5.3.0Apache 2.0 License
typer0.24.1MIT
typing_extensions4.15.0PSF-2.0

License Notation Inconsistencies

Looking at the list, you can see that multiple notations coexist for the same license. For example, the MIT License appears as both “MIT License” and “MIT,” BSD licenses appear as both “BSD License” and “BSD-3-Clause,” and Apache licenses appear as “Apache Software License,” “Apache 2.0 License,” and “Apache-2.0.”

This inconsistency is a transitional issue as the ecosystem migrates to PEP 639. PEP 639 moves Python package license metadata toward SPDX expressions via License-Expression, but as of 2026, many packages still use the legacy Classifier-based notation.

These notation inconsistencies affect the policy checks described in the next section.

Policy Checks with pip-licenses

pip-licenses provides options for automated license verification. Here, I’ll look at how pip-licenses addresses three common needs.

Need 1: Compiling dependency licenses into a report

The basic pip-licenses command retrieves a list of dependency package licenses. With additional options, it can also serve as source material for third-party reports or SBOM creation.

uv run pip-licenses -f markdown --with-urls --with-license-file \
  --output-file=license-report.md

Need 2: Blocking copyleft licenses like GPL (blocklist)

The most common need in commercial projects is confirming that copyleft licenses such as GPL and AGPL are not included in dependencies. Combining --fail-on with --partial-match causes an error (exit code 1) when a specified license is found.

uv run pip-licenses --fail-on="GPL;AGPL;LGPL" --partial-match

--partial-match enables substring matching. The string GPL matches both GNU General Public License v3.0 and GPL-3.0-only, so you can roughly block by license family even when notation varies.

Since the Transformers dependencies contain no GPL-family licenses, this check passes without issues. For CI integration, this blocklist approach is the simplest and most robust.

Need 3: Restricting to only allowed licenses (allowlist)

For stricter policy management, --allow-only lets you specify a list of permitted licenses, and any license not in the list triggers an error.

However, --allow-only checks by exact string match by default, so you need to handle both notation inconsistencies and SPDX compound expressions. For example, even if you allow Apache-2.0 and CNRI-Python individually, the combined string Apache-2.0 AND CNRI-Python won’t pass unless it’s also in the allowlist. The current dependencies include the following compound expressions:

PackageLicense ExpressionMeaning
numpyBSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0Five licenses apply to different parts of the code
packagingApache-2.0 OR BSD-2-ClauseDual license — users may choose either
regexApache-2.0 AND CNRI-PythonBoth licenses apply simultaneously
tqdmMPL-2.0 AND MITBoth MPL-2.0 and MIT apply

Running with an allowlist that accounts for both notation inconsistencies and compound expressions:

uv run pip-licenses \
  --allow-only="MIT;BSD-3-Clause;Apache-2.0;ISC;PSF-2.0;0BSD;Zlib;CC0-1.0;CNRI-Python;\
MIT License;BSD License;Apache Software License;Apache 2.0 License;\
ISC License (ISCL);Mozilla Public License 2.0 (MPL 2.0);MPL-2.0;\
Apache-2.0 AND CNRI-Python;Apache-2.0 OR BSD-2-Clause;\
BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0;MPL-2.0 AND MIT"

This passes all 35 packages. However, the list needs updating whenever dependency packages migrate to SPDX expressions or new license notations are added, resulting in high maintenance cost. While --partial-match can be combined with --allow-only to simplify the list, there is a risk of unintended matches, so use it according to your needs.

Results Differ Across Platforms

Even for the same packages, the license composition changes by platform. All results so far were from macOS, so I used Docker to install the same transformers[torch] in a Linux environment and ran pip-licenses.

On Linux x86_64, 16 additional NVIDIA CUDA packages are installed as PyTorch dependencies. While all 35 packages on macOS had permissive open-source licenses, on Linux x86_64, 16 out of 53 packages (about 30%) have proprietary licenses.

Moreover, the license notation for these 16 NVIDIA packages is split across four variations:

NotationCountExample
Other/Proprietary License13nvidia-cublas-cu12, etc.
NVIDIA Proprietary Software1nvidia-cusparselt-cu12
LicenseRef-NVIDIA-Proprietary1nvidia-nvshmem-cu12
LicenseRef-NVIDIA-SOFTWARE-LICENSE1cuda-bindings

Since there are no standard SPDX identifiers for proprietary licenses, each package defines a different LicenseRef- identifier. All 16 packages above are governed by the same NVIDIA CUDA EULA, but three approaches — Classifier-based, legacy metadata, and SPDX LicenseRef — coexist.

Running license checks on the same platform as your deployment target is the most reliable approach.

Automation with GitHub Actions

Here’s how to integrate license checks into CI. The following is a configuration example that automatically runs license checks when pyproject.toml or uv.lock changes. It uses the simple and robust blocklist (--fail-on) approach.

GitHub Actions workflow example (click to expand)
name: License Check

on:
  pull_request:
    paths:
      - 'pyproject.toml'
      - 'uv.lock'

jobs:
  license-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: astral-sh/setup-uv@v4

      - name: Install dependencies
        run: uv sync

      - name: Check licenses
        run: |
          uv run pip-licenses \
            --fail-on="GPL;AGPL;LGPL;EUPL;SSPL" \
            --partial-match

      - name: Generate license report
        if: always()
        run: uv run pip-licenses -f markdown --with-urls > license-report.md

      - name: Upload license report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: license-report
          path: license-report.md

By default, CI runs on ubuntu-latest (Linux x86_64), so as shown in the previous section, additional packages like NVIDIA CUDA may be included. If you accept packages with proprietary licenses, leave it as is; if you want to exclude them, add their license notations to --fail-on or use --ignore-packages to exclude specific packages. For stricter management with --allow-only, you’ll need to handle the notation inconsistencies and compound expressions mentioned earlier.

Tools Beyond pip-licenses

pip-licenses only covers Python package metadata. When distributing container images, tools that can scan the entire image filesystem are useful. Here, I compare results from scanning the Docker image mentioned earlier using syft (Anchore’s SBOM generator) and trivy (Aqua Security’s security scanner).

Comparison of Three Tools

Itempip-licensessyfttrivy
ApproachReads Python metadataScans entire image → generates SBOMScans entire image → classifies licenses
Python packages detected537070
OS package detectionNoYes (173 packages)Yes (624 license entries)
License notationClassifier / SPDX as-isNormalized to SPDX identifiers (some LicenseRef)Normalized to SPDX identifiers
Severity classificationNoneNoneYes (notice / reciprocal / restricted)

The difference in Python package detection counts is due to different detection targets. The Docker image has 58 packages registered in pip list. pip-licenses excludes by default 5 packages — system packages (pip, setuptools) and its own dependencies (pip-licenses, prettytable, wcwidth) — resulting in 53. syft and trivy scan the filesystem directly and also detect 12 vendored packages inside setuptools, resulting in 70.

License Notation Normalization

syft and trivy normalize license notations to SPDX identifiers. Here’s a comparison with pip-licenses:

Packagepip-licensessyfttrivy
PyYAMLMIT LicenseMITMIT
PygmentsBSD LicenseBSD-2-ClauseBSD-3-Clause
httpxBSD LicenseBSD-3-ClauseBSD-3-Clause
Jinja2BSD LicenseNOASSERTIONBSD-3-Clause
accelerateApache Software LicenseLicenseRef-ApacheApache-2.0
transformersApache 2.0 LicenseLicenseRef-Apache-2.0-LicenseApache-2.0
certifiMozilla Public License 2.0 (MPL 2.0)MPL-2.0MPL-2.0

trivy excels in normalization consistency, converting Classifier-based notations like Apache Software License to the SPDX identifier Apache-2.0. syft tends to fall back to LicenseRef-* for some cases, and even produces NOASSERTION for Jinja2.

However, normalization isn’t always accurate. In the table above, trivy identifies Pygments as BSD-3-Clause while syft identifies it as BSD-2-Clause. Pygments’ actual license is BSD-2-Clause, making syft more accurate. trivy tends to uniformly convert BSD License to BSD-3-Clause.

Even larger discrepancies appear with NVIDIA packages. For example, nvidia-nccl-cu12 is reported as Other/Proprietary License by pip-licenses and trivy, but syft reports BSD-3-Clause. NCCL’s source code is actually published under Apache-2.0 (partially BSD), so none of the tools are fully accurate. Since the licensing of NVIDIA CUDA-related packages is complex, individual verification is recommended.

As shown, different tools can produce conflicting results for the same package. Don’t blindly trust a single tool’s output — cross-checking and reviewing the original license text is essential.

trivy’s Severity Classification

trivy automatically classifies licenses by category and severity, which can be directly used for policy design.

CategorySeverityExample Licenses
noticeLOWMIT, BSD-3-Clause, Apache-2.0, ISC, etc.
reciprocalMEDIUMMPL-2.0 (certifi, tqdm)
restrictedHIGHLGPL-3.0-only (autocommand)
unknownUNKNOWNNVIDIA proprietary, PSF-2.0, etc.

Vendored Packages Invisible to pip-licenses

An LGPLv3 package undetectable by pip-licenses was discovered by syft and trivy: autocommand, vendored inside setuptools’ _vendor/ directory (syft: LicenseRef-LGPLv3, trivy: LGPL-3.0-only, severity: HIGH).

transformers[torch]
  └→ torch
       └→ setuptools  (required dependency for Python 3.12+)
            └→ _vendor/autocommand  (vendored, LGPLv3)

Vendored packages don’t appear in pip list, so they can’t be detected by pip-licenses’ --fail-on="GPL;AGPL;LGPL" either. Of the 12 packages vendored by setuptools, autocommand is the only one with LGPLv3 (copyleft).

autocommand is not loaded at runtime. It’s only imported by setuptools’ internal development support utilities and is unrelated to the transformers/torch workflow. However, whether including it in a container image constitutes “distribution” is subject to interpretation, so it could be flagged in strict license audits.

OS-Level Licenses

When distributing container images, OS package licenses also need to be considered. trivy also scans Debian packages inside the container (bash, coreutils, tar, etc.) and reported 624 license entries. Even if Python packages show “no GPL,” OS packages contain numerous GPL-2.0 / GPL-3.0 licenses.

Choosing the Right Tool

PurposeRecommended ToolReason
License checks during developmentpip-licensesClean output aligned with pip’s dependency tree. Easy CI integration
Container image distributionsyft / trivyCovers licenses of all physically included files
License auditing & complianceMultiple tools combinedpip-licenses alone misses vendored packages

Summary

pip-licenses makes it easy to list dependency licenses for Python projects. You can choose the policy check method that fits your needs:

  • Report generation (-f markdown --with-urls): Get an overview of dependency licenses
  • Blocking prohibited licenses (--fail-on + --partial-match): The simplest and most robust approach for excluding copyleft licenses like GPL
  • Restricting to an allowlist (--allow-only): The strictest approach, but requires handling notation inconsistencies and SPDX compound expressions, resulting in high maintenance cost

Keep the following points in mind when managing dependency licenses:

  • Notation inconsistencies: The coexistence of Classifier-based and SPDX expressions means the same license may have multiple notations
  • Platform differences: License composition changes by platform, so checks should run on the same environment as your deployment target
  • Vendored packages: Vendored packages undetectable by pip-licenses exist. When distributing containers, combining syft/trivy is effective

Start by understanding how many licenses are involved in a single pip install, and build your license management from there.

We’re Open for Business

PredNext is currently accepting new project inquiries. We specialize in AI technologies such as natural language processing and image processing, with particular strengths in model compression and acceleration. If you’re interested, please reach out through our contact form.

Share: