SPDX in Practice for Developers: Checking Dependency Licenses for Transformers with pip-licenses
This article is a machine translation from Japanese. It may contain translation errors.
I’m Yuki from PredNext. In the previous article, I introduced SPDX license identifiers and expression syntax (AND/OR/WITH), and touched on the current state of inconsistent license notation even as the transition to PEP 639 progresses.
In this article, I’ll use Hugging Face Transformers as a case study and walk through checking dependency licenses with pip-licenses. I’ll cover three common needs — “getting an overview,” “blocking prohibited licenses,” and “restricting to an allowlist” — showing how pip-licenses handles each, while also touching on notation inconsistencies and platform differences.
Setup
I used uv to create a project with transformers[torch] as a dependency.
uv init hf-license-check --python 3.12
cd hf-license-check
uv add 'transformers[torch]' pip-licenses
Checking Dependency Licenses for Transformers
Running uv run pip-licenses lists the licenses of all installed packages. The output format can be changed with the -f option to markdown, json, csv, etc. You can include package URLs with --with-urls and full license text with --with-license-file. This is useful for compiling license information for submission.
I checked the license status of the transformers[torch] environment on macOS (Python 3.12.7, pip-licenses 5.5.1) and aggregated the results by license family:
| Family | Count |
|---|---|
| BSD family | 15 |
| MIT family | 9 |
| Apache family | 7 |
| MPL family | 2 |
| ISC | 1 |
| PSF | 1 |
Out of 35 packages, there were 0 GPL-family licenses and 0 UNKNOWN entries. All are permissive open-source licenses.
Full list of all 35 packages (click to expand)
| Name | Version | License |
|---|---|---|
| Jinja2 | 3.1.6 | BSD License |
| MarkupSafe | 3.0.3 | BSD-3-Clause |
| PyYAML | 6.0.3 | MIT License |
| Pygments | 2.19.2 | BSD License |
| accelerate | 1.13.0 | Apache Software License |
| annotated-doc | 0.0.4 | MIT |
| anyio | 4.12.1 | MIT |
| certifi | 2026.2.25 | Mozilla Public License 2.0 (MPL 2.0) |
| click | 8.3.1 | BSD-3-Clause |
| filelock | 3.25.1 | MIT |
| fsspec | 2026.2.0 | BSD-3-Clause |
| h11 | 0.16.0 | MIT License |
| hf-xet | 1.3.2 | Apache-2.0 |
| httpcore | 1.0.9 | BSD-3-Clause |
| httpx | 0.28.1 | BSD License |
| huggingface_hub | 1.6.0 | Apache Software License |
| idna | 3.11 | BSD-3-Clause |
| markdown-it-py | 4.0.0 | MIT License |
| mdurl | 0.1.2 | MIT License |
| mpmath | 1.3.0 | BSD License |
| networkx | 3.6.1 | BSD-3-Clause |
| numpy | 2.4.3 | BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0 |
| packaging | 26.0 | Apache-2.0 OR BSD-2-Clause |
| psutil | 7.2.2 | BSD-3-Clause |
| regex | 2026.2.28 | Apache-2.0 AND CNRI-Python |
| rich | 14.3.3 | MIT License |
| safetensors | 0.7.0 | Apache Software License |
| shellingham | 1.5.4 | ISC License (ISCL) |
| sympy | 1.14.0 | BSD License |
| tokenizers | 0.22.2 | Apache Software License |
| torch | 2.10.0 | BSD-3-Clause |
| tqdm | 4.67.3 | MPL-2.0 AND MIT |
| transformers | 5.3.0 | Apache 2.0 License |
| typer | 0.24.1 | MIT |
| typing_extensions | 4.15.0 | PSF-2.0 |
License Notation Inconsistencies
Looking at the list, you can see that multiple notations coexist for the same license. For example, the MIT License appears as both “MIT License” and “MIT,” BSD licenses appear as both “BSD License” and “BSD-3-Clause,” and Apache licenses appear as “Apache Software License,” “Apache 2.0 License,” and “Apache-2.0.”
This inconsistency is a transitional issue as the ecosystem migrates to PEP 639. PEP 639 moves Python package license metadata toward SPDX expressions via License-Expression, but as of 2026, many packages still use the legacy Classifier-based notation.
These notation inconsistencies affect the policy checks described in the next section.
Policy Checks with pip-licenses
pip-licenses provides options for automated license verification. Here, I’ll look at how pip-licenses addresses three common needs.
Need 1: Compiling dependency licenses into a report
The basic pip-licenses command retrieves a list of dependency package licenses. With additional options, it can also serve as source material for third-party reports or SBOM creation.
uv run pip-licenses -f markdown --with-urls --with-license-file \
--output-file=license-report.md
Need 2: Blocking copyleft licenses like GPL (blocklist)
The most common need in commercial projects is confirming that copyleft licenses such as GPL and AGPL are not included in dependencies. Combining --fail-on with --partial-match causes an error (exit code 1) when a specified license is found.
uv run pip-licenses --fail-on="GPL;AGPL;LGPL" --partial-match
--partial-match enables substring matching. The string GPL matches both GNU General Public License v3.0 and GPL-3.0-only, so you can roughly block by license family even when notation varies.
Since the Transformers dependencies contain no GPL-family licenses, this check passes without issues. For CI integration, this blocklist approach is the simplest and most robust.
Need 3: Restricting to only allowed licenses (allowlist)
For stricter policy management, --allow-only lets you specify a list of permitted licenses, and any license not in the list triggers an error.
However, --allow-only checks by exact string match by default, so you need to handle both notation inconsistencies and SPDX compound expressions. For example, even if you allow Apache-2.0 and CNRI-Python individually, the combined string Apache-2.0 AND CNRI-Python won’t pass unless it’s also in the allowlist. The current dependencies include the following compound expressions:
| Package | License Expression | Meaning |
|---|---|---|
| numpy | BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0 | Five licenses apply to different parts of the code |
| packaging | Apache-2.0 OR BSD-2-Clause | Dual license — users may choose either |
| regex | Apache-2.0 AND CNRI-Python | Both licenses apply simultaneously |
| tqdm | MPL-2.0 AND MIT | Both MPL-2.0 and MIT apply |
Running with an allowlist that accounts for both notation inconsistencies and compound expressions:
uv run pip-licenses \
--allow-only="MIT;BSD-3-Clause;Apache-2.0;ISC;PSF-2.0;0BSD;Zlib;CC0-1.0;CNRI-Python;\
MIT License;BSD License;Apache Software License;Apache 2.0 License;\
ISC License (ISCL);Mozilla Public License 2.0 (MPL 2.0);MPL-2.0;\
Apache-2.0 AND CNRI-Python;Apache-2.0 OR BSD-2-Clause;\
BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0;MPL-2.0 AND MIT"
This passes all 35 packages. However, the list needs updating whenever dependency packages migrate to SPDX expressions or new license notations are added, resulting in high maintenance cost. While --partial-match can be combined with --allow-only to simplify the list, there is a risk of unintended matches, so use it according to your needs.
Results Differ Across Platforms
Even for the same packages, the license composition changes by platform. All results so far were from macOS, so I used Docker to install the same transformers[torch] in a Linux environment and ran pip-licenses.
On Linux x86_64, 16 additional NVIDIA CUDA packages are installed as PyTorch dependencies. While all 35 packages on macOS had permissive open-source licenses, on Linux x86_64, 16 out of 53 packages (about 30%) have proprietary licenses.
Moreover, the license notation for these 16 NVIDIA packages is split across four variations:
| Notation | Count | Example |
|---|---|---|
Other/Proprietary License | 13 | nvidia-cublas-cu12, etc. |
NVIDIA Proprietary Software | 1 | nvidia-cusparselt-cu12 |
LicenseRef-NVIDIA-Proprietary | 1 | nvidia-nvshmem-cu12 |
LicenseRef-NVIDIA-SOFTWARE-LICENSE | 1 | cuda-bindings |
Since there are no standard SPDX identifiers for proprietary licenses, each package defines a different LicenseRef- identifier. All 16 packages above are governed by the same NVIDIA CUDA EULA, but three approaches — Classifier-based, legacy metadata, and SPDX LicenseRef — coexist.
Running license checks on the same platform as your deployment target is the most reliable approach.
Automation with GitHub Actions
Here’s how to integrate license checks into CI. The following is a configuration example that automatically runs license checks when pyproject.toml or uv.lock changes. It uses the simple and robust blocklist (--fail-on) approach.
GitHub Actions workflow example (click to expand)
name: License Check
on:
pull_request:
paths:
- 'pyproject.toml'
- 'uv.lock'
jobs:
license-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: uv sync
- name: Check licenses
run: |
uv run pip-licenses \
--fail-on="GPL;AGPL;LGPL;EUPL;SSPL" \
--partial-match
- name: Generate license report
if: always()
run: uv run pip-licenses -f markdown --with-urls > license-report.md
- name: Upload license report
if: always()
uses: actions/upload-artifact@v4
with:
name: license-report
path: license-report.md
By default, CI runs on ubuntu-latest (Linux x86_64), so as shown in the previous section, additional packages like NVIDIA CUDA may be included. If you accept packages with proprietary licenses, leave it as is; if you want to exclude them, add their license notations to --fail-on or use --ignore-packages to exclude specific packages. For stricter management with --allow-only, you’ll need to handle the notation inconsistencies and compound expressions mentioned earlier.
Tools Beyond pip-licenses
pip-licenses only covers Python package metadata. When distributing container images, tools that can scan the entire image filesystem are useful. Here, I compare results from scanning the Docker image mentioned earlier using syft (Anchore’s SBOM generator) and trivy (Aqua Security’s security scanner).
Comparison of Three Tools
| Item | pip-licenses | syft | trivy |
|---|---|---|---|
| Approach | Reads Python metadata | Scans entire image → generates SBOM | Scans entire image → classifies licenses |
| Python packages detected | 53 | 70 | 70 |
| OS package detection | No | Yes (173 packages) | Yes (624 license entries) |
| License notation | Classifier / SPDX as-is | Normalized to SPDX identifiers (some LicenseRef) | Normalized to SPDX identifiers |
| Severity classification | None | None | Yes (notice / reciprocal / restricted) |
The difference in Python package detection counts is due to different detection targets. The Docker image has 58 packages registered in pip list. pip-licenses excludes by default 5 packages — system packages (pip, setuptools) and its own dependencies (pip-licenses, prettytable, wcwidth) — resulting in 53. syft and trivy scan the filesystem directly and also detect 12 vendored packages inside setuptools, resulting in 70.
License Notation Normalization
syft and trivy normalize license notations to SPDX identifiers. Here’s a comparison with pip-licenses:
| Package | pip-licenses | syft | trivy |
|---|---|---|---|
| PyYAML | MIT License | MIT | MIT |
| Pygments | BSD License | BSD-2-Clause | BSD-3-Clause |
| httpx | BSD License | BSD-3-Clause | BSD-3-Clause |
| Jinja2 | BSD License | NOASSERTION | BSD-3-Clause |
| accelerate | Apache Software License | LicenseRef-Apache | Apache-2.0 |
| transformers | Apache 2.0 License | LicenseRef-Apache-2.0-License | Apache-2.0 |
| certifi | Mozilla Public License 2.0 (MPL 2.0) | MPL-2.0 | MPL-2.0 |
trivy excels in normalization consistency, converting Classifier-based notations like Apache Software License to the SPDX identifier Apache-2.0. syft tends to fall back to LicenseRef-* for some cases, and even produces NOASSERTION for Jinja2.
However, normalization isn’t always accurate. In the table above, trivy identifies Pygments as BSD-3-Clause while syft identifies it as BSD-2-Clause. Pygments’ actual license is BSD-2-Clause, making syft more accurate. trivy tends to uniformly convert BSD License to BSD-3-Clause.
Even larger discrepancies appear with NVIDIA packages. For example, nvidia-nccl-cu12 is reported as Other/Proprietary License by pip-licenses and trivy, but syft reports BSD-3-Clause. NCCL’s source code is actually published under Apache-2.0 (partially BSD), so none of the tools are fully accurate. Since the licensing of NVIDIA CUDA-related packages is complex, individual verification is recommended.
As shown, different tools can produce conflicting results for the same package. Don’t blindly trust a single tool’s output — cross-checking and reviewing the original license text is essential.
trivy’s Severity Classification
trivy automatically classifies licenses by category and severity, which can be directly used for policy design.
| Category | Severity | Example Licenses |
|---|---|---|
| notice | LOW | MIT, BSD-3-Clause, Apache-2.0, ISC, etc. |
| reciprocal | MEDIUM | MPL-2.0 (certifi, tqdm) |
| restricted | HIGH | LGPL-3.0-only (autocommand) |
| unknown | UNKNOWN | NVIDIA proprietary, PSF-2.0, etc. |
Vendored Packages Invisible to pip-licenses
An LGPLv3 package undetectable by pip-licenses was discovered by syft and trivy: autocommand, vendored inside setuptools’ _vendor/ directory (syft: LicenseRef-LGPLv3, trivy: LGPL-3.0-only, severity: HIGH).
transformers[torch]
└→ torch
└→ setuptools (required dependency for Python 3.12+)
└→ _vendor/autocommand (vendored, LGPLv3)
Vendored packages don’t appear in pip list, so they can’t be detected by pip-licenses’ --fail-on="GPL;AGPL;LGPL" either. Of the 12 packages vendored by setuptools, autocommand is the only one with LGPLv3 (copyleft).
autocommand is not loaded at runtime. It’s only imported by setuptools’ internal development support utilities and is unrelated to the transformers/torch workflow. However, whether including it in a container image constitutes “distribution” is subject to interpretation, so it could be flagged in strict license audits.
OS-Level Licenses
When distributing container images, OS package licenses also need to be considered. trivy also scans Debian packages inside the container (bash, coreutils, tar, etc.) and reported 624 license entries. Even if Python packages show “no GPL,” OS packages contain numerous GPL-2.0 / GPL-3.0 licenses.
Choosing the Right Tool
| Purpose | Recommended Tool | Reason |
|---|---|---|
| License checks during development | pip-licenses | Clean output aligned with pip’s dependency tree. Easy CI integration |
| Container image distribution | syft / trivy | Covers licenses of all physically included files |
| License auditing & compliance | Multiple tools combined | pip-licenses alone misses vendored packages |
Summary
pip-licenses makes it easy to list dependency licenses for Python projects. You can choose the policy check method that fits your needs:
- Report generation (
-f markdown --with-urls): Get an overview of dependency licenses - Blocking prohibited licenses (
--fail-on+--partial-match): The simplest and most robust approach for excluding copyleft licenses like GPL - Restricting to an allowlist (
--allow-only): The strictest approach, but requires handling notation inconsistencies and SPDX compound expressions, resulting in high maintenance cost
Keep the following points in mind when managing dependency licenses:
- Notation inconsistencies: The coexistence of Classifier-based and SPDX expressions means the same license may have multiple notations
- Platform differences: License composition changes by platform, so checks should run on the same environment as your deployment target
- Vendored packages: Vendored packages undetectable by pip-licenses exist. When distributing containers, combining syft/trivy is effective
Start by understanding how many licenses are involved in a single pip install, and build your license management from there.
We’re Open for Business
PredNext is currently accepting new project inquiries. We specialize in AI technologies such as natural language processing and image processing, with particular strengths in model compression and acceleration. If you’re interested, please reach out through our contact form.