| cpes | ||
| metadata/vendors | ||
| products | ||
| proposals | ||
| purl-mappings | ||
| relationships | ||
| vendors | ||
| manifest.json | ||
| README.md | ||
cpe.gcve.eu dump
This repository contains a git-friendly dump of the dataset used by cpe.gcve.eu, the GCVE CPE Editor.
The data maintained in the CPE Editor is regularly exported and pushed to this repository. This makes the current state of the CPE dataset available as a versioned, reviewable, and reproducible git tree. Consumers can use this repository to follow changes over time, mirror the dataset, or integrate the exported CPE, vendor, product, metadata, relationship, and PURL mapping data into their own tooling.
Purpose
The dump is intended to provide:
- a transparent publication channel for the data maintained on
cpe.gcve.eu; - deterministic files that are suitable for git review and diffs;
- a historical record of changes through normal git commits;
- a simple format for downstream consumers, mirrors, researchers, and vulnerability tooling.
The live editing and curation of the data happens through the CPE Editor. This repository is the exported dataset view that is regularly refreshed from that source.
Dataset layout
This directory was generated by:
tools/export_dataset_to_git.py
from a CPE Editor portable dataset export.
The exported tree is optimized for git review and history:
- entity records are stored as deterministic JSON files;
- potentially large collections are sharded into sorted JSON Lines files;
- indexes and layout information are described in
manifest.json; - UUID-based references are preserved to make relationships stable across exports.
See manifest.json for the layout contract, shard structure, and UUID indexes.
Counts
Current dataset counts:
- Vendors: 56,235
- Products: 249,645
- CPEs: 2,000,365
- Metadata rows: 3
- Relationships: 5
- PURL mappings: 2,612,697
- Proposals: 0
These numbers describe the exported dataset at the time this dump was generated. They may change in later commits as the CPE Editor dataset evolves.
Update model
The dataset from cpe.gcve.eu is exported on a regular basis and pushed to this repository.
Each push represents a new snapshot of the CPE Editor dataset. The git history can therefore be used to inspect what changed between exports, including additions, corrections, removals, and updates to mappings or relationships.
To update a local copy, use:
git pull
Downstream users should treat the repository history as the publication log for the dump.
Consuming the dump
Recommended consumption flow:
- Read
manifest.jsonto understand the current layout and indexes. - Process deterministic JSON entity files for vendors, products, and CPE records.
- Process sharded JSON Lines files for large collections such as mappings.
- Use UUID indexes from the manifest when resolving relationships between entities.
- Track updates through git commits rather than relying only on file modification times.
The export format is designed to be stable enough for automated ingestion while remaining easy to inspect manually during reviews.
Notes
This repository is a dump of the CPE Editor dataset. It is not a replacement for the live editor interface.
For data corrections or proposals, use the workflow provided by the CPE Editor or the contribution process documented by the GCVE project.