A tool to generate datasets and models based on vulnerabilities descriptions from @vulnerability-lookup.
Find a file
2025-09-30 07:38:57 +02:00
.github/workflows chg: disable training command 2025-02-19 16:22:24 +01:00
docs chg: [RELEASE] Updated release notes, bumped version number, updated dependencies and cleaned a bit the project. 2025-09-05 14:34:06 +02:00
vulntrain chg: [RELEASE] Updated release notes, bumped version number, updated dependencies and cleaned a bit the project. 2025-09-05 14:34:06 +02:00
.gitignore chg: [added] added validator for severity prediction 2025-02-25 10:39:21 +01:00
AUTHORS chg: [RELEASE] Updated CHANGELOG, README, and dependencies. Bumped release number. 2025-07-01 10:40:26 +02:00
CHANGELOG.md chg: [RELEASE] Updated release notes, bumped version number, updated dependencies and cleaned a bit the project. 2025-09-05 14:34:06 +02:00
CITATION.cff chg: updated changelog 2025-07-23 09:14:35 +02:00
COPYING chg: [documentation] Updated README and COPYING. 2025-02-24 10:54:47 +01:00
poetry.lock chg: Updated dependencies. 2025-09-30 07:38:57 +02:00
pyproject.toml chg: [RELEASE] Updated release notes, bumped version number, updated dependencies and cleaned a bit the project. 2025-09-05 14:34:06 +02:00
README.md chg: Reorganization of the modules. 2025-09-05 08:43:03 +02:00

VulnTrain

Latest release License PyPi version

VulnTrain offers a suite of commands to generate diverse AI datasets and train models using comprehensive vulnerability data from Vulnerability-Lookup. It harnesses over one million JSON records from all supported advisory sources to build high-quality, domain-specific models.

Additionally, data from the vulnerability-lookup:meta container, including enrichment sources such as vulnrichment and Fraunhofer FKIE, is incorporated to enhance model quality.

Check out the datasets and models on Hugging Face:

Model on HF

For more information about the use of AI in Vulnerability-Lookup, please refer to the user manual.

Usage

Install VulnTrain:

$ pipx install VulnTrain

Three types of commands are available:

  • Dataset generation: Create and prepare datasets.
  • Model training: Train models using the prepared datasets.
    • Train a model to classify vulnerabilities by severity. Model on HF
    • Train a model for text generation to assist in writing vulnerability descriptions Model on HF
  • Model validation: Assess the performance of trained models (validations, benchmarks, etc.).

Check out the documentation for more information.

How to cite

Bonhomme, C., & Dulaunoy, A. (2025). VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (Version 1.4.0) [Computer software]. https://doi.org/10.48550/arXiv.2507.03607

@misc{bonhomme2025vlai,
    title={VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification},
    author={Cédric Bonhomme and Alexandre Dulaunoy},
    year={2025},
    eprint={2507.03607},
    archivePrefix={arXiv},
    primaryClass={cs.CR}
}

License

VulnTrain is licensed under GNU General Public License version 3

Copyright (c) 2025 Computer Incident Response Center Luxembourg (CIRCL)
Copyright (C) 2025 Cédric Bonhomme - https://github.com/cedricbonhomme
Copyright (C) 2025 Léa Ulusan - https://github.com/3LS3-1F