AIL framework - Analysis Information Leak framework
  • Python 38.9%
  • HTML 34.8%
  • JavaScript 23.9%
  • Shell 1.7%
  • CSS 0.7%
Find a file
terrtia b74bd75cdc
Some checks are pending
CI / ail_test (push) Waiting to run
chg: [README] darknet keyword
2026-03-13 14:15:50 +01:00
.github/workflows Fix GitHub Actions workflow: add disk cleanup and optimize matrix 2025-11-19 09:18:53 +00:00
bin chg: [v6.7] update delete old indexes + doc + add reindexing tool 2026-03-12 11:47:53 +01:00
configs fix: [queue] remove old indexer 2026-02-20 15:40:25 +01:00
doc chg: [README] update 2026-03-13 13:51:47 +01:00
files chg: [mail] add new mail object + correlation chats/domains/crawled 2025-04-16 17:15:19 +02:00
logs Travis, print logs 2016-01-19 12:01:45 +01:00
other_installers Fix [Lacus] new dependencies 2025-02-17 10:47:19 +01:00
samples/2021/01/01 chg: [modules + tests] fix modules + test modules on samples 2021-06-08 16:46:36 +02:00
tests fix: [test] fix crawler test rel path + kwargs 2025-11-27 15:31:22 +01:00
tools chg: [v6.7] update delete old indexes + doc + add reindexing tool 2026-03-12 11:47:53 +01:00
update chg: [README] update 2026-03-13 13:51:47 +01:00
var/www chg: [README] update 2026-03-13 13:51:47 +01:00
.gitchangelog.rc chg: [gitchangelogrc] match sub version of tags 2025-01-23 15:16:23 +01:00
.gitignore chg: [search] refactor search, replace multiple search calls with a global search + recent/best match filter + date filtering 2026-02-20 14:39:47 +01:00
.gitmodules chg: [tags] refactor tags + cleanup 2022-11-22 10:47:15 +01:00
HOWTO.md chg: [v6.7] update delete old indexes + doc + add reindexing tool 2026-03-12 11:47:53 +01:00
install_virtualenv.sh chg: [faup] replace and remove faup 2025-07-07 14:16:23 +02:00
installing_deps.sh chg: Move yara install from var/www/update_thirdparty.sh to installing_deps.sh 2025-08-18 18:29:36 +02:00
LICENSE Initial import of AIL framework - Analysis Information Leak framework 2014-08-06 11:43:40 +02:00
README.md chg: [README] darknet keyword 2026-03-13 14:15:50 +01:00
requirements.txt chg: [PDF] add support for PDF Files. processing + correlation + content to markdown 2025-11-13 17:31:22 +01:00
reset_AIL.sh Typo in CRAWLED_SCREENSHOT 2024-01-01 14:10:42 +01:00
SECURITY.md Create SECURITY.md 2022-02-03 10:15:12 +01:00

AIL logo

AIL Framework

Open-source framework for the collection, crawling, processing, and analysis of unstructured information.

Latest Release CI Gitter Contributors License

AIL framework is an open-source platform to collect, crawl, process and analyse unstructured data from the clear web, Tor, I2P, chats, files and external feeds.

Originally developed at CIRCL, AIL helps analysts transform raw, messy content into structured intelligence through extraction, tagging, detection, correlation and investigation workflows.

AIL dashboard

What is AIL? https://ail-project.org

AIL (Analysis of Information Leaks) is an open-source framework for the collection, crawling, processing, and analysis of unstructured information. It supports threat intelligence, leak analysis, and investigative workflows by helping analysts extract, detect, correlate, and share relevant information from a wide range of sources.

AIL includes:

  • an extensible Python-based framework for processing and analysing unstructured information,
  • a crawler manager for continuous and authenticated collection,
  • feeders for communication platforms and external streams,
  • a detection and retro-hunt engine based on keywords, regex and YARA,
  • search, correlation and investigation capabilities to pivot across extracted data,
  • and export/integration features for platforms such as MISP.

AIL intelligence lifecycle

AIL follows a practical intelligence workflow:

  1. Collection Continuous ingestion from chats, websites, hidden services, files and feeds.
  2. Processing Extraction, decoding, OCR, QR/barcode parsing, enrichment and tagging.
  3. Detection Real-time tracking with words, sets, regex, typo-squatting and YARA rules.
  4. Analysis Search, pivoting, correlation graphs and investigations.
  5. Dissemination Export of findings and objects to MISP intelligence-sharing platforms.

Whats new in AIL v6.7

AIL is now at v6.7 and recent releases significantly expanded search, image analysis, crawling and document-processing capabilities.

Highlights include:

  • Unified search interface with best-match and most-recent ordering
  • Date range filtering and improved advanced search workflows
  • Image and screenshot descriptions for faster visual analysis and searchability
  • Expanded OCR and QR extraction, including support for more difficult image cases
  • Full PDF processing pipeline, including metadata extraction and translation support
  • I2P crawling support in addition to clear web and Tor collection
  • Passive SSH correlation for infrastructure analysis and deanonymization workflows
  • Improved chat exploration for platforms such as Discord, Telegram and Matrix

Features

AIL internal overview

Collection

  • Modular architecture to handle streams of unstructured information
  • Multiple feeder and importer support
  • Feeders for chat and stream sources such as Discord, Telegram and other providers
  • Crawling support for the clear web, darknet, Tor hidden services (.onion), and I2P
  • Authenticated crawling with browser sessions, cookies and local storage reuse
  • Continuous or on-demand monitoring of websites and hidden services over time
  • UI submission/import capabilities

Processing and enrichment

  • Full-text indexing of unstructured information (chats, crawled contents)
  • Extraction of URLs, hostnames, email addresses and credentials
  • Detection of phone numbers, API keys, IBANs, certificates and private keys
  • Detection of Bitcoin addresses, private keys and related cryptocurrency artifacts
  • File extraction and decoding from encoded content (Base64, hex)
  • OCR processing for screenshots and images
  • QR code and barcode extraction with reprocessing of embedded content
  • AI-assisted descriptions for images, screenshots and domains
  • PDF metadata extraction, ingestion and translation
  • Tagging system using MISP Galaxy and MISP Taxonomies

Detection and tracking

Trackers are user-defined rules or patterns that automatically detect, tag and notify analysts about relevant information collected by AIL.

Supported tracker types:

  • word tracking
  • set-of-words tracking
  • regex tracking
  • YARA rules
  • typo-squatting detection

Detection capabilities include:

  • real-time tagging and classification
  • object occurrence tracking
  • webhook or email notification workflows
  • built-in YARA editor

AIL also supports Retro Hunts, enabling analysts to run newly created YARA rules against historical data to uncover previously missed content.

tracker-create

tracker-yara

retro-hunt

Search, correlation and investigation

  • Unified search interface with recency and relevancy ordering
  • Search by date range and specialized advanced search for selected data types
  • Search across chats, crawled domains, titles, filenames and AI-generated descriptions
  • Correlation engine and graph visualisation for relationships between:
    • decoded files and hashes
    • PGP metadata
    • domains, titles, dom-hash, favicons, cookie-names
    • usernames and user-accounts
    • CVEs
    • SSH keys
    • cryptocurrencies
    • PDF metadata
    • ...
  • Investigation workflow to group, enrich and follow analyst findings

global search

Export and integrations

  • Alerting and sharing to MISP
  • Export of AIL objects and investigations to MISP formats
  • Automatic exports on selected detections and tags
  • Integrations supporting collaborative intelligence and incident-response workflows

Why AIL?

AIL is built for analysts who need to work with messy, real-world data:

  • free text,
  • screenshots,
  • PDFs and files,
  • chat messages,
  • encoded payloads,
  • content collected from web, Tor and I2P sources.

Instead of treating those sources separately, AIL helps turn them into searchable, correlated and actionable intelligence.

Screenshots

Websites, forums and hidden services

Domain CIRCL

Login-protected crawling with pre-recorded session cookies

Domain cookiejar

Extracted and decoded files

Extracted files

Correlation engine

Onion Domains Correlations

Correlation decoded image

Investigation

Investigation

Tagging system

Tags

Tags search

MISP export

misp_export

Automatic events and alerts

tags_misp_auto

UI submission

ui_submit

Installation

To install the AIL framework:

# Clone the repository
git clone https://github.com/ail-project/ail-framework.git
cd ail-framework
git submodule update --init --recursive

# Install dependencies on Debian/Ubuntu-based distributions
./installing_deps.sh

# Start AIL
cd bin
./LAUNCH.sh -l

The default installing_deps.sh script targets Debian and Ubuntu based distributions.

Requirements

  • Python 3.8+

How to size the hardware requirements for AIL?

Installation notes

Some optional components require additional configuration, including the Lacus crawler, the Meilisearch search indexer, and the translation. See the HOWTO for detailed setup instructions.

Starting AIL

cd bin
./LAUNCH.sh -l

The web interface is available at:

https://localhost:7000/

The default credentials are stored in the DEFAULT_PASSWORD file and the file is removed once the password is changed.

Documentation

Training

Training materials on how to use and extend the AIL framework are available at ail-project/ail-training.

Privacy and GDPR

For information on privacy and GDPR-related considerations, see the document AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks.

This document provides guidance on using AIL in a lawful context, especially within the scope of the General Data Protection Regulation.

Research using AIL

If you use or reference AIL in academic work, you can cite it as follows:

@inproceedings{mokaddem2018ail,
  title={AIL-The design and implementation of an Analysis Information Leak framework},
  author={Mokaddem, Sami and Wagener, G{\'e}rard and Dulaunoy, Alexandre},
  booktitle={2018 IEEE International Conference on Big Data (Big Data)},
  pages={5049--5057},
  year={2018},
  organization={IEEE}
}

License

Copyright (C) 2014 Jules Debra
Copyright (c) 2021 Olivier Sagit
Copyright (C) 2014-2026 CIRCL - Computer Incident Response Center Luxembourg
Copyright (c) 2014-2024 Raphaël Vinot
Copyright (c) 2014-2026 Alexandre Dulaunoy
Copyright (c) 2016-2024 Sami Mokaddem
Copyright (c) 2018-2026 Thirion Aurélien

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.