Merge branch 'master' into gunicorn

This commit is contained in:
Alan Anselmo 2024-02-07 11:44:52 -03:00
commit 25fb6ca377
313 changed files with 19458 additions and 6237 deletions

1
.gitignore vendored
View file

@ -16,6 +16,7 @@ tlsh
Blooms Blooms
PASTES PASTES
CRAWLED_SCREENSHOT CRAWLED_SCREENSHOT
IMAGES
BASE64 BASE64
HASHS HASHS
DATA_ARDB DATA_ARDB

149
HOWTO.md
View file

@ -1,143 +1,72 @@
Feeding, adding new features and contributing # Feeding, Adding new features and Contributing
=============================================
How to feed the AIL framework ## [AIL Importers](./doc/README.md#ail-importers)
-----------------------------
For the moment, there are three different ways to feed AIL with data: Refer to the [AIL Importers Documentation](./doc/README.md#ail-importers)
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP you are using for AIL. ## Feeding Data to AIL
2. You can setup [pystemon](https://github.com/cvandeplas/pystemon) and use the custom feeder provided by AIL (see below).
3. You can feed your own data using the [./bin/file_dir_importer.py](./bin/import_dir.py) script.
### Feeding AIL with pystemon
AIL is an analysis tool, not a collector! AIL is an analysis tool, not a collector!
However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries! However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries!
Feed data to AIL: 1. [AIL Importers](./doc/README.md#ail-importers)
2. ZMQ: Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP you are using for AIL.
1. Clone the [pystemon's git repository](https://github.com/cvandeplas/pystemon): ## How to create a new module
``` git clone https://github.com/cvandeplas/pystemon.git ```
2. Edit configuration file for pystemon ```pystemon/pystemon.yaml```: To add a new processing or analysis module to AIL, follow these steps:
* Configuration of storage section (adapt to your needs):
```
storage:
archive:
storage-classname: FileStorage
save: yes
save-all: yes
dir: "alerts"
dir-all: "archive"
compress: yes
redis: 1. Add your module name in [./configs/modules.cfg](./configs/modules.cfg) and subscribe to at least one module at minimum (Usually, `Item`).
storage-classname: RedisStorage 2. Use [./bin/modules/modules/TemplateModule.py](./bin/modules/modules/TemplateModule.py) as a sample module and create a new file in bin/modules with the module name used in the `modules.cfg` configuration.
save: yes
save-all: yes
server: "localhost"
port: 6379
database: 10
lookup: no
```
* Change configuration for paste-sites according to your needs (don't forget to throttle download time and/or update time).
3. Install python dependencies inside the virtual environment:
```
cd ail-framework/
. ./AILENV/bin/activate
cd pystemon/ #cd to pystemon folder
pip3 install -U -r requirements.txt
```
4. Edit configuration file ```ail-framework/configs/core.cfg```:
* Modify the "pystemonpath" path accordingly
5. Launch ail-framework, pystemon and pystemon-feeder.py (still inside virtual environment):
* Option 1 (recommended):
```
./ail-framework/bin/LAUNCH.py -l #starts ail-framework
./ail-framework/bin/LAUNCH.py -f #starts pystemon and the pystemon-feeder.py
```
* Option 2 (you may need two terminal windows):
```
./ail-framework/bin/LAUNCH.py -l #starts ail-framework
./pystemon/pystemon.py
./ail-framework/bin/feeder/pystemon-feeder.py
```
How to create a new module
--------------------------
If you want to add a new processing or analysis module in AIL, follow these simple steps:
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to at least one module at minimum (Usually, Redis_Global).
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
How to contribute a module ## Contributions
--------------------------
Feel free to fork the code, play with it, make some patches or add additional analysis modules. Contributions are welcome! Fork the repository, experiment with the code, and submit your modules or patches through a pull request.
To contribute your module, feel free to pull your contribution. ## Crawler
AIL supports crawling of websites and Tor hidden services. Ensure your Tor client's proxy configuration is correct, especially the SOCKS5 proxy settings.
Additional information
======================
Crawler
---------------------
In AIL, you can crawl websites and Tor hidden services. Don't forget to review the proxy configuration of your Tor client and especially if you enabled the SOCKS5 proxy
[//]: # (and binding on the appropriate IP address reachable via the dockers where Splash runs.)
### Installation ### Installation
[Install Lacus](https://github.com/ail-project/lacus) [Install Lacus](https://github.com/ail-project/lacus)
### Configuration ### Configuration
1. Lacus URL: 1. Lacus URL:
In the webinterface, go to ``Crawlers>Settings`` and click on the Edit button In the web interface, go to `Crawlers` > `Settings` and click on the Edit button
![AIL Crawler Config](./doc/screenshots/lacus_config.png?raw=true "AIL Lacus Config")
![Splash Manager Config](./doc/screenshots/lacus_config.png?raw=true "AIL Lacus Config") ![AIL Crawler Config Edis](./doc/screenshots/lacus_config_edit.png?raw=true "AIL Lacus Config")
![Splash Manager Config](./doc/screenshots/lacus_config_edit.png?raw=true "AIL Lacus Config") 2. Number of Crawlers:
2. Launch AIL Crawlers:
Choose the number of crawlers you want to launch Choose the number of crawlers you want to launch
![Splash Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures.png?raw=true "AIL Lacus Nb Crawlers Config") ![Crawler Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures.png?raw=true "AIL Lacus Nb Crawlers Config")
![Splash Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures_edit.png?raw=true "AIL Lacus Nb Crawlers Config")
![Crawler Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures_edit.png?raw=true "AIL Lacus Nb Crawlers Config")
Kvrocks Migration ## Chats Translation with LibreTranslate
---------------------
**Important Note:
We are currently working on a [migration script](https://github.com/ail-project/ail-framework/blob/master/bin/DB_KVROCKS_MIGRATION.py) to facilitate the migration to Kvrocks.
Once this script is ready, AIL version 5.0 will be released.**
Please note that the current version of this migration script only supports migrating the database on the same server. Chats message can be translated using [libretranslate](https://github.com/LibreTranslate/LibreTranslate), an open-source self-hosted machine translation.
(If you plan to migrate to another server, we will provide additional instructions in this section once the migration script is completed)
### Installation:
1. Install LibreTranslate by running the following command:
```bash
pip install libretranslate
```
2. Run libretranslate:
```bash
libretranslate
```
### Configuration:
To enable LibreTranslate for chat translation, edit the LibreTranslate URL in the [./configs/core.cfg](./configs/core.cfg) file under the [Translation] section.
```
[Translation]
libretranslate = http://127.0.0.1:5000
```
To migrate your database to Kvrocks:
1. Launch ARDB and Kvrocks
2. Pull from remote
```
git checkout master
git pull
```
3. Launch the migration script:
```
git checkout master
git pull
cd bin/
./DB_KVROCKS_MIGRATION.py
```

224
README.md
View file

@ -1,9 +1,6 @@
AIL # AIL framework
===
<p align="center"> <img src="https://raw.githubusercontent.com/ail-project/ail-framework/master/var/www/static/image/ail-icon.png" height="400" />
<img src="https://raw.githubusercontent.com/ail-project/ail-framework/master/var/www/static/image/ail-icon.png" height="250" />
</p>
<table> <table>
<tr> <tr>
@ -12,7 +9,7 @@ AIL
</tr> </tr>
<tr> <tr>
<td>CI</td> <td>CI</td>
<td><a href="https://github.com/CIRCL/AIL-framework/actions/workflows/ail_framework_test.yml"><img src="https://github.com/CIRCL/AIL-framework/actions/workflows/ail_framework_test.yml/badge.svg"></a></td> <td><a href="https://github.com/ail-project/ail-framework/actions/workflows/ail_framework_test.yml"><img src="https://github.com/ail-project/ail-framework/actions/workflows/ail_framework_test.yml/badge.svg"></a></td>
</tr> </tr>
<tr> <tr>
<td>Gitter</td> <td>Gitter</td>
@ -28,59 +25,72 @@ AIL
</tr> </tr>
</table> </table>
![Logo](./doc/logo/logo-small.png?raw=true "AIL logo")
AIL framework - Framework for Analysis of Information Leaks AIL framework - Framework for Analysis of Information Leaks
AIL is a modular framework to analyse potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention). AIL is a modular framework to analyse potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).
![Dashboard](./doc/screenshots/dashboard.png?raw=true "AIL framework dashboard") ![Dashboard](./doc/screenshots/dashboard0.png?raw=true "AIL framework dashboard")
![Finding webshells with AIL](./doc/screenshots/webshells.gif?raw=true "Finding websheels with AIL") ![Finding webshells with AIL](./doc/screenshots/webshells.gif?raw=true "Finding webshells with AIL")
Features ## AIL V5.0 Version:
--------
* Modular architecture to handle streams of unstructured or structured information AIL v5.0 introduces significant improvements and new features:
* Default support for external ZMQ feeds, such as provided by CIRCL or other providers
* Multiple feed support - **Codebase Rewrite**: The codebase has undergone a substantial rewrite,
* Each module can process and reprocess the information already processed by AIL resulting in enhanced performance and speed improvements.
* Detecting and extracting URLs including their geographical location (e.g. IP address location) - **Database Upgrade**: The database has been migrated from ARDB to Kvrocks.
* Extracting and validating potential leaks of credit card numbers, credentials, ... - **New Correlation Engine**: AIL v5.0 introduces a new powerful correlation engine with two new correlation types: CVE and Title.
* Extracting and validating leaked email addresses, including DNS MX validation - **Enhanced Logging**: The logging system has been improved to provide better troubleshooting capabilities.
* Module for extracting Tor .onion addresses (to be further processed for analysis) - **Tagging Support**: [AIL objects](./doc/README.md#ail-objects) now support tagging,
* Keep tracks of duplicates (and diffing between each duplicate found) allowing users to categorize and label extracted information for easier analysis and organization.
* Extracting and validating potential hostnames (e.g. to feed Passive DNS systems) - **Trackers**: Improved objects filtering, PGP and decoded tracking added.
* A full-text indexer module to index unstructured information - **UI Content Visualization**: The user interface has been upgraded to visualize extracted and tracked information.
* Statistics on modules and web - **New Crawler Lacus**: improve crawling capabilities.
* Real-time modules manager in terminal - **Modular Importers and Exporters**: New importers (ZMQ, AIL Feeders) and exporters (MISP, Mail, TheHive) modular design.
* Global sentiment analysis for each providers based on nltk vader module Allow easy creation and customization by extending an abstract class.
* Terms, Set of terms and Regex tracking and occurrence - **Module Queues**: improved the queuing mechanism between detection modules.
* Many more modules for extracting phone numbers, credentials and others - **New Object CVE and Title**: Extract an correlate CVE IDs and web page titles.
* Alerting to [MISP](https://github.com/MISP/MISP) to share found leaks within a threat intelligence platform using [MISP standard](https://www.misp-project.org/objects.html#_ail_leak)
* Detect and decode encoded file (Base64, hex encoded or your own decoding scheme) and store files ## Features
* Detect Amazon AWS and Google API keys
* Detect Bitcoin address and Bitcoin private keys - Modular architecture to handle streams of unstructured or structured information
* Detect private keys, certificate, keys (including SSH, OpenVPN) - Default support for external ZMQ feeds, such as provided by CIRCL or other providers
* Detect IBAN bank accounts - Multiple Importers and feeds support
* Tagging system with [MISP Galaxy](https://github.com/MISP/misp-galaxy) and [MISP Taxonomies](https://github.com/MISP/misp-taxonomies) tags - Each module can process and reprocess the information already analyzed by AIL
* UI paste submission - Detecting and extracting URLs including their geographical location (e.g. IP address location)
* Create events on [MISP](https://github.com/MISP/MISP) and cases on [The Hive](https://github.com/TheHive-Project/TheHive) - Extracting and validating potential leaks of credit card numbers, credentials, ...
* Automatic paste export at detection on [MISP](https://github.com/MISP/MISP) (events) and [The Hive](https://github.com/TheHive-Project/TheHive) (alerts) on selected tags - Extracting and validating leaked email addresses, including DNS MX validation
* Extracted and decoded files can be searched by date range, type of file (mime-type) and encoding discovered - Module for extracting Tor .onion addresses for further analysis
* Graph relationships between decoded file (hashes), similar PGP UIDs and addresses of cryptocurrencies - Keep tracks of credentials duplicates (and diffing between each duplicate found)
* Tor hidden services crawler to crawl and parse output - Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
* Tor onion availability is monitored to detect up and down of hidden services - A full-text indexer module to index unstructured information
* Browser hidden services are screenshot and integrated in the analysed output including a blurring screenshot interface (to avoid "burning the eyes" of the security analysis with specific content) - Terms, Set of terms, Regex, typo squatting and YARA tracking and occurrence
* Tor hidden services is part of the standard framework, all the AIL modules are available to the crawled hidden services - YARA Retro Hunt
* Generic web crawler to trigger crawling on demand or at regular interval URL or Tor hidden services - Many more modules for extracting phone numbers, credentials, and more
- Alerting to [MISP](https://github.com/MISP/MISP) to share found leaks within a threat intelligence platform using [MISP standard](https://www.misp-project.org/objects.html#_ail_leak)
- Detecting and decoding encoded file (Base64, hex encoded or your own decoding scheme) and storing files
- Detecting Amazon AWS and Google API keys
- Detecting Bitcoin address and Bitcoin private keys
- Detecting private keys, certificate, keys (including SSH, OpenVPN)
- Detecting IBAN bank accounts
- Tagging system with [MISP Galaxy](https://github.com/MISP/misp-galaxy) and [MISP Taxonomies](https://github.com/MISP/misp-taxonomies) tags
- UI submission
- Create events on [MISP](https://github.com/MISP/MISP) and cases on [The Hive](https://github.com/TheHive-Project/TheHive)
- Automatic export on detection with [MISP](https://github.com/MISP/MISP) (events) and [The Hive](https://github.com/TheHive-Project/TheHive) (alerts) on selected tags
- Extracted and decoded files can be searched by date range, type of file (mime-type) and encoding discovered
- Correlations engine and Graph to visualize relationships between decoded files (hashes), PGP UIDs, domains, username, and cryptocurrencies addresses
- Websites, Forums and Tor Hidden-Services hidden services crawler to crawl and parse output
- Domain availability monitoring to detect up and down of websites and hidden services
- Browsed hidden services are automatically captured and integrated into the analyzed output, including a blurring screenshot interface (to avoid "burning the eyes" of security analysts with sensitive content)
- Tor hidden services is part of the standard framework, all the AIL modules are available to the crawled hidden services
- Crawler scheduler to trigger crawling on demand or at regular intervals for URLs or Tor hidden services
Installation ## Installation
------------
Type these command lines for a fully automated installation and start AIL framework: To install the AIL framework, run the following commands:
```bash ```bash
# Clone the repo first # Clone the repo first
git clone https://github.com/ail-project/ail-framework.git git clone https://github.com/ail-project/ail-framework.git
@ -89,10 +99,6 @@ cd ail-framework
# For Debian and Ubuntu based distributions # For Debian and Ubuntu based distributions
./installing_deps.sh ./installing_deps.sh
# For Centos based distributions (Tested: Centos 8)
chmod u+x centos_installing_deps.sh
./centos_installing_deps.sh
# Launch ail # Launch ail
cd ~/ail-framework/ cd ~/ail-framework/
cd bin/ cd bin/
@ -101,59 +107,52 @@ cd bin/
The default [installing_deps.sh](./installing_deps.sh) is for Debian and Ubuntu based distributions. The default [installing_deps.sh](./installing_deps.sh) is for Debian and Ubuntu based distributions.
There is also a [Travis file](.travis.yml) used for automating the installation that can be used to build and install AIL on other systems.
Requirement: Requirement:
- Python 3.6+ - Python 3.7+
Installation Notes ## Installation Notes
------------
In order to use AIL combined with **ZFS** or **unprivileged LXC** it's necessary to disable Direct I/O in `$AIL_HOME/configs/6382.conf` by changing the value of the directive `use_direct_io_for_flush_and_compaction` to `false`. For Lacus Crawler installation instructions, refer to the [HOWTO](https://github.com/ail-project/ail-framework/blob/master/HOWTO.md#crawler)
Tor installation instructions can be found in the [HOWTO](https://github.com/ail-project/ail-framework/blob/master/HOWTO.md#installationconfiguration) ## Starting AIL
Starting AIL To start AIL, use the following commands:
--------------------------
```bash ```bash
cd bin/ cd bin/
./LAUNCH.sh -l ./LAUNCH.sh -l
``` ```
Eventually you can browse the status of the AIL framework website at the following URL: You can access the AIL framework web interface at the following URL:
``` ```
https://localhost:7000/ https://localhost:7000/
``` ```
The default credentials for the web interface are located in ``DEFAULT_PASSWORD``. This file is removed when you change your password. The default credentials for the web interface are located in the ``DEFAULT_PASSWORD``file, which is deleted when you change your password.
Training ## Training
--------
CIRCL organises training on how to use or extend the AIL framework. AIL training materials are available at [https://www.circl.lu/services/ail-training-materials/](https://www.circl.lu/services/ail-training-materials/). CIRCL organises training on how to use or extend the AIL framework. AIL training materials are available at [https://github.com/ail-project/ail-training](https://github.com/ail-project/ail-training).
API ## API
-----
The API documentation is available in [doc/README.md](doc/README.md) The API documentation is available in [doc/api.md](doc/api.md)
HOWTO ## HOWTO
-----
HOWTO are available in [HOWTO.md](HOWTO.md) HOWTO are available in [HOWTO.md](HOWTO.md)
Privacy and GDPR ## Privacy and GDPR
----------------
[AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks](https://www.circl.lu/assets/files/information-leaks-analysis-and-gdpr.pdf) document provides an overview how to use AIL in a lawfulness context especially in the scope of General Data Protection Regulation. For information on AIL's compliance with GDPR and privacy considerations, refer to the [AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks](https://www.circl.lu/assets/files/information-leaks-analysis-and-gdpr.pdf) document.
Research using AIL this document provides an overview how to use AIL in a lawfulness context especially in the scope of General Data Protection Regulation.
------------------
If you write academic paper, relying or using AIL, it can be cited with the following BibTeX: ## Research using AIL
If you use or reference AIL in an academic paper, you can cite it using the following BibTeX:
~~~~ ~~~~
@inproceedings{mokaddem2018ail, @inproceedings{mokaddem2018ail,
@ -166,75 +165,64 @@ If you write academic paper, relying or using AIL, it can be cited with the foll
} }
~~~~ ~~~~
Screenshots ## Screenshots
===========
Tor hidden service crawler ### Websites, Forums and Tor Hidden-Services
--------------------------
![Tor hidden service](./doc/screenshots/ail-bitcoinmixer.png?raw=true "Tor hidden service crawler") ![Domain CIRCL](./doc/screenshots/domain_circl.png?raw=true "Tor hidden service crawler")
Trending charts #### Login protected, pre-recorded session cookies:
--------------- ![Domain cookiejar](./doc/screenshots/crawler-cookiejar-domain-crawled.png?raw=true "Tor hidden service crawler")
![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending") ### Extracted encoded files from items
Extracted encoded files from pastes ![Extracted files](./doc/screenshots/decodeds_dashboard.png?raw=true "AIL extracted decoded files statistics")
-----------------------------------
![Extracted files from pastes](./doc/screenshots/ail-hashedfiles.png?raw=true "AIL extracted decoded files statistics") ### Correlation Engine
![Relationships between extracted files from encoded file in unstructured data](./doc/screenshots/hashedfile-graph.png?raw=true "Relationships between extracted files from encoded file in unstructured data")
Browsing ![Correlation decoded image](./doc/screenshots/correlation_decoded_image.png?raw=true "Correlation decoded image")
--------
![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes") ### Investigation
Tagging system ![Investigation](./doc/screenshots/investigation_mixer.png?raw=true "AIL framework cookiejar")
--------
![Tags](./doc/screenshots/tags.png?raw=true "AIL framework tags") ### Tagging system
MISP and The Hive, automatic events and alerts creation ![Tags](./doc/screenshots/tags_search.png?raw=true "AIL framework tags")
--------
![paste_submit](./doc/screenshots/tag_auto_export.png?raw=true "AIL framework MISP and Hive auto export") ![Tags search](./doc/screenshots/tags_search_items.png?raw=true "AIL framework tags items search")
Paste submission ### MISP Export
--------
![paste_submit](./doc/screenshots/paste_submit.png?raw=true "AIL framework paste submission") ![misp_export](./doc/screenshots/misp_export.png?raw=true "AIL framework MISP Export")
Sentiment analysis ### MISP and The Hive, automatic events and alerts creation
------------------
![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis") ![tags_misp_auto](./doc/screenshots/tags_misp_auto.png?raw=true "AIL framework MISP and Hive auto export")
Terms tracker ### UI submission
---------------------------
![Term-tracker](./doc/screenshots/term-tracker.png?raw=true "AIL framework termManager") ![ui_submit](./doc/screenshots/ui_submit.png?raw=true "AIL framework UI importer")
### Trackers
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo) ![tracker-create](./doc/screenshots/tracker_create.png?raw=true "AIL framework create tracker")
Command line module manager ![tracker-yara](./doc/screenshots/tracker_yara.png?raw=true "AIL framework Yara tracker")
---------------------------
![Module-Manager](./doc/screenshots/module_information.png?raw=true "AIL framework ModuleInformationV2.py") ![retro-hunt](./doc/screenshots/retro_hunt.png?raw=true "AIL framework Retro Hunt")
License ## License
=======
``` ```
Copyright (C) 2014 Jules Debra Copyright (C) 2014 Jules Debra
Copyright (C) 2014-2021 CIRCL - Computer Incident Response Center Luxembourg (c/o smile, security made in Lëtzebuerg, Groupement d'Intérêt Economique)
Copyright (c) 2014-2021 Raphaël Vinot
Copyright (c) 2014-2021 Alexandre Dulaunoy
Copyright (c) 2016-2021 Sami Mokaddem
Copyright (c) 2018-2021 Thirion Aurélien
Copyright (c) 2021 Olivier Sagit Copyright (c) 2021 Olivier Sagit
Copyright (C) 2014-2023 CIRCL - Computer Incident Response Center Luxembourg (c/o smile, security made in Lëtzebuerg, Groupement d'Intérêt Economique)
Copyright (c) 2014-2023 Raphaël Vinot
Copyright (c) 2014-2023 Alexandre Dulaunoy
Copyright (c) 2016-2023 Sami Mokaddem
Copyright (c) 2018-2023 Thirion Aurélien
This program is free software: you can redistribute it and/or modify This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by it under the terms of the GNU Affero General Public License as published by

View file

@ -20,26 +20,28 @@ if [ -e "${DIR}/AILENV/bin/python" ]; then
export AIL_VENV=${AIL_HOME}/AILENV/ export AIL_VENV=${AIL_HOME}/AILENV/
. ./AILENV/bin/activate . ./AILENV/bin/activate
else else
echo "Please make sure you have a AIL-framework environment, au revoir" echo "Please make sure AILENV is installed"
exit 1 exit 1
fi fi
export PATH=$AIL_VENV/bin:$PATH export PATH=$AIL_VENV/bin:$PATH
export PATH=$AIL_HOME:$PATH export PATH=$AIL_HOME:$PATH
export PATH=$AIL_REDIS:$PATH export PATH=$AIL_REDIS:$PATH
export PATH=$AIL_ARDB:$PATH export PATH=$AIL_KVROCKS:$PATH
export PATH=$AIL_BIN:$PATH export PATH=$AIL_BIN:$PATH
export PATH=$AIL_FLASK:$PATH export PATH=$AIL_FLASK:$PATH
isredis=`screen -ls | egrep '[0-9]+.Redis_AIL' | cut -d. -f1` function check_screens {
isardb=`screen -ls | egrep '[0-9]+.ARDB_AIL' | cut -d. -f1` isredis=`screen -ls | egrep '[0-9]+.Redis_AIL' | cut -d. -f1`
iskvrocks=`screen -ls | egrep '[0-9]+.KVROCKS_AIL' | cut -d. -f1` isardb=`screen -ls | egrep '[0-9]+.ARDB_AIL' | cut -d. -f1`
islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1` iskvrocks=`screen -ls | egrep '[0-9]+.KVROCKS_AIL' | cut -d. -f1`
is_ail_core=`screen -ls | egrep '[0-9]+.Core_AIL' | cut -d. -f1` islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
is_ail_2_ail=`screen -ls | egrep '[0-9]+.AIL_2_AIL' | cut -d. -f1` is_ail_core=`screen -ls | egrep '[0-9]+.Core_AIL' | cut -d. -f1`
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1` is_ail_2_ail=`screen -ls | egrep '[0-9]+.AIL_2_AIL' | cut -d. -f1`
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1` isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1` isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
}
function helptext { function helptext {
echo -e $YELLOW" echo -e $YELLOW"
@ -59,7 +61,6 @@ function helptext {
- All the queuing modules. - All the queuing modules.
- All the processing modules. - All the processing modules.
- All Redis in memory servers. - All Redis in memory servers.
- All ARDB on disk servers.
- All KVROCKS servers. - All KVROCKS servers.
"$DEFAULT" "$DEFAULT"
(Inside screen Daemons) (Inside screen Daemons)
@ -69,6 +70,7 @@ function helptext {
LAUNCH.sh LAUNCH.sh
[-l | --launchAuto] LAUNCH DB + Scripts [-l | --launchAuto] LAUNCH DB + Scripts
[-k | --killAll] Kill DB + Scripts [-k | --killAll] Kill DB + Scripts
[-r | --restart] Restart
[-ks | --killscript] Scripts [-ks | --killscript] Scripts
[-u | --update] Update AIL [-u | --update] Update AIL
[-ut | --thirdpartyUpdate] Update UI/Frontend [-ut | --thirdpartyUpdate] Update UI/Frontend
@ -265,14 +267,17 @@ function launching_scripts {
sleep 0.1 sleep 0.1
screen -S "Script_AIL" -X screen -t "SQLInjectionDetection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./SQLInjectionDetection.py; read x" screen -S "Script_AIL" -X screen -t "SQLInjectionDetection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./SQLInjectionDetection.py; read x"
sleep 0.1 sleep 0.1
screen -S "Script_AIL" -X screen -t "LibInjection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./LibInjection.py; read x" # screen -S "Script_AIL" -X screen -t "LibInjection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./LibInjection.py; read x"
sleep 0.1 # sleep 0.1
screen -S "Script_AIL" -X screen -t "Zerobins" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Zerobins.py; read x" # screen -S "Script_AIL" -X screen -t "Pasties" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Pasties.py; read x"
sleep 0.1 # sleep 0.1
screen -S "Script_AIL" -X screen -t "MISP_Thehive_Auto_Push" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./MISP_Thehive_Auto_Push.py; read x" screen -S "Script_AIL" -X screen -t "MISP_Thehive_Auto_Push" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./MISP_Thehive_Auto_Push.py; read x"
sleep 0.1 sleep 0.1
screen -S "Script_AIL" -X screen -t "Exif" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Exif.py; read x"
sleep 0.1
################################## ##################################
# TRACKERS MODULES # # TRACKERS MODULES #
################################## ##################################
@ -607,7 +612,7 @@ function launch_all {
function menu_display { function menu_display {
options=("Redis" "Ardb" "Kvrocks" "Logs" "Scripts" "Flask" "Killall" "Update" "Update-config" "Update-thirdparty") options=("Redis" "Kvrocks" "Logs" "Scripts" "Flask" "Killall" "Update" "Update-config" "Update-thirdparty")
menu() { menu() {
echo "What do you want to Launch?:" echo "What do you want to Launch?:"
@ -635,9 +640,6 @@ function menu_display {
Redis) Redis)
launch_redis; launch_redis;
;; ;;
Ardb)
launch_ardb;
;;
Kvrocks) Kvrocks)
launch_kvrocks; launch_kvrocks;
;; ;;
@ -679,31 +681,38 @@ function menu_display {
} }
#echo "$@" #echo "$@"
check_screens;
while [ "$1" != "" ]; do while [ "$1" != "" ]; do
case $1 in case $1 in
-l | --launchAuto ) launch_all "automatic"; -l | --launchAuto ) check_screens;
launch_all "automatic";
;; ;;
-lr | --launchRedis ) launch_redis; -lr | --launchRedis ) check_screens;
launch_redis;
;; ;;
-la | --launchARDB ) launch_ardb; -la | --launchARDB ) launch_ardb;
;; ;;
-lk | --launchKVROCKS ) launch_kvrocks; -lk | --launchKVROCKS ) check_screens;
launch_kvrocks;
;; ;;
-lrv | --launchRedisVerify ) launch_redis; -lrv | --launchRedisVerify ) launch_redis;
wait_until_redis_is_ready; wait_until_redis_is_ready;
;; ;;
-lav | --launchARDBVerify ) launch_ardb;
wait_until_ardb_is_ready;
;;
-lkv | --launchKVORCKSVerify ) launch_kvrocks; -lkv | --launchKVORCKSVerify ) launch_kvrocks;
wait_until_kvrocks_is_ready; wait_until_kvrocks_is_ready;
;; ;;
--set_kvrocks_namespaces ) set_kvrocks_namespaces; --set_kvrocks_namespaces ) set_kvrocks_namespaces;
;; ;;
-k | --killAll ) killall; -k | --killAll ) check_screens;
killall;
;; ;;
-ks | --killscript ) killscript; -r | --restart ) killall;
sleep 0.1;
check_screens;
launch_all "automatic";
;;
-ks | --killscript ) check_screens;
killscript;
;; ;;
-m | --menu ) menu_display; -m | --menu ) menu_display;
;; ;;

View file

@ -34,16 +34,20 @@ class D4Client(AbstractModule):
self.d4_client = d4.create_d4_client() self.d4_client = d4.create_d4_client()
self.last_refresh = time.time() self.last_refresh = time.time()
self.last_config_check = time.time()
# Send module state to logs # Send module state to logs
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
def compute(self, dns_record): def compute(self, dns_record):
# Refresh D4 Client # Refresh D4 Client
if self.last_config_check < int(time.time()) - 30:
print('refresh rrrr')
if self.last_refresh < d4.get_config_last_update_time(): if self.last_refresh < d4.get_config_last_update_time():
self.d4_client = d4.create_d4_client() self.d4_client = d4.create_d4_client()
self.last_refresh = time.time() self.last_refresh = time.time()
print('D4 Client: config updated') print('D4 Client: config updated')
self.last_config_check = time.time()
if self.d4_client: if self.d4_client:
# Send DNS Record to D4Server # Send DNS Record to D4Server

View file

@ -23,7 +23,7 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from core import ail_2_ail from core import ail_2_ail
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
# from lib.ConfigLoader import ConfigLoader from lib.objects.Items import Item
#### CONFIG #### #### CONFIG ####
# config_loader = ConfigLoader() # config_loader = ConfigLoader()
@ -76,10 +76,11 @@ class Sync_importer(AbstractModule):
# # TODO: create default id # # TODO: create default id
item_id = ail_stream['meta']['ail:id'] item_id = ail_stream['meta']['ail:id']
item = Item(item_id)
message = f'sync {item_id} {b64_gzip_content}' message = f'sync {b64_gzip_content}'
print(item_id) print(item.id)
self.add_message_to_queue(message, 'Importers') self.add_message_to_queue(obj=item, message=message, queue='Importers')
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -15,13 +15,16 @@ This module .
import os import os
import sys import sys
import time import time
import traceback
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from core import ail_2_ail from core import ail_2_ail
from lib.objects.Items import Item from lib.ail_queues import get_processed_end_obj, timeout_processed_objs, get_last_queue_timeout
from lib.exceptions import ModuleQueueError
from lib.objects import ail_objects
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
@ -30,14 +33,15 @@ class Sync_module(AbstractModule):
Sync_module module for AIL framework Sync_module module for AIL framework
""" """
def __init__(self): def __init__(self, queue=False): # FIXME MODIFY/ADD QUEUE
super(Sync_module, self).__init__() super(Sync_module, self).__init__(queue=queue)
# Waiting time in seconds between to message processed # Waiting time in seconds between to message processed
self.pending_seconds = 10 self.pending_seconds = 10
self.dict_sync_queues = ail_2_ail.get_all_sync_queue_dict() self.dict_sync_queues = ail_2_ail.get_all_sync_queue_dict()
self.last_refresh = time.time() self.last_refresh = time.time()
self.last_refresh_queues = time.time()
print(self.dict_sync_queues) print(self.dict_sync_queues)
@ -53,17 +57,7 @@ class Sync_module(AbstractModule):
print('sync queues refreshed') print('sync queues refreshed')
print(self.dict_sync_queues) print(self.dict_sync_queues)
# Extract object from message obj = ail_objects.get_obj_from_global_id(message)
# # TODO: USE JSON DICT ????
mess_split = message.split(';')
if len(mess_split) == 3:
obj_type = mess_split[0]
obj_subtype = mess_split[1]
obj_id = mess_split[2]
# OBJECT => Item
# if obj_type == 'item':
obj = Item(obj_id)
tags = obj.get_tags() tags = obj.get_tags()
@ -77,16 +71,56 @@ class Sync_module(AbstractModule):
obj_dict = obj.get_default_meta() obj_dict = obj.get_default_meta()
# send to queue push and/or pull # send to queue push and/or pull
for dict_ail in self.dict_sync_queues[queue_uuid]['ail_instances']: for dict_ail in self.dict_sync_queues[queue_uuid]['ail_instances']:
print(f'ail_uuid: {dict_ail["ail_uuid"]} obj: {message}') print(f'ail_uuid: {dict_ail["ail_uuid"]} obj: {obj.type}:{obj.get_subtype(r_str=True)}:{obj.id}')
ail_2_ail.add_object_to_sync_queue(queue_uuid, dict_ail['ail_uuid'], obj_dict, ail_2_ail.add_object_to_sync_queue(queue_uuid, dict_ail['ail_uuid'], obj_dict,
push=dict_ail['push'], pull=dict_ail['pull']) push=dict_ail['push'], pull=dict_ail['pull'])
def run(self):
"""
Run Module endless process
"""
# Endless loop processing messages from the input queue
while self.proceed:
# Timeout queues
# timeout_processed_objs()
if self.last_refresh_queues < time.time():
timeout_processed_objs()
self.last_refresh_queues = time.time() + 120
self.redis_logger.debug('Timeout queues')
# print('Timeout queues')
# Get one message (paste) from the QueueIn (copy of Redis_Global publish)
global_id = get_processed_end_obj()
if global_id:
try:
# Module processing with the message from the queue
self.compute(global_id)
except Exception as err:
if self.debug:
self.queue.error()
raise err
# LOG ERROR
trace = traceback.format_tb(err.__traceback__)
trace = ''.join(trace)
self.logger.critical(f"Error in module {self.module_name}: {__name__} : {err}")
self.logger.critical(f"Module {self.module_name} input message: {global_id}")
self.logger.critical(trace)
if isinstance(err, ModuleQueueError):
self.queue.error()
raise err
else: else:
# Malformed message self.computeNone()
raise Exception(f'too many values to unpack (expected 3) given {len(mess_split)} with message {message}') # Wait before next process
self.logger.debug(f"{self.module_name}, waiting for new message, Idling {self.pending_seconds}s")
time.sleep(self.pending_seconds)
if __name__ == '__main__': if __name__ == '__main__':
module = Sync_module() module = Sync_module(queue=False) # FIXME MODIFY/ADD QUEUE
module.run() module.run()

View file

@ -11,7 +11,7 @@ import uuid
import subprocess import subprocess
from flask import escape from markupsafe import escape
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
@ -141,7 +141,10 @@ def is_server_client_sync_mode_connected(ail_uuid, sync_mode):
return res == 1 return res == 1
def is_server_client_connected(ail_uuid): def is_server_client_connected(ail_uuid):
try:
return r_cache.sismember('ail_2_ail:server:all_clients', ail_uuid) return r_cache.sismember('ail_2_ail:server:all_clients', ail_uuid)
except:
return False
def clear_server_connected_clients(): def clear_server_connected_clients():
for ail_uuid in get_server_all_connected_clients(): for ail_uuid in get_server_all_connected_clients():
@ -398,7 +401,10 @@ def get_all_ail_instance_keys():
return r_serv_sync.smembers(f'ail:instance:key:all') return r_serv_sync.smembers(f'ail:instance:key:all')
def is_allowed_ail_instance_key(key): def is_allowed_ail_instance_key(key):
try:
return r_serv_sync.sismember(f'ail:instance:key:all', key) return r_serv_sync.sismember(f'ail:instance:key:all', key)
except:
return False
def get_ail_instance_key(ail_uuid): def get_ail_instance_key(ail_uuid):
return r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'api_key') return r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'api_key')
@ -427,7 +433,10 @@ def get_ail_instance_all_sync_queue(ail_uuid):
return r_serv_sync.smembers(f'ail:instance:sync_queue:{ail_uuid}') return r_serv_sync.smembers(f'ail:instance:sync_queue:{ail_uuid}')
def is_ail_instance_queue(ail_uuid, queue_uuid): def is_ail_instance_queue(ail_uuid, queue_uuid):
try:
return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid) return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid)
except:
return False
def exists_ail_instance(ail_uuid): def exists_ail_instance(ail_uuid):
return r_serv_sync.exists(f'ail:instance:{ail_uuid}') return r_serv_sync.exists(f'ail:instance:{ail_uuid}')
@ -439,7 +448,10 @@ def get_ail_instance_description(ail_uuid):
return r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'description') return r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'description')
def exists_ail_instance(ail_uuid): def exists_ail_instance(ail_uuid):
try:
return r_serv_sync.sismember('ail:instance:all', ail_uuid) return r_serv_sync.sismember('ail:instance:all', ail_uuid)
except:
return False
def is_ail_instance_push_enabled(ail_uuid): def is_ail_instance_push_enabled(ail_uuid):
res = r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'push') res = r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'push')
@ -935,7 +947,10 @@ def get_all_sync_queue_dict():
return dict_sync_queues return dict_sync_queues
def is_queue_registred_by_ail_instance(queue_uuid, ail_uuid): def is_queue_registred_by_ail_instance(queue_uuid, ail_uuid):
try:
return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid) return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid)
except:
return False
def register_ail_to_sync_queue(ail_uuid, queue_uuid): def register_ail_to_sync_queue(ail_uuid, queue_uuid):
is_linked = is_ail_instance_linked_to_sync_queue(ail_uuid) is_linked = is_ail_instance_linked_to_sync_queue(ail_uuid)

View file

@ -6,6 +6,7 @@ import logging.config
import sys import sys
import time import time
from pyail import PyAIL
from requests.exceptions import ConnectionError from requests.exceptions import ConnectionError
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
@ -16,9 +17,13 @@ from modules.abstract_module import AbstractModule
from lib import ail_logger from lib import ail_logger
from lib import crawlers from lib import crawlers
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects import CookiesNames
from lib.objects import Etags
from lib.objects.Domains import Domain from lib.objects.Domains import Domain
from lib.objects.Items import Item from lib.objects.Items import Item
from lib.objects import Screenshots from lib.objects import Screenshots
from lib.objects import Titles
from trackers.Tracker_Yara import Tracker_Yara
logging.config.dictConfig(ail_logger.get_config(name='crawlers')) logging.config.dictConfig(ail_logger.get_config(name='crawlers'))
@ -32,12 +37,23 @@ class Crawler(AbstractModule):
# Waiting time in seconds between to message processed # Waiting time in seconds between to message processed
self.pending_seconds = 1 self.pending_seconds = 1
self.tracker_yara = Tracker_Yara(queue=False)
config_loader = ConfigLoader() config_loader = ConfigLoader()
self.default_har = config_loader.get_config_boolean('Crawler', 'default_har') self.default_har = config_loader.get_config_boolean('Crawler', 'default_har')
self.default_screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot') self.default_screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot')
self.default_depth_limit = config_loader.get_config_int('Crawler', 'default_depth_limit') self.default_depth_limit = config_loader.get_config_int('Crawler', 'default_depth_limit')
ail_url_to_push_discovery = config_loader.get_config_str('Crawler', 'ail_url_to_push_onion_discovery')
ail_key_to_push_discovery = config_loader.get_config_str('Crawler', 'ail_key_to_push_onion_discovery')
if ail_url_to_push_discovery and ail_key_to_push_discovery:
ail = PyAIL(ail_url_to_push_discovery, ail_key_to_push_discovery, ssl=False)
if ail.ping_ail():
self.ail_to_push_discovery = ail
else:
self.ail_to_push_discovery = None
# TODO: LIMIT MAX NUMBERS OF CRAWLED PAGES # TODO: LIMIT MAX NUMBERS OF CRAWLED PAGES
# update hardcoded blacklist # update hardcoded blacklist
@ -55,12 +71,15 @@ class Crawler(AbstractModule):
self.har = None self.har = None
self.screenshot = None self.screenshot = None
self.root_item = None self.root_item = None
self.har_dir = None self.date = None
self.items_dir = None self.items_dir = None
self.original_domain = None
self.domain = None self.domain = None
# TODO Replace with warning list ??? # TODO Replace with warning list ???
self.placeholder_screenshots = {'27e14ace10b0f96acd2bd919aaa98a964597532c35b6409dff6cc8eec8214748'} self.placeholder_screenshots = {'07244254f73e822bd4a95d916d8b27f2246b02c428adc29082d09550c6ed6e1a' # blank
'27e14ace10b0f96acd2bd919aaa98a964597532c35b6409dff6cc8eec8214748', # not found
'3e66bf4cc250a68c10f8a30643d73e50e68bf1d4a38d4adc5bfc4659ca2974c0'} # 404
# Send module state to logs # Send module state to logs
self.logger.info('Crawler initialized') self.logger.info('Crawler initialized')
@ -102,7 +121,9 @@ class Crawler(AbstractModule):
if crawlers.get_nb_crawler_captures() < crawlers.get_crawler_max_captures(): if crawlers.get_nb_crawler_captures() < crawlers.get_crawler_max_captures():
task_row = crawlers.add_task_to_lacus_queue() task_row = crawlers.add_task_to_lacus_queue()
if task_row: if task_row:
task_uuid, priority = task_row task, priority = task_row
task.start()
task_uuid = task.uuid
try: try:
self.enqueue_capture(task_uuid, priority) self.enqueue_capture(task_uuid, priority)
except ConnectionError: except ConnectionError:
@ -117,15 +138,30 @@ class Crawler(AbstractModule):
if capture: if capture:
try: try:
status = self.lacus.get_capture_status(capture.uuid) status = self.lacus.get_capture_status(capture.uuid)
if status != crawlers.CaptureStatus.DONE: # TODO ADD GLOBAL TIMEOUT-> Save start time ### print start time if status == crawlers.CaptureStatus.DONE:
return capture
elif status == crawlers.CaptureStatus.UNKNOWN:
capture_start = capture.get_start_time(r_str=False)
if capture_start == 0:
task = capture.get_task()
task.delete()
capture.delete()
self.logger.warning(f'capture UNKNOWN ERROR STATE, {task.uuid} Removed from queue')
return None
if int(time.time()) - capture_start > 600: # TODO ADD in new crawler config
task = capture.get_task()
task.reset()
capture.delete()
self.logger.warning(f'capture UNKNOWN Timeout, {task.uuid} Send back in queue')
else:
capture.update(status)
else:
capture.update(status) capture.update(status)
print(capture.uuid, crawlers.CaptureStatus(status).name, int(time.time())) print(capture.uuid, crawlers.CaptureStatus(status).name, int(time.time()))
else:
return capture
except ConnectionError: except ConnectionError:
print(capture.uuid) print(capture.uuid)
capture.update(self, -1) capture.update(-1)
self.refresh_lacus_status() self.refresh_lacus_status()
time.sleep(self.pending_seconds) time.sleep(self.pending_seconds)
@ -166,6 +202,24 @@ class Crawler(AbstractModule):
crawlers.create_capture(capture_uuid, task_uuid) crawlers.create_capture(capture_uuid, task_uuid)
print(task.uuid, capture_uuid, 'launched') print(task.uuid, capture_uuid, 'launched')
if self.ail_to_push_discovery:
if task.get_depth() == 1 and priority < 10 and task.get_domain().endswith('.onion'):
har = task.get_har()
screenshot = task.get_screenshot()
# parent_id = task.get_parent()
# if parent_id != 'manual' and parent_id != 'auto':
# parent = parent_id[19:-36]
# else:
# parent = 'AIL_capture'
if not url:
raise Exception(f'Error: url is None, {task.uuid}, {capture_uuid}, {url}')
self.ail_to_push_discovery.add_crawler_capture(task_uuid, capture_uuid, url, har=har, # parent=parent,
screenshot=screenshot, depth_limit=1, proxy='force_tor')
print(task.uuid, capture_uuid, url, 'Added to ail_to_push_discovery')
return capture_uuid return capture_uuid
# CRAWL DOMAIN # CRAWL DOMAIN
@ -175,34 +229,52 @@ class Crawler(AbstractModule):
task = capture.get_task() task = capture.get_task()
domain = task.get_domain() domain = task.get_domain()
print(domain) print(domain)
if not domain:
if self.debug:
raise Exception(f'Error: domain {domain} - task {task.uuid} - capture {capture.uuid}')
else:
self.logger.critical(f'Error: domain {domain} - task {task.uuid} - capture {capture.uuid}')
print(f'Error: domain {domain}')
return None
self.domain = Domain(domain) self.domain = Domain(domain)
self.original_domain = Domain(domain)
epoch = int(time.time()) epoch = int(time.time())
parent_id = task.get_parent() parent_id = task.get_parent()
entries = self.lacus.get_capture(capture.uuid) entries = self.lacus.get_capture(capture.uuid)
print(entries['status']) print(entries.get('status'))
self.har = task.get_har() self.har = task.get_har()
self.screenshot = task.get_screenshot() self.screenshot = task.get_screenshot()
# DEBUG # DEBUG
# self.har = True # self.har = True
# self.screenshot = True # self.screenshot = True
str_date = crawlers.get_current_date(separator=True) self.date = crawlers.get_current_date(separator=True)
self.har_dir = crawlers.get_date_har_dir(str_date) self.items_dir = crawlers.get_date_crawled_items_source(self.date)
self.items_dir = crawlers.get_date_crawled_items_source(str_date)
self.root_item = None self.root_item = None
# Save Capture # Save Capture
self.save_capture_response(parent_id, entries) self.save_capture_response(parent_id, entries)
self.domain.update_daterange(str_date.replace('/', '')) self.domain.update_daterange(self.date.replace('/', ''))
# Origin + History # Origin + History + tags
if self.root_item: if self.root_item:
self.domain.set_last_origin(parent_id) self.domain.set_last_origin(parent_id)
# Tags
for tag in task.get_tags():
self.domain.add_tag(tag)
self.domain.add_history(epoch, root_item=self.root_item) self.domain.add_history(epoch, root_item=self.root_item)
elif self.domain.was_up():
self.domain.add_history(epoch, root_item=epoch) if self.domain != self.original_domain:
self.original_domain.update_daterange(self.date.replace('/', ''))
if self.root_item:
self.original_domain.set_last_origin(parent_id)
# Tags
for tag in task.get_tags():
self.domain.add_tag(tag)
self.original_domain.add_history(epoch, root_item=self.root_item)
crawlers.update_last_crawled_domain(self.original_domain.get_domain_type(), self.original_domain.id, epoch)
crawlers.update_last_crawled_domain(self.domain.get_domain_type(), self.domain.id, epoch) crawlers.update_last_crawled_domain(self.domain.get_domain_type(), self.domain.id, epoch)
print('capture:', capture.uuid, 'completed') print('capture:', capture.uuid, 'completed')
@ -215,12 +287,12 @@ class Crawler(AbstractModule):
if 'error' in entries: if 'error' in entries:
# TODO IMPROVE ERROR MESSAGE # TODO IMPROVE ERROR MESSAGE
self.logger.warning(str(entries['error'])) self.logger.warning(str(entries['error']))
print(entries['error']) print(entries.get('error'))
if entries.get('html'): if entries.get('html'):
print('retrieved content') print('retrieved content')
# print(entries.get('html')) # print(entries.get('html'))
if 'last_redirected_url' in entries and entries['last_redirected_url']: if 'last_redirected_url' in entries and entries.get('last_redirected_url'):
last_url = entries['last_redirected_url'] last_url = entries['last_redirected_url']
unpacked_last_url = crawlers.unpack_url(last_url) unpacked_last_url = crawlers.unpack_url(last_url)
current_domain = unpacked_last_url['domain'] current_domain = unpacked_last_url['domain']
@ -235,32 +307,45 @@ class Crawler(AbstractModule):
else: else:
last_url = f'http://{self.domain.id}' last_url = f'http://{self.domain.id}'
if 'html' in entries and entries['html']: if 'html' in entries and entries.get('html'):
item_id = crawlers.create_item_id(self.items_dir, self.domain.id) item_id = crawlers.create_item_id(self.items_dir, self.domain.id)
print(item_id) item = Item(item_id)
gzip64encoded = crawlers.get_gzipped_b64_item(item_id, entries['html']) print(item.id)
gzip64encoded = crawlers.get_gzipped_b64_item(item.id, entries['html'])
# send item to Global # send item to Global
relay_message = f'crawler {item_id} {gzip64encoded}' relay_message = f'crawler {gzip64encoded}'
self.add_message_to_queue(relay_message, 'Importers') self.add_message_to_queue(obj=item, message=relay_message, queue='Importers')
# Tag # Tag # TODO replace me with metadata to tags
msg = f'infoleak:submission="crawler";{item_id}' msg = f'infoleak:submission="crawler"' # TODO FIXME
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(obj=item, message=msg, queue='Tags')
# TODO replace me with metadata to add
crawlers.create_item_metadata(item_id, last_url, parent_id) crawlers.create_item_metadata(item_id, last_url, parent_id)
if self.root_item is None: if self.root_item is None:
self.root_item = item_id self.root_item = item_id
parent_id = item_id parent_id = item_id
title_content = crawlers.extract_title_from_html(entries['html'])
if title_content:
title = Titles.create_title(title_content)
title.add(item.get_date(), item)
# Tracker
self.tracker_yara.compute_manual(title)
if not title.is_tags_safe():
unsafe_tag = 'dark-web:topic="pornography-child-exploitation"'
self.domain.add_tag(unsafe_tag)
item.add_tag(unsafe_tag)
# SCREENSHOT # SCREENSHOT
if self.screenshot: if self.screenshot:
if 'png' in entries and entries['png']: if 'png' in entries and entries.get('png'):
screenshot = Screenshots.create_screenshot(entries['png'], b64=False) screenshot = Screenshots.create_screenshot(entries['png'], b64=False)
if screenshot: if screenshot:
if not screenshot.is_tags_safe(): if not screenshot.is_tags_safe():
unsafe_tag = 'dark-web:topic="pornography-child-exploitation"' unsafe_tag = 'dark-web:topic="pornography-child-exploitation"'
self.domain.add_tag(unsafe_tag) self.domain.add_tag(unsafe_tag)
item = Item(item_id)
item.add_tag(unsafe_tag) item.add_tag(unsafe_tag)
# Remove Placeholder pages # TODO Replace with warning list ??? # Remove Placeholder pages # TODO Replace with warning list ???
if screenshot.id not in self.placeholder_screenshots: if screenshot.id not in self.placeholder_screenshots:
@ -269,8 +354,19 @@ class Crawler(AbstractModule):
screenshot.add_correlation('domain', '', self.domain.id) screenshot.add_correlation('domain', '', self.domain.id)
# HAR # HAR
if self.har: if self.har:
if 'har' in entries and entries['har']: if 'har' in entries and entries.get('har'):
crawlers.save_har(self.har_dir, item_id, entries['har']) har_id = crawlers.create_har_id(self.date, item_id)
crawlers.save_har(har_id, entries['har'])
for cookie_name in crawlers.extract_cookies_names_from_har(entries['har']):
print(cookie_name)
cookie = CookiesNames.create(cookie_name)
cookie.add(self.date.replace('/', ''), self.domain)
for etag_content in crawlers.extract_etag_from_har(entries['har']):
print(etag_content)
etag = Etags.create(etag_content)
etag.add(self.date.replace('/', ''), self.domain)
crawlers.extract_hhhash(entries['har'], self.domain.id, self.date.replace('/', ''))
# Next Children # Next Children
entries_children = entries.get('children') entries_children = entries.get('children')
if entries_children: if entries_children:

View file

@ -319,10 +319,6 @@ class MISPExporterAutoDaily(MISPExporter):
def __init__(self, url='', key='', ssl=False): def __init__(self, url='', key='', ssl=False):
super().__init__(url=url, key=key, ssl=ssl) super().__init__(url=url, key=key, ssl=ssl)
# create event if don't exists
try:
self.event_id = self.get_daily_event_id()
except MISPConnectionError:
self.event_id = - 1 self.event_id = - 1
self.date = datetime.date.today() self.date = datetime.date.today()
@ -345,6 +341,7 @@ class MISPExporterAutoDaily(MISPExporter):
self.add_event_object(self.event_id, obj) self.add_event_object(self.event_id, obj)
except MISPConnectionError: except MISPConnectionError:
self.event_id = - 1
return -1 return -1

View file

@ -8,9 +8,12 @@ Import Content
""" """
import os import os
import logging
import logging.config
import sys import sys
from abc import ABC from abc import ABC
from ssl import create_default_context
import smtplib import smtplib
from email.mime.multipart import MIMEMultipart from email.mime.multipart import MIMEMultipart
@ -22,17 +25,22 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from lib import ail_logger
from exporter.abstract_exporter import AbstractExporter from exporter.abstract_exporter import AbstractExporter
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
# from lib.objects.abstract_object import AbstractObject # from lib.objects.abstract_object import AbstractObject
# from lib.Tracker import Tracker # from lib.Tracker import Tracker
logging.config.dictConfig(ail_logger.get_config(name='modules'))
class MailExporter(AbstractExporter, ABC): class MailExporter(AbstractExporter, ABC):
def __init__(self, host=None, port=None, password=None, user='', sender=''): def __init__(self, host=None, port=None, password=None, user='', sender='', cert_required=None, ca_file=None):
super().__init__() super().__init__()
config_loader = ConfigLoader() config_loader = ConfigLoader()
self.logger = logging.getLogger(f'{self.__class__.__name__}')
if host: if host:
self.host = host self.host = host
self.port = port self.port = port
@ -45,6 +53,15 @@ class MailExporter(AbstractExporter, ABC):
self.pw = config_loader.get_config_str("Notifications", "sender_pw") self.pw = config_loader.get_config_str("Notifications", "sender_pw")
if self.pw == 'None': if self.pw == 'None':
self.pw = None self.pw = None
if cert_required is not None:
self.cert_required = bool(cert_required)
self.ca_file = ca_file
else:
self.cert_required = config_loader.get_config_boolean("Notifications", "cert_required")
if self.cert_required:
self.ca_file = config_loader.get_config_str("Notifications", "ca_file")
else:
self.ca_file = None
if user: if user:
self.user = user self.user = user
else: else:
@ -67,8 +84,12 @@ class MailExporter(AbstractExporter, ABC):
smtp_server = smtplib.SMTP(self.host, self.port) smtp_server = smtplib.SMTP(self.host, self.port)
smtp_server.starttls() smtp_server.starttls()
except smtplib.SMTPNotSupportedError: except smtplib.SMTPNotSupportedError:
print("The server does not support the STARTTLS extension.") self.logger.info(f"The server {self.host}:{self.port} does not support the STARTTLS extension.")
smtp_server = smtplib.SMTP_SSL(self.host, self.port) if self.cert_required:
context = create_default_context(cafile=self.ca_file)
else:
context = None
smtp_server = smtplib.SMTP_SSL(self.host, self.port, context=context)
smtp_server.ehlo() smtp_server.ehlo()
if self.user is not None: if self.user is not None:
@ -80,7 +101,7 @@ class MailExporter(AbstractExporter, ABC):
return smtp_server return smtp_server
# except Exception as err: # except Exception as err:
# traceback.print_tb(err.__traceback__) # traceback.print_tb(err.__traceback__)
# logger.warning(err) # self.logger.warning(err)
def _export(self, recipient, subject, body): def _export(self, recipient, subject, body):
mime_msg = MIMEMultipart() mime_msg = MIMEMultipart()
@ -95,24 +116,35 @@ class MailExporter(AbstractExporter, ABC):
smtp_client.quit() smtp_client.quit()
# except Exception as err: # except Exception as err:
# traceback.print_tb(err.__traceback__) # traceback.print_tb(err.__traceback__)
# logger.warning(err) # self.logger.warning(err)
print(f'Send notification: {subject} to {recipient}') self.logger.info(f'Send notification: {subject} to {recipient}')
class MailExporterTracker(MailExporter): class MailExporterTracker(MailExporter):
def __init__(self, host=None, port=None, password=None, user='', sender=''): def __init__(self, host=None, port=None, password=None, user='', sender=''):
super().__init__(host=host, port=port, password=password, user=user, sender=sender) super().__init__(host=host, port=port, password=password, user=user, sender=sender)
def export(self, tracker, obj): # TODO match def export(self, tracker, obj, matches=[]):
tracker_type = tracker.get_type() tracker_type = tracker.get_type()
tracker_name = tracker.get_tracked() tracker_name = tracker.get_tracked()
subject = f'AIL Framework Tracker: {tracker_name}' # TODO custom subject description = tracker.get_description()
if not description:
description = tracker_name
subject = f'AIL Framework Tracker: {description}'
body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n" body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n"
body += f'Item: {obj.id}\nurl:{obj.get_link()}' body += f'Item: {obj.id}\nurl:{obj.get_link()}'
# TODO match option if matches:
# if match: body += '\n'
# body += f'Tracker Match:\n\n{escape(match)}' nb = 1
for match in matches:
body += f'\nMatch {nb}: {match[0]}\nExtract:\n{match[1]}\n\n'
nb += 1
else:
body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n"
body += f'Item: {obj.id}\nurl:{obj.get_link()}'
# print(body)
for mail in tracker.get_mails(): for mail in tracker.get_mails():
self._export(mail, subject, body) self._export(mail, subject, body)

View file

@ -56,6 +56,8 @@ class FeederImporter(AbstractImporter):
feeders = [f[:-3] for f in os.listdir(feeder_dir) if os.path.isfile(os.path.join(feeder_dir, f))] feeders = [f[:-3] for f in os.listdir(feeder_dir) if os.path.isfile(os.path.join(feeder_dir, f))]
self.feeders = {} self.feeders = {}
for feeder in feeders: for feeder in feeders:
if feeder == 'abstract_chats_feeder':
continue
print(feeder) print(feeder)
part = feeder.split('.')[-1] part = feeder.split('.')[-1]
# import json importer class # import json importer class
@ -87,13 +89,27 @@ class FeederImporter(AbstractImporter):
feeder_name = feeder.get_name() feeder_name = feeder.get_name()
print(f'importing: {feeder_name} feeder') print(f'importing: {feeder_name} feeder')
item_id = feeder.get_item_id() # Get Data object:
data_obj = feeder.get_obj()
# process meta # process meta
if feeder.get_json_meta(): if feeder.get_json_meta():
feeder.process_meta() objs = feeder.process_meta()
gzip64_content = feeder.get_gzip64_content() if objs is None:
objs = set()
else:
objs = set()
return f'{feeder_name} {item_id} {gzip64_content}' if data_obj:
objs.add(data_obj)
for obj in objs:
if obj.type == 'item': # object save on disk as file (Items)
gzip64_content = feeder.get_gzip64_content()
return obj, f'{feeder_name} {gzip64_content}'
else: # Messages save on DB
if obj.exists() and obj.type != 'chat':
return obj, f'{feeder_name}'
class FeederModuleImporter(AbstractModule): class FeederModuleImporter(AbstractModule):
@ -112,11 +128,14 @@ class FeederModuleImporter(AbstractModule):
def compute(self, message): def compute(self, message):
# TODO HANDLE Invalid JSON # TODO HANDLE Invalid JSON
json_data = json.loads(message) json_data = json.loads(message)
relay_message = self.importer.importer(json_data) # TODO multiple objs + messages
self.add_message_to_queue(relay_message) obj, relay_message = self.importer.importer(json_data)
####
self.add_message_to_queue(obj=obj, message=relay_message)
# Launch Importer # Launch Importer
if __name__ == '__main__': if __name__ == '__main__':
module = FeederModuleImporter() module = FeederModuleImporter()
# module.debug = True
module.run() module.run()

View file

@ -19,42 +19,39 @@ sys.path.append(os.environ['AIL_BIN'])
from importer.abstract_importer import AbstractImporter from importer.abstract_importer import AbstractImporter
# from modules.abstract_module import AbstractModule # from modules.abstract_module import AbstractModule
from lib import ail_logger from lib import ail_logger
from lib.ail_queues import AILQueue # from lib.ail_queues import AILQueue
from lib import ail_files # TODO RENAME ME from lib import ail_files # TODO RENAME ME
from lib.objects.Items import Item
logging.config.dictConfig(ail_logger.get_config(name='modules')) logging.config.dictConfig(ail_logger.get_config(name='modules'))
# TODO Clean queue one object destruct
class FileImporter(AbstractImporter): class FileImporter(AbstractImporter):
def __init__(self, feeder='file_import'): def __init__(self, feeder='file_import'):
super().__init__() super().__init__(queue=True)
self.logger = logging.getLogger(f'{self.__class__.__name__}') self.logger = logging.getLogger(f'{self.__class__.__name__}')
self.feeder_name = feeder # TODO sanityze feeder name self.feeder_name = feeder # TODO sanityze feeder name
# Setup the I/O queues
self.queue = AILQueue('FileImporter', 'manual')
def importer(self, path): def importer(self, path):
if os.path.isfile(path): if os.path.isfile(path):
with open(path, 'rb') as f: with open(path, 'rb') as f:
content = f.read() content = f.read()
if content:
mimetype = ail_files.get_mimetype(content) mimetype = ail_files.get_mimetype(content)
if ail_files.is_text(mimetype):
item_id = ail_files.create_item_id(self.feeder_name, path) item_id = ail_files.create_item_id(self.feeder_name, path)
content = ail_files.create_gzipped_b64(content) gzipped = False
if content: if mimetype == 'application/gzip':
message = f'dir_import {item_id} {content}' gzipped = True
self.logger.info(message) elif not ail_files.is_text(mimetype): # # # #
self.queue.send_message(message) return None
elif mimetype == 'application/gzip':
item_id = ail_files.create_item_id(self.feeder_name, path) source = 'dir_import'
content = ail_files.create_b64(content) message = self.create_message(content, gzipped=gzipped, source=source)
if content: self.logger.info(f'{source} {item_id}')
message = f'dir_import {item_id} {content}' obj = Item(item_id)
self.logger.info(message) if message:
self.queue.send_message(message) self.add_message_to_queue(obj, message=message)
class DirImporter(AbstractImporter): class DirImporter(AbstractImporter):
def __init__(self): def __init__(self):

View file

@ -10,9 +10,7 @@
# https://github.com/cvandeplas/pystemon/blob/master/pystemon.yaml#L52 # https://github.com/cvandeplas/pystemon/blob/master/pystemon.yaml#L52
# #
import base64
import os import os
import gzip
import sys import sys
import redis import redis
@ -24,6 +22,8 @@ from importer.abstract_importer import AbstractImporter
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects.Items import Item
class PystemonImporter(AbstractImporter): class PystemonImporter(AbstractImporter):
def __init__(self, pystemon_dir, host='localhost', port=6379, db=10): def __init__(self, pystemon_dir, host='localhost', port=6379, db=10):
super().__init__() super().__init__()
@ -32,10 +32,6 @@ class PystemonImporter(AbstractImporter):
self.r_pystemon = redis.StrictRedis(host=host, port=port, db=db, decode_responses=True) self.r_pystemon = redis.StrictRedis(host=host, port=port, db=db, decode_responses=True)
self.dir_pystemon = pystemon_dir self.dir_pystemon = pystemon_dir
# # TODO: add exception
def encode_and_compress_data(self, content):
return base64.b64encode(gzip.compress(content)).decode()
def importer(self): def importer(self):
item_id = self.r_pystemon.lpop("pastes") item_id = self.r_pystemon.lpop("pastes")
print(item_id) print(item_id)
@ -53,11 +49,19 @@ class PystemonImporter(AbstractImporter):
if not content: if not content:
return None return None
b64_gzipped_content = self.encode_and_compress_data(content) if full_item_path[-3:] == '.gz':
print(item_id, b64_gzipped_content) gzipped = True
return f'{item_id} {b64_gzipped_content}' else:
gzipped = False
# TODO handle multiple objects
source = 'pystemon'
message = self.create_message(content, gzipped=gzipped, source=source)
self.logger.info(f'{source} {item_id}')
return item_id, message
except IOError as e: except IOError as e:
print(f'Error: {full_item_path}, IOError') self.logger.error(f'Error {e}: {full_item_path}, IOError')
return None return None
@ -81,8 +85,10 @@ class PystemonModuleImporter(AbstractModule):
return self.importer.importer() return self.importer.importer()
def compute(self, message): def compute(self, message):
relay_message = f'pystemon {message}' if message:
self.add_message_to_queue(relay_message) item_id, message = message
item = Item(item_id)
self.add_message_to_queue(obj=item, message=message)
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -4,15 +4,13 @@
Importer Class Importer Class
================ ================
Import Content ZMQ Importer
""" """
import os import os
import sys import sys
import zmq import zmq
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
@ -21,6 +19,8 @@ from importer.abstract_importer import AbstractImporter
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects.Items import Item
class ZMQImporters(AbstractImporter): class ZMQImporters(AbstractImporter):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
@ -56,6 +56,8 @@ class ZMQModuleImporter(AbstractModule):
super().__init__() super().__init__()
config_loader = ConfigLoader() config_loader = ConfigLoader()
self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name")
addresses = config_loader.get_config_str('ZMQ_Global', 'address') addresses = config_loader.get_config_str('ZMQ_Global', 'address')
addresses = addresses.split(',') addresses = addresses.split(',')
channel = config_loader.get_config_str('ZMQ_Global', 'channel') channel = config_loader.get_config_str('ZMQ_Global', 'channel')
@ -63,7 +65,6 @@ class ZMQModuleImporter(AbstractModule):
for address in addresses: for address in addresses:
self.zmq_importer.add(address.strip(), channel) self.zmq_importer.add(address.strip(), channel)
# TODO MESSAGE SOURCE - UI
def get_message(self): def get_message(self):
for message in self.zmq_importer.importer(): for message in self.zmq_importer.importer():
# remove channel from message # remove channel from message
@ -72,8 +73,20 @@ class ZMQModuleImporter(AbstractModule):
def compute(self, messages): def compute(self, messages):
for message in messages: for message in messages:
message = message.decode() message = message.decode()
print(message.split(' ', 1)[0])
self.add_message_to_queue(message) obj_id, gzip64encoded = message.split(' ', 1) # TODO ADD LOGS
splitted = obj_id.split('>>', 1)
if len(splitted) == 2:
feeder_name, obj_id = splitted
else:
feeder_name = self.default_feeder_name
obj = Item(obj_id)
# f'{source} {content}'
relay_message = f'{feeder_name} {gzip64encoded}'
print(f'feeder_name item::{obj_id}')
self.add_message_to_queue(obj=obj, message=relay_message)
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -7,26 +7,41 @@ Importer Class
Import Content Import Content
""" """
import base64
import gzip
import logging
import logging.config
import os import os
import sys import sys
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
# sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
# from ConfigLoader import ConfigLoader # from ConfigLoader import ConfigLoader
from lib import ail_logger
from lib.ail_queues import AILQueue
class AbstractImporter(ABC): logging.config.dictConfig(ail_logger.get_config(name='modules'))
def __init__(self):
# TODO Clean queue one object destruct
class AbstractImporter(ABC): # TODO ail queues
def __init__(self, queue=False):
""" """
Init Module AIL Importer
importer_name: str; set the importer name if different from the instance ClassName :param queue: Allow to push messages to other modules
""" """
# Module name if provided else instance className # Module name if provided else instance className
self.name = self._name() self.name = self._name()
self.logger = logging.getLogger(f'{self.__class__.__name__}')
# Setup the I/O queues for one shot importers
if queue:
self.queue = AILQueue(self.name, 'importer_manual')
@abstractmethod @abstractmethod
def importer(self, *args, **kwargs): def importer(self, *args, **kwargs):
@ -39,4 +54,57 @@ class AbstractImporter(ABC):
""" """
return self.__class__.__name__ return self.__class__.__name__
def add_message_to_queue(self, obj, message='', queue=None):
"""
Add message to queue
:param obj: AILObject
:param message: message to send in queue
:param queue: queue name or module name
ex: add_message_to_queue(item_id, 'Mail')
"""
if not obj:
raise Exception(f'Invalid AIL object, {obj}')
obj_global_id = obj.get_global_id()
self.queue.send_message(obj_global_id, message, queue)
def get_available_queues(self):
return self.queue.get_out_queues()
@staticmethod
def b64(content):
if isinstance(content, str):
content = content.encode()
return base64.b64encode(content).decode()
@staticmethod
def create_gzip(content):
if isinstance(content, str):
content = content.encode()
return gzip.compress(content)
def b64_gzip(self, content):
try:
gziped = self.create_gzip(content)
return self.b64(gziped)
except Exception as e:
self.logger.warning(e)
return ''
def create_message(self, content, b64=False, gzipped=False, source=None):
if not source:
source = self.name
if content:
if not gzipped:
content = self.b64_gzip(content)
elif not b64:
content = self.b64(content)
if not content:
return None
if isinstance(content, bytes):
content = content.decode()
return f'{source} {content}'
else:
return f'{source}'

View file

@ -33,3 +33,4 @@ class BgpMonitorFeeder(DefaultFeeder):
tag = 'infoleak:automatic-detection=bgp_monitor' tag = 'infoleak:automatic-detection=bgp_monitor'
item = Item(self.get_item_id()) item = Item(self.get_item_id())
item.add_tag(tag) item.add_tag(tag)
return set()

View file

@ -9,14 +9,21 @@ Process Feeder Json (example: Twitter feeder)
""" """
import os import os
import datetime import datetime
import sys
import uuid import uuid
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.objects import ail_objects
class DefaultFeeder: class DefaultFeeder:
"""Default Feeder""" """Default Feeder"""
def __init__(self, json_data): def __init__(self, json_data):
self.json_data = json_data self.json_data = json_data
self.item_id = None self.obj = None
self.name = None self.name = None
def get_name(self): def get_name(self):
@ -24,8 +31,12 @@ class DefaultFeeder:
Return feeder name. first part of the item_id and display in the UI Return feeder name. first part of the item_id and display in the UI
""" """
if not self.name: if not self.name:
return self.get_source() name = self.get_source()
return self.name else:
name = self.name
if not name:
name = 'default'
return name
def get_source(self): def get_source(self):
return self.json_data.get('source') return self.json_data.get('source')
@ -51,15 +62,22 @@ class DefaultFeeder:
""" """
return self.json_data.get('data') return self.json_data.get('data')
def get_obj_type(self):
meta = self.get_json_meta()
return meta.get('type', 'item')
## OVERWRITE ME ## ## OVERWRITE ME ##
def get_item_id(self): def get_obj(self):
""" """
Return item id. define item id Return obj global id. define obj global id
Default == item object
""" """
date = datetime.date.today().strftime("%Y/%m/%d") date = datetime.date.today().strftime("%Y/%m/%d")
item_id = os.path.join(self.get_name(), date, str(uuid.uuid4())) obj_id = os.path.join(self.get_name(), date, str(uuid.uuid4()))
self.item_id = f'{item_id}.gz' obj_id = f'{obj_id}.gz'
return self.item_id obj_id = f'item::{obj_id}'
self.obj = ail_objects.get_obj_from_global_id(obj_id)
return self.obj
## OVERWRITE ME ## ## OVERWRITE ME ##
def process_meta(self): def process_meta(self):
@ -67,4 +85,4 @@ class DefaultFeeder:
Process JSON meta filed. Process JSON meta filed.
""" """
# meta = self.get_json_meta() # meta = self.get_json_meta()
pass return set()

38
bin/importer/feeders/Discord.py Executable file
View file

@ -0,0 +1,38 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
"""
The Telegram Feeder Importer Module
================
Process Telegram JSON
"""
import os
import sys
import datetime
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from importer.feeders.abstract_chats_feeder import AbstractChatFeeder
from lib.ConfigLoader import ConfigLoader
from lib.objects import ail_objects
from lib.objects.Chats import Chat
from lib.objects import Messages
from lib.objects import UsersAccount
from lib.objects.Usernames import Username
import base64
class DiscordFeeder(AbstractChatFeeder):
def __init__(self, json_data):
super().__init__('discord', json_data)
# def get_obj(self):.
# obj_id = Messages.create_obj_id('telegram', chat_id, message_id, timestamp)
# obj_id = f'message:telegram:{obj_id}'
# self.obj = ail_objects.get_obj_from_global_id(obj_id)
# return self.obj

View file

@ -17,7 +17,7 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from importer.feeders.Default import DefaultFeeder from importer.feeders.Default import DefaultFeeder
from lib.objects.Usernames import Username from lib.objects.Usernames import Username
from lib import item_basic from lib.objects.Items import Item
class JabberFeeder(DefaultFeeder): class JabberFeeder(DefaultFeeder):
@ -36,7 +36,7 @@ class JabberFeeder(DefaultFeeder):
self.item_id = f'{item_id}.gz' self.item_id = f'{item_id}.gz'
return self.item_id return self.item_id
def process_meta(self): def process_meta(self): # TODO replace me by message
""" """
Process JSON meta field. Process JSON meta field.
""" """
@ -44,10 +44,12 @@ class JabberFeeder(DefaultFeeder):
# item_basic.add_map_obj_id_item_id(jabber_id, item_id, 'jabber_id') ############################################## # item_basic.add_map_obj_id_item_id(jabber_id, item_id, 'jabber_id') ##############################################
to = str(self.json_data['meta']['jabber:to']) to = str(self.json_data['meta']['jabber:to'])
fr = str(self.json_data['meta']['jabber:from']) fr = str(self.json_data['meta']['jabber:from'])
date = item_basic.get_item_date(item_id)
item = Item(self.item_id)
date = item.get_date()
user_to = Username(to, 'jabber') user_to = Username(to, 'jabber')
user_fr = Username(fr, 'jabber') user_fr = Username(fr, 'jabber')
user_to.add(date, self.item_id) user_to.add(date, item)
user_fr.add(date, self.item_id) user_fr.add(date, item)
return None return set()

View file

@ -15,42 +15,24 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from importer.feeders.Default import DefaultFeeder from importer.feeders.abstract_chats_feeder import AbstractChatFeeder
from lib.ConfigLoader import ConfigLoader
from lib.objects import ail_objects
from lib.objects.Chats import Chat
from lib.objects import Messages
from lib.objects import UsersAccount
from lib.objects.Usernames import Username from lib.objects.Usernames import Username
from lib import item_basic
class TelegramFeeder(DefaultFeeder): import base64
class TelegramFeeder(AbstractChatFeeder):
def __init__(self, json_data): def __init__(self, json_data):
super().__init__(json_data) super().__init__('telegram', json_data)
self.name = 'telegram'
# define item id # def get_obj(self):.
def get_item_id(self): # obj_id = Messages.create_obj_id('telegram', chat_id, message_id, timestamp)
# TODO use telegram message date # obj_id = f'message:telegram:{obj_id}'
date = datetime.date.today().strftime("%Y/%m/%d") # self.obj = ail_objects.get_obj_from_global_id(obj_id)
channel_id = str(self.json_data['meta']['channel_id']) # return self.obj
message_id = str(self.json_data['meta']['message_id'])
item_id = f'{channel_id}_{message_id}'
item_id = os.path.join('telegram', date, item_id)
self.item_id = f'{item_id}.gz'
return self.item_id
def process_meta(self):
"""
Process JSON meta field.
"""
# channel_id = str(self.json_data['meta']['channel_id'])
# message_id = str(self.json_data['meta']['message_id'])
# telegram_id = f'{channel_id}_{message_id}'
# item_basic.add_map_obj_id_item_id(telegram_id, item_id, 'telegram_id') #########################################
user = None
if self.json_data['meta'].get('user'):
user = str(self.json_data['meta']['user'])
elif self.json_data['meta'].get('channel'):
user = str(self.json_data['meta']['channel'].get('username'))
if user:
date = item_basic.get_item_date(self.item_id)
username = Username(user, 'telegram')
username.add(date, self.item_id)
return None

View file

@ -17,7 +17,7 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from importer.feeders.Default import DefaultFeeder from importer.feeders.Default import DefaultFeeder
from lib.objects.Usernames import Username from lib.objects.Usernames import Username
from lib import item_basic from lib.objects.Items import Item
class TwitterFeeder(DefaultFeeder): class TwitterFeeder(DefaultFeeder):
@ -40,9 +40,9 @@ class TwitterFeeder(DefaultFeeder):
''' '''
# tweet_id = str(self.json_data['meta']['twitter:tweet_id']) # tweet_id = str(self.json_data['meta']['twitter:tweet_id'])
# item_basic.add_map_obj_id_item_id(tweet_id, item_id, 'twitter_id') ############################################ # item_basic.add_map_obj_id_item_id(tweet_id, item_id, 'twitter_id') ############################################
item = Item(self.item_id)
date = item_basic.get_item_date(self.item_id) date = item.get_date()
user = str(self.json_data['meta']['twitter:id']) user = str(self.json_data['meta']['twitter:id'])
username = Username(user, 'twitter') username = Username(user, 'twitter')
username.add(date, item_id) username.add(date, item)
return None return set()

View file

@ -56,3 +56,5 @@ class UrlextractFeeder(DefaultFeeder):
item = Item(self.item_id) item = Item(self.item_id)
item.set_parent(parent_id) item.set_parent(parent_id)
return set()

View file

@ -0,0 +1,394 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
"""
Abstract Chat JSON Feeder Importer Module
================
Process Feeder Json (example: Twitter feeder)
"""
import datetime
import os
import sys
from abc import ABC
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from importer.feeders.Default import DefaultFeeder
from lib.objects.Chats import Chat
from lib.objects import ChatSubChannels
from lib.objects import ChatThreads
from lib.objects import Images
from lib.objects import Messages
from lib.objects import FilesNames
# from lib.objects import Files
from lib.objects import UsersAccount
from lib.objects.Usernames import Username
from lib import chats_viewer
import base64
import io
import gzip
# TODO remove compression ???
def _gunzip_bytes_obj(bytes_obj):
gunzipped_bytes_obj = None
try:
in_ = io.BytesIO()
in_.write(bytes_obj)
in_.seek(0)
with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
gunzipped_bytes_obj = fo.read()
except Exception as e:
print(f'Global; Invalid Gzip file: {e}')
return gunzipped_bytes_obj
class AbstractChatFeeder(DefaultFeeder, ABC):
def __init__(self, name, json_data):
super().__init__(json_data)
self.obj = None
self.name = name
def get_chat_protocol(self): # TODO # # # # # # # # # # # # #
return self.name
def get_chat_network(self):
self.json_data['meta'].get('network', None)
def get_chat_address(self):
self.json_data['meta'].get('address', None)
def get_chat_instance_uuid(self):
chat_instance_uuid = chats_viewer.create_chat_service_instance(self.get_chat_protocol(),
network=self.get_chat_network(),
address=self.get_chat_address())
# TODO SET
return chat_instance_uuid
def get_chat_id(self): # TODO RAISE ERROR IF NONE
return self.json_data['meta']['chat']['id']
def get_subchannel_id(self):
return self.json_data['meta']['chat'].get('subchannel', {}).get('id')
def get_subchannels(self):
pass
def get_thread_id(self):
return self.json_data['meta'].get('thread', {}).get('id')
def get_message_id(self):
return self.json_data['meta']['id']
def get_media_name(self):
return self.json_data['meta'].get('media', {}).get('name')
def get_reactions(self):
return self.json_data['meta'].get('reactions', [])
def get_message_timestamp(self):
if not self.json_data['meta'].get('date'):
return None
else:
return self.json_data['meta']['date']['timestamp']
# if self.json_data['meta'].get('date'):
# date = datetime.datetime.fromtimestamp( self.json_data['meta']['date']['timestamp'])
# date = date.strftime('%Y/%m/%d')
# else:
# date = datetime.date.today().strftime("%Y/%m/%d")
def get_message_date_timestamp(self):
timestamp = self.get_message_timestamp()
date = datetime.datetime.fromtimestamp(timestamp)
date = date.strftime('%Y%m%d')
return date, timestamp
def get_message_sender_id(self):
return self.json_data['meta']['sender']['id']
def get_message_reply(self):
return self.json_data['meta'].get('reply_to') # TODO change to reply ???
def get_message_reply_id(self):
return self.json_data['meta'].get('reply_to', {}).get('message_id')
def get_message_forward(self):
return self.json_data['meta'].get('forward')
def get_message_content(self):
decoded = base64.standard_b64decode(self.json_data['data'])
return _gunzip_bytes_obj(decoded)
def get_obj(self):
#### TIMESTAMP ####
timestamp = self.get_message_timestamp()
#### Create Object ID ####
chat_id = self.get_chat_id()
try:
message_id = self.get_message_id()
except KeyError:
if chat_id:
self.obj = Chat(chat_id, self.get_chat_instance_uuid())
return self.obj
else:
self.obj = None
return None
thread_id = self.get_thread_id()
# channel id
# thread id
# TODO sanitize obj type
obj_type = self.get_obj_type()
if obj_type == 'image':
self.obj = Images.Image(self.json_data['data-sha256'])
else:
obj_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, thread_id=thread_id)
self.obj = Messages.Message(obj_id)
return self.obj
def process_chat(self, new_objs, obj, date, timestamp, reply_id=None):
meta = self.json_data['meta']['chat'] # todo replace me by function
chat = Chat(self.get_chat_id(), self.get_chat_instance_uuid())
subchannel = None
thread = None
# date stat + correlation
chat.add(date, obj)
if meta.get('name'):
chat.set_name(meta['name'])
if meta.get('info'):
chat.set_info(meta['info'])
if meta.get('date'): # TODO check if already exists
chat.set_created_at(int(meta['date']['timestamp']))
if meta.get('icon'):
img = Images.create(meta['icon'], b64=True)
img.add(date, chat)
chat.set_icon(img.get_global_id())
new_objs.add(img)
if meta.get('username'):
username = Username(meta['username'], self.get_chat_protocol())
chat.update_username_timeline(username.get_global_id(), timestamp)
if meta.get('subchannel'):
subchannel, thread = self.process_subchannel(obj, date, timestamp, reply_id=reply_id)
chat.add_children(obj_global_id=subchannel.get_global_id())
else:
if obj.type == 'message':
if self.get_thread_id():
thread = self.process_thread(obj, chat, date, timestamp, reply_id=reply_id)
else:
chat.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
chats_obj = [chat]
if subchannel:
chats_obj.append(subchannel)
if thread:
chats_obj.append(thread)
return chats_obj
def process_subchannel(self, obj, date, timestamp, reply_id=None): # TODO CREATE DATE
meta = self.json_data['meta']['chat']['subchannel']
subchannel = ChatSubChannels.ChatSubChannel(f'{self.get_chat_id()}/{meta["id"]}', self.get_chat_instance_uuid())
thread = None
# TODO correlation with obj = message/image
subchannel.add(date)
if meta.get('date'): # TODO check if already exists
subchannel.set_created_at(int(meta['date']['timestamp']))
if meta.get('name'):
subchannel.set_name(meta['name'])
# subchannel.update_name(meta['name'], timestamp) # TODO #################
if meta.get('info'):
subchannel.set_info(meta['info'])
if obj.type == 'message':
if self.get_thread_id():
thread = self.process_thread(obj, subchannel, date, timestamp, reply_id=reply_id)
else:
subchannel.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
return subchannel, thread
def process_thread(self, obj, obj_chat, date, timestamp, reply_id=None):
meta = self.json_data['meta']['thread']
thread_id = self.get_thread_id()
p_chat_id = meta['parent'].get('chat')
p_subchannel_id = meta['parent'].get('subchannel')
p_message_id = meta['parent'].get('message')
# print(thread_id, p_chat_id, p_subchannel_id, p_message_id)
if p_chat_id == self.get_chat_id() and p_subchannel_id == self.get_subchannel_id():
thread = ChatThreads.create(thread_id, self.get_chat_instance_uuid(), p_chat_id, p_subchannel_id, p_message_id, obj_chat)
thread.add(date, obj)
thread.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
# TODO OTHERS CORRELATIONS TO ADD
if meta.get('name'):
thread.set_name(meta['name'])
return thread
# TODO
# else:
# # ADD NEW MESSAGE REF (used by discord)
def process_sender(self, new_objs, obj, date, timestamp):
meta = self.json_data['meta'].get('sender')
if not meta:
return None
user_account = UsersAccount.UserAccount(meta['id'], self.get_chat_instance_uuid())
# date stat + correlation
user_account.add(date, obj)
if meta.get('username'):
username = Username(meta['username'], self.get_chat_protocol())
# TODO timeline or/and correlation ????
user_account.add_correlation(username.type, username.get_subtype(r_str=True), username.id)
user_account.update_username_timeline(username.get_global_id(), timestamp)
# Username---Message
username.add(date) # TODO # correlation message ???
# ADDITIONAL METAS
if meta.get('firstname'):
user_account.set_first_name(meta['firstname'])
if meta.get('lastname'):
user_account.set_last_name(meta['lastname'])
if meta.get('phone'):
user_account.set_phone(meta['phone'])
if meta.get('icon'):
img = Images.create(meta['icon'], b64=True)
img.add(date, user_account)
user_account.set_icon(img.get_global_id())
new_objs.add(img)
if meta.get('info'):
user_account.set_info(meta['info'])
return user_account
def process_meta(self): # TODO CHECK MANDATORY FIELDS
"""
Process JSON meta filed.
"""
# meta = self.get_json_meta()
objs = set()
if self.obj:
objs.add(self.obj)
new_objs = set()
date, timestamp = self.get_message_date_timestamp()
# REPLY
reply_id = self.get_message_reply_id()
print(self.obj.type)
# TODO FILES + FILES REF
# get object by meta object type
if self.obj.type == 'message':
# Content
obj = Messages.create(self.obj.id, self.get_message_content())
# FILENAME
media_name = self.get_media_name()
if media_name:
print(media_name)
FilesNames.FilesNames().create(media_name, date, obj)
for reaction in self.get_reactions():
obj.add_reaction(reaction['reaction'], int(reaction['count']))
elif self.obj.type == 'chat':
pass
else:
chat_id = self.get_chat_id()
thread_id = self.get_thread_id()
channel_id = self.get_subchannel_id()
message_id = self.get_message_id()
message_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, channel_id=channel_id, thread_id=thread_id)
message = Messages.Message(message_id)
# create empty message if message don't exist
if not message.exists():
message.create('')
objs.add(message)
if message.exists(): # TODO Correlation user-account image/filename ????
obj = Images.create(self.get_message_content())
obj.add(date, message)
obj.set_parent(obj_global_id=message.get_global_id())
# FILENAME
media_name = self.get_media_name()
if media_name:
FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
for reaction in self.get_reactions():
message.add_reaction(reaction['reaction'], int(reaction['count']))
for obj in objs: # TODO PERF avoid parsing metas multiple times
# TODO get created subchannel + thread
# => create correlation user-account with object
print(obj.id)
# CHAT
chat_objs = self.process_chat(new_objs, obj, date, timestamp, reply_id=reply_id)
# Message forward
# if self.get_json_meta().get('forward'):
# forward_from = self.get_message_forward()
# print('-----------------------------------------------------------')
# print(forward_from)
# if forward_from:
# forward_from_type = forward_from['from']['type']
# if forward_from_type == 'channel' or forward_from_type == 'chat':
# chat_forward_id = forward_from['from']['id']
# chat_forward = Chat(chat_forward_id, self.get_chat_instance_uuid())
# if chat_forward.exists():
# for chat_obj in chat_objs:
# if chat_obj.type == 'chat':
# chat_forward.add_relationship(chat_obj.get_global_id(), 'forward')
# # chat_forward.add_relationship(obj.get_global_id(), 'forward')
# SENDER # TODO HANDLE NULL SENDER
user_account = self.process_sender(new_objs, obj, date, timestamp)
if user_account:
# UserAccount---ChatObjects
for obj_chat in chat_objs:
user_account.add_correlation(obj_chat.type, obj_chat.get_subtype(r_str=True), obj_chat.id)
# if chat: # TODO Chat---Username correlation ???
# # Chat---Username => need to handle members and participants
# chat.add_correlation(username.type, username.get_subtype(r_str=True), username.id)
# TODO Sender image -> correlation
# image
# -> subchannel ?
# -> thread id ?
return new_objs | objs

View file

@ -83,6 +83,7 @@ class ConfigLoader(object):
else: else:
return [] return []
# # # # Directory Config # # # # # # # # Directory Config # # # #
config_loader = ConfigLoader() config_loader = ConfigLoader()

View file

@ -85,18 +85,18 @@ def add_obj_duplicate(algo, similarity, obj_type, subtype, obj_id, id_2):
r_serv_db.sadd(f'obj:duplicates:{obj_type}:{subtype}:{obj_id}', f'{similarity}:{algo}:{id_2}') r_serv_db.sadd(f'obj:duplicates:{obj_type}:{subtype}:{obj_id}', f'{similarity}:{algo}:{id_2}')
def add_duplicate(algo, hash_, similarity, obj_type, subtype, id, date_ymonth): def add_duplicate(algo, hash_, similarity, obj_type, subtype, obj_id, date_ymonth):
obj2_id = get_object_id_by_hash(algo, hash_, date_ymonth) obj2_id = get_object_id_by_hash(algo, hash_, date_ymonth)
# same content # same content
if similarity == 100: if similarity == 100:
dups = get_obj_duplicates(obj_type, subtype, id) dups = get_obj_duplicates(obj_type, subtype, obj_id)
for dup_id in dups: for dup_id in dups:
for algo_dict in dups[dup_id]: for algo_dict in dups[dup_id]:
if algo_dict['similarity'] == 100 and algo_dict['algo'] == algo: if algo_dict['similarity'] == 100 and algo_dict['algo'] == algo:
add_obj_duplicate(algo, similarity, obj_type, subtype, id, dups[dup_id]) add_obj_duplicate(algo, similarity, obj_type, subtype, obj_id, dups[dup_id])
add_obj_duplicate(algo, similarity, obj_type, subtype, dups[dup_id], id) add_obj_duplicate(algo, similarity, obj_type, subtype, dups[dup_id], obj_id)
add_obj_duplicate(algo, similarity, obj_type, subtype, id, obj2_id) add_obj_duplicate(algo, similarity, obj_type, subtype, obj_id, obj2_id)
add_obj_duplicate(algo, similarity, obj_type, subtype, obj2_id, id) add_obj_duplicate(algo, similarity, obj_type, subtype, obj2_id, obj_id)
# TODO # TODO
def delete_obj_duplicates(): def delete_obj_duplicates():

View file

@ -16,12 +16,13 @@ import time
import uuid import uuid
from enum import Enum from enum import Enum
from flask import escape from markupsafe import escape
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from lib import ail_core
from lib import ConfigLoader from lib import ConfigLoader
from lib import Tag from lib import Tag
from lib.exceptions import UpdateInvestigationError from lib.exceptions import UpdateInvestigationError
@ -234,18 +235,27 @@ class Investigation(object):
objs.append(dict_obj) objs.append(dict_obj)
return objs return objs
def get_objects_comment(self, obj_global_id):
return r_tracking.hget(f'investigations:objs:comment:{self.uuid}', obj_global_id)
def set_objects_comment(self, obj_global_id, comment):
if comment:
r_tracking.hset(f'investigations:objs:comment:{self.uuid}', obj_global_id, comment)
# # TODO: def register_object(self, Object): in OBJECT CLASS # # TODO: def register_object(self, Object): in OBJECT CLASS
def register_object(self, obj_id, obj_type, subtype): def register_object(self, obj_id, obj_type, subtype, comment=''):
r_tracking.sadd(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}') r_tracking.sadd(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
r_tracking.sadd(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid) r_tracking.sadd(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid)
if comment:
self.set_objects_comment(f'{obj_type}:{subtype}:{obj_id}', comment)
timestamp = int(time.time()) timestamp = int(time.time())
self.set_last_change(timestamp) self.set_last_change(timestamp)
def unregister_object(self, obj_id, obj_type, subtype): def unregister_object(self, obj_id, obj_type, subtype):
r_tracking.srem(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}') r_tracking.srem(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
r_tracking.srem(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid) r_tracking.srem(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid)
r_tracking.hdel(f'investigations:objs:comment:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
timestamp = int(time.time()) timestamp = int(time.time())
self.set_last_change(timestamp) self.set_last_change(timestamp)
@ -350,7 +360,7 @@ def get_investigations_selector():
for investigation_uuid in get_all_investigations(): for investigation_uuid in get_all_investigations():
investigation = Investigation(investigation_uuid) investigation = Investigation(investigation_uuid)
name = investigation.get_info() name = investigation.get_info()
l_investigations.append({"id":investigation_uuid, "name": name}) l_investigations.append({"id": investigation_uuid, "name": name})
return l_investigations return l_investigations
#{id:'8dc4b81aeff94a9799bd70ba556fa345',name:"Paris"} #{id:'8dc4b81aeff94a9799bd70ba556fa345',name:"Paris"}
@ -445,14 +455,18 @@ def api_register_object(json_dict):
investigation = Investigation(investigation_uuid) investigation = Investigation(investigation_uuid)
obj_type = json_dict.get('type', '').replace(' ', '') obj_type = json_dict.get('type', '').replace(' ', '')
if not exists_obj_type(obj_type): if obj_type not in ail_core.get_all_objects():
return {"status": "error", "reason": f"Invalid Object Type: {obj_type}"}, 400 return {"status": "error", "reason": f"Invalid Object Type: {obj_type}"}, 400
subtype = json_dict.get('subtype', '') subtype = json_dict.get('subtype', '')
if subtype == 'None': if subtype == 'None':
subtype = '' subtype = ''
obj_id = json_dict.get('id', '').replace(' ', '') obj_id = json_dict.get('id', '').replace(' ', '')
res = investigation.register_object(obj_id, obj_type, subtype)
comment = json_dict.get('comment', '')
# if comment:
# comment = escape(comment)
res = investigation.register_object(obj_id, obj_type, subtype, comment=comment)
return res, 200 return res, 200
def api_unregister_object(json_dict): def api_unregister_object(json_dict):

View file

@ -2,7 +2,24 @@
# -*-coding:UTF-8 -* # -*-coding:UTF-8 -*
import os import os
import re
import sys import sys
import html2text
import gcld3
from libretranslatepy import LibreTranslateAPI
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
config_loader = ConfigLoader()
r_cache = config_loader.get_redis_conn("Redis_Cache")
TRANSLATOR_URL = config_loader.get_config_str('Translation', 'libretranslate')
config_loader = None
dict_iso_languages = { dict_iso_languages = {
'af': 'Afrikaans', 'af': 'Afrikaans',
@ -237,3 +254,201 @@ def get_iso_from_languages(l_languages, sort=False):
if sort: if sort:
l_iso = sorted(l_iso) l_iso = sorted(l_iso)
return l_iso return l_iso
class LanguageDetector:
pass
def get_translator_instance():
return TRANSLATOR_URL
def _get_html2text(content, ignore_links=False):
h = html2text.HTML2Text()
h.ignore_links = ignore_links
h.ignore_images = ignore_links
return h.handle(content)
def _clean_text_to_translate(content, html=False, keys_blocks=True):
if html:
content = _get_html2text(content, ignore_links=True)
# REMOVE URLS
regex = r'\b(?:http://|https://)?(?:[a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)(?:\:[0-9]+)*(?:/(?:$|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*\b'
url_regex = re.compile(regex)
urls = url_regex.findall(content)
urls = sorted(urls, key=len, reverse=True)
for url in urls:
content = content.replace(url, '')
# REMOVE PGP Blocks
if keys_blocks:
regex_pgp_public_blocs = r'-----BEGIN PGP PUBLIC KEY BLOCK-----[\s\S]+?-----END PGP PUBLIC KEY BLOCK-----'
regex_pgp_signature = r'-----BEGIN PGP SIGNATURE-----[\s\S]+?-----END PGP SIGNATURE-----'
regex_pgp_message = r'-----BEGIN PGP MESSAGE-----[\s\S]+?-----END PGP MESSAGE-----'
re.compile(regex_pgp_public_blocs)
re.compile(regex_pgp_signature)
re.compile(regex_pgp_message)
res = re.findall(regex_pgp_public_blocs, content)
for it in res:
content = content.replace(it, '')
res = re.findall(regex_pgp_signature, content)
for it in res:
content = content.replace(it, '')
res = re.findall(regex_pgp_message, content)
for it in res:
content = content.replace(it, '')
return content
#### AIL Objects ####
def get_obj_translation(obj_global_id, content, field='', source=None, target='en'):
"""
Returns translated content
"""
translation = r_cache.get(f'translation:{target}:{obj_global_id}:{field}')
if translation:
# DEBUG
# print('cache')
# r_cache.expire(f'translation:{target}:{obj_global_id}:{field}', 0)
return translation
translation = LanguageTranslator().translate(content, source=source, target=target)
if translation:
r_cache.set(f'translation:{target}:{obj_global_id}:{field}', translation)
r_cache.expire(f'translation:{target}:{obj_global_id}:{field}', 300)
return translation
## --AIL Objects-- ##
class LanguagesDetector:
def __init__(self, nb_langs=3, min_proportion=0.2, min_probability=0.7, min_len=0):
self.lt = LibreTranslateAPI(get_translator_instance())
try:
self.lt.languages()
except Exception:
self.lt = None
self.detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)
self.nb_langs = nb_langs
self.min_proportion = min_proportion
self.min_probability = min_probability
self.min_len = min_len
def detect_gcld3(self, content):
languages = []
content = _clean_text_to_translate(content, html=True)
if self.min_len > 0:
if len(content) < self.min_len:
return languages
for lang in self.detector.FindTopNMostFreqLangs(content, num_langs=self.nb_langs):
if lang.proportion >= self.min_proportion and lang.probability >= self.min_probability and lang.is_reliable:
languages.append(lang.language)
return languages
def detect_libretranslate(self, content):
languages = []
try:
# [{"confidence": 0.6, "language": "en"}]
resp = self.lt.detect(content)
except Exception as e: # TODO ERROR MESSAGE
raise Exception(f'libretranslate error: {e}')
# resp = []
if resp:
if isinstance(resp, dict):
raise Exception(f'libretranslate error {resp}')
for language in resp:
if language.confidence >= self.min_probability:
languages.append(language)
return languages
def detect(self, content, force_gcld3=False):
# gcld3
if len(content) >= 200 or not self.lt or force_gcld3:
language = self.detect_gcld3(content)
# libretranslate
else:
language = self.detect_libretranslate(content)
return language
class LanguageTranslator:
def __init__(self):
self.lt = LibreTranslateAPI(get_translator_instance())
def languages(self):
languages = []
try:
for dict_lang in self.lt.languages():
languages.append({'iso': dict_lang['code'], 'language': dict_lang['name']})
except Exception as e:
print(e)
return languages
def detect_gcld3(self, content):
content = _clean_text_to_translate(content, html=True)
detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)
lang = detector.FindLanguage(content)
# print(lang.language)
# print(lang.is_reliable)
# print(lang.proportion)
# print(lang.probability)
return lang.language
def detect_libretranslate(self, content):
try:
language = self.lt.detect(content)
except: # TODO ERROR MESSAGE
language = None
if language:
return language[0].get('language')
def detect(self, content):
# gcld3
if len(content) >= 200:
language = self.detect_gcld3(content)
# libretranslate
else:
language = self.detect_libretranslate(content)
return language
def translate(self, content, source=None, target="en"): # TODO source target
if target not in get_translation_languages():
return None
translation = None
if content:
if not source:
source = self.detect(content)
# print(source, content)
if source:
if source != target:
try:
# print(content, source, target)
translation = self.lt.translate(content, source, target)
except:
translation = None
# TODO LOG and display error
if translation == content:
print('EQUAL')
translation = None
return translation
LIST_LANGUAGES = {}
def get_translation_languages():
global LIST_LANGUAGES
if not LIST_LANGUAGES:
try:
LIST_LANGUAGES = {}
for lang in LanguageTranslator().languages():
LIST_LANGUAGES[lang['iso']] = lang['language']
except Exception as e:
print(e)
LIST_LANGUAGES = {}
return LIST_LANGUAGES
if __name__ == '__main__':
# t_content = ''
langg = LanguageTranslator()
# langg = LanguagesDetector()
# lang.translate(t_content, source='ru')
langg.languages()

View file

@ -64,7 +64,7 @@ unsafe_tags = build_unsafe_tags()
# get set_keys: intersection # get set_keys: intersection
def get_obj_keys_by_tags(tags, obj_type, subtype='', date=None): def get_obj_keys_by_tags(tags, obj_type, subtype='', date=None):
l_set_keys = [] l_set_keys = []
if obj_type == 'item': if obj_type == 'item' or obj_type == 'message':
for tag in tags: for tag in tags:
l_set_keys.append(f'{obj_type}:{subtype}:{tag}:{date}') l_set_keys.append(f'{obj_type}:{subtype}:{tag}:{date}')
else: else:
@ -96,8 +96,6 @@ def get_taxonomies():
def get_active_taxonomies(): def get_active_taxonomies():
return r_tags.smembers('taxonomies:enabled') return r_tags.smembers('taxonomies:enabled')
'active_taxonomies'
def is_taxonomy_enabled(taxonomy): def is_taxonomy_enabled(taxonomy):
# enabled = r_tags.sismember('taxonomies:enabled', taxonomy) # enabled = r_tags.sismember('taxonomies:enabled', taxonomy)
try: try:
@ -340,7 +338,7 @@ def get_galaxy_meta(galaxy_name, nb_active_tags=False):
else: else:
meta['icon'] = f'fas fa-{icon}' meta['icon'] = f'fas fa-{icon}'
if nb_active_tags: if nb_active_tags:
meta['nb_active_tags'] = get_galaxy_nb_tags_enabled(galaxy) meta['nb_active_tags'] = get_galaxy_nb_tags_enabled(galaxy.type)
meta['nb_tags'] = len(get_galaxy_tags(galaxy.type)) meta['nb_tags'] = len(get_galaxy_tags(galaxy.type))
return meta return meta
@ -389,9 +387,13 @@ def get_cluster_tags(cluster_type, enabled=False):
meta_tag = {'tag': tag, 'description': cluster_val.description} meta_tag = {'tag': tag, 'description': cluster_val.description}
if enabled: if enabled:
meta_tag['enabled'] = is_galaxy_tag_enabled(cluster_type, tag) meta_tag['enabled'] = is_galaxy_tag_enabled(cluster_type, tag)
synonyms = cluster_val.meta.synonyms cluster_val_meta = cluster_val.meta
if cluster_val_meta:
synonyms = cluster_val_meta.synonyms
if not synonyms: if not synonyms:
synonyms = [] synonyms = []
else:
synonyms = []
meta_tag['synonyms'] = synonyms meta_tag['synonyms'] = synonyms
tags.append(meta_tag) tags.append(meta_tag)
return tags return tags
@ -633,7 +635,7 @@ def update_tag_metadata(tag, date, delete=False): # # TODO: delete Tags
# r_tags.smembers(f'{tag}:{date}') # r_tags.smembers(f'{tag}:{date}')
# r_tags.smembers(f'{obj_type}:{tag}') # r_tags.smembers(f'{obj_type}:{tag}')
def get_tag_objects(tag, obj_type, subtype='', date=''): def get_tag_objects(tag, obj_type, subtype='', date=''):
if obj_type == 'item': if obj_type == 'item' or obj_type == 'message':
return r_tags.smembers(f'{obj_type}:{subtype}:{tag}:{date}') return r_tags.smembers(f'{obj_type}:{subtype}:{tag}:{date}')
else: else:
return r_tags.smembers(f'{obj_type}:{subtype}:{tag}') return r_tags.smembers(f'{obj_type}:{subtype}:{tag}')
@ -641,23 +643,32 @@ def get_tag_objects(tag, obj_type, subtype='', date=''):
def get_object_tags(obj_type, obj_id, subtype=''): def get_object_tags(obj_type, obj_id, subtype=''):
return r_tags.smembers(f'tag:{obj_type}:{subtype}:{obj_id}') return r_tags.smembers(f'tag:{obj_type}:{subtype}:{obj_id}')
def add_object_tag(tag, obj_type, id, subtype=''): def add_object_tag(tag, obj_type, obj_id, subtype=''):
if r_tags.sadd(f'tag:{obj_type}:{subtype}:{id}', tag) == 1: if r_tags.sadd(f'tag:{obj_type}:{subtype}:{obj_id}', tag) == 1:
r_tags.sadd('list_tags', tag) r_tags.sadd('list_tags', tag)
r_tags.sadd(f'list_tags:{obj_type}', tag) r_tags.sadd(f'list_tags:{obj_type}', tag)
r_tags.sadd(f'list_tags:{obj_type}:{subtype}', tag) r_tags.sadd(f'list_tags:{obj_type}:{subtype}', tag)
if obj_type == 'item': if obj_type == 'item':
date = item_basic.get_item_date(id) date = item_basic.get_item_date(obj_id)
r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', id) r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', obj_id)
# add domain tag # add domain tag
if item_basic.is_crawled(id) and tag != 'infoleak:submission="crawler"' and tag != 'infoleak:submission="manual"': if item_basic.is_crawled(obj_id) and tag != 'infoleak:submission="crawler"' and tag != 'infoleak:submission="manual"':
domain = item_basic.get_item_domain(id) domain = item_basic.get_item_domain(obj_id)
add_object_tag(tag, "domain", domain) add_object_tag(tag, "domain", domain)
update_tag_metadata(tag, date)
# MESSAGE
elif obj_type == 'message':
timestamp = obj_id.split('/')[1]
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', obj_id)
# TODO ADD CHAT TAGS ????
update_tag_metadata(tag, date) update_tag_metadata(tag, date)
else: else:
r_tags.sadd(f'{obj_type}:{subtype}:{tag}', id) r_tags.sadd(f'{obj_type}:{subtype}:{tag}', obj_id)
r_tags.hincrby(f'daily_tags:{datetime.date.today().strftime("%Y%m%d")}', tag, 1) r_tags.hincrby(f'daily_tags:{datetime.date.today().strftime("%Y%m%d")}', tag, 1)
@ -673,8 +684,8 @@ def confirm_tag(tag, obj):
# TODO REVIEW ME # TODO REVIEW ME
def update_tag_global_by_obj_type(tag, obj_type, subtype=''): def update_tag_global_by_obj_type(tag, obj_type, subtype=''):
tag_deleted = False tag_deleted = False
if obj_type == 'item': if obj_type == 'item' or obj_type == 'message':
if not r_tags.exists(f'tag_metadata:{tag}'): if not r_tags.exists(f'tag_metadata:{tag}'): # TODO FIXME #################################################################
tag_deleted = True tag_deleted = True
else: else:
if not r_tags.exists(f'{obj_type}:{subtype}:{tag}'): if not r_tags.exists(f'{obj_type}:{subtype}:{tag}'):
@ -705,6 +716,12 @@ def delete_object_tag(tag, obj_type, id, subtype=''):
date = item_basic.get_item_date(id) date = item_basic.get_item_date(id)
r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id) r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id)
update_tag_metadata(tag, date, delete=True)
elif obj_type == 'message':
timestamp = id.split('/')[1]
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id)
update_tag_metadata(tag, date, delete=True) update_tag_metadata(tag, date, delete=True)
else: else:
r_tags.srem(f'{obj_type}:{subtype}:{tag}', id) r_tags.srem(f'{obj_type}:{subtype}:{tag}', id)
@ -727,7 +744,7 @@ def delete_object_tags(obj_type, subtype, obj_id):
def get_obj_by_tags(obj_type, l_tags, date_from=None, date_to=None, nb_obj=50, page=1): def get_obj_by_tags(obj_type, l_tags, date_from=None, date_to=None, nb_obj=50, page=1):
# with daterange # with daterange
l_tagged_obj = [] l_tagged_obj = []
if obj_type=='item': if obj_type=='item' or obj_type=='message':
#sanityze date #sanityze date
date_range = sanitise_tags_date_range(l_tags, date_from=date_from, date_to=date_to) date_range = sanitise_tags_date_range(l_tags, date_from=date_from, date_to=date_to)
l_dates = Date.substract_date(date_range['date_from'], date_range['date_to']) l_dates = Date.substract_date(date_range['date_from'], date_range['date_to'])
@ -1183,12 +1200,17 @@ def get_enabled_tags_with_synonyms_ui():
# TYPE -> taxonomy/galaxy/custom # TYPE -> taxonomy/galaxy/custom
# TODO GET OBJ Types
class Tag: class Tag:
def __int__(self, name: str, local=False): # TODO Get first seen by object, obj='item def __int__(self, name: str, local=False): # TODO Get first seen by object, obj='item
self.name = name self.name = name
self.local = local self.local = local
# TODO
def exists(self):
pass
def is_local(self): def is_local(self):
return self.local return self.local
@ -1199,7 +1221,11 @@ class Tag:
else: else:
return 'taxonomy' return 'taxonomy'
def is_taxonomy(self):
return not self.local and self.is_galaxy()
def is_galaxy(self):
return not self.local and self.name.startswith('misp-galaxy:')
def get_first_seen(self, r_int=False): def get_first_seen(self, r_int=False):
first_seen = r_tags.hget(f'meta:tag:{self.name}', 'first_seen') first_seen = r_tags.hget(f'meta:tag:{self.name}', 'first_seen')
@ -1210,6 +1236,9 @@ class Tag:
first_seen = 99999999 first_seen = 99999999
return first_seen return first_seen
def set_first_seen(self, first_seen):
return r_tags.hget(f'meta:tag:{self.name}', 'first_seen', int(first_seen))
def get_last_seen(self, r_int=False): def get_last_seen(self, r_int=False):
last_seen = r_tags.hget(f'meta:tag:{self.name}', 'last_seen') # 'last_seen:object' -> only if date or daterange last_seen = r_tags.hget(f'meta:tag:{self.name}', 'last_seen') # 'last_seen:object' -> only if date or daterange
if r_int: if r_int:
@ -1219,6 +1248,9 @@ class Tag:
last_seen = 0 last_seen = 0
return last_seen return last_seen
def set_last_seen(self, last_seen):
return r_tags.hset(f'meta:tag:{self.name}', 'last_seen', int(last_seen))
def get_color(self): def get_color(self):
color = r_tags.hget(f'meta:tag:{self.name}', 'color') color = r_tags.hget(f'meta:tag:{self.name}', 'color')
if not color: if not color:
@ -1241,6 +1273,131 @@ class Tag:
'local': self.is_local()} 'local': self.is_local()}
return meta return meta
def update_obj_type_first_seen(self, obj_type, first_seen, last_seen): # TODO SUBTYPE ##################################
if int(first_seen) > int(last_seen):
raise Exception(f'INVALID first_seen/last_seen, {first_seen}/{last_seen}')
for date in Date.get_daterange(first_seen, last_seen):
date = int(date)
if date == last_seen:
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') > 0:
r_tags.hset(f'tag_metadata:{self.name}', 'first_seen', first_seen)
else:
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen') # TODO SUBTYPE
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen') # TODO SUBTYPE
r_tags.srem(f'list_tags:{obj_type}', self.name) # TODO SUBTYPE
elif r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') > 0:
r_tags.hset(f'tag_metadata:{self.name}', 'first_seen', first_seen) # TODO METADATA OBJECT NAME
def update_obj_type_last_seen(self, obj_type, first_seen, last_seen): # TODO SUBTYPE ##################################
if int(first_seen) > int(last_seen):
raise Exception(f'INVALID first_seen/last_seen, {first_seen}/{last_seen}')
for date in Date.get_daterange(first_seen, last_seen).reverse():
date = int(date)
if date == last_seen:
if r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') > 0:
r_tags.hset(f'tag_metadata:{self.name}', 'last_seen', last_seen)
else:
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen') # TODO SUBTYPE
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen') # TODO SUBTYPE
r_tags.srem(f'list_tags:{obj_type}', self.name) # TODO SUBTYPE
elif r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') > 0:
r_tags.hset(f'tag_metadata:{self.name}', 'last_seen', last_seen) # TODO METADATA OBJECT NAME
# TODO
# TODO Update First seen and last seen
# TODO SUBTYPE CHATS ??????????????
def update_obj_type_date(self, obj_type, date, op='add', first_seen=None, last_seen=None):
date = int(date)
if not first_seen:
first_seen = self.get_first_seen(r_int=True)
if not last_seen:
last_seen = self.get_last_seen(r_int=True)
# Add tag
if op == 'add':
if date < first_seen:
self.set_first_seen(date)
if date > last_seen:
self.set_last_seen(date)
# Delete tag
else:
if date == first_seen and date == last_seen:
# TODO OBJECTS ##############################################################################################
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') < 1: ####################### TODO OBJ SUBTYPE ???????????????????
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen')
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen')
# TODO CHECK IF DELETE FULL TAG LIST ############################
elif date == first_seen:
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') < 1:
if int(last_seen) >= int(first_seen):
self.update_obj_type_first_seen(obj_type, first_seen, last_seen) # TODO OBJ_TYPE
elif date == last_seen:
if r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') < 1:
if int(last_seen) >= int(first_seen):
self.update_obj_type_last_seen(obj_type, first_seen, last_seen) # TODO OBJ_TYPE
# STATS
nb = r_tags.hincrby(f'daily_tags:{date}', self.name, -1)
if nb < 1:
r_tags.hdel(f'daily_tags:{date}', self.name)
# TODO -> CHECK IF TAG EXISTS + UPDATE FIRST SEEN/LAST SEEN
def update(self, date=None):
pass
# TODO CHANGE ME TO SUB FUNCTION ##### add_object_tag(tag, obj_type, obj_id, subtype='')
def add(self, obj_type, subtype, obj_id):
if subtype is None:
subtype = ''
if r_tags.sadd(f'tag:{obj_type}:{subtype}:{obj_id}', self.name) == 1:
r_tags.sadd('list_tags', self.name)
r_tags.sadd(f'list_tags:{obj_type}', self.name)
if subtype:
r_tags.sadd(f'list_tags:{obj_type}:{subtype}', self.name)
if obj_type == 'item':
date = item_basic.get_item_date(obj_id)
# add domain tag
if item_basic.is_crawled(obj_id) and self.name != 'infoleak:submission="crawler"' and self.name != 'infoleak:submission="manual"':
domain = item_basic.get_item_domain(obj_id)
self.add('domain', '', domain)
elif obj_type == 'message':
timestamp = obj_id.split('/')[1]
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
else:
date = None
if date:
r_tags.sadd(f'{obj_type}:{subtype}:{self.name}:{date}', obj_id)
update_tag_metadata(self.name, date)
else:
r_tags.sadd(f'{obj_type}:{subtype}:{self.name}', obj_id)
# TODO REPLACE ME BY DATE TAGS ????
# STATS BY TYPE ???
# DAILY STATS
r_tags.hincrby(f'daily_tags:{datetime.date.today().strftime("%Y%m%d")}', self.name, 1)
# TODO CREATE FUNCTION GET OBJECT DATE
def remove(self, obj_type, subtype, obj_id):
# TODO CHECK IN ALL OBJECT TO DELETE
pass
def delete(self):
pass
#### TAG AUTO PUSH #### #### TAG AUTO PUSH ####
@ -1381,7 +1538,7 @@ def api_add_obj_tags(tags=[], galaxy_tags=[], object_id=None, object_type="item"
# r_serv_metadata.srem('tag:{}'.format(object_id), tag) # r_serv_metadata.srem('tag:{}'.format(object_id), tag)
# r_tags.srem('{}:{}'.format(object_type, tag), object_id) # r_tags.srem('{}:{}'.format(object_type, tag), object_id)
def delete_tag(object_type, tag, object_id, obj_date=None): ################################ # TODO: def delete_tag(object_type, tag, object_id, obj_date=None): ################################ # TODO: REMOVE ME
# tag exist # tag exist
if is_obj_tagged(object_id, tag): if is_obj_tagged(object_id, tag):
if not obj_date: if not obj_date:
@ -1447,6 +1604,29 @@ def get_list_of_solo_tags_to_export_by_type(export_type): # by type
return None return None
#r_serv_db.smembers('whitelist_hive') #r_serv_db.smembers('whitelist_hive')
def _fix_tag_obj_id(date_from):
date_to = datetime.date.today().strftime("%Y%m%d")
for obj_type in ail_core.get_all_objects():
print(obj_type)
for tag in get_all_obj_tags(obj_type):
if ';' in tag:
print(tag)
new_tag = tag.split(';')[0]
print(new_tag)
r_tags.hdel(f'tag_metadata:{tag}', 'first_seen')
r_tags.hdel(f'tag_metadata:{tag}', 'last_seen')
r_tags.srem(f'list_tags:{obj_type}', tag)
r_tags.srem(f'list_tags:{obj_type}:', tag)
r_tags.srem(f'list_tags', tag)
raw = get_obj_by_tags(obj_type, [tag], nb_obj=500000, date_from=date_from, date_to=date_to)
if raw.get('tagged_obj', []):
for obj_id in raw['tagged_obj']:
# print(obj_id)
delete_object_tag(tag, obj_type, obj_id)
add_object_tag(new_tag, obj_type, obj_id)
else:
update_tag_global_by_obj_type(tag, obj_type)
# if __name__ == '__main__': # if __name__ == '__main__':
# taxo = 'accessnow' # taxo = 'accessnow'
# # taxo = TAXONOMIES.get(taxo) # # taxo = TAXONOMIES.get(taxo)

View file

@ -2,6 +2,8 @@
# -*-coding:UTF-8 -* # -*-coding:UTF-8 -*
import json import json
import os import os
import logging
import logging.config
import re import re
import sys import sys
import time import time
@ -14,7 +16,7 @@ from ail_typo_squatting import runAll
import math import math
from collections import defaultdict from collections import defaultdict
from flask import escape from markupsafe import escape
from textblob import TextBlob from textblob import TextBlob
from nltk.tokenize import RegexpTokenizer from nltk.tokenize import RegexpTokenizer
@ -24,11 +26,16 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from packages import Date from packages import Date
from lib.ail_core import get_objects_tracked, get_object_all_subtypes, get_objects_retro_hunted from lib.ail_core import get_objects_tracked, get_object_all_subtypes, get_objects_retro_hunted
from lib import ail_logger
from lib import ConfigLoader from lib import ConfigLoader
from lib import item_basic from lib import item_basic
from lib import Tag from lib import Tag
from lib.Users import User from lib.Users import User
# LOGS
logging.config.dictConfig(ail_logger.get_config(name='modules'))
logger = logging.getLogger()
config_loader = ConfigLoader.ConfigLoader() config_loader = ConfigLoader.ConfigLoader()
r_cache = config_loader.get_redis_conn("Redis_Cache") r_cache = config_loader.get_redis_conn("Redis_Cache")
@ -207,6 +214,13 @@ class Tracker:
if filters: if filters:
self._set_field('filters', json.dumps(filters)) self._set_field('filters', json.dumps(filters))
def del_filters(self, tracker_type, to_track):
filters = self.get_filters()
for obj_type in filters:
r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', to_track)
r_tracker.srem(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}')
r_tracker.hdel(f'tracker:{self.uuid}', 'filters')
def get_tracked(self): def get_tracked(self):
return self._get_field('tracked') return self._get_field('tracked')
@ -241,7 +255,8 @@ class Tracker:
return self._get_field('user_id') return self._get_field('user_id')
def webhook_export(self): def webhook_export(self):
return r_tracker.hexists(f'tracker:mail:{self.uuid}', 'webhook') webhook = self.get_webhook()
return webhook is not None and webhook
def get_webhook(self): def get_webhook(self):
return r_tracker.hget(f'tracker:{self.uuid}', 'webhook') return r_tracker.hget(f'tracker:{self.uuid}', 'webhook')
@ -513,6 +528,7 @@ class Tracker:
self._set_mails(mails) self._set_mails(mails)
# Filters # Filters
self.del_filters(old_type, old_to_track)
if not filters: if not filters:
filters = {} filters = {}
for obj_type in get_objects_tracked(): for obj_type in get_objects_tracked():
@ -522,9 +538,6 @@ class Tracker:
for obj_type in filters: for obj_type in filters:
r_tracker.sadd(f'trackers:objs:{tracker_type}:{obj_type}', to_track) r_tracker.sadd(f'trackers:objs:{tracker_type}:{obj_type}', to_track)
r_tracker.sadd(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}') r_tracker.sadd(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}')
if tracker_type != old_type:
r_tracker.srem(f'trackers:objs:{old_type}:{obj_type}', old_to_track)
r_tracker.srem(f'trackers:uuid:{old_type}:{old_to_track}', f'{self.uuid}:{obj_type}')
# Refresh Trackers # Refresh Trackers
trigger_trackers_refresh(tracker_type) trigger_trackers_refresh(tracker_type)
@ -555,8 +568,6 @@ class Tracker:
os.remove(filepath) os.remove(filepath)
# Filters # Filters
filters = self.get_filters()
if not filters:
filters = get_objects_tracked() filters = get_objects_tracked()
for obj_type in filters: for obj_type in filters:
r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', tracked) r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', tracked)
@ -564,8 +575,20 @@ class Tracker:
self._del_mails() self._del_mails()
self._del_tags() self._del_tags()
level = self.get_level()
if level == 0: # user only
user = self.get_user()
r_tracker.srem(f'user:tracker:{user}', self.uuid)
r_tracker.srem(f'user:tracker:{user}:{tracker_type}', self.uuid)
elif level == 1: # global
r_tracker.srem('global:tracker', self.uuid)
r_tracker.srem(f'global:tracker:{tracker_type}', self.uuid)
# meta # meta
r_tracker.delete(f'tracker:{self.uuid}') r_tracker.delete(f'tracker:{self.uuid}')
trigger_trackers_refresh(tracker_type)
def create_tracker(tracker_type, to_track, user_id, level, description=None, filters={}, tags=[], mails=[], webhook=None, tracker_uuid=None): def create_tracker(tracker_type, to_track, user_id, level, description=None, filters={}, tags=[], mails=[], webhook=None, tracker_uuid=None):
@ -638,14 +661,14 @@ def get_user_trackers_meta(user_id, tracker_type=None):
metas = [] metas = []
for tracker_uuid in get_user_trackers(user_id, tracker_type=tracker_type): for tracker_uuid in get_user_trackers(user_id, tracker_type=tracker_type):
tracker = Tracker(tracker_uuid) tracker = Tracker(tracker_uuid)
metas.append(tracker.get_meta(options={'mails', 'sparkline', 'tags'})) metas.append(tracker.get_meta(options={'description', 'mails', 'sparkline', 'tags'}))
return metas return metas
def get_global_trackers_meta(tracker_type=None): def get_global_trackers_meta(tracker_type=None):
metas = [] metas = []
for tracker_uuid in get_global_trackers(tracker_type=tracker_type): for tracker_uuid in get_global_trackers(tracker_type=tracker_type):
tracker = Tracker(tracker_uuid) tracker = Tracker(tracker_uuid)
metas.append(tracker.get_meta(options={'mails', 'sparkline', 'tags'})) metas.append(tracker.get_meta(options={'description', 'mails', 'sparkline', 'tags'}))
return metas return metas
def get_users_trackers_meta(): def get_users_trackers_meta():
@ -906,7 +929,7 @@ def api_add_tracker(dict_input, user_id):
# Filters # TODO MOVE ME # Filters # TODO MOVE ME
filters = dict_input.get('filters', {}) filters = dict_input.get('filters', {})
if filters: if filters:
if filters.keys() == {'decoded', 'item', 'pgp'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}: if filters.keys() == {'decoded', 'item', 'pgp', 'title'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
filters = {} filters = {}
for obj_type in filters: for obj_type in filters:
if obj_type not in get_objects_tracked(): if obj_type not in get_objects_tracked():
@ -981,7 +1004,7 @@ def api_edit_tracker(dict_input, user_id):
# Filters # TODO MOVE ME # Filters # TODO MOVE ME
filters = dict_input.get('filters', {}) filters = dict_input.get('filters', {})
if filters: if filters:
if filters.keys() == {'decoded', 'item', 'pgp'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}: if filters.keys() == {'decoded', 'item', 'pgp', 'title'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
if not filters['decoded'] and not filters['item']: if not filters['decoded'] and not filters['item']:
filters = {} filters = {}
for obj_type in filters: for obj_type in filters:
@ -1134,7 +1157,11 @@ def get_tracked_yara_rules():
for obj_type in get_objects_tracked(): for obj_type in get_objects_tracked():
rules = {} rules = {}
for tracked in _get_tracked_by_obj_type('yara', obj_type): for tracked in _get_tracked_by_obj_type('yara', obj_type):
rules[tracked] = os.path.join(get_yara_rules_dir(), tracked) rule = os.path.join(get_yara_rules_dir(), tracked)
if not os.path.exists(rule):
logger.critical(f"Yara rule don't exists {tracked} : {obj_type}")
else:
rules[tracked] = rule
to_track[obj_type] = yara.compile(filepaths=rules) to_track[obj_type] = yara.compile(filepaths=rules)
print(to_track) print(to_track)
return to_track return to_track

View file

@ -81,7 +81,7 @@ def get_user_passwd_hash(user_id):
return r_serv_db.hget('ail:users:all', user_id) return r_serv_db.hget('ail:users:all', user_id)
def get_user_token(user_id): def get_user_token(user_id):
return r_serv_db.hget(f'ail:users:metadata:{user_id}', 'token') return r_serv_db.hget(f'ail:user:metadata:{user_id}', 'token')
def get_token_user(token): def get_token_user(token):
return r_serv_db.hget('ail:users:tokens', token) return r_serv_db.hget('ail:users:tokens', token)
@ -156,6 +156,7 @@ def delete_user(user_id):
for role_id in get_all_roles(): for role_id in get_all_roles():
r_serv_db.srem(f'ail:users:role:{role_id}', user_id) r_serv_db.srem(f'ail:users:role:{role_id}', user_id)
user_token = get_user_token(user_id) user_token = get_user_token(user_id)
if user_token:
r_serv_db.hdel('ail:users:tokens', user_token) r_serv_db.hdel('ail:users:tokens', user_token)
r_serv_db.delete(f'ail:user:metadata:{user_id}') r_serv_db.delete(f'ail:user:metadata:{user_id}')
r_serv_db.hdel('ail:users:all', user_id) r_serv_db.hdel('ail:users:all', user_id)
@ -246,7 +247,10 @@ class User(UserMixin):
self.id = "__anonymous__" self.id = "__anonymous__"
def exists(self): def exists(self):
return self.id != "__anonymous__" if self.id == "__anonymous__":
return False
else:
return r_serv_db.exists(f'ail:user:metadata:{self.id}')
# return True or False # return True or False
# def is_authenticated(): # def is_authenticated():
@ -286,3 +290,6 @@ class User(UserMixin):
return True return True
else: else:
return False return False
def get_role(self):
return r_serv_db.hget(f'ail:user:metadata:{self.id}', 'role')

View file

@ -13,9 +13,12 @@ from lib.ConfigLoader import ConfigLoader
config_loader = ConfigLoader() config_loader = ConfigLoader()
r_serv_db = config_loader.get_db_conn("Kvrocks_DB") r_serv_db = config_loader.get_db_conn("Kvrocks_DB")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
config_loader = None config_loader = None
AIL_OBJECTS = sorted({'cve', 'cryptocurrency', 'decoded', 'domain', 'item', 'pgp', 'screenshot', 'username'}) AIL_OBJECTS = sorted({'chat', 'chat-subchannel', 'chat-thread', 'cookie-name', 'cve', 'cryptocurrency', 'decoded',
'domain', 'etag', 'favicon', 'file-name', 'hhhash',
'item', 'image', 'message', 'pgp', 'screenshot', 'title', 'user-account', 'username'})
def get_ail_uuid(): def get_ail_uuid():
ail_uuid = r_serv_db.get('ail:uuid') ail_uuid = r_serv_db.get('ail:uuid')
@ -37,19 +40,28 @@ def get_all_objects():
return AIL_OBJECTS return AIL_OBJECTS
def get_objects_with_subtypes(): def get_objects_with_subtypes():
return ['cryptocurrency', 'pgp', 'username'] return ['chat', 'cryptocurrency', 'pgp', 'username', 'user-account']
def get_object_all_subtypes(obj_type): def get_object_all_subtypes(obj_type): # TODO Dynamic subtype
if obj_type == 'chat':
return r_object.smembers(f'all_chat:subtypes')
if obj_type == 'chat-subchannel':
return r_object.smembers(f'all_chat-subchannel:subtypes')
if obj_type == 'cryptocurrency': if obj_type == 'cryptocurrency':
return ['bitcoin', 'bitcoin-cash', 'dash', 'ethereum', 'litecoin', 'monero', 'zcash'] return ['bitcoin', 'bitcoin-cash', 'dash', 'ethereum', 'litecoin', 'monero', 'zcash']
if obj_type == 'pgp': if obj_type == 'pgp':
return ['key', 'mail', 'name'] return ['key', 'mail', 'name']
if obj_type == 'username': if obj_type == 'username':
return ['telegram', 'twitter', 'jabber'] return ['telegram', 'twitter', 'jabber']
if obj_type == 'user-account':
return r_object.smembers(f'all_chat:subtypes')
return [] return []
def get_obj_queued():
return ['item', 'image']
def get_objects_tracked(): def get_objects_tracked():
return ['decoded', 'item', 'pgp'] return ['decoded', 'item', 'pgp', 'title']
def get_objects_retro_hunted(): def get_objects_retro_hunted():
return ['decoded', 'item'] return ['decoded', 'item']
@ -65,6 +77,32 @@ def get_all_objects_with_subtypes_tuple():
str_objs.append((obj_type, '')) str_objs.append((obj_type, ''))
return str_objs return str_objs
def unpack_obj_global_id(global_id, r_type='tuple'):
if r_type == 'dict':
obj = global_id.split(':', 2)
return {'type': obj[0], 'subtype': obj[1], 'id': obj[2]}
else: # tuple(type, subtype, id)
return global_id.split(':', 2)
def unpack_objs_global_id(objs_global_id, r_type='tuple'):
objs = []
for global_id in objs_global_id:
objs.append(unpack_obj_global_id(global_id, r_type=r_type))
return objs
def unpack_correl_obj__id(obj_type, global_id, r_type='tuple'):
obj = global_id.split(':', 1)
if r_type == 'dict':
return {'type': obj_type, 'subtype': obj[0], 'id': obj[1]}
else: # tuple(type, subtype, id)
return obj_type, obj[0], obj[1]
def unpack_correl_objs_id(obj_type, correl_objs_id, r_type='tuple'):
objs = []
for correl_obj_id in correl_objs_id:
objs.append(unpack_correl_obj__id(obj_type, correl_obj_id, r_type=r_type))
return objs
##-- AIL OBJECTS --## ##-- AIL OBJECTS --##
#### Redis #### #### Redis ####
@ -82,6 +120,10 @@ def zscan_iter(r_redis, name): # count ???
## -- Redis -- ## ## -- Redis -- ##
def rreplace(s, old, new, occurrence):
li = s.rsplit(old, occurrence)
return new.join(li)
def paginate_iterator(iter_elems, nb_obj=50, page=1): def paginate_iterator(iter_elems, nb_obj=50, page=1):
dict_page = {'nb_all_elem': len(iter_elems)} dict_page = {'nb_all_elem': len(iter_elems)}
nb_pages = dict_page['nb_all_elem'] / nb_obj nb_pages = dict_page['nb_all_elem'] / nb_obj

View file

@ -1,9 +1,7 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
# -*-coding:UTF-8 -* # -*-coding:UTF-8 -*
import base64
import datetime import datetime
import gzip
import logging.config import logging.config
import magic import magic
import os import os
@ -181,15 +179,3 @@ def create_item_id(feeder_name, path):
item_id = os.path.join(feeder_name, date, basename) item_id = os.path.join(feeder_name, date, basename)
# TODO check if already exists # TODO check if already exists
return item_id return item_id
def create_b64(b_content):
return base64.standard_b64encode(b_content).decode()
def create_gzipped_b64(b_content):
try:
gzipencoded = gzip.compress(b_content)
gzip64encoded = create_b64(gzipencoded)
return gzip64encoded
except Exception as e:
logger.warning(e)
return ''

View file

@ -6,19 +6,29 @@ import sys
import datetime import datetime
import time import time
import xxhash
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from lib.exceptions import ModuleQueueError from lib.exceptions import ModuleQueueError
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib import ail_core
config_loader = ConfigLoader() config_loader = ConfigLoader()
r_queues = config_loader.get_redis_conn("Redis_Queues") r_queues = config_loader.get_redis_conn("Redis_Queues")
r_obj_process = config_loader.get_redis_conn("Redis_Process")
timeout_queue_obj = 172800
config_loader = None config_loader = None
MODULES_FILE = os.path.join(os.environ['AIL_HOME'], 'configs', 'modules.cfg') MODULES_FILE = os.path.join(os.environ['AIL_HOME'], 'configs', 'modules.cfg')
# # # # # # # #
# #
# AIL QUEUE #
# #
# # # # # # # #
class AILQueue: class AILQueue:
@ -60,16 +70,38 @@ class AILQueue:
# Update queues stats # Update queues stats
r_queues.hset('queues', self.name, self.get_nb_messages()) r_queues.hset('queues', self.name, self.get_nb_messages())
r_queues.hset(f'modules', f'{self.pid}:{self.name}', int(time.time())) r_queues.hset(f'modules', f'{self.pid}:{self.name}', int(time.time()))
# Get Message # Get Message
message = r_queues.lpop(f'queue:{self.name}:in') message = r_queues.lpop(f'queue:{self.name}:in')
if not message: if not message:
return None return None
else: else:
# TODO SAVE CURRENT ITEMS (OLD Module information) row_mess = message.split(';', 1)
if len(row_mess) != 2:
return None, None, message
# raise Exception(f'Error: queue {self.name}, no AIL object provided')
else:
obj_global_id, mess = row_mess
m_hash = xxhash.xxh3_64_hexdigest(message)
add_processed_obj(obj_global_id, m_hash, module=self.name)
return obj_global_id, m_hash, mess
return message def rename_message_obj(self, new_id, old_id):
# restrict rename function
if self.name == 'Mixer' or self.name == 'Global':
rename_processed_obj(new_id, old_id)
else:
raise ModuleQueueError('This Module can\'t rename an object ID')
def send_message(self, message, queue_name=None): # condition -> not in any queue
# TODO EDIT meta
def end_message(self, obj_global_id, m_hash):
end_processed_obj(obj_global_id, m_hash, module=self.name)
def send_message(self, obj_global_id, message='', queue_name=None):
if not self.subscribers_modules: if not self.subscribers_modules:
raise ModuleQueueError('This Module don\'t have any subscriber') raise ModuleQueueError('This Module don\'t have any subscriber')
if queue_name: if queue_name:
@ -80,8 +112,17 @@ class AILQueue:
raise ModuleQueueError('Queue name required. This module push to multiple queues') raise ModuleQueueError('Queue name required. This module push to multiple queues')
queue_name = list(self.subscribers_modules)[0] queue_name = list(self.subscribers_modules)[0]
message = f'{obj_global_id};{message}'
if obj_global_id != '::':
m_hash = xxhash.xxh3_64_hexdigest(message)
else:
m_hash = None
# Add message to all modules # Add message to all modules
for module_name in self.subscribers_modules[queue_name]: for module_name in self.subscribers_modules[queue_name]:
if m_hash:
add_processed_obj(obj_global_id, m_hash, queue=module_name)
r_queues.rpush(f'queue:{module_name}:in', message) r_queues.rpush(f'queue:{module_name}:in', message)
# stats # stats
nb_mess = r_queues.llen(f'queue:{module_name}:in') nb_mess = r_queues.llen(f'queue:{module_name}:in')
@ -98,6 +139,7 @@ class AILQueue:
def error(self): def error(self):
r_queues.hdel(f'modules', f'{self.pid}:{self.name}') r_queues.hdel(f'modules', f'{self.pid}:{self.name}')
def get_queues_modules(): def get_queues_modules():
return r_queues.hkeys('queues') return r_queues.hkeys('queues')
@ -132,6 +174,132 @@ def get_modules_queues_stats():
def clear_modules_queues_stats(): def clear_modules_queues_stats():
r_queues.delete('modules') r_queues.delete('modules')
# # # # # # # # #
# #
# OBJ QUEUES # PROCESS ??
# #
# # # # # # # # #
def get_processed_objs():
return r_obj_process.smembers(f'objs:process')
def get_processed_end_objs():
return r_obj_process.smembers(f'objs:processed')
def get_processed_end_obj():
return r_obj_process.spop(f'objs:processed')
def get_processed_objs_by_type(obj_type):
return r_obj_process.zrange(f'objs:process:{obj_type}', 0, -1)
def is_processed_obj_queued(obj_global_id):
return r_obj_process.exists(f'obj:queues:{obj_global_id}')
def is_processed_obj_moduled(obj_global_id):
return r_obj_process.exists(f'obj:modules:{obj_global_id}')
def is_processed_obj(obj_global_id):
return is_processed_obj_queued(obj_global_id) or is_processed_obj_moduled(obj_global_id)
def get_processed_obj_modules(obj_global_id):
return r_obj_process.zrange(f'obj:modules:{obj_global_id}', 0, -1)
def get_processed_obj_queues(obj_global_id):
return r_obj_process.zrange(f'obj:queues:{obj_global_id}', 0, -1)
def get_processed_obj(obj_global_id):
return {'modules': get_processed_obj_modules(obj_global_id), 'queues': get_processed_obj_queues(obj_global_id)}
def add_processed_obj(obj_global_id, m_hash, module=None, queue=None):
obj_type = obj_global_id.split(':', 1)[0]
new_obj = r_obj_process.sadd(f'objs:process', obj_global_id)
# first process:
if new_obj:
r_obj_process.zadd(f'objs:process:{obj_type}', {obj_global_id: int(time.time())})
if queue:
r_obj_process.zadd(f'obj:queues:{obj_global_id}', {f'{queue}:{m_hash}': int(time.time())})
if module:
r_obj_process.zadd(f'obj:modules:{obj_global_id}', {f'{module}:{m_hash}': int(time.time())})
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{module}:{m_hash}')
def end_processed_obj(obj_global_id, m_hash, module=None, queue=None):
if queue:
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{m_hash}')
if module:
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{m_hash}')
# TODO HANDLE QUEUE DELETE
# process completed
if not is_processed_obj(obj_global_id):
obj_type = obj_global_id.split(':', 1)[0]
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
r_obj_process.srem(f'objs:process', obj_global_id)
r_obj_process.sadd(f'objs:processed', obj_global_id) # TODO use list ??????
def rename_processed_obj(new_id, old_id):
module = get_processed_obj_modules(old_id)
# currently in a module
if len(module) == 1:
module, x_hash = module[0].split(':', 1)
obj_type = old_id.split(':', 1)[0]
r_obj_process.zrem(f'obj:modules:{old_id}', f'{module}:{x_hash}')
r_obj_process.zrem(f'objs:process:{obj_type}', old_id)
r_obj_process.srem(f'objs:process', old_id)
add_processed_obj(new_id, x_hash, module=module)
def get_last_queue_timeout():
epoch_update = r_obj_process.get('queue:obj:timeout:last')
if not epoch_update:
epoch_update = 0
return float(epoch_update)
def timeout_process_obj(obj_global_id):
for q in get_processed_obj_queues(obj_global_id):
queue, x_hash = q.split(':', 1)
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{x_hash}')
for m in get_processed_obj_modules(obj_global_id):
module, x_hash = m.split(':', 1)
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{x_hash}')
obj_type = obj_global_id.split(':', 1)[0]
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
r_obj_process.srem(f'objs:process', obj_global_id)
r_obj_process.sadd(f'objs:processed', obj_global_id)
print(f'timeout: {obj_global_id}')
def timeout_processed_objs():
curr_time = int(time.time())
time_limit = curr_time - timeout_queue_obj
for obj_type in ail_core.get_obj_queued():
for obj_global_id in r_obj_process.zrangebyscore(f'objs:process:{obj_type}', 0, time_limit):
timeout_process_obj(obj_global_id)
r_obj_process.set('queue:obj:timeout:last', time.time())
def delete_processed_obj(obj_global_id):
for q in get_processed_obj_queues(obj_global_id):
queue, x_hash = q.split(':', 1)
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{x_hash}')
for m in get_processed_obj_modules(obj_global_id):
module, x_hash = m.split(':', 1)
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{x_hash}')
obj_type = obj_global_id.split(':', 1)[0]
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
r_obj_process.srem(f'objs:process', obj_global_id)
###################################################################################
# # # # # # # #
# #
# GRAPH #
# #
# # # # # # # #
def get_queue_digraph(): def get_queue_digraph():
queues_ail = {} queues_ail = {}
modules = {} modules = {}
@ -223,64 +391,13 @@ def save_queue_digraph():
sys.exit(1) sys.exit(1)
###########################################################################################
###########################################################################################
###########################################################################################
###########################################################################################
###########################################################################################
# def get_all_queues_name():
# return r_queues.hkeys('queues')
#
# def get_all_queues_dict_with_nb_elem():
# return r_queues.hgetall('queues')
#
# def get_all_queues_with_sorted_nb_elem():
# res = r_queues.hgetall('queues')
# res = sorted(res.items())
# return res
#
# def get_module_pid_by_queue_name(queue_name):
# return r_queues.smembers('MODULE_TYPE_{}'.format(queue_name))
#
# # # TODO: remove last msg part
# def get_module_last_process_start_time(queue_name, module_pid):
# res = r_queues.get('MODULE_{}_{}'.format(queue_name, module_pid))
# if res:
# return res.split(',')[0]
# return None
#
# def get_module_last_msg(queue_name, module_pid):
# return r_queues.get('MODULE_{}_{}_PATH'.format(queue_name, module_pid))
#
# def get_all_modules_queues_stats():
# all_modules_queues_stats = []
# for queue_name, nb_elem_queue in get_all_queues_with_sorted_nb_elem():
# l_module_pid = get_module_pid_by_queue_name(queue_name)
# for module_pid in l_module_pid:
# last_process_start_time = get_module_last_process_start_time(queue_name, module_pid)
# if last_process_start_time:
# last_process_start_time = datetime.datetime.fromtimestamp(int(last_process_start_time))
# seconds = int((datetime.datetime.now() - last_process_start_time).total_seconds())
# else:
# seconds = 0
# all_modules_queues_stats.append((queue_name, nb_elem_queue, seconds, module_pid))
# return all_modules_queues_stats
#
#
# def _get_all_messages_from_queue(queue_name):
# #self.r_temp.hset('queues', self.subscriber_name, int(self.r_temp.scard(in_set)))
# return r_queues.smembers(f'queue:{queue_name}:in')
#
# # def is_message_in queue(queue_name):
# # pass
#
# def remove_message_from_queue(queue_name, message):
# queue_key = f'queue:{queue_name}:in'
# r_queues.srem(queue_key, message)
# r_queues.hset('queues', queue_name, int(r_queues.scard(queue_key)))
if __name__ == '__main__': if __name__ == '__main__':
# clear_modules_queues_stats() # clear_modules_queues_stats()
save_queue_digraph() # save_queue_digraph()
oobj_global_id = 'item::submitted/2023/10/11/submitted_b5440009-05d5-4494-a807-a6d8e4a900cf.gz'
# print(get_processed_obj(oobj_global_id))
# delete_processed_obj(oobj_global_id)
# while True:
# print(get_processed_obj(oobj_global_id))
# time.sleep(0.5)
print(get_processed_end_objs())

View file

@ -15,38 +15,15 @@ config_loader = ConfigLoader()
r_db = config_loader.get_db_conn("Kvrocks_DB") r_db = config_loader.get_db_conn("Kvrocks_DB")
config_loader = None config_loader = None
BACKGROUND_UPDATES = { # # # # # # # #
'v1.5': { # #
'nb_updates': 5, # UPDATE #
'message': 'Tags and Screenshots' # #
}, # # # # # # # #
'v2.4': {
'nb_updates': 1,
'message': ' Domains Tags and Correlations'
},
'v2.6': {
'nb_updates': 1,
'message': 'Domains Tags and Correlations'
},
'v2.7': {
'nb_updates': 1,
'message': 'Domains Tags'
},
'v3.4': {
'nb_updates': 1,
'message': 'Domains Languages'
},
'v3.7': {
'nb_updates': 1,
'message': 'Trackers first_seen/last_seen'
}
}
def get_ail_version(): def get_ail_version():
return r_db.get('ail:version') return r_db.get('ail:version')
def get_ail_float_version(): def get_ail_float_version():
version = get_ail_version() version = get_ail_version()
if version: if version:
@ -55,6 +32,179 @@ def get_ail_float_version():
version = 0 version = 0
return version return version
# # # - - # # #
# # # # # # # # # # # #
# #
# UPDATE BACKGROUND #
# #
# # # # # # # # # # # #
BACKGROUND_UPDATES = {
'v5.2': {
'message': 'Compress HAR',
'scripts': ['compress_har.py']
},
}
class AILBackgroundUpdate:
"""
AIL Background Update.
"""
def __init__(self, version):
self.version = version
def _get_field(self, field):
return r_db.hget('ail:update:background', field)
def _set_field(self, field, value):
r_db.hset('ail:update:background', field, value)
def get_version(self):
return self.version
def get_message(self):
return BACKGROUND_UPDATES.get(self.version, {}).get('message', '')
def get_error(self):
return self._get_field('error')
def set_error(self, error): # TODO ADD LOGS
self._set_field('error', error)
def get_nb_scripts(self):
return int(len(BACKGROUND_UPDATES.get(self.version, {}).get('scripts', [''])))
def get_scripts(self):
return BACKGROUND_UPDATES.get(self.version, {}).get('scripts', [])
def get_nb_scripts_done(self):
done = self._get_field('done')
try:
done = int(done)
except (TypeError, ValueError):
done = 0
return done
def inc_nb_scripts_done(self):
self._set_field('done', self.get_nb_scripts_done() + 1)
def get_script(self):
return self._get_field('script')
def get_script_path(self):
path = os.path.basename(self.get_script())
if path:
return os.path.join(os.environ['AIL_HOME'], 'update', self.version, path)
def get_nb_to_update(self): # TODO use cache ?????
nb_to_update = self._get_field('nb_to_update')
if not nb_to_update:
nb_to_update = 1
return int(nb_to_update)
def set_nb_to_update(self, nb):
self._set_field('nb_to_update', int(nb))
def get_nb_updated(self): # TODO use cache ?????
nb_updated = self._get_field('nb_updated')
if not nb_updated:
nb_updated = 0
return int(nb_updated)
def inc_nb_updated(self): # TODO use cache ?????
r_db.hincrby('ail:update:background', 'nb_updated', 1)
def get_progress(self): # TODO use cache ?????
return self._get_field('progress')
def set_progress(self, progress):
self._set_field('progress', progress)
def update_progress(self):
nb_updated = self.get_nb_updated()
nb_to_update = self.get_nb_to_update()
if nb_updated == nb_to_update:
progress = 100
elif nb_updated > nb_to_update:
progress = 99
else:
progress = int((nb_updated * 100) / nb_to_update)
self.set_progress(progress)
print(f'{nb_updated}/{nb_to_update} updated {progress}%')
return progress
def is_running(self):
return r_db.hget('ail:update:background', 'version') == self.version
def get_meta(self, options=set()):
meta = {'version': self.get_version(),
'error': self.get_error(),
'script': self.get_script(),
'script_progress': self.get_progress(),
'nb_update': self.get_nb_scripts(),
'nb_completed': self.get_nb_scripts_done()}
meta['progress'] = int(meta['nb_completed'] * 100 / meta['nb_update'])
if 'message' in options:
meta['message'] = self.get_message()
return meta
def start(self):
self._set_field('version', self.version)
r_db.hdel('ail:update:background', 'error')
def start_script(self, script):
self.clear()
self._set_field('script', script)
self.set_progress(0)
def end_script(self):
self.set_progress(100)
self.inc_nb_scripts_done()
def clear(self):
r_db.hdel('ail:update:background', 'error')
r_db.hdel('ail:update:background', 'progress')
r_db.hdel('ail:update:background', 'nb_updated')
r_db.hdel('ail:update:background', 'nb_to_update')
def end(self):
r_db.delete('ail:update:background')
r_db.srem('ail:updates:background', self.version)
# To Add in update script
def add_background_update(version):
r_db.sadd('ail:updates:background', version)
def is_update_background_running():
return r_db.exists('ail:update:background')
def get_update_background_version():
return r_db.hget('ail:update:background', 'version')
def get_update_background_meta(options=set()):
version = get_update_background_version()
if version:
return AILBackgroundUpdate(version).get_meta(options=options)
else:
return {}
def get_update_background_to_launch():
to_launch = []
updates = r_db.smembers('ail:updates:background')
for version in BACKGROUND_UPDATES:
if version in updates:
to_launch.append(version)
return to_launch
# # # - - # # #
##########################################################################################
##########################################################################################
##########################################################################################
def get_ail_all_updates(date_separator='-'): def get_ail_all_updates(date_separator='-'):
dict_update = r_db.hgetall('ail:update_date') dict_update = r_db.hgetall('ail:update_date')
@ -87,111 +237,6 @@ def check_version(version):
return True return True
#### UPDATE BACKGROUND ####
def exits_background_update_to_launch():
return r_db.scard('ail:update:to_update') != 0
def is_version_in_background_update(version):
return r_db.sismember('ail:update:to_update', version)
def get_all_background_updates_to_launch():
return r_db.smembers('ail:update:to_update')
def get_current_background_update():
return r_db.get('ail:update:update_in_progress')
def get_current_background_update_script():
return r_db.get('ail:update:current_background_script')
def get_current_background_update_script_path(version, script_name):
return os.path.join(os.environ['AIL_HOME'], 'update', version, script_name)
def get_current_background_nb_update_completed():
return r_db.scard('ail:update:update_in_progress:completed')
def get_current_background_update_progress():
progress = r_db.get('ail:update:current_background_script_stat')
if not progress:
progress = 0
return int(progress)
def get_background_update_error():
return r_db.get('ail:update:error')
def add_background_updates_to_launch(version):
return r_db.sadd('ail:update:to_update', version)
def start_background_update(version):
r_db.delete('ail:update:error')
r_db.set('ail:update:update_in_progress', version)
def set_current_background_update_script(script_name):
r_db.set('ail:update:current_background_script', script_name)
r_db.set('ail:update:current_background_script_stat', 0)
def set_current_background_update_progress(progress):
r_db.set('ail:update:current_background_script_stat', progress)
def set_background_update_error(error):
r_db.set('ail:update:error', error)
def end_background_update_script():
r_db.sadd('ail:update:update_in_progress:completed')
def end_background_update(version):
r_db.delete('ail:update:update_in_progress')
r_db.delete('ail:update:current_background_script')
r_db.delete('ail:update:current_background_script_stat')
r_db.delete('ail:update:update_in_progress:completed')
r_db.srem('ail:update:to_update', version)
def clear_background_update():
r_db.delete('ail:update:error')
r_db.delete('ail:update:update_in_progress')
r_db.delete('ail:update:current_background_script')
r_db.delete('ail:update:current_background_script_stat')
r_db.delete('ail:update:update_in_progress:completed')
def get_update_background_message(version):
return BACKGROUND_UPDATES[version]['message']
# TODO: Detect error in subprocess
def get_update_background_metadata():
dict_update = {}
version = get_current_background_update()
if version:
dict_update['version'] = version
dict_update['script'] = get_current_background_update_script()
dict_update['script_progress'] = get_current_background_update_progress()
dict_update['nb_update'] = BACKGROUND_UPDATES[dict_update['version']]['nb_updates']
dict_update['nb_completed'] = get_current_background_nb_update_completed()
dict_update['progress'] = int(dict_update['nb_completed'] * 100 / dict_update['nb_update'])
dict_update['error'] = get_background_update_error()
return dict_update
##-- UPDATE BACKGROUND --##
if __name__ == '__main__': if __name__ == '__main__':
res = check_version('v3.1..1') res = check_version('v3.1..1')
print(res) print(res)

View file

@ -1,6 +1,8 @@
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
import json
import logging
import os import os
import sys import sys
import requests import requests
@ -8,6 +10,8 @@ import requests
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
from lib.objects.CryptoCurrencies import CryptoCurrency from lib.objects.CryptoCurrencies import CryptoCurrency
logger = logging.getLogger()
blockchain_all = 'https://blockchain.info/rawaddr' blockchain_all = 'https://blockchain.info/rawaddr'
# pre-alpha script # pre-alpha script
@ -18,23 +22,26 @@ def get_bitcoin_info(bitcoin_address, nb_transaction=50):
set_btc_in = set() set_btc_in = set()
set_btc_out = set() set_btc_out = set()
try: try:
req = requests.get('{}/{}?limit={}'.format(blockchain_all, bitcoin_address, nb_transaction)) req = requests.get(f'{blockchain_all}/{bitcoin_address}?limit={nb_transaction}')
jreq = req.json() jreq = req.json()
except Exception as e: except Exception as e:
print(e) logger.warning(e)
return dict_btc
if not jreq.get('n_tx'):
logger.critical(json.dumps(jreq))
return dict_btc return dict_btc
# print(json.dumps(jreq))
dict_btc['n_tx'] = jreq['n_tx'] dict_btc['n_tx'] = jreq['n_tx']
dict_btc['total_received'] = float(jreq['total_received'] / 100000000) dict_btc['total_received'] = float(jreq['total_received'] / 100000000)
dict_btc['total_sent'] = float(jreq['total_sent'] / 100000000) dict_btc['total_sent'] = float(jreq['total_sent'] / 100000000)
dict_btc['final_balance'] = float(jreq['final_balance'] / 100000000) dict_btc['final_balance'] = float(jreq['final_balance'] / 100000000)
for transaction in jreq['txs']: for transaction in jreq['txs']:
for input in transaction['inputs']: for t_input in transaction['inputs']:
if 'addr' in input['prev_out']: if 'addr' in t_input['prev_out']:
if input['prev_out']['addr'] != bitcoin_address: if t_input['prev_out']['addr'] != bitcoin_address:
set_btc_in.add(input['prev_out']['addr']) set_btc_in.add(t_input['prev_out']['addr'])
for output in transaction['out']: for output in transaction['out']:
if 'addr' in output: if 'addr' in output:
if output['addr'] != bitcoin_address: if output['addr'] != bitcoin_address:

423
bin/lib/chats_viewer.py Executable file
View file

@ -0,0 +1,423 @@
#!/usr/bin/python3
"""
Chats Viewer
===================
"""
import os
import sys
import time
import uuid
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects import Chats
from lib.objects import ChatSubChannels
from lib.objects import ChatThreads
from lib.objects import Messages
from lib.objects import UsersAccount
from lib.objects import Usernames
from lib import Language
config_loader = ConfigLoader()
r_db = config_loader.get_db_conn("Kvrocks_DB")
r_crawler = config_loader.get_db_conn("Kvrocks_Crawler")
r_cache = config_loader.get_redis_conn("Redis_Cache")
r_obj = config_loader.get_db_conn("Kvrocks_DB") # TEMP new DB ????
# # # # # # # #
# #
# COMMON #
# #
# # # # # # # #
# TODO ChatDefaultPlatform
# CHAT(type=chat, subtype=platform, id= chat_id)
# Channel(type=channel, subtype=platform, id=channel_id)
# Thread(type=thread, subtype=platform, id=thread_id)
# Message(type=message, subtype=platform, id=message_id)
# Protocol/Platform
# class ChatProtocols: # TODO Remove Me
#
# def __init__(self): # name ???? subtype, id ????
# # discord, mattermost, ...
# pass
#
# def get_chat_protocols(self):
# pass
#
# def get_chat_protocol(self, protocol):
# pass
#
# ################################################################
#
# def get_instances(self):
# pass
#
# def get_chats(self):
# pass
#
# def get_chats_by_instance(self, instance):
# pass
#
#
# class ChatNetwork: # uuid or protocol
# def __init__(self, network='default'):
# self.id = network
#
# def get_addresses(self):
# pass
#
#
# class ChatServerAddress: # uuid or protocol + network
# def __init__(self, address='default'):
# self.id = address
# map uuid -> type + field
# TODO option last protocol/ imported messages/chat -> unread mode ????
# # # # # # # # #
# #
# PROTOCOLS # IRC, discord, mattermost, ...
# #
# # # # # # # # # TODO icon => UI explorer by protocol + network + instance
def get_chat_protocols():
return r_obj.smembers(f'chat:protocols')
def get_chat_protocols_meta():
metas = []
for protocol_id in get_chat_protocols():
protocol = ChatProtocol(protocol_id)
metas.append(protocol.get_meta(options={'icon'}))
return metas
class ChatProtocol: # TODO first seen last seen ???? + nb by day ????
def __init__(self, protocol):
self.id = protocol
def exists(self):
return r_db.exists(f'chat:protocol:{self.id}')
def get_networks(self):
return r_db.smembers(f'chat:protocol:{self.id}')
def get_nb_networks(self):
return r_db.scard(f'chat:protocol:{self.id}')
def get_icon(self):
if self.id == 'discord':
icon = {'style': 'fab', 'icon': 'fa-discord'}
elif self.id == 'telegram':
icon = {'style': 'fab', 'icon': 'fa-telegram'}
else:
icon = {}
return icon
def get_meta(self, options=set()):
meta = {'id': self.id}
if 'icon' in options:
meta['icon'] = self.get_icon()
return meta
# def get_addresses(self):
# pass
#
# def get_instances_uuids(self):
# pass
# # # # # # # # # # # # # #
# #
# ChatServiceInstance #
# #
# # # # # # # # # # # # # #
# uuid -> protocol + network + server
class ChatServiceInstance:
def __init__(self, instance_uuid):
self.uuid = instance_uuid
def exists(self):
return r_obj.exists(f'chatSerIns:{self.uuid}')
def get_protocol(self): # return objects ????
return r_obj.hget(f'chatSerIns:{self.uuid}', 'protocol')
def get_network(self): # return objects ????
network = r_obj.hget(f'chatSerIns:{self.uuid}', 'network')
if network:
return network
def get_address(self): # return objects ????
address = r_obj.hget(f'chatSerIns:{self.uuid}', 'address')
if address:
return address
def get_meta(self, options=set()):
meta = {'uuid': self.uuid,
'protocol': self.get_protocol(),
'network': self.get_network(),
'address': self.get_address()}
if 'chats' in options:
meta['chats'] = []
for chat_id in self.get_chats():
meta['chats'].append(Chats.Chat(chat_id, self.uuid).get_meta({'created_at', 'icon', 'nb_subchannels', 'nb_messages'}))
return meta
def get_nb_chats(self):
return Chats.Chats().get_nb_ids_by_subtype(self.uuid)
def get_chats(self):
return Chats.Chats().get_ids_by_subtype(self.uuid)
def get_chat_service_instances():
return r_obj.smembers(f'chatSerIns:all')
def get_chat_service_instances_by_protocol(protocol):
instance_uuids = {}
for network in r_obj.smembers(f'chat:protocol:networks:{protocol}'):
inst_uuids = r_obj.hvals(f'map:chatSerIns:{protocol}:{network}')
if not network:
network = 'default'
instance_uuids[network] = inst_uuids
return instance_uuids
def get_chat_service_instance_uuid(protocol, network, address):
if not network:
network = ''
if not address:
address = ''
return r_obj.hget(f'map:chatSerIns:{protocol}:{network}', address)
def get_chat_service_instance_uuid_meta_from_network_dict(instance_uuids):
for network in instance_uuids:
metas = []
for instance_uuid in instance_uuids[network]:
metas.append(ChatServiceInstance(instance_uuid).get_meta())
instance_uuids[network] = metas
return instance_uuids
def get_chat_service_instance(protocol, network, address):
instance_uuid = get_chat_service_instance_uuid(protocol, network, address)
if instance_uuid:
return ChatServiceInstance(instance_uuid)
def create_chat_service_instance(protocol, network=None, address=None):
instance_uuid = get_chat_service_instance_uuid(protocol, network, address)
if instance_uuid:
return instance_uuid
else:
if not network:
network = ''
if not address:
address = ''
instance_uuid = str(uuid.uuid5(uuid.NAMESPACE_URL, f'{protocol}|{network}|{address}'))
r_obj.sadd(f'chatSerIns:all', instance_uuid)
# map instance - uuid
r_obj.hset(f'map:chatSerIns:{protocol}:{network}', address, instance_uuid)
r_obj.hset(f'chatSerIns:{instance_uuid}', 'protocol', protocol)
if network:
r_obj.hset(f'chatSerIns:{instance_uuid}', 'network', network)
if address:
r_obj.hset(f'chatSerIns:{instance_uuid}', 'address', address)
# protocols
r_obj.sadd(f'chat:protocols', protocol) # TODO first seen / last seen
# protocol -> network
r_obj.sadd(f'chat:protocol:networks:{protocol}', network)
return instance_uuid
# INSTANCE ===> CHAT IDS
# protocol -> instance_uuids => for protocol->networks -> protocol+network => HGETALL
# protocol+network -> instance_uuids => HGETALL
# protocol -> networks ???default??? or ''
# --------------------------------------------------------
# protocol+network -> addresses => HKEYS
# protocol+network+addresse => HGET
# Chat -> subtype=uuid, id = chat id
# instance_uuid -> chat id
# protocol - uniq ID
# protocol + network -> uuid ????
# protocol + network + address -> uuid
#######################################################################################
def get_obj_chat(chat_type, chat_subtype, chat_id):
if chat_type == 'chat':
return Chats.Chat(chat_id, chat_subtype)
elif chat_type == 'chat-subchannel':
return ChatSubChannels.ChatSubChannel(chat_id, chat_subtype)
elif chat_type == 'chat-thread':
return ChatThreads.ChatThread(chat_id, chat_subtype)
def get_obj_chat_meta(obj_chat, new_options=set()):
options = {}
if obj_chat.type == 'chat':
options = {'created_at', 'icon', 'info', 'subchannels', 'threads', 'username'}
elif obj_chat.type == 'chat-subchannel':
options = {'chat', 'created_at', 'icon', 'nb_messages', 'threads'}
elif obj_chat.type == 'chat-thread':
options = {'chat', 'nb_messages'}
for option in new_options:
options.add(option)
return obj_chat.get_meta(options=options)
def get_subchannels_meta_from_global_id(subchannels, translation_target=None):
meta = []
for sub in subchannels:
_, instance_uuid, sub_id = sub.split(':', 2)
subchannel = ChatSubChannels.ChatSubChannel(sub_id, instance_uuid)
meta.append(subchannel.get_meta({'nb_messages', 'created_at', 'icon', 'translation'}, translation_target=translation_target))
return meta
def get_chat_meta_from_global_id(chat_global_id):
_, instance_uuid, chat_id = chat_global_id.split(':', 2)
chat = Chats.Chat(chat_id, instance_uuid)
return chat.get_meta()
def get_threads_metas(threads):
metas = []
for thread in threads:
metas.append(ChatThreads.ChatThread(thread['id'], thread['subtype']).get_meta(options={'name', 'nb_messages'}))
return metas
def get_username_meta_from_global_id(username_global_id):
_, instance_uuid, username_id = username_global_id.split(':', 2)
username = Usernames.Username(username_id, instance_uuid)
return username.get_meta()
#### API ####
def api_get_chat_service_instance(chat_instance_uuid):
chat_instance = ChatServiceInstance(chat_instance_uuid)
if not chat_instance.exists():
return {"status": "error", "reason": "Unknown uuid"}, 404
return chat_instance.get_meta({'chats'}), 200
def api_get_chat(chat_id, chat_instance_uuid, translation_target=None, nb=-1, page=-1):
chat = Chats.Chat(chat_id, chat_instance_uuid)
if not chat.exists():
return {"status": "error", "reason": "Unknown chat"}, 404
meta = chat.get_meta({'created_at', 'icon', 'info', 'nb_participants', 'subchannels', 'threads', 'translation', 'username'}, translation_target=translation_target)
if meta['username']:
meta['username'] = get_username_meta_from_global_id(meta['username'])
if meta['subchannels']:
meta['subchannels'] = get_subchannels_meta_from_global_id(meta['subchannels'], translation_target=translation_target)
else:
if translation_target not in Language.get_translation_languages():
translation_target = None
meta['messages'], meta['pagination'], meta['tags_messages'] = chat.get_messages(translation_target=translation_target, nb=nb, page=page)
return meta, 200
def api_get_nb_message_by_week(chat_id, chat_instance_uuid):
chat = Chats.Chat(chat_id, chat_instance_uuid)
if not chat.exists():
return {"status": "error", "reason": "Unknown chat"}, 404
week = chat.get_nb_message_this_week()
# week = chat.get_nb_message_by_week('20231109')
return week, 200
def api_get_chat_participants(chat_type, chat_subtype, chat_id):
if chat_type not in ['chat', 'chat-subchannel', 'chat-thread']:
return {"status": "error", "reason": "Unknown chat type"}, 400
chat_obj = get_obj_chat(chat_type, chat_subtype, chat_id)
if not chat_obj.exists():
return {"status": "error", "reason": "Unknown chat"}, 404
else:
meta = get_obj_chat_meta(chat_obj, new_options={'participants'})
chat_participants = []
for participant in meta['participants']:
user_account = UsersAccount.UserAccount(participant['id'], participant['subtype'])
chat_participants.append(user_account.get_meta({'icon', 'info', 'username'}))
meta['participants'] = chat_participants
return meta, 200
def api_get_subchannel(chat_id, chat_instance_uuid, translation_target=None, nb=-1, page=-1):
subchannel = ChatSubChannels.ChatSubChannel(chat_id, chat_instance_uuid)
if not subchannel.exists():
return {"status": "error", "reason": "Unknown subchannel"}, 404
meta = subchannel.get_meta({'chat', 'created_at', 'icon', 'nb_messages', 'nb_participants', 'threads', 'translation'}, translation_target=translation_target)
if meta['chat']:
meta['chat'] = get_chat_meta_from_global_id(meta['chat'])
if meta.get('threads'):
meta['threads'] = get_threads_metas(meta['threads'])
if meta.get('username'):
meta['username'] = get_username_meta_from_global_id(meta['username'])
meta['messages'], meta['pagination'], meta['tags_messages'] = subchannel.get_messages(translation_target=translation_target, nb=nb, page=page)
return meta, 200
def api_get_thread(thread_id, thread_instance_uuid, translation_target=None, nb=-1, page=-1):
thread = ChatThreads.ChatThread(thread_id, thread_instance_uuid)
if not thread.exists():
return {"status": "error", "reason": "Unknown thread"}, 404
meta = thread.get_meta({'chat', 'nb_messages', 'nb_participants'})
# if meta['chat']:
# meta['chat'] = get_chat_meta_from_global_id(meta['chat'])
meta['messages'], meta['pagination'], meta['tags_messages'] = thread.get_messages(translation_target=translation_target, nb=nb, page=page)
return meta, 200
def api_get_message(message_id, translation_target=None):
message = Messages.Message(message_id)
if not message.exists():
return {"status": "error", "reason": "Unknown uuid"}, 404
meta = message.get_meta({'chat', 'content', 'files-names', 'icon', 'images', 'link', 'parent', 'parent_meta', 'reactions', 'thread', 'translation', 'user-account'}, translation_target=translation_target)
return meta, 200
def api_get_user_account(user_id, instance_uuid, translation_target=None):
user_account = UsersAccount.UserAccount(user_id, instance_uuid)
if not user_account.exists():
return {"status": "error", "reason": "Unknown user-account"}, 404
meta = user_account.get_meta({'chats', 'icon', 'info', 'subchannels', 'threads', 'translation', 'username', 'username_meta'}, translation_target=translation_target)
return meta, 200
# # # # # # # # # # LATER
# #
# ChatCategory #
# #
# # # # # # # # # #
if __name__ == '__main__':
r = get_chat_service_instances()
print(r)
r = ChatServiceInstance(r.pop())
print(r.get_meta({'chats'}))
# r = get_chat_protocols()
# print(r)

View file

@ -41,14 +41,26 @@ config_loader = None
################################## ##################################
CORRELATION_TYPES_BY_OBJ = { CORRELATION_TYPES_BY_OBJ = {
"cryptocurrency": ["domain", "item"], "chat": ["chat-subchannel", "chat-thread", "image", "user-account"], # message or direct correlation like cve, bitcoin, ... ???
"cve": ["domain", "item"], "chat-subchannel": ["chat", "chat-thread", "image", "message", "user-account"],
"decoded": ["domain", "item"], "chat-thread": ["chat", "chat-subchannel", "image", "message", "user-account"], # TODO user account
"domain": ["cve", "cryptocurrency", "decoded", "item", "pgp", "username", "screenshot"], "cookie-name": ["domain"],
"item": ["cve", "cryptocurrency", "decoded", "domain", "pgp", "username", "screenshot"], "cryptocurrency": ["domain", "item", "message"],
"pgp": ["domain", "item"], "cve": ["domain", "item", "message"],
"username": ["domain", "item"], "decoded": ["domain", "item", "message"],
"domain": ["cve", "cookie-name", "cryptocurrency", "decoded", "etag", "favicon", "hhhash", "item", "pgp", "title", "screenshot", "username"],
"etag": ["domain"],
"favicon": ["domain", "item"], # TODO Decoded
"file-name": ["chat", "message"],
"hhhash": ["domain"],
"image": ["chat", "message", "user-account"],
"item": ["cve", "cryptocurrency", "decoded", "domain", "favicon", "pgp", "screenshot", "title", "username"], # chat ???
"message": ["chat", "chat-subchannel", "chat-thread", "cve", "cryptocurrency", "decoded", "file-name", "image", "pgp", "user-account"], # chat ??
"pgp": ["domain", "item", "message"],
"screenshot": ["domain", "item"], "screenshot": ["domain", "item"],
"title": ["domain", "item"],
"user-account": ["chat", "chat-subchannel", "chat-thread", "image", "message", "username"],
"username": ["domain", "item", "message", "user-account"],
} }
def get_obj_correl_types(obj_type): def get_obj_correl_types(obj_type):
@ -60,6 +72,8 @@ def sanityze_obj_correl_types(obj_type, correl_types):
correl_types = set(correl_types).intersection(obj_correl_types) correl_types = set(correl_types).intersection(obj_correl_types)
if not correl_types: if not correl_types:
correl_types = obj_correl_types correl_types = obj_correl_types
if not correl_types:
return []
return correl_types return correl_types
def get_nb_correlation_by_correl_type(obj_type, subtype, obj_id, correl_type): def get_nb_correlation_by_correl_type(obj_type, subtype, obj_id, correl_type):
@ -109,6 +123,9 @@ def is_obj_correlated(obj_type, subtype, obj_id, obj2_type, subtype2, obj2_id):
except: except:
return False return False
def get_obj_inter_correlation(obj_type1, subtype1, obj_id1, obj_type2, subtype2, obj_id2, correl_type):
return r_metadata.sinter(f'correlation:obj:{obj_type1}:{subtype1}:{correl_type}:{obj_id1}', f'correlation:obj:{obj_type2}:{subtype2}:{correl_type}:{obj_id2}')
def add_obj_correlation(obj1_type, subtype1, obj1_id, obj2_type, subtype2, obj2_id): def add_obj_correlation(obj1_type, subtype1, obj1_id, obj2_type, subtype2, obj2_id):
if subtype1 is None: if subtype1 is None:
subtype1 = '' subtype1 = ''
@ -164,20 +181,22 @@ def delete_obj_correlations(obj_type, subtype, obj_id):
def get_obj_str_id(obj_type, subtype, obj_id): def get_obj_str_id(obj_type, subtype, obj_id):
if subtype is None: if subtype is None:
subtype = '' subtype = ''
return f'{obj_type};{subtype};{obj_id}' return f'{obj_type}:{subtype}:{obj_id}'
def get_correlations_graph_nodes_links(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, flask_context=False): def get_correlations_graph_nodes_links(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, objs_hidden=set(), flask_context=False):
links = set() links = set()
nodes = set() nodes = set()
meta = {'complete': True, 'objs': set()}
obj_str_id = get_obj_str_id(obj_type, subtype, obj_id) obj_str_id = get_obj_str_id(obj_type, subtype, obj_id)
_get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level, max_nodes, filter_types=filter_types, previous_str_obj='') _get_correlations_graph_node(links, nodes, meta, obj_type, subtype, obj_id, level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, previous_str_obj='')
return obj_str_id, nodes, links return obj_str_id, nodes, links, meta
def _get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level, max_nodes, filter_types=[], previous_str_obj=''): def _get_correlations_graph_node(links, nodes, meta, obj_type, subtype, obj_id, level, max_nodes, filter_types=[], objs_hidden=set(), previous_str_obj=''):
obj_str_id = get_obj_str_id(obj_type, subtype, obj_id) obj_str_id = get_obj_str_id(obj_type, subtype, obj_id)
meta['objs'].add(obj_str_id)
nodes.add(obj_str_id) nodes.add(obj_str_id)
obj_correlations = get_correlations(obj_type, subtype, obj_id, filter_types=filter_types) obj_correlations = get_correlations(obj_type, subtype, obj_id, filter_types=filter_types)
@ -186,15 +205,22 @@ def _get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level,
for str_obj in obj_correlations[correl_type]: for str_obj in obj_correlations[correl_type]:
subtype2, obj2_id = str_obj.split(':', 1) subtype2, obj2_id = str_obj.split(':', 1)
obj2_str_id = get_obj_str_id(correl_type, subtype2, obj2_id) obj2_str_id = get_obj_str_id(correl_type, subtype2, obj2_id)
# filter objects to hide
if obj2_str_id in objs_hidden:
continue
meta['objs'].add(obj2_str_id)
if obj2_str_id == previous_str_obj: if obj2_str_id == previous_str_obj:
continue continue
if len(nodes) > max_nodes: if len(nodes) > max_nodes != 0:
meta['complete'] = False
break break
nodes.add(obj2_str_id) nodes.add(obj2_str_id)
links.add((obj_str_id, obj2_str_id)) links.add((obj_str_id, obj2_str_id))
if level > 0: if level > 0:
next_level = level - 1 next_level = level - 1
_get_correlations_graph_node(links, nodes, correl_type, subtype2, obj2_id, next_level, max_nodes, filter_types=filter_types, previous_str_obj=obj_str_id) _get_correlations_graph_node(links, nodes, meta, correl_type, subtype2, obj2_id, next_level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, previous_str_obj=obj_str_id)

View file

@ -36,8 +36,10 @@ sys.path.append(os.environ['AIL_BIN'])
# Import Project packages # Import Project packages
################################## ##################################
from packages import git_status from packages import git_status
from packages import Date
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects.Domains import Domain from lib.objects.Domains import Domain
from lib.objects import HHHashs
from lib.objects.Items import Item from lib.objects.Items import Item
config_loader = ConfigLoader() config_loader = ConfigLoader()
@ -74,8 +76,8 @@ def get_current_date(separator=False):
def get_date_crawled_items_source(date): def get_date_crawled_items_source(date):
return os.path.join('crawled', date) return os.path.join('crawled', date)
def get_date_har_dir(date): def get_har_dir():
return os.path.join(HAR_DIR, date) return HAR_DIR
def is_valid_onion_domain(domain): def is_valid_onion_domain(domain):
if not domain.endswith('.onion'): if not domain.endswith('.onion'):
@ -133,7 +135,7 @@ def unpack_url(url):
# # # # # # # # TODO CREATE NEW OBJECT # # # # # # # # TODO CREATE NEW OBJECT
def get_favicon_from_html(html, domain, url): def get_favicon_from_html(html, domain, url):
favicon_urls = extract_favicon_from_html(html, url) favicon_urls, favicons = extract_favicon_from_html(html, url)
# add root favicon # add root favicon
if not favicon_urls: if not favicon_urls:
favicon_urls.add(f'{urlparse(url).scheme}://{domain}/favicon.ico') favicon_urls.add(f'{urlparse(url).scheme}://{domain}/favicon.ico')
@ -141,9 +143,11 @@ def get_favicon_from_html(html, domain, url):
return favicon_urls return favicon_urls
def extract_favicon_from_html(html, url): def extract_favicon_from_html(html, url):
favicon_urls = set() favicons = set()
favicons_urls = set()
soup = BeautifulSoup(html, 'html.parser') soup = BeautifulSoup(html, 'html.parser')
set_icons = set() all_icons = set()
# If there are multiple <link rel="icon">s, the browser uses their media, # If there are multiple <link rel="icon">s, the browser uses their media,
# type, and sizes attributes to select the most appropriate icon. # type, and sizes attributes to select the most appropriate icon.
# If several icons are equally appropriate, the last one is used. # If several icons are equally appropriate, the last one is used.
@ -159,30 +163,293 @@ def extract_favicon_from_html(html, url):
# - <meta name="msapplication-TileColor" content="#aaaaaa"> <meta name="theme-color" content="#ffffff"> # - <meta name="msapplication-TileColor" content="#aaaaaa"> <meta name="theme-color" content="#ffffff">
# - <meta name="msapplication-config" content="/icons/browserconfig.xml"> # - <meta name="msapplication-config" content="/icons/browserconfig.xml">
# desktop browser 'shortcut icon' (older browser), 'icon' # Root Favicon
for favicon_tag in ['icon', 'shortcut icon']: f = get_faup()
if soup.head: f.decode(url)
for icon in soup.head.find_all('link', attrs={'rel': lambda x : x and x.lower() == favicon_tag, 'href': True}): url_decoded = f.get()
set_icons.add(icon) root_domain = f"{url_decoded['scheme']}://{url_decoded['domain']}"
default_icon = f'{root_domain}/favicon.ico'
favicons_urls.add(default_icon)
# print(default_icon)
# # TODO: handle base64 favicon # shortcut
for tag in set_icons: for shortcut in soup.find_all('link', rel='shortcut icon'):
all_icons.add(shortcut)
# icons
for icon in soup.find_all('link', rel='icon'):
all_icons.add(icon)
for mask_icon in soup.find_all('link', rel='mask-icon'):
all_icons.add(mask_icon)
for apple_touche_icon in soup.find_all('link', rel='apple-touch-icon'):
all_icons.add(apple_touche_icon)
for msapplication in soup.find_all('meta', attrs={'name': 'msapplication-TileImage'}): # msapplication-TileColor
all_icons.add(msapplication)
# msapplication-TileImage
# print(all_icons)
for tag in all_icons:
icon_url = tag.get('href') icon_url = tag.get('href')
if icon_url: if icon_url:
if icon_url.startswith('//'):
icon_url = icon_url.replace('//', '/')
if icon_url.startswith('data:'): if icon_url.startswith('data:'):
# # TODO: handle base64 favicon data = icon_url.split(',', 1)
pass if len(data) > 1:
data = ''.join(data[1].split())
favicon = base64.b64decode(data)
if favicon:
favicons.add(favicon)
else: else:
icon_url = urljoin(url, icon_url) favicon_url = urljoin(url, icon_url)
icon_url = urlparse(icon_url, scheme=urlparse(url).scheme).geturl() favicons_urls.add(favicon_url)
favicon_urls.add(icon_url) elif tag.get('name') == 'msapplication-TileImage':
return favicon_urls icon_url = tag.get('content')
if icon_url:
if icon_url.startswith('data:'):
data = icon_url.split(',', 1)
if len(data) > 1:
data = ''.join(data[1].split())
favicon = base64.b64decode(data)
if favicon:
favicons.add(favicon)
else:
favicon_url = urljoin(url, icon_url)
favicons_urls.add(favicon_url)
print(favicon_url)
# print(favicons_urls)
return favicons_urls, favicons
# mmh3.hash(favicon)
# # # - - # # # # # # - - # # #
# # # # # # # #
# #
# TITLE #
# #
# # # # # # # #
def extract_title_from_html(html):
soup = BeautifulSoup(html, 'html.parser')
title = soup.title
if title:
title = title.string
if title:
return str(title)
return ''
def extract_description_from_html(html):
soup = BeautifulSoup(html, 'html.parser')
description = soup.find('meta', attrs={'name': 'description'})
if description:
return description['content']
return ''
def extract_keywords_from_html(html):
soup = BeautifulSoup(html, 'html.parser')
keywords = soup.find('meta', attrs={'name': 'keywords'})
if keywords:
return keywords['content']
return ''
def extract_author_from_html(html):
soup = BeautifulSoup(html, 'html.parser')
keywords = soup.find('meta', attrs={'name': 'author'})
if keywords:
return keywords['content']
return ''
# # # - - # # #
# # # # # # # #
# #
# HAR #
# #
# # # # # # # #
def create_har_id(date, item_id):
item_id = item_id.split('/')[-1]
return os.path.join(date, f'{item_id}.json.gz')
def save_har(har_id, har_content):
# create dir
har_dir = os.path.dirname(os.path.join(get_har_dir(), har_id))
if not os.path.exists(har_dir):
os.makedirs(har_dir)
# save HAR
filename = os.path.join(get_har_dir(), har_id)
with gzip.open(filename, 'wb') as f:
f.write(json.dumps(har_content).encode())
def get_all_har_ids():
har_ids = []
today_root_dir = os.path.join(HAR_DIR, Date.get_today_date_str(separator=True))
dirs_year = set()
for ydir in next(os.walk(HAR_DIR))[1]:
if len(ydir) == 4:
try:
int(ydir)
dirs_year.add(ydir)
except (TypeError, ValueError):
pass
if os.path.exists(today_root_dir):
for file in [f for f in os.listdir(today_root_dir) if os.path.isfile(os.path.join(today_root_dir, f))]:
har_id = os.path.relpath(os.path.join(today_root_dir, file), HAR_DIR)
har_ids.append(har_id)
for ydir in sorted(dirs_year, reverse=False):
search_dear = os.path.join(HAR_DIR, ydir)
for root, dirs, files in os.walk(search_dear):
for file in files:
if root != today_root_dir:
har_id = os.path.relpath(os.path.join(root, file), HAR_DIR)
har_ids.append(har_id)
return har_ids
def get_month_har_ids(year, month):
har_ids = []
month_path = os.path.join(HAR_DIR, year, month)
for root, dirs, files in os.walk(month_path):
for file in files:
har_id = os.path.relpath(os.path.join(root, file), HAR_DIR)
har_ids.append(har_id)
return har_ids
def get_har_content(har_id):
har_path = os.path.join(HAR_DIR, har_id)
try:
with gzip.open(har_path) as f:
try:
return json.loads(f.read())
except json.decoder.JSONDecodeError:
return {}
except Exception as e:
print(e) # TODO LOGS
return {}
def extract_cookies_names_from_har(har):
cookies = set()
for entrie in har.get('log', {}).get('entries', []):
for cookie in entrie.get('request', {}).get('cookies', []):
name = cookie.get('name')
if name:
cookies.add(name)
for cookie in entrie.get('response', {}).get('cookies', []):
name = cookie.get('name')
if name:
cookies.add(name)
return cookies
def _reprocess_all_hars_cookie_name():
from lib.objects import CookiesNames
for har_id in get_all_har_ids():
domain = har_id.split('/')[-1]
domain = domain[:-44]
date = har_id.split('/')
date = f'{date[-4]}{date[-3]}{date[-2]}'
for cookie_name in extract_cookies_names_from_har(get_har_content(har_id)):
print(domain, date, cookie_name)
cookie = CookiesNames.create(cookie_name)
cookie.add(date, Domain(domain))
def extract_etag_from_har(har): # TODO check response url
etags = set()
for entrie in har.get('log', {}).get('entries', []):
for header in entrie.get('response', {}).get('headers', []):
if header.get('name') == 'etag':
# print(header)
etag = header.get('value')
if etag:
etags.add(etag)
return etags
def _reprocess_all_hars_etag():
from lib.objects import Etags
for har_id in get_all_har_ids():
domain = har_id.split('/')[-1]
domain = domain[:-44]
date = har_id.split('/')
date = f'{date[-4]}{date[-3]}{date[-2]}'
for etag_content in extract_etag_from_har(get_har_content(har_id)):
print(domain, date, etag_content)
etag = Etags.create(etag_content)
etag.add(date, Domain(domain))
def extract_hhhash_by_id(har_id, domain, date):
return extract_hhhash(get_har_content(har_id), domain, date)
def extract_hhhash(har, domain, date):
hhhashs = set()
urls = set()
for entrie in har.get('log', {}).get('entries', []):
url = entrie.get('request').get('url')
if url not in urls:
# filter redirect
if entrie.get('response').get('status') == 200: # != 301:
# print(url, entrie.get('response').get('status'))
f = get_faup()
f.decode(url)
domain_url = f.get().get('domain')
if domain_url == domain:
headers = entrie.get('response').get('headers')
hhhash_header = HHHashs.build_hhhash_headers(headers)
hhhash = HHHashs.hhhash_headers(hhhash_header)
if hhhash not in hhhashs:
print('', url, hhhash)
# -----
obj = HHHashs.create(hhhash_header, hhhash)
obj.add(date, Domain(domain))
hhhashs.add(hhhash)
urls.add(url)
print()
print()
print('HHHASH:')
for hhhash in hhhashs:
print(hhhash)
return hhhashs
def _reprocess_all_hars_hhhashs():
for har_id in get_all_har_ids():
print()
print(har_id)
domain = har_id.split('/')[-1]
domain = domain[:-44]
date = har_id.split('/')
date = f'{date[-4]}{date[-3]}{date[-2]}'
extract_hhhash_by_id(har_id, domain, date)
def _gzip_har(har_id):
har_path = os.path.join(HAR_DIR, har_id)
new_id = f'{har_path}.gz'
if not har_id.endswith('.gz'):
if not os.path.exists(new_id):
with open(har_path, 'rb') as f:
content = f.read()
if content:
with gzip.open(new_id, 'wb') as f:
r = f.write(content)
print(r)
if os.path.exists(new_id) and os.path.exists(har_path):
os.remove(har_path)
print('delete:', har_path)
def _gzip_all_hars():
for har_id in get_all_har_ids():
_gzip_har(har_id)
# # # - - # # #
################################################################################ ################################################################################
@ -498,8 +765,7 @@ class Cookie:
meta[field] = value meta[field] = value
if r_json: if r_json:
data = json.dumps(meta, indent=4, sort_keys=True) data = json.dumps(meta, indent=4, sort_keys=True)
meta = {'data': data} meta = {'data': data, 'uuid': self.uuid}
meta['uuid'] = self.uuid
return meta return meta
def edit(self, cookie_dict): def edit(self, cookie_dict):
@ -803,6 +1069,7 @@ class CrawlerScheduler:
task_uuid = create_task(meta['url'], depth=meta['depth'], har=meta['har'], screenshot=meta['screenshot'], task_uuid = create_task(meta['url'], depth=meta['depth'], har=meta['har'], screenshot=meta['screenshot'],
header=meta['header'], header=meta['header'],
cookiejar=meta['cookiejar'], proxy=meta['proxy'], cookiejar=meta['cookiejar'], proxy=meta['proxy'],
tags=meta['tags'],
user_agent=meta['user_agent'], parent='scheduler', priority=40) user_agent=meta['user_agent'], parent='scheduler', priority=40)
if task_uuid: if task_uuid:
schedule.set_task(task_uuid) schedule.set_task(task_uuid)
@ -905,6 +1172,14 @@ class CrawlerSchedule:
def _set_field(self, field, value): def _set_field(self, field, value):
return r_crawler.hset(f'schedule:{self.uuid}', field, value) return r_crawler.hset(f'schedule:{self.uuid}', field, value)
def get_tags(self):
return r_crawler.smembers(f'schedule:tags:{self.uuid}')
def set_tags(self, tags=[]):
for tag in tags:
r_crawler.sadd(f'schedule:tags:{self.uuid}', tag)
# Tag.create_custom_tag(tag)
def get_meta(self, ui=False): def get_meta(self, ui=False):
meta = { meta = {
'uuid': self.uuid, 'uuid': self.uuid,
@ -919,6 +1194,7 @@ class CrawlerSchedule:
'cookiejar': self.get_cookiejar(), 'cookiejar': self.get_cookiejar(),
'header': self.get_header(), 'header': self.get_header(),
'proxy': self.get_proxy(), 'proxy': self.get_proxy(),
'tags': self.get_tags(),
} }
status = self.get_status() status = self.get_status()
if ui: if ui:
@ -934,6 +1210,7 @@ class CrawlerSchedule:
meta = {'uuid': self.uuid, meta = {'uuid': self.uuid,
'url': self.get_url(), 'url': self.get_url(),
'user': self.get_user(), 'user': self.get_user(),
'tags': self.get_tags(),
'next_run': self.get_next_run(r_str=True)} 'next_run': self.get_next_run(r_str=True)}
status = self.get_status() status = self.get_status()
if isinstance(status, ScheduleStatus): if isinstance(status, ScheduleStatus):
@ -942,7 +1219,7 @@ class CrawlerSchedule:
return meta return meta
def create(self, frequency, user, url, def create(self, frequency, user, url,
depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None): depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None, tags=[]):
if self.exists(): if self.exists():
raise Exception('Error: Monitor already exists') raise Exception('Error: Monitor already exists')
@ -971,6 +1248,9 @@ class CrawlerSchedule:
if user_agent: if user_agent:
self._set_field('user_agent', user_agent) self._set_field('user_agent', user_agent)
if tags:
self.set_tags(tags)
r_crawler.sadd('scheduler:schedules', self.uuid) r_crawler.sadd('scheduler:schedules', self.uuid)
def delete(self): def delete(self):
@ -984,12 +1264,13 @@ class CrawlerSchedule:
# delete meta # delete meta
r_crawler.delete(f'schedule:{self.uuid}') r_crawler.delete(f'schedule:{self.uuid}')
r_crawler.delete(f'schedule:tags:{self.uuid}')
r_crawler.srem('scheduler:schedules', self.uuid) r_crawler.srem('scheduler:schedules', self.uuid)
def create_schedule(frequency, user, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None): def create_schedule(frequency, user, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None, tags=[]):
schedule_uuid = gen_uuid() schedule_uuid = gen_uuid()
schedule = CrawlerSchedule(schedule_uuid) schedule = CrawlerSchedule(schedule_uuid)
schedule.create(frequency, user, url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, proxy=proxy, user_agent=user_agent) schedule.create(frequency, user, url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, proxy=proxy, user_agent=user_agent, tags=tags)
return schedule_uuid return schedule_uuid
# TODO sanityze UUID # TODO sanityze UUID
@ -1046,18 +1327,29 @@ class CrawlerCapture:
if task_uuid: if task_uuid:
return CrawlerTask(task_uuid) return CrawlerTask(task_uuid)
def get_start_time(self): def get_start_time(self, r_str=True):
return self.get_task().get_start_time() start_time = self.get_task().get_start_time()
if r_str:
return start_time
elif not start_time:
return 0
else:
start_time = datetime.strptime(start_time, "%Y/%m/%d - %H:%M.%S").timestamp()
return int(start_time)
def get_status(self): def get_status(self):
return r_cache.hget(f'crawler:capture:{self.uuid}', 'status') status = r_cache.hget(f'crawler:capture:{self.uuid}', 'status')
if not status:
status = -1
return status
def is_ongoing(self): def is_ongoing(self):
return self.get_status() == CaptureStatus.ONGOING return self.get_status() == CaptureStatus.ONGOING
def create(self, task_uuid): def create(self, task_uuid):
if self.exists(): if self.exists():
raise Exception(f'Error: Capture {self.uuid} already exists') print(f'Capture {self.uuid} already exists') # TODO LOGS
return None
launch_time = int(time.time()) launch_time = int(time.time())
r_crawler.hset(f'crawler:task:{task_uuid}', 'capture', self.uuid) r_crawler.hset(f'crawler:task:{task_uuid}', 'capture', self.uuid)
r_crawler.hset('crawler:captures:tasks', self.uuid, task_uuid) r_crawler.hset('crawler:captures:tasks', self.uuid, task_uuid)
@ -1068,7 +1360,7 @@ class CrawlerCapture:
def update(self, status): def update(self, status):
# Error or Reload # Error or Reload
if not status: if not status:
r_cache.hset(f'crawler:capture:{self.uuid}', 'status', CaptureStatus.UNKNOWN) r_cache.hset(f'crawler:capture:{self.uuid}', 'status', CaptureStatus.UNKNOWN.value)
r_cache.zadd('crawler:captures', {self.uuid: 0}) r_cache.zadd('crawler:captures', {self.uuid: 0})
else: else:
last_check = int(time.time()) last_check = int(time.time())
@ -1122,6 +1414,11 @@ def get_captures_status():
status.append(meta) status.append(meta)
return status return status
def delete_captures():
for capture_uuid in get_crawler_captures():
capture = CrawlerCapture(capture_uuid)
capture.delete()
##-- CRAWLER STATE --## ##-- CRAWLER STATE --##
@ -1204,6 +1501,14 @@ class CrawlerTask:
def _set_field(self, field, value): def _set_field(self, field, value):
return r_crawler.hset(f'crawler:task:{self.uuid}', field, value) return r_crawler.hset(f'crawler:task:{self.uuid}', field, value)
def get_tags(self):
return r_crawler.smembers(f'crawler:task:tags:{self.uuid}')
def set_tags(self, tags):
for tag in tags:
r_crawler.sadd(f'crawler:task:tags:{self.uuid}', tag)
# Tag.create_custom_tag(tag)
def get_meta(self): def get_meta(self):
meta = { meta = {
'uuid': self.uuid, 'uuid': self.uuid,
@ -1218,6 +1523,7 @@ class CrawlerTask:
'header': self.get_header(), 'header': self.get_header(),
'proxy': self.get_proxy(), 'proxy': self.get_proxy(),
'parent': self.get_parent(), 'parent': self.get_parent(),
'tags': self.get_tags(),
} }
return meta return meta
@ -1225,7 +1531,7 @@ class CrawlerTask:
# TODO SANITIZE PRIORITY # TODO SANITIZE PRIORITY
# PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100 # PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100
def create(self, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, def create(self, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None,
user_agent=None, parent='manual', priority=0): user_agent=None, tags=[], parent='manual', priority=0, external=False):
if self.exists(): if self.exists():
raise Exception('Error: Task already exists') raise Exception('Error: Task already exists')
@ -1256,7 +1562,7 @@ class CrawlerTask:
# TODO SANITIZE COOKIEJAR -> UUID # TODO SANITIZE COOKIEJAR -> UUID
# Check if already in queue # Check if already in queue
hash_query = get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header) hash_query = get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header, tags)
if r_crawler.hexists(f'crawler:queue:hash', hash_query): if r_crawler.hexists(f'crawler:queue:hash', hash_query):
self.uuid = r_crawler.hget(f'crawler:queue:hash', hash_query) self.uuid = r_crawler.hget(f'crawler:queue:hash', hash_query)
return self.uuid return self.uuid
@ -1277,9 +1583,12 @@ class CrawlerTask:
if user_agent: if user_agent:
self._set_field('user_agent', user_agent) self._set_field('user_agent', user_agent)
if tags:
self.set_tags(tags)
r_crawler.hset('crawler:queue:hash', hash_query, self.uuid) r_crawler.hset('crawler:queue:hash', hash_query, self.uuid)
self._set_field('hash', hash_query) self._set_field('hash', hash_query)
r_crawler.zadd('crawler:queue', {self.uuid: priority}) if not external:
self.add_to_db_crawler_queue(priority) self.add_to_db_crawler_queue(priority)
# UI # UI
domain_type = dom.get_domain_type() domain_type = dom.get_domain_type()
@ -1293,6 +1602,11 @@ class CrawlerTask:
def start(self): def start(self):
self._set_field('start_time', datetime.now().strftime("%Y/%m/%d - %H:%M.%S")) self._set_field('start_time', datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
def reset(self):
priority = 49
r_crawler.hdel(f'crawler:task:{self.uuid}', 'start_time')
self.add_to_db_crawler_queue(priority)
# Crawler # Crawler
def remove(self): # zrem cache + DB def remove(self): # zrem cache + DB
capture_uuid = self.get_capture() capture_uuid = self.get_capture()
@ -1316,10 +1630,10 @@ class CrawlerTask:
# TODO move to class ??? # TODO move to class ???
def get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header): def get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header, tags):
to_enqueue = {'domain': domain, 'depth': depth, 'har': har, 'screenshot': screenshot, to_enqueue = {'domain': domain, 'depth': depth, 'har': har, 'screenshot': screenshot,
'priority': priority, 'proxy': proxy, 'cookiejar': cookiejar, 'user_agent': user_agent, 'priority': priority, 'proxy': proxy, 'cookiejar': cookiejar, 'user_agent': user_agent,
'header': header} 'header': header, 'tags': tags}
if priority != 0: if priority != 0:
to_enqueue['url'] = url to_enqueue['url'] = url
return hashlib.sha512(pickle.dumps(to_enqueue)).hexdigest() return hashlib.sha512(pickle.dumps(to_enqueue)).hexdigest()
@ -1330,12 +1644,11 @@ def add_task_to_lacus_queue():
return None return None
task_uuid, priority = task_uuid[0] task_uuid, priority = task_uuid[0]
task = CrawlerTask(task_uuid) task = CrawlerTask(task_uuid)
task.start() return task, priority
return task.uuid, priority
# PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100 # PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100
def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None,
user_agent=None, parent='manual', priority=0, task_uuid=None): user_agent=None, tags=[], parent='manual', priority=0, task_uuid=None, external=False):
if task_uuid: if task_uuid:
if CrawlerTask(task_uuid).exists(): if CrawlerTask(task_uuid).exists():
task_uuid = gen_uuid() task_uuid = gen_uuid()
@ -1343,7 +1656,8 @@ def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=
task_uuid = gen_uuid() task_uuid = gen_uuid()
task = CrawlerTask(task_uuid) task = CrawlerTask(task_uuid)
task_uuid = task.create(url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, task_uuid = task.create(url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar,
proxy=proxy, user_agent=user_agent, parent=parent, priority=priority) proxy=proxy, user_agent=user_agent, tags=tags, parent=parent, priority=priority,
external=external)
return task_uuid return task_uuid
@ -1353,7 +1667,8 @@ def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=
# # TODO: ADD user agent # # TODO: ADD user agent
# # TODO: sanitize URL # # TODO: sanitize URL
def api_add_crawler_task(data, user_id=None):
def api_parse_task_dict_basic(data, user_id):
url = data.get('url', None) url = data.get('url', None)
if not url or url == '\n': if not url or url == '\n':
return {'status': 'error', 'reason': 'No url supplied'}, 400 return {'status': 'error', 'reason': 'No url supplied'}, 400
@ -1379,6 +1694,31 @@ def api_add_crawler_task(data, user_id=None):
else: else:
depth_limit = 0 depth_limit = 0
# PROXY
proxy = data.get('proxy', None)
if proxy == 'onion' or proxy == 'tor' or proxy == 'force_tor':
proxy = 'force_tor'
elif proxy:
verify = api_verify_proxy(proxy)
if verify[1] != 200:
return verify
tags = data.get('tags', [])
return {'url': url, 'depth_limit': depth_limit, 'har': har, 'screenshot': screenshot, 'proxy': proxy, 'tags': tags}, 200
def api_add_crawler_task(data, user_id=None):
task, resp = api_parse_task_dict_basic(data, user_id)
if resp != 200:
return task, resp
url = task['url']
screenshot = task['screenshot']
har = task['har']
depth_limit = task['depth_limit']
proxy = task['proxy']
tags = task['tags']
cookiejar_uuid = data.get('cookiejar', None) cookiejar_uuid = data.get('cookiejar', None)
if cookiejar_uuid: if cookiejar_uuid:
cookiejar = Cookiejar(cookiejar_uuid) cookiejar = Cookiejar(cookiejar_uuid)
@ -1390,6 +1730,19 @@ def api_add_crawler_task(data, user_id=None):
return {'error': 'The access to this cookiejar is restricted'}, 403 return {'error': 'The access to this cookiejar is restricted'}, 403
cookiejar_uuid = cookiejar.uuid cookiejar_uuid = cookiejar.uuid
cookies = data.get('cookies', None)
if not cookiejar_uuid and cookies:
# Create new cookiejar
cookiejar_uuid = create_cookiejar(user_id, "single-shot cookiejar", 1, None)
cookiejar = Cookiejar(cookiejar_uuid)
for cookie in cookies:
try:
name = cookie.get('name')
value = cookie.get('value')
cookiejar.add_cookie(name, value, None, None, None, None, None)
except KeyError:
return {'error': 'Invalid cookie key, please submit a valid JSON', 'cookiejar_uuid': cookiejar_uuid}, 400
frequency = data.get('frequency', None) frequency = data.get('frequency', None)
if frequency: if frequency:
if frequency not in ['monthly', 'weekly', 'daily', 'hourly']: if frequency not in ['monthly', 'weekly', 'daily', 'hourly']:
@ -1410,29 +1763,47 @@ def api_add_crawler_task(data, user_id=None):
return {'error': 'Invalid frequency'}, 400 return {'error': 'Invalid frequency'}, 400
frequency = f'{months}:{weeks}:{days}:{hours}:{minutes}' frequency = f'{months}:{weeks}:{days}:{hours}:{minutes}'
# PROXY
proxy = data.get('proxy', None)
if proxy == 'onion' or proxy == 'tor' or proxy == 'force_tor':
proxy = 'force_tor'
elif proxy:
verify = api_verify_proxy(proxy)
if verify[1] != 200:
return verify
if frequency: if frequency:
# TODO verify user # TODO verify user
return create_schedule(frequency, user_id, url, depth=depth_limit, har=har, screenshot=screenshot, header=None, task_uuid = create_schedule(frequency, user_id, url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None), 200 cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, tags=tags)
else: else:
# TODO HEADERS # TODO HEADERS
# TODO USER AGENT # TODO USER AGENT
return create_task(url, depth=depth_limit, har=har, screenshot=screenshot, header=None, task_uuid = create_task(url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, tags=tags,
parent='manual', priority=90), 200 parent='manual', priority=90)
return {'uuid': task_uuid}, 200
#### #### #### ####
# TODO cookiejar - cookies - frequency
def api_add_crawler_capture(data, user_id):
task, resp = api_parse_task_dict_basic(data, user_id)
if resp != 200:
return task, resp
task_uuid = data.get('task_uuid')
if not task_uuid:
return {'error': 'Invalid task_uuid', 'task_uuid': task_uuid}, 400
capture_uuid = data.get('capture_uuid')
if not capture_uuid:
return {'error': 'Invalid capture_uuid', 'capture_uuid': capture_uuid}, 400
# parent = data.get('parent')
# TODO parent
task_uuid = create_task(task['url'], depth=task['depth_limit'], har=task['har'], screenshot=task['screenshot'],
proxy=task['proxy'], tags=task['tags'],
parent='manual', task_uuid=task_uuid, external=True)
if not task_uuid:
return {'error': 'Aborted by Crawler', 'task_uuid': task_uuid, 'capture_uuid': capture_uuid}, 400
task = CrawlerTask(task_uuid)
create_capture(capture_uuid, task_uuid)
task.start()
return {'uuid': capture_uuid}, 200
################################################################################### ###################################################################################
################################################################################### ###################################################################################
@ -1471,14 +1842,6 @@ def create_item_id(item_dir, domain):
UUID = domain+str(uuid.uuid4()) UUID = domain+str(uuid.uuid4())
return os.path.join(item_dir, UUID) return os.path.join(item_dir, UUID)
def save_har(har_dir, item_id, har_content):
if not os.path.exists(har_dir):
os.makedirs(har_dir)
item_id = item_id.split('/')[-1]
filename = os.path.join(har_dir, item_id + '.json')
with open(filename, 'w') as f:
f.write(json.dumps(har_content))
# # # # # # # # # # # # # # # # # # # # # # # #
# # # #
# CRAWLER MANAGER # TODO REFACTOR ME # CRAWLER MANAGER # TODO REFACTOR ME
@ -1509,13 +1872,13 @@ class CrawlerProxy:
self.uuid = proxy_uuid self.uuid = proxy_uuid
def get_description(self): def get_description(self):
return r_crawler.hgrt(f'crawler:proxy:{self.uuif}', 'description') return r_crawler.hget(f'crawler:proxy:{self.uuid}', 'description')
# Host # Host
# Port # Port
# Type -> need test # Type -> need test
def get_url(self): def get_url(self):
return r_crawler.hgrt(f'crawler:proxy:{self.uuif}', 'url') return r_crawler.hget(f'crawler:proxy:{self.uuid}', 'url')
#### CRAWLER LACUS #### #### CRAWLER LACUS ####
@ -1577,7 +1940,11 @@ def ping_lacus():
ping = False ping = False
req_error = {'error': 'Lacus URL undefined', 'status_code': 400} req_error = {'error': 'Lacus URL undefined', 'status_code': 400}
else: else:
try:
ping = lacus.is_up ping = lacus.is_up
except:
req_error = {'error': 'Failed to connect Lacus URL', 'status_code': 400}
ping = False
update_lacus_connection_status(ping, req_error=req_error) update_lacus_connection_status(ping, req_error=req_error)
return ping return ping
@ -1637,7 +2004,7 @@ def api_set_crawler_max_captures(data):
save_nb_max_captures(nb_captures) save_nb_max_captures(nb_captures)
return nb_captures, 200 return nb_captures, 200
## TEST ## ## TEST ##
def is_test_ail_crawlers_successful(): def is_test_ail_crawlers_successful():
return r_db.hget('crawler:tor:test', 'success') == 'True' return r_db.hget('crawler:tor:test', 'success') == 'True'
@ -1711,7 +2078,15 @@ def test_ail_crawlers():
load_blacklist() load_blacklist()
# if __name__ == '__main__': # if __name__ == '__main__':
# task = CrawlerTask('2dffcae9-8f66-4cfa-8e2c-de1df738a6cd') # delete_captures()
# print(task.get_meta())
# _clear_captures()
# item_id = 'crawled/2023/02/20/data.gz'
# item = Item(item_id)
# content = item.get_content()
# temp_url = ''
# r = extract_favicon_from_html(content, temp_url)
# print(r)
# _reprocess_all_hars_cookie_name()
# _reprocess_all_hars_etag()
# _gzip_all_hars()
# _reprocess_all_hars_hhhashs()

View file

@ -50,8 +50,8 @@ def is_passive_dns_enabled(cache=True):
def change_passive_dns_state(new_state): def change_passive_dns_state(new_state):
old_state = is_passive_dns_enabled(cache=False) old_state = is_passive_dns_enabled(cache=False)
if old_state != new_state: if old_state != new_state:
r_serv_db.hset('d4:passivedns', 'enabled', bool(new_state)) r_serv_db.hset('d4:passivedns', 'enabled', str(new_state))
r_cache.set('d4:passivedns:enabled', bool(new_state)) r_cache.set('d4:passivedns:enabled', str(new_state))
update_time = time.time() update_time = time.time()
r_serv_db.hset('d4:passivedns', 'update_time', update_time) r_serv_db.hset('d4:passivedns', 'update_time', update_time)
r_cache.set('d4:passivedns:last_update_time', update_time) r_cache.set('d4:passivedns:last_update_time', update_time)

View file

@ -129,7 +129,7 @@ def get_item_url(item_id):
def get_item_har(item_id): def get_item_har(item_id):
har = '/'.join(item_id.rsplit('/')[-4:]) har = '/'.join(item_id.rsplit('/')[-4:])
har = f'{har}.json' har = f'{har}.json.gz'
path = os.path.join(ConfigLoader.get_hars_dir(), har) path = os.path.join(ConfigLoader.get_hars_dir(), har)
if os.path.isfile(path): if os.path.isfile(path):
return har return har
@ -204,15 +204,22 @@ def _get_dir_source_name(directory, source_name=None, l_sources_name=set(), filt
if not l_sources_name: if not l_sources_name:
l_sources_name = set() l_sources_name = set()
if source_name: if source_name:
path = os.path.join(directory, source_name)
if os.path.isdir(path):
l_dir = os.listdir(os.path.join(directory, source_name)) l_dir = os.listdir(os.path.join(directory, source_name))
else:
l_dir = []
else: else:
l_dir = os.listdir(directory) l_dir = os.listdir(directory)
# empty directory # empty directory
if not l_dir: if not l_dir:
if source_name:
return l_sources_name.add(source_name) return l_sources_name.add(source_name)
else:
return l_sources_name
else: else:
for src_name in l_dir: for src_name in l_dir:
if len(src_name) == 4: if len(src_name) == 4 and source_name:
# try: # try:
int(src_name) int(src_name)
to_add = os.path.join(source_name) to_add = os.path.join(source_name)

View file

@ -1,12 +1,13 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
# -*-coding:UTF-8 -* # -*-coding:UTF-8 -*
import json import json
import logging
import os import os
import sys import sys
import time
import yara import yara
from hashlib import sha256
from operator import itemgetter from operator import itemgetter
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
@ -15,6 +16,7 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from lib.objects import ail_objects from lib.objects import ail_objects
from lib.objects.Items import Item from lib.objects.Items import Item
from lib.objects.Titles import Title
from lib import correlations_engine from lib import correlations_engine
from lib import regex_helper from lib import regex_helper
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
@ -28,6 +30,8 @@ from modules.Onion import Onion
from modules.Phone import Phone from modules.Phone import Phone
from modules.Tools import Tools from modules.Tools import Tools
logger = logging.getLogger()
config_loader = ConfigLoader() config_loader = ConfigLoader()
r_cache = config_loader.get_redis_conn("Redis_Cache") r_cache = config_loader.get_redis_conn("Redis_Cache")
config_loader = None config_loader = None
@ -58,18 +62,31 @@ def get_correl_match(extract_type, obj_id, content):
correl = correlations_engine.get_correlation_by_correl_type('item', '', obj_id, extract_type) correl = correlations_engine.get_correlation_by_correl_type('item', '', obj_id, extract_type)
to_extract = [] to_extract = []
map_subtype = {} map_subtype = {}
map_value_id = {}
for c in correl: for c in correl:
subtype, value = c.split(':', 1) subtype, value = c.split(':', 1)
if extract_type == 'title':
title = Title(value).get_content()
to_extract.append(title)
sha256_val = sha256(title.encode()).hexdigest()
else:
map_subtype[value] = subtype map_subtype[value] = subtype
to_extract.append(value) to_extract.append(value)
sha256_val = sha256(value.encode()).hexdigest()
map_value_id[sha256_val] = value
if to_extract: if to_extract:
objs = regex_helper.regex_finditer(r_key, '|'.join(to_extract), obj_id, content) objs = regex_helper.regex_finditer(r_key, '|'.join(to_extract), obj_id, content)
for obj in objs: for obj in objs:
if map_subtype[obj[2]]: if map_subtype.get(obj[2]):
subtype = map_subtype[obj[2]] subtype = map_subtype[obj[2]]
else: else:
subtype = '' subtype = ''
extracted.append([obj[0], obj[1], obj[2], f'{extract_type}:{subtype}:{obj[2]}']) sha256_val = sha256(obj[2].encode()).hexdigest()
value_id = map_value_id.get(sha256_val)
if not value_id:
logger.critical(f'Error module extractor: {sha256_val}\n{extract_type}\n{subtype}\n{value_id}\n{map_value_id}\n{objs}')
value_id = 'ERROR'
extracted.append([obj[0], obj[1], obj[2], f'{extract_type}:{subtype}:{value_id}'])
return extracted return extracted
def _get_yara_match(data): def _get_yara_match(data):
@ -87,9 +104,13 @@ def _get_word_regex(word):
def convert_byte_offset_to_string(b_content, offset): def convert_byte_offset_to_string(b_content, offset):
byte_chunk = b_content[:offset + 1] byte_chunk = b_content[:offset + 1]
try:
string_chunk = byte_chunk.decode() string_chunk = byte_chunk.decode()
offset = len(string_chunk) - 1 offset = len(string_chunk) - 1
return offset return offset
except UnicodeDecodeError as e:
logger.error(f'Yara offset converter error, {str(e)}\n{offset}/{len(b_content)}')
return convert_byte_offset_to_string(b_content, offset - 1)
# TODO RETRO HUNTS # TODO RETRO HUNTS
@ -155,6 +176,7 @@ def extract(obj_id, content=None):
# CHECK CACHE # CHECK CACHE
cached = r_cache.get(f'extractor:cache:{obj_id}') cached = r_cache.get(f'extractor:cache:{obj_id}')
# cached = None
if cached: if cached:
r_cache.expire(f'extractor:cache:{obj_id}', 300) r_cache.expire(f'extractor:cache:{obj_id}', 300)
return json.loads(cached) return json.loads(cached)
@ -173,7 +195,7 @@ def extract(obj_id, content=None):
if matches: if matches:
extracted = extracted + matches extracted = extracted + matches
for obj_t in ['cve', 'cryptocurrency', 'username']: # Decoded, PGP->extract bloc for obj_t in ['cve', 'cryptocurrency', 'title', 'username']: # Decoded, PGP->extract bloc
matches = get_correl_match(obj_t, obj_id, content) matches = get_correl_match(obj_t, obj_id, content)
if matches: if matches:
extracted = extracted + matches extracted = extracted + matches

View file

@ -0,0 +1,166 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from datetime import datetime
from flask import url_for
# from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib import ail_core
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
from lib.data_retention_engine import update_obj_date
from lib.objects import ail_objects
from lib.timeline_engine import Timeline
from lib.correlations_engine import get_correlation_by_correl_type
config_loader = ConfigLoader()
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
r_cache = config_loader.get_redis_conn("Redis_Cache")
config_loader = None
################################################################################
################################################################################
################################################################################
class ChatSubChannel(AbstractChatObject):
"""
AIL Chat Object. (strings)
"""
# ID -> <CHAT ID>/<SubChannel ID> subtype = chat_instance_uuid
def __init__(self, id, subtype):
super(ChatSubChannel, self).__init__('chat-subchannel', id, subtype)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
return url
def get_svg_icon(self): # TODO
# if self.subtype == 'telegram':
# style = 'fab'
# icon = '\uf2c6'
# elif self.subtype == 'discord':
# style = 'fab'
# icon = '\uf099'
# else:
# style = 'fas'
# icon = '\uf007'
style = 'far'
icon = '\uf086'
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
# TODO TIME LAST MESSAGES
def get_meta(self, options=set(), translation_target=None):
meta = self._get_meta(options=options)
meta['tags'] = self.get_tags(r_list=True)
meta['name'] = self.get_name()
if 'chat' in options:
meta['chat'] = self.get_chat()
if 'icon' in options:
meta['icon'] = self.get_icon()
meta['img'] = meta['icon']
if 'nb_messages' in options:
meta['nb_messages'] = self.get_nb_messages()
if 'created_at' in options:
meta['created_at'] = self.get_created_at(date=True)
if 'threads' in options:
meta['threads'] = self.get_threads()
if 'participants' in options:
meta['participants'] = self.get_participants()
if 'nb_participants' in options:
meta['nb_participants'] = self.get_nb_participants()
if 'translation' in options and translation_target:
meta['translation_name'] = self.translate(meta['name'], field='name', target=translation_target)
return meta
def get_misp_object(self):
# obj_attrs = []
# if self.subtype == 'telegram':
# obj = MISPObject('telegram-account', standalone=True)
# obj_attrs.append(obj.add_attribute('username', value=self.id))
#
# elif self.subtype == 'twitter':
# obj = MISPObject('twitter-account', standalone=True)
# obj_attrs.append(obj.add_attribute('name', value=self.id))
#
# else:
# obj = MISPObject('user-account', standalone=True)
# obj_attrs.append(obj.add_attribute('username', value=self.id))
#
# first_seen = self.get_first_seen()
# last_seen = self.get_last_seen()
# if first_seen:
# obj.first_seen = first_seen
# if last_seen:
# obj.last_seen = last_seen
# if not first_seen or not last_seen:
# self.logger.warning(
# f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
#
# for obj_attr in obj_attrs:
# for tag in self.get_tags():
# obj_attr.add_tag(tag)
# return obj
return
############################################################################
############################################################################
# others optional metas, ... -> # TODO ALL meta in hset
def _get_timeline_name(self):
return Timeline(self.get_global_id(), 'username')
def update_name(self, name, timestamp):
self._get_timeline_name().add_timestamp(timestamp, name)
# TODO # # # # # # # # # # #
def get_users(self):
pass
#### Categories ####
#### Threads ####
#### Messages #### TODO set parents
# def get_last_message_id(self):
#
# return r_object.hget(f'meta:{self.type}:{self.subtype}:{self.id}', 'last:message:id')
class ChatSubChannels(AbstractChatObjects):
def __init__(self):
super().__init__('chat-subchannel')
# if __name__ == '__main__':
# chat = Chat('test', 'telegram')
# r = chat.get_messages()
# print(r)

120
bin/lib/objects/ChatThreads.py Executable file
View file

@ -0,0 +1,120 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from datetime import datetime
from flask import url_for
# from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib import ail_core
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
config_loader = ConfigLoader()
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
r_cache = config_loader.get_redis_conn("Redis_Cache")
config_loader = None
################################################################################
################################################################################
################################################################################
class ChatThread(AbstractChatObject):
"""
AIL Chat Object. (strings)
"""
def __init__(self, id, subtype):
super().__init__('chat-thread', id, subtype)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
return url
def get_svg_icon(self): # TODO
# if self.subtype == 'telegram':
# style = 'fab'
# icon = '\uf2c6'
# elif self.subtype == 'discord':
# style = 'fab'
# icon = '\uf099'
# else:
# style = 'fas'
# icon = '\uf007'
style = 'fas'
icon = '\uf7a4'
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['subtype'] = self.subtype
meta['tags'] = self.get_tags(r_list=True)
if 'name':
meta['name'] = self.get_name()
if 'nb_messages':
meta['nb_messages'] = self.get_nb_messages()
if 'participants':
meta['participants'] = self.get_participants()
if 'nb_participants':
meta['nb_participants'] = self.get_nb_participants()
# created_at ???
return meta
def get_misp_object(self):
return
def create(self, container_obj, message_id):
if message_id:
parent_message = container_obj.get_obj_by_message_id(message_id)
if parent_message: # TODO EXCEPTION IF DON'T EXISTS
self.set_parent(obj_global_id=parent_message)
_, _, parent_id = parent_message.split(':', 2)
self.add_correlation('message', '', parent_id)
else:
self.set_parent(obj_global_id=container_obj.get_global_id())
self.add_correlation(container_obj.get_type(), container_obj.get_subtype(r_str=True), container_obj.get_id())
def create(thread_id, chat_instance, chat_id, subchannel_id, message_id, container_obj):
if container_obj.get_type() == 'chat':
new_thread_id = f'{chat_id}/{thread_id}'
# sub-channel
else:
new_thread_id = f'{chat_id}/{subchannel_id}/{thread_id}'
thread = ChatThread(new_thread_id, chat_instance)
if not thread.is_children():
thread.create(container_obj, message_id)
return thread
class ChatThreads(AbstractChatObjects):
def __init__(self):
super().__init__('chat-thread')
# if __name__ == '__main__':
# chat = Chat('test', 'telegram')
# r = chat.get_messages()
# print(r)

216
bin/lib/objects/Chats.py Executable file
View file

@ -0,0 +1,216 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from datetime import datetime
from flask import url_for
# from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib import ail_core
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
from lib.objects.abstract_subtype_object import AbstractSubtypeObject, get_all_id
from lib.data_retention_engine import update_obj_date
from lib.objects import ail_objects
from lib.timeline_engine import Timeline
from lib.correlations_engine import get_correlation_by_correl_type
config_loader = ConfigLoader()
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
r_cache = config_loader.get_redis_conn("Redis_Cache")
config_loader = None
################################################################################
################################################################################
################################################################################
class Chat(AbstractChatObject):
"""
AIL Chat Object.
"""
def __init__(self, id, subtype):
super(Chat, self).__init__('chat', id, subtype)
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
return url
def get_svg_icon(self): # TODO
# if self.subtype == 'telegram':
# style = 'fab'
# icon = '\uf2c6'
# elif self.subtype == 'discord':
# style = 'fab'
# icon = '\uf099'
# else:
# style = 'fas'
# icon = '\uf007'
style = 'fas'
icon = '\uf086'
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
def get_meta(self, options=set(), translation_target=None):
meta = self._get_meta(options=options)
meta['name'] = self.get_name()
meta['tags'] = self.get_tags(r_list=True)
if 'icon' in options:
meta['icon'] = self.get_icon()
meta['img'] = meta['icon']
if 'info' in options:
meta['info'] = self.get_info()
if 'translation' in options and translation_target:
meta['translation_info'] = self.translate(meta['info'], field='info', target=translation_target)
if 'participants' in options:
meta['participants'] = self.get_participants()
if 'nb_participants' in options:
meta['nb_participants'] = self.get_nb_participants()
if 'nb_messages' in options:
meta['nb_messages'] = self.get_nb_messages()
if 'username' in options:
meta['username'] = self.get_username()
if 'subchannels' in options:
meta['subchannels'] = self.get_subchannels()
if 'nb_subchannels':
meta['nb_subchannels'] = self.get_nb_subchannels()
if 'created_at':
meta['created_at'] = self.get_created_at(date=True)
if 'threads' in options:
meta['threads'] = self.get_threads()
if 'tags_safe' in options:
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
return meta
def get_misp_object(self):
# obj_attrs = []
# if self.subtype == 'telegram':
# obj = MISPObject('telegram-account', standalone=True)
# obj_attrs.append(obj.add_attribute('username', value=self.id))
#
# elif self.subtype == 'twitter':
# obj = MISPObject('twitter-account', standalone=True)
# obj_attrs.append(obj.add_attribute('name', value=self.id))
#
# else:
# obj = MISPObject('user-account', standalone=True)
# obj_attrs.append(obj.add_attribute('username', value=self.id))
#
# first_seen = self.get_first_seen()
# last_seen = self.get_last_seen()
# if first_seen:
# obj.first_seen = first_seen
# if last_seen:
# obj.last_seen = last_seen
# if not first_seen or not last_seen:
# self.logger.warning(
# f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
#
# for obj_attr in obj_attrs:
# for tag in self.get_tags():
# obj_attr.add_tag(tag)
# return obj
return
############################################################################
############################################################################
# users that send at least a message else participants/spectator
# correlation created by messages
def get_users(self):
users = set()
accounts = self.get_correlation('user-account').get('user-account', [])
for account in accounts:
users.add(account[1:])
return users
def _get_timeline_username(self):
return Timeline(self.get_global_id(), 'username')
def get_username(self):
return self._get_timeline_username().get_last_obj_id()
def get_usernames(self):
return self._get_timeline_username().get_objs_ids()
def update_username_timeline(self, username_global_id, timestamp):
self._get_timeline_username().add_timestamp(timestamp, username_global_id)
#### ChatSubChannels ####
#### Categories ####
#### Threads ####
#### Messages #### TODO set parents
# def get_last_message_id(self):
#
# return r_object.hget(f'meta:{self.type}:{self.subtype}:{self.id}', 'last:message:id')
# def add(self, timestamp, obj_id, mess_id=0, username=None, user_id=None):
# date = # TODO get date from object
# self.update_daterange(date)
# update_obj_date(date, self.type, self.subtype)
#
#
# # daily
# r_object.hincrby(f'{self.type}:{self.subtype}:{date}', self.id, 1)
# # all subtypes
# r_object.zincrby(f'{self.type}_all:{self.subtype}', 1, self.id)
#
# #######################################################################
# #######################################################################
#
# # Correlations
# self.add_correlation('item', '', item_id)
# # domain
# if is_crawled(item_id):
# domain = get_item_domain(item_id)
# self.add_correlation('domain', '', domain)
# importer -> use cache for previous reply SET to_add_id: previously_imported : expire SET key -> 30 mn
class Chats(AbstractChatObjects):
def __init__(self):
super().__init__('chat')
# TODO factorize
def get_all_subtypes():
return ail_core.get_object_all_subtypes('chat')
def get_all():
objs = {}
for subtype in get_all_subtypes():
objs[subtype] = get_all_by_subtype(subtype)
return objs
def get_all_by_subtype(subtype):
return get_all_id('chat', subtype)
if __name__ == '__main__':
chat = Chat('test', 'telegram')
r = chat.get_messages()
print(r)

118
bin/lib/objects/CookiesNames.py Executable file
View file

@ -0,0 +1,118 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from hashlib import sha256
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
# TODO NEW ABSTRACT OBJECT -> daterange for all objects ????
class CookieName(AbstractDaterangeObject):
"""
AIL CookieName Object.
"""
def __init__(self, obj_id):
super(CookieName, self).__init__('cookie-name', obj_id)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_content(self, r_type='str'):
if r_type == 'str':
return self._get_field('content')
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
# TODO # CHANGE COLOR
def get_svg_icon(self):
return {'style': 'fas', 'icon': '\uf564', 'color': '#BFD677', 'radius': 5} # f563
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('cookie')
first_seen = self.get_first_seen()
last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('cookie-name', value=self.get_content()))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_nb_seen(self):
return self.get_nb_correlation('domain')
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['tags'] = self.get_tags(r_list=True)
meta['content'] = self.get_content()
return meta
def create(self, content, _first_seen=None, _last_seen=None):
if not isinstance(content, str):
content = content.decode()
self._set_field('content', content)
self._create()
def create(content):
if isinstance(content, str):
content = content.encode()
obj_id = sha256(content).hexdigest()
cookie = CookieName(obj_id)
if not cookie.exists():
cookie.create(content)
return cookie
class CookiesNames(AbstractDaterangeObjects):
"""
CookieName Objects
"""
def __init__(self):
super().__init__('cookie-name', CookieName)
def sanitize_id_to_search(self, name_to_search):
return name_to_search # TODO
# if __name__ == '__main__':
# name_to_search = '98'
# print(search_cves_by_name(name_to_search))

View file

@ -107,8 +107,15 @@ class CryptoCurrency(AbstractSubtypeObject):
def get_misp_object(self): def get_misp_object(self):
obj_attrs = [] obj_attrs = []
obj = MISPObject('coin-address') obj = MISPObject('coin-address')
obj.first_seen = self.get_first_seen() first_seen = self.get_first_seen()
obj.last_seen = self.get_last_seen() last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('address', value=self.id)) obj_attrs.append(obj.add_attribute('address', value=self.id))
crypto_symbol = self.get_currency_symbol() crypto_symbol = self.get_currency_symbol()

View file

@ -57,8 +57,15 @@ class Cve(AbstractDaterangeObject):
def get_misp_object(self): def get_misp_object(self):
obj_attrs = [] obj_attrs = []
obj = MISPObject('vulnerability') obj = MISPObject('vulnerability')
obj.first_seen = self.get_first_seen() first_seen = self.get_first_seen()
obj.last_seen = self.get_last_seen() last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('id', value=self.id)) obj_attrs.append(obj.add_attribute('id', value=self.id))
for obj_attr in obj_attrs: for obj_attr in obj_attrs:
@ -72,9 +79,6 @@ class Cve(AbstractDaterangeObject):
meta['tags'] = self.get_tags(r_list=True) meta['tags'] = self.get_tags(r_list=True)
return meta return meta
def add(self, date, item_id):
self._add(date, item_id)
def get_cve_search(self): def get_cve_search(self):
try: try:
response = requests.get(f'https://cvepremium.circl.lu/api/cve/{self.id}', timeout=10) response = requests.get(f'https://cvepremium.circl.lu/api/cve/{self.id}', timeout=10)

View file

@ -111,13 +111,25 @@ class Decoded(AbstractDaterangeObject):
def get_rel_path(self, mimetype=None): def get_rel_path(self, mimetype=None):
if not mimetype: if not mimetype:
mimetype = self.get_mimetype() mimetype = self.get_mimetype()
if not mimetype:
self.logger.warning(f'Decoded {self.id}: Empty mimetype')
return None
return os.path.join(HASH_DIR, mimetype, self.id[0:2], self.id) return os.path.join(HASH_DIR, mimetype, self.id[0:2], self.id)
def get_filepath(self, mimetype=None): def get_filepath(self, mimetype=None):
return os.path.join(os.environ['AIL_HOME'], self.get_rel_path(mimetype=mimetype)) rel_path = self.get_rel_path(mimetype=mimetype)
if not rel_path:
return None
else:
return os.path.join(os.environ['AIL_HOME'], rel_path)
def get_content(self, mimetype=None, r_type='str'): def get_content(self, mimetype=None, r_type='str'):
filepath = self.get_filepath(mimetype=mimetype) filepath = self.get_filepath(mimetype=mimetype)
if not filepath:
if r_type == 'str':
return ''
else:
return b''
if r_type == 'str': if r_type == 'str':
with open(filepath, 'r') as f: with open(filepath, 'r') as f:
content = f.read() content = f.read()
@ -126,7 +138,7 @@ class Decoded(AbstractDaterangeObject):
with open(filepath, 'rb') as f: with open(filepath, 'rb') as f:
content = f.read() content = f.read()
return content return content
elif r_str == 'bytesio': elif r_type == 'bytesio':
with open(filepath, 'rb') as f: with open(filepath, 'rb') as f:
content = BytesIO(f.read()) content = BytesIO(f.read())
return content return content
@ -137,15 +149,22 @@ class Decoded(AbstractDaterangeObject):
with zipfile.ZipFile(zip_content, "w") as zf: with zipfile.ZipFile(zip_content, "w") as zf:
# TODO: Fix password # TODO: Fix password
# zf.setpassword(b"infected") # zf.setpassword(b"infected")
zf.writestr(self.id, self.get_content().getvalue()) zf.writestr(self.id, self.get_content(r_type='bytesio').getvalue())
zip_content.seek(0) zip_content.seek(0)
return zip_content return zip_content
def get_misp_object(self): def get_misp_object(self):
obj_attrs = [] obj_attrs = []
obj = MISPObject('file') obj = MISPObject('file')
obj.first_seen = self.get_first_seen() first_seen = self.get_first_seen()
obj.last_seen = self.get_last_seen() last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('sha1', value=self.id)) obj_attrs.append(obj.add_attribute('sha1', value=self.id))
obj_attrs.append(obj.add_attribute('mimetype', value=self.get_mimetype())) obj_attrs.append(obj.add_attribute('mimetype', value=self.get_mimetype()))
@ -220,8 +239,8 @@ class Decoded(AbstractDaterangeObject):
return True return True
def add(self, algo_name, date, obj_id, mimetype=None): def add(self, date, obj, algo_name, mimetype=None):
self._add(date, obj_id) self._add(date, obj)
if not mimetype: if not mimetype:
mimetype = self.get_mimetype() mimetype = self.get_mimetype()
@ -435,13 +454,13 @@ def get_all_decodeds_objects(filters={}):
if i >= len(files): if i >= len(files):
files = [] files = []
for file in files: for file in files:
yield Decoded(file).id yield Decoded(file)
############################################################################ ############################################################################
def sanityze_decoder_names(decoder_name): def sanityze_decoder_names(decoder_name):
if decoder_name not in Decodeds.get_algos(): if decoder_name not in get_algos():
return None return None
else: else:
return decoder_name return decoder_name

View file

@ -311,6 +311,9 @@ class Domain(AbstractObject):
root_item = self.get_last_item_root() root_item = self.get_last_item_root()
if root_item: if root_item:
return self.get_crawled_items(root_item) return self.get_crawled_items(root_item)
else:
return []
# TODO FIXME # TODO FIXME
def get_all_urls(self, date=False, epoch=None): def get_all_urls(self, date=False, epoch=None):
@ -341,8 +344,15 @@ class Domain(AbstractObject):
# create domain-ip obj # create domain-ip obj
obj_attrs = [] obj_attrs = []
obj = MISPObject('domain-crawled', standalone=True) obj = MISPObject('domain-crawled', standalone=True)
obj.first_seen = self.get_first_seen() first_seen = self.get_first_seen()
obj.last_seen = self.get_last_check() last_seen = self.get_last_check()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('domain', value=self.id)) obj_attrs.append(obj.add_attribute('domain', value=self.id))
urls = self.get_all_urls(date=True, epoch=epoch) urls = self.get_all_urls(date=True, epoch=epoch)
@ -379,10 +389,10 @@ class Domain(AbstractObject):
har = get_item_har(item_id) har = get_item_har(item_id)
if har: if har:
print(har) print(har)
_write_in_zip_buffer(zf, os.path.join(hars_dir, har), f'{basename}.json') _write_in_zip_buffer(zf, os.path.join(hars_dir, har), f'{basename}.json.gz')
# Screenshot # Screenshot
screenshot = self._get_external_correlation('item', '', item_id, 'screenshot') screenshot = self._get_external_correlation('item', '', item_id, 'screenshot')
if screenshot: if screenshot and screenshot['screenshot']:
screenshot = screenshot['screenshot'].pop()[1:] screenshot = screenshot['screenshot'].pop()[1:]
screenshot = os.path.join(screenshot[0:2], screenshot[2:4], screenshot[4:6], screenshot[6:8], screenshot = os.path.join(screenshot[0:2], screenshot[2:4], screenshot[4:6], screenshot[6:8],
screenshot[8:10], screenshot[10:12], screenshot[12:]) screenshot[8:10], screenshot[10:12], screenshot[12:])
@ -585,21 +595,22 @@ def get_domains_up_by_filers(domain_types, date_from=None, date_to=None, tags=[]
return None return None
def sanitize_domain_name_to_search(name_to_search, domain_type): def sanitize_domain_name_to_search(name_to_search, domain_type):
if not name_to_search:
return ""
if domain_type == 'onion': if domain_type == 'onion':
r_name = r'[a-z0-9\.]+' r_name = r'[a-z0-9\.]+'
else: else:
r_name = r'[a-zA-Z0-9-_\.]+' r_name = r'[a-zA-Z0-9-_\.]+'
# invalid domain name # invalid domain name
if not re.fullmatch(r_name, name_to_search): if not re.fullmatch(r_name, name_to_search):
res = re.match(r_name, name_to_search) return ""
return {'search': name_to_search, 'error': res.string.replace( res[0], '')}
return name_to_search.replace('.', '\.') return name_to_search.replace('.', '\.')
def search_domain_by_name(name_to_search, domain_types, r_pos=False): def search_domain_by_name(name_to_search, domain_types, r_pos=False):
domains = {} domains = {}
for domain_type in domain_types: for domain_type in domain_types:
r_name = sanitize_domain_name_to_search(name_to_search, domain_type) r_name = sanitize_domain_name_to_search(name_to_search, domain_type)
if not name_to_search or isinstance(r_name, dict): if not r_name:
break break
r_name = re.compile(r_name) r_name = re.compile(r_name)
for domain in get_domains_up_by_type(domain_type): for domain in get_domains_up_by_type(domain_type):

118
bin/lib/objects/Etags.py Executable file
View file

@ -0,0 +1,118 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from hashlib import sha256
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
# TODO NEW ABSTRACT OBJECT -> daterange for all objects ????
class Etag(AbstractDaterangeObject):
"""
AIL Etag Object.
"""
def __init__(self, obj_id):
super(Etag, self).__init__('etag', obj_id)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_content(self, r_type='str'):
if r_type == 'str':
return self._get_field('content')
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
# TODO # CHANGE COLOR
def get_svg_icon(self):
return {'style': 'fas', 'icon': '\uf02b', 'color': '#556F65', 'radius': 5}
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('etag')
first_seen = self.get_first_seen()
last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('etag', value=self.get_content()))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_nb_seen(self):
return self.get_nb_correlation('domain')
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['tags'] = self.get_tags(r_list=True)
meta['content'] = self.get_content()
return meta
def create(self, content, _first_seen=None, _last_seen=None):
if not isinstance(content, str):
content = content.decode()
self._set_field('content', content)
self._create()
def create(content):
if isinstance(content, str):
content = content.encode()
obj_id = sha256(content).hexdigest()
etag = Etag(obj_id)
if not etag.exists():
etag.create(content)
return etag
class Etags(AbstractDaterangeObjects):
"""
Etags Objects
"""
def __init__(self):
super().__init__('etag', Etag)
def sanitize_id_to_search(self, name_to_search):
return name_to_search # TODO
# if __name__ == '__main__':
# name_to_search = '98'
# print(search_cves_by_name(name_to_search))

118
bin/lib/objects/Favicons.py Executable file
View file

@ -0,0 +1,118 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import mmh3
import os
import sys
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
class Favicon(AbstractDaterangeObject):
"""
AIL Favicon Object.
"""
def __init__(self, id):
super(Favicon, self).__init__('favicon', id)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_content(self, r_type='str'):
if r_type == 'str':
return self._get_field('content')
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
# TODO # CHANGE COLOR
def get_svg_icon(self):
return {'style': 'fas', 'icon': '\uf20a', 'color': '#1E88E5', 'radius': 5} # f0c8 f45c
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('favicon')
first_seen = self.get_first_seen()
last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('favicon-mmh3', value=self.id))
obj_attrs.append(obj.add_attribute('favicon', value=self.get_content(r_type='bytes')))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['tags'] = self.get_tags(r_list=True)
if 'content' in options:
meta['content'] = self.get_content()
return meta
# def get_links(self):
# # TODO GET ALL URLS FROM CORRELATED ITEMS
def create(self, content, _first_seen=None, _last_seen=None):
if not isinstance(content, str):
content = content.decode()
self._set_field('content', content)
self._create()
def create_favicon(content, url=None): # TODO URL ????
if isinstance(content, str):
content = content.encode()
favicon_id = mmh3.hash_bytes(content)
favicon = Favicon(favicon_id)
if not favicon.exists():
favicon.create(content)
class Favicons(AbstractDaterangeObjects):
"""
Favicons Objects
"""
def __init__(self):
super().__init__('favicon', Favicon)
def sanitize_id_to_search(self, name_to_search):
return name_to_search # TODO
# if __name__ == '__main__':
# name_to_search = '98'
# print(search_cves_by_name(name_to_search))

101
bin/lib/objects/FilesNames.py Executable file
View file

@ -0,0 +1,101 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_object = config_loader.get_db_conn("Kvrocks_Objects")
config_loader = None
class FileName(AbstractDaterangeObject):
"""
AIL FileName Object. (strings)
"""
# ID = SHA256
def __init__(self, name):
super().__init__('file-name', name)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
def get_svg_icon(self):
return {'style': 'far', 'icon': '\uf249', 'color': '#36F5D5', 'radius': 5}
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('file')
# obj_attrs.append(obj.add_attribute('sha256', value=self.id))
# obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['tags'] = self.get_tags(r_list=True)
if 'tags_safe' in options:
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
return meta
def create(self): # create ALL SET ??????
pass
def add_reference(self, date, src_ail_object, file_obj=None):
self.add(date, src_ail_object)
if file_obj:
self.add_correlation(file_obj.type, file_obj.get_subtype(r_str=True), file_obj.get_id())
# TODO USE ZSET FOR ALL OBJS IDS ??????
class FilesNames(AbstractDaterangeObjects):
"""
CookieName Objects
"""
def __init__(self):
super().__init__('file-name', FileName)
def sanitize_id_to_search(self, name_to_search):
return name_to_search
# TODO sanitize file name
def create(self, name, date, src_ail_object, file_obj=None, limit=500, force=False):
if 0 < len(name) <= limit or force or limit < 0:
file_name = self.obj_class(name)
# if not file_name.exists():
# file_name.create()
file_name.add_reference(date, src_ail_object, file_obj=file_obj)
return file_name
# if __name__ == '__main__':
# name_to_search = '29ba'
# print(search_screenshots_by_name(name_to_search))

135
bin/lib/objects/HHHashs.py Executable file
View file

@ -0,0 +1,135 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import hashlib
import os
import sys
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
class HHHash(AbstractDaterangeObject):
"""
AIL HHHash Object.
"""
def __init__(self, obj_id):
super(HHHash, self).__init__('hhhash', obj_id)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_content(self, r_type='str'):
if r_type == 'str':
return self._get_field('content')
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
# TODO # CHANGE COLOR
def get_svg_icon(self):
return {'style': 'fas', 'icon': '\uf036', 'color': '#71D090', 'radius': 5}
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('hhhash')
first_seen = self.get_first_seen()
last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('hhhash', value=self.get_id()))
obj_attrs.append(obj.add_attribute('hhhash-headers', value=self.get_content()))
obj_attrs.append(obj.add_attribute('hhhash-tool', value='lacus'))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_nb_seen(self):
return self.get_nb_correlation('domain')
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['tags'] = self.get_tags(r_list=True)
meta['content'] = self.get_content()
return meta
def create(self, hhhash_header, _first_seen=None, _last_seen=None): # TODO CREATE ADD FUNCTION -> urls set
self._set_field('content', hhhash_header)
self._create()
def create(hhhash_header, hhhash=None):
if not hhhash:
hhhash = hhhash_headers(hhhash_header)
hhhash = HHHash(hhhash)
if not hhhash.exists():
hhhash.create(hhhash_header)
return hhhash
def build_hhhash_headers(dict_headers): # filter_dup=True
hhhash = ''
previous_header = ''
for header in dict_headers:
header_name = header.get('name')
if header_name:
if header_name != previous_header: # remove dup headers, filter playwright invalid splitting
hhhash = f'{hhhash}:{header_name}'
previous_header = header_name
hhhash = hhhash[1:]
# print(hhhash)
return hhhash
def hhhash_headers(header_hhhash):
m = hashlib.sha256()
m.update(header_hhhash.encode())
digest = m.hexdigest()
return f"hhh:1:{digest}"
class HHHashs(AbstractDaterangeObjects):
"""
HHHashs Objects
"""
def __init__(self):
super().__init__('hhhash', HHHash)
def sanitize_id_to_search(self, name_to_search):
return name_to_search # TODO
# if __name__ == '__main__':
# name_to_search = '98'
# print(search_cves_by_name(name_to_search))

135
bin/lib/objects/Images.py Executable file
View file

@ -0,0 +1,135 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import base64
import os
import sys
from hashlib import sha256
from io import BytesIO
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_serv_metadata = config_loader.get_db_conn("Kvrocks_Objects")
IMAGE_FOLDER = config_loader.get_files_directory('images')
config_loader = None
class Image(AbstractDaterangeObject):
"""
AIL Screenshot Object. (strings)
"""
# ID = SHA256
def __init__(self, image_id):
super(Image, self).__init__('image', image_id)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def exists(self):
return os.path.isfile(self.get_filepath())
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
def get_svg_icon(self):
return {'style': 'far', 'icon': '\uf03e', 'color': '#E1F5DF', 'radius': 5}
def get_rel_path(self):
rel_path = os.path.join(self.id[0:2], self.id[2:4], self.id[4:6], self.id[6:8], self.id[8:10], self.id[10:12], self.id[12:])
return rel_path
def get_filepath(self):
filename = os.path.join(IMAGE_FOLDER, self.get_rel_path())
return os.path.realpath(filename)
def get_file_content(self):
filepath = self.get_filepath()
with open(filepath, 'rb') as f:
file_content = BytesIO(f.read())
return file_content
def get_content(self, r_type='str'):
return self.get_file_content()
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('file')
obj_attrs.append(obj.add_attribute('sha256', value=self.id))
obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['img'] = self.id
meta['tags'] = self.get_tags(r_list=True)
if 'content' in options:
meta['content'] = self.get_content()
if 'tags_safe' in options:
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
return meta
def create(self, content):
filepath = self.get_filepath()
dirname = os.path.dirname(filepath)
if not os.path.exists(dirname):
os.makedirs(dirname)
with open(filepath, 'wb') as f:
f.write(content)
def get_screenshot_dir():
return IMAGE_FOLDER
def create(content, size_limit=5000000, b64=False, force=False):
size = (len(content)*3) / 4
if size <= size_limit or size_limit < 0 or force:
if b64:
content = base64.standard_b64decode(content.encode())
image_id = sha256(content).hexdigest()
image = Image(image_id)
if not image.exists():
image.create(content)
return image
class Images(AbstractDaterangeObjects):
"""
CookieName Objects
"""
def __init__(self):
super().__init__('image', Image)
def sanitize_id_to_search(self, name_to_search):
return name_to_search # TODO
# if __name__ == '__main__':
# name_to_search = '29ba'
# print(search_screenshots_by_name(name_to_search))

View file

@ -7,10 +7,10 @@ import magic
import os import os
import re import re
import sys import sys
import cld3
import html2text import html2text
from io import BytesIO from io import BytesIO
from uuid import uuid4
from pymisp import MISPObject from pymisp import MISPObject
@ -18,10 +18,11 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from lib.ail_core import get_ail_uuid from lib.ail_core import get_ail_uuid, rreplace
from lib.objects.abstract_object import AbstractObject from lib.objects.abstract_object import AbstractObject
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib import item_basic from lib import item_basic
from lib.Language import LanguagesDetector
from lib.data_retention_engine import update_obj_date, get_obj_date_first from lib.data_retention_engine import update_obj_date, get_obj_date_first
from packages import Date from packages import Date
@ -137,9 +138,23 @@ class Item(AbstractObject):
#################################################################################### ####################################################################################
#################################################################################### ####################################################################################
def sanitize_id(self): # TODO ADD function to check if ITEM (content + file) already exists
pass
def sanitize_id(self):
if ITEMS_FOLDER in self.id:
self.id = self.id.replace(ITEMS_FOLDER, '', 1)
# limit filename length
basename = self.get_basename()
if len(basename) > 255:
new_basename = f'{basename[:215]}{str(uuid4())}.gz'
self.id = rreplace(self.id, basename, new_basename, 1)
return self.id
# # TODO: sanitize_id # # TODO: sanitize_id
# # TODO: check if already exists ? # # TODO: check if already exists ?
@ -211,9 +226,13 @@ class Item(AbstractObject):
return {'style': '', 'icon': '', 'color': color, 'radius': 5} return {'style': '', 'icon': '', 'color': color, 'radius': 5}
def get_misp_object(self): def get_misp_object(self):
obj_date = self.get_date()
obj = MISPObject('ail-leak', standalone=True) obj = MISPObject('ail-leak', standalone=True)
obj_date = self.get_date()
if obj_date:
obj.first_seen = obj_date obj.first_seen = obj_date
else:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={obj_date}')
obj_attrs = [obj.add_attribute('first-seen', value=obj_date), obj_attrs = [obj.add_attribute('first-seen', value=obj_date),
obj.add_attribute('raw-data', value=self.id, data=self.get_raw_content()), obj.add_attribute('raw-data', value=self.id, data=self.get_raw_content()),
@ -260,10 +279,9 @@ class Item(AbstractObject):
""" """
if options is None: if options is None:
options = set() options = set()
meta = {'id': self.id, meta = self.get_default_meta(tags=True)
'date': self.get_date(separator=True), meta['date'] = self.get_date(separator=True)
'source': self.get_source(), meta['source'] = self.get_source()
'tags': self.get_tags(r_list=True)}
# optional meta fields # optional meta fields
if 'content' in options: if 'content' in options:
meta['content'] = self.get_content() meta['content'] = self.get_content()
@ -283,6 +301,10 @@ class Item(AbstractObject):
if 'mimetype' in options: if 'mimetype' in options:
content = meta.get('content') content = meta.get('content')
meta['mimetype'] = self.get_mimetype(content=content) meta['mimetype'] = self.get_mimetype(content=content)
if 'investigations' in options:
meta['investigations'] = self.get_investigations()
if 'link' in options:
meta['link'] = self.get_link(flask_context=True)
# meta['encoding'] = None # meta['encoding'] = None
return meta return meta
@ -316,21 +338,10 @@ class Item(AbstractObject):
nb_line += 1 nb_line += 1
return {'nb': nb_line, 'max_length': max_length} return {'nb': nb_line, 'max_length': max_length}
def get_languages(self, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7): # TODO RENAME ME
all_languages = [] def get_languages(self, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7, force_gcld3=False):
## CLEAN CONTENT ## ld = LanguagesDetector(nb_langs=num_langs, min_proportion=min_proportion, min_probability=min_probability, min_len=min_len)
content = self.get_html2text_content(ignore_links=True) return ld.detect(self.get_content(), force_gcld3=force_gcld3)
content = remove_all_urls_from_content(self.id, item_content=content) ##########################################
# REMOVE USELESS SPACE
content = ' '.join(content.split())
#- CLEAN CONTENT -#
#print(content)
#print(len(content))
if len(content) >= min_len: # # TODO: # FIXME: check num langs limit
for lang in cld3.get_frequent_languages(content, num_langs=num_langs):
if lang.proportion >= min_proportion and lang.probability >= min_probability and lang.is_reliable:
all_languages.append(lang)
return all_languages
def get_mimetype(self, content=None): def get_mimetype(self, content=None):
if not content: if not content:
@ -476,7 +487,10 @@ def get_all_items_objects(filters={}):
daterange = Date.get_daterange(date_from, date_to) daterange = Date.get_daterange(date_from, date_to)
else: else:
date_from = get_obj_date_first('item') date_from = get_obj_date_first('item')
if date_from:
daterange = Date.get_daterange(date_from, Date.get_today_date_str()) daterange = Date.get_daterange(date_from, Date.get_today_date_str())
else:
daterange = []
if start_date: if start_date:
if int(start_date) > int(date_from): if int(start_date) > int(date_from):
i = 0 i = 0
@ -615,61 +629,6 @@ def get_item_metadata(item_id, item_content=None):
def get_item_content(item_id): def get_item_content(item_id):
return item_basic.get_item_content(item_id) return item_basic.get_item_content(item_id)
def get_item_content_html2text(item_id, item_content=None, ignore_links=False):
if not item_content:
item_content = get_item_content(item_id)
h = html2text.HTML2Text()
h.ignore_links = ignore_links
h.ignore_images = ignore_links
return h.handle(item_content)
def remove_all_urls_from_content(item_id, item_content=None):
if not item_content:
item_content = get_item_content(item_id)
regex = r'\b(?:http://|https://)?(?:[a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)(?:\:[0-9]+)*(?:/(?:$|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*\b'
url_regex = re.compile(regex)
urls = url_regex.findall(item_content)
urls = sorted(urls, key=len, reverse=True)
for url in urls:
item_content = item_content.replace(url, '')
regex_pgp_public_blocs = r'-----BEGIN PGP PUBLIC KEY BLOCK-----[\s\S]+?-----END PGP PUBLIC KEY BLOCK-----'
regex_pgp_signature = r'-----BEGIN PGP SIGNATURE-----[\s\S]+?-----END PGP SIGNATURE-----'
regex_pgp_message = r'-----BEGIN PGP MESSAGE-----[\s\S]+?-----END PGP MESSAGE-----'
re.compile(regex_pgp_public_blocs)
re.compile(regex_pgp_signature)
re.compile(regex_pgp_message)
res = re.findall(regex_pgp_public_blocs, item_content)
for it in res:
item_content = item_content.replace(it, '')
res = re.findall(regex_pgp_signature, item_content)
for it in res:
item_content = item_content.replace(it, '')
res = re.findall(regex_pgp_message, item_content)
for it in res:
item_content = item_content.replace(it, '')
return item_content
def get_item_languages(item_id, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7):
all_languages = []
## CLEAN CONTENT ##
content = get_item_content_html2text(item_id, ignore_links=True)
content = remove_all_urls_from_content(item_id, item_content=content)
# REMOVE USELESS SPACE
content = ' '.join(content.split())
#- CLEAN CONTENT -#
#print(content)
#print(len(content))
if len(content) >= min_len:
for lang in cld3.get_frequent_languages(content, num_langs=num_langs):
if lang.proportion >= min_proportion and lang.probability >= min_probability and lang.is_reliable:
all_languages.append(lang)
return all_languages
# API # API
# def get_item(request_dict): # def get_item(request_dict):
@ -920,13 +879,13 @@ def create_item(obj_id, obj_metadata, io_content):
# delete_item(child_id) # delete_item(child_id)
if __name__ == '__main__': # if __name__ == '__main__':
# content = 'test file content' # content = 'test file content'
# duplicates = {'tests/2020/01/02/test.gz': [{'algo':'ssdeep', 'similarity':75}, {'algo':'tlsh', 'similarity':45}]} # duplicates = {'tests/2020/01/02/test.gz': [{'algo':'ssdeep', 'similarity':75}, {'algo':'tlsh', 'similarity':45}]}
# #
# item = Item('tests/2020/01/02/test_save.gz') # item = Item('tests/2020/01/02/test_save.gz')
# item.create(content, _save=False) # item.create(content, _save=False)
filters = {'date_from': '20230101', 'date_to': '20230501', 'sources': ['crawled', 'submitted'], 'start': ':submitted/2023/04/28/submitted_2b3dd861-a75d-48e4-8cec-6108d41450da.gz'} # filters = {'date_from': '20230101', 'date_to': '20230501', 'sources': ['crawled', 'submitted'], 'start': ':submitted/2023/04/28/submitted_2b3dd861-a75d-48e4-8cec-6108d41450da.gz'}
gen = get_all_items_objects(filters=filters) # gen = get_all_items_objects(filters=filters)
for obj_id in gen: # for obj_id in gen:
print(obj_id.id) # print(obj_id.id)

348
bin/lib/objects/Messages.py Executable file
View file

@ -0,0 +1,348 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import re
import sys
from datetime import datetime
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ail_core import get_ail_uuid
from lib.objects.abstract_object import AbstractObject
from lib.ConfigLoader import ConfigLoader
from lib import Language
from lib.objects import UsersAccount
from lib.data_retention_engine import update_obj_date, get_obj_date_first
# TODO Set all messages ???
from flask import url_for
config_loader = ConfigLoader()
r_cache = config_loader.get_redis_conn("Redis_Cache")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
# r_content = config_loader.get_db_conn("Kvrocks_Content")
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
# TODO SAVE OR EXTRACT MESSAGE SOURCE FOR ICON ?????????
# TODO iterate on all objects
# TODO also add support for small objects ????
# CAN Message exists without CHAT -> no convert it to object
# ID: source:chat_id:message_id ????
#
# /!\ handle null chat and message id -> chat = uuid and message = timestamp ???
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<message ID> => telegram without channels
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Channel ID>/<message ID>
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Thread ID>/<message ID>
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Channel ID>/<Thread ID>/<message ID>
class Message(AbstractObject):
"""
AIL Message Object. (strings)
"""
def __init__(self, id): # TODO subtype or use source ????
super(Message, self).__init__('message', id) # message::< telegram/1692189934.380827/ChatID_MessageID >
def exists(self):
if self.subtype is None:
return r_object.exists(f'meta:{self.type}:{self.id}')
else:
return r_object.exists(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
def get_source(self):
"""
Returns source/feeder name
"""
l_source = self.id.split('/')[:-2]
return os.path.join(*l_source)
def get_basename(self):
return os.path.basename(self.id)
def get_content(self, r_type='str'): # TODO ADD cache # TODO Compress content ???????
"""
Returns content
"""
global_id = self.get_global_id()
content = r_cache.get(f'content:{global_id}')
if not content:
content = self._get_field('content')
if content:
r_cache.set(f'content:{global_id}', content)
r_cache.expire(f'content:{global_id}', 300)
if r_type == 'str':
return content
elif r_type == 'bytes':
return content.encode()
def get_date(self):
timestamp = self.get_timestamp()
return datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
def get_timestamp(self):
dirs = self.id.split('/')
return dirs[1]
def get_message_id(self): # TODO optimize
message_id = self.get_basename().rsplit('/', 1)[1]
# if message_id.endswith('.gz'):
# message_id = message_id[:-3]
return message_id
def get_chat_id(self): # TODO optimize -> use me to tag Chat
chat_id = self.get_basename().rsplit('_', 1)[0]
return chat_id
def get_thread(self):
for child in self.get_childrens():
obj_type, obj_subtype, obj_id = child.split(':', 2)
if obj_type == 'chat-thread':
nb_messages = r_object.zcard(f'messages:{obj_type}:{obj_subtype}:{obj_id}')
return {'type': obj_type, 'subtype': obj_subtype, 'id': obj_id, 'nb': nb_messages}
# TODO get Instance ID
# TODO get channel ID
# TODO get thread ID
def get_images(self):
images = []
for child in self.get_childrens():
obj_type, _, obj_id = child.split(':', 2)
if obj_type == 'image':
images.append(obj_id)
return images
def get_user_account(self, meta=False):
user_account = self.get_correlation('user-account')
if user_account.get('user-account'):
user_account = f'user-account:{user_account["user-account"].pop()}'
if meta:
_, user_account_subtype, user_account_id = user_account.split(':', 3)
user_account = UsersAccount.UserAccount(user_account_id, user_account_subtype).get_meta(options={'icon', 'username', 'username_meta'})
return user_account
def get_files_names(self):
names = []
filenames = self.get_correlation('file-name').get('file-name')
if filenames:
for name in filenames:
names.append(name[1:])
return names
def get_reactions(self):
return r_object.hgetall(f'meta:reactions:{self.type}::{self.id}')
# TODO sanitize reactions
def add_reaction(self, reactions, nb_reaction):
r_object.hset(f'meta:reactions:{self.type}::{self.id}', reactions, nb_reaction)
# Interactions between users -> use replies
# nb views
# MENTIONS -> Messages + Chats
# # relationship -> mention - Chat -> Chat
# - Message -> Chat
# - Message -> Message ??? fetch mentioned messages
# FORWARDS
# TODO Create forward CHAT -> message
# message (is forwarded) -> message (is forwarded from) ???
# # TODO get source message timestamp
#
# # is forwarded
# # forwarded from -> check if relationship
# # nb forwarded -> scard relationship
#
# Messages -> CHATS -> NB forwarded
# CHAT -> NB forwarded by chats -> NB messages -> parse full set ????
#
#
#
#
#
#
# show users chats
# message media
# flag is deleted -> event or missing from feeder pass ???
def get_translation(self, content=None, source=None, target='fr'):
"""
Returns translated content
"""
# return self._get_field('translated')
global_id = self.get_global_id()
translation = r_cache.get(f'translation:{target}:{global_id}')
r_cache.expire(f'translation:{target}:{global_id}', 0)
if translation:
return translation
if not content:
content = self.get_content()
translation = Language.LanguageTranslator().translate(content, source=source, target=target)
if translation:
r_cache.set(f'translation:{target}:{global_id}', translation)
r_cache.expire(f'translation:{target}:{global_id}', 300)
return translation
def _set_translation(self, translation):
"""
Set translated content
"""
return self._set_field('translated', translation) # translation by hash ??? -> avoid translating multiple time
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True)}
# return payload
def get_link(self, flask_context=False):
if flask_context:
url = url_for('chats_explorer.objects_message', type=self.type, id=self.id)
else:
url = f'{baseurl}/objects/message?id={self.id}'
return url
def get_svg_icon(self):
return {'style': 'fas', 'icon': '\uf4ad', 'color': '#4dffff', 'radius': 5}
def get_misp_object(self): # TODO
obj = MISPObject('instant-message', standalone=True)
obj_date = self.get_date()
if obj_date:
obj.first_seen = obj_date
else:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={obj_date}')
# obj_attrs = [obj.add_attribute('first-seen', value=obj_date),
# obj.add_attribute('raw-data', value=self.id, data=self.get_raw_content()),
# obj.add_attribute('sensor', value=get_ail_uuid())]
obj_attrs = []
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
# def get_url(self):
# return r_object.hget(f'meta:item::{self.id}', 'url')
# options: set of optional meta fields
def get_meta(self, options=None, timestamp=None, translation_target='en'):
"""
:type options: set
:type timestamp: float
"""
if options is None:
options = set()
meta = self.get_default_meta(tags=True)
# timestamp
if not timestamp:
timestamp = self.get_timestamp()
else:
timestamp = float(timestamp)
timestamp = datetime.fromtimestamp(float(timestamp))
meta['date'] = timestamp.strftime('%Y/%m/%d')
meta['hour'] = timestamp.strftime('%H:%M:%S')
meta['full_date'] = timestamp.isoformat(' ')
meta['source'] = self.get_source()
# optional meta fields
if 'content' in options:
meta['content'] = self.get_content()
if 'parent' in options:
meta['parent'] = self.get_parent()
if meta['parent'] and 'parent_meta' in options:
options.remove('parent')
parent_type, _, parent_id = meta['parent'].split(':', 3)
if parent_type == 'message':
message = Message(parent_id)
meta['reply_to'] = message.get_meta(options=options, translation_target=translation_target)
if 'investigations' in options:
meta['investigations'] = self.get_investigations()
if 'link' in options:
meta['link'] = self.get_link(flask_context=True)
if 'icon' in options:
meta['icon'] = self.get_svg_icon()
if 'user-account' in options:
meta['user-account'] = self.get_user_account(meta=True)
if not meta['user-account']:
meta['user-account'] = {'id': 'UNKNOWN'}
if 'chat' in options:
meta['chat'] = self.get_chat_id()
if 'thread' in options:
thread = self.get_thread()
if thread:
meta['thread'] = thread
if 'images' in options:
meta['images'] = self.get_images()
if 'files-names' in options:
meta['files-names'] = self.get_files_names()
if 'reactions' in options:
meta['reactions'] = self.get_reactions()
if 'translation' in options and translation_target:
meta['translation'] = self.translate(content=meta.get('content'), target=translation_target)
# meta['encoding'] = None
return meta
# def translate(self, content=None): # TODO translation plugin
# # TODO get text language
# if not content:
# content = self.get_content()
# translated = argostranslate.translate.translate(content, 'ru', 'en')
# # Save translation
# self._set_translation(translated)
# return translated
def create(self, content, translation=None, tags=[]):
self._set_field('content', content)
# r_content.get(f'content:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', content)
if translation:
self._set_translation(translation)
for tag in tags:
self.add_tag(tag)
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
pass
def create_obj_id(chat_instance, chat_id, message_id, timestamp, channel_id=None, thread_id=None): # TODO CHECK COLLISIONS
timestamp = int(timestamp)
if channel_id and thread_id:
return f'{chat_instance}/{timestamp}/{chat_id}/{thread_id}/{message_id}'
elif channel_id:
return f'{chat_instance}/{timestamp}/{channel_id}/{chat_id}/{message_id}'
elif thread_id:
return f'{chat_instance}/{timestamp}/{chat_id}/{thread_id}/{message_id}'
else:
return f'{chat_instance}/{timestamp}/{chat_id}/{message_id}'
# thread id of message
# thread id of chat
# thread id of subchannel
# TODO Check if already exists
# def create(source, chat_id, message_id, timestamp, content, tags=[]):
def create(obj_id, content, translation=None, tags=[]):
message = Message(obj_id)
# if not message.exists():
message.create(content, translation=translation, tags=tags)
return message
# TODO Encode translation
if __name__ == '__main__':
r = 'test'
print(r)

View file

@ -71,8 +71,15 @@ class Pgp(AbstractSubtypeObject):
def get_misp_object(self): def get_misp_object(self):
obj_attrs = [] obj_attrs = []
obj = MISPObject('pgp-meta') obj = MISPObject('pgp-meta')
obj.first_seen = self.get_first_seen() first_seen = self.get_first_seen()
obj.last_seen = self.get_last_seen() last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
if self.subtype == 'key': if self.subtype == 'key':
obj_attrs.append(obj.add_attribute('key-id', value=self.id)) obj_attrs.append(obj.add_attribute('key-id', value=self.id))

View file

@ -9,6 +9,7 @@ import sys
from hashlib import sha256 from hashlib import sha256
from io import BytesIO from io import BytesIO
from flask import url_for from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
@ -79,15 +80,15 @@ class Screenshot(AbstractObject):
obj_attrs = [] obj_attrs = []
obj = MISPObject('file') obj = MISPObject('file')
obj_attrs.append( obj.add_attribute('sha256', value=self.id) ) obj_attrs.append(obj.add_attribute('sha256', value=self.id))
obj_attrs.append( obj.add_attribute('attachment', value=self.id, data=self.get_file_content()) ) obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
for obj_attr in obj_attrs: for obj_attr in obj_attrs:
for tag in self.get_tags(): for tag in self.get_tags():
obj_attr.add_tag(tag) obj_attr.add_tag(tag)
return obj return obj
def get_meta(self, options=set()): def get_meta(self, options=set()):
meta = {'id': self.id} meta = self.get_default_meta()
meta['img'] = get_screenshot_rel_path(self.id) ######### # TODO: Rename ME ?????? meta['img'] = get_screenshot_rel_path(self.id) ######### # TODO: Rename ME ??????
meta['tags'] = self.get_tags(r_list=True) meta['tags'] = self.get_tags(r_list=True)
if 'tags_safe' in options: if 'tags_safe' in options:

123
bin/lib/objects/Titles.py Executable file
View file

@ -0,0 +1,123 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from hashlib import sha256
from flask import url_for
# import warnings
# warnings.filterwarnings("ignore", category=DeprecationWarning)
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
config_loader = ConfigLoader()
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
class Title(AbstractDaterangeObject):
"""
AIL Title Object.
"""
def __init__(self, id):
super(Title, self).__init__('title', id)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_content(self, r_type='str'):
if r_type == 'str':
return self._get_field('content')
elif r_type == 'bytes':
return self._get_field('content').encode()
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
return url
def get_svg_icon(self):
return {'style': 'fas', 'icon': '\uf1dc', 'color': '#3C7CFF', 'radius': 5}
def get_misp_object(self):
obj_attrs = []
obj = MISPObject('tsk-web-history')
first_seen = self.get_first_seen()
last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
obj_attrs.append(obj.add_attribute('title', value=self.get_content()))
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_meta(self, options=set()):
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['tags'] = self.get_tags(r_list=True)
meta['content'] = self.get_content()
return meta
def create(self, content, _first_seen=None, _last_seen=None):
self._set_field('content', content)
self._create()
def create_title(content):
title_id = sha256(content.encode()).hexdigest()
title = Title(title_id)
if not title.exists():
title.create(content)
return title
class Titles(AbstractDaterangeObjects):
"""
Titles Objects
"""
def __init__(self):
super().__init__('title', Title)
def sanitize_id_to_search(self, name_to_search):
return name_to_search
# if __name__ == '__main__':
# # from lib import crawlers
# # from lib.objects import Items
# # for item in Items.get_all_items_objects(filters={'sources': ['crawled']}):
# # title_content = crawlers.extract_title_from_html(item.get_content())
# # if title_content:
# # print(item.id, title_content)
# # title = create_title(title_content)
# # title.add(item.get_date(), item.id)
# titles = Titles()
# # for r in titles.get_ids_iterator():
# # print(r)
# r = titles.search_by_id('f7d57B', r_pos=True, case_sensitive=False)
# print(r)

View file

@ -82,8 +82,16 @@ class Username(AbstractSubtypeObject):
obj = MISPObject('user-account', standalone=True) obj = MISPObject('user-account', standalone=True)
obj_attrs.append(obj.add_attribute('username', value=self.id)) obj_attrs.append(obj.add_attribute('username', value=self.id))
obj.first_seen = self.get_first_seen() first_seen = self.get_first_seen()
obj.last_seen = self.get_last_seen() last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
for obj_attr in obj_attrs: for obj_attr in obj_attrs:
for tag in self.get_tags(): for tag in self.get_tags():
obj_attr.add_tag(tag) obj_attr.add_tag(tag)

216
bin/lib/objects/UsersAccount.py Executable file
View file

@ -0,0 +1,216 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
# import re
# from datetime import datetime
from flask import url_for
from pymisp import MISPObject
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib import ail_core
from lib.ConfigLoader import ConfigLoader
from lib.objects.abstract_subtype_object import AbstractSubtypeObject, get_all_id
from lib.timeline_engine import Timeline
from lib.objects import Usernames
config_loader = ConfigLoader()
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
config_loader = None
################################################################################
################################################################################
################################################################################
class UserAccount(AbstractSubtypeObject):
"""
AIL User Object. (strings)
"""
def __init__(self, id, subtype):
super(UserAccount, self).__init__('user-account', id, subtype)
# def get_ail_2_ail_payload(self):
# payload = {'raw': self.get_gzip_content(b64=True),
# 'compress': 'gzip'}
# return payload
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
def delete(self):
# # TODO:
pass
def get_link(self, flask_context=False):
if flask_context:
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
else:
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
return url
def get_svg_icon(self): # TODO change icon/color
return {'style': 'fas', 'icon': '\uf2bd', 'color': '#4dffff', 'radius': 5}
def get_first_name(self):
return self._get_field('firstname')
def get_last_name(self):
return self._get_field('lastname')
def get_phone(self):
return self._get_field('phone')
def set_first_name(self, firstname):
return self._set_field('firstname', firstname)
def set_last_name(self, lastname):
return self._set_field('lastname', lastname)
def set_phone(self, phone):
return self._set_field('phone', phone)
def get_icon(self):
icon = self._get_field('icon')
if icon:
return icon.rsplit(':', 1)[1]
def set_icon(self, icon):
self._set_field('icon', icon)
def get_info(self):
return self._get_field('info')
def set_info(self, info):
return self._set_field('info', info)
# def get_created_at(self, date=False):
# created_at = self._get_field('created_at')
# if date and created_at:
# created_at = datetime.fromtimestamp(float(created_at))
# created_at = created_at.isoformat(' ')
# return created_at
# TODO MESSAGES:
# 1) ALL MESSAGES + NB
# 2) ALL MESSAGES TIMESTAMP
# 3) ALL MESSAGES TIMESTAMP By: - chats
# - subchannel
# - thread
def get_chats(self):
chats = self.get_correlation('chat')['chat']
return chats
def get_chat_subchannels(self):
chats = self.get_correlation('chat-subchannel')['chat-subchannel']
return chats
def get_chat_threads(self):
chats = self.get_correlation('chat-thread')['chat-thread']
return chats
def _get_timeline_username(self):
return Timeline(self.get_global_id(), 'username')
def get_username(self):
return self._get_timeline_username().get_last_obj_id()
def get_usernames(self):
return self._get_timeline_username().get_objs_ids()
def update_username_timeline(self, username_global_id, timestamp):
self._get_timeline_username().add_timestamp(timestamp, username_global_id)
def get_messages_by_chat_obj(self, chat_obj):
messages = []
for mess in self.get_correlation_iter_obj(chat_obj, 'message'):
messages.append(f'message:{mess}')
return messages
def get_meta(self, options=set(), translation_target=None): # TODO Username timeline
meta = self._get_meta(options=options)
meta['id'] = self.id
meta['subtype'] = self.subtype
meta['tags'] = self.get_tags(r_list=True) # TODO add in options ????
if 'username' in options:
meta['username'] = self.get_username()
if meta['username']:
_, username_account_subtype, username_account_id = meta['username'].split(':', 3)
if 'username_meta' in options:
meta['username'] = Usernames.Username(username_account_id, username_account_subtype).get_meta()
else:
meta['username'] = {'type': 'username', 'subtype': username_account_subtype, 'id': username_account_id}
if 'usernames' in options:
meta['usernames'] = self.get_usernames()
if 'icon' in options:
meta['icon'] = self.get_icon()
if 'info' in options:
meta['info'] = self.get_info()
if 'translation' in options and translation_target:
meta['translation_info'] = self.translate(meta['info'], field='info', target=translation_target)
# if 'created_at':
# meta['created_at'] = self.get_created_at(date=True)
if 'chats' in options:
meta['chats'] = self.get_chats()
if 'subchannels' in options:
meta['subchannels'] = self.get_chat_subchannels()
if 'threads' in options:
meta['threads'] = self.get_chat_threads()
return meta
def get_misp_object(self):
obj_attrs = []
if self.subtype == 'telegram':
obj = MISPObject('telegram-account', standalone=True)
obj_attrs.append(obj.add_attribute('username', value=self.id))
elif self.subtype == 'twitter':
obj = MISPObject('twitter-account', standalone=True)
obj_attrs.append(obj.add_attribute('name', value=self.id))
else:
obj = MISPObject('user-account', standalone=True)
obj_attrs.append(obj.add_attribute('username', value=self.id))
first_seen = self.get_first_seen()
last_seen = self.get_last_seen()
if first_seen:
obj.first_seen = first_seen
if last_seen:
obj.last_seen = last_seen
if not first_seen or not last_seen:
self.logger.warning(
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
for obj_attr in obj_attrs:
for tag in self.get_tags():
obj_attr.add_tag(tag)
return obj
def get_user_by_username():
pass
def get_all_subtypes():
return ail_core.get_object_all_subtypes('user-account')
def get_all():
users = {}
for subtype in get_all_subtypes():
users[subtype] = get_all_by_subtype(subtype)
return users
def get_all_by_subtype(subtype):
return get_all_id('user-account', subtype)
if __name__ == '__main__':
from lib.objects import Chats
chat = Chats.Chat('', '00098785-7e70-5d12-a120-c5cdc1252b2b')
account = UserAccount('', '00098785-7e70-5d12-a120-c5cdc1252b2b')
print(account.get_messages_by_chat_obj(chat))

View file

@ -0,0 +1,306 @@
# -*-coding:UTF-8 -*
"""
Base Class for AIL Objects
"""
##################################
# Import External packages
##################################
import os
import sys
import time
from abc import ABC
from datetime import datetime
# from flask import url_for
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.objects.abstract_subtype_object import AbstractSubtypeObject
from lib.ail_core import unpack_correl_objs_id, zscan_iter ################
from lib.ConfigLoader import ConfigLoader
from lib.objects import Messages
from packages import Date
# from lib.data_retention_engine import update_obj_date
# LOAD CONFIG
config_loader = ConfigLoader()
r_cache = config_loader.get_redis_conn("Redis_Cache")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
config_loader = None
# # FIXME: SAVE SUBTYPE NAMES ?????
class AbstractChatObject(AbstractSubtypeObject, ABC):
"""
Abstract Subtype Object
"""
def __init__(self, obj_type, id, subtype):
""" Abstract for all the AIL object
:param obj_type: object type (item, ...)
:param id: Object ID
"""
super().__init__(obj_type, id, subtype)
# get useraccount / username
# get users ?
# timeline name ????
# info
# created
# last imported/updated
# TODO get instance
# TODO get protocol
# TODO get network
# TODO get address
def get_chat(self): # require ail object TODO ##
if self.type != 'chat':
parent = self.get_parent()
if parent:
obj_type, _ = parent.split(':', 1)
if obj_type == 'chat':
return parent
def get_subchannels(self):
subchannels = []
if self.type == 'chat': # category ???
for obj_global_id in self.get_childrens():
obj_type, _ = obj_global_id.split(':', 1)
if obj_type == 'chat-subchannel':
subchannels.append(obj_global_id)
return subchannels
def get_nb_subchannels(self):
nb = 0
if self.type == 'chat':
for obj_global_id in self.get_childrens():
obj_type, _ = obj_global_id.split(':', 1)
if obj_type == 'chat-subchannel':
nb += 1
return nb
def get_threads(self):
threads = []
for child in self.get_childrens():
obj_type, obj_subtype, obj_id = child.split(':', 2)
if obj_type == 'chat-thread':
threads.append({'type': obj_type, 'subtype': obj_subtype, 'id': obj_id})
return threads
def get_created_at(self, date=False):
created_at = self._get_field('created_at')
if date and created_at:
created_at = datetime.fromtimestamp(float(created_at))
created_at = created_at.isoformat(' ')
return created_at
def set_created_at(self, timestamp):
self._set_field('created_at', timestamp)
def get_name(self):
name = self._get_field('name')
if not name:
name = ''
return name
def set_name(self, name):
self._set_field('name', name)
def get_icon(self):
icon = self._get_field('icon')
if icon:
return icon.rsplit(':', 1)[1]
def set_icon(self, icon):
self._set_field('icon', icon)
def get_info(self):
return self._get_field('info')
def set_info(self, info):
self._set_field('info', info)
def get_nb_messages(self):
return r_object.zcard(f'messages:{self.type}:{self.subtype}:{self.id}')
def _get_messages(self, nb=-1, page=-1):
if nb < 1:
messages = r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, -1, withscores=True)
nb_pages = 0
page = 1
total = len(messages)
nb_first = 1
nb_last = total
else:
total = r_object.zcard(f'messages:{self.type}:{self.subtype}:{self.id}')
nb_pages = total / nb
if not nb_pages.is_integer():
nb_pages = int(nb_pages) + 1
else:
nb_pages = int(nb_pages)
if page > nb_pages or page < 1:
page = nb_pages
if page > 1:
start = (page - 1) * nb
else:
start = 0
messages = r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', start, start+nb-1, withscores=True)
# if messages:
# messages = reversed(messages)
nb_first = start+1
nb_last = start+nb
if nb_last > total:
nb_last = total
return messages, {'nb': nb, 'page': page, 'nb_pages': nb_pages, 'total': total, 'nb_first': nb_first, 'nb_last': nb_last}
def get_timestamp_first_message(self):
return r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0, withscores=True)
def get_timestamp_last_message(self):
return r_object.zrevrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0, withscores=True)
def get_first_message(self):
return r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0)
def get_last_message(self):
return r_object.zrevrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0)
def get_nb_message_by_hours(self, date_day, nb_day):
hours = []
# start=0, end=23
timestamp = time.mktime(datetime.strptime(date_day, "%Y%m%d").timetuple())
for i in range(24):
timestamp_end = timestamp + 3600
nb_messages = r_object.zcount(f'messages:{self.type}:{self.subtype}:{self.id}', timestamp, timestamp_end)
timestamp = timestamp_end
hours.append({'date': f'{date_day[0:4]}-{date_day[4:6]}-{date_day[6:8]}', 'day': nb_day, 'hour': i, 'count': nb_messages})
return hours
def get_nb_message_by_week(self, date_day):
date_day = Date.get_date_week_by_date(date_day)
week_messages = []
i = 0
for date in Date.daterange_add_days(date_day, 6):
week_messages = week_messages + self.get_nb_message_by_hours(date, i)
i += 1
return week_messages
def get_nb_message_this_week(self):
week_date = Date.get_current_week_day()
return self.get_nb_message_by_week(week_date)
def get_message_meta(self, message, timestamp=None, translation_target='en'): # TODO handle file message
message = Messages.Message(message[9:])
meta = message.get_meta(options={'content', 'files-names', 'images', 'link', 'parent', 'parent_meta', 'reactions', 'thread', 'translation', 'user-account'}, timestamp=timestamp, translation_target=translation_target)
return meta
def get_messages(self, start=0, page=-1, nb=500, unread=False, translation_target='en'): # threads ???? # TODO ADD last/first message timestamp + return page
# TODO return message meta
tags = {}
messages = {}
curr_date = None
try:
nb = int(nb)
except TypeError:
nb = 500
if not page:
page = -1
try:
page = int(page)
except TypeError:
page = 1
mess, pagination = self._get_messages(nb=nb, page=page)
for message in mess:
timestamp = message[1]
date_day = datetime.fromtimestamp(timestamp).strftime('%Y/%m/%d')
if date_day != curr_date:
messages[date_day] = []
curr_date = date_day
mess_dict = self.get_message_meta(message[0], timestamp=timestamp, translation_target=translation_target)
messages[date_day].append(mess_dict)
if mess_dict.get('tags'):
for tag in mess_dict['tags']:
if tag not in tags:
tags[tag] = 0
tags[tag] += 1
return messages, pagination, tags
# TODO REWRITE ADD OR ADD MESSAGE ????
# add
# add message
def get_obj_by_message_id(self, message_id):
return r_object.hget(f'messages:ids:{self.type}:{self.subtype}:{self.id}', message_id)
def add_message_cached_reply(self, reply_id, message_id):
r_cache.sadd(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{reply_id}', message_id)
r_cache.expire(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{reply_id}', 600)
def _get_message_cached_reply(self, message_id):
return r_cache.smembers(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{message_id}')
def get_cached_message_reply(self, message_id):
objs_global_id = []
for mess_id in self._get_message_cached_reply(message_id):
obj_global_id = self.get_obj_by_message_id(mess_id) # TODO CATCH EXCEPTION
if obj_global_id:
objs_global_id.append(obj_global_id)
return objs_global_id
def add_message(self, obj_global_id, message_id, timestamp, reply_id=None):
r_object.hset(f'messages:ids:{self.type}:{self.subtype}:{self.id}', message_id, obj_global_id)
r_object.zadd(f'messages:{self.type}:{self.subtype}:{self.id}', {obj_global_id: float(timestamp)})
# MESSAGE REPLY
if reply_id:
reply_obj = self.get_obj_by_message_id(reply_id) # TODO CATCH EXCEPTION
if reply_obj:
self.add_obj_children(reply_obj, obj_global_id)
else:
self.add_message_cached_reply(reply_id, message_id)
# CACHED REPLIES
for mess_id in self.get_cached_message_reply(message_id):
self.add_obj_children(obj_global_id, mess_id)
# def get_deleted_messages(self, message_id):
def get_participants(self):
return unpack_correl_objs_id('user-account', self.get_correlation('user-account')['user-account'], r_type='dict')
def get_nb_participants(self):
return self.get_nb_correlation('user-account')
# TODO move me to abstract subtype
class AbstractChatObjects(ABC):
def __init__(self, type):
self.type = type
def add_subtype(self, subtype):
r_object.sadd(f'all_{self.type}:subtypes', subtype)
def get_subtypes(self):
return r_object.smembers(f'all_{self.type}:subtypes')
def get_nb_ids_by_subtype(self, subtype):
return r_object.zcard(f'{self.type}_all:{subtype}')
def get_ids_by_subtype(self, subtype):
return r_object.zrange(f'{self.type}_all:{subtype}', 0, -1)
def get_all_id_iterator_iter(self, subtype):
return zscan_iter(r_object, f'{self.type}_all:{subtype}')
def get_ids(self):
pass
def search(self):
pass

View file

@ -7,6 +7,7 @@ Base Class for AIL Objects
# Import External packages # Import External packages
################################## ##################################
import os import os
import re
import sys import sys
from abc import abstractmethod, ABC from abc import abstractmethod, ABC
@ -44,8 +45,14 @@ class AbstractDaterangeObject(AbstractObject, ABC):
def exists(self): def exists(self):
return r_object.exists(f'meta:{self.type}:{self.id}') return r_object.exists(f'meta:{self.type}:{self.id}')
def _get_field(self, field): # TODO remove me (NEW in abstract)
return r_object.hget(f'meta:{self.type}:{self.id}', field)
def _set_field(self, field, value): # TODO remove me (NEW in abstract)
return r_object.hset(f'meta:{self.type}:{self.id}', field, value)
def get_first_seen(self, r_int=False): def get_first_seen(self, r_int=False):
first_seen = r_object.hget(f'meta:{self.type}:{self.id}', 'first_seen') first_seen = self._get_field('first_seen')
if r_int: if r_int:
if first_seen: if first_seen:
return int(first_seen) return int(first_seen)
@ -55,7 +62,7 @@ class AbstractDaterangeObject(AbstractObject, ABC):
return first_seen return first_seen
def get_last_seen(self, r_int=False): def get_last_seen(self, r_int=False):
last_seen = r_object.hget(f'meta:{self.type}:{self.id}', 'last_seen') last_seen = self._get_field('last_seen')
if r_int: if r_int:
if last_seen: if last_seen:
return int(last_seen) return int(last_seen)
@ -64,8 +71,8 @@ class AbstractDaterangeObject(AbstractObject, ABC):
else: else:
return last_seen return last_seen
def get_nb_seen(self): def get_nb_seen(self): # TODO REPLACE ME -> correlation image
return self.get_nb_correlation('item') return self.get_nb_correlation('item') + self.get_nb_correlation('message')
def get_nb_seen_by_date(self, date): def get_nb_seen_by_date(self, date):
nb = r_object.zscore(f'{self.type}:date:{date}', self.id) nb = r_object.zscore(f'{self.type}:date:{date}', self.id)
@ -75,18 +82,19 @@ class AbstractDaterangeObject(AbstractObject, ABC):
return int(nb) return int(nb)
def _get_meta(self, options=[]): def _get_meta(self, options=[]):
meta_dict = {'first_seen': self.get_first_seen(), meta_dict = self.get_default_meta()
'last_seen': self.get_last_seen(), meta_dict['first_seen'] = self.get_first_seen()
'nb_seen': self.get_nb_seen()} meta_dict['last_seen'] = self.get_last_seen()
meta_dict['nb_seen'] = self.get_nb_seen()
if 'sparkline' in options: if 'sparkline' in options:
meta_dict['sparkline'] = self.get_sparkline() meta_dict['sparkline'] = self.get_sparkline()
return meta_dict return meta_dict
def set_first_seen(self, first_seen): def set_first_seen(self, first_seen):
r_object.hset(f'meta:{self.type}:{self.id}', 'first_seen', first_seen) self._set_field('first_seen', first_seen)
def set_last_seen(self, last_seen): def set_last_seen(self, last_seen):
r_object.hset(f'meta:{self.type}:{self.id}', 'last_seen', last_seen) self._set_field('last_seen', last_seen)
def update_daterange(self, date): def update_daterange(self, date):
date = int(date) date = int(date)
@ -117,9 +125,7 @@ class AbstractDaterangeObject(AbstractObject, ABC):
def _add_create(self): def _add_create(self):
r_object.sadd(f'{self.type}:all', self.id) r_object.sadd(f'{self.type}:all', self.id)
# TODO don't increase nb if same hash in item with different encoding def _add(self, date, obj): # TODO OBJ=None
# if hash already in item
def _add(self, date, item_id):
if not self.exists(): if not self.exists():
self._add_create() self._add_create()
self.set_first_seen(date) self.set_first_seen(date)
@ -128,22 +134,132 @@ class AbstractDaterangeObject(AbstractObject, ABC):
self.update_daterange(date) self.update_daterange(date)
update_obj_date(date, self.type) update_obj_date(date, self.type)
# NB Object seen by day
if not self.is_correlated('item', '', item_id): # if decoded not already in object
r_object.zincrby(f'{self.type}:date:{date}', 1, self.id) r_object.zincrby(f'{self.type}:date:{date}', 1, self.id)
if obj:
# Correlations # Correlations
self.add_correlation('item', '', item_id) self.add_correlation(obj.type, obj.get_subtype(r_str=True), obj.get_id())
if is_crawled(item_id): # Domain
if obj.type == 'item':
item_id = obj.get_id()
# domain
if is_crawled(item_id):
domain = get_item_domain(item_id) domain = get_item_domain(item_id)
self.add_correlation('domain', '', domain) self.add_correlation('domain', '', domain)
def add(self, date, obj):
self._add(date, obj)
# TODO:ADD objects + Stats # TODO:ADD objects + Stats
def _create(self, first_seen, last_seen): def _create(self, first_seen=None, last_seen=None):
if first_seen:
self.set_first_seen(first_seen) self.set_first_seen(first_seen)
if last_seen:
self.set_last_seen(last_seen) self.set_last_seen(last_seen)
r_object.sadd(f'{self.type}:all', self.id) r_object.sadd(f'{self.type}:all', self.id)
# TODO # TODO
def _delete(self): def _delete(self):
pass pass
class AbstractDaterangeObjects(ABC):
"""
Abstract Daterange Objects
"""
def __init__(self, obj_type, obj_class):
""" Abstract for Daterange Objects
:param obj_type: object type (item, ...)
:param obj_class: object python class (Item, ...)
"""
self.type = obj_type
self.obj_class = obj_class
def get_ids(self):
return r_object.smembers(f'{self.type}:all')
# def get_ids_iterator(self):
# return r_object.sscan_iter(r_object, f'{self.type}:all')
def get_by_date(self, date):
return r_object.zrange(f'{self.type}:date:{date}', 0, -1)
def get_nb_by_date(self, date):
return r_object.zcard(f'{self.type}:date:{date}')
def get_by_daterange(self, date_from, date_to):
obj_ids = set()
for date in Date.substract_date(date_from, date_to):
obj_ids = obj_ids | set(self.get_by_date(date))
return obj_ids
def get_metas(self, obj_ids, options=set()):
dict_obj = {}
for obj_id in obj_ids:
obj = self.obj_class(obj_id)
dict_obj[obj_id] = obj.get_meta(options=options)
return dict_obj
@abstractmethod
def sanitize_id_to_search(self, id_to_search):
return id_to_search
def search_by_id(self, name_to_search, r_pos=False, case_sensitive=True):
objs = {}
if case_sensitive:
flags = 0
else:
flags = re.IGNORECASE
# for subtype in subtypes:
r_name = self.sanitize_id_to_search(name_to_search)
if not name_to_search or isinstance(r_name, dict):
return objs
r_name = re.compile(r_name, flags=flags)
for obj_id in self.get_ids(): # TODO REPLACE ME WITH AN ITERATOR
res = re.search(r_name, obj_id)
if res:
objs[obj_id] = {}
if r_pos:
objs[obj_id]['hl-start'] = res.start()
objs[obj_id]['hl-end'] = res.end()
return objs
def sanitize_content_to_search(self, content_to_search):
return content_to_search
def search_by_content(self, content_to_search, r_pos=False, case_sensitive=True):
objs = {}
if case_sensitive:
flags = 0
else:
flags = re.IGNORECASE
# for subtype in subtypes:
r_search = self.sanitize_content_to_search(content_to_search)
if not r_search or isinstance(r_search, dict):
return objs
r_search = re.compile(r_search, flags=flags)
for obj_id in self.get_ids(): # TODO REPLACE ME WITH AN ITERATOR
obj = self.obj_class(obj_id)
content = obj.get_content()
res = re.search(r_search, content)
if res:
objs[obj_id] = {}
if r_pos: # TODO ADD CONTENT ????
objs[obj_id]['hl-start'] = res.start()
objs[obj_id]['hl-end'] = res.end()
objs[obj_id]['content'] = content
return objs
def api_get_chart_nb_by_daterange(self, date_from, date_to):
date_type = []
for date in Date.substract_date(date_from, date_to):
d = {'date': f'{date[0:4]}-{date[4:6]}-{date[6:8]}',
self.type: self.get_nb_by_date(date)}
date_type.append(d)
return date_type
def api_get_meta_by_daterange(self, date_from, date_to):
date = Date.sanitise_date_range(date_from, date_to)
return self.get_metas(self.get_by_daterange(date['date_from'], date['date_to']), options={'sparkline'})

View file

@ -7,6 +7,7 @@ Base Class for AIL Objects
# Import External packages # Import External packages
################################## ##################################
import os import os
import logging.config
import sys import sys
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from pymisp import MISPObject from pymisp import MISPObject
@ -17,23 +18,28 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from lib import ail_logger
from lib import Tag from lib import Tag
from lib.ConfigLoader import ConfigLoader
from lib import Duplicate from lib import Duplicate
from lib.correlations_engine import get_nb_correlations, get_correlations, add_obj_correlation, delete_obj_correlation, delete_obj_correlations, exists_obj_correlation, is_obj_correlated, get_nb_correlation_by_correl_type from lib.correlations_engine import get_nb_correlations, get_correlations, add_obj_correlation, delete_obj_correlation, delete_obj_correlations, exists_obj_correlation, is_obj_correlated, get_nb_correlation_by_correl_type, get_obj_inter_correlation
from lib.Investigations import is_object_investigated, get_obj_investigations, delete_obj_investigations from lib.Investigations import is_object_investigated, get_obj_investigations, delete_obj_investigations
from lib.relationships_engine import get_obj_nb_relationships, add_obj_relationship
from lib.Language import get_obj_translation
from lib.Tracker import is_obj_tracked, get_obj_trackers, delete_obj_trackers from lib.Tracker import is_obj_tracked, get_obj_trackers, delete_obj_trackers
logging.config.dictConfig(ail_logger.get_config(name='ail'))
config_loader = ConfigLoader()
# r_cache = config_loader.get_redis_conn("Redis_Cache")
r_object = config_loader.get_db_conn("Kvrocks_Objects")
config_loader = None
class AbstractObject(ABC): class AbstractObject(ABC):
""" """
Abstract Object Abstract Object
""" """
# first seen last/seen ??
# # TODO: - tags
# - handle + refactor correlations
# - creates others objects
def __init__(self, obj_type, id, subtype=None): def __init__(self, obj_type, id, subtype=None):
""" Abstract for all the AIL object """ Abstract for all the AIL object
@ -44,6 +50,8 @@ class AbstractObject(ABC):
self.type = obj_type self.type = obj_type
self.subtype = subtype self.subtype = subtype
self.logger = logging.getLogger(f'{self.__class__.__name__}')
def get_id(self): def get_id(self):
return self.id return self.id
@ -59,14 +67,28 @@ class AbstractObject(ABC):
def get_global_id(self): def get_global_id(self):
return f'{self.get_type()}:{self.get_subtype(r_str=True)}:{self.get_id()}' return f'{self.get_type()}:{self.get_subtype(r_str=True)}:{self.get_id()}'
def get_default_meta(self, tags=False): def get_default_meta(self, tags=False, link=False):
dict_meta = {'id': self.get_id(), dict_meta = {'id': self.get_id(),
'type': self.get_type(), 'type': self.get_type(),
'subtype': self.get_subtype()} 'subtype': self.get_subtype(r_str=True)}
if tags: if tags:
dict_meta['tags'] = self.get_tags() dict_meta['tags'] = self.get_tags()
if link:
dict_meta['link'] = self.get_link()
return dict_meta return dict_meta
def _get_field(self, field):
if self.subtype is None:
return r_object.hget(f'meta:{self.type}:{self.id}', field)
else:
return r_object.hget(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', field)
def _set_field(self, field, value):
if self.subtype is None:
return r_object.hset(f'meta:{self.type}:{self.id}', field, value)
else:
return r_object.hset(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', field, value)
## Tags ## ## Tags ##
def get_tags(self, r_list=False): def get_tags(self, r_list=False):
tags = Tag.get_object_tags(self.type, self.id, self.get_subtype(r_str=True)) tags = Tag.get_object_tags(self.type, self.id, self.get_subtype(r_str=True))
@ -74,7 +96,6 @@ class AbstractObject(ABC):
tags = list(tags) tags = list(tags)
return tags return tags
## ADD TAGS ????
def add_tag(self, tag): def add_tag(self, tag):
Tag.add_object_tag(tag, self.type, self.id, subtype=self.get_subtype(r_str=True)) Tag.add_object_tag(tag, self.type, self.id, subtype=self.get_subtype(r_str=True))
@ -83,7 +104,7 @@ class AbstractObject(ABC):
tags = self.get_tags() tags = self.get_tags()
return Tag.is_tags_safe(tags) return Tag.is_tags_safe(tags)
#- Tags -# ## -Tags- ##
@abstractmethod @abstractmethod
def get_content(self): def get_content(self):
@ -98,10 +119,9 @@ class AbstractObject(ABC):
def add_duplicate(self, algo, similarity, id_2): def add_duplicate(self, algo, similarity, id_2):
return Duplicate.add_obj_duplicate(algo, similarity, self.type, self.get_subtype(r_str=True), self.id, id_2) return Duplicate.add_obj_duplicate(algo, similarity, self.type, self.get_subtype(r_str=True), self.id, id_2)
# -Duplicates -# ## -Duplicates- ##
## Investigations ## ## Investigations ##
# # TODO: unregister =====
def is_investigated(self): def is_investigated(self):
if not self.subtype: if not self.subtype:
@ -124,7 +144,7 @@ class AbstractObject(ABC):
unregistered = delete_obj_investigations(self.id, self.type, self.subtype) unregistered = delete_obj_investigations(self.id, self.type, self.subtype)
return unregistered return unregistered
#- Investigations -# ## -Investigations- ##
## Trackers ## ## Trackers ##
@ -137,7 +157,7 @@ class AbstractObject(ABC):
def delete_trackers(self): def delete_trackers(self):
return delete_obj_trackers(self.type, self.subtype, self.id) return delete_obj_trackers(self.type, self.subtype, self.id)
#- Trackers -# ## -Trackers- ##
def _delete(self): def _delete(self):
# DELETE TAGS # DELETE TAGS
@ -186,15 +206,6 @@ class AbstractObject(ABC):
def get_misp_object(self): def get_misp_object(self):
pass pass
@staticmethod
def get_misp_object_first_last_seen(misp_obj):
"""
:type misp_obj: MISPObject
"""
first_seen = misp_obj.get('first_seen')
last_seen = misp_obj.get('last_seen')
return first_seen, last_seen
@staticmethod @staticmethod
def get_misp_object_tags(misp_obj): def get_misp_object_tags(misp_obj):
""" """
@ -209,6 +220,8 @@ class AbstractObject(ABC):
else: else:
return [] return []
## Correlation ##
def _get_external_correlation(self, req_type, req_subtype, req_id, obj_type): def _get_external_correlation(self, req_type, req_subtype, req_id, obj_type):
""" """
Get object correlation Get object correlation
@ -259,13 +272,79 @@ class AbstractObject(ABC):
return is_obj_correlated(self.type, self.subtype, self.id, return is_obj_correlated(self.type, self.subtype, self.id,
object2.get_type(), object2.get_subtype(r_str=True), object2.get_id()) object2.get_type(), object2.get_subtype(r_str=True), object2.get_id())
def get_correlation_iter(self, obj_type2, subtype2, obj_id2, correl_type):
return get_obj_inter_correlation(self.type, self.get_subtype(r_str=True), self.id, obj_type2, subtype2, obj_id2, correl_type)
def get_correlation_iter_obj(self, object2, correl_type):
return self.get_correlation_iter(object2.get_type(), object2.get_subtype(r_str=True), object2.get_id(), correl_type)
def delete_correlation(self, type2, subtype2, id2): def delete_correlation(self, type2, subtype2, id2):
""" """
Get object correlations Get object correlations
""" """
delete_obj_correlation(self.type, self.subtype, self.id, type2, subtype2, id2) delete_obj_correlation(self.type, self.subtype, self.id, type2, subtype2, id2)
## -Correlation- ##
# # TODO: get favicon ## Relationship ##
# # TODO: get url
# # TODO: get metadata def get_nb_relationships(self, filter=[]):
return get_obj_nb_relationships(self.get_global_id())
def add_relationship(self, obj2_global_id, relationship, source=True):
# is source
if source:
print(self.get_global_id(), obj2_global_id, relationship)
add_obj_relationship(self.get_global_id(), obj2_global_id, relationship)
# is target
else:
add_obj_relationship(obj2_global_id, self.get_global_id(), relationship)
## -Relationship- ##
## Translation ##
def translate(self, content=None, field='', source=None, target='en'):
global_id = self.get_global_id()
if not content:
content = self.get_content()
return get_obj_translation(global_id, content, field=field, source=source, target=target)
## -Translation- ##
## Parent ##
def is_parent(self):
return r_object.exists(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
def is_children(self):
return r_object.hexists(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent')
def get_parent(self):
return r_object.hget(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent')
def get_childrens(self):
return r_object.smembers(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
def set_parent(self, obj_type=None, obj_subtype=None, obj_id=None, obj_global_id=None): # TODO # REMOVE ITEM DUP
if not obj_global_id:
if obj_subtype is None:
obj_subtype = ''
obj_global_id = f'{obj_type}:{obj_subtype}:{obj_id}'
r_object.hset(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent', obj_global_id)
r_object.sadd(f'child:{obj_global_id}', self.get_global_id())
def add_children(self, obj_type=None, obj_subtype=None, obj_id=None, obj_global_id=None): # TODO # REMOVE ITEM DUP
if not obj_global_id:
if obj_subtype is None:
obj_subtype = ''
obj_global_id = f'{obj_type}:{obj_subtype}:{obj_id}'
r_object.sadd(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', obj_global_id)
r_object.hset(f'meta:{obj_global_id}', 'parent', self.get_global_id())
## others objects ##
def add_obj_children(self, parent_global_id, son_global_id):
r_object.sadd(f'child:{parent_global_id}', son_global_id)
r_object.hset(f'meta:{son_global_id}', 'parent', parent_global_id)
## Parent ##

View file

@ -72,7 +72,10 @@ class AbstractSubtypeObject(AbstractObject, ABC):
return last_seen return last_seen
def get_nb_seen(self): def get_nb_seen(self):
return int(r_object.zscore(f'{self.type}_all:{self.subtype}', self.id)) nb = r_object.zscore(f'{self.type}_all:{self.subtype}', self.id)
if not nb:
nb = 0
return int(nb)
# # TODO: CHECK RESULT # # TODO: CHECK RESULT
def get_nb_seen_by_date(self, date_day): def get_nb_seen_by_date(self, date_day):
@ -85,7 +88,10 @@ class AbstractSubtypeObject(AbstractObject, ABC):
def _get_meta(self, options=None): def _get_meta(self, options=None):
if options is None: if options is None:
options = set() options = set()
meta = {'first_seen': self.get_first_seen(), meta = {'id': self.id,
'type': self.type,
'subtype': self.subtype,
'first_seen': self.get_first_seen(),
'last_seen': self.get_last_seen(), 'last_seen': self.get_last_seen(),
'nb_seen': self.get_nb_seen()} 'nb_seen': self.get_nb_seen()}
if 'icon' in options: if 'icon' in options:
@ -147,8 +153,11 @@ class AbstractSubtypeObject(AbstractObject, ABC):
# => data Retention + efficient search # => data Retention + efficient search
# #
# #
def _add_subtype(self):
r_object.sadd(f'all_{self.type}:subtypes', self.subtype)
def add(self, date, item_id): def add(self, date, obj=None):
self._add_subtype()
self.update_daterange(date) self.update_daterange(date)
update_obj_date(date, self.type, self.subtype) update_obj_date(date, self.type, self.subtype)
# daily # daily
@ -159,19 +168,21 @@ class AbstractSubtypeObject(AbstractObject, ABC):
####################################################################### #######################################################################
####################################################################### #######################################################################
if obj:
# Correlations # Correlations
self.add_correlation('item', '', item_id) self.add_correlation(obj.type, obj.get_subtype(r_str=True), obj.get_id())
if obj.type == 'item': # TODO same for message->chat ???
item_id = obj.get_id()
# domain # domain
if is_crawled(item_id): if is_crawled(item_id):
domain = get_item_domain(item_id) domain = get_item_domain(item_id)
self.add_correlation('domain', '', domain) self.add_correlation('domain', '', domain)
# TODO:ADD objects + Stats # TODO:ADD objects + Stats
def create(self, first_seen, last_seen): # def create(self, first_seen, last_seen):
self.set_first_seen(first_seen) # self.set_first_seen(first_seen)
self.set_last_seen(last_seen) # self.set_last_seen(last_seen)
def _delete(self): def _delete(self):
pass pass

View file

@ -1,6 +1,5 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
# -*-coding:UTF-8 -* # -*-coding:UTF-8 -*
import os import os
import sys import sys
@ -11,16 +10,29 @@ sys.path.append(os.environ['AIL_BIN'])
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.ail_core import get_all_objects, get_object_all_subtypes from lib.ail_core import get_all_objects, get_object_all_subtypes
from lib import correlations_engine from lib import correlations_engine
from lib import relationships_engine
from lib import btc_ail from lib import btc_ail
from lib import Tag from lib import Tag
from lib.objects import Chats
from lib.objects import ChatSubChannels
from lib.objects import ChatThreads
from lib.objects import CryptoCurrencies from lib.objects import CryptoCurrencies
from lib.objects import CookiesNames
from lib.objects.Cves import Cve from lib.objects.Cves import Cve
from lib.objects.Decodeds import Decoded, get_all_decodeds_objects, get_nb_decodeds_objects from lib.objects.Decodeds import Decoded, get_all_decodeds_objects, get_nb_decodeds_objects
from lib.objects.Domains import Domain from lib.objects.Domains import Domain
from lib.objects import Etags
from lib.objects.Favicons import Favicon
from lib.objects import FilesNames
from lib.objects import HHHashs
from lib.objects.Items import Item, get_all_items_objects, get_nb_items_objects from lib.objects.Items import Item, get_all_items_objects, get_nb_items_objects
from lib.objects import Images
from lib.objects.Messages import Message
from lib.objects import Pgps from lib.objects import Pgps
from lib.objects.Screenshots import Screenshot from lib.objects.Screenshots import Screenshot
from lib.objects import Titles
from lib.objects.UsersAccount import UserAccount
from lib.objects import Usernames from lib.objects import Usernames
config_loader = ConfigLoader() config_loader = ConfigLoader()
@ -44,23 +56,49 @@ def sanitize_objs_types(objs):
return l_types return l_types
def get_object(obj_type, subtype, id): def get_object(obj_type, subtype, obj_id):
if obj_type == 'item': if obj_type == 'item':
return Item(id) return Item(obj_id)
elif obj_type == 'domain': elif obj_type == 'domain':
return Domain(id) return Domain(obj_id)
elif obj_type == 'decoded': elif obj_type == 'decoded':
return Decoded(id) return Decoded(obj_id)
elif obj_type == 'chat':
return Chats.Chat(obj_id, subtype)
elif obj_type == 'chat-subchannel':
return ChatSubChannels.ChatSubChannel(obj_id, subtype)
elif obj_type == 'chat-thread':
return ChatThreads.ChatThread(obj_id, subtype)
elif obj_type == 'cookie-name':
return CookiesNames.CookieName(obj_id)
elif obj_type == 'cve': elif obj_type == 'cve':
return Cve(id) return Cve(obj_id)
elif obj_type == 'etag':
return Etags.Etag(obj_id)
elif obj_type == 'favicon':
return Favicon(obj_id)
elif obj_type == 'file-name':
return FilesNames.FileName(obj_id)
elif obj_type == 'hhhash':
return HHHashs.HHHash(obj_id)
elif obj_type == 'image':
return Images.Image(obj_id)
elif obj_type == 'message':
return Message(obj_id)
elif obj_type == 'screenshot': elif obj_type == 'screenshot':
return Screenshot(id) return Screenshot(obj_id)
elif obj_type == 'cryptocurrency': elif obj_type == 'cryptocurrency':
return CryptoCurrencies.CryptoCurrency(id, subtype) return CryptoCurrencies.CryptoCurrency(obj_id, subtype)
elif obj_type == 'pgp': elif obj_type == 'pgp':
return Pgps.Pgp(id, subtype) return Pgps.Pgp(obj_id, subtype)
elif obj_type == 'title':
return Titles.Title(obj_id)
elif obj_type == 'user-account':
return UserAccount(obj_id, subtype)
elif obj_type == 'username': elif obj_type == 'username':
return Usernames.Username(id, subtype) return Usernames.Username(obj_id, subtype)
else:
raise Exception(f'Unknown AIL object: {obj_type} {subtype} {obj_id}')
def get_objects(objects): def get_objects(objects):
objs = set() objs = set()
@ -93,9 +131,12 @@ def get_obj_global_id(obj_type, subtype, obj_id):
obj = get_object(obj_type, subtype, obj_id) obj = get_object(obj_type, subtype, obj_id)
return obj.get_global_id() return obj.get_global_id()
def get_obj_type_subtype_id_from_global_id(global_id):
obj_type, subtype, obj_id = global_id.split(':', 2)
return obj_type, subtype, obj_id
def get_obj_from_global_id(global_id): def get_obj_from_global_id(global_id):
obj = global_id.split(':', 3) obj = get_obj_type_subtype_id_from_global_id(global_id)
return get_object(obj[0], obj[1], obj[2]) return get_object(obj[0], obj[1], obj[2])
@ -151,7 +192,7 @@ def get_objects_meta(objs, options=set(), flask_context=False):
subtype = obj[1] subtype = obj[1]
obj_id = obj[2] obj_id = obj[2]
else: else:
obj_type, subtype, obj_id = obj.split(':', 2) obj_type, subtype, obj_id = get_obj_type_subtype_id_from_global_id(obj)
metas.append(get_object_meta(obj_type, subtype, obj_id, options=options, flask_context=flask_context)) metas.append(get_object_meta(obj_type, subtype, obj_id, options=options, flask_context=flask_context))
return metas return metas
@ -160,13 +201,17 @@ def get_object_card_meta(obj_type, subtype, id, related_btc=False):
obj = get_object(obj_type, subtype, id) obj = get_object(obj_type, subtype, id)
meta = obj.get_meta() meta = obj.get_meta()
meta['icon'] = obj.get_svg_icon() meta['icon'] = obj.get_svg_icon()
if subtype or obj_type == 'cve': if subtype or obj_type == 'cookie-name' or obj_type == 'cve' or obj_type == 'etag' or obj_type == 'title' or obj_type == 'favicon' or obj_type == 'hhhash':
meta['sparkline'] = obj.get_sparkline() meta['sparkline'] = obj.get_sparkline()
if obj_type == 'cve': if obj_type == 'cve':
meta['cve_search'] = obj.get_cve_search() meta['cve_search'] = obj.get_cve_search()
# if obj_type == 'title':
# meta['cve_search'] = obj.get_cve_search()
if subtype == 'bitcoin' and related_btc: if subtype == 'bitcoin' and related_btc:
meta["related_btc"] = btc_ail.get_bitcoin_info(obj.id) meta["related_btc"] = btc_ail.get_bitcoin_info(obj.id)
if obj.get_type() == 'decoded': if obj.get_type() == 'decoded':
meta['mimetype'] = obj.get_mimetype()
meta['size'] = obj.get_size()
meta["vt"] = obj.get_meta_vt() meta["vt"] = obj.get_meta_vt()
meta["vt"]["status"] = obj.is_vt_enabled() meta["vt"]["status"] = obj.is_vt_enabled()
# TAGS MODAL # TAGS MODAL
@ -323,8 +368,8 @@ def get_obj_correlations(obj_type, subtype, obj_id):
obj = get_object(obj_type, subtype, obj_id) obj = get_object(obj_type, subtype, obj_id)
return obj.get_correlations() return obj.get_correlations()
def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max): def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max, objs_hidden):
if len(objs) < nb_max or nb_max == -1: if len(objs) < nb_max or nb_max == 0:
if lvl == 0: if lvl == 0:
objs.add((obj_type, subtype, obj_id)) objs.add((obj_type, subtype, obj_id))
@ -336,21 +381,27 @@ def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lv
for obj2_type in correlations: for obj2_type in correlations:
for str_obj in correlations[obj2_type]: for str_obj in correlations[obj2_type]:
obj2_subtype, obj2_id = str_obj.split(':', 1) obj2_subtype, obj2_id = str_obj.split(':', 1)
_get_obj_correlations_objs(objs, obj2_type, obj2_subtype, obj2_id, filter_types, lvl, nb_max) if get_obj_global_id(obj2_type, obj2_subtype, obj2_id) in objs_hidden:
continue # filter object to hide
_get_obj_correlations_objs(objs, obj2_type, obj2_subtype, obj2_id, filter_types, lvl, nb_max, objs_hidden)
def get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=[], lvl=0, nb_max=300): def get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=[], lvl=0, nb_max=300, objs_hidden=set()):
objs = set() objs = set()
_get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max) _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max, objs_hidden)
return objs return objs
def obj_correlations_objs_add_tags(obj_type, subtype, obj_id, tags, filter_types=[], lvl=0, nb_max=300): def obj_correlations_objs_add_tags(obj_type, subtype, obj_id, tags, filter_types=[], lvl=0, nb_max=300, objs_hidden=set()):
objs = get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=filter_types, lvl=lvl, nb_max=nb_max) objs = get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=filter_types, lvl=lvl, nb_max=nb_max, objs_hidden=objs_hidden)
# print(objs) # print(objs)
for obj_tuple in objs: for obj_tuple in objs:
obj1_type, subtype1, id1 = obj_tuple obj1_type, subtype1, id1 = obj_tuple
add_obj_tags(obj1_type, subtype1, id1, tags) add_obj_tags(obj1_type, subtype1, id1, tags)
return objs return objs
def get_obj_nb_correlations(obj_type, subtype, obj_id, filter_types=[]):
obj = get_object(obj_type, subtype, obj_id)
return obj.get_nb_correlations(filter_types=filter_types)
################################################################################ ################################################################################
################################################################################ TODO ################################################################################ TODO
################################################################################ ################################################################################
@ -381,7 +432,7 @@ def create_correlation_graph_links(links_set):
def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True): def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True):
graph_nodes_list = [] graph_nodes_list = []
for node_id in nodes_set: for node_id in nodes_set:
obj_type, subtype, obj_id = node_id.split(';', 2) obj_type, subtype, obj_id = get_obj_type_subtype_id_from_global_id(node_id)
dict_node = {'id': node_id} dict_node = {'id': node_id}
dict_node['style'] = get_object_svg(obj_type, subtype, obj_id) dict_node['style'] = get_object_svg(obj_type, subtype, obj_id)
@ -402,17 +453,40 @@ def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True):
def get_correlations_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, def get_correlations_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1,
objs_hidden=set(),
flask_context=False): flask_context=False):
obj_str_id, nodes, links = correlations_engine.get_correlations_graph_nodes_links(obj_type, subtype, obj_id, obj_str_id, nodes, links, meta = correlations_engine.get_correlations_graph_nodes_links(obj_type, subtype, obj_id,
filter_types=filter_types, filter_types=filter_types,
max_nodes=max_nodes, level=level, max_nodes=max_nodes, level=level,
objs_hidden=objs_hidden,
flask_context=flask_context) flask_context=flask_context)
# print(meta)
meta['objs'] = list(meta['objs'])
return {"nodes": create_correlation_graph_nodes(nodes, obj_str_id, flask_context=flask_context), return {"nodes": create_correlation_graph_nodes(nodes, obj_str_id, flask_context=flask_context),
"links": create_correlation_graph_links(links)} "links": create_correlation_graph_links(links),
"meta": meta}
# --- CORRELATION --- # # --- CORRELATION --- #
def get_obj_nb_relationships(obj_type, subtype, obj_id, filter_types=[]):
obj = get_object(obj_type, subtype, obj_id)
return obj.get_nb_relationships(filter=filter_types)
def get_relationships_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1,
objs_hidden=set(),
flask_context=False):
obj_global_id = get_obj_global_id(obj_type, subtype, obj_id)
nodes, links, meta = relationships_engine.get_relationship_graph(obj_global_id,
filter_types=filter_types,
max_nodes=max_nodes, level=level,
objs_hidden=objs_hidden)
# print(meta)
meta['objs'] = list(meta['objs'])
return {"nodes": create_correlation_graph_nodes(nodes, obj_global_id, flask_context=flask_context),
"links": links,
"meta": meta}
# if __name__ == '__main__': # if __name__ == '__main__':
# r = get_objects([{'lvl': 1, 'type': 'item', 'subtype': '', 'id': 'crawled/2020/09/14/circl.lu0f4976a4-dda4-4189-ba11-6618c4a8c951'}]) # r = get_objects([{'lvl': 1, 'type': 'item', 'subtype': '', 'id': 'crawled/2020/09/14/circl.lu0f4976a4-dda4-4189-ba11-6618c4a8c951'}])

View file

@ -113,6 +113,34 @@ def regex_finditer(r_key, regex, item_id, content, max_time=30):
proc.terminate() proc.terminate()
sys.exit(0) sys.exit(0)
def _regex_match(r_key, regex, content):
if re.match(regex, content):
r_serv_cache.set(r_key, 1)
r_serv_cache.expire(r_key, 360)
def regex_match(r_key, regex, item_id, content, max_time=30):
proc = Proc(target=_regex_match, args=(r_key, regex, content))
try:
proc.start()
proc.join(max_time)
if proc.is_alive():
proc.terminate()
# Statistics.incr_module_timeout_statistic(r_key)
err_mess = f"{r_key}: processing timeout: {item_id}"
logger.info(err_mess)
return False
else:
if r_serv_cache.exists(r_key):
r_serv_cache.delete(r_key)
return True
else:
r_serv_cache.delete(r_key)
return False
except KeyboardInterrupt:
print("Caught KeyboardInterrupt, terminating regex worker")
proc.terminate()
sys.exit(0)
def _regex_search(r_key, regex, content): def _regex_search(r_key, regex, content):
if re.search(regex, content): if re.search(regex, content):
r_serv_cache.set(r_key, 1) r_serv_cache.set(r_key, 1)

111
bin/lib/relationships_engine.py Executable file
View file

@ -0,0 +1,111 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
config_loader = ConfigLoader()
r_rel = config_loader.get_db_conn("Kvrocks_Relationships")
config_loader = None
RELATIONSHIPS = {
"forward",
"mention"
}
def get_relationships():
return RELATIONSHIPS
def get_obj_relationships_by_type(obj_global_id, relationship):
return r_rel.smembers(f'rel:{relationship}:{obj_global_id}')
def get_obj_nb_relationships_by_type(obj_global_id, relationship):
return r_rel.scard(f'rel:{relationship}:{obj_global_id}')
def get_obj_relationships(obj_global_id):
relationships = []
for relationship in get_relationships():
for rel in get_obj_relationships_by_type(obj_global_id, relationship):
meta = {'relationship': relationship}
direction, obj_id = rel.split(':', 1)
if direction == 'i':
meta['source'] = obj_id
meta['target'] = obj_global_id
else:
meta['target'] = obj_id
meta['source'] = obj_global_id
if not obj_id.startswith('chat'):
continue
meta['id'] = obj_id
# meta['direction'] = direction
relationships.append(meta)
return relationships
def get_obj_nb_relationships(obj_global_id):
nb = {}
for relationship in get_relationships():
nb[relationship] = get_obj_nb_relationships_by_type(obj_global_id, relationship)
return nb
# TODO Filter by obj type ???
def add_obj_relationship(source, target, relationship):
r_rel.sadd(f'rel:{relationship}:{source}', f'o:{target}')
r_rel.sadd(f'rel:{relationship}:{target}', f'i:{source}')
# r_rel.sadd(f'rels:{source}', relationship)
# r_rel.sadd(f'rels:{target}', relationship)
def get_relationship_graph(obj_global_id, filter_types=[], max_nodes=300, level=1, objs_hidden=set()):
links = []
nodes = set()
meta = {'complete': True, 'objs': set()}
done = set()
done_link = set()
_get_relationship_graph(obj_global_id, links, nodes, meta, level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, done=done, done_link=done_link)
return nodes, links, meta
def _get_relationship_graph(obj_global_id, links, nodes, meta, level, max_nodes, filter_types=[], objs_hidden=set(), done=set(), done_link=set()):
meta['objs'].add(obj_global_id)
nodes.add(obj_global_id)
for rel in get_obj_relationships(obj_global_id):
meta['objs'].add(rel['id'])
if rel['id'] in done:
continue
if len(nodes) > max_nodes != 0:
meta['complete'] = False
break
nodes.add(rel['id'])
str_link = f"{rel['source']}{rel['target']}{rel['relationship']}"
if str_link not in done_link:
links.append({"source": rel['source'], "target": rel['target'], "relationship": rel['relationship']})
done_link.add(str_link)
if level > 0:
next_level = level - 1
_get_relationship_graph(rel['id'], links, nodes, meta, next_level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, done=done, done_link=done_link)
# done.add(rel['id'])
if __name__ == '__main__':
source = ''
target = ''
add_obj_relationship(source, target, 'forward')
# print(get_obj_relationships(source))

212
bin/lib/timeline_engine.py Executable file
View file

@ -0,0 +1,212 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
from uuid import uuid4
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from lib.ConfigLoader import ConfigLoader
config_loader = ConfigLoader()
r_meta = config_loader.get_db_conn("Kvrocks_Timeline")
config_loader = None
# CORRELATION_TYPES_BY_OBJ = {
# "chat": ["item", "username"], # item ???
# "cookie-name": ["domain"],
# "cryptocurrency": ["domain", "item"],
# "cve": ["domain", "item"],
# "decoded": ["domain", "item"],
# "domain": ["cve", "cookie-name", "cryptocurrency", "decoded", "etag", "favicon", "hhhash", "item", "pgp", "title", "screenshot", "username"],
# "etag": ["domain"],
# "favicon": ["domain", "item"],
# "hhhash": ["domain"],
# "item": ["chat", "cve", "cryptocurrency", "decoded", "domain", "favicon", "pgp", "screenshot", "title", "username"],
# "pgp": ["domain", "item"],
# "screenshot": ["domain", "item"],
# "title": ["domain", "item"],
# "username": ["chat", "domain", "item"],
# }
#
# def get_obj_correl_types(obj_type):
# return CORRELATION_TYPES_BY_OBJ.get(obj_type)
# def sanityze_obj_correl_types(obj_type, correl_types):
# obj_correl_types = get_obj_correl_types(obj_type)
# if correl_types:
# correl_types = set(correl_types).intersection(obj_correl_types)
# if not correl_types:
# correl_types = obj_correl_types
# if not correl_types:
# return []
# return correl_types
class Timeline:
def __init__(self, global_id, name):
self.id = global_id
self.name = name
def _get_block_obj_global_id(self, block):
return r_meta.hget(f'block:{self.id}:{self.name}', block)
def _set_block_obj_global_id(self, block, global_id):
return r_meta.hset(f'block:{self.id}:{self.name}', block, global_id)
def _get_block_timestamp(self, block, position):
return r_meta.zscore(f'line:{self.id}:{self.name}', f'{position}:{block}')
def _get_nearest_bloc_inf(self, timestamp):
inf = r_meta.zrevrangebyscore(f'line:{self.id}:{self.name}', float(timestamp), 0, start=0, num=1, withscores=True)
if inf:
inf, score = inf[0]
if inf.startswith('end'):
inf_key = f'start:{inf[4:]}'
inf_score = r_meta.zscore(f'line:{self.id}:{self.name}', inf_key)
if inf_score == score:
inf = inf_key
return inf
else:
return None
def _get_nearest_bloc_sup(self, timestamp):
sup = r_meta.zrangebyscore(f'line:{self.id}:{self.name}', float(timestamp), '+inf', start=0, num=1, withscores=True)
if sup:
sup, score = sup[0]
if sup.startswith('start'):
sup_key = f'end:{sup[6:]}'
sup_score = r_meta.zscore(f'line:{self.id}:{self.name}', sup_key)
if score == sup_score:
sup = sup_key
return sup
else:
return None
def get_first_obj_id(self):
first = r_meta.zrange(f'line:{self.id}:{self.name}', 0, 0)
if first: # start:block
first = first[0]
if first.startswith('start:'):
first = first[6:]
else:
first = first[4:]
return self._get_block_obj_global_id(first)
def get_last_obj_id(self):
last = r_meta.zrevrange(f'line:{self.id}:{self.name}', 0, 0)
if last: # end:block
last = last[0]
if last.startswith('end:'):
last = last[4:]
else:
last = last[6:]
return self._get_block_obj_global_id(last)
def get_objs_ids(self):
objs = set()
for block in r_meta.zrange(f'line:{self.id}:{self.name}', 0, -1):
if block:
if block.startswith('start:'):
objs.add(self._get_block_obj_global_id(block[6:]))
return objs
# def get_objs_ids(self):
# objs = {}
# last_obj_id = None
# for block, timestamp in r_meta.zrange(f'line:{self.id}:{self.name}', 0, -1, withscores=True):
# if block:
# if block.startswith('start:'):
# last_obj_id = self._get_block_obj_global_id(block[6:])
# objs[last_obj_id] = {'first_seen': timestamp}
# else:
# objs[last_obj_id]['last_seen'] = timestamp
# return objs
def _update_bloc(self, block, position, timestamp):
r_meta.zadd(f'line:{self.id}:{self.name}', {f'{position}:{block}': timestamp})
def _add_bloc(self, obj_global_id, timestamp, end=None):
if end:
timestamp_end = end
else:
timestamp_end = timestamp
new_bloc = str(uuid4())
r_meta.zadd(f'line:{self.id}:{self.name}', {f'start:{new_bloc}': timestamp, f'end:{new_bloc}': timestamp_end})
self._set_block_obj_global_id(new_bloc, obj_global_id)
return new_bloc
def add_timestamp(self, timestamp, obj_global_id):
inf = self._get_nearest_bloc_inf(timestamp)
sup = self._get_nearest_bloc_sup(timestamp)
if not inf and not sup:
# create new bloc
new_bloc = self._add_bloc(obj_global_id, timestamp)
return new_bloc
# timestamp < first_seen
elif not inf:
sup_pos, sup_id = sup.split(':')
sup_obj = self._get_block_obj_global_id(sup_id)
if sup_obj == obj_global_id:
self._update_bloc(sup_id, 'start', timestamp)
# create new bloc
else:
new_bloc = self._add_bloc(obj_global_id, timestamp)
return new_bloc
# timestamp > first_seen
elif not sup:
inf_pos, inf_id = inf.split(':')
inf_obj = self._get_block_obj_global_id(inf_id)
if inf_obj == obj_global_id:
self._update_bloc(inf_id, 'end', timestamp)
# create new bloc
else:
new_bloc = self._add_bloc(obj_global_id, timestamp)
return new_bloc
else:
inf_pos, inf_id = inf.split(':')
sup_pos, sup_id = sup.split(':')
inf_obj = self._get_block_obj_global_id(inf_id)
if inf_id == sup_id:
# reduce bloc + create two new bloc
if obj_global_id != inf_obj:
# get end timestamp
sup_timestamp = self._get_block_timestamp(sup_id, 'end')
# reduce original bloc
self._update_bloc(inf_id, 'end', timestamp - 1)
# Insert new bloc
new_bloc = self._add_bloc(obj_global_id, timestamp)
# Recreate end of the first bloc by a new bloc
self._add_bloc(inf_obj, timestamp + 1, end=sup_timestamp)
return new_bloc
# timestamp in existing bloc
else:
return inf_id
# different blocs: expend sup/inf bloc or create a new bloc if
elif inf_pos == 'end' and sup_pos == 'start':
# Extend inf bloc
if obj_global_id == inf_obj:
self._update_bloc(inf_id, 'end', timestamp)
return inf_id
sup_obj = self._get_block_obj_global_id(sup_id)
# Extend sup bloc
if obj_global_id == sup_obj:
self._update_bloc(sup_id, 'start', timestamp)
return sup_id
# create new bloc
new_bloc = self._add_bloc(obj_global_id, timestamp)
return new_bloc
# inf_pos == 'start' and sup_pos == 'end'
# else raise error ???

View file

@ -47,8 +47,8 @@ class ApiKey(AbstractModule):
self.logger.info(f"Module {self.module_name} initialized") self.logger.info(f"Module {self.module_name} initialized")
def compute(self, message, r_result=False): def compute(self, message, r_result=False):
item_id, score = message.split() score = message
item = Item(item_id) item = self.get_obj()
item_content = item.get_content() item_content = item.get_content()
google_api_key = self.regex_findall(self.re_google_api_key, item.get_id(), item_content, r_set=True) google_api_key = self.regex_findall(self.re_google_api_key, item.get_id(), item_content, r_set=True)
@ -63,8 +63,8 @@ class ApiKey(AbstractModule):
print(f'found google api key: {to_print}') print(f'found google api key: {to_print}')
self.redis_logger.warning(f'{to_print}Checked {len(google_api_key)} found Google API Key;{item.get_id()}') self.redis_logger.warning(f'{to_print}Checked {len(google_api_key)} found Google API Key;{item.get_id()}')
msg = f'infoleak:automatic-detection="google-api-key";{item.get_id()}' tag = 'infoleak:automatic-detection="google-api-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# # TODO: # FIXME: AWS regex/validate/sanitize KEY + SECRET KEY # # TODO: # FIXME: AWS regex/validate/sanitize KEY + SECRET KEY
if aws_access_key: if aws_access_key:
@ -74,12 +74,12 @@ class ApiKey(AbstractModule):
print(f'found AWS secret key') print(f'found AWS secret key')
self.redis_logger.warning(f'{to_print}Checked {len(aws_secret_key)} found AWS secret Key;{item.get_id()}') self.redis_logger.warning(f'{to_print}Checked {len(aws_secret_key)} found AWS secret Key;{item.get_id()}')
msg = f'infoleak:automatic-detection="aws-key";{item.get_id()}' tag = 'infoleak:automatic-detection="aws-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# Tags # Tags
msg = f'infoleak:automatic-detection="api-key";{item.get_id()}' tag = 'infoleak:automatic-detection="api-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if r_result: if r_result:
return google_api_key, aws_access_key, aws_secret_key return google_api_key, aws_access_key, aws_secret_key

View file

@ -6,14 +6,14 @@ The ZMQ_PubSub_Categ Module
Each words files created under /files/ are representing categories. Each words files created under /files/ are representing categories.
This modules take these files and compare them to This modules take these files and compare them to
the content of an item. the content of an obj.
When a word from a item match one or more of these words file, the filename of When a word from a obj match one or more of these words file, the filename of
the item / zhe item id is published/forwarded to the next modules. the obj / the obj id is published/forwarded to the next modules.
Each category (each files) are representing a dynamic channel. Each category (each files) are representing a dynamic channel.
This mean that if you create 1000 files under /files/ you'll have 1000 channels This mean that if you create 1000 files under /files/ you'll have 1000 channels
where every time there is a matching word to a category, the item containing where every time there is a matching word to a category, the obj containing
this word will be pushed to this specific channel. this word will be pushed to this specific channel.
..note:: The channel will have the name of the file created. ..note:: The channel will have the name of the file created.
@ -44,7 +44,6 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects.Items import Item
class Categ(AbstractModule): class Categ(AbstractModule):
@ -81,27 +80,32 @@ class Categ(AbstractModule):
self.categ_words = tmp_dict.items() self.categ_words = tmp_dict.items()
def compute(self, message, r_result=False): def compute(self, message, r_result=False):
# Create Item Object # Get obj Object
item = Item(message) obj = self.get_obj()
# Get item content # Get obj content
content = item.get_content() content = obj.get_content()
categ_found = [] categ_found = []
# Search for pattern categories in item content # Search for pattern categories in obj content
for categ, pattern in self.categ_words: for categ, pattern in self.categ_words:
if obj.type == 'message':
self.add_message_to_queue(message='0', queue=categ)
else:
found = set(re.findall(pattern, content)) found = set(re.findall(pattern, content))
lenfound = len(found) lenfound = len(found)
if lenfound >= self.matchingThreshold: if lenfound >= self.matchingThreshold:
categ_found.append(categ) categ_found.append(categ)
msg = f'{item.get_id()} {lenfound}' msg = str(lenfound)
# Export message to categ queue # Export message to categ queue
print(msg, categ) print(msg, categ)
self.add_message_to_queue(msg, categ) self.add_message_to_queue(message=msg, queue=categ)
self.redis_logger.debug( self.redis_logger.debug(
f'Categ;{item.get_source()};{item.get_date()};{item.get_basename()};Detected {lenfound} as {categ};{item.get_id()}') f'Categ;{obj.get_source()};{obj.get_date()};{obj.get_basename()};Detected {lenfound} as {categ};{obj.get_id()}')
if r_result: if r_result:
return categ_found return categ_found

View file

@ -29,7 +29,6 @@ Redis organization:
import os import os
import sys import sys
import time import time
import re
from datetime import datetime from datetime import datetime
from pyfaup.faup import Faup from pyfaup.faup import Faup
@ -85,8 +84,8 @@ class Credential(AbstractModule):
def compute(self, message): def compute(self, message):
item_id, count = message.split() count = message
item = Item(item_id) item = self.get_obj()
item_content = item.get_content() item_content = item.get_content()
@ -111,8 +110,8 @@ class Credential(AbstractModule):
print(f"========> Found more than 10 credentials in this file : {item.get_id()}") print(f"========> Found more than 10 credentials in this file : {item.get_id()}")
self.redis_logger.warning(to_print) self.redis_logger.warning(to_print)
msg = f'infoleak:automatic-detection="credential";{item.get_id()}' tag = 'infoleak:automatic-detection="credential"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
site_occurrence = self.regex_findall(self.regex_site_for_stats, item.get_id(), item_content) site_occurrence = self.regex_findall(self.regex_site_for_stats, item.get_id(), item_content)

View file

@ -68,8 +68,8 @@ class CreditCards(AbstractModule):
return extracted return extracted
def compute(self, message, r_result=False): def compute(self, message, r_result=False):
item_id, score = message.split() score = message
item = Item(item_id) item = self.get_obj()
content = item.get_content() content = item.get_content()
all_cards = self.regex_findall(self.regex, item.id, content) all_cards = self.regex_findall(self.regex, item.id, content)
@ -90,8 +90,8 @@ class CreditCards(AbstractModule):
print(mess) print(mess)
self.redis_logger.warning(mess) self.redis_logger.warning(mess)
msg = f'infoleak:automatic-detection="credit-card";{item.id}' tag = 'infoleak:automatic-detection="credit-card"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if r_result: if r_result:
return creditcard_set return creditcard_set

View file

@ -114,7 +114,7 @@ class Cryptocurrencies(AbstractModule, ABC):
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
def compute(self, message): def compute(self, message):
item = Item(message) item = self.get_obj()
item_id = item.get_id() item_id = item.get_id()
date = item.get_date() date = item.get_date()
content = item.get_content() content = item.get_content()
@ -130,18 +130,18 @@ class Cryptocurrencies(AbstractModule, ABC):
if crypto.is_valid_address(): if crypto.is_valid_address():
# print(address) # print(address)
is_valid_address = True is_valid_address = True
crypto.add(date, item_id) crypto.add(date, item)
# Check private key # Check private key
if is_valid_address: if is_valid_address:
msg = f'{currency["tag"]};{item_id}' msg = f'{currency["tag"]}'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=msg, queue='Tags')
if currency.get('private_key'): if currency.get('private_key'):
private_keys = self.regex_findall(currency['private_key']['regex'], item_id, content) private_keys = self.regex_findall(currency['private_key']['regex'], item_id, content)
if private_keys: if private_keys:
msg = f'{currency["private_key"]["tag"]};{item_id}' msg = f'{currency["private_key"]["tag"]}'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=msg, queue='Tags')
# debug # debug
print(private_keys) print(private_keys)

View file

@ -44,9 +44,8 @@ class CveModule(AbstractModule):
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
def compute(self, message): def compute(self, message):
count = message
item_id, count = message.split() item = self.get_obj()
item = Item(item_id)
item_id = item.get_id() item_id = item.get_id()
cves = self.regex_findall(self.reg_cve, item_id, item.get_content()) cves = self.regex_findall(self.reg_cve, item_id, item.get_content())
@ -55,15 +54,15 @@ class CveModule(AbstractModule):
date = item.get_date() date = item.get_date()
for cve_id in cves: for cve_id in cves:
cve = Cves.Cve(cve_id) cve = Cves.Cve(cve_id)
cve.add(date, item_id) cve.add(date, item)
warning = f'{item_id} contains CVEs {cves}' warning = f'{item_id} contains CVEs {cves}'
print(warning) print(warning)
self.redis_logger.warning(warning) self.redis_logger.warning(warning)
msg = f'infoleak:automatic-detection="cve";{item_id}' tag = 'infoleak:automatic-detection="cve"'
# Send to Tags Queue # Send to Tags Queue
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -21,7 +21,6 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects.Items import Item
from lib.objects.Decodeds import Decoded from lib.objects.Decodeds import Decoded
from trackers.Tracker_Term import Tracker_Term from trackers.Tracker_Term import Tracker_Term
from trackers.Tracker_Regex import Tracker_Regex from trackers.Tracker_Regex import Tracker_Regex
@ -87,18 +86,16 @@ class Decoder(AbstractModule):
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
def compute(self, message): def compute(self, message):
content = self.obj.get_content()
item = Item(message) date = self.obj.get_date()
content = item.get_content()
date = item.get_date()
new_decodeds = [] new_decodeds = []
for decoder in self.decoder_order: for decoder in self.decoder_order:
find = False find = False
dname = decoder['name'] dname = decoder['name']
encodeds = self.regex_findall(decoder['regex'], item.id, content) encodeds = self.regex_findall(decoder['regex'], self.obj.id, content)
# PERF remove encoded from item content # PERF remove encoded from obj content
for encoded in encodeds: for encoded in encodeds:
content = content.replace(encoded, '', 1) content = content.replace(encoded, '', 1)
encodeds = set(encodeds) encodeds = set(encodeds)
@ -114,33 +111,34 @@ class Decoder(AbstractModule):
if not decoded.exists(): if not decoded.exists():
mimetype = decoded.guess_mimetype(decoded_file) mimetype = decoded.guess_mimetype(decoded_file)
if not mimetype: if not mimetype:
print(sha1_string, item.id) print(sha1_string, self.obj.id)
raise Exception(f'Invalid mimetype: {decoded.id} {item.id}') raise Exception(f'Invalid mimetype: {decoded.id} {self.obj.id}')
decoded.save_file(decoded_file, mimetype) decoded.save_file(decoded_file, mimetype)
new_decodeds.append(decoded.id) new_decodeds.append(decoded.id)
else: else:
mimetype = decoded.get_mimetype() mimetype = decoded.get_mimetype()
decoded.add(dname, date, item.id, mimetype=mimetype) decoded.add(date, self.obj, dname, mimetype=mimetype)
# new_decodeds.append(decoded.id) # new_decodeds.append(decoded.id)
self.logger.info(f'{item.id} : {dname} - {decoded.id} - {mimetype}') self.logger.info(f'{self.obj.id} : {dname} - {decoded.id} - {mimetype}')
if find: if find:
self.logger.info(f'{item.id} - {dname}') self.logger.info(f'{self.obj.id} - {dname}')
# Send to Tags # Send to Tags
msg = f'infoleak:automatic-detection="{dname}";{item.id}' tag = f'infoleak:automatic-detection="{dname}"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
#################### ####################
# TRACKERS DECODED # TRACKERS DECODED
for decoded_id in new_decodeds: for decoded_id in new_decodeds:
decoded = Decoded(decoded_id)
try: try:
self.tracker_term.compute(decoded_id, obj_type='decoded') self.tracker_term.compute_manual(decoded)
self.tracker_regex.compute(decoded_id, obj_type='decoded') self.tracker_regex.compute_manual(decoded)
except UnicodeDecodeError: except UnicodeDecodeError:
pass pass
self.tracker_yara.compute(decoded_id, obj_type='decoded') self.tracker_yara.compute_manual(decoded)
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -22,7 +22,6 @@ sys.path.append(os.environ['AIL_BIN'])
# Import Project packages # Import Project packages
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.objects.Items import Item
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib import d4 from lib import d4
@ -42,7 +41,13 @@ class DomClassifier(AbstractModule):
addr_dns = config_loader.get_config_str("DomClassifier", "dns") addr_dns = config_loader.get_config_str("DomClassifier", "dns")
self.c = DomainClassifier.domainclassifier.Extract(rawtext="", nameservers=[addr_dns]) redis_host = config_loader.get_config_str('Redis_Cache', 'host')
redis_port = config_loader.get_config_int('Redis_Cache', 'port')
redis_db = config_loader.get_config_int('Redis_Cache', 'db')
self.dom_classifier = DomainClassifier.domainclassifier.Extract(rawtext="", nameservers=[addr_dns],
redis_host=redis_host,
redis_port=redis_port, redis_db=redis_db,
re_timeout=30)
self.cc = config_loader.get_config_str("DomClassifier", "cc") self.cc = config_loader.get_config_str("DomClassifier", "cc")
self.cc_tld = config_loader.get_config_str("DomClassifier", "cc_tld") self.cc_tld = config_loader.get_config_str("DomClassifier", "cc_tld")
@ -51,38 +56,42 @@ class DomClassifier(AbstractModule):
self.logger.info(f"Module: {self.module_name} Launched") self.logger.info(f"Module: {self.module_name} Launched")
def compute(self, message, r_result=False): def compute(self, message, r_result=False):
host, item_id = message.split() host = message
item = Item(item_id) item = self.get_obj()
item_basename = item.get_basename() item_basename = item.get_basename()
item_date = item.get_date() item_date = item.get_date()
item_source = item.get_source() item_source = item.get_source()
try: try:
self.c.text(rawtext=host) self.dom_classifier.text(rawtext=host)
print(self.c.domain) if not self.dom_classifier.domain:
self.c.validdomain(passive_dns=True, extended=False) return
# self.logger.debug(self.c.vdomain) print(self.dom_classifier.domain)
self.dom_classifier.validdomain(passive_dns=True, extended=False)
# self.logger.debug(self.dom_classifier.vdomain)
print(self.c.vdomain) print(self.dom_classifier.vdomain)
print() print()
if self.c.vdomain and d4.is_passive_dns_enabled(): if self.dom_classifier.vdomain and d4.is_passive_dns_enabled():
for dns_record in self.c.vdomain: for dns_record in self.dom_classifier.vdomain:
self.add_message_to_queue(dns_record) self.add_message_to_queue(obj=None, message=dns_record)
localizeddomains = self.c.include(expression=self.cc_tld) if self.cc_tld:
localizeddomains = self.dom_classifier.include(expression=self.cc_tld)
if localizeddomains: if localizeddomains:
print(localizeddomains) print(localizeddomains)
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc_tld};{item.get_id()}") self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc_tld};{item.get_id()}")
localizeddomains = self.c.localizedomain(cc=self.cc) if self.cc:
localizeddomains = self.dom_classifier.localizedomain(cc=self.cc)
if localizeddomains: if localizeddomains:
print(localizeddomains) print(localizeddomains)
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc};{item.get_id()}") self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc};{item.get_id()}")
if r_result: if r_result:
return self.c.vdomain return self.dom_classifier.vdomain
except IOError as err: except IOError as err:
self.redis_logger.error(f"Duplicate;{item_source};{item_date};{item_basename};CRC Checksum Failed") self.redis_logger.error(f"Duplicate;{item_source};{item_date};{item_basename};CRC Checksum Failed")

View file

@ -52,7 +52,7 @@ class Duplicates(AbstractModule):
def compute(self, message): def compute(self, message):
# IOError: "CRC Checksum Failed on : {id}" # IOError: "CRC Checksum Failed on : {id}"
item = Item(message) item = self.get_obj()
# Check file size # Check file size
if item.get_size() < self.min_item_size: if item.get_size() < self.min_item_size:

66
bin/modules/Exif.py Executable file
View file

@ -0,0 +1,66 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
"""
The Exif Module
======================
"""
##################################
# Import External packages
##################################
import os
import sys
from PIL import Image, ExifTags
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from modules.abstract_module import AbstractModule
class Exif(AbstractModule):
"""
CveModule for AIL framework
"""
def __init__(self):
super(Exif, self).__init__()
# Waiting time in seconds between to message processed
self.pending_seconds = 1
# Send module state to logs
self.logger.info(f'Module {self.module_name} initialized')
def compute(self, message):
image = self.get_obj()
print(image)
img = Image.open(image.get_filepath())
img_exif = img.getexif()
print(img_exif)
if img_exif:
self.logger.critical(f'Exif: {self.get_obj().id}')
gps = img_exif.get(34853)
print(gps)
self.logger.critical(f'gps: {gps}')
for key, val in img_exif.items():
if key in ExifTags.TAGS:
print(f'{ExifTags.TAGS[key]}:{val}')
self.logger.critical(f'{ExifTags.TAGS[key]}:{val}')
else:
print(f'{key}:{val}')
self.logger.critical(f'{key}:{val}')
sys.exit(0)
# tag = 'infoleak:automatic-detection="cve"'
# Send to Tags Queue
# self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == '__main__':
module = Exif()
module.run()

View file

@ -79,24 +79,15 @@ class Global(AbstractModule):
self.time_last_stats = time.time() self.time_last_stats = time.time()
self.processed_item = 0 self.processed_item = 0
def compute(self, message, r_result=False): def compute(self, message, r_result=False): # TODO move OBJ ID sanitization to importer
# Recovering the streamed message informations # Recovering the streamed message infos
splitted = message.split() gzip64encoded = message
if len(splitted) == 2: if self.obj.type == 'item':
item, gzip64encoded = splitted if gzip64encoded:
# Remove ITEMS_FOLDER from item path (crawled item + submitted)
if self.ITEMS_FOLDER in item:
item = item.replace(self.ITEMS_FOLDER, '', 1)
file_name_item = item.split('/')[-1]
if len(file_name_item) > 255:
new_file_name_item = '{}{}.gz'.format(file_name_item[:215], str(uuid4()))
item = self.rreplace(item, file_name_item, new_file_name_item, 1)
# Creating the full filepath # Creating the full filepath
filename = os.path.join(self.ITEMS_FOLDER, item) filename = os.path.join(self.ITEMS_FOLDER, self.obj.id)
filename = os.path.realpath(filename) filename = os.path.realpath(filename)
# Incorrect filename # Incorrect filename
@ -109,6 +100,7 @@ class Global(AbstractModule):
decoded = base64.standard_b64decode(gzip64encoded) decoded = base64.standard_b64decode(gzip64encoded)
new_file_content = self.gunzip_bytes_obj(filename, decoded) new_file_content = self.gunzip_bytes_obj(filename, decoded)
# TODO REWRITE ME
if new_file_content: if new_file_content:
filename = self.check_filename(filename, new_file_content) filename = self.check_filename(filename, new_file_content)
@ -121,31 +113,24 @@ class Global(AbstractModule):
with open(filename, 'wb') as f: with open(filename, 'wb') as f:
f.write(decoded) f.write(decoded)
item_id = filename update_obj_date(self.obj.get_date(), 'item')
# remove self.ITEMS_FOLDER from
if self.ITEMS_FOLDER in item_id:
item_id = item_id.replace(self.ITEMS_FOLDER, '', 1)
item = Item(item_id) self.add_message_to_queue(obj=self.obj, queue='Item')
update_obj_date(item.get_date(), 'item')
self.add_message_to_queue(item_id, 'Item')
self.processed_item += 1 self.processed_item += 1
# DIRTY FIX AIL SYNC - SEND TO SYNC MODULE print(self.obj.id)
# # FIXME: DIRTY FIX
message = f'{item.get_type()};{item.get_subtype(r_str=True)};{item.get_id()}'
print(message)
self.add_message_to_queue(message, 'Sync')
print(item_id)
if r_result: if r_result:
return item_id return self.obj.id
else: else:
self.logger.debug(f"Empty Item: {message} not processed") self.logger.info(f"Empty Item: {message} not processed")
print(f"Empty Item: {message} not processed") elif self.obj.type == 'message':
# TODO send to specific object queue => image, ...
self.add_message_to_queue(obj=self.obj, queue='Item')
elif self.obj.type == 'image':
self.add_message_to_queue(obj=self.obj, queue='Image')
else:
self.logger.critical(f"Empty obj: {self.obj} {message} not processed")
def check_filename(self, filename, new_file_content): def check_filename(self, filename, new_file_content):
""" """

View file

@ -18,13 +18,14 @@ import os
import re import re
import sys import sys
import DomainClassifier.domainclassifier
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
# Import Project packages # Import Project packages
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader from lib.ConfigLoader import ConfigLoader
from lib.objects.Items import Item
class Hosts(AbstractModule): class Hosts(AbstractModule):
""" """
@ -43,29 +44,29 @@ class Hosts(AbstractModule):
# Waiting time in seconds between to message processed # Waiting time in seconds between to message processed
self.pending_seconds = 1 self.pending_seconds = 1
self.host_regex = r'\b([a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)\b' redis_host = config_loader.get_config_str('Redis_Cache', 'host')
re.compile(self.host_regex) redis_port = config_loader.get_config_int('Redis_Cache', 'port')
redis_db = config_loader.get_config_int('Redis_Cache', 'db')
self.dom_classifier = DomainClassifier.domainclassifier.Extract(rawtext="",
redis_host=redis_host,
redis_port=redis_port,
redis_db=redis_db,
re_timeout=30)
self.logger.info(f"Module: {self.module_name} Launched") self.logger.info(f"Module: {self.module_name} Launched")
def compute(self, message): def compute(self, message):
item = Item(message) obj = self.get_obj()
# mimetype = item_basic.get_item_mimetype(item.get_id()) content = obj.get_content()
# if mimetype.split('/')[0] == "text": self.dom_classifier.text(content)
if self.dom_classifier.domain:
content = item.get_content() print(f'{len(self.dom_classifier.domain)} host {obj.get_id()}')
hosts = self.regex_findall(self.host_regex, item.get_id(), content) # print(self.dom_classifier.domain)
if hosts: for domain in self.dom_classifier.domain:
print(f'{len(hosts)} host {item.get_id()}') if domain:
for host in hosts: self.add_message_to_queue(message=domain, queue='Host')
# print(host)
msg = f'{host} {item.get_id()}'
self.add_message_to_queue(msg, 'Host')
if __name__ == '__main__': if __name__ == '__main__':
module = Hosts() module = Hosts()
module.run() module.run()

View file

@ -43,7 +43,8 @@ class IPAddress(AbstractModule):
networks = config_loader.get_config_str("IP", "networks") networks = config_loader.get_config_str("IP", "networks")
if not networks: if not networks:
print('No IP ranges provided') print('No IP ranges provided')
sys.exit(0) # sys.exit(0)
else:
try: try:
for network in networks.split(","): for network in networks.split(","):
self.ip_networks.add(IPv4Network(network)) self.ip_networks.add(IPv4Network(network))
@ -62,7 +63,10 @@ class IPAddress(AbstractModule):
self.logger.info(f"Module {self.module_name} initialized") self.logger.info(f"Module {self.module_name} initialized")
def compute(self, message, r_result=False): def compute(self, message, r_result=False):
item = Item(message) if not self.ip_networks:
return None
item = self.get_obj()
content = item.get_content() content = item.get_content()
# list of the regex results in the Item # list of the regex results in the Item
@ -82,8 +86,8 @@ class IPAddress(AbstractModule):
self.redis_logger.warning(f'{item.get_id()} contains {item.get_id()} IPs') self.redis_logger.warning(f'{item.get_id()} contains {item.get_id()} IPs')
# Tag message with IP # Tag message with IP
msg = f'infoleak:automatic-detection="ip";{item.get_id()}' tag = 'infoleak:automatic-detection="ip"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == "__main__": if __name__ == "__main__":

View file

@ -73,7 +73,7 @@ class Iban(AbstractModule):
return extracted return extracted
def compute(self, message): def compute(self, message):
item = Item(message) item = self.get_obj()
item_id = item.get_id() item_id = item.get_id()
ibans = self.regex_findall(self.iban_regex, item_id, item.get_content()) ibans = self.regex_findall(self.iban_regex, item_id, item.get_content())
@ -97,8 +97,8 @@ class Iban(AbstractModule):
to_print = f'Iban;{item.get_source()};{item.get_date()};{item.get_basename()};' to_print = f'Iban;{item.get_source()};{item.get_date()};{item.get_basename()};'
self.redis_logger.warning(f'{to_print}Checked found {len(valid_ibans)} IBAN;{item_id}') self.redis_logger.warning(f'{to_print}Checked found {len(valid_ibans)} IBAN;{item_id}')
# Tags # Tags
msg = f'infoleak:automatic-detection="iban";{item_id}' tag = 'infoleak:automatic-detection="iban"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -93,12 +93,12 @@ class Indexer(AbstractModule):
self.last_refresh = time_now self.last_refresh = time_now
def compute(self, message): def compute(self, message):
docpath = message.split(" ", -1)[-1] item = self.get_obj()
item = Item(message)
item_id = item.get_id() item_id = item.get_id()
item_content = item.get_content() item_content = item.get_content()
docpath = item_id
self.logger.debug(f"Indexing - {self.indexname}: {docpath}") self.logger.debug(f"Indexing - {self.indexname}: {docpath}")
print(f"Indexing - {self.indexname}: {docpath}") print(f"Indexing - {self.indexname}: {docpath}")

View file

@ -56,7 +56,7 @@ class Keys(AbstractModule):
self.pending_seconds = 1 self.pending_seconds = 1
def compute(self, message): def compute(self, message):
item = Item(message) item = self.get_obj()
content = item.get_content() content = item.get_content()
# find = False # find = False
@ -65,107 +65,107 @@ class Keys(AbstractModule):
if KeyEnum.PGP_MESSAGE.value in content: if KeyEnum.PGP_MESSAGE.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a PGP enc message') self.redis_logger.warning(f'{item.get_basename()} has a PGP enc message')
msg = f'infoleak:automatic-detection="pgp-message";{item.get_id()}' tag = 'infoleak:automatic-detection="pgp-message"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
get_pgp_content = True get_pgp_content = True
# find = True # find = True
if KeyEnum.PGP_PUBLIC_KEY_BLOCK.value in content: if KeyEnum.PGP_PUBLIC_KEY_BLOCK.value in content:
msg = f'infoleak:automatic-detection="pgp-public-key-block";{item.get_id()}' tag = 'infoleak:automatic-detection="pgp-public-key-block"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
get_pgp_content = True get_pgp_content = True
if KeyEnum.PGP_SIGNATURE.value in content: if KeyEnum.PGP_SIGNATURE.value in content:
msg = f'infoleak:automatic-detection="pgp-signature";{item.get_id()}' tag = 'infoleak:automatic-detection="pgp-signature"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
get_pgp_content = True get_pgp_content = True
if KeyEnum.PGP_PRIVATE_KEY_BLOCK.value in content: if KeyEnum.PGP_PRIVATE_KEY_BLOCK.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a pgp private key block message') self.redis_logger.warning(f'{item.get_basename()} has a pgp private key block message')
msg = f'infoleak:automatic-detection="pgp-private-key";{item.get_id()}' tag = 'infoleak:automatic-detection="pgp-private-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
get_pgp_content = True get_pgp_content = True
if KeyEnum.CERTIFICATE.value in content: if KeyEnum.CERTIFICATE.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a certificate message') self.redis_logger.warning(f'{item.get_basename()} has a certificate message')
msg = f'infoleak:automatic-detection="certificate";{item.get_id()}' tag = 'infoleak:automatic-detection="certificate"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.RSA_PRIVATE_KEY.value in content: if KeyEnum.RSA_PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a RSA private key message') self.redis_logger.warning(f'{item.get_basename()} has a RSA private key message')
print('rsa private key message found') print('rsa private key message found')
msg = f'infoleak:automatic-detection="rsa-private-key";{item.get_id()}' tag = 'infoleak:automatic-detection="rsa-private-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.PRIVATE_KEY.value in content: if KeyEnum.PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a private key message') self.redis_logger.warning(f'{item.get_basename()} has a private key message')
print('private key message found') print('private key message found')
msg = f'infoleak:automatic-detection="private-key";{item.get_id()}' tag = 'infoleak:automatic-detection="private-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.ENCRYPTED_PRIVATE_KEY.value in content: if KeyEnum.ENCRYPTED_PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has an encrypted private key message') self.redis_logger.warning(f'{item.get_basename()} has an encrypted private key message')
print('encrypted private key message found') print('encrypted private key message found')
msg = f'infoleak:automatic-detection="encrypted-private-key";{item.get_id()}' tag = 'infoleak:automatic-detection="encrypted-private-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.OPENSSH_PRIVATE_KEY.value in content: if KeyEnum.OPENSSH_PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message') self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message')
print('openssh private key message found') print('openssh private key message found')
msg = f'infoleak:automatic-detection="private-ssh-key";{item.get_id()}' tag = 'infoleak:automatic-detection="private-ssh-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.SSH2_ENCRYPTED_PRIVATE_KEY.value in content: if KeyEnum.SSH2_ENCRYPTED_PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has an ssh2 private key message') self.redis_logger.warning(f'{item.get_basename()} has an ssh2 private key message')
print('SSH2 private key message found') print('SSH2 private key message found')
msg = f'infoleak:automatic-detection="private-ssh-key";{item.get_id()}' tag = 'infoleak:automatic-detection="private-ssh-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.OPENVPN_STATIC_KEY_V1.value in content: if KeyEnum.OPENVPN_STATIC_KEY_V1.value in content:
self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message') self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message')
print('OpenVPN Static key message found') print('OpenVPN Static key message found')
msg = f'infoleak:automatic-detection="vpn-static-key";{item.get_id()}' tag = 'infoleak:automatic-detection="vpn-static-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.DSA_PRIVATE_KEY.value in content: if KeyEnum.DSA_PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a dsa private key message') self.redis_logger.warning(f'{item.get_basename()} has a dsa private key message')
msg = f'infoleak:automatic-detection="dsa-private-key";{item.get_id()}' tag = 'infoleak:automatic-detection="dsa-private-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.EC_PRIVATE_KEY.value in content: if KeyEnum.EC_PRIVATE_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has an ec private key message') self.redis_logger.warning(f'{item.get_basename()} has an ec private key message')
msg = f'infoleak:automatic-detection="ec-private-key";{item.get_id()}' tag = 'infoleak:automatic-detection="ec-private-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
if KeyEnum.PUBLIC_KEY.value in content: if KeyEnum.PUBLIC_KEY.value in content:
self.redis_logger.warning(f'{item.get_basename()} has a public key message') self.redis_logger.warning(f'{item.get_basename()} has a public key message')
msg = f'infoleak:automatic-detection="public-key";{item.get_id()}' tag = 'infoleak:automatic-detection="public-key"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# find = True # find = True
# pgp content # pgp content
if get_pgp_content: if get_pgp_content:
self.add_message_to_queue(item.get_id(), 'PgpDump') self.add_message_to_queue(queue='PgpDump')
# if find : # if find :
# # Send to duplicate # # Send to duplicate

View file

@ -25,11 +25,14 @@ class Languages(AbstractModule):
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
def compute(self, message): def compute(self, message):
item = Item(message) obj = self.get_obj()
if item.is_crawled():
domain = Domain(item.get_domain()) if obj.type == 'item':
for lang in item.get_languages(min_probability=0.8): if obj.is_crawled():
domain.add_language(lang.language) domain = Domain(obj.get_domain())
for lang in obj.get_languages(min_probability=0.8, force_gcld3=True):
print(lang)
domain.add_language(lang)
if __name__ == '__main__': if __name__ == '__main__':

View file

@ -25,9 +25,6 @@ sys.path.append(os.environ['AIL_BIN'])
# Import Project packages # Import Project packages
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader
from lib.objects.Items import Item
# from lib import Statistics
class LibInjection(AbstractModule): class LibInjection(AbstractModule):
"""docstring for LibInjection module.""" """docstring for LibInjection module."""
@ -40,7 +37,8 @@ class LibInjection(AbstractModule):
self.redis_logger.info(f"Module: {self.module_name} Launched") self.redis_logger.info(f"Module: {self.module_name} Launched")
def compute(self, message): def compute(self, message):
url, item_id = message.split() item = self.get_obj()
url = message
self.faup.decode(url) self.faup.decode(url)
url_parsed = self.faup.get() url_parsed = self.faup.get()
@ -68,7 +66,6 @@ class LibInjection(AbstractModule):
# print(f'query is sqli : {result_query}') # print(f'query is sqli : {result_query}')
if result_path['sqli'] is True or result_query['sqli'] is True: if result_path['sqli'] is True or result_query['sqli'] is True:
item = Item(item_id)
item_id = item.get_id() item_id = item.get_id()
print(f"Detected (libinjection) SQL in URL: {item_id}") print(f"Detected (libinjection) SQL in URL: {item_id}")
print(unquote(url)) print(unquote(url))
@ -77,8 +74,8 @@ class LibInjection(AbstractModule):
self.redis_logger.warning(to_print) self.redis_logger.warning(to_print)
# Add tag # Add tag
msg = f'infoleak:automatic-detection="sql-injection";{item_id}' tag = 'infoleak:automatic-detection="sql-injection"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# statistics # statistics
# # # TODO: # FIXME: remove me # # # TODO: # FIXME: remove me

View file

@ -45,8 +45,9 @@ class MISP_Thehive_Auto_Push(AbstractModule):
self.last_refresh = time.time() self.last_refresh = time.time()
self.redis_logger.info('Tags Auto Push refreshed') self.redis_logger.info('Tags Auto Push refreshed')
item_id, tag = message.split(';', 1) tag = message
item = Item(item_id) item = self.get_obj()
item_id = item.get_id()
# enabled # enabled
if 'misp' in self.tags: if 'misp' in self.tags:

View file

@ -135,11 +135,11 @@ class Mail(AbstractModule):
# # TODO: sanitize mails # # TODO: sanitize mails
def compute(self, message): def compute(self, message):
item_id, score = message.split() score = message
item = Item(item_id) item = self.get_obj()
item_date = item.get_date() item_date = item.get_date()
mails = self.regex_findall(self.email_regex, item_id, item.get_content()) mails = self.regex_findall(self.email_regex, item.id, item.get_content())
mxdomains_email = {} mxdomains_email = {}
for mail in mails: for mail in mails:
mxdomain = mail.rsplit('@', 1)[1].lower() mxdomain = mail.rsplit('@', 1)[1].lower()
@ -172,13 +172,13 @@ class Mail(AbstractModule):
# for tld in mx_tlds: # for tld in mx_tlds:
# Statistics.add_module_tld_stats_by_date('mail', item_date, tld, mx_tlds[tld]) # Statistics.add_module_tld_stats_by_date('mail', item_date, tld, mx_tlds[tld])
msg = f'Mails;{item.get_source()};{item_date};{item.get_basename()};Checked {num_valid_email} e-mail(s);{item_id}' msg = f'Mails;{item.get_source()};{item_date};{item.get_basename()};Checked {num_valid_email} e-mail(s);{item.id}'
if num_valid_email > self.mail_threshold: if num_valid_email > self.mail_threshold:
print(f'{item_id} Checked {num_valid_email} e-mail(s)') print(f'{item.id} Checked {num_valid_email} e-mail(s)')
self.redis_logger.warning(msg) self.redis_logger.warning(msg)
# Tags # Tags
msg = f'infoleak:automatic-detection="mail";{item_id}' tag = 'infoleak:automatic-detection="mail"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
elif num_valid_email > 0: elif num_valid_email > 0:
self.redis_logger.info(msg) self.redis_logger.info(msg)

View file

@ -9,7 +9,7 @@ This module is consuming the Redis-list created by the ZMQ_Feed_Q Module.
This module take all the feeds provided in the config. This module take all the feeds provided in the config.
Depending on the configuration, this module will process the feed as follow: Depending on the configuration, this module will process the feed as follows:
operation_mode 1: "Avoid any duplicate from any sources" operation_mode 1: "Avoid any duplicate from any sources"
- The module maintain a list of content for each item - The module maintain a list of content for each item
- If the content is new, process it - If the content is new, process it
@ -64,9 +64,6 @@ class Mixer(AbstractModule):
self.ttl_key = config_loader.get_config_int("Module_Mixer", "ttl_duplicate") self.ttl_key = config_loader.get_config_int("Module_Mixer", "ttl_duplicate")
self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name") self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name")
self.ITEMS_FOLDER = os.path.join(os.environ['AIL_HOME'], config_loader.get_config_str("Directories", "pastes")) + '/'
self.ITEMS_FOLDER = os.path.join(os.path.realpath(self.ITEMS_FOLDER), '')
self.nb_processed_items = 0 self.nb_processed_items = 0
self.feeders_processed = {} self.feeders_processed = {}
self.feeders_duplicate = {} self.feeders_duplicate = {}
@ -138,30 +135,38 @@ class Mixer(AbstractModule):
def compute(self, message): def compute(self, message):
self.refresh_stats() self.refresh_stats()
# obj = self.obj
# TODO CHECK IF NOT self.object -> get object global ID from message
splitted = message.split() splitted = message.split()
# Old Feeder name "feeder>>item_id gzip64encoded" # message -> feeder_name - content
if len(splitted) == 2: # or message -> feeder_name
item_id, gzip64encoded = splitted
try: # feeder_name - object
feeder_name, item_id = item_id.split('>>') if len(splitted) == 1: # feeder_name - object (content already saved)
feeder_name.replace(" ", "") feeder_name = message
if 'import_dir' in feeder_name: gzip64encoded = None
feeder_name = feeder_name.split('/')[1]
except ValueError: # Feeder name in message: "feeder obj_id gzip64encoded"
feeder_name = self.default_feeder_name elif len(splitted) == 2: # gzip64encoded content
# Feeder name in message: "feeder item_id gzip64encoded" feeder_name, gzip64encoded = splitted
elif len(splitted) == 3:
feeder_name, item_id, gzip64encoded = splitted
else: else:
print('Invalid message: not processed') self.logger.warning(f'Invalid Message: {splitted} not processed')
self.logger.debug(f'Invalid Item: {splitted[0]} not processed')
return None return None
# remove absolute path if self.obj.type == 'item':
item_id = item_id.replace(self.ITEMS_FOLDER, '', 1) # Remove ITEMS_FOLDER from item path (crawled item + submitted)
# Limit basename length
obj_id = self.obj.id
self.obj.sanitize_id()
if self.obj.id != obj_id:
self.queue.rename_message_obj(self.obj.id, obj_id)
relay_message = f'{item_id} {gzip64encoded}'
relay_message = gzip64encoded
# print(relay_message)
# TODO only work for item object
# Avoid any duplicate coming from any sources # Avoid any duplicate coming from any sources
if self.operation_mode == 1: if self.operation_mode == 1:
digest = hashlib.sha1(gzip64encoded.encode('utf8')).hexdigest() digest = hashlib.sha1(gzip64encoded.encode('utf8')).hexdigest()
@ -173,7 +178,7 @@ class Mixer(AbstractModule):
self.r_cache.expire(digest, self.ttl_key) self.r_cache.expire(digest, self.ttl_key)
self.increase_stat_processed(feeder_name) self.increase_stat_processed(feeder_name)
self.add_message_to_queue(relay_message) self.add_message_to_queue(message=relay_message)
# Need To Be Fixed, Currently doesn't check the source (-> same as operation 1) # Need To Be Fixed, Currently doesn't check the source (-> same as operation 1)
# # Keep duplicate coming from different sources # # Keep duplicate coming from different sources
@ -210,7 +215,10 @@ class Mixer(AbstractModule):
# No Filtering # No Filtering
else: else:
self.increase_stat_processed(feeder_name) self.increase_stat_processed(feeder_name)
self.add_message_to_queue(relay_message) if self.obj.type == 'item':
self.add_message_to_queue(obj=self.obj, message=gzip64encoded)
else:
self.add_message_to_queue(obj=self.obj)
if __name__ == "__main__": if __name__ == "__main__":

View file

@ -42,7 +42,8 @@ class Onion(AbstractModule):
self.faup = crawlers.get_faup() self.faup = crawlers.get_faup()
# activate_crawler = p.config.get("Crawler", "activate_crawler") # activate_crawler = p.config.get("Crawler", "activate_crawler")
self.har = config_loader.get_config_boolean('Crawler', 'default_har')
self.screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot')
self.onion_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)" self.onion_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
# self.i2p_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)" # self.i2p_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
@ -69,8 +70,8 @@ class Onion(AbstractModule):
onion_urls = [] onion_urls = []
domains = [] domains = []
item_id, score = message.split() score = message
item = Item(item_id) item = self.get_obj()
item_content = item.get_content() item_content = item.get_content()
# max execution time on regex # max execution time on regex
@ -90,8 +91,9 @@ class Onion(AbstractModule):
if onion_urls: if onion_urls:
if crawlers.is_crawler_activated(): if crawlers.is_crawler_activated():
for domain in domains: # TODO LOAD DEFAULT SCREENSHOT + HAR for domain in domains:
task_uuid = crawlers.create_task(domain, parent=item.get_id(), priority=0) task_uuid = crawlers.create_task(domain, parent=item.get_id(), priority=0,
har=self.har, screenshot=self.screenshot)
if task_uuid: if task_uuid:
print(f'{domain} added to crawler queue: {task_uuid}') print(f'{domain} added to crawler queue: {task_uuid}')
else: else:
@ -100,8 +102,8 @@ class Onion(AbstractModule):
self.redis_logger.warning(f'{to_print}Detected {len(domains)} .onion(s);{item.get_id()}') self.redis_logger.warning(f'{to_print}Detected {len(domains)} .onion(s);{item.get_id()}')
# TAG Item # TAG Item
msg = f'infoleak:automatic-detection="onion";{item.get_id()}' tag = 'infoleak:automatic-detection="onion"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == "__main__": if __name__ == "__main__":

144
bin/modules/Pasties.py Executable file
View file

@ -0,0 +1,144 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
"""
The Pasties Module
======================
This module spots domain-pasties services for further processing
"""
##################################
# Import External packages
##################################
import os
import sys
import time
from pyfaup.faup import Faup
sys.path.append(os.environ['AIL_BIN'])
##################################
# Import Project packages
##################################
from modules.abstract_module import AbstractModule
from lib.ConfigLoader import ConfigLoader
from lib import crawlers
# TODO add url validator
pasties_blocklist_urls = set()
pasties_domains = {}
class Pasties(AbstractModule):
"""
Pasties module for AIL framework
"""
def __init__(self):
super(Pasties, self).__init__()
self.faup = Faup()
config_loader = ConfigLoader()
self.r_cache = config_loader.get_redis_conn("Redis_Cache")
self.pasties = {}
self.urls_blocklist = set()
self.load_pasties_domains()
# Send module state to logs
self.logger.info(f'Module {self.module_name} initialized')
def load_pasties_domains(self):
self.pasties = {}
self.urls_blocklist = set()
domains_pasties = os.path.join(os.environ['AIL_HOME'], 'files/domains_pasties')
if os.path.exists(domains_pasties):
with open(domains_pasties) as f:
for line in f:
url = line.strip()
if url: # TODO validate line
self.faup.decode(url)
url_decoded = self.faup.get()
host = url_decoded['host']
# if url_decoded.get('port', ''):
# host = f'{host}:{url_decoded["port"]}'
path = url_decoded.get('resource_path', '')
# print(url_decoded)
if path and path != '/':
if path[-1] != '/':
path = f'{path}/'
else:
path = None
if host in self.pasties:
if path:
self.pasties[host].add(path)
else:
if path:
self.pasties[host] = {path}
else:
self.pasties[host] = set()
url_blocklist = os.path.join(os.environ['AIL_HOME'], 'files/domains_pasties_blacklist')
if os.path.exists(url_blocklist):
with open(url_blocklist) as f:
for line in f:
url = line.strip()
self.faup.decode(url)
url_decoded = self.faup.get()
host = url_decoded['host']
# if url_decoded.get('port', ''):
# host = f'{host}:{url_decoded["port"]}'
path = url_decoded.get('resource_path', '')
url = f'{host}{path}'
if url_decoded['query_string']:
url = url + url_decoded['query_string']
self.urls_blocklist.add(url)
def send_to_crawler(self, url, obj_id):
if not self.r_cache.exists(f'{self.module_name}:url:{url}'):
self.r_cache.set(f'{self.module_name}:url:{url}', int(time.time()))
self.r_cache.expire(f'{self.module_name}:url:{url}', 86400)
crawlers.create_task(url, depth=0, har=False, screenshot=False, proxy='force_tor', priority=60, parent=obj_id)
def compute(self, message):
url = message.split()
self.faup.decode(url)
url_decoded = self.faup.get()
# print(url_decoded)
url_host = url_decoded['host']
# if url_decoded.get('port', ''):
# url_host = f'{url_host}:{url_decoded["port"]}'
path = url_decoded.get('resource_path', '')
if url_host in self.pasties:
if url.startswith('http://'):
if url[7:] in self.urls_blocklist:
return None
elif url.startswith('https://'):
if url[8:] in self.urls_blocklist:
return None
else:
if url in self.urls_blocklist:
return None
if not self.pasties[url_host]:
if path and path != '/':
print('send to crawler', url_host, url)
self.send_to_crawler(url, self.obj.id)
else:
if path.endswith('/'):
path_end = path[:-1]
else:
path_end = f'{path}/'
for url_path in self.pasties[url_host]:
if path.startswith(url_path):
if url_path != path and url_path != path_end:
print('send to crawler', url_path, url)
self.send_to_crawler(url, self.obj.id))
break
if __name__ == '__main__':
module = Pasties()
module.run()

View file

@ -24,7 +24,6 @@ sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.objects import Pgps from lib.objects import Pgps
from lib.objects.Items import Item
from trackers.Tracker_Term import Tracker_Term from trackers.Tracker_Term import Tracker_Term
from trackers.Tracker_Regex import Tracker_Regex from trackers.Tracker_Regex import Tracker_Regex
from trackers.Tracker_Yara import Tracker_Yara from trackers.Tracker_Yara import Tracker_Yara
@ -61,7 +60,6 @@ class PgpDump(AbstractModule):
self.tracker_yara = Tracker_Yara(queue=False) self.tracker_yara = Tracker_Yara(queue=False)
# init # init
self.item_id = None
self.keys = set() self.keys = set()
self.private_keys = set() self.private_keys = set()
self.names = set() self.names = set()
@ -93,11 +91,11 @@ class PgpDump(AbstractModule):
print() print()
pgp_block = self.remove_html(pgp_block) pgp_block = self.remove_html(pgp_block)
# Remove Version # Remove Version
versions = self.regex_findall(self.reg_tool_version, self.item_id, pgp_block) versions = self.regex_findall(self.reg_tool_version, self.obj.id, pgp_block)
for version in versions: for version in versions:
pgp_block = pgp_block.replace(version, '') pgp_block = pgp_block.replace(version, '')
# Remove Comment # Remove Comment
comments = self.regex_findall(self.reg_block_comment, self.item_id, pgp_block) comments = self.regex_findall(self.reg_block_comment, self.obj.id, pgp_block)
for comment in comments: for comment in comments:
pgp_block = pgp_block.replace(comment, '') pgp_block = pgp_block.replace(comment, '')
# Remove Empty Lines # Remove Empty Lines
@ -130,7 +128,7 @@ class PgpDump(AbstractModule):
try: try:
output = output.decode() output = output.decode()
except UnicodeDecodeError: except UnicodeDecodeError:
self.logger.error(f'Error PgpDump UnicodeDecodeError: {self.item_id}') self.logger.error(f'Error PgpDump UnicodeDecodeError: {self.obj.id}')
output = '' output = ''
return output return output
@ -145,7 +143,7 @@ class PgpDump(AbstractModule):
private = True private = True
else: else:
private = False private = False
users = self.regex_findall(self.reg_user_id, self.item_id, pgpdump_output) users = self.regex_findall(self.reg_user_id, self.obj.id, pgpdump_output)
for user in users: for user in users:
# avoid key injection in user_id: # avoid key injection in user_id:
pgpdump_output.replace(user, '', 1) pgpdump_output.replace(user, '', 1)
@ -159,7 +157,7 @@ class PgpDump(AbstractModule):
name = user name = user
self.names.add(name) self.names.add(name)
keys = self.regex_findall(self.reg_key_id, self.item_id, pgpdump_output) keys = self.regex_findall(self.reg_key_id, self.obj.id, pgpdump_output)
for key_id in keys: for key_id in keys:
key_id = key_id.replace('Key ID - ', '', 1) key_id = key_id.replace('Key ID - ', '', 1)
if key_id != '0x0000000000000000': if key_id != '0x0000000000000000':
@ -171,28 +169,26 @@ class PgpDump(AbstractModule):
print('symmetrically encrypted') print('symmetrically encrypted')
def compute(self, message): def compute(self, message):
item = Item(message) content = self.obj.get_content()
self.item_id = item.get_id()
content = item.get_content()
pgp_blocks = [] pgp_blocks = []
# Public Block # Public Block
for pgp_block in self.regex_findall(self.reg_pgp_public_blocs, self.item_id, content): for pgp_block in self.regex_findall(self.reg_pgp_public_blocs, self.obj.id, content):
# content = content.replace(pgp_block, '') # content = content.replace(pgp_block, '')
pgp_block = self.sanitize_pgp_block(pgp_block) pgp_block = self.sanitize_pgp_block(pgp_block)
pgp_blocks.append(pgp_block) pgp_blocks.append(pgp_block)
# Private Block # Private Block
for pgp_block in self.regex_findall(self.reg_pgp_private_blocs, self.item_id, content): for pgp_block in self.regex_findall(self.reg_pgp_private_blocs, self.obj.id, content):
# content = content.replace(pgp_block, '') # content = content.replace(pgp_block, '')
pgp_block = self.sanitize_pgp_block(pgp_block) pgp_block = self.sanitize_pgp_block(pgp_block)
pgp_blocks.append(pgp_block) pgp_blocks.append(pgp_block)
# Signature # Signature
for pgp_block in self.regex_findall(self.reg_pgp_signature, self.item_id, content): for pgp_block in self.regex_findall(self.reg_pgp_signature, self.obj.id, content):
# content = content.replace(pgp_block, '') # content = content.replace(pgp_block, '')
pgp_block = self.sanitize_pgp_block(pgp_block) pgp_block = self.sanitize_pgp_block(pgp_block)
pgp_blocks.append(pgp_block) pgp_blocks.append(pgp_block)
# Message # Message
for pgp_block in self.regex_findall(self.reg_pgp_message, self.item_id, content): for pgp_block in self.regex_findall(self.reg_pgp_message, self.obj.id, content):
pgp_block = self.sanitize_pgp_block(pgp_block) pgp_block = self.sanitize_pgp_block(pgp_block)
pgp_blocks.append(pgp_block) pgp_blocks.append(pgp_block)
@ -206,26 +202,26 @@ class PgpDump(AbstractModule):
self.extract_id_from_pgpdump_output(pgpdump_output) self.extract_id_from_pgpdump_output(pgpdump_output)
if self.keys or self.names or self.mails: if self.keys or self.names or self.mails:
print(self.item_id) print(self.obj.id)
date = item.get_date() date = self.obj.get_date()
for key in self.keys: for key in self.keys:
pgp = Pgps.Pgp(key, 'key') pgp = Pgps.Pgp(key, 'key')
pgp.add(date, self.item_id) pgp.add(date, self.obj)
print(f' key: {key}') print(f' key: {key}')
for name in self.names: for name in self.names:
pgp = Pgps.Pgp(name, 'name') pgp = Pgps.Pgp(name, 'name')
pgp.add(date, self.item_id) pgp.add(date, self.obj)
print(f' name: {name}') print(f' name: {name}')
self.tracker_term.compute(name, obj_type='pgp', subtype='name') self.tracker_term.compute_manual(pgp)
self.tracker_regex.compute(name, obj_type='pgp', subtype='name') self.tracker_regex.compute_manual(pgp)
self.tracker_yara.compute(name, obj_type='pgp', subtype='name') self.tracker_yara.compute_manual(pgp)
for mail in self.mails: for mail in self.mails:
pgp = Pgps.Pgp(mail, 'mail') pgp = Pgps.Pgp(mail, 'mail')
pgp.add(date, self.item_id) pgp.add(date, self.obj)
print(f' mail: {mail}') print(f' mail: {mail}')
self.tracker_term.compute(mail, obj_type='pgp', subtype='mail') self.tracker_term.compute_manual(pgp)
self.tracker_regex.compute(mail, obj_type='pgp', subtype='mail') self.tracker_regex.compute_manual(pgp)
self.tracker_yara.compute(mail, obj_type='pgp', subtype='mail') self.tracker_yara.compute_manual(pgp)
# Keys extracted from PGP PRIVATE KEY BLOCK # Keys extracted from PGP PRIVATE KEY BLOCK
for key in self.private_keys: for key in self.private_keys:
@ -234,11 +230,10 @@ class PgpDump(AbstractModule):
print(f' private key: {key}') print(f' private key: {key}')
if self.symmetrically_encrypted: if self.symmetrically_encrypted:
msg = f'infoleak:automatic-detection="pgp-symmetric";{self.item_id}' tag = 'infoleak:automatic-detection="pgp-symmetric"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == '__main__': if __name__ == '__main__':
module = PgpDump() module = PgpDump()
module.run() module.run()

View file

@ -43,13 +43,13 @@ class Phone(AbstractModule):
def extract(self, obj_id, content, tag): def extract(self, obj_id, content, tag):
extracted = [] extracted = []
phones = self.regex_phone_iter('US', obj_id, content) phones = self.regex_phone_iter('ZZ', obj_id, content)
for phone in phones: for phone in phones:
extracted.append([phone[0], phone[1], phone[2], f'tag:{tag}']) extracted.append([phone[0], phone[1], phone[2], f'tag:{tag}'])
return extracted return extracted
def compute(self, message): def compute(self, message):
item = Item(message) item = self.get_obj()
content = item.get_content() content = item.get_content()
# TODO use language detection to choose the country code ? # TODO use language detection to choose the country code ?
@ -59,8 +59,8 @@ class Phone(AbstractModule):
if results: if results:
# TAGS # TAGS
msg = f'infoleak:automatic-detection="phone-number";{item.get_id()}' tag = 'infoleak:automatic-detection="phone-number"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
self.redis_logger.warning(f'{item.get_id()} contains {len(phone)} Phone numbers') self.redis_logger.warning(f'{item.get_id()} contains {len(phone)} Phone numbers')

View file

@ -44,22 +44,21 @@ class SQLInjectionDetection(AbstractModule):
self.logger.info(f"Module: {self.module_name} Launched") self.logger.info(f"Module: {self.module_name} Launched")
def compute(self, message): def compute(self, message):
url, item_id = message.split() url = message
item = self.get_obj()
if self.is_sql_injection(url): if self.is_sql_injection(url):
self.faup.decode(url) self.faup.decode(url)
url_parsed = self.faup.get() url_parsed = self.faup.get()
item = Item(item_id)
item_id = item.get_id()
print(f"Detected SQL in URL: {item_id}") print(f"Detected SQL in URL: {item_id}")
print(urllib.request.unquote(url)) print(urllib.request.unquote(url))
to_print = f'SQLInjection;{item.get_source()};{item.get_date()};{item.get_basename()};Detected SQL in URL;{item_id}' to_print = f'SQLInjection;{item.get_source()};{item.get_date()};{item.get_basename()};Detected SQL in URL;{item_id}'
self.redis_logger.warning(to_print) self.redis_logger.warning(to_print)
# Tag # Tag
msg = f'infoleak:automatic-detection="sql-injection";{item_id}' tag = f'infoleak:automatic-detection="sql-injection";{item_id}'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
# statistics # statistics
# tld = url_parsed['tld'] # tld = url_parsed['tld']

View file

@ -16,8 +16,6 @@ import gzip
import base64 import base64
import datetime import datetime
import time import time
# from sflock.main import unpack
# import sflock
sys.path.append(os.environ['AIL_BIN']) sys.path.append(os.environ['AIL_BIN'])
################################## ##################################
@ -27,7 +25,7 @@ from modules.abstract_module import AbstractModule
from lib.objects.Items import ITEMS_FOLDER from lib.objects.Items import ITEMS_FOLDER
from lib import ConfigLoader from lib import ConfigLoader
from lib import Tag from lib import Tag
from lib.objects.Items import Item
class SubmitPaste(AbstractModule): class SubmitPaste(AbstractModule):
""" """
@ -48,7 +46,6 @@ class SubmitPaste(AbstractModule):
""" """
super(SubmitPaste, self).__init__() super(SubmitPaste, self).__init__()
# TODO KVROCKS
self.r_serv_db = ConfigLoader.ConfigLoader().get_db_conn("Kvrocks_DB") self.r_serv_db = ConfigLoader.ConfigLoader().get_db_conn("Kvrocks_DB")
self.r_serv_log_submit = ConfigLoader.ConfigLoader().get_redis_conn("Redis_Log_submit") self.r_serv_log_submit = ConfigLoader.ConfigLoader().get_redis_conn("Redis_Log_submit")
@ -279,9 +276,11 @@ class SubmitPaste(AbstractModule):
rel_item_path = save_path.replace(self.PASTES_FOLDER, '', 1) rel_item_path = save_path.replace(self.PASTES_FOLDER, '', 1)
self.redis_logger.debug(f"relative path {rel_item_path}") self.redis_logger.debug(f"relative path {rel_item_path}")
item = Item(rel_item_path)
# send paste to Global module # send paste to Global module
relay_message = f"submitted {rel_item_path} {gzip64encoded}" relay_message = f"submitted {gzip64encoded}"
self.add_message_to_queue(relay_message) self.add_message_to_queue(obj=item, message=relay_message)
# add tags # add tags
for tag in ltags: for tag in ltags:

View file

@ -20,9 +20,6 @@ sys.path.append(os.environ['AIL_BIN'])
# Import Project packages # Import Project packages
################################## ##################################
from modules.abstract_module import AbstractModule from modules.abstract_module import AbstractModule
from lib.objects.Items import Item
from lib import Tag
class Tags(AbstractModule): class Tags(AbstractModule):
""" """
@ -39,26 +36,15 @@ class Tags(AbstractModule):
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
def compute(self, message): def compute(self, message):
# Extract item ID and tag from message item = self.obj
mess_split = message.split(';') tag = message
if len(mess_split) == 2:
tag = mess_split[0]
item = Item(mess_split[1])
# Create a new tag # Create a new tag
Tag.add_object_tag(tag, 'item', item.get_id()) item.add_tag(tag)
print(f'{item.get_id()}: Tagged {tag}') print(f'{item.get_id()}: Tagged {tag}')
# Forward message to channel # Forward message to channel
self.add_message_to_queue(message, 'Tag_feed') self.add_message_to_queue(message=tag, queue='Tag_feed')
message = f'{item.get_type()};{item.get_subtype(r_str=True)};{item.get_id()}'
self.add_message_to_queue(message, 'Sync')
else:
# Malformed message
raise Exception(f'too many values to unpack (expected 2) given {len(mess_split)} with message {message}')
if __name__ == '__main__': if __name__ == '__main__':
module = Tags() module = Tags()

View file

@ -41,7 +41,7 @@ class Telegram(AbstractModule):
self.logger.info(f"Module {self.module_name} initialized") self.logger.info(f"Module {self.module_name} initialized")
def compute(self, message, r_result=False): def compute(self, message, r_result=False):
item = Item(message) item = self.get_obj()
item_content = item.get_content() item_content = item.get_content()
item_date = item.get_date() item_date = item.get_date()
@ -58,7 +58,7 @@ class Telegram(AbstractModule):
user_id = dict_url.get('username') user_id = dict_url.get('username')
if user_id: if user_id:
username = Username(user_id, 'telegram') username = Username(user_id, 'telegram')
username.add(item_date, item.id) username.add(item_date, item)
print(f'username: {user_id}') print(f'username: {user_id}')
invite_hash = dict_url.get('invite_hash') invite_hash = dict_url.get('invite_hash')
if invite_hash: if invite_hash:
@ -73,7 +73,7 @@ class Telegram(AbstractModule):
user_id = dict_url.get('username') user_id = dict_url.get('username')
if user_id: if user_id:
username = Username(user_id, 'telegram') username = Username(user_id, 'telegram')
username.add(item_date, item.id) username.add(item_date, item)
print(f'username: {user_id}') print(f'username: {user_id}')
invite_hash = dict_url.get('invite_hash') invite_hash = dict_url.get('invite_hash')
if invite_hash: if invite_hash:
@ -86,8 +86,8 @@ class Telegram(AbstractModule):
# CREATE TAG # CREATE TAG
if invite_code_found: if invite_code_found:
# tags # tags
msg = f'infoleak:automatic-detection="telegram-invite-hash";{item.id}' tag = 'infoleak:automatic-detection="telegram-invite-hash"'
self.add_message_to_queue(msg, 'Tags') self.add_message_to_queue(message=tag, queue='Tags')
if __name__ == "__main__": if __name__ == "__main__":

View file

@ -30,15 +30,15 @@ class Template(AbstractModule):
def __init__(self): def __init__(self):
super(Template, self).__init__() super(Template, self).__init__()
# Pending time between two computation (computeNone) in seconds # Pending time between two computation (computeNone) in seconds, 10 by default
self.pending_seconds = 10 # self.pending_seconds = 10
# Send module state to logs # logs
self.logger.info(f'Module {self.module_name} initialized') self.logger.info(f'Module {self.module_name} initialized')
# def computeNone(self): # def computeNone(self):
# """ # """
# Do something when there is no message in the queue # Do something when there is no message in the queue. Optional
# """ # """
# self.logger.debug("No message in queue") # self.logger.debug("No message in queue")
@ -53,6 +53,5 @@ class Template(AbstractModule):
if __name__ == '__main__': if __name__ == '__main__':
module = Template() module = Template()
module.run() module.run()

Some files were not shown because too many files have changed in this diff Show more