mirror of
https://github.com/ail-project/ail-framework.git
synced 2024-11-22 22:27:17 +00:00
Merge branch 'master' into gunicorn
This commit is contained in:
commit
25fb6ca377
313 changed files with 19458 additions and 6237 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -16,6 +16,7 @@ tlsh
|
|||
Blooms
|
||||
PASTES
|
||||
CRAWLED_SCREENSHOT
|
||||
IMAGES
|
||||
BASE64
|
||||
HASHS
|
||||
DATA_ARDB
|
||||
|
|
149
HOWTO.md
149
HOWTO.md
|
@ -1,143 +1,72 @@
|
|||
|
||||
Feeding, adding new features and contributing
|
||||
=============================================
|
||||
# Feeding, Adding new features and Contributing
|
||||
|
||||
How to feed the AIL framework
|
||||
-----------------------------
|
||||
## [AIL Importers](./doc/README.md#ail-importers)
|
||||
|
||||
For the moment, there are three different ways to feed AIL with data:
|
||||
Refer to the [AIL Importers Documentation](./doc/README.md#ail-importers)
|
||||
|
||||
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP you are using for AIL.
|
||||
|
||||
2. You can setup [pystemon](https://github.com/cvandeplas/pystemon) and use the custom feeder provided by AIL (see below).
|
||||
|
||||
3. You can feed your own data using the [./bin/file_dir_importer.py](./bin/import_dir.py) script.
|
||||
|
||||
### Feeding AIL with pystemon
|
||||
## Feeding Data to AIL
|
||||
|
||||
AIL is an analysis tool, not a collector!
|
||||
However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries!
|
||||
|
||||
Feed data to AIL:
|
||||
1. [AIL Importers](./doc/README.md#ail-importers)
|
||||
2. ZMQ: Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP you are using for AIL.
|
||||
|
||||
1. Clone the [pystemon's git repository](https://github.com/cvandeplas/pystemon):
|
||||
``` git clone https://github.com/cvandeplas/pystemon.git ```
|
||||
## How to create a new module
|
||||
|
||||
2. Edit configuration file for pystemon ```pystemon/pystemon.yaml```:
|
||||
* Configuration of storage section (adapt to your needs):
|
||||
```
|
||||
storage:
|
||||
archive:
|
||||
storage-classname: FileStorage
|
||||
save: yes
|
||||
save-all: yes
|
||||
dir: "alerts"
|
||||
dir-all: "archive"
|
||||
compress: yes
|
||||
|
||||
redis:
|
||||
storage-classname: RedisStorage
|
||||
save: yes
|
||||
save-all: yes
|
||||
server: "localhost"
|
||||
port: 6379
|
||||
database: 10
|
||||
lookup: no
|
||||
```
|
||||
* Change configuration for paste-sites according to your needs (don't forget to throttle download time and/or update time).
|
||||
3. Install python dependencies inside the virtual environment:
|
||||
```
|
||||
cd ail-framework/
|
||||
. ./AILENV/bin/activate
|
||||
cd pystemon/ #cd to pystemon folder
|
||||
pip3 install -U -r requirements.txt
|
||||
```
|
||||
4. Edit configuration file ```ail-framework/configs/core.cfg```:
|
||||
* Modify the "pystemonpath" path accordingly
|
||||
To add a new processing or analysis module to AIL, follow these steps:
|
||||
|
||||
5. Launch ail-framework, pystemon and pystemon-feeder.py (still inside virtual environment):
|
||||
* Option 1 (recommended):
|
||||
```
|
||||
./ail-framework/bin/LAUNCH.py -l #starts ail-framework
|
||||
./ail-framework/bin/LAUNCH.py -f #starts pystemon and the pystemon-feeder.py
|
||||
```
|
||||
* Option 2 (you may need two terminal windows):
|
||||
```
|
||||
./ail-framework/bin/LAUNCH.py -l #starts ail-framework
|
||||
./pystemon/pystemon.py
|
||||
./ail-framework/bin/feeder/pystemon-feeder.py
|
||||
```
|
||||
|
||||
How to create a new module
|
||||
--------------------------
|
||||
|
||||
If you want to add a new processing or analysis module in AIL, follow these simple steps:
|
||||
|
||||
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to at least one module at minimum (Usually, Redis_Global).
|
||||
|
||||
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
|
||||
1. Add your module name in [./configs/modules.cfg](./configs/modules.cfg) and subscribe to at least one module at minimum (Usually, `Item`).
|
||||
2. Use [./bin/modules/modules/TemplateModule.py](./bin/modules/modules/TemplateModule.py) as a sample module and create a new file in bin/modules with the module name used in the `modules.cfg` configuration.
|
||||
|
||||
|
||||
How to contribute a module
|
||||
--------------------------
|
||||
## Contributions
|
||||
|
||||
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
|
||||
Contributions are welcome! Fork the repository, experiment with the code, and submit your modules or patches through a pull request.
|
||||
|
||||
To contribute your module, feel free to pull your contribution.
|
||||
## Crawler
|
||||
|
||||
|
||||
Additional information
|
||||
======================
|
||||
|
||||
Crawler
|
||||
---------------------
|
||||
|
||||
In AIL, you can crawl websites and Tor hidden services. Don't forget to review the proxy configuration of your Tor client and especially if you enabled the SOCKS5 proxy
|
||||
|
||||
[//]: # (and binding on the appropriate IP address reachable via the dockers where Splash runs.)
|
||||
AIL supports crawling of websites and Tor hidden services. Ensure your Tor client's proxy configuration is correct, especially the SOCKS5 proxy settings.
|
||||
|
||||
### Installation
|
||||
|
||||
|
||||
[Install Lacus](https://github.com/ail-project/lacus)
|
||||
|
||||
### Configuration
|
||||
|
||||
1. Lacus URL:
|
||||
In the webinterface, go to ``Crawlers>Settings`` and click on the Edit button
|
||||
In the web interface, go to `Crawlers` > `Settings` and click on the Edit button
|
||||
|
||||
![AIL Crawler Config](./doc/screenshots/lacus_config.png?raw=true "AIL Lacus Config")
|
||||
|
||||
![Splash Manager Config](./doc/screenshots/lacus_config.png?raw=true "AIL Lacus Config")
|
||||
![AIL Crawler Config Edis](./doc/screenshots/lacus_config_edit.png?raw=true "AIL Lacus Config")
|
||||
|
||||
![Splash Manager Config](./doc/screenshots/lacus_config_edit.png?raw=true "AIL Lacus Config")
|
||||
|
||||
2. Launch AIL Crawlers:
|
||||
2. Number of Crawlers:
|
||||
Choose the number of crawlers you want to launch
|
||||
|
||||
![Splash Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
![Splash Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures_edit.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
![Crawler Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
|
||||
![Crawler Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures_edit.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
|
||||
Kvrocks Migration
|
||||
---------------------
|
||||
**Important Note:
|
||||
We are currently working on a [migration script](https://github.com/ail-project/ail-framework/blob/master/bin/DB_KVROCKS_MIGRATION.py) to facilitate the migration to Kvrocks.
|
||||
Once this script is ready, AIL version 5.0 will be released.**
|
||||
## Chats Translation with LibreTranslate
|
||||
|
||||
Please note that the current version of this migration script only supports migrating the database on the same server.
|
||||
(If you plan to migrate to another server, we will provide additional instructions in this section once the migration script is completed)
|
||||
Chats message can be translated using [libretranslate](https://github.com/LibreTranslate/LibreTranslate), an open-source self-hosted machine translation.
|
||||
|
||||
### Installation:
|
||||
1. Install LibreTranslate by running the following command:
|
||||
```bash
|
||||
pip install libretranslate
|
||||
```
|
||||
2. Run libretranslate:
|
||||
```bash
|
||||
libretranslate
|
||||
```
|
||||
|
||||
### Configuration:
|
||||
To enable LibreTranslate for chat translation, edit the LibreTranslate URL in the [./configs/core.cfg](./configs/core.cfg) file under the [Translation] section.
|
||||
```
|
||||
[Translation]
|
||||
libretranslate = http://127.0.0.1:5000
|
||||
```
|
||||
|
||||
To migrate your database to Kvrocks:
|
||||
1. Launch ARDB and Kvrocks
|
||||
2. Pull from remote
|
||||
```
|
||||
git checkout master
|
||||
git pull
|
||||
```
|
||||
3. Launch the migration script:
|
||||
```
|
||||
git checkout master
|
||||
git pull
|
||||
cd bin/
|
||||
./DB_KVROCKS_MIGRATION.py
|
||||
```
|
||||
|
|
224
README.md
224
README.md
|
@ -1,9 +1,6 @@
|
|||
AIL
|
||||
===
|
||||
# AIL framework
|
||||
|
||||
<p align="center">
|
||||
<img src="https://raw.githubusercontent.com/ail-project/ail-framework/master/var/www/static/image/ail-icon.png" height="250" />
|
||||
</p>
|
||||
<img src="https://raw.githubusercontent.com/ail-project/ail-framework/master/var/www/static/image/ail-icon.png" height="400" />
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
|
@ -12,7 +9,7 @@ AIL
|
|||
</tr>
|
||||
<tr>
|
||||
<td>CI</td>
|
||||
<td><a href="https://github.com/CIRCL/AIL-framework/actions/workflows/ail_framework_test.yml"><img src="https://github.com/CIRCL/AIL-framework/actions/workflows/ail_framework_test.yml/badge.svg"></a></td>
|
||||
<td><a href="https://github.com/ail-project/ail-framework/actions/workflows/ail_framework_test.yml"><img src="https://github.com/ail-project/ail-framework/actions/workflows/ail_framework_test.yml/badge.svg"></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Gitter</td>
|
||||
|
@ -28,59 +25,72 @@ AIL
|
|||
</tr>
|
||||
</table>
|
||||
|
||||
![Logo](./doc/logo/logo-small.png?raw=true "AIL logo")
|
||||
|
||||
AIL framework - Framework for Analysis of Information Leaks
|
||||
|
||||
AIL is a modular framework to analyse potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).
|
||||
|
||||
![Dashboard](./doc/screenshots/dashboard.png?raw=true "AIL framework dashboard")
|
||||
![Dashboard](./doc/screenshots/dashboard0.png?raw=true "AIL framework dashboard")
|
||||
|
||||
|
||||
![Finding webshells with AIL](./doc/screenshots/webshells.gif?raw=true "Finding websheels with AIL")
|
||||
![Finding webshells with AIL](./doc/screenshots/webshells.gif?raw=true "Finding webshells with AIL")
|
||||
|
||||
Features
|
||||
--------
|
||||
## AIL V5.0 Version:
|
||||
|
||||
* Modular architecture to handle streams of unstructured or structured information
|
||||
* Default support for external ZMQ feeds, such as provided by CIRCL or other providers
|
||||
* Multiple feed support
|
||||
* Each module can process and reprocess the information already processed by AIL
|
||||
* Detecting and extracting URLs including their geographical location (e.g. IP address location)
|
||||
* Extracting and validating potential leaks of credit card numbers, credentials, ...
|
||||
* Extracting and validating leaked email addresses, including DNS MX validation
|
||||
* Module for extracting Tor .onion addresses (to be further processed for analysis)
|
||||
* Keep tracks of duplicates (and diffing between each duplicate found)
|
||||
* Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
|
||||
* A full-text indexer module to index unstructured information
|
||||
* Statistics on modules and web
|
||||
* Real-time modules manager in terminal
|
||||
* Global sentiment analysis for each providers based on nltk vader module
|
||||
* Terms, Set of terms and Regex tracking and occurrence
|
||||
* Many more modules for extracting phone numbers, credentials and others
|
||||
* Alerting to [MISP](https://github.com/MISP/MISP) to share found leaks within a threat intelligence platform using [MISP standard](https://www.misp-project.org/objects.html#_ail_leak)
|
||||
* Detect and decode encoded file (Base64, hex encoded or your own decoding scheme) and store files
|
||||
* Detect Amazon AWS and Google API keys
|
||||
* Detect Bitcoin address and Bitcoin private keys
|
||||
* Detect private keys, certificate, keys (including SSH, OpenVPN)
|
||||
* Detect IBAN bank accounts
|
||||
* Tagging system with [MISP Galaxy](https://github.com/MISP/misp-galaxy) and [MISP Taxonomies](https://github.com/MISP/misp-taxonomies) tags
|
||||
* UI paste submission
|
||||
* Create events on [MISP](https://github.com/MISP/MISP) and cases on [The Hive](https://github.com/TheHive-Project/TheHive)
|
||||
* Automatic paste export at detection on [MISP](https://github.com/MISP/MISP) (events) and [The Hive](https://github.com/TheHive-Project/TheHive) (alerts) on selected tags
|
||||
* Extracted and decoded files can be searched by date range, type of file (mime-type) and encoding discovered
|
||||
* Graph relationships between decoded file (hashes), similar PGP UIDs and addresses of cryptocurrencies
|
||||
* Tor hidden services crawler to crawl and parse output
|
||||
* Tor onion availability is monitored to detect up and down of hidden services
|
||||
* Browser hidden services are screenshot and integrated in the analysed output including a blurring screenshot interface (to avoid "burning the eyes" of the security analysis with specific content)
|
||||
* Tor hidden services is part of the standard framework, all the AIL modules are available to the crawled hidden services
|
||||
* Generic web crawler to trigger crawling on demand or at regular interval URL or Tor hidden services
|
||||
AIL v5.0 introduces significant improvements and new features:
|
||||
|
||||
- **Codebase Rewrite**: The codebase has undergone a substantial rewrite,
|
||||
resulting in enhanced performance and speed improvements.
|
||||
- **Database Upgrade**: The database has been migrated from ARDB to Kvrocks.
|
||||
- **New Correlation Engine**: AIL v5.0 introduces a new powerful correlation engine with two new correlation types: CVE and Title.
|
||||
- **Enhanced Logging**: The logging system has been improved to provide better troubleshooting capabilities.
|
||||
- **Tagging Support**: [AIL objects](./doc/README.md#ail-objects) now support tagging,
|
||||
allowing users to categorize and label extracted information for easier analysis and organization.
|
||||
- **Trackers**: Improved objects filtering, PGP and decoded tracking added.
|
||||
- **UI Content Visualization**: The user interface has been upgraded to visualize extracted and tracked information.
|
||||
- **New Crawler Lacus**: improve crawling capabilities.
|
||||
- **Modular Importers and Exporters**: New importers (ZMQ, AIL Feeders) and exporters (MISP, Mail, TheHive) modular design.
|
||||
Allow easy creation and customization by extending an abstract class.
|
||||
- **Module Queues**: improved the queuing mechanism between detection modules.
|
||||
- **New Object CVE and Title**: Extract an correlate CVE IDs and web page titles.
|
||||
|
||||
## Features
|
||||
|
||||
- Modular architecture to handle streams of unstructured or structured information
|
||||
- Default support for external ZMQ feeds, such as provided by CIRCL or other providers
|
||||
- Multiple Importers and feeds support
|
||||
- Each module can process and reprocess the information already analyzed by AIL
|
||||
- Detecting and extracting URLs including their geographical location (e.g. IP address location)
|
||||
- Extracting and validating potential leaks of credit card numbers, credentials, ...
|
||||
- Extracting and validating leaked email addresses, including DNS MX validation
|
||||
- Module for extracting Tor .onion addresses for further analysis
|
||||
- Keep tracks of credentials duplicates (and diffing between each duplicate found)
|
||||
- Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
|
||||
- A full-text indexer module to index unstructured information
|
||||
- Terms, Set of terms, Regex, typo squatting and YARA tracking and occurrence
|
||||
- YARA Retro Hunt
|
||||
- Many more modules for extracting phone numbers, credentials, and more
|
||||
- Alerting to [MISP](https://github.com/MISP/MISP) to share found leaks within a threat intelligence platform using [MISP standard](https://www.misp-project.org/objects.html#_ail_leak)
|
||||
- Detecting and decoding encoded file (Base64, hex encoded or your own decoding scheme) and storing files
|
||||
- Detecting Amazon AWS and Google API keys
|
||||
- Detecting Bitcoin address and Bitcoin private keys
|
||||
- Detecting private keys, certificate, keys (including SSH, OpenVPN)
|
||||
- Detecting IBAN bank accounts
|
||||
- Tagging system with [MISP Galaxy](https://github.com/MISP/misp-galaxy) and [MISP Taxonomies](https://github.com/MISP/misp-taxonomies) tags
|
||||
- UI submission
|
||||
- Create events on [MISP](https://github.com/MISP/MISP) and cases on [The Hive](https://github.com/TheHive-Project/TheHive)
|
||||
- Automatic export on detection with [MISP](https://github.com/MISP/MISP) (events) and [The Hive](https://github.com/TheHive-Project/TheHive) (alerts) on selected tags
|
||||
- Extracted and decoded files can be searched by date range, type of file (mime-type) and encoding discovered
|
||||
- Correlations engine and Graph to visualize relationships between decoded files (hashes), PGP UIDs, domains, username, and cryptocurrencies addresses
|
||||
- Websites, Forums and Tor Hidden-Services hidden services crawler to crawl and parse output
|
||||
- Domain availability monitoring to detect up and down of websites and hidden services
|
||||
- Browsed hidden services are automatically captured and integrated into the analyzed output, including a blurring screenshot interface (to avoid "burning the eyes" of security analysts with sensitive content)
|
||||
- Tor hidden services is part of the standard framework, all the AIL modules are available to the crawled hidden services
|
||||
- Crawler scheduler to trigger crawling on demand or at regular intervals for URLs or Tor hidden services
|
||||
|
||||
|
||||
Installation
|
||||
------------
|
||||
## Installation
|
||||
|
||||
Type these command lines for a fully automated installation and start AIL framework:
|
||||
To install the AIL framework, run the following commands:
|
||||
```bash
|
||||
# Clone the repo first
|
||||
git clone https://github.com/ail-project/ail-framework.git
|
||||
|
@ -89,10 +99,6 @@ cd ail-framework
|
|||
# For Debian and Ubuntu based distributions
|
||||
./installing_deps.sh
|
||||
|
||||
# For Centos based distributions (Tested: Centos 8)
|
||||
chmod u+x centos_installing_deps.sh
|
||||
./centos_installing_deps.sh
|
||||
|
||||
# Launch ail
|
||||
cd ~/ail-framework/
|
||||
cd bin/
|
||||
|
@ -101,59 +107,52 @@ cd bin/
|
|||
|
||||
The default [installing_deps.sh](./installing_deps.sh) is for Debian and Ubuntu based distributions.
|
||||
|
||||
There is also a [Travis file](.travis.yml) used for automating the installation that can be used to build and install AIL on other systems.
|
||||
|
||||
Requirement:
|
||||
- Python 3.6+
|
||||
- Python 3.7+
|
||||
|
||||
Installation Notes
|
||||
------------
|
||||
## Installation Notes
|
||||
|
||||
In order to use AIL combined with **ZFS** or **unprivileged LXC** it's necessary to disable Direct I/O in `$AIL_HOME/configs/6382.conf` by changing the value of the directive `use_direct_io_for_flush_and_compaction` to `false`.
|
||||
For Lacus Crawler installation instructions, refer to the [HOWTO](https://github.com/ail-project/ail-framework/blob/master/HOWTO.md#crawler)
|
||||
|
||||
Tor installation instructions can be found in the [HOWTO](https://github.com/ail-project/ail-framework/blob/master/HOWTO.md#installationconfiguration)
|
||||
## Starting AIL
|
||||
|
||||
Starting AIL
|
||||
--------------------------
|
||||
To start AIL, use the following commands:
|
||||
|
||||
```bash
|
||||
cd bin/
|
||||
./LAUNCH.sh -l
|
||||
```
|
||||
|
||||
Eventually you can browse the status of the AIL framework website at the following URL:
|
||||
You can access the AIL framework web interface at the following URL:
|
||||
|
||||
```
|
||||
https://localhost:7000/
|
||||
```
|
||||
|
||||
The default credentials for the web interface are located in ``DEFAULT_PASSWORD``. This file is removed when you change your password.
|
||||
The default credentials for the web interface are located in the ``DEFAULT_PASSWORD``file, which is deleted when you change your password.
|
||||
|
||||
Training
|
||||
--------
|
||||
## Training
|
||||
|
||||
CIRCL organises training on how to use or extend the AIL framework. AIL training materials are available at [https://www.circl.lu/services/ail-training-materials/](https://www.circl.lu/services/ail-training-materials/).
|
||||
CIRCL organises training on how to use or extend the AIL framework. AIL training materials are available at [https://github.com/ail-project/ail-training](https://github.com/ail-project/ail-training).
|
||||
|
||||
|
||||
API
|
||||
-----
|
||||
## API
|
||||
|
||||
The API documentation is available in [doc/README.md](doc/README.md)
|
||||
The API documentation is available in [doc/api.md](doc/api.md)
|
||||
|
||||
HOWTO
|
||||
-----
|
||||
## HOWTO
|
||||
|
||||
HOWTO are available in [HOWTO.md](HOWTO.md)
|
||||
|
||||
Privacy and GDPR
|
||||
----------------
|
||||
## Privacy and GDPR
|
||||
|
||||
[AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks](https://www.circl.lu/assets/files/information-leaks-analysis-and-gdpr.pdf) document provides an overview how to use AIL in a lawfulness context especially in the scope of General Data Protection Regulation.
|
||||
For information on AIL's compliance with GDPR and privacy considerations, refer to the [AIL information leaks analysis and the GDPR in the context of collection, analysis and sharing information leaks](https://www.circl.lu/assets/files/information-leaks-analysis-and-gdpr.pdf) document.
|
||||
|
||||
Research using AIL
|
||||
------------------
|
||||
this document provides an overview how to use AIL in a lawfulness context especially in the scope of General Data Protection Regulation.
|
||||
|
||||
If you write academic paper, relying or using AIL, it can be cited with the following BibTeX:
|
||||
## Research using AIL
|
||||
|
||||
If you use or reference AIL in an academic paper, you can cite it using the following BibTeX:
|
||||
|
||||
~~~~
|
||||
@inproceedings{mokaddem2018ail,
|
||||
|
@ -166,75 +165,64 @@ If you write academic paper, relying or using AIL, it can be cited with the foll
|
|||
}
|
||||
~~~~
|
||||
|
||||
Screenshots
|
||||
===========
|
||||
## Screenshots
|
||||
|
||||
|
||||
Tor hidden service crawler
|
||||
--------------------------
|
||||
### Websites, Forums and Tor Hidden-Services
|
||||
|
||||
![Tor hidden service](./doc/screenshots/ail-bitcoinmixer.png?raw=true "Tor hidden service crawler")
|
||||
![Domain CIRCL](./doc/screenshots/domain_circl.png?raw=true "Tor hidden service crawler")
|
||||
|
||||
Trending charts
|
||||
---------------
|
||||
#### Login protected, pre-recorded session cookies:
|
||||
![Domain cookiejar](./doc/screenshots/crawler-cookiejar-domain-crawled.png?raw=true "Tor hidden service crawler")
|
||||
|
||||
![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending")
|
||||
### Extracted encoded files from items
|
||||
|
||||
Extracted encoded files from pastes
|
||||
-----------------------------------
|
||||
![Extracted files](./doc/screenshots/decodeds_dashboard.png?raw=true "AIL extracted decoded files statistics")
|
||||
|
||||
![Extracted files from pastes](./doc/screenshots/ail-hashedfiles.png?raw=true "AIL extracted decoded files statistics")
|
||||
![Relationships between extracted files from encoded file in unstructured data](./doc/screenshots/hashedfile-graph.png?raw=true "Relationships between extracted files from encoded file in unstructured data")
|
||||
### Correlation Engine
|
||||
|
||||
Browsing
|
||||
--------
|
||||
![Correlation decoded image](./doc/screenshots/correlation_decoded_image.png?raw=true "Correlation decoded image")
|
||||
|
||||
![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes")
|
||||
### Investigation
|
||||
|
||||
Tagging system
|
||||
--------
|
||||
![Investigation](./doc/screenshots/investigation_mixer.png?raw=true "AIL framework cookiejar")
|
||||
|
||||
![Tags](./doc/screenshots/tags.png?raw=true "AIL framework tags")
|
||||
### Tagging system
|
||||
|
||||
MISP and The Hive, automatic events and alerts creation
|
||||
--------
|
||||
![Tags](./doc/screenshots/tags_search.png?raw=true "AIL framework tags")
|
||||
|
||||
![paste_submit](./doc/screenshots/tag_auto_export.png?raw=true "AIL framework MISP and Hive auto export")
|
||||
![Tags search](./doc/screenshots/tags_search_items.png?raw=true "AIL framework tags items search")
|
||||
|
||||
Paste submission
|
||||
--------
|
||||
### MISP Export
|
||||
|
||||
![paste_submit](./doc/screenshots/paste_submit.png?raw=true "AIL framework paste submission")
|
||||
![misp_export](./doc/screenshots/misp_export.png?raw=true "AIL framework MISP Export")
|
||||
|
||||
Sentiment analysis
|
||||
------------------
|
||||
### MISP and The Hive, automatic events and alerts creation
|
||||
|
||||
![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis")
|
||||
![tags_misp_auto](./doc/screenshots/tags_misp_auto.png?raw=true "AIL framework MISP and Hive auto export")
|
||||
|
||||
Terms tracker
|
||||
---------------------------
|
||||
### UI submission
|
||||
|
||||
![Term-tracker](./doc/screenshots/term-tracker.png?raw=true "AIL framework termManager")
|
||||
![ui_submit](./doc/screenshots/ui_submit.png?raw=true "AIL framework UI importer")
|
||||
|
||||
### Trackers
|
||||
|
||||
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
|
||||
![tracker-create](./doc/screenshots/tracker_create.png?raw=true "AIL framework create tracker")
|
||||
|
||||
Command line module manager
|
||||
---------------------------
|
||||
![tracker-yara](./doc/screenshots/tracker_yara.png?raw=true "AIL framework Yara tracker")
|
||||
|
||||
![Module-Manager](./doc/screenshots/module_information.png?raw=true "AIL framework ModuleInformationV2.py")
|
||||
![retro-hunt](./doc/screenshots/retro_hunt.png?raw=true "AIL framework Retro Hunt")
|
||||
|
||||
License
|
||||
=======
|
||||
## License
|
||||
|
||||
```
|
||||
Copyright (C) 2014 Jules Debra
|
||||
Copyright (C) 2014-2021 CIRCL - Computer Incident Response Center Luxembourg (c/o smile, security made in Lëtzebuerg, Groupement d'Intérêt Economique)
|
||||
Copyright (c) 2014-2021 Raphaël Vinot
|
||||
Copyright (c) 2014-2021 Alexandre Dulaunoy
|
||||
Copyright (c) 2016-2021 Sami Mokaddem
|
||||
Copyright (c) 2018-2021 Thirion Aurélien
|
||||
Copyright (c) 2021 Olivier Sagit
|
||||
Copyright (C) 2014-2023 CIRCL - Computer Incident Response Center Luxembourg (c/o smile, security made in Lëtzebuerg, Groupement d'Intérêt Economique)
|
||||
Copyright (c) 2014-2023 Raphaël Vinot
|
||||
Copyright (c) 2014-2023 Alexandre Dulaunoy
|
||||
Copyright (c) 2016-2023 Sami Mokaddem
|
||||
Copyright (c) 2018-2023 Thirion Aurélien
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License as published by
|
||||
|
|
|
@ -20,26 +20,28 @@ if [ -e "${DIR}/AILENV/bin/python" ]; then
|
|||
export AIL_VENV=${AIL_HOME}/AILENV/
|
||||
. ./AILENV/bin/activate
|
||||
else
|
||||
echo "Please make sure you have a AIL-framework environment, au revoir"
|
||||
echo "Please make sure AILENV is installed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
export PATH=$AIL_VENV/bin:$PATH
|
||||
export PATH=$AIL_HOME:$PATH
|
||||
export PATH=$AIL_REDIS:$PATH
|
||||
export PATH=$AIL_ARDB:$PATH
|
||||
export PATH=$AIL_KVROCKS:$PATH
|
||||
export PATH=$AIL_BIN:$PATH
|
||||
export PATH=$AIL_FLASK:$PATH
|
||||
|
||||
isredis=`screen -ls | egrep '[0-9]+.Redis_AIL' | cut -d. -f1`
|
||||
isardb=`screen -ls | egrep '[0-9]+.ARDB_AIL' | cut -d. -f1`
|
||||
iskvrocks=`screen -ls | egrep '[0-9]+.KVROCKS_AIL' | cut -d. -f1`
|
||||
islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
|
||||
is_ail_core=`screen -ls | egrep '[0-9]+.Core_AIL' | cut -d. -f1`
|
||||
is_ail_2_ail=`screen -ls | egrep '[0-9]+.AIL_2_AIL' | cut -d. -f1`
|
||||
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
||||
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
||||
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
||||
function check_screens {
|
||||
isredis=`screen -ls | egrep '[0-9]+.Redis_AIL' | cut -d. -f1`
|
||||
isardb=`screen -ls | egrep '[0-9]+.ARDB_AIL' | cut -d. -f1`
|
||||
iskvrocks=`screen -ls | egrep '[0-9]+.KVROCKS_AIL' | cut -d. -f1`
|
||||
islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
|
||||
is_ail_core=`screen -ls | egrep '[0-9]+.Core_AIL' | cut -d. -f1`
|
||||
is_ail_2_ail=`screen -ls | egrep '[0-9]+.AIL_2_AIL' | cut -d. -f1`
|
||||
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
||||
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
||||
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
||||
}
|
||||
|
||||
function helptext {
|
||||
echo -e $YELLOW"
|
||||
|
@ -59,7 +61,6 @@ function helptext {
|
|||
- All the queuing modules.
|
||||
- All the processing modules.
|
||||
- All Redis in memory servers.
|
||||
- All ARDB on disk servers.
|
||||
- All KVROCKS servers.
|
||||
"$DEFAULT"
|
||||
(Inside screen Daemons)
|
||||
|
@ -69,6 +70,7 @@ function helptext {
|
|||
LAUNCH.sh
|
||||
[-l | --launchAuto] LAUNCH DB + Scripts
|
||||
[-k | --killAll] Kill DB + Scripts
|
||||
[-r | --restart] Restart
|
||||
[-ks | --killscript] Scripts
|
||||
[-u | --update] Update AIL
|
||||
[-ut | --thirdpartyUpdate] Update UI/Frontend
|
||||
|
@ -265,14 +267,17 @@ function launching_scripts {
|
|||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "SQLInjectionDetection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./SQLInjectionDetection.py; read x"
|
||||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "LibInjection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./LibInjection.py; read x"
|
||||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "Zerobins" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Zerobins.py; read x"
|
||||
sleep 0.1
|
||||
# screen -S "Script_AIL" -X screen -t "LibInjection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./LibInjection.py; read x"
|
||||
# sleep 0.1
|
||||
# screen -S "Script_AIL" -X screen -t "Pasties" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Pasties.py; read x"
|
||||
# sleep 0.1
|
||||
|
||||
screen -S "Script_AIL" -X screen -t "MISP_Thehive_Auto_Push" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./MISP_Thehive_Auto_Push.py; read x"
|
||||
sleep 0.1
|
||||
|
||||
screen -S "Script_AIL" -X screen -t "Exif" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Exif.py; read x"
|
||||
sleep 0.1
|
||||
|
||||
##################################
|
||||
# TRACKERS MODULES #
|
||||
##################################
|
||||
|
@ -607,7 +612,7 @@ function launch_all {
|
|||
|
||||
function menu_display {
|
||||
|
||||
options=("Redis" "Ardb" "Kvrocks" "Logs" "Scripts" "Flask" "Killall" "Update" "Update-config" "Update-thirdparty")
|
||||
options=("Redis" "Kvrocks" "Logs" "Scripts" "Flask" "Killall" "Update" "Update-config" "Update-thirdparty")
|
||||
|
||||
menu() {
|
||||
echo "What do you want to Launch?:"
|
||||
|
@ -635,9 +640,6 @@ function menu_display {
|
|||
Redis)
|
||||
launch_redis;
|
||||
;;
|
||||
Ardb)
|
||||
launch_ardb;
|
||||
;;
|
||||
Kvrocks)
|
||||
launch_kvrocks;
|
||||
;;
|
||||
|
@ -679,31 +681,38 @@ function menu_display {
|
|||
}
|
||||
|
||||
#echo "$@"
|
||||
|
||||
check_screens;
|
||||
while [ "$1" != "" ]; do
|
||||
case $1 in
|
||||
-l | --launchAuto ) launch_all "automatic";
|
||||
-l | --launchAuto ) check_screens;
|
||||
launch_all "automatic";
|
||||
;;
|
||||
-lr | --launchRedis ) launch_redis;
|
||||
-lr | --launchRedis ) check_screens;
|
||||
launch_redis;
|
||||
;;
|
||||
-la | --launchARDB ) launch_ardb;
|
||||
;;
|
||||
-lk | --launchKVROCKS ) launch_kvrocks;
|
||||
-lk | --launchKVROCKS ) check_screens;
|
||||
launch_kvrocks;
|
||||
;;
|
||||
-lrv | --launchRedisVerify ) launch_redis;
|
||||
wait_until_redis_is_ready;
|
||||
;;
|
||||
-lav | --launchARDBVerify ) launch_ardb;
|
||||
wait_until_ardb_is_ready;
|
||||
;;
|
||||
-lkv | --launchKVORCKSVerify ) launch_kvrocks;
|
||||
wait_until_kvrocks_is_ready;
|
||||
;;
|
||||
--set_kvrocks_namespaces ) set_kvrocks_namespaces;
|
||||
;;
|
||||
-k | --killAll ) killall;
|
||||
-k | --killAll ) check_screens;
|
||||
killall;
|
||||
;;
|
||||
-ks | --killscript ) killscript;
|
||||
-r | --restart ) killall;
|
||||
sleep 0.1;
|
||||
check_screens;
|
||||
launch_all "automatic";
|
||||
;;
|
||||
-ks | --killscript ) check_screens;
|
||||
killscript;
|
||||
;;
|
||||
-m | --menu ) menu_display;
|
||||
;;
|
||||
|
|
|
@ -34,16 +34,20 @@ class D4Client(AbstractModule):
|
|||
|
||||
self.d4_client = d4.create_d4_client()
|
||||
self.last_refresh = time.time()
|
||||
self.last_config_check = time.time()
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, dns_record):
|
||||
# Refresh D4 Client
|
||||
if self.last_refresh < d4.get_config_last_update_time():
|
||||
self.d4_client = d4.create_d4_client()
|
||||
self.last_refresh = time.time()
|
||||
print('D4 Client: config updated')
|
||||
if self.last_config_check < int(time.time()) - 30:
|
||||
print('refresh rrrr')
|
||||
if self.last_refresh < d4.get_config_last_update_time():
|
||||
self.d4_client = d4.create_d4_client()
|
||||
self.last_refresh = time.time()
|
||||
print('D4 Client: config updated')
|
||||
self.last_config_check = time.time()
|
||||
|
||||
if self.d4_client:
|
||||
# Send DNS Record to D4Server
|
||||
|
|
|
@ -23,7 +23,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from core import ail_2_ail
|
||||
from modules.abstract_module import AbstractModule
|
||||
# from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
|
||||
#### CONFIG ####
|
||||
# config_loader = ConfigLoader()
|
||||
|
@ -76,10 +76,11 @@ class Sync_importer(AbstractModule):
|
|||
|
||||
# # TODO: create default id
|
||||
item_id = ail_stream['meta']['ail:id']
|
||||
item = Item(item_id)
|
||||
|
||||
message = f'sync {item_id} {b64_gzip_content}'
|
||||
print(item_id)
|
||||
self.add_message_to_queue(message, 'Importers')
|
||||
message = f'sync {b64_gzip_content}'
|
||||
print(item.id)
|
||||
self.add_message_to_queue(obj=item, message=message, queue='Importers')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -15,13 +15,16 @@ This module .
|
|||
import os
|
||||
import sys
|
||||
import time
|
||||
import traceback
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from core import ail_2_ail
|
||||
from lib.objects.Items import Item
|
||||
from lib.ail_queues import get_processed_end_obj, timeout_processed_objs, get_last_queue_timeout
|
||||
from lib.exceptions import ModuleQueueError
|
||||
from lib.objects import ail_objects
|
||||
from modules.abstract_module import AbstractModule
|
||||
|
||||
|
||||
|
@ -30,14 +33,15 @@ class Sync_module(AbstractModule):
|
|||
Sync_module module for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Sync_module, self).__init__()
|
||||
def __init__(self, queue=False): # FIXME MODIFY/ADD QUEUE
|
||||
super(Sync_module, self).__init__(queue=queue)
|
||||
|
||||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 10
|
||||
|
||||
self.dict_sync_queues = ail_2_ail.get_all_sync_queue_dict()
|
||||
self.last_refresh = time.time()
|
||||
self.last_refresh_queues = time.time()
|
||||
|
||||
print(self.dict_sync_queues)
|
||||
|
||||
|
@ -53,40 +57,70 @@ class Sync_module(AbstractModule):
|
|||
print('sync queues refreshed')
|
||||
print(self.dict_sync_queues)
|
||||
|
||||
# Extract object from message
|
||||
# # TODO: USE JSON DICT ????
|
||||
mess_split = message.split(';')
|
||||
if len(mess_split) == 3:
|
||||
obj_type = mess_split[0]
|
||||
obj_subtype = mess_split[1]
|
||||
obj_id = mess_split[2]
|
||||
obj = ail_objects.get_obj_from_global_id(message)
|
||||
|
||||
# OBJECT => Item
|
||||
# if obj_type == 'item':
|
||||
obj = Item(obj_id)
|
||||
tags = obj.get_tags()
|
||||
|
||||
tags = obj.get_tags()
|
||||
# check filter + tags
|
||||
# print(message)
|
||||
for queue_uuid in self.dict_sync_queues:
|
||||
filter_tags = self.dict_sync_queues[queue_uuid]['filter']
|
||||
if filter_tags and tags:
|
||||
# print('tags: {tags} filter: {filter_tags}')
|
||||
if filter_tags.issubset(tags):
|
||||
obj_dict = obj.get_default_meta()
|
||||
# send to queue push and/or pull
|
||||
for dict_ail in self.dict_sync_queues[queue_uuid]['ail_instances']:
|
||||
print(f'ail_uuid: {dict_ail["ail_uuid"]} obj: {obj.type}:{obj.get_subtype(r_str=True)}:{obj.id}')
|
||||
ail_2_ail.add_object_to_sync_queue(queue_uuid, dict_ail['ail_uuid'], obj_dict,
|
||||
push=dict_ail['push'], pull=dict_ail['pull'])
|
||||
|
||||
# check filter + tags
|
||||
# print(message)
|
||||
for queue_uuid in self.dict_sync_queues:
|
||||
filter_tags = self.dict_sync_queues[queue_uuid]['filter']
|
||||
if filter_tags and tags:
|
||||
# print('tags: {tags} filter: {filter_tags}')
|
||||
if filter_tags.issubset(tags):
|
||||
obj_dict = obj.get_default_meta()
|
||||
# send to queue push and/or pull
|
||||
for dict_ail in self.dict_sync_queues[queue_uuid]['ail_instances']:
|
||||
print(f'ail_uuid: {dict_ail["ail_uuid"]} obj: {message}')
|
||||
ail_2_ail.add_object_to_sync_queue(queue_uuid, dict_ail['ail_uuid'], obj_dict,
|
||||
push=dict_ail['push'], pull=dict_ail['pull'])
|
||||
def run(self):
|
||||
"""
|
||||
Run Module endless process
|
||||
"""
|
||||
|
||||
else:
|
||||
# Malformed message
|
||||
raise Exception(f'too many values to unpack (expected 3) given {len(mess_split)} with message {message}')
|
||||
# Endless loop processing messages from the input queue
|
||||
while self.proceed:
|
||||
|
||||
# Timeout queues
|
||||
# timeout_processed_objs()
|
||||
if self.last_refresh_queues < time.time():
|
||||
timeout_processed_objs()
|
||||
self.last_refresh_queues = time.time() + 120
|
||||
self.redis_logger.debug('Timeout queues')
|
||||
# print('Timeout queues')
|
||||
|
||||
# Get one message (paste) from the QueueIn (copy of Redis_Global publish)
|
||||
global_id = get_processed_end_obj()
|
||||
if global_id:
|
||||
try:
|
||||
# Module processing with the message from the queue
|
||||
self.compute(global_id)
|
||||
except Exception as err:
|
||||
if self.debug:
|
||||
self.queue.error()
|
||||
raise err
|
||||
|
||||
# LOG ERROR
|
||||
trace = traceback.format_tb(err.__traceback__)
|
||||
trace = ''.join(trace)
|
||||
self.logger.critical(f"Error in module {self.module_name}: {__name__} : {err}")
|
||||
self.logger.critical(f"Module {self.module_name} input message: {global_id}")
|
||||
self.logger.critical(trace)
|
||||
|
||||
if isinstance(err, ModuleQueueError):
|
||||
self.queue.error()
|
||||
raise err
|
||||
|
||||
else:
|
||||
self.computeNone()
|
||||
# Wait before next process
|
||||
self.logger.debug(f"{self.module_name}, waiting for new message, Idling {self.pending_seconds}s")
|
||||
time.sleep(self.pending_seconds)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Sync_module()
|
||||
module = Sync_module(queue=False) # FIXME MODIFY/ADD QUEUE
|
||||
module.run()
|
||||
|
|
|
@ -11,7 +11,7 @@ import uuid
|
|||
|
||||
import subprocess
|
||||
|
||||
from flask import escape
|
||||
from markupsafe import escape
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
|
@ -141,7 +141,10 @@ def is_server_client_sync_mode_connected(ail_uuid, sync_mode):
|
|||
return res == 1
|
||||
|
||||
def is_server_client_connected(ail_uuid):
|
||||
return r_cache.sismember('ail_2_ail:server:all_clients', ail_uuid)
|
||||
try:
|
||||
return r_cache.sismember('ail_2_ail:server:all_clients', ail_uuid)
|
||||
except:
|
||||
return False
|
||||
|
||||
def clear_server_connected_clients():
|
||||
for ail_uuid in get_server_all_connected_clients():
|
||||
|
@ -398,7 +401,10 @@ def get_all_ail_instance_keys():
|
|||
return r_serv_sync.smembers(f'ail:instance:key:all')
|
||||
|
||||
def is_allowed_ail_instance_key(key):
|
||||
return r_serv_sync.sismember(f'ail:instance:key:all', key)
|
||||
try:
|
||||
return r_serv_sync.sismember(f'ail:instance:key:all', key)
|
||||
except:
|
||||
return False
|
||||
|
||||
def get_ail_instance_key(ail_uuid):
|
||||
return r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'api_key')
|
||||
|
@ -427,7 +433,10 @@ def get_ail_instance_all_sync_queue(ail_uuid):
|
|||
return r_serv_sync.smembers(f'ail:instance:sync_queue:{ail_uuid}')
|
||||
|
||||
def is_ail_instance_queue(ail_uuid, queue_uuid):
|
||||
return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid)
|
||||
try:
|
||||
return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid)
|
||||
except:
|
||||
return False
|
||||
|
||||
def exists_ail_instance(ail_uuid):
|
||||
return r_serv_sync.exists(f'ail:instance:{ail_uuid}')
|
||||
|
@ -439,7 +448,10 @@ def get_ail_instance_description(ail_uuid):
|
|||
return r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'description')
|
||||
|
||||
def exists_ail_instance(ail_uuid):
|
||||
return r_serv_sync.sismember('ail:instance:all', ail_uuid)
|
||||
try:
|
||||
return r_serv_sync.sismember('ail:instance:all', ail_uuid)
|
||||
except:
|
||||
return False
|
||||
|
||||
def is_ail_instance_push_enabled(ail_uuid):
|
||||
res = r_serv_sync.hget(f'ail:instance:{ail_uuid}', 'push')
|
||||
|
@ -935,7 +947,10 @@ def get_all_sync_queue_dict():
|
|||
return dict_sync_queues
|
||||
|
||||
def is_queue_registred_by_ail_instance(queue_uuid, ail_uuid):
|
||||
return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid)
|
||||
try:
|
||||
return r_serv_sync.sismember(f'ail:instance:sync_queue:{ail_uuid}', queue_uuid)
|
||||
except:
|
||||
return False
|
||||
|
||||
def register_ail_to_sync_queue(ail_uuid, queue_uuid):
|
||||
is_linked = is_ail_instance_linked_to_sync_queue(ail_uuid)
|
||||
|
|
|
@ -6,6 +6,7 @@ import logging.config
|
|||
import sys
|
||||
import time
|
||||
|
||||
from pyail import PyAIL
|
||||
from requests.exceptions import ConnectionError
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
|
@ -16,9 +17,13 @@ from modules.abstract_module import AbstractModule
|
|||
from lib import ail_logger
|
||||
from lib import crawlers
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import CookiesNames
|
||||
from lib.objects import Etags
|
||||
from lib.objects.Domains import Domain
|
||||
from lib.objects.Items import Item
|
||||
from lib.objects import Screenshots
|
||||
from lib.objects import Titles
|
||||
from trackers.Tracker_Yara import Tracker_Yara
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='crawlers'))
|
||||
|
||||
|
@ -32,12 +37,23 @@ class Crawler(AbstractModule):
|
|||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 1
|
||||
|
||||
self.tracker_yara = Tracker_Yara(queue=False)
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
||||
self.default_har = config_loader.get_config_boolean('Crawler', 'default_har')
|
||||
self.default_screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot')
|
||||
self.default_depth_limit = config_loader.get_config_int('Crawler', 'default_depth_limit')
|
||||
|
||||
ail_url_to_push_discovery = config_loader.get_config_str('Crawler', 'ail_url_to_push_onion_discovery')
|
||||
ail_key_to_push_discovery = config_loader.get_config_str('Crawler', 'ail_key_to_push_onion_discovery')
|
||||
if ail_url_to_push_discovery and ail_key_to_push_discovery:
|
||||
ail = PyAIL(ail_url_to_push_discovery, ail_key_to_push_discovery, ssl=False)
|
||||
if ail.ping_ail():
|
||||
self.ail_to_push_discovery = ail
|
||||
else:
|
||||
self.ail_to_push_discovery = None
|
||||
|
||||
# TODO: LIMIT MAX NUMBERS OF CRAWLED PAGES
|
||||
|
||||
# update hardcoded blacklist
|
||||
|
@ -55,12 +71,15 @@ class Crawler(AbstractModule):
|
|||
self.har = None
|
||||
self.screenshot = None
|
||||
self.root_item = None
|
||||
self.har_dir = None
|
||||
self.date = None
|
||||
self.items_dir = None
|
||||
self.original_domain = None
|
||||
self.domain = None
|
||||
|
||||
# TODO Replace with warning list ???
|
||||
self.placeholder_screenshots = {'27e14ace10b0f96acd2bd919aaa98a964597532c35b6409dff6cc8eec8214748'}
|
||||
self.placeholder_screenshots = {'07244254f73e822bd4a95d916d8b27f2246b02c428adc29082d09550c6ed6e1a' # blank
|
||||
'27e14ace10b0f96acd2bd919aaa98a964597532c35b6409dff6cc8eec8214748', # not found
|
||||
'3e66bf4cc250a68c10f8a30643d73e50e68bf1d4a38d4adc5bfc4659ca2974c0'} # 404
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info('Crawler initialized')
|
||||
|
@ -94,7 +113,7 @@ class Crawler(AbstractModule):
|
|||
self.crawler_scheduler.update_queue()
|
||||
self.crawler_scheduler.process_queue()
|
||||
|
||||
self.refresh_lacus_status() # TODO LOG ERROR
|
||||
self.refresh_lacus_status() # TODO LOG ERROR
|
||||
if not self.is_lacus_up:
|
||||
return None
|
||||
|
||||
|
@ -102,7 +121,9 @@ class Crawler(AbstractModule):
|
|||
if crawlers.get_nb_crawler_captures() < crawlers.get_crawler_max_captures():
|
||||
task_row = crawlers.add_task_to_lacus_queue()
|
||||
if task_row:
|
||||
task_uuid, priority = task_row
|
||||
task, priority = task_row
|
||||
task.start()
|
||||
task_uuid = task.uuid
|
||||
try:
|
||||
self.enqueue_capture(task_uuid, priority)
|
||||
except ConnectionError:
|
||||
|
@ -117,15 +138,30 @@ class Crawler(AbstractModule):
|
|||
if capture:
|
||||
try:
|
||||
status = self.lacus.get_capture_status(capture.uuid)
|
||||
if status != crawlers.CaptureStatus.DONE: # TODO ADD GLOBAL TIMEOUT-> Save start time ### print start time
|
||||
if status == crawlers.CaptureStatus.DONE:
|
||||
return capture
|
||||
elif status == crawlers.CaptureStatus.UNKNOWN:
|
||||
capture_start = capture.get_start_time(r_str=False)
|
||||
if capture_start == 0:
|
||||
task = capture.get_task()
|
||||
task.delete()
|
||||
capture.delete()
|
||||
self.logger.warning(f'capture UNKNOWN ERROR STATE, {task.uuid} Removed from queue')
|
||||
return None
|
||||
if int(time.time()) - capture_start > 600: # TODO ADD in new crawler config
|
||||
task = capture.get_task()
|
||||
task.reset()
|
||||
capture.delete()
|
||||
self.logger.warning(f'capture UNKNOWN Timeout, {task.uuid} Send back in queue')
|
||||
else:
|
||||
capture.update(status)
|
||||
else:
|
||||
capture.update(status)
|
||||
print(capture.uuid, crawlers.CaptureStatus(status).name, int(time.time()))
|
||||
else:
|
||||
return capture
|
||||
|
||||
except ConnectionError:
|
||||
print(capture.uuid)
|
||||
capture.update(self, -1)
|
||||
capture.update(-1)
|
||||
self.refresh_lacus_status()
|
||||
|
||||
time.sleep(self.pending_seconds)
|
||||
|
@ -166,6 +202,24 @@ class Crawler(AbstractModule):
|
|||
|
||||
crawlers.create_capture(capture_uuid, task_uuid)
|
||||
print(task.uuid, capture_uuid, 'launched')
|
||||
|
||||
if self.ail_to_push_discovery:
|
||||
|
||||
if task.get_depth() == 1 and priority < 10 and task.get_domain().endswith('.onion'):
|
||||
har = task.get_har()
|
||||
screenshot = task.get_screenshot()
|
||||
# parent_id = task.get_parent()
|
||||
# if parent_id != 'manual' and parent_id != 'auto':
|
||||
# parent = parent_id[19:-36]
|
||||
# else:
|
||||
# parent = 'AIL_capture'
|
||||
|
||||
if not url:
|
||||
raise Exception(f'Error: url is None, {task.uuid}, {capture_uuid}, {url}')
|
||||
|
||||
self.ail_to_push_discovery.add_crawler_capture(task_uuid, capture_uuid, url, har=har, # parent=parent,
|
||||
screenshot=screenshot, depth_limit=1, proxy='force_tor')
|
||||
print(task.uuid, capture_uuid, url, 'Added to ail_to_push_discovery')
|
||||
return capture_uuid
|
||||
|
||||
# CRAWL DOMAIN
|
||||
|
@ -175,34 +229,52 @@ class Crawler(AbstractModule):
|
|||
task = capture.get_task()
|
||||
domain = task.get_domain()
|
||||
print(domain)
|
||||
if not domain:
|
||||
if self.debug:
|
||||
raise Exception(f'Error: domain {domain} - task {task.uuid} - capture {capture.uuid}')
|
||||
else:
|
||||
self.logger.critical(f'Error: domain {domain} - task {task.uuid} - capture {capture.uuid}')
|
||||
print(f'Error: domain {domain}')
|
||||
return None
|
||||
|
||||
self.domain = Domain(domain)
|
||||
self.original_domain = Domain(domain)
|
||||
|
||||
epoch = int(time.time())
|
||||
parent_id = task.get_parent()
|
||||
|
||||
entries = self.lacus.get_capture(capture.uuid)
|
||||
print(entries['status'])
|
||||
print(entries.get('status'))
|
||||
self.har = task.get_har()
|
||||
self.screenshot = task.get_screenshot()
|
||||
# DEBUG
|
||||
# self.har = True
|
||||
# self.screenshot = True
|
||||
str_date = crawlers.get_current_date(separator=True)
|
||||
self.har_dir = crawlers.get_date_har_dir(str_date)
|
||||
self.items_dir = crawlers.get_date_crawled_items_source(str_date)
|
||||
self.date = crawlers.get_current_date(separator=True)
|
||||
self.items_dir = crawlers.get_date_crawled_items_source(self.date)
|
||||
self.root_item = None
|
||||
|
||||
# Save Capture
|
||||
self.save_capture_response(parent_id, entries)
|
||||
|
||||
self.domain.update_daterange(str_date.replace('/', ''))
|
||||
# Origin + History
|
||||
self.domain.update_daterange(self.date.replace('/', ''))
|
||||
# Origin + History + tags
|
||||
if self.root_item:
|
||||
self.domain.set_last_origin(parent_id)
|
||||
self.domain.add_history(epoch, root_item=self.root_item)
|
||||
elif self.domain.was_up():
|
||||
self.domain.add_history(epoch, root_item=epoch)
|
||||
# Tags
|
||||
for tag in task.get_tags():
|
||||
self.domain.add_tag(tag)
|
||||
self.domain.add_history(epoch, root_item=self.root_item)
|
||||
|
||||
if self.domain != self.original_domain:
|
||||
self.original_domain.update_daterange(self.date.replace('/', ''))
|
||||
if self.root_item:
|
||||
self.original_domain.set_last_origin(parent_id)
|
||||
# Tags
|
||||
for tag in task.get_tags():
|
||||
self.domain.add_tag(tag)
|
||||
self.original_domain.add_history(epoch, root_item=self.root_item)
|
||||
crawlers.update_last_crawled_domain(self.original_domain.get_domain_type(), self.original_domain.id, epoch)
|
||||
|
||||
crawlers.update_last_crawled_domain(self.domain.get_domain_type(), self.domain.id, epoch)
|
||||
print('capture:', capture.uuid, 'completed')
|
||||
|
@ -215,12 +287,12 @@ class Crawler(AbstractModule):
|
|||
if 'error' in entries:
|
||||
# TODO IMPROVE ERROR MESSAGE
|
||||
self.logger.warning(str(entries['error']))
|
||||
print(entries['error'])
|
||||
print(entries.get('error'))
|
||||
if entries.get('html'):
|
||||
print('retrieved content')
|
||||
# print(entries.get('html'))
|
||||
|
||||
if 'last_redirected_url' in entries and entries['last_redirected_url']:
|
||||
if 'last_redirected_url' in entries and entries.get('last_redirected_url'):
|
||||
last_url = entries['last_redirected_url']
|
||||
unpacked_last_url = crawlers.unpack_url(last_url)
|
||||
current_domain = unpacked_last_url['domain']
|
||||
|
@ -235,32 +307,45 @@ class Crawler(AbstractModule):
|
|||
else:
|
||||
last_url = f'http://{self.domain.id}'
|
||||
|
||||
if 'html' in entries and entries['html']:
|
||||
if 'html' in entries and entries.get('html'):
|
||||
item_id = crawlers.create_item_id(self.items_dir, self.domain.id)
|
||||
print(item_id)
|
||||
gzip64encoded = crawlers.get_gzipped_b64_item(item_id, entries['html'])
|
||||
item = Item(item_id)
|
||||
print(item.id)
|
||||
|
||||
gzip64encoded = crawlers.get_gzipped_b64_item(item.id, entries['html'])
|
||||
# send item to Global
|
||||
relay_message = f'crawler {item_id} {gzip64encoded}'
|
||||
self.add_message_to_queue(relay_message, 'Importers')
|
||||
relay_message = f'crawler {gzip64encoded}'
|
||||
self.add_message_to_queue(obj=item, message=relay_message, queue='Importers')
|
||||
|
||||
# Tag
|
||||
msg = f'infoleak:submission="crawler";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
# Tag # TODO replace me with metadata to tags
|
||||
msg = f'infoleak:submission="crawler"' # TODO FIXME
|
||||
self.add_message_to_queue(obj=item, message=msg, queue='Tags')
|
||||
|
||||
# TODO replace me with metadata to add
|
||||
crawlers.create_item_metadata(item_id, last_url, parent_id)
|
||||
if self.root_item is None:
|
||||
self.root_item = item_id
|
||||
parent_id = item_id
|
||||
|
||||
title_content = crawlers.extract_title_from_html(entries['html'])
|
||||
if title_content:
|
||||
title = Titles.create_title(title_content)
|
||||
title.add(item.get_date(), item)
|
||||
# Tracker
|
||||
self.tracker_yara.compute_manual(title)
|
||||
if not title.is_tags_safe():
|
||||
unsafe_tag = 'dark-web:topic="pornography-child-exploitation"'
|
||||
self.domain.add_tag(unsafe_tag)
|
||||
item.add_tag(unsafe_tag)
|
||||
|
||||
# SCREENSHOT
|
||||
if self.screenshot:
|
||||
if 'png' in entries and entries['png']:
|
||||
if 'png' in entries and entries.get('png'):
|
||||
screenshot = Screenshots.create_screenshot(entries['png'], b64=False)
|
||||
if screenshot:
|
||||
if not screenshot.is_tags_safe():
|
||||
unsafe_tag = 'dark-web:topic="pornography-child-exploitation"'
|
||||
self.domain.add_tag(unsafe_tag)
|
||||
item = Item(item_id)
|
||||
item.add_tag(unsafe_tag)
|
||||
# Remove Placeholder pages # TODO Replace with warning list ???
|
||||
if screenshot.id not in self.placeholder_screenshots:
|
||||
|
@ -269,8 +354,19 @@ class Crawler(AbstractModule):
|
|||
screenshot.add_correlation('domain', '', self.domain.id)
|
||||
# HAR
|
||||
if self.har:
|
||||
if 'har' in entries and entries['har']:
|
||||
crawlers.save_har(self.har_dir, item_id, entries['har'])
|
||||
if 'har' in entries and entries.get('har'):
|
||||
har_id = crawlers.create_har_id(self.date, item_id)
|
||||
crawlers.save_har(har_id, entries['har'])
|
||||
for cookie_name in crawlers.extract_cookies_names_from_har(entries['har']):
|
||||
print(cookie_name)
|
||||
cookie = CookiesNames.create(cookie_name)
|
||||
cookie.add(self.date.replace('/', ''), self.domain)
|
||||
for etag_content in crawlers.extract_etag_from_har(entries['har']):
|
||||
print(etag_content)
|
||||
etag = Etags.create(etag_content)
|
||||
etag.add(self.date.replace('/', ''), self.domain)
|
||||
crawlers.extract_hhhash(entries['har'], self.domain.id, self.date.replace('/', ''))
|
||||
|
||||
# Next Children
|
||||
entries_children = entries.get('children')
|
||||
if entries_children:
|
||||
|
|
|
@ -319,11 +319,7 @@ class MISPExporterAutoDaily(MISPExporter):
|
|||
def __init__(self, url='', key='', ssl=False):
|
||||
super().__init__(url=url, key=key, ssl=ssl)
|
||||
|
||||
# create event if don't exists
|
||||
try:
|
||||
self.event_id = self.get_daily_event_id()
|
||||
except MISPConnectionError:
|
||||
self.event_id = - 1
|
||||
self.event_id = - 1
|
||||
self.date = datetime.date.today()
|
||||
|
||||
def export(self, obj, tag):
|
||||
|
@ -345,6 +341,7 @@ class MISPExporterAutoDaily(MISPExporter):
|
|||
self.add_event_object(self.event_id, obj)
|
||||
|
||||
except MISPConnectionError:
|
||||
self.event_id = - 1
|
||||
return -1
|
||||
|
||||
|
||||
|
|
|
@ -8,9 +8,12 @@ Import Content
|
|||
|
||||
"""
|
||||
import os
|
||||
import logging
|
||||
import logging.config
|
||||
import sys
|
||||
|
||||
from abc import ABC
|
||||
from ssl import create_default_context
|
||||
|
||||
import smtplib
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
|
@ -22,17 +25,22 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_logger
|
||||
from exporter.abstract_exporter import AbstractExporter
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
# from lib.objects.abstract_object import AbstractObject
|
||||
# from lib.Tracker import Tracker
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
|
||||
|
||||
class MailExporter(AbstractExporter, ABC):
|
||||
def __init__(self, host=None, port=None, password=None, user='', sender=''):
|
||||
def __init__(self, host=None, port=None, password=None, user='', sender='', cert_required=None, ca_file=None):
|
||||
super().__init__()
|
||||
config_loader = ConfigLoader()
|
||||
|
||||
self.logger = logging.getLogger(f'{self.__class__.__name__}')
|
||||
|
||||
if host:
|
||||
self.host = host
|
||||
self.port = port
|
||||
|
@ -45,6 +53,15 @@ class MailExporter(AbstractExporter, ABC):
|
|||
self.pw = config_loader.get_config_str("Notifications", "sender_pw")
|
||||
if self.pw == 'None':
|
||||
self.pw = None
|
||||
if cert_required is not None:
|
||||
self.cert_required = bool(cert_required)
|
||||
self.ca_file = ca_file
|
||||
else:
|
||||
self.cert_required = config_loader.get_config_boolean("Notifications", "cert_required")
|
||||
if self.cert_required:
|
||||
self.ca_file = config_loader.get_config_str("Notifications", "ca_file")
|
||||
else:
|
||||
self.ca_file = None
|
||||
if user:
|
||||
self.user = user
|
||||
else:
|
||||
|
@ -67,8 +84,12 @@ class MailExporter(AbstractExporter, ABC):
|
|||
smtp_server = smtplib.SMTP(self.host, self.port)
|
||||
smtp_server.starttls()
|
||||
except smtplib.SMTPNotSupportedError:
|
||||
print("The server does not support the STARTTLS extension.")
|
||||
smtp_server = smtplib.SMTP_SSL(self.host, self.port)
|
||||
self.logger.info(f"The server {self.host}:{self.port} does not support the STARTTLS extension.")
|
||||
if self.cert_required:
|
||||
context = create_default_context(cafile=self.ca_file)
|
||||
else:
|
||||
context = None
|
||||
smtp_server = smtplib.SMTP_SSL(self.host, self.port, context=context)
|
||||
|
||||
smtp_server.ehlo()
|
||||
if self.user is not None:
|
||||
|
@ -80,7 +101,7 @@ class MailExporter(AbstractExporter, ABC):
|
|||
return smtp_server
|
||||
# except Exception as err:
|
||||
# traceback.print_tb(err.__traceback__)
|
||||
# logger.warning(err)
|
||||
# self.logger.warning(err)
|
||||
|
||||
def _export(self, recipient, subject, body):
|
||||
mime_msg = MIMEMultipart()
|
||||
|
@ -95,24 +116,35 @@ class MailExporter(AbstractExporter, ABC):
|
|||
smtp_client.quit()
|
||||
# except Exception as err:
|
||||
# traceback.print_tb(err.__traceback__)
|
||||
# logger.warning(err)
|
||||
print(f'Send notification: {subject} to {recipient}')
|
||||
# self.logger.warning(err)
|
||||
self.logger.info(f'Send notification: {subject} to {recipient}')
|
||||
|
||||
class MailExporterTracker(MailExporter):
|
||||
|
||||
def __init__(self, host=None, port=None, password=None, user='', sender=''):
|
||||
super().__init__(host=host, port=port, password=password, user=user, sender=sender)
|
||||
|
||||
def export(self, tracker, obj): # TODO match
|
||||
def export(self, tracker, obj, matches=[]):
|
||||
tracker_type = tracker.get_type()
|
||||
tracker_name = tracker.get_tracked()
|
||||
subject = f'AIL Framework Tracker: {tracker_name}' # TODO custom subject
|
||||
description = tracker.get_description()
|
||||
if not description:
|
||||
description = tracker_name
|
||||
|
||||
subject = f'AIL Framework Tracker: {description}'
|
||||
body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n"
|
||||
body += f'Item: {obj.id}\nurl:{obj.get_link()}'
|
||||
|
||||
# TODO match option
|
||||
# if match:
|
||||
# body += f'Tracker Match:\n\n{escape(match)}'
|
||||
if matches:
|
||||
body += '\n'
|
||||
nb = 1
|
||||
for match in matches:
|
||||
body += f'\nMatch {nb}: {match[0]}\nExtract:\n{match[1]}\n\n'
|
||||
nb += 1
|
||||
else:
|
||||
body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n"
|
||||
body += f'Item: {obj.id}\nurl:{obj.get_link()}'
|
||||
|
||||
# print(body)
|
||||
for mail in tracker.get_mails():
|
||||
self._export(mail, subject, body)
|
||||
|
|
|
@ -56,6 +56,8 @@ class FeederImporter(AbstractImporter):
|
|||
feeders = [f[:-3] for f in os.listdir(feeder_dir) if os.path.isfile(os.path.join(feeder_dir, f))]
|
||||
self.feeders = {}
|
||||
for feeder in feeders:
|
||||
if feeder == 'abstract_chats_feeder':
|
||||
continue
|
||||
print(feeder)
|
||||
part = feeder.split('.')[-1]
|
||||
# import json importer class
|
||||
|
@ -87,13 +89,27 @@ class FeederImporter(AbstractImporter):
|
|||
feeder_name = feeder.get_name()
|
||||
print(f'importing: {feeder_name} feeder')
|
||||
|
||||
item_id = feeder.get_item_id()
|
||||
# Get Data object:
|
||||
data_obj = feeder.get_obj()
|
||||
|
||||
# process meta
|
||||
if feeder.get_json_meta():
|
||||
feeder.process_meta()
|
||||
gzip64_content = feeder.get_gzip64_content()
|
||||
objs = feeder.process_meta()
|
||||
if objs is None:
|
||||
objs = set()
|
||||
else:
|
||||
objs = set()
|
||||
|
||||
return f'{feeder_name} {item_id} {gzip64_content}'
|
||||
if data_obj:
|
||||
objs.add(data_obj)
|
||||
|
||||
for obj in objs:
|
||||
if obj.type == 'item': # object save on disk as file (Items)
|
||||
gzip64_content = feeder.get_gzip64_content()
|
||||
return obj, f'{feeder_name} {gzip64_content}'
|
||||
else: # Messages save on DB
|
||||
if obj.exists() and obj.type != 'chat':
|
||||
return obj, f'{feeder_name}'
|
||||
|
||||
|
||||
class FeederModuleImporter(AbstractModule):
|
||||
|
@ -112,11 +128,14 @@ class FeederModuleImporter(AbstractModule):
|
|||
def compute(self, message):
|
||||
# TODO HANDLE Invalid JSON
|
||||
json_data = json.loads(message)
|
||||
relay_message = self.importer.importer(json_data)
|
||||
self.add_message_to_queue(relay_message)
|
||||
# TODO multiple objs + messages
|
||||
obj, relay_message = self.importer.importer(json_data)
|
||||
####
|
||||
self.add_message_to_queue(obj=obj, message=relay_message)
|
||||
|
||||
|
||||
# Launch Importer
|
||||
if __name__ == '__main__':
|
||||
module = FeederModuleImporter()
|
||||
# module.debug = True
|
||||
module.run()
|
||||
|
|
|
@ -19,42 +19,39 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
from importer.abstract_importer import AbstractImporter
|
||||
# from modules.abstract_module import AbstractModule
|
||||
from lib import ail_logger
|
||||
from lib.ail_queues import AILQueue
|
||||
# from lib.ail_queues import AILQueue
|
||||
from lib import ail_files # TODO RENAME ME
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
|
||||
# TODO Clean queue one object destruct
|
||||
|
||||
class FileImporter(AbstractImporter):
|
||||
def __init__(self, feeder='file_import'):
|
||||
super().__init__()
|
||||
super().__init__(queue=True)
|
||||
self.logger = logging.getLogger(f'{self.__class__.__name__}')
|
||||
|
||||
self.feeder_name = feeder # TODO sanityze feeder name
|
||||
|
||||
# Setup the I/O queues
|
||||
self.queue = AILQueue('FileImporter', 'manual')
|
||||
|
||||
def importer(self, path):
|
||||
if os.path.isfile(path):
|
||||
with open(path, 'rb') as f:
|
||||
content = f.read()
|
||||
mimetype = ail_files.get_mimetype(content)
|
||||
if ail_files.is_text(mimetype):
|
||||
if content:
|
||||
mimetype = ail_files.get_mimetype(content)
|
||||
item_id = ail_files.create_item_id(self.feeder_name, path)
|
||||
content = ail_files.create_gzipped_b64(content)
|
||||
if content:
|
||||
message = f'dir_import {item_id} {content}'
|
||||
self.logger.info(message)
|
||||
self.queue.send_message(message)
|
||||
elif mimetype == 'application/gzip':
|
||||
item_id = ail_files.create_item_id(self.feeder_name, path)
|
||||
content = ail_files.create_b64(content)
|
||||
if content:
|
||||
message = f'dir_import {item_id} {content}'
|
||||
self.logger.info(message)
|
||||
self.queue.send_message(message)
|
||||
gzipped = False
|
||||
if mimetype == 'application/gzip':
|
||||
gzipped = True
|
||||
elif not ail_files.is_text(mimetype): # # # #
|
||||
return None
|
||||
|
||||
source = 'dir_import'
|
||||
message = self.create_message(content, gzipped=gzipped, source=source)
|
||||
self.logger.info(f'{source} {item_id}')
|
||||
obj = Item(item_id)
|
||||
if message:
|
||||
self.add_message_to_queue(obj, message=message)
|
||||
|
||||
class DirImporter(AbstractImporter):
|
||||
def __init__(self):
|
||||
|
|
|
@ -10,9 +10,7 @@
|
|||
# https://github.com/cvandeplas/pystemon/blob/master/pystemon.yaml#L52
|
||||
#
|
||||
|
||||
import base64
|
||||
import os
|
||||
import gzip
|
||||
import sys
|
||||
import redis
|
||||
|
||||
|
@ -24,6 +22,8 @@ from importer.abstract_importer import AbstractImporter
|
|||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class PystemonImporter(AbstractImporter):
|
||||
def __init__(self, pystemon_dir, host='localhost', port=6379, db=10):
|
||||
super().__init__()
|
||||
|
@ -32,16 +32,12 @@ class PystemonImporter(AbstractImporter):
|
|||
self.r_pystemon = redis.StrictRedis(host=host, port=port, db=db, decode_responses=True)
|
||||
self.dir_pystemon = pystemon_dir
|
||||
|
||||
# # TODO: add exception
|
||||
def encode_and_compress_data(self, content):
|
||||
return base64.b64encode(gzip.compress(content)).decode()
|
||||
|
||||
def importer(self):
|
||||
item_id = self.r_pystemon.lpop("pastes")
|
||||
print(item_id)
|
||||
if item_id:
|
||||
print(item_id)
|
||||
full_item_path = os.path.join(self.dir_pystemon, item_id) # TODO SANITIZE PATH
|
||||
full_item_path = os.path.join(self.dir_pystemon, item_id) # TODO SANITIZE PATH
|
||||
# Check if pystemon file exists
|
||||
if not os.path.isfile(full_item_path):
|
||||
print(f'Error: {full_item_path}, file not found')
|
||||
|
@ -53,11 +49,19 @@ class PystemonImporter(AbstractImporter):
|
|||
if not content:
|
||||
return None
|
||||
|
||||
b64_gzipped_content = self.encode_and_compress_data(content)
|
||||
print(item_id, b64_gzipped_content)
|
||||
return f'{item_id} {b64_gzipped_content}'
|
||||
if full_item_path[-3:] == '.gz':
|
||||
gzipped = True
|
||||
else:
|
||||
gzipped = False
|
||||
|
||||
# TODO handle multiple objects
|
||||
source = 'pystemon'
|
||||
message = self.create_message(content, gzipped=gzipped, source=source)
|
||||
self.logger.info(f'{source} {item_id}')
|
||||
return item_id, message
|
||||
|
||||
except IOError as e:
|
||||
print(f'Error: {full_item_path}, IOError')
|
||||
self.logger.error(f'Error {e}: {full_item_path}, IOError')
|
||||
return None
|
||||
|
||||
|
||||
|
@ -81,8 +85,10 @@ class PystemonModuleImporter(AbstractModule):
|
|||
return self.importer.importer()
|
||||
|
||||
def compute(self, message):
|
||||
relay_message = f'pystemon {message}'
|
||||
self.add_message_to_queue(relay_message)
|
||||
if message:
|
||||
item_id, message = message
|
||||
item = Item(item_id)
|
||||
self.add_message_to_queue(obj=item, message=message)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -4,15 +4,13 @@
|
|||
Importer Class
|
||||
================
|
||||
|
||||
Import Content
|
||||
ZMQ Importer
|
||||
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
|
||||
import zmq
|
||||
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
|
@ -21,6 +19,8 @@ from importer.abstract_importer import AbstractImporter
|
|||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class ZMQImporters(AbstractImporter):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
@ -56,6 +56,8 @@ class ZMQModuleImporter(AbstractModule):
|
|||
super().__init__()
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name")
|
||||
|
||||
addresses = config_loader.get_config_str('ZMQ_Global', 'address')
|
||||
addresses = addresses.split(',')
|
||||
channel = config_loader.get_config_str('ZMQ_Global', 'channel')
|
||||
|
@ -63,7 +65,6 @@ class ZMQModuleImporter(AbstractModule):
|
|||
for address in addresses:
|
||||
self.zmq_importer.add(address.strip(), channel)
|
||||
|
||||
# TODO MESSAGE SOURCE - UI
|
||||
def get_message(self):
|
||||
for message in self.zmq_importer.importer():
|
||||
# remove channel from message
|
||||
|
@ -72,8 +73,20 @@ class ZMQModuleImporter(AbstractModule):
|
|||
def compute(self, messages):
|
||||
for message in messages:
|
||||
message = message.decode()
|
||||
print(message.split(' ', 1)[0])
|
||||
self.add_message_to_queue(message)
|
||||
|
||||
obj_id, gzip64encoded = message.split(' ', 1) # TODO ADD LOGS
|
||||
splitted = obj_id.split('>>', 1)
|
||||
if len(splitted) == 2:
|
||||
feeder_name, obj_id = splitted
|
||||
else:
|
||||
feeder_name = self.default_feeder_name
|
||||
|
||||
obj = Item(obj_id)
|
||||
# f'{source} {content}'
|
||||
relay_message = f'{feeder_name} {gzip64encoded}'
|
||||
|
||||
print(f'feeder_name item::{obj_id}')
|
||||
self.add_message_to_queue(obj=obj, message=relay_message)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -7,26 +7,41 @@ Importer Class
|
|||
Import Content
|
||||
|
||||
"""
|
||||
import base64
|
||||
import gzip
|
||||
import logging
|
||||
import logging.config
|
||||
import os
|
||||
import sys
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
|
||||
# sys.path.append(os.environ['AIL_BIN'])
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
# from ConfigLoader import ConfigLoader
|
||||
from lib import ail_logger
|
||||
from lib.ail_queues import AILQueue
|
||||
|
||||
class AbstractImporter(ABC):
|
||||
def __init__(self):
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
|
||||
# TODO Clean queue one object destruct
|
||||
|
||||
class AbstractImporter(ABC): # TODO ail queues
|
||||
def __init__(self, queue=False):
|
||||
"""
|
||||
Init Module
|
||||
importer_name: str; set the importer name if different from the instance ClassName
|
||||
AIL Importer
|
||||
:param queue: Allow to push messages to other modules
|
||||
"""
|
||||
# Module name if provided else instance className
|
||||
self.name = self._name()
|
||||
self.logger = logging.getLogger(f'{self.__class__.__name__}')
|
||||
|
||||
# Setup the I/O queues for one shot importers
|
||||
if queue:
|
||||
self.queue = AILQueue(self.name, 'importer_manual')
|
||||
|
||||
@abstractmethod
|
||||
def importer(self, *args, **kwargs):
|
||||
|
@ -39,4 +54,57 @@ class AbstractImporter(ABC):
|
|||
"""
|
||||
return self.__class__.__name__
|
||||
|
||||
def add_message_to_queue(self, obj, message='', queue=None):
|
||||
"""
|
||||
Add message to queue
|
||||
:param obj: AILObject
|
||||
:param message: message to send in queue
|
||||
:param queue: queue name or module name
|
||||
|
||||
ex: add_message_to_queue(item_id, 'Mail')
|
||||
"""
|
||||
if not obj:
|
||||
raise Exception(f'Invalid AIL object, {obj}')
|
||||
obj_global_id = obj.get_global_id()
|
||||
self.queue.send_message(obj_global_id, message, queue)
|
||||
|
||||
def get_available_queues(self):
|
||||
return self.queue.get_out_queues()
|
||||
|
||||
@staticmethod
|
||||
def b64(content):
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
return base64.b64encode(content).decode()
|
||||
|
||||
@staticmethod
|
||||
def create_gzip(content):
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
return gzip.compress(content)
|
||||
|
||||
def b64_gzip(self, content):
|
||||
try:
|
||||
gziped = self.create_gzip(content)
|
||||
return self.b64(gziped)
|
||||
except Exception as e:
|
||||
self.logger.warning(e)
|
||||
return ''
|
||||
|
||||
def create_message(self, content, b64=False, gzipped=False, source=None):
|
||||
if not source:
|
||||
source = self.name
|
||||
|
||||
if content:
|
||||
if not gzipped:
|
||||
content = self.b64_gzip(content)
|
||||
elif not b64:
|
||||
content = self.b64(content)
|
||||
if not content:
|
||||
return None
|
||||
if isinstance(content, bytes):
|
||||
content = content.decode()
|
||||
return f'{source} {content}'
|
||||
else:
|
||||
return f'{source}'
|
||||
|
||||
|
|
|
@ -33,3 +33,4 @@ class BgpMonitorFeeder(DefaultFeeder):
|
|||
tag = 'infoleak:automatic-detection=bgp_monitor'
|
||||
item = Item(self.get_item_id())
|
||||
item.add_tag(tag)
|
||||
return set()
|
||||
|
|
|
@ -9,14 +9,21 @@ Process Feeder Json (example: Twitter feeder)
|
|||
"""
|
||||
import os
|
||||
import datetime
|
||||
import sys
|
||||
import uuid
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.objects import ail_objects
|
||||
|
||||
class DefaultFeeder:
|
||||
"""Default Feeder"""
|
||||
|
||||
def __init__(self, json_data):
|
||||
self.json_data = json_data
|
||||
self.item_id = None
|
||||
self.obj = None
|
||||
self.name = None
|
||||
|
||||
def get_name(self):
|
||||
|
@ -24,8 +31,12 @@ class DefaultFeeder:
|
|||
Return feeder name. first part of the item_id and display in the UI
|
||||
"""
|
||||
if not self.name:
|
||||
return self.get_source()
|
||||
return self.name
|
||||
name = self.get_source()
|
||||
else:
|
||||
name = self.name
|
||||
if not name:
|
||||
name = 'default'
|
||||
return name
|
||||
|
||||
def get_source(self):
|
||||
return self.json_data.get('source')
|
||||
|
@ -51,15 +62,22 @@ class DefaultFeeder:
|
|||
"""
|
||||
return self.json_data.get('data')
|
||||
|
||||
def get_obj_type(self):
|
||||
meta = self.get_json_meta()
|
||||
return meta.get('type', 'item')
|
||||
|
||||
## OVERWRITE ME ##
|
||||
def get_item_id(self):
|
||||
def get_obj(self):
|
||||
"""
|
||||
Return item id. define item id
|
||||
Return obj global id. define obj global id
|
||||
Default == item object
|
||||
"""
|
||||
date = datetime.date.today().strftime("%Y/%m/%d")
|
||||
item_id = os.path.join(self.get_name(), date, str(uuid.uuid4()))
|
||||
self.item_id = f'{item_id}.gz'
|
||||
return self.item_id
|
||||
obj_id = os.path.join(self.get_name(), date, str(uuid.uuid4()))
|
||||
obj_id = f'{obj_id}.gz'
|
||||
obj_id = f'item::{obj_id}'
|
||||
self.obj = ail_objects.get_obj_from_global_id(obj_id)
|
||||
return self.obj
|
||||
|
||||
## OVERWRITE ME ##
|
||||
def process_meta(self):
|
||||
|
@ -67,4 +85,4 @@ class DefaultFeeder:
|
|||
Process JSON meta filed.
|
||||
"""
|
||||
# meta = self.get_json_meta()
|
||||
pass
|
||||
return set()
|
||||
|
|
38
bin/importer/feeders/Discord.py
Executable file
38
bin/importer/feeders/Discord.py
Executable file
|
@ -0,0 +1,38 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Telegram Feeder Importer Module
|
||||
================
|
||||
|
||||
Process Telegram JSON
|
||||
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import datetime
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from importer.feeders.abstract_chats_feeder import AbstractChatFeeder
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import ail_objects
|
||||
from lib.objects.Chats import Chat
|
||||
from lib.objects import Messages
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects.Usernames import Username
|
||||
|
||||
import base64
|
||||
|
||||
class DiscordFeeder(AbstractChatFeeder):
|
||||
|
||||
def __init__(self, json_data):
|
||||
super().__init__('discord', json_data)
|
||||
|
||||
# def get_obj(self):.
|
||||
# obj_id = Messages.create_obj_id('telegram', chat_id, message_id, timestamp)
|
||||
# obj_id = f'message:telegram:{obj_id}'
|
||||
# self.obj = ail_objects.get_obj_from_global_id(obj_id)
|
||||
# return self.obj
|
||||
|
|
@ -17,7 +17,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import item_basic
|
||||
from lib.objects.Items import Item
|
||||
|
||||
|
||||
class JabberFeeder(DefaultFeeder):
|
||||
|
@ -36,7 +36,7 @@ class JabberFeeder(DefaultFeeder):
|
|||
self.item_id = f'{item_id}.gz'
|
||||
return self.item_id
|
||||
|
||||
def process_meta(self):
|
||||
def process_meta(self): # TODO replace me by message
|
||||
"""
|
||||
Process JSON meta field.
|
||||
"""
|
||||
|
@ -44,10 +44,12 @@ class JabberFeeder(DefaultFeeder):
|
|||
# item_basic.add_map_obj_id_item_id(jabber_id, item_id, 'jabber_id') ##############################################
|
||||
to = str(self.json_data['meta']['jabber:to'])
|
||||
fr = str(self.json_data['meta']['jabber:from'])
|
||||
date = item_basic.get_item_date(item_id)
|
||||
|
||||
item = Item(self.item_id)
|
||||
date = item.get_date()
|
||||
|
||||
user_to = Username(to, 'jabber')
|
||||
user_fr = Username(fr, 'jabber')
|
||||
user_to.add(date, self.item_id)
|
||||
user_fr.add(date, self.item_id)
|
||||
return None
|
||||
user_to.add(date, item)
|
||||
user_fr.add(date, item)
|
||||
return set()
|
||||
|
|
|
@ -15,42 +15,24 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from importer.feeders.abstract_chats_feeder import AbstractChatFeeder
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import ail_objects
|
||||
from lib.objects.Chats import Chat
|
||||
from lib.objects import Messages
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import item_basic
|
||||
|
||||
class TelegramFeeder(DefaultFeeder):
|
||||
import base64
|
||||
|
||||
class TelegramFeeder(AbstractChatFeeder):
|
||||
|
||||
def __init__(self, json_data):
|
||||
super().__init__(json_data)
|
||||
self.name = 'telegram'
|
||||
super().__init__('telegram', json_data)
|
||||
|
||||
# define item id
|
||||
def get_item_id(self):
|
||||
# TODO use telegram message date
|
||||
date = datetime.date.today().strftime("%Y/%m/%d")
|
||||
channel_id = str(self.json_data['meta']['channel_id'])
|
||||
message_id = str(self.json_data['meta']['message_id'])
|
||||
item_id = f'{channel_id}_{message_id}'
|
||||
item_id = os.path.join('telegram', date, item_id)
|
||||
self.item_id = f'{item_id}.gz'
|
||||
return self.item_id
|
||||
# def get_obj(self):.
|
||||
# obj_id = Messages.create_obj_id('telegram', chat_id, message_id, timestamp)
|
||||
# obj_id = f'message:telegram:{obj_id}'
|
||||
# self.obj = ail_objects.get_obj_from_global_id(obj_id)
|
||||
# return self.obj
|
||||
|
||||
def process_meta(self):
|
||||
"""
|
||||
Process JSON meta field.
|
||||
"""
|
||||
# channel_id = str(self.json_data['meta']['channel_id'])
|
||||
# message_id = str(self.json_data['meta']['message_id'])
|
||||
# telegram_id = f'{channel_id}_{message_id}'
|
||||
# item_basic.add_map_obj_id_item_id(telegram_id, item_id, 'telegram_id') #########################################
|
||||
user = None
|
||||
if self.json_data['meta'].get('user'):
|
||||
user = str(self.json_data['meta']['user'])
|
||||
elif self.json_data['meta'].get('channel'):
|
||||
user = str(self.json_data['meta']['channel'].get('username'))
|
||||
if user:
|
||||
date = item_basic.get_item_date(self.item_id)
|
||||
username = Username(user, 'telegram')
|
||||
username.add(date, self.item_id)
|
||||
return None
|
||||
|
|
|
@ -17,7 +17,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import item_basic
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class TwitterFeeder(DefaultFeeder):
|
||||
|
||||
|
@ -40,9 +40,9 @@ class TwitterFeeder(DefaultFeeder):
|
|||
'''
|
||||
# tweet_id = str(self.json_data['meta']['twitter:tweet_id'])
|
||||
# item_basic.add_map_obj_id_item_id(tweet_id, item_id, 'twitter_id') ############################################
|
||||
|
||||
date = item_basic.get_item_date(self.item_id)
|
||||
item = Item(self.item_id)
|
||||
date = item.get_date()
|
||||
user = str(self.json_data['meta']['twitter:id'])
|
||||
username = Username(user, 'twitter')
|
||||
username.add(date, item_id)
|
||||
return None
|
||||
username.add(date, item)
|
||||
return set()
|
||||
|
|
|
@ -56,3 +56,5 @@ class UrlextractFeeder(DefaultFeeder):
|
|||
item = Item(self.item_id)
|
||||
item.set_parent(parent_id)
|
||||
|
||||
return set()
|
||||
|
||||
|
|
394
bin/importer/feeders/abstract_chats_feeder.py
Executable file
394
bin/importer/feeders/abstract_chats_feeder.py
Executable file
|
@ -0,0 +1,394 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
Abstract Chat JSON Feeder Importer Module
|
||||
================
|
||||
|
||||
Process Feeder Json (example: Twitter feeder)
|
||||
|
||||
"""
|
||||
import datetime
|
||||
import os
|
||||
import sys
|
||||
|
||||
from abc import ABC
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from lib.objects.Chats import Chat
|
||||
from lib.objects import ChatSubChannels
|
||||
from lib.objects import ChatThreads
|
||||
from lib.objects import Images
|
||||
from lib.objects import Messages
|
||||
from lib.objects import FilesNames
|
||||
# from lib.objects import Files
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import chats_viewer
|
||||
|
||||
import base64
|
||||
import io
|
||||
import gzip
|
||||
|
||||
# TODO remove compression ???
|
||||
def _gunzip_bytes_obj(bytes_obj):
|
||||
gunzipped_bytes_obj = None
|
||||
try:
|
||||
in_ = io.BytesIO()
|
||||
in_.write(bytes_obj)
|
||||
in_.seek(0)
|
||||
|
||||
with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
|
||||
gunzipped_bytes_obj = fo.read()
|
||||
except Exception as e:
|
||||
print(f'Global; Invalid Gzip file: {e}')
|
||||
|
||||
return gunzipped_bytes_obj
|
||||
|
||||
class AbstractChatFeeder(DefaultFeeder, ABC):
|
||||
|
||||
def __init__(self, name, json_data):
|
||||
super().__init__(json_data)
|
||||
self.obj = None
|
||||
self.name = name
|
||||
|
||||
def get_chat_protocol(self): # TODO # # # # # # # # # # # # #
|
||||
return self.name
|
||||
|
||||
def get_chat_network(self):
|
||||
self.json_data['meta'].get('network', None)
|
||||
|
||||
def get_chat_address(self):
|
||||
self.json_data['meta'].get('address', None)
|
||||
|
||||
def get_chat_instance_uuid(self):
|
||||
chat_instance_uuid = chats_viewer.create_chat_service_instance(self.get_chat_protocol(),
|
||||
network=self.get_chat_network(),
|
||||
address=self.get_chat_address())
|
||||
# TODO SET
|
||||
return chat_instance_uuid
|
||||
|
||||
def get_chat_id(self): # TODO RAISE ERROR IF NONE
|
||||
return self.json_data['meta']['chat']['id']
|
||||
|
||||
def get_subchannel_id(self):
|
||||
return self.json_data['meta']['chat'].get('subchannel', {}).get('id')
|
||||
|
||||
def get_subchannels(self):
|
||||
pass
|
||||
|
||||
def get_thread_id(self):
|
||||
return self.json_data['meta'].get('thread', {}).get('id')
|
||||
|
||||
def get_message_id(self):
|
||||
return self.json_data['meta']['id']
|
||||
|
||||
def get_media_name(self):
|
||||
return self.json_data['meta'].get('media', {}).get('name')
|
||||
|
||||
def get_reactions(self):
|
||||
return self.json_data['meta'].get('reactions', [])
|
||||
|
||||
def get_message_timestamp(self):
|
||||
if not self.json_data['meta'].get('date'):
|
||||
return None
|
||||
else:
|
||||
return self.json_data['meta']['date']['timestamp']
|
||||
# if self.json_data['meta'].get('date'):
|
||||
# date = datetime.datetime.fromtimestamp( self.json_data['meta']['date']['timestamp'])
|
||||
# date = date.strftime('%Y/%m/%d')
|
||||
# else:
|
||||
# date = datetime.date.today().strftime("%Y/%m/%d")
|
||||
|
||||
def get_message_date_timestamp(self):
|
||||
timestamp = self.get_message_timestamp()
|
||||
date = datetime.datetime.fromtimestamp(timestamp)
|
||||
date = date.strftime('%Y%m%d')
|
||||
return date, timestamp
|
||||
|
||||
def get_message_sender_id(self):
|
||||
return self.json_data['meta']['sender']['id']
|
||||
|
||||
def get_message_reply(self):
|
||||
return self.json_data['meta'].get('reply_to') # TODO change to reply ???
|
||||
|
||||
def get_message_reply_id(self):
|
||||
return self.json_data['meta'].get('reply_to', {}).get('message_id')
|
||||
|
||||
def get_message_forward(self):
|
||||
return self.json_data['meta'].get('forward')
|
||||
|
||||
def get_message_content(self):
|
||||
decoded = base64.standard_b64decode(self.json_data['data'])
|
||||
return _gunzip_bytes_obj(decoded)
|
||||
|
||||
def get_obj(self):
|
||||
#### TIMESTAMP ####
|
||||
timestamp = self.get_message_timestamp()
|
||||
|
||||
#### Create Object ID ####
|
||||
chat_id = self.get_chat_id()
|
||||
try:
|
||||
message_id = self.get_message_id()
|
||||
except KeyError:
|
||||
if chat_id:
|
||||
self.obj = Chat(chat_id, self.get_chat_instance_uuid())
|
||||
return self.obj
|
||||
else:
|
||||
self.obj = None
|
||||
return None
|
||||
|
||||
thread_id = self.get_thread_id()
|
||||
# channel id
|
||||
# thread id
|
||||
|
||||
# TODO sanitize obj type
|
||||
obj_type = self.get_obj_type()
|
||||
|
||||
if obj_type == 'image':
|
||||
self.obj = Images.Image(self.json_data['data-sha256'])
|
||||
|
||||
else:
|
||||
obj_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, thread_id=thread_id)
|
||||
self.obj = Messages.Message(obj_id)
|
||||
return self.obj
|
||||
|
||||
def process_chat(self, new_objs, obj, date, timestamp, reply_id=None):
|
||||
meta = self.json_data['meta']['chat'] # todo replace me by function
|
||||
chat = Chat(self.get_chat_id(), self.get_chat_instance_uuid())
|
||||
subchannel = None
|
||||
thread = None
|
||||
|
||||
# date stat + correlation
|
||||
chat.add(date, obj)
|
||||
|
||||
if meta.get('name'):
|
||||
chat.set_name(meta['name'])
|
||||
|
||||
if meta.get('info'):
|
||||
chat.set_info(meta['info'])
|
||||
|
||||
if meta.get('date'): # TODO check if already exists
|
||||
chat.set_created_at(int(meta['date']['timestamp']))
|
||||
|
||||
if meta.get('icon'):
|
||||
img = Images.create(meta['icon'], b64=True)
|
||||
img.add(date, chat)
|
||||
chat.set_icon(img.get_global_id())
|
||||
new_objs.add(img)
|
||||
|
||||
if meta.get('username'):
|
||||
username = Username(meta['username'], self.get_chat_protocol())
|
||||
chat.update_username_timeline(username.get_global_id(), timestamp)
|
||||
|
||||
if meta.get('subchannel'):
|
||||
subchannel, thread = self.process_subchannel(obj, date, timestamp, reply_id=reply_id)
|
||||
chat.add_children(obj_global_id=subchannel.get_global_id())
|
||||
else:
|
||||
if obj.type == 'message':
|
||||
if self.get_thread_id():
|
||||
thread = self.process_thread(obj, chat, date, timestamp, reply_id=reply_id)
|
||||
else:
|
||||
chat.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
|
||||
|
||||
chats_obj = [chat]
|
||||
if subchannel:
|
||||
chats_obj.append(subchannel)
|
||||
if thread:
|
||||
chats_obj.append(thread)
|
||||
return chats_obj
|
||||
|
||||
def process_subchannel(self, obj, date, timestamp, reply_id=None): # TODO CREATE DATE
|
||||
meta = self.json_data['meta']['chat']['subchannel']
|
||||
subchannel = ChatSubChannels.ChatSubChannel(f'{self.get_chat_id()}/{meta["id"]}', self.get_chat_instance_uuid())
|
||||
thread = None
|
||||
|
||||
# TODO correlation with obj = message/image
|
||||
subchannel.add(date)
|
||||
|
||||
if meta.get('date'): # TODO check if already exists
|
||||
subchannel.set_created_at(int(meta['date']['timestamp']))
|
||||
|
||||
if meta.get('name'):
|
||||
subchannel.set_name(meta['name'])
|
||||
# subchannel.update_name(meta['name'], timestamp) # TODO #################
|
||||
|
||||
if meta.get('info'):
|
||||
subchannel.set_info(meta['info'])
|
||||
|
||||
if obj.type == 'message':
|
||||
if self.get_thread_id():
|
||||
thread = self.process_thread(obj, subchannel, date, timestamp, reply_id=reply_id)
|
||||
else:
|
||||
subchannel.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
|
||||
return subchannel, thread
|
||||
|
||||
def process_thread(self, obj, obj_chat, date, timestamp, reply_id=None):
|
||||
meta = self.json_data['meta']['thread']
|
||||
thread_id = self.get_thread_id()
|
||||
p_chat_id = meta['parent'].get('chat')
|
||||
p_subchannel_id = meta['parent'].get('subchannel')
|
||||
p_message_id = meta['parent'].get('message')
|
||||
|
||||
# print(thread_id, p_chat_id, p_subchannel_id, p_message_id)
|
||||
|
||||
if p_chat_id == self.get_chat_id() and p_subchannel_id == self.get_subchannel_id():
|
||||
thread = ChatThreads.create(thread_id, self.get_chat_instance_uuid(), p_chat_id, p_subchannel_id, p_message_id, obj_chat)
|
||||
thread.add(date, obj)
|
||||
thread.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
|
||||
# TODO OTHERS CORRELATIONS TO ADD
|
||||
|
||||
if meta.get('name'):
|
||||
thread.set_name(meta['name'])
|
||||
|
||||
return thread
|
||||
|
||||
# TODO
|
||||
# else:
|
||||
# # ADD NEW MESSAGE REF (used by discord)
|
||||
|
||||
def process_sender(self, new_objs, obj, date, timestamp):
|
||||
meta = self.json_data['meta'].get('sender')
|
||||
if not meta:
|
||||
return None
|
||||
|
||||
user_account = UsersAccount.UserAccount(meta['id'], self.get_chat_instance_uuid())
|
||||
|
||||
# date stat + correlation
|
||||
user_account.add(date, obj)
|
||||
|
||||
if meta.get('username'):
|
||||
username = Username(meta['username'], self.get_chat_protocol())
|
||||
# TODO timeline or/and correlation ????
|
||||
user_account.add_correlation(username.type, username.get_subtype(r_str=True), username.id)
|
||||
user_account.update_username_timeline(username.get_global_id(), timestamp)
|
||||
|
||||
# Username---Message
|
||||
username.add(date) # TODO # correlation message ???
|
||||
|
||||
# ADDITIONAL METAS
|
||||
if meta.get('firstname'):
|
||||
user_account.set_first_name(meta['firstname'])
|
||||
if meta.get('lastname'):
|
||||
user_account.set_last_name(meta['lastname'])
|
||||
if meta.get('phone'):
|
||||
user_account.set_phone(meta['phone'])
|
||||
|
||||
if meta.get('icon'):
|
||||
img = Images.create(meta['icon'], b64=True)
|
||||
img.add(date, user_account)
|
||||
user_account.set_icon(img.get_global_id())
|
||||
new_objs.add(img)
|
||||
|
||||
if meta.get('info'):
|
||||
user_account.set_info(meta['info'])
|
||||
|
||||
return user_account
|
||||
|
||||
def process_meta(self): # TODO CHECK MANDATORY FIELDS
|
||||
"""
|
||||
Process JSON meta filed.
|
||||
"""
|
||||
# meta = self.get_json_meta()
|
||||
|
||||
objs = set()
|
||||
if self.obj:
|
||||
objs.add(self.obj)
|
||||
new_objs = set()
|
||||
|
||||
date, timestamp = self.get_message_date_timestamp()
|
||||
|
||||
# REPLY
|
||||
reply_id = self.get_message_reply_id()
|
||||
|
||||
print(self.obj.type)
|
||||
|
||||
# TODO FILES + FILES REF
|
||||
|
||||
# get object by meta object type
|
||||
if self.obj.type == 'message':
|
||||
# Content
|
||||
obj = Messages.create(self.obj.id, self.get_message_content())
|
||||
|
||||
# FILENAME
|
||||
media_name = self.get_media_name()
|
||||
if media_name:
|
||||
print(media_name)
|
||||
FilesNames.FilesNames().create(media_name, date, obj)
|
||||
|
||||
for reaction in self.get_reactions():
|
||||
obj.add_reaction(reaction['reaction'], int(reaction['count']))
|
||||
elif self.obj.type == 'chat':
|
||||
pass
|
||||
else:
|
||||
chat_id = self.get_chat_id()
|
||||
thread_id = self.get_thread_id()
|
||||
channel_id = self.get_subchannel_id()
|
||||
message_id = self.get_message_id()
|
||||
message_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, channel_id=channel_id, thread_id=thread_id)
|
||||
message = Messages.Message(message_id)
|
||||
# create empty message if message don't exist
|
||||
if not message.exists():
|
||||
message.create('')
|
||||
objs.add(message)
|
||||
|
||||
if message.exists(): # TODO Correlation user-account image/filename ????
|
||||
obj = Images.create(self.get_message_content())
|
||||
obj.add(date, message)
|
||||
obj.set_parent(obj_global_id=message.get_global_id())
|
||||
|
||||
# FILENAME
|
||||
media_name = self.get_media_name()
|
||||
if media_name:
|
||||
FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
|
||||
|
||||
for reaction in self.get_reactions():
|
||||
message.add_reaction(reaction['reaction'], int(reaction['count']))
|
||||
|
||||
for obj in objs: # TODO PERF avoid parsing metas multiple times
|
||||
|
||||
# TODO get created subchannel + thread
|
||||
# => create correlation user-account with object
|
||||
|
||||
print(obj.id)
|
||||
|
||||
# CHAT
|
||||
chat_objs = self.process_chat(new_objs, obj, date, timestamp, reply_id=reply_id)
|
||||
|
||||
# Message forward
|
||||
# if self.get_json_meta().get('forward'):
|
||||
# forward_from = self.get_message_forward()
|
||||
# print('-----------------------------------------------------------')
|
||||
# print(forward_from)
|
||||
# if forward_from:
|
||||
# forward_from_type = forward_from['from']['type']
|
||||
# if forward_from_type == 'channel' or forward_from_type == 'chat':
|
||||
# chat_forward_id = forward_from['from']['id']
|
||||
# chat_forward = Chat(chat_forward_id, self.get_chat_instance_uuid())
|
||||
# if chat_forward.exists():
|
||||
# for chat_obj in chat_objs:
|
||||
# if chat_obj.type == 'chat':
|
||||
# chat_forward.add_relationship(chat_obj.get_global_id(), 'forward')
|
||||
# # chat_forward.add_relationship(obj.get_global_id(), 'forward')
|
||||
|
||||
# SENDER # TODO HANDLE NULL SENDER
|
||||
user_account = self.process_sender(new_objs, obj, date, timestamp)
|
||||
|
||||
if user_account:
|
||||
# UserAccount---ChatObjects
|
||||
for obj_chat in chat_objs:
|
||||
user_account.add_correlation(obj_chat.type, obj_chat.get_subtype(r_str=True), obj_chat.id)
|
||||
|
||||
# if chat: # TODO Chat---Username correlation ???
|
||||
# # Chat---Username => need to handle members and participants
|
||||
# chat.add_correlation(username.type, username.get_subtype(r_str=True), username.id)
|
||||
|
||||
# TODO Sender image -> correlation
|
||||
# image
|
||||
# -> subchannel ?
|
||||
# -> thread id ?
|
||||
|
||||
return new_objs | objs
|
|
@ -83,6 +83,7 @@ class ConfigLoader(object):
|
|||
else:
|
||||
return []
|
||||
|
||||
|
||||
# # # # Directory Config # # # #
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
|
|
@ -85,18 +85,18 @@ def add_obj_duplicate(algo, similarity, obj_type, subtype, obj_id, id_2):
|
|||
r_serv_db.sadd(f'obj:duplicates:{obj_type}:{subtype}:{obj_id}', f'{similarity}:{algo}:{id_2}')
|
||||
|
||||
|
||||
def add_duplicate(algo, hash_, similarity, obj_type, subtype, id, date_ymonth):
|
||||
def add_duplicate(algo, hash_, similarity, obj_type, subtype, obj_id, date_ymonth):
|
||||
obj2_id = get_object_id_by_hash(algo, hash_, date_ymonth)
|
||||
# same content
|
||||
if similarity == 100:
|
||||
dups = get_obj_duplicates(obj_type, subtype, id)
|
||||
dups = get_obj_duplicates(obj_type, subtype, obj_id)
|
||||
for dup_id in dups:
|
||||
for algo_dict in dups[dup_id]:
|
||||
if algo_dict['similarity'] == 100 and algo_dict['algo'] == algo:
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, id, dups[dup_id])
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, dups[dup_id], id)
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, id, obj2_id)
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, obj2_id, id)
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, obj_id, dups[dup_id])
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, dups[dup_id], obj_id)
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, obj_id, obj2_id)
|
||||
add_obj_duplicate(algo, similarity, obj_type, subtype, obj2_id, obj_id)
|
||||
|
||||
# TODO
|
||||
def delete_obj_duplicates():
|
||||
|
|
|
@ -16,12 +16,13 @@ import time
|
|||
import uuid
|
||||
|
||||
from enum import Enum
|
||||
from flask import escape
|
||||
from markupsafe import escape
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib import ConfigLoader
|
||||
from lib import Tag
|
||||
from lib.exceptions import UpdateInvestigationError
|
||||
|
@ -234,18 +235,27 @@ class Investigation(object):
|
|||
objs.append(dict_obj)
|
||||
return objs
|
||||
|
||||
def get_objects_comment(self, obj_global_id):
|
||||
return r_tracking.hget(f'investigations:objs:comment:{self.uuid}', obj_global_id)
|
||||
|
||||
def set_objects_comment(self, obj_global_id, comment):
|
||||
if comment:
|
||||
r_tracking.hset(f'investigations:objs:comment:{self.uuid}', obj_global_id, comment)
|
||||
|
||||
# # TODO: def register_object(self, Object): in OBJECT CLASS
|
||||
|
||||
def register_object(self, obj_id, obj_type, subtype):
|
||||
def register_object(self, obj_id, obj_type, subtype, comment=''):
|
||||
r_tracking.sadd(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
|
||||
r_tracking.sadd(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid)
|
||||
if comment:
|
||||
self.set_objects_comment(f'{obj_type}:{subtype}:{obj_id}', comment)
|
||||
timestamp = int(time.time())
|
||||
self.set_last_change(timestamp)
|
||||
|
||||
|
||||
def unregister_object(self, obj_id, obj_type, subtype):
|
||||
r_tracking.srem(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
|
||||
r_tracking.srem(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid)
|
||||
r_tracking.hdel(f'investigations:objs:comment:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
|
||||
timestamp = int(time.time())
|
||||
self.set_last_change(timestamp)
|
||||
|
||||
|
@ -350,7 +360,7 @@ def get_investigations_selector():
|
|||
for investigation_uuid in get_all_investigations():
|
||||
investigation = Investigation(investigation_uuid)
|
||||
name = investigation.get_info()
|
||||
l_investigations.append({"id":investigation_uuid, "name": name})
|
||||
l_investigations.append({"id": investigation_uuid, "name": name})
|
||||
return l_investigations
|
||||
|
||||
#{id:'8dc4b81aeff94a9799bd70ba556fa345',name:"Paris"}
|
||||
|
@ -445,14 +455,18 @@ def api_register_object(json_dict):
|
|||
investigation = Investigation(investigation_uuid)
|
||||
|
||||
obj_type = json_dict.get('type', '').replace(' ', '')
|
||||
if not exists_obj_type(obj_type):
|
||||
if obj_type not in ail_core.get_all_objects():
|
||||
return {"status": "error", "reason": f"Invalid Object Type: {obj_type}"}, 400
|
||||
|
||||
subtype = json_dict.get('subtype', '')
|
||||
if subtype == 'None':
|
||||
subtype = ''
|
||||
obj_id = json_dict.get('id', '').replace(' ', '')
|
||||
res = investigation.register_object(obj_id, obj_type, subtype)
|
||||
|
||||
comment = json_dict.get('comment', '')
|
||||
# if comment:
|
||||
# comment = escape(comment)
|
||||
res = investigation.register_object(obj_id, obj_type, subtype, comment=comment)
|
||||
return res, 200
|
||||
|
||||
def api_unregister_object(json_dict):
|
||||
|
|
|
@ -2,7 +2,24 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import html2text
|
||||
|
||||
import gcld3
|
||||
from libretranslatepy import LibreTranslateAPI
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
TRANSLATOR_URL = config_loader.get_config_str('Translation', 'libretranslate')
|
||||
config_loader = None
|
||||
|
||||
|
||||
dict_iso_languages = {
|
||||
'af': 'Afrikaans',
|
||||
|
@ -237,3 +254,201 @@ def get_iso_from_languages(l_languages, sort=False):
|
|||
if sort:
|
||||
l_iso = sorted(l_iso)
|
||||
return l_iso
|
||||
|
||||
|
||||
class LanguageDetector:
|
||||
pass
|
||||
|
||||
def get_translator_instance():
|
||||
return TRANSLATOR_URL
|
||||
|
||||
def _get_html2text(content, ignore_links=False):
|
||||
h = html2text.HTML2Text()
|
||||
h.ignore_links = ignore_links
|
||||
h.ignore_images = ignore_links
|
||||
return h.handle(content)
|
||||
|
||||
def _clean_text_to_translate(content, html=False, keys_blocks=True):
|
||||
if html:
|
||||
content = _get_html2text(content, ignore_links=True)
|
||||
|
||||
# REMOVE URLS
|
||||
regex = r'\b(?:http://|https://)?(?:[a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)(?:\:[0-9]+)*(?:/(?:$|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*\b'
|
||||
url_regex = re.compile(regex)
|
||||
urls = url_regex.findall(content)
|
||||
urls = sorted(urls, key=len, reverse=True)
|
||||
for url in urls:
|
||||
content = content.replace(url, '')
|
||||
|
||||
# REMOVE PGP Blocks
|
||||
if keys_blocks:
|
||||
regex_pgp_public_blocs = r'-----BEGIN PGP PUBLIC KEY BLOCK-----[\s\S]+?-----END PGP PUBLIC KEY BLOCK-----'
|
||||
regex_pgp_signature = r'-----BEGIN PGP SIGNATURE-----[\s\S]+?-----END PGP SIGNATURE-----'
|
||||
regex_pgp_message = r'-----BEGIN PGP MESSAGE-----[\s\S]+?-----END PGP MESSAGE-----'
|
||||
re.compile(regex_pgp_public_blocs)
|
||||
re.compile(regex_pgp_signature)
|
||||
re.compile(regex_pgp_message)
|
||||
res = re.findall(regex_pgp_public_blocs, content)
|
||||
for it in res:
|
||||
content = content.replace(it, '')
|
||||
res = re.findall(regex_pgp_signature, content)
|
||||
for it in res:
|
||||
content = content.replace(it, '')
|
||||
res = re.findall(regex_pgp_message, content)
|
||||
for it in res:
|
||||
content = content.replace(it, '')
|
||||
return content
|
||||
|
||||
#### AIL Objects ####
|
||||
|
||||
def get_obj_translation(obj_global_id, content, field='', source=None, target='en'):
|
||||
"""
|
||||
Returns translated content
|
||||
"""
|
||||
translation = r_cache.get(f'translation:{target}:{obj_global_id}:{field}')
|
||||
if translation:
|
||||
# DEBUG
|
||||
# print('cache')
|
||||
# r_cache.expire(f'translation:{target}:{obj_global_id}:{field}', 0)
|
||||
return translation
|
||||
translation = LanguageTranslator().translate(content, source=source, target=target)
|
||||
if translation:
|
||||
r_cache.set(f'translation:{target}:{obj_global_id}:{field}', translation)
|
||||
r_cache.expire(f'translation:{target}:{obj_global_id}:{field}', 300)
|
||||
return translation
|
||||
|
||||
## --AIL Objects-- ##
|
||||
|
||||
class LanguagesDetector:
|
||||
|
||||
def __init__(self, nb_langs=3, min_proportion=0.2, min_probability=0.7, min_len=0):
|
||||
self.lt = LibreTranslateAPI(get_translator_instance())
|
||||
try:
|
||||
self.lt.languages()
|
||||
except Exception:
|
||||
self.lt = None
|
||||
self.detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)
|
||||
self.nb_langs = nb_langs
|
||||
self.min_proportion = min_proportion
|
||||
self.min_probability = min_probability
|
||||
self.min_len = min_len
|
||||
|
||||
def detect_gcld3(self, content):
|
||||
languages = []
|
||||
content = _clean_text_to_translate(content, html=True)
|
||||
if self.min_len > 0:
|
||||
if len(content) < self.min_len:
|
||||
return languages
|
||||
for lang in self.detector.FindTopNMostFreqLangs(content, num_langs=self.nb_langs):
|
||||
if lang.proportion >= self.min_proportion and lang.probability >= self.min_probability and lang.is_reliable:
|
||||
languages.append(lang.language)
|
||||
return languages
|
||||
|
||||
def detect_libretranslate(self, content):
|
||||
languages = []
|
||||
try:
|
||||
# [{"confidence": 0.6, "language": "en"}]
|
||||
resp = self.lt.detect(content)
|
||||
except Exception as e: # TODO ERROR MESSAGE
|
||||
raise Exception(f'libretranslate error: {e}')
|
||||
# resp = []
|
||||
if resp:
|
||||
if isinstance(resp, dict):
|
||||
raise Exception(f'libretranslate error {resp}')
|
||||
for language in resp:
|
||||
if language.confidence >= self.min_probability:
|
||||
languages.append(language)
|
||||
return languages
|
||||
|
||||
def detect(self, content, force_gcld3=False):
|
||||
# gcld3
|
||||
if len(content) >= 200 or not self.lt or force_gcld3:
|
||||
language = self.detect_gcld3(content)
|
||||
# libretranslate
|
||||
else:
|
||||
language = self.detect_libretranslate(content)
|
||||
return language
|
||||
|
||||
class LanguageTranslator:
|
||||
|
||||
def __init__(self):
|
||||
self.lt = LibreTranslateAPI(get_translator_instance())
|
||||
|
||||
def languages(self):
|
||||
languages = []
|
||||
try:
|
||||
for dict_lang in self.lt.languages():
|
||||
languages.append({'iso': dict_lang['code'], 'language': dict_lang['name']})
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return languages
|
||||
|
||||
def detect_gcld3(self, content):
|
||||
content = _clean_text_to_translate(content, html=True)
|
||||
detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)
|
||||
lang = detector.FindLanguage(content)
|
||||
# print(lang.language)
|
||||
# print(lang.is_reliable)
|
||||
# print(lang.proportion)
|
||||
# print(lang.probability)
|
||||
return lang.language
|
||||
|
||||
def detect_libretranslate(self, content):
|
||||
try:
|
||||
language = self.lt.detect(content)
|
||||
except: # TODO ERROR MESSAGE
|
||||
language = None
|
||||
if language:
|
||||
return language[0].get('language')
|
||||
|
||||
def detect(self, content):
|
||||
# gcld3
|
||||
if len(content) >= 200:
|
||||
language = self.detect_gcld3(content)
|
||||
# libretranslate
|
||||
else:
|
||||
language = self.detect_libretranslate(content)
|
||||
return language
|
||||
|
||||
def translate(self, content, source=None, target="en"): # TODO source target
|
||||
if target not in get_translation_languages():
|
||||
return None
|
||||
translation = None
|
||||
if content:
|
||||
if not source:
|
||||
source = self.detect(content)
|
||||
# print(source, content)
|
||||
if source:
|
||||
if source != target:
|
||||
try:
|
||||
# print(content, source, target)
|
||||
translation = self.lt.translate(content, source, target)
|
||||
except:
|
||||
translation = None
|
||||
# TODO LOG and display error
|
||||
if translation == content:
|
||||
print('EQUAL')
|
||||
translation = None
|
||||
return translation
|
||||
|
||||
|
||||
LIST_LANGUAGES = {}
|
||||
def get_translation_languages():
|
||||
global LIST_LANGUAGES
|
||||
if not LIST_LANGUAGES:
|
||||
try:
|
||||
LIST_LANGUAGES = {}
|
||||
for lang in LanguageTranslator().languages():
|
||||
LIST_LANGUAGES[lang['iso']] = lang['language']
|
||||
except Exception as e:
|
||||
print(e)
|
||||
LIST_LANGUAGES = {}
|
||||
return LIST_LANGUAGES
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# t_content = ''
|
||||
langg = LanguageTranslator()
|
||||
# langg = LanguagesDetector()
|
||||
# lang.translate(t_content, source='ru')
|
||||
langg.languages()
|
||||
|
|
216
bin/lib/Tag.py
216
bin/lib/Tag.py
|
@ -64,7 +64,7 @@ unsafe_tags = build_unsafe_tags()
|
|||
# get set_keys: intersection
|
||||
def get_obj_keys_by_tags(tags, obj_type, subtype='', date=None):
|
||||
l_set_keys = []
|
||||
if obj_type == 'item':
|
||||
if obj_type == 'item' or obj_type == 'message':
|
||||
for tag in tags:
|
||||
l_set_keys.append(f'{obj_type}:{subtype}:{tag}:{date}')
|
||||
else:
|
||||
|
@ -96,8 +96,6 @@ def get_taxonomies():
|
|||
def get_active_taxonomies():
|
||||
return r_tags.smembers('taxonomies:enabled')
|
||||
|
||||
'active_taxonomies'
|
||||
|
||||
def is_taxonomy_enabled(taxonomy):
|
||||
# enabled = r_tags.sismember('taxonomies:enabled', taxonomy)
|
||||
try:
|
||||
|
@ -340,7 +338,7 @@ def get_galaxy_meta(galaxy_name, nb_active_tags=False):
|
|||
else:
|
||||
meta['icon'] = f'fas fa-{icon}'
|
||||
if nb_active_tags:
|
||||
meta['nb_active_tags'] = get_galaxy_nb_tags_enabled(galaxy)
|
||||
meta['nb_active_tags'] = get_galaxy_nb_tags_enabled(galaxy.type)
|
||||
meta['nb_tags'] = len(get_galaxy_tags(galaxy.type))
|
||||
return meta
|
||||
|
||||
|
@ -389,8 +387,12 @@ def get_cluster_tags(cluster_type, enabled=False):
|
|||
meta_tag = {'tag': tag, 'description': cluster_val.description}
|
||||
if enabled:
|
||||
meta_tag['enabled'] = is_galaxy_tag_enabled(cluster_type, tag)
|
||||
synonyms = cluster_val.meta.synonyms
|
||||
if not synonyms:
|
||||
cluster_val_meta = cluster_val.meta
|
||||
if cluster_val_meta:
|
||||
synonyms = cluster_val_meta.synonyms
|
||||
if not synonyms:
|
||||
synonyms = []
|
||||
else:
|
||||
synonyms = []
|
||||
meta_tag['synonyms'] = synonyms
|
||||
tags.append(meta_tag)
|
||||
|
@ -633,7 +635,7 @@ def update_tag_metadata(tag, date, delete=False): # # TODO: delete Tags
|
|||
# r_tags.smembers(f'{tag}:{date}')
|
||||
# r_tags.smembers(f'{obj_type}:{tag}')
|
||||
def get_tag_objects(tag, obj_type, subtype='', date=''):
|
||||
if obj_type == 'item':
|
||||
if obj_type == 'item' or obj_type == 'message':
|
||||
return r_tags.smembers(f'{obj_type}:{subtype}:{tag}:{date}')
|
||||
else:
|
||||
return r_tags.smembers(f'{obj_type}:{subtype}:{tag}')
|
||||
|
@ -641,23 +643,32 @@ def get_tag_objects(tag, obj_type, subtype='', date=''):
|
|||
def get_object_tags(obj_type, obj_id, subtype=''):
|
||||
return r_tags.smembers(f'tag:{obj_type}:{subtype}:{obj_id}')
|
||||
|
||||
def add_object_tag(tag, obj_type, id, subtype=''):
|
||||
if r_tags.sadd(f'tag:{obj_type}:{subtype}:{id}', tag) == 1:
|
||||
def add_object_tag(tag, obj_type, obj_id, subtype=''):
|
||||
if r_tags.sadd(f'tag:{obj_type}:{subtype}:{obj_id}', tag) == 1:
|
||||
r_tags.sadd('list_tags', tag)
|
||||
r_tags.sadd(f'list_tags:{obj_type}', tag)
|
||||
r_tags.sadd(f'list_tags:{obj_type}:{subtype}', tag)
|
||||
if obj_type == 'item':
|
||||
date = item_basic.get_item_date(id)
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', id)
|
||||
date = item_basic.get_item_date(obj_id)
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', obj_id)
|
||||
|
||||
# add domain tag
|
||||
if item_basic.is_crawled(id) and tag != 'infoleak:submission="crawler"' and tag != 'infoleak:submission="manual"':
|
||||
domain = item_basic.get_item_domain(id)
|
||||
if item_basic.is_crawled(obj_id) and tag != 'infoleak:submission="crawler"' and tag != 'infoleak:submission="manual"':
|
||||
domain = item_basic.get_item_domain(obj_id)
|
||||
add_object_tag(tag, "domain", domain)
|
||||
|
||||
update_tag_metadata(tag, date)
|
||||
# MESSAGE
|
||||
elif obj_type == 'message':
|
||||
timestamp = obj_id.split('/')[1]
|
||||
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', obj_id)
|
||||
|
||||
# TODO ADD CHAT TAGS ????
|
||||
|
||||
update_tag_metadata(tag, date)
|
||||
else:
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}', id)
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}', obj_id)
|
||||
|
||||
r_tags.hincrby(f'daily_tags:{datetime.date.today().strftime("%Y%m%d")}', tag, 1)
|
||||
|
||||
|
@ -673,8 +684,8 @@ def confirm_tag(tag, obj):
|
|||
# TODO REVIEW ME
|
||||
def update_tag_global_by_obj_type(tag, obj_type, subtype=''):
|
||||
tag_deleted = False
|
||||
if obj_type == 'item':
|
||||
if not r_tags.exists(f'tag_metadata:{tag}'):
|
||||
if obj_type == 'item' or obj_type == 'message':
|
||||
if not r_tags.exists(f'tag_metadata:{tag}'): # TODO FIXME #################################################################
|
||||
tag_deleted = True
|
||||
else:
|
||||
if not r_tags.exists(f'{obj_type}:{subtype}:{tag}'):
|
||||
|
@ -705,6 +716,12 @@ def delete_object_tag(tag, obj_type, id, subtype=''):
|
|||
date = item_basic.get_item_date(id)
|
||||
r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id)
|
||||
|
||||
update_tag_metadata(tag, date, delete=True)
|
||||
elif obj_type == 'message':
|
||||
timestamp = id.split('/')[1]
|
||||
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id)
|
||||
|
||||
update_tag_metadata(tag, date, delete=True)
|
||||
else:
|
||||
r_tags.srem(f'{obj_type}:{subtype}:{tag}', id)
|
||||
|
@ -727,7 +744,7 @@ def delete_object_tags(obj_type, subtype, obj_id):
|
|||
def get_obj_by_tags(obj_type, l_tags, date_from=None, date_to=None, nb_obj=50, page=1):
|
||||
# with daterange
|
||||
l_tagged_obj = []
|
||||
if obj_type=='item':
|
||||
if obj_type=='item' or obj_type=='message':
|
||||
#sanityze date
|
||||
date_range = sanitise_tags_date_range(l_tags, date_from=date_from, date_to=date_to)
|
||||
l_dates = Date.substract_date(date_range['date_from'], date_range['date_to'])
|
||||
|
@ -1183,12 +1200,17 @@ def get_enabled_tags_with_synonyms_ui():
|
|||
|
||||
# TYPE -> taxonomy/galaxy/custom
|
||||
|
||||
# TODO GET OBJ Types
|
||||
class Tag:
|
||||
|
||||
def __int__(self, name: str, local=False): # TODO Get first seen by object, obj='item
|
||||
self.name = name
|
||||
self.local = local
|
||||
|
||||
# TODO
|
||||
def exists(self):
|
||||
pass
|
||||
|
||||
def is_local(self):
|
||||
return self.local
|
||||
|
||||
|
@ -1199,7 +1221,11 @@ class Tag:
|
|||
else:
|
||||
return 'taxonomy'
|
||||
|
||||
def is_taxonomy(self):
|
||||
return not self.local and self.is_galaxy()
|
||||
|
||||
def is_galaxy(self):
|
||||
return not self.local and self.name.startswith('misp-galaxy:')
|
||||
|
||||
def get_first_seen(self, r_int=False):
|
||||
first_seen = r_tags.hget(f'meta:tag:{self.name}', 'first_seen')
|
||||
|
@ -1210,6 +1236,9 @@ class Tag:
|
|||
first_seen = 99999999
|
||||
return first_seen
|
||||
|
||||
def set_first_seen(self, first_seen):
|
||||
return r_tags.hget(f'meta:tag:{self.name}', 'first_seen', int(first_seen))
|
||||
|
||||
def get_last_seen(self, r_int=False):
|
||||
last_seen = r_tags.hget(f'meta:tag:{self.name}', 'last_seen') # 'last_seen:object' -> only if date or daterange
|
||||
if r_int:
|
||||
|
@ -1219,6 +1248,9 @@ class Tag:
|
|||
last_seen = 0
|
||||
return last_seen
|
||||
|
||||
def set_last_seen(self, last_seen):
|
||||
return r_tags.hset(f'meta:tag:{self.name}', 'last_seen', int(last_seen))
|
||||
|
||||
def get_color(self):
|
||||
color = r_tags.hget(f'meta:tag:{self.name}', 'color')
|
||||
if not color:
|
||||
|
@ -1241,6 +1273,131 @@ class Tag:
|
|||
'local': self.is_local()}
|
||||
return meta
|
||||
|
||||
def update_obj_type_first_seen(self, obj_type, first_seen, last_seen): # TODO SUBTYPE ##################################
|
||||
if int(first_seen) > int(last_seen):
|
||||
raise Exception(f'INVALID first_seen/last_seen, {first_seen}/{last_seen}')
|
||||
|
||||
for date in Date.get_daterange(first_seen, last_seen):
|
||||
date = int(date)
|
||||
if date == last_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'first_seen', first_seen)
|
||||
else:
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen') # TODO SUBTYPE
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen') # TODO SUBTYPE
|
||||
r_tags.srem(f'list_tags:{obj_type}', self.name) # TODO SUBTYPE
|
||||
|
||||
elif r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'first_seen', first_seen) # TODO METADATA OBJECT NAME
|
||||
|
||||
|
||||
def update_obj_type_last_seen(self, obj_type, first_seen, last_seen): # TODO SUBTYPE ##################################
|
||||
if int(first_seen) > int(last_seen):
|
||||
raise Exception(f'INVALID first_seen/last_seen, {first_seen}/{last_seen}')
|
||||
|
||||
for date in Date.get_daterange(first_seen, last_seen).reverse():
|
||||
date = int(date)
|
||||
if date == last_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'last_seen', last_seen)
|
||||
else:
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen') # TODO SUBTYPE
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen') # TODO SUBTYPE
|
||||
r_tags.srem(f'list_tags:{obj_type}', self.name) # TODO SUBTYPE
|
||||
|
||||
elif r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'last_seen', last_seen) # TODO METADATA OBJECT NAME
|
||||
|
||||
# TODO
|
||||
# TODO Update First seen and last seen
|
||||
# TODO SUBTYPE CHATS ??????????????
|
||||
def update_obj_type_date(self, obj_type, date, op='add', first_seen=None, last_seen=None):
|
||||
date = int(date)
|
||||
if not first_seen:
|
||||
first_seen = self.get_first_seen(r_int=True)
|
||||
if not last_seen:
|
||||
last_seen = self.get_last_seen(r_int=True)
|
||||
|
||||
# Add tag
|
||||
if op == 'add':
|
||||
if date < first_seen:
|
||||
self.set_first_seen(date)
|
||||
if date > last_seen:
|
||||
self.set_last_seen(date)
|
||||
|
||||
# Delete tag
|
||||
else:
|
||||
if date == first_seen and date == last_seen:
|
||||
|
||||
# TODO OBJECTS ##############################################################################################
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') < 1: ####################### TODO OBJ SUBTYPE ???????????????????
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen')
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen')
|
||||
# TODO CHECK IF DELETE FULL TAG LIST ############################
|
||||
|
||||
elif date == first_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') < 1:
|
||||
if int(last_seen) >= int(first_seen):
|
||||
self.update_obj_type_first_seen(obj_type, first_seen, last_seen) # TODO OBJ_TYPE
|
||||
|
||||
elif date == last_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') < 1:
|
||||
if int(last_seen) >= int(first_seen):
|
||||
self.update_obj_type_last_seen(obj_type, first_seen, last_seen) # TODO OBJ_TYPE
|
||||
|
||||
# STATS
|
||||
nb = r_tags.hincrby(f'daily_tags:{date}', self.name, -1)
|
||||
if nb < 1:
|
||||
r_tags.hdel(f'daily_tags:{date}', self.name)
|
||||
|
||||
# TODO -> CHECK IF TAG EXISTS + UPDATE FIRST SEEN/LAST SEEN
|
||||
def update(self, date=None):
|
||||
pass
|
||||
|
||||
# TODO CHANGE ME TO SUB FUNCTION ##### add_object_tag(tag, obj_type, obj_id, subtype='')
|
||||
def add(self, obj_type, subtype, obj_id):
|
||||
if subtype is None:
|
||||
subtype = ''
|
||||
|
||||
if r_tags.sadd(f'tag:{obj_type}:{subtype}:{obj_id}', self.name) == 1:
|
||||
r_tags.sadd('list_tags', self.name)
|
||||
r_tags.sadd(f'list_tags:{obj_type}', self.name)
|
||||
if subtype:
|
||||
r_tags.sadd(f'list_tags:{obj_type}:{subtype}', self.name)
|
||||
|
||||
if obj_type == 'item':
|
||||
date = item_basic.get_item_date(obj_id)
|
||||
|
||||
# add domain tag
|
||||
if item_basic.is_crawled(obj_id) and self.name != 'infoleak:submission="crawler"' and self.name != 'infoleak:submission="manual"':
|
||||
domain = item_basic.get_item_domain(obj_id)
|
||||
self.add('domain', '', domain)
|
||||
elif obj_type == 'message':
|
||||
timestamp = obj_id.split('/')[1]
|
||||
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
else:
|
||||
date = None
|
||||
|
||||
if date:
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{self.name}:{date}', obj_id)
|
||||
update_tag_metadata(self.name, date)
|
||||
else:
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{self.name}', obj_id)
|
||||
|
||||
# TODO REPLACE ME BY DATE TAGS ????
|
||||
# STATS BY TYPE ???
|
||||
# DAILY STATS
|
||||
r_tags.hincrby(f'daily_tags:{datetime.date.today().strftime("%Y%m%d")}', self.name, 1)
|
||||
|
||||
|
||||
# TODO CREATE FUNCTION GET OBJECT DATE
|
||||
def remove(self, obj_type, subtype, obj_id):
|
||||
# TODO CHECK IN ALL OBJECT TO DELETE
|
||||
pass
|
||||
|
||||
def delete(self):
|
||||
pass
|
||||
|
||||
|
||||
#### TAG AUTO PUSH ####
|
||||
|
||||
|
@ -1381,7 +1538,7 @@ def api_add_obj_tags(tags=[], galaxy_tags=[], object_id=None, object_type="item"
|
|||
# r_serv_metadata.srem('tag:{}'.format(object_id), tag)
|
||||
# r_tags.srem('{}:{}'.format(object_type, tag), object_id)
|
||||
|
||||
def delete_tag(object_type, tag, object_id, obj_date=None): ################################ # TODO:
|
||||
def delete_tag(object_type, tag, object_id, obj_date=None): ################################ # TODO: REMOVE ME
|
||||
# tag exist
|
||||
if is_obj_tagged(object_id, tag):
|
||||
if not obj_date:
|
||||
|
@ -1447,6 +1604,29 @@ def get_list_of_solo_tags_to_export_by_type(export_type): # by type
|
|||
return None
|
||||
#r_serv_db.smembers('whitelist_hive')
|
||||
|
||||
def _fix_tag_obj_id(date_from):
|
||||
date_to = datetime.date.today().strftime("%Y%m%d")
|
||||
for obj_type in ail_core.get_all_objects():
|
||||
print(obj_type)
|
||||
for tag in get_all_obj_tags(obj_type):
|
||||
if ';' in tag:
|
||||
print(tag)
|
||||
new_tag = tag.split(';')[0]
|
||||
print(new_tag)
|
||||
r_tags.hdel(f'tag_metadata:{tag}', 'first_seen')
|
||||
r_tags.hdel(f'tag_metadata:{tag}', 'last_seen')
|
||||
r_tags.srem(f'list_tags:{obj_type}', tag)
|
||||
r_tags.srem(f'list_tags:{obj_type}:', tag)
|
||||
r_tags.srem(f'list_tags', tag)
|
||||
raw = get_obj_by_tags(obj_type, [tag], nb_obj=500000, date_from=date_from, date_to=date_to)
|
||||
if raw.get('tagged_obj', []):
|
||||
for obj_id in raw['tagged_obj']:
|
||||
# print(obj_id)
|
||||
delete_object_tag(tag, obj_type, obj_id)
|
||||
add_object_tag(new_tag, obj_type, obj_id)
|
||||
else:
|
||||
update_tag_global_by_obj_type(tag, obj_type)
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# taxo = 'accessnow'
|
||||
# # taxo = TAXONOMIES.get(taxo)
|
||||
|
|
|
@ -2,6 +2,8 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
import json
|
||||
import os
|
||||
import logging
|
||||
import logging.config
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
|
@ -14,7 +16,7 @@ from ail_typo_squatting import runAll
|
|||
import math
|
||||
|
||||
from collections import defaultdict
|
||||
from flask import escape
|
||||
from markupsafe import escape
|
||||
from textblob import TextBlob
|
||||
from nltk.tokenize import RegexpTokenizer
|
||||
|
||||
|
@ -24,11 +26,16 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from packages import Date
|
||||
from lib.ail_core import get_objects_tracked, get_object_all_subtypes, get_objects_retro_hunted
|
||||
from lib import ail_logger
|
||||
from lib import ConfigLoader
|
||||
from lib import item_basic
|
||||
from lib import Tag
|
||||
from lib.Users import User
|
||||
|
||||
# LOGS
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
logger = logging.getLogger()
|
||||
|
||||
config_loader = ConfigLoader.ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
|
||||
|
@ -207,6 +214,13 @@ class Tracker:
|
|||
if filters:
|
||||
self._set_field('filters', json.dumps(filters))
|
||||
|
||||
def del_filters(self, tracker_type, to_track):
|
||||
filters = self.get_filters()
|
||||
for obj_type in filters:
|
||||
r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', to_track)
|
||||
r_tracker.srem(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}')
|
||||
r_tracker.hdel(f'tracker:{self.uuid}', 'filters')
|
||||
|
||||
def get_tracked(self):
|
||||
return self._get_field('tracked')
|
||||
|
||||
|
@ -241,7 +255,8 @@ class Tracker:
|
|||
return self._get_field('user_id')
|
||||
|
||||
def webhook_export(self):
|
||||
return r_tracker.hexists(f'tracker:mail:{self.uuid}', 'webhook')
|
||||
webhook = self.get_webhook()
|
||||
return webhook is not None and webhook
|
||||
|
||||
def get_webhook(self):
|
||||
return r_tracker.hget(f'tracker:{self.uuid}', 'webhook')
|
||||
|
@ -513,6 +528,7 @@ class Tracker:
|
|||
self._set_mails(mails)
|
||||
|
||||
# Filters
|
||||
self.del_filters(old_type, old_to_track)
|
||||
if not filters:
|
||||
filters = {}
|
||||
for obj_type in get_objects_tracked():
|
||||
|
@ -522,9 +538,6 @@ class Tracker:
|
|||
for obj_type in filters:
|
||||
r_tracker.sadd(f'trackers:objs:{tracker_type}:{obj_type}', to_track)
|
||||
r_tracker.sadd(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}')
|
||||
if tracker_type != old_type:
|
||||
r_tracker.srem(f'trackers:objs:{old_type}:{obj_type}', old_to_track)
|
||||
r_tracker.srem(f'trackers:uuid:{old_type}:{old_to_track}', f'{self.uuid}:{obj_type}')
|
||||
|
||||
# Refresh Trackers
|
||||
trigger_trackers_refresh(tracker_type)
|
||||
|
@ -555,17 +568,27 @@ class Tracker:
|
|||
os.remove(filepath)
|
||||
|
||||
# Filters
|
||||
filters = self.get_filters()
|
||||
if not filters:
|
||||
filters = get_objects_tracked()
|
||||
filters = get_objects_tracked()
|
||||
for obj_type in filters:
|
||||
r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', tracked)
|
||||
r_tracker.srem(f'trackers:uuid:{tracker_type}:{tracked}', f'{self.uuid}:{obj_type}')
|
||||
|
||||
self._del_mails()
|
||||
self._del_tags()
|
||||
|
||||
level = self.get_level()
|
||||
|
||||
if level == 0: # user only
|
||||
user = self.get_user()
|
||||
r_tracker.srem(f'user:tracker:{user}', self.uuid)
|
||||
r_tracker.srem(f'user:tracker:{user}:{tracker_type}', self.uuid)
|
||||
elif level == 1: # global
|
||||
r_tracker.srem('global:tracker', self.uuid)
|
||||
r_tracker.srem(f'global:tracker:{tracker_type}', self.uuid)
|
||||
|
||||
# meta
|
||||
r_tracker.delete(f'tracker:{self.uuid}')
|
||||
trigger_trackers_refresh(tracker_type)
|
||||
|
||||
|
||||
def create_tracker(tracker_type, to_track, user_id, level, description=None, filters={}, tags=[], mails=[], webhook=None, tracker_uuid=None):
|
||||
|
@ -638,14 +661,14 @@ def get_user_trackers_meta(user_id, tracker_type=None):
|
|||
metas = []
|
||||
for tracker_uuid in get_user_trackers(user_id, tracker_type=tracker_type):
|
||||
tracker = Tracker(tracker_uuid)
|
||||
metas.append(tracker.get_meta(options={'mails', 'sparkline', 'tags'}))
|
||||
metas.append(tracker.get_meta(options={'description', 'mails', 'sparkline', 'tags'}))
|
||||
return metas
|
||||
|
||||
def get_global_trackers_meta(tracker_type=None):
|
||||
metas = []
|
||||
for tracker_uuid in get_global_trackers(tracker_type=tracker_type):
|
||||
tracker = Tracker(tracker_uuid)
|
||||
metas.append(tracker.get_meta(options={'mails', 'sparkline', 'tags'}))
|
||||
metas.append(tracker.get_meta(options={'description', 'mails', 'sparkline', 'tags'}))
|
||||
return metas
|
||||
|
||||
def get_users_trackers_meta():
|
||||
|
@ -906,7 +929,7 @@ def api_add_tracker(dict_input, user_id):
|
|||
# Filters # TODO MOVE ME
|
||||
filters = dict_input.get('filters', {})
|
||||
if filters:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp', 'title'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
filters = {}
|
||||
for obj_type in filters:
|
||||
if obj_type not in get_objects_tracked():
|
||||
|
@ -981,7 +1004,7 @@ def api_edit_tracker(dict_input, user_id):
|
|||
# Filters # TODO MOVE ME
|
||||
filters = dict_input.get('filters', {})
|
||||
if filters:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp', 'title'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
if not filters['decoded'] and not filters['item']:
|
||||
filters = {}
|
||||
for obj_type in filters:
|
||||
|
@ -1134,7 +1157,11 @@ def get_tracked_yara_rules():
|
|||
for obj_type in get_objects_tracked():
|
||||
rules = {}
|
||||
for tracked in _get_tracked_by_obj_type('yara', obj_type):
|
||||
rules[tracked] = os.path.join(get_yara_rules_dir(), tracked)
|
||||
rule = os.path.join(get_yara_rules_dir(), tracked)
|
||||
if not os.path.exists(rule):
|
||||
logger.critical(f"Yara rule don't exists {tracked} : {obj_type}")
|
||||
else:
|
||||
rules[tracked] = rule
|
||||
to_track[obj_type] = yara.compile(filepaths=rules)
|
||||
print(to_track)
|
||||
return to_track
|
||||
|
|
|
@ -81,7 +81,7 @@ def get_user_passwd_hash(user_id):
|
|||
return r_serv_db.hget('ail:users:all', user_id)
|
||||
|
||||
def get_user_token(user_id):
|
||||
return r_serv_db.hget(f'ail:users:metadata:{user_id}', 'token')
|
||||
return r_serv_db.hget(f'ail:user:metadata:{user_id}', 'token')
|
||||
|
||||
def get_token_user(token):
|
||||
return r_serv_db.hget('ail:users:tokens', token)
|
||||
|
@ -156,7 +156,8 @@ def delete_user(user_id):
|
|||
for role_id in get_all_roles():
|
||||
r_serv_db.srem(f'ail:users:role:{role_id}', user_id)
|
||||
user_token = get_user_token(user_id)
|
||||
r_serv_db.hdel('ail:users:tokens', user_token)
|
||||
if user_token:
|
||||
r_serv_db.hdel('ail:users:tokens', user_token)
|
||||
r_serv_db.delete(f'ail:user:metadata:{user_id}')
|
||||
r_serv_db.hdel('ail:users:all', user_id)
|
||||
|
||||
|
@ -246,7 +247,10 @@ class User(UserMixin):
|
|||
self.id = "__anonymous__"
|
||||
|
||||
def exists(self):
|
||||
return self.id != "__anonymous__"
|
||||
if self.id == "__anonymous__":
|
||||
return False
|
||||
else:
|
||||
return r_serv_db.exists(f'ail:user:metadata:{self.id}')
|
||||
|
||||
# return True or False
|
||||
# def is_authenticated():
|
||||
|
@ -286,3 +290,6 @@ class User(UserMixin):
|
|||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
def get_role(self):
|
||||
return r_serv_db.hget(f'ail:user:metadata:{self.id}', 'role')
|
||||
|
|
|
@ -13,9 +13,12 @@ from lib.ConfigLoader import ConfigLoader
|
|||
|
||||
config_loader = ConfigLoader()
|
||||
r_serv_db = config_loader.get_db_conn("Kvrocks_DB")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
AIL_OBJECTS = sorted({'cve', 'cryptocurrency', 'decoded', 'domain', 'item', 'pgp', 'screenshot', 'username'})
|
||||
AIL_OBJECTS = sorted({'chat', 'chat-subchannel', 'chat-thread', 'cookie-name', 'cve', 'cryptocurrency', 'decoded',
|
||||
'domain', 'etag', 'favicon', 'file-name', 'hhhash',
|
||||
'item', 'image', 'message', 'pgp', 'screenshot', 'title', 'user-account', 'username'})
|
||||
|
||||
def get_ail_uuid():
|
||||
ail_uuid = r_serv_db.get('ail:uuid')
|
||||
|
@ -37,19 +40,28 @@ def get_all_objects():
|
|||
return AIL_OBJECTS
|
||||
|
||||
def get_objects_with_subtypes():
|
||||
return ['cryptocurrency', 'pgp', 'username']
|
||||
return ['chat', 'cryptocurrency', 'pgp', 'username', 'user-account']
|
||||
|
||||
def get_object_all_subtypes(obj_type):
|
||||
def get_object_all_subtypes(obj_type): # TODO Dynamic subtype
|
||||
if obj_type == 'chat':
|
||||
return r_object.smembers(f'all_chat:subtypes')
|
||||
if obj_type == 'chat-subchannel':
|
||||
return r_object.smembers(f'all_chat-subchannel:subtypes')
|
||||
if obj_type == 'cryptocurrency':
|
||||
return ['bitcoin', 'bitcoin-cash', 'dash', 'ethereum', 'litecoin', 'monero', 'zcash']
|
||||
if obj_type == 'pgp':
|
||||
return ['key', 'mail', 'name']
|
||||
if obj_type == 'username':
|
||||
return ['telegram', 'twitter', 'jabber']
|
||||
if obj_type == 'user-account':
|
||||
return r_object.smembers(f'all_chat:subtypes')
|
||||
return []
|
||||
|
||||
def get_obj_queued():
|
||||
return ['item', 'image']
|
||||
|
||||
def get_objects_tracked():
|
||||
return ['decoded', 'item', 'pgp']
|
||||
return ['decoded', 'item', 'pgp', 'title']
|
||||
|
||||
def get_objects_retro_hunted():
|
||||
return ['decoded', 'item']
|
||||
|
@ -65,6 +77,32 @@ def get_all_objects_with_subtypes_tuple():
|
|||
str_objs.append((obj_type, ''))
|
||||
return str_objs
|
||||
|
||||
def unpack_obj_global_id(global_id, r_type='tuple'):
|
||||
if r_type == 'dict':
|
||||
obj = global_id.split(':', 2)
|
||||
return {'type': obj[0], 'subtype': obj[1], 'id': obj[2]}
|
||||
else: # tuple(type, subtype, id)
|
||||
return global_id.split(':', 2)
|
||||
|
||||
def unpack_objs_global_id(objs_global_id, r_type='tuple'):
|
||||
objs = []
|
||||
for global_id in objs_global_id:
|
||||
objs.append(unpack_obj_global_id(global_id, r_type=r_type))
|
||||
return objs
|
||||
|
||||
def unpack_correl_obj__id(obj_type, global_id, r_type='tuple'):
|
||||
obj = global_id.split(':', 1)
|
||||
if r_type == 'dict':
|
||||
return {'type': obj_type, 'subtype': obj[0], 'id': obj[1]}
|
||||
else: # tuple(type, subtype, id)
|
||||
return obj_type, obj[0], obj[1]
|
||||
|
||||
def unpack_correl_objs_id(obj_type, correl_objs_id, r_type='tuple'):
|
||||
objs = []
|
||||
for correl_obj_id in correl_objs_id:
|
||||
objs.append(unpack_correl_obj__id(obj_type, correl_obj_id, r_type=r_type))
|
||||
return objs
|
||||
|
||||
##-- AIL OBJECTS --##
|
||||
|
||||
#### Redis ####
|
||||
|
@ -82,6 +120,10 @@ def zscan_iter(r_redis, name): # count ???
|
|||
|
||||
## -- Redis -- ##
|
||||
|
||||
def rreplace(s, old, new, occurrence):
|
||||
li = s.rsplit(old, occurrence)
|
||||
return new.join(li)
|
||||
|
||||
def paginate_iterator(iter_elems, nb_obj=50, page=1):
|
||||
dict_page = {'nb_all_elem': len(iter_elems)}
|
||||
nb_pages = dict_page['nb_all_elem'] / nb_obj
|
||||
|
|
|
@ -1,9 +1,7 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import base64
|
||||
import datetime
|
||||
import gzip
|
||||
import logging.config
|
||||
import magic
|
||||
import os
|
||||
|
@ -181,15 +179,3 @@ def create_item_id(feeder_name, path):
|
|||
item_id = os.path.join(feeder_name, date, basename)
|
||||
# TODO check if already exists
|
||||
return item_id
|
||||
|
||||
def create_b64(b_content):
|
||||
return base64.standard_b64encode(b_content).decode()
|
||||
|
||||
def create_gzipped_b64(b_content):
|
||||
try:
|
||||
gzipencoded = gzip.compress(b_content)
|
||||
gzip64encoded = create_b64(gzipencoded)
|
||||
return gzip64encoded
|
||||
except Exception as e:
|
||||
logger.warning(e)
|
||||
return ''
|
||||
|
|
|
@ -6,19 +6,29 @@ import sys
|
|||
import datetime
|
||||
import time
|
||||
|
||||
import xxhash
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.exceptions import ModuleQueueError
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import ail_core
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_queues = config_loader.get_redis_conn("Redis_Queues")
|
||||
r_obj_process = config_loader.get_redis_conn("Redis_Process")
|
||||
timeout_queue_obj = 172800
|
||||
config_loader = None
|
||||
|
||||
MODULES_FILE = os.path.join(os.environ['AIL_HOME'], 'configs', 'modules.cfg')
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# AIL QUEUE #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
class AILQueue:
|
||||
|
||||
|
@ -60,16 +70,38 @@ class AILQueue:
|
|||
# Update queues stats
|
||||
r_queues.hset('queues', self.name, self.get_nb_messages())
|
||||
r_queues.hset(f'modules', f'{self.pid}:{self.name}', int(time.time()))
|
||||
|
||||
# Get Message
|
||||
message = r_queues.lpop(f'queue:{self.name}:in')
|
||||
if not message:
|
||||
return None
|
||||
else:
|
||||
# TODO SAVE CURRENT ITEMS (OLD Module information)
|
||||
row_mess = message.split(';', 1)
|
||||
if len(row_mess) != 2:
|
||||
return None, None, message
|
||||
# raise Exception(f'Error: queue {self.name}, no AIL object provided')
|
||||
else:
|
||||
obj_global_id, mess = row_mess
|
||||
m_hash = xxhash.xxh3_64_hexdigest(message)
|
||||
add_processed_obj(obj_global_id, m_hash, module=self.name)
|
||||
return obj_global_id, m_hash, mess
|
||||
|
||||
return message
|
||||
def rename_message_obj(self, new_id, old_id):
|
||||
# restrict rename function
|
||||
if self.name == 'Mixer' or self.name == 'Global':
|
||||
rename_processed_obj(new_id, old_id)
|
||||
else:
|
||||
raise ModuleQueueError('This Module can\'t rename an object ID')
|
||||
|
||||
def send_message(self, message, queue_name=None):
|
||||
# condition -> not in any queue
|
||||
# TODO EDIT meta
|
||||
|
||||
|
||||
|
||||
def end_message(self, obj_global_id, m_hash):
|
||||
end_processed_obj(obj_global_id, m_hash, module=self.name)
|
||||
|
||||
def send_message(self, obj_global_id, message='', queue_name=None):
|
||||
if not self.subscribers_modules:
|
||||
raise ModuleQueueError('This Module don\'t have any subscriber')
|
||||
if queue_name:
|
||||
|
@ -80,8 +112,17 @@ class AILQueue:
|
|||
raise ModuleQueueError('Queue name required. This module push to multiple queues')
|
||||
queue_name = list(self.subscribers_modules)[0]
|
||||
|
||||
message = f'{obj_global_id};{message}'
|
||||
if obj_global_id != '::':
|
||||
m_hash = xxhash.xxh3_64_hexdigest(message)
|
||||
else:
|
||||
m_hash = None
|
||||
|
||||
# Add message to all modules
|
||||
for module_name in self.subscribers_modules[queue_name]:
|
||||
if m_hash:
|
||||
add_processed_obj(obj_global_id, m_hash, queue=module_name)
|
||||
|
||||
r_queues.rpush(f'queue:{module_name}:in', message)
|
||||
# stats
|
||||
nb_mess = r_queues.llen(f'queue:{module_name}:in')
|
||||
|
@ -98,6 +139,7 @@ class AILQueue:
|
|||
def error(self):
|
||||
r_queues.hdel(f'modules', f'{self.pid}:{self.name}')
|
||||
|
||||
|
||||
def get_queues_modules():
|
||||
return r_queues.hkeys('queues')
|
||||
|
||||
|
@ -132,6 +174,132 @@ def get_modules_queues_stats():
|
|||
def clear_modules_queues_stats():
|
||||
r_queues.delete('modules')
|
||||
|
||||
|
||||
# # # # # # # # #
|
||||
# #
|
||||
# OBJ QUEUES # PROCESS ??
|
||||
# #
|
||||
# # # # # # # # #
|
||||
|
||||
|
||||
def get_processed_objs():
|
||||
return r_obj_process.smembers(f'objs:process')
|
||||
|
||||
def get_processed_end_objs():
|
||||
return r_obj_process.smembers(f'objs:processed')
|
||||
|
||||
def get_processed_end_obj():
|
||||
return r_obj_process.spop(f'objs:processed')
|
||||
|
||||
def get_processed_objs_by_type(obj_type):
|
||||
return r_obj_process.zrange(f'objs:process:{obj_type}', 0, -1)
|
||||
|
||||
def is_processed_obj_queued(obj_global_id):
|
||||
return r_obj_process.exists(f'obj:queues:{obj_global_id}')
|
||||
|
||||
def is_processed_obj_moduled(obj_global_id):
|
||||
return r_obj_process.exists(f'obj:modules:{obj_global_id}')
|
||||
|
||||
def is_processed_obj(obj_global_id):
|
||||
return is_processed_obj_queued(obj_global_id) or is_processed_obj_moduled(obj_global_id)
|
||||
|
||||
def get_processed_obj_modules(obj_global_id):
|
||||
return r_obj_process.zrange(f'obj:modules:{obj_global_id}', 0, -1)
|
||||
|
||||
def get_processed_obj_queues(obj_global_id):
|
||||
return r_obj_process.zrange(f'obj:queues:{obj_global_id}', 0, -1)
|
||||
|
||||
def get_processed_obj(obj_global_id):
|
||||
return {'modules': get_processed_obj_modules(obj_global_id), 'queues': get_processed_obj_queues(obj_global_id)}
|
||||
|
||||
def add_processed_obj(obj_global_id, m_hash, module=None, queue=None):
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
new_obj = r_obj_process.sadd(f'objs:process', obj_global_id)
|
||||
# first process:
|
||||
if new_obj:
|
||||
r_obj_process.zadd(f'objs:process:{obj_type}', {obj_global_id: int(time.time())})
|
||||
if queue:
|
||||
r_obj_process.zadd(f'obj:queues:{obj_global_id}', {f'{queue}:{m_hash}': int(time.time())})
|
||||
if module:
|
||||
r_obj_process.zadd(f'obj:modules:{obj_global_id}', {f'{module}:{m_hash}': int(time.time())})
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{module}:{m_hash}')
|
||||
|
||||
def end_processed_obj(obj_global_id, m_hash, module=None, queue=None):
|
||||
if queue:
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{m_hash}')
|
||||
if module:
|
||||
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{m_hash}')
|
||||
|
||||
# TODO HANDLE QUEUE DELETE
|
||||
# process completed
|
||||
if not is_processed_obj(obj_global_id):
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
|
||||
r_obj_process.srem(f'objs:process', obj_global_id)
|
||||
|
||||
r_obj_process.sadd(f'objs:processed', obj_global_id) # TODO use list ??????
|
||||
|
||||
def rename_processed_obj(new_id, old_id):
|
||||
module = get_processed_obj_modules(old_id)
|
||||
# currently in a module
|
||||
if len(module) == 1:
|
||||
module, x_hash = module[0].split(':', 1)
|
||||
obj_type = old_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'obj:modules:{old_id}', f'{module}:{x_hash}')
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', old_id)
|
||||
r_obj_process.srem(f'objs:process', old_id)
|
||||
add_processed_obj(new_id, x_hash, module=module)
|
||||
|
||||
def get_last_queue_timeout():
|
||||
epoch_update = r_obj_process.get('queue:obj:timeout:last')
|
||||
if not epoch_update:
|
||||
epoch_update = 0
|
||||
return float(epoch_update)
|
||||
|
||||
def timeout_process_obj(obj_global_id):
|
||||
for q in get_processed_obj_queues(obj_global_id):
|
||||
queue, x_hash = q.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{x_hash}')
|
||||
for m in get_processed_obj_modules(obj_global_id):
|
||||
module, x_hash = m.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{x_hash}')
|
||||
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
|
||||
r_obj_process.srem(f'objs:process', obj_global_id)
|
||||
|
||||
r_obj_process.sadd(f'objs:processed', obj_global_id)
|
||||
print(f'timeout: {obj_global_id}')
|
||||
|
||||
|
||||
def timeout_processed_objs():
|
||||
curr_time = int(time.time())
|
||||
time_limit = curr_time - timeout_queue_obj
|
||||
for obj_type in ail_core.get_obj_queued():
|
||||
for obj_global_id in r_obj_process.zrangebyscore(f'objs:process:{obj_type}', 0, time_limit):
|
||||
timeout_process_obj(obj_global_id)
|
||||
r_obj_process.set('queue:obj:timeout:last', time.time())
|
||||
|
||||
def delete_processed_obj(obj_global_id):
|
||||
for q in get_processed_obj_queues(obj_global_id):
|
||||
queue, x_hash = q.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{x_hash}')
|
||||
for m in get_processed_obj_modules(obj_global_id):
|
||||
module, x_hash = m.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{x_hash}')
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
|
||||
r_obj_process.srem(f'objs:process', obj_global_id)
|
||||
|
||||
###################################################################################
|
||||
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# GRAPH #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def get_queue_digraph():
|
||||
queues_ail = {}
|
||||
modules = {}
|
||||
|
@ -223,64 +391,13 @@ def save_queue_digraph():
|
|||
sys.exit(1)
|
||||
|
||||
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
|
||||
# def get_all_queues_name():
|
||||
# return r_queues.hkeys('queues')
|
||||
#
|
||||
# def get_all_queues_dict_with_nb_elem():
|
||||
# return r_queues.hgetall('queues')
|
||||
#
|
||||
# def get_all_queues_with_sorted_nb_elem():
|
||||
# res = r_queues.hgetall('queues')
|
||||
# res = sorted(res.items())
|
||||
# return res
|
||||
#
|
||||
# def get_module_pid_by_queue_name(queue_name):
|
||||
# return r_queues.smembers('MODULE_TYPE_{}'.format(queue_name))
|
||||
#
|
||||
# # # TODO: remove last msg part
|
||||
# def get_module_last_process_start_time(queue_name, module_pid):
|
||||
# res = r_queues.get('MODULE_{}_{}'.format(queue_name, module_pid))
|
||||
# if res:
|
||||
# return res.split(',')[0]
|
||||
# return None
|
||||
#
|
||||
# def get_module_last_msg(queue_name, module_pid):
|
||||
# return r_queues.get('MODULE_{}_{}_PATH'.format(queue_name, module_pid))
|
||||
#
|
||||
# def get_all_modules_queues_stats():
|
||||
# all_modules_queues_stats = []
|
||||
# for queue_name, nb_elem_queue in get_all_queues_with_sorted_nb_elem():
|
||||
# l_module_pid = get_module_pid_by_queue_name(queue_name)
|
||||
# for module_pid in l_module_pid:
|
||||
# last_process_start_time = get_module_last_process_start_time(queue_name, module_pid)
|
||||
# if last_process_start_time:
|
||||
# last_process_start_time = datetime.datetime.fromtimestamp(int(last_process_start_time))
|
||||
# seconds = int((datetime.datetime.now() - last_process_start_time).total_seconds())
|
||||
# else:
|
||||
# seconds = 0
|
||||
# all_modules_queues_stats.append((queue_name, nb_elem_queue, seconds, module_pid))
|
||||
# return all_modules_queues_stats
|
||||
#
|
||||
#
|
||||
# def _get_all_messages_from_queue(queue_name):
|
||||
# #self.r_temp.hset('queues', self.subscriber_name, int(self.r_temp.scard(in_set)))
|
||||
# return r_queues.smembers(f'queue:{queue_name}:in')
|
||||
#
|
||||
# # def is_message_in queue(queue_name):
|
||||
# # pass
|
||||
#
|
||||
# def remove_message_from_queue(queue_name, message):
|
||||
# queue_key = f'queue:{queue_name}:in'
|
||||
# r_queues.srem(queue_key, message)
|
||||
# r_queues.hset('queues', queue_name, int(r_queues.scard(queue_key)))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# clear_modules_queues_stats()
|
||||
save_queue_digraph()
|
||||
# save_queue_digraph()
|
||||
oobj_global_id = 'item::submitted/2023/10/11/submitted_b5440009-05d5-4494-a807-a6d8e4a900cf.gz'
|
||||
# print(get_processed_obj(oobj_global_id))
|
||||
# delete_processed_obj(oobj_global_id)
|
||||
# while True:
|
||||
# print(get_processed_obj(oobj_global_id))
|
||||
# time.sleep(0.5)
|
||||
print(get_processed_end_objs())
|
||||
|
|
|
@ -15,38 +15,15 @@ config_loader = ConfigLoader()
|
|||
r_db = config_loader.get_db_conn("Kvrocks_DB")
|
||||
config_loader = None
|
||||
|
||||
BACKGROUND_UPDATES = {
|
||||
'v1.5': {
|
||||
'nb_updates': 5,
|
||||
'message': 'Tags and Screenshots'
|
||||
},
|
||||
'v2.4': {
|
||||
'nb_updates': 1,
|
||||
'message': ' Domains Tags and Correlations'
|
||||
},
|
||||
'v2.6': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Domains Tags and Correlations'
|
||||
},
|
||||
'v2.7': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Domains Tags'
|
||||
},
|
||||
'v3.4': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Domains Languages'
|
||||
},
|
||||
'v3.7': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Trackers first_seen/last_seen'
|
||||
}
|
||||
}
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# UPDATE #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def get_ail_version():
|
||||
return r_db.get('ail:version')
|
||||
|
||||
|
||||
def get_ail_float_version():
|
||||
version = get_ail_version()
|
||||
if version:
|
||||
|
@ -55,6 +32,179 @@ def get_ail_float_version():
|
|||
version = 0
|
||||
return version
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
# # # # # # # # # # # #
|
||||
# #
|
||||
# UPDATE BACKGROUND #
|
||||
# #
|
||||
# # # # # # # # # # # #
|
||||
|
||||
|
||||
BACKGROUND_UPDATES = {
|
||||
'v5.2': {
|
||||
'message': 'Compress HAR',
|
||||
'scripts': ['compress_har.py']
|
||||
},
|
||||
}
|
||||
|
||||
class AILBackgroundUpdate:
|
||||
"""
|
||||
AIL Background Update.
|
||||
"""
|
||||
|
||||
def __init__(self, version):
|
||||
self.version = version
|
||||
|
||||
def _get_field(self, field):
|
||||
return r_db.hget('ail:update:background', field)
|
||||
|
||||
def _set_field(self, field, value):
|
||||
r_db.hset('ail:update:background', field, value)
|
||||
|
||||
def get_version(self):
|
||||
return self.version
|
||||
|
||||
def get_message(self):
|
||||
return BACKGROUND_UPDATES.get(self.version, {}).get('message', '')
|
||||
|
||||
def get_error(self):
|
||||
return self._get_field('error')
|
||||
|
||||
def set_error(self, error): # TODO ADD LOGS
|
||||
self._set_field('error', error)
|
||||
|
||||
def get_nb_scripts(self):
|
||||
return int(len(BACKGROUND_UPDATES.get(self.version, {}).get('scripts', [''])))
|
||||
|
||||
def get_scripts(self):
|
||||
return BACKGROUND_UPDATES.get(self.version, {}).get('scripts', [])
|
||||
|
||||
def get_nb_scripts_done(self):
|
||||
done = self._get_field('done')
|
||||
try:
|
||||
done = int(done)
|
||||
except (TypeError, ValueError):
|
||||
done = 0
|
||||
return done
|
||||
|
||||
def inc_nb_scripts_done(self):
|
||||
self._set_field('done', self.get_nb_scripts_done() + 1)
|
||||
|
||||
def get_script(self):
|
||||
return self._get_field('script')
|
||||
|
||||
def get_script_path(self):
|
||||
path = os.path.basename(self.get_script())
|
||||
if path:
|
||||
return os.path.join(os.environ['AIL_HOME'], 'update', self.version, path)
|
||||
|
||||
def get_nb_to_update(self): # TODO use cache ?????
|
||||
nb_to_update = self._get_field('nb_to_update')
|
||||
if not nb_to_update:
|
||||
nb_to_update = 1
|
||||
return int(nb_to_update)
|
||||
|
||||
def set_nb_to_update(self, nb):
|
||||
self._set_field('nb_to_update', int(nb))
|
||||
|
||||
def get_nb_updated(self): # TODO use cache ?????
|
||||
nb_updated = self._get_field('nb_updated')
|
||||
if not nb_updated:
|
||||
nb_updated = 0
|
||||
return int(nb_updated)
|
||||
|
||||
def inc_nb_updated(self): # TODO use cache ?????
|
||||
r_db.hincrby('ail:update:background', 'nb_updated', 1)
|
||||
|
||||
def get_progress(self): # TODO use cache ?????
|
||||
return self._get_field('progress')
|
||||
|
||||
def set_progress(self, progress):
|
||||
self._set_field('progress', progress)
|
||||
|
||||
def update_progress(self):
|
||||
nb_updated = self.get_nb_updated()
|
||||
nb_to_update = self.get_nb_to_update()
|
||||
if nb_updated == nb_to_update:
|
||||
progress = 100
|
||||
elif nb_updated > nb_to_update:
|
||||
progress = 99
|
||||
else:
|
||||
progress = int((nb_updated * 100) / nb_to_update)
|
||||
self.set_progress(progress)
|
||||
print(f'{nb_updated}/{nb_to_update} updated {progress}%')
|
||||
return progress
|
||||
|
||||
def is_running(self):
|
||||
return r_db.hget('ail:update:background', 'version') == self.version
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'version': self.get_version(),
|
||||
'error': self.get_error(),
|
||||
'script': self.get_script(),
|
||||
'script_progress': self.get_progress(),
|
||||
'nb_update': self.get_nb_scripts(),
|
||||
'nb_completed': self.get_nb_scripts_done()}
|
||||
meta['progress'] = int(meta['nb_completed'] * 100 / meta['nb_update'])
|
||||
if 'message' in options:
|
||||
meta['message'] = self.get_message()
|
||||
return meta
|
||||
|
||||
def start(self):
|
||||
self._set_field('version', self.version)
|
||||
r_db.hdel('ail:update:background', 'error')
|
||||
|
||||
def start_script(self, script):
|
||||
self.clear()
|
||||
self._set_field('script', script)
|
||||
self.set_progress(0)
|
||||
|
||||
def end_script(self):
|
||||
self.set_progress(100)
|
||||
self.inc_nb_scripts_done()
|
||||
|
||||
def clear(self):
|
||||
r_db.hdel('ail:update:background', 'error')
|
||||
r_db.hdel('ail:update:background', 'progress')
|
||||
r_db.hdel('ail:update:background', 'nb_updated')
|
||||
r_db.hdel('ail:update:background', 'nb_to_update')
|
||||
|
||||
def end(self):
|
||||
r_db.delete('ail:update:background')
|
||||
r_db.srem('ail:updates:background', self.version)
|
||||
|
||||
|
||||
# To Add in update script
|
||||
def add_background_update(version):
|
||||
r_db.sadd('ail:updates:background', version)
|
||||
|
||||
def is_update_background_running():
|
||||
return r_db.exists('ail:update:background')
|
||||
|
||||
def get_update_background_version():
|
||||
return r_db.hget('ail:update:background', 'version')
|
||||
|
||||
def get_update_background_meta(options=set()):
|
||||
version = get_update_background_version()
|
||||
if version:
|
||||
return AILBackgroundUpdate(version).get_meta(options=options)
|
||||
else:
|
||||
return {}
|
||||
|
||||
def get_update_background_to_launch():
|
||||
to_launch = []
|
||||
updates = r_db.smembers('ail:updates:background')
|
||||
for version in BACKGROUND_UPDATES:
|
||||
if version in updates:
|
||||
to_launch.append(version)
|
||||
return to_launch
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
##########################################################################################
|
||||
##########################################################################################
|
||||
##########################################################################################
|
||||
|
||||
def get_ail_all_updates(date_separator='-'):
|
||||
dict_update = r_db.hgetall('ail:update_date')
|
||||
|
@ -87,111 +237,6 @@ def check_version(version):
|
|||
return True
|
||||
|
||||
|
||||
#### UPDATE BACKGROUND ####
|
||||
|
||||
def exits_background_update_to_launch():
|
||||
return r_db.scard('ail:update:to_update') != 0
|
||||
|
||||
|
||||
def is_version_in_background_update(version):
|
||||
return r_db.sismember('ail:update:to_update', version)
|
||||
|
||||
|
||||
def get_all_background_updates_to_launch():
|
||||
return r_db.smembers('ail:update:to_update')
|
||||
|
||||
|
||||
def get_current_background_update():
|
||||
return r_db.get('ail:update:update_in_progress')
|
||||
|
||||
|
||||
def get_current_background_update_script():
|
||||
return r_db.get('ail:update:current_background_script')
|
||||
|
||||
|
||||
def get_current_background_update_script_path(version, script_name):
|
||||
return os.path.join(os.environ['AIL_HOME'], 'update', version, script_name)
|
||||
|
||||
|
||||
def get_current_background_nb_update_completed():
|
||||
return r_db.scard('ail:update:update_in_progress:completed')
|
||||
|
||||
|
||||
def get_current_background_update_progress():
|
||||
progress = r_db.get('ail:update:current_background_script_stat')
|
||||
if not progress:
|
||||
progress = 0
|
||||
return int(progress)
|
||||
|
||||
|
||||
def get_background_update_error():
|
||||
return r_db.get('ail:update:error')
|
||||
|
||||
|
||||
def add_background_updates_to_launch(version):
|
||||
return r_db.sadd('ail:update:to_update', version)
|
||||
|
||||
|
||||
def start_background_update(version):
|
||||
r_db.delete('ail:update:error')
|
||||
r_db.set('ail:update:update_in_progress', version)
|
||||
|
||||
|
||||
def set_current_background_update_script(script_name):
|
||||
r_db.set('ail:update:current_background_script', script_name)
|
||||
r_db.set('ail:update:current_background_script_stat', 0)
|
||||
|
||||
|
||||
def set_current_background_update_progress(progress):
|
||||
r_db.set('ail:update:current_background_script_stat', progress)
|
||||
|
||||
|
||||
def set_background_update_error(error):
|
||||
r_db.set('ail:update:error', error)
|
||||
|
||||
|
||||
def end_background_update_script():
|
||||
r_db.sadd('ail:update:update_in_progress:completed')
|
||||
|
||||
|
||||
def end_background_update(version):
|
||||
r_db.delete('ail:update:update_in_progress')
|
||||
r_db.delete('ail:update:current_background_script')
|
||||
r_db.delete('ail:update:current_background_script_stat')
|
||||
r_db.delete('ail:update:update_in_progress:completed')
|
||||
r_db.srem('ail:update:to_update', version)
|
||||
|
||||
|
||||
def clear_background_update():
|
||||
r_db.delete('ail:update:error')
|
||||
r_db.delete('ail:update:update_in_progress')
|
||||
r_db.delete('ail:update:current_background_script')
|
||||
r_db.delete('ail:update:current_background_script_stat')
|
||||
r_db.delete('ail:update:update_in_progress:completed')
|
||||
|
||||
|
||||
def get_update_background_message(version):
|
||||
return BACKGROUND_UPDATES[version]['message']
|
||||
|
||||
|
||||
# TODO: Detect error in subprocess
|
||||
def get_update_background_metadata():
|
||||
dict_update = {}
|
||||
version = get_current_background_update()
|
||||
if version:
|
||||
dict_update['version'] = version
|
||||
dict_update['script'] = get_current_background_update_script()
|
||||
dict_update['script_progress'] = get_current_background_update_progress()
|
||||
dict_update['nb_update'] = BACKGROUND_UPDATES[dict_update['version']]['nb_updates']
|
||||
dict_update['nb_completed'] = get_current_background_nb_update_completed()
|
||||
dict_update['progress'] = int(dict_update['nb_completed'] * 100 / dict_update['nb_update'])
|
||||
dict_update['error'] = get_background_update_error()
|
||||
return dict_update
|
||||
|
||||
|
||||
##-- UPDATE BACKGROUND --##
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
res = check_version('v3.1..1')
|
||||
print(res)
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
#!/usr/bin/env python
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import requests
|
||||
|
@ -8,6 +10,8 @@ import requests
|
|||
sys.path.append(os.environ['AIL_BIN'])
|
||||
from lib.objects.CryptoCurrencies import CryptoCurrency
|
||||
|
||||
logger = logging.getLogger()
|
||||
|
||||
blockchain_all = 'https://blockchain.info/rawaddr'
|
||||
|
||||
# pre-alpha script
|
||||
|
@ -18,23 +22,26 @@ def get_bitcoin_info(bitcoin_address, nb_transaction=50):
|
|||
set_btc_in = set()
|
||||
set_btc_out = set()
|
||||
try:
|
||||
req = requests.get('{}/{}?limit={}'.format(blockchain_all, bitcoin_address, nb_transaction))
|
||||
req = requests.get(f'{blockchain_all}/{bitcoin_address}?limit={nb_transaction}')
|
||||
jreq = req.json()
|
||||
except Exception as e:
|
||||
print(e)
|
||||
logger.warning(e)
|
||||
return dict_btc
|
||||
|
||||
if not jreq.get('n_tx'):
|
||||
logger.critical(json.dumps(jreq))
|
||||
return dict_btc
|
||||
|
||||
# print(json.dumps(jreq))
|
||||
dict_btc['n_tx'] = jreq['n_tx']
|
||||
dict_btc['total_received'] = float(jreq['total_received'] / 100000000)
|
||||
dict_btc['total_sent'] = float(jreq['total_sent'] / 100000000)
|
||||
dict_btc['final_balance'] = float(jreq['final_balance'] / 100000000)
|
||||
|
||||
for transaction in jreq['txs']:
|
||||
for input in transaction['inputs']:
|
||||
if 'addr' in input['prev_out']:
|
||||
if input['prev_out']['addr'] != bitcoin_address:
|
||||
set_btc_in.add(input['prev_out']['addr'])
|
||||
for t_input in transaction['inputs']:
|
||||
if 'addr' in t_input['prev_out']:
|
||||
if t_input['prev_out']['addr'] != bitcoin_address:
|
||||
set_btc_in.add(t_input['prev_out']['addr'])
|
||||
for output in transaction['out']:
|
||||
if 'addr' in output:
|
||||
if output['addr'] != bitcoin_address:
|
||||
|
|
423
bin/lib/chats_viewer.py
Executable file
423
bin/lib/chats_viewer.py
Executable file
|
@ -0,0 +1,423 @@
|
|||
#!/usr/bin/python3
|
||||
|
||||
"""
|
||||
Chats Viewer
|
||||
===================
|
||||
|
||||
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import uuid
|
||||
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import Chats
|
||||
from lib.objects import ChatSubChannels
|
||||
from lib.objects import ChatThreads
|
||||
from lib.objects import Messages
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects import Usernames
|
||||
from lib import Language
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_db = config_loader.get_db_conn("Kvrocks_DB")
|
||||
r_crawler = config_loader.get_db_conn("Kvrocks_Crawler")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
|
||||
r_obj = config_loader.get_db_conn("Kvrocks_DB") # TEMP new DB ????
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# COMMON #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
# TODO ChatDefaultPlatform
|
||||
|
||||
# CHAT(type=chat, subtype=platform, id= chat_id)
|
||||
|
||||
# Channel(type=channel, subtype=platform, id=channel_id)
|
||||
|
||||
# Thread(type=thread, subtype=platform, id=thread_id)
|
||||
|
||||
# Message(type=message, subtype=platform, id=message_id)
|
||||
|
||||
|
||||
# Protocol/Platform
|
||||
|
||||
|
||||
# class ChatProtocols: # TODO Remove Me
|
||||
#
|
||||
# def __init__(self): # name ???? subtype, id ????
|
||||
# # discord, mattermost, ...
|
||||
# pass
|
||||
#
|
||||
# def get_chat_protocols(self):
|
||||
# pass
|
||||
#
|
||||
# def get_chat_protocol(self, protocol):
|
||||
# pass
|
||||
#
|
||||
# ################################################################
|
||||
#
|
||||
# def get_instances(self):
|
||||
# pass
|
||||
#
|
||||
# def get_chats(self):
|
||||
# pass
|
||||
#
|
||||
# def get_chats_by_instance(self, instance):
|
||||
# pass
|
||||
#
|
||||
#
|
||||
# class ChatNetwork: # uuid or protocol
|
||||
# def __init__(self, network='default'):
|
||||
# self.id = network
|
||||
#
|
||||
# def get_addresses(self):
|
||||
# pass
|
||||
#
|
||||
#
|
||||
# class ChatServerAddress: # uuid or protocol + network
|
||||
# def __init__(self, address='default'):
|
||||
# self.id = address
|
||||
|
||||
# map uuid -> type + field
|
||||
|
||||
# TODO option last protocol/ imported messages/chat -> unread mode ????
|
||||
|
||||
# # # # # # # # #
|
||||
# #
|
||||
# PROTOCOLS # IRC, discord, mattermost, ...
|
||||
# #
|
||||
# # # # # # # # # TODO icon => UI explorer by protocol + network + instance
|
||||
|
||||
def get_chat_protocols():
|
||||
return r_obj.smembers(f'chat:protocols')
|
||||
|
||||
def get_chat_protocols_meta():
|
||||
metas = []
|
||||
for protocol_id in get_chat_protocols():
|
||||
protocol = ChatProtocol(protocol_id)
|
||||
metas.append(protocol.get_meta(options={'icon'}))
|
||||
return metas
|
||||
|
||||
class ChatProtocol: # TODO first seen last seen ???? + nb by day ????
|
||||
def __init__(self, protocol):
|
||||
self.id = protocol
|
||||
|
||||
def exists(self):
|
||||
return r_db.exists(f'chat:protocol:{self.id}')
|
||||
|
||||
def get_networks(self):
|
||||
return r_db.smembers(f'chat:protocol:{self.id}')
|
||||
|
||||
def get_nb_networks(self):
|
||||
return r_db.scard(f'chat:protocol:{self.id}')
|
||||
|
||||
def get_icon(self):
|
||||
if self.id == 'discord':
|
||||
icon = {'style': 'fab', 'icon': 'fa-discord'}
|
||||
elif self.id == 'telegram':
|
||||
icon = {'style': 'fab', 'icon': 'fa-telegram'}
|
||||
else:
|
||||
icon = {}
|
||||
return icon
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'id': self.id}
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
return meta
|
||||
|
||||
# def get_addresses(self):
|
||||
# pass
|
||||
#
|
||||
# def get_instances_uuids(self):
|
||||
# pass
|
||||
|
||||
|
||||
# # # # # # # # # # # # # #
|
||||
# #
|
||||
# ChatServiceInstance #
|
||||
# #
|
||||
# # # # # # # # # # # # # #
|
||||
|
||||
# uuid -> protocol + network + server
|
||||
class ChatServiceInstance:
|
||||
def __init__(self, instance_uuid):
|
||||
self.uuid = instance_uuid
|
||||
|
||||
def exists(self):
|
||||
return r_obj.exists(f'chatSerIns:{self.uuid}')
|
||||
|
||||
def get_protocol(self): # return objects ????
|
||||
return r_obj.hget(f'chatSerIns:{self.uuid}', 'protocol')
|
||||
|
||||
def get_network(self): # return objects ????
|
||||
network = r_obj.hget(f'chatSerIns:{self.uuid}', 'network')
|
||||
if network:
|
||||
return network
|
||||
|
||||
def get_address(self): # return objects ????
|
||||
address = r_obj.hget(f'chatSerIns:{self.uuid}', 'address')
|
||||
if address:
|
||||
return address
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'uuid': self.uuid,
|
||||
'protocol': self.get_protocol(),
|
||||
'network': self.get_network(),
|
||||
'address': self.get_address()}
|
||||
if 'chats' in options:
|
||||
meta['chats'] = []
|
||||
for chat_id in self.get_chats():
|
||||
meta['chats'].append(Chats.Chat(chat_id, self.uuid).get_meta({'created_at', 'icon', 'nb_subchannels', 'nb_messages'}))
|
||||
return meta
|
||||
|
||||
def get_nb_chats(self):
|
||||
return Chats.Chats().get_nb_ids_by_subtype(self.uuid)
|
||||
|
||||
def get_chats(self):
|
||||
return Chats.Chats().get_ids_by_subtype(self.uuid)
|
||||
|
||||
def get_chat_service_instances():
|
||||
return r_obj.smembers(f'chatSerIns:all')
|
||||
|
||||
def get_chat_service_instances_by_protocol(protocol):
|
||||
instance_uuids = {}
|
||||
for network in r_obj.smembers(f'chat:protocol:networks:{protocol}'):
|
||||
inst_uuids = r_obj.hvals(f'map:chatSerIns:{protocol}:{network}')
|
||||
if not network:
|
||||
network = 'default'
|
||||
instance_uuids[network] = inst_uuids
|
||||
return instance_uuids
|
||||
|
||||
def get_chat_service_instance_uuid(protocol, network, address):
|
||||
if not network:
|
||||
network = ''
|
||||
if not address:
|
||||
address = ''
|
||||
return r_obj.hget(f'map:chatSerIns:{protocol}:{network}', address)
|
||||
|
||||
def get_chat_service_instance_uuid_meta_from_network_dict(instance_uuids):
|
||||
for network in instance_uuids:
|
||||
metas = []
|
||||
for instance_uuid in instance_uuids[network]:
|
||||
metas.append(ChatServiceInstance(instance_uuid).get_meta())
|
||||
instance_uuids[network] = metas
|
||||
return instance_uuids
|
||||
|
||||
def get_chat_service_instance(protocol, network, address):
|
||||
instance_uuid = get_chat_service_instance_uuid(protocol, network, address)
|
||||
if instance_uuid:
|
||||
return ChatServiceInstance(instance_uuid)
|
||||
|
||||
def create_chat_service_instance(protocol, network=None, address=None):
|
||||
instance_uuid = get_chat_service_instance_uuid(protocol, network, address)
|
||||
if instance_uuid:
|
||||
return instance_uuid
|
||||
else:
|
||||
if not network:
|
||||
network = ''
|
||||
if not address:
|
||||
address = ''
|
||||
instance_uuid = str(uuid.uuid5(uuid.NAMESPACE_URL, f'{protocol}|{network}|{address}'))
|
||||
r_obj.sadd(f'chatSerIns:all', instance_uuid)
|
||||
|
||||
# map instance - uuid
|
||||
r_obj.hset(f'map:chatSerIns:{protocol}:{network}', address, instance_uuid)
|
||||
|
||||
r_obj.hset(f'chatSerIns:{instance_uuid}', 'protocol', protocol)
|
||||
if network:
|
||||
r_obj.hset(f'chatSerIns:{instance_uuid}', 'network', network)
|
||||
if address:
|
||||
r_obj.hset(f'chatSerIns:{instance_uuid}', 'address', address)
|
||||
|
||||
# protocols
|
||||
r_obj.sadd(f'chat:protocols', protocol) # TODO first seen / last seen
|
||||
|
||||
# protocol -> network
|
||||
r_obj.sadd(f'chat:protocol:networks:{protocol}', network)
|
||||
|
||||
return instance_uuid
|
||||
|
||||
|
||||
|
||||
|
||||
# INSTANCE ===> CHAT IDS
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# protocol -> instance_uuids => for protocol->networks -> protocol+network => HGETALL
|
||||
# protocol+network -> instance_uuids => HGETALL
|
||||
|
||||
# protocol -> networks ???default??? or ''
|
||||
|
||||
# --------------------------------------------------------
|
||||
# protocol+network -> addresses => HKEYS
|
||||
# protocol+network+addresse => HGET
|
||||
|
||||
|
||||
# Chat -> subtype=uuid, id = chat id
|
||||
|
||||
|
||||
# instance_uuid -> chat id
|
||||
|
||||
|
||||
# protocol - uniq ID
|
||||
# protocol + network -> uuid ????
|
||||
# protocol + network + address -> uuid
|
||||
|
||||
#######################################################################################
|
||||
|
||||
def get_obj_chat(chat_type, chat_subtype, chat_id):
|
||||
if chat_type == 'chat':
|
||||
return Chats.Chat(chat_id, chat_subtype)
|
||||
elif chat_type == 'chat-subchannel':
|
||||
return ChatSubChannels.ChatSubChannel(chat_id, chat_subtype)
|
||||
elif chat_type == 'chat-thread':
|
||||
return ChatThreads.ChatThread(chat_id, chat_subtype)
|
||||
|
||||
def get_obj_chat_meta(obj_chat, new_options=set()):
|
||||
options = {}
|
||||
if obj_chat.type == 'chat':
|
||||
options = {'created_at', 'icon', 'info', 'subchannels', 'threads', 'username'}
|
||||
elif obj_chat.type == 'chat-subchannel':
|
||||
options = {'chat', 'created_at', 'icon', 'nb_messages', 'threads'}
|
||||
elif obj_chat.type == 'chat-thread':
|
||||
options = {'chat', 'nb_messages'}
|
||||
for option in new_options:
|
||||
options.add(option)
|
||||
return obj_chat.get_meta(options=options)
|
||||
|
||||
def get_subchannels_meta_from_global_id(subchannels, translation_target=None):
|
||||
meta = []
|
||||
for sub in subchannels:
|
||||
_, instance_uuid, sub_id = sub.split(':', 2)
|
||||
subchannel = ChatSubChannels.ChatSubChannel(sub_id, instance_uuid)
|
||||
meta.append(subchannel.get_meta({'nb_messages', 'created_at', 'icon', 'translation'}, translation_target=translation_target))
|
||||
return meta
|
||||
|
||||
def get_chat_meta_from_global_id(chat_global_id):
|
||||
_, instance_uuid, chat_id = chat_global_id.split(':', 2)
|
||||
chat = Chats.Chat(chat_id, instance_uuid)
|
||||
return chat.get_meta()
|
||||
|
||||
def get_threads_metas(threads):
|
||||
metas = []
|
||||
for thread in threads:
|
||||
metas.append(ChatThreads.ChatThread(thread['id'], thread['subtype']).get_meta(options={'name', 'nb_messages'}))
|
||||
return metas
|
||||
|
||||
def get_username_meta_from_global_id(username_global_id):
|
||||
_, instance_uuid, username_id = username_global_id.split(':', 2)
|
||||
username = Usernames.Username(username_id, instance_uuid)
|
||||
return username.get_meta()
|
||||
|
||||
#### API ####
|
||||
|
||||
def api_get_chat_service_instance(chat_instance_uuid):
|
||||
chat_instance = ChatServiceInstance(chat_instance_uuid)
|
||||
if not chat_instance.exists():
|
||||
return {"status": "error", "reason": "Unknown uuid"}, 404
|
||||
return chat_instance.get_meta({'chats'}), 200
|
||||
|
||||
def api_get_chat(chat_id, chat_instance_uuid, translation_target=None, nb=-1, page=-1):
|
||||
chat = Chats.Chat(chat_id, chat_instance_uuid)
|
||||
if not chat.exists():
|
||||
return {"status": "error", "reason": "Unknown chat"}, 404
|
||||
meta = chat.get_meta({'created_at', 'icon', 'info', 'nb_participants', 'subchannels', 'threads', 'translation', 'username'}, translation_target=translation_target)
|
||||
if meta['username']:
|
||||
meta['username'] = get_username_meta_from_global_id(meta['username'])
|
||||
if meta['subchannels']:
|
||||
meta['subchannels'] = get_subchannels_meta_from_global_id(meta['subchannels'], translation_target=translation_target)
|
||||
else:
|
||||
if translation_target not in Language.get_translation_languages():
|
||||
translation_target = None
|
||||
meta['messages'], meta['pagination'], meta['tags_messages'] = chat.get_messages(translation_target=translation_target, nb=nb, page=page)
|
||||
return meta, 200
|
||||
|
||||
def api_get_nb_message_by_week(chat_id, chat_instance_uuid):
|
||||
chat = Chats.Chat(chat_id, chat_instance_uuid)
|
||||
if not chat.exists():
|
||||
return {"status": "error", "reason": "Unknown chat"}, 404
|
||||
week = chat.get_nb_message_this_week()
|
||||
# week = chat.get_nb_message_by_week('20231109')
|
||||
return week, 200
|
||||
|
||||
def api_get_chat_participants(chat_type, chat_subtype, chat_id):
|
||||
if chat_type not in ['chat', 'chat-subchannel', 'chat-thread']:
|
||||
return {"status": "error", "reason": "Unknown chat type"}, 400
|
||||
chat_obj = get_obj_chat(chat_type, chat_subtype, chat_id)
|
||||
if not chat_obj.exists():
|
||||
return {"status": "error", "reason": "Unknown chat"}, 404
|
||||
else:
|
||||
meta = get_obj_chat_meta(chat_obj, new_options={'participants'})
|
||||
chat_participants = []
|
||||
for participant in meta['participants']:
|
||||
user_account = UsersAccount.UserAccount(participant['id'], participant['subtype'])
|
||||
chat_participants.append(user_account.get_meta({'icon', 'info', 'username'}))
|
||||
meta['participants'] = chat_participants
|
||||
return meta, 200
|
||||
|
||||
def api_get_subchannel(chat_id, chat_instance_uuid, translation_target=None, nb=-1, page=-1):
|
||||
subchannel = ChatSubChannels.ChatSubChannel(chat_id, chat_instance_uuid)
|
||||
if not subchannel.exists():
|
||||
return {"status": "error", "reason": "Unknown subchannel"}, 404
|
||||
meta = subchannel.get_meta({'chat', 'created_at', 'icon', 'nb_messages', 'nb_participants', 'threads', 'translation'}, translation_target=translation_target)
|
||||
if meta['chat']:
|
||||
meta['chat'] = get_chat_meta_from_global_id(meta['chat'])
|
||||
if meta.get('threads'):
|
||||
meta['threads'] = get_threads_metas(meta['threads'])
|
||||
if meta.get('username'):
|
||||
meta['username'] = get_username_meta_from_global_id(meta['username'])
|
||||
meta['messages'], meta['pagination'], meta['tags_messages'] = subchannel.get_messages(translation_target=translation_target, nb=nb, page=page)
|
||||
return meta, 200
|
||||
|
||||
def api_get_thread(thread_id, thread_instance_uuid, translation_target=None, nb=-1, page=-1):
|
||||
thread = ChatThreads.ChatThread(thread_id, thread_instance_uuid)
|
||||
if not thread.exists():
|
||||
return {"status": "error", "reason": "Unknown thread"}, 404
|
||||
meta = thread.get_meta({'chat', 'nb_messages', 'nb_participants'})
|
||||
# if meta['chat']:
|
||||
# meta['chat'] = get_chat_meta_from_global_id(meta['chat'])
|
||||
meta['messages'], meta['pagination'], meta['tags_messages'] = thread.get_messages(translation_target=translation_target, nb=nb, page=page)
|
||||
return meta, 200
|
||||
|
||||
def api_get_message(message_id, translation_target=None):
|
||||
message = Messages.Message(message_id)
|
||||
if not message.exists():
|
||||
return {"status": "error", "reason": "Unknown uuid"}, 404
|
||||
meta = message.get_meta({'chat', 'content', 'files-names', 'icon', 'images', 'link', 'parent', 'parent_meta', 'reactions', 'thread', 'translation', 'user-account'}, translation_target=translation_target)
|
||||
return meta, 200
|
||||
|
||||
def api_get_user_account(user_id, instance_uuid, translation_target=None):
|
||||
user_account = UsersAccount.UserAccount(user_id, instance_uuid)
|
||||
if not user_account.exists():
|
||||
return {"status": "error", "reason": "Unknown user-account"}, 404
|
||||
meta = user_account.get_meta({'chats', 'icon', 'info', 'subchannels', 'threads', 'translation', 'username', 'username_meta'}, translation_target=translation_target)
|
||||
return meta, 200
|
||||
|
||||
# # # # # # # # # # LATER
|
||||
# #
|
||||
# ChatCategory #
|
||||
# #
|
||||
# # # # # # # # # #
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
r = get_chat_service_instances()
|
||||
print(r)
|
||||
r = ChatServiceInstance(r.pop())
|
||||
print(r.get_meta({'chats'}))
|
||||
# r = get_chat_protocols()
|
||||
# print(r)
|
|
@ -41,14 +41,26 @@ config_loader = None
|
|||
##################################
|
||||
|
||||
CORRELATION_TYPES_BY_OBJ = {
|
||||
"cryptocurrency": ["domain", "item"],
|
||||
"cve": ["domain", "item"],
|
||||
"decoded": ["domain", "item"],
|
||||
"domain": ["cve", "cryptocurrency", "decoded", "item", "pgp", "username", "screenshot"],
|
||||
"item": ["cve", "cryptocurrency", "decoded", "domain", "pgp", "username", "screenshot"],
|
||||
"pgp": ["domain", "item"],
|
||||
"username": ["domain", "item"],
|
||||
"chat": ["chat-subchannel", "chat-thread", "image", "user-account"], # message or direct correlation like cve, bitcoin, ... ???
|
||||
"chat-subchannel": ["chat", "chat-thread", "image", "message", "user-account"],
|
||||
"chat-thread": ["chat", "chat-subchannel", "image", "message", "user-account"], # TODO user account
|
||||
"cookie-name": ["domain"],
|
||||
"cryptocurrency": ["domain", "item", "message"],
|
||||
"cve": ["domain", "item", "message"],
|
||||
"decoded": ["domain", "item", "message"],
|
||||
"domain": ["cve", "cookie-name", "cryptocurrency", "decoded", "etag", "favicon", "hhhash", "item", "pgp", "title", "screenshot", "username"],
|
||||
"etag": ["domain"],
|
||||
"favicon": ["domain", "item"], # TODO Decoded
|
||||
"file-name": ["chat", "message"],
|
||||
"hhhash": ["domain"],
|
||||
"image": ["chat", "message", "user-account"],
|
||||
"item": ["cve", "cryptocurrency", "decoded", "domain", "favicon", "pgp", "screenshot", "title", "username"], # chat ???
|
||||
"message": ["chat", "chat-subchannel", "chat-thread", "cve", "cryptocurrency", "decoded", "file-name", "image", "pgp", "user-account"], # chat ??
|
||||
"pgp": ["domain", "item", "message"],
|
||||
"screenshot": ["domain", "item"],
|
||||
"title": ["domain", "item"],
|
||||
"user-account": ["chat", "chat-subchannel", "chat-thread", "image", "message", "username"],
|
||||
"username": ["domain", "item", "message", "user-account"],
|
||||
}
|
||||
|
||||
def get_obj_correl_types(obj_type):
|
||||
|
@ -60,6 +72,8 @@ def sanityze_obj_correl_types(obj_type, correl_types):
|
|||
correl_types = set(correl_types).intersection(obj_correl_types)
|
||||
if not correl_types:
|
||||
correl_types = obj_correl_types
|
||||
if not correl_types:
|
||||
return []
|
||||
return correl_types
|
||||
|
||||
def get_nb_correlation_by_correl_type(obj_type, subtype, obj_id, correl_type):
|
||||
|
@ -109,6 +123,9 @@ def is_obj_correlated(obj_type, subtype, obj_id, obj2_type, subtype2, obj2_id):
|
|||
except:
|
||||
return False
|
||||
|
||||
def get_obj_inter_correlation(obj_type1, subtype1, obj_id1, obj_type2, subtype2, obj_id2, correl_type):
|
||||
return r_metadata.sinter(f'correlation:obj:{obj_type1}:{subtype1}:{correl_type}:{obj_id1}', f'correlation:obj:{obj_type2}:{subtype2}:{correl_type}:{obj_id2}')
|
||||
|
||||
def add_obj_correlation(obj1_type, subtype1, obj1_id, obj2_type, subtype2, obj2_id):
|
||||
if subtype1 is None:
|
||||
subtype1 = ''
|
||||
|
@ -164,20 +181,22 @@ def delete_obj_correlations(obj_type, subtype, obj_id):
|
|||
def get_obj_str_id(obj_type, subtype, obj_id):
|
||||
if subtype is None:
|
||||
subtype = ''
|
||||
return f'{obj_type};{subtype};{obj_id}'
|
||||
return f'{obj_type}:{subtype}:{obj_id}'
|
||||
|
||||
def get_correlations_graph_nodes_links(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, flask_context=False):
|
||||
def get_correlations_graph_nodes_links(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, objs_hidden=set(), flask_context=False):
|
||||
links = set()
|
||||
nodes = set()
|
||||
meta = {'complete': True, 'objs': set()}
|
||||
|
||||
obj_str_id = get_obj_str_id(obj_type, subtype, obj_id)
|
||||
|
||||
_get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level, max_nodes, filter_types=filter_types, previous_str_obj='')
|
||||
return obj_str_id, nodes, links
|
||||
_get_correlations_graph_node(links, nodes, meta, obj_type, subtype, obj_id, level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, previous_str_obj='')
|
||||
return obj_str_id, nodes, links, meta
|
||||
|
||||
|
||||
def _get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level, max_nodes, filter_types=[], previous_str_obj=''):
|
||||
def _get_correlations_graph_node(links, nodes, meta, obj_type, subtype, obj_id, level, max_nodes, filter_types=[], objs_hidden=set(), previous_str_obj=''):
|
||||
obj_str_id = get_obj_str_id(obj_type, subtype, obj_id)
|
||||
meta['objs'].add(obj_str_id)
|
||||
nodes.add(obj_str_id)
|
||||
|
||||
obj_correlations = get_correlations(obj_type, subtype, obj_id, filter_types=filter_types)
|
||||
|
@ -186,15 +205,22 @@ def _get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level,
|
|||
for str_obj in obj_correlations[correl_type]:
|
||||
subtype2, obj2_id = str_obj.split(':', 1)
|
||||
obj2_str_id = get_obj_str_id(correl_type, subtype2, obj2_id)
|
||||
# filter objects to hide
|
||||
if obj2_str_id in objs_hidden:
|
||||
continue
|
||||
|
||||
meta['objs'].add(obj2_str_id)
|
||||
|
||||
if obj2_str_id == previous_str_obj:
|
||||
continue
|
||||
|
||||
if len(nodes) > max_nodes:
|
||||
if len(nodes) > max_nodes != 0:
|
||||
meta['complete'] = False
|
||||
break
|
||||
nodes.add(obj2_str_id)
|
||||
links.add((obj_str_id, obj2_str_id))
|
||||
|
||||
if level > 0:
|
||||
next_level = level - 1
|
||||
_get_correlations_graph_node(links, nodes, correl_type, subtype2, obj2_id, next_level, max_nodes, filter_types=filter_types, previous_str_obj=obj_str_id)
|
||||
_get_correlations_graph_node(links, nodes, meta, correl_type, subtype2, obj2_id, next_level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, previous_str_obj=obj_str_id)
|
||||
|
||||
|
|
|
@ -36,8 +36,10 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from packages import git_status
|
||||
from packages import Date
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Domains import Domain
|
||||
from lib.objects import HHHashs
|
||||
from lib.objects.Items import Item
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
@ -74,8 +76,8 @@ def get_current_date(separator=False):
|
|||
def get_date_crawled_items_source(date):
|
||||
return os.path.join('crawled', date)
|
||||
|
||||
def get_date_har_dir(date):
|
||||
return os.path.join(HAR_DIR, date)
|
||||
def get_har_dir():
|
||||
return HAR_DIR
|
||||
|
||||
def is_valid_onion_domain(domain):
|
||||
if not domain.endswith('.onion'):
|
||||
|
@ -133,7 +135,7 @@ def unpack_url(url):
|
|||
# # # # # # # # TODO CREATE NEW OBJECT
|
||||
|
||||
def get_favicon_from_html(html, domain, url):
|
||||
favicon_urls = extract_favicon_from_html(html, url)
|
||||
favicon_urls, favicons = extract_favicon_from_html(html, url)
|
||||
# add root favicon
|
||||
if not favicon_urls:
|
||||
favicon_urls.add(f'{urlparse(url).scheme}://{domain}/favicon.ico')
|
||||
|
@ -141,9 +143,11 @@ def get_favicon_from_html(html, domain, url):
|
|||
return favicon_urls
|
||||
|
||||
def extract_favicon_from_html(html, url):
|
||||
favicon_urls = set()
|
||||
favicons = set()
|
||||
favicons_urls = set()
|
||||
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
set_icons = set()
|
||||
all_icons = set()
|
||||
# If there are multiple <link rel="icon">s, the browser uses their media,
|
||||
# type, and sizes attributes to select the most appropriate icon.
|
||||
# If several icons are equally appropriate, the last one is used.
|
||||
|
@ -159,30 +163,293 @@ def extract_favicon_from_html(html, url):
|
|||
# - <meta name="msapplication-TileColor" content="#aaaaaa"> <meta name="theme-color" content="#ffffff">
|
||||
# - <meta name="msapplication-config" content="/icons/browserconfig.xml">
|
||||
|
||||
# desktop browser 'shortcut icon' (older browser), 'icon'
|
||||
for favicon_tag in ['icon', 'shortcut icon']:
|
||||
if soup.head:
|
||||
for icon in soup.head.find_all('link', attrs={'rel': lambda x : x and x.lower() == favicon_tag, 'href': True}):
|
||||
set_icons.add(icon)
|
||||
# Root Favicon
|
||||
f = get_faup()
|
||||
f.decode(url)
|
||||
url_decoded = f.get()
|
||||
root_domain = f"{url_decoded['scheme']}://{url_decoded['domain']}"
|
||||
default_icon = f'{root_domain}/favicon.ico'
|
||||
favicons_urls.add(default_icon)
|
||||
# print(default_icon)
|
||||
|
||||
# # TODO: handle base64 favicon
|
||||
for tag in set_icons:
|
||||
# shortcut
|
||||
for shortcut in soup.find_all('link', rel='shortcut icon'):
|
||||
all_icons.add(shortcut)
|
||||
# icons
|
||||
for icon in soup.find_all('link', rel='icon'):
|
||||
all_icons.add(icon)
|
||||
|
||||
for mask_icon in soup.find_all('link', rel='mask-icon'):
|
||||
all_icons.add(mask_icon)
|
||||
for apple_touche_icon in soup.find_all('link', rel='apple-touch-icon'):
|
||||
all_icons.add(apple_touche_icon)
|
||||
for msapplication in soup.find_all('meta', attrs={'name': 'msapplication-TileImage'}): # msapplication-TileColor
|
||||
all_icons.add(msapplication)
|
||||
|
||||
# msapplication-TileImage
|
||||
|
||||
# print(all_icons)
|
||||
for tag in all_icons:
|
||||
icon_url = tag.get('href')
|
||||
if icon_url:
|
||||
if icon_url.startswith('//'):
|
||||
icon_url = icon_url.replace('//', '/')
|
||||
if icon_url.startswith('data:'):
|
||||
# # TODO: handle base64 favicon
|
||||
pass
|
||||
data = icon_url.split(',', 1)
|
||||
if len(data) > 1:
|
||||
data = ''.join(data[1].split())
|
||||
favicon = base64.b64decode(data)
|
||||
if favicon:
|
||||
favicons.add(favicon)
|
||||
else:
|
||||
icon_url = urljoin(url, icon_url)
|
||||
icon_url = urlparse(icon_url, scheme=urlparse(url).scheme).geturl()
|
||||
favicon_urls.add(icon_url)
|
||||
return favicon_urls
|
||||
favicon_url = urljoin(url, icon_url)
|
||||
favicons_urls.add(favicon_url)
|
||||
elif tag.get('name') == 'msapplication-TileImage':
|
||||
icon_url = tag.get('content')
|
||||
if icon_url:
|
||||
if icon_url.startswith('data:'):
|
||||
data = icon_url.split(',', 1)
|
||||
if len(data) > 1:
|
||||
data = ''.join(data[1].split())
|
||||
favicon = base64.b64decode(data)
|
||||
if favicon:
|
||||
favicons.add(favicon)
|
||||
else:
|
||||
favicon_url = urljoin(url, icon_url)
|
||||
favicons_urls.add(favicon_url)
|
||||
print(favicon_url)
|
||||
|
||||
# print(favicons_urls)
|
||||
return favicons_urls, favicons
|
||||
|
||||
# mmh3.hash(favicon)
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# TITLE #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def extract_title_from_html(html):
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
title = soup.title
|
||||
if title:
|
||||
title = title.string
|
||||
if title:
|
||||
return str(title)
|
||||
return ''
|
||||
|
||||
def extract_description_from_html(html):
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
description = soup.find('meta', attrs={'name': 'description'})
|
||||
if description:
|
||||
return description['content']
|
||||
return ''
|
||||
|
||||
def extract_keywords_from_html(html):
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
keywords = soup.find('meta', attrs={'name': 'keywords'})
|
||||
if keywords:
|
||||
return keywords['content']
|
||||
return ''
|
||||
|
||||
def extract_author_from_html(html):
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
keywords = soup.find('meta', attrs={'name': 'author'})
|
||||
if keywords:
|
||||
return keywords['content']
|
||||
return ''
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# HAR #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def create_har_id(date, item_id):
|
||||
item_id = item_id.split('/')[-1]
|
||||
return os.path.join(date, f'{item_id}.json.gz')
|
||||
|
||||
def save_har(har_id, har_content):
|
||||
# create dir
|
||||
har_dir = os.path.dirname(os.path.join(get_har_dir(), har_id))
|
||||
if not os.path.exists(har_dir):
|
||||
os.makedirs(har_dir)
|
||||
# save HAR
|
||||
filename = os.path.join(get_har_dir(), har_id)
|
||||
with gzip.open(filename, 'wb') as f:
|
||||
f.write(json.dumps(har_content).encode())
|
||||
|
||||
def get_all_har_ids():
|
||||
har_ids = []
|
||||
today_root_dir = os.path.join(HAR_DIR, Date.get_today_date_str(separator=True))
|
||||
dirs_year = set()
|
||||
for ydir in next(os.walk(HAR_DIR))[1]:
|
||||
if len(ydir) == 4:
|
||||
try:
|
||||
int(ydir)
|
||||
dirs_year.add(ydir)
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
|
||||
if os.path.exists(today_root_dir):
|
||||
for file in [f for f in os.listdir(today_root_dir) if os.path.isfile(os.path.join(today_root_dir, f))]:
|
||||
har_id = os.path.relpath(os.path.join(today_root_dir, file), HAR_DIR)
|
||||
har_ids.append(har_id)
|
||||
|
||||
for ydir in sorted(dirs_year, reverse=False):
|
||||
search_dear = os.path.join(HAR_DIR, ydir)
|
||||
for root, dirs, files in os.walk(search_dear):
|
||||
for file in files:
|
||||
if root != today_root_dir:
|
||||
har_id = os.path.relpath(os.path.join(root, file), HAR_DIR)
|
||||
har_ids.append(har_id)
|
||||
return har_ids
|
||||
|
||||
def get_month_har_ids(year, month):
|
||||
har_ids = []
|
||||
month_path = os.path.join(HAR_DIR, year, month)
|
||||
for root, dirs, files in os.walk(month_path):
|
||||
for file in files:
|
||||
har_id = os.path.relpath(os.path.join(root, file), HAR_DIR)
|
||||
har_ids.append(har_id)
|
||||
return har_ids
|
||||
|
||||
|
||||
def get_har_content(har_id):
|
||||
har_path = os.path.join(HAR_DIR, har_id)
|
||||
try:
|
||||
with gzip.open(har_path) as f:
|
||||
try:
|
||||
return json.loads(f.read())
|
||||
except json.decoder.JSONDecodeError:
|
||||
return {}
|
||||
except Exception as e:
|
||||
print(e) # TODO LOGS
|
||||
return {}
|
||||
|
||||
def extract_cookies_names_from_har(har):
|
||||
cookies = set()
|
||||
for entrie in har.get('log', {}).get('entries', []):
|
||||
for cookie in entrie.get('request', {}).get('cookies', []):
|
||||
name = cookie.get('name')
|
||||
if name:
|
||||
cookies.add(name)
|
||||
for cookie in entrie.get('response', {}).get('cookies', []):
|
||||
name = cookie.get('name')
|
||||
if name:
|
||||
cookies.add(name)
|
||||
return cookies
|
||||
|
||||
def _reprocess_all_hars_cookie_name():
|
||||
from lib.objects import CookiesNames
|
||||
for har_id in get_all_har_ids():
|
||||
domain = har_id.split('/')[-1]
|
||||
domain = domain[:-44]
|
||||
date = har_id.split('/')
|
||||
date = f'{date[-4]}{date[-3]}{date[-2]}'
|
||||
for cookie_name in extract_cookies_names_from_har(get_har_content(har_id)):
|
||||
print(domain, date, cookie_name)
|
||||
cookie = CookiesNames.create(cookie_name)
|
||||
cookie.add(date, Domain(domain))
|
||||
|
||||
def extract_etag_from_har(har): # TODO check response url
|
||||
etags = set()
|
||||
for entrie in har.get('log', {}).get('entries', []):
|
||||
for header in entrie.get('response', {}).get('headers', []):
|
||||
if header.get('name') == 'etag':
|
||||
# print(header)
|
||||
etag = header.get('value')
|
||||
if etag:
|
||||
etags.add(etag)
|
||||
return etags
|
||||
|
||||
def _reprocess_all_hars_etag():
|
||||
from lib.objects import Etags
|
||||
for har_id in get_all_har_ids():
|
||||
domain = har_id.split('/')[-1]
|
||||
domain = domain[:-44]
|
||||
date = har_id.split('/')
|
||||
date = f'{date[-4]}{date[-3]}{date[-2]}'
|
||||
for etag_content in extract_etag_from_har(get_har_content(har_id)):
|
||||
print(domain, date, etag_content)
|
||||
etag = Etags.create(etag_content)
|
||||
etag.add(date, Domain(domain))
|
||||
|
||||
def extract_hhhash_by_id(har_id, domain, date):
|
||||
return extract_hhhash(get_har_content(har_id), domain, date)
|
||||
|
||||
def extract_hhhash(har, domain, date):
|
||||
hhhashs = set()
|
||||
urls = set()
|
||||
for entrie in har.get('log', {}).get('entries', []):
|
||||
url = entrie.get('request').get('url')
|
||||
if url not in urls:
|
||||
# filter redirect
|
||||
if entrie.get('response').get('status') == 200: # != 301:
|
||||
# print(url, entrie.get('response').get('status'))
|
||||
|
||||
f = get_faup()
|
||||
f.decode(url)
|
||||
domain_url = f.get().get('domain')
|
||||
if domain_url == domain:
|
||||
|
||||
headers = entrie.get('response').get('headers')
|
||||
|
||||
hhhash_header = HHHashs.build_hhhash_headers(headers)
|
||||
hhhash = HHHashs.hhhash_headers(hhhash_header)
|
||||
|
||||
if hhhash not in hhhashs:
|
||||
print('', url, hhhash)
|
||||
|
||||
# -----
|
||||
obj = HHHashs.create(hhhash_header, hhhash)
|
||||
obj.add(date, Domain(domain))
|
||||
|
||||
hhhashs.add(hhhash)
|
||||
urls.add(url)
|
||||
print()
|
||||
print()
|
||||
print('HHHASH:')
|
||||
for hhhash in hhhashs:
|
||||
print(hhhash)
|
||||
return hhhashs
|
||||
|
||||
def _reprocess_all_hars_hhhashs():
|
||||
for har_id in get_all_har_ids():
|
||||
print()
|
||||
print(har_id)
|
||||
domain = har_id.split('/')[-1]
|
||||
domain = domain[:-44]
|
||||
date = har_id.split('/')
|
||||
date = f'{date[-4]}{date[-3]}{date[-2]}'
|
||||
extract_hhhash_by_id(har_id, domain, date)
|
||||
|
||||
|
||||
|
||||
def _gzip_har(har_id):
|
||||
har_path = os.path.join(HAR_DIR, har_id)
|
||||
new_id = f'{har_path}.gz'
|
||||
if not har_id.endswith('.gz'):
|
||||
if not os.path.exists(new_id):
|
||||
with open(har_path, 'rb') as f:
|
||||
content = f.read()
|
||||
if content:
|
||||
with gzip.open(new_id, 'wb') as f:
|
||||
r = f.write(content)
|
||||
print(r)
|
||||
if os.path.exists(new_id) and os.path.exists(har_path):
|
||||
os.remove(har_path)
|
||||
print('delete:', har_path)
|
||||
|
||||
def _gzip_all_hars():
|
||||
for har_id in get_all_har_ids():
|
||||
_gzip_har(har_id)
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
################################################################################
|
||||
|
||||
|
@ -498,8 +765,7 @@ class Cookie:
|
|||
meta[field] = value
|
||||
if r_json:
|
||||
data = json.dumps(meta, indent=4, sort_keys=True)
|
||||
meta = {'data': data}
|
||||
meta['uuid'] = self.uuid
|
||||
meta = {'data': data, 'uuid': self.uuid}
|
||||
return meta
|
||||
|
||||
def edit(self, cookie_dict):
|
||||
|
@ -611,7 +877,7 @@ def unpack_imported_json_cookie(json_cookie):
|
|||
|
||||
## - - ##
|
||||
#### COOKIEJAR API ####
|
||||
def api_import_cookies_from_json(user_id, cookiejar_uuid, json_cookies_str): # # TODO: add catch
|
||||
def api_import_cookies_from_json(user_id, cookiejar_uuid, json_cookies_str): # # TODO: add catch
|
||||
resp = api_verify_cookiejar_acl(cookiejar_uuid, user_id)
|
||||
if resp:
|
||||
return resp
|
||||
|
@ -780,8 +1046,8 @@ class CrawlerScheduler:
|
|||
minutes = 0
|
||||
current_time = datetime.now().timestamp()
|
||||
time_next_run = (datetime.now() + relativedelta(months=int(months), weeks=int(weeks),
|
||||
days=int(days), hours=int(hours),
|
||||
minutes=int(minutes))).timestamp()
|
||||
days=int(days), hours=int(hours),
|
||||
minutes=int(minutes))).timestamp()
|
||||
# Make sure the next capture is not scheduled for in a too short interval
|
||||
interval_next_capture = time_next_run - current_time
|
||||
if interval_next_capture < self.min_frequency:
|
||||
|
@ -803,6 +1069,7 @@ class CrawlerScheduler:
|
|||
task_uuid = create_task(meta['url'], depth=meta['depth'], har=meta['har'], screenshot=meta['screenshot'],
|
||||
header=meta['header'],
|
||||
cookiejar=meta['cookiejar'], proxy=meta['proxy'],
|
||||
tags=meta['tags'],
|
||||
user_agent=meta['user_agent'], parent='scheduler', priority=40)
|
||||
if task_uuid:
|
||||
schedule.set_task(task_uuid)
|
||||
|
@ -905,6 +1172,14 @@ class CrawlerSchedule:
|
|||
def _set_field(self, field, value):
|
||||
return r_crawler.hset(f'schedule:{self.uuid}', field, value)
|
||||
|
||||
def get_tags(self):
|
||||
return r_crawler.smembers(f'schedule:tags:{self.uuid}')
|
||||
|
||||
def set_tags(self, tags=[]):
|
||||
for tag in tags:
|
||||
r_crawler.sadd(f'schedule:tags:{self.uuid}', tag)
|
||||
# Tag.create_custom_tag(tag)
|
||||
|
||||
def get_meta(self, ui=False):
|
||||
meta = {
|
||||
'uuid': self.uuid,
|
||||
|
@ -919,6 +1194,7 @@ class CrawlerSchedule:
|
|||
'cookiejar': self.get_cookiejar(),
|
||||
'header': self.get_header(),
|
||||
'proxy': self.get_proxy(),
|
||||
'tags': self.get_tags(),
|
||||
}
|
||||
status = self.get_status()
|
||||
if ui:
|
||||
|
@ -934,6 +1210,7 @@ class CrawlerSchedule:
|
|||
meta = {'uuid': self.uuid,
|
||||
'url': self.get_url(),
|
||||
'user': self.get_user(),
|
||||
'tags': self.get_tags(),
|
||||
'next_run': self.get_next_run(r_str=True)}
|
||||
status = self.get_status()
|
||||
if isinstance(status, ScheduleStatus):
|
||||
|
@ -942,7 +1219,7 @@ class CrawlerSchedule:
|
|||
return meta
|
||||
|
||||
def create(self, frequency, user, url,
|
||||
depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None):
|
||||
depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None, tags=[]):
|
||||
|
||||
if self.exists():
|
||||
raise Exception('Error: Monitor already exists')
|
||||
|
@ -971,6 +1248,9 @@ class CrawlerSchedule:
|
|||
if user_agent:
|
||||
self._set_field('user_agent', user_agent)
|
||||
|
||||
if tags:
|
||||
self.set_tags(tags)
|
||||
|
||||
r_crawler.sadd('scheduler:schedules', self.uuid)
|
||||
|
||||
def delete(self):
|
||||
|
@ -984,12 +1264,13 @@ class CrawlerSchedule:
|
|||
|
||||
# delete meta
|
||||
r_crawler.delete(f'schedule:{self.uuid}')
|
||||
r_crawler.delete(f'schedule:tags:{self.uuid}')
|
||||
r_crawler.srem('scheduler:schedules', self.uuid)
|
||||
|
||||
def create_schedule(frequency, user, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None):
|
||||
def create_schedule(frequency, user, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None, tags=[]):
|
||||
schedule_uuid = gen_uuid()
|
||||
schedule = CrawlerSchedule(schedule_uuid)
|
||||
schedule.create(frequency, user, url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, proxy=proxy, user_agent=user_agent)
|
||||
schedule.create(frequency, user, url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, proxy=proxy, user_agent=user_agent, tags=tags)
|
||||
return schedule_uuid
|
||||
|
||||
# TODO sanityze UUID
|
||||
|
@ -1046,18 +1327,29 @@ class CrawlerCapture:
|
|||
if task_uuid:
|
||||
return CrawlerTask(task_uuid)
|
||||
|
||||
def get_start_time(self):
|
||||
return self.get_task().get_start_time()
|
||||
def get_start_time(self, r_str=True):
|
||||
start_time = self.get_task().get_start_time()
|
||||
if r_str:
|
||||
return start_time
|
||||
elif not start_time:
|
||||
return 0
|
||||
else:
|
||||
start_time = datetime.strptime(start_time, "%Y/%m/%d - %H:%M.%S").timestamp()
|
||||
return int(start_time)
|
||||
|
||||
def get_status(self):
|
||||
return r_cache.hget(f'crawler:capture:{self.uuid}', 'status')
|
||||
status = r_cache.hget(f'crawler:capture:{self.uuid}', 'status')
|
||||
if not status:
|
||||
status = -1
|
||||
return status
|
||||
|
||||
def is_ongoing(self):
|
||||
return self.get_status() == CaptureStatus.ONGOING
|
||||
|
||||
def create(self, task_uuid):
|
||||
if self.exists():
|
||||
raise Exception(f'Error: Capture {self.uuid} already exists')
|
||||
print(f'Capture {self.uuid} already exists') # TODO LOGS
|
||||
return None
|
||||
launch_time = int(time.time())
|
||||
r_crawler.hset(f'crawler:task:{task_uuid}', 'capture', self.uuid)
|
||||
r_crawler.hset('crawler:captures:tasks', self.uuid, task_uuid)
|
||||
|
@ -1068,7 +1360,7 @@ class CrawlerCapture:
|
|||
def update(self, status):
|
||||
# Error or Reload
|
||||
if not status:
|
||||
r_cache.hset(f'crawler:capture:{self.uuid}', 'status', CaptureStatus.UNKNOWN)
|
||||
r_cache.hset(f'crawler:capture:{self.uuid}', 'status', CaptureStatus.UNKNOWN.value)
|
||||
r_cache.zadd('crawler:captures', {self.uuid: 0})
|
||||
else:
|
||||
last_check = int(time.time())
|
||||
|
@ -1122,6 +1414,11 @@ def get_captures_status():
|
|||
status.append(meta)
|
||||
return status
|
||||
|
||||
def delete_captures():
|
||||
for capture_uuid in get_crawler_captures():
|
||||
capture = CrawlerCapture(capture_uuid)
|
||||
capture.delete()
|
||||
|
||||
##-- CRAWLER STATE --##
|
||||
|
||||
|
||||
|
@ -1204,6 +1501,14 @@ class CrawlerTask:
|
|||
def _set_field(self, field, value):
|
||||
return r_crawler.hset(f'crawler:task:{self.uuid}', field, value)
|
||||
|
||||
def get_tags(self):
|
||||
return r_crawler.smembers(f'crawler:task:tags:{self.uuid}')
|
||||
|
||||
def set_tags(self, tags):
|
||||
for tag in tags:
|
||||
r_crawler.sadd(f'crawler:task:tags:{self.uuid}', tag)
|
||||
# Tag.create_custom_tag(tag)
|
||||
|
||||
def get_meta(self):
|
||||
meta = {
|
||||
'uuid': self.uuid,
|
||||
|
@ -1218,6 +1523,7 @@ class CrawlerTask:
|
|||
'header': self.get_header(),
|
||||
'proxy': self.get_proxy(),
|
||||
'parent': self.get_parent(),
|
||||
'tags': self.get_tags(),
|
||||
}
|
||||
return meta
|
||||
|
||||
|
@ -1225,7 +1531,7 @@ class CrawlerTask:
|
|||
# TODO SANITIZE PRIORITY
|
||||
# PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100
|
||||
def create(self, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None,
|
||||
user_agent=None, parent='manual', priority=0):
|
||||
user_agent=None, tags=[], parent='manual', priority=0, external=False):
|
||||
if self.exists():
|
||||
raise Exception('Error: Task already exists')
|
||||
|
||||
|
@ -1256,7 +1562,7 @@ class CrawlerTask:
|
|||
# TODO SANITIZE COOKIEJAR -> UUID
|
||||
|
||||
# Check if already in queue
|
||||
hash_query = get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header)
|
||||
hash_query = get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header, tags)
|
||||
if r_crawler.hexists(f'crawler:queue:hash', hash_query):
|
||||
self.uuid = r_crawler.hget(f'crawler:queue:hash', hash_query)
|
||||
return self.uuid
|
||||
|
@ -1277,10 +1583,13 @@ class CrawlerTask:
|
|||
if user_agent:
|
||||
self._set_field('user_agent', user_agent)
|
||||
|
||||
if tags:
|
||||
self.set_tags(tags)
|
||||
|
||||
r_crawler.hset('crawler:queue:hash', hash_query, self.uuid)
|
||||
self._set_field('hash', hash_query)
|
||||
r_crawler.zadd('crawler:queue', {self.uuid: priority})
|
||||
self.add_to_db_crawler_queue(priority)
|
||||
if not external:
|
||||
self.add_to_db_crawler_queue(priority)
|
||||
# UI
|
||||
domain_type = dom.get_domain_type()
|
||||
r_crawler.sadd(f'crawler:queue:type:{domain_type}', self.uuid)
|
||||
|
@ -1293,6 +1602,11 @@ class CrawlerTask:
|
|||
def start(self):
|
||||
self._set_field('start_time', datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
|
||||
|
||||
def reset(self):
|
||||
priority = 49
|
||||
r_crawler.hdel(f'crawler:task:{self.uuid}', 'start_time')
|
||||
self.add_to_db_crawler_queue(priority)
|
||||
|
||||
# Crawler
|
||||
def remove(self): # zrem cache + DB
|
||||
capture_uuid = self.get_capture()
|
||||
|
@ -1316,10 +1630,10 @@ class CrawlerTask:
|
|||
|
||||
|
||||
# TODO move to class ???
|
||||
def get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header):
|
||||
def get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header, tags):
|
||||
to_enqueue = {'domain': domain, 'depth': depth, 'har': har, 'screenshot': screenshot,
|
||||
'priority': priority, 'proxy': proxy, 'cookiejar': cookiejar, 'user_agent': user_agent,
|
||||
'header': header}
|
||||
'header': header, 'tags': tags}
|
||||
if priority != 0:
|
||||
to_enqueue['url'] = url
|
||||
return hashlib.sha512(pickle.dumps(to_enqueue)).hexdigest()
|
||||
|
@ -1330,12 +1644,11 @@ def add_task_to_lacus_queue():
|
|||
return None
|
||||
task_uuid, priority = task_uuid[0]
|
||||
task = CrawlerTask(task_uuid)
|
||||
task.start()
|
||||
return task.uuid, priority
|
||||
return task, priority
|
||||
|
||||
# PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100
|
||||
def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None,
|
||||
user_agent=None, parent='manual', priority=0, task_uuid=None):
|
||||
user_agent=None, tags=[], parent='manual', priority=0, task_uuid=None, external=False):
|
||||
if task_uuid:
|
||||
if CrawlerTask(task_uuid).exists():
|
||||
task_uuid = gen_uuid()
|
||||
|
@ -1343,7 +1656,8 @@ def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=
|
|||
task_uuid = gen_uuid()
|
||||
task = CrawlerTask(task_uuid)
|
||||
task_uuid = task.create(url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar,
|
||||
proxy=proxy, user_agent=user_agent, parent=parent, priority=priority)
|
||||
proxy=proxy, user_agent=user_agent, tags=tags, parent=parent, priority=priority,
|
||||
external=external)
|
||||
return task_uuid
|
||||
|
||||
|
||||
|
@ -1353,7 +1667,8 @@ def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=
|
|||
|
||||
# # TODO: ADD user agent
|
||||
# # TODO: sanitize URL
|
||||
def api_add_crawler_task(data, user_id=None):
|
||||
|
||||
def api_parse_task_dict_basic(data, user_id):
|
||||
url = data.get('url', None)
|
||||
if not url or url == '\n':
|
||||
return {'status': 'error', 'reason': 'No url supplied'}, 400
|
||||
|
@ -1379,6 +1694,31 @@ def api_add_crawler_task(data, user_id=None):
|
|||
else:
|
||||
depth_limit = 0
|
||||
|
||||
# PROXY
|
||||
proxy = data.get('proxy', None)
|
||||
if proxy == 'onion' or proxy == 'tor' or proxy == 'force_tor':
|
||||
proxy = 'force_tor'
|
||||
elif proxy:
|
||||
verify = api_verify_proxy(proxy)
|
||||
if verify[1] != 200:
|
||||
return verify
|
||||
|
||||
tags = data.get('tags', [])
|
||||
|
||||
return {'url': url, 'depth_limit': depth_limit, 'har': har, 'screenshot': screenshot, 'proxy': proxy, 'tags': tags}, 200
|
||||
|
||||
def api_add_crawler_task(data, user_id=None):
|
||||
task, resp = api_parse_task_dict_basic(data, user_id)
|
||||
if resp != 200:
|
||||
return task, resp
|
||||
|
||||
url = task['url']
|
||||
screenshot = task['screenshot']
|
||||
har = task['har']
|
||||
depth_limit = task['depth_limit']
|
||||
proxy = task['proxy']
|
||||
tags = task['tags']
|
||||
|
||||
cookiejar_uuid = data.get('cookiejar', None)
|
||||
if cookiejar_uuid:
|
||||
cookiejar = Cookiejar(cookiejar_uuid)
|
||||
|
@ -1390,6 +1730,19 @@ def api_add_crawler_task(data, user_id=None):
|
|||
return {'error': 'The access to this cookiejar is restricted'}, 403
|
||||
cookiejar_uuid = cookiejar.uuid
|
||||
|
||||
cookies = data.get('cookies', None)
|
||||
if not cookiejar_uuid and cookies:
|
||||
# Create new cookiejar
|
||||
cookiejar_uuid = create_cookiejar(user_id, "single-shot cookiejar", 1, None)
|
||||
cookiejar = Cookiejar(cookiejar_uuid)
|
||||
for cookie in cookies:
|
||||
try:
|
||||
name = cookie.get('name')
|
||||
value = cookie.get('value')
|
||||
cookiejar.add_cookie(name, value, None, None, None, None, None)
|
||||
except KeyError:
|
||||
return {'error': 'Invalid cookie key, please submit a valid JSON', 'cookiejar_uuid': cookiejar_uuid}, 400
|
||||
|
||||
frequency = data.get('frequency', None)
|
||||
if frequency:
|
||||
if frequency not in ['monthly', 'weekly', 'daily', 'hourly']:
|
||||
|
@ -1410,29 +1763,47 @@ def api_add_crawler_task(data, user_id=None):
|
|||
return {'error': 'Invalid frequency'}, 400
|
||||
frequency = f'{months}:{weeks}:{days}:{hours}:{minutes}'
|
||||
|
||||
# PROXY
|
||||
proxy = data.get('proxy', None)
|
||||
if proxy == 'onion' or proxy == 'tor' or proxy == 'force_tor':
|
||||
proxy = 'force_tor'
|
||||
elif proxy:
|
||||
verify = api_verify_proxy(proxy)
|
||||
if verify[1] != 200:
|
||||
return verify
|
||||
|
||||
if frequency:
|
||||
# TODO verify user
|
||||
return create_schedule(frequency, user_id, url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None), 200
|
||||
task_uuid = create_schedule(frequency, user_id, url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, tags=tags)
|
||||
else:
|
||||
# TODO HEADERS
|
||||
# TODO USER AGENT
|
||||
return create_task(url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None,
|
||||
parent='manual', priority=90), 200
|
||||
task_uuid = create_task(url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, tags=tags,
|
||||
parent='manual', priority=90)
|
||||
|
||||
return {'uuid': task_uuid}, 200
|
||||
|
||||
|
||||
#### ####
|
||||
|
||||
# TODO cookiejar - cookies - frequency
|
||||
def api_add_crawler_capture(data, user_id):
|
||||
task, resp = api_parse_task_dict_basic(data, user_id)
|
||||
if resp != 200:
|
||||
return task, resp
|
||||
|
||||
task_uuid = data.get('task_uuid')
|
||||
if not task_uuid:
|
||||
return {'error': 'Invalid task_uuid', 'task_uuid': task_uuid}, 400
|
||||
capture_uuid = data.get('capture_uuid')
|
||||
if not capture_uuid:
|
||||
return {'error': 'Invalid capture_uuid', 'capture_uuid': capture_uuid}, 400
|
||||
|
||||
# parent = data.get('parent')
|
||||
|
||||
# TODO parent
|
||||
task_uuid = create_task(task['url'], depth=task['depth_limit'], har=task['har'], screenshot=task['screenshot'],
|
||||
proxy=task['proxy'], tags=task['tags'],
|
||||
parent='manual', task_uuid=task_uuid, external=True)
|
||||
if not task_uuid:
|
||||
return {'error': 'Aborted by Crawler', 'task_uuid': task_uuid, 'capture_uuid': capture_uuid}, 400
|
||||
task = CrawlerTask(task_uuid)
|
||||
create_capture(capture_uuid, task_uuid)
|
||||
task.start()
|
||||
return {'uuid': capture_uuid}, 200
|
||||
|
||||
###################################################################################
|
||||
###################################################################################
|
||||
|
@ -1471,14 +1842,6 @@ def create_item_id(item_dir, domain):
|
|||
UUID = domain+str(uuid.uuid4())
|
||||
return os.path.join(item_dir, UUID)
|
||||
|
||||
def save_har(har_dir, item_id, har_content):
|
||||
if not os.path.exists(har_dir):
|
||||
os.makedirs(har_dir)
|
||||
item_id = item_id.split('/')[-1]
|
||||
filename = os.path.join(har_dir, item_id + '.json')
|
||||
with open(filename, 'w') as f:
|
||||
f.write(json.dumps(har_content))
|
||||
|
||||
# # # # # # # # # # # #
|
||||
# #
|
||||
# CRAWLER MANAGER # TODO REFACTOR ME
|
||||
|
@ -1509,13 +1872,13 @@ class CrawlerProxy:
|
|||
self.uuid = proxy_uuid
|
||||
|
||||
def get_description(self):
|
||||
return r_crawler.hgrt(f'crawler:proxy:{self.uuif}', 'description')
|
||||
return r_crawler.hget(f'crawler:proxy:{self.uuid}', 'description')
|
||||
|
||||
# Host
|
||||
# Port
|
||||
# Type -> need test
|
||||
def get_url(self):
|
||||
return r_crawler.hgrt(f'crawler:proxy:{self.uuif}', 'url')
|
||||
return r_crawler.hget(f'crawler:proxy:{self.uuid}', 'url')
|
||||
|
||||
#### CRAWLER LACUS ####
|
||||
|
||||
|
@ -1577,7 +1940,11 @@ def ping_lacus():
|
|||
ping = False
|
||||
req_error = {'error': 'Lacus URL undefined', 'status_code': 400}
|
||||
else:
|
||||
ping = lacus.is_up
|
||||
try:
|
||||
ping = lacus.is_up
|
||||
except:
|
||||
req_error = {'error': 'Failed to connect Lacus URL', 'status_code': 400}
|
||||
ping = False
|
||||
update_lacus_connection_status(ping, req_error=req_error)
|
||||
return ping
|
||||
|
||||
|
@ -1594,7 +1961,7 @@ def api_save_lacus_url_key(data):
|
|||
# unpack json
|
||||
manager_url = data.get('url', None)
|
||||
api_key = data.get('api_key', None)
|
||||
if not manager_url: # or not api_key:
|
||||
if not manager_url: # or not api_key:
|
||||
return {'status': 'error', 'reason': 'No url or API key supplied'}, 400
|
||||
# check if is valid url
|
||||
try:
|
||||
|
@ -1637,7 +2004,7 @@ def api_set_crawler_max_captures(data):
|
|||
save_nb_max_captures(nb_captures)
|
||||
return nb_captures, 200
|
||||
|
||||
## TEST ##
|
||||
## TEST ##
|
||||
|
||||
def is_test_ail_crawlers_successful():
|
||||
return r_db.hget('crawler:tor:test', 'success') == 'True'
|
||||
|
@ -1711,7 +2078,15 @@ def test_ail_crawlers():
|
|||
load_blacklist()
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# task = CrawlerTask('2dffcae9-8f66-4cfa-8e2c-de1df738a6cd')
|
||||
# print(task.get_meta())
|
||||
# _clear_captures()
|
||||
# delete_captures()
|
||||
|
||||
# item_id = 'crawled/2023/02/20/data.gz'
|
||||
# item = Item(item_id)
|
||||
# content = item.get_content()
|
||||
# temp_url = ''
|
||||
# r = extract_favicon_from_html(content, temp_url)
|
||||
# print(r)
|
||||
# _reprocess_all_hars_cookie_name()
|
||||
# _reprocess_all_hars_etag()
|
||||
# _gzip_all_hars()
|
||||
# _reprocess_all_hars_hhhashs()
|
||||
|
|
|
@ -50,8 +50,8 @@ def is_passive_dns_enabled(cache=True):
|
|||
def change_passive_dns_state(new_state):
|
||||
old_state = is_passive_dns_enabled(cache=False)
|
||||
if old_state != new_state:
|
||||
r_serv_db.hset('d4:passivedns', 'enabled', bool(new_state))
|
||||
r_cache.set('d4:passivedns:enabled', bool(new_state))
|
||||
r_serv_db.hset('d4:passivedns', 'enabled', str(new_state))
|
||||
r_cache.set('d4:passivedns:enabled', str(new_state))
|
||||
update_time = time.time()
|
||||
r_serv_db.hset('d4:passivedns', 'update_time', update_time)
|
||||
r_cache.set('d4:passivedns:last_update_time', update_time)
|
||||
|
|
|
@ -129,7 +129,7 @@ def get_item_url(item_id):
|
|||
|
||||
def get_item_har(item_id):
|
||||
har = '/'.join(item_id.rsplit('/')[-4:])
|
||||
har = f'{har}.json'
|
||||
har = f'{har}.json.gz'
|
||||
path = os.path.join(ConfigLoader.get_hars_dir(), har)
|
||||
if os.path.isfile(path):
|
||||
return har
|
||||
|
@ -204,15 +204,22 @@ def _get_dir_source_name(directory, source_name=None, l_sources_name=set(), filt
|
|||
if not l_sources_name:
|
||||
l_sources_name = set()
|
||||
if source_name:
|
||||
l_dir = os.listdir(os.path.join(directory, source_name))
|
||||
path = os.path.join(directory, source_name)
|
||||
if os.path.isdir(path):
|
||||
l_dir = os.listdir(os.path.join(directory, source_name))
|
||||
else:
|
||||
l_dir = []
|
||||
else:
|
||||
l_dir = os.listdir(directory)
|
||||
# empty directory
|
||||
if not l_dir:
|
||||
return l_sources_name.add(source_name)
|
||||
if source_name:
|
||||
return l_sources_name.add(source_name)
|
||||
else:
|
||||
return l_sources_name
|
||||
else:
|
||||
for src_name in l_dir:
|
||||
if len(src_name) == 4:
|
||||
if len(src_name) == 4 and source_name:
|
||||
# try:
|
||||
int(src_name)
|
||||
to_add = os.path.join(source_name)
|
||||
|
|
|
@ -1,12 +1,13 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
import yara
|
||||
|
||||
from hashlib import sha256
|
||||
from operator import itemgetter
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
|
@ -15,6 +16,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from lib.objects import ail_objects
|
||||
from lib.objects.Items import Item
|
||||
from lib.objects.Titles import Title
|
||||
from lib import correlations_engine
|
||||
from lib import regex_helper
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
@ -28,6 +30,8 @@ from modules.Onion import Onion
|
|||
from modules.Phone import Phone
|
||||
from modules.Tools import Tools
|
||||
|
||||
logger = logging.getLogger()
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
@ -58,18 +62,31 @@ def get_correl_match(extract_type, obj_id, content):
|
|||
correl = correlations_engine.get_correlation_by_correl_type('item', '', obj_id, extract_type)
|
||||
to_extract = []
|
||||
map_subtype = {}
|
||||
map_value_id = {}
|
||||
for c in correl:
|
||||
subtype, value = c.split(':', 1)
|
||||
map_subtype[value] = subtype
|
||||
to_extract.append(value)
|
||||
if extract_type == 'title':
|
||||
title = Title(value).get_content()
|
||||
to_extract.append(title)
|
||||
sha256_val = sha256(title.encode()).hexdigest()
|
||||
else:
|
||||
map_subtype[value] = subtype
|
||||
to_extract.append(value)
|
||||
sha256_val = sha256(value.encode()).hexdigest()
|
||||
map_value_id[sha256_val] = value
|
||||
if to_extract:
|
||||
objs = regex_helper.regex_finditer(r_key, '|'.join(to_extract), obj_id, content)
|
||||
for obj in objs:
|
||||
if map_subtype[obj[2]]:
|
||||
if map_subtype.get(obj[2]):
|
||||
subtype = map_subtype[obj[2]]
|
||||
else:
|
||||
subtype = ''
|
||||
extracted.append([obj[0], obj[1], obj[2], f'{extract_type}:{subtype}:{obj[2]}'])
|
||||
sha256_val = sha256(obj[2].encode()).hexdigest()
|
||||
value_id = map_value_id.get(sha256_val)
|
||||
if not value_id:
|
||||
logger.critical(f'Error module extractor: {sha256_val}\n{extract_type}\n{subtype}\n{value_id}\n{map_value_id}\n{objs}')
|
||||
value_id = 'ERROR'
|
||||
extracted.append([obj[0], obj[1], obj[2], f'{extract_type}:{subtype}:{value_id}'])
|
||||
return extracted
|
||||
|
||||
def _get_yara_match(data):
|
||||
|
@ -87,9 +104,13 @@ def _get_word_regex(word):
|
|||
|
||||
def convert_byte_offset_to_string(b_content, offset):
|
||||
byte_chunk = b_content[:offset + 1]
|
||||
string_chunk = byte_chunk.decode()
|
||||
offset = len(string_chunk) - 1
|
||||
return offset
|
||||
try:
|
||||
string_chunk = byte_chunk.decode()
|
||||
offset = len(string_chunk) - 1
|
||||
return offset
|
||||
except UnicodeDecodeError as e:
|
||||
logger.error(f'Yara offset converter error, {str(e)}\n{offset}/{len(b_content)}')
|
||||
return convert_byte_offset_to_string(b_content, offset - 1)
|
||||
|
||||
|
||||
# TODO RETRO HUNTS
|
||||
|
@ -155,6 +176,7 @@ def extract(obj_id, content=None):
|
|||
|
||||
# CHECK CACHE
|
||||
cached = r_cache.get(f'extractor:cache:{obj_id}')
|
||||
# cached = None
|
||||
if cached:
|
||||
r_cache.expire(f'extractor:cache:{obj_id}', 300)
|
||||
return json.loads(cached)
|
||||
|
@ -173,7 +195,7 @@ def extract(obj_id, content=None):
|
|||
if matches:
|
||||
extracted = extracted + matches
|
||||
|
||||
for obj_t in ['cve', 'cryptocurrency', 'username']: # Decoded, PGP->extract bloc
|
||||
for obj_t in ['cve', 'cryptocurrency', 'title', 'username']: # Decoded, PGP->extract bloc
|
||||
matches = get_correl_match(obj_t, obj_id, content)
|
||||
if matches:
|
||||
extracted = extracted + matches
|
||||
|
|
166
bin/lib/objects/ChatSubChannels.py
Executable file
166
bin/lib/objects/ChatSubChannels.py
Executable file
|
@ -0,0 +1,166 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from flask import url_for
|
||||
# from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
|
||||
|
||||
from lib.data_retention_engine import update_obj_date
|
||||
from lib.objects import ail_objects
|
||||
from lib.timeline_engine import Timeline
|
||||
|
||||
from lib.correlations_engine import get_correlation_by_correl_type
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class ChatSubChannel(AbstractChatObject):
|
||||
"""
|
||||
AIL Chat Object. (strings)
|
||||
"""
|
||||
|
||||
# ID -> <CHAT ID>/<SubChannel ID> subtype = chat_instance_uuid
|
||||
def __init__(self, id, subtype):
|
||||
super(ChatSubChannel, self).__init__('chat-subchannel', id, subtype)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO
|
||||
# if self.subtype == 'telegram':
|
||||
# style = 'fab'
|
||||
# icon = '\uf2c6'
|
||||
# elif self.subtype == 'discord':
|
||||
# style = 'fab'
|
||||
# icon = '\uf099'
|
||||
# else:
|
||||
# style = 'fas'
|
||||
# icon = '\uf007'
|
||||
style = 'far'
|
||||
icon = '\uf086'
|
||||
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
# TODO TIME LAST MESSAGES
|
||||
|
||||
def get_meta(self, options=set(), translation_target=None):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['name'] = self.get_name()
|
||||
if 'chat' in options:
|
||||
meta['chat'] = self.get_chat()
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
meta['img'] = meta['icon']
|
||||
if 'nb_messages' in options:
|
||||
meta['nb_messages'] = self.get_nb_messages()
|
||||
if 'created_at' in options:
|
||||
meta['created_at'] = self.get_created_at(date=True)
|
||||
if 'threads' in options:
|
||||
meta['threads'] = self.get_threads()
|
||||
if 'participants' in options:
|
||||
meta['participants'] = self.get_participants()
|
||||
if 'nb_participants' in options:
|
||||
meta['nb_participants'] = self.get_nb_participants()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation_name'] = self.translate(meta['name'], field='name', target=translation_target)
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
# obj_attrs = []
|
||||
# if self.subtype == 'telegram':
|
||||
# obj = MISPObject('telegram-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# elif self.subtype == 'twitter':
|
||||
# obj = MISPObject('twitter-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('name', value=self.id))
|
||||
#
|
||||
# else:
|
||||
# obj = MISPObject('user-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# first_seen = self.get_first_seen()
|
||||
# last_seen = self.get_last_seen()
|
||||
# if first_seen:
|
||||
# obj.first_seen = first_seen
|
||||
# if last_seen:
|
||||
# obj.last_seen = last_seen
|
||||
# if not first_seen or not last_seen:
|
||||
# self.logger.warning(
|
||||
# f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
#
|
||||
# for obj_attr in obj_attrs:
|
||||
# for tag in self.get_tags():
|
||||
# obj_attr.add_tag(tag)
|
||||
# return obj
|
||||
return
|
||||
|
||||
############################################################################
|
||||
############################################################################
|
||||
|
||||
# others optional metas, ... -> # TODO ALL meta in hset
|
||||
|
||||
def _get_timeline_name(self):
|
||||
return Timeline(self.get_global_id(), 'username')
|
||||
|
||||
def update_name(self, name, timestamp):
|
||||
self._get_timeline_name().add_timestamp(timestamp, name)
|
||||
|
||||
|
||||
# TODO # # # # # # # # # # #
|
||||
def get_users(self):
|
||||
pass
|
||||
|
||||
#### Categories ####
|
||||
|
||||
#### Threads ####
|
||||
|
||||
#### Messages #### TODO set parents
|
||||
|
||||
# def get_last_message_id(self):
|
||||
#
|
||||
# return r_object.hget(f'meta:{self.type}:{self.subtype}:{self.id}', 'last:message:id')
|
||||
|
||||
|
||||
class ChatSubChannels(AbstractChatObjects):
|
||||
def __init__(self):
|
||||
super().__init__('chat-subchannel')
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# chat = Chat('test', 'telegram')
|
||||
# r = chat.get_messages()
|
||||
# print(r)
|
120
bin/lib/objects/ChatThreads.py
Executable file
120
bin/lib/objects/ChatThreads.py
Executable file
|
@ -0,0 +1,120 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from flask import url_for
|
||||
# from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
|
||||
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class ChatThread(AbstractChatObject):
|
||||
"""
|
||||
AIL Chat Object. (strings)
|
||||
"""
|
||||
|
||||
def __init__(self, id, subtype):
|
||||
super().__init__('chat-thread', id, subtype)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO
|
||||
# if self.subtype == 'telegram':
|
||||
# style = 'fab'
|
||||
# icon = '\uf2c6'
|
||||
# elif self.subtype == 'discord':
|
||||
# style = 'fab'
|
||||
# icon = '\uf099'
|
||||
# else:
|
||||
# style = 'fas'
|
||||
# icon = '\uf007'
|
||||
style = 'fas'
|
||||
icon = '\uf7a4'
|
||||
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['subtype'] = self.subtype
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'name':
|
||||
meta['name'] = self.get_name()
|
||||
if 'nb_messages':
|
||||
meta['nb_messages'] = self.get_nb_messages()
|
||||
if 'participants':
|
||||
meta['participants'] = self.get_participants()
|
||||
if 'nb_participants':
|
||||
meta['nb_participants'] = self.get_nb_participants()
|
||||
# created_at ???
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
return
|
||||
|
||||
def create(self, container_obj, message_id):
|
||||
if message_id:
|
||||
parent_message = container_obj.get_obj_by_message_id(message_id)
|
||||
if parent_message: # TODO EXCEPTION IF DON'T EXISTS
|
||||
self.set_parent(obj_global_id=parent_message)
|
||||
_, _, parent_id = parent_message.split(':', 2)
|
||||
self.add_correlation('message', '', parent_id)
|
||||
else:
|
||||
self.set_parent(obj_global_id=container_obj.get_global_id())
|
||||
self.add_correlation(container_obj.get_type(), container_obj.get_subtype(r_str=True), container_obj.get_id())
|
||||
|
||||
def create(thread_id, chat_instance, chat_id, subchannel_id, message_id, container_obj):
|
||||
if container_obj.get_type() == 'chat':
|
||||
new_thread_id = f'{chat_id}/{thread_id}'
|
||||
# sub-channel
|
||||
else:
|
||||
new_thread_id = f'{chat_id}/{subchannel_id}/{thread_id}'
|
||||
|
||||
thread = ChatThread(new_thread_id, chat_instance)
|
||||
if not thread.is_children():
|
||||
thread.create(container_obj, message_id)
|
||||
return thread
|
||||
|
||||
class ChatThreads(AbstractChatObjects):
|
||||
def __init__(self):
|
||||
super().__init__('chat-thread')
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# chat = Chat('test', 'telegram')
|
||||
# r = chat.get_messages()
|
||||
# print(r)
|
216
bin/lib/objects/Chats.py
Executable file
216
bin/lib/objects/Chats.py
Executable file
|
@ -0,0 +1,216 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from flask import url_for
|
||||
# from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
|
||||
|
||||
|
||||
from lib.objects.abstract_subtype_object import AbstractSubtypeObject, get_all_id
|
||||
from lib.data_retention_engine import update_obj_date
|
||||
from lib.objects import ail_objects
|
||||
from lib.timeline_engine import Timeline
|
||||
|
||||
from lib.correlations_engine import get_correlation_by_correl_type
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class Chat(AbstractChatObject):
|
||||
"""
|
||||
AIL Chat Object.
|
||||
"""
|
||||
|
||||
def __init__(self, id, subtype):
|
||||
super(Chat, self).__init__('chat', id, subtype)
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO
|
||||
# if self.subtype == 'telegram':
|
||||
# style = 'fab'
|
||||
# icon = '\uf2c6'
|
||||
# elif self.subtype == 'discord':
|
||||
# style = 'fab'
|
||||
# icon = '\uf099'
|
||||
# else:
|
||||
# style = 'fas'
|
||||
# icon = '\uf007'
|
||||
style = 'fas'
|
||||
icon = '\uf086'
|
||||
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_meta(self, options=set(), translation_target=None):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['name'] = self.get_name()
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
meta['img'] = meta['icon']
|
||||
if 'info' in options:
|
||||
meta['info'] = self.get_info()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation_info'] = self.translate(meta['info'], field='info', target=translation_target)
|
||||
if 'participants' in options:
|
||||
meta['participants'] = self.get_participants()
|
||||
if 'nb_participants' in options:
|
||||
meta['nb_participants'] = self.get_nb_participants()
|
||||
if 'nb_messages' in options:
|
||||
meta['nb_messages'] = self.get_nb_messages()
|
||||
if 'username' in options:
|
||||
meta['username'] = self.get_username()
|
||||
if 'subchannels' in options:
|
||||
meta['subchannels'] = self.get_subchannels()
|
||||
if 'nb_subchannels':
|
||||
meta['nb_subchannels'] = self.get_nb_subchannels()
|
||||
if 'created_at':
|
||||
meta['created_at'] = self.get_created_at(date=True)
|
||||
if 'threads' in options:
|
||||
meta['threads'] = self.get_threads()
|
||||
if 'tags_safe' in options:
|
||||
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
# obj_attrs = []
|
||||
# if self.subtype == 'telegram':
|
||||
# obj = MISPObject('telegram-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# elif self.subtype == 'twitter':
|
||||
# obj = MISPObject('twitter-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('name', value=self.id))
|
||||
#
|
||||
# else:
|
||||
# obj = MISPObject('user-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# first_seen = self.get_first_seen()
|
||||
# last_seen = self.get_last_seen()
|
||||
# if first_seen:
|
||||
# obj.first_seen = first_seen
|
||||
# if last_seen:
|
||||
# obj.last_seen = last_seen
|
||||
# if not first_seen or not last_seen:
|
||||
# self.logger.warning(
|
||||
# f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
#
|
||||
# for obj_attr in obj_attrs:
|
||||
# for tag in self.get_tags():
|
||||
# obj_attr.add_tag(tag)
|
||||
# return obj
|
||||
return
|
||||
|
||||
############################################################################
|
||||
############################################################################
|
||||
|
||||
# users that send at least a message else participants/spectator
|
||||
# correlation created by messages
|
||||
def get_users(self):
|
||||
users = set()
|
||||
accounts = self.get_correlation('user-account').get('user-account', [])
|
||||
for account in accounts:
|
||||
users.add(account[1:])
|
||||
return users
|
||||
|
||||
def _get_timeline_username(self):
|
||||
return Timeline(self.get_global_id(), 'username')
|
||||
|
||||
def get_username(self):
|
||||
return self._get_timeline_username().get_last_obj_id()
|
||||
|
||||
def get_usernames(self):
|
||||
return self._get_timeline_username().get_objs_ids()
|
||||
|
||||
def update_username_timeline(self, username_global_id, timestamp):
|
||||
self._get_timeline_username().add_timestamp(timestamp, username_global_id)
|
||||
|
||||
#### ChatSubChannels ####
|
||||
|
||||
|
||||
#### Categories ####
|
||||
|
||||
#### Threads ####
|
||||
|
||||
#### Messages #### TODO set parents
|
||||
|
||||
# def get_last_message_id(self):
|
||||
#
|
||||
# return r_object.hget(f'meta:{self.type}:{self.subtype}:{self.id}', 'last:message:id')
|
||||
|
||||
# def add(self, timestamp, obj_id, mess_id=0, username=None, user_id=None):
|
||||
# date = # TODO get date from object
|
||||
# self.update_daterange(date)
|
||||
# update_obj_date(date, self.type, self.subtype)
|
||||
#
|
||||
#
|
||||
# # daily
|
||||
# r_object.hincrby(f'{self.type}:{self.subtype}:{date}', self.id, 1)
|
||||
# # all subtypes
|
||||
# r_object.zincrby(f'{self.type}_all:{self.subtype}', 1, self.id)
|
||||
#
|
||||
# #######################################################################
|
||||
# #######################################################################
|
||||
#
|
||||
# # Correlations
|
||||
# self.add_correlation('item', '', item_id)
|
||||
# # domain
|
||||
# if is_crawled(item_id):
|
||||
# domain = get_item_domain(item_id)
|
||||
# self.add_correlation('domain', '', domain)
|
||||
|
||||
# importer -> use cache for previous reply SET to_add_id: previously_imported : expire SET key -> 30 mn
|
||||
|
||||
|
||||
class Chats(AbstractChatObjects):
|
||||
def __init__(self):
|
||||
super().__init__('chat')
|
||||
|
||||
# TODO factorize
|
||||
def get_all_subtypes():
|
||||
return ail_core.get_object_all_subtypes('chat')
|
||||
|
||||
def get_all():
|
||||
objs = {}
|
||||
for subtype in get_all_subtypes():
|
||||
objs[subtype] = get_all_by_subtype(subtype)
|
||||
return objs
|
||||
|
||||
def get_all_by_subtype(subtype):
|
||||
return get_all_id('chat', subtype)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
chat = Chat('test', 'telegram')
|
||||
r = chat.get_messages()
|
||||
print(r)
|
118
bin/lib/objects/CookiesNames.py
Executable file
118
bin/lib/objects/CookiesNames.py
Executable file
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
# TODO NEW ABSTRACT OBJECT -> daterange for all objects ????
|
||||
|
||||
class CookieName(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL CookieName Object.
|
||||
"""
|
||||
|
||||
def __init__(self, obj_id):
|
||||
super(CookieName, self).__init__('cookie-name', obj_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf564', 'color': '#BFD677', 'radius': 5} # f563
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('cookie')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('cookie-name', value=self.get_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('domain')
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
if not isinstance(content, str):
|
||||
content = content.decode()
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create(content):
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
obj_id = sha256(content).hexdigest()
|
||||
cookie = CookieName(obj_id)
|
||||
if not cookie.exists():
|
||||
cookie.create(content)
|
||||
return cookie
|
||||
|
||||
|
||||
class CookiesNames(AbstractDaterangeObjects):
|
||||
"""
|
||||
CookieName Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('cookie-name', CookieName)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
|
@ -107,8 +107,15 @@ class CryptoCurrency(AbstractSubtypeObject):
|
|||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('coin-address')
|
||||
obj.first_seen = self.get_first_seen()
|
||||
obj.last_seen = self.get_last_seen()
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('address', value=self.id))
|
||||
crypto_symbol = self.get_currency_symbol()
|
||||
|
|
|
@ -57,8 +57,15 @@ class Cve(AbstractDaterangeObject):
|
|||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('vulnerability')
|
||||
obj.first_seen = self.get_first_seen()
|
||||
obj.last_seen = self.get_last_seen()
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('id', value=self.id))
|
||||
for obj_attr in obj_attrs:
|
||||
|
@ -72,9 +79,6 @@ class Cve(AbstractDaterangeObject):
|
|||
meta['tags'] = self.get_tags(r_list=True)
|
||||
return meta
|
||||
|
||||
def add(self, date, item_id):
|
||||
self._add(date, item_id)
|
||||
|
||||
def get_cve_search(self):
|
||||
try:
|
||||
response = requests.get(f'https://cvepremium.circl.lu/api/cve/{self.id}', timeout=10)
|
||||
|
|
|
@ -111,13 +111,25 @@ class Decoded(AbstractDaterangeObject):
|
|||
def get_rel_path(self, mimetype=None):
|
||||
if not mimetype:
|
||||
mimetype = self.get_mimetype()
|
||||
if not mimetype:
|
||||
self.logger.warning(f'Decoded {self.id}: Empty mimetype')
|
||||
return None
|
||||
return os.path.join(HASH_DIR, mimetype, self.id[0:2], self.id)
|
||||
|
||||
def get_filepath(self, mimetype=None):
|
||||
return os.path.join(os.environ['AIL_HOME'], self.get_rel_path(mimetype=mimetype))
|
||||
rel_path = self.get_rel_path(mimetype=mimetype)
|
||||
if not rel_path:
|
||||
return None
|
||||
else:
|
||||
return os.path.join(os.environ['AIL_HOME'], rel_path)
|
||||
|
||||
def get_content(self, mimetype=None, r_type='str'):
|
||||
filepath = self.get_filepath(mimetype=mimetype)
|
||||
if not filepath:
|
||||
if r_type == 'str':
|
||||
return ''
|
||||
else:
|
||||
return b''
|
||||
if r_type == 'str':
|
||||
with open(filepath, 'r') as f:
|
||||
content = f.read()
|
||||
|
@ -126,7 +138,7 @@ class Decoded(AbstractDaterangeObject):
|
|||
with open(filepath, 'rb') as f:
|
||||
content = f.read()
|
||||
return content
|
||||
elif r_str == 'bytesio':
|
||||
elif r_type == 'bytesio':
|
||||
with open(filepath, 'rb') as f:
|
||||
content = BytesIO(f.read())
|
||||
return content
|
||||
|
@ -137,15 +149,22 @@ class Decoded(AbstractDaterangeObject):
|
|||
with zipfile.ZipFile(zip_content, "w") as zf:
|
||||
# TODO: Fix password
|
||||
# zf.setpassword(b"infected")
|
||||
zf.writestr(self.id, self.get_content().getvalue())
|
||||
zf.writestr(self.id, self.get_content(r_type='bytesio').getvalue())
|
||||
zip_content.seek(0)
|
||||
return zip_content
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('file')
|
||||
obj.first_seen = self.get_first_seen()
|
||||
obj.last_seen = self.get_last_seen()
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('sha1', value=self.id))
|
||||
obj_attrs.append(obj.add_attribute('mimetype', value=self.get_mimetype()))
|
||||
|
@ -220,8 +239,8 @@ class Decoded(AbstractDaterangeObject):
|
|||
|
||||
return True
|
||||
|
||||
def add(self, algo_name, date, obj_id, mimetype=None):
|
||||
self._add(date, obj_id)
|
||||
def add(self, date, obj, algo_name, mimetype=None):
|
||||
self._add(date, obj)
|
||||
if not mimetype:
|
||||
mimetype = self.get_mimetype()
|
||||
|
||||
|
@ -435,13 +454,13 @@ def get_all_decodeds_objects(filters={}):
|
|||
if i >= len(files):
|
||||
files = []
|
||||
for file in files:
|
||||
yield Decoded(file).id
|
||||
yield Decoded(file)
|
||||
|
||||
|
||||
############################################################################
|
||||
|
||||
def sanityze_decoder_names(decoder_name):
|
||||
if decoder_name not in Decodeds.get_algos():
|
||||
if decoder_name not in get_algos():
|
||||
return None
|
||||
else:
|
||||
return decoder_name
|
||||
|
|
|
@ -311,6 +311,9 @@ class Domain(AbstractObject):
|
|||
root_item = self.get_last_item_root()
|
||||
if root_item:
|
||||
return self.get_crawled_items(root_item)
|
||||
else:
|
||||
return []
|
||||
|
||||
|
||||
# TODO FIXME
|
||||
def get_all_urls(self, date=False, epoch=None):
|
||||
|
@ -341,8 +344,15 @@ class Domain(AbstractObject):
|
|||
# create domain-ip obj
|
||||
obj_attrs = []
|
||||
obj = MISPObject('domain-crawled', standalone=True)
|
||||
obj.first_seen = self.get_first_seen()
|
||||
obj.last_seen = self.get_last_check()
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_check()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('domain', value=self.id))
|
||||
urls = self.get_all_urls(date=True, epoch=epoch)
|
||||
|
@ -379,10 +389,10 @@ class Domain(AbstractObject):
|
|||
har = get_item_har(item_id)
|
||||
if har:
|
||||
print(har)
|
||||
_write_in_zip_buffer(zf, os.path.join(hars_dir, har), f'{basename}.json')
|
||||
_write_in_zip_buffer(zf, os.path.join(hars_dir, har), f'{basename}.json.gz')
|
||||
# Screenshot
|
||||
screenshot = self._get_external_correlation('item', '', item_id, 'screenshot')
|
||||
if screenshot:
|
||||
if screenshot and screenshot['screenshot']:
|
||||
screenshot = screenshot['screenshot'].pop()[1:]
|
||||
screenshot = os.path.join(screenshot[0:2], screenshot[2:4], screenshot[4:6], screenshot[6:8],
|
||||
screenshot[8:10], screenshot[10:12], screenshot[12:])
|
||||
|
@ -585,21 +595,22 @@ def get_domains_up_by_filers(domain_types, date_from=None, date_to=None, tags=[]
|
|||
return None
|
||||
|
||||
def sanitize_domain_name_to_search(name_to_search, domain_type):
|
||||
if not name_to_search:
|
||||
return ""
|
||||
if domain_type == 'onion':
|
||||
r_name = r'[a-z0-9\.]+'
|
||||
else:
|
||||
r_name = r'[a-zA-Z0-9-_\.]+'
|
||||
# invalid domain name
|
||||
if not re.fullmatch(r_name, name_to_search):
|
||||
res = re.match(r_name, name_to_search)
|
||||
return {'search': name_to_search, 'error': res.string.replace( res[0], '')}
|
||||
return ""
|
||||
return name_to_search.replace('.', '\.')
|
||||
|
||||
def search_domain_by_name(name_to_search, domain_types, r_pos=False):
|
||||
domains = {}
|
||||
for domain_type in domain_types:
|
||||
r_name = sanitize_domain_name_to_search(name_to_search, domain_type)
|
||||
if not name_to_search or isinstance(r_name, dict):
|
||||
if not r_name:
|
||||
break
|
||||
r_name = re.compile(r_name)
|
||||
for domain in get_domains_up_by_type(domain_type):
|
||||
|
|
118
bin/lib/objects/Etags.py
Executable file
118
bin/lib/objects/Etags.py
Executable file
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
# TODO NEW ABSTRACT OBJECT -> daterange for all objects ????
|
||||
|
||||
class Etag(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Etag Object.
|
||||
"""
|
||||
|
||||
def __init__(self, obj_id):
|
||||
super(Etag, self).__init__('etag', obj_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf02b', 'color': '#556F65', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('etag')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('etag', value=self.get_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('domain')
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
if not isinstance(content, str):
|
||||
content = content.decode()
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create(content):
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
obj_id = sha256(content).hexdigest()
|
||||
etag = Etag(obj_id)
|
||||
if not etag.exists():
|
||||
etag.create(content)
|
||||
return etag
|
||||
|
||||
|
||||
class Etags(AbstractDaterangeObjects):
|
||||
"""
|
||||
Etags Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('etag', Etag)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
118
bin/lib/objects/Favicons.py
Executable file
118
bin/lib/objects/Favicons.py
Executable file
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import mmh3
|
||||
import os
|
||||
import sys
|
||||
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class Favicon(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Favicon Object.
|
||||
"""
|
||||
|
||||
def __init__(self, id):
|
||||
super(Favicon, self).__init__('favicon', id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf20a', 'color': '#1E88E5', 'radius': 5} # f0c8 f45c
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('favicon')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('favicon-mmh3', value=self.id))
|
||||
obj_attrs.append(obj.add_attribute('favicon', value=self.get_content(r_type='bytes')))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
# def get_links(self):
|
||||
# # TODO GET ALL URLS FROM CORRELATED ITEMS
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
if not isinstance(content, str):
|
||||
content = content.decode()
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create_favicon(content, url=None): # TODO URL ????
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
favicon_id = mmh3.hash_bytes(content)
|
||||
favicon = Favicon(favicon_id)
|
||||
if not favicon.exists():
|
||||
favicon.create(content)
|
||||
|
||||
|
||||
class Favicons(AbstractDaterangeObjects):
|
||||
"""
|
||||
Favicons Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('favicon', Favicon)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
101
bin/lib/objects/FilesNames.py
Executable file
101
bin/lib/objects/FilesNames.py
Executable file
|
@ -0,0 +1,101 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class FileName(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL FileName Object. (strings)
|
||||
"""
|
||||
|
||||
# ID = SHA256
|
||||
def __init__(self, name):
|
||||
super().__init__('file-name', name)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'far', 'icon': '\uf249', 'color': '#36F5D5', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('file')
|
||||
|
||||
# obj_attrs.append(obj.add_attribute('sha256', value=self.id))
|
||||
# obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'tags_safe' in options:
|
||||
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
|
||||
return meta
|
||||
|
||||
def create(self): # create ALL SET ??????
|
||||
pass
|
||||
|
||||
def add_reference(self, date, src_ail_object, file_obj=None):
|
||||
self.add(date, src_ail_object)
|
||||
if file_obj:
|
||||
self.add_correlation(file_obj.type, file_obj.get_subtype(r_str=True), file_obj.get_id())
|
||||
|
||||
# TODO USE ZSET FOR ALL OBJS IDS ??????
|
||||
|
||||
class FilesNames(AbstractDaterangeObjects):
|
||||
"""
|
||||
CookieName Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('file-name', FileName)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search
|
||||
|
||||
# TODO sanitize file name
|
||||
def create(self, name, date, src_ail_object, file_obj=None, limit=500, force=False):
|
||||
if 0 < len(name) <= limit or force or limit < 0:
|
||||
file_name = self.obj_class(name)
|
||||
# if not file_name.exists():
|
||||
# file_name.create()
|
||||
file_name.add_reference(date, src_ail_object, file_obj=file_obj)
|
||||
return file_name
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '29ba'
|
||||
# print(search_screenshots_by_name(name_to_search))
|
135
bin/lib/objects/HHHashs.py
Executable file
135
bin/lib/objects/HHHashs.py
Executable file
|
@ -0,0 +1,135 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import hashlib
|
||||
import os
|
||||
import sys
|
||||
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class HHHash(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL HHHash Object.
|
||||
"""
|
||||
|
||||
def __init__(self, obj_id):
|
||||
super(HHHash, self).__init__('hhhash', obj_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf036', 'color': '#71D090', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('hhhash')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('hhhash', value=self.get_id()))
|
||||
obj_attrs.append(obj.add_attribute('hhhash-headers', value=self.get_content()))
|
||||
obj_attrs.append(obj.add_attribute('hhhash-tool', value='lacus'))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('domain')
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, hhhash_header, _first_seen=None, _last_seen=None): # TODO CREATE ADD FUNCTION -> urls set
|
||||
self._set_field('content', hhhash_header)
|
||||
self._create()
|
||||
|
||||
|
||||
def create(hhhash_header, hhhash=None):
|
||||
if not hhhash:
|
||||
hhhash = hhhash_headers(hhhash_header)
|
||||
hhhash = HHHash(hhhash)
|
||||
if not hhhash.exists():
|
||||
hhhash.create(hhhash_header)
|
||||
return hhhash
|
||||
|
||||
def build_hhhash_headers(dict_headers): # filter_dup=True
|
||||
hhhash = ''
|
||||
previous_header = ''
|
||||
for header in dict_headers:
|
||||
header_name = header.get('name')
|
||||
if header_name:
|
||||
if header_name != previous_header: # remove dup headers, filter playwright invalid splitting
|
||||
hhhash = f'{hhhash}:{header_name}'
|
||||
previous_header = header_name
|
||||
hhhash = hhhash[1:]
|
||||
# print(hhhash)
|
||||
return hhhash
|
||||
|
||||
def hhhash_headers(header_hhhash):
|
||||
m = hashlib.sha256()
|
||||
m.update(header_hhhash.encode())
|
||||
digest = m.hexdigest()
|
||||
return f"hhh:1:{digest}"
|
||||
|
||||
|
||||
class HHHashs(AbstractDaterangeObjects):
|
||||
"""
|
||||
HHHashs Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('hhhash', HHHash)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
135
bin/lib/objects/Images.py
Executable file
135
bin/lib/objects/Images.py
Executable file
|
@ -0,0 +1,135 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import base64
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from io import BytesIO
|
||||
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_serv_metadata = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
IMAGE_FOLDER = config_loader.get_files_directory('images')
|
||||
config_loader = None
|
||||
|
||||
|
||||
class Image(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Screenshot Object. (strings)
|
||||
"""
|
||||
|
||||
# ID = SHA256
|
||||
def __init__(self, image_id):
|
||||
super(Image, self).__init__('image', image_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def exists(self):
|
||||
return os.path.isfile(self.get_filepath())
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'far', 'icon': '\uf03e', 'color': '#E1F5DF', 'radius': 5}
|
||||
|
||||
def get_rel_path(self):
|
||||
rel_path = os.path.join(self.id[0:2], self.id[2:4], self.id[4:6], self.id[6:8], self.id[8:10], self.id[10:12], self.id[12:])
|
||||
return rel_path
|
||||
|
||||
def get_filepath(self):
|
||||
filename = os.path.join(IMAGE_FOLDER, self.get_rel_path())
|
||||
return os.path.realpath(filename)
|
||||
|
||||
def get_file_content(self):
|
||||
filepath = self.get_filepath()
|
||||
with open(filepath, 'rb') as f:
|
||||
file_content = BytesIO(f.read())
|
||||
return file_content
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
return self.get_file_content()
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('file')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('sha256', value=self.id))
|
||||
obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['img'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
if 'tags_safe' in options:
|
||||
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
|
||||
return meta
|
||||
|
||||
def create(self, content):
|
||||
filepath = self.get_filepath()
|
||||
dirname = os.path.dirname(filepath)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
with open(filepath, 'wb') as f:
|
||||
f.write(content)
|
||||
|
||||
def get_screenshot_dir():
|
||||
return IMAGE_FOLDER
|
||||
|
||||
|
||||
def create(content, size_limit=5000000, b64=False, force=False):
|
||||
size = (len(content)*3) / 4
|
||||
if size <= size_limit or size_limit < 0 or force:
|
||||
if b64:
|
||||
content = base64.standard_b64decode(content.encode())
|
||||
image_id = sha256(content).hexdigest()
|
||||
image = Image(image_id)
|
||||
if not image.exists():
|
||||
image.create(content)
|
||||
return image
|
||||
|
||||
|
||||
class Images(AbstractDaterangeObjects):
|
||||
"""
|
||||
CookieName Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('image', Image)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '29ba'
|
||||
# print(search_screenshots_by_name(name_to_search))
|
|
@ -7,10 +7,10 @@ import magic
|
|||
import os
|
||||
import re
|
||||
import sys
|
||||
import cld3
|
||||
import html2text
|
||||
|
||||
from io import BytesIO
|
||||
from uuid import uuid4
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
|
@ -18,10 +18,11 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ail_core import get_ail_uuid
|
||||
from lib.ail_core import get_ail_uuid, rreplace
|
||||
from lib.objects.abstract_object import AbstractObject
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import item_basic
|
||||
from lib.Language import LanguagesDetector
|
||||
from lib.data_retention_engine import update_obj_date, get_obj_date_first
|
||||
from packages import Date
|
||||
|
||||
|
@ -137,9 +138,23 @@ class Item(AbstractObject):
|
|||
####################################################################################
|
||||
####################################################################################
|
||||
|
||||
def sanitize_id(self):
|
||||
pass
|
||||
# TODO ADD function to check if ITEM (content + file) already exists
|
||||
|
||||
def sanitize_id(self):
|
||||
if ITEMS_FOLDER in self.id:
|
||||
self.id = self.id.replace(ITEMS_FOLDER, '', 1)
|
||||
|
||||
# limit filename length
|
||||
basename = self.get_basename()
|
||||
if len(basename) > 255:
|
||||
new_basename = f'{basename[:215]}{str(uuid4())}.gz'
|
||||
self.id = rreplace(self.id, basename, new_basename, 1)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
return self.id
|
||||
|
||||
# # TODO: sanitize_id
|
||||
# # TODO: check if already exists ?
|
||||
|
@ -211,9 +226,13 @@ class Item(AbstractObject):
|
|||
return {'style': '', 'icon': '', 'color': color, 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_date = self.get_date()
|
||||
obj = MISPObject('ail-leak', standalone=True)
|
||||
obj.first_seen = obj_date
|
||||
obj_date = self.get_date()
|
||||
if obj_date:
|
||||
obj.first_seen = obj_date
|
||||
else:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={obj_date}')
|
||||
|
||||
obj_attrs = [obj.add_attribute('first-seen', value=obj_date),
|
||||
obj.add_attribute('raw-data', value=self.id, data=self.get_raw_content()),
|
||||
|
@ -260,10 +279,9 @@ class Item(AbstractObject):
|
|||
"""
|
||||
if options is None:
|
||||
options = set()
|
||||
meta = {'id': self.id,
|
||||
'date': self.get_date(separator=True),
|
||||
'source': self.get_source(),
|
||||
'tags': self.get_tags(r_list=True)}
|
||||
meta = self.get_default_meta(tags=True)
|
||||
meta['date'] = self.get_date(separator=True)
|
||||
meta['source'] = self.get_source()
|
||||
# optional meta fields
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
|
@ -283,6 +301,10 @@ class Item(AbstractObject):
|
|||
if 'mimetype' in options:
|
||||
content = meta.get('content')
|
||||
meta['mimetype'] = self.get_mimetype(content=content)
|
||||
if 'investigations' in options:
|
||||
meta['investigations'] = self.get_investigations()
|
||||
if 'link' in options:
|
||||
meta['link'] = self.get_link(flask_context=True)
|
||||
|
||||
# meta['encoding'] = None
|
||||
return meta
|
||||
|
@ -316,21 +338,10 @@ class Item(AbstractObject):
|
|||
nb_line += 1
|
||||
return {'nb': nb_line, 'max_length': max_length}
|
||||
|
||||
def get_languages(self, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7):
|
||||
all_languages = []
|
||||
## CLEAN CONTENT ##
|
||||
content = self.get_html2text_content(ignore_links=True)
|
||||
content = remove_all_urls_from_content(self.id, item_content=content) ##########################################
|
||||
# REMOVE USELESS SPACE
|
||||
content = ' '.join(content.split())
|
||||
#- CLEAN CONTENT -#
|
||||
#print(content)
|
||||
#print(len(content))
|
||||
if len(content) >= min_len: # # TODO: # FIXME: check num langs limit
|
||||
for lang in cld3.get_frequent_languages(content, num_langs=num_langs):
|
||||
if lang.proportion >= min_proportion and lang.probability >= min_probability and lang.is_reliable:
|
||||
all_languages.append(lang)
|
||||
return all_languages
|
||||
# TODO RENAME ME
|
||||
def get_languages(self, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7, force_gcld3=False):
|
||||
ld = LanguagesDetector(nb_langs=num_langs, min_proportion=min_proportion, min_probability=min_probability, min_len=min_len)
|
||||
return ld.detect(self.get_content(), force_gcld3=force_gcld3)
|
||||
|
||||
def get_mimetype(self, content=None):
|
||||
if not content:
|
||||
|
@ -476,7 +487,10 @@ def get_all_items_objects(filters={}):
|
|||
daterange = Date.get_daterange(date_from, date_to)
|
||||
else:
|
||||
date_from = get_obj_date_first('item')
|
||||
daterange = Date.get_daterange(date_from, Date.get_today_date_str())
|
||||
if date_from:
|
||||
daterange = Date.get_daterange(date_from, Date.get_today_date_str())
|
||||
else:
|
||||
daterange = []
|
||||
if start_date:
|
||||
if int(start_date) > int(date_from):
|
||||
i = 0
|
||||
|
@ -615,61 +629,6 @@ def get_item_metadata(item_id, item_content=None):
|
|||
def get_item_content(item_id):
|
||||
return item_basic.get_item_content(item_id)
|
||||
|
||||
def get_item_content_html2text(item_id, item_content=None, ignore_links=False):
|
||||
if not item_content:
|
||||
item_content = get_item_content(item_id)
|
||||
h = html2text.HTML2Text()
|
||||
h.ignore_links = ignore_links
|
||||
h.ignore_images = ignore_links
|
||||
return h.handle(item_content)
|
||||
|
||||
def remove_all_urls_from_content(item_id, item_content=None):
|
||||
if not item_content:
|
||||
item_content = get_item_content(item_id)
|
||||
regex = r'\b(?:http://|https://)?(?:[a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)(?:\:[0-9]+)*(?:/(?:$|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*\b'
|
||||
url_regex = re.compile(regex)
|
||||
urls = url_regex.findall(item_content)
|
||||
urls = sorted(urls, key=len, reverse=True)
|
||||
for url in urls:
|
||||
item_content = item_content.replace(url, '')
|
||||
|
||||
regex_pgp_public_blocs = r'-----BEGIN PGP PUBLIC KEY BLOCK-----[\s\S]+?-----END PGP PUBLIC KEY BLOCK-----'
|
||||
regex_pgp_signature = r'-----BEGIN PGP SIGNATURE-----[\s\S]+?-----END PGP SIGNATURE-----'
|
||||
regex_pgp_message = r'-----BEGIN PGP MESSAGE-----[\s\S]+?-----END PGP MESSAGE-----'
|
||||
re.compile(regex_pgp_public_blocs)
|
||||
re.compile(regex_pgp_signature)
|
||||
re.compile(regex_pgp_message)
|
||||
|
||||
res = re.findall(regex_pgp_public_blocs, item_content)
|
||||
for it in res:
|
||||
item_content = item_content.replace(it, '')
|
||||
res = re.findall(regex_pgp_signature, item_content)
|
||||
for it in res:
|
||||
item_content = item_content.replace(it, '')
|
||||
res = re.findall(regex_pgp_message, item_content)
|
||||
for it in res:
|
||||
item_content = item_content.replace(it, '')
|
||||
|
||||
return item_content
|
||||
|
||||
def get_item_languages(item_id, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7):
|
||||
all_languages = []
|
||||
|
||||
## CLEAN CONTENT ##
|
||||
content = get_item_content_html2text(item_id, ignore_links=True)
|
||||
content = remove_all_urls_from_content(item_id, item_content=content)
|
||||
|
||||
# REMOVE USELESS SPACE
|
||||
content = ' '.join(content.split())
|
||||
#- CLEAN CONTENT -#
|
||||
|
||||
#print(content)
|
||||
#print(len(content))
|
||||
if len(content) >= min_len:
|
||||
for lang in cld3.get_frequent_languages(content, num_langs=num_langs):
|
||||
if lang.proportion >= min_proportion and lang.probability >= min_probability and lang.is_reliable:
|
||||
all_languages.append(lang)
|
||||
return all_languages
|
||||
|
||||
# API
|
||||
# def get_item(request_dict):
|
||||
|
@ -920,13 +879,13 @@ def create_item(obj_id, obj_metadata, io_content):
|
|||
# delete_item(child_id)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# if __name__ == '__main__':
|
||||
# content = 'test file content'
|
||||
# duplicates = {'tests/2020/01/02/test.gz': [{'algo':'ssdeep', 'similarity':75}, {'algo':'tlsh', 'similarity':45}]}
|
||||
#
|
||||
# item = Item('tests/2020/01/02/test_save.gz')
|
||||
# item = Item('tests/2020/01/02/test_save.gz')
|
||||
# item.create(content, _save=False)
|
||||
filters = {'date_from': '20230101', 'date_to': '20230501', 'sources': ['crawled', 'submitted'], 'start': ':submitted/2023/04/28/submitted_2b3dd861-a75d-48e4-8cec-6108d41450da.gz'}
|
||||
gen = get_all_items_objects(filters=filters)
|
||||
for obj_id in gen:
|
||||
print(obj_id.id)
|
||||
# filters = {'date_from': '20230101', 'date_to': '20230501', 'sources': ['crawled', 'submitted'], 'start': ':submitted/2023/04/28/submitted_2b3dd861-a75d-48e4-8cec-6108d41450da.gz'}
|
||||
# gen = get_all_items_objects(filters=filters)
|
||||
# for obj_id in gen:
|
||||
# print(obj_id.id)
|
||||
|
|
348
bin/lib/objects/Messages.py
Executable file
348
bin/lib/objects/Messages.py
Executable file
|
@ -0,0 +1,348 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ail_core import get_ail_uuid
|
||||
from lib.objects.abstract_object import AbstractObject
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import Language
|
||||
from lib.objects import UsersAccount
|
||||
from lib.data_retention_engine import update_obj_date, get_obj_date_first
|
||||
# TODO Set all messages ???
|
||||
|
||||
|
||||
from flask import url_for
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
# r_content = config_loader.get_db_conn("Kvrocks_Content")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
# TODO SAVE OR EXTRACT MESSAGE SOURCE FOR ICON ?????????
|
||||
# TODO iterate on all objects
|
||||
# TODO also add support for small objects ????
|
||||
|
||||
# CAN Message exists without CHAT -> no convert it to object
|
||||
|
||||
# ID: source:chat_id:message_id ????
|
||||
#
|
||||
# /!\ handle null chat and message id -> chat = uuid and message = timestamp ???
|
||||
|
||||
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<message ID> => telegram without channels
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Channel ID>/<message ID>
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Thread ID>/<message ID>
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Channel ID>/<Thread ID>/<message ID>
|
||||
class Message(AbstractObject):
|
||||
"""
|
||||
AIL Message Object. (strings)
|
||||
"""
|
||||
|
||||
def __init__(self, id): # TODO subtype or use source ????
|
||||
super(Message, self).__init__('message', id) # message::< telegram/1692189934.380827/ChatID_MessageID >
|
||||
|
||||
def exists(self):
|
||||
if self.subtype is None:
|
||||
return r_object.exists(f'meta:{self.type}:{self.id}')
|
||||
else:
|
||||
return r_object.exists(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
|
||||
|
||||
def get_source(self):
|
||||
"""
|
||||
Returns source/feeder name
|
||||
"""
|
||||
l_source = self.id.split('/')[:-2]
|
||||
return os.path.join(*l_source)
|
||||
|
||||
def get_basename(self):
|
||||
return os.path.basename(self.id)
|
||||
|
||||
def get_content(self, r_type='str'): # TODO ADD cache # TODO Compress content ???????
|
||||
"""
|
||||
Returns content
|
||||
"""
|
||||
global_id = self.get_global_id()
|
||||
content = r_cache.get(f'content:{global_id}')
|
||||
if not content:
|
||||
content = self._get_field('content')
|
||||
if content:
|
||||
r_cache.set(f'content:{global_id}', content)
|
||||
r_cache.expire(f'content:{global_id}', 300)
|
||||
if r_type == 'str':
|
||||
return content
|
||||
elif r_type == 'bytes':
|
||||
return content.encode()
|
||||
|
||||
def get_date(self):
|
||||
timestamp = self.get_timestamp()
|
||||
return datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
|
||||
def get_timestamp(self):
|
||||
dirs = self.id.split('/')
|
||||
return dirs[1]
|
||||
|
||||
def get_message_id(self): # TODO optimize
|
||||
message_id = self.get_basename().rsplit('/', 1)[1]
|
||||
# if message_id.endswith('.gz'):
|
||||
# message_id = message_id[:-3]
|
||||
return message_id
|
||||
|
||||
def get_chat_id(self): # TODO optimize -> use me to tag Chat
|
||||
chat_id = self.get_basename().rsplit('_', 1)[0]
|
||||
return chat_id
|
||||
|
||||
def get_thread(self):
|
||||
for child in self.get_childrens():
|
||||
obj_type, obj_subtype, obj_id = child.split(':', 2)
|
||||
if obj_type == 'chat-thread':
|
||||
nb_messages = r_object.zcard(f'messages:{obj_type}:{obj_subtype}:{obj_id}')
|
||||
return {'type': obj_type, 'subtype': obj_subtype, 'id': obj_id, 'nb': nb_messages}
|
||||
|
||||
# TODO get Instance ID
|
||||
# TODO get channel ID
|
||||
# TODO get thread ID
|
||||
|
||||
def get_images(self):
|
||||
images = []
|
||||
for child in self.get_childrens():
|
||||
obj_type, _, obj_id = child.split(':', 2)
|
||||
if obj_type == 'image':
|
||||
images.append(obj_id)
|
||||
return images
|
||||
|
||||
def get_user_account(self, meta=False):
|
||||
user_account = self.get_correlation('user-account')
|
||||
if user_account.get('user-account'):
|
||||
user_account = f'user-account:{user_account["user-account"].pop()}'
|
||||
if meta:
|
||||
_, user_account_subtype, user_account_id = user_account.split(':', 3)
|
||||
user_account = UsersAccount.UserAccount(user_account_id, user_account_subtype).get_meta(options={'icon', 'username', 'username_meta'})
|
||||
return user_account
|
||||
|
||||
def get_files_names(self):
|
||||
names = []
|
||||
filenames = self.get_correlation('file-name').get('file-name')
|
||||
if filenames:
|
||||
for name in filenames:
|
||||
names.append(name[1:])
|
||||
return names
|
||||
|
||||
def get_reactions(self):
|
||||
return r_object.hgetall(f'meta:reactions:{self.type}::{self.id}')
|
||||
|
||||
# TODO sanitize reactions
|
||||
def add_reaction(self, reactions, nb_reaction):
|
||||
r_object.hset(f'meta:reactions:{self.type}::{self.id}', reactions, nb_reaction)
|
||||
|
||||
# Interactions between users -> use replies
|
||||
# nb views
|
||||
# MENTIONS -> Messages + Chats
|
||||
# # relationship -> mention - Chat -> Chat
|
||||
# - Message -> Chat
|
||||
# - Message -> Message ??? fetch mentioned messages
|
||||
# FORWARDS
|
||||
# TODO Create forward CHAT -> message
|
||||
# message (is forwarded) -> message (is forwarded from) ???
|
||||
# # TODO get source message timestamp
|
||||
#
|
||||
# # is forwarded
|
||||
# # forwarded from -> check if relationship
|
||||
# # nb forwarded -> scard relationship
|
||||
#
|
||||
# Messages -> CHATS -> NB forwarded
|
||||
# CHAT -> NB forwarded by chats -> NB messages -> parse full set ????
|
||||
#
|
||||
#
|
||||
#
|
||||
#
|
||||
#
|
||||
#
|
||||
# show users chats
|
||||
# message media
|
||||
# flag is deleted -> event or missing from feeder pass ???
|
||||
|
||||
def get_translation(self, content=None, source=None, target='fr'):
|
||||
"""
|
||||
Returns translated content
|
||||
"""
|
||||
|
||||
# return self._get_field('translated')
|
||||
global_id = self.get_global_id()
|
||||
translation = r_cache.get(f'translation:{target}:{global_id}')
|
||||
r_cache.expire(f'translation:{target}:{global_id}', 0)
|
||||
if translation:
|
||||
return translation
|
||||
if not content:
|
||||
content = self.get_content()
|
||||
translation = Language.LanguageTranslator().translate(content, source=source, target=target)
|
||||
if translation:
|
||||
r_cache.set(f'translation:{target}:{global_id}', translation)
|
||||
r_cache.expire(f'translation:{target}:{global_id}', 300)
|
||||
return translation
|
||||
|
||||
def _set_translation(self, translation):
|
||||
"""
|
||||
Set translated content
|
||||
"""
|
||||
return self._set_field('translated', translation) # translation by hash ??? -> avoid translating multiple time
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True)}
|
||||
# return payload
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('chats_explorer.objects_message', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/objects/message?id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf4ad', 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_misp_object(self): # TODO
|
||||
obj = MISPObject('instant-message', standalone=True)
|
||||
obj_date = self.get_date()
|
||||
if obj_date:
|
||||
obj.first_seen = obj_date
|
||||
else:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={obj_date}')
|
||||
|
||||
# obj_attrs = [obj.add_attribute('first-seen', value=obj_date),
|
||||
# obj.add_attribute('raw-data', value=self.id, data=self.get_raw_content()),
|
||||
# obj.add_attribute('sensor', value=get_ail_uuid())]
|
||||
obj_attrs = []
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
# def get_url(self):
|
||||
# return r_object.hget(f'meta:item::{self.id}', 'url')
|
||||
|
||||
# options: set of optional meta fields
|
||||
def get_meta(self, options=None, timestamp=None, translation_target='en'):
|
||||
"""
|
||||
:type options: set
|
||||
:type timestamp: float
|
||||
"""
|
||||
if options is None:
|
||||
options = set()
|
||||
meta = self.get_default_meta(tags=True)
|
||||
|
||||
# timestamp
|
||||
if not timestamp:
|
||||
timestamp = self.get_timestamp()
|
||||
else:
|
||||
timestamp = float(timestamp)
|
||||
timestamp = datetime.fromtimestamp(float(timestamp))
|
||||
meta['date'] = timestamp.strftime('%Y/%m/%d')
|
||||
meta['hour'] = timestamp.strftime('%H:%M:%S')
|
||||
meta['full_date'] = timestamp.isoformat(' ')
|
||||
|
||||
meta['source'] = self.get_source()
|
||||
# optional meta fields
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
if 'parent' in options:
|
||||
meta['parent'] = self.get_parent()
|
||||
if meta['parent'] and 'parent_meta' in options:
|
||||
options.remove('parent')
|
||||
parent_type, _, parent_id = meta['parent'].split(':', 3)
|
||||
if parent_type == 'message':
|
||||
message = Message(parent_id)
|
||||
meta['reply_to'] = message.get_meta(options=options, translation_target=translation_target)
|
||||
if 'investigations' in options:
|
||||
meta['investigations'] = self.get_investigations()
|
||||
if 'link' in options:
|
||||
meta['link'] = self.get_link(flask_context=True)
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_svg_icon()
|
||||
if 'user-account' in options:
|
||||
meta['user-account'] = self.get_user_account(meta=True)
|
||||
if not meta['user-account']:
|
||||
meta['user-account'] = {'id': 'UNKNOWN'}
|
||||
if 'chat' in options:
|
||||
meta['chat'] = self.get_chat_id()
|
||||
if 'thread' in options:
|
||||
thread = self.get_thread()
|
||||
if thread:
|
||||
meta['thread'] = thread
|
||||
if 'images' in options:
|
||||
meta['images'] = self.get_images()
|
||||
if 'files-names' in options:
|
||||
meta['files-names'] = self.get_files_names()
|
||||
if 'reactions' in options:
|
||||
meta['reactions'] = self.get_reactions()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation'] = self.translate(content=meta.get('content'), target=translation_target)
|
||||
|
||||
# meta['encoding'] = None
|
||||
return meta
|
||||
|
||||
# def translate(self, content=None): # TODO translation plugin
|
||||
# # TODO get text language
|
||||
# if not content:
|
||||
# content = self.get_content()
|
||||
# translated = argostranslate.translate.translate(content, 'ru', 'en')
|
||||
# # Save translation
|
||||
# self._set_translation(translated)
|
||||
# return translated
|
||||
|
||||
def create(self, content, translation=None, tags=[]):
|
||||
self._set_field('content', content)
|
||||
# r_content.get(f'content:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', content)
|
||||
if translation:
|
||||
self._set_translation(translation)
|
||||
for tag in tags:
|
||||
self.add_tag(tag)
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
pass
|
||||
|
||||
def create_obj_id(chat_instance, chat_id, message_id, timestamp, channel_id=None, thread_id=None): # TODO CHECK COLLISIONS
|
||||
timestamp = int(timestamp)
|
||||
if channel_id and thread_id:
|
||||
return f'{chat_instance}/{timestamp}/{chat_id}/{thread_id}/{message_id}'
|
||||
elif channel_id:
|
||||
return f'{chat_instance}/{timestamp}/{channel_id}/{chat_id}/{message_id}'
|
||||
elif thread_id:
|
||||
return f'{chat_instance}/{timestamp}/{chat_id}/{thread_id}/{message_id}'
|
||||
else:
|
||||
return f'{chat_instance}/{timestamp}/{chat_id}/{message_id}'
|
||||
|
||||
# thread id of message
|
||||
# thread id of chat
|
||||
# thread id of subchannel
|
||||
|
||||
# TODO Check if already exists
|
||||
# def create(source, chat_id, message_id, timestamp, content, tags=[]):
|
||||
def create(obj_id, content, translation=None, tags=[]):
|
||||
message = Message(obj_id)
|
||||
# if not message.exists():
|
||||
message.create(content, translation=translation, tags=tags)
|
||||
return message
|
||||
|
||||
|
||||
# TODO Encode translation
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
r = 'test'
|
||||
print(r)
|
|
@ -71,8 +71,15 @@ class Pgp(AbstractSubtypeObject):
|
|||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('pgp-meta')
|
||||
obj.first_seen = self.get_first_seen()
|
||||
obj.last_seen = self.get_last_seen()
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
if self.subtype == 'key':
|
||||
obj_attrs.append(obj.add_attribute('key-id', value=self.id))
|
||||
|
|
|
@ -9,6 +9,7 @@ import sys
|
|||
from hashlib import sha256
|
||||
from io import BytesIO
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
|
@ -79,15 +80,15 @@ class Screenshot(AbstractObject):
|
|||
obj_attrs = []
|
||||
obj = MISPObject('file')
|
||||
|
||||
obj_attrs.append( obj.add_attribute('sha256', value=self.id) )
|
||||
obj_attrs.append( obj.add_attribute('attachment', value=self.id, data=self.get_file_content()) )
|
||||
obj_attrs.append(obj.add_attribute('sha256', value=self.id))
|
||||
obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'id': self.id}
|
||||
meta = self.get_default_meta()
|
||||
meta['img'] = get_screenshot_rel_path(self.id) ######### # TODO: Rename ME ??????
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'tags_safe' in options:
|
||||
|
|
123
bin/lib/objects/Titles.py
Executable file
123
bin/lib/objects/Titles.py
Executable file
|
@ -0,0 +1,123 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from flask import url_for
|
||||
|
||||
# import warnings
|
||||
# warnings.filterwarnings("ignore", category=DeprecationWarning)
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class Title(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Title Object.
|
||||
"""
|
||||
|
||||
def __init__(self, id):
|
||||
super(Title, self).__init__('title', id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
elif r_type == 'bytes':
|
||||
return self._get_field('content').encode()
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf1dc', 'color': '#3C7CFF', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('tsk-web-history')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('title', value=self.get_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create_title(content):
|
||||
title_id = sha256(content.encode()).hexdigest()
|
||||
title = Title(title_id)
|
||||
if not title.exists():
|
||||
title.create(content)
|
||||
return title
|
||||
|
||||
class Titles(AbstractDaterangeObjects):
|
||||
"""
|
||||
Titles Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('title', Title)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# # from lib import crawlers
|
||||
# # from lib.objects import Items
|
||||
# # for item in Items.get_all_items_objects(filters={'sources': ['crawled']}):
|
||||
# # title_content = crawlers.extract_title_from_html(item.get_content())
|
||||
# # if title_content:
|
||||
# # print(item.id, title_content)
|
||||
# # title = create_title(title_content)
|
||||
# # title.add(item.get_date(), item.id)
|
||||
# titles = Titles()
|
||||
# # for r in titles.get_ids_iterator():
|
||||
# # print(r)
|
||||
# r = titles.search_by_id('f7d57B', r_pos=True, case_sensitive=False)
|
||||
# print(r)
|
|
@ -82,8 +82,16 @@ class Username(AbstractSubtypeObject):
|
|||
obj = MISPObject('user-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
|
||||
obj.first_seen = self.get_first_seen()
|
||||
obj.last_seen = self.get_last_seen()
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
|
|
216
bin/lib/objects/UsersAccount.py
Executable file
216
bin/lib/objects/UsersAccount.py
Executable file
|
@ -0,0 +1,216 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
# import re
|
||||
|
||||
# from datetime import datetime
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_subtype_object import AbstractSubtypeObject, get_all_id
|
||||
from lib.timeline_engine import Timeline
|
||||
from lib.objects import Usernames
|
||||
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class UserAccount(AbstractSubtypeObject):
|
||||
"""
|
||||
AIL User Object. (strings)
|
||||
"""
|
||||
|
||||
def __init__(self, id, subtype):
|
||||
super(UserAccount, self).__init__('user-account', id, subtype)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO change icon/color
|
||||
return {'style': 'fas', 'icon': '\uf2bd', 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_first_name(self):
|
||||
return self._get_field('firstname')
|
||||
|
||||
def get_last_name(self):
|
||||
return self._get_field('lastname')
|
||||
|
||||
def get_phone(self):
|
||||
return self._get_field('phone')
|
||||
|
||||
def set_first_name(self, firstname):
|
||||
return self._set_field('firstname', firstname)
|
||||
|
||||
def set_last_name(self, lastname):
|
||||
return self._set_field('lastname', lastname)
|
||||
|
||||
def set_phone(self, phone):
|
||||
return self._set_field('phone', phone)
|
||||
|
||||
def get_icon(self):
|
||||
icon = self._get_field('icon')
|
||||
if icon:
|
||||
return icon.rsplit(':', 1)[1]
|
||||
|
||||
def set_icon(self, icon):
|
||||
self._set_field('icon', icon)
|
||||
|
||||
def get_info(self):
|
||||
return self._get_field('info')
|
||||
|
||||
def set_info(self, info):
|
||||
return self._set_field('info', info)
|
||||
|
||||
# def get_created_at(self, date=False):
|
||||
# created_at = self._get_field('created_at')
|
||||
# if date and created_at:
|
||||
# created_at = datetime.fromtimestamp(float(created_at))
|
||||
# created_at = created_at.isoformat(' ')
|
||||
# return created_at
|
||||
|
||||
# TODO MESSAGES:
|
||||
# 1) ALL MESSAGES + NB
|
||||
# 2) ALL MESSAGES TIMESTAMP
|
||||
# 3) ALL MESSAGES TIMESTAMP By: - chats
|
||||
# - subchannel
|
||||
# - thread
|
||||
|
||||
def get_chats(self):
|
||||
chats = self.get_correlation('chat')['chat']
|
||||
return chats
|
||||
|
||||
def get_chat_subchannels(self):
|
||||
chats = self.get_correlation('chat-subchannel')['chat-subchannel']
|
||||
return chats
|
||||
|
||||
def get_chat_threads(self):
|
||||
chats = self.get_correlation('chat-thread')['chat-thread']
|
||||
return chats
|
||||
|
||||
def _get_timeline_username(self):
|
||||
return Timeline(self.get_global_id(), 'username')
|
||||
|
||||
def get_username(self):
|
||||
return self._get_timeline_username().get_last_obj_id()
|
||||
|
||||
def get_usernames(self):
|
||||
return self._get_timeline_username().get_objs_ids()
|
||||
|
||||
def update_username_timeline(self, username_global_id, timestamp):
|
||||
self._get_timeline_username().add_timestamp(timestamp, username_global_id)
|
||||
|
||||
def get_messages_by_chat_obj(self, chat_obj):
|
||||
messages = []
|
||||
for mess in self.get_correlation_iter_obj(chat_obj, 'message'):
|
||||
messages.append(f'message:{mess}')
|
||||
return messages
|
||||
|
||||
def get_meta(self, options=set(), translation_target=None): # TODO Username timeline
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['subtype'] = self.subtype
|
||||
meta['tags'] = self.get_tags(r_list=True) # TODO add in options ????
|
||||
if 'username' in options:
|
||||
meta['username'] = self.get_username()
|
||||
if meta['username']:
|
||||
_, username_account_subtype, username_account_id = meta['username'].split(':', 3)
|
||||
if 'username_meta' in options:
|
||||
meta['username'] = Usernames.Username(username_account_id, username_account_subtype).get_meta()
|
||||
else:
|
||||
meta['username'] = {'type': 'username', 'subtype': username_account_subtype, 'id': username_account_id}
|
||||
if 'usernames' in options:
|
||||
meta['usernames'] = self.get_usernames()
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
if 'info' in options:
|
||||
meta['info'] = self.get_info()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation_info'] = self.translate(meta['info'], field='info', target=translation_target)
|
||||
# if 'created_at':
|
||||
# meta['created_at'] = self.get_created_at(date=True)
|
||||
if 'chats' in options:
|
||||
meta['chats'] = self.get_chats()
|
||||
if 'subchannels' in options:
|
||||
meta['subchannels'] = self.get_chat_subchannels()
|
||||
if 'threads' in options:
|
||||
meta['threads'] = self.get_chat_threads()
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
if self.subtype == 'telegram':
|
||||
obj = MISPObject('telegram-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
|
||||
elif self.subtype == 'twitter':
|
||||
obj = MISPObject('twitter-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('name', value=self.id))
|
||||
|
||||
else:
|
||||
obj = MISPObject('user-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_user_by_username():
|
||||
pass
|
||||
|
||||
def get_all_subtypes():
|
||||
return ail_core.get_object_all_subtypes('user-account')
|
||||
|
||||
def get_all():
|
||||
users = {}
|
||||
for subtype in get_all_subtypes():
|
||||
users[subtype] = get_all_by_subtype(subtype)
|
||||
return users
|
||||
|
||||
def get_all_by_subtype(subtype):
|
||||
return get_all_id('user-account', subtype)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
from lib.objects import Chats
|
||||
chat = Chats.Chat('', '00098785-7e70-5d12-a120-c5cdc1252b2b')
|
||||
account = UserAccount('', '00098785-7e70-5d12-a120-c5cdc1252b2b')
|
||||
print(account.get_messages_by_chat_obj(chat))
|
306
bin/lib/objects/abstract_chat_object.py
Executable file
306
bin/lib/objects/abstract_chat_object.py
Executable file
|
@ -0,0 +1,306 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
Base Class for AIL Objects
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from abc import ABC
|
||||
|
||||
from datetime import datetime
|
||||
# from flask import url_for
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.objects.abstract_subtype_object import AbstractSubtypeObject
|
||||
from lib.ail_core import unpack_correl_objs_id, zscan_iter ################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import Messages
|
||||
from packages import Date
|
||||
|
||||
# from lib.data_retention_engine import update_obj_date
|
||||
|
||||
|
||||
# LOAD CONFIG
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
# # FIXME: SAVE SUBTYPE NAMES ?????
|
||||
|
||||
class AbstractChatObject(AbstractSubtypeObject, ABC):
|
||||
"""
|
||||
Abstract Subtype Object
|
||||
"""
|
||||
|
||||
def __init__(self, obj_type, id, subtype):
|
||||
""" Abstract for all the AIL object
|
||||
|
||||
:param obj_type: object type (item, ...)
|
||||
:param id: Object ID
|
||||
"""
|
||||
super().__init__(obj_type, id, subtype)
|
||||
|
||||
# get useraccount / username
|
||||
# get users ?
|
||||
# timeline name ????
|
||||
# info
|
||||
# created
|
||||
# last imported/updated
|
||||
|
||||
# TODO get instance
|
||||
# TODO get protocol
|
||||
# TODO get network
|
||||
# TODO get address
|
||||
|
||||
def get_chat(self): # require ail object TODO ##
|
||||
if self.type != 'chat':
|
||||
parent = self.get_parent()
|
||||
if parent:
|
||||
obj_type, _ = parent.split(':', 1)
|
||||
if obj_type == 'chat':
|
||||
return parent
|
||||
|
||||
def get_subchannels(self):
|
||||
subchannels = []
|
||||
if self.type == 'chat': # category ???
|
||||
for obj_global_id in self.get_childrens():
|
||||
obj_type, _ = obj_global_id.split(':', 1)
|
||||
if obj_type == 'chat-subchannel':
|
||||
subchannels.append(obj_global_id)
|
||||
return subchannels
|
||||
|
||||
def get_nb_subchannels(self):
|
||||
nb = 0
|
||||
if self.type == 'chat':
|
||||
for obj_global_id in self.get_childrens():
|
||||
obj_type, _ = obj_global_id.split(':', 1)
|
||||
if obj_type == 'chat-subchannel':
|
||||
nb += 1
|
||||
return nb
|
||||
|
||||
def get_threads(self):
|
||||
threads = []
|
||||
for child in self.get_childrens():
|
||||
obj_type, obj_subtype, obj_id = child.split(':', 2)
|
||||
if obj_type == 'chat-thread':
|
||||
threads.append({'type': obj_type, 'subtype': obj_subtype, 'id': obj_id})
|
||||
return threads
|
||||
|
||||
def get_created_at(self, date=False):
|
||||
created_at = self._get_field('created_at')
|
||||
if date and created_at:
|
||||
created_at = datetime.fromtimestamp(float(created_at))
|
||||
created_at = created_at.isoformat(' ')
|
||||
return created_at
|
||||
|
||||
def set_created_at(self, timestamp):
|
||||
self._set_field('created_at', timestamp)
|
||||
|
||||
def get_name(self):
|
||||
name = self._get_field('name')
|
||||
if not name:
|
||||
name = ''
|
||||
return name
|
||||
|
||||
def set_name(self, name):
|
||||
self._set_field('name', name)
|
||||
|
||||
def get_icon(self):
|
||||
icon = self._get_field('icon')
|
||||
if icon:
|
||||
return icon.rsplit(':', 1)[1]
|
||||
|
||||
def set_icon(self, icon):
|
||||
self._set_field('icon', icon)
|
||||
|
||||
def get_info(self):
|
||||
return self._get_field('info')
|
||||
|
||||
def set_info(self, info):
|
||||
self._set_field('info', info)
|
||||
|
||||
def get_nb_messages(self):
|
||||
return r_object.zcard(f'messages:{self.type}:{self.subtype}:{self.id}')
|
||||
|
||||
def _get_messages(self, nb=-1, page=-1):
|
||||
if nb < 1:
|
||||
messages = r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, -1, withscores=True)
|
||||
nb_pages = 0
|
||||
page = 1
|
||||
total = len(messages)
|
||||
nb_first = 1
|
||||
nb_last = total
|
||||
else:
|
||||
total = r_object.zcard(f'messages:{self.type}:{self.subtype}:{self.id}')
|
||||
nb_pages = total / nb
|
||||
if not nb_pages.is_integer():
|
||||
nb_pages = int(nb_pages) + 1
|
||||
else:
|
||||
nb_pages = int(nb_pages)
|
||||
if page > nb_pages or page < 1:
|
||||
page = nb_pages
|
||||
|
||||
if page > 1:
|
||||
start = (page - 1) * nb
|
||||
else:
|
||||
start = 0
|
||||
messages = r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', start, start+nb-1, withscores=True)
|
||||
# if messages:
|
||||
# messages = reversed(messages)
|
||||
nb_first = start+1
|
||||
nb_last = start+nb
|
||||
if nb_last > total:
|
||||
nb_last = total
|
||||
return messages, {'nb': nb, 'page': page, 'nb_pages': nb_pages, 'total': total, 'nb_first': nb_first, 'nb_last': nb_last}
|
||||
|
||||
def get_timestamp_first_message(self):
|
||||
return r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0, withscores=True)
|
||||
|
||||
def get_timestamp_last_message(self):
|
||||
return r_object.zrevrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0, withscores=True)
|
||||
|
||||
def get_first_message(self):
|
||||
return r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0)
|
||||
|
||||
def get_last_message(self):
|
||||
return r_object.zrevrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0)
|
||||
|
||||
def get_nb_message_by_hours(self, date_day, nb_day):
|
||||
hours = []
|
||||
# start=0, end=23
|
||||
timestamp = time.mktime(datetime.strptime(date_day, "%Y%m%d").timetuple())
|
||||
for i in range(24):
|
||||
timestamp_end = timestamp + 3600
|
||||
nb_messages = r_object.zcount(f'messages:{self.type}:{self.subtype}:{self.id}', timestamp, timestamp_end)
|
||||
timestamp = timestamp_end
|
||||
hours.append({'date': f'{date_day[0:4]}-{date_day[4:6]}-{date_day[6:8]}', 'day': nb_day, 'hour': i, 'count': nb_messages})
|
||||
return hours
|
||||
|
||||
def get_nb_message_by_week(self, date_day):
|
||||
date_day = Date.get_date_week_by_date(date_day)
|
||||
week_messages = []
|
||||
i = 0
|
||||
for date in Date.daterange_add_days(date_day, 6):
|
||||
week_messages = week_messages + self.get_nb_message_by_hours(date, i)
|
||||
i += 1
|
||||
return week_messages
|
||||
|
||||
def get_nb_message_this_week(self):
|
||||
week_date = Date.get_current_week_day()
|
||||
return self.get_nb_message_by_week(week_date)
|
||||
|
||||
def get_message_meta(self, message, timestamp=None, translation_target='en'): # TODO handle file message
|
||||
message = Messages.Message(message[9:])
|
||||
meta = message.get_meta(options={'content', 'files-names', 'images', 'link', 'parent', 'parent_meta', 'reactions', 'thread', 'translation', 'user-account'}, timestamp=timestamp, translation_target=translation_target)
|
||||
return meta
|
||||
|
||||
def get_messages(self, start=0, page=-1, nb=500, unread=False, translation_target='en'): # threads ???? # TODO ADD last/first message timestamp + return page
|
||||
# TODO return message meta
|
||||
tags = {}
|
||||
messages = {}
|
||||
curr_date = None
|
||||
try:
|
||||
nb = int(nb)
|
||||
except TypeError:
|
||||
nb = 500
|
||||
if not page:
|
||||
page = -1
|
||||
try:
|
||||
page = int(page)
|
||||
except TypeError:
|
||||
page = 1
|
||||
mess, pagination = self._get_messages(nb=nb, page=page)
|
||||
for message in mess:
|
||||
timestamp = message[1]
|
||||
date_day = datetime.fromtimestamp(timestamp).strftime('%Y/%m/%d')
|
||||
if date_day != curr_date:
|
||||
messages[date_day] = []
|
||||
curr_date = date_day
|
||||
mess_dict = self.get_message_meta(message[0], timestamp=timestamp, translation_target=translation_target)
|
||||
messages[date_day].append(mess_dict)
|
||||
|
||||
if mess_dict.get('tags'):
|
||||
for tag in mess_dict['tags']:
|
||||
if tag not in tags:
|
||||
tags[tag] = 0
|
||||
tags[tag] += 1
|
||||
return messages, pagination, tags
|
||||
|
||||
# TODO REWRITE ADD OR ADD MESSAGE ????
|
||||
# add
|
||||
# add message
|
||||
|
||||
def get_obj_by_message_id(self, message_id):
|
||||
return r_object.hget(f'messages:ids:{self.type}:{self.subtype}:{self.id}', message_id)
|
||||
|
||||
def add_message_cached_reply(self, reply_id, message_id):
|
||||
r_cache.sadd(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{reply_id}', message_id)
|
||||
r_cache.expire(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{reply_id}', 600)
|
||||
|
||||
def _get_message_cached_reply(self, message_id):
|
||||
return r_cache.smembers(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{message_id}')
|
||||
|
||||
def get_cached_message_reply(self, message_id):
|
||||
objs_global_id = []
|
||||
for mess_id in self._get_message_cached_reply(message_id):
|
||||
obj_global_id = self.get_obj_by_message_id(mess_id) # TODO CATCH EXCEPTION
|
||||
if obj_global_id:
|
||||
objs_global_id.append(obj_global_id)
|
||||
return objs_global_id
|
||||
|
||||
def add_message(self, obj_global_id, message_id, timestamp, reply_id=None):
|
||||
r_object.hset(f'messages:ids:{self.type}:{self.subtype}:{self.id}', message_id, obj_global_id)
|
||||
r_object.zadd(f'messages:{self.type}:{self.subtype}:{self.id}', {obj_global_id: float(timestamp)})
|
||||
|
||||
# MESSAGE REPLY
|
||||
if reply_id:
|
||||
reply_obj = self.get_obj_by_message_id(reply_id) # TODO CATCH EXCEPTION
|
||||
if reply_obj:
|
||||
self.add_obj_children(reply_obj, obj_global_id)
|
||||
else:
|
||||
self.add_message_cached_reply(reply_id, message_id)
|
||||
# CACHED REPLIES
|
||||
for mess_id in self.get_cached_message_reply(message_id):
|
||||
self.add_obj_children(obj_global_id, mess_id)
|
||||
|
||||
# def get_deleted_messages(self, message_id):
|
||||
|
||||
def get_participants(self):
|
||||
return unpack_correl_objs_id('user-account', self.get_correlation('user-account')['user-account'], r_type='dict')
|
||||
|
||||
def get_nb_participants(self):
|
||||
return self.get_nb_correlation('user-account')
|
||||
|
||||
# TODO move me to abstract subtype
|
||||
class AbstractChatObjects(ABC):
|
||||
def __init__(self, type):
|
||||
self.type = type
|
||||
|
||||
def add_subtype(self, subtype):
|
||||
r_object.sadd(f'all_{self.type}:subtypes', subtype)
|
||||
|
||||
def get_subtypes(self):
|
||||
return r_object.smembers(f'all_{self.type}:subtypes')
|
||||
|
||||
def get_nb_ids_by_subtype(self, subtype):
|
||||
return r_object.zcard(f'{self.type}_all:{subtype}')
|
||||
|
||||
def get_ids_by_subtype(self, subtype):
|
||||
return r_object.zrange(f'{self.type}_all:{subtype}', 0, -1)
|
||||
|
||||
def get_all_id_iterator_iter(self, subtype):
|
||||
return zscan_iter(r_object, f'{self.type}_all:{subtype}')
|
||||
|
||||
def get_ids(self):
|
||||
pass
|
||||
|
||||
def search(self):
|
||||
pass
|
|
@ -7,6 +7,7 @@ Base Class for AIL Objects
|
|||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from abc import abstractmethod, ABC
|
||||
|
||||
|
@ -44,8 +45,14 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
def exists(self):
|
||||
return r_object.exists(f'meta:{self.type}:{self.id}')
|
||||
|
||||
def _get_field(self, field): # TODO remove me (NEW in abstract)
|
||||
return r_object.hget(f'meta:{self.type}:{self.id}', field)
|
||||
|
||||
def _set_field(self, field, value): # TODO remove me (NEW in abstract)
|
||||
return r_object.hset(f'meta:{self.type}:{self.id}', field, value)
|
||||
|
||||
def get_first_seen(self, r_int=False):
|
||||
first_seen = r_object.hget(f'meta:{self.type}:{self.id}', 'first_seen')
|
||||
first_seen = self._get_field('first_seen')
|
||||
if r_int:
|
||||
if first_seen:
|
||||
return int(first_seen)
|
||||
|
@ -55,7 +62,7 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
return first_seen
|
||||
|
||||
def get_last_seen(self, r_int=False):
|
||||
last_seen = r_object.hget(f'meta:{self.type}:{self.id}', 'last_seen')
|
||||
last_seen = self._get_field('last_seen')
|
||||
if r_int:
|
||||
if last_seen:
|
||||
return int(last_seen)
|
||||
|
@ -64,8 +71,8 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
else:
|
||||
return last_seen
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('item')
|
||||
def get_nb_seen(self): # TODO REPLACE ME -> correlation image
|
||||
return self.get_nb_correlation('item') + self.get_nb_correlation('message')
|
||||
|
||||
def get_nb_seen_by_date(self, date):
|
||||
nb = r_object.zscore(f'{self.type}:date:{date}', self.id)
|
||||
|
@ -75,18 +82,19 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
return int(nb)
|
||||
|
||||
def _get_meta(self, options=[]):
|
||||
meta_dict = {'first_seen': self.get_first_seen(),
|
||||
'last_seen': self.get_last_seen(),
|
||||
'nb_seen': self.get_nb_seen()}
|
||||
meta_dict = self.get_default_meta()
|
||||
meta_dict['first_seen'] = self.get_first_seen()
|
||||
meta_dict['last_seen'] = self.get_last_seen()
|
||||
meta_dict['nb_seen'] = self.get_nb_seen()
|
||||
if 'sparkline' in options:
|
||||
meta_dict['sparkline'] = self.get_sparkline()
|
||||
return meta_dict
|
||||
|
||||
def set_first_seen(self, first_seen):
|
||||
r_object.hset(f'meta:{self.type}:{self.id}', 'first_seen', first_seen)
|
||||
self._set_field('first_seen', first_seen)
|
||||
|
||||
def set_last_seen(self, last_seen):
|
||||
r_object.hset(f'meta:{self.type}:{self.id}', 'last_seen', last_seen)
|
||||
self._set_field('last_seen', last_seen)
|
||||
|
||||
def update_daterange(self, date):
|
||||
date = int(date)
|
||||
|
@ -117,9 +125,7 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
def _add_create(self):
|
||||
r_object.sadd(f'{self.type}:all', self.id)
|
||||
|
||||
# TODO don't increase nb if same hash in item with different encoding
|
||||
# if hash already in item
|
||||
def _add(self, date, item_id):
|
||||
def _add(self, date, obj): # TODO OBJ=None
|
||||
if not self.exists():
|
||||
self._add_create()
|
||||
self.set_first_seen(date)
|
||||
|
@ -128,22 +134,132 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
self.update_daterange(date)
|
||||
update_obj_date(date, self.type)
|
||||
|
||||
# NB Object seen by day
|
||||
if not self.is_correlated('item', '', item_id): # if decoded not already in object
|
||||
r_object.zincrby(f'{self.type}:date:{date}', 1, self.id)
|
||||
r_object.zincrby(f'{self.type}:date:{date}', 1, self.id)
|
||||
|
||||
# Correlations
|
||||
self.add_correlation('item', '', item_id)
|
||||
if is_crawled(item_id): # Domain
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
if obj:
|
||||
# Correlations
|
||||
self.add_correlation(obj.type, obj.get_subtype(r_str=True), obj.get_id())
|
||||
|
||||
if obj.type == 'item':
|
||||
item_id = obj.get_id()
|
||||
# domain
|
||||
if is_crawled(item_id):
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
|
||||
def add(self, date, obj):
|
||||
self._add(date, obj)
|
||||
|
||||
# TODO:ADD objects + Stats
|
||||
def _create(self, first_seen, last_seen):
|
||||
self.set_first_seen(first_seen)
|
||||
self.set_last_seen(last_seen)
|
||||
def _create(self, first_seen=None, last_seen=None):
|
||||
if first_seen:
|
||||
self.set_first_seen(first_seen)
|
||||
if last_seen:
|
||||
self.set_last_seen(last_seen)
|
||||
r_object.sadd(f'{self.type}:all', self.id)
|
||||
|
||||
# TODO
|
||||
def _delete(self):
|
||||
pass
|
||||
|
||||
|
||||
class AbstractDaterangeObjects(ABC):
|
||||
"""
|
||||
Abstract Daterange Objects
|
||||
"""
|
||||
|
||||
def __init__(self, obj_type, obj_class):
|
||||
""" Abstract for Daterange Objects
|
||||
|
||||
:param obj_type: object type (item, ...)
|
||||
:param obj_class: object python class (Item, ...)
|
||||
"""
|
||||
self.type = obj_type
|
||||
self.obj_class = obj_class
|
||||
|
||||
def get_ids(self):
|
||||
return r_object.smembers(f'{self.type}:all')
|
||||
|
||||
# def get_ids_iterator(self):
|
||||
# return r_object.sscan_iter(r_object, f'{self.type}:all')
|
||||
|
||||
def get_by_date(self, date):
|
||||
return r_object.zrange(f'{self.type}:date:{date}', 0, -1)
|
||||
|
||||
def get_nb_by_date(self, date):
|
||||
return r_object.zcard(f'{self.type}:date:{date}')
|
||||
|
||||
def get_by_daterange(self, date_from, date_to):
|
||||
obj_ids = set()
|
||||
for date in Date.substract_date(date_from, date_to):
|
||||
obj_ids = obj_ids | set(self.get_by_date(date))
|
||||
return obj_ids
|
||||
|
||||
def get_metas(self, obj_ids, options=set()):
|
||||
dict_obj = {}
|
||||
for obj_id in obj_ids:
|
||||
obj = self.obj_class(obj_id)
|
||||
dict_obj[obj_id] = obj.get_meta(options=options)
|
||||
return dict_obj
|
||||
|
||||
@abstractmethod
|
||||
def sanitize_id_to_search(self, id_to_search):
|
||||
return id_to_search
|
||||
|
||||
def search_by_id(self, name_to_search, r_pos=False, case_sensitive=True):
|
||||
objs = {}
|
||||
if case_sensitive:
|
||||
flags = 0
|
||||
else:
|
||||
flags = re.IGNORECASE
|
||||
# for subtype in subtypes:
|
||||
r_name = self.sanitize_id_to_search(name_to_search)
|
||||
if not name_to_search or isinstance(r_name, dict):
|
||||
return objs
|
||||
r_name = re.compile(r_name, flags=flags)
|
||||
for obj_id in self.get_ids(): # TODO REPLACE ME WITH AN ITERATOR
|
||||
res = re.search(r_name, obj_id)
|
||||
if res:
|
||||
objs[obj_id] = {}
|
||||
if r_pos:
|
||||
objs[obj_id]['hl-start'] = res.start()
|
||||
objs[obj_id]['hl-end'] = res.end()
|
||||
return objs
|
||||
|
||||
def sanitize_content_to_search(self, content_to_search):
|
||||
return content_to_search
|
||||
|
||||
def search_by_content(self, content_to_search, r_pos=False, case_sensitive=True):
|
||||
objs = {}
|
||||
if case_sensitive:
|
||||
flags = 0
|
||||
else:
|
||||
flags = re.IGNORECASE
|
||||
# for subtype in subtypes:
|
||||
r_search = self.sanitize_content_to_search(content_to_search)
|
||||
if not r_search or isinstance(r_search, dict):
|
||||
return objs
|
||||
r_search = re.compile(r_search, flags=flags)
|
||||
for obj_id in self.get_ids(): # TODO REPLACE ME WITH AN ITERATOR
|
||||
obj = self.obj_class(obj_id)
|
||||
content = obj.get_content()
|
||||
res = re.search(r_search, content)
|
||||
if res:
|
||||
objs[obj_id] = {}
|
||||
if r_pos: # TODO ADD CONTENT ????
|
||||
objs[obj_id]['hl-start'] = res.start()
|
||||
objs[obj_id]['hl-end'] = res.end()
|
||||
objs[obj_id]['content'] = content
|
||||
return objs
|
||||
|
||||
def api_get_chart_nb_by_daterange(self, date_from, date_to):
|
||||
date_type = []
|
||||
for date in Date.substract_date(date_from, date_to):
|
||||
d = {'date': f'{date[0:4]}-{date[4:6]}-{date[6:8]}',
|
||||
self.type: self.get_nb_by_date(date)}
|
||||
date_type.append(d)
|
||||
return date_type
|
||||
|
||||
def api_get_meta_by_daterange(self, date_from, date_to):
|
||||
date = Date.sanitise_date_range(date_from, date_to)
|
||||
return self.get_metas(self.get_by_daterange(date['date_from'], date['date_to']), options={'sparkline'})
|
|
@ -7,6 +7,7 @@ Base Class for AIL Objects
|
|||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import logging.config
|
||||
import sys
|
||||
from abc import ABC, abstractmethod
|
||||
from pymisp import MISPObject
|
||||
|
@ -17,23 +18,28 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_logger
|
||||
from lib import Tag
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import Duplicate
|
||||
from lib.correlations_engine import get_nb_correlations, get_correlations, add_obj_correlation, delete_obj_correlation, delete_obj_correlations, exists_obj_correlation, is_obj_correlated, get_nb_correlation_by_correl_type
|
||||
from lib.correlations_engine import get_nb_correlations, get_correlations, add_obj_correlation, delete_obj_correlation, delete_obj_correlations, exists_obj_correlation, is_obj_correlated, get_nb_correlation_by_correl_type, get_obj_inter_correlation
|
||||
from lib.Investigations import is_object_investigated, get_obj_investigations, delete_obj_investigations
|
||||
from lib.relationships_engine import get_obj_nb_relationships, add_obj_relationship
|
||||
from lib.Language import get_obj_translation
|
||||
from lib.Tracker import is_obj_tracked, get_obj_trackers, delete_obj_trackers
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='ail'))
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
# r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
class AbstractObject(ABC):
|
||||
"""
|
||||
Abstract Object
|
||||
"""
|
||||
|
||||
# first seen last/seen ??
|
||||
# # TODO: - tags
|
||||
# - handle + refactor correlations
|
||||
# - creates others objects
|
||||
|
||||
def __init__(self, obj_type, id, subtype=None):
|
||||
""" Abstract for all the AIL object
|
||||
|
||||
|
@ -44,6 +50,8 @@ class AbstractObject(ABC):
|
|||
self.type = obj_type
|
||||
self.subtype = subtype
|
||||
|
||||
self.logger = logging.getLogger(f'{self.__class__.__name__}')
|
||||
|
||||
def get_id(self):
|
||||
return self.id
|
||||
|
||||
|
@ -59,14 +67,28 @@ class AbstractObject(ABC):
|
|||
def get_global_id(self):
|
||||
return f'{self.get_type()}:{self.get_subtype(r_str=True)}:{self.get_id()}'
|
||||
|
||||
def get_default_meta(self, tags=False):
|
||||
def get_default_meta(self, tags=False, link=False):
|
||||
dict_meta = {'id': self.get_id(),
|
||||
'type': self.get_type(),
|
||||
'subtype': self.get_subtype()}
|
||||
'subtype': self.get_subtype(r_str=True)}
|
||||
if tags:
|
||||
dict_meta['tags'] = self.get_tags()
|
||||
if link:
|
||||
dict_meta['link'] = self.get_link()
|
||||
return dict_meta
|
||||
|
||||
def _get_field(self, field):
|
||||
if self.subtype is None:
|
||||
return r_object.hget(f'meta:{self.type}:{self.id}', field)
|
||||
else:
|
||||
return r_object.hget(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', field)
|
||||
|
||||
def _set_field(self, field, value):
|
||||
if self.subtype is None:
|
||||
return r_object.hset(f'meta:{self.type}:{self.id}', field, value)
|
||||
else:
|
||||
return r_object.hset(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', field, value)
|
||||
|
||||
## Tags ##
|
||||
def get_tags(self, r_list=False):
|
||||
tags = Tag.get_object_tags(self.type, self.id, self.get_subtype(r_str=True))
|
||||
|
@ -74,7 +96,6 @@ class AbstractObject(ABC):
|
|||
tags = list(tags)
|
||||
return tags
|
||||
|
||||
## ADD TAGS ????
|
||||
def add_tag(self, tag):
|
||||
Tag.add_object_tag(tag, self.type, self.id, subtype=self.get_subtype(r_str=True))
|
||||
|
||||
|
@ -83,7 +104,7 @@ class AbstractObject(ABC):
|
|||
tags = self.get_tags()
|
||||
return Tag.is_tags_safe(tags)
|
||||
|
||||
#- Tags -#
|
||||
## -Tags- ##
|
||||
|
||||
@abstractmethod
|
||||
def get_content(self):
|
||||
|
@ -98,10 +119,9 @@ class AbstractObject(ABC):
|
|||
|
||||
def add_duplicate(self, algo, similarity, id_2):
|
||||
return Duplicate.add_obj_duplicate(algo, similarity, self.type, self.get_subtype(r_str=True), self.id, id_2)
|
||||
# -Duplicates -#
|
||||
## -Duplicates- ##
|
||||
|
||||
## Investigations ##
|
||||
# # TODO: unregister =====
|
||||
|
||||
def is_investigated(self):
|
||||
if not self.subtype:
|
||||
|
@ -124,7 +144,7 @@ class AbstractObject(ABC):
|
|||
unregistered = delete_obj_investigations(self.id, self.type, self.subtype)
|
||||
return unregistered
|
||||
|
||||
#- Investigations -#
|
||||
## -Investigations- ##
|
||||
|
||||
## Trackers ##
|
||||
|
||||
|
@ -137,7 +157,7 @@ class AbstractObject(ABC):
|
|||
def delete_trackers(self):
|
||||
return delete_obj_trackers(self.type, self.subtype, self.id)
|
||||
|
||||
#- Trackers -#
|
||||
## -Trackers- ##
|
||||
|
||||
def _delete(self):
|
||||
# DELETE TAGS
|
||||
|
@ -186,15 +206,6 @@ class AbstractObject(ABC):
|
|||
def get_misp_object(self):
|
||||
pass
|
||||
|
||||
@staticmethod
|
||||
def get_misp_object_first_last_seen(misp_obj):
|
||||
"""
|
||||
:type misp_obj: MISPObject
|
||||
"""
|
||||
first_seen = misp_obj.get('first_seen')
|
||||
last_seen = misp_obj.get('last_seen')
|
||||
return first_seen, last_seen
|
||||
|
||||
@staticmethod
|
||||
def get_misp_object_tags(misp_obj):
|
||||
"""
|
||||
|
@ -209,6 +220,8 @@ class AbstractObject(ABC):
|
|||
else:
|
||||
return []
|
||||
|
||||
## Correlation ##
|
||||
|
||||
def _get_external_correlation(self, req_type, req_subtype, req_id, obj_type):
|
||||
"""
|
||||
Get object correlation
|
||||
|
@ -259,13 +272,79 @@ class AbstractObject(ABC):
|
|||
return is_obj_correlated(self.type, self.subtype, self.id,
|
||||
object2.get_type(), object2.get_subtype(r_str=True), object2.get_id())
|
||||
|
||||
def get_correlation_iter(self, obj_type2, subtype2, obj_id2, correl_type):
|
||||
return get_obj_inter_correlation(self.type, self.get_subtype(r_str=True), self.id, obj_type2, subtype2, obj_id2, correl_type)
|
||||
|
||||
def get_correlation_iter_obj(self, object2, correl_type):
|
||||
return self.get_correlation_iter(object2.get_type(), object2.get_subtype(r_str=True), object2.get_id(), correl_type)
|
||||
|
||||
def delete_correlation(self, type2, subtype2, id2):
|
||||
"""
|
||||
Get object correlations
|
||||
"""
|
||||
delete_obj_correlation(self.type, self.subtype, self.id, type2, subtype2, id2)
|
||||
|
||||
## -Correlation- ##
|
||||
|
||||
# # TODO: get favicon
|
||||
# # TODO: get url
|
||||
# # TODO: get metadata
|
||||
## Relationship ##
|
||||
|
||||
def get_nb_relationships(self, filter=[]):
|
||||
return get_obj_nb_relationships(self.get_global_id())
|
||||
|
||||
def add_relationship(self, obj2_global_id, relationship, source=True):
|
||||
# is source
|
||||
if source:
|
||||
print(self.get_global_id(), obj2_global_id, relationship)
|
||||
add_obj_relationship(self.get_global_id(), obj2_global_id, relationship)
|
||||
# is target
|
||||
else:
|
||||
add_obj_relationship(obj2_global_id, self.get_global_id(), relationship)
|
||||
|
||||
## -Relationship- ##
|
||||
|
||||
## Translation ##
|
||||
|
||||
def translate(self, content=None, field='', source=None, target='en'):
|
||||
global_id = self.get_global_id()
|
||||
if not content:
|
||||
content = self.get_content()
|
||||
return get_obj_translation(global_id, content, field=field, source=source, target=target)
|
||||
|
||||
## -Translation- ##
|
||||
|
||||
## Parent ##
|
||||
|
||||
def is_parent(self):
|
||||
return r_object.exists(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
|
||||
|
||||
def is_children(self):
|
||||
return r_object.hexists(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent')
|
||||
|
||||
def get_parent(self):
|
||||
return r_object.hget(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent')
|
||||
|
||||
def get_childrens(self):
|
||||
return r_object.smembers(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
|
||||
|
||||
def set_parent(self, obj_type=None, obj_subtype=None, obj_id=None, obj_global_id=None): # TODO # REMOVE ITEM DUP
|
||||
if not obj_global_id:
|
||||
if obj_subtype is None:
|
||||
obj_subtype = ''
|
||||
obj_global_id = f'{obj_type}:{obj_subtype}:{obj_id}'
|
||||
r_object.hset(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent', obj_global_id)
|
||||
r_object.sadd(f'child:{obj_global_id}', self.get_global_id())
|
||||
|
||||
def add_children(self, obj_type=None, obj_subtype=None, obj_id=None, obj_global_id=None): # TODO # REMOVE ITEM DUP
|
||||
if not obj_global_id:
|
||||
if obj_subtype is None:
|
||||
obj_subtype = ''
|
||||
obj_global_id = f'{obj_type}:{obj_subtype}:{obj_id}'
|
||||
r_object.sadd(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', obj_global_id)
|
||||
r_object.hset(f'meta:{obj_global_id}', 'parent', self.get_global_id())
|
||||
|
||||
## others objects ##
|
||||
def add_obj_children(self, parent_global_id, son_global_id):
|
||||
r_object.sadd(f'child:{parent_global_id}', son_global_id)
|
||||
r_object.hset(f'meta:{son_global_id}', 'parent', parent_global_id)
|
||||
|
||||
## Parent ##
|
||||
|
|
|
@ -72,7 +72,10 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
return last_seen
|
||||
|
||||
def get_nb_seen(self):
|
||||
return int(r_object.zscore(f'{self.type}_all:{self.subtype}', self.id))
|
||||
nb = r_object.zscore(f'{self.type}_all:{self.subtype}', self.id)
|
||||
if not nb:
|
||||
nb = 0
|
||||
return int(nb)
|
||||
|
||||
# # TODO: CHECK RESULT
|
||||
def get_nb_seen_by_date(self, date_day):
|
||||
|
@ -85,7 +88,10 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
def _get_meta(self, options=None):
|
||||
if options is None:
|
||||
options = set()
|
||||
meta = {'first_seen': self.get_first_seen(),
|
||||
meta = {'id': self.id,
|
||||
'type': self.type,
|
||||
'subtype': self.subtype,
|
||||
'first_seen': self.get_first_seen(),
|
||||
'last_seen': self.get_last_seen(),
|
||||
'nb_seen': self.get_nb_seen()}
|
||||
if 'icon' in options:
|
||||
|
@ -147,8 +153,11 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
# => data Retention + efficient search
|
||||
#
|
||||
#
|
||||
def _add_subtype(self):
|
||||
r_object.sadd(f'all_{self.type}:subtypes', self.subtype)
|
||||
|
||||
def add(self, date, item_id):
|
||||
def add(self, date, obj=None):
|
||||
self._add_subtype()
|
||||
self.update_daterange(date)
|
||||
update_obj_date(date, self.type, self.subtype)
|
||||
# daily
|
||||
|
@ -159,19 +168,21 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
#######################################################################
|
||||
#######################################################################
|
||||
|
||||
# Correlations
|
||||
self.add_correlation('item', '', item_id)
|
||||
# domain
|
||||
if is_crawled(item_id):
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
if obj:
|
||||
# Correlations
|
||||
self.add_correlation(obj.type, obj.get_subtype(r_str=True), obj.get_id())
|
||||
|
||||
if obj.type == 'item': # TODO same for message->chat ???
|
||||
item_id = obj.get_id()
|
||||
# domain
|
||||
if is_crawled(item_id):
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
|
||||
# TODO:ADD objects + Stats
|
||||
def create(self, first_seen, last_seen):
|
||||
self.set_first_seen(first_seen)
|
||||
self.set_last_seen(last_seen)
|
||||
|
||||
# def create(self, first_seen, last_seen):
|
||||
# self.set_first_seen(first_seen)
|
||||
# self.set_last_seen(last_seen)
|
||||
|
||||
def _delete(self):
|
||||
pass
|
||||
|
|
|
@ -1,6 +1,5 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
|
@ -11,16 +10,29 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.ail_core import get_all_objects, get_object_all_subtypes
|
||||
from lib import correlations_engine
|
||||
from lib import relationships_engine
|
||||
from lib import btc_ail
|
||||
from lib import Tag
|
||||
|
||||
from lib.objects import Chats
|
||||
from lib.objects import ChatSubChannels
|
||||
from lib.objects import ChatThreads
|
||||
from lib.objects import CryptoCurrencies
|
||||
from lib.objects import CookiesNames
|
||||
from lib.objects.Cves import Cve
|
||||
from lib.objects.Decodeds import Decoded, get_all_decodeds_objects, get_nb_decodeds_objects
|
||||
from lib.objects.Domains import Domain
|
||||
from lib.objects import Etags
|
||||
from lib.objects.Favicons import Favicon
|
||||
from lib.objects import FilesNames
|
||||
from lib.objects import HHHashs
|
||||
from lib.objects.Items import Item, get_all_items_objects, get_nb_items_objects
|
||||
from lib.objects import Images
|
||||
from lib.objects.Messages import Message
|
||||
from lib.objects import Pgps
|
||||
from lib.objects.Screenshots import Screenshot
|
||||
from lib.objects import Titles
|
||||
from lib.objects.UsersAccount import UserAccount
|
||||
from lib.objects import Usernames
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
@ -44,23 +56,49 @@ def sanitize_objs_types(objs):
|
|||
return l_types
|
||||
|
||||
|
||||
def get_object(obj_type, subtype, id):
|
||||
def get_object(obj_type, subtype, obj_id):
|
||||
if obj_type == 'item':
|
||||
return Item(id)
|
||||
return Item(obj_id)
|
||||
elif obj_type == 'domain':
|
||||
return Domain(id)
|
||||
return Domain(obj_id)
|
||||
elif obj_type == 'decoded':
|
||||
return Decoded(id)
|
||||
return Decoded(obj_id)
|
||||
elif obj_type == 'chat':
|
||||
return Chats.Chat(obj_id, subtype)
|
||||
elif obj_type == 'chat-subchannel':
|
||||
return ChatSubChannels.ChatSubChannel(obj_id, subtype)
|
||||
elif obj_type == 'chat-thread':
|
||||
return ChatThreads.ChatThread(obj_id, subtype)
|
||||
elif obj_type == 'cookie-name':
|
||||
return CookiesNames.CookieName(obj_id)
|
||||
elif obj_type == 'cve':
|
||||
return Cve(id)
|
||||
return Cve(obj_id)
|
||||
elif obj_type == 'etag':
|
||||
return Etags.Etag(obj_id)
|
||||
elif obj_type == 'favicon':
|
||||
return Favicon(obj_id)
|
||||
elif obj_type == 'file-name':
|
||||
return FilesNames.FileName(obj_id)
|
||||
elif obj_type == 'hhhash':
|
||||
return HHHashs.HHHash(obj_id)
|
||||
elif obj_type == 'image':
|
||||
return Images.Image(obj_id)
|
||||
elif obj_type == 'message':
|
||||
return Message(obj_id)
|
||||
elif obj_type == 'screenshot':
|
||||
return Screenshot(id)
|
||||
return Screenshot(obj_id)
|
||||
elif obj_type == 'cryptocurrency':
|
||||
return CryptoCurrencies.CryptoCurrency(id, subtype)
|
||||
return CryptoCurrencies.CryptoCurrency(obj_id, subtype)
|
||||
elif obj_type == 'pgp':
|
||||
return Pgps.Pgp(id, subtype)
|
||||
return Pgps.Pgp(obj_id, subtype)
|
||||
elif obj_type == 'title':
|
||||
return Titles.Title(obj_id)
|
||||
elif obj_type == 'user-account':
|
||||
return UserAccount(obj_id, subtype)
|
||||
elif obj_type == 'username':
|
||||
return Usernames.Username(id, subtype)
|
||||
return Usernames.Username(obj_id, subtype)
|
||||
else:
|
||||
raise Exception(f'Unknown AIL object: {obj_type} {subtype} {obj_id}')
|
||||
|
||||
def get_objects(objects):
|
||||
objs = set()
|
||||
|
@ -93,9 +131,12 @@ def get_obj_global_id(obj_type, subtype, obj_id):
|
|||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_global_id()
|
||||
|
||||
def get_obj_type_subtype_id_from_global_id(global_id):
|
||||
obj_type, subtype, obj_id = global_id.split(':', 2)
|
||||
return obj_type, subtype, obj_id
|
||||
|
||||
def get_obj_from_global_id(global_id):
|
||||
obj = global_id.split(':', 3)
|
||||
obj = get_obj_type_subtype_id_from_global_id(global_id)
|
||||
return get_object(obj[0], obj[1], obj[2])
|
||||
|
||||
|
||||
|
@ -151,7 +192,7 @@ def get_objects_meta(objs, options=set(), flask_context=False):
|
|||
subtype = obj[1]
|
||||
obj_id = obj[2]
|
||||
else:
|
||||
obj_type, subtype, obj_id = obj.split(':', 2)
|
||||
obj_type, subtype, obj_id = get_obj_type_subtype_id_from_global_id(obj)
|
||||
metas.append(get_object_meta(obj_type, subtype, obj_id, options=options, flask_context=flask_context))
|
||||
return metas
|
||||
|
||||
|
@ -160,13 +201,17 @@ def get_object_card_meta(obj_type, subtype, id, related_btc=False):
|
|||
obj = get_object(obj_type, subtype, id)
|
||||
meta = obj.get_meta()
|
||||
meta['icon'] = obj.get_svg_icon()
|
||||
if subtype or obj_type == 'cve':
|
||||
if subtype or obj_type == 'cookie-name' or obj_type == 'cve' or obj_type == 'etag' or obj_type == 'title' or obj_type == 'favicon' or obj_type == 'hhhash':
|
||||
meta['sparkline'] = obj.get_sparkline()
|
||||
if obj_type == 'cve':
|
||||
meta['cve_search'] = obj.get_cve_search()
|
||||
# if obj_type == 'title':
|
||||
# meta['cve_search'] = obj.get_cve_search()
|
||||
if subtype == 'bitcoin' and related_btc:
|
||||
meta["related_btc"] = btc_ail.get_bitcoin_info(obj.id)
|
||||
if obj.get_type() == 'decoded':
|
||||
meta['mimetype'] = obj.get_mimetype()
|
||||
meta['size'] = obj.get_size()
|
||||
meta["vt"] = obj.get_meta_vt()
|
||||
meta["vt"]["status"] = obj.is_vt_enabled()
|
||||
# TAGS MODAL
|
||||
|
@ -323,8 +368,8 @@ def get_obj_correlations(obj_type, subtype, obj_id):
|
|||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_correlations()
|
||||
|
||||
def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max):
|
||||
if len(objs) < nb_max or nb_max == -1:
|
||||
def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max, objs_hidden):
|
||||
if len(objs) < nb_max or nb_max == 0:
|
||||
if lvl == 0:
|
||||
objs.add((obj_type, subtype, obj_id))
|
||||
|
||||
|
@ -336,21 +381,27 @@ def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lv
|
|||
for obj2_type in correlations:
|
||||
for str_obj in correlations[obj2_type]:
|
||||
obj2_subtype, obj2_id = str_obj.split(':', 1)
|
||||
_get_obj_correlations_objs(objs, obj2_type, obj2_subtype, obj2_id, filter_types, lvl, nb_max)
|
||||
if get_obj_global_id(obj2_type, obj2_subtype, obj2_id) in objs_hidden:
|
||||
continue # filter object to hide
|
||||
_get_obj_correlations_objs(objs, obj2_type, obj2_subtype, obj2_id, filter_types, lvl, nb_max, objs_hidden)
|
||||
|
||||
def get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=[], lvl=0, nb_max=300):
|
||||
def get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=[], lvl=0, nb_max=300, objs_hidden=set()):
|
||||
objs = set()
|
||||
_get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max)
|
||||
_get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max, objs_hidden)
|
||||
return objs
|
||||
|
||||
def obj_correlations_objs_add_tags(obj_type, subtype, obj_id, tags, filter_types=[], lvl=0, nb_max=300):
|
||||
objs = get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=filter_types, lvl=lvl, nb_max=nb_max)
|
||||
def obj_correlations_objs_add_tags(obj_type, subtype, obj_id, tags, filter_types=[], lvl=0, nb_max=300, objs_hidden=set()):
|
||||
objs = get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=filter_types, lvl=lvl, nb_max=nb_max, objs_hidden=objs_hidden)
|
||||
# print(objs)
|
||||
for obj_tuple in objs:
|
||||
obj1_type, subtype1, id1 = obj_tuple
|
||||
add_obj_tags(obj1_type, subtype1, id1, tags)
|
||||
return objs
|
||||
|
||||
def get_obj_nb_correlations(obj_type, subtype, obj_id, filter_types=[]):
|
||||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_nb_correlations(filter_types=filter_types)
|
||||
|
||||
################################################################################
|
||||
################################################################################ TODO
|
||||
################################################################################
|
||||
|
@ -381,7 +432,7 @@ def create_correlation_graph_links(links_set):
|
|||
def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True):
|
||||
graph_nodes_list = []
|
||||
for node_id in nodes_set:
|
||||
obj_type, subtype, obj_id = node_id.split(';', 2)
|
||||
obj_type, subtype, obj_id = get_obj_type_subtype_id_from_global_id(node_id)
|
||||
dict_node = {'id': node_id}
|
||||
dict_node['style'] = get_object_svg(obj_type, subtype, obj_id)
|
||||
|
||||
|
@ -402,17 +453,40 @@ def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True):
|
|||
|
||||
|
||||
def get_correlations_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1,
|
||||
objs_hidden=set(),
|
||||
flask_context=False):
|
||||
obj_str_id, nodes, links = correlations_engine.get_correlations_graph_nodes_links(obj_type, subtype, obj_id,
|
||||
filter_types=filter_types,
|
||||
max_nodes=max_nodes, level=level,
|
||||
flask_context=flask_context)
|
||||
obj_str_id, nodes, links, meta = correlations_engine.get_correlations_graph_nodes_links(obj_type, subtype, obj_id,
|
||||
filter_types=filter_types,
|
||||
max_nodes=max_nodes, level=level,
|
||||
objs_hidden=objs_hidden,
|
||||
flask_context=flask_context)
|
||||
# print(meta)
|
||||
meta['objs'] = list(meta['objs'])
|
||||
return {"nodes": create_correlation_graph_nodes(nodes, obj_str_id, flask_context=flask_context),
|
||||
"links": create_correlation_graph_links(links)}
|
||||
"links": create_correlation_graph_links(links),
|
||||
"meta": meta}
|
||||
|
||||
|
||||
# --- CORRELATION --- #
|
||||
|
||||
def get_obj_nb_relationships(obj_type, subtype, obj_id, filter_types=[]):
|
||||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_nb_relationships(filter=filter_types)
|
||||
|
||||
def get_relationships_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1,
|
||||
objs_hidden=set(),
|
||||
flask_context=False):
|
||||
obj_global_id = get_obj_global_id(obj_type, subtype, obj_id)
|
||||
nodes, links, meta = relationships_engine.get_relationship_graph(obj_global_id,
|
||||
filter_types=filter_types,
|
||||
max_nodes=max_nodes, level=level,
|
||||
objs_hidden=objs_hidden)
|
||||
# print(meta)
|
||||
meta['objs'] = list(meta['objs'])
|
||||
return {"nodes": create_correlation_graph_nodes(nodes, obj_global_id, flask_context=flask_context),
|
||||
"links": links,
|
||||
"meta": meta}
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# r = get_objects([{'lvl': 1, 'type': 'item', 'subtype': '', 'id': 'crawled/2020/09/14/circl.lu0f4976a4-dda4-4189-ba11-6618c4a8c951'}])
|
||||
|
|
|
@ -113,6 +113,34 @@ def regex_finditer(r_key, regex, item_id, content, max_time=30):
|
|||
proc.terminate()
|
||||
sys.exit(0)
|
||||
|
||||
def _regex_match(r_key, regex, content):
|
||||
if re.match(regex, content):
|
||||
r_serv_cache.set(r_key, 1)
|
||||
r_serv_cache.expire(r_key, 360)
|
||||
|
||||
def regex_match(r_key, regex, item_id, content, max_time=30):
|
||||
proc = Proc(target=_regex_match, args=(r_key, regex, content))
|
||||
try:
|
||||
proc.start()
|
||||
proc.join(max_time)
|
||||
if proc.is_alive():
|
||||
proc.terminate()
|
||||
# Statistics.incr_module_timeout_statistic(r_key)
|
||||
err_mess = f"{r_key}: processing timeout: {item_id}"
|
||||
logger.info(err_mess)
|
||||
return False
|
||||
else:
|
||||
if r_serv_cache.exists(r_key):
|
||||
r_serv_cache.delete(r_key)
|
||||
return True
|
||||
else:
|
||||
r_serv_cache.delete(r_key)
|
||||
return False
|
||||
except KeyboardInterrupt:
|
||||
print("Caught KeyboardInterrupt, terminating regex worker")
|
||||
proc.terminate()
|
||||
sys.exit(0)
|
||||
|
||||
def _regex_search(r_key, regex, content):
|
||||
if re.search(regex, content):
|
||||
r_serv_cache.set(r_key, 1)
|
||||
|
|
111
bin/lib/relationships_engine.py
Executable file
111
bin/lib/relationships_engine.py
Executable file
|
@ -0,0 +1,111 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_rel = config_loader.get_db_conn("Kvrocks_Relationships")
|
||||
config_loader = None
|
||||
|
||||
|
||||
RELATIONSHIPS = {
|
||||
"forward",
|
||||
"mention"
|
||||
}
|
||||
def get_relationships():
|
||||
return RELATIONSHIPS
|
||||
|
||||
|
||||
def get_obj_relationships_by_type(obj_global_id, relationship):
|
||||
return r_rel.smembers(f'rel:{relationship}:{obj_global_id}')
|
||||
|
||||
def get_obj_nb_relationships_by_type(obj_global_id, relationship):
|
||||
return r_rel.scard(f'rel:{relationship}:{obj_global_id}')
|
||||
|
||||
def get_obj_relationships(obj_global_id):
|
||||
relationships = []
|
||||
for relationship in get_relationships():
|
||||
for rel in get_obj_relationships_by_type(obj_global_id, relationship):
|
||||
meta = {'relationship': relationship}
|
||||
direction, obj_id = rel.split(':', 1)
|
||||
if direction == 'i':
|
||||
meta['source'] = obj_id
|
||||
meta['target'] = obj_global_id
|
||||
else:
|
||||
meta['target'] = obj_id
|
||||
meta['source'] = obj_global_id
|
||||
|
||||
if not obj_id.startswith('chat'):
|
||||
continue
|
||||
|
||||
meta['id'] = obj_id
|
||||
# meta['direction'] = direction
|
||||
relationships.append(meta)
|
||||
return relationships
|
||||
|
||||
def get_obj_nb_relationships(obj_global_id):
|
||||
nb = {}
|
||||
for relationship in get_relationships():
|
||||
nb[relationship] = get_obj_nb_relationships_by_type(obj_global_id, relationship)
|
||||
return nb
|
||||
|
||||
|
||||
# TODO Filter by obj type ???
|
||||
def add_obj_relationship(source, target, relationship):
|
||||
r_rel.sadd(f'rel:{relationship}:{source}', f'o:{target}')
|
||||
r_rel.sadd(f'rel:{relationship}:{target}', f'i:{source}')
|
||||
# r_rel.sadd(f'rels:{source}', relationship)
|
||||
# r_rel.sadd(f'rels:{target}', relationship)
|
||||
|
||||
|
||||
def get_relationship_graph(obj_global_id, filter_types=[], max_nodes=300, level=1, objs_hidden=set()):
|
||||
links = []
|
||||
nodes = set()
|
||||
meta = {'complete': True, 'objs': set()}
|
||||
done = set()
|
||||
done_link = set()
|
||||
|
||||
_get_relationship_graph(obj_global_id, links, nodes, meta, level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, done=done, done_link=done_link)
|
||||
return nodes, links, meta
|
||||
|
||||
def _get_relationship_graph(obj_global_id, links, nodes, meta, level, max_nodes, filter_types=[], objs_hidden=set(), done=set(), done_link=set()):
|
||||
meta['objs'].add(obj_global_id)
|
||||
nodes.add(obj_global_id)
|
||||
|
||||
for rel in get_obj_relationships(obj_global_id):
|
||||
meta['objs'].add(rel['id'])
|
||||
|
||||
if rel['id'] in done:
|
||||
continue
|
||||
|
||||
if len(nodes) > max_nodes != 0:
|
||||
meta['complete'] = False
|
||||
break
|
||||
|
||||
nodes.add(rel['id'])
|
||||
|
||||
str_link = f"{rel['source']}{rel['target']}{rel['relationship']}"
|
||||
if str_link not in done_link:
|
||||
links.append({"source": rel['source'], "target": rel['target'], "relationship": rel['relationship']})
|
||||
done_link.add(str_link)
|
||||
|
||||
if level > 0:
|
||||
next_level = level - 1
|
||||
|
||||
_get_relationship_graph(rel['id'], links, nodes, meta, next_level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, done=done, done_link=done_link)
|
||||
|
||||
# done.add(rel['id'])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
source = ''
|
||||
target = ''
|
||||
add_obj_relationship(source, target, 'forward')
|
||||
# print(get_obj_relationships(source))
|
212
bin/lib/timeline_engine.py
Executable file
212
bin/lib/timeline_engine.py
Executable file
|
@ -0,0 +1,212 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from uuid import uuid4
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_meta = config_loader.get_db_conn("Kvrocks_Timeline")
|
||||
config_loader = None
|
||||
|
||||
# CORRELATION_TYPES_BY_OBJ = {
|
||||
# "chat": ["item", "username"], # item ???
|
||||
# "cookie-name": ["domain"],
|
||||
# "cryptocurrency": ["domain", "item"],
|
||||
# "cve": ["domain", "item"],
|
||||
# "decoded": ["domain", "item"],
|
||||
# "domain": ["cve", "cookie-name", "cryptocurrency", "decoded", "etag", "favicon", "hhhash", "item", "pgp", "title", "screenshot", "username"],
|
||||
# "etag": ["domain"],
|
||||
# "favicon": ["domain", "item"],
|
||||
# "hhhash": ["domain"],
|
||||
# "item": ["chat", "cve", "cryptocurrency", "decoded", "domain", "favicon", "pgp", "screenshot", "title", "username"],
|
||||
# "pgp": ["domain", "item"],
|
||||
# "screenshot": ["domain", "item"],
|
||||
# "title": ["domain", "item"],
|
||||
# "username": ["chat", "domain", "item"],
|
||||
# }
|
||||
#
|
||||
# def get_obj_correl_types(obj_type):
|
||||
# return CORRELATION_TYPES_BY_OBJ.get(obj_type)
|
||||
|
||||
# def sanityze_obj_correl_types(obj_type, correl_types):
|
||||
# obj_correl_types = get_obj_correl_types(obj_type)
|
||||
# if correl_types:
|
||||
# correl_types = set(correl_types).intersection(obj_correl_types)
|
||||
# if not correl_types:
|
||||
# correl_types = obj_correl_types
|
||||
# if not correl_types:
|
||||
# return []
|
||||
# return correl_types
|
||||
|
||||
class Timeline:
|
||||
|
||||
def __init__(self, global_id, name):
|
||||
self.id = global_id
|
||||
self.name = name
|
||||
|
||||
def _get_block_obj_global_id(self, block):
|
||||
return r_meta.hget(f'block:{self.id}:{self.name}', block)
|
||||
|
||||
def _set_block_obj_global_id(self, block, global_id):
|
||||
return r_meta.hset(f'block:{self.id}:{self.name}', block, global_id)
|
||||
|
||||
def _get_block_timestamp(self, block, position):
|
||||
return r_meta.zscore(f'line:{self.id}:{self.name}', f'{position}:{block}')
|
||||
|
||||
def _get_nearest_bloc_inf(self, timestamp):
|
||||
inf = r_meta.zrevrangebyscore(f'line:{self.id}:{self.name}', float(timestamp), 0, start=0, num=1, withscores=True)
|
||||
if inf:
|
||||
inf, score = inf[0]
|
||||
if inf.startswith('end'):
|
||||
inf_key = f'start:{inf[4:]}'
|
||||
inf_score = r_meta.zscore(f'line:{self.id}:{self.name}', inf_key)
|
||||
if inf_score == score:
|
||||
inf = inf_key
|
||||
return inf
|
||||
else:
|
||||
return None
|
||||
|
||||
def _get_nearest_bloc_sup(self, timestamp):
|
||||
sup = r_meta.zrangebyscore(f'line:{self.id}:{self.name}', float(timestamp), '+inf', start=0, num=1, withscores=True)
|
||||
if sup:
|
||||
sup, score = sup[0]
|
||||
if sup.startswith('start'):
|
||||
sup_key = f'end:{sup[6:]}'
|
||||
sup_score = r_meta.zscore(f'line:{self.id}:{self.name}', sup_key)
|
||||
if score == sup_score:
|
||||
sup = sup_key
|
||||
return sup
|
||||
else:
|
||||
return None
|
||||
|
||||
def get_first_obj_id(self):
|
||||
first = r_meta.zrange(f'line:{self.id}:{self.name}', 0, 0)
|
||||
if first: # start:block
|
||||
first = first[0]
|
||||
if first.startswith('start:'):
|
||||
first = first[6:]
|
||||
else:
|
||||
first = first[4:]
|
||||
return self._get_block_obj_global_id(first)
|
||||
|
||||
def get_last_obj_id(self):
|
||||
last = r_meta.zrevrange(f'line:{self.id}:{self.name}', 0, 0)
|
||||
if last: # end:block
|
||||
last = last[0]
|
||||
if last.startswith('end:'):
|
||||
last = last[4:]
|
||||
else:
|
||||
last = last[6:]
|
||||
return self._get_block_obj_global_id(last)
|
||||
|
||||
def get_objs_ids(self):
|
||||
objs = set()
|
||||
for block in r_meta.zrange(f'line:{self.id}:{self.name}', 0, -1):
|
||||
if block:
|
||||
if block.startswith('start:'):
|
||||
objs.add(self._get_block_obj_global_id(block[6:]))
|
||||
return objs
|
||||
|
||||
# def get_objs_ids(self):
|
||||
# objs = {}
|
||||
# last_obj_id = None
|
||||
# for block, timestamp in r_meta.zrange(f'line:{self.id}:{self.name}', 0, -1, withscores=True):
|
||||
# if block:
|
||||
# if block.startswith('start:'):
|
||||
# last_obj_id = self._get_block_obj_global_id(block[6:])
|
||||
# objs[last_obj_id] = {'first_seen': timestamp}
|
||||
# else:
|
||||
# objs[last_obj_id]['last_seen'] = timestamp
|
||||
# return objs
|
||||
|
||||
def _update_bloc(self, block, position, timestamp):
|
||||
r_meta.zadd(f'line:{self.id}:{self.name}', {f'{position}:{block}': timestamp})
|
||||
|
||||
def _add_bloc(self, obj_global_id, timestamp, end=None):
|
||||
if end:
|
||||
timestamp_end = end
|
||||
else:
|
||||
timestamp_end = timestamp
|
||||
new_bloc = str(uuid4())
|
||||
r_meta.zadd(f'line:{self.id}:{self.name}', {f'start:{new_bloc}': timestamp, f'end:{new_bloc}': timestamp_end})
|
||||
self._set_block_obj_global_id(new_bloc, obj_global_id)
|
||||
return new_bloc
|
||||
|
||||
def add_timestamp(self, timestamp, obj_global_id):
|
||||
inf = self._get_nearest_bloc_inf(timestamp)
|
||||
sup = self._get_nearest_bloc_sup(timestamp)
|
||||
if not inf and not sup:
|
||||
# create new bloc
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
# timestamp < first_seen
|
||||
elif not inf:
|
||||
sup_pos, sup_id = sup.split(':')
|
||||
sup_obj = self._get_block_obj_global_id(sup_id)
|
||||
if sup_obj == obj_global_id:
|
||||
self._update_bloc(sup_id, 'start', timestamp)
|
||||
# create new bloc
|
||||
else:
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
|
||||
# timestamp > first_seen
|
||||
elif not sup:
|
||||
inf_pos, inf_id = inf.split(':')
|
||||
inf_obj = self._get_block_obj_global_id(inf_id)
|
||||
if inf_obj == obj_global_id:
|
||||
self._update_bloc(inf_id, 'end', timestamp)
|
||||
# create new bloc
|
||||
else:
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
|
||||
else:
|
||||
inf_pos, inf_id = inf.split(':')
|
||||
sup_pos, sup_id = sup.split(':')
|
||||
inf_obj = self._get_block_obj_global_id(inf_id)
|
||||
|
||||
if inf_id == sup_id:
|
||||
# reduce bloc + create two new bloc
|
||||
if obj_global_id != inf_obj:
|
||||
# get end timestamp
|
||||
sup_timestamp = self._get_block_timestamp(sup_id, 'end')
|
||||
# reduce original bloc
|
||||
self._update_bloc(inf_id, 'end', timestamp - 1)
|
||||
# Insert new bloc
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
# Recreate end of the first bloc by a new bloc
|
||||
self._add_bloc(inf_obj, timestamp + 1, end=sup_timestamp)
|
||||
return new_bloc
|
||||
|
||||
# timestamp in existing bloc
|
||||
else:
|
||||
return inf_id
|
||||
|
||||
# different blocs: expend sup/inf bloc or create a new bloc if
|
||||
elif inf_pos == 'end' and sup_pos == 'start':
|
||||
# Extend inf bloc
|
||||
if obj_global_id == inf_obj:
|
||||
self._update_bloc(inf_id, 'end', timestamp)
|
||||
return inf_id
|
||||
|
||||
sup_obj = self._get_block_obj_global_id(sup_id)
|
||||
# Extend sup bloc
|
||||
if obj_global_id == sup_obj:
|
||||
self._update_bloc(sup_id, 'start', timestamp)
|
||||
return sup_id
|
||||
|
||||
# create new bloc
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
|
||||
# inf_pos == 'start' and sup_pos == 'end'
|
||||
# else raise error ???
|
|
@ -47,8 +47,8 @@ class ApiKey(AbstractModule):
|
|||
self.logger.info(f"Module {self.module_name} initialized")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
|
||||
google_api_key = self.regex_findall(self.re_google_api_key, item.get_id(), item_content, r_set=True)
|
||||
|
@ -63,8 +63,8 @@ class ApiKey(AbstractModule):
|
|||
print(f'found google api key: {to_print}')
|
||||
self.redis_logger.warning(f'{to_print}Checked {len(google_api_key)} found Google API Key;{item.get_id()}')
|
||||
|
||||
msg = f'infoleak:automatic-detection="google-api-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="google-api-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# # TODO: # FIXME: AWS regex/validate/sanitize KEY + SECRET KEY
|
||||
if aws_access_key:
|
||||
|
@ -74,12 +74,12 @@ class ApiKey(AbstractModule):
|
|||
print(f'found AWS secret key')
|
||||
self.redis_logger.warning(f'{to_print}Checked {len(aws_secret_key)} found AWS secret Key;{item.get_id()}')
|
||||
|
||||
msg = f'infoleak:automatic-detection="aws-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="aws-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# Tags
|
||||
msg = f'infoleak:automatic-detection="api-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="api-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
if r_result:
|
||||
return google_api_key, aws_access_key, aws_secret_key
|
||||
|
|
|
@ -6,14 +6,14 @@ The ZMQ_PubSub_Categ Module
|
|||
|
||||
Each words files created under /files/ are representing categories.
|
||||
This modules take these files and compare them to
|
||||
the content of an item.
|
||||
the content of an obj.
|
||||
|
||||
When a word from a item match one or more of these words file, the filename of
|
||||
the item / zhe item id is published/forwarded to the next modules.
|
||||
When a word from a obj match one or more of these words file, the filename of
|
||||
the obj / the obj id is published/forwarded to the next modules.
|
||||
|
||||
Each category (each files) are representing a dynamic channel.
|
||||
This mean that if you create 1000 files under /files/ you'll have 1000 channels
|
||||
where every time there is a matching word to a category, the item containing
|
||||
where every time there is a matching word to a category, the obj containing
|
||||
this word will be pushed to this specific channel.
|
||||
|
||||
..note:: The channel will have the name of the file created.
|
||||
|
@ -44,7 +44,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
|
||||
|
||||
class Categ(AbstractModule):
|
||||
|
@ -81,27 +80,32 @@ class Categ(AbstractModule):
|
|||
self.categ_words = tmp_dict.items()
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
# Create Item Object
|
||||
item = Item(message)
|
||||
# Get item content
|
||||
content = item.get_content()
|
||||
# Get obj Object
|
||||
obj = self.get_obj()
|
||||
# Get obj content
|
||||
content = obj.get_content()
|
||||
categ_found = []
|
||||
|
||||
# Search for pattern categories in item content
|
||||
# Search for pattern categories in obj content
|
||||
for categ, pattern in self.categ_words:
|
||||
|
||||
found = set(re.findall(pattern, content))
|
||||
lenfound = len(found)
|
||||
if lenfound >= self.matchingThreshold:
|
||||
categ_found.append(categ)
|
||||
msg = f'{item.get_id()} {lenfound}'
|
||||
if obj.type == 'message':
|
||||
self.add_message_to_queue(message='0', queue=categ)
|
||||
else:
|
||||
|
||||
# Export message to categ queue
|
||||
print(msg, categ)
|
||||
self.add_message_to_queue(msg, categ)
|
||||
found = set(re.findall(pattern, content))
|
||||
lenfound = len(found)
|
||||
if lenfound >= self.matchingThreshold:
|
||||
categ_found.append(categ)
|
||||
msg = str(lenfound)
|
||||
|
||||
# Export message to categ queue
|
||||
print(msg, categ)
|
||||
self.add_message_to_queue(message=msg, queue=categ)
|
||||
|
||||
self.redis_logger.debug(
|
||||
f'Categ;{obj.get_source()};{obj.get_date()};{obj.get_basename()};Detected {lenfound} as {categ};{obj.get_id()}')
|
||||
|
||||
self.redis_logger.debug(
|
||||
f'Categ;{item.get_source()};{item.get_date()};{item.get_basename()};Detected {lenfound} as {categ};{item.get_id()}')
|
||||
if r_result:
|
||||
return categ_found
|
||||
|
||||
|
|
|
@ -29,7 +29,6 @@ Redis organization:
|
|||
import os
|
||||
import sys
|
||||
import time
|
||||
import re
|
||||
from datetime import datetime
|
||||
from pyfaup.faup import Faup
|
||||
|
||||
|
@ -85,8 +84,8 @@ class Credential(AbstractModule):
|
|||
|
||||
def compute(self, message):
|
||||
|
||||
item_id, count = message.split()
|
||||
item = Item(item_id)
|
||||
count = message
|
||||
item = self.get_obj()
|
||||
|
||||
item_content = item.get_content()
|
||||
|
||||
|
@ -111,8 +110,8 @@ class Credential(AbstractModule):
|
|||
print(f"========> Found more than 10 credentials in this file : {item.get_id()}")
|
||||
self.redis_logger.warning(to_print)
|
||||
|
||||
msg = f'infoleak:automatic-detection="credential";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="credential"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
site_occurrence = self.regex_findall(self.regex_site_for_stats, item.get_id(), item_content)
|
||||
|
||||
|
|
|
@ -68,8 +68,8 @@ class CreditCards(AbstractModule):
|
|||
return extracted
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
all_cards = self.regex_findall(self.regex, item.id, content)
|
||||
|
||||
|
@ -90,8 +90,8 @@ class CreditCards(AbstractModule):
|
|||
print(mess)
|
||||
self.redis_logger.warning(mess)
|
||||
|
||||
msg = f'infoleak:automatic-detection="credit-card";{item.id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="credit-card"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
if r_result:
|
||||
return creditcard_set
|
||||
|
|
|
@ -114,7 +114,7 @@ class Cryptocurrencies(AbstractModule, ABC):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
date = item.get_date()
|
||||
content = item.get_content()
|
||||
|
@ -130,18 +130,18 @@ class Cryptocurrencies(AbstractModule, ABC):
|
|||
if crypto.is_valid_address():
|
||||
# print(address)
|
||||
is_valid_address = True
|
||||
crypto.add(date, item_id)
|
||||
crypto.add(date, item)
|
||||
|
||||
# Check private key
|
||||
if is_valid_address:
|
||||
msg = f'{currency["tag"]};{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
msg = f'{currency["tag"]}'
|
||||
self.add_message_to_queue(message=msg, queue='Tags')
|
||||
|
||||
if currency.get('private_key'):
|
||||
private_keys = self.regex_findall(currency['private_key']['regex'], item_id, content)
|
||||
if private_keys:
|
||||
msg = f'{currency["private_key"]["tag"]};{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
msg = f'{currency["private_key"]["tag"]}'
|
||||
self.add_message_to_queue(message=msg, queue='Tags')
|
||||
|
||||
# debug
|
||||
print(private_keys)
|
||||
|
|
|
@ -44,9 +44,8 @@ class CveModule(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
|
||||
item_id, count = message.split()
|
||||
item = Item(item_id)
|
||||
count = message
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
|
||||
cves = self.regex_findall(self.reg_cve, item_id, item.get_content())
|
||||
|
@ -55,15 +54,15 @@ class CveModule(AbstractModule):
|
|||
date = item.get_date()
|
||||
for cve_id in cves:
|
||||
cve = Cves.Cve(cve_id)
|
||||
cve.add(date, item_id)
|
||||
cve.add(date, item)
|
||||
|
||||
warning = f'{item_id} contains CVEs {cves}'
|
||||
print(warning)
|
||||
self.redis_logger.warning(warning)
|
||||
|
||||
msg = f'infoleak:automatic-detection="cve";{item_id}'
|
||||
tag = 'infoleak:automatic-detection="cve"'
|
||||
# Send to Tags Queue
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -21,7 +21,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
from lib.objects.Decodeds import Decoded
|
||||
from trackers.Tracker_Term import Tracker_Term
|
||||
from trackers.Tracker_Regex import Tracker_Regex
|
||||
|
@ -87,18 +86,16 @@ class Decoder(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
|
||||
item = Item(message)
|
||||
content = item.get_content()
|
||||
date = item.get_date()
|
||||
content = self.obj.get_content()
|
||||
date = self.obj.get_date()
|
||||
new_decodeds = []
|
||||
|
||||
for decoder in self.decoder_order:
|
||||
find = False
|
||||
dname = decoder['name']
|
||||
|
||||
encodeds = self.regex_findall(decoder['regex'], item.id, content)
|
||||
# PERF remove encoded from item content
|
||||
encodeds = self.regex_findall(decoder['regex'], self.obj.id, content)
|
||||
# PERF remove encoded from obj content
|
||||
for encoded in encodeds:
|
||||
content = content.replace(encoded, '', 1)
|
||||
encodeds = set(encodeds)
|
||||
|
@ -114,33 +111,34 @@ class Decoder(AbstractModule):
|
|||
if not decoded.exists():
|
||||
mimetype = decoded.guess_mimetype(decoded_file)
|
||||
if not mimetype:
|
||||
print(sha1_string, item.id)
|
||||
raise Exception(f'Invalid mimetype: {decoded.id} {item.id}')
|
||||
print(sha1_string, self.obj.id)
|
||||
raise Exception(f'Invalid mimetype: {decoded.id} {self.obj.id}')
|
||||
decoded.save_file(decoded_file, mimetype)
|
||||
new_decodeds.append(decoded.id)
|
||||
else:
|
||||
mimetype = decoded.get_mimetype()
|
||||
decoded.add(dname, date, item.id, mimetype=mimetype)
|
||||
decoded.add(date, self.obj, dname, mimetype=mimetype)
|
||||
|
||||
# new_decodeds.append(decoded.id)
|
||||
self.logger.info(f'{item.id} : {dname} - {decoded.id} - {mimetype}')
|
||||
self.logger.info(f'{self.obj.id} : {dname} - {decoded.id} - {mimetype}')
|
||||
|
||||
if find:
|
||||
self.logger.info(f'{item.id} - {dname}')
|
||||
self.logger.info(f'{self.obj.id} - {dname}')
|
||||
|
||||
# Send to Tags
|
||||
msg = f'infoleak:automatic-detection="{dname}";{item.id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = f'infoleak:automatic-detection="{dname}"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
####################
|
||||
# TRACKERS DECODED
|
||||
for decoded_id in new_decodeds:
|
||||
decoded = Decoded(decoded_id)
|
||||
try:
|
||||
self.tracker_term.compute(decoded_id, obj_type='decoded')
|
||||
self.tracker_regex.compute(decoded_id, obj_type='decoded')
|
||||
self.tracker_term.compute_manual(decoded)
|
||||
self.tracker_regex.compute_manual(decoded)
|
||||
except UnicodeDecodeError:
|
||||
pass
|
||||
self.tracker_yara.compute(decoded_id, obj_type='decoded')
|
||||
self.tracker_yara.compute_manual(decoded)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -22,7 +22,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.objects.Items import Item
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import d4
|
||||
|
||||
|
@ -42,7 +41,13 @@ class DomClassifier(AbstractModule):
|
|||
|
||||
addr_dns = config_loader.get_config_str("DomClassifier", "dns")
|
||||
|
||||
self.c = DomainClassifier.domainclassifier.Extract(rawtext="", nameservers=[addr_dns])
|
||||
redis_host = config_loader.get_config_str('Redis_Cache', 'host')
|
||||
redis_port = config_loader.get_config_int('Redis_Cache', 'port')
|
||||
redis_db = config_loader.get_config_int('Redis_Cache', 'db')
|
||||
self.dom_classifier = DomainClassifier.domainclassifier.Extract(rawtext="", nameservers=[addr_dns],
|
||||
redis_host=redis_host,
|
||||
redis_port=redis_port, redis_db=redis_db,
|
||||
re_timeout=30)
|
||||
|
||||
self.cc = config_loader.get_config_str("DomClassifier", "cc")
|
||||
self.cc_tld = config_loader.get_config_str("DomClassifier", "cc_tld")
|
||||
|
@ -51,38 +56,42 @@ class DomClassifier(AbstractModule):
|
|||
self.logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
host, item_id = message.split()
|
||||
host = message
|
||||
|
||||
item = Item(item_id)
|
||||
item = self.get_obj()
|
||||
item_basename = item.get_basename()
|
||||
item_date = item.get_date()
|
||||
item_source = item.get_source()
|
||||
try:
|
||||
|
||||
self.c.text(rawtext=host)
|
||||
print(self.c.domain)
|
||||
self.c.validdomain(passive_dns=True, extended=False)
|
||||
# self.logger.debug(self.c.vdomain)
|
||||
self.dom_classifier.text(rawtext=host)
|
||||
if not self.dom_classifier.domain:
|
||||
return
|
||||
print(self.dom_classifier.domain)
|
||||
self.dom_classifier.validdomain(passive_dns=True, extended=False)
|
||||
# self.logger.debug(self.dom_classifier.vdomain)
|
||||
|
||||
print(self.c.vdomain)
|
||||
print(self.dom_classifier.vdomain)
|
||||
print()
|
||||
|
||||
if self.c.vdomain and d4.is_passive_dns_enabled():
|
||||
for dns_record in self.c.vdomain:
|
||||
self.add_message_to_queue(dns_record)
|
||||
if self.dom_classifier.vdomain and d4.is_passive_dns_enabled():
|
||||
for dns_record in self.dom_classifier.vdomain:
|
||||
self.add_message_to_queue(obj=None, message=dns_record)
|
||||
|
||||
localizeddomains = self.c.include(expression=self.cc_tld)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc_tld};{item.get_id()}")
|
||||
if self.cc_tld:
|
||||
localizeddomains = self.dom_classifier.include(expression=self.cc_tld)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc_tld};{item.get_id()}")
|
||||
|
||||
localizeddomains = self.c.localizedomain(cc=self.cc)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc};{item.get_id()}")
|
||||
if self.cc:
|
||||
localizeddomains = self.dom_classifier.localizedomain(cc=self.cc)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc};{item.get_id()}")
|
||||
|
||||
if r_result:
|
||||
return self.c.vdomain
|
||||
return self.dom_classifier.vdomain
|
||||
|
||||
except IOError as err:
|
||||
self.redis_logger.error(f"Duplicate;{item_source};{item_date};{item_basename};CRC Checksum Failed")
|
||||
|
|
|
@ -52,7 +52,7 @@ class Duplicates(AbstractModule):
|
|||
def compute(self, message):
|
||||
# IOError: "CRC Checksum Failed on : {id}"
|
||||
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
|
||||
# Check file size
|
||||
if item.get_size() < self.min_item_size:
|
||||
|
|
66
bin/modules/Exif.py
Executable file
66
bin/modules/Exif.py
Executable file
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Exif Module
|
||||
======================
|
||||
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import sys
|
||||
|
||||
from PIL import Image, ExifTags
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
|
||||
|
||||
class Exif(AbstractModule):
|
||||
"""
|
||||
CveModule for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Exif, self).__init__()
|
||||
|
||||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 1
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
image = self.get_obj()
|
||||
print(image)
|
||||
img = Image.open(image.get_filepath())
|
||||
img_exif = img.getexif()
|
||||
print(img_exif)
|
||||
if img_exif:
|
||||
self.logger.critical(f'Exif: {self.get_obj().id}')
|
||||
gps = img_exif.get(34853)
|
||||
print(gps)
|
||||
self.logger.critical(f'gps: {gps}')
|
||||
for key, val in img_exif.items():
|
||||
if key in ExifTags.TAGS:
|
||||
print(f'{ExifTags.TAGS[key]}:{val}')
|
||||
self.logger.critical(f'{ExifTags.TAGS[key]}:{val}')
|
||||
else:
|
||||
print(f'{key}:{val}')
|
||||
self.logger.critical(f'{key}:{val}')
|
||||
sys.exit(0)
|
||||
|
||||
# tag = 'infoleak:automatic-detection="cve"'
|
||||
# Send to Tags Queue
|
||||
# self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Exif()
|
||||
module.run()
|
|
@ -79,73 +79,58 @@ class Global(AbstractModule):
|
|||
self.time_last_stats = time.time()
|
||||
self.processed_item = 0
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
# Recovering the streamed message informations
|
||||
splitted = message.split()
|
||||
def compute(self, message, r_result=False): # TODO move OBJ ID sanitization to importer
|
||||
# Recovering the streamed message infos
|
||||
gzip64encoded = message
|
||||
|
||||
if len(splitted) == 2:
|
||||
item, gzip64encoded = splitted
|
||||
if self.obj.type == 'item':
|
||||
if gzip64encoded:
|
||||
|
||||
# Remove ITEMS_FOLDER from item path (crawled item + submitted)
|
||||
if self.ITEMS_FOLDER in item:
|
||||
item = item.replace(self.ITEMS_FOLDER, '', 1)
|
||||
# Creating the full filepath
|
||||
filename = os.path.join(self.ITEMS_FOLDER, self.obj.id)
|
||||
filename = os.path.realpath(filename)
|
||||
|
||||
file_name_item = item.split('/')[-1]
|
||||
if len(file_name_item) > 255:
|
||||
new_file_name_item = '{}{}.gz'.format(file_name_item[:215], str(uuid4()))
|
||||
item = self.rreplace(item, file_name_item, new_file_name_item, 1)
|
||||
# Incorrect filename
|
||||
if not os.path.commonprefix([filename, self.ITEMS_FOLDER]) == self.ITEMS_FOLDER:
|
||||
self.logger.warning(f'Global; Path traversal detected {filename}')
|
||||
print(f'Global; Path traversal detected {filename}')
|
||||
|
||||
# Creating the full filepath
|
||||
filename = os.path.join(self.ITEMS_FOLDER, item)
|
||||
filename = os.path.realpath(filename)
|
||||
else:
|
||||
# Decode compressed base64
|
||||
decoded = base64.standard_b64decode(gzip64encoded)
|
||||
new_file_content = self.gunzip_bytes_obj(filename, decoded)
|
||||
|
||||
# Incorrect filename
|
||||
if not os.path.commonprefix([filename, self.ITEMS_FOLDER]) == self.ITEMS_FOLDER:
|
||||
self.logger.warning(f'Global; Path traversal detected {filename}')
|
||||
print(f'Global; Path traversal detected {filename}')
|
||||
# TODO REWRITE ME
|
||||
if new_file_content:
|
||||
filename = self.check_filename(filename, new_file_content)
|
||||
|
||||
if filename:
|
||||
# create subdir
|
||||
dirname = os.path.dirname(filename)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
|
||||
with open(filename, 'wb') as f:
|
||||
f.write(decoded)
|
||||
|
||||
update_obj_date(self.obj.get_date(), 'item')
|
||||
|
||||
self.add_message_to_queue(obj=self.obj, queue='Item')
|
||||
self.processed_item += 1
|
||||
|
||||
print(self.obj.id)
|
||||
if r_result:
|
||||
return self.obj.id
|
||||
|
||||
else:
|
||||
# Decode compressed base64
|
||||
decoded = base64.standard_b64decode(gzip64encoded)
|
||||
new_file_content = self.gunzip_bytes_obj(filename, decoded)
|
||||
|
||||
if new_file_content:
|
||||
filename = self.check_filename(filename, new_file_content)
|
||||
|
||||
if filename:
|
||||
# create subdir
|
||||
dirname = os.path.dirname(filename)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
|
||||
with open(filename, 'wb') as f:
|
||||
f.write(decoded)
|
||||
|
||||
item_id = filename
|
||||
# remove self.ITEMS_FOLDER from
|
||||
if self.ITEMS_FOLDER in item_id:
|
||||
item_id = item_id.replace(self.ITEMS_FOLDER, '', 1)
|
||||
|
||||
item = Item(item_id)
|
||||
|
||||
update_obj_date(item.get_date(), 'item')
|
||||
|
||||
self.add_message_to_queue(item_id, 'Item')
|
||||
self.processed_item += 1
|
||||
|
||||
# DIRTY FIX AIL SYNC - SEND TO SYNC MODULE
|
||||
# # FIXME: DIRTY FIX
|
||||
message = f'{item.get_type()};{item.get_subtype(r_str=True)};{item.get_id()}'
|
||||
print(message)
|
||||
self.add_message_to_queue(message, 'Sync')
|
||||
|
||||
print(item_id)
|
||||
if r_result:
|
||||
return item_id
|
||||
|
||||
self.logger.info(f"Empty Item: {message} not processed")
|
||||
elif self.obj.type == 'message':
|
||||
# TODO send to specific object queue => image, ...
|
||||
self.add_message_to_queue(obj=self.obj, queue='Item')
|
||||
elif self.obj.type == 'image':
|
||||
self.add_message_to_queue(obj=self.obj, queue='Image')
|
||||
else:
|
||||
self.logger.debug(f"Empty Item: {message} not processed")
|
||||
print(f"Empty Item: {message} not processed")
|
||||
self.logger.critical(f"Empty obj: {self.obj} {message} not processed")
|
||||
|
||||
def check_filename(self, filename, new_file_content):
|
||||
"""
|
||||
|
|
|
@ -18,13 +18,14 @@ import os
|
|||
import re
|
||||
import sys
|
||||
|
||||
import DomainClassifier.domainclassifier
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class Hosts(AbstractModule):
|
||||
"""
|
||||
|
@ -43,29 +44,29 @@ class Hosts(AbstractModule):
|
|||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 1
|
||||
|
||||
self.host_regex = r'\b([a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)\b'
|
||||
re.compile(self.host_regex)
|
||||
|
||||
redis_host = config_loader.get_config_str('Redis_Cache', 'host')
|
||||
redis_port = config_loader.get_config_int('Redis_Cache', 'port')
|
||||
redis_db = config_loader.get_config_int('Redis_Cache', 'db')
|
||||
self.dom_classifier = DomainClassifier.domainclassifier.Extract(rawtext="",
|
||||
redis_host=redis_host,
|
||||
redis_port=redis_port,
|
||||
redis_db=redis_db,
|
||||
re_timeout=30)
|
||||
self.logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
obj = self.get_obj()
|
||||
|
||||
# mimetype = item_basic.get_item_mimetype(item.get_id())
|
||||
# if mimetype.split('/')[0] == "text":
|
||||
|
||||
content = item.get_content()
|
||||
hosts = self.regex_findall(self.host_regex, item.get_id(), content)
|
||||
if hosts:
|
||||
print(f'{len(hosts)} host {item.get_id()}')
|
||||
for host in hosts:
|
||||
# print(host)
|
||||
|
||||
msg = f'{host} {item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Host')
|
||||
content = obj.get_content()
|
||||
self.dom_classifier.text(content)
|
||||
if self.dom_classifier.domain:
|
||||
print(f'{len(self.dom_classifier.domain)} host {obj.get_id()}')
|
||||
# print(self.dom_classifier.domain)
|
||||
for domain in self.dom_classifier.domain:
|
||||
if domain:
|
||||
self.add_message_to_queue(message=domain, queue='Host')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Hosts()
|
||||
module.run()
|
||||
|
|
|
@ -43,14 +43,15 @@ class IPAddress(AbstractModule):
|
|||
networks = config_loader.get_config_str("IP", "networks")
|
||||
if not networks:
|
||||
print('No IP ranges provided')
|
||||
sys.exit(0)
|
||||
try:
|
||||
for network in networks.split(","):
|
||||
self.ip_networks.add(IPv4Network(network))
|
||||
print(f'IP Range To Search: {network}')
|
||||
except:
|
||||
print('Please provide a list of valid IP addresses')
|
||||
sys.exit(0)
|
||||
# sys.exit(0)
|
||||
else:
|
||||
try:
|
||||
for network in networks.split(","):
|
||||
self.ip_networks.add(IPv4Network(network))
|
||||
print(f'IP Range To Search: {network}')
|
||||
except:
|
||||
print('Please provide a list of valid IP addresses')
|
||||
sys.exit(0)
|
||||
|
||||
self.re_ipv4 = r'(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
|
||||
re.compile(self.re_ipv4)
|
||||
|
@ -62,7 +63,10 @@ class IPAddress(AbstractModule):
|
|||
self.logger.info(f"Module {self.module_name} initialized")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item = Item(message)
|
||||
if not self.ip_networks:
|
||||
return None
|
||||
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
# list of the regex results in the Item
|
||||
|
@ -82,8 +86,8 @@ class IPAddress(AbstractModule):
|
|||
self.redis_logger.warning(f'{item.get_id()} contains {item.get_id()} IPs')
|
||||
|
||||
# Tag message with IP
|
||||
msg = f'infoleak:automatic-detection="ip";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="ip"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -73,7 +73,7 @@ class Iban(AbstractModule):
|
|||
return extracted
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
|
||||
ibans = self.regex_findall(self.iban_regex, item_id, item.get_content())
|
||||
|
@ -97,8 +97,8 @@ class Iban(AbstractModule):
|
|||
to_print = f'Iban;{item.get_source()};{item.get_date()};{item.get_basename()};'
|
||||
self.redis_logger.warning(f'{to_print}Checked found {len(valid_ibans)} IBAN;{item_id}')
|
||||
# Tags
|
||||
msg = f'infoleak:automatic-detection="iban";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="iban"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -93,12 +93,12 @@ class Indexer(AbstractModule):
|
|||
self.last_refresh = time_now
|
||||
|
||||
def compute(self, message):
|
||||
docpath = message.split(" ", -1)[-1]
|
||||
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
item_content = item.get_content()
|
||||
|
||||
docpath = item_id
|
||||
|
||||
self.logger.debug(f"Indexing - {self.indexname}: {docpath}")
|
||||
print(f"Indexing - {self.indexname}: {docpath}")
|
||||
|
||||
|
|
|
@ -56,7 +56,7 @@ class Keys(AbstractModule):
|
|||
self.pending_seconds = 1
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
# find = False
|
||||
|
@ -65,107 +65,107 @@ class Keys(AbstractModule):
|
|||
if KeyEnum.PGP_MESSAGE.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a PGP enc message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="pgp-message";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-message"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
# find = True
|
||||
|
||||
if KeyEnum.PGP_PUBLIC_KEY_BLOCK.value in content:
|
||||
msg = f'infoleak:automatic-detection="pgp-public-key-block";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-public-key-block"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
|
||||
if KeyEnum.PGP_SIGNATURE.value in content:
|
||||
msg = f'infoleak:automatic-detection="pgp-signature";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-signature"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
|
||||
if KeyEnum.PGP_PRIVATE_KEY_BLOCK.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a pgp private key block message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="pgp-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
|
||||
if KeyEnum.CERTIFICATE.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a certificate message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="certificate";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="certificate"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.RSA_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a RSA private key message')
|
||||
print('rsa private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="rsa-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="rsa-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a private key message')
|
||||
print('private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.ENCRYPTED_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an encrypted private key message')
|
||||
print('encrypted private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="encrypted-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="encrypted-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.OPENSSH_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message')
|
||||
print('openssh private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="private-ssh-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="private-ssh-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.SSH2_ENCRYPTED_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an ssh2 private key message')
|
||||
print('SSH2 private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="private-ssh-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="private-ssh-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.OPENVPN_STATIC_KEY_V1.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message')
|
||||
print('OpenVPN Static key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="vpn-static-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="vpn-static-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.DSA_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a dsa private key message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="dsa-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="dsa-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.EC_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an ec private key message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="ec-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="ec-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.PUBLIC_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a public key message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="public-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="public-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
# pgp content
|
||||
if get_pgp_content:
|
||||
self.add_message_to_queue(item.get_id(), 'PgpDump')
|
||||
self.add_message_to_queue(queue='PgpDump')
|
||||
|
||||
# if find :
|
||||
# # Send to duplicate
|
||||
|
|
|
@ -25,11 +25,14 @@ class Languages(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
if item.is_crawled():
|
||||
domain = Domain(item.get_domain())
|
||||
for lang in item.get_languages(min_probability=0.8):
|
||||
domain.add_language(lang.language)
|
||||
obj = self.get_obj()
|
||||
|
||||
if obj.type == 'item':
|
||||
if obj.is_crawled():
|
||||
domain = Domain(obj.get_domain())
|
||||
for lang in obj.get_languages(min_probability=0.8, force_gcld3=True):
|
||||
print(lang)
|
||||
domain.add_language(lang)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -25,9 +25,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
# from lib import Statistics
|
||||
|
||||
class LibInjection(AbstractModule):
|
||||
"""docstring for LibInjection module."""
|
||||
|
@ -40,7 +37,8 @@ class LibInjection(AbstractModule):
|
|||
self.redis_logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message):
|
||||
url, item_id = message.split()
|
||||
item = self.get_obj()
|
||||
url = message
|
||||
|
||||
self.faup.decode(url)
|
||||
url_parsed = self.faup.get()
|
||||
|
@ -68,7 +66,6 @@ class LibInjection(AbstractModule):
|
|||
# print(f'query is sqli : {result_query}')
|
||||
|
||||
if result_path['sqli'] is True or result_query['sqli'] is True:
|
||||
item = Item(item_id)
|
||||
item_id = item.get_id()
|
||||
print(f"Detected (libinjection) SQL in URL: {item_id}")
|
||||
print(unquote(url))
|
||||
|
@ -77,8 +74,8 @@ class LibInjection(AbstractModule):
|
|||
self.redis_logger.warning(to_print)
|
||||
|
||||
# Add tag
|
||||
msg = f'infoleak:automatic-detection="sql-injection";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="sql-injection"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# statistics
|
||||
# # # TODO: # FIXME: remove me
|
||||
|
|
|
@ -45,8 +45,9 @@ class MISP_Thehive_Auto_Push(AbstractModule):
|
|||
self.last_refresh = time.time()
|
||||
self.redis_logger.info('Tags Auto Push refreshed')
|
||||
|
||||
item_id, tag = message.split(';', 1)
|
||||
item = Item(item_id)
|
||||
tag = message
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
|
||||
# enabled
|
||||
if 'misp' in self.tags:
|
||||
|
|
|
@ -135,11 +135,11 @@ class Mail(AbstractModule):
|
|||
|
||||
# # TODO: sanitize mails
|
||||
def compute(self, message):
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
item_date = item.get_date()
|
||||
|
||||
mails = self.regex_findall(self.email_regex, item_id, item.get_content())
|
||||
mails = self.regex_findall(self.email_regex, item.id, item.get_content())
|
||||
mxdomains_email = {}
|
||||
for mail in mails:
|
||||
mxdomain = mail.rsplit('@', 1)[1].lower()
|
||||
|
@ -172,13 +172,13 @@ class Mail(AbstractModule):
|
|||
# for tld in mx_tlds:
|
||||
# Statistics.add_module_tld_stats_by_date('mail', item_date, tld, mx_tlds[tld])
|
||||
|
||||
msg = f'Mails;{item.get_source()};{item_date};{item.get_basename()};Checked {num_valid_email} e-mail(s);{item_id}'
|
||||
msg = f'Mails;{item.get_source()};{item_date};{item.get_basename()};Checked {num_valid_email} e-mail(s);{item.id}'
|
||||
if num_valid_email > self.mail_threshold:
|
||||
print(f'{item_id} Checked {num_valid_email} e-mail(s)')
|
||||
print(f'{item.id} Checked {num_valid_email} e-mail(s)')
|
||||
self.redis_logger.warning(msg)
|
||||
# Tags
|
||||
msg = f'infoleak:automatic-detection="mail";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="mail"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
elif num_valid_email > 0:
|
||||
self.redis_logger.info(msg)
|
||||
|
||||
|
|
|
@ -9,7 +9,7 @@ This module is consuming the Redis-list created by the ZMQ_Feed_Q Module.
|
|||
This module take all the feeds provided in the config.
|
||||
|
||||
|
||||
Depending on the configuration, this module will process the feed as follow:
|
||||
Depending on the configuration, this module will process the feed as follows:
|
||||
operation_mode 1: "Avoid any duplicate from any sources"
|
||||
- The module maintain a list of content for each item
|
||||
- If the content is new, process it
|
||||
|
@ -64,9 +64,6 @@ class Mixer(AbstractModule):
|
|||
self.ttl_key = config_loader.get_config_int("Module_Mixer", "ttl_duplicate")
|
||||
self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name")
|
||||
|
||||
self.ITEMS_FOLDER = os.path.join(os.environ['AIL_HOME'], config_loader.get_config_str("Directories", "pastes")) + '/'
|
||||
self.ITEMS_FOLDER = os.path.join(os.path.realpath(self.ITEMS_FOLDER), '')
|
||||
|
||||
self.nb_processed_items = 0
|
||||
self.feeders_processed = {}
|
||||
self.feeders_duplicate = {}
|
||||
|
@ -131,37 +128,45 @@ class Mixer(AbstractModule):
|
|||
|
||||
self.last_refresh = time.time()
|
||||
self.clear_feeders_stat()
|
||||
time.sleep(0.5)
|
||||
time.sleep(0.5)
|
||||
|
||||
def computeNone(self):
|
||||
self.refresh_stats()
|
||||
|
||||
def compute(self, message):
|
||||
self.refresh_stats()
|
||||
# obj = self.obj
|
||||
# TODO CHECK IF NOT self.object -> get object global ID from message
|
||||
|
||||
splitted = message.split()
|
||||
# Old Feeder name "feeder>>item_id gzip64encoded"
|
||||
if len(splitted) == 2:
|
||||
item_id, gzip64encoded = splitted
|
||||
try:
|
||||
feeder_name, item_id = item_id.split('>>')
|
||||
feeder_name.replace(" ", "")
|
||||
if 'import_dir' in feeder_name:
|
||||
feeder_name = feeder_name.split('/')[1]
|
||||
except ValueError:
|
||||
feeder_name = self.default_feeder_name
|
||||
# Feeder name in message: "feeder item_id gzip64encoded"
|
||||
elif len(splitted) == 3:
|
||||
feeder_name, item_id, gzip64encoded = splitted
|
||||
# message -> feeder_name - content
|
||||
# or message -> feeder_name
|
||||
|
||||
# feeder_name - object
|
||||
if len(splitted) == 1: # feeder_name - object (content already saved)
|
||||
feeder_name = message
|
||||
gzip64encoded = None
|
||||
|
||||
# Feeder name in message: "feeder obj_id gzip64encoded"
|
||||
elif len(splitted) == 2: # gzip64encoded content
|
||||
feeder_name, gzip64encoded = splitted
|
||||
else:
|
||||
print('Invalid message: not processed')
|
||||
self.logger.debug(f'Invalid Item: {splitted[0]} not processed')
|
||||
self.logger.warning(f'Invalid Message: {splitted} not processed')
|
||||
return None
|
||||
|
||||
# remove absolute path
|
||||
item_id = item_id.replace(self.ITEMS_FOLDER, '', 1)
|
||||
if self.obj.type == 'item':
|
||||
# Remove ITEMS_FOLDER from item path (crawled item + submitted)
|
||||
# Limit basename length
|
||||
obj_id = self.obj.id
|
||||
self.obj.sanitize_id()
|
||||
if self.obj.id != obj_id:
|
||||
self.queue.rename_message_obj(self.obj.id, obj_id)
|
||||
|
||||
relay_message = f'{item_id} {gzip64encoded}'
|
||||
|
||||
relay_message = gzip64encoded
|
||||
# print(relay_message)
|
||||
|
||||
# TODO only work for item object
|
||||
# Avoid any duplicate coming from any sources
|
||||
if self.operation_mode == 1:
|
||||
digest = hashlib.sha1(gzip64encoded.encode('utf8')).hexdigest()
|
||||
|
@ -173,7 +178,7 @@ class Mixer(AbstractModule):
|
|||
self.r_cache.expire(digest, self.ttl_key)
|
||||
|
||||
self.increase_stat_processed(feeder_name)
|
||||
self.add_message_to_queue(relay_message)
|
||||
self.add_message_to_queue(message=relay_message)
|
||||
|
||||
# Need To Be Fixed, Currently doesn't check the source (-> same as operation 1)
|
||||
# # Keep duplicate coming from different sources
|
||||
|
@ -210,7 +215,10 @@ class Mixer(AbstractModule):
|
|||
# No Filtering
|
||||
else:
|
||||
self.increase_stat_processed(feeder_name)
|
||||
self.add_message_to_queue(relay_message)
|
||||
if self.obj.type == 'item':
|
||||
self.add_message_to_queue(obj=self.obj, message=gzip64encoded)
|
||||
else:
|
||||
self.add_message_to_queue(obj=self.obj)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -42,7 +42,8 @@ class Onion(AbstractModule):
|
|||
self.faup = crawlers.get_faup()
|
||||
|
||||
# activate_crawler = p.config.get("Crawler", "activate_crawler")
|
||||
|
||||
self.har = config_loader.get_config_boolean('Crawler', 'default_har')
|
||||
self.screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot')
|
||||
|
||||
self.onion_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
# self.i2p_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
|
@ -69,8 +70,8 @@ class Onion(AbstractModule):
|
|||
onion_urls = []
|
||||
domains = []
|
||||
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
|
||||
# max execution time on regex
|
||||
|
@ -90,8 +91,9 @@ class Onion(AbstractModule):
|
|||
|
||||
if onion_urls:
|
||||
if crawlers.is_crawler_activated():
|
||||
for domain in domains: # TODO LOAD DEFAULT SCREENSHOT + HAR
|
||||
task_uuid = crawlers.create_task(domain, parent=item.get_id(), priority=0)
|
||||
for domain in domains:
|
||||
task_uuid = crawlers.create_task(domain, parent=item.get_id(), priority=0,
|
||||
har=self.har, screenshot=self.screenshot)
|
||||
if task_uuid:
|
||||
print(f'{domain} added to crawler queue: {task_uuid}')
|
||||
else:
|
||||
|
@ -100,8 +102,8 @@ class Onion(AbstractModule):
|
|||
self.redis_logger.warning(f'{to_print}Detected {len(domains)} .onion(s);{item.get_id()}')
|
||||
|
||||
# TAG Item
|
||||
msg = f'infoleak:automatic-detection="onion";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="onion"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
144
bin/modules/Pasties.py
Executable file
144
bin/modules/Pasties.py
Executable file
|
@ -0,0 +1,144 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Pasties Module
|
||||
======================
|
||||
This module spots domain-pasties services for further processing
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
from pyfaup.faup import Faup
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import crawlers
|
||||
|
||||
# TODO add url validator
|
||||
|
||||
pasties_blocklist_urls = set()
|
||||
pasties_domains = {}
|
||||
|
||||
class Pasties(AbstractModule):
|
||||
"""
|
||||
Pasties module for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Pasties, self).__init__()
|
||||
self.faup = Faup()
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
self.r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
|
||||
self.pasties = {}
|
||||
self.urls_blocklist = set()
|
||||
self.load_pasties_domains()
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def load_pasties_domains(self):
|
||||
self.pasties = {}
|
||||
self.urls_blocklist = set()
|
||||
|
||||
domains_pasties = os.path.join(os.environ['AIL_HOME'], 'files/domains_pasties')
|
||||
if os.path.exists(domains_pasties):
|
||||
with open(domains_pasties) as f:
|
||||
for line in f:
|
||||
url = line.strip()
|
||||
if url: # TODO validate line
|
||||
self.faup.decode(url)
|
||||
url_decoded = self.faup.get()
|
||||
host = url_decoded['host']
|
||||
# if url_decoded.get('port', ''):
|
||||
# host = f'{host}:{url_decoded["port"]}'
|
||||
path = url_decoded.get('resource_path', '')
|
||||
# print(url_decoded)
|
||||
if path and path != '/':
|
||||
if path[-1] != '/':
|
||||
path = f'{path}/'
|
||||
else:
|
||||
path = None
|
||||
|
||||
if host in self.pasties:
|
||||
if path:
|
||||
self.pasties[host].add(path)
|
||||
else:
|
||||
if path:
|
||||
self.pasties[host] = {path}
|
||||
else:
|
||||
self.pasties[host] = set()
|
||||
|
||||
url_blocklist = os.path.join(os.environ['AIL_HOME'], 'files/domains_pasties_blacklist')
|
||||
if os.path.exists(url_blocklist):
|
||||
with open(url_blocklist) as f:
|
||||
for line in f:
|
||||
url = line.strip()
|
||||
self.faup.decode(url)
|
||||
url_decoded = self.faup.get()
|
||||
host = url_decoded['host']
|
||||
# if url_decoded.get('port', ''):
|
||||
# host = f'{host}:{url_decoded["port"]}'
|
||||
path = url_decoded.get('resource_path', '')
|
||||
url = f'{host}{path}'
|
||||
if url_decoded['query_string']:
|
||||
url = url + url_decoded['query_string']
|
||||
self.urls_blocklist.add(url)
|
||||
|
||||
def send_to_crawler(self, url, obj_id):
|
||||
if not self.r_cache.exists(f'{self.module_name}:url:{url}'):
|
||||
self.r_cache.set(f'{self.module_name}:url:{url}', int(time.time()))
|
||||
self.r_cache.expire(f'{self.module_name}:url:{url}', 86400)
|
||||
crawlers.create_task(url, depth=0, har=False, screenshot=False, proxy='force_tor', priority=60, parent=obj_id)
|
||||
|
||||
def compute(self, message):
|
||||
url = message.split()
|
||||
|
||||
self.faup.decode(url)
|
||||
url_decoded = self.faup.get()
|
||||
# print(url_decoded)
|
||||
url_host = url_decoded['host']
|
||||
# if url_decoded.get('port', ''):
|
||||
# url_host = f'{url_host}:{url_decoded["port"]}'
|
||||
path = url_decoded.get('resource_path', '')
|
||||
if url_host in self.pasties:
|
||||
if url.startswith('http://'):
|
||||
if url[7:] in self.urls_blocklist:
|
||||
return None
|
||||
elif url.startswith('https://'):
|
||||
if url[8:] in self.urls_blocklist:
|
||||
return None
|
||||
else:
|
||||
if url in self.urls_blocklist:
|
||||
return None
|
||||
|
||||
if not self.pasties[url_host]:
|
||||
if path and path != '/':
|
||||
print('send to crawler', url_host, url)
|
||||
self.send_to_crawler(url, self.obj.id)
|
||||
else:
|
||||
if path.endswith('/'):
|
||||
path_end = path[:-1]
|
||||
else:
|
||||
path_end = f'{path}/'
|
||||
for url_path in self.pasties[url_host]:
|
||||
if path.startswith(url_path):
|
||||
if url_path != path and url_path != path_end:
|
||||
print('send to crawler', url_path, url)
|
||||
self.send_to_crawler(url, self.obj.id))
|
||||
break
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = Pasties()
|
||||
module.run()
|
|
@ -24,7 +24,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.objects import Pgps
|
||||
from lib.objects.Items import Item
|
||||
from trackers.Tracker_Term import Tracker_Term
|
||||
from trackers.Tracker_Regex import Tracker_Regex
|
||||
from trackers.Tracker_Yara import Tracker_Yara
|
||||
|
@ -61,7 +60,6 @@ class PgpDump(AbstractModule):
|
|||
self.tracker_yara = Tracker_Yara(queue=False)
|
||||
|
||||
# init
|
||||
self.item_id = None
|
||||
self.keys = set()
|
||||
self.private_keys = set()
|
||||
self.names = set()
|
||||
|
@ -93,11 +91,11 @@ class PgpDump(AbstractModule):
|
|||
print()
|
||||
pgp_block = self.remove_html(pgp_block)
|
||||
# Remove Version
|
||||
versions = self.regex_findall(self.reg_tool_version, self.item_id, pgp_block)
|
||||
versions = self.regex_findall(self.reg_tool_version, self.obj.id, pgp_block)
|
||||
for version in versions:
|
||||
pgp_block = pgp_block.replace(version, '')
|
||||
# Remove Comment
|
||||
comments = self.regex_findall(self.reg_block_comment, self.item_id, pgp_block)
|
||||
comments = self.regex_findall(self.reg_block_comment, self.obj.id, pgp_block)
|
||||
for comment in comments:
|
||||
pgp_block = pgp_block.replace(comment, '')
|
||||
# Remove Empty Lines
|
||||
|
@ -130,7 +128,7 @@ class PgpDump(AbstractModule):
|
|||
try:
|
||||
output = output.decode()
|
||||
except UnicodeDecodeError:
|
||||
self.logger.error(f'Error PgpDump UnicodeDecodeError: {self.item_id}')
|
||||
self.logger.error(f'Error PgpDump UnicodeDecodeError: {self.obj.id}')
|
||||
output = ''
|
||||
return output
|
||||
|
||||
|
@ -145,7 +143,7 @@ class PgpDump(AbstractModule):
|
|||
private = True
|
||||
else:
|
||||
private = False
|
||||
users = self.regex_findall(self.reg_user_id, self.item_id, pgpdump_output)
|
||||
users = self.regex_findall(self.reg_user_id, self.obj.id, pgpdump_output)
|
||||
for user in users:
|
||||
# avoid key injection in user_id:
|
||||
pgpdump_output.replace(user, '', 1)
|
||||
|
@ -159,7 +157,7 @@ class PgpDump(AbstractModule):
|
|||
name = user
|
||||
self.names.add(name)
|
||||
|
||||
keys = self.regex_findall(self.reg_key_id, self.item_id, pgpdump_output)
|
||||
keys = self.regex_findall(self.reg_key_id, self.obj.id, pgpdump_output)
|
||||
for key_id in keys:
|
||||
key_id = key_id.replace('Key ID - ', '', 1)
|
||||
if key_id != '0x0000000000000000':
|
||||
|
@ -171,28 +169,26 @@ class PgpDump(AbstractModule):
|
|||
print('symmetrically encrypted')
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
self.item_id = item.get_id()
|
||||
content = item.get_content()
|
||||
content = self.obj.get_content()
|
||||
|
||||
pgp_blocks = []
|
||||
# Public Block
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_public_blocs, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_public_blocs, self.obj.id, content):
|
||||
# content = content.replace(pgp_block, '')
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
# Private Block
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_private_blocs, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_private_blocs, self.obj.id, content):
|
||||
# content = content.replace(pgp_block, '')
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
# Signature
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_signature, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_signature, self.obj.id, content):
|
||||
# content = content.replace(pgp_block, '')
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
# Message
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_message, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_message, self.obj.id, content):
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
|
||||
|
@ -206,26 +202,26 @@ class PgpDump(AbstractModule):
|
|||
self.extract_id_from_pgpdump_output(pgpdump_output)
|
||||
|
||||
if self.keys or self.names or self.mails:
|
||||
print(self.item_id)
|
||||
date = item.get_date()
|
||||
print(self.obj.id)
|
||||
date = self.obj.get_date()
|
||||
for key in self.keys:
|
||||
pgp = Pgps.Pgp(key, 'key')
|
||||
pgp.add(date, self.item_id)
|
||||
pgp.add(date, self.obj)
|
||||
print(f' key: {key}')
|
||||
for name in self.names:
|
||||
pgp = Pgps.Pgp(name, 'name')
|
||||
pgp.add(date, self.item_id)
|
||||
pgp.add(date, self.obj)
|
||||
print(f' name: {name}')
|
||||
self.tracker_term.compute(name, obj_type='pgp', subtype='name')
|
||||
self.tracker_regex.compute(name, obj_type='pgp', subtype='name')
|
||||
self.tracker_yara.compute(name, obj_type='pgp', subtype='name')
|
||||
self.tracker_term.compute_manual(pgp)
|
||||
self.tracker_regex.compute_manual(pgp)
|
||||
self.tracker_yara.compute_manual(pgp)
|
||||
for mail in self.mails:
|
||||
pgp = Pgps.Pgp(mail, 'mail')
|
||||
pgp.add(date, self.item_id)
|
||||
pgp.add(date, self.obj)
|
||||
print(f' mail: {mail}')
|
||||
self.tracker_term.compute(mail, obj_type='pgp', subtype='mail')
|
||||
self.tracker_regex.compute(mail, obj_type='pgp', subtype='mail')
|
||||
self.tracker_yara.compute(mail, obj_type='pgp', subtype='mail')
|
||||
self.tracker_term.compute_manual(pgp)
|
||||
self.tracker_regex.compute_manual(pgp)
|
||||
self.tracker_yara.compute_manual(pgp)
|
||||
|
||||
# Keys extracted from PGP PRIVATE KEY BLOCK
|
||||
for key in self.private_keys:
|
||||
|
@ -234,11 +230,10 @@ class PgpDump(AbstractModule):
|
|||
print(f' private key: {key}')
|
||||
|
||||
if self.symmetrically_encrypted:
|
||||
msg = f'infoleak:automatic-detection="pgp-symmetric";{self.item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-symmetric"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = PgpDump()
|
||||
module.run()
|
||||
|
||||
|
|
|
@ -43,13 +43,13 @@ class Phone(AbstractModule):
|
|||
|
||||
def extract(self, obj_id, content, tag):
|
||||
extracted = []
|
||||
phones = self.regex_phone_iter('US', obj_id, content)
|
||||
phones = self.regex_phone_iter('ZZ', obj_id, content)
|
||||
for phone in phones:
|
||||
extracted.append([phone[0], phone[1], phone[2], f'tag:{tag}'])
|
||||
return extracted
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
# TODO use language detection to choose the country code ?
|
||||
|
@ -59,8 +59,8 @@ class Phone(AbstractModule):
|
|||
|
||||
if results:
|
||||
# TAGS
|
||||
msg = f'infoleak:automatic-detection="phone-number";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="phone-number"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
self.redis_logger.warning(f'{item.get_id()} contains {len(phone)} Phone numbers')
|
||||
|
||||
|
|
|
@ -44,22 +44,21 @@ class SQLInjectionDetection(AbstractModule):
|
|||
self.logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message):
|
||||
url, item_id = message.split()
|
||||
url = message
|
||||
item = self.get_obj()
|
||||
|
||||
if self.is_sql_injection(url):
|
||||
self.faup.decode(url)
|
||||
url_parsed = self.faup.get()
|
||||
|
||||
item = Item(item_id)
|
||||
item_id = item.get_id()
|
||||
print(f"Detected SQL in URL: {item_id}")
|
||||
print(urllib.request.unquote(url))
|
||||
to_print = f'SQLInjection;{item.get_source()};{item.get_date()};{item.get_basename()};Detected SQL in URL;{item_id}'
|
||||
self.redis_logger.warning(to_print)
|
||||
|
||||
# Tag
|
||||
msg = f'infoleak:automatic-detection="sql-injection";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = f'infoleak:automatic-detection="sql-injection";{item_id}'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# statistics
|
||||
# tld = url_parsed['tld']
|
||||
|
|
|
@ -16,8 +16,6 @@ import gzip
|
|||
import base64
|
||||
import datetime
|
||||
import time
|
||||
# from sflock.main import unpack
|
||||
# import sflock
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
|
@ -27,7 +25,7 @@ from modules.abstract_module import AbstractModule
|
|||
from lib.objects.Items import ITEMS_FOLDER
|
||||
from lib import ConfigLoader
|
||||
from lib import Tag
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class SubmitPaste(AbstractModule):
|
||||
"""
|
||||
|
@ -48,7 +46,6 @@ class SubmitPaste(AbstractModule):
|
|||
"""
|
||||
super(SubmitPaste, self).__init__()
|
||||
|
||||
# TODO KVROCKS
|
||||
self.r_serv_db = ConfigLoader.ConfigLoader().get_db_conn("Kvrocks_DB")
|
||||
self.r_serv_log_submit = ConfigLoader.ConfigLoader().get_redis_conn("Redis_Log_submit")
|
||||
|
||||
|
@ -279,9 +276,11 @@ class SubmitPaste(AbstractModule):
|
|||
rel_item_path = save_path.replace(self.PASTES_FOLDER, '', 1)
|
||||
self.redis_logger.debug(f"relative path {rel_item_path}")
|
||||
|
||||
item = Item(rel_item_path)
|
||||
|
||||
# send paste to Global module
|
||||
relay_message = f"submitted {rel_item_path} {gzip64encoded}"
|
||||
self.add_message_to_queue(relay_message)
|
||||
relay_message = f"submitted {gzip64encoded}"
|
||||
self.add_message_to_queue(obj=item, message=relay_message)
|
||||
|
||||
# add tags
|
||||
for tag in ltags:
|
||||
|
|
|
@ -20,9 +20,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.objects.Items import Item
|
||||
from lib import Tag
|
||||
|
||||
|
||||
class Tags(AbstractModule):
|
||||
"""
|
||||
|
@ -39,26 +36,15 @@ class Tags(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
# Extract item ID and tag from message
|
||||
mess_split = message.split(';')
|
||||
if len(mess_split) == 2:
|
||||
tag = mess_split[0]
|
||||
item = Item(mess_split[1])
|
||||
item = self.obj
|
||||
tag = message
|
||||
|
||||
# Create a new tag
|
||||
Tag.add_object_tag(tag, 'item', item.get_id())
|
||||
print(f'{item.get_id()}: Tagged {tag}')
|
||||
|
||||
# Forward message to channel
|
||||
self.add_message_to_queue(message, 'Tag_feed')
|
||||
|
||||
message = f'{item.get_type()};{item.get_subtype(r_str=True)};{item.get_id()}'
|
||||
self.add_message_to_queue(message, 'Sync')
|
||||
|
||||
else:
|
||||
# Malformed message
|
||||
raise Exception(f'too many values to unpack (expected 2) given {len(mess_split)} with message {message}')
|
||||
# Create a new tag
|
||||
item.add_tag(tag)
|
||||
print(f'{item.get_id()}: Tagged {tag}')
|
||||
|
||||
# Forward message to channel
|
||||
self.add_message_to_queue(message=tag, queue='Tag_feed')
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = Tags()
|
||||
|
|
|
@ -41,7 +41,7 @@ class Telegram(AbstractModule):
|
|||
self.logger.info(f"Module {self.module_name} initialized")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
item_date = item.get_date()
|
||||
|
||||
|
@ -58,7 +58,7 @@ class Telegram(AbstractModule):
|
|||
user_id = dict_url.get('username')
|
||||
if user_id:
|
||||
username = Username(user_id, 'telegram')
|
||||
username.add(item_date, item.id)
|
||||
username.add(item_date, item)
|
||||
print(f'username: {user_id}')
|
||||
invite_hash = dict_url.get('invite_hash')
|
||||
if invite_hash:
|
||||
|
@ -73,7 +73,7 @@ class Telegram(AbstractModule):
|
|||
user_id = dict_url.get('username')
|
||||
if user_id:
|
||||
username = Username(user_id, 'telegram')
|
||||
username.add(item_date, item.id)
|
||||
username.add(item_date, item)
|
||||
print(f'username: {user_id}')
|
||||
invite_hash = dict_url.get('invite_hash')
|
||||
if invite_hash:
|
||||
|
@ -86,8 +86,8 @@ class Telegram(AbstractModule):
|
|||
# CREATE TAG
|
||||
if invite_code_found:
|
||||
# tags
|
||||
msg = f'infoleak:automatic-detection="telegram-invite-hash";{item.id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="telegram-invite-hash"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -30,15 +30,15 @@ class Template(AbstractModule):
|
|||
def __init__(self):
|
||||
super(Template, self).__init__()
|
||||
|
||||
# Pending time between two computation (computeNone) in seconds
|
||||
self.pending_seconds = 10
|
||||
# Pending time between two computation (computeNone) in seconds, 10 by default
|
||||
# self.pending_seconds = 10
|
||||
|
||||
# Send module state to logs
|
||||
# logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
# def computeNone(self):
|
||||
# """
|
||||
# Do something when there is no message in the queue
|
||||
# Do something when there is no message in the queue. Optional
|
||||
# """
|
||||
# self.logger.debug("No message in queue")
|
||||
|
||||
|
@ -53,6 +53,5 @@ class Template(AbstractModule):
|
|||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Template()
|
||||
module.run()
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue