mirror of
https://github.com/ail-project/ail-framework.git
synced 2025-02-16 22:36:25 +00:00
Merge branch 'master' into flask-session-fix
This commit is contained in:
commit
316831d337
264 changed files with 16020 additions and 4155 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -16,6 +16,7 @@ tlsh
|
|||
Blooms
|
||||
PASTES
|
||||
CRAWLED_SCREENSHOT
|
||||
IMAGES
|
||||
BASE64
|
||||
HASHS
|
||||
DATA_ARDB
|
||||
|
|
72
HOWTO.md
72
HOWTO.md
|
@ -1,17 +1,16 @@
|
|||
|
||||
# Feeding, adding new features and contributing
|
||||
# Feeding, Adding new features and Contributing
|
||||
|
||||
## [Documentation AIL Importers](./doc/README.md#ail-importers)
|
||||
## [AIL Importers](./doc/README.md#ail-importers)
|
||||
|
||||
[Documentation AIL Importers](./doc/README.md#ail-importers)
|
||||
Refer to the [AIL Importers Documentation](./doc/README.md#ail-importers)
|
||||
|
||||
## How to feed the AIL framework
|
||||
## Feeding Data to AIL
|
||||
|
||||
AIL is an analysis tool, not a collector!
|
||||
However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries!
|
||||
|
||||
1. [AIL Importers](./doc/README.md#ail-importers)
|
||||
|
||||
2. ZMQ: Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP you are using for AIL.
|
||||
|
||||
## How to create a new module
|
||||
|
@ -19,22 +18,16 @@ However, if you want to collect some pastes and feed them to AIL, the procedure
|
|||
To add a new processing or analysis module to AIL, follow these steps:
|
||||
|
||||
1. Add your module name in [./configs/modules.cfg](./configs/modules.cfg) and subscribe to at least one module at minimum (Usually, `Item`).
|
||||
|
||||
2. Use [./bin/modules/modules/TemplateModule.py](./bin/modules/modules/TemplateModule.py) as a sample module and create a new file in bin/modules with the module name used in the `modules.cfg` configuration.
|
||||
|
||||
|
||||
## How to contribute a module
|
||||
## Contributions
|
||||
|
||||
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
|
||||
Contributions are welcome! Fork the repository, experiment with the code, and submit your modules or patches through a pull request.
|
||||
|
||||
To contribute your module, feel free to pull your contribution.
|
||||
## Crawler
|
||||
|
||||
|
||||
## Additional information
|
||||
|
||||
### Crawler
|
||||
|
||||
In AIL, you can crawl websites and Tor hidden services. Don't forget to review the proxy configuration of your Tor client and especially if you enabled the SOCKS5 proxy
|
||||
AIL supports crawling of websites and Tor hidden services. Ensure your Tor client's proxy configuration is correct, especially the SOCKS5 proxy settings.
|
||||
|
||||
### Installation
|
||||
|
||||
|
@ -45,38 +38,35 @@ In AIL, you can crawl websites and Tor hidden services. Don't forget to review t
|
|||
1. Lacus URL:
|
||||
In the web interface, go to `Crawlers` > `Settings` and click on the Edit button
|
||||
|
||||
![Splash Manager Config](./doc/screenshots/lacus_config.png?raw=true "AIL Lacus Config")
|
||||
![AIL Crawler Config](./doc/screenshots/lacus_config.png?raw=true "AIL Lacus Config")
|
||||
|
||||
![Splash Manager Config](./doc/screenshots/lacus_config_edit.png?raw=true "AIL Lacus Config")
|
||||
![AIL Crawler Config Edis](./doc/screenshots/lacus_config_edit.png?raw=true "AIL Lacus Config")
|
||||
|
||||
2. Launch AIL Crawlers:
|
||||
2. Number of Crawlers:
|
||||
Choose the number of crawlers you want to launch
|
||||
|
||||
![Splash Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
![Crawler Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
|
||||
![Splash Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures_edit.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
![Crawler Manager Nb Crawlers Config](./doc/screenshots/crawler_nb_captures_edit.png?raw=true "AIL Lacus Nb Crawlers Config")
|
||||
|
||||
## Chats Translation with LibreTranslate
|
||||
|
||||
### Kvrocks Migration
|
||||
---------------------
|
||||
**Important Note:
|
||||
We are currently working on a [migration script](https://github.com/ail-project/ail-framework/blob/master/update/v5.0/DB_KVROCKS_MIGRATION.py) to facilitate the migration to Kvrocks.
|
||||
**
|
||||
Chats message can be translated using [libretranslate](https://github.com/LibreTranslate/LibreTranslate), an open-source self-hosted machine translation.
|
||||
|
||||
Please note that the current version of this migration script only supports migrating the database on the same server.
|
||||
(If you plan to migrate to another server, we will provide additional instructions in this section once the migration script is completed)
|
||||
### Installation:
|
||||
1. Install LibreTranslate by running the following command:
|
||||
```bash
|
||||
pip install libretranslate
|
||||
```
|
||||
2. Run libretranslate:
|
||||
```bash
|
||||
libretranslate
|
||||
```
|
||||
|
||||
### Configuration:
|
||||
To enable LibreTranslate for chat translation, edit the LibreTranslate URL in the [./configs/core.cfg](./configs/core.cfg) file under the [Translation] section.
|
||||
```
|
||||
[Translation]
|
||||
libretranslate = http://127.0.0.1:5000
|
||||
```
|
||||
|
||||
To migrate your database to Kvrocks:
|
||||
1. Launch ARDB and Kvrocks
|
||||
2. Pull from remote
|
||||
```shell
|
||||
git checkout master
|
||||
git pull
|
||||
```
|
||||
3. Launch the migration script:
|
||||
```shell
|
||||
git checkout master
|
||||
git pull
|
||||
cd update/v5.0
|
||||
./DB_KVROCKS_MIGRATION.py
|
||||
```
|
||||
|
|
|
@ -138,7 +138,7 @@ CIRCL organises training on how to use or extend the AIL framework. AIL training
|
|||
|
||||
## API
|
||||
|
||||
The API documentation is available in [doc/README.md](doc/README.md)
|
||||
The API documentation is available in [doc/api.md](doc/api.md)
|
||||
|
||||
## HOWTO
|
||||
|
||||
|
|
|
@ -20,7 +20,7 @@ if [ -e "${DIR}/AILENV/bin/python" ]; then
|
|||
export AIL_VENV=${AIL_HOME}/AILENV/
|
||||
. ./AILENV/bin/activate
|
||||
else
|
||||
echo "Please make sure you have a AIL-framework environment, au revoir"
|
||||
echo "Please make sure AILENV is installed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
@ -29,19 +29,21 @@ export AIL_APP_SECRET_KEY="$(uuidgen)"
|
|||
export PATH=$AIL_VENV/bin:$PATH
|
||||
export PATH=$AIL_HOME:$PATH
|
||||
export PATH=$AIL_REDIS:$PATH
|
||||
export PATH=$AIL_ARDB:$PATH
|
||||
export PATH=$AIL_KVROCKS:$PATH
|
||||
export PATH=$AIL_BIN:$PATH
|
||||
export PATH=$AIL_FLASK:$PATH
|
||||
|
||||
isredis=`screen -ls | egrep '[0-9]+.Redis_AIL' | cut -d. -f1`
|
||||
isardb=`screen -ls | egrep '[0-9]+.ARDB_AIL' | cut -d. -f1`
|
||||
iskvrocks=`screen -ls | egrep '[0-9]+.KVROCKS_AIL' | cut -d. -f1`
|
||||
islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
|
||||
is_ail_core=`screen -ls | egrep '[0-9]+.Core_AIL' | cut -d. -f1`
|
||||
is_ail_2_ail=`screen -ls | egrep '[0-9]+.AIL_2_AIL' | cut -d. -f1`
|
||||
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
||||
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
||||
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
||||
function check_screens {
|
||||
isredis=`screen -ls | egrep '[0-9]+.Redis_AIL' | cut -d. -f1`
|
||||
isardb=`screen -ls | egrep '[0-9]+.ARDB_AIL' | cut -d. -f1`
|
||||
iskvrocks=`screen -ls | egrep '[0-9]+.KVROCKS_AIL' | cut -d. -f1`
|
||||
islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
|
||||
is_ail_core=`screen -ls | egrep '[0-9]+.Core_AIL' | cut -d. -f1`
|
||||
is_ail_2_ail=`screen -ls | egrep '[0-9]+.AIL_2_AIL' | cut -d. -f1`
|
||||
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
||||
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
||||
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
||||
}
|
||||
|
||||
function helptext {
|
||||
echo -e $YELLOW"
|
||||
|
@ -61,7 +63,6 @@ function helptext {
|
|||
- All the queuing modules.
|
||||
- All the processing modules.
|
||||
- All Redis in memory servers.
|
||||
- All ARDB on disk servers.
|
||||
- All KVROCKS servers.
|
||||
"$DEFAULT"
|
||||
(Inside screen Daemons)
|
||||
|
@ -71,6 +72,7 @@ function helptext {
|
|||
LAUNCH.sh
|
||||
[-l | --launchAuto] LAUNCH DB + Scripts
|
||||
[-k | --killAll] Kill DB + Scripts
|
||||
[-r | --restart] Restart
|
||||
[-ks | --killscript] Scripts
|
||||
[-u | --update] Update AIL
|
||||
[-ut | --thirdpartyUpdate] Update UI/Frontend
|
||||
|
@ -267,14 +269,17 @@ function launching_scripts {
|
|||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "SQLInjectionDetection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./SQLInjectionDetection.py; read x"
|
||||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "LibInjection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./LibInjection.py; read x"
|
||||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "Zerobins" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Zerobins.py; read x"
|
||||
sleep 0.1
|
||||
# screen -S "Script_AIL" -X screen -t "LibInjection" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./LibInjection.py; read x"
|
||||
# sleep 0.1
|
||||
# screen -S "Script_AIL" -X screen -t "Pasties" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Pasties.py; read x"
|
||||
# sleep 0.1
|
||||
|
||||
screen -S "Script_AIL" -X screen -t "MISP_Thehive_Auto_Push" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./MISP_Thehive_Auto_Push.py; read x"
|
||||
sleep 0.1
|
||||
|
||||
screen -S "Script_AIL" -X screen -t "Exif" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Exif.py; read x"
|
||||
sleep 0.1
|
||||
|
||||
##################################
|
||||
# TRACKERS MODULES #
|
||||
##################################
|
||||
|
@ -601,7 +606,7 @@ function launch_all {
|
|||
|
||||
function menu_display {
|
||||
|
||||
options=("Redis" "Ardb" "Kvrocks" "Logs" "Scripts" "Flask" "Killall" "Update" "Update-config" "Update-thirdparty")
|
||||
options=("Redis" "Kvrocks" "Logs" "Scripts" "Flask" "Killall" "Update" "Update-config" "Update-thirdparty")
|
||||
|
||||
menu() {
|
||||
echo "What do you want to Launch?:"
|
||||
|
@ -629,9 +634,6 @@ function menu_display {
|
|||
Redis)
|
||||
launch_redis;
|
||||
;;
|
||||
Ardb)
|
||||
launch_ardb;
|
||||
;;
|
||||
Kvrocks)
|
||||
launch_kvrocks;
|
||||
;;
|
||||
|
@ -673,31 +675,38 @@ function menu_display {
|
|||
}
|
||||
|
||||
#echo "$@"
|
||||
|
||||
check_screens;
|
||||
while [ "$1" != "" ]; do
|
||||
case $1 in
|
||||
-l | --launchAuto ) launch_all "automatic";
|
||||
-l | --launchAuto ) check_screens;
|
||||
launch_all "automatic";
|
||||
;;
|
||||
-lr | --launchRedis ) launch_redis;
|
||||
-lr | --launchRedis ) check_screens;
|
||||
launch_redis;
|
||||
;;
|
||||
-la | --launchARDB ) launch_ardb;
|
||||
;;
|
||||
-lk | --launchKVROCKS ) launch_kvrocks;
|
||||
-lk | --launchKVROCKS ) check_screens;
|
||||
launch_kvrocks;
|
||||
;;
|
||||
-lrv | --launchRedisVerify ) launch_redis;
|
||||
wait_until_redis_is_ready;
|
||||
;;
|
||||
-lav | --launchARDBVerify ) launch_ardb;
|
||||
wait_until_ardb_is_ready;
|
||||
;;
|
||||
-lkv | --launchKVORCKSVerify ) launch_kvrocks;
|
||||
wait_until_kvrocks_is_ready;
|
||||
;;
|
||||
--set_kvrocks_namespaces ) set_kvrocks_namespaces;
|
||||
;;
|
||||
-k | --killAll ) killall;
|
||||
-k | --killAll ) check_screens;
|
||||
killall;
|
||||
;;
|
||||
-ks | --killscript ) killscript;
|
||||
-r | --restart ) killall;
|
||||
sleep 0.1;
|
||||
check_screens;
|
||||
launch_all "automatic";
|
||||
;;
|
||||
-ks | --killscript ) check_screens;
|
||||
killscript;
|
||||
;;
|
||||
-m | --menu ) menu_display;
|
||||
;;
|
||||
|
|
|
@ -34,16 +34,20 @@ class D4Client(AbstractModule):
|
|||
|
||||
self.d4_client = d4.create_d4_client()
|
||||
self.last_refresh = time.time()
|
||||
self.last_config_check = time.time()
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, dns_record):
|
||||
# Refresh D4 Client
|
||||
if self.last_refresh < d4.get_config_last_update_time():
|
||||
self.d4_client = d4.create_d4_client()
|
||||
self.last_refresh = time.time()
|
||||
print('D4 Client: config updated')
|
||||
if self.last_config_check < int(time.time()) - 30:
|
||||
print('refresh rrrr')
|
||||
if self.last_refresh < d4.get_config_last_update_time():
|
||||
self.d4_client = d4.create_d4_client()
|
||||
self.last_refresh = time.time()
|
||||
print('D4 Client: config updated')
|
||||
self.last_config_check = time.time()
|
||||
|
||||
if self.d4_client:
|
||||
# Send DNS Record to D4Server
|
||||
|
|
|
@ -23,7 +23,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from core import ail_2_ail
|
||||
from modules.abstract_module import AbstractModule
|
||||
# from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
|
||||
#### CONFIG ####
|
||||
# config_loader = ConfigLoader()
|
||||
|
@ -76,10 +76,11 @@ class Sync_importer(AbstractModule):
|
|||
|
||||
# # TODO: create default id
|
||||
item_id = ail_stream['meta']['ail:id']
|
||||
item = Item(item_id)
|
||||
|
||||
message = f'sync {item_id} {b64_gzip_content}'
|
||||
print(item_id)
|
||||
self.add_message_to_queue(message, 'Importers')
|
||||
message = f'sync {b64_gzip_content}'
|
||||
print(item.id)
|
||||
self.add_message_to_queue(obj=item, message=message, queue='Importers')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -15,13 +15,16 @@ This module .
|
|||
import os
|
||||
import sys
|
||||
import time
|
||||
import traceback
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from core import ail_2_ail
|
||||
from lib.objects.Items import Item
|
||||
from lib.ail_queues import get_processed_end_obj, timeout_processed_objs, get_last_queue_timeout
|
||||
from lib.exceptions import ModuleQueueError
|
||||
from lib.objects import ail_objects
|
||||
from modules.abstract_module import AbstractModule
|
||||
|
||||
|
||||
|
@ -30,14 +33,15 @@ class Sync_module(AbstractModule):
|
|||
Sync_module module for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Sync_module, self).__init__()
|
||||
def __init__(self, queue=False): # FIXME MODIFY/ADD QUEUE
|
||||
super(Sync_module, self).__init__(queue=queue)
|
||||
|
||||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 10
|
||||
|
||||
self.dict_sync_queues = ail_2_ail.get_all_sync_queue_dict()
|
||||
self.last_refresh = time.time()
|
||||
self.last_refresh_queues = time.time()
|
||||
|
||||
print(self.dict_sync_queues)
|
||||
|
||||
|
@ -53,40 +57,70 @@ class Sync_module(AbstractModule):
|
|||
print('sync queues refreshed')
|
||||
print(self.dict_sync_queues)
|
||||
|
||||
# Extract object from message
|
||||
# # TODO: USE JSON DICT ????
|
||||
mess_split = message.split(';')
|
||||
if len(mess_split) == 3:
|
||||
obj_type = mess_split[0]
|
||||
obj_subtype = mess_split[1]
|
||||
obj_id = mess_split[2]
|
||||
obj = ail_objects.get_obj_from_global_id(message)
|
||||
|
||||
# OBJECT => Item
|
||||
# if obj_type == 'item':
|
||||
obj = Item(obj_id)
|
||||
tags = obj.get_tags()
|
||||
|
||||
tags = obj.get_tags()
|
||||
# check filter + tags
|
||||
# print(message)
|
||||
for queue_uuid in self.dict_sync_queues:
|
||||
filter_tags = self.dict_sync_queues[queue_uuid]['filter']
|
||||
if filter_tags and tags:
|
||||
# print('tags: {tags} filter: {filter_tags}')
|
||||
if filter_tags.issubset(tags):
|
||||
obj_dict = obj.get_default_meta()
|
||||
# send to queue push and/or pull
|
||||
for dict_ail in self.dict_sync_queues[queue_uuid]['ail_instances']:
|
||||
print(f'ail_uuid: {dict_ail["ail_uuid"]} obj: {obj.type}:{obj.get_subtype(r_str=True)}:{obj.id}')
|
||||
ail_2_ail.add_object_to_sync_queue(queue_uuid, dict_ail['ail_uuid'], obj_dict,
|
||||
push=dict_ail['push'], pull=dict_ail['pull'])
|
||||
|
||||
# check filter + tags
|
||||
# print(message)
|
||||
for queue_uuid in self.dict_sync_queues:
|
||||
filter_tags = self.dict_sync_queues[queue_uuid]['filter']
|
||||
if filter_tags and tags:
|
||||
# print('tags: {tags} filter: {filter_tags}')
|
||||
if filter_tags.issubset(tags):
|
||||
obj_dict = obj.get_default_meta()
|
||||
# send to queue push and/or pull
|
||||
for dict_ail in self.dict_sync_queues[queue_uuid]['ail_instances']:
|
||||
print(f'ail_uuid: {dict_ail["ail_uuid"]} obj: {message}')
|
||||
ail_2_ail.add_object_to_sync_queue(queue_uuid, dict_ail['ail_uuid'], obj_dict,
|
||||
push=dict_ail['push'], pull=dict_ail['pull'])
|
||||
def run(self):
|
||||
"""
|
||||
Run Module endless process
|
||||
"""
|
||||
|
||||
else:
|
||||
# Malformed message
|
||||
raise Exception(f'too many values to unpack (expected 3) given {len(mess_split)} with message {message}')
|
||||
# Endless loop processing messages from the input queue
|
||||
while self.proceed:
|
||||
|
||||
# Timeout queues
|
||||
# timeout_processed_objs()
|
||||
if self.last_refresh_queues < time.time():
|
||||
timeout_processed_objs()
|
||||
self.last_refresh_queues = time.time() + 120
|
||||
self.redis_logger.debug('Timeout queues')
|
||||
# print('Timeout queues')
|
||||
|
||||
# Get one message (paste) from the QueueIn (copy of Redis_Global publish)
|
||||
global_id = get_processed_end_obj()
|
||||
if global_id:
|
||||
try:
|
||||
# Module processing with the message from the queue
|
||||
self.compute(global_id)
|
||||
except Exception as err:
|
||||
if self.debug:
|
||||
self.queue.error()
|
||||
raise err
|
||||
|
||||
# LOG ERROR
|
||||
trace = traceback.format_tb(err.__traceback__)
|
||||
trace = ''.join(trace)
|
||||
self.logger.critical(f"Error in module {self.module_name}: {__name__} : {err}")
|
||||
self.logger.critical(f"Module {self.module_name} input message: {global_id}")
|
||||
self.logger.critical(trace)
|
||||
|
||||
if isinstance(err, ModuleQueueError):
|
||||
self.queue.error()
|
||||
raise err
|
||||
|
||||
else:
|
||||
self.computeNone()
|
||||
# Wait before next process
|
||||
self.logger.debug(f"{self.module_name}, waiting for new message, Idling {self.pending_seconds}s")
|
||||
time.sleep(self.pending_seconds)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Sync_module()
|
||||
module = Sync_module(queue=False) # FIXME MODIFY/ADD QUEUE
|
||||
module.run()
|
||||
|
|
|
@ -11,7 +11,7 @@ import uuid
|
|||
|
||||
import subprocess
|
||||
|
||||
from flask import escape
|
||||
from markupsafe import escape
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
|
|
|
@ -6,6 +6,7 @@ import logging.config
|
|||
import sys
|
||||
import time
|
||||
|
||||
from pyail import PyAIL
|
||||
from requests.exceptions import ConnectionError
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
|
@ -16,10 +17,13 @@ from modules.abstract_module import AbstractModule
|
|||
from lib import ail_logger
|
||||
from lib import crawlers
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import CookiesNames
|
||||
from lib.objects import Etags
|
||||
from lib.objects.Domains import Domain
|
||||
from lib.objects.Items import Item
|
||||
from lib.objects import Screenshots
|
||||
from lib.objects import Titles
|
||||
from trackers.Tracker_Yara import Tracker_Yara
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='crawlers'))
|
||||
|
||||
|
@ -33,12 +37,23 @@ class Crawler(AbstractModule):
|
|||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 1
|
||||
|
||||
self.tracker_yara = Tracker_Yara(queue=False)
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
||||
self.default_har = config_loader.get_config_boolean('Crawler', 'default_har')
|
||||
self.default_screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot')
|
||||
self.default_depth_limit = config_loader.get_config_int('Crawler', 'default_depth_limit')
|
||||
|
||||
ail_url_to_push_discovery = config_loader.get_config_str('Crawler', 'ail_url_to_push_onion_discovery')
|
||||
ail_key_to_push_discovery = config_loader.get_config_str('Crawler', 'ail_key_to_push_onion_discovery')
|
||||
if ail_url_to_push_discovery and ail_key_to_push_discovery:
|
||||
ail = PyAIL(ail_url_to_push_discovery, ail_key_to_push_discovery, ssl=False)
|
||||
if ail.ping_ail():
|
||||
self.ail_to_push_discovery = ail
|
||||
else:
|
||||
self.ail_to_push_discovery = None
|
||||
|
||||
# TODO: LIMIT MAX NUMBERS OF CRAWLED PAGES
|
||||
|
||||
# update hardcoded blacklist
|
||||
|
@ -56,8 +71,9 @@ class Crawler(AbstractModule):
|
|||
self.har = None
|
||||
self.screenshot = None
|
||||
self.root_item = None
|
||||
self.har_dir = None
|
||||
self.date = None
|
||||
self.items_dir = None
|
||||
self.original_domain = None
|
||||
self.domain = None
|
||||
|
||||
# TODO Replace with warning list ???
|
||||
|
@ -97,7 +113,7 @@ class Crawler(AbstractModule):
|
|||
self.crawler_scheduler.update_queue()
|
||||
self.crawler_scheduler.process_queue()
|
||||
|
||||
self.refresh_lacus_status() # TODO LOG ERROR
|
||||
self.refresh_lacus_status() # TODO LOG ERROR
|
||||
if not self.is_lacus_up:
|
||||
return None
|
||||
|
||||
|
@ -105,7 +121,9 @@ class Crawler(AbstractModule):
|
|||
if crawlers.get_nb_crawler_captures() < crawlers.get_crawler_max_captures():
|
||||
task_row = crawlers.add_task_to_lacus_queue()
|
||||
if task_row:
|
||||
task_uuid, priority = task_row
|
||||
task, priority = task_row
|
||||
task.start()
|
||||
task_uuid = task.uuid
|
||||
try:
|
||||
self.enqueue_capture(task_uuid, priority)
|
||||
except ConnectionError:
|
||||
|
@ -120,15 +138,30 @@ class Crawler(AbstractModule):
|
|||
if capture:
|
||||
try:
|
||||
status = self.lacus.get_capture_status(capture.uuid)
|
||||
if status != crawlers.CaptureStatus.DONE: # TODO ADD GLOBAL TIMEOUT-> Save start time ### print start time
|
||||
if status == crawlers.CaptureStatus.DONE:
|
||||
return capture
|
||||
elif status == crawlers.CaptureStatus.UNKNOWN:
|
||||
capture_start = capture.get_start_time(r_str=False)
|
||||
if capture_start == 0:
|
||||
task = capture.get_task()
|
||||
task.delete()
|
||||
capture.delete()
|
||||
self.logger.warning(f'capture UNKNOWN ERROR STATE, {task.uuid} Removed from queue')
|
||||
return None
|
||||
if int(time.time()) - capture_start > 600: # TODO ADD in new crawler config
|
||||
task = capture.get_task()
|
||||
task.reset()
|
||||
capture.delete()
|
||||
self.logger.warning(f'capture UNKNOWN Timeout, {task.uuid} Send back in queue')
|
||||
else:
|
||||
capture.update(status)
|
||||
else:
|
||||
capture.update(status)
|
||||
print(capture.uuid, crawlers.CaptureStatus(status).name, int(time.time()))
|
||||
else:
|
||||
return capture
|
||||
|
||||
except ConnectionError:
|
||||
print(capture.uuid)
|
||||
capture.update(self, -1)
|
||||
capture.update(-1)
|
||||
self.refresh_lacus_status()
|
||||
|
||||
time.sleep(self.pending_seconds)
|
||||
|
@ -169,6 +202,24 @@ class Crawler(AbstractModule):
|
|||
|
||||
crawlers.create_capture(capture_uuid, task_uuid)
|
||||
print(task.uuid, capture_uuid, 'launched')
|
||||
|
||||
if self.ail_to_push_discovery:
|
||||
|
||||
if task.get_depth() == 1 and priority < 10 and task.get_domain().endswith('.onion'):
|
||||
har = task.get_har()
|
||||
screenshot = task.get_screenshot()
|
||||
# parent_id = task.get_parent()
|
||||
# if parent_id != 'manual' and parent_id != 'auto':
|
||||
# parent = parent_id[19:-36]
|
||||
# else:
|
||||
# parent = 'AIL_capture'
|
||||
|
||||
if not url:
|
||||
raise Exception(f'Error: url is None, {task.uuid}, {capture_uuid}, {url}')
|
||||
|
||||
self.ail_to_push_discovery.add_crawler_capture(task_uuid, capture_uuid, url, har=har, # parent=parent,
|
||||
screenshot=screenshot, depth_limit=1, proxy='force_tor')
|
||||
print(task.uuid, capture_uuid, url, 'Added to ail_to_push_discovery')
|
||||
return capture_uuid
|
||||
|
||||
# CRAWL DOMAIN
|
||||
|
@ -178,34 +229,52 @@ class Crawler(AbstractModule):
|
|||
task = capture.get_task()
|
||||
domain = task.get_domain()
|
||||
print(domain)
|
||||
if not domain:
|
||||
if self.debug:
|
||||
raise Exception(f'Error: domain {domain} - task {task.uuid} - capture {capture.uuid}')
|
||||
else:
|
||||
self.logger.critical(f'Error: domain {domain} - task {task.uuid} - capture {capture.uuid}')
|
||||
print(f'Error: domain {domain}')
|
||||
return None
|
||||
|
||||
self.domain = Domain(domain)
|
||||
self.original_domain = Domain(domain)
|
||||
|
||||
epoch = int(time.time())
|
||||
parent_id = task.get_parent()
|
||||
|
||||
entries = self.lacus.get_capture(capture.uuid)
|
||||
print(entries['status'])
|
||||
print(entries.get('status'))
|
||||
self.har = task.get_har()
|
||||
self.screenshot = task.get_screenshot()
|
||||
# DEBUG
|
||||
# self.har = True
|
||||
# self.screenshot = True
|
||||
str_date = crawlers.get_current_date(separator=True)
|
||||
self.har_dir = crawlers.get_date_har_dir(str_date)
|
||||
self.items_dir = crawlers.get_date_crawled_items_source(str_date)
|
||||
self.date = crawlers.get_current_date(separator=True)
|
||||
self.items_dir = crawlers.get_date_crawled_items_source(self.date)
|
||||
self.root_item = None
|
||||
|
||||
# Save Capture
|
||||
self.save_capture_response(parent_id, entries)
|
||||
|
||||
self.domain.update_daterange(str_date.replace('/', ''))
|
||||
# Origin + History
|
||||
self.domain.update_daterange(self.date.replace('/', ''))
|
||||
# Origin + History + tags
|
||||
if self.root_item:
|
||||
self.domain.set_last_origin(parent_id)
|
||||
self.domain.add_history(epoch, root_item=self.root_item)
|
||||
elif self.domain.was_up():
|
||||
self.domain.add_history(epoch, root_item=epoch)
|
||||
# Tags
|
||||
for tag in task.get_tags():
|
||||
self.domain.add_tag(tag)
|
||||
self.domain.add_history(epoch, root_item=self.root_item)
|
||||
|
||||
if self.domain != self.original_domain:
|
||||
self.original_domain.update_daterange(self.date.replace('/', ''))
|
||||
if self.root_item:
|
||||
self.original_domain.set_last_origin(parent_id)
|
||||
# Tags
|
||||
for tag in task.get_tags():
|
||||
self.domain.add_tag(tag)
|
||||
self.original_domain.add_history(epoch, root_item=self.root_item)
|
||||
crawlers.update_last_crawled_domain(self.original_domain.get_domain_type(), self.original_domain.id, epoch)
|
||||
|
||||
crawlers.update_last_crawled_domain(self.domain.get_domain_type(), self.domain.id, epoch)
|
||||
print('capture:', capture.uuid, 'completed')
|
||||
|
@ -218,12 +287,12 @@ class Crawler(AbstractModule):
|
|||
if 'error' in entries:
|
||||
# TODO IMPROVE ERROR MESSAGE
|
||||
self.logger.warning(str(entries['error']))
|
||||
print(entries['error'])
|
||||
print(entries.get('error'))
|
||||
if entries.get('html'):
|
||||
print('retrieved content')
|
||||
# print(entries.get('html'))
|
||||
|
||||
if 'last_redirected_url' in entries and entries['last_redirected_url']:
|
||||
if 'last_redirected_url' in entries and entries.get('last_redirected_url'):
|
||||
last_url = entries['last_redirected_url']
|
||||
unpacked_last_url = crawlers.unpack_url(last_url)
|
||||
current_domain = unpacked_last_url['domain']
|
||||
|
@ -238,33 +307,40 @@ class Crawler(AbstractModule):
|
|||
else:
|
||||
last_url = f'http://{self.domain.id}'
|
||||
|
||||
if 'html' in entries and entries['html']:
|
||||
if 'html' in entries and entries.get('html'):
|
||||
item_id = crawlers.create_item_id(self.items_dir, self.domain.id)
|
||||
print(item_id)
|
||||
gzip64encoded = crawlers.get_gzipped_b64_item(item_id, entries['html'])
|
||||
item = Item(item_id)
|
||||
print(item.id)
|
||||
|
||||
gzip64encoded = crawlers.get_gzipped_b64_item(item.id, entries['html'])
|
||||
# send item to Global
|
||||
relay_message = f'crawler {item_id} {gzip64encoded}'
|
||||
self.add_message_to_queue(relay_message, 'Importers')
|
||||
relay_message = f'crawler {gzip64encoded}'
|
||||
self.add_message_to_queue(obj=item, message=relay_message, queue='Importers')
|
||||
|
||||
# Tag
|
||||
msg = f'infoleak:submission="crawler";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
# Tag # TODO replace me with metadata to tags
|
||||
msg = f'infoleak:submission="crawler"' # TODO FIXME
|
||||
self.add_message_to_queue(obj=item, message=msg, queue='Tags')
|
||||
|
||||
# TODO replace me with metadata to add
|
||||
crawlers.create_item_metadata(item_id, last_url, parent_id)
|
||||
if self.root_item is None:
|
||||
self.root_item = item_id
|
||||
parent_id = item_id
|
||||
|
||||
item = Item(item_id)
|
||||
|
||||
title_content = crawlers.extract_title_from_html(entries['html'])
|
||||
if title_content:
|
||||
title = Titles.create_title(title_content)
|
||||
title.add(item.get_date(), item_id)
|
||||
title.add(item.get_date(), item)
|
||||
# Tracker
|
||||
self.tracker_yara.compute_manual(title)
|
||||
if not title.is_tags_safe():
|
||||
unsafe_tag = 'dark-web:topic="pornography-child-exploitation"'
|
||||
self.domain.add_tag(unsafe_tag)
|
||||
item.add_tag(unsafe_tag)
|
||||
|
||||
# SCREENSHOT
|
||||
if self.screenshot:
|
||||
if 'png' in entries and entries['png']:
|
||||
if 'png' in entries and entries.get('png'):
|
||||
screenshot = Screenshots.create_screenshot(entries['png'], b64=False)
|
||||
if screenshot:
|
||||
if not screenshot.is_tags_safe():
|
||||
|
@ -278,8 +354,19 @@ class Crawler(AbstractModule):
|
|||
screenshot.add_correlation('domain', '', self.domain.id)
|
||||
# HAR
|
||||
if self.har:
|
||||
if 'har' in entries and entries['har']:
|
||||
crawlers.save_har(self.har_dir, item_id, entries['har'])
|
||||
if 'har' in entries and entries.get('har'):
|
||||
har_id = crawlers.create_har_id(self.date, item_id)
|
||||
crawlers.save_har(har_id, entries['har'])
|
||||
for cookie_name in crawlers.extract_cookies_names_from_har(entries['har']):
|
||||
print(cookie_name)
|
||||
cookie = CookiesNames.create(cookie_name)
|
||||
cookie.add(self.date.replace('/', ''), self.domain)
|
||||
for etag_content in crawlers.extract_etag_from_har(entries['har']):
|
||||
print(etag_content)
|
||||
etag = Etags.create(etag_content)
|
||||
etag.add(self.date.replace('/', ''), self.domain)
|
||||
crawlers.extract_hhhash(entries['har'], self.domain.id, self.date.replace('/', ''))
|
||||
|
||||
# Next Children
|
||||
entries_children = entries.get('children')
|
||||
if entries_children:
|
||||
|
|
|
@ -319,11 +319,7 @@ class MISPExporterAutoDaily(MISPExporter):
|
|||
def __init__(self, url='', key='', ssl=False):
|
||||
super().__init__(url=url, key=key, ssl=ssl)
|
||||
|
||||
# create event if don't exists
|
||||
try:
|
||||
self.event_id = self.get_daily_event_id()
|
||||
except MISPConnectionError:
|
||||
self.event_id = - 1
|
||||
self.event_id = - 1
|
||||
self.date = datetime.date.today()
|
||||
|
||||
def export(self, obj, tag):
|
||||
|
@ -345,6 +341,7 @@ class MISPExporterAutoDaily(MISPExporter):
|
|||
self.add_event_object(self.event_id, obj)
|
||||
|
||||
except MISPConnectionError:
|
||||
self.event_id = - 1
|
||||
return -1
|
||||
|
||||
|
||||
|
|
|
@ -8,9 +8,12 @@ Import Content
|
|||
|
||||
"""
|
||||
import os
|
||||
import logging
|
||||
import logging.config
|
||||
import sys
|
||||
|
||||
from abc import ABC
|
||||
from ssl import create_default_context
|
||||
|
||||
import smtplib
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
|
@ -22,17 +25,22 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_logger
|
||||
from exporter.abstract_exporter import AbstractExporter
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
# from lib.objects.abstract_object import AbstractObject
|
||||
# from lib.Tracker import Tracker
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
|
||||
|
||||
class MailExporter(AbstractExporter, ABC):
|
||||
def __init__(self, host=None, port=None, password=None, user='', sender=''):
|
||||
def __init__(self, host=None, port=None, password=None, user='', sender='', cert_required=None, ca_file=None):
|
||||
super().__init__()
|
||||
config_loader = ConfigLoader()
|
||||
|
||||
self.logger = logging.getLogger(f'{self.__class__.__name__}')
|
||||
|
||||
if host:
|
||||
self.host = host
|
||||
self.port = port
|
||||
|
@ -45,6 +53,15 @@ class MailExporter(AbstractExporter, ABC):
|
|||
self.pw = config_loader.get_config_str("Notifications", "sender_pw")
|
||||
if self.pw == 'None':
|
||||
self.pw = None
|
||||
if cert_required is not None:
|
||||
self.cert_required = bool(cert_required)
|
||||
self.ca_file = ca_file
|
||||
else:
|
||||
self.cert_required = config_loader.get_config_boolean("Notifications", "cert_required")
|
||||
if self.cert_required:
|
||||
self.ca_file = config_loader.get_config_str("Notifications", "ca_file")
|
||||
else:
|
||||
self.ca_file = None
|
||||
if user:
|
||||
self.user = user
|
||||
else:
|
||||
|
@ -67,8 +84,12 @@ class MailExporter(AbstractExporter, ABC):
|
|||
smtp_server = smtplib.SMTP(self.host, self.port)
|
||||
smtp_server.starttls()
|
||||
except smtplib.SMTPNotSupportedError:
|
||||
print("The server does not support the STARTTLS extension.")
|
||||
smtp_server = smtplib.SMTP_SSL(self.host, self.port)
|
||||
self.logger.info(f"The server {self.host}:{self.port} does not support the STARTTLS extension.")
|
||||
if self.cert_required:
|
||||
context = create_default_context(cafile=self.ca_file)
|
||||
else:
|
||||
context = None
|
||||
smtp_server = smtplib.SMTP_SSL(self.host, self.port, context=context)
|
||||
|
||||
smtp_server.ehlo()
|
||||
if self.user is not None:
|
||||
|
@ -80,7 +101,7 @@ class MailExporter(AbstractExporter, ABC):
|
|||
return smtp_server
|
||||
# except Exception as err:
|
||||
# traceback.print_tb(err.__traceback__)
|
||||
# logger.warning(err)
|
||||
# self.logger.warning(err)
|
||||
|
||||
def _export(self, recipient, subject, body):
|
||||
mime_msg = MIMEMultipart()
|
||||
|
@ -95,24 +116,35 @@ class MailExporter(AbstractExporter, ABC):
|
|||
smtp_client.quit()
|
||||
# except Exception as err:
|
||||
# traceback.print_tb(err.__traceback__)
|
||||
# logger.warning(err)
|
||||
print(f'Send notification: {subject} to {recipient}')
|
||||
# self.logger.warning(err)
|
||||
self.logger.info(f'Send notification: {subject} to {recipient}')
|
||||
|
||||
class MailExporterTracker(MailExporter):
|
||||
|
||||
def __init__(self, host=None, port=None, password=None, user='', sender=''):
|
||||
super().__init__(host=host, port=port, password=password, user=user, sender=sender)
|
||||
|
||||
def export(self, tracker, obj): # TODO match
|
||||
def export(self, tracker, obj, matches=[]):
|
||||
tracker_type = tracker.get_type()
|
||||
tracker_name = tracker.get_tracked()
|
||||
subject = f'AIL Framework Tracker: {tracker_name}' # TODO custom subject
|
||||
description = tracker.get_description()
|
||||
if not description:
|
||||
description = tracker_name
|
||||
|
||||
subject = f'AIL Framework Tracker: {description}'
|
||||
body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n"
|
||||
body += f'Item: {obj.id}\nurl:{obj.get_link()}'
|
||||
|
||||
# TODO match option
|
||||
# if match:
|
||||
# body += f'Tracker Match:\n\n{escape(match)}'
|
||||
if matches:
|
||||
body += '\n'
|
||||
nb = 1
|
||||
for match in matches:
|
||||
body += f'\nMatch {nb}: {match[0]}\nExtract:\n{match[1]}\n\n'
|
||||
nb += 1
|
||||
else:
|
||||
body = f"AIL Framework, New occurrence for {tracker_type} tracker: {tracker_name}\n"
|
||||
body += f'Item: {obj.id}\nurl:{obj.get_link()}'
|
||||
|
||||
# print(body)
|
||||
for mail in tracker.get_mails():
|
||||
self._export(mail, subject, body)
|
||||
|
|
|
@ -56,6 +56,8 @@ class FeederImporter(AbstractImporter):
|
|||
feeders = [f[:-3] for f in os.listdir(feeder_dir) if os.path.isfile(os.path.join(feeder_dir, f))]
|
||||
self.feeders = {}
|
||||
for feeder in feeders:
|
||||
if feeder == 'abstract_chats_feeder':
|
||||
continue
|
||||
print(feeder)
|
||||
part = feeder.split('.')[-1]
|
||||
# import json importer class
|
||||
|
@ -87,13 +89,27 @@ class FeederImporter(AbstractImporter):
|
|||
feeder_name = feeder.get_name()
|
||||
print(f'importing: {feeder_name} feeder')
|
||||
|
||||
item_id = feeder.get_item_id()
|
||||
# Get Data object:
|
||||
data_obj = feeder.get_obj()
|
||||
|
||||
# process meta
|
||||
if feeder.get_json_meta():
|
||||
feeder.process_meta()
|
||||
gzip64_content = feeder.get_gzip64_content()
|
||||
objs = feeder.process_meta()
|
||||
if objs is None:
|
||||
objs = set()
|
||||
else:
|
||||
objs = set()
|
||||
|
||||
return f'{feeder_name} {item_id} {gzip64_content}'
|
||||
if data_obj:
|
||||
objs.add(data_obj)
|
||||
|
||||
for obj in objs:
|
||||
if obj.type == 'item': # object save on disk as file (Items)
|
||||
gzip64_content = feeder.get_gzip64_content()
|
||||
return obj, f'{feeder_name} {gzip64_content}'
|
||||
else: # Messages save on DB
|
||||
if obj.exists() and obj.type != 'chat':
|
||||
return obj, f'{feeder_name}'
|
||||
|
||||
|
||||
class FeederModuleImporter(AbstractModule):
|
||||
|
@ -112,11 +128,14 @@ class FeederModuleImporter(AbstractModule):
|
|||
def compute(self, message):
|
||||
# TODO HANDLE Invalid JSON
|
||||
json_data = json.loads(message)
|
||||
relay_message = self.importer.importer(json_data)
|
||||
self.add_message_to_queue(relay_message)
|
||||
# TODO multiple objs + messages
|
||||
obj, relay_message = self.importer.importer(json_data)
|
||||
####
|
||||
self.add_message_to_queue(obj=obj, message=relay_message)
|
||||
|
||||
|
||||
# Launch Importer
|
||||
if __name__ == '__main__':
|
||||
module = FeederModuleImporter()
|
||||
# module.debug = True
|
||||
module.run()
|
||||
|
|
|
@ -19,9 +19,11 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
from importer.abstract_importer import AbstractImporter
|
||||
# from modules.abstract_module import AbstractModule
|
||||
from lib import ail_logger
|
||||
from lib.ail_queues import AILQueue
|
||||
# from lib.ail_queues import AILQueue
|
||||
from lib import ail_files # TODO RENAME ME
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
|
||||
class FileImporter(AbstractImporter):
|
||||
|
@ -41,12 +43,15 @@ class FileImporter(AbstractImporter):
|
|||
gzipped = False
|
||||
if mimetype == 'application/gzip':
|
||||
gzipped = True
|
||||
elif not ail_files.is_text(mimetype):
|
||||
elif not ail_files.is_text(mimetype): # # # #
|
||||
return None
|
||||
|
||||
message = self.create_message(item_id, content, gzipped=gzipped, source='dir_import')
|
||||
source = 'dir_import'
|
||||
message = self.create_message(content, gzipped=gzipped, source=source)
|
||||
self.logger.info(f'{source} {item_id}')
|
||||
obj = Item(item_id)
|
||||
if message:
|
||||
self.add_message_to_queue(message)
|
||||
self.add_message_to_queue(obj, message=message)
|
||||
|
||||
class DirImporter(AbstractImporter):
|
||||
def __init__(self):
|
||||
|
|
|
@ -22,6 +22,8 @@ from importer.abstract_importer import AbstractImporter
|
|||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class PystemonImporter(AbstractImporter):
|
||||
def __init__(self, pystemon_dir, host='localhost', port=6379, db=10):
|
||||
super().__init__()
|
||||
|
@ -35,7 +37,7 @@ class PystemonImporter(AbstractImporter):
|
|||
print(item_id)
|
||||
if item_id:
|
||||
print(item_id)
|
||||
full_item_path = os.path.join(self.dir_pystemon, item_id) # TODO SANITIZE PATH
|
||||
full_item_path = os.path.join(self.dir_pystemon, item_id) # TODO SANITIZE PATH
|
||||
# Check if pystemon file exists
|
||||
if not os.path.isfile(full_item_path):
|
||||
print(f'Error: {full_item_path}, file not found')
|
||||
|
@ -47,10 +49,19 @@ class PystemonImporter(AbstractImporter):
|
|||
if not content:
|
||||
return None
|
||||
|
||||
return self.create_message(item_id, content, source='pystemon')
|
||||
if full_item_path[-3:] == '.gz':
|
||||
gzipped = True
|
||||
else:
|
||||
gzipped = False
|
||||
|
||||
# TODO handle multiple objects
|
||||
source = 'pystemon'
|
||||
message = self.create_message(content, gzipped=gzipped, source=source)
|
||||
self.logger.info(f'{source} {item_id}')
|
||||
return item_id, message
|
||||
|
||||
except IOError as e:
|
||||
print(f'Error: {full_item_path}, IOError')
|
||||
self.logger.error(f'Error {e}: {full_item_path}, IOError')
|
||||
return None
|
||||
|
||||
|
||||
|
@ -74,7 +85,10 @@ class PystemonModuleImporter(AbstractModule):
|
|||
return self.importer.importer()
|
||||
|
||||
def compute(self, message):
|
||||
self.add_message_to_queue(message)
|
||||
if message:
|
||||
item_id, message = message
|
||||
item = Item(item_id)
|
||||
self.add_message_to_queue(obj=item, message=message)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -4,15 +4,13 @@
|
|||
Importer Class
|
||||
================
|
||||
|
||||
Import Content
|
||||
ZMQ Importer
|
||||
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
|
||||
import zmq
|
||||
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
|
@ -21,6 +19,8 @@ from importer.abstract_importer import AbstractImporter
|
|||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class ZMQImporters(AbstractImporter):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
|
@ -56,6 +56,8 @@ class ZMQModuleImporter(AbstractModule):
|
|||
super().__init__()
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name")
|
||||
|
||||
addresses = config_loader.get_config_str('ZMQ_Global', 'address')
|
||||
addresses = addresses.split(',')
|
||||
channel = config_loader.get_config_str('ZMQ_Global', 'channel')
|
||||
|
@ -63,7 +65,6 @@ class ZMQModuleImporter(AbstractModule):
|
|||
for address in addresses:
|
||||
self.zmq_importer.add(address.strip(), channel)
|
||||
|
||||
# TODO MESSAGE SOURCE - UI
|
||||
def get_message(self):
|
||||
for message in self.zmq_importer.importer():
|
||||
# remove channel from message
|
||||
|
@ -72,8 +73,20 @@ class ZMQModuleImporter(AbstractModule):
|
|||
def compute(self, messages):
|
||||
for message in messages:
|
||||
message = message.decode()
|
||||
print(message.split(' ', 1)[0])
|
||||
self.add_message_to_queue(message)
|
||||
|
||||
obj_id, gzip64encoded = message.split(' ', 1) # TODO ADD LOGS
|
||||
splitted = obj_id.split('>>', 1)
|
||||
if len(splitted) == 2:
|
||||
feeder_name, obj_id = splitted
|
||||
else:
|
||||
feeder_name = self.default_feeder_name
|
||||
|
||||
obj = Item(obj_id)
|
||||
# f'{source} {content}'
|
||||
relay_message = f'{feeder_name} {gzip64encoded}'
|
||||
|
||||
print(f'feeder_name item::{obj_id}')
|
||||
self.add_message_to_queue(obj=obj, message=relay_message)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -54,16 +54,22 @@ class AbstractImporter(ABC): # TODO ail queues
|
|||
"""
|
||||
return self.__class__.__name__
|
||||
|
||||
def add_message_to_queue(self, message, queue_name=None):
|
||||
def add_message_to_queue(self, obj, message='', queue=None):
|
||||
"""
|
||||
Add message to queue
|
||||
:param obj: AILObject
|
||||
:param message: message to send in queue
|
||||
:param queue_name: queue or module name
|
||||
:param queue: queue name or module name
|
||||
|
||||
ex: add_message_to_queue(item_id, 'Mail')
|
||||
"""
|
||||
if message:
|
||||
self.queue.send_message(message, queue_name)
|
||||
if not obj:
|
||||
raise Exception(f'Invalid AIL object, {obj}')
|
||||
obj_global_id = obj.get_global_id()
|
||||
self.queue.send_message(obj_global_id, message, queue)
|
||||
|
||||
def get_available_queues(self):
|
||||
return self.queue.get_out_queues()
|
||||
|
||||
@staticmethod
|
||||
def b64(content):
|
||||
|
@ -85,18 +91,20 @@ class AbstractImporter(ABC): # TODO ail queues
|
|||
self.logger.warning(e)
|
||||
return ''
|
||||
|
||||
def create_message(self, obj_id, content, b64=False, gzipped=False, source=None):
|
||||
if not gzipped:
|
||||
content = self.b64_gzip(content)
|
||||
elif not b64:
|
||||
content = self.b64(gzipped)
|
||||
if not content:
|
||||
return None
|
||||
if isinstance(content, bytes):
|
||||
content = content.decode()
|
||||
def create_message(self, content, b64=False, gzipped=False, source=None):
|
||||
if not source:
|
||||
source = self.name
|
||||
self.logger.info(f'{source} {obj_id}')
|
||||
# self.logger.debug(f'{source} {obj_id} {content}')
|
||||
return f'{source} {obj_id} {content}'
|
||||
|
||||
if content:
|
||||
if not gzipped:
|
||||
content = self.b64_gzip(content)
|
||||
elif not b64:
|
||||
content = self.b64(content)
|
||||
if not content:
|
||||
return None
|
||||
if isinstance(content, bytes):
|
||||
content = content.decode()
|
||||
return f'{source} {content}'
|
||||
else:
|
||||
return f'{source}'
|
||||
|
||||
|
|
|
@ -33,3 +33,4 @@ class BgpMonitorFeeder(DefaultFeeder):
|
|||
tag = 'infoleak:automatic-detection=bgp_monitor'
|
||||
item = Item(self.get_item_id())
|
||||
item.add_tag(tag)
|
||||
return set()
|
||||
|
|
|
@ -9,14 +9,21 @@ Process Feeder Json (example: Twitter feeder)
|
|||
"""
|
||||
import os
|
||||
import datetime
|
||||
import sys
|
||||
import uuid
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.objects import ail_objects
|
||||
|
||||
class DefaultFeeder:
|
||||
"""Default Feeder"""
|
||||
|
||||
def __init__(self, json_data):
|
||||
self.json_data = json_data
|
||||
self.item_id = None
|
||||
self.obj = None
|
||||
self.name = None
|
||||
|
||||
def get_name(self):
|
||||
|
@ -24,8 +31,12 @@ class DefaultFeeder:
|
|||
Return feeder name. first part of the item_id and display in the UI
|
||||
"""
|
||||
if not self.name:
|
||||
return self.get_source()
|
||||
return self.name
|
||||
name = self.get_source()
|
||||
else:
|
||||
name = self.name
|
||||
if not name:
|
||||
name = 'default'
|
||||
return name
|
||||
|
||||
def get_source(self):
|
||||
return self.json_data.get('source')
|
||||
|
@ -51,15 +62,22 @@ class DefaultFeeder:
|
|||
"""
|
||||
return self.json_data.get('data')
|
||||
|
||||
def get_obj_type(self):
|
||||
meta = self.get_json_meta()
|
||||
return meta.get('type', 'item')
|
||||
|
||||
## OVERWRITE ME ##
|
||||
def get_item_id(self):
|
||||
def get_obj(self):
|
||||
"""
|
||||
Return item id. define item id
|
||||
Return obj global id. define obj global id
|
||||
Default == item object
|
||||
"""
|
||||
date = datetime.date.today().strftime("%Y/%m/%d")
|
||||
item_id = os.path.join(self.get_name(), date, str(uuid.uuid4()))
|
||||
self.item_id = f'{item_id}.gz'
|
||||
return self.item_id
|
||||
obj_id = os.path.join(self.get_name(), date, str(uuid.uuid4()))
|
||||
obj_id = f'{obj_id}.gz'
|
||||
obj_id = f'item::{obj_id}'
|
||||
self.obj = ail_objects.get_obj_from_global_id(obj_id)
|
||||
return self.obj
|
||||
|
||||
## OVERWRITE ME ##
|
||||
def process_meta(self):
|
||||
|
@ -67,4 +85,4 @@ class DefaultFeeder:
|
|||
Process JSON meta filed.
|
||||
"""
|
||||
# meta = self.get_json_meta()
|
||||
pass
|
||||
return set()
|
||||
|
|
38
bin/importer/feeders/Discord.py
Executable file
38
bin/importer/feeders/Discord.py
Executable file
|
@ -0,0 +1,38 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Telegram Feeder Importer Module
|
||||
================
|
||||
|
||||
Process Telegram JSON
|
||||
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import datetime
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from importer.feeders.abstract_chats_feeder import AbstractChatFeeder
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import ail_objects
|
||||
from lib.objects.Chats import Chat
|
||||
from lib.objects import Messages
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects.Usernames import Username
|
||||
|
||||
import base64
|
||||
|
||||
class DiscordFeeder(AbstractChatFeeder):
|
||||
|
||||
def __init__(self, json_data):
|
||||
super().__init__('discord', json_data)
|
||||
|
||||
# def get_obj(self):.
|
||||
# obj_id = Messages.create_obj_id('telegram', chat_id, message_id, timestamp)
|
||||
# obj_id = f'message:telegram:{obj_id}'
|
||||
# self.obj = ail_objects.get_obj_from_global_id(obj_id)
|
||||
# return self.obj
|
||||
|
|
@ -17,7 +17,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import item_basic
|
||||
from lib.objects.Items import Item
|
||||
|
||||
|
||||
class JabberFeeder(DefaultFeeder):
|
||||
|
@ -36,7 +36,7 @@ class JabberFeeder(DefaultFeeder):
|
|||
self.item_id = f'{item_id}.gz'
|
||||
return self.item_id
|
||||
|
||||
def process_meta(self):
|
||||
def process_meta(self): # TODO replace me by message
|
||||
"""
|
||||
Process JSON meta field.
|
||||
"""
|
||||
|
@ -44,10 +44,12 @@ class JabberFeeder(DefaultFeeder):
|
|||
# item_basic.add_map_obj_id_item_id(jabber_id, item_id, 'jabber_id') ##############################################
|
||||
to = str(self.json_data['meta']['jabber:to'])
|
||||
fr = str(self.json_data['meta']['jabber:from'])
|
||||
date = item_basic.get_item_date(item_id)
|
||||
|
||||
item = Item(self.item_id)
|
||||
date = item.get_date()
|
||||
|
||||
user_to = Username(to, 'jabber')
|
||||
user_fr = Username(fr, 'jabber')
|
||||
user_to.add(date, self.item_id)
|
||||
user_fr.add(date, self.item_id)
|
||||
return None
|
||||
user_to.add(date, item)
|
||||
user_fr.add(date, item)
|
||||
return set()
|
||||
|
|
|
@ -15,42 +15,24 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from importer.feeders.abstract_chats_feeder import AbstractChatFeeder
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import ail_objects
|
||||
from lib.objects.Chats import Chat
|
||||
from lib.objects import Messages
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import item_basic
|
||||
|
||||
class TelegramFeeder(DefaultFeeder):
|
||||
import base64
|
||||
|
||||
class TelegramFeeder(AbstractChatFeeder):
|
||||
|
||||
def __init__(self, json_data):
|
||||
super().__init__(json_data)
|
||||
self.name = 'telegram'
|
||||
super().__init__('telegram', json_data)
|
||||
|
||||
# define item id
|
||||
def get_item_id(self):
|
||||
# TODO use telegram message date
|
||||
date = datetime.date.today().strftime("%Y/%m/%d")
|
||||
channel_id = str(self.json_data['meta']['channel_id'])
|
||||
message_id = str(self.json_data['meta']['message_id'])
|
||||
item_id = f'{channel_id}_{message_id}'
|
||||
item_id = os.path.join('telegram', date, item_id)
|
||||
self.item_id = f'{item_id}.gz'
|
||||
return self.item_id
|
||||
# def get_obj(self):.
|
||||
# obj_id = Messages.create_obj_id('telegram', chat_id, message_id, timestamp)
|
||||
# obj_id = f'message:telegram:{obj_id}'
|
||||
# self.obj = ail_objects.get_obj_from_global_id(obj_id)
|
||||
# return self.obj
|
||||
|
||||
def process_meta(self):
|
||||
"""
|
||||
Process JSON meta field.
|
||||
"""
|
||||
# channel_id = str(self.json_data['meta']['channel_id'])
|
||||
# message_id = str(self.json_data['meta']['message_id'])
|
||||
# telegram_id = f'{channel_id}_{message_id}'
|
||||
# item_basic.add_map_obj_id_item_id(telegram_id, item_id, 'telegram_id') #########################################
|
||||
user = None
|
||||
if self.json_data['meta'].get('user'):
|
||||
user = str(self.json_data['meta']['user'])
|
||||
elif self.json_data['meta'].get('channel'):
|
||||
user = str(self.json_data['meta']['channel'].get('username'))
|
||||
if user:
|
||||
date = item_basic.get_item_date(self.item_id)
|
||||
username = Username(user, 'telegram')
|
||||
username.add(date, self.item_id)
|
||||
return None
|
||||
|
|
|
@ -17,7 +17,7 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import item_basic
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class TwitterFeeder(DefaultFeeder):
|
||||
|
||||
|
@ -40,9 +40,9 @@ class TwitterFeeder(DefaultFeeder):
|
|||
'''
|
||||
# tweet_id = str(self.json_data['meta']['twitter:tweet_id'])
|
||||
# item_basic.add_map_obj_id_item_id(tweet_id, item_id, 'twitter_id') ############################################
|
||||
|
||||
date = item_basic.get_item_date(self.item_id)
|
||||
item = Item(self.item_id)
|
||||
date = item.get_date()
|
||||
user = str(self.json_data['meta']['twitter:id'])
|
||||
username = Username(user, 'twitter')
|
||||
username.add(date, item_id)
|
||||
return None
|
||||
username.add(date, item)
|
||||
return set()
|
||||
|
|
|
@ -56,3 +56,5 @@ class UrlextractFeeder(DefaultFeeder):
|
|||
item = Item(self.item_id)
|
||||
item.set_parent(parent_id)
|
||||
|
||||
return set()
|
||||
|
||||
|
|
394
bin/importer/feeders/abstract_chats_feeder.py
Executable file
394
bin/importer/feeders/abstract_chats_feeder.py
Executable file
|
@ -0,0 +1,394 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
Abstract Chat JSON Feeder Importer Module
|
||||
================
|
||||
|
||||
Process Feeder Json (example: Twitter feeder)
|
||||
|
||||
"""
|
||||
import datetime
|
||||
import os
|
||||
import sys
|
||||
|
||||
from abc import ABC
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from importer.feeders.Default import DefaultFeeder
|
||||
from lib.objects.Chats import Chat
|
||||
from lib.objects import ChatSubChannels
|
||||
from lib.objects import ChatThreads
|
||||
from lib.objects import Images
|
||||
from lib.objects import Messages
|
||||
from lib.objects import FilesNames
|
||||
# from lib.objects import Files
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects.Usernames import Username
|
||||
from lib import chats_viewer
|
||||
|
||||
import base64
|
||||
import io
|
||||
import gzip
|
||||
|
||||
# TODO remove compression ???
|
||||
def _gunzip_bytes_obj(bytes_obj):
|
||||
gunzipped_bytes_obj = None
|
||||
try:
|
||||
in_ = io.BytesIO()
|
||||
in_.write(bytes_obj)
|
||||
in_.seek(0)
|
||||
|
||||
with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
|
||||
gunzipped_bytes_obj = fo.read()
|
||||
except Exception as e:
|
||||
print(f'Global; Invalid Gzip file: {e}')
|
||||
|
||||
return gunzipped_bytes_obj
|
||||
|
||||
class AbstractChatFeeder(DefaultFeeder, ABC):
|
||||
|
||||
def __init__(self, name, json_data):
|
||||
super().__init__(json_data)
|
||||
self.obj = None
|
||||
self.name = name
|
||||
|
||||
def get_chat_protocol(self): # TODO # # # # # # # # # # # # #
|
||||
return self.name
|
||||
|
||||
def get_chat_network(self):
|
||||
self.json_data['meta'].get('network', None)
|
||||
|
||||
def get_chat_address(self):
|
||||
self.json_data['meta'].get('address', None)
|
||||
|
||||
def get_chat_instance_uuid(self):
|
||||
chat_instance_uuid = chats_viewer.create_chat_service_instance(self.get_chat_protocol(),
|
||||
network=self.get_chat_network(),
|
||||
address=self.get_chat_address())
|
||||
# TODO SET
|
||||
return chat_instance_uuid
|
||||
|
||||
def get_chat_id(self): # TODO RAISE ERROR IF NONE
|
||||
return self.json_data['meta']['chat']['id']
|
||||
|
||||
def get_subchannel_id(self):
|
||||
return self.json_data['meta']['chat'].get('subchannel', {}).get('id')
|
||||
|
||||
def get_subchannels(self):
|
||||
pass
|
||||
|
||||
def get_thread_id(self):
|
||||
return self.json_data['meta'].get('thread', {}).get('id')
|
||||
|
||||
def get_message_id(self):
|
||||
return self.json_data['meta']['id']
|
||||
|
||||
def get_media_name(self):
|
||||
return self.json_data['meta'].get('media', {}).get('name')
|
||||
|
||||
def get_reactions(self):
|
||||
return self.json_data['meta'].get('reactions', [])
|
||||
|
||||
def get_message_timestamp(self):
|
||||
if not self.json_data['meta'].get('date'):
|
||||
return None
|
||||
else:
|
||||
return self.json_data['meta']['date']['timestamp']
|
||||
# if self.json_data['meta'].get('date'):
|
||||
# date = datetime.datetime.fromtimestamp( self.json_data['meta']['date']['timestamp'])
|
||||
# date = date.strftime('%Y/%m/%d')
|
||||
# else:
|
||||
# date = datetime.date.today().strftime("%Y/%m/%d")
|
||||
|
||||
def get_message_date_timestamp(self):
|
||||
timestamp = self.get_message_timestamp()
|
||||
date = datetime.datetime.fromtimestamp(timestamp)
|
||||
date = date.strftime('%Y%m%d')
|
||||
return date, timestamp
|
||||
|
||||
def get_message_sender_id(self):
|
||||
return self.json_data['meta']['sender']['id']
|
||||
|
||||
def get_message_reply(self):
|
||||
return self.json_data['meta'].get('reply_to') # TODO change to reply ???
|
||||
|
||||
def get_message_reply_id(self):
|
||||
return self.json_data['meta'].get('reply_to', {}).get('message_id')
|
||||
|
||||
def get_message_forward(self):
|
||||
return self.json_data['meta'].get('forward')
|
||||
|
||||
def get_message_content(self):
|
||||
decoded = base64.standard_b64decode(self.json_data['data'])
|
||||
return _gunzip_bytes_obj(decoded)
|
||||
|
||||
def get_obj(self):
|
||||
#### TIMESTAMP ####
|
||||
timestamp = self.get_message_timestamp()
|
||||
|
||||
#### Create Object ID ####
|
||||
chat_id = self.get_chat_id()
|
||||
try:
|
||||
message_id = self.get_message_id()
|
||||
except KeyError:
|
||||
if chat_id:
|
||||
self.obj = Chat(chat_id, self.get_chat_instance_uuid())
|
||||
return self.obj
|
||||
else:
|
||||
self.obj = None
|
||||
return None
|
||||
|
||||
thread_id = self.get_thread_id()
|
||||
# channel id
|
||||
# thread id
|
||||
|
||||
# TODO sanitize obj type
|
||||
obj_type = self.get_obj_type()
|
||||
|
||||
if obj_type == 'image':
|
||||
self.obj = Images.Image(self.json_data['data-sha256'])
|
||||
|
||||
else:
|
||||
obj_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, thread_id=thread_id)
|
||||
self.obj = Messages.Message(obj_id)
|
||||
return self.obj
|
||||
|
||||
def process_chat(self, new_objs, obj, date, timestamp, reply_id=None):
|
||||
meta = self.json_data['meta']['chat'] # todo replace me by function
|
||||
chat = Chat(self.get_chat_id(), self.get_chat_instance_uuid())
|
||||
subchannel = None
|
||||
thread = None
|
||||
|
||||
# date stat + correlation
|
||||
chat.add(date, obj)
|
||||
|
||||
if meta.get('name'):
|
||||
chat.set_name(meta['name'])
|
||||
|
||||
if meta.get('info'):
|
||||
chat.set_info(meta['info'])
|
||||
|
||||
if meta.get('date'): # TODO check if already exists
|
||||
chat.set_created_at(int(meta['date']['timestamp']))
|
||||
|
||||
if meta.get('icon'):
|
||||
img = Images.create(meta['icon'], b64=True)
|
||||
img.add(date, chat)
|
||||
chat.set_icon(img.get_global_id())
|
||||
new_objs.add(img)
|
||||
|
||||
if meta.get('username'):
|
||||
username = Username(meta['username'], self.get_chat_protocol())
|
||||
chat.update_username_timeline(username.get_global_id(), timestamp)
|
||||
|
||||
if meta.get('subchannel'):
|
||||
subchannel, thread = self.process_subchannel(obj, date, timestamp, reply_id=reply_id)
|
||||
chat.add_children(obj_global_id=subchannel.get_global_id())
|
||||
else:
|
||||
if obj.type == 'message':
|
||||
if self.get_thread_id():
|
||||
thread = self.process_thread(obj, chat, date, timestamp, reply_id=reply_id)
|
||||
else:
|
||||
chat.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
|
||||
|
||||
chats_obj = [chat]
|
||||
if subchannel:
|
||||
chats_obj.append(subchannel)
|
||||
if thread:
|
||||
chats_obj.append(thread)
|
||||
return chats_obj
|
||||
|
||||
def process_subchannel(self, obj, date, timestamp, reply_id=None): # TODO CREATE DATE
|
||||
meta = self.json_data['meta']['chat']['subchannel']
|
||||
subchannel = ChatSubChannels.ChatSubChannel(f'{self.get_chat_id()}/{meta["id"]}', self.get_chat_instance_uuid())
|
||||
thread = None
|
||||
|
||||
# TODO correlation with obj = message/image
|
||||
subchannel.add(date)
|
||||
|
||||
if meta.get('date'): # TODO check if already exists
|
||||
subchannel.set_created_at(int(meta['date']['timestamp']))
|
||||
|
||||
if meta.get('name'):
|
||||
subchannel.set_name(meta['name'])
|
||||
# subchannel.update_name(meta['name'], timestamp) # TODO #################
|
||||
|
||||
if meta.get('info'):
|
||||
subchannel.set_info(meta['info'])
|
||||
|
||||
if obj.type == 'message':
|
||||
if self.get_thread_id():
|
||||
thread = self.process_thread(obj, subchannel, date, timestamp, reply_id=reply_id)
|
||||
else:
|
||||
subchannel.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
|
||||
return subchannel, thread
|
||||
|
||||
def process_thread(self, obj, obj_chat, date, timestamp, reply_id=None):
|
||||
meta = self.json_data['meta']['thread']
|
||||
thread_id = self.get_thread_id()
|
||||
p_chat_id = meta['parent'].get('chat')
|
||||
p_subchannel_id = meta['parent'].get('subchannel')
|
||||
p_message_id = meta['parent'].get('message')
|
||||
|
||||
# print(thread_id, p_chat_id, p_subchannel_id, p_message_id)
|
||||
|
||||
if p_chat_id == self.get_chat_id() and p_subchannel_id == self.get_subchannel_id():
|
||||
thread = ChatThreads.create(thread_id, self.get_chat_instance_uuid(), p_chat_id, p_subchannel_id, p_message_id, obj_chat)
|
||||
thread.add(date, obj)
|
||||
thread.add_message(obj.get_global_id(), self.get_message_id(), timestamp, reply_id=reply_id)
|
||||
# TODO OTHERS CORRELATIONS TO ADD
|
||||
|
||||
if meta.get('name'):
|
||||
thread.set_name(meta['name'])
|
||||
|
||||
return thread
|
||||
|
||||
# TODO
|
||||
# else:
|
||||
# # ADD NEW MESSAGE REF (used by discord)
|
||||
|
||||
def process_sender(self, new_objs, obj, date, timestamp):
|
||||
meta = self.json_data['meta'].get('sender')
|
||||
if not meta:
|
||||
return None
|
||||
|
||||
user_account = UsersAccount.UserAccount(meta['id'], self.get_chat_instance_uuid())
|
||||
|
||||
# date stat + correlation
|
||||
user_account.add(date, obj)
|
||||
|
||||
if meta.get('username'):
|
||||
username = Username(meta['username'], self.get_chat_protocol())
|
||||
# TODO timeline or/and correlation ????
|
||||
user_account.add_correlation(username.type, username.get_subtype(r_str=True), username.id)
|
||||
user_account.update_username_timeline(username.get_global_id(), timestamp)
|
||||
|
||||
# Username---Message
|
||||
username.add(date) # TODO # correlation message ???
|
||||
|
||||
# ADDITIONAL METAS
|
||||
if meta.get('firstname'):
|
||||
user_account.set_first_name(meta['firstname'])
|
||||
if meta.get('lastname'):
|
||||
user_account.set_last_name(meta['lastname'])
|
||||
if meta.get('phone'):
|
||||
user_account.set_phone(meta['phone'])
|
||||
|
||||
if meta.get('icon'):
|
||||
img = Images.create(meta['icon'], b64=True)
|
||||
img.add(date, user_account)
|
||||
user_account.set_icon(img.get_global_id())
|
||||
new_objs.add(img)
|
||||
|
||||
if meta.get('info'):
|
||||
user_account.set_info(meta['info'])
|
||||
|
||||
return user_account
|
||||
|
||||
def process_meta(self): # TODO CHECK MANDATORY FIELDS
|
||||
"""
|
||||
Process JSON meta filed.
|
||||
"""
|
||||
# meta = self.get_json_meta()
|
||||
|
||||
objs = set()
|
||||
if self.obj:
|
||||
objs.add(self.obj)
|
||||
new_objs = set()
|
||||
|
||||
date, timestamp = self.get_message_date_timestamp()
|
||||
|
||||
# REPLY
|
||||
reply_id = self.get_message_reply_id()
|
||||
|
||||
print(self.obj.type)
|
||||
|
||||
# TODO FILES + FILES REF
|
||||
|
||||
# get object by meta object type
|
||||
if self.obj.type == 'message':
|
||||
# Content
|
||||
obj = Messages.create(self.obj.id, self.get_message_content())
|
||||
|
||||
# FILENAME
|
||||
media_name = self.get_media_name()
|
||||
if media_name:
|
||||
print(media_name)
|
||||
FilesNames.FilesNames().create(media_name, date, obj)
|
||||
|
||||
for reaction in self.get_reactions():
|
||||
obj.add_reaction(reaction['reaction'], int(reaction['count']))
|
||||
elif self.obj.type == 'chat':
|
||||
pass
|
||||
else:
|
||||
chat_id = self.get_chat_id()
|
||||
thread_id = self.get_thread_id()
|
||||
channel_id = self.get_subchannel_id()
|
||||
message_id = self.get_message_id()
|
||||
message_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, channel_id=channel_id, thread_id=thread_id)
|
||||
message = Messages.Message(message_id)
|
||||
# create empty message if message don't exist
|
||||
if not message.exists():
|
||||
message.create('')
|
||||
objs.add(message)
|
||||
|
||||
if message.exists(): # TODO Correlation user-account image/filename ????
|
||||
obj = Images.create(self.get_message_content())
|
||||
obj.add(date, message)
|
||||
obj.set_parent(obj_global_id=message.get_global_id())
|
||||
|
||||
# FILENAME
|
||||
media_name = self.get_media_name()
|
||||
if media_name:
|
||||
FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
|
||||
|
||||
for reaction in self.get_reactions():
|
||||
message.add_reaction(reaction['reaction'], int(reaction['count']))
|
||||
|
||||
for obj in objs: # TODO PERF avoid parsing metas multiple times
|
||||
|
||||
# TODO get created subchannel + thread
|
||||
# => create correlation user-account with object
|
||||
|
||||
print(obj.id)
|
||||
|
||||
# CHAT
|
||||
chat_objs = self.process_chat(new_objs, obj, date, timestamp, reply_id=reply_id)
|
||||
|
||||
# Message forward
|
||||
# if self.get_json_meta().get('forward'):
|
||||
# forward_from = self.get_message_forward()
|
||||
# print('-----------------------------------------------------------')
|
||||
# print(forward_from)
|
||||
# if forward_from:
|
||||
# forward_from_type = forward_from['from']['type']
|
||||
# if forward_from_type == 'channel' or forward_from_type == 'chat':
|
||||
# chat_forward_id = forward_from['from']['id']
|
||||
# chat_forward = Chat(chat_forward_id, self.get_chat_instance_uuid())
|
||||
# if chat_forward.exists():
|
||||
# for chat_obj in chat_objs:
|
||||
# if chat_obj.type == 'chat':
|
||||
# chat_forward.add_relationship(chat_obj.get_global_id(), 'forward')
|
||||
# # chat_forward.add_relationship(obj.get_global_id(), 'forward')
|
||||
|
||||
# SENDER # TODO HANDLE NULL SENDER
|
||||
user_account = self.process_sender(new_objs, obj, date, timestamp)
|
||||
|
||||
if user_account:
|
||||
# UserAccount---ChatObjects
|
||||
for obj_chat in chat_objs:
|
||||
user_account.add_correlation(obj_chat.type, obj_chat.get_subtype(r_str=True), obj_chat.id)
|
||||
|
||||
# if chat: # TODO Chat---Username correlation ???
|
||||
# # Chat---Username => need to handle members and participants
|
||||
# chat.add_correlation(username.type, username.get_subtype(r_str=True), username.id)
|
||||
|
||||
# TODO Sender image -> correlation
|
||||
# image
|
||||
# -> subchannel ?
|
||||
# -> thread id ?
|
||||
|
||||
return new_objs | objs
|
|
@ -83,6 +83,7 @@ class ConfigLoader(object):
|
|||
else:
|
||||
return []
|
||||
|
||||
|
||||
# # # # Directory Config # # # #
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
|
|
@ -16,7 +16,7 @@ import time
|
|||
import uuid
|
||||
|
||||
from enum import Enum
|
||||
from flask import escape
|
||||
from markupsafe import escape
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
|
@ -235,18 +235,27 @@ class Investigation(object):
|
|||
objs.append(dict_obj)
|
||||
return objs
|
||||
|
||||
def get_objects_comment(self, obj_global_id):
|
||||
return r_tracking.hget(f'investigations:objs:comment:{self.uuid}', obj_global_id)
|
||||
|
||||
def set_objects_comment(self, obj_global_id, comment):
|
||||
if comment:
|
||||
r_tracking.hset(f'investigations:objs:comment:{self.uuid}', obj_global_id, comment)
|
||||
|
||||
# # TODO: def register_object(self, Object): in OBJECT CLASS
|
||||
|
||||
def register_object(self, obj_id, obj_type, subtype):
|
||||
def register_object(self, obj_id, obj_type, subtype, comment=''):
|
||||
r_tracking.sadd(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
|
||||
r_tracking.sadd(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid)
|
||||
if comment:
|
||||
self.set_objects_comment(f'{obj_type}:{subtype}:{obj_id}', comment)
|
||||
timestamp = int(time.time())
|
||||
self.set_last_change(timestamp)
|
||||
|
||||
|
||||
def unregister_object(self, obj_id, obj_type, subtype):
|
||||
r_tracking.srem(f'investigations:objs:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
|
||||
r_tracking.srem(f'obj:investigations:{obj_type}:{subtype}:{obj_id}', self.uuid)
|
||||
r_tracking.hdel(f'investigations:objs:comment:{self.uuid}', f'{obj_type}:{subtype}:{obj_id}')
|
||||
timestamp = int(time.time())
|
||||
self.set_last_change(timestamp)
|
||||
|
||||
|
@ -351,7 +360,7 @@ def get_investigations_selector():
|
|||
for investigation_uuid in get_all_investigations():
|
||||
investigation = Investigation(investigation_uuid)
|
||||
name = investigation.get_info()
|
||||
l_investigations.append({"id":investigation_uuid, "name": name})
|
||||
l_investigations.append({"id": investigation_uuid, "name": name})
|
||||
return l_investigations
|
||||
|
||||
#{id:'8dc4b81aeff94a9799bd70ba556fa345',name:"Paris"}
|
||||
|
@ -453,7 +462,11 @@ def api_register_object(json_dict):
|
|||
if subtype == 'None':
|
||||
subtype = ''
|
||||
obj_id = json_dict.get('id', '').replace(' ', '')
|
||||
res = investigation.register_object(obj_id, obj_type, subtype)
|
||||
|
||||
comment = json_dict.get('comment', '')
|
||||
# if comment:
|
||||
# comment = escape(comment)
|
||||
res = investigation.register_object(obj_id, obj_type, subtype, comment=comment)
|
||||
return res, 200
|
||||
|
||||
def api_unregister_object(json_dict):
|
||||
|
|
|
@ -2,7 +2,24 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import html2text
|
||||
|
||||
import gcld3
|
||||
from libretranslatepy import LibreTranslateAPI
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
TRANSLATOR_URL = config_loader.get_config_str('Translation', 'libretranslate')
|
||||
config_loader = None
|
||||
|
||||
|
||||
dict_iso_languages = {
|
||||
'af': 'Afrikaans',
|
||||
|
@ -237,3 +254,201 @@ def get_iso_from_languages(l_languages, sort=False):
|
|||
if sort:
|
||||
l_iso = sorted(l_iso)
|
||||
return l_iso
|
||||
|
||||
|
||||
class LanguageDetector:
|
||||
pass
|
||||
|
||||
def get_translator_instance():
|
||||
return TRANSLATOR_URL
|
||||
|
||||
def _get_html2text(content, ignore_links=False):
|
||||
h = html2text.HTML2Text()
|
||||
h.ignore_links = ignore_links
|
||||
h.ignore_images = ignore_links
|
||||
return h.handle(content)
|
||||
|
||||
def _clean_text_to_translate(content, html=False, keys_blocks=True):
|
||||
if html:
|
||||
content = _get_html2text(content, ignore_links=True)
|
||||
|
||||
# REMOVE URLS
|
||||
regex = r'\b(?:http://|https://)?(?:[a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)(?:\:[0-9]+)*(?:/(?:$|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*\b'
|
||||
url_regex = re.compile(regex)
|
||||
urls = url_regex.findall(content)
|
||||
urls = sorted(urls, key=len, reverse=True)
|
||||
for url in urls:
|
||||
content = content.replace(url, '')
|
||||
|
||||
# REMOVE PGP Blocks
|
||||
if keys_blocks:
|
||||
regex_pgp_public_blocs = r'-----BEGIN PGP PUBLIC KEY BLOCK-----[\s\S]+?-----END PGP PUBLIC KEY BLOCK-----'
|
||||
regex_pgp_signature = r'-----BEGIN PGP SIGNATURE-----[\s\S]+?-----END PGP SIGNATURE-----'
|
||||
regex_pgp_message = r'-----BEGIN PGP MESSAGE-----[\s\S]+?-----END PGP MESSAGE-----'
|
||||
re.compile(regex_pgp_public_blocs)
|
||||
re.compile(regex_pgp_signature)
|
||||
re.compile(regex_pgp_message)
|
||||
res = re.findall(regex_pgp_public_blocs, content)
|
||||
for it in res:
|
||||
content = content.replace(it, '')
|
||||
res = re.findall(regex_pgp_signature, content)
|
||||
for it in res:
|
||||
content = content.replace(it, '')
|
||||
res = re.findall(regex_pgp_message, content)
|
||||
for it in res:
|
||||
content = content.replace(it, '')
|
||||
return content
|
||||
|
||||
#### AIL Objects ####
|
||||
|
||||
def get_obj_translation(obj_global_id, content, field='', source=None, target='en'):
|
||||
"""
|
||||
Returns translated content
|
||||
"""
|
||||
translation = r_cache.get(f'translation:{target}:{obj_global_id}:{field}')
|
||||
if translation:
|
||||
# DEBUG
|
||||
# print('cache')
|
||||
# r_cache.expire(f'translation:{target}:{obj_global_id}:{field}', 0)
|
||||
return translation
|
||||
translation = LanguageTranslator().translate(content, source=source, target=target)
|
||||
if translation:
|
||||
r_cache.set(f'translation:{target}:{obj_global_id}:{field}', translation)
|
||||
r_cache.expire(f'translation:{target}:{obj_global_id}:{field}', 300)
|
||||
return translation
|
||||
|
||||
## --AIL Objects-- ##
|
||||
|
||||
class LanguagesDetector:
|
||||
|
||||
def __init__(self, nb_langs=3, min_proportion=0.2, min_probability=0.7, min_len=0):
|
||||
self.lt = LibreTranslateAPI(get_translator_instance())
|
||||
try:
|
||||
self.lt.languages()
|
||||
except Exception:
|
||||
self.lt = None
|
||||
self.detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)
|
||||
self.nb_langs = nb_langs
|
||||
self.min_proportion = min_proportion
|
||||
self.min_probability = min_probability
|
||||
self.min_len = min_len
|
||||
|
||||
def detect_gcld3(self, content):
|
||||
languages = []
|
||||
content = _clean_text_to_translate(content, html=True)
|
||||
if self.min_len > 0:
|
||||
if len(content) < self.min_len:
|
||||
return languages
|
||||
for lang in self.detector.FindTopNMostFreqLangs(content, num_langs=self.nb_langs):
|
||||
if lang.proportion >= self.min_proportion and lang.probability >= self.min_probability and lang.is_reliable:
|
||||
languages.append(lang.language)
|
||||
return languages
|
||||
|
||||
def detect_libretranslate(self, content):
|
||||
languages = []
|
||||
try:
|
||||
# [{"confidence": 0.6, "language": "en"}]
|
||||
resp = self.lt.detect(content)
|
||||
except Exception as e: # TODO ERROR MESSAGE
|
||||
raise Exception(f'libretranslate error: {e}')
|
||||
# resp = []
|
||||
if resp:
|
||||
if isinstance(resp, dict):
|
||||
raise Exception(f'libretranslate error {resp}')
|
||||
for language in resp:
|
||||
if language.confidence >= self.min_probability:
|
||||
languages.append(language)
|
||||
return languages
|
||||
|
||||
def detect(self, content, force_gcld3=False):
|
||||
# gcld3
|
||||
if len(content) >= 200 or not self.lt or force_gcld3:
|
||||
language = self.detect_gcld3(content)
|
||||
# libretranslate
|
||||
else:
|
||||
language = self.detect_libretranslate(content)
|
||||
return language
|
||||
|
||||
class LanguageTranslator:
|
||||
|
||||
def __init__(self):
|
||||
self.lt = LibreTranslateAPI(get_translator_instance())
|
||||
|
||||
def languages(self):
|
||||
languages = []
|
||||
try:
|
||||
for dict_lang in self.lt.languages():
|
||||
languages.append({'iso': dict_lang['code'], 'language': dict_lang['name']})
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return languages
|
||||
|
||||
def detect_gcld3(self, content):
|
||||
content = _clean_text_to_translate(content, html=True)
|
||||
detector = gcld3.NNetLanguageIdentifier(min_num_bytes=0, max_num_bytes=1000)
|
||||
lang = detector.FindLanguage(content)
|
||||
# print(lang.language)
|
||||
# print(lang.is_reliable)
|
||||
# print(lang.proportion)
|
||||
# print(lang.probability)
|
||||
return lang.language
|
||||
|
||||
def detect_libretranslate(self, content):
|
||||
try:
|
||||
language = self.lt.detect(content)
|
||||
except: # TODO ERROR MESSAGE
|
||||
language = None
|
||||
if language:
|
||||
return language[0].get('language')
|
||||
|
||||
def detect(self, content):
|
||||
# gcld3
|
||||
if len(content) >= 200:
|
||||
language = self.detect_gcld3(content)
|
||||
# libretranslate
|
||||
else:
|
||||
language = self.detect_libretranslate(content)
|
||||
return language
|
||||
|
||||
def translate(self, content, source=None, target="en"): # TODO source target
|
||||
if target not in get_translation_languages():
|
||||
return None
|
||||
translation = None
|
||||
if content:
|
||||
if not source:
|
||||
source = self.detect(content)
|
||||
# print(source, content)
|
||||
if source:
|
||||
if source != target:
|
||||
try:
|
||||
# print(content, source, target)
|
||||
translation = self.lt.translate(content, source, target)
|
||||
except:
|
||||
translation = None
|
||||
# TODO LOG and display error
|
||||
if translation == content:
|
||||
print('EQUAL')
|
||||
translation = None
|
||||
return translation
|
||||
|
||||
|
||||
LIST_LANGUAGES = {}
|
||||
def get_translation_languages():
|
||||
global LIST_LANGUAGES
|
||||
if not LIST_LANGUAGES:
|
||||
try:
|
||||
LIST_LANGUAGES = {}
|
||||
for lang in LanguageTranslator().languages():
|
||||
LIST_LANGUAGES[lang['iso']] = lang['language']
|
||||
except Exception as e:
|
||||
print(e)
|
||||
LIST_LANGUAGES = {}
|
||||
return LIST_LANGUAGES
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# t_content = ''
|
||||
langg = LanguageTranslator()
|
||||
# langg = LanguagesDetector()
|
||||
# lang.translate(t_content, source='ru')
|
||||
langg.languages()
|
||||
|
|
200
bin/lib/Tag.py
200
bin/lib/Tag.py
|
@ -64,7 +64,7 @@ unsafe_tags = build_unsafe_tags()
|
|||
# get set_keys: intersection
|
||||
def get_obj_keys_by_tags(tags, obj_type, subtype='', date=None):
|
||||
l_set_keys = []
|
||||
if obj_type == 'item':
|
||||
if obj_type == 'item' or obj_type == 'message':
|
||||
for tag in tags:
|
||||
l_set_keys.append(f'{obj_type}:{subtype}:{tag}:{date}')
|
||||
else:
|
||||
|
@ -338,7 +338,7 @@ def get_galaxy_meta(galaxy_name, nb_active_tags=False):
|
|||
else:
|
||||
meta['icon'] = f'fas fa-{icon}'
|
||||
if nb_active_tags:
|
||||
meta['nb_active_tags'] = get_galaxy_nb_tags_enabled(galaxy)
|
||||
meta['nb_active_tags'] = get_galaxy_nb_tags_enabled(galaxy.type)
|
||||
meta['nb_tags'] = len(get_galaxy_tags(galaxy.type))
|
||||
return meta
|
||||
|
||||
|
@ -387,8 +387,12 @@ def get_cluster_tags(cluster_type, enabled=False):
|
|||
meta_tag = {'tag': tag, 'description': cluster_val.description}
|
||||
if enabled:
|
||||
meta_tag['enabled'] = is_galaxy_tag_enabled(cluster_type, tag)
|
||||
synonyms = cluster_val.meta.synonyms
|
||||
if not synonyms:
|
||||
cluster_val_meta = cluster_val.meta
|
||||
if cluster_val_meta:
|
||||
synonyms = cluster_val_meta.synonyms
|
||||
if not synonyms:
|
||||
synonyms = []
|
||||
else:
|
||||
synonyms = []
|
||||
meta_tag['synonyms'] = synonyms
|
||||
tags.append(meta_tag)
|
||||
|
@ -631,7 +635,7 @@ def update_tag_metadata(tag, date, delete=False): # # TODO: delete Tags
|
|||
# r_tags.smembers(f'{tag}:{date}')
|
||||
# r_tags.smembers(f'{obj_type}:{tag}')
|
||||
def get_tag_objects(tag, obj_type, subtype='', date=''):
|
||||
if obj_type == 'item':
|
||||
if obj_type == 'item' or obj_type == 'message':
|
||||
return r_tags.smembers(f'{obj_type}:{subtype}:{tag}:{date}')
|
||||
else:
|
||||
return r_tags.smembers(f'{obj_type}:{subtype}:{tag}')
|
||||
|
@ -653,6 +657,15 @@ def add_object_tag(tag, obj_type, obj_id, subtype=''):
|
|||
domain = item_basic.get_item_domain(obj_id)
|
||||
add_object_tag(tag, "domain", domain)
|
||||
|
||||
update_tag_metadata(tag, date)
|
||||
# MESSAGE
|
||||
elif obj_type == 'message':
|
||||
timestamp = obj_id.split('/')[1]
|
||||
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}:{date}', obj_id)
|
||||
|
||||
# TODO ADD CHAT TAGS ????
|
||||
|
||||
update_tag_metadata(tag, date)
|
||||
else:
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{tag}', obj_id)
|
||||
|
@ -671,8 +684,8 @@ def confirm_tag(tag, obj):
|
|||
# TODO REVIEW ME
|
||||
def update_tag_global_by_obj_type(tag, obj_type, subtype=''):
|
||||
tag_deleted = False
|
||||
if obj_type == 'item':
|
||||
if not r_tags.exists(f'tag_metadata:{tag}'):
|
||||
if obj_type == 'item' or obj_type == 'message':
|
||||
if not r_tags.exists(f'tag_metadata:{tag}'): # TODO FIXME #################################################################
|
||||
tag_deleted = True
|
||||
else:
|
||||
if not r_tags.exists(f'{obj_type}:{subtype}:{tag}'):
|
||||
|
@ -703,6 +716,12 @@ def delete_object_tag(tag, obj_type, id, subtype=''):
|
|||
date = item_basic.get_item_date(id)
|
||||
r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id)
|
||||
|
||||
update_tag_metadata(tag, date, delete=True)
|
||||
elif obj_type == 'message':
|
||||
timestamp = id.split('/')[1]
|
||||
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
r_tags.srem(f'{obj_type}:{subtype}:{tag}:{date}', id)
|
||||
|
||||
update_tag_metadata(tag, date, delete=True)
|
||||
else:
|
||||
r_tags.srem(f'{obj_type}:{subtype}:{tag}', id)
|
||||
|
@ -725,7 +744,7 @@ def delete_object_tags(obj_type, subtype, obj_id):
|
|||
def get_obj_by_tags(obj_type, l_tags, date_from=None, date_to=None, nb_obj=50, page=1):
|
||||
# with daterange
|
||||
l_tagged_obj = []
|
||||
if obj_type=='item':
|
||||
if obj_type=='item' or obj_type=='message':
|
||||
#sanityze date
|
||||
date_range = sanitise_tags_date_range(l_tags, date_from=date_from, date_to=date_to)
|
||||
l_dates = Date.substract_date(date_range['date_from'], date_range['date_to'])
|
||||
|
@ -1181,12 +1200,17 @@ def get_enabled_tags_with_synonyms_ui():
|
|||
|
||||
# TYPE -> taxonomy/galaxy/custom
|
||||
|
||||
# TODO GET OBJ Types
|
||||
class Tag:
|
||||
|
||||
def __int__(self, name: str, local=False): # TODO Get first seen by object, obj='item
|
||||
self.name = name
|
||||
self.local = local
|
||||
|
||||
# TODO
|
||||
def exists(self):
|
||||
pass
|
||||
|
||||
def is_local(self):
|
||||
return self.local
|
||||
|
||||
|
@ -1197,7 +1221,11 @@ class Tag:
|
|||
else:
|
||||
return 'taxonomy'
|
||||
|
||||
def is_taxonomy(self):
|
||||
return not self.local and self.is_galaxy()
|
||||
|
||||
def is_galaxy(self):
|
||||
return not self.local and self.name.startswith('misp-galaxy:')
|
||||
|
||||
def get_first_seen(self, r_int=False):
|
||||
first_seen = r_tags.hget(f'meta:tag:{self.name}', 'first_seen')
|
||||
|
@ -1208,6 +1236,9 @@ class Tag:
|
|||
first_seen = 99999999
|
||||
return first_seen
|
||||
|
||||
def set_first_seen(self, first_seen):
|
||||
return r_tags.hget(f'meta:tag:{self.name}', 'first_seen', int(first_seen))
|
||||
|
||||
def get_last_seen(self, r_int=False):
|
||||
last_seen = r_tags.hget(f'meta:tag:{self.name}', 'last_seen') # 'last_seen:object' -> only if date or daterange
|
||||
if r_int:
|
||||
|
@ -1217,6 +1248,9 @@ class Tag:
|
|||
last_seen = 0
|
||||
return last_seen
|
||||
|
||||
def set_last_seen(self, last_seen):
|
||||
return r_tags.hset(f'meta:tag:{self.name}', 'last_seen', int(last_seen))
|
||||
|
||||
def get_color(self):
|
||||
color = r_tags.hget(f'meta:tag:{self.name}', 'color')
|
||||
if not color:
|
||||
|
@ -1239,6 +1273,131 @@ class Tag:
|
|||
'local': self.is_local()}
|
||||
return meta
|
||||
|
||||
def update_obj_type_first_seen(self, obj_type, first_seen, last_seen): # TODO SUBTYPE ##################################
|
||||
if int(first_seen) > int(last_seen):
|
||||
raise Exception(f'INVALID first_seen/last_seen, {first_seen}/{last_seen}')
|
||||
|
||||
for date in Date.get_daterange(first_seen, last_seen):
|
||||
date = int(date)
|
||||
if date == last_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'first_seen', first_seen)
|
||||
else:
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen') # TODO SUBTYPE
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen') # TODO SUBTYPE
|
||||
r_tags.srem(f'list_tags:{obj_type}', self.name) # TODO SUBTYPE
|
||||
|
||||
elif r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'first_seen', first_seen) # TODO METADATA OBJECT NAME
|
||||
|
||||
|
||||
def update_obj_type_last_seen(self, obj_type, first_seen, last_seen): # TODO SUBTYPE ##################################
|
||||
if int(first_seen) > int(last_seen):
|
||||
raise Exception(f'INVALID first_seen/last_seen, {first_seen}/{last_seen}')
|
||||
|
||||
for date in Date.get_daterange(first_seen, last_seen).reverse():
|
||||
date = int(date)
|
||||
if date == last_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'last_seen', last_seen)
|
||||
else:
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen') # TODO SUBTYPE
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen') # TODO SUBTYPE
|
||||
r_tags.srem(f'list_tags:{obj_type}', self.name) # TODO SUBTYPE
|
||||
|
||||
elif r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') > 0:
|
||||
r_tags.hset(f'tag_metadata:{self.name}', 'last_seen', last_seen) # TODO METADATA OBJECT NAME
|
||||
|
||||
# TODO
|
||||
# TODO Update First seen and last seen
|
||||
# TODO SUBTYPE CHATS ??????????????
|
||||
def update_obj_type_date(self, obj_type, date, op='add', first_seen=None, last_seen=None):
|
||||
date = int(date)
|
||||
if not first_seen:
|
||||
first_seen = self.get_first_seen(r_int=True)
|
||||
if not last_seen:
|
||||
last_seen = self.get_last_seen(r_int=True)
|
||||
|
||||
# Add tag
|
||||
if op == 'add':
|
||||
if date < first_seen:
|
||||
self.set_first_seen(date)
|
||||
if date > last_seen:
|
||||
self.set_last_seen(date)
|
||||
|
||||
# Delete tag
|
||||
else:
|
||||
if date == first_seen and date == last_seen:
|
||||
|
||||
# TODO OBJECTS ##############################################################################################
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') < 1: ####################### TODO OBJ SUBTYPE ???????????????????
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'first_seen')
|
||||
r_tags.hdel(f'tag_metadata:{self.name}', 'last_seen')
|
||||
# TODO CHECK IF DELETE FULL TAG LIST ############################
|
||||
|
||||
elif date == first_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{first_seen}') < 1:
|
||||
if int(last_seen) >= int(first_seen):
|
||||
self.update_obj_type_first_seen(obj_type, first_seen, last_seen) # TODO OBJ_TYPE
|
||||
|
||||
elif date == last_seen:
|
||||
if r_tags.scard(f'{obj_type}::{self.name}:{last_seen}') < 1:
|
||||
if int(last_seen) >= int(first_seen):
|
||||
self.update_obj_type_last_seen(obj_type, first_seen, last_seen) # TODO OBJ_TYPE
|
||||
|
||||
# STATS
|
||||
nb = r_tags.hincrby(f'daily_tags:{date}', self.name, -1)
|
||||
if nb < 1:
|
||||
r_tags.hdel(f'daily_tags:{date}', self.name)
|
||||
|
||||
# TODO -> CHECK IF TAG EXISTS + UPDATE FIRST SEEN/LAST SEEN
|
||||
def update(self, date=None):
|
||||
pass
|
||||
|
||||
# TODO CHANGE ME TO SUB FUNCTION ##### add_object_tag(tag, obj_type, obj_id, subtype='')
|
||||
def add(self, obj_type, subtype, obj_id):
|
||||
if subtype is None:
|
||||
subtype = ''
|
||||
|
||||
if r_tags.sadd(f'tag:{obj_type}:{subtype}:{obj_id}', self.name) == 1:
|
||||
r_tags.sadd('list_tags', self.name)
|
||||
r_tags.sadd(f'list_tags:{obj_type}', self.name)
|
||||
if subtype:
|
||||
r_tags.sadd(f'list_tags:{obj_type}:{subtype}', self.name)
|
||||
|
||||
if obj_type == 'item':
|
||||
date = item_basic.get_item_date(obj_id)
|
||||
|
||||
# add domain tag
|
||||
if item_basic.is_crawled(obj_id) and self.name != 'infoleak:submission="crawler"' and self.name != 'infoleak:submission="manual"':
|
||||
domain = item_basic.get_item_domain(obj_id)
|
||||
self.add('domain', '', domain)
|
||||
elif obj_type == 'message':
|
||||
timestamp = obj_id.split('/')[1]
|
||||
date = datetime.datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
else:
|
||||
date = None
|
||||
|
||||
if date:
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{self.name}:{date}', obj_id)
|
||||
update_tag_metadata(self.name, date)
|
||||
else:
|
||||
r_tags.sadd(f'{obj_type}:{subtype}:{self.name}', obj_id)
|
||||
|
||||
# TODO REPLACE ME BY DATE TAGS ????
|
||||
# STATS BY TYPE ???
|
||||
# DAILY STATS
|
||||
r_tags.hincrby(f'daily_tags:{datetime.date.today().strftime("%Y%m%d")}', self.name, 1)
|
||||
|
||||
|
||||
# TODO CREATE FUNCTION GET OBJECT DATE
|
||||
def remove(self, obj_type, subtype, obj_id):
|
||||
# TODO CHECK IN ALL OBJECT TO DELETE
|
||||
pass
|
||||
|
||||
def delete(self):
|
||||
pass
|
||||
|
||||
|
||||
#### TAG AUTO PUSH ####
|
||||
|
||||
|
@ -1379,7 +1538,7 @@ def api_add_obj_tags(tags=[], galaxy_tags=[], object_id=None, object_type="item"
|
|||
# r_serv_metadata.srem('tag:{}'.format(object_id), tag)
|
||||
# r_tags.srem('{}:{}'.format(object_type, tag), object_id)
|
||||
|
||||
def delete_tag(object_type, tag, object_id, obj_date=None): ################################ # TODO:
|
||||
def delete_tag(object_type, tag, object_id, obj_date=None): ################################ # TODO: REMOVE ME
|
||||
# tag exist
|
||||
if is_obj_tagged(object_id, tag):
|
||||
if not obj_date:
|
||||
|
@ -1445,6 +1604,29 @@ def get_list_of_solo_tags_to_export_by_type(export_type): # by type
|
|||
return None
|
||||
#r_serv_db.smembers('whitelist_hive')
|
||||
|
||||
def _fix_tag_obj_id(date_from):
|
||||
date_to = datetime.date.today().strftime("%Y%m%d")
|
||||
for obj_type in ail_core.get_all_objects():
|
||||
print(obj_type)
|
||||
for tag in get_all_obj_tags(obj_type):
|
||||
if ';' in tag:
|
||||
print(tag)
|
||||
new_tag = tag.split(';')[0]
|
||||
print(new_tag)
|
||||
r_tags.hdel(f'tag_metadata:{tag}', 'first_seen')
|
||||
r_tags.hdel(f'tag_metadata:{tag}', 'last_seen')
|
||||
r_tags.srem(f'list_tags:{obj_type}', tag)
|
||||
r_tags.srem(f'list_tags:{obj_type}:', tag)
|
||||
r_tags.srem(f'list_tags', tag)
|
||||
raw = get_obj_by_tags(obj_type, [tag], nb_obj=500000, date_from=date_from, date_to=date_to)
|
||||
if raw.get('tagged_obj', []):
|
||||
for obj_id in raw['tagged_obj']:
|
||||
# print(obj_id)
|
||||
delete_object_tag(tag, obj_type, obj_id)
|
||||
add_object_tag(new_tag, obj_type, obj_id)
|
||||
else:
|
||||
update_tag_global_by_obj_type(tag, obj_type)
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# taxo = 'accessnow'
|
||||
# # taxo = TAXONOMIES.get(taxo)
|
||||
|
|
|
@ -2,6 +2,8 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
import json
|
||||
import os
|
||||
import logging
|
||||
import logging.config
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
|
@ -14,7 +16,7 @@ from ail_typo_squatting import runAll
|
|||
import math
|
||||
|
||||
from collections import defaultdict
|
||||
from flask import escape
|
||||
from markupsafe import escape
|
||||
from textblob import TextBlob
|
||||
from nltk.tokenize import RegexpTokenizer
|
||||
|
||||
|
@ -24,11 +26,16 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from packages import Date
|
||||
from lib.ail_core import get_objects_tracked, get_object_all_subtypes, get_objects_retro_hunted
|
||||
from lib import ail_logger
|
||||
from lib import ConfigLoader
|
||||
from lib import item_basic
|
||||
from lib import Tag
|
||||
from lib.Users import User
|
||||
|
||||
# LOGS
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
logger = logging.getLogger()
|
||||
|
||||
config_loader = ConfigLoader.ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
|
||||
|
@ -207,6 +214,13 @@ class Tracker:
|
|||
if filters:
|
||||
self._set_field('filters', json.dumps(filters))
|
||||
|
||||
def del_filters(self, tracker_type, to_track):
|
||||
filters = self.get_filters()
|
||||
for obj_type in filters:
|
||||
r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', to_track)
|
||||
r_tracker.srem(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}')
|
||||
r_tracker.hdel(f'tracker:{self.uuid}', 'filters')
|
||||
|
||||
def get_tracked(self):
|
||||
return self._get_field('tracked')
|
||||
|
||||
|
@ -241,7 +255,8 @@ class Tracker:
|
|||
return self._get_field('user_id')
|
||||
|
||||
def webhook_export(self):
|
||||
return r_tracker.hexists(f'tracker:{self.uuid}', 'webhook')
|
||||
webhook = self.get_webhook()
|
||||
return webhook is not None and webhook
|
||||
|
||||
def get_webhook(self):
|
||||
return r_tracker.hget(f'tracker:{self.uuid}', 'webhook')
|
||||
|
@ -513,6 +528,7 @@ class Tracker:
|
|||
self._set_mails(mails)
|
||||
|
||||
# Filters
|
||||
self.del_filters(old_type, old_to_track)
|
||||
if not filters:
|
||||
filters = {}
|
||||
for obj_type in get_objects_tracked():
|
||||
|
@ -522,9 +538,6 @@ class Tracker:
|
|||
for obj_type in filters:
|
||||
r_tracker.sadd(f'trackers:objs:{tracker_type}:{obj_type}', to_track)
|
||||
r_tracker.sadd(f'trackers:uuid:{tracker_type}:{to_track}', f'{self.uuid}:{obj_type}')
|
||||
if tracker_type != old_type:
|
||||
r_tracker.srem(f'trackers:objs:{old_type}:{obj_type}', old_to_track)
|
||||
r_tracker.srem(f'trackers:uuid:{old_type}:{old_to_track}', f'{self.uuid}:{obj_type}')
|
||||
|
||||
# Refresh Trackers
|
||||
trigger_trackers_refresh(tracker_type)
|
||||
|
@ -555,9 +568,7 @@ class Tracker:
|
|||
os.remove(filepath)
|
||||
|
||||
# Filters
|
||||
filters = self.get_filters()
|
||||
if not filters:
|
||||
filters = get_objects_tracked()
|
||||
filters = get_objects_tracked()
|
||||
for obj_type in filters:
|
||||
r_tracker.srem(f'trackers:objs:{tracker_type}:{obj_type}', tracked)
|
||||
r_tracker.srem(f'trackers:uuid:{tracker_type}:{tracked}', f'{self.uuid}:{obj_type}')
|
||||
|
@ -650,14 +661,14 @@ def get_user_trackers_meta(user_id, tracker_type=None):
|
|||
metas = []
|
||||
for tracker_uuid in get_user_trackers(user_id, tracker_type=tracker_type):
|
||||
tracker = Tracker(tracker_uuid)
|
||||
metas.append(tracker.get_meta(options={'mails', 'sparkline', 'tags'}))
|
||||
metas.append(tracker.get_meta(options={'description', 'mails', 'sparkline', 'tags'}))
|
||||
return metas
|
||||
|
||||
def get_global_trackers_meta(tracker_type=None):
|
||||
metas = []
|
||||
for tracker_uuid in get_global_trackers(tracker_type=tracker_type):
|
||||
tracker = Tracker(tracker_uuid)
|
||||
metas.append(tracker.get_meta(options={'mails', 'sparkline', 'tags'}))
|
||||
metas.append(tracker.get_meta(options={'description', 'mails', 'sparkline', 'tags'}))
|
||||
return metas
|
||||
|
||||
def get_users_trackers_meta():
|
||||
|
@ -918,7 +929,7 @@ def api_add_tracker(dict_input, user_id):
|
|||
# Filters # TODO MOVE ME
|
||||
filters = dict_input.get('filters', {})
|
||||
if filters:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp', 'title'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
filters = {}
|
||||
for obj_type in filters:
|
||||
if obj_type not in get_objects_tracked():
|
||||
|
@ -993,7 +1004,7 @@ def api_edit_tracker(dict_input, user_id):
|
|||
# Filters # TODO MOVE ME
|
||||
filters = dict_input.get('filters', {})
|
||||
if filters:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
if filters.keys() == {'decoded', 'item', 'pgp', 'title'} and set(filters['pgp'].get('subtypes', [])) == {'mail', 'name'}:
|
||||
if not filters['decoded'] and not filters['item']:
|
||||
filters = {}
|
||||
for obj_type in filters:
|
||||
|
@ -1146,7 +1157,11 @@ def get_tracked_yara_rules():
|
|||
for obj_type in get_objects_tracked():
|
||||
rules = {}
|
||||
for tracked in _get_tracked_by_obj_type('yara', obj_type):
|
||||
rules[tracked] = os.path.join(get_yara_rules_dir(), tracked)
|
||||
rule = os.path.join(get_yara_rules_dir(), tracked)
|
||||
if not os.path.exists(rule):
|
||||
logger.critical(f"Yara rule don't exists {tracked} : {obj_type}")
|
||||
else:
|
||||
rules[tracked] = rule
|
||||
to_track[obj_type] = yara.compile(filepaths=rules)
|
||||
print(to_track)
|
||||
return to_track
|
||||
|
|
|
@ -81,7 +81,7 @@ def get_user_passwd_hash(user_id):
|
|||
return r_serv_db.hget('ail:users:all', user_id)
|
||||
|
||||
def get_user_token(user_id):
|
||||
return r_serv_db.hget(f'ail:users:metadata:{user_id}', 'token')
|
||||
return r_serv_db.hget(f'ail:user:metadata:{user_id}', 'token')
|
||||
|
||||
def get_token_user(token):
|
||||
return r_serv_db.hget('ail:users:tokens', token)
|
||||
|
@ -156,7 +156,8 @@ def delete_user(user_id):
|
|||
for role_id in get_all_roles():
|
||||
r_serv_db.srem(f'ail:users:role:{role_id}', user_id)
|
||||
user_token = get_user_token(user_id)
|
||||
r_serv_db.hdel('ail:users:tokens', user_token)
|
||||
if user_token:
|
||||
r_serv_db.hdel('ail:users:tokens', user_token)
|
||||
r_serv_db.delete(f'ail:user:metadata:{user_id}')
|
||||
r_serv_db.hdel('ail:users:all', user_id)
|
||||
|
||||
|
@ -246,7 +247,10 @@ class User(UserMixin):
|
|||
self.id = "__anonymous__"
|
||||
|
||||
def exists(self):
|
||||
return self.id != "__anonymous__"
|
||||
if self.id == "__anonymous__":
|
||||
return False
|
||||
else:
|
||||
return r_serv_db.exists(f'ail:user:metadata:{self.id}')
|
||||
|
||||
# return True or False
|
||||
# def is_authenticated():
|
||||
|
@ -286,3 +290,6 @@ class User(UserMixin):
|
|||
return True
|
||||
else:
|
||||
return False
|
||||
|
||||
def get_role(self):
|
||||
return r_serv_db.hget(f'ail:user:metadata:{self.id}', 'role')
|
||||
|
|
|
@ -13,9 +13,12 @@ from lib.ConfigLoader import ConfigLoader
|
|||
|
||||
config_loader = ConfigLoader()
|
||||
r_serv_db = config_loader.get_db_conn("Kvrocks_DB")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
AIL_OBJECTS = sorted({'cve', 'cryptocurrency', 'decoded', 'domain', 'item', 'pgp', 'screenshot', 'title', 'username'})
|
||||
AIL_OBJECTS = sorted({'chat', 'chat-subchannel', 'chat-thread', 'cookie-name', 'cve', 'cryptocurrency', 'decoded',
|
||||
'domain', 'etag', 'favicon', 'file-name', 'hhhash',
|
||||
'item', 'image', 'message', 'pgp', 'screenshot', 'title', 'user-account', 'username'})
|
||||
|
||||
def get_ail_uuid():
|
||||
ail_uuid = r_serv_db.get('ail:uuid')
|
||||
|
@ -37,19 +40,28 @@ def get_all_objects():
|
|||
return AIL_OBJECTS
|
||||
|
||||
def get_objects_with_subtypes():
|
||||
return ['cryptocurrency', 'pgp', 'username']
|
||||
return ['chat', 'cryptocurrency', 'pgp', 'username', 'user-account']
|
||||
|
||||
def get_object_all_subtypes(obj_type):
|
||||
def get_object_all_subtypes(obj_type): # TODO Dynamic subtype
|
||||
if obj_type == 'chat':
|
||||
return r_object.smembers(f'all_chat:subtypes')
|
||||
if obj_type == 'chat-subchannel':
|
||||
return r_object.smembers(f'all_chat-subchannel:subtypes')
|
||||
if obj_type == 'cryptocurrency':
|
||||
return ['bitcoin', 'bitcoin-cash', 'dash', 'ethereum', 'litecoin', 'monero', 'zcash']
|
||||
if obj_type == 'pgp':
|
||||
return ['key', 'mail', 'name']
|
||||
if obj_type == 'username':
|
||||
return ['telegram', 'twitter', 'jabber']
|
||||
if obj_type == 'user-account':
|
||||
return r_object.smembers(f'all_chat:subtypes')
|
||||
return []
|
||||
|
||||
def get_obj_queued():
|
||||
return ['item', 'image']
|
||||
|
||||
def get_objects_tracked():
|
||||
return ['decoded', 'item', 'pgp']
|
||||
return ['decoded', 'item', 'pgp', 'title']
|
||||
|
||||
def get_objects_retro_hunted():
|
||||
return ['decoded', 'item']
|
||||
|
@ -65,6 +77,32 @@ def get_all_objects_with_subtypes_tuple():
|
|||
str_objs.append((obj_type, ''))
|
||||
return str_objs
|
||||
|
||||
def unpack_obj_global_id(global_id, r_type='tuple'):
|
||||
if r_type == 'dict':
|
||||
obj = global_id.split(':', 2)
|
||||
return {'type': obj[0], 'subtype': obj[1], 'id': obj[2]}
|
||||
else: # tuple(type, subtype, id)
|
||||
return global_id.split(':', 2)
|
||||
|
||||
def unpack_objs_global_id(objs_global_id, r_type='tuple'):
|
||||
objs = []
|
||||
for global_id in objs_global_id:
|
||||
objs.append(unpack_obj_global_id(global_id, r_type=r_type))
|
||||
return objs
|
||||
|
||||
def unpack_correl_obj__id(obj_type, global_id, r_type='tuple'):
|
||||
obj = global_id.split(':', 1)
|
||||
if r_type == 'dict':
|
||||
return {'type': obj_type, 'subtype': obj[0], 'id': obj[1]}
|
||||
else: # tuple(type, subtype, id)
|
||||
return obj_type, obj[0], obj[1]
|
||||
|
||||
def unpack_correl_objs_id(obj_type, correl_objs_id, r_type='tuple'):
|
||||
objs = []
|
||||
for correl_obj_id in correl_objs_id:
|
||||
objs.append(unpack_correl_obj__id(obj_type, correl_obj_id, r_type=r_type))
|
||||
return objs
|
||||
|
||||
##-- AIL OBJECTS --##
|
||||
|
||||
#### Redis ####
|
||||
|
@ -82,6 +120,10 @@ def zscan_iter(r_redis, name): # count ???
|
|||
|
||||
## -- Redis -- ##
|
||||
|
||||
def rreplace(s, old, new, occurrence):
|
||||
li = s.rsplit(old, occurrence)
|
||||
return new.join(li)
|
||||
|
||||
def paginate_iterator(iter_elems, nb_obj=50, page=1):
|
||||
dict_page = {'nb_all_elem': len(iter_elems)}
|
||||
nb_pages = dict_page['nb_all_elem'] / nb_obj
|
||||
|
|
|
@ -6,19 +6,29 @@ import sys
|
|||
import datetime
|
||||
import time
|
||||
|
||||
import xxhash
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.exceptions import ModuleQueueError
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import ail_core
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_queues = config_loader.get_redis_conn("Redis_Queues")
|
||||
r_obj_process = config_loader.get_redis_conn("Redis_Process")
|
||||
timeout_queue_obj = 172800
|
||||
config_loader = None
|
||||
|
||||
MODULES_FILE = os.path.join(os.environ['AIL_HOME'], 'configs', 'modules.cfg')
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# AIL QUEUE #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
class AILQueue:
|
||||
|
||||
|
@ -60,16 +70,38 @@ class AILQueue:
|
|||
# Update queues stats
|
||||
r_queues.hset('queues', self.name, self.get_nb_messages())
|
||||
r_queues.hset(f'modules', f'{self.pid}:{self.name}', int(time.time()))
|
||||
|
||||
# Get Message
|
||||
message = r_queues.lpop(f'queue:{self.name}:in')
|
||||
if not message:
|
||||
return None
|
||||
else:
|
||||
# TODO SAVE CURRENT ITEMS (OLD Module information)
|
||||
row_mess = message.split(';', 1)
|
||||
if len(row_mess) != 2:
|
||||
return None, None, message
|
||||
# raise Exception(f'Error: queue {self.name}, no AIL object provided')
|
||||
else:
|
||||
obj_global_id, mess = row_mess
|
||||
m_hash = xxhash.xxh3_64_hexdigest(message)
|
||||
add_processed_obj(obj_global_id, m_hash, module=self.name)
|
||||
return obj_global_id, m_hash, mess
|
||||
|
||||
return message
|
||||
def rename_message_obj(self, new_id, old_id):
|
||||
# restrict rename function
|
||||
if self.name == 'Mixer' or self.name == 'Global':
|
||||
rename_processed_obj(new_id, old_id)
|
||||
else:
|
||||
raise ModuleQueueError('This Module can\'t rename an object ID')
|
||||
|
||||
def send_message(self, message, queue_name=None):
|
||||
# condition -> not in any queue
|
||||
# TODO EDIT meta
|
||||
|
||||
|
||||
|
||||
def end_message(self, obj_global_id, m_hash):
|
||||
end_processed_obj(obj_global_id, m_hash, module=self.name)
|
||||
|
||||
def send_message(self, obj_global_id, message='', queue_name=None):
|
||||
if not self.subscribers_modules:
|
||||
raise ModuleQueueError('This Module don\'t have any subscriber')
|
||||
if queue_name:
|
||||
|
@ -80,8 +112,17 @@ class AILQueue:
|
|||
raise ModuleQueueError('Queue name required. This module push to multiple queues')
|
||||
queue_name = list(self.subscribers_modules)[0]
|
||||
|
||||
message = f'{obj_global_id};{message}'
|
||||
if obj_global_id != '::':
|
||||
m_hash = xxhash.xxh3_64_hexdigest(message)
|
||||
else:
|
||||
m_hash = None
|
||||
|
||||
# Add message to all modules
|
||||
for module_name in self.subscribers_modules[queue_name]:
|
||||
if m_hash:
|
||||
add_processed_obj(obj_global_id, m_hash, queue=module_name)
|
||||
|
||||
r_queues.rpush(f'queue:{module_name}:in', message)
|
||||
# stats
|
||||
nb_mess = r_queues.llen(f'queue:{module_name}:in')
|
||||
|
@ -98,6 +139,7 @@ class AILQueue:
|
|||
def error(self):
|
||||
r_queues.hdel(f'modules', f'{self.pid}:{self.name}')
|
||||
|
||||
|
||||
def get_queues_modules():
|
||||
return r_queues.hkeys('queues')
|
||||
|
||||
|
@ -132,6 +174,132 @@ def get_modules_queues_stats():
|
|||
def clear_modules_queues_stats():
|
||||
r_queues.delete('modules')
|
||||
|
||||
|
||||
# # # # # # # # #
|
||||
# #
|
||||
# OBJ QUEUES # PROCESS ??
|
||||
# #
|
||||
# # # # # # # # #
|
||||
|
||||
|
||||
def get_processed_objs():
|
||||
return r_obj_process.smembers(f'objs:process')
|
||||
|
||||
def get_processed_end_objs():
|
||||
return r_obj_process.smembers(f'objs:processed')
|
||||
|
||||
def get_processed_end_obj():
|
||||
return r_obj_process.spop(f'objs:processed')
|
||||
|
||||
def get_processed_objs_by_type(obj_type):
|
||||
return r_obj_process.zrange(f'objs:process:{obj_type}', 0, -1)
|
||||
|
||||
def is_processed_obj_queued(obj_global_id):
|
||||
return r_obj_process.exists(f'obj:queues:{obj_global_id}')
|
||||
|
||||
def is_processed_obj_moduled(obj_global_id):
|
||||
return r_obj_process.exists(f'obj:modules:{obj_global_id}')
|
||||
|
||||
def is_processed_obj(obj_global_id):
|
||||
return is_processed_obj_queued(obj_global_id) or is_processed_obj_moduled(obj_global_id)
|
||||
|
||||
def get_processed_obj_modules(obj_global_id):
|
||||
return r_obj_process.zrange(f'obj:modules:{obj_global_id}', 0, -1)
|
||||
|
||||
def get_processed_obj_queues(obj_global_id):
|
||||
return r_obj_process.zrange(f'obj:queues:{obj_global_id}', 0, -1)
|
||||
|
||||
def get_processed_obj(obj_global_id):
|
||||
return {'modules': get_processed_obj_modules(obj_global_id), 'queues': get_processed_obj_queues(obj_global_id)}
|
||||
|
||||
def add_processed_obj(obj_global_id, m_hash, module=None, queue=None):
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
new_obj = r_obj_process.sadd(f'objs:process', obj_global_id)
|
||||
# first process:
|
||||
if new_obj:
|
||||
r_obj_process.zadd(f'objs:process:{obj_type}', {obj_global_id: int(time.time())})
|
||||
if queue:
|
||||
r_obj_process.zadd(f'obj:queues:{obj_global_id}', {f'{queue}:{m_hash}': int(time.time())})
|
||||
if module:
|
||||
r_obj_process.zadd(f'obj:modules:{obj_global_id}', {f'{module}:{m_hash}': int(time.time())})
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{module}:{m_hash}')
|
||||
|
||||
def end_processed_obj(obj_global_id, m_hash, module=None, queue=None):
|
||||
if queue:
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{m_hash}')
|
||||
if module:
|
||||
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{m_hash}')
|
||||
|
||||
# TODO HANDLE QUEUE DELETE
|
||||
# process completed
|
||||
if not is_processed_obj(obj_global_id):
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
|
||||
r_obj_process.srem(f'objs:process', obj_global_id)
|
||||
|
||||
r_obj_process.sadd(f'objs:processed', obj_global_id) # TODO use list ??????
|
||||
|
||||
def rename_processed_obj(new_id, old_id):
|
||||
module = get_processed_obj_modules(old_id)
|
||||
# currently in a module
|
||||
if len(module) == 1:
|
||||
module, x_hash = module[0].split(':', 1)
|
||||
obj_type = old_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'obj:modules:{old_id}', f'{module}:{x_hash}')
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', old_id)
|
||||
r_obj_process.srem(f'objs:process', old_id)
|
||||
add_processed_obj(new_id, x_hash, module=module)
|
||||
|
||||
def get_last_queue_timeout():
|
||||
epoch_update = r_obj_process.get('queue:obj:timeout:last')
|
||||
if not epoch_update:
|
||||
epoch_update = 0
|
||||
return float(epoch_update)
|
||||
|
||||
def timeout_process_obj(obj_global_id):
|
||||
for q in get_processed_obj_queues(obj_global_id):
|
||||
queue, x_hash = q.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{x_hash}')
|
||||
for m in get_processed_obj_modules(obj_global_id):
|
||||
module, x_hash = m.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{x_hash}')
|
||||
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
|
||||
r_obj_process.srem(f'objs:process', obj_global_id)
|
||||
|
||||
r_obj_process.sadd(f'objs:processed', obj_global_id)
|
||||
print(f'timeout: {obj_global_id}')
|
||||
|
||||
|
||||
def timeout_processed_objs():
|
||||
curr_time = int(time.time())
|
||||
time_limit = curr_time - timeout_queue_obj
|
||||
for obj_type in ail_core.get_obj_queued():
|
||||
for obj_global_id in r_obj_process.zrangebyscore(f'objs:process:{obj_type}', 0, time_limit):
|
||||
timeout_process_obj(obj_global_id)
|
||||
r_obj_process.set('queue:obj:timeout:last', time.time())
|
||||
|
||||
def delete_processed_obj(obj_global_id):
|
||||
for q in get_processed_obj_queues(obj_global_id):
|
||||
queue, x_hash = q.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:queues:{obj_global_id}', f'{queue}:{x_hash}')
|
||||
for m in get_processed_obj_modules(obj_global_id):
|
||||
module, x_hash = m.split(':', 1)
|
||||
r_obj_process.zrem(f'obj:modules:{obj_global_id}', f'{module}:{x_hash}')
|
||||
obj_type = obj_global_id.split(':', 1)[0]
|
||||
r_obj_process.zrem(f'objs:process:{obj_type}', obj_global_id)
|
||||
r_obj_process.srem(f'objs:process', obj_global_id)
|
||||
|
||||
###################################################################################
|
||||
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# GRAPH #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def get_queue_digraph():
|
||||
queues_ail = {}
|
||||
modules = {}
|
||||
|
@ -223,64 +391,13 @@ def save_queue_digraph():
|
|||
sys.exit(1)
|
||||
|
||||
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
###########################################################################################
|
||||
|
||||
# def get_all_queues_name():
|
||||
# return r_queues.hkeys('queues')
|
||||
#
|
||||
# def get_all_queues_dict_with_nb_elem():
|
||||
# return r_queues.hgetall('queues')
|
||||
#
|
||||
# def get_all_queues_with_sorted_nb_elem():
|
||||
# res = r_queues.hgetall('queues')
|
||||
# res = sorted(res.items())
|
||||
# return res
|
||||
#
|
||||
# def get_module_pid_by_queue_name(queue_name):
|
||||
# return r_queues.smembers('MODULE_TYPE_{}'.format(queue_name))
|
||||
#
|
||||
# # # TODO: remove last msg part
|
||||
# def get_module_last_process_start_time(queue_name, module_pid):
|
||||
# res = r_queues.get('MODULE_{}_{}'.format(queue_name, module_pid))
|
||||
# if res:
|
||||
# return res.split(',')[0]
|
||||
# return None
|
||||
#
|
||||
# def get_module_last_msg(queue_name, module_pid):
|
||||
# return r_queues.get('MODULE_{}_{}_PATH'.format(queue_name, module_pid))
|
||||
#
|
||||
# def get_all_modules_queues_stats():
|
||||
# all_modules_queues_stats = []
|
||||
# for queue_name, nb_elem_queue in get_all_queues_with_sorted_nb_elem():
|
||||
# l_module_pid = get_module_pid_by_queue_name(queue_name)
|
||||
# for module_pid in l_module_pid:
|
||||
# last_process_start_time = get_module_last_process_start_time(queue_name, module_pid)
|
||||
# if last_process_start_time:
|
||||
# last_process_start_time = datetime.datetime.fromtimestamp(int(last_process_start_time))
|
||||
# seconds = int((datetime.datetime.now() - last_process_start_time).total_seconds())
|
||||
# else:
|
||||
# seconds = 0
|
||||
# all_modules_queues_stats.append((queue_name, nb_elem_queue, seconds, module_pid))
|
||||
# return all_modules_queues_stats
|
||||
#
|
||||
#
|
||||
# def _get_all_messages_from_queue(queue_name):
|
||||
# #self.r_temp.hset('queues', self.subscriber_name, int(self.r_temp.scard(in_set)))
|
||||
# return r_queues.smembers(f'queue:{queue_name}:in')
|
||||
#
|
||||
# # def is_message_in queue(queue_name):
|
||||
# # pass
|
||||
#
|
||||
# def remove_message_from_queue(queue_name, message):
|
||||
# queue_key = f'queue:{queue_name}:in'
|
||||
# r_queues.srem(queue_key, message)
|
||||
# r_queues.hset('queues', queue_name, int(r_queues.scard(queue_key)))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# clear_modules_queues_stats()
|
||||
save_queue_digraph()
|
||||
# save_queue_digraph()
|
||||
oobj_global_id = 'item::submitted/2023/10/11/submitted_b5440009-05d5-4494-a807-a6d8e4a900cf.gz'
|
||||
# print(get_processed_obj(oobj_global_id))
|
||||
# delete_processed_obj(oobj_global_id)
|
||||
# while True:
|
||||
# print(get_processed_obj(oobj_global_id))
|
||||
# time.sleep(0.5)
|
||||
print(get_processed_end_objs())
|
||||
|
|
|
@ -15,38 +15,15 @@ config_loader = ConfigLoader()
|
|||
r_db = config_loader.get_db_conn("Kvrocks_DB")
|
||||
config_loader = None
|
||||
|
||||
BACKGROUND_UPDATES = {
|
||||
'v1.5': {
|
||||
'nb_updates': 5,
|
||||
'message': 'Tags and Screenshots'
|
||||
},
|
||||
'v2.4': {
|
||||
'nb_updates': 1,
|
||||
'message': ' Domains Tags and Correlations'
|
||||
},
|
||||
'v2.6': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Domains Tags and Correlations'
|
||||
},
|
||||
'v2.7': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Domains Tags'
|
||||
},
|
||||
'v3.4': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Domains Languages'
|
||||
},
|
||||
'v3.7': {
|
||||
'nb_updates': 1,
|
||||
'message': 'Trackers first_seen/last_seen'
|
||||
}
|
||||
}
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# UPDATE #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def get_ail_version():
|
||||
return r_db.get('ail:version')
|
||||
|
||||
|
||||
def get_ail_float_version():
|
||||
version = get_ail_version()
|
||||
if version:
|
||||
|
@ -55,6 +32,179 @@ def get_ail_float_version():
|
|||
version = 0
|
||||
return version
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
# # # # # # # # # # # #
|
||||
# #
|
||||
# UPDATE BACKGROUND #
|
||||
# #
|
||||
# # # # # # # # # # # #
|
||||
|
||||
|
||||
BACKGROUND_UPDATES = {
|
||||
'v5.2': {
|
||||
'message': 'Compress HAR',
|
||||
'scripts': ['compress_har.py']
|
||||
},
|
||||
}
|
||||
|
||||
class AILBackgroundUpdate:
|
||||
"""
|
||||
AIL Background Update.
|
||||
"""
|
||||
|
||||
def __init__(self, version):
|
||||
self.version = version
|
||||
|
||||
def _get_field(self, field):
|
||||
return r_db.hget('ail:update:background', field)
|
||||
|
||||
def _set_field(self, field, value):
|
||||
r_db.hset('ail:update:background', field, value)
|
||||
|
||||
def get_version(self):
|
||||
return self.version
|
||||
|
||||
def get_message(self):
|
||||
return BACKGROUND_UPDATES.get(self.version, {}).get('message', '')
|
||||
|
||||
def get_error(self):
|
||||
return self._get_field('error')
|
||||
|
||||
def set_error(self, error): # TODO ADD LOGS
|
||||
self._set_field('error', error)
|
||||
|
||||
def get_nb_scripts(self):
|
||||
return int(len(BACKGROUND_UPDATES.get(self.version, {}).get('scripts', [''])))
|
||||
|
||||
def get_scripts(self):
|
||||
return BACKGROUND_UPDATES.get(self.version, {}).get('scripts', [])
|
||||
|
||||
def get_nb_scripts_done(self):
|
||||
done = self._get_field('done')
|
||||
try:
|
||||
done = int(done)
|
||||
except (TypeError, ValueError):
|
||||
done = 0
|
||||
return done
|
||||
|
||||
def inc_nb_scripts_done(self):
|
||||
self._set_field('done', self.get_nb_scripts_done() + 1)
|
||||
|
||||
def get_script(self):
|
||||
return self._get_field('script')
|
||||
|
||||
def get_script_path(self):
|
||||
path = os.path.basename(self.get_script())
|
||||
if path:
|
||||
return os.path.join(os.environ['AIL_HOME'], 'update', self.version, path)
|
||||
|
||||
def get_nb_to_update(self): # TODO use cache ?????
|
||||
nb_to_update = self._get_field('nb_to_update')
|
||||
if not nb_to_update:
|
||||
nb_to_update = 1
|
||||
return int(nb_to_update)
|
||||
|
||||
def set_nb_to_update(self, nb):
|
||||
self._set_field('nb_to_update', int(nb))
|
||||
|
||||
def get_nb_updated(self): # TODO use cache ?????
|
||||
nb_updated = self._get_field('nb_updated')
|
||||
if not nb_updated:
|
||||
nb_updated = 0
|
||||
return int(nb_updated)
|
||||
|
||||
def inc_nb_updated(self): # TODO use cache ?????
|
||||
r_db.hincrby('ail:update:background', 'nb_updated', 1)
|
||||
|
||||
def get_progress(self): # TODO use cache ?????
|
||||
return self._get_field('progress')
|
||||
|
||||
def set_progress(self, progress):
|
||||
self._set_field('progress', progress)
|
||||
|
||||
def update_progress(self):
|
||||
nb_updated = self.get_nb_updated()
|
||||
nb_to_update = self.get_nb_to_update()
|
||||
if nb_updated == nb_to_update:
|
||||
progress = 100
|
||||
elif nb_updated > nb_to_update:
|
||||
progress = 99
|
||||
else:
|
||||
progress = int((nb_updated * 100) / nb_to_update)
|
||||
self.set_progress(progress)
|
||||
print(f'{nb_updated}/{nb_to_update} updated {progress}%')
|
||||
return progress
|
||||
|
||||
def is_running(self):
|
||||
return r_db.hget('ail:update:background', 'version') == self.version
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'version': self.get_version(),
|
||||
'error': self.get_error(),
|
||||
'script': self.get_script(),
|
||||
'script_progress': self.get_progress(),
|
||||
'nb_update': self.get_nb_scripts(),
|
||||
'nb_completed': self.get_nb_scripts_done()}
|
||||
meta['progress'] = int(meta['nb_completed'] * 100 / meta['nb_update'])
|
||||
if 'message' in options:
|
||||
meta['message'] = self.get_message()
|
||||
return meta
|
||||
|
||||
def start(self):
|
||||
self._set_field('version', self.version)
|
||||
r_db.hdel('ail:update:background', 'error')
|
||||
|
||||
def start_script(self, script):
|
||||
self.clear()
|
||||
self._set_field('script', script)
|
||||
self.set_progress(0)
|
||||
|
||||
def end_script(self):
|
||||
self.set_progress(100)
|
||||
self.inc_nb_scripts_done()
|
||||
|
||||
def clear(self):
|
||||
r_db.hdel('ail:update:background', 'error')
|
||||
r_db.hdel('ail:update:background', 'progress')
|
||||
r_db.hdel('ail:update:background', 'nb_updated')
|
||||
r_db.hdel('ail:update:background', 'nb_to_update')
|
||||
|
||||
def end(self):
|
||||
r_db.delete('ail:update:background')
|
||||
r_db.srem('ail:updates:background', self.version)
|
||||
|
||||
|
||||
# To Add in update script
|
||||
def add_background_update(version):
|
||||
r_db.sadd('ail:updates:background', version)
|
||||
|
||||
def is_update_background_running():
|
||||
return r_db.exists('ail:update:background')
|
||||
|
||||
def get_update_background_version():
|
||||
return r_db.hget('ail:update:background', 'version')
|
||||
|
||||
def get_update_background_meta(options=set()):
|
||||
version = get_update_background_version()
|
||||
if version:
|
||||
return AILBackgroundUpdate(version).get_meta(options=options)
|
||||
else:
|
||||
return {}
|
||||
|
||||
def get_update_background_to_launch():
|
||||
to_launch = []
|
||||
updates = r_db.smembers('ail:updates:background')
|
||||
for version in BACKGROUND_UPDATES:
|
||||
if version in updates:
|
||||
to_launch.append(version)
|
||||
return to_launch
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
##########################################################################################
|
||||
##########################################################################################
|
||||
##########################################################################################
|
||||
|
||||
def get_ail_all_updates(date_separator='-'):
|
||||
dict_update = r_db.hgetall('ail:update_date')
|
||||
|
@ -87,111 +237,6 @@ def check_version(version):
|
|||
return True
|
||||
|
||||
|
||||
#### UPDATE BACKGROUND ####
|
||||
|
||||
def exits_background_update_to_launch():
|
||||
return r_db.scard('ail:update:to_update') != 0
|
||||
|
||||
|
||||
def is_version_in_background_update(version):
|
||||
return r_db.sismember('ail:update:to_update', version)
|
||||
|
||||
|
||||
def get_all_background_updates_to_launch():
|
||||
return r_db.smembers('ail:update:to_update')
|
||||
|
||||
|
||||
def get_current_background_update():
|
||||
return r_db.get('ail:update:update_in_progress')
|
||||
|
||||
|
||||
def get_current_background_update_script():
|
||||
return r_db.get('ail:update:current_background_script')
|
||||
|
||||
|
||||
def get_current_background_update_script_path(version, script_name):
|
||||
return os.path.join(os.environ['AIL_HOME'], 'update', version, script_name)
|
||||
|
||||
|
||||
def get_current_background_nb_update_completed():
|
||||
return r_db.scard('ail:update:update_in_progress:completed')
|
||||
|
||||
|
||||
def get_current_background_update_progress():
|
||||
progress = r_db.get('ail:update:current_background_script_stat')
|
||||
if not progress:
|
||||
progress = 0
|
||||
return int(progress)
|
||||
|
||||
|
||||
def get_background_update_error():
|
||||
return r_db.get('ail:update:error')
|
||||
|
||||
|
||||
def add_background_updates_to_launch(version):
|
||||
return r_db.sadd('ail:update:to_update', version)
|
||||
|
||||
|
||||
def start_background_update(version):
|
||||
r_db.delete('ail:update:error')
|
||||
r_db.set('ail:update:update_in_progress', version)
|
||||
|
||||
|
||||
def set_current_background_update_script(script_name):
|
||||
r_db.set('ail:update:current_background_script', script_name)
|
||||
r_db.set('ail:update:current_background_script_stat', 0)
|
||||
|
||||
|
||||
def set_current_background_update_progress(progress):
|
||||
r_db.set('ail:update:current_background_script_stat', progress)
|
||||
|
||||
|
||||
def set_background_update_error(error):
|
||||
r_db.set('ail:update:error', error)
|
||||
|
||||
|
||||
def end_background_update_script():
|
||||
r_db.sadd('ail:update:update_in_progress:completed')
|
||||
|
||||
|
||||
def end_background_update(version):
|
||||
r_db.delete('ail:update:update_in_progress')
|
||||
r_db.delete('ail:update:current_background_script')
|
||||
r_db.delete('ail:update:current_background_script_stat')
|
||||
r_db.delete('ail:update:update_in_progress:completed')
|
||||
r_db.srem('ail:update:to_update', version)
|
||||
|
||||
|
||||
def clear_background_update():
|
||||
r_db.delete('ail:update:error')
|
||||
r_db.delete('ail:update:update_in_progress')
|
||||
r_db.delete('ail:update:current_background_script')
|
||||
r_db.delete('ail:update:current_background_script_stat')
|
||||
r_db.delete('ail:update:update_in_progress:completed')
|
||||
|
||||
|
||||
def get_update_background_message(version):
|
||||
return BACKGROUND_UPDATES[version]['message']
|
||||
|
||||
|
||||
# TODO: Detect error in subprocess
|
||||
def get_update_background_metadata():
|
||||
dict_update = {}
|
||||
version = get_current_background_update()
|
||||
if version:
|
||||
dict_update['version'] = version
|
||||
dict_update['script'] = get_current_background_update_script()
|
||||
dict_update['script_progress'] = get_current_background_update_progress()
|
||||
dict_update['nb_update'] = BACKGROUND_UPDATES[dict_update['version']]['nb_updates']
|
||||
dict_update['nb_completed'] = get_current_background_nb_update_completed()
|
||||
dict_update['progress'] = int(dict_update['nb_completed'] * 100 / dict_update['nb_update'])
|
||||
dict_update['error'] = get_background_update_error()
|
||||
return dict_update
|
||||
|
||||
|
||||
##-- UPDATE BACKGROUND --##
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
res = check_version('v3.1..1')
|
||||
print(res)
|
||||
|
|
423
bin/lib/chats_viewer.py
Executable file
423
bin/lib/chats_viewer.py
Executable file
|
@ -0,0 +1,423 @@
|
|||
#!/usr/bin/python3
|
||||
|
||||
"""
|
||||
Chats Viewer
|
||||
===================
|
||||
|
||||
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import uuid
|
||||
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import Chats
|
||||
from lib.objects import ChatSubChannels
|
||||
from lib.objects import ChatThreads
|
||||
from lib.objects import Messages
|
||||
from lib.objects import UsersAccount
|
||||
from lib.objects import Usernames
|
||||
from lib import Language
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_db = config_loader.get_db_conn("Kvrocks_DB")
|
||||
r_crawler = config_loader.get_db_conn("Kvrocks_Crawler")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
|
||||
r_obj = config_loader.get_db_conn("Kvrocks_DB") # TEMP new DB ????
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# COMMON #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
# TODO ChatDefaultPlatform
|
||||
|
||||
# CHAT(type=chat, subtype=platform, id= chat_id)
|
||||
|
||||
# Channel(type=channel, subtype=platform, id=channel_id)
|
||||
|
||||
# Thread(type=thread, subtype=platform, id=thread_id)
|
||||
|
||||
# Message(type=message, subtype=platform, id=message_id)
|
||||
|
||||
|
||||
# Protocol/Platform
|
||||
|
||||
|
||||
# class ChatProtocols: # TODO Remove Me
|
||||
#
|
||||
# def __init__(self): # name ???? subtype, id ????
|
||||
# # discord, mattermost, ...
|
||||
# pass
|
||||
#
|
||||
# def get_chat_protocols(self):
|
||||
# pass
|
||||
#
|
||||
# def get_chat_protocol(self, protocol):
|
||||
# pass
|
||||
#
|
||||
# ################################################################
|
||||
#
|
||||
# def get_instances(self):
|
||||
# pass
|
||||
#
|
||||
# def get_chats(self):
|
||||
# pass
|
||||
#
|
||||
# def get_chats_by_instance(self, instance):
|
||||
# pass
|
||||
#
|
||||
#
|
||||
# class ChatNetwork: # uuid or protocol
|
||||
# def __init__(self, network='default'):
|
||||
# self.id = network
|
||||
#
|
||||
# def get_addresses(self):
|
||||
# pass
|
||||
#
|
||||
#
|
||||
# class ChatServerAddress: # uuid or protocol + network
|
||||
# def __init__(self, address='default'):
|
||||
# self.id = address
|
||||
|
||||
# map uuid -> type + field
|
||||
|
||||
# TODO option last protocol/ imported messages/chat -> unread mode ????
|
||||
|
||||
# # # # # # # # #
|
||||
# #
|
||||
# PROTOCOLS # IRC, discord, mattermost, ...
|
||||
# #
|
||||
# # # # # # # # # TODO icon => UI explorer by protocol + network + instance
|
||||
|
||||
def get_chat_protocols():
|
||||
return r_obj.smembers(f'chat:protocols')
|
||||
|
||||
def get_chat_protocols_meta():
|
||||
metas = []
|
||||
for protocol_id in get_chat_protocols():
|
||||
protocol = ChatProtocol(protocol_id)
|
||||
metas.append(protocol.get_meta(options={'icon'}))
|
||||
return metas
|
||||
|
||||
class ChatProtocol: # TODO first seen last seen ???? + nb by day ????
|
||||
def __init__(self, protocol):
|
||||
self.id = protocol
|
||||
|
||||
def exists(self):
|
||||
return r_db.exists(f'chat:protocol:{self.id}')
|
||||
|
||||
def get_networks(self):
|
||||
return r_db.smembers(f'chat:protocol:{self.id}')
|
||||
|
||||
def get_nb_networks(self):
|
||||
return r_db.scard(f'chat:protocol:{self.id}')
|
||||
|
||||
def get_icon(self):
|
||||
if self.id == 'discord':
|
||||
icon = {'style': 'fab', 'icon': 'fa-discord'}
|
||||
elif self.id == 'telegram':
|
||||
icon = {'style': 'fab', 'icon': 'fa-telegram'}
|
||||
else:
|
||||
icon = {}
|
||||
return icon
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'id': self.id}
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
return meta
|
||||
|
||||
# def get_addresses(self):
|
||||
# pass
|
||||
#
|
||||
# def get_instances_uuids(self):
|
||||
# pass
|
||||
|
||||
|
||||
# # # # # # # # # # # # # #
|
||||
# #
|
||||
# ChatServiceInstance #
|
||||
# #
|
||||
# # # # # # # # # # # # # #
|
||||
|
||||
# uuid -> protocol + network + server
|
||||
class ChatServiceInstance:
|
||||
def __init__(self, instance_uuid):
|
||||
self.uuid = instance_uuid
|
||||
|
||||
def exists(self):
|
||||
return r_obj.exists(f'chatSerIns:{self.uuid}')
|
||||
|
||||
def get_protocol(self): # return objects ????
|
||||
return r_obj.hget(f'chatSerIns:{self.uuid}', 'protocol')
|
||||
|
||||
def get_network(self): # return objects ????
|
||||
network = r_obj.hget(f'chatSerIns:{self.uuid}', 'network')
|
||||
if network:
|
||||
return network
|
||||
|
||||
def get_address(self): # return objects ????
|
||||
address = r_obj.hget(f'chatSerIns:{self.uuid}', 'address')
|
||||
if address:
|
||||
return address
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'uuid': self.uuid,
|
||||
'protocol': self.get_protocol(),
|
||||
'network': self.get_network(),
|
||||
'address': self.get_address()}
|
||||
if 'chats' in options:
|
||||
meta['chats'] = []
|
||||
for chat_id in self.get_chats():
|
||||
meta['chats'].append(Chats.Chat(chat_id, self.uuid).get_meta({'created_at', 'icon', 'nb_subchannels', 'nb_messages'}))
|
||||
return meta
|
||||
|
||||
def get_nb_chats(self):
|
||||
return Chats.Chats().get_nb_ids_by_subtype(self.uuid)
|
||||
|
||||
def get_chats(self):
|
||||
return Chats.Chats().get_ids_by_subtype(self.uuid)
|
||||
|
||||
def get_chat_service_instances():
|
||||
return r_obj.smembers(f'chatSerIns:all')
|
||||
|
||||
def get_chat_service_instances_by_protocol(protocol):
|
||||
instance_uuids = {}
|
||||
for network in r_obj.smembers(f'chat:protocol:networks:{protocol}'):
|
||||
inst_uuids = r_obj.hvals(f'map:chatSerIns:{protocol}:{network}')
|
||||
if not network:
|
||||
network = 'default'
|
||||
instance_uuids[network] = inst_uuids
|
||||
return instance_uuids
|
||||
|
||||
def get_chat_service_instance_uuid(protocol, network, address):
|
||||
if not network:
|
||||
network = ''
|
||||
if not address:
|
||||
address = ''
|
||||
return r_obj.hget(f'map:chatSerIns:{protocol}:{network}', address)
|
||||
|
||||
def get_chat_service_instance_uuid_meta_from_network_dict(instance_uuids):
|
||||
for network in instance_uuids:
|
||||
metas = []
|
||||
for instance_uuid in instance_uuids[network]:
|
||||
metas.append(ChatServiceInstance(instance_uuid).get_meta())
|
||||
instance_uuids[network] = metas
|
||||
return instance_uuids
|
||||
|
||||
def get_chat_service_instance(protocol, network, address):
|
||||
instance_uuid = get_chat_service_instance_uuid(protocol, network, address)
|
||||
if instance_uuid:
|
||||
return ChatServiceInstance(instance_uuid)
|
||||
|
||||
def create_chat_service_instance(protocol, network=None, address=None):
|
||||
instance_uuid = get_chat_service_instance_uuid(protocol, network, address)
|
||||
if instance_uuid:
|
||||
return instance_uuid
|
||||
else:
|
||||
if not network:
|
||||
network = ''
|
||||
if not address:
|
||||
address = ''
|
||||
instance_uuid = str(uuid.uuid5(uuid.NAMESPACE_URL, f'{protocol}|{network}|{address}'))
|
||||
r_obj.sadd(f'chatSerIns:all', instance_uuid)
|
||||
|
||||
# map instance - uuid
|
||||
r_obj.hset(f'map:chatSerIns:{protocol}:{network}', address, instance_uuid)
|
||||
|
||||
r_obj.hset(f'chatSerIns:{instance_uuid}', 'protocol', protocol)
|
||||
if network:
|
||||
r_obj.hset(f'chatSerIns:{instance_uuid}', 'network', network)
|
||||
if address:
|
||||
r_obj.hset(f'chatSerIns:{instance_uuid}', 'address', address)
|
||||
|
||||
# protocols
|
||||
r_obj.sadd(f'chat:protocols', protocol) # TODO first seen / last seen
|
||||
|
||||
# protocol -> network
|
||||
r_obj.sadd(f'chat:protocol:networks:{protocol}', network)
|
||||
|
||||
return instance_uuid
|
||||
|
||||
|
||||
|
||||
|
||||
# INSTANCE ===> CHAT IDS
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# protocol -> instance_uuids => for protocol->networks -> protocol+network => HGETALL
|
||||
# protocol+network -> instance_uuids => HGETALL
|
||||
|
||||
# protocol -> networks ???default??? or ''
|
||||
|
||||
# --------------------------------------------------------
|
||||
# protocol+network -> addresses => HKEYS
|
||||
# protocol+network+addresse => HGET
|
||||
|
||||
|
||||
# Chat -> subtype=uuid, id = chat id
|
||||
|
||||
|
||||
# instance_uuid -> chat id
|
||||
|
||||
|
||||
# protocol - uniq ID
|
||||
# protocol + network -> uuid ????
|
||||
# protocol + network + address -> uuid
|
||||
|
||||
#######################################################################################
|
||||
|
||||
def get_obj_chat(chat_type, chat_subtype, chat_id):
|
||||
if chat_type == 'chat':
|
||||
return Chats.Chat(chat_id, chat_subtype)
|
||||
elif chat_type == 'chat-subchannel':
|
||||
return ChatSubChannels.ChatSubChannel(chat_id, chat_subtype)
|
||||
elif chat_type == 'chat-thread':
|
||||
return ChatThreads.ChatThread(chat_id, chat_subtype)
|
||||
|
||||
def get_obj_chat_meta(obj_chat, new_options=set()):
|
||||
options = {}
|
||||
if obj_chat.type == 'chat':
|
||||
options = {'created_at', 'icon', 'info', 'subchannels', 'threads', 'username'}
|
||||
elif obj_chat.type == 'chat-subchannel':
|
||||
options = {'chat', 'created_at', 'icon', 'nb_messages', 'threads'}
|
||||
elif obj_chat.type == 'chat-thread':
|
||||
options = {'chat', 'nb_messages'}
|
||||
for option in new_options:
|
||||
options.add(option)
|
||||
return obj_chat.get_meta(options=options)
|
||||
|
||||
def get_subchannels_meta_from_global_id(subchannels, translation_target=None):
|
||||
meta = []
|
||||
for sub in subchannels:
|
||||
_, instance_uuid, sub_id = sub.split(':', 2)
|
||||
subchannel = ChatSubChannels.ChatSubChannel(sub_id, instance_uuid)
|
||||
meta.append(subchannel.get_meta({'nb_messages', 'created_at', 'icon', 'translation'}, translation_target=translation_target))
|
||||
return meta
|
||||
|
||||
def get_chat_meta_from_global_id(chat_global_id):
|
||||
_, instance_uuid, chat_id = chat_global_id.split(':', 2)
|
||||
chat = Chats.Chat(chat_id, instance_uuid)
|
||||
return chat.get_meta()
|
||||
|
||||
def get_threads_metas(threads):
|
||||
metas = []
|
||||
for thread in threads:
|
||||
metas.append(ChatThreads.ChatThread(thread['id'], thread['subtype']).get_meta(options={'name', 'nb_messages'}))
|
||||
return metas
|
||||
|
||||
def get_username_meta_from_global_id(username_global_id):
|
||||
_, instance_uuid, username_id = username_global_id.split(':', 2)
|
||||
username = Usernames.Username(username_id, instance_uuid)
|
||||
return username.get_meta()
|
||||
|
||||
#### API ####
|
||||
|
||||
def api_get_chat_service_instance(chat_instance_uuid):
|
||||
chat_instance = ChatServiceInstance(chat_instance_uuid)
|
||||
if not chat_instance.exists():
|
||||
return {"status": "error", "reason": "Unknown uuid"}, 404
|
||||
return chat_instance.get_meta({'chats'}), 200
|
||||
|
||||
def api_get_chat(chat_id, chat_instance_uuid, translation_target=None, nb=-1, page=-1):
|
||||
chat = Chats.Chat(chat_id, chat_instance_uuid)
|
||||
if not chat.exists():
|
||||
return {"status": "error", "reason": "Unknown chat"}, 404
|
||||
meta = chat.get_meta({'created_at', 'icon', 'info', 'nb_participants', 'subchannels', 'threads', 'translation', 'username'}, translation_target=translation_target)
|
||||
if meta['username']:
|
||||
meta['username'] = get_username_meta_from_global_id(meta['username'])
|
||||
if meta['subchannels']:
|
||||
meta['subchannels'] = get_subchannels_meta_from_global_id(meta['subchannels'], translation_target=translation_target)
|
||||
else:
|
||||
if translation_target not in Language.get_translation_languages():
|
||||
translation_target = None
|
||||
meta['messages'], meta['pagination'], meta['tags_messages'] = chat.get_messages(translation_target=translation_target, nb=nb, page=page)
|
||||
return meta, 200
|
||||
|
||||
def api_get_nb_message_by_week(chat_id, chat_instance_uuid):
|
||||
chat = Chats.Chat(chat_id, chat_instance_uuid)
|
||||
if not chat.exists():
|
||||
return {"status": "error", "reason": "Unknown chat"}, 404
|
||||
week = chat.get_nb_message_this_week()
|
||||
# week = chat.get_nb_message_by_week('20231109')
|
||||
return week, 200
|
||||
|
||||
def api_get_chat_participants(chat_type, chat_subtype, chat_id):
|
||||
if chat_type not in ['chat', 'chat-subchannel', 'chat-thread']:
|
||||
return {"status": "error", "reason": "Unknown chat type"}, 400
|
||||
chat_obj = get_obj_chat(chat_type, chat_subtype, chat_id)
|
||||
if not chat_obj.exists():
|
||||
return {"status": "error", "reason": "Unknown chat"}, 404
|
||||
else:
|
||||
meta = get_obj_chat_meta(chat_obj, new_options={'participants'})
|
||||
chat_participants = []
|
||||
for participant in meta['participants']:
|
||||
user_account = UsersAccount.UserAccount(participant['id'], participant['subtype'])
|
||||
chat_participants.append(user_account.get_meta({'icon', 'info', 'username'}))
|
||||
meta['participants'] = chat_participants
|
||||
return meta, 200
|
||||
|
||||
def api_get_subchannel(chat_id, chat_instance_uuid, translation_target=None, nb=-1, page=-1):
|
||||
subchannel = ChatSubChannels.ChatSubChannel(chat_id, chat_instance_uuid)
|
||||
if not subchannel.exists():
|
||||
return {"status": "error", "reason": "Unknown subchannel"}, 404
|
||||
meta = subchannel.get_meta({'chat', 'created_at', 'icon', 'nb_messages', 'nb_participants', 'threads', 'translation'}, translation_target=translation_target)
|
||||
if meta['chat']:
|
||||
meta['chat'] = get_chat_meta_from_global_id(meta['chat'])
|
||||
if meta.get('threads'):
|
||||
meta['threads'] = get_threads_metas(meta['threads'])
|
||||
if meta.get('username'):
|
||||
meta['username'] = get_username_meta_from_global_id(meta['username'])
|
||||
meta['messages'], meta['pagination'], meta['tags_messages'] = subchannel.get_messages(translation_target=translation_target, nb=nb, page=page)
|
||||
return meta, 200
|
||||
|
||||
def api_get_thread(thread_id, thread_instance_uuid, translation_target=None, nb=-1, page=-1):
|
||||
thread = ChatThreads.ChatThread(thread_id, thread_instance_uuid)
|
||||
if not thread.exists():
|
||||
return {"status": "error", "reason": "Unknown thread"}, 404
|
||||
meta = thread.get_meta({'chat', 'nb_messages', 'nb_participants'})
|
||||
# if meta['chat']:
|
||||
# meta['chat'] = get_chat_meta_from_global_id(meta['chat'])
|
||||
meta['messages'], meta['pagination'], meta['tags_messages'] = thread.get_messages(translation_target=translation_target, nb=nb, page=page)
|
||||
return meta, 200
|
||||
|
||||
def api_get_message(message_id, translation_target=None):
|
||||
message = Messages.Message(message_id)
|
||||
if not message.exists():
|
||||
return {"status": "error", "reason": "Unknown uuid"}, 404
|
||||
meta = message.get_meta({'chat', 'content', 'files-names', 'icon', 'images', 'link', 'parent', 'parent_meta', 'reactions', 'thread', 'translation', 'user-account'}, translation_target=translation_target)
|
||||
return meta, 200
|
||||
|
||||
def api_get_user_account(user_id, instance_uuid, translation_target=None):
|
||||
user_account = UsersAccount.UserAccount(user_id, instance_uuid)
|
||||
if not user_account.exists():
|
||||
return {"status": "error", "reason": "Unknown user-account"}, 404
|
||||
meta = user_account.get_meta({'chats', 'icon', 'info', 'subchannels', 'threads', 'translation', 'username', 'username_meta'}, translation_target=translation_target)
|
||||
return meta, 200
|
||||
|
||||
# # # # # # # # # # LATER
|
||||
# #
|
||||
# ChatCategory #
|
||||
# #
|
||||
# # # # # # # # # #
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
r = get_chat_service_instances()
|
||||
print(r)
|
||||
r = ChatServiceInstance(r.pop())
|
||||
print(r.get_meta({'chats'}))
|
||||
# r = get_chat_protocols()
|
||||
# print(r)
|
|
@ -41,15 +41,26 @@ config_loader = None
|
|||
##################################
|
||||
|
||||
CORRELATION_TYPES_BY_OBJ = {
|
||||
"cryptocurrency": ["domain", "item"],
|
||||
"cve": ["domain", "item"],
|
||||
"decoded": ["domain", "item"],
|
||||
"domain": ["cve", "cryptocurrency", "decoded", "item", "pgp", "title", "screenshot", "username"],
|
||||
"item": ["cve", "cryptocurrency", "decoded", "domain", "pgp", "screenshot", "title", "username"],
|
||||
"pgp": ["domain", "item"],
|
||||
"chat": ["chat-subchannel", "chat-thread", "image", "user-account"], # message or direct correlation like cve, bitcoin, ... ???
|
||||
"chat-subchannel": ["chat", "chat-thread", "image", "message", "user-account"],
|
||||
"chat-thread": ["chat", "chat-subchannel", "image", "message", "user-account"], # TODO user account
|
||||
"cookie-name": ["domain"],
|
||||
"cryptocurrency": ["domain", "item", "message"],
|
||||
"cve": ["domain", "item", "message"],
|
||||
"decoded": ["domain", "item", "message"],
|
||||
"domain": ["cve", "cookie-name", "cryptocurrency", "decoded", "etag", "favicon", "hhhash", "item", "pgp", "title", "screenshot", "username"],
|
||||
"etag": ["domain"],
|
||||
"favicon": ["domain", "item"], # TODO Decoded
|
||||
"file-name": ["chat", "message"],
|
||||
"hhhash": ["domain"],
|
||||
"image": ["chat", "message", "user-account"],
|
||||
"item": ["cve", "cryptocurrency", "decoded", "domain", "favicon", "pgp", "screenshot", "title", "username"], # chat ???
|
||||
"message": ["chat", "chat-subchannel", "chat-thread", "cve", "cryptocurrency", "decoded", "file-name", "image", "pgp", "user-account"], # chat ??
|
||||
"pgp": ["domain", "item", "message"],
|
||||
"screenshot": ["domain", "item"],
|
||||
"title": ["domain", "item"],
|
||||
"username": ["domain", "item"],
|
||||
"user-account": ["chat", "chat-subchannel", "chat-thread", "image", "message", "username"],
|
||||
"username": ["domain", "item", "message", "user-account"],
|
||||
}
|
||||
|
||||
def get_obj_correl_types(obj_type):
|
||||
|
@ -61,6 +72,8 @@ def sanityze_obj_correl_types(obj_type, correl_types):
|
|||
correl_types = set(correl_types).intersection(obj_correl_types)
|
||||
if not correl_types:
|
||||
correl_types = obj_correl_types
|
||||
if not correl_types:
|
||||
return []
|
||||
return correl_types
|
||||
|
||||
def get_nb_correlation_by_correl_type(obj_type, subtype, obj_id, correl_type):
|
||||
|
@ -110,6 +123,9 @@ def is_obj_correlated(obj_type, subtype, obj_id, obj2_type, subtype2, obj2_id):
|
|||
except:
|
||||
return False
|
||||
|
||||
def get_obj_inter_correlation(obj_type1, subtype1, obj_id1, obj_type2, subtype2, obj_id2, correl_type):
|
||||
return r_metadata.sinter(f'correlation:obj:{obj_type1}:{subtype1}:{correl_type}:{obj_id1}', f'correlation:obj:{obj_type2}:{subtype2}:{correl_type}:{obj_id2}')
|
||||
|
||||
def add_obj_correlation(obj1_type, subtype1, obj1_id, obj2_type, subtype2, obj2_id):
|
||||
if subtype1 is None:
|
||||
subtype1 = ''
|
||||
|
@ -165,20 +181,22 @@ def delete_obj_correlations(obj_type, subtype, obj_id):
|
|||
def get_obj_str_id(obj_type, subtype, obj_id):
|
||||
if subtype is None:
|
||||
subtype = ''
|
||||
return f'{obj_type};{subtype};{obj_id}'
|
||||
return f'{obj_type}:{subtype}:{obj_id}'
|
||||
|
||||
def get_correlations_graph_nodes_links(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, flask_context=False):
|
||||
def get_correlations_graph_nodes_links(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1, objs_hidden=set(), flask_context=False):
|
||||
links = set()
|
||||
nodes = set()
|
||||
meta = {'complete': True, 'objs': set()}
|
||||
|
||||
obj_str_id = get_obj_str_id(obj_type, subtype, obj_id)
|
||||
|
||||
_get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level, max_nodes, filter_types=filter_types, previous_str_obj='')
|
||||
return obj_str_id, nodes, links
|
||||
_get_correlations_graph_node(links, nodes, meta, obj_type, subtype, obj_id, level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, previous_str_obj='')
|
||||
return obj_str_id, nodes, links, meta
|
||||
|
||||
|
||||
def _get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level, max_nodes, filter_types=[], previous_str_obj=''):
|
||||
def _get_correlations_graph_node(links, nodes, meta, obj_type, subtype, obj_id, level, max_nodes, filter_types=[], objs_hidden=set(), previous_str_obj=''):
|
||||
obj_str_id = get_obj_str_id(obj_type, subtype, obj_id)
|
||||
meta['objs'].add(obj_str_id)
|
||||
nodes.add(obj_str_id)
|
||||
|
||||
obj_correlations = get_correlations(obj_type, subtype, obj_id, filter_types=filter_types)
|
||||
|
@ -187,15 +205,22 @@ def _get_correlations_graph_node(links, nodes, obj_type, subtype, obj_id, level,
|
|||
for str_obj in obj_correlations[correl_type]:
|
||||
subtype2, obj2_id = str_obj.split(':', 1)
|
||||
obj2_str_id = get_obj_str_id(correl_type, subtype2, obj2_id)
|
||||
# filter objects to hide
|
||||
if obj2_str_id in objs_hidden:
|
||||
continue
|
||||
|
||||
meta['objs'].add(obj2_str_id)
|
||||
|
||||
if obj2_str_id == previous_str_obj:
|
||||
continue
|
||||
|
||||
if len(nodes) > max_nodes:
|
||||
if len(nodes) > max_nodes != 0:
|
||||
meta['complete'] = False
|
||||
break
|
||||
nodes.add(obj2_str_id)
|
||||
links.add((obj_str_id, obj2_str_id))
|
||||
|
||||
if level > 0:
|
||||
next_level = level - 1
|
||||
_get_correlations_graph_node(links, nodes, correl_type, subtype2, obj2_id, next_level, max_nodes, filter_types=filter_types, previous_str_obj=obj_str_id)
|
||||
_get_correlations_graph_node(links, nodes, meta, correl_type, subtype2, obj2_id, next_level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, previous_str_obj=obj_str_id)
|
||||
|
||||
|
|
|
@ -36,8 +36,10 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from packages import git_status
|
||||
from packages import Date
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Domains import Domain
|
||||
from lib.objects import HHHashs
|
||||
from lib.objects.Items import Item
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
@ -74,8 +76,8 @@ def get_current_date(separator=False):
|
|||
def get_date_crawled_items_source(date):
|
||||
return os.path.join('crawled', date)
|
||||
|
||||
def get_date_har_dir(date):
|
||||
return os.path.join(HAR_DIR, date)
|
||||
def get_har_dir():
|
||||
return HAR_DIR
|
||||
|
||||
def is_valid_onion_domain(domain):
|
||||
if not domain.endswith('.onion'):
|
||||
|
@ -133,7 +135,7 @@ def unpack_url(url):
|
|||
# # # # # # # # TODO CREATE NEW OBJECT
|
||||
|
||||
def get_favicon_from_html(html, domain, url):
|
||||
favicon_urls = extract_favicon_from_html(html, url)
|
||||
favicon_urls, favicons = extract_favicon_from_html(html, url)
|
||||
# add root favicon
|
||||
if not favicon_urls:
|
||||
favicon_urls.add(f'{urlparse(url).scheme}://{domain}/favicon.ico')
|
||||
|
@ -141,9 +143,11 @@ def get_favicon_from_html(html, domain, url):
|
|||
return favicon_urls
|
||||
|
||||
def extract_favicon_from_html(html, url):
|
||||
favicon_urls = set()
|
||||
favicons = set()
|
||||
favicons_urls = set()
|
||||
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
set_icons = set()
|
||||
all_icons = set()
|
||||
# If there are multiple <link rel="icon">s, the browser uses their media,
|
||||
# type, and sizes attributes to select the most appropriate icon.
|
||||
# If several icons are equally appropriate, the last one is used.
|
||||
|
@ -159,27 +163,64 @@ def extract_favicon_from_html(html, url):
|
|||
# - <meta name="msapplication-TileColor" content="#aaaaaa"> <meta name="theme-color" content="#ffffff">
|
||||
# - <meta name="msapplication-config" content="/icons/browserconfig.xml">
|
||||
|
||||
# desktop browser 'shortcut icon' (older browser), 'icon'
|
||||
for favicon_tag in ['icon', 'shortcut icon']:
|
||||
if soup.head:
|
||||
for icon in soup.head.find_all('link', attrs={'rel': lambda x : x and x.lower() == favicon_tag, 'href': True}):
|
||||
set_icons.add(icon)
|
||||
# Root Favicon
|
||||
f = get_faup()
|
||||
f.decode(url)
|
||||
url_decoded = f.get()
|
||||
root_domain = f"{url_decoded['scheme']}://{url_decoded['domain']}"
|
||||
default_icon = f'{root_domain}/favicon.ico'
|
||||
favicons_urls.add(default_icon)
|
||||
# print(default_icon)
|
||||
|
||||
# # TODO: handle base64 favicon
|
||||
for tag in set_icons:
|
||||
# shortcut
|
||||
for shortcut in soup.find_all('link', rel='shortcut icon'):
|
||||
all_icons.add(shortcut)
|
||||
# icons
|
||||
for icon in soup.find_all('link', rel='icon'):
|
||||
all_icons.add(icon)
|
||||
|
||||
for mask_icon in soup.find_all('link', rel='mask-icon'):
|
||||
all_icons.add(mask_icon)
|
||||
for apple_touche_icon in soup.find_all('link', rel='apple-touch-icon'):
|
||||
all_icons.add(apple_touche_icon)
|
||||
for msapplication in soup.find_all('meta', attrs={'name': 'msapplication-TileImage'}): # msapplication-TileColor
|
||||
all_icons.add(msapplication)
|
||||
|
||||
# msapplication-TileImage
|
||||
|
||||
# print(all_icons)
|
||||
for tag in all_icons:
|
||||
icon_url = tag.get('href')
|
||||
if icon_url:
|
||||
if icon_url.startswith('//'):
|
||||
icon_url = icon_url.replace('//', '/')
|
||||
if icon_url.startswith('data:'):
|
||||
# # TODO: handle base64 favicon
|
||||
pass
|
||||
data = icon_url.split(',', 1)
|
||||
if len(data) > 1:
|
||||
data = ''.join(data[1].split())
|
||||
favicon = base64.b64decode(data)
|
||||
if favicon:
|
||||
favicons.add(favicon)
|
||||
else:
|
||||
icon_url = urljoin(url, icon_url)
|
||||
icon_url = urlparse(icon_url, scheme=urlparse(url).scheme).geturl()
|
||||
favicon_urls.add(icon_url)
|
||||
return favicon_urls
|
||||
favicon_url = urljoin(url, icon_url)
|
||||
favicons_urls.add(favicon_url)
|
||||
elif tag.get('name') == 'msapplication-TileImage':
|
||||
icon_url = tag.get('content')
|
||||
if icon_url:
|
||||
if icon_url.startswith('data:'):
|
||||
data = icon_url.split(',', 1)
|
||||
if len(data) > 1:
|
||||
data = ''.join(data[1].split())
|
||||
favicon = base64.b64decode(data)
|
||||
if favicon:
|
||||
favicons.add(favicon)
|
||||
else:
|
||||
favicon_url = urljoin(url, icon_url)
|
||||
favicons_urls.add(favicon_url)
|
||||
print(favicon_url)
|
||||
|
||||
# print(favicons_urls)
|
||||
return favicons_urls, favicons
|
||||
|
||||
# mmh3.hash(favicon)
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
|
@ -193,14 +234,9 @@ def extract_title_from_html(html):
|
|||
soup = BeautifulSoup(html, 'html.parser')
|
||||
title = soup.title
|
||||
if title:
|
||||
return str(title.string)
|
||||
return ''
|
||||
|
||||
def extract_description_from_html(html):
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
description = soup.find('meta', attrs={'name': 'description'})
|
||||
if description:
|
||||
return description['content']
|
||||
title = title.string
|
||||
if title:
|
||||
return str(title)
|
||||
return ''
|
||||
|
||||
def extract_description_from_html(html):
|
||||
|
@ -223,6 +259,196 @@ def extract_author_from_html(html):
|
|||
if keywords:
|
||||
return keywords['content']
|
||||
return ''
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# HAR #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def create_har_id(date, item_id):
|
||||
item_id = item_id.split('/')[-1]
|
||||
return os.path.join(date, f'{item_id}.json.gz')
|
||||
|
||||
def save_har(har_id, har_content):
|
||||
# create dir
|
||||
har_dir = os.path.dirname(os.path.join(get_har_dir(), har_id))
|
||||
if not os.path.exists(har_dir):
|
||||
os.makedirs(har_dir)
|
||||
# save HAR
|
||||
filename = os.path.join(get_har_dir(), har_id)
|
||||
with gzip.open(filename, 'wb') as f:
|
||||
f.write(json.dumps(har_content).encode())
|
||||
|
||||
def get_all_har_ids():
|
||||
har_ids = []
|
||||
today_root_dir = os.path.join(HAR_DIR, Date.get_today_date_str(separator=True))
|
||||
dirs_year = set()
|
||||
for ydir in next(os.walk(HAR_DIR))[1]:
|
||||
if len(ydir) == 4:
|
||||
try:
|
||||
int(ydir)
|
||||
dirs_year.add(ydir)
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
|
||||
if os.path.exists(today_root_dir):
|
||||
for file in [f for f in os.listdir(today_root_dir) if os.path.isfile(os.path.join(today_root_dir, f))]:
|
||||
har_id = os.path.relpath(os.path.join(today_root_dir, file), HAR_DIR)
|
||||
har_ids.append(har_id)
|
||||
|
||||
for ydir in sorted(dirs_year, reverse=False):
|
||||
search_dear = os.path.join(HAR_DIR, ydir)
|
||||
for root, dirs, files in os.walk(search_dear):
|
||||
for file in files:
|
||||
if root != today_root_dir:
|
||||
har_id = os.path.relpath(os.path.join(root, file), HAR_DIR)
|
||||
har_ids.append(har_id)
|
||||
return har_ids
|
||||
|
||||
def get_month_har_ids(year, month):
|
||||
har_ids = []
|
||||
month_path = os.path.join(HAR_DIR, year, month)
|
||||
for root, dirs, files in os.walk(month_path):
|
||||
for file in files:
|
||||
har_id = os.path.relpath(os.path.join(root, file), HAR_DIR)
|
||||
har_ids.append(har_id)
|
||||
return har_ids
|
||||
|
||||
|
||||
def get_har_content(har_id):
|
||||
har_path = os.path.join(HAR_DIR, har_id)
|
||||
try:
|
||||
with gzip.open(har_path) as f:
|
||||
try:
|
||||
return json.loads(f.read())
|
||||
except json.decoder.JSONDecodeError:
|
||||
return {}
|
||||
except Exception as e:
|
||||
print(e) # TODO LOGS
|
||||
return {}
|
||||
|
||||
def extract_cookies_names_from_har(har):
|
||||
cookies = set()
|
||||
for entrie in har.get('log', {}).get('entries', []):
|
||||
for cookie in entrie.get('request', {}).get('cookies', []):
|
||||
name = cookie.get('name')
|
||||
if name:
|
||||
cookies.add(name)
|
||||
for cookie in entrie.get('response', {}).get('cookies', []):
|
||||
name = cookie.get('name')
|
||||
if name:
|
||||
cookies.add(name)
|
||||
return cookies
|
||||
|
||||
def _reprocess_all_hars_cookie_name():
|
||||
from lib.objects import CookiesNames
|
||||
for har_id in get_all_har_ids():
|
||||
domain = har_id.split('/')[-1]
|
||||
domain = domain[:-44]
|
||||
date = har_id.split('/')
|
||||
date = f'{date[-4]}{date[-3]}{date[-2]}'
|
||||
for cookie_name in extract_cookies_names_from_har(get_har_content(har_id)):
|
||||
print(domain, date, cookie_name)
|
||||
cookie = CookiesNames.create(cookie_name)
|
||||
cookie.add(date, Domain(domain))
|
||||
|
||||
def extract_etag_from_har(har): # TODO check response url
|
||||
etags = set()
|
||||
for entrie in har.get('log', {}).get('entries', []):
|
||||
for header in entrie.get('response', {}).get('headers', []):
|
||||
if header.get('name') == 'etag':
|
||||
# print(header)
|
||||
etag = header.get('value')
|
||||
if etag:
|
||||
etags.add(etag)
|
||||
return etags
|
||||
|
||||
def _reprocess_all_hars_etag():
|
||||
from lib.objects import Etags
|
||||
for har_id in get_all_har_ids():
|
||||
domain = har_id.split('/')[-1]
|
||||
domain = domain[:-44]
|
||||
date = har_id.split('/')
|
||||
date = f'{date[-4]}{date[-3]}{date[-2]}'
|
||||
for etag_content in extract_etag_from_har(get_har_content(har_id)):
|
||||
print(domain, date, etag_content)
|
||||
etag = Etags.create(etag_content)
|
||||
etag.add(date, Domain(domain))
|
||||
|
||||
def extract_hhhash_by_id(har_id, domain, date):
|
||||
return extract_hhhash(get_har_content(har_id), domain, date)
|
||||
|
||||
def extract_hhhash(har, domain, date):
|
||||
hhhashs = set()
|
||||
urls = set()
|
||||
for entrie in har.get('log', {}).get('entries', []):
|
||||
url = entrie.get('request').get('url')
|
||||
if url not in urls:
|
||||
# filter redirect
|
||||
if entrie.get('response').get('status') == 200: # != 301:
|
||||
# print(url, entrie.get('response').get('status'))
|
||||
|
||||
f = get_faup()
|
||||
f.decode(url)
|
||||
domain_url = f.get().get('domain')
|
||||
if domain_url == domain:
|
||||
|
||||
headers = entrie.get('response').get('headers')
|
||||
|
||||
hhhash_header = HHHashs.build_hhhash_headers(headers)
|
||||
hhhash = HHHashs.hhhash_headers(hhhash_header)
|
||||
|
||||
if hhhash not in hhhashs:
|
||||
print('', url, hhhash)
|
||||
|
||||
# -----
|
||||
obj = HHHashs.create(hhhash_header, hhhash)
|
||||
obj.add(date, Domain(domain))
|
||||
|
||||
hhhashs.add(hhhash)
|
||||
urls.add(url)
|
||||
print()
|
||||
print()
|
||||
print('HHHASH:')
|
||||
for hhhash in hhhashs:
|
||||
print(hhhash)
|
||||
return hhhashs
|
||||
|
||||
def _reprocess_all_hars_hhhashs():
|
||||
for har_id in get_all_har_ids():
|
||||
print()
|
||||
print(har_id)
|
||||
domain = har_id.split('/')[-1]
|
||||
domain = domain[:-44]
|
||||
date = har_id.split('/')
|
||||
date = f'{date[-4]}{date[-3]}{date[-2]}'
|
||||
extract_hhhash_by_id(har_id, domain, date)
|
||||
|
||||
|
||||
|
||||
def _gzip_har(har_id):
|
||||
har_path = os.path.join(HAR_DIR, har_id)
|
||||
new_id = f'{har_path}.gz'
|
||||
if not har_id.endswith('.gz'):
|
||||
if not os.path.exists(new_id):
|
||||
with open(har_path, 'rb') as f:
|
||||
content = f.read()
|
||||
if content:
|
||||
with gzip.open(new_id, 'wb') as f:
|
||||
r = f.write(content)
|
||||
print(r)
|
||||
if os.path.exists(new_id) and os.path.exists(har_path):
|
||||
os.remove(har_path)
|
||||
print('delete:', har_path)
|
||||
|
||||
def _gzip_all_hars():
|
||||
for har_id in get_all_har_ids():
|
||||
_gzip_har(har_id)
|
||||
|
||||
# # # - - # # #
|
||||
|
||||
################################################################################
|
||||
|
@ -539,8 +765,7 @@ class Cookie:
|
|||
meta[field] = value
|
||||
if r_json:
|
||||
data = json.dumps(meta, indent=4, sort_keys=True)
|
||||
meta = {'data': data}
|
||||
meta['uuid'] = self.uuid
|
||||
meta = {'data': data, 'uuid': self.uuid}
|
||||
return meta
|
||||
|
||||
def edit(self, cookie_dict):
|
||||
|
@ -652,7 +877,7 @@ def unpack_imported_json_cookie(json_cookie):
|
|||
|
||||
## - - ##
|
||||
#### COOKIEJAR API ####
|
||||
def api_import_cookies_from_json(user_id, cookiejar_uuid, json_cookies_str): # # TODO: add catch
|
||||
def api_import_cookies_from_json(user_id, cookiejar_uuid, json_cookies_str): # # TODO: add catch
|
||||
resp = api_verify_cookiejar_acl(cookiejar_uuid, user_id)
|
||||
if resp:
|
||||
return resp
|
||||
|
@ -821,8 +1046,8 @@ class CrawlerScheduler:
|
|||
minutes = 0
|
||||
current_time = datetime.now().timestamp()
|
||||
time_next_run = (datetime.now() + relativedelta(months=int(months), weeks=int(weeks),
|
||||
days=int(days), hours=int(hours),
|
||||
minutes=int(minutes))).timestamp()
|
||||
days=int(days), hours=int(hours),
|
||||
minutes=int(minutes))).timestamp()
|
||||
# Make sure the next capture is not scheduled for in a too short interval
|
||||
interval_next_capture = time_next_run - current_time
|
||||
if interval_next_capture < self.min_frequency:
|
||||
|
@ -844,6 +1069,7 @@ class CrawlerScheduler:
|
|||
task_uuid = create_task(meta['url'], depth=meta['depth'], har=meta['har'], screenshot=meta['screenshot'],
|
||||
header=meta['header'],
|
||||
cookiejar=meta['cookiejar'], proxy=meta['proxy'],
|
||||
tags=meta['tags'],
|
||||
user_agent=meta['user_agent'], parent='scheduler', priority=40)
|
||||
if task_uuid:
|
||||
schedule.set_task(task_uuid)
|
||||
|
@ -946,6 +1172,14 @@ class CrawlerSchedule:
|
|||
def _set_field(self, field, value):
|
||||
return r_crawler.hset(f'schedule:{self.uuid}', field, value)
|
||||
|
||||
def get_tags(self):
|
||||
return r_crawler.smembers(f'schedule:tags:{self.uuid}')
|
||||
|
||||
def set_tags(self, tags=[]):
|
||||
for tag in tags:
|
||||
r_crawler.sadd(f'schedule:tags:{self.uuid}', tag)
|
||||
# Tag.create_custom_tag(tag)
|
||||
|
||||
def get_meta(self, ui=False):
|
||||
meta = {
|
||||
'uuid': self.uuid,
|
||||
|
@ -960,6 +1194,7 @@ class CrawlerSchedule:
|
|||
'cookiejar': self.get_cookiejar(),
|
||||
'header': self.get_header(),
|
||||
'proxy': self.get_proxy(),
|
||||
'tags': self.get_tags(),
|
||||
}
|
||||
status = self.get_status()
|
||||
if ui:
|
||||
|
@ -975,6 +1210,7 @@ class CrawlerSchedule:
|
|||
meta = {'uuid': self.uuid,
|
||||
'url': self.get_url(),
|
||||
'user': self.get_user(),
|
||||
'tags': self.get_tags(),
|
||||
'next_run': self.get_next_run(r_str=True)}
|
||||
status = self.get_status()
|
||||
if isinstance(status, ScheduleStatus):
|
||||
|
@ -983,7 +1219,7 @@ class CrawlerSchedule:
|
|||
return meta
|
||||
|
||||
def create(self, frequency, user, url,
|
||||
depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None):
|
||||
depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None, tags=[]):
|
||||
|
||||
if self.exists():
|
||||
raise Exception('Error: Monitor already exists')
|
||||
|
@ -1012,6 +1248,9 @@ class CrawlerSchedule:
|
|||
if user_agent:
|
||||
self._set_field('user_agent', user_agent)
|
||||
|
||||
if tags:
|
||||
self.set_tags(tags)
|
||||
|
||||
r_crawler.sadd('scheduler:schedules', self.uuid)
|
||||
|
||||
def delete(self):
|
||||
|
@ -1025,12 +1264,13 @@ class CrawlerSchedule:
|
|||
|
||||
# delete meta
|
||||
r_crawler.delete(f'schedule:{self.uuid}')
|
||||
r_crawler.delete(f'schedule:tags:{self.uuid}')
|
||||
r_crawler.srem('scheduler:schedules', self.uuid)
|
||||
|
||||
def create_schedule(frequency, user, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None):
|
||||
def create_schedule(frequency, user, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None, user_agent=None, tags=[]):
|
||||
schedule_uuid = gen_uuid()
|
||||
schedule = CrawlerSchedule(schedule_uuid)
|
||||
schedule.create(frequency, user, url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, proxy=proxy, user_agent=user_agent)
|
||||
schedule.create(frequency, user, url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar, proxy=proxy, user_agent=user_agent, tags=tags)
|
||||
return schedule_uuid
|
||||
|
||||
# TODO sanityze UUID
|
||||
|
@ -1087,8 +1327,15 @@ class CrawlerCapture:
|
|||
if task_uuid:
|
||||
return CrawlerTask(task_uuid)
|
||||
|
||||
def get_start_time(self):
|
||||
return self.get_task().get_start_time()
|
||||
def get_start_time(self, r_str=True):
|
||||
start_time = self.get_task().get_start_time()
|
||||
if r_str:
|
||||
return start_time
|
||||
elif not start_time:
|
||||
return 0
|
||||
else:
|
||||
start_time = datetime.strptime(start_time, "%Y/%m/%d - %H:%M.%S").timestamp()
|
||||
return int(start_time)
|
||||
|
||||
def get_status(self):
|
||||
status = r_cache.hget(f'crawler:capture:{self.uuid}', 'status')
|
||||
|
@ -1101,7 +1348,8 @@ class CrawlerCapture:
|
|||
|
||||
def create(self, task_uuid):
|
||||
if self.exists():
|
||||
raise Exception(f'Error: Capture {self.uuid} already exists')
|
||||
print(f'Capture {self.uuid} already exists') # TODO LOGS
|
||||
return None
|
||||
launch_time = int(time.time())
|
||||
r_crawler.hset(f'crawler:task:{task_uuid}', 'capture', self.uuid)
|
||||
r_crawler.hset('crawler:captures:tasks', self.uuid, task_uuid)
|
||||
|
@ -1166,6 +1414,11 @@ def get_captures_status():
|
|||
status.append(meta)
|
||||
return status
|
||||
|
||||
def delete_captures():
|
||||
for capture_uuid in get_crawler_captures():
|
||||
capture = CrawlerCapture(capture_uuid)
|
||||
capture.delete()
|
||||
|
||||
##-- CRAWLER STATE --##
|
||||
|
||||
|
||||
|
@ -1248,6 +1501,14 @@ class CrawlerTask:
|
|||
def _set_field(self, field, value):
|
||||
return r_crawler.hset(f'crawler:task:{self.uuid}', field, value)
|
||||
|
||||
def get_tags(self):
|
||||
return r_crawler.smembers(f'crawler:task:tags:{self.uuid}')
|
||||
|
||||
def set_tags(self, tags):
|
||||
for tag in tags:
|
||||
r_crawler.sadd(f'crawler:task:tags:{self.uuid}', tag)
|
||||
# Tag.create_custom_tag(tag)
|
||||
|
||||
def get_meta(self):
|
||||
meta = {
|
||||
'uuid': self.uuid,
|
||||
|
@ -1262,6 +1523,7 @@ class CrawlerTask:
|
|||
'header': self.get_header(),
|
||||
'proxy': self.get_proxy(),
|
||||
'parent': self.get_parent(),
|
||||
'tags': self.get_tags(),
|
||||
}
|
||||
return meta
|
||||
|
||||
|
@ -1269,7 +1531,7 @@ class CrawlerTask:
|
|||
# TODO SANITIZE PRIORITY
|
||||
# PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100
|
||||
def create(self, url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None,
|
||||
user_agent=None, parent='manual', priority=0):
|
||||
user_agent=None, tags=[], parent='manual', priority=0, external=False):
|
||||
if self.exists():
|
||||
raise Exception('Error: Task already exists')
|
||||
|
||||
|
@ -1300,7 +1562,7 @@ class CrawlerTask:
|
|||
# TODO SANITIZE COOKIEJAR -> UUID
|
||||
|
||||
# Check if already in queue
|
||||
hash_query = get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header)
|
||||
hash_query = get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header, tags)
|
||||
if r_crawler.hexists(f'crawler:queue:hash', hash_query):
|
||||
self.uuid = r_crawler.hget(f'crawler:queue:hash', hash_query)
|
||||
return self.uuid
|
||||
|
@ -1321,10 +1583,13 @@ class CrawlerTask:
|
|||
if user_agent:
|
||||
self._set_field('user_agent', user_agent)
|
||||
|
||||
if tags:
|
||||
self.set_tags(tags)
|
||||
|
||||
r_crawler.hset('crawler:queue:hash', hash_query, self.uuid)
|
||||
self._set_field('hash', hash_query)
|
||||
r_crawler.zadd('crawler:queue', {self.uuid: priority})
|
||||
self.add_to_db_crawler_queue(priority)
|
||||
if not external:
|
||||
self.add_to_db_crawler_queue(priority)
|
||||
# UI
|
||||
domain_type = dom.get_domain_type()
|
||||
r_crawler.sadd(f'crawler:queue:type:{domain_type}', self.uuid)
|
||||
|
@ -1337,6 +1602,11 @@ class CrawlerTask:
|
|||
def start(self):
|
||||
self._set_field('start_time', datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
|
||||
|
||||
def reset(self):
|
||||
priority = 49
|
||||
r_crawler.hdel(f'crawler:task:{self.uuid}', 'start_time')
|
||||
self.add_to_db_crawler_queue(priority)
|
||||
|
||||
# Crawler
|
||||
def remove(self): # zrem cache + DB
|
||||
capture_uuid = self.get_capture()
|
||||
|
@ -1360,10 +1630,10 @@ class CrawlerTask:
|
|||
|
||||
|
||||
# TODO move to class ???
|
||||
def get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header):
|
||||
def get_task_hash(url, domain, depth, har, screenshot, priority, proxy, cookiejar, user_agent, header, tags):
|
||||
to_enqueue = {'domain': domain, 'depth': depth, 'har': har, 'screenshot': screenshot,
|
||||
'priority': priority, 'proxy': proxy, 'cookiejar': cookiejar, 'user_agent': user_agent,
|
||||
'header': header}
|
||||
'header': header, 'tags': tags}
|
||||
if priority != 0:
|
||||
to_enqueue['url'] = url
|
||||
return hashlib.sha512(pickle.dumps(to_enqueue)).hexdigest()
|
||||
|
@ -1374,12 +1644,11 @@ def add_task_to_lacus_queue():
|
|||
return None
|
||||
task_uuid, priority = task_uuid[0]
|
||||
task = CrawlerTask(task_uuid)
|
||||
task.start()
|
||||
return task.uuid, priority
|
||||
return task, priority
|
||||
|
||||
# PRIORITY: discovery = 0/10, feeder = 10, manual = 50, auto = 40, test = 100
|
||||
def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=None, proxy=None,
|
||||
user_agent=None, parent='manual', priority=0, task_uuid=None):
|
||||
user_agent=None, tags=[], parent='manual', priority=0, task_uuid=None, external=False):
|
||||
if task_uuid:
|
||||
if CrawlerTask(task_uuid).exists():
|
||||
task_uuid = gen_uuid()
|
||||
|
@ -1387,7 +1656,8 @@ def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=
|
|||
task_uuid = gen_uuid()
|
||||
task = CrawlerTask(task_uuid)
|
||||
task_uuid = task.create(url, depth=depth, har=har, screenshot=screenshot, header=header, cookiejar=cookiejar,
|
||||
proxy=proxy, user_agent=user_agent, parent=parent, priority=priority)
|
||||
proxy=proxy, user_agent=user_agent, tags=tags, parent=parent, priority=priority,
|
||||
external=external)
|
||||
return task_uuid
|
||||
|
||||
|
||||
|
@ -1397,7 +1667,8 @@ def create_task(url, depth=1, har=True, screenshot=True, header=None, cookiejar=
|
|||
|
||||
# # TODO: ADD user agent
|
||||
# # TODO: sanitize URL
|
||||
def api_add_crawler_task(data, user_id=None):
|
||||
|
||||
def api_parse_task_dict_basic(data, user_id):
|
||||
url = data.get('url', None)
|
||||
if not url or url == '\n':
|
||||
return {'status': 'error', 'reason': 'No url supplied'}, 400
|
||||
|
@ -1423,6 +1694,31 @@ def api_add_crawler_task(data, user_id=None):
|
|||
else:
|
||||
depth_limit = 0
|
||||
|
||||
# PROXY
|
||||
proxy = data.get('proxy', None)
|
||||
if proxy == 'onion' or proxy == 'tor' or proxy == 'force_tor':
|
||||
proxy = 'force_tor'
|
||||
elif proxy:
|
||||
verify = api_verify_proxy(proxy)
|
||||
if verify[1] != 200:
|
||||
return verify
|
||||
|
||||
tags = data.get('tags', [])
|
||||
|
||||
return {'url': url, 'depth_limit': depth_limit, 'har': har, 'screenshot': screenshot, 'proxy': proxy, 'tags': tags}, 200
|
||||
|
||||
def api_add_crawler_task(data, user_id=None):
|
||||
task, resp = api_parse_task_dict_basic(data, user_id)
|
||||
if resp != 200:
|
||||
return task, resp
|
||||
|
||||
url = task['url']
|
||||
screenshot = task['screenshot']
|
||||
har = task['har']
|
||||
depth_limit = task['depth_limit']
|
||||
proxy = task['proxy']
|
||||
tags = task['tags']
|
||||
|
||||
cookiejar_uuid = data.get('cookiejar', None)
|
||||
if cookiejar_uuid:
|
||||
cookiejar = Cookiejar(cookiejar_uuid)
|
||||
|
@ -1434,6 +1730,19 @@ def api_add_crawler_task(data, user_id=None):
|
|||
return {'error': 'The access to this cookiejar is restricted'}, 403
|
||||
cookiejar_uuid = cookiejar.uuid
|
||||
|
||||
cookies = data.get('cookies', None)
|
||||
if not cookiejar_uuid and cookies:
|
||||
# Create new cookiejar
|
||||
cookiejar_uuid = create_cookiejar(user_id, "single-shot cookiejar", 1, None)
|
||||
cookiejar = Cookiejar(cookiejar_uuid)
|
||||
for cookie in cookies:
|
||||
try:
|
||||
name = cookie.get('name')
|
||||
value = cookie.get('value')
|
||||
cookiejar.add_cookie(name, value, None, None, None, None, None)
|
||||
except KeyError:
|
||||
return {'error': 'Invalid cookie key, please submit a valid JSON', 'cookiejar_uuid': cookiejar_uuid}, 400
|
||||
|
||||
frequency = data.get('frequency', None)
|
||||
if frequency:
|
||||
if frequency not in ['monthly', 'weekly', 'daily', 'hourly']:
|
||||
|
@ -1454,29 +1763,47 @@ def api_add_crawler_task(data, user_id=None):
|
|||
return {'error': 'Invalid frequency'}, 400
|
||||
frequency = f'{months}:{weeks}:{days}:{hours}:{minutes}'
|
||||
|
||||
# PROXY
|
||||
proxy = data.get('proxy', None)
|
||||
if proxy == 'onion' or proxy == 'tor' or proxy == 'force_tor':
|
||||
proxy = 'force_tor'
|
||||
elif proxy:
|
||||
verify = api_verify_proxy(proxy)
|
||||
if verify[1] != 200:
|
||||
return verify
|
||||
|
||||
if frequency:
|
||||
# TODO verify user
|
||||
return create_schedule(frequency, user_id, url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None), 200
|
||||
task_uuid = create_schedule(frequency, user_id, url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, tags=tags)
|
||||
else:
|
||||
# TODO HEADERS
|
||||
# TODO USER AGENT
|
||||
return create_task(url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None,
|
||||
parent='manual', priority=90), 200
|
||||
task_uuid = create_task(url, depth=depth_limit, har=har, screenshot=screenshot, header=None,
|
||||
cookiejar=cookiejar_uuid, proxy=proxy, user_agent=None, tags=tags,
|
||||
parent='manual', priority=90)
|
||||
|
||||
return {'uuid': task_uuid}, 200
|
||||
|
||||
|
||||
#### ####
|
||||
|
||||
# TODO cookiejar - cookies - frequency
|
||||
def api_add_crawler_capture(data, user_id):
|
||||
task, resp = api_parse_task_dict_basic(data, user_id)
|
||||
if resp != 200:
|
||||
return task, resp
|
||||
|
||||
task_uuid = data.get('task_uuid')
|
||||
if not task_uuid:
|
||||
return {'error': 'Invalid task_uuid', 'task_uuid': task_uuid}, 400
|
||||
capture_uuid = data.get('capture_uuid')
|
||||
if not capture_uuid:
|
||||
return {'error': 'Invalid capture_uuid', 'capture_uuid': capture_uuid}, 400
|
||||
|
||||
# parent = data.get('parent')
|
||||
|
||||
# TODO parent
|
||||
task_uuid = create_task(task['url'], depth=task['depth_limit'], har=task['har'], screenshot=task['screenshot'],
|
||||
proxy=task['proxy'], tags=task['tags'],
|
||||
parent='manual', task_uuid=task_uuid, external=True)
|
||||
if not task_uuid:
|
||||
return {'error': 'Aborted by Crawler', 'task_uuid': task_uuid, 'capture_uuid': capture_uuid}, 400
|
||||
task = CrawlerTask(task_uuid)
|
||||
create_capture(capture_uuid, task_uuid)
|
||||
task.start()
|
||||
return {'uuid': capture_uuid}, 200
|
||||
|
||||
###################################################################################
|
||||
###################################################################################
|
||||
|
@ -1515,14 +1842,6 @@ def create_item_id(item_dir, domain):
|
|||
UUID = domain+str(uuid.uuid4())
|
||||
return os.path.join(item_dir, UUID)
|
||||
|
||||
def save_har(har_dir, item_id, har_content):
|
||||
if not os.path.exists(har_dir):
|
||||
os.makedirs(har_dir)
|
||||
item_id = item_id.split('/')[-1]
|
||||
filename = os.path.join(har_dir, item_id + '.json')
|
||||
with open(filename, 'w') as f:
|
||||
f.write(json.dumps(har_content))
|
||||
|
||||
# # # # # # # # # # # #
|
||||
# #
|
||||
# CRAWLER MANAGER # TODO REFACTOR ME
|
||||
|
@ -1553,13 +1872,13 @@ class CrawlerProxy:
|
|||
self.uuid = proxy_uuid
|
||||
|
||||
def get_description(self):
|
||||
return r_crawler.hgrt(f'crawler:proxy:{self.uuif}', 'description')
|
||||
return r_crawler.hget(f'crawler:proxy:{self.uuid}', 'description')
|
||||
|
||||
# Host
|
||||
# Port
|
||||
# Type -> need test
|
||||
def get_url(self):
|
||||
return r_crawler.hgrt(f'crawler:proxy:{self.uuif}', 'url')
|
||||
return r_crawler.hget(f'crawler:proxy:{self.uuid}', 'url')
|
||||
|
||||
#### CRAWLER LACUS ####
|
||||
|
||||
|
@ -1621,7 +1940,11 @@ def ping_lacus():
|
|||
ping = False
|
||||
req_error = {'error': 'Lacus URL undefined', 'status_code': 400}
|
||||
else:
|
||||
ping = lacus.is_up
|
||||
try:
|
||||
ping = lacus.is_up
|
||||
except:
|
||||
req_error = {'error': 'Failed to connect Lacus URL', 'status_code': 400}
|
||||
ping = False
|
||||
update_lacus_connection_status(ping, req_error=req_error)
|
||||
return ping
|
||||
|
||||
|
@ -1638,7 +1961,7 @@ def api_save_lacus_url_key(data):
|
|||
# unpack json
|
||||
manager_url = data.get('url', None)
|
||||
api_key = data.get('api_key', None)
|
||||
if not manager_url: # or not api_key:
|
||||
if not manager_url: # or not api_key:
|
||||
return {'status': 'error', 'reason': 'No url or API key supplied'}, 400
|
||||
# check if is valid url
|
||||
try:
|
||||
|
@ -1681,7 +2004,7 @@ def api_set_crawler_max_captures(data):
|
|||
save_nb_max_captures(nb_captures)
|
||||
return nb_captures, 200
|
||||
|
||||
## TEST ##
|
||||
## TEST ##
|
||||
|
||||
def is_test_ail_crawlers_successful():
|
||||
return r_db.hget('crawler:tor:test', 'success') == 'True'
|
||||
|
@ -1755,7 +2078,15 @@ def test_ail_crawlers():
|
|||
load_blacklist()
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# item = Item('crawled/2023/03/06/foo.bec50a87b5-0c21-4ed4-9cb2-2d717a7a6507')
|
||||
# content = item.get_content()
|
||||
# r = extract_author_from_html(content)
|
||||
# print(r)
|
||||
# delete_captures()
|
||||
|
||||
# item_id = 'crawled/2023/02/20/data.gz'
|
||||
# item = Item(item_id)
|
||||
# content = item.get_content()
|
||||
# temp_url = ''
|
||||
# r = extract_favicon_from_html(content, temp_url)
|
||||
# print(r)
|
||||
# _reprocess_all_hars_cookie_name()
|
||||
# _reprocess_all_hars_etag()
|
||||
# _gzip_all_hars()
|
||||
# _reprocess_all_hars_hhhashs()
|
||||
|
|
|
@ -129,7 +129,7 @@ def get_item_url(item_id):
|
|||
|
||||
def get_item_har(item_id):
|
||||
har = '/'.join(item_id.rsplit('/')[-4:])
|
||||
har = f'{har}.json'
|
||||
har = f'{har}.json.gz'
|
||||
path = os.path.join(ConfigLoader.get_hars_dir(), har)
|
||||
if os.path.isfile(path):
|
||||
return har
|
||||
|
@ -204,15 +204,22 @@ def _get_dir_source_name(directory, source_name=None, l_sources_name=set(), filt
|
|||
if not l_sources_name:
|
||||
l_sources_name = set()
|
||||
if source_name:
|
||||
l_dir = os.listdir(os.path.join(directory, source_name))
|
||||
path = os.path.join(directory, source_name)
|
||||
if os.path.isdir(path):
|
||||
l_dir = os.listdir(os.path.join(directory, source_name))
|
||||
else:
|
||||
l_dir = []
|
||||
else:
|
||||
l_dir = os.listdir(directory)
|
||||
# empty directory
|
||||
if not l_dir:
|
||||
return l_sources_name.add(source_name)
|
||||
if source_name:
|
||||
return l_sources_name.add(source_name)
|
||||
else:
|
||||
return l_sources_name
|
||||
else:
|
||||
for src_name in l_dir:
|
||||
if len(src_name) == 4:
|
||||
if len(src_name) == 4 and source_name:
|
||||
# try:
|
||||
int(src_name)
|
||||
to_add = os.path.join(source_name)
|
||||
|
|
|
@ -104,9 +104,13 @@ def _get_word_regex(word):
|
|||
|
||||
def convert_byte_offset_to_string(b_content, offset):
|
||||
byte_chunk = b_content[:offset + 1]
|
||||
string_chunk = byte_chunk.decode()
|
||||
offset = len(string_chunk) - 1
|
||||
return offset
|
||||
try:
|
||||
string_chunk = byte_chunk.decode()
|
||||
offset = len(string_chunk) - 1
|
||||
return offset
|
||||
except UnicodeDecodeError as e:
|
||||
logger.error(f'Yara offset converter error, {str(e)}\n{offset}/{len(b_content)}')
|
||||
return convert_byte_offset_to_string(b_content, offset - 1)
|
||||
|
||||
|
||||
# TODO RETRO HUNTS
|
||||
|
|
166
bin/lib/objects/ChatSubChannels.py
Executable file
166
bin/lib/objects/ChatSubChannels.py
Executable file
|
@ -0,0 +1,166 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from flask import url_for
|
||||
# from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
|
||||
|
||||
from lib.data_retention_engine import update_obj_date
|
||||
from lib.objects import ail_objects
|
||||
from lib.timeline_engine import Timeline
|
||||
|
||||
from lib.correlations_engine import get_correlation_by_correl_type
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class ChatSubChannel(AbstractChatObject):
|
||||
"""
|
||||
AIL Chat Object. (strings)
|
||||
"""
|
||||
|
||||
# ID -> <CHAT ID>/<SubChannel ID> subtype = chat_instance_uuid
|
||||
def __init__(self, id, subtype):
|
||||
super(ChatSubChannel, self).__init__('chat-subchannel', id, subtype)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO
|
||||
# if self.subtype == 'telegram':
|
||||
# style = 'fab'
|
||||
# icon = '\uf2c6'
|
||||
# elif self.subtype == 'discord':
|
||||
# style = 'fab'
|
||||
# icon = '\uf099'
|
||||
# else:
|
||||
# style = 'fas'
|
||||
# icon = '\uf007'
|
||||
style = 'far'
|
||||
icon = '\uf086'
|
||||
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
# TODO TIME LAST MESSAGES
|
||||
|
||||
def get_meta(self, options=set(), translation_target=None):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['name'] = self.get_name()
|
||||
if 'chat' in options:
|
||||
meta['chat'] = self.get_chat()
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
meta['img'] = meta['icon']
|
||||
if 'nb_messages' in options:
|
||||
meta['nb_messages'] = self.get_nb_messages()
|
||||
if 'created_at' in options:
|
||||
meta['created_at'] = self.get_created_at(date=True)
|
||||
if 'threads' in options:
|
||||
meta['threads'] = self.get_threads()
|
||||
if 'participants' in options:
|
||||
meta['participants'] = self.get_participants()
|
||||
if 'nb_participants' in options:
|
||||
meta['nb_participants'] = self.get_nb_participants()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation_name'] = self.translate(meta['name'], field='name', target=translation_target)
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
# obj_attrs = []
|
||||
# if self.subtype == 'telegram':
|
||||
# obj = MISPObject('telegram-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# elif self.subtype == 'twitter':
|
||||
# obj = MISPObject('twitter-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('name', value=self.id))
|
||||
#
|
||||
# else:
|
||||
# obj = MISPObject('user-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# first_seen = self.get_first_seen()
|
||||
# last_seen = self.get_last_seen()
|
||||
# if first_seen:
|
||||
# obj.first_seen = first_seen
|
||||
# if last_seen:
|
||||
# obj.last_seen = last_seen
|
||||
# if not first_seen or not last_seen:
|
||||
# self.logger.warning(
|
||||
# f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
#
|
||||
# for obj_attr in obj_attrs:
|
||||
# for tag in self.get_tags():
|
||||
# obj_attr.add_tag(tag)
|
||||
# return obj
|
||||
return
|
||||
|
||||
############################################################################
|
||||
############################################################################
|
||||
|
||||
# others optional metas, ... -> # TODO ALL meta in hset
|
||||
|
||||
def _get_timeline_name(self):
|
||||
return Timeline(self.get_global_id(), 'username')
|
||||
|
||||
def update_name(self, name, timestamp):
|
||||
self._get_timeline_name().add_timestamp(timestamp, name)
|
||||
|
||||
|
||||
# TODO # # # # # # # # # # #
|
||||
def get_users(self):
|
||||
pass
|
||||
|
||||
#### Categories ####
|
||||
|
||||
#### Threads ####
|
||||
|
||||
#### Messages #### TODO set parents
|
||||
|
||||
# def get_last_message_id(self):
|
||||
#
|
||||
# return r_object.hget(f'meta:{self.type}:{self.subtype}:{self.id}', 'last:message:id')
|
||||
|
||||
|
||||
class ChatSubChannels(AbstractChatObjects):
|
||||
def __init__(self):
|
||||
super().__init__('chat-subchannel')
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# chat = Chat('test', 'telegram')
|
||||
# r = chat.get_messages()
|
||||
# print(r)
|
120
bin/lib/objects/ChatThreads.py
Executable file
120
bin/lib/objects/ChatThreads.py
Executable file
|
@ -0,0 +1,120 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from flask import url_for
|
||||
# from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
|
||||
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class ChatThread(AbstractChatObject):
|
||||
"""
|
||||
AIL Chat Object. (strings)
|
||||
"""
|
||||
|
||||
def __init__(self, id, subtype):
|
||||
super().__init__('chat-thread', id, subtype)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO
|
||||
# if self.subtype == 'telegram':
|
||||
# style = 'fab'
|
||||
# icon = '\uf2c6'
|
||||
# elif self.subtype == 'discord':
|
||||
# style = 'fab'
|
||||
# icon = '\uf099'
|
||||
# else:
|
||||
# style = 'fas'
|
||||
# icon = '\uf007'
|
||||
style = 'fas'
|
||||
icon = '\uf7a4'
|
||||
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['subtype'] = self.subtype
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'name':
|
||||
meta['name'] = self.get_name()
|
||||
if 'nb_messages':
|
||||
meta['nb_messages'] = self.get_nb_messages()
|
||||
if 'participants':
|
||||
meta['participants'] = self.get_participants()
|
||||
if 'nb_participants':
|
||||
meta['nb_participants'] = self.get_nb_participants()
|
||||
# created_at ???
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
return
|
||||
|
||||
def create(self, container_obj, message_id):
|
||||
if message_id:
|
||||
parent_message = container_obj.get_obj_by_message_id(message_id)
|
||||
if parent_message: # TODO EXCEPTION IF DON'T EXISTS
|
||||
self.set_parent(obj_global_id=parent_message)
|
||||
_, _, parent_id = parent_message.split(':', 2)
|
||||
self.add_correlation('message', '', parent_id)
|
||||
else:
|
||||
self.set_parent(obj_global_id=container_obj.get_global_id())
|
||||
self.add_correlation(container_obj.get_type(), container_obj.get_subtype(r_str=True), container_obj.get_id())
|
||||
|
||||
def create(thread_id, chat_instance, chat_id, subchannel_id, message_id, container_obj):
|
||||
if container_obj.get_type() == 'chat':
|
||||
new_thread_id = f'{chat_id}/{thread_id}'
|
||||
# sub-channel
|
||||
else:
|
||||
new_thread_id = f'{chat_id}/{subchannel_id}/{thread_id}'
|
||||
|
||||
thread = ChatThread(new_thread_id, chat_instance)
|
||||
if not thread.is_children():
|
||||
thread.create(container_obj, message_id)
|
||||
return thread
|
||||
|
||||
class ChatThreads(AbstractChatObjects):
|
||||
def __init__(self):
|
||||
super().__init__('chat-thread')
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# chat = Chat('test', 'telegram')
|
||||
# r = chat.get_messages()
|
||||
# print(r)
|
216
bin/lib/objects/Chats.py
Executable file
216
bin/lib/objects/Chats.py
Executable file
|
@ -0,0 +1,216 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from flask import url_for
|
||||
# from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_chat_object import AbstractChatObject, AbstractChatObjects
|
||||
|
||||
|
||||
from lib.objects.abstract_subtype_object import AbstractSubtypeObject, get_all_id
|
||||
from lib.data_retention_engine import update_obj_date
|
||||
from lib.objects import ail_objects
|
||||
from lib.timeline_engine import Timeline
|
||||
|
||||
from lib.correlations_engine import get_correlation_by_correl_type
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class Chat(AbstractChatObject):
|
||||
"""
|
||||
AIL Chat Object.
|
||||
"""
|
||||
|
||||
def __init__(self, id, subtype):
|
||||
super(Chat, self).__init__('chat', id, subtype)
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO
|
||||
# if self.subtype == 'telegram':
|
||||
# style = 'fab'
|
||||
# icon = '\uf2c6'
|
||||
# elif self.subtype == 'discord':
|
||||
# style = 'fab'
|
||||
# icon = '\uf099'
|
||||
# else:
|
||||
# style = 'fas'
|
||||
# icon = '\uf007'
|
||||
style = 'fas'
|
||||
icon = '\uf086'
|
||||
return {'style': style, 'icon': icon, 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_meta(self, options=set(), translation_target=None):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['name'] = self.get_name()
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
meta['img'] = meta['icon']
|
||||
if 'info' in options:
|
||||
meta['info'] = self.get_info()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation_info'] = self.translate(meta['info'], field='info', target=translation_target)
|
||||
if 'participants' in options:
|
||||
meta['participants'] = self.get_participants()
|
||||
if 'nb_participants' in options:
|
||||
meta['nb_participants'] = self.get_nb_participants()
|
||||
if 'nb_messages' in options:
|
||||
meta['nb_messages'] = self.get_nb_messages()
|
||||
if 'username' in options:
|
||||
meta['username'] = self.get_username()
|
||||
if 'subchannels' in options:
|
||||
meta['subchannels'] = self.get_subchannels()
|
||||
if 'nb_subchannels':
|
||||
meta['nb_subchannels'] = self.get_nb_subchannels()
|
||||
if 'created_at':
|
||||
meta['created_at'] = self.get_created_at(date=True)
|
||||
if 'threads' in options:
|
||||
meta['threads'] = self.get_threads()
|
||||
if 'tags_safe' in options:
|
||||
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
# obj_attrs = []
|
||||
# if self.subtype == 'telegram':
|
||||
# obj = MISPObject('telegram-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# elif self.subtype == 'twitter':
|
||||
# obj = MISPObject('twitter-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('name', value=self.id))
|
||||
#
|
||||
# else:
|
||||
# obj = MISPObject('user-account', standalone=True)
|
||||
# obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
#
|
||||
# first_seen = self.get_first_seen()
|
||||
# last_seen = self.get_last_seen()
|
||||
# if first_seen:
|
||||
# obj.first_seen = first_seen
|
||||
# if last_seen:
|
||||
# obj.last_seen = last_seen
|
||||
# if not first_seen or not last_seen:
|
||||
# self.logger.warning(
|
||||
# f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
#
|
||||
# for obj_attr in obj_attrs:
|
||||
# for tag in self.get_tags():
|
||||
# obj_attr.add_tag(tag)
|
||||
# return obj
|
||||
return
|
||||
|
||||
############################################################################
|
||||
############################################################################
|
||||
|
||||
# users that send at least a message else participants/spectator
|
||||
# correlation created by messages
|
||||
def get_users(self):
|
||||
users = set()
|
||||
accounts = self.get_correlation('user-account').get('user-account', [])
|
||||
for account in accounts:
|
||||
users.add(account[1:])
|
||||
return users
|
||||
|
||||
def _get_timeline_username(self):
|
||||
return Timeline(self.get_global_id(), 'username')
|
||||
|
||||
def get_username(self):
|
||||
return self._get_timeline_username().get_last_obj_id()
|
||||
|
||||
def get_usernames(self):
|
||||
return self._get_timeline_username().get_objs_ids()
|
||||
|
||||
def update_username_timeline(self, username_global_id, timestamp):
|
||||
self._get_timeline_username().add_timestamp(timestamp, username_global_id)
|
||||
|
||||
#### ChatSubChannels ####
|
||||
|
||||
|
||||
#### Categories ####
|
||||
|
||||
#### Threads ####
|
||||
|
||||
#### Messages #### TODO set parents
|
||||
|
||||
# def get_last_message_id(self):
|
||||
#
|
||||
# return r_object.hget(f'meta:{self.type}:{self.subtype}:{self.id}', 'last:message:id')
|
||||
|
||||
# def add(self, timestamp, obj_id, mess_id=0, username=None, user_id=None):
|
||||
# date = # TODO get date from object
|
||||
# self.update_daterange(date)
|
||||
# update_obj_date(date, self.type, self.subtype)
|
||||
#
|
||||
#
|
||||
# # daily
|
||||
# r_object.hincrby(f'{self.type}:{self.subtype}:{date}', self.id, 1)
|
||||
# # all subtypes
|
||||
# r_object.zincrby(f'{self.type}_all:{self.subtype}', 1, self.id)
|
||||
#
|
||||
# #######################################################################
|
||||
# #######################################################################
|
||||
#
|
||||
# # Correlations
|
||||
# self.add_correlation('item', '', item_id)
|
||||
# # domain
|
||||
# if is_crawled(item_id):
|
||||
# domain = get_item_domain(item_id)
|
||||
# self.add_correlation('domain', '', domain)
|
||||
|
||||
# importer -> use cache for previous reply SET to_add_id: previously_imported : expire SET key -> 30 mn
|
||||
|
||||
|
||||
class Chats(AbstractChatObjects):
|
||||
def __init__(self):
|
||||
super().__init__('chat')
|
||||
|
||||
# TODO factorize
|
||||
def get_all_subtypes():
|
||||
return ail_core.get_object_all_subtypes('chat')
|
||||
|
||||
def get_all():
|
||||
objs = {}
|
||||
for subtype in get_all_subtypes():
|
||||
objs[subtype] = get_all_by_subtype(subtype)
|
||||
return objs
|
||||
|
||||
def get_all_by_subtype(subtype):
|
||||
return get_all_id('chat', subtype)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
chat = Chat('test', 'telegram')
|
||||
r = chat.get_messages()
|
||||
print(r)
|
118
bin/lib/objects/CookiesNames.py
Executable file
118
bin/lib/objects/CookiesNames.py
Executable file
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
# TODO NEW ABSTRACT OBJECT -> daterange for all objects ????
|
||||
|
||||
class CookieName(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL CookieName Object.
|
||||
"""
|
||||
|
||||
def __init__(self, obj_id):
|
||||
super(CookieName, self).__init__('cookie-name', obj_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf564', 'color': '#BFD677', 'radius': 5} # f563
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('cookie')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('cookie-name', value=self.get_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('domain')
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
if not isinstance(content, str):
|
||||
content = content.decode()
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create(content):
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
obj_id = sha256(content).hexdigest()
|
||||
cookie = CookieName(obj_id)
|
||||
if not cookie.exists():
|
||||
cookie.create(content)
|
||||
return cookie
|
||||
|
||||
|
||||
class CookiesNames(AbstractDaterangeObjects):
|
||||
"""
|
||||
CookieName Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('cookie-name', CookieName)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
|
@ -79,9 +79,6 @@ class Cve(AbstractDaterangeObject):
|
|||
meta['tags'] = self.get_tags(r_list=True)
|
||||
return meta
|
||||
|
||||
def add(self, date, item_id):
|
||||
self._add(date, item_id)
|
||||
|
||||
def get_cve_search(self):
|
||||
try:
|
||||
response = requests.get(f'https://cvepremium.circl.lu/api/cve/{self.id}', timeout=10)
|
||||
|
|
|
@ -111,13 +111,25 @@ class Decoded(AbstractDaterangeObject):
|
|||
def get_rel_path(self, mimetype=None):
|
||||
if not mimetype:
|
||||
mimetype = self.get_mimetype()
|
||||
if not mimetype:
|
||||
self.logger.warning(f'Decoded {self.id}: Empty mimetype')
|
||||
return None
|
||||
return os.path.join(HASH_DIR, mimetype, self.id[0:2], self.id)
|
||||
|
||||
def get_filepath(self, mimetype=None):
|
||||
return os.path.join(os.environ['AIL_HOME'], self.get_rel_path(mimetype=mimetype))
|
||||
rel_path = self.get_rel_path(mimetype=mimetype)
|
||||
if not rel_path:
|
||||
return None
|
||||
else:
|
||||
return os.path.join(os.environ['AIL_HOME'], rel_path)
|
||||
|
||||
def get_content(self, mimetype=None, r_type='str'):
|
||||
filepath = self.get_filepath(mimetype=mimetype)
|
||||
if not filepath:
|
||||
if r_type == 'str':
|
||||
return ''
|
||||
else:
|
||||
return b''
|
||||
if r_type == 'str':
|
||||
with open(filepath, 'r') as f:
|
||||
content = f.read()
|
||||
|
@ -126,7 +138,7 @@ class Decoded(AbstractDaterangeObject):
|
|||
with open(filepath, 'rb') as f:
|
||||
content = f.read()
|
||||
return content
|
||||
elif r_str == 'bytesio':
|
||||
elif r_type == 'bytesio':
|
||||
with open(filepath, 'rb') as f:
|
||||
content = BytesIO(f.read())
|
||||
return content
|
||||
|
@ -137,7 +149,7 @@ class Decoded(AbstractDaterangeObject):
|
|||
with zipfile.ZipFile(zip_content, "w") as zf:
|
||||
# TODO: Fix password
|
||||
# zf.setpassword(b"infected")
|
||||
zf.writestr(self.id, self.get_content().getvalue())
|
||||
zf.writestr(self.id, self.get_content(r_type='bytesio').getvalue())
|
||||
zip_content.seek(0)
|
||||
return zip_content
|
||||
|
||||
|
@ -227,8 +239,8 @@ class Decoded(AbstractDaterangeObject):
|
|||
|
||||
return True
|
||||
|
||||
def add(self, algo_name, date, obj_id, mimetype=None):
|
||||
self._add(date, obj_id)
|
||||
def add(self, date, obj, algo_name, mimetype=None):
|
||||
self._add(date, obj)
|
||||
if not mimetype:
|
||||
mimetype = self.get_mimetype()
|
||||
|
||||
|
@ -442,13 +454,13 @@ def get_all_decodeds_objects(filters={}):
|
|||
if i >= len(files):
|
||||
files = []
|
||||
for file in files:
|
||||
yield Decoded(file).id
|
||||
yield Decoded(file)
|
||||
|
||||
|
||||
############################################################################
|
||||
|
||||
def sanityze_decoder_names(decoder_name):
|
||||
if decoder_name not in Decodeds.get_algos():
|
||||
if decoder_name not in get_algos():
|
||||
return None
|
||||
else:
|
||||
return decoder_name
|
||||
|
|
|
@ -389,10 +389,10 @@ class Domain(AbstractObject):
|
|||
har = get_item_har(item_id)
|
||||
if har:
|
||||
print(har)
|
||||
_write_in_zip_buffer(zf, os.path.join(hars_dir, har), f'{basename}.json')
|
||||
_write_in_zip_buffer(zf, os.path.join(hars_dir, har), f'{basename}.json.gz')
|
||||
# Screenshot
|
||||
screenshot = self._get_external_correlation('item', '', item_id, 'screenshot')
|
||||
if screenshot:
|
||||
if screenshot and screenshot['screenshot']:
|
||||
screenshot = screenshot['screenshot'].pop()[1:]
|
||||
screenshot = os.path.join(screenshot[0:2], screenshot[2:4], screenshot[4:6], screenshot[6:8],
|
||||
screenshot[8:10], screenshot[10:12], screenshot[12:])
|
||||
|
@ -595,21 +595,22 @@ def get_domains_up_by_filers(domain_types, date_from=None, date_to=None, tags=[]
|
|||
return None
|
||||
|
||||
def sanitize_domain_name_to_search(name_to_search, domain_type):
|
||||
if not name_to_search:
|
||||
return ""
|
||||
if domain_type == 'onion':
|
||||
r_name = r'[a-z0-9\.]+'
|
||||
else:
|
||||
r_name = r'[a-zA-Z0-9-_\.]+'
|
||||
# invalid domain name
|
||||
if not re.fullmatch(r_name, name_to_search):
|
||||
res = re.match(r_name, name_to_search)
|
||||
return {'search': name_to_search, 'error': res.string.replace( res[0], '')}
|
||||
return ""
|
||||
return name_to_search.replace('.', '\.')
|
||||
|
||||
def search_domain_by_name(name_to_search, domain_types, r_pos=False):
|
||||
domains = {}
|
||||
for domain_type in domain_types:
|
||||
r_name = sanitize_domain_name_to_search(name_to_search, domain_type)
|
||||
if not name_to_search or isinstance(r_name, dict):
|
||||
if not r_name:
|
||||
break
|
||||
r_name = re.compile(r_name)
|
||||
for domain in get_domains_up_by_type(domain_type):
|
||||
|
|
118
bin/lib/objects/Etags.py
Executable file
118
bin/lib/objects/Etags.py
Executable file
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
# TODO NEW ABSTRACT OBJECT -> daterange for all objects ????
|
||||
|
||||
class Etag(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Etag Object.
|
||||
"""
|
||||
|
||||
def __init__(self, obj_id):
|
||||
super(Etag, self).__init__('etag', obj_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf02b', 'color': '#556F65', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('etag')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('etag', value=self.get_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('domain')
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
if not isinstance(content, str):
|
||||
content = content.decode()
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create(content):
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
obj_id = sha256(content).hexdigest()
|
||||
etag = Etag(obj_id)
|
||||
if not etag.exists():
|
||||
etag.create(content)
|
||||
return etag
|
||||
|
||||
|
||||
class Etags(AbstractDaterangeObjects):
|
||||
"""
|
||||
Etags Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('etag', Etag)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
118
bin/lib/objects/Favicons.py
Executable file
118
bin/lib/objects/Favicons.py
Executable file
|
@ -0,0 +1,118 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import mmh3
|
||||
import os
|
||||
import sys
|
||||
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class Favicon(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Favicon Object.
|
||||
"""
|
||||
|
||||
def __init__(self, id):
|
||||
super(Favicon, self).__init__('favicon', id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf20a', 'color': '#1E88E5', 'radius': 5} # f0c8 f45c
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('favicon')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('favicon-mmh3', value=self.id))
|
||||
obj_attrs.append(obj.add_attribute('favicon', value=self.get_content(r_type='bytes')))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
# def get_links(self):
|
||||
# # TODO GET ALL URLS FROM CORRELATED ITEMS
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
if not isinstance(content, str):
|
||||
content = content.decode()
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
||||
|
||||
def create_favicon(content, url=None): # TODO URL ????
|
||||
if isinstance(content, str):
|
||||
content = content.encode()
|
||||
favicon_id = mmh3.hash_bytes(content)
|
||||
favicon = Favicon(favicon_id)
|
||||
if not favicon.exists():
|
||||
favicon.create(content)
|
||||
|
||||
|
||||
class Favicons(AbstractDaterangeObjects):
|
||||
"""
|
||||
Favicons Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('favicon', Favicon)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
101
bin/lib/objects/FilesNames.py
Executable file
101
bin/lib/objects/FilesNames.py
Executable file
|
@ -0,0 +1,101 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class FileName(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL FileName Object. (strings)
|
||||
"""
|
||||
|
||||
# ID = SHA256
|
||||
def __init__(self, name):
|
||||
super().__init__('file-name', name)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'far', 'icon': '\uf249', 'color': '#36F5D5', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('file')
|
||||
|
||||
# obj_attrs.append(obj.add_attribute('sha256', value=self.id))
|
||||
# obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'tags_safe' in options:
|
||||
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
|
||||
return meta
|
||||
|
||||
def create(self): # create ALL SET ??????
|
||||
pass
|
||||
|
||||
def add_reference(self, date, src_ail_object, file_obj=None):
|
||||
self.add(date, src_ail_object)
|
||||
if file_obj:
|
||||
self.add_correlation(file_obj.type, file_obj.get_subtype(r_str=True), file_obj.get_id())
|
||||
|
||||
# TODO USE ZSET FOR ALL OBJS IDS ??????
|
||||
|
||||
class FilesNames(AbstractDaterangeObjects):
|
||||
"""
|
||||
CookieName Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('file-name', FileName)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search
|
||||
|
||||
# TODO sanitize file name
|
||||
def create(self, name, date, src_ail_object, file_obj=None, limit=500, force=False):
|
||||
if 0 < len(name) <= limit or force or limit < 0:
|
||||
file_name = self.obj_class(name)
|
||||
# if not file_name.exists():
|
||||
# file_name.create()
|
||||
file_name.add_reference(date, src_ail_object, file_obj=file_obj)
|
||||
return file_name
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '29ba'
|
||||
# print(search_screenshots_by_name(name_to_search))
|
135
bin/lib/objects/HHHashs.py
Executable file
135
bin/lib/objects/HHHashs.py
Executable file
|
@ -0,0 +1,135 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import hashlib
|
||||
import os
|
||||
import sys
|
||||
|
||||
from flask import url_for
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_objects = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
class HHHash(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL HHHash Object.
|
||||
"""
|
||||
|
||||
def __init__(self, obj_id):
|
||||
super(HHHash, self).__init__('hhhash', obj_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
# TODO # CHANGE COLOR
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf036', 'color': '#71D090', 'radius': 5}
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('hhhash')
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('hhhash', value=self.get_id()))
|
||||
obj_attrs.append(obj.add_attribute('hhhash-headers', value=self.get_content()))
|
||||
obj_attrs.append(obj.add_attribute('hhhash-tool', value='lacus'))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('domain')
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def create(self, hhhash_header, _first_seen=None, _last_seen=None): # TODO CREATE ADD FUNCTION -> urls set
|
||||
self._set_field('content', hhhash_header)
|
||||
self._create()
|
||||
|
||||
|
||||
def create(hhhash_header, hhhash=None):
|
||||
if not hhhash:
|
||||
hhhash = hhhash_headers(hhhash_header)
|
||||
hhhash = HHHash(hhhash)
|
||||
if not hhhash.exists():
|
||||
hhhash.create(hhhash_header)
|
||||
return hhhash
|
||||
|
||||
def build_hhhash_headers(dict_headers): # filter_dup=True
|
||||
hhhash = ''
|
||||
previous_header = ''
|
||||
for header in dict_headers:
|
||||
header_name = header.get('name')
|
||||
if header_name:
|
||||
if header_name != previous_header: # remove dup headers, filter playwright invalid splitting
|
||||
hhhash = f'{hhhash}:{header_name}'
|
||||
previous_header = header_name
|
||||
hhhash = hhhash[1:]
|
||||
# print(hhhash)
|
||||
return hhhash
|
||||
|
||||
def hhhash_headers(header_hhhash):
|
||||
m = hashlib.sha256()
|
||||
m.update(header_hhhash.encode())
|
||||
digest = m.hexdigest()
|
||||
return f"hhh:1:{digest}"
|
||||
|
||||
|
||||
class HHHashs(AbstractDaterangeObjects):
|
||||
"""
|
||||
HHHashs Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('hhhash', HHHash)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '98'
|
||||
# print(search_cves_by_name(name_to_search))
|
135
bin/lib/objects/Images.py
Executable file
135
bin/lib/objects/Images.py
Executable file
|
@ -0,0 +1,135 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import base64
|
||||
import os
|
||||
import sys
|
||||
|
||||
from hashlib import sha256
|
||||
from io import BytesIO
|
||||
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_daterange_object import AbstractDaterangeObject, AbstractDaterangeObjects
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_serv_metadata = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
IMAGE_FOLDER = config_loader.get_files_directory('images')
|
||||
config_loader = None
|
||||
|
||||
|
||||
class Image(AbstractDaterangeObject):
|
||||
"""
|
||||
AIL Screenshot Object. (strings)
|
||||
"""
|
||||
|
||||
# ID = SHA256
|
||||
def __init__(self, image_id):
|
||||
super(Image, self).__init__('image', image_id)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def exists(self):
|
||||
return os.path.isfile(self.get_filepath())
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'far', 'icon': '\uf03e', 'color': '#E1F5DF', 'radius': 5}
|
||||
|
||||
def get_rel_path(self):
|
||||
rel_path = os.path.join(self.id[0:2], self.id[2:4], self.id[4:6], self.id[6:8], self.id[8:10], self.id[10:12], self.id[12:])
|
||||
return rel_path
|
||||
|
||||
def get_filepath(self):
|
||||
filename = os.path.join(IMAGE_FOLDER, self.get_rel_path())
|
||||
return os.path.realpath(filename)
|
||||
|
||||
def get_file_content(self):
|
||||
filepath = self.get_filepath()
|
||||
with open(filepath, 'rb') as f:
|
||||
file_content = BytesIO(f.read())
|
||||
return file_content
|
||||
|
||||
def get_content(self, r_type='str'):
|
||||
return self.get_file_content()
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
obj = MISPObject('file')
|
||||
|
||||
obj_attrs.append(obj.add_attribute('sha256', value=self.id))
|
||||
obj_attrs.append(obj.add_attribute('attachment', value=self.id, data=self.get_file_content()))
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['img'] = self.id
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
if 'tags_safe' in options:
|
||||
meta['tags_safe'] = self.is_tags_safe(meta['tags'])
|
||||
return meta
|
||||
|
||||
def create(self, content):
|
||||
filepath = self.get_filepath()
|
||||
dirname = os.path.dirname(filepath)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
with open(filepath, 'wb') as f:
|
||||
f.write(content)
|
||||
|
||||
def get_screenshot_dir():
|
||||
return IMAGE_FOLDER
|
||||
|
||||
|
||||
def create(content, size_limit=5000000, b64=False, force=False):
|
||||
size = (len(content)*3) / 4
|
||||
if size <= size_limit or size_limit < 0 or force:
|
||||
if b64:
|
||||
content = base64.standard_b64decode(content.encode())
|
||||
image_id = sha256(content).hexdigest()
|
||||
image = Image(image_id)
|
||||
if not image.exists():
|
||||
image.create(content)
|
||||
return image
|
||||
|
||||
|
||||
class Images(AbstractDaterangeObjects):
|
||||
"""
|
||||
CookieName Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('image', Image)
|
||||
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search # TODO
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# name_to_search = '29ba'
|
||||
# print(search_screenshots_by_name(name_to_search))
|
|
@ -7,10 +7,10 @@ import magic
|
|||
import os
|
||||
import re
|
||||
import sys
|
||||
import cld3
|
||||
import html2text
|
||||
|
||||
from io import BytesIO
|
||||
from uuid import uuid4
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
|
@ -18,10 +18,11 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ail_core import get_ail_uuid
|
||||
from lib.ail_core import get_ail_uuid, rreplace
|
||||
from lib.objects.abstract_object import AbstractObject
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import item_basic
|
||||
from lib.Language import LanguagesDetector
|
||||
from lib.data_retention_engine import update_obj_date, get_obj_date_first
|
||||
from packages import Date
|
||||
|
||||
|
@ -137,9 +138,23 @@ class Item(AbstractObject):
|
|||
####################################################################################
|
||||
####################################################################################
|
||||
|
||||
def sanitize_id(self):
|
||||
pass
|
||||
# TODO ADD function to check if ITEM (content + file) already exists
|
||||
|
||||
def sanitize_id(self):
|
||||
if ITEMS_FOLDER in self.id:
|
||||
self.id = self.id.replace(ITEMS_FOLDER, '', 1)
|
||||
|
||||
# limit filename length
|
||||
basename = self.get_basename()
|
||||
if len(basename) > 255:
|
||||
new_basename = f'{basename[:215]}{str(uuid4())}.gz'
|
||||
self.id = rreplace(self.id, basename, new_basename, 1)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
return self.id
|
||||
|
||||
# # TODO: sanitize_id
|
||||
# # TODO: check if already exists ?
|
||||
|
@ -264,10 +279,9 @@ class Item(AbstractObject):
|
|||
"""
|
||||
if options is None:
|
||||
options = set()
|
||||
meta = {'id': self.id,
|
||||
'date': self.get_date(separator=True),
|
||||
'source': self.get_source(),
|
||||
'tags': self.get_tags(r_list=True)}
|
||||
meta = self.get_default_meta(tags=True)
|
||||
meta['date'] = self.get_date(separator=True)
|
||||
meta['source'] = self.get_source()
|
||||
# optional meta fields
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
|
@ -289,6 +303,8 @@ class Item(AbstractObject):
|
|||
meta['mimetype'] = self.get_mimetype(content=content)
|
||||
if 'investigations' in options:
|
||||
meta['investigations'] = self.get_investigations()
|
||||
if 'link' in options:
|
||||
meta['link'] = self.get_link(flask_context=True)
|
||||
|
||||
# meta['encoding'] = None
|
||||
return meta
|
||||
|
@ -322,21 +338,10 @@ class Item(AbstractObject):
|
|||
nb_line += 1
|
||||
return {'nb': nb_line, 'max_length': max_length}
|
||||
|
||||
def get_languages(self, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7):
|
||||
all_languages = []
|
||||
## CLEAN CONTENT ##
|
||||
content = self.get_html2text_content(ignore_links=True)
|
||||
content = remove_all_urls_from_content(self.id, item_content=content) ##########################################
|
||||
# REMOVE USELESS SPACE
|
||||
content = ' '.join(content.split())
|
||||
#- CLEAN CONTENT -#
|
||||
#print(content)
|
||||
#print(len(content))
|
||||
if len(content) >= min_len: # # TODO: # FIXME: check num langs limit
|
||||
for lang in cld3.get_frequent_languages(content, num_langs=num_langs):
|
||||
if lang.proportion >= min_proportion and lang.probability >= min_probability and lang.is_reliable:
|
||||
all_languages.append(lang)
|
||||
return all_languages
|
||||
# TODO RENAME ME
|
||||
def get_languages(self, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7, force_gcld3=False):
|
||||
ld = LanguagesDetector(nb_langs=num_langs, min_proportion=min_proportion, min_probability=min_probability, min_len=min_len)
|
||||
return ld.detect(self.get_content(), force_gcld3=force_gcld3)
|
||||
|
||||
def get_mimetype(self, content=None):
|
||||
if not content:
|
||||
|
@ -482,7 +487,10 @@ def get_all_items_objects(filters={}):
|
|||
daterange = Date.get_daterange(date_from, date_to)
|
||||
else:
|
||||
date_from = get_obj_date_first('item')
|
||||
daterange = Date.get_daterange(date_from, Date.get_today_date_str())
|
||||
if date_from:
|
||||
daterange = Date.get_daterange(date_from, Date.get_today_date_str())
|
||||
else:
|
||||
daterange = []
|
||||
if start_date:
|
||||
if int(start_date) > int(date_from):
|
||||
i = 0
|
||||
|
@ -621,61 +629,6 @@ def get_item_metadata(item_id, item_content=None):
|
|||
def get_item_content(item_id):
|
||||
return item_basic.get_item_content(item_id)
|
||||
|
||||
def get_item_content_html2text(item_id, item_content=None, ignore_links=False):
|
||||
if not item_content:
|
||||
item_content = get_item_content(item_id)
|
||||
h = html2text.HTML2Text()
|
||||
h.ignore_links = ignore_links
|
||||
h.ignore_images = ignore_links
|
||||
return h.handle(item_content)
|
||||
|
||||
def remove_all_urls_from_content(item_id, item_content=None):
|
||||
if not item_content:
|
||||
item_content = get_item_content(item_id)
|
||||
regex = r'\b(?:http://|https://)?(?:[a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)(?:\:[0-9]+)*(?:/(?:$|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*\b'
|
||||
url_regex = re.compile(regex)
|
||||
urls = url_regex.findall(item_content)
|
||||
urls = sorted(urls, key=len, reverse=True)
|
||||
for url in urls:
|
||||
item_content = item_content.replace(url, '')
|
||||
|
||||
regex_pgp_public_blocs = r'-----BEGIN PGP PUBLIC KEY BLOCK-----[\s\S]+?-----END PGP PUBLIC KEY BLOCK-----'
|
||||
regex_pgp_signature = r'-----BEGIN PGP SIGNATURE-----[\s\S]+?-----END PGP SIGNATURE-----'
|
||||
regex_pgp_message = r'-----BEGIN PGP MESSAGE-----[\s\S]+?-----END PGP MESSAGE-----'
|
||||
re.compile(regex_pgp_public_blocs)
|
||||
re.compile(regex_pgp_signature)
|
||||
re.compile(regex_pgp_message)
|
||||
|
||||
res = re.findall(regex_pgp_public_blocs, item_content)
|
||||
for it in res:
|
||||
item_content = item_content.replace(it, '')
|
||||
res = re.findall(regex_pgp_signature, item_content)
|
||||
for it in res:
|
||||
item_content = item_content.replace(it, '')
|
||||
res = re.findall(regex_pgp_message, item_content)
|
||||
for it in res:
|
||||
item_content = item_content.replace(it, '')
|
||||
|
||||
return item_content
|
||||
|
||||
def get_item_languages(item_id, min_len=600, num_langs=3, min_proportion=0.2, min_probability=0.7):
|
||||
all_languages = []
|
||||
|
||||
## CLEAN CONTENT ##
|
||||
content = get_item_content_html2text(item_id, ignore_links=True)
|
||||
content = remove_all_urls_from_content(item_id, item_content=content)
|
||||
|
||||
# REMOVE USELESS SPACE
|
||||
content = ' '.join(content.split())
|
||||
#- CLEAN CONTENT -#
|
||||
|
||||
#print(content)
|
||||
#print(len(content))
|
||||
if len(content) >= min_len:
|
||||
for lang in cld3.get_frequent_languages(content, num_langs=num_langs):
|
||||
if lang.proportion >= min_proportion and lang.probability >= min_probability and lang.is_reliable:
|
||||
all_languages.append(lang)
|
||||
return all_languages
|
||||
|
||||
# API
|
||||
# def get_item(request_dict):
|
||||
|
@ -926,13 +879,13 @@ def create_item(obj_id, obj_metadata, io_content):
|
|||
# delete_item(child_id)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# if __name__ == '__main__':
|
||||
# content = 'test file content'
|
||||
# duplicates = {'tests/2020/01/02/test.gz': [{'algo':'ssdeep', 'similarity':75}, {'algo':'tlsh', 'similarity':45}]}
|
||||
#
|
||||
# item = Item('tests/2020/01/02/test_save.gz')
|
||||
# item = Item('tests/2020/01/02/test_save.gz')
|
||||
# item.create(content, _save=False)
|
||||
filters = {'date_from': '20230101', 'date_to': '20230501', 'sources': ['crawled', 'submitted'], 'start': ':submitted/2023/04/28/submitted_2b3dd861-a75d-48e4-8cec-6108d41450da.gz'}
|
||||
gen = get_all_items_objects(filters=filters)
|
||||
for obj_id in gen:
|
||||
print(obj_id.id)
|
||||
# filters = {'date_from': '20230101', 'date_to': '20230501', 'sources': ['crawled', 'submitted'], 'start': ':submitted/2023/04/28/submitted_2b3dd861-a75d-48e4-8cec-6108d41450da.gz'}
|
||||
# gen = get_all_items_objects(filters=filters)
|
||||
# for obj_id in gen:
|
||||
# print(obj_id.id)
|
||||
|
|
348
bin/lib/objects/Messages.py
Executable file
348
bin/lib/objects/Messages.py
Executable file
|
@ -0,0 +1,348 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ail_core import get_ail_uuid
|
||||
from lib.objects.abstract_object import AbstractObject
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import Language
|
||||
from lib.objects import UsersAccount
|
||||
from lib.data_retention_engine import update_obj_date, get_obj_date_first
|
||||
# TODO Set all messages ???
|
||||
|
||||
|
||||
from flask import url_for
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
# r_content = config_loader.get_db_conn("Kvrocks_Content")
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
# TODO SAVE OR EXTRACT MESSAGE SOURCE FOR ICON ?????????
|
||||
# TODO iterate on all objects
|
||||
# TODO also add support for small objects ????
|
||||
|
||||
# CAN Message exists without CHAT -> no convert it to object
|
||||
|
||||
# ID: source:chat_id:message_id ????
|
||||
#
|
||||
# /!\ handle null chat and message id -> chat = uuid and message = timestamp ???
|
||||
|
||||
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<message ID> => telegram without channels
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Channel ID>/<message ID>
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Thread ID>/<message ID>
|
||||
# ID = <ChatInstance UUID>/<timestamp>/<chat ID>/<Channel ID>/<Thread ID>/<message ID>
|
||||
class Message(AbstractObject):
|
||||
"""
|
||||
AIL Message Object. (strings)
|
||||
"""
|
||||
|
||||
def __init__(self, id): # TODO subtype or use source ????
|
||||
super(Message, self).__init__('message', id) # message::< telegram/1692189934.380827/ChatID_MessageID >
|
||||
|
||||
def exists(self):
|
||||
if self.subtype is None:
|
||||
return r_object.exists(f'meta:{self.type}:{self.id}')
|
||||
else:
|
||||
return r_object.exists(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
|
||||
|
||||
def get_source(self):
|
||||
"""
|
||||
Returns source/feeder name
|
||||
"""
|
||||
l_source = self.id.split('/')[:-2]
|
||||
return os.path.join(*l_source)
|
||||
|
||||
def get_basename(self):
|
||||
return os.path.basename(self.id)
|
||||
|
||||
def get_content(self, r_type='str'): # TODO ADD cache # TODO Compress content ???????
|
||||
"""
|
||||
Returns content
|
||||
"""
|
||||
global_id = self.get_global_id()
|
||||
content = r_cache.get(f'content:{global_id}')
|
||||
if not content:
|
||||
content = self._get_field('content')
|
||||
if content:
|
||||
r_cache.set(f'content:{global_id}', content)
|
||||
r_cache.expire(f'content:{global_id}', 300)
|
||||
if r_type == 'str':
|
||||
return content
|
||||
elif r_type == 'bytes':
|
||||
return content.encode()
|
||||
|
||||
def get_date(self):
|
||||
timestamp = self.get_timestamp()
|
||||
return datetime.fromtimestamp(float(timestamp)).strftime('%Y%m%d')
|
||||
|
||||
def get_timestamp(self):
|
||||
dirs = self.id.split('/')
|
||||
return dirs[1]
|
||||
|
||||
def get_message_id(self): # TODO optimize
|
||||
message_id = self.get_basename().rsplit('/', 1)[1]
|
||||
# if message_id.endswith('.gz'):
|
||||
# message_id = message_id[:-3]
|
||||
return message_id
|
||||
|
||||
def get_chat_id(self): # TODO optimize -> use me to tag Chat
|
||||
chat_id = self.get_basename().rsplit('_', 1)[0]
|
||||
return chat_id
|
||||
|
||||
def get_thread(self):
|
||||
for child in self.get_childrens():
|
||||
obj_type, obj_subtype, obj_id = child.split(':', 2)
|
||||
if obj_type == 'chat-thread':
|
||||
nb_messages = r_object.zcard(f'messages:{obj_type}:{obj_subtype}:{obj_id}')
|
||||
return {'type': obj_type, 'subtype': obj_subtype, 'id': obj_id, 'nb': nb_messages}
|
||||
|
||||
# TODO get Instance ID
|
||||
# TODO get channel ID
|
||||
# TODO get thread ID
|
||||
|
||||
def get_images(self):
|
||||
images = []
|
||||
for child in self.get_childrens():
|
||||
obj_type, _, obj_id = child.split(':', 2)
|
||||
if obj_type == 'image':
|
||||
images.append(obj_id)
|
||||
return images
|
||||
|
||||
def get_user_account(self, meta=False):
|
||||
user_account = self.get_correlation('user-account')
|
||||
if user_account.get('user-account'):
|
||||
user_account = f'user-account:{user_account["user-account"].pop()}'
|
||||
if meta:
|
||||
_, user_account_subtype, user_account_id = user_account.split(':', 3)
|
||||
user_account = UsersAccount.UserAccount(user_account_id, user_account_subtype).get_meta(options={'icon', 'username', 'username_meta'})
|
||||
return user_account
|
||||
|
||||
def get_files_names(self):
|
||||
names = []
|
||||
filenames = self.get_correlation('file-name').get('file-name')
|
||||
if filenames:
|
||||
for name in filenames:
|
||||
names.append(name[1:])
|
||||
return names
|
||||
|
||||
def get_reactions(self):
|
||||
return r_object.hgetall(f'meta:reactions:{self.type}::{self.id}')
|
||||
|
||||
# TODO sanitize reactions
|
||||
def add_reaction(self, reactions, nb_reaction):
|
||||
r_object.hset(f'meta:reactions:{self.type}::{self.id}', reactions, nb_reaction)
|
||||
|
||||
# Interactions between users -> use replies
|
||||
# nb views
|
||||
# MENTIONS -> Messages + Chats
|
||||
# # relationship -> mention - Chat -> Chat
|
||||
# - Message -> Chat
|
||||
# - Message -> Message ??? fetch mentioned messages
|
||||
# FORWARDS
|
||||
# TODO Create forward CHAT -> message
|
||||
# message (is forwarded) -> message (is forwarded from) ???
|
||||
# # TODO get source message timestamp
|
||||
#
|
||||
# # is forwarded
|
||||
# # forwarded from -> check if relationship
|
||||
# # nb forwarded -> scard relationship
|
||||
#
|
||||
# Messages -> CHATS -> NB forwarded
|
||||
# CHAT -> NB forwarded by chats -> NB messages -> parse full set ????
|
||||
#
|
||||
#
|
||||
#
|
||||
#
|
||||
#
|
||||
#
|
||||
# show users chats
|
||||
# message media
|
||||
# flag is deleted -> event or missing from feeder pass ???
|
||||
|
||||
def get_translation(self, content=None, source=None, target='fr'):
|
||||
"""
|
||||
Returns translated content
|
||||
"""
|
||||
|
||||
# return self._get_field('translated')
|
||||
global_id = self.get_global_id()
|
||||
translation = r_cache.get(f'translation:{target}:{global_id}')
|
||||
r_cache.expire(f'translation:{target}:{global_id}', 0)
|
||||
if translation:
|
||||
return translation
|
||||
if not content:
|
||||
content = self.get_content()
|
||||
translation = Language.LanguageTranslator().translate(content, source=source, target=target)
|
||||
if translation:
|
||||
r_cache.set(f'translation:{target}:{global_id}', translation)
|
||||
r_cache.expire(f'translation:{target}:{global_id}', 300)
|
||||
return translation
|
||||
|
||||
def _set_translation(self, translation):
|
||||
"""
|
||||
Set translated content
|
||||
"""
|
||||
return self._set_field('translated', translation) # translation by hash ??? -> avoid translating multiple time
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True)}
|
||||
# return payload
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('chats_explorer.objects_message', type=self.type, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/objects/message?id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self):
|
||||
return {'style': 'fas', 'icon': '\uf4ad', 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_misp_object(self): # TODO
|
||||
obj = MISPObject('instant-message', standalone=True)
|
||||
obj_date = self.get_date()
|
||||
if obj_date:
|
||||
obj.first_seen = obj_date
|
||||
else:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={obj_date}')
|
||||
|
||||
# obj_attrs = [obj.add_attribute('first-seen', value=obj_date),
|
||||
# obj.add_attribute('raw-data', value=self.id, data=self.get_raw_content()),
|
||||
# obj.add_attribute('sensor', value=get_ail_uuid())]
|
||||
obj_attrs = []
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
# def get_url(self):
|
||||
# return r_object.hget(f'meta:item::{self.id}', 'url')
|
||||
|
||||
# options: set of optional meta fields
|
||||
def get_meta(self, options=None, timestamp=None, translation_target='en'):
|
||||
"""
|
||||
:type options: set
|
||||
:type timestamp: float
|
||||
"""
|
||||
if options is None:
|
||||
options = set()
|
||||
meta = self.get_default_meta(tags=True)
|
||||
|
||||
# timestamp
|
||||
if not timestamp:
|
||||
timestamp = self.get_timestamp()
|
||||
else:
|
||||
timestamp = float(timestamp)
|
||||
timestamp = datetime.fromtimestamp(float(timestamp))
|
||||
meta['date'] = timestamp.strftime('%Y/%m/%d')
|
||||
meta['hour'] = timestamp.strftime('%H:%M:%S')
|
||||
meta['full_date'] = timestamp.isoformat(' ')
|
||||
|
||||
meta['source'] = self.get_source()
|
||||
# optional meta fields
|
||||
if 'content' in options:
|
||||
meta['content'] = self.get_content()
|
||||
if 'parent' in options:
|
||||
meta['parent'] = self.get_parent()
|
||||
if meta['parent'] and 'parent_meta' in options:
|
||||
options.remove('parent')
|
||||
parent_type, _, parent_id = meta['parent'].split(':', 3)
|
||||
if parent_type == 'message':
|
||||
message = Message(parent_id)
|
||||
meta['reply_to'] = message.get_meta(options=options, translation_target=translation_target)
|
||||
if 'investigations' in options:
|
||||
meta['investigations'] = self.get_investigations()
|
||||
if 'link' in options:
|
||||
meta['link'] = self.get_link(flask_context=True)
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_svg_icon()
|
||||
if 'user-account' in options:
|
||||
meta['user-account'] = self.get_user_account(meta=True)
|
||||
if not meta['user-account']:
|
||||
meta['user-account'] = {'id': 'UNKNOWN'}
|
||||
if 'chat' in options:
|
||||
meta['chat'] = self.get_chat_id()
|
||||
if 'thread' in options:
|
||||
thread = self.get_thread()
|
||||
if thread:
|
||||
meta['thread'] = thread
|
||||
if 'images' in options:
|
||||
meta['images'] = self.get_images()
|
||||
if 'files-names' in options:
|
||||
meta['files-names'] = self.get_files_names()
|
||||
if 'reactions' in options:
|
||||
meta['reactions'] = self.get_reactions()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation'] = self.translate(content=meta.get('content'), target=translation_target)
|
||||
|
||||
# meta['encoding'] = None
|
||||
return meta
|
||||
|
||||
# def translate(self, content=None): # TODO translation plugin
|
||||
# # TODO get text language
|
||||
# if not content:
|
||||
# content = self.get_content()
|
||||
# translated = argostranslate.translate.translate(content, 'ru', 'en')
|
||||
# # Save translation
|
||||
# self._set_translation(translated)
|
||||
# return translated
|
||||
|
||||
def create(self, content, translation=None, tags=[]):
|
||||
self._set_field('content', content)
|
||||
# r_content.get(f'content:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', content)
|
||||
if translation:
|
||||
self._set_translation(translation)
|
||||
for tag in tags:
|
||||
self.add_tag(tag)
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
pass
|
||||
|
||||
def create_obj_id(chat_instance, chat_id, message_id, timestamp, channel_id=None, thread_id=None): # TODO CHECK COLLISIONS
|
||||
timestamp = int(timestamp)
|
||||
if channel_id and thread_id:
|
||||
return f'{chat_instance}/{timestamp}/{chat_id}/{thread_id}/{message_id}'
|
||||
elif channel_id:
|
||||
return f'{chat_instance}/{timestamp}/{channel_id}/{chat_id}/{message_id}'
|
||||
elif thread_id:
|
||||
return f'{chat_instance}/{timestamp}/{chat_id}/{thread_id}/{message_id}'
|
||||
else:
|
||||
return f'{chat_instance}/{timestamp}/{chat_id}/{message_id}'
|
||||
|
||||
# thread id of message
|
||||
# thread id of chat
|
||||
# thread id of subchannel
|
||||
|
||||
# TODO Check if already exists
|
||||
# def create(source, chat_id, message_id, timestamp, content, tags=[]):
|
||||
def create(obj_id, content, translation=None, tags=[]):
|
||||
message = Message(obj_id)
|
||||
# if not message.exists():
|
||||
message.create(content, translation=translation, tags=tags)
|
||||
return message
|
||||
|
||||
|
||||
# TODO Encode translation
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
r = 'test'
|
||||
print(r)
|
|
@ -88,7 +88,7 @@ class Screenshot(AbstractObject):
|
|||
return obj
|
||||
|
||||
def get_meta(self, options=set()):
|
||||
meta = {'id': self.id}
|
||||
meta = self.get_default_meta()
|
||||
meta['img'] = get_screenshot_rel_path(self.id) ######### # TODO: Rename ME ??????
|
||||
meta['tags'] = self.get_tags(r_list=True)
|
||||
if 'tags_safe' in options:
|
||||
|
|
|
@ -7,6 +7,8 @@ import sys
|
|||
from hashlib import sha256
|
||||
from flask import url_for
|
||||
|
||||
# import warnings
|
||||
# warnings.filterwarnings("ignore", category=DeprecationWarning)
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
|
@ -43,6 +45,8 @@ class Title(AbstractDaterangeObject):
|
|||
def get_content(self, r_type='str'):
|
||||
if r_type == 'str':
|
||||
return self._get_field('content')
|
||||
elif r_type == 'bytes':
|
||||
return self._get_field('content').encode()
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
|
@ -80,9 +84,6 @@ class Title(AbstractDaterangeObject):
|
|||
meta['content'] = self.get_content()
|
||||
return meta
|
||||
|
||||
def add(self, date, item_id):
|
||||
self._add(date, item_id)
|
||||
|
||||
def create(self, content, _first_seen=None, _last_seen=None):
|
||||
self._set_field('content', content)
|
||||
self._create()
|
||||
|
@ -100,21 +101,23 @@ class Titles(AbstractDaterangeObjects):
|
|||
Titles Objects
|
||||
"""
|
||||
def __init__(self):
|
||||
super().__init__('title')
|
||||
super().__init__('title', Title)
|
||||
|
||||
def get_metas(self, obj_ids, options=set()):
|
||||
return self._get_metas(Title, obj_ids, options=options)
|
||||
|
||||
def sanitize_name_to_search(self, name_to_search):
|
||||
def sanitize_id_to_search(self, name_to_search):
|
||||
return name_to_search
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# from lib import crawlers
|
||||
# from lib.objects import Items
|
||||
# for item in Items.get_all_items_objects(filters={'sources': ['crawled']}):
|
||||
# title_content = crawlers.extract_title_from_html(item.get_content())
|
||||
# if title_content:
|
||||
# print(item.id, title_content)
|
||||
# title = create_title(title_content)
|
||||
# title.add(item.get_date(), item.id)
|
||||
# # from lib import crawlers
|
||||
# # from lib.objects import Items
|
||||
# # for item in Items.get_all_items_objects(filters={'sources': ['crawled']}):
|
||||
# # title_content = crawlers.extract_title_from_html(item.get_content())
|
||||
# # if title_content:
|
||||
# # print(item.id, title_content)
|
||||
# # title = create_title(title_content)
|
||||
# # title.add(item.get_date(), item.id)
|
||||
# titles = Titles()
|
||||
# # for r in titles.get_ids_iterator():
|
||||
# # print(r)
|
||||
# r = titles.search_by_id('f7d57B', r_pos=True, case_sensitive=False)
|
||||
# print(r)
|
||||
|
|
216
bin/lib/objects/UsersAccount.py
Executable file
216
bin/lib/objects/UsersAccount.py
Executable file
|
@ -0,0 +1,216 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
# import re
|
||||
|
||||
# from datetime import datetime
|
||||
from flask import url_for
|
||||
from pymisp import MISPObject
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib import ail_core
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.abstract_subtype_object import AbstractSubtypeObject, get_all_id
|
||||
from lib.timeline_engine import Timeline
|
||||
from lib.objects import Usernames
|
||||
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
baseurl = config_loader.get_config_str("Notifications", "ail_domain")
|
||||
config_loader = None
|
||||
|
||||
|
||||
################################################################################
|
||||
################################################################################
|
||||
################################################################################
|
||||
|
||||
class UserAccount(AbstractSubtypeObject):
|
||||
"""
|
||||
AIL User Object. (strings)
|
||||
"""
|
||||
|
||||
def __init__(self, id, subtype):
|
||||
super(UserAccount, self).__init__('user-account', id, subtype)
|
||||
|
||||
# def get_ail_2_ail_payload(self):
|
||||
# payload = {'raw': self.get_gzip_content(b64=True),
|
||||
# 'compress': 'gzip'}
|
||||
# return payload
|
||||
|
||||
# # WARNING: UNCLEAN DELETE /!\ TEST ONLY /!\
|
||||
def delete(self):
|
||||
# # TODO:
|
||||
pass
|
||||
|
||||
def get_link(self, flask_context=False):
|
||||
if flask_context:
|
||||
url = url_for('correlation.show_correlation', type=self.type, subtype=self.subtype, id=self.id)
|
||||
else:
|
||||
url = f'{baseurl}/correlation/show?type={self.type}&subtype={self.subtype}&id={self.id}'
|
||||
return url
|
||||
|
||||
def get_svg_icon(self): # TODO change icon/color
|
||||
return {'style': 'fas', 'icon': '\uf2bd', 'color': '#4dffff', 'radius': 5}
|
||||
|
||||
def get_first_name(self):
|
||||
return self._get_field('firstname')
|
||||
|
||||
def get_last_name(self):
|
||||
return self._get_field('lastname')
|
||||
|
||||
def get_phone(self):
|
||||
return self._get_field('phone')
|
||||
|
||||
def set_first_name(self, firstname):
|
||||
return self._set_field('firstname', firstname)
|
||||
|
||||
def set_last_name(self, lastname):
|
||||
return self._set_field('lastname', lastname)
|
||||
|
||||
def set_phone(self, phone):
|
||||
return self._set_field('phone', phone)
|
||||
|
||||
def get_icon(self):
|
||||
icon = self._get_field('icon')
|
||||
if icon:
|
||||
return icon.rsplit(':', 1)[1]
|
||||
|
||||
def set_icon(self, icon):
|
||||
self._set_field('icon', icon)
|
||||
|
||||
def get_info(self):
|
||||
return self._get_field('info')
|
||||
|
||||
def set_info(self, info):
|
||||
return self._set_field('info', info)
|
||||
|
||||
# def get_created_at(self, date=False):
|
||||
# created_at = self._get_field('created_at')
|
||||
# if date and created_at:
|
||||
# created_at = datetime.fromtimestamp(float(created_at))
|
||||
# created_at = created_at.isoformat(' ')
|
||||
# return created_at
|
||||
|
||||
# TODO MESSAGES:
|
||||
# 1) ALL MESSAGES + NB
|
||||
# 2) ALL MESSAGES TIMESTAMP
|
||||
# 3) ALL MESSAGES TIMESTAMP By: - chats
|
||||
# - subchannel
|
||||
# - thread
|
||||
|
||||
def get_chats(self):
|
||||
chats = self.get_correlation('chat')['chat']
|
||||
return chats
|
||||
|
||||
def get_chat_subchannels(self):
|
||||
chats = self.get_correlation('chat-subchannel')['chat-subchannel']
|
||||
return chats
|
||||
|
||||
def get_chat_threads(self):
|
||||
chats = self.get_correlation('chat-thread')['chat-thread']
|
||||
return chats
|
||||
|
||||
def _get_timeline_username(self):
|
||||
return Timeline(self.get_global_id(), 'username')
|
||||
|
||||
def get_username(self):
|
||||
return self._get_timeline_username().get_last_obj_id()
|
||||
|
||||
def get_usernames(self):
|
||||
return self._get_timeline_username().get_objs_ids()
|
||||
|
||||
def update_username_timeline(self, username_global_id, timestamp):
|
||||
self._get_timeline_username().add_timestamp(timestamp, username_global_id)
|
||||
|
||||
def get_messages_by_chat_obj(self, chat_obj):
|
||||
messages = []
|
||||
for mess in self.get_correlation_iter_obj(chat_obj, 'message'):
|
||||
messages.append(f'message:{mess}')
|
||||
return messages
|
||||
|
||||
def get_meta(self, options=set(), translation_target=None): # TODO Username timeline
|
||||
meta = self._get_meta(options=options)
|
||||
meta['id'] = self.id
|
||||
meta['subtype'] = self.subtype
|
||||
meta['tags'] = self.get_tags(r_list=True) # TODO add in options ????
|
||||
if 'username' in options:
|
||||
meta['username'] = self.get_username()
|
||||
if meta['username']:
|
||||
_, username_account_subtype, username_account_id = meta['username'].split(':', 3)
|
||||
if 'username_meta' in options:
|
||||
meta['username'] = Usernames.Username(username_account_id, username_account_subtype).get_meta()
|
||||
else:
|
||||
meta['username'] = {'type': 'username', 'subtype': username_account_subtype, 'id': username_account_id}
|
||||
if 'usernames' in options:
|
||||
meta['usernames'] = self.get_usernames()
|
||||
if 'icon' in options:
|
||||
meta['icon'] = self.get_icon()
|
||||
if 'info' in options:
|
||||
meta['info'] = self.get_info()
|
||||
if 'translation' in options and translation_target:
|
||||
meta['translation_info'] = self.translate(meta['info'], field='info', target=translation_target)
|
||||
# if 'created_at':
|
||||
# meta['created_at'] = self.get_created_at(date=True)
|
||||
if 'chats' in options:
|
||||
meta['chats'] = self.get_chats()
|
||||
if 'subchannels' in options:
|
||||
meta['subchannels'] = self.get_chat_subchannels()
|
||||
if 'threads' in options:
|
||||
meta['threads'] = self.get_chat_threads()
|
||||
return meta
|
||||
|
||||
def get_misp_object(self):
|
||||
obj_attrs = []
|
||||
if self.subtype == 'telegram':
|
||||
obj = MISPObject('telegram-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
|
||||
elif self.subtype == 'twitter':
|
||||
obj = MISPObject('twitter-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('name', value=self.id))
|
||||
|
||||
else:
|
||||
obj = MISPObject('user-account', standalone=True)
|
||||
obj_attrs.append(obj.add_attribute('username', value=self.id))
|
||||
|
||||
first_seen = self.get_first_seen()
|
||||
last_seen = self.get_last_seen()
|
||||
if first_seen:
|
||||
obj.first_seen = first_seen
|
||||
if last_seen:
|
||||
obj.last_seen = last_seen
|
||||
if not first_seen or not last_seen:
|
||||
self.logger.warning(
|
||||
f'Export error, None seen {self.type}:{self.subtype}:{self.id}, first={first_seen}, last={last_seen}')
|
||||
|
||||
for obj_attr in obj_attrs:
|
||||
for tag in self.get_tags():
|
||||
obj_attr.add_tag(tag)
|
||||
return obj
|
||||
|
||||
def get_user_by_username():
|
||||
pass
|
||||
|
||||
def get_all_subtypes():
|
||||
return ail_core.get_object_all_subtypes('user-account')
|
||||
|
||||
def get_all():
|
||||
users = {}
|
||||
for subtype in get_all_subtypes():
|
||||
users[subtype] = get_all_by_subtype(subtype)
|
||||
return users
|
||||
|
||||
def get_all_by_subtype(subtype):
|
||||
return get_all_id('user-account', subtype)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
from lib.objects import Chats
|
||||
chat = Chats.Chat('', '00098785-7e70-5d12-a120-c5cdc1252b2b')
|
||||
account = UserAccount('', '00098785-7e70-5d12-a120-c5cdc1252b2b')
|
||||
print(account.get_messages_by_chat_obj(chat))
|
306
bin/lib/objects/abstract_chat_object.py
Executable file
306
bin/lib/objects/abstract_chat_object.py
Executable file
|
@ -0,0 +1,306 @@
|
|||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
Base Class for AIL Objects
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from abc import ABC
|
||||
|
||||
from datetime import datetime
|
||||
# from flask import url_for
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.objects.abstract_subtype_object import AbstractSubtypeObject
|
||||
from lib.ail_core import unpack_correl_objs_id, zscan_iter ################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects import Messages
|
||||
from packages import Date
|
||||
|
||||
# from lib.data_retention_engine import update_obj_date
|
||||
|
||||
|
||||
# LOAD CONFIG
|
||||
config_loader = ConfigLoader()
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
# # FIXME: SAVE SUBTYPE NAMES ?????
|
||||
|
||||
class AbstractChatObject(AbstractSubtypeObject, ABC):
|
||||
"""
|
||||
Abstract Subtype Object
|
||||
"""
|
||||
|
||||
def __init__(self, obj_type, id, subtype):
|
||||
""" Abstract for all the AIL object
|
||||
|
||||
:param obj_type: object type (item, ...)
|
||||
:param id: Object ID
|
||||
"""
|
||||
super().__init__(obj_type, id, subtype)
|
||||
|
||||
# get useraccount / username
|
||||
# get users ?
|
||||
# timeline name ????
|
||||
# info
|
||||
# created
|
||||
# last imported/updated
|
||||
|
||||
# TODO get instance
|
||||
# TODO get protocol
|
||||
# TODO get network
|
||||
# TODO get address
|
||||
|
||||
def get_chat(self): # require ail object TODO ##
|
||||
if self.type != 'chat':
|
||||
parent = self.get_parent()
|
||||
if parent:
|
||||
obj_type, _ = parent.split(':', 1)
|
||||
if obj_type == 'chat':
|
||||
return parent
|
||||
|
||||
def get_subchannels(self):
|
||||
subchannels = []
|
||||
if self.type == 'chat': # category ???
|
||||
for obj_global_id in self.get_childrens():
|
||||
obj_type, _ = obj_global_id.split(':', 1)
|
||||
if obj_type == 'chat-subchannel':
|
||||
subchannels.append(obj_global_id)
|
||||
return subchannels
|
||||
|
||||
def get_nb_subchannels(self):
|
||||
nb = 0
|
||||
if self.type == 'chat':
|
||||
for obj_global_id in self.get_childrens():
|
||||
obj_type, _ = obj_global_id.split(':', 1)
|
||||
if obj_type == 'chat-subchannel':
|
||||
nb += 1
|
||||
return nb
|
||||
|
||||
def get_threads(self):
|
||||
threads = []
|
||||
for child in self.get_childrens():
|
||||
obj_type, obj_subtype, obj_id = child.split(':', 2)
|
||||
if obj_type == 'chat-thread':
|
||||
threads.append({'type': obj_type, 'subtype': obj_subtype, 'id': obj_id})
|
||||
return threads
|
||||
|
||||
def get_created_at(self, date=False):
|
||||
created_at = self._get_field('created_at')
|
||||
if date and created_at:
|
||||
created_at = datetime.fromtimestamp(float(created_at))
|
||||
created_at = created_at.isoformat(' ')
|
||||
return created_at
|
||||
|
||||
def set_created_at(self, timestamp):
|
||||
self._set_field('created_at', timestamp)
|
||||
|
||||
def get_name(self):
|
||||
name = self._get_field('name')
|
||||
if not name:
|
||||
name = ''
|
||||
return name
|
||||
|
||||
def set_name(self, name):
|
||||
self._set_field('name', name)
|
||||
|
||||
def get_icon(self):
|
||||
icon = self._get_field('icon')
|
||||
if icon:
|
||||
return icon.rsplit(':', 1)[1]
|
||||
|
||||
def set_icon(self, icon):
|
||||
self._set_field('icon', icon)
|
||||
|
||||
def get_info(self):
|
||||
return self._get_field('info')
|
||||
|
||||
def set_info(self, info):
|
||||
self._set_field('info', info)
|
||||
|
||||
def get_nb_messages(self):
|
||||
return r_object.zcard(f'messages:{self.type}:{self.subtype}:{self.id}')
|
||||
|
||||
def _get_messages(self, nb=-1, page=-1):
|
||||
if nb < 1:
|
||||
messages = r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, -1, withscores=True)
|
||||
nb_pages = 0
|
||||
page = 1
|
||||
total = len(messages)
|
||||
nb_first = 1
|
||||
nb_last = total
|
||||
else:
|
||||
total = r_object.zcard(f'messages:{self.type}:{self.subtype}:{self.id}')
|
||||
nb_pages = total / nb
|
||||
if not nb_pages.is_integer():
|
||||
nb_pages = int(nb_pages) + 1
|
||||
else:
|
||||
nb_pages = int(nb_pages)
|
||||
if page > nb_pages or page < 1:
|
||||
page = nb_pages
|
||||
|
||||
if page > 1:
|
||||
start = (page - 1) * nb
|
||||
else:
|
||||
start = 0
|
||||
messages = r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', start, start+nb-1, withscores=True)
|
||||
# if messages:
|
||||
# messages = reversed(messages)
|
||||
nb_first = start+1
|
||||
nb_last = start+nb
|
||||
if nb_last > total:
|
||||
nb_last = total
|
||||
return messages, {'nb': nb, 'page': page, 'nb_pages': nb_pages, 'total': total, 'nb_first': nb_first, 'nb_last': nb_last}
|
||||
|
||||
def get_timestamp_first_message(self):
|
||||
return r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0, withscores=True)
|
||||
|
||||
def get_timestamp_last_message(self):
|
||||
return r_object.zrevrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0, withscores=True)
|
||||
|
||||
def get_first_message(self):
|
||||
return r_object.zrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0)
|
||||
|
||||
def get_last_message(self):
|
||||
return r_object.zrevrange(f'messages:{self.type}:{self.subtype}:{self.id}', 0, 0)
|
||||
|
||||
def get_nb_message_by_hours(self, date_day, nb_day):
|
||||
hours = []
|
||||
# start=0, end=23
|
||||
timestamp = time.mktime(datetime.strptime(date_day, "%Y%m%d").timetuple())
|
||||
for i in range(24):
|
||||
timestamp_end = timestamp + 3600
|
||||
nb_messages = r_object.zcount(f'messages:{self.type}:{self.subtype}:{self.id}', timestamp, timestamp_end)
|
||||
timestamp = timestamp_end
|
||||
hours.append({'date': f'{date_day[0:4]}-{date_day[4:6]}-{date_day[6:8]}', 'day': nb_day, 'hour': i, 'count': nb_messages})
|
||||
return hours
|
||||
|
||||
def get_nb_message_by_week(self, date_day):
|
||||
date_day = Date.get_date_week_by_date(date_day)
|
||||
week_messages = []
|
||||
i = 0
|
||||
for date in Date.daterange_add_days(date_day, 6):
|
||||
week_messages = week_messages + self.get_nb_message_by_hours(date, i)
|
||||
i += 1
|
||||
return week_messages
|
||||
|
||||
def get_nb_message_this_week(self):
|
||||
week_date = Date.get_current_week_day()
|
||||
return self.get_nb_message_by_week(week_date)
|
||||
|
||||
def get_message_meta(self, message, timestamp=None, translation_target='en'): # TODO handle file message
|
||||
message = Messages.Message(message[9:])
|
||||
meta = message.get_meta(options={'content', 'files-names', 'images', 'link', 'parent', 'parent_meta', 'reactions', 'thread', 'translation', 'user-account'}, timestamp=timestamp, translation_target=translation_target)
|
||||
return meta
|
||||
|
||||
def get_messages(self, start=0, page=-1, nb=500, unread=False, translation_target='en'): # threads ???? # TODO ADD last/first message timestamp + return page
|
||||
# TODO return message meta
|
||||
tags = {}
|
||||
messages = {}
|
||||
curr_date = None
|
||||
try:
|
||||
nb = int(nb)
|
||||
except TypeError:
|
||||
nb = 500
|
||||
if not page:
|
||||
page = -1
|
||||
try:
|
||||
page = int(page)
|
||||
except TypeError:
|
||||
page = 1
|
||||
mess, pagination = self._get_messages(nb=nb, page=page)
|
||||
for message in mess:
|
||||
timestamp = message[1]
|
||||
date_day = datetime.fromtimestamp(timestamp).strftime('%Y/%m/%d')
|
||||
if date_day != curr_date:
|
||||
messages[date_day] = []
|
||||
curr_date = date_day
|
||||
mess_dict = self.get_message_meta(message[0], timestamp=timestamp, translation_target=translation_target)
|
||||
messages[date_day].append(mess_dict)
|
||||
|
||||
if mess_dict.get('tags'):
|
||||
for tag in mess_dict['tags']:
|
||||
if tag not in tags:
|
||||
tags[tag] = 0
|
||||
tags[tag] += 1
|
||||
return messages, pagination, tags
|
||||
|
||||
# TODO REWRITE ADD OR ADD MESSAGE ????
|
||||
# add
|
||||
# add message
|
||||
|
||||
def get_obj_by_message_id(self, message_id):
|
||||
return r_object.hget(f'messages:ids:{self.type}:{self.subtype}:{self.id}', message_id)
|
||||
|
||||
def add_message_cached_reply(self, reply_id, message_id):
|
||||
r_cache.sadd(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{reply_id}', message_id)
|
||||
r_cache.expire(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{reply_id}', 600)
|
||||
|
||||
def _get_message_cached_reply(self, message_id):
|
||||
return r_cache.smembers(f'messages:ids:{self.type}:{self.subtype}:{self.id}:{message_id}')
|
||||
|
||||
def get_cached_message_reply(self, message_id):
|
||||
objs_global_id = []
|
||||
for mess_id in self._get_message_cached_reply(message_id):
|
||||
obj_global_id = self.get_obj_by_message_id(mess_id) # TODO CATCH EXCEPTION
|
||||
if obj_global_id:
|
||||
objs_global_id.append(obj_global_id)
|
||||
return objs_global_id
|
||||
|
||||
def add_message(self, obj_global_id, message_id, timestamp, reply_id=None):
|
||||
r_object.hset(f'messages:ids:{self.type}:{self.subtype}:{self.id}', message_id, obj_global_id)
|
||||
r_object.zadd(f'messages:{self.type}:{self.subtype}:{self.id}', {obj_global_id: float(timestamp)})
|
||||
|
||||
# MESSAGE REPLY
|
||||
if reply_id:
|
||||
reply_obj = self.get_obj_by_message_id(reply_id) # TODO CATCH EXCEPTION
|
||||
if reply_obj:
|
||||
self.add_obj_children(reply_obj, obj_global_id)
|
||||
else:
|
||||
self.add_message_cached_reply(reply_id, message_id)
|
||||
# CACHED REPLIES
|
||||
for mess_id in self.get_cached_message_reply(message_id):
|
||||
self.add_obj_children(obj_global_id, mess_id)
|
||||
|
||||
# def get_deleted_messages(self, message_id):
|
||||
|
||||
def get_participants(self):
|
||||
return unpack_correl_objs_id('user-account', self.get_correlation('user-account')['user-account'], r_type='dict')
|
||||
|
||||
def get_nb_participants(self):
|
||||
return self.get_nb_correlation('user-account')
|
||||
|
||||
# TODO move me to abstract subtype
|
||||
class AbstractChatObjects(ABC):
|
||||
def __init__(self, type):
|
||||
self.type = type
|
||||
|
||||
def add_subtype(self, subtype):
|
||||
r_object.sadd(f'all_{self.type}:subtypes', subtype)
|
||||
|
||||
def get_subtypes(self):
|
||||
return r_object.smembers(f'all_{self.type}:subtypes')
|
||||
|
||||
def get_nb_ids_by_subtype(self, subtype):
|
||||
return r_object.zcard(f'{self.type}_all:{subtype}')
|
||||
|
||||
def get_ids_by_subtype(self, subtype):
|
||||
return r_object.zrange(f'{self.type}_all:{subtype}', 0, -1)
|
||||
|
||||
def get_all_id_iterator_iter(self, subtype):
|
||||
return zscan_iter(r_object, f'{self.type}_all:{subtype}')
|
||||
|
||||
def get_ids(self):
|
||||
pass
|
||||
|
||||
def search(self):
|
||||
pass
|
|
@ -45,10 +45,10 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
def exists(self):
|
||||
return r_object.exists(f'meta:{self.type}:{self.id}')
|
||||
|
||||
def _get_field(self, field):
|
||||
def _get_field(self, field): # TODO remove me (NEW in abstract)
|
||||
return r_object.hget(f'meta:{self.type}:{self.id}', field)
|
||||
|
||||
def _set_field(self, field, value):
|
||||
def _set_field(self, field, value): # TODO remove me (NEW in abstract)
|
||||
return r_object.hset(f'meta:{self.type}:{self.id}', field, value)
|
||||
|
||||
def get_first_seen(self, r_int=False):
|
||||
|
@ -71,8 +71,8 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
else:
|
||||
return last_seen
|
||||
|
||||
def get_nb_seen(self):
|
||||
return self.get_nb_correlation('item')
|
||||
def get_nb_seen(self): # TODO REPLACE ME -> correlation image
|
||||
return self.get_nb_correlation('item') + self.get_nb_correlation('message')
|
||||
|
||||
def get_nb_seen_by_date(self, date):
|
||||
nb = r_object.zscore(f'{self.type}:date:{date}', self.id)
|
||||
|
@ -82,9 +82,10 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
return int(nb)
|
||||
|
||||
def _get_meta(self, options=[]):
|
||||
meta_dict = {'first_seen': self.get_first_seen(),
|
||||
'last_seen': self.get_last_seen(),
|
||||
'nb_seen': self.get_nb_seen()}
|
||||
meta_dict = self.get_default_meta()
|
||||
meta_dict['first_seen'] = self.get_first_seen()
|
||||
meta_dict['last_seen'] = self.get_last_seen()
|
||||
meta_dict['nb_seen'] = self.get_nb_seen()
|
||||
if 'sparkline' in options:
|
||||
meta_dict['sparkline'] = self.get_sparkline()
|
||||
return meta_dict
|
||||
|
@ -124,9 +125,7 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
def _add_create(self):
|
||||
r_object.sadd(f'{self.type}:all', self.id)
|
||||
|
||||
# TODO don't increase nb if same hash in item with different encoding
|
||||
# if hash already in item
|
||||
def _add(self, date, item_id):
|
||||
def _add(self, date, obj): # TODO OBJ=None
|
||||
if not self.exists():
|
||||
self._add_create()
|
||||
self.set_first_seen(date)
|
||||
|
@ -135,15 +134,21 @@ class AbstractDaterangeObject(AbstractObject, ABC):
|
|||
self.update_daterange(date)
|
||||
update_obj_date(date, self.type)
|
||||
|
||||
# NB Object seen by day
|
||||
if not self.is_correlated('item', '', item_id): # if decoded not already in object
|
||||
r_object.zincrby(f'{self.type}:date:{date}', 1, self.id)
|
||||
r_object.zincrby(f'{self.type}:date:{date}', 1, self.id)
|
||||
|
||||
# Correlations
|
||||
self.add_correlation('item', '', item_id)
|
||||
if is_crawled(item_id): # Domain
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
if obj:
|
||||
# Correlations
|
||||
self.add_correlation(obj.type, obj.get_subtype(r_str=True), obj.get_id())
|
||||
|
||||
if obj.type == 'item':
|
||||
item_id = obj.get_id()
|
||||
# domain
|
||||
if is_crawled(item_id):
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
|
||||
def add(self, date, obj):
|
||||
self._add(date, obj)
|
||||
|
||||
# TODO:ADD objects + Stats
|
||||
def _create(self, first_seen=None, last_seen=None):
|
||||
|
@ -163,16 +168,21 @@ class AbstractDaterangeObjects(ABC):
|
|||
Abstract Daterange Objects
|
||||
"""
|
||||
|
||||
def __init__(self, obj_type):
|
||||
def __init__(self, obj_type, obj_class):
|
||||
""" Abstract for Daterange Objects
|
||||
|
||||
:param obj_type: object type (item, ...)
|
||||
:param obj_class: object python class (Item, ...)
|
||||
"""
|
||||
self.type = obj_type
|
||||
self.obj_class = obj_class
|
||||
|
||||
def get_all(self):
|
||||
def get_ids(self):
|
||||
return r_object.smembers(f'{self.type}:all')
|
||||
|
||||
# def get_ids_iterator(self):
|
||||
# return r_object.sscan_iter(r_object, f'{self.type}:all')
|
||||
|
||||
def get_by_date(self, date):
|
||||
return r_object.zrange(f'{self.type}:date:{date}', 0, -1)
|
||||
|
||||
|
@ -185,35 +195,61 @@ class AbstractDaterangeObjects(ABC):
|
|||
obj_ids = obj_ids | set(self.get_by_date(date))
|
||||
return obj_ids
|
||||
|
||||
@abstractmethod
|
||||
def get_metas(self, obj_ids, options=set()):
|
||||
pass
|
||||
|
||||
def _get_metas(self, obj_class_ref, obj_ids, options=set()):
|
||||
dict_obj = {}
|
||||
for obj_id in obj_ids:
|
||||
obj = obj_class_ref(obj_id)
|
||||
obj = self.obj_class(obj_id)
|
||||
dict_obj[obj_id] = obj.get_meta(options=options)
|
||||
return dict_obj
|
||||
|
||||
@abstractmethod
|
||||
def sanitize_name_to_search(self, name_to_search):
|
||||
return name_to_search
|
||||
def sanitize_id_to_search(self, id_to_search):
|
||||
return id_to_search
|
||||
|
||||
def search_by_name(self, name_to_search, r_pos=False):
|
||||
def search_by_id(self, name_to_search, r_pos=False, case_sensitive=True):
|
||||
objs = {}
|
||||
if case_sensitive:
|
||||
flags = 0
|
||||
else:
|
||||
flags = re.IGNORECASE
|
||||
# for subtype in subtypes:
|
||||
r_name = self.sanitize_name_to_search(name_to_search)
|
||||
r_name = self.sanitize_id_to_search(name_to_search)
|
||||
if not name_to_search or isinstance(r_name, dict):
|
||||
return objs
|
||||
r_name = re.compile(r_name)
|
||||
for title_name in self.get_all():
|
||||
res = re.search(r_name, title_name)
|
||||
r_name = re.compile(r_name, flags=flags)
|
||||
for obj_id in self.get_ids(): # TODO REPLACE ME WITH AN ITERATOR
|
||||
res = re.search(r_name, obj_id)
|
||||
if res:
|
||||
objs[title_name] = {}
|
||||
objs[obj_id] = {}
|
||||
if r_pos:
|
||||
objs[title_name]['hl-start'] = res.start()
|
||||
objs[title_name]['hl-end'] = res.end()
|
||||
objs[obj_id]['hl-start'] = res.start()
|
||||
objs[obj_id]['hl-end'] = res.end()
|
||||
return objs
|
||||
|
||||
def sanitize_content_to_search(self, content_to_search):
|
||||
return content_to_search
|
||||
|
||||
def search_by_content(self, content_to_search, r_pos=False, case_sensitive=True):
|
||||
objs = {}
|
||||
if case_sensitive:
|
||||
flags = 0
|
||||
else:
|
||||
flags = re.IGNORECASE
|
||||
# for subtype in subtypes:
|
||||
r_search = self.sanitize_content_to_search(content_to_search)
|
||||
if not r_search or isinstance(r_search, dict):
|
||||
return objs
|
||||
r_search = re.compile(r_search, flags=flags)
|
||||
for obj_id in self.get_ids(): # TODO REPLACE ME WITH AN ITERATOR
|
||||
obj = self.obj_class(obj_id)
|
||||
content = obj.get_content()
|
||||
res = re.search(r_search, content)
|
||||
if res:
|
||||
objs[obj_id] = {}
|
||||
if r_pos: # TODO ADD CONTENT ????
|
||||
objs[obj_id]['hl-start'] = res.start()
|
||||
objs[obj_id]['hl-end'] = res.end()
|
||||
objs[obj_id]['content'] = content
|
||||
return objs
|
||||
|
||||
def api_get_chart_nb_by_daterange(self, date_from, date_to):
|
||||
|
@ -226,5 +262,4 @@ class AbstractDaterangeObjects(ABC):
|
|||
|
||||
def api_get_meta_by_daterange(self, date_from, date_to):
|
||||
date = Date.sanitise_date_range(date_from, date_to)
|
||||
return self.get_metas(self.get_by_daterange(date['date_from'], date['date_to']), options={'sparkline'})
|
||||
|
||||
return self.get_metas(self.get_by_daterange(date['date_from'], date['date_to']), options={'sparkline'})
|
|
@ -20,13 +20,21 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from lib import ail_logger
|
||||
from lib import Tag
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import Duplicate
|
||||
from lib.correlations_engine import get_nb_correlations, get_correlations, add_obj_correlation, delete_obj_correlation, delete_obj_correlations, exists_obj_correlation, is_obj_correlated, get_nb_correlation_by_correl_type
|
||||
from lib.correlations_engine import get_nb_correlations, get_correlations, add_obj_correlation, delete_obj_correlation, delete_obj_correlations, exists_obj_correlation, is_obj_correlated, get_nb_correlation_by_correl_type, get_obj_inter_correlation
|
||||
from lib.Investigations import is_object_investigated, get_obj_investigations, delete_obj_investigations
|
||||
from lib.relationships_engine import get_obj_nb_relationships, add_obj_relationship
|
||||
from lib.Language import get_obj_translation
|
||||
from lib.Tracker import is_obj_tracked, get_obj_trackers, delete_obj_trackers
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='ail'))
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
# r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
r_object = config_loader.get_db_conn("Kvrocks_Objects")
|
||||
config_loader = None
|
||||
|
||||
class AbstractObject(ABC):
|
||||
"""
|
||||
Abstract Object
|
||||
|
@ -59,14 +67,28 @@ class AbstractObject(ABC):
|
|||
def get_global_id(self):
|
||||
return f'{self.get_type()}:{self.get_subtype(r_str=True)}:{self.get_id()}'
|
||||
|
||||
def get_default_meta(self, tags=False):
|
||||
def get_default_meta(self, tags=False, link=False):
|
||||
dict_meta = {'id': self.get_id(),
|
||||
'type': self.get_type(),
|
||||
'subtype': self.get_subtype()}
|
||||
'subtype': self.get_subtype(r_str=True)}
|
||||
if tags:
|
||||
dict_meta['tags'] = self.get_tags()
|
||||
if link:
|
||||
dict_meta['link'] = self.get_link()
|
||||
return dict_meta
|
||||
|
||||
def _get_field(self, field):
|
||||
if self.subtype is None:
|
||||
return r_object.hget(f'meta:{self.type}:{self.id}', field)
|
||||
else:
|
||||
return r_object.hget(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', field)
|
||||
|
||||
def _set_field(self, field, value):
|
||||
if self.subtype is None:
|
||||
return r_object.hset(f'meta:{self.type}:{self.id}', field, value)
|
||||
else:
|
||||
return r_object.hset(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', field, value)
|
||||
|
||||
## Tags ##
|
||||
def get_tags(self, r_list=False):
|
||||
tags = Tag.get_object_tags(self.type, self.id, self.get_subtype(r_str=True))
|
||||
|
@ -198,6 +220,8 @@ class AbstractObject(ABC):
|
|||
else:
|
||||
return []
|
||||
|
||||
## Correlation ##
|
||||
|
||||
def _get_external_correlation(self, req_type, req_subtype, req_id, obj_type):
|
||||
"""
|
||||
Get object correlation
|
||||
|
@ -248,8 +272,79 @@ class AbstractObject(ABC):
|
|||
return is_obj_correlated(self.type, self.subtype, self.id,
|
||||
object2.get_type(), object2.get_subtype(r_str=True), object2.get_id())
|
||||
|
||||
def get_correlation_iter(self, obj_type2, subtype2, obj_id2, correl_type):
|
||||
return get_obj_inter_correlation(self.type, self.get_subtype(r_str=True), self.id, obj_type2, subtype2, obj_id2, correl_type)
|
||||
|
||||
def get_correlation_iter_obj(self, object2, correl_type):
|
||||
return self.get_correlation_iter(object2.get_type(), object2.get_subtype(r_str=True), object2.get_id(), correl_type)
|
||||
|
||||
def delete_correlation(self, type2, subtype2, id2):
|
||||
"""
|
||||
Get object correlations
|
||||
"""
|
||||
delete_obj_correlation(self.type, self.subtype, self.id, type2, subtype2, id2)
|
||||
|
||||
## -Correlation- ##
|
||||
|
||||
## Relationship ##
|
||||
|
||||
def get_nb_relationships(self, filter=[]):
|
||||
return get_obj_nb_relationships(self.get_global_id())
|
||||
|
||||
def add_relationship(self, obj2_global_id, relationship, source=True):
|
||||
# is source
|
||||
if source:
|
||||
print(self.get_global_id(), obj2_global_id, relationship)
|
||||
add_obj_relationship(self.get_global_id(), obj2_global_id, relationship)
|
||||
# is target
|
||||
else:
|
||||
add_obj_relationship(obj2_global_id, self.get_global_id(), relationship)
|
||||
|
||||
## -Relationship- ##
|
||||
|
||||
## Translation ##
|
||||
|
||||
def translate(self, content=None, field='', source=None, target='en'):
|
||||
global_id = self.get_global_id()
|
||||
if not content:
|
||||
content = self.get_content()
|
||||
return get_obj_translation(global_id, content, field=field, source=source, target=target)
|
||||
|
||||
## -Translation- ##
|
||||
|
||||
## Parent ##
|
||||
|
||||
def is_parent(self):
|
||||
return r_object.exists(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
|
||||
|
||||
def is_children(self):
|
||||
return r_object.hexists(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent')
|
||||
|
||||
def get_parent(self):
|
||||
return r_object.hget(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent')
|
||||
|
||||
def get_childrens(self):
|
||||
return r_object.smembers(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}')
|
||||
|
||||
def set_parent(self, obj_type=None, obj_subtype=None, obj_id=None, obj_global_id=None): # TODO # REMOVE ITEM DUP
|
||||
if not obj_global_id:
|
||||
if obj_subtype is None:
|
||||
obj_subtype = ''
|
||||
obj_global_id = f'{obj_type}:{obj_subtype}:{obj_id}'
|
||||
r_object.hset(f'meta:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', 'parent', obj_global_id)
|
||||
r_object.sadd(f'child:{obj_global_id}', self.get_global_id())
|
||||
|
||||
def add_children(self, obj_type=None, obj_subtype=None, obj_id=None, obj_global_id=None): # TODO # REMOVE ITEM DUP
|
||||
if not obj_global_id:
|
||||
if obj_subtype is None:
|
||||
obj_subtype = ''
|
||||
obj_global_id = f'{obj_type}:{obj_subtype}:{obj_id}'
|
||||
r_object.sadd(f'child:{self.type}:{self.get_subtype(r_str=True)}:{self.id}', obj_global_id)
|
||||
r_object.hset(f'meta:{obj_global_id}', 'parent', self.get_global_id())
|
||||
|
||||
## others objects ##
|
||||
def add_obj_children(self, parent_global_id, son_global_id):
|
||||
r_object.sadd(f'child:{parent_global_id}', son_global_id)
|
||||
r_object.hset(f'meta:{son_global_id}', 'parent', parent_global_id)
|
||||
|
||||
## Parent ##
|
||||
|
|
|
@ -88,7 +88,10 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
def _get_meta(self, options=None):
|
||||
if options is None:
|
||||
options = set()
|
||||
meta = {'first_seen': self.get_first_seen(),
|
||||
meta = {'id': self.id,
|
||||
'type': self.type,
|
||||
'subtype': self.subtype,
|
||||
'first_seen': self.get_first_seen(),
|
||||
'last_seen': self.get_last_seen(),
|
||||
'nb_seen': self.get_nb_seen()}
|
||||
if 'icon' in options:
|
||||
|
@ -150,8 +153,11 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
# => data Retention + efficient search
|
||||
#
|
||||
#
|
||||
def _add_subtype(self):
|
||||
r_object.sadd(f'all_{self.type}:subtypes', self.subtype)
|
||||
|
||||
def add(self, date, item_id):
|
||||
def add(self, date, obj=None):
|
||||
self._add_subtype()
|
||||
self.update_daterange(date)
|
||||
update_obj_date(date, self.type, self.subtype)
|
||||
# daily
|
||||
|
@ -162,19 +168,21 @@ class AbstractSubtypeObject(AbstractObject, ABC):
|
|||
#######################################################################
|
||||
#######################################################################
|
||||
|
||||
# Correlations
|
||||
self.add_correlation('item', '', item_id)
|
||||
# domain
|
||||
if is_crawled(item_id):
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
if obj:
|
||||
# Correlations
|
||||
self.add_correlation(obj.type, obj.get_subtype(r_str=True), obj.get_id())
|
||||
|
||||
if obj.type == 'item': # TODO same for message->chat ???
|
||||
item_id = obj.get_id()
|
||||
# domain
|
||||
if is_crawled(item_id):
|
||||
domain = get_item_domain(item_id)
|
||||
self.add_correlation('domain', '', domain)
|
||||
|
||||
# TODO:ADD objects + Stats
|
||||
def create(self, first_seen, last_seen):
|
||||
self.set_first_seen(first_seen)
|
||||
self.set_last_seen(last_seen)
|
||||
|
||||
# def create(self, first_seen, last_seen):
|
||||
# self.set_first_seen(first_seen)
|
||||
# self.set_last_seen(last_seen)
|
||||
|
||||
def _delete(self):
|
||||
pass
|
||||
|
|
|
@ -1,6 +1,5 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
|
@ -11,17 +10,29 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.ail_core import get_all_objects, get_object_all_subtypes
|
||||
from lib import correlations_engine
|
||||
from lib import relationships_engine
|
||||
from lib import btc_ail
|
||||
from lib import Tag
|
||||
|
||||
from lib.objects import Chats
|
||||
from lib.objects import ChatSubChannels
|
||||
from lib.objects import ChatThreads
|
||||
from lib.objects import CryptoCurrencies
|
||||
from lib.objects import CookiesNames
|
||||
from lib.objects.Cves import Cve
|
||||
from lib.objects.Decodeds import Decoded, get_all_decodeds_objects, get_nb_decodeds_objects
|
||||
from lib.objects.Domains import Domain
|
||||
from lib.objects import Etags
|
||||
from lib.objects.Favicons import Favicon
|
||||
from lib.objects import FilesNames
|
||||
from lib.objects import HHHashs
|
||||
from lib.objects.Items import Item, get_all_items_objects, get_nb_items_objects
|
||||
from lib.objects import Images
|
||||
from lib.objects.Messages import Message
|
||||
from lib.objects import Pgps
|
||||
from lib.objects.Screenshots import Screenshot
|
||||
from lib.objects import Titles
|
||||
from lib.objects.UsersAccount import UserAccount
|
||||
from lib.objects import Usernames
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
|
@ -45,25 +56,49 @@ def sanitize_objs_types(objs):
|
|||
return l_types
|
||||
|
||||
|
||||
def get_object(obj_type, subtype, id):
|
||||
def get_object(obj_type, subtype, obj_id):
|
||||
if obj_type == 'item':
|
||||
return Item(id)
|
||||
return Item(obj_id)
|
||||
elif obj_type == 'domain':
|
||||
return Domain(id)
|
||||
return Domain(obj_id)
|
||||
elif obj_type == 'decoded':
|
||||
return Decoded(id)
|
||||
return Decoded(obj_id)
|
||||
elif obj_type == 'chat':
|
||||
return Chats.Chat(obj_id, subtype)
|
||||
elif obj_type == 'chat-subchannel':
|
||||
return ChatSubChannels.ChatSubChannel(obj_id, subtype)
|
||||
elif obj_type == 'chat-thread':
|
||||
return ChatThreads.ChatThread(obj_id, subtype)
|
||||
elif obj_type == 'cookie-name':
|
||||
return CookiesNames.CookieName(obj_id)
|
||||
elif obj_type == 'cve':
|
||||
return Cve(id)
|
||||
return Cve(obj_id)
|
||||
elif obj_type == 'etag':
|
||||
return Etags.Etag(obj_id)
|
||||
elif obj_type == 'favicon':
|
||||
return Favicon(obj_id)
|
||||
elif obj_type == 'file-name':
|
||||
return FilesNames.FileName(obj_id)
|
||||
elif obj_type == 'hhhash':
|
||||
return HHHashs.HHHash(obj_id)
|
||||
elif obj_type == 'image':
|
||||
return Images.Image(obj_id)
|
||||
elif obj_type == 'message':
|
||||
return Message(obj_id)
|
||||
elif obj_type == 'screenshot':
|
||||
return Screenshot(id)
|
||||
return Screenshot(obj_id)
|
||||
elif obj_type == 'cryptocurrency':
|
||||
return CryptoCurrencies.CryptoCurrency(id, subtype)
|
||||
return CryptoCurrencies.CryptoCurrency(obj_id, subtype)
|
||||
elif obj_type == 'pgp':
|
||||
return Pgps.Pgp(id, subtype)
|
||||
return Pgps.Pgp(obj_id, subtype)
|
||||
elif obj_type == 'title':
|
||||
return Titles.Title(id)
|
||||
return Titles.Title(obj_id)
|
||||
elif obj_type == 'user-account':
|
||||
return UserAccount(obj_id, subtype)
|
||||
elif obj_type == 'username':
|
||||
return Usernames.Username(id, subtype)
|
||||
return Usernames.Username(obj_id, subtype)
|
||||
else:
|
||||
raise Exception(f'Unknown AIL object: {obj_type} {subtype} {obj_id}')
|
||||
|
||||
def get_objects(objects):
|
||||
objs = set()
|
||||
|
@ -96,9 +131,12 @@ def get_obj_global_id(obj_type, subtype, obj_id):
|
|||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_global_id()
|
||||
|
||||
def get_obj_type_subtype_id_from_global_id(global_id):
|
||||
obj_type, subtype, obj_id = global_id.split(':', 2)
|
||||
return obj_type, subtype, obj_id
|
||||
|
||||
def get_obj_from_global_id(global_id):
|
||||
obj = global_id.split(':', 3)
|
||||
obj = get_obj_type_subtype_id_from_global_id(global_id)
|
||||
return get_object(obj[0], obj[1], obj[2])
|
||||
|
||||
|
||||
|
@ -154,7 +192,7 @@ def get_objects_meta(objs, options=set(), flask_context=False):
|
|||
subtype = obj[1]
|
||||
obj_id = obj[2]
|
||||
else:
|
||||
obj_type, subtype, obj_id = obj.split(':', 2)
|
||||
obj_type, subtype, obj_id = get_obj_type_subtype_id_from_global_id(obj)
|
||||
metas.append(get_object_meta(obj_type, subtype, obj_id, options=options, flask_context=flask_context))
|
||||
return metas
|
||||
|
||||
|
@ -163,7 +201,7 @@ def get_object_card_meta(obj_type, subtype, id, related_btc=False):
|
|||
obj = get_object(obj_type, subtype, id)
|
||||
meta = obj.get_meta()
|
||||
meta['icon'] = obj.get_svg_icon()
|
||||
if subtype or obj_type == 'cve' or obj_type == 'title':
|
||||
if subtype or obj_type == 'cookie-name' or obj_type == 'cve' or obj_type == 'etag' or obj_type == 'title' or obj_type == 'favicon' or obj_type == 'hhhash':
|
||||
meta['sparkline'] = obj.get_sparkline()
|
||||
if obj_type == 'cve':
|
||||
meta['cve_search'] = obj.get_cve_search()
|
||||
|
@ -172,6 +210,8 @@ def get_object_card_meta(obj_type, subtype, id, related_btc=False):
|
|||
if subtype == 'bitcoin' and related_btc:
|
||||
meta["related_btc"] = btc_ail.get_bitcoin_info(obj.id)
|
||||
if obj.get_type() == 'decoded':
|
||||
meta['mimetype'] = obj.get_mimetype()
|
||||
meta['size'] = obj.get_size()
|
||||
meta["vt"] = obj.get_meta_vt()
|
||||
meta["vt"]["status"] = obj.is_vt_enabled()
|
||||
# TAGS MODAL
|
||||
|
@ -328,8 +368,8 @@ def get_obj_correlations(obj_type, subtype, obj_id):
|
|||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_correlations()
|
||||
|
||||
def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max):
|
||||
if len(objs) < nb_max or nb_max == -1:
|
||||
def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max, objs_hidden):
|
||||
if len(objs) < nb_max or nb_max == 0:
|
||||
if lvl == 0:
|
||||
objs.add((obj_type, subtype, obj_id))
|
||||
|
||||
|
@ -341,15 +381,17 @@ def _get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lv
|
|||
for obj2_type in correlations:
|
||||
for str_obj in correlations[obj2_type]:
|
||||
obj2_subtype, obj2_id = str_obj.split(':', 1)
|
||||
_get_obj_correlations_objs(objs, obj2_type, obj2_subtype, obj2_id, filter_types, lvl, nb_max)
|
||||
if get_obj_global_id(obj2_type, obj2_subtype, obj2_id) in objs_hidden:
|
||||
continue # filter object to hide
|
||||
_get_obj_correlations_objs(objs, obj2_type, obj2_subtype, obj2_id, filter_types, lvl, nb_max, objs_hidden)
|
||||
|
||||
def get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=[], lvl=0, nb_max=300):
|
||||
def get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=[], lvl=0, nb_max=300, objs_hidden=set()):
|
||||
objs = set()
|
||||
_get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max)
|
||||
_get_obj_correlations_objs(objs, obj_type, subtype, obj_id, filter_types, lvl, nb_max, objs_hidden)
|
||||
return objs
|
||||
|
||||
def obj_correlations_objs_add_tags(obj_type, subtype, obj_id, tags, filter_types=[], lvl=0, nb_max=300):
|
||||
objs = get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=filter_types, lvl=lvl, nb_max=nb_max)
|
||||
def obj_correlations_objs_add_tags(obj_type, subtype, obj_id, tags, filter_types=[], lvl=0, nb_max=300, objs_hidden=set()):
|
||||
objs = get_obj_correlations_objs(obj_type, subtype, obj_id, filter_types=filter_types, lvl=lvl, nb_max=nb_max, objs_hidden=objs_hidden)
|
||||
# print(objs)
|
||||
for obj_tuple in objs:
|
||||
obj1_type, subtype1, id1 = obj_tuple
|
||||
|
@ -390,7 +432,7 @@ def create_correlation_graph_links(links_set):
|
|||
def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True):
|
||||
graph_nodes_list = []
|
||||
for node_id in nodes_set:
|
||||
obj_type, subtype, obj_id = node_id.split(';', 2)
|
||||
obj_type, subtype, obj_id = get_obj_type_subtype_id_from_global_id(node_id)
|
||||
dict_node = {'id': node_id}
|
||||
dict_node['style'] = get_object_svg(obj_type, subtype, obj_id)
|
||||
|
||||
|
@ -411,17 +453,40 @@ def create_correlation_graph_nodes(nodes_set, obj_str_id, flask_context=True):
|
|||
|
||||
|
||||
def get_correlations_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1,
|
||||
objs_hidden=set(),
|
||||
flask_context=False):
|
||||
obj_str_id, nodes, links = correlations_engine.get_correlations_graph_nodes_links(obj_type, subtype, obj_id,
|
||||
filter_types=filter_types,
|
||||
max_nodes=max_nodes, level=level,
|
||||
flask_context=flask_context)
|
||||
obj_str_id, nodes, links, meta = correlations_engine.get_correlations_graph_nodes_links(obj_type, subtype, obj_id,
|
||||
filter_types=filter_types,
|
||||
max_nodes=max_nodes, level=level,
|
||||
objs_hidden=objs_hidden,
|
||||
flask_context=flask_context)
|
||||
# print(meta)
|
||||
meta['objs'] = list(meta['objs'])
|
||||
return {"nodes": create_correlation_graph_nodes(nodes, obj_str_id, flask_context=flask_context),
|
||||
"links": create_correlation_graph_links(links)}
|
||||
"links": create_correlation_graph_links(links),
|
||||
"meta": meta}
|
||||
|
||||
|
||||
# --- CORRELATION --- #
|
||||
|
||||
def get_obj_nb_relationships(obj_type, subtype, obj_id, filter_types=[]):
|
||||
obj = get_object(obj_type, subtype, obj_id)
|
||||
return obj.get_nb_relationships(filter=filter_types)
|
||||
|
||||
def get_relationships_graph_node(obj_type, subtype, obj_id, filter_types=[], max_nodes=300, level=1,
|
||||
objs_hidden=set(),
|
||||
flask_context=False):
|
||||
obj_global_id = get_obj_global_id(obj_type, subtype, obj_id)
|
||||
nodes, links, meta = relationships_engine.get_relationship_graph(obj_global_id,
|
||||
filter_types=filter_types,
|
||||
max_nodes=max_nodes, level=level,
|
||||
objs_hidden=objs_hidden)
|
||||
# print(meta)
|
||||
meta['objs'] = list(meta['objs'])
|
||||
return {"nodes": create_correlation_graph_nodes(nodes, obj_global_id, flask_context=flask_context),
|
||||
"links": links,
|
||||
"meta": meta}
|
||||
|
||||
|
||||
# if __name__ == '__main__':
|
||||
# r = get_objects([{'lvl': 1, 'type': 'item', 'subtype': '', 'id': 'crawled/2020/09/14/circl.lu0f4976a4-dda4-4189-ba11-6618c4a8c951'}])
|
||||
|
|
|
@ -113,6 +113,34 @@ def regex_finditer(r_key, regex, item_id, content, max_time=30):
|
|||
proc.terminate()
|
||||
sys.exit(0)
|
||||
|
||||
def _regex_match(r_key, regex, content):
|
||||
if re.match(regex, content):
|
||||
r_serv_cache.set(r_key, 1)
|
||||
r_serv_cache.expire(r_key, 360)
|
||||
|
||||
def regex_match(r_key, regex, item_id, content, max_time=30):
|
||||
proc = Proc(target=_regex_match, args=(r_key, regex, content))
|
||||
try:
|
||||
proc.start()
|
||||
proc.join(max_time)
|
||||
if proc.is_alive():
|
||||
proc.terminate()
|
||||
# Statistics.incr_module_timeout_statistic(r_key)
|
||||
err_mess = f"{r_key}: processing timeout: {item_id}"
|
||||
logger.info(err_mess)
|
||||
return False
|
||||
else:
|
||||
if r_serv_cache.exists(r_key):
|
||||
r_serv_cache.delete(r_key)
|
||||
return True
|
||||
else:
|
||||
r_serv_cache.delete(r_key)
|
||||
return False
|
||||
except KeyboardInterrupt:
|
||||
print("Caught KeyboardInterrupt, terminating regex worker")
|
||||
proc.terminate()
|
||||
sys.exit(0)
|
||||
|
||||
def _regex_search(r_key, regex, content):
|
||||
if re.search(regex, content):
|
||||
r_serv_cache.set(r_key, 1)
|
||||
|
|
111
bin/lib/relationships_engine.py
Executable file
111
bin/lib/relationships_engine.py
Executable file
|
@ -0,0 +1,111 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_rel = config_loader.get_db_conn("Kvrocks_Relationships")
|
||||
config_loader = None
|
||||
|
||||
|
||||
RELATIONSHIPS = {
|
||||
"forward",
|
||||
"mention"
|
||||
}
|
||||
def get_relationships():
|
||||
return RELATIONSHIPS
|
||||
|
||||
|
||||
def get_obj_relationships_by_type(obj_global_id, relationship):
|
||||
return r_rel.smembers(f'rel:{relationship}:{obj_global_id}')
|
||||
|
||||
def get_obj_nb_relationships_by_type(obj_global_id, relationship):
|
||||
return r_rel.scard(f'rel:{relationship}:{obj_global_id}')
|
||||
|
||||
def get_obj_relationships(obj_global_id):
|
||||
relationships = []
|
||||
for relationship in get_relationships():
|
||||
for rel in get_obj_relationships_by_type(obj_global_id, relationship):
|
||||
meta = {'relationship': relationship}
|
||||
direction, obj_id = rel.split(':', 1)
|
||||
if direction == 'i':
|
||||
meta['source'] = obj_id
|
||||
meta['target'] = obj_global_id
|
||||
else:
|
||||
meta['target'] = obj_id
|
||||
meta['source'] = obj_global_id
|
||||
|
||||
if not obj_id.startswith('chat'):
|
||||
continue
|
||||
|
||||
meta['id'] = obj_id
|
||||
# meta['direction'] = direction
|
||||
relationships.append(meta)
|
||||
return relationships
|
||||
|
||||
def get_obj_nb_relationships(obj_global_id):
|
||||
nb = {}
|
||||
for relationship in get_relationships():
|
||||
nb[relationship] = get_obj_nb_relationships_by_type(obj_global_id, relationship)
|
||||
return nb
|
||||
|
||||
|
||||
# TODO Filter by obj type ???
|
||||
def add_obj_relationship(source, target, relationship):
|
||||
r_rel.sadd(f'rel:{relationship}:{source}', f'o:{target}')
|
||||
r_rel.sadd(f'rel:{relationship}:{target}', f'i:{source}')
|
||||
# r_rel.sadd(f'rels:{source}', relationship)
|
||||
# r_rel.sadd(f'rels:{target}', relationship)
|
||||
|
||||
|
||||
def get_relationship_graph(obj_global_id, filter_types=[], max_nodes=300, level=1, objs_hidden=set()):
|
||||
links = []
|
||||
nodes = set()
|
||||
meta = {'complete': True, 'objs': set()}
|
||||
done = set()
|
||||
done_link = set()
|
||||
|
||||
_get_relationship_graph(obj_global_id, links, nodes, meta, level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, done=done, done_link=done_link)
|
||||
return nodes, links, meta
|
||||
|
||||
def _get_relationship_graph(obj_global_id, links, nodes, meta, level, max_nodes, filter_types=[], objs_hidden=set(), done=set(), done_link=set()):
|
||||
meta['objs'].add(obj_global_id)
|
||||
nodes.add(obj_global_id)
|
||||
|
||||
for rel in get_obj_relationships(obj_global_id):
|
||||
meta['objs'].add(rel['id'])
|
||||
|
||||
if rel['id'] in done:
|
||||
continue
|
||||
|
||||
if len(nodes) > max_nodes != 0:
|
||||
meta['complete'] = False
|
||||
break
|
||||
|
||||
nodes.add(rel['id'])
|
||||
|
||||
str_link = f"{rel['source']}{rel['target']}{rel['relationship']}"
|
||||
if str_link not in done_link:
|
||||
links.append({"source": rel['source'], "target": rel['target'], "relationship": rel['relationship']})
|
||||
done_link.add(str_link)
|
||||
|
||||
if level > 0:
|
||||
next_level = level - 1
|
||||
|
||||
_get_relationship_graph(rel['id'], links, nodes, meta, next_level, max_nodes, filter_types=filter_types, objs_hidden=objs_hidden, done=done, done_link=done_link)
|
||||
|
||||
# done.add(rel['id'])
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
source = ''
|
||||
target = ''
|
||||
add_obj_relationship(source, target, 'forward')
|
||||
# print(get_obj_relationships(source))
|
212
bin/lib/timeline_engine.py
Executable file
212
bin/lib/timeline_engine.py
Executable file
|
@ -0,0 +1,212 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
from uuid import uuid4
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
r_meta = config_loader.get_db_conn("Kvrocks_Timeline")
|
||||
config_loader = None
|
||||
|
||||
# CORRELATION_TYPES_BY_OBJ = {
|
||||
# "chat": ["item", "username"], # item ???
|
||||
# "cookie-name": ["domain"],
|
||||
# "cryptocurrency": ["domain", "item"],
|
||||
# "cve": ["domain", "item"],
|
||||
# "decoded": ["domain", "item"],
|
||||
# "domain": ["cve", "cookie-name", "cryptocurrency", "decoded", "etag", "favicon", "hhhash", "item", "pgp", "title", "screenshot", "username"],
|
||||
# "etag": ["domain"],
|
||||
# "favicon": ["domain", "item"],
|
||||
# "hhhash": ["domain"],
|
||||
# "item": ["chat", "cve", "cryptocurrency", "decoded", "domain", "favicon", "pgp", "screenshot", "title", "username"],
|
||||
# "pgp": ["domain", "item"],
|
||||
# "screenshot": ["domain", "item"],
|
||||
# "title": ["domain", "item"],
|
||||
# "username": ["chat", "domain", "item"],
|
||||
# }
|
||||
#
|
||||
# def get_obj_correl_types(obj_type):
|
||||
# return CORRELATION_TYPES_BY_OBJ.get(obj_type)
|
||||
|
||||
# def sanityze_obj_correl_types(obj_type, correl_types):
|
||||
# obj_correl_types = get_obj_correl_types(obj_type)
|
||||
# if correl_types:
|
||||
# correl_types = set(correl_types).intersection(obj_correl_types)
|
||||
# if not correl_types:
|
||||
# correl_types = obj_correl_types
|
||||
# if not correl_types:
|
||||
# return []
|
||||
# return correl_types
|
||||
|
||||
class Timeline:
|
||||
|
||||
def __init__(self, global_id, name):
|
||||
self.id = global_id
|
||||
self.name = name
|
||||
|
||||
def _get_block_obj_global_id(self, block):
|
||||
return r_meta.hget(f'block:{self.id}:{self.name}', block)
|
||||
|
||||
def _set_block_obj_global_id(self, block, global_id):
|
||||
return r_meta.hset(f'block:{self.id}:{self.name}', block, global_id)
|
||||
|
||||
def _get_block_timestamp(self, block, position):
|
||||
return r_meta.zscore(f'line:{self.id}:{self.name}', f'{position}:{block}')
|
||||
|
||||
def _get_nearest_bloc_inf(self, timestamp):
|
||||
inf = r_meta.zrevrangebyscore(f'line:{self.id}:{self.name}', float(timestamp), 0, start=0, num=1, withscores=True)
|
||||
if inf:
|
||||
inf, score = inf[0]
|
||||
if inf.startswith('end'):
|
||||
inf_key = f'start:{inf[4:]}'
|
||||
inf_score = r_meta.zscore(f'line:{self.id}:{self.name}', inf_key)
|
||||
if inf_score == score:
|
||||
inf = inf_key
|
||||
return inf
|
||||
else:
|
||||
return None
|
||||
|
||||
def _get_nearest_bloc_sup(self, timestamp):
|
||||
sup = r_meta.zrangebyscore(f'line:{self.id}:{self.name}', float(timestamp), '+inf', start=0, num=1, withscores=True)
|
||||
if sup:
|
||||
sup, score = sup[0]
|
||||
if sup.startswith('start'):
|
||||
sup_key = f'end:{sup[6:]}'
|
||||
sup_score = r_meta.zscore(f'line:{self.id}:{self.name}', sup_key)
|
||||
if score == sup_score:
|
||||
sup = sup_key
|
||||
return sup
|
||||
else:
|
||||
return None
|
||||
|
||||
def get_first_obj_id(self):
|
||||
first = r_meta.zrange(f'line:{self.id}:{self.name}', 0, 0)
|
||||
if first: # start:block
|
||||
first = first[0]
|
||||
if first.startswith('start:'):
|
||||
first = first[6:]
|
||||
else:
|
||||
first = first[4:]
|
||||
return self._get_block_obj_global_id(first)
|
||||
|
||||
def get_last_obj_id(self):
|
||||
last = r_meta.zrevrange(f'line:{self.id}:{self.name}', 0, 0)
|
||||
if last: # end:block
|
||||
last = last[0]
|
||||
if last.startswith('end:'):
|
||||
last = last[4:]
|
||||
else:
|
||||
last = last[6:]
|
||||
return self._get_block_obj_global_id(last)
|
||||
|
||||
def get_objs_ids(self):
|
||||
objs = set()
|
||||
for block in r_meta.zrange(f'line:{self.id}:{self.name}', 0, -1):
|
||||
if block:
|
||||
if block.startswith('start:'):
|
||||
objs.add(self._get_block_obj_global_id(block[6:]))
|
||||
return objs
|
||||
|
||||
# def get_objs_ids(self):
|
||||
# objs = {}
|
||||
# last_obj_id = None
|
||||
# for block, timestamp in r_meta.zrange(f'line:{self.id}:{self.name}', 0, -1, withscores=True):
|
||||
# if block:
|
||||
# if block.startswith('start:'):
|
||||
# last_obj_id = self._get_block_obj_global_id(block[6:])
|
||||
# objs[last_obj_id] = {'first_seen': timestamp}
|
||||
# else:
|
||||
# objs[last_obj_id]['last_seen'] = timestamp
|
||||
# return objs
|
||||
|
||||
def _update_bloc(self, block, position, timestamp):
|
||||
r_meta.zadd(f'line:{self.id}:{self.name}', {f'{position}:{block}': timestamp})
|
||||
|
||||
def _add_bloc(self, obj_global_id, timestamp, end=None):
|
||||
if end:
|
||||
timestamp_end = end
|
||||
else:
|
||||
timestamp_end = timestamp
|
||||
new_bloc = str(uuid4())
|
||||
r_meta.zadd(f'line:{self.id}:{self.name}', {f'start:{new_bloc}': timestamp, f'end:{new_bloc}': timestamp_end})
|
||||
self._set_block_obj_global_id(new_bloc, obj_global_id)
|
||||
return new_bloc
|
||||
|
||||
def add_timestamp(self, timestamp, obj_global_id):
|
||||
inf = self._get_nearest_bloc_inf(timestamp)
|
||||
sup = self._get_nearest_bloc_sup(timestamp)
|
||||
if not inf and not sup:
|
||||
# create new bloc
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
# timestamp < first_seen
|
||||
elif not inf:
|
||||
sup_pos, sup_id = sup.split(':')
|
||||
sup_obj = self._get_block_obj_global_id(sup_id)
|
||||
if sup_obj == obj_global_id:
|
||||
self._update_bloc(sup_id, 'start', timestamp)
|
||||
# create new bloc
|
||||
else:
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
|
||||
# timestamp > first_seen
|
||||
elif not sup:
|
||||
inf_pos, inf_id = inf.split(':')
|
||||
inf_obj = self._get_block_obj_global_id(inf_id)
|
||||
if inf_obj == obj_global_id:
|
||||
self._update_bloc(inf_id, 'end', timestamp)
|
||||
# create new bloc
|
||||
else:
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
|
||||
else:
|
||||
inf_pos, inf_id = inf.split(':')
|
||||
sup_pos, sup_id = sup.split(':')
|
||||
inf_obj = self._get_block_obj_global_id(inf_id)
|
||||
|
||||
if inf_id == sup_id:
|
||||
# reduce bloc + create two new bloc
|
||||
if obj_global_id != inf_obj:
|
||||
# get end timestamp
|
||||
sup_timestamp = self._get_block_timestamp(sup_id, 'end')
|
||||
# reduce original bloc
|
||||
self._update_bloc(inf_id, 'end', timestamp - 1)
|
||||
# Insert new bloc
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
# Recreate end of the first bloc by a new bloc
|
||||
self._add_bloc(inf_obj, timestamp + 1, end=sup_timestamp)
|
||||
return new_bloc
|
||||
|
||||
# timestamp in existing bloc
|
||||
else:
|
||||
return inf_id
|
||||
|
||||
# different blocs: expend sup/inf bloc or create a new bloc if
|
||||
elif inf_pos == 'end' and sup_pos == 'start':
|
||||
# Extend inf bloc
|
||||
if obj_global_id == inf_obj:
|
||||
self._update_bloc(inf_id, 'end', timestamp)
|
||||
return inf_id
|
||||
|
||||
sup_obj = self._get_block_obj_global_id(sup_id)
|
||||
# Extend sup bloc
|
||||
if obj_global_id == sup_obj:
|
||||
self._update_bloc(sup_id, 'start', timestamp)
|
||||
return sup_id
|
||||
|
||||
# create new bloc
|
||||
new_bloc = self._add_bloc(obj_global_id, timestamp)
|
||||
return new_bloc
|
||||
|
||||
# inf_pos == 'start' and sup_pos == 'end'
|
||||
# else raise error ???
|
|
@ -47,8 +47,8 @@ class ApiKey(AbstractModule):
|
|||
self.logger.info(f"Module {self.module_name} initialized")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
|
||||
google_api_key = self.regex_findall(self.re_google_api_key, item.get_id(), item_content, r_set=True)
|
||||
|
@ -63,8 +63,8 @@ class ApiKey(AbstractModule):
|
|||
print(f'found google api key: {to_print}')
|
||||
self.redis_logger.warning(f'{to_print}Checked {len(google_api_key)} found Google API Key;{item.get_id()}')
|
||||
|
||||
msg = f'infoleak:automatic-detection="google-api-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="google-api-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# # TODO: # FIXME: AWS regex/validate/sanitize KEY + SECRET KEY
|
||||
if aws_access_key:
|
||||
|
@ -74,12 +74,12 @@ class ApiKey(AbstractModule):
|
|||
print(f'found AWS secret key')
|
||||
self.redis_logger.warning(f'{to_print}Checked {len(aws_secret_key)} found AWS secret Key;{item.get_id()}')
|
||||
|
||||
msg = f'infoleak:automatic-detection="aws-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="aws-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# Tags
|
||||
msg = f'infoleak:automatic-detection="api-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="api-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
if r_result:
|
||||
return google_api_key, aws_access_key, aws_secret_key
|
||||
|
|
|
@ -6,14 +6,14 @@ The ZMQ_PubSub_Categ Module
|
|||
|
||||
Each words files created under /files/ are representing categories.
|
||||
This modules take these files and compare them to
|
||||
the content of an item.
|
||||
the content of an obj.
|
||||
|
||||
When a word from a item match one or more of these words file, the filename of
|
||||
the item / zhe item id is published/forwarded to the next modules.
|
||||
When a word from a obj match one or more of these words file, the filename of
|
||||
the obj / the obj id is published/forwarded to the next modules.
|
||||
|
||||
Each category (each files) are representing a dynamic channel.
|
||||
This mean that if you create 1000 files under /files/ you'll have 1000 channels
|
||||
where every time there is a matching word to a category, the item containing
|
||||
where every time there is a matching word to a category, the obj containing
|
||||
this word will be pushed to this specific channel.
|
||||
|
||||
..note:: The channel will have the name of the file created.
|
||||
|
@ -44,7 +44,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
|
||||
|
||||
class Categ(AbstractModule):
|
||||
|
@ -81,27 +80,32 @@ class Categ(AbstractModule):
|
|||
self.categ_words = tmp_dict.items()
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
# Create Item Object
|
||||
item = Item(message)
|
||||
# Get item content
|
||||
content = item.get_content()
|
||||
# Get obj Object
|
||||
obj = self.get_obj()
|
||||
# Get obj content
|
||||
content = obj.get_content()
|
||||
categ_found = []
|
||||
|
||||
# Search for pattern categories in item content
|
||||
# Search for pattern categories in obj content
|
||||
for categ, pattern in self.categ_words:
|
||||
|
||||
found = set(re.findall(pattern, content))
|
||||
lenfound = len(found)
|
||||
if lenfound >= self.matchingThreshold:
|
||||
categ_found.append(categ)
|
||||
msg = f'{item.get_id()} {lenfound}'
|
||||
if obj.type == 'message':
|
||||
self.add_message_to_queue(message='0', queue=categ)
|
||||
else:
|
||||
|
||||
# Export message to categ queue
|
||||
print(msg, categ)
|
||||
self.add_message_to_queue(msg, categ)
|
||||
found = set(re.findall(pattern, content))
|
||||
lenfound = len(found)
|
||||
if lenfound >= self.matchingThreshold:
|
||||
categ_found.append(categ)
|
||||
msg = str(lenfound)
|
||||
|
||||
# Export message to categ queue
|
||||
print(msg, categ)
|
||||
self.add_message_to_queue(message=msg, queue=categ)
|
||||
|
||||
self.redis_logger.debug(
|
||||
f'Categ;{obj.get_source()};{obj.get_date()};{obj.get_basename()};Detected {lenfound} as {categ};{obj.get_id()}')
|
||||
|
||||
self.redis_logger.debug(
|
||||
f'Categ;{item.get_source()};{item.get_date()};{item.get_basename()};Detected {lenfound} as {categ};{item.get_id()}')
|
||||
if r_result:
|
||||
return categ_found
|
||||
|
||||
|
|
|
@ -29,7 +29,6 @@ Redis organization:
|
|||
import os
|
||||
import sys
|
||||
import time
|
||||
import re
|
||||
from datetime import datetime
|
||||
from pyfaup.faup import Faup
|
||||
|
||||
|
@ -85,8 +84,8 @@ class Credential(AbstractModule):
|
|||
|
||||
def compute(self, message):
|
||||
|
||||
item_id, count = message.split()
|
||||
item = Item(item_id)
|
||||
count = message
|
||||
item = self.get_obj()
|
||||
|
||||
item_content = item.get_content()
|
||||
|
||||
|
@ -111,8 +110,8 @@ class Credential(AbstractModule):
|
|||
print(f"========> Found more than 10 credentials in this file : {item.get_id()}")
|
||||
self.redis_logger.warning(to_print)
|
||||
|
||||
msg = f'infoleak:automatic-detection="credential";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="credential"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
site_occurrence = self.regex_findall(self.regex_site_for_stats, item.get_id(), item_content)
|
||||
|
||||
|
|
|
@ -68,8 +68,8 @@ class CreditCards(AbstractModule):
|
|||
return extracted
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
all_cards = self.regex_findall(self.regex, item.id, content)
|
||||
|
||||
|
@ -90,8 +90,8 @@ class CreditCards(AbstractModule):
|
|||
print(mess)
|
||||
self.redis_logger.warning(mess)
|
||||
|
||||
msg = f'infoleak:automatic-detection="credit-card";{item.id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="credit-card"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
if r_result:
|
||||
return creditcard_set
|
||||
|
|
|
@ -114,7 +114,7 @@ class Cryptocurrencies(AbstractModule, ABC):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
date = item.get_date()
|
||||
content = item.get_content()
|
||||
|
@ -130,18 +130,18 @@ class Cryptocurrencies(AbstractModule, ABC):
|
|||
if crypto.is_valid_address():
|
||||
# print(address)
|
||||
is_valid_address = True
|
||||
crypto.add(date, item_id)
|
||||
crypto.add(date, item)
|
||||
|
||||
# Check private key
|
||||
if is_valid_address:
|
||||
msg = f'{currency["tag"]};{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
msg = f'{currency["tag"]}'
|
||||
self.add_message_to_queue(message=msg, queue='Tags')
|
||||
|
||||
if currency.get('private_key'):
|
||||
private_keys = self.regex_findall(currency['private_key']['regex'], item_id, content)
|
||||
if private_keys:
|
||||
msg = f'{currency["private_key"]["tag"]};{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
msg = f'{currency["private_key"]["tag"]}'
|
||||
self.add_message_to_queue(message=msg, queue='Tags')
|
||||
|
||||
# debug
|
||||
print(private_keys)
|
||||
|
|
|
@ -44,9 +44,8 @@ class CveModule(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
|
||||
item_id, count = message.split()
|
||||
item = Item(item_id)
|
||||
count = message
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
|
||||
cves = self.regex_findall(self.reg_cve, item_id, item.get_content())
|
||||
|
@ -55,15 +54,15 @@ class CveModule(AbstractModule):
|
|||
date = item.get_date()
|
||||
for cve_id in cves:
|
||||
cve = Cves.Cve(cve_id)
|
||||
cve.add(date, item_id)
|
||||
cve.add(date, item)
|
||||
|
||||
warning = f'{item_id} contains CVEs {cves}'
|
||||
print(warning)
|
||||
self.redis_logger.warning(warning)
|
||||
|
||||
msg = f'infoleak:automatic-detection="cve";{item_id}'
|
||||
tag = 'infoleak:automatic-detection="cve"'
|
||||
# Send to Tags Queue
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -21,7 +21,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
from lib.objects.Decodeds import Decoded
|
||||
from trackers.Tracker_Term import Tracker_Term
|
||||
from trackers.Tracker_Regex import Tracker_Regex
|
||||
|
@ -87,18 +86,16 @@ class Decoder(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
|
||||
item = Item(message)
|
||||
content = item.get_content()
|
||||
date = item.get_date()
|
||||
content = self.obj.get_content()
|
||||
date = self.obj.get_date()
|
||||
new_decodeds = []
|
||||
|
||||
for decoder in self.decoder_order:
|
||||
find = False
|
||||
dname = decoder['name']
|
||||
|
||||
encodeds = self.regex_findall(decoder['regex'], item.id, content)
|
||||
# PERF remove encoded from item content
|
||||
encodeds = self.regex_findall(decoder['regex'], self.obj.id, content)
|
||||
# PERF remove encoded from obj content
|
||||
for encoded in encodeds:
|
||||
content = content.replace(encoded, '', 1)
|
||||
encodeds = set(encodeds)
|
||||
|
@ -114,33 +111,34 @@ class Decoder(AbstractModule):
|
|||
if not decoded.exists():
|
||||
mimetype = decoded.guess_mimetype(decoded_file)
|
||||
if not mimetype:
|
||||
print(sha1_string, item.id)
|
||||
raise Exception(f'Invalid mimetype: {decoded.id} {item.id}')
|
||||
print(sha1_string, self.obj.id)
|
||||
raise Exception(f'Invalid mimetype: {decoded.id} {self.obj.id}')
|
||||
decoded.save_file(decoded_file, mimetype)
|
||||
new_decodeds.append(decoded.id)
|
||||
else:
|
||||
mimetype = decoded.get_mimetype()
|
||||
decoded.add(dname, date, item.id, mimetype=mimetype)
|
||||
decoded.add(date, self.obj, dname, mimetype=mimetype)
|
||||
|
||||
# new_decodeds.append(decoded.id)
|
||||
self.logger.info(f'{item.id} : {dname} - {decoded.id} - {mimetype}')
|
||||
self.logger.info(f'{self.obj.id} : {dname} - {decoded.id} - {mimetype}')
|
||||
|
||||
if find:
|
||||
self.logger.info(f'{item.id} - {dname}')
|
||||
self.logger.info(f'{self.obj.id} - {dname}')
|
||||
|
||||
# Send to Tags
|
||||
msg = f'infoleak:automatic-detection="{dname}";{item.id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = f'infoleak:automatic-detection="{dname}"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
####################
|
||||
# TRACKERS DECODED
|
||||
for decoded_id in new_decodeds:
|
||||
decoded = Decoded(decoded_id)
|
||||
try:
|
||||
self.tracker_term.compute(decoded_id, obj_type='decoded')
|
||||
self.tracker_regex.compute(decoded_id, obj_type='decoded')
|
||||
self.tracker_term.compute_manual(decoded)
|
||||
self.tracker_regex.compute_manual(decoded)
|
||||
except UnicodeDecodeError:
|
||||
pass
|
||||
self.tracker_yara.compute(decoded_id, obj_type='decoded')
|
||||
self.tracker_yara.compute_manual(decoded)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -22,7 +22,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.objects.Items import Item
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import d4
|
||||
|
||||
|
@ -42,7 +41,13 @@ class DomClassifier(AbstractModule):
|
|||
|
||||
addr_dns = config_loader.get_config_str("DomClassifier", "dns")
|
||||
|
||||
self.c = DomainClassifier.domainclassifier.Extract(rawtext="", nameservers=[addr_dns])
|
||||
redis_host = config_loader.get_config_str('Redis_Cache', 'host')
|
||||
redis_port = config_loader.get_config_int('Redis_Cache', 'port')
|
||||
redis_db = config_loader.get_config_int('Redis_Cache', 'db')
|
||||
self.dom_classifier = DomainClassifier.domainclassifier.Extract(rawtext="", nameservers=[addr_dns],
|
||||
redis_host=redis_host,
|
||||
redis_port=redis_port, redis_db=redis_db,
|
||||
re_timeout=30)
|
||||
|
||||
self.cc = config_loader.get_config_str("DomClassifier", "cc")
|
||||
self.cc_tld = config_loader.get_config_str("DomClassifier", "cc_tld")
|
||||
|
@ -51,38 +56,42 @@ class DomClassifier(AbstractModule):
|
|||
self.logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
host, item_id = message.split()
|
||||
host = message
|
||||
|
||||
item = Item(item_id)
|
||||
item = self.get_obj()
|
||||
item_basename = item.get_basename()
|
||||
item_date = item.get_date()
|
||||
item_source = item.get_source()
|
||||
try:
|
||||
|
||||
self.c.text(rawtext=host)
|
||||
print(self.c.domain)
|
||||
self.c.validdomain(passive_dns=True, extended=False)
|
||||
# self.logger.debug(self.c.vdomain)
|
||||
self.dom_classifier.text(rawtext=host)
|
||||
if not self.dom_classifier.domain:
|
||||
return
|
||||
print(self.dom_classifier.domain)
|
||||
self.dom_classifier.validdomain(passive_dns=True, extended=False)
|
||||
# self.logger.debug(self.dom_classifier.vdomain)
|
||||
|
||||
print(self.c.vdomain)
|
||||
print(self.dom_classifier.vdomain)
|
||||
print()
|
||||
|
||||
if self.c.vdomain and d4.is_passive_dns_enabled():
|
||||
for dns_record in self.c.vdomain:
|
||||
self.add_message_to_queue(dns_record)
|
||||
if self.dom_classifier.vdomain and d4.is_passive_dns_enabled():
|
||||
for dns_record in self.dom_classifier.vdomain:
|
||||
self.add_message_to_queue(obj=None, message=dns_record)
|
||||
|
||||
localizeddomains = self.c.include(expression=self.cc_tld)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc_tld};{item.get_id()}")
|
||||
if self.cc_tld:
|
||||
localizeddomains = self.dom_classifier.include(expression=self.cc_tld)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc_tld};{item.get_id()}")
|
||||
|
||||
localizeddomains = self.c.localizedomain(cc=self.cc)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc};{item.get_id()}")
|
||||
if self.cc:
|
||||
localizeddomains = self.dom_classifier.localizedomain(cc=self.cc)
|
||||
if localizeddomains:
|
||||
print(localizeddomains)
|
||||
self.redis_logger.warning(f"DomainC;{item_source};{item_date};{item_basename};Checked {localizeddomains} located in {self.cc};{item.get_id()}")
|
||||
|
||||
if r_result:
|
||||
return self.c.vdomain
|
||||
return self.dom_classifier.vdomain
|
||||
|
||||
except IOError as err:
|
||||
self.redis_logger.error(f"Duplicate;{item_source};{item_date};{item_basename};CRC Checksum Failed")
|
||||
|
|
|
@ -52,7 +52,7 @@ class Duplicates(AbstractModule):
|
|||
def compute(self, message):
|
||||
# IOError: "CRC Checksum Failed on : {id}"
|
||||
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
|
||||
# Check file size
|
||||
if item.get_size() < self.min_item_size:
|
||||
|
|
66
bin/modules/Exif.py
Executable file
66
bin/modules/Exif.py
Executable file
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Exif Module
|
||||
======================
|
||||
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import sys
|
||||
|
||||
from PIL import Image, ExifTags
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
|
||||
|
||||
class Exif(AbstractModule):
|
||||
"""
|
||||
CveModule for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Exif, self).__init__()
|
||||
|
||||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 1
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
image = self.get_obj()
|
||||
print(image)
|
||||
img = Image.open(image.get_filepath())
|
||||
img_exif = img.getexif()
|
||||
print(img_exif)
|
||||
if img_exif:
|
||||
self.logger.critical(f'Exif: {self.get_obj().id}')
|
||||
gps = img_exif.get(34853)
|
||||
print(gps)
|
||||
self.logger.critical(f'gps: {gps}')
|
||||
for key, val in img_exif.items():
|
||||
if key in ExifTags.TAGS:
|
||||
print(f'{ExifTags.TAGS[key]}:{val}')
|
||||
self.logger.critical(f'{ExifTags.TAGS[key]}:{val}')
|
||||
else:
|
||||
print(f'{key}:{val}')
|
||||
self.logger.critical(f'{key}:{val}')
|
||||
sys.exit(0)
|
||||
|
||||
# tag = 'infoleak:automatic-detection="cve"'
|
||||
# Send to Tags Queue
|
||||
# self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Exif()
|
||||
module.run()
|
|
@ -79,73 +79,58 @@ class Global(AbstractModule):
|
|||
self.time_last_stats = time.time()
|
||||
self.processed_item = 0
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
# Recovering the streamed message informations
|
||||
splitted = message.split()
|
||||
def compute(self, message, r_result=False): # TODO move OBJ ID sanitization to importer
|
||||
# Recovering the streamed message infos
|
||||
gzip64encoded = message
|
||||
|
||||
if len(splitted) == 2:
|
||||
item, gzip64encoded = splitted
|
||||
if self.obj.type == 'item':
|
||||
if gzip64encoded:
|
||||
|
||||
# Remove ITEMS_FOLDER from item path (crawled item + submitted)
|
||||
if self.ITEMS_FOLDER in item:
|
||||
item = item.replace(self.ITEMS_FOLDER, '', 1)
|
||||
# Creating the full filepath
|
||||
filename = os.path.join(self.ITEMS_FOLDER, self.obj.id)
|
||||
filename = os.path.realpath(filename)
|
||||
|
||||
file_name_item = item.split('/')[-1]
|
||||
if len(file_name_item) > 255:
|
||||
new_file_name_item = '{}{}.gz'.format(file_name_item[:215], str(uuid4()))
|
||||
item = self.rreplace(item, file_name_item, new_file_name_item, 1)
|
||||
# Incorrect filename
|
||||
if not os.path.commonprefix([filename, self.ITEMS_FOLDER]) == self.ITEMS_FOLDER:
|
||||
self.logger.warning(f'Global; Path traversal detected {filename}')
|
||||
print(f'Global; Path traversal detected {filename}')
|
||||
|
||||
# Creating the full filepath
|
||||
filename = os.path.join(self.ITEMS_FOLDER, item)
|
||||
filename = os.path.realpath(filename)
|
||||
else:
|
||||
# Decode compressed base64
|
||||
decoded = base64.standard_b64decode(gzip64encoded)
|
||||
new_file_content = self.gunzip_bytes_obj(filename, decoded)
|
||||
|
||||
# Incorrect filename
|
||||
if not os.path.commonprefix([filename, self.ITEMS_FOLDER]) == self.ITEMS_FOLDER:
|
||||
self.logger.warning(f'Global; Path traversal detected {filename}')
|
||||
print(f'Global; Path traversal detected {filename}')
|
||||
# TODO REWRITE ME
|
||||
if new_file_content:
|
||||
filename = self.check_filename(filename, new_file_content)
|
||||
|
||||
if filename:
|
||||
# create subdir
|
||||
dirname = os.path.dirname(filename)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
|
||||
with open(filename, 'wb') as f:
|
||||
f.write(decoded)
|
||||
|
||||
update_obj_date(self.obj.get_date(), 'item')
|
||||
|
||||
self.add_message_to_queue(obj=self.obj, queue='Item')
|
||||
self.processed_item += 1
|
||||
|
||||
print(self.obj.id)
|
||||
if r_result:
|
||||
return self.obj.id
|
||||
|
||||
else:
|
||||
# Decode compressed base64
|
||||
decoded = base64.standard_b64decode(gzip64encoded)
|
||||
new_file_content = self.gunzip_bytes_obj(filename, decoded)
|
||||
|
||||
if new_file_content:
|
||||
filename = self.check_filename(filename, new_file_content)
|
||||
|
||||
if filename:
|
||||
# create subdir
|
||||
dirname = os.path.dirname(filename)
|
||||
if not os.path.exists(dirname):
|
||||
os.makedirs(dirname)
|
||||
|
||||
with open(filename, 'wb') as f:
|
||||
f.write(decoded)
|
||||
|
||||
item_id = filename
|
||||
# remove self.ITEMS_FOLDER from
|
||||
if self.ITEMS_FOLDER in item_id:
|
||||
item_id = item_id.replace(self.ITEMS_FOLDER, '', 1)
|
||||
|
||||
item = Item(item_id)
|
||||
|
||||
update_obj_date(item.get_date(), 'item')
|
||||
|
||||
self.add_message_to_queue(item_id, 'Item')
|
||||
self.processed_item += 1
|
||||
|
||||
# DIRTY FIX AIL SYNC - SEND TO SYNC MODULE
|
||||
# # FIXME: DIRTY FIX
|
||||
message = f'{item.get_type()};{item.get_subtype(r_str=True)};{item.get_id()}'
|
||||
print(message)
|
||||
self.add_message_to_queue(message, 'Sync')
|
||||
|
||||
print(item_id)
|
||||
if r_result:
|
||||
return item_id
|
||||
|
||||
self.logger.info(f"Empty Item: {message} not processed")
|
||||
elif self.obj.type == 'message':
|
||||
# TODO send to specific object queue => image, ...
|
||||
self.add_message_to_queue(obj=self.obj, queue='Item')
|
||||
elif self.obj.type == 'image':
|
||||
self.add_message_to_queue(obj=self.obj, queue='Image')
|
||||
else:
|
||||
self.logger.debug(f"Empty Item: {message} not processed")
|
||||
print(f"Empty Item: {message} not processed")
|
||||
self.logger.critical(f"Empty obj: {self.obj} {message} not processed")
|
||||
|
||||
def check_filename(self, filename, new_file_content):
|
||||
"""
|
||||
|
|
|
@ -18,13 +18,14 @@ import os
|
|||
import re
|
||||
import sys
|
||||
|
||||
import DomainClassifier.domainclassifier
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class Hosts(AbstractModule):
|
||||
"""
|
||||
|
@ -43,29 +44,29 @@ class Hosts(AbstractModule):
|
|||
# Waiting time in seconds between to message processed
|
||||
self.pending_seconds = 1
|
||||
|
||||
self.host_regex = r'\b([a-zA-Z\d-]{,63}(?:\.[a-zA-Z\d-]{,63})+)\b'
|
||||
re.compile(self.host_regex)
|
||||
|
||||
redis_host = config_loader.get_config_str('Redis_Cache', 'host')
|
||||
redis_port = config_loader.get_config_int('Redis_Cache', 'port')
|
||||
redis_db = config_loader.get_config_int('Redis_Cache', 'db')
|
||||
self.dom_classifier = DomainClassifier.domainclassifier.Extract(rawtext="",
|
||||
redis_host=redis_host,
|
||||
redis_port=redis_port,
|
||||
redis_db=redis_db,
|
||||
re_timeout=30)
|
||||
self.logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
obj = self.get_obj()
|
||||
|
||||
# mimetype = item_basic.get_item_mimetype(item.get_id())
|
||||
# if mimetype.split('/')[0] == "text":
|
||||
|
||||
content = item.get_content()
|
||||
hosts = self.regex_findall(self.host_regex, item.get_id(), content)
|
||||
if hosts:
|
||||
print(f'{len(hosts)} host {item.get_id()}')
|
||||
for host in hosts:
|
||||
# print(host)
|
||||
|
||||
msg = f'{host} {item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Host')
|
||||
content = obj.get_content()
|
||||
self.dom_classifier.text(content)
|
||||
if self.dom_classifier.domain:
|
||||
print(f'{len(self.dom_classifier.domain)} host {obj.get_id()}')
|
||||
# print(self.dom_classifier.domain)
|
||||
for domain in self.dom_classifier.domain:
|
||||
if domain:
|
||||
self.add_message_to_queue(message=domain, queue='Host')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
||||
module = Hosts()
|
||||
module.run()
|
||||
|
|
|
@ -43,14 +43,15 @@ class IPAddress(AbstractModule):
|
|||
networks = config_loader.get_config_str("IP", "networks")
|
||||
if not networks:
|
||||
print('No IP ranges provided')
|
||||
sys.exit(0)
|
||||
try:
|
||||
for network in networks.split(","):
|
||||
self.ip_networks.add(IPv4Network(network))
|
||||
print(f'IP Range To Search: {network}')
|
||||
except:
|
||||
print('Please provide a list of valid IP addresses')
|
||||
sys.exit(0)
|
||||
# sys.exit(0)
|
||||
else:
|
||||
try:
|
||||
for network in networks.split(","):
|
||||
self.ip_networks.add(IPv4Network(network))
|
||||
print(f'IP Range To Search: {network}')
|
||||
except:
|
||||
print('Please provide a list of valid IP addresses')
|
||||
sys.exit(0)
|
||||
|
||||
self.re_ipv4 = r'(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
|
||||
re.compile(self.re_ipv4)
|
||||
|
@ -62,7 +63,10 @@ class IPAddress(AbstractModule):
|
|||
self.logger.info(f"Module {self.module_name} initialized")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item = Item(message)
|
||||
if not self.ip_networks:
|
||||
return None
|
||||
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
# list of the regex results in the Item
|
||||
|
@ -82,8 +86,8 @@ class IPAddress(AbstractModule):
|
|||
self.redis_logger.warning(f'{item.get_id()} contains {item.get_id()} IPs')
|
||||
|
||||
# Tag message with IP
|
||||
msg = f'infoleak:automatic-detection="ip";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="ip"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -73,7 +73,7 @@ class Iban(AbstractModule):
|
|||
return extracted
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
|
||||
ibans = self.regex_findall(self.iban_regex, item_id, item.get_content())
|
||||
|
@ -97,8 +97,8 @@ class Iban(AbstractModule):
|
|||
to_print = f'Iban;{item.get_source()};{item.get_date()};{item.get_basename()};'
|
||||
self.redis_logger.warning(f'{to_print}Checked found {len(valid_ibans)} IBAN;{item_id}')
|
||||
# Tags
|
||||
msg = f'infoleak:automatic-detection="iban";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="iban"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -93,12 +93,12 @@ class Indexer(AbstractModule):
|
|||
self.last_refresh = time_now
|
||||
|
||||
def compute(self, message):
|
||||
docpath = message.split(" ", -1)[-1]
|
||||
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
item_content = item.get_content()
|
||||
|
||||
docpath = item_id
|
||||
|
||||
self.logger.debug(f"Indexing - {self.indexname}: {docpath}")
|
||||
print(f"Indexing - {self.indexname}: {docpath}")
|
||||
|
||||
|
|
|
@ -56,7 +56,7 @@ class Keys(AbstractModule):
|
|||
self.pending_seconds = 1
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
# find = False
|
||||
|
@ -65,107 +65,107 @@ class Keys(AbstractModule):
|
|||
if KeyEnum.PGP_MESSAGE.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a PGP enc message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="pgp-message";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-message"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
# find = True
|
||||
|
||||
if KeyEnum.PGP_PUBLIC_KEY_BLOCK.value in content:
|
||||
msg = f'infoleak:automatic-detection="pgp-public-key-block";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-public-key-block"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
|
||||
if KeyEnum.PGP_SIGNATURE.value in content:
|
||||
msg = f'infoleak:automatic-detection="pgp-signature";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-signature"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
|
||||
if KeyEnum.PGP_PRIVATE_KEY_BLOCK.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a pgp private key block message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="pgp-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
get_pgp_content = True
|
||||
|
||||
if KeyEnum.CERTIFICATE.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a certificate message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="certificate";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="certificate"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.RSA_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a RSA private key message')
|
||||
print('rsa private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="rsa-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="rsa-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a private key message')
|
||||
print('private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.ENCRYPTED_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an encrypted private key message')
|
||||
print('encrypted private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="encrypted-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="encrypted-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.OPENSSH_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message')
|
||||
print('openssh private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="private-ssh-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="private-ssh-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.SSH2_ENCRYPTED_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an ssh2 private key message')
|
||||
print('SSH2 private key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="private-ssh-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="private-ssh-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.OPENVPN_STATIC_KEY_V1.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an openssh private key message')
|
||||
print('OpenVPN Static key message found')
|
||||
|
||||
msg = f'infoleak:automatic-detection="vpn-static-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="vpn-static-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.DSA_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a dsa private key message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="dsa-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="dsa-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.EC_PRIVATE_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has an ec private key message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="ec-private-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="ec-private-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
if KeyEnum.PUBLIC_KEY.value in content:
|
||||
self.redis_logger.warning(f'{item.get_basename()} has a public key message')
|
||||
|
||||
msg = f'infoleak:automatic-detection="public-key";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="public-key"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# find = True
|
||||
|
||||
# pgp content
|
||||
if get_pgp_content:
|
||||
self.add_message_to_queue(item.get_id(), 'PgpDump')
|
||||
self.add_message_to_queue(queue='PgpDump')
|
||||
|
||||
# if find :
|
||||
# # Send to duplicate
|
||||
|
|
|
@ -25,11 +25,14 @@ class Languages(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
if item.is_crawled():
|
||||
domain = Domain(item.get_domain())
|
||||
for lang in item.get_languages(min_probability=0.8):
|
||||
domain.add_language(lang.language)
|
||||
obj = self.get_obj()
|
||||
|
||||
if obj.type == 'item':
|
||||
if obj.is_crawled():
|
||||
domain = Domain(obj.get_domain())
|
||||
for lang in obj.get_languages(min_probability=0.8, force_gcld3=True):
|
||||
print(lang)
|
||||
domain.add_language(lang)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -25,9 +25,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib.objects.Items import Item
|
||||
# from lib import Statistics
|
||||
|
||||
class LibInjection(AbstractModule):
|
||||
"""docstring for LibInjection module."""
|
||||
|
@ -40,7 +37,8 @@ class LibInjection(AbstractModule):
|
|||
self.redis_logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message):
|
||||
url, item_id = message.split()
|
||||
item = self.get_obj()
|
||||
url = message
|
||||
|
||||
self.faup.decode(url)
|
||||
url_parsed = self.faup.get()
|
||||
|
@ -68,7 +66,6 @@ class LibInjection(AbstractModule):
|
|||
# print(f'query is sqli : {result_query}')
|
||||
|
||||
if result_path['sqli'] is True or result_query['sqli'] is True:
|
||||
item = Item(item_id)
|
||||
item_id = item.get_id()
|
||||
print(f"Detected (libinjection) SQL in URL: {item_id}")
|
||||
print(unquote(url))
|
||||
|
@ -77,8 +74,8 @@ class LibInjection(AbstractModule):
|
|||
self.redis_logger.warning(to_print)
|
||||
|
||||
# Add tag
|
||||
msg = f'infoleak:automatic-detection="sql-injection";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="sql-injection"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# statistics
|
||||
# # # TODO: # FIXME: remove me
|
||||
|
|
|
@ -45,8 +45,9 @@ class MISP_Thehive_Auto_Push(AbstractModule):
|
|||
self.last_refresh = time.time()
|
||||
self.redis_logger.info('Tags Auto Push refreshed')
|
||||
|
||||
item_id, tag = message.split(';', 1)
|
||||
item = Item(item_id)
|
||||
tag = message
|
||||
item = self.get_obj()
|
||||
item_id = item.get_id()
|
||||
|
||||
# enabled
|
||||
if 'misp' in self.tags:
|
||||
|
|
|
@ -135,11 +135,11 @@ class Mail(AbstractModule):
|
|||
|
||||
# # TODO: sanitize mails
|
||||
def compute(self, message):
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
item_date = item.get_date()
|
||||
|
||||
mails = self.regex_findall(self.email_regex, item_id, item.get_content())
|
||||
mails = self.regex_findall(self.email_regex, item.id, item.get_content())
|
||||
mxdomains_email = {}
|
||||
for mail in mails:
|
||||
mxdomain = mail.rsplit('@', 1)[1].lower()
|
||||
|
@ -172,13 +172,13 @@ class Mail(AbstractModule):
|
|||
# for tld in mx_tlds:
|
||||
# Statistics.add_module_tld_stats_by_date('mail', item_date, tld, mx_tlds[tld])
|
||||
|
||||
msg = f'Mails;{item.get_source()};{item_date};{item.get_basename()};Checked {num_valid_email} e-mail(s);{item_id}'
|
||||
msg = f'Mails;{item.get_source()};{item_date};{item.get_basename()};Checked {num_valid_email} e-mail(s);{item.id}'
|
||||
if num_valid_email > self.mail_threshold:
|
||||
print(f'{item_id} Checked {num_valid_email} e-mail(s)')
|
||||
print(f'{item.id} Checked {num_valid_email} e-mail(s)')
|
||||
self.redis_logger.warning(msg)
|
||||
# Tags
|
||||
msg = f'infoleak:automatic-detection="mail";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="mail"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
elif num_valid_email > 0:
|
||||
self.redis_logger.info(msg)
|
||||
|
||||
|
|
|
@ -9,7 +9,7 @@ This module is consuming the Redis-list created by the ZMQ_Feed_Q Module.
|
|||
This module take all the feeds provided in the config.
|
||||
|
||||
|
||||
Depending on the configuration, this module will process the feed as follow:
|
||||
Depending on the configuration, this module will process the feed as follows:
|
||||
operation_mode 1: "Avoid any duplicate from any sources"
|
||||
- The module maintain a list of content for each item
|
||||
- If the content is new, process it
|
||||
|
@ -64,9 +64,6 @@ class Mixer(AbstractModule):
|
|||
self.ttl_key = config_loader.get_config_int("Module_Mixer", "ttl_duplicate")
|
||||
self.default_feeder_name = config_loader.get_config_str("Module_Mixer", "default_unnamed_feed_name")
|
||||
|
||||
self.ITEMS_FOLDER = os.path.join(os.environ['AIL_HOME'], config_loader.get_config_str("Directories", "pastes")) + '/'
|
||||
self.ITEMS_FOLDER = os.path.join(os.path.realpath(self.ITEMS_FOLDER), '')
|
||||
|
||||
self.nb_processed_items = 0
|
||||
self.feeders_processed = {}
|
||||
self.feeders_duplicate = {}
|
||||
|
@ -131,37 +128,45 @@ class Mixer(AbstractModule):
|
|||
|
||||
self.last_refresh = time.time()
|
||||
self.clear_feeders_stat()
|
||||
time.sleep(0.5)
|
||||
time.sleep(0.5)
|
||||
|
||||
def computeNone(self):
|
||||
self.refresh_stats()
|
||||
|
||||
def compute(self, message):
|
||||
self.refresh_stats()
|
||||
# obj = self.obj
|
||||
# TODO CHECK IF NOT self.object -> get object global ID from message
|
||||
|
||||
splitted = message.split()
|
||||
# Old Feeder name "feeder>>item_id gzip64encoded"
|
||||
if len(splitted) == 2:
|
||||
item_id, gzip64encoded = splitted
|
||||
try:
|
||||
feeder_name, item_id = item_id.split('>>')
|
||||
feeder_name.replace(" ", "")
|
||||
if 'import_dir' in feeder_name:
|
||||
feeder_name = feeder_name.split('/')[1]
|
||||
except ValueError:
|
||||
feeder_name = self.default_feeder_name
|
||||
# Feeder name in message: "feeder item_id gzip64encoded"
|
||||
elif len(splitted) == 3:
|
||||
feeder_name, item_id, gzip64encoded = splitted
|
||||
# message -> feeder_name - content
|
||||
# or message -> feeder_name
|
||||
|
||||
# feeder_name - object
|
||||
if len(splitted) == 1: # feeder_name - object (content already saved)
|
||||
feeder_name = message
|
||||
gzip64encoded = None
|
||||
|
||||
# Feeder name in message: "feeder obj_id gzip64encoded"
|
||||
elif len(splitted) == 2: # gzip64encoded content
|
||||
feeder_name, gzip64encoded = splitted
|
||||
else:
|
||||
print('Invalid message: not processed')
|
||||
self.logger.debug(f'Invalid Item: {splitted[0]} not processed')
|
||||
self.logger.warning(f'Invalid Message: {splitted} not processed')
|
||||
return None
|
||||
|
||||
# remove absolute path
|
||||
item_id = item_id.replace(self.ITEMS_FOLDER, '', 1)
|
||||
if self.obj.type == 'item':
|
||||
# Remove ITEMS_FOLDER from item path (crawled item + submitted)
|
||||
# Limit basename length
|
||||
obj_id = self.obj.id
|
||||
self.obj.sanitize_id()
|
||||
if self.obj.id != obj_id:
|
||||
self.queue.rename_message_obj(self.obj.id, obj_id)
|
||||
|
||||
relay_message = f'{item_id} {gzip64encoded}'
|
||||
|
||||
relay_message = gzip64encoded
|
||||
# print(relay_message)
|
||||
|
||||
# TODO only work for item object
|
||||
# Avoid any duplicate coming from any sources
|
||||
if self.operation_mode == 1:
|
||||
digest = hashlib.sha1(gzip64encoded.encode('utf8')).hexdigest()
|
||||
|
@ -173,7 +178,7 @@ class Mixer(AbstractModule):
|
|||
self.r_cache.expire(digest, self.ttl_key)
|
||||
|
||||
self.increase_stat_processed(feeder_name)
|
||||
self.add_message_to_queue(relay_message)
|
||||
self.add_message_to_queue(message=relay_message)
|
||||
|
||||
# Need To Be Fixed, Currently doesn't check the source (-> same as operation 1)
|
||||
# # Keep duplicate coming from different sources
|
||||
|
@ -210,7 +215,10 @@ class Mixer(AbstractModule):
|
|||
# No Filtering
|
||||
else:
|
||||
self.increase_stat_processed(feeder_name)
|
||||
self.add_message_to_queue(relay_message)
|
||||
if self.obj.type == 'item':
|
||||
self.add_message_to_queue(obj=self.obj, message=gzip64encoded)
|
||||
else:
|
||||
self.add_message_to_queue(obj=self.obj)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -42,7 +42,8 @@ class Onion(AbstractModule):
|
|||
self.faup = crawlers.get_faup()
|
||||
|
||||
# activate_crawler = p.config.get("Crawler", "activate_crawler")
|
||||
|
||||
self.har = config_loader.get_config_boolean('Crawler', 'default_har')
|
||||
self.screenshot = config_loader.get_config_boolean('Crawler', 'default_screenshot')
|
||||
|
||||
self.onion_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
# self.i2p_regex = r"((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||
|
@ -69,8 +70,8 @@ class Onion(AbstractModule):
|
|||
onion_urls = []
|
||||
domains = []
|
||||
|
||||
item_id, score = message.split()
|
||||
item = Item(item_id)
|
||||
score = message
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
|
||||
# max execution time on regex
|
||||
|
@ -90,8 +91,9 @@ class Onion(AbstractModule):
|
|||
|
||||
if onion_urls:
|
||||
if crawlers.is_crawler_activated():
|
||||
for domain in domains: # TODO LOAD DEFAULT SCREENSHOT + HAR
|
||||
task_uuid = crawlers.create_task(domain, parent=item.get_id(), priority=0)
|
||||
for domain in domains:
|
||||
task_uuid = crawlers.create_task(domain, parent=item.get_id(), priority=0,
|
||||
har=self.har, screenshot=self.screenshot)
|
||||
if task_uuid:
|
||||
print(f'{domain} added to crawler queue: {task_uuid}')
|
||||
else:
|
||||
|
@ -100,8 +102,8 @@ class Onion(AbstractModule):
|
|||
self.redis_logger.warning(f'{to_print}Detected {len(domains)} .onion(s);{item.get_id()}')
|
||||
|
||||
# TAG Item
|
||||
msg = f'infoleak:automatic-detection="onion";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="onion"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
144
bin/modules/Pasties.py
Executable file
144
bin/modules/Pasties.py
Executable file
|
@ -0,0 +1,144 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Pasties Module
|
||||
======================
|
||||
This module spots domain-pasties services for further processing
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
from pyfaup.faup import Faup
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.ConfigLoader import ConfigLoader
|
||||
from lib import crawlers
|
||||
|
||||
# TODO add url validator
|
||||
|
||||
pasties_blocklist_urls = set()
|
||||
pasties_domains = {}
|
||||
|
||||
class Pasties(AbstractModule):
|
||||
"""
|
||||
Pasties module for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Pasties, self).__init__()
|
||||
self.faup = Faup()
|
||||
|
||||
config_loader = ConfigLoader()
|
||||
self.r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
|
||||
self.pasties = {}
|
||||
self.urls_blocklist = set()
|
||||
self.load_pasties_domains()
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def load_pasties_domains(self):
|
||||
self.pasties = {}
|
||||
self.urls_blocklist = set()
|
||||
|
||||
domains_pasties = os.path.join(os.environ['AIL_HOME'], 'files/domains_pasties')
|
||||
if os.path.exists(domains_pasties):
|
||||
with open(domains_pasties) as f:
|
||||
for line in f:
|
||||
url = line.strip()
|
||||
if url: # TODO validate line
|
||||
self.faup.decode(url)
|
||||
url_decoded = self.faup.get()
|
||||
host = url_decoded['host']
|
||||
# if url_decoded.get('port', ''):
|
||||
# host = f'{host}:{url_decoded["port"]}'
|
||||
path = url_decoded.get('resource_path', '')
|
||||
# print(url_decoded)
|
||||
if path and path != '/':
|
||||
if path[-1] != '/':
|
||||
path = f'{path}/'
|
||||
else:
|
||||
path = None
|
||||
|
||||
if host in self.pasties:
|
||||
if path:
|
||||
self.pasties[host].add(path)
|
||||
else:
|
||||
if path:
|
||||
self.pasties[host] = {path}
|
||||
else:
|
||||
self.pasties[host] = set()
|
||||
|
||||
url_blocklist = os.path.join(os.environ['AIL_HOME'], 'files/domains_pasties_blacklist')
|
||||
if os.path.exists(url_blocklist):
|
||||
with open(url_blocklist) as f:
|
||||
for line in f:
|
||||
url = line.strip()
|
||||
self.faup.decode(url)
|
||||
url_decoded = self.faup.get()
|
||||
host = url_decoded['host']
|
||||
# if url_decoded.get('port', ''):
|
||||
# host = f'{host}:{url_decoded["port"]}'
|
||||
path = url_decoded.get('resource_path', '')
|
||||
url = f'{host}{path}'
|
||||
if url_decoded['query_string']:
|
||||
url = url + url_decoded['query_string']
|
||||
self.urls_blocklist.add(url)
|
||||
|
||||
def send_to_crawler(self, url, obj_id):
|
||||
if not self.r_cache.exists(f'{self.module_name}:url:{url}'):
|
||||
self.r_cache.set(f'{self.module_name}:url:{url}', int(time.time()))
|
||||
self.r_cache.expire(f'{self.module_name}:url:{url}', 86400)
|
||||
crawlers.create_task(url, depth=0, har=False, screenshot=False, proxy='force_tor', priority=60, parent=obj_id)
|
||||
|
||||
def compute(self, message):
|
||||
url = message.split()
|
||||
|
||||
self.faup.decode(url)
|
||||
url_decoded = self.faup.get()
|
||||
# print(url_decoded)
|
||||
url_host = url_decoded['host']
|
||||
# if url_decoded.get('port', ''):
|
||||
# url_host = f'{url_host}:{url_decoded["port"]}'
|
||||
path = url_decoded.get('resource_path', '')
|
||||
if url_host in self.pasties:
|
||||
if url.startswith('http://'):
|
||||
if url[7:] in self.urls_blocklist:
|
||||
return None
|
||||
elif url.startswith('https://'):
|
||||
if url[8:] in self.urls_blocklist:
|
||||
return None
|
||||
else:
|
||||
if url in self.urls_blocklist:
|
||||
return None
|
||||
|
||||
if not self.pasties[url_host]:
|
||||
if path and path != '/':
|
||||
print('send to crawler', url_host, url)
|
||||
self.send_to_crawler(url, self.obj.id)
|
||||
else:
|
||||
if path.endswith('/'):
|
||||
path_end = path[:-1]
|
||||
else:
|
||||
path_end = f'{path}/'
|
||||
for url_path in self.pasties[url_host]:
|
||||
if path.startswith(url_path):
|
||||
if url_path != path and url_path != path_end:
|
||||
print('send to crawler', url_path, url)
|
||||
self.send_to_crawler(url, self.obj.id))
|
||||
break
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = Pasties()
|
||||
module.run()
|
|
@ -24,7 +24,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.objects import Pgps
|
||||
from lib.objects.Items import Item
|
||||
from trackers.Tracker_Term import Tracker_Term
|
||||
from trackers.Tracker_Regex import Tracker_Regex
|
||||
from trackers.Tracker_Yara import Tracker_Yara
|
||||
|
@ -61,7 +60,6 @@ class PgpDump(AbstractModule):
|
|||
self.tracker_yara = Tracker_Yara(queue=False)
|
||||
|
||||
# init
|
||||
self.item_id = None
|
||||
self.keys = set()
|
||||
self.private_keys = set()
|
||||
self.names = set()
|
||||
|
@ -93,11 +91,11 @@ class PgpDump(AbstractModule):
|
|||
print()
|
||||
pgp_block = self.remove_html(pgp_block)
|
||||
# Remove Version
|
||||
versions = self.regex_findall(self.reg_tool_version, self.item_id, pgp_block)
|
||||
versions = self.regex_findall(self.reg_tool_version, self.obj.id, pgp_block)
|
||||
for version in versions:
|
||||
pgp_block = pgp_block.replace(version, '')
|
||||
# Remove Comment
|
||||
comments = self.regex_findall(self.reg_block_comment, self.item_id, pgp_block)
|
||||
comments = self.regex_findall(self.reg_block_comment, self.obj.id, pgp_block)
|
||||
for comment in comments:
|
||||
pgp_block = pgp_block.replace(comment, '')
|
||||
# Remove Empty Lines
|
||||
|
@ -130,7 +128,7 @@ class PgpDump(AbstractModule):
|
|||
try:
|
||||
output = output.decode()
|
||||
except UnicodeDecodeError:
|
||||
self.logger.error(f'Error PgpDump UnicodeDecodeError: {self.item_id}')
|
||||
self.logger.error(f'Error PgpDump UnicodeDecodeError: {self.obj.id}')
|
||||
output = ''
|
||||
return output
|
||||
|
||||
|
@ -145,7 +143,7 @@ class PgpDump(AbstractModule):
|
|||
private = True
|
||||
else:
|
||||
private = False
|
||||
users = self.regex_findall(self.reg_user_id, self.item_id, pgpdump_output)
|
||||
users = self.regex_findall(self.reg_user_id, self.obj.id, pgpdump_output)
|
||||
for user in users:
|
||||
# avoid key injection in user_id:
|
||||
pgpdump_output.replace(user, '', 1)
|
||||
|
@ -159,7 +157,7 @@ class PgpDump(AbstractModule):
|
|||
name = user
|
||||
self.names.add(name)
|
||||
|
||||
keys = self.regex_findall(self.reg_key_id, self.item_id, pgpdump_output)
|
||||
keys = self.regex_findall(self.reg_key_id, self.obj.id, pgpdump_output)
|
||||
for key_id in keys:
|
||||
key_id = key_id.replace('Key ID - ', '', 1)
|
||||
if key_id != '0x0000000000000000':
|
||||
|
@ -171,28 +169,26 @@ class PgpDump(AbstractModule):
|
|||
print('symmetrically encrypted')
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
self.item_id = item.get_id()
|
||||
content = item.get_content()
|
||||
content = self.obj.get_content()
|
||||
|
||||
pgp_blocks = []
|
||||
# Public Block
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_public_blocs, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_public_blocs, self.obj.id, content):
|
||||
# content = content.replace(pgp_block, '')
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
# Private Block
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_private_blocs, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_private_blocs, self.obj.id, content):
|
||||
# content = content.replace(pgp_block, '')
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
# Signature
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_signature, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_signature, self.obj.id, content):
|
||||
# content = content.replace(pgp_block, '')
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
# Message
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_message, self.item_id, content):
|
||||
for pgp_block in self.regex_findall(self.reg_pgp_message, self.obj.id, content):
|
||||
pgp_block = self.sanitize_pgp_block(pgp_block)
|
||||
pgp_blocks.append(pgp_block)
|
||||
|
||||
|
@ -206,26 +202,26 @@ class PgpDump(AbstractModule):
|
|||
self.extract_id_from_pgpdump_output(pgpdump_output)
|
||||
|
||||
if self.keys or self.names or self.mails:
|
||||
print(self.item_id)
|
||||
date = item.get_date()
|
||||
print(self.obj.id)
|
||||
date = self.obj.get_date()
|
||||
for key in self.keys:
|
||||
pgp = Pgps.Pgp(key, 'key')
|
||||
pgp.add(date, self.item_id)
|
||||
pgp.add(date, self.obj)
|
||||
print(f' key: {key}')
|
||||
for name in self.names:
|
||||
pgp = Pgps.Pgp(name, 'name')
|
||||
pgp.add(date, self.item_id)
|
||||
pgp.add(date, self.obj)
|
||||
print(f' name: {name}')
|
||||
self.tracker_term.compute(name, obj_type='pgp', subtype='name')
|
||||
self.tracker_regex.compute(name, obj_type='pgp', subtype='name')
|
||||
self.tracker_yara.compute(name, obj_type='pgp', subtype='name')
|
||||
self.tracker_term.compute_manual(pgp)
|
||||
self.tracker_regex.compute_manual(pgp)
|
||||
self.tracker_yara.compute_manual(pgp)
|
||||
for mail in self.mails:
|
||||
pgp = Pgps.Pgp(mail, 'mail')
|
||||
pgp.add(date, self.item_id)
|
||||
pgp.add(date, self.obj)
|
||||
print(f' mail: {mail}')
|
||||
self.tracker_term.compute(mail, obj_type='pgp', subtype='mail')
|
||||
self.tracker_regex.compute(mail, obj_type='pgp', subtype='mail')
|
||||
self.tracker_yara.compute(mail, obj_type='pgp', subtype='mail')
|
||||
self.tracker_term.compute_manual(pgp)
|
||||
self.tracker_regex.compute_manual(pgp)
|
||||
self.tracker_yara.compute_manual(pgp)
|
||||
|
||||
# Keys extracted from PGP PRIVATE KEY BLOCK
|
||||
for key in self.private_keys:
|
||||
|
@ -234,11 +230,10 @@ class PgpDump(AbstractModule):
|
|||
print(f' private key: {key}')
|
||||
|
||||
if self.symmetrically_encrypted:
|
||||
msg = f'infoleak:automatic-detection="pgp-symmetric";{self.item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="pgp-symmetric"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = PgpDump()
|
||||
module.run()
|
||||
|
||||
|
|
|
@ -49,7 +49,7 @@ class Phone(AbstractModule):
|
|||
return extracted
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
# TODO use language detection to choose the country code ?
|
||||
|
@ -59,8 +59,8 @@ class Phone(AbstractModule):
|
|||
|
||||
if results:
|
||||
# TAGS
|
||||
msg = f'infoleak:automatic-detection="phone-number";{item.get_id()}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="phone-number"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
self.redis_logger.warning(f'{item.get_id()} contains {len(phone)} Phone numbers')
|
||||
|
||||
|
|
|
@ -44,22 +44,21 @@ class SQLInjectionDetection(AbstractModule):
|
|||
self.logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, message):
|
||||
url, item_id = message.split()
|
||||
url = message
|
||||
item = self.get_obj()
|
||||
|
||||
if self.is_sql_injection(url):
|
||||
self.faup.decode(url)
|
||||
url_parsed = self.faup.get()
|
||||
|
||||
item = Item(item_id)
|
||||
item_id = item.get_id()
|
||||
print(f"Detected SQL in URL: {item_id}")
|
||||
print(urllib.request.unquote(url))
|
||||
to_print = f'SQLInjection;{item.get_source()};{item.get_date()};{item.get_basename()};Detected SQL in URL;{item_id}'
|
||||
self.redis_logger.warning(to_print)
|
||||
|
||||
# Tag
|
||||
msg = f'infoleak:automatic-detection="sql-injection";{item_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = f'infoleak:automatic-detection="sql-injection";{item_id}'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
# statistics
|
||||
# tld = url_parsed['tld']
|
||||
|
|
|
@ -16,8 +16,6 @@ import gzip
|
|||
import base64
|
||||
import datetime
|
||||
import time
|
||||
# from sflock.main import unpack
|
||||
# import sflock
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
|
@ -27,7 +25,7 @@ from modules.abstract_module import AbstractModule
|
|||
from lib.objects.Items import ITEMS_FOLDER
|
||||
from lib import ConfigLoader
|
||||
from lib import Tag
|
||||
|
||||
from lib.objects.Items import Item
|
||||
|
||||
class SubmitPaste(AbstractModule):
|
||||
"""
|
||||
|
@ -48,7 +46,6 @@ class SubmitPaste(AbstractModule):
|
|||
"""
|
||||
super(SubmitPaste, self).__init__()
|
||||
|
||||
# TODO KVROCKS
|
||||
self.r_serv_db = ConfigLoader.ConfigLoader().get_db_conn("Kvrocks_DB")
|
||||
self.r_serv_log_submit = ConfigLoader.ConfigLoader().get_redis_conn("Redis_Log_submit")
|
||||
|
||||
|
@ -279,9 +276,11 @@ class SubmitPaste(AbstractModule):
|
|||
rel_item_path = save_path.replace(self.PASTES_FOLDER, '', 1)
|
||||
self.redis_logger.debug(f"relative path {rel_item_path}")
|
||||
|
||||
item = Item(rel_item_path)
|
||||
|
||||
# send paste to Global module
|
||||
relay_message = f"submitted {rel_item_path} {gzip64encoded}"
|
||||
self.add_message_to_queue(relay_message)
|
||||
relay_message = f"submitted {gzip64encoded}"
|
||||
self.add_message_to_queue(obj=item, message=relay_message)
|
||||
|
||||
# add tags
|
||||
for tag in ltags:
|
||||
|
|
|
@ -20,9 +20,6 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib.objects.Items import Item
|
||||
from lib import Tag
|
||||
|
||||
|
||||
class Tags(AbstractModule):
|
||||
"""
|
||||
|
@ -39,26 +36,15 @@ class Tags(AbstractModule):
|
|||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def compute(self, message):
|
||||
# Extract item ID and tag from message
|
||||
mess_split = message.split(';')
|
||||
if len(mess_split) == 2:
|
||||
tag = mess_split[0]
|
||||
item = Item(mess_split[1])
|
||||
item = self.obj
|
||||
tag = message
|
||||
|
||||
# Create a new tag
|
||||
Tag.add_object_tag(tag, 'item', item.get_id())
|
||||
print(f'{item.get_id()}: Tagged {tag}')
|
||||
|
||||
# Forward message to channel
|
||||
self.add_message_to_queue(message, 'Tag_feed')
|
||||
|
||||
message = f'{item.get_type()};{item.get_subtype(r_str=True)};{item.get_id()}'
|
||||
self.add_message_to_queue(message, 'Sync')
|
||||
|
||||
else:
|
||||
# Malformed message
|
||||
raise Exception(f'too many values to unpack (expected 2) given {len(mess_split)} with message {message}')
|
||||
# Create a new tag
|
||||
item.add_tag(tag)
|
||||
print(f'{item.get_id()}: Tagged {tag}')
|
||||
|
||||
# Forward message to channel
|
||||
self.add_message_to_queue(message=tag, queue='Tag_feed')
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = Tags()
|
||||
|
|
|
@ -41,7 +41,7 @@ class Telegram(AbstractModule):
|
|||
self.logger.info(f"Module {self.module_name} initialized")
|
||||
|
||||
def compute(self, message, r_result=False):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
item_date = item.get_date()
|
||||
|
||||
|
@ -58,7 +58,7 @@ class Telegram(AbstractModule):
|
|||
user_id = dict_url.get('username')
|
||||
if user_id:
|
||||
username = Username(user_id, 'telegram')
|
||||
username.add(item_date, item.id)
|
||||
username.add(item_date, item)
|
||||
print(f'username: {user_id}')
|
||||
invite_hash = dict_url.get('invite_hash')
|
||||
if invite_hash:
|
||||
|
@ -73,7 +73,7 @@ class Telegram(AbstractModule):
|
|||
user_id = dict_url.get('username')
|
||||
if user_id:
|
||||
username = Username(user_id, 'telegram')
|
||||
username.add(item_date, item.id)
|
||||
username.add(item_date, item)
|
||||
print(f'username: {user_id}')
|
||||
invite_hash = dict_url.get('invite_hash')
|
||||
if invite_hash:
|
||||
|
@ -86,8 +86,8 @@ class Telegram(AbstractModule):
|
|||
# CREATE TAG
|
||||
if invite_code_found:
|
||||
# tags
|
||||
msg = f'infoleak:automatic-detection="telegram-invite-hash";{item.id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = 'infoleak:automatic-detection="telegram-invite-hash"'
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
|
|
@ -416,7 +416,7 @@ class Tools(AbstractModule):
|
|||
return extracted
|
||||
|
||||
def compute(self, message):
|
||||
item = Item(message)
|
||||
item = self.get_obj()
|
||||
content = item.get_content()
|
||||
|
||||
for tool_name in TOOLS:
|
||||
|
@ -425,8 +425,8 @@ class Tools(AbstractModule):
|
|||
if match:
|
||||
print(f'{item.id} found: {tool_name}')
|
||||
# Tag Item
|
||||
msg = f"{tool['tag']};{item.id}"
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
tag = tool['tag']
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
# TODO ADD LOGS
|
||||
|
||||
|
||||
|
|
|
@ -62,10 +62,9 @@ class Urls(AbstractModule):
|
|||
"""
|
||||
Search for Web links from given message
|
||||
"""
|
||||
# Extract item
|
||||
item_id, score = message.split()
|
||||
score = message
|
||||
|
||||
item = Item(item_id)
|
||||
item = self.get_obj()
|
||||
item_content = item.get_content()
|
||||
|
||||
# TODO Handle invalid URL
|
||||
|
@ -79,10 +78,9 @@ class Urls(AbstractModule):
|
|||
except AttributeError:
|
||||
url = url_decoded['url']
|
||||
|
||||
to_send = f"{url} {item.get_id()}"
|
||||
print(to_send)
|
||||
self.add_message_to_queue(to_send, 'Url')
|
||||
self.logger.debug(f"url_parsed: {to_send}")
|
||||
print(url, item.get_id())
|
||||
self.add_message_to_queue(message=str(url), queue='Url')
|
||||
self.logger.debug(f"url_parsed: {url}")
|
||||
|
||||
if len(l_urls) > 0:
|
||||
to_print = f'Urls;{item.get_source()};{item.get_date()};{item.get_basename()};'
|
||||
|
|
|
@ -1,71 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
"""
|
||||
The Zerobins Module
|
||||
======================
|
||||
This module spots zerobins-like services for further processing
|
||||
"""
|
||||
|
||||
##################################
|
||||
# Import External packages
|
||||
##################################
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
sys.path.append(os.environ['AIL_BIN'])
|
||||
##################################
|
||||
# Import Project packages
|
||||
##################################
|
||||
from modules.abstract_module import AbstractModule
|
||||
from lib import crawlers
|
||||
|
||||
|
||||
class Zerobins(AbstractModule):
|
||||
"""
|
||||
Zerobins module for AIL framework
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
super(Zerobins, self).__init__()
|
||||
|
||||
binz = [
|
||||
r'^https:\/\/(zerobin||privatebin)\..*$', # historical ones
|
||||
]
|
||||
|
||||
self.regex = re.compile('|'.join(binz))
|
||||
|
||||
# Pending time between two computation (computeNone) in seconds
|
||||
self.pending_seconds = 10
|
||||
|
||||
# Send module state to logs
|
||||
self.logger.info(f'Module {self.module_name} initialized')
|
||||
|
||||
def computeNone(self):
|
||||
"""
|
||||
Compute when no message in queue
|
||||
"""
|
||||
self.logger.debug("No message in queue")
|
||||
|
||||
def compute(self, message):
|
||||
"""
|
||||
Compute a message in queue
|
||||
"""
|
||||
url, item_id = message.split()
|
||||
|
||||
# Extract zerobins addresses
|
||||
matching_binz = self.regex_findall(self.regex, item_id, url)
|
||||
|
||||
if len(matching_binz) > 0:
|
||||
for bin_url in matching_binz:
|
||||
print(f'send {bin_url} to crawler')
|
||||
# TODO Change priority ???
|
||||
crawlers.create_task(bin_url, depth=0, har=False, screenshot=False, proxy='force_tor',
|
||||
parent='manual', priority=60)
|
||||
|
||||
self.logger.debug("Compute message in queue")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
module = Zerobins()
|
||||
module.run()
|
|
@ -23,6 +23,7 @@ from lib import ail_logger
|
|||
from lib.ail_queues import AILQueue
|
||||
from lib import regex_helper
|
||||
from lib.exceptions import ModuleQueueError
|
||||
from lib.objects.ail_objects import get_obj_from_global_id
|
||||
|
||||
logging.config.dictConfig(ail_logger.get_config(name='modules'))
|
||||
|
||||
|
@ -47,6 +48,8 @@ class AbstractModule(ABC):
|
|||
# Setup the I/O queues
|
||||
if queue:
|
||||
self.queue = AILQueue(self.module_name, self.pid)
|
||||
self.obj = None
|
||||
self.sha256_mess = None
|
||||
|
||||
# Init Redis Logger
|
||||
self.redis_logger = publisher
|
||||
|
@ -70,28 +73,53 @@ class AbstractModule(ABC):
|
|||
# Debug Mode
|
||||
self.debug = False
|
||||
|
||||
def get_obj(self):
|
||||
return self.obj
|
||||
|
||||
def get_message(self):
|
||||
"""
|
||||
Get message from the Redis Queue (QueueIn)
|
||||
Input message can change between modules
|
||||
ex: '<item id>'
|
||||
"""
|
||||
return self.queue.get_message()
|
||||
message = self.queue.get_message()
|
||||
if message:
|
||||
obj_global_id, sha256_mess, mess = message
|
||||
if obj_global_id:
|
||||
self.sha256_mess = sha256_mess
|
||||
self.obj = get_obj_from_global_id(obj_global_id)
|
||||
else:
|
||||
self.sha256_mess = None
|
||||
self.obj = None
|
||||
return mess
|
||||
self.sha256_mess = None
|
||||
self.obj = None
|
||||
return None
|
||||
|
||||
def add_message_to_queue(self, message, queue_name=None):
|
||||
# TODO ADD META OBJ ????
|
||||
def add_message_to_queue(self, obj=None, message='', queue=None):
|
||||
"""
|
||||
Add message to queue
|
||||
:param obj: AILObject
|
||||
:param message: message to send in queue
|
||||
:param queue_name: queue or module name
|
||||
:param queue: queue name or module name
|
||||
|
||||
ex: add_message_to_queue(item_id, 'Mail')
|
||||
"""
|
||||
self.queue.send_message(message, queue_name)
|
||||
# add to new set_module
|
||||
if obj:
|
||||
obj_global_id = obj.get_global_id()
|
||||
elif self.obj:
|
||||
obj_global_id = self.obj.get_global_id()
|
||||
else:
|
||||
obj_global_id = '::'
|
||||
self.queue.send_message(obj_global_id, message, queue)
|
||||
|
||||
def get_available_queues(self):
|
||||
return self.queue.get_out_queues()
|
||||
|
||||
def regex_match(self, regex, obj_id, content):
|
||||
return regex_helper.regex_match(self.r_cache_key, regex, obj_id, content, max_time=self.max_execution_time)
|
||||
|
||||
def regex_search(self, regex, obj_id, content):
|
||||
return regex_helper.regex_search(self.r_cache_key, regex, obj_id, content, max_time=self.max_execution_time)
|
||||
|
||||
|
@ -130,7 +158,7 @@ class AbstractModule(ABC):
|
|||
# Get one message (ex:item id) from the Redis Queue (QueueIn)
|
||||
message = self.get_message()
|
||||
|
||||
if message:
|
||||
if message or self.obj:
|
||||
try:
|
||||
# Module processing with the message from the queue
|
||||
self.compute(message)
|
||||
|
@ -152,6 +180,11 @@ class AbstractModule(ABC):
|
|||
# remove from set_module
|
||||
## check if item process == completed
|
||||
|
||||
if self.obj:
|
||||
self.queue.end_message(self.obj.get_global_id(), self.sha256_mess)
|
||||
self.obj = None
|
||||
self.sha256_mess = None
|
||||
|
||||
else:
|
||||
self.computeNone()
|
||||
# Wait before next process
|
||||
|
@ -171,6 +204,10 @@ class AbstractModule(ABC):
|
|||
"""
|
||||
pass
|
||||
|
||||
def compute_manual(self, obj, message=None):
|
||||
self.obj = obj
|
||||
return self.compute(message)
|
||||
|
||||
def computeNone(self):
|
||||
"""
|
||||
Method of the Module when there is no message
|
||||
|
|
|
@ -81,15 +81,29 @@ class Date(object):
|
|||
|
||||
def get_today_date_str(separator=False):
|
||||
if separator:
|
||||
datetime.date.today().strftime("%Y/%m/%d")
|
||||
return datetime.date.today().strftime("%Y/%m/%d")
|
||||
else:
|
||||
return datetime.date.today().strftime("%Y%m%d")
|
||||
|
||||
def get_current_week_day():
|
||||
dt = datetime.date.today()
|
||||
start = dt - datetime.timedelta(days=dt.weekday())
|
||||
return start.strftime("%Y%m%d")
|
||||
|
||||
def get_date_week_by_date(date):
|
||||
dt = datetime.date(int(date[0:4]), int(date[4:6]), int(date[6:8]))
|
||||
start = dt - datetime.timedelta(days=dt.weekday())
|
||||
return start.strftime("%Y%m%d")
|
||||
|
||||
def date_add_day(date, num_day=1):
|
||||
new_date = datetime.date(int(date[0:4]), int(date[4:6]), int(date[6:8])) + datetime.timedelta(num_day)
|
||||
new_date = str(new_date).replace('-', '')
|
||||
return new_date
|
||||
|
||||
def daterange_add_days(date, nb_days):
|
||||
end_date = date_add_day(date, num_day=nb_days)
|
||||
return get_daterange(date, end_date)
|
||||
|
||||
def date_substract_day(date, num_day=1):
|
||||
new_date = datetime.date(int(date[0:4]), int(date[4:6]), int(date[6:8])) - datetime.timedelta(num_day)
|
||||
new_date = str(new_date).replace('-', '')
|
||||
|
|
|
@ -129,17 +129,25 @@ def get_last_tag_from_local(verbose=False):
|
|||
print('{}{}{}'.format(TERMINAL_RED, process.stderr.decode(), TERMINAL_DEFAULT))
|
||||
return ''
|
||||
|
||||
# Get last local tag
|
||||
# Get last remote tag
|
||||
def get_last_tag_from_remote(verbose=False):
|
||||
if verbose:
|
||||
print('retrieving last remote tag ...')
|
||||
#print('git ls-remote --tags')
|
||||
|
||||
process = subprocess.run(['git', 'ls-remote', '--tags'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
if process.returncode == 0:
|
||||
res = process.stdout.split(b'\n')[-2].split(b'/')[-1].replace(b'^{}', b'').decode()
|
||||
if verbose:
|
||||
print(res)
|
||||
return res
|
||||
output_lines = process.stdout.split(b'\n')
|
||||
if len(output_lines) > 1:
|
||||
# Assuming we want the second-to-last line as before
|
||||
res = output_lines[-2].split(b'/')[-1].replace(b'^{}', b'').decode()
|
||||
if verbose:
|
||||
print(res)
|
||||
return res
|
||||
else:
|
||||
if verbose:
|
||||
print("No tags found or insufficient output from git command.")
|
||||
return ''
|
||||
|
||||
else:
|
||||
if verbose:
|
||||
|
|
|
@ -111,7 +111,10 @@ class Retro_Hunt_Module(AbstractModule):
|
|||
self.redis_logger.warning(f'{self.module_name}, Retro Hunt {task_uuid} completed')
|
||||
|
||||
def update_progress(self):
|
||||
new_progress = self.nb_done * 100 / self.nb_objs
|
||||
if self.nb_objs == 0:
|
||||
new_progress = 100
|
||||
else:
|
||||
new_progress = self.nb_done * 100 / self.nb_objs
|
||||
if int(self.progress) != int(new_progress):
|
||||
print(new_progress)
|
||||
self.retro_hunt.set_progress(new_progress)
|
||||
|
@ -128,10 +131,16 @@ class Retro_Hunt_Module(AbstractModule):
|
|||
self.retro_hunt.add(self.obj.get_type(), self.obj.get_subtype(), obj_id)
|
||||
|
||||
# TODO FILTER Tags
|
||||
|
||||
# TODO refactor Tags module for all object type
|
||||
# Tags
|
||||
for tag in self.tags:
|
||||
msg = f'{tag};{id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
if self.obj.get_type() == 'item':
|
||||
for tag in self.tags:
|
||||
msg = f'{tag};{obj_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
else:
|
||||
for tag in self.tags:
|
||||
self.obj.add_tag(tag)
|
||||
|
||||
# # Mails
|
||||
# EXPORTER MAILS
|
||||
|
|
|
@ -41,13 +41,15 @@ class Tracker_Regex(AbstractModule):
|
|||
self.tracked_regexs = Tracker.get_tracked_regexs()
|
||||
self.last_refresh = time.time()
|
||||
|
||||
self.obj = None
|
||||
|
||||
# Exporter
|
||||
self.exporters = {'mail': MailExporterTracker(),
|
||||
'webhook': WebHookExporterTracker()}
|
||||
|
||||
self.redis_logger.info(f"Module: {self.module_name} Launched")
|
||||
|
||||
def compute(self, obj_id, obj_type='item', subtype=''):
|
||||
def compute(self, message):
|
||||
# refresh Tracked regex
|
||||
if self.last_refresh < Tracker.get_tracker_last_updated_by_type('regex'):
|
||||
self.tracked_regexs = Tracker.get_tracked_regexs()
|
||||
|
@ -55,7 +57,7 @@ class Tracker_Regex(AbstractModule):
|
|||
self.redis_logger.debug('Tracked regex refreshed')
|
||||
print('Tracked regex refreshed')
|
||||
|
||||
obj = ail_objects.get_object(obj_type, subtype, obj_id)
|
||||
obj = self.get_obj()
|
||||
obj_id = obj.get_id()
|
||||
obj_type = obj.get_type()
|
||||
|
||||
|
@ -66,12 +68,46 @@ class Tracker_Regex(AbstractModule):
|
|||
content = obj.get_content()
|
||||
|
||||
for dict_regex in self.tracked_regexs[obj_type]:
|
||||
matched = self.regex_findall(dict_regex['regex'], obj_id, content)
|
||||
if matched:
|
||||
self.new_tracker_found(dict_regex['tracked'], 'regex', obj)
|
||||
matches = self.regex_finditer(dict_regex['regex'], obj_id, content)
|
||||
if matches:
|
||||
self.new_tracker_found(dict_regex['tracked'], 'regex', obj, matches)
|
||||
|
||||
def new_tracker_found(self, tracker_name, tracker_type, obj):
|
||||
def extract_matches(self, re_matches, limit=500, lines=5):
|
||||
matches = []
|
||||
content = self.obj.get_content()
|
||||
l_content = len(content)
|
||||
for match in re_matches:
|
||||
start = match[0]
|
||||
value = match[2]
|
||||
end = match[1]
|
||||
|
||||
# Start
|
||||
if start > limit:
|
||||
i_start = start - limit
|
||||
else:
|
||||
i_start = 0
|
||||
str_start = content[i_start:start].splitlines()
|
||||
if len(str_start) > lines:
|
||||
str_start = '\n'.join(str_start[-lines + 1:])
|
||||
else:
|
||||
str_start = content[i_start:start]
|
||||
|
||||
# End
|
||||
if end + limit > l_content:
|
||||
i_end = l_content
|
||||
else:
|
||||
i_end = end + limit
|
||||
str_end = content[end:i_end].splitlines()
|
||||
if len(str_end) > lines:
|
||||
str_end = '\n'.join(str_end[:lines + 1])
|
||||
else:
|
||||
str_end = content[end:i_end]
|
||||
matches.append((value, f'{str_start}{value}{str_end}'))
|
||||
return matches
|
||||
|
||||
def new_tracker_found(self, tracker_name, tracker_type, obj, re_matches):
|
||||
obj_id = obj.get_id()
|
||||
matches = None
|
||||
for tracker_uuid in Tracker.get_trackers_by_tracked_obj_type(tracker_type, obj.get_type(), tracker_name):
|
||||
tracker = Tracker.Tracker(tracker_uuid)
|
||||
|
||||
|
@ -87,14 +123,14 @@ class Tracker_Regex(AbstractModule):
|
|||
|
||||
for tag in tracker.get_tags():
|
||||
if obj.get_type() == 'item':
|
||||
msg = f'{tag};{obj_id}'
|
||||
self.add_message_to_queue(msg, 'Tags')
|
||||
self.add_message_to_queue(message=tag, queue='Tags')
|
||||
else:
|
||||
obj.add_tag(tag)
|
||||
|
||||
if tracker.mail_export():
|
||||
# TODO add matches + custom subjects
|
||||
self.exporters['mail'].export(tracker, obj)
|
||||
if not matches:
|
||||
matches = self.extract_matches(re_matches)
|
||||
self.exporters['mail'].export(tracker, obj, matches)
|
||||
|
||||
if tracker.webhook_export():
|
||||
self.exporters['webhook'].export(tracker, obj)
|
||||
|
@ -103,4 +139,3 @@ class Tracker_Regex(AbstractModule):
|
|||
if __name__ == "__main__":
|
||||
module = Tracker_Regex()
|
||||
module.run()
|
||||
# module.compute('submitted/2023/05/02/submitted_b1e518f1-703b-40f6-8238-d1c22888197e.gz')
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Reference in a new issue