mirror of
https://github.com/ail-project/ail-framework.git
synced 2024-11-30 09:47:17 +00:00
commit
dd34382721
27 changed files with 1734 additions and 17 deletions
4
.gitignore
vendored
4
.gitignore
vendored
|
@ -11,9 +11,10 @@ ardb
|
||||||
faup
|
faup
|
||||||
tlsh
|
tlsh
|
||||||
Blooms
|
Blooms
|
||||||
LEVEL_DB_DATA
|
|
||||||
PASTES
|
PASTES
|
||||||
|
CRAWLED_SCREENSHOT
|
||||||
BASE64
|
BASE64
|
||||||
|
HASHS
|
||||||
DATA_ARDB
|
DATA_ARDB
|
||||||
indexdir/
|
indexdir/
|
||||||
logs/
|
logs/
|
||||||
|
@ -33,6 +34,7 @@ var/www/submitted
|
||||||
bin/packages/config.cfg
|
bin/packages/config.cfg
|
||||||
bin/packages/config.cfg.backup
|
bin/packages/config.cfg.backup
|
||||||
configs/keys
|
configs/keys
|
||||||
|
files
|
||||||
|
|
||||||
# installed files
|
# installed files
|
||||||
nltk_data/
|
nltk_data/
|
||||||
|
|
53
HOWTO.md
53
HOWTO.md
|
@ -84,9 +84,9 @@ You can navigate into the interface by using arrow keys. In order to perform an
|
||||||
|
|
||||||
To change list, you can press the <TAB> key.
|
To change list, you can press the <TAB> key.
|
||||||
|
|
||||||
Also, you can quickly stop or start modules by clicking on the <K> or <S> symbol respectively. These are located in the _Action_ column.
|
Also, you can quickly stop or start modules by clicking on the ``<K>`` or ``<S>`` symbol respectively. These are located in the _Action_ column.
|
||||||
|
|
||||||
Finally, you can quit this program by pressing either <q> or <C-c>
|
Finally, you can quit this program by pressing either ``<q>`` or ``<C-c>``.
|
||||||
|
|
||||||
|
|
||||||
Terms frequency usage
|
Terms frequency usage
|
||||||
|
@ -96,3 +96,52 @@ In AIL, you can track terms, set of terms and even regexes without creating a de
|
||||||
- You can track a term by simply putting it in the box.
|
- You can track a term by simply putting it in the box.
|
||||||
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
|
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
|
||||||
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
|
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
|
||||||
|
|
||||||
|
|
||||||
|
Crawler
|
||||||
|
---------------------
|
||||||
|
In AIL, you can crawl hidden services.
|
||||||
|
|
||||||
|
There are two types of installation. You can install a *local* or a *remote* Splash server.
|
||||||
|
``(Splash host) = the server running the splash service``
|
||||||
|
``(AIL host) = the server running AIL``
|
||||||
|
|
||||||
|
### Installation/Configuration
|
||||||
|
|
||||||
|
1. *(Splash host)* Launch ``crawler_hidden_services_install.sh`` to install all requirements (type ``y`` if a localhost splah server is used or use the ``-y`` option)
|
||||||
|
|
||||||
|
2. *(Splash host)* To install and setup your tor proxy:
|
||||||
|
- Install the tor proxy: ``sudo apt-get install tor -y``
|
||||||
|
(Not required if ``Splah host == AIL host`` - The tor proxy is installed by default in AIL)
|
||||||
|
- Add the following line ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
|
||||||
|
(for a linux docker, the localhost IP is *172.17.0.1*; Should be adapted for other platform)
|
||||||
|
- Restart the tor proxy: ``sudo service tor restart``
|
||||||
|
|
||||||
|
3. *(AIL host)* Edit the ``/bin/packages/config.cfg`` file:
|
||||||
|
- In the crawler section, set ``activate_crawler`` to ``True``
|
||||||
|
- Change the IP address of Splash servers if needed (remote only)
|
||||||
|
- Set ``splash_onion_port`` according to your Splash servers port numbers that will be used.
|
||||||
|
those ports numbers should be described as a single port (ex: 8050) or a port range (ex: 8050-8052 for 8050,8051,8052 ports).
|
||||||
|
|
||||||
|
|
||||||
|
### Starting the scripts
|
||||||
|
|
||||||
|
- *(Splash host)* Launch all Splash servers with:
|
||||||
|
```sudo ./bin/torcrawler/launch_splash_crawler.sh -f <config absolute_path> -p <port_start> -n <number_of_splash>```
|
||||||
|
With ``<port_start>`` and ``<number_of_splash>`` matching those specified at ``splash_onion_port`` in the configuration file of point 3 (``/bin/packages/config.cfg``)
|
||||||
|
|
||||||
|
All Splash dockers are launched inside the ``Docker_Splash`` screen. You can use ``sudo screen -r Docker_Splash`` to connect to the screen session and check all Splash servers status.
|
||||||
|
|
||||||
|
- (AIL host) launch all AIL crawler scripts using:
|
||||||
|
```./bin/LAUNCH.sh -c```
|
||||||
|
|
||||||
|
|
||||||
|
### TL;DR - Local setup
|
||||||
|
#### Installation
|
||||||
|
- ```crawler_hidden_services_install.sh -y```
|
||||||
|
- Add the following line in ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
|
||||||
|
- ```sudo service tor restart```
|
||||||
|
- set activate_crawler to True in ``/bin/packages/config.cfg``
|
||||||
|
#### Start
|
||||||
|
- ```sudo ./bin/torcrawler/launch_splash_crawler.sh -f $AIL_HOME/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1";```
|
||||||
|
- ```./bin/LAUNCH.sh -c```
|
||||||
|
|
|
@ -32,6 +32,7 @@ def decode_base58(bc, length):
|
||||||
for char in bc:
|
for char in bc:
|
||||||
n = n * 58 + digits58.index(char)
|
n = n * 58 + digits58.index(char)
|
||||||
return n.to_bytes(length, 'big')
|
return n.to_bytes(length, 'big')
|
||||||
|
|
||||||
def check_bc(bc):
|
def check_bc(bc):
|
||||||
try:
|
try:
|
||||||
bcbytes = decode_base58(bc, 25)
|
bcbytes = decode_base58(bc, 25)
|
||||||
|
|
239
bin/Crawler.py
Executable file
239
bin/Crawler.py
Executable file
|
@ -0,0 +1,239 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
# -*-coding:UTF-8 -*
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import re
|
||||||
|
import redis
|
||||||
|
import datetime
|
||||||
|
import time
|
||||||
|
import subprocess
|
||||||
|
import requests
|
||||||
|
|
||||||
|
sys.path.append(os.environ['AIL_BIN'])
|
||||||
|
from Helper import Process
|
||||||
|
from pubsublogger import publisher
|
||||||
|
|
||||||
|
def on_error_send_message_back_in_queue(type_hidden_service, domain, message):
|
||||||
|
# send this msg back in the queue
|
||||||
|
if not r_onion.sismember('{}_domain_crawler_queue'.format(type_hidden_service), domain):
|
||||||
|
r_onion.sadd('{}_domain_crawler_queue'.format(type_hidden_service), domain)
|
||||||
|
r_onion.sadd('{}_crawler_queue'.format(type_hidden_service), message)
|
||||||
|
|
||||||
|
def crawl_onion(url, domain, date, date_month, message):
|
||||||
|
|
||||||
|
#if not r_onion.sismember('full_onion_up', domain) and not r_onion.sismember('onion_down:'+date , domain):
|
||||||
|
super_father = r_serv_metadata.hget('paste_metadata:'+paste, 'super_father')
|
||||||
|
if super_father is None:
|
||||||
|
super_father=paste
|
||||||
|
|
||||||
|
try:
|
||||||
|
r = requests.get(splash_url , timeout=30.0)
|
||||||
|
except Exception:
|
||||||
|
# TODO: relaunch docker or send error message
|
||||||
|
|
||||||
|
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
|
||||||
|
publisher.error('{} SPASH DOWN'.format(splash_url))
|
||||||
|
print('--------------------------------------')
|
||||||
|
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
|
||||||
|
print(' {} DOWN'.format(splash_url))
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
if r.status_code == 200:
|
||||||
|
process = subprocess.Popen(["python", './torcrawler/tor_crawler.py', splash_url, type_hidden_service, url, domain, paste, super_father],
|
||||||
|
stdout=subprocess.PIPE)
|
||||||
|
while process.poll() is None:
|
||||||
|
time.sleep(1)
|
||||||
|
|
||||||
|
if process.returncode == 0:
|
||||||
|
output = process.stdout.read().decode()
|
||||||
|
print(output)
|
||||||
|
# error: splash:Connection to proxy refused
|
||||||
|
if 'Connection to proxy refused' in output:
|
||||||
|
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
|
||||||
|
publisher.error('{} SPASH, PROXY DOWN OR BAD CONFIGURATION'.format(splash_url))
|
||||||
|
print('------------------------------------------------------------------------')
|
||||||
|
print(' \033[91m SPLASH: Connection to proxy refused')
|
||||||
|
print('')
|
||||||
|
print(' PROXY DOWN OR BAD CONFIGURATION\033[0m'.format(splash_url))
|
||||||
|
print('------------------------------------------------------------------------')
|
||||||
|
exit(-2)
|
||||||
|
else:
|
||||||
|
print(process.stdout.read())
|
||||||
|
exit(-1)
|
||||||
|
else:
|
||||||
|
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
|
||||||
|
print('--------------------------------------')
|
||||||
|
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
|
||||||
|
print(' {} DOWN'.format(splash_url))
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
|
||||||
|
if len(sys.argv) != 3:
|
||||||
|
print('usage:', 'Crawler.py', 'type_hidden_service (onion or i2p or regular)', 'splash_port')
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
type_hidden_service = sys.argv[1]
|
||||||
|
splash_port = sys.argv[2]
|
||||||
|
|
||||||
|
publisher.port = 6380
|
||||||
|
publisher.channel = "Script"
|
||||||
|
|
||||||
|
publisher.info("Script Crawler started")
|
||||||
|
|
||||||
|
config_section = 'Crawler'
|
||||||
|
|
||||||
|
# Setup the I/O queues
|
||||||
|
p = Process(config_section)
|
||||||
|
|
||||||
|
url_onion = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||||
|
re.compile(url_onion)
|
||||||
|
url_i2p = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||||
|
re.compile(url_i2p)
|
||||||
|
|
||||||
|
if type_hidden_service == 'onion':
|
||||||
|
regex_hidden_service = url_onion
|
||||||
|
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_onion"), splash_port)
|
||||||
|
elif type_hidden_service == 'i2p':
|
||||||
|
regex_hidden_service = url_i2p
|
||||||
|
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_i2p"), splash_port)
|
||||||
|
elif type_hidden_service == 'regular':
|
||||||
|
regex_hidden_service = url_i2p
|
||||||
|
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_onion"), splash_port)
|
||||||
|
else:
|
||||||
|
print('incorrect crawler type: {}'.format(type_hidden_service))
|
||||||
|
exit(0)
|
||||||
|
|
||||||
|
print('splash url: {}'.format(splash_url))
|
||||||
|
|
||||||
|
crawler_depth_limit = p.config.getint("Crawler", "crawler_depth_limit")
|
||||||
|
|
||||||
|
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], p.config.get("Directories", "pastes"))
|
||||||
|
|
||||||
|
r_serv_metadata = redis.StrictRedis(
|
||||||
|
host=p.config.get("ARDB_Metadata", "host"),
|
||||||
|
port=p.config.getint("ARDB_Metadata", "port"),
|
||||||
|
db=p.config.getint("ARDB_Metadata", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
r_cache = redis.StrictRedis(
|
||||||
|
host=p.config.get("Redis_Cache", "host"),
|
||||||
|
port=p.config.getint("Redis_Cache", "port"),
|
||||||
|
db=p.config.getint("Redis_Cache", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
r_onion = redis.StrictRedis(
|
||||||
|
host=p.config.get("ARDB_Onion", "host"),
|
||||||
|
port=p.config.getint("ARDB_Onion", "port"),
|
||||||
|
db=p.config.getint("ARDB_Onion", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
# load domains blacklist
|
||||||
|
try:
|
||||||
|
with open(os.environ['AIL_BIN']+'/torcrawler/blacklist_onion.txt', 'r') as f:
|
||||||
|
r_onion.delete('blacklist_{}'.format(type_hidden_service))
|
||||||
|
lines = f.read().splitlines()
|
||||||
|
for line in lines:
|
||||||
|
r_onion.sadd('blacklist_{}'.format(type_hidden_service), line)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
while True:
|
||||||
|
|
||||||
|
# Recovering the streamed message informations.
|
||||||
|
message = r_onion.spop('{}_crawler_queue'.format(type_hidden_service))
|
||||||
|
|
||||||
|
if message is not None:
|
||||||
|
|
||||||
|
splitted = message.split(';')
|
||||||
|
if len(splitted) == 2:
|
||||||
|
url, paste = splitted
|
||||||
|
paste = paste.replace(PASTES_FOLDER+'/', '')
|
||||||
|
|
||||||
|
url_list = re.findall(regex_hidden_service, url)[0]
|
||||||
|
if url_list[1] == '':
|
||||||
|
url= 'http://{}'.format(url)
|
||||||
|
|
||||||
|
link, s, credential, subdomain, domain, host, port, \
|
||||||
|
resource_path, query_string, f1, f2, f3, f4 = url_list
|
||||||
|
domain = url_list[4]
|
||||||
|
r_onion.srem('{}_domain_crawler_queue'.format(type_hidden_service), domain)
|
||||||
|
|
||||||
|
domain_url = 'http://{}'.format(domain)
|
||||||
|
|
||||||
|
print('\033[92m------------------START CRAWLER------------------\033[0m')
|
||||||
|
print('crawler type: {}'.format(type_hidden_service))
|
||||||
|
print('\033[92m-------------------------------------------------\033[0m')
|
||||||
|
print('url: {}'.format(url))
|
||||||
|
print('domain: {}'.format(domain))
|
||||||
|
print('domain_url: {}'.format(domain_url))
|
||||||
|
|
||||||
|
if not r_onion.sismember('blacklist_{}'.format(type_hidden_service), domain):
|
||||||
|
|
||||||
|
date = datetime.datetime.now().strftime("%Y%m%d")
|
||||||
|
date_month = datetime.datetime.now().strftime("%Y%m")
|
||||||
|
|
||||||
|
if not r_onion.sismember('month_{}_up:{}'.format(type_hidden_service, date_month), domain) and not r_onion.sismember('{}_down:{}'.format(type_hidden_service, date), domain):
|
||||||
|
|
||||||
|
crawl_onion(url, domain, date, date_month, message)
|
||||||
|
if url != domain_url:
|
||||||
|
print(url)
|
||||||
|
print(domain_url)
|
||||||
|
crawl_onion(domain_url, domain, date, date_month, message)
|
||||||
|
|
||||||
|
# save down onion
|
||||||
|
if not r_onion.sismember('{}_up:{}'.format(type_hidden_service, date), domain):
|
||||||
|
r_onion.sadd('{}_down:{}'.format(type_hidden_service, date), domain)
|
||||||
|
#r_onion.sadd('{}_down_link:{}'.format(type_hidden_service, date), url)
|
||||||
|
#r_onion.hincrby('{}_link_down'.format(type_hidden_service), url, 1)
|
||||||
|
if not r_onion.exists('{}_metadata:{}'.format(type_hidden_service, domain)):
|
||||||
|
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'first_seen', date)
|
||||||
|
r_onion.hset('{}_metadata:{}'.format(type_hidden_service,domain), 'last_seen', date)
|
||||||
|
else:
|
||||||
|
#r_onion.hincrby('{}_link_up'.format(type_hidden_service), url, 1)
|
||||||
|
if r_onion.sismember('month_{}_up:{}'.format(type_hidden_service, date_month), domain) and r_serv_metadata.exists('paste_children:'+paste):
|
||||||
|
msg = 'infoleak:automatic-detection="{}";{}'.format(type_hidden_service, paste)
|
||||||
|
p.populate_set_out(msg, 'Tags')
|
||||||
|
|
||||||
|
# last check
|
||||||
|
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'last_check', date)
|
||||||
|
|
||||||
|
# last_father
|
||||||
|
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'paste_parent', paste)
|
||||||
|
|
||||||
|
# add onion screenshot history
|
||||||
|
# add crawled days
|
||||||
|
if r_onion.lindex('{}_history:{}'.format(type_hidden_service, domain), 0) != date:
|
||||||
|
r_onion.lpush('{}_history:{}'.format(type_hidden_service, domain), date)
|
||||||
|
# add crawled history by date
|
||||||
|
r_onion.lpush('{}_history:{}:{}'.format(type_hidden_service, domain, date), paste) #add datetime here
|
||||||
|
|
||||||
|
|
||||||
|
# check external onions links (full_scrawl)
|
||||||
|
external_domains = set()
|
||||||
|
for link in r_onion.smembers('domain_{}_external_links:{}'.format(type_hidden_service, domain)):
|
||||||
|
external_domain = re.findall(url_onion, link)
|
||||||
|
external_domain.extend(re.findall(url_i2p, link))
|
||||||
|
if len(external_domain) > 0:
|
||||||
|
external_domain = external_domain[0][4]
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
if '.onion' in external_domain and external_domain != domain:
|
||||||
|
external_domains.add(external_domain)
|
||||||
|
elif '.i2p' in external_domain and external_domain != domain:
|
||||||
|
external_domains.add(external_domain)
|
||||||
|
if len(external_domains) >= 10:
|
||||||
|
r_onion.sadd('{}_potential_source'.format(type_hidden_service), domain)
|
||||||
|
r_onion.delete('domain_{}_external_links:{}'.format(type_hidden_service, domain))
|
||||||
|
print(r_onion.smembers('domain_{}_external_links:{}'.format(type_hidden_service, domain)))
|
||||||
|
|
||||||
|
# update list, last crawled onions
|
||||||
|
r_onion.lpush('last_{}'.format(type_hidden_service), domain)
|
||||||
|
r_onion.ltrim('last_{}'.format(type_hidden_service), 0, 15)
|
||||||
|
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
time.sleep(1)
|
|
@ -27,6 +27,7 @@ islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
|
||||||
isqueued=`screen -ls | egrep '[0-9]+.Queue_AIL' | cut -d. -f1`
|
isqueued=`screen -ls | egrep '[0-9]+.Queue_AIL' | cut -d. -f1`
|
||||||
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
|
||||||
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
|
||||||
|
iscrawler=`screen -ls | egrep '[0-9]+.Crawler_AIL' | cut -d. -f1`
|
||||||
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
|
||||||
|
|
||||||
function helptext {
|
function helptext {
|
||||||
|
@ -199,6 +200,35 @@ function launching_scripts {
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function launching_crawler {
|
||||||
|
if [[ ! $iscrawler ]]; then
|
||||||
|
CONFIG=$AIL_BIN/packages/config.cfg
|
||||||
|
lport=$(awk '/^\[Crawler\]/{f=1} f==1&&/^splash_onion_port/{print $3;exit}' "${CONFIG}")
|
||||||
|
|
||||||
|
IFS='-' read -ra PORTS <<< "$lport"
|
||||||
|
if [ ${#PORTS[@]} -eq 1 ]
|
||||||
|
then
|
||||||
|
first_port=${PORTS[0]}
|
||||||
|
last_port=${PORTS[0]}
|
||||||
|
else
|
||||||
|
first_port=${PORTS[0]}
|
||||||
|
last_port=${PORTS[1]}
|
||||||
|
fi
|
||||||
|
|
||||||
|
screen -dmS "Crawler_AIL"
|
||||||
|
sleep 0.1
|
||||||
|
|
||||||
|
for ((i=first_port;i<=last_port;i++)); do
|
||||||
|
screen -S "Crawler_AIL" -X screen -t "onion_crawler:$i" bash -c 'cd '${AIL_BIN}'; ./Crawler.py onion '$i'; read x'
|
||||||
|
sleep 0.1
|
||||||
|
done
|
||||||
|
|
||||||
|
echo -e $GREEN"\t* Launching Crawler_AIL scripts"$DEFAULT
|
||||||
|
else
|
||||||
|
echo -e $RED"\t* A screen is already launched"$DEFAULT
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
function shutting_down_redis {
|
function shutting_down_redis {
|
||||||
redis_dir=${AIL_HOME}/redis/src/
|
redis_dir=${AIL_HOME}/redis/src/
|
||||||
bash -c $redis_dir'redis-cli -p 6379 SHUTDOWN'
|
bash -c $redis_dir'redis-cli -p 6379 SHUTDOWN'
|
||||||
|
@ -420,6 +450,9 @@ function launch_all {
|
||||||
Flask)
|
Flask)
|
||||||
launch_flask;
|
launch_flask;
|
||||||
;;
|
;;
|
||||||
|
Crawler)
|
||||||
|
launching_crawler;
|
||||||
|
;;
|
||||||
Killall)
|
Killall)
|
||||||
killall;
|
killall;
|
||||||
;;
|
;;
|
||||||
|
@ -445,10 +478,10 @@ while [ "$1" != "" ]; do
|
||||||
;;
|
;;
|
||||||
-k | --killAll ) killall;
|
-k | --killAll ) killall;
|
||||||
;;
|
;;
|
||||||
-c | --configUpdate ) checking_configuration "manual";
|
|
||||||
;;
|
|
||||||
-t | --thirdpartyUpdate ) update_thirdparty;
|
-t | --thirdpartyUpdate ) update_thirdparty;
|
||||||
;;
|
;;
|
||||||
|
-c | --crawler ) launching_crawler;
|
||||||
|
;;
|
||||||
-f | --launchFeeder ) launch_feeder;
|
-f | --launchFeeder ) launch_feeder;
|
||||||
;;
|
;;
|
||||||
-h | --help ) helptext;
|
-h | --help ) helptext;
|
||||||
|
|
63
bin/Onion.py
63
bin/Onion.py
|
@ -21,7 +21,6 @@ Requirements
|
||||||
*Need the ZMQ_Sub_Onion_Q Module running to be able to work properly.
|
*Need the ZMQ_Sub_Onion_Q Module running to be able to work properly.
|
||||||
|
|
||||||
"""
|
"""
|
||||||
import pprint
|
|
||||||
import time
|
import time
|
||||||
from packages import Paste
|
from packages import Paste
|
||||||
from pubsublogger import publisher
|
from pubsublogger import publisher
|
||||||
|
@ -30,6 +29,7 @@ import os
|
||||||
import base64
|
import base64
|
||||||
import subprocess
|
import subprocess
|
||||||
import redis
|
import redis
|
||||||
|
import re
|
||||||
|
|
||||||
from Helper import Process
|
from Helper import Process
|
||||||
|
|
||||||
|
@ -97,6 +97,12 @@ if __name__ == "__main__":
|
||||||
db=p.config.getint("Redis_Cache", "db"),
|
db=p.config.getint("Redis_Cache", "db"),
|
||||||
decode_responses=True)
|
decode_responses=True)
|
||||||
|
|
||||||
|
r_onion = redis.StrictRedis(
|
||||||
|
host=p.config.get("ARDB_Onion", "host"),
|
||||||
|
port=p.config.getint("ARDB_Onion", "port"),
|
||||||
|
db=p.config.getint("ARDB_Onion", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
# FUNCTIONS #
|
# FUNCTIONS #
|
||||||
publisher.info("Script subscribed to channel onion_categ")
|
publisher.info("Script subscribed to channel onion_categ")
|
||||||
|
|
||||||
|
@ -107,9 +113,21 @@ if __name__ == "__main__":
|
||||||
message = p.get_from_set()
|
message = p.get_from_set()
|
||||||
prec_filename = None
|
prec_filename = None
|
||||||
|
|
||||||
|
# send to crawler:
|
||||||
|
activate_crawler = p.config.get("Crawler", "activate_crawler")
|
||||||
|
if activate_crawler == 'True':
|
||||||
|
activate_crawler = True
|
||||||
|
print('Crawler enabled')
|
||||||
|
else:
|
||||||
|
activate_crawler = False
|
||||||
|
print('Crawler disabled')
|
||||||
|
|
||||||
# Thanks to Faup project for this regex
|
# Thanks to Faup project for this regex
|
||||||
# https://github.com/stricaud/faup
|
# https://github.com/stricaud/faup
|
||||||
url_regex = "((http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
url_regex = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||||
|
i2p_regex = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
|
||||||
|
re.compile(url_regex)
|
||||||
|
|
||||||
|
|
||||||
while True:
|
while True:
|
||||||
if message is not None:
|
if message is not None:
|
||||||
|
@ -123,13 +141,33 @@ if __name__ == "__main__":
|
||||||
PST = Paste.Paste(filename)
|
PST = Paste.Paste(filename)
|
||||||
|
|
||||||
for x in PST.get_regex(url_regex):
|
for x in PST.get_regex(url_regex):
|
||||||
|
print(x)
|
||||||
# Extracting url with regex
|
# Extracting url with regex
|
||||||
url, s, credential, subdomain, domain, host, port, \
|
url, s, credential, subdomain, domain, host, port, \
|
||||||
resource_path, query_string, f1, f2, f3, f4 = x
|
resource_path, query_string, f1, f2, f3, f4 = x
|
||||||
|
|
||||||
|
if '.onion' in url:
|
||||||
|
print(url)
|
||||||
domains_list.append(domain)
|
domains_list.append(domain)
|
||||||
urls.append(url)
|
urls.append(url)
|
||||||
|
|
||||||
|
'''
|
||||||
|
for x in PST.get_regex(i2p_regex):
|
||||||
|
# Extracting url with regex
|
||||||
|
url, s, credential, subdomain, domain, host, port, \
|
||||||
|
resource_path, query_string, f1, f2, f3, f4 = x
|
||||||
|
|
||||||
|
if '.i2p' in url:
|
||||||
|
print('add i2p')
|
||||||
|
print(domain)
|
||||||
|
if not r_onion.sismember('i2p_domain', domain) and not r_onion.sismember('i2p_domain_crawler_queue', domain):
|
||||||
|
r_onion.sadd('i2p_domain', domain)
|
||||||
|
r_onion.sadd('i2p_link', url)
|
||||||
|
r_onion.sadd('i2p_domain_crawler_queue', domain)
|
||||||
|
msg = '{};{}'.format(url,PST.p_path)
|
||||||
|
r_onion.sadd('i2p_crawler_queue', msg)
|
||||||
|
'''
|
||||||
|
|
||||||
# Saving the list of extracted onion domains.
|
# Saving the list of extracted onion domains.
|
||||||
PST.__setattr__(channel, domains_list)
|
PST.__setattr__(channel, domains_list)
|
||||||
PST.save_attribute_redis(channel, domains_list)
|
PST.save_attribute_redis(channel, domains_list)
|
||||||
|
@ -149,6 +187,27 @@ if __name__ == "__main__":
|
||||||
to_print = 'Onion;{};{};{};'.format(PST.p_source,
|
to_print = 'Onion;{};{};{};'.format(PST.p_source,
|
||||||
PST.p_date,
|
PST.p_date,
|
||||||
PST.p_name)
|
PST.p_name)
|
||||||
|
|
||||||
|
if activate_crawler:
|
||||||
|
date_month = datetime.datetime.now().strftime("%Y%m")
|
||||||
|
date = datetime.datetime.now().strftime("%Y%m%d")
|
||||||
|
for url in urls:
|
||||||
|
|
||||||
|
domain = re.findall(url_regex, url)
|
||||||
|
if len(domain) > 0:
|
||||||
|
domain = domain[0][4]
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not r_onion.sismember('month_onion_up:{}'.format(date_month), domain) and not r_onion.sismember('onion_down:'+date , domain):
|
||||||
|
if not r_onion.sismember('onion_domain_crawler_queue', domain):
|
||||||
|
print('send to onion crawler')
|
||||||
|
r_onion.sadd('onion_domain_crawler_queue', domain)
|
||||||
|
msg = '{};{}'.format(url,PST.p_path)
|
||||||
|
r_onion.sadd('onion_crawler_queue', msg)
|
||||||
|
#p.populate_set_out(msg, 'Crawler')
|
||||||
|
|
||||||
|
else:
|
||||||
for url in fetch(p, r_cache, urls, domains_list, path):
|
for url in fetch(p, r_cache, urls, domains_list, path):
|
||||||
publisher.info('{}Checked {};{}'.format(to_print, url, PST.p_path))
|
publisher.info('{}Checked {};{}'.format(to_print, url, PST.p_path))
|
||||||
p.populate_set_out('onion;{}'.format(PST.p_path), 'alertHandler')
|
p.populate_set_out('onion;{}'.format(PST.p_path), 'alertHandler')
|
||||||
|
|
183
bin/packages/HiddenServices.py
Executable file
183
bin/packages/HiddenServices.py
Executable file
|
@ -0,0 +1,183 @@
|
||||||
|
#!/usr/bin/python3
|
||||||
|
|
||||||
|
"""
|
||||||
|
The ``hiddenServices Class``
|
||||||
|
===================
|
||||||
|
|
||||||
|
Use it to create an object from an existing paste or other random file.
|
||||||
|
|
||||||
|
Conditions to fulfill to be able to use this class correctly:
|
||||||
|
-------------------------------------------------------------
|
||||||
|
|
||||||
|
1/ The paste need to be saved on disk somewhere (have an accessible path)
|
||||||
|
2/ The paste need to be gziped.
|
||||||
|
3/ The filepath need to look like something like this:
|
||||||
|
/directory/source/year/month/day/paste.gz
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import gzip
|
||||||
|
import redis
|
||||||
|
import random
|
||||||
|
|
||||||
|
import configparser
|
||||||
|
import sys
|
||||||
|
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'packages/'))
|
||||||
|
from Date import Date
|
||||||
|
|
||||||
|
class HiddenServices(object):
|
||||||
|
"""
|
||||||
|
This class representing a hiddenServices as an object.
|
||||||
|
When created, the object will have by default some "main attributes"
|
||||||
|
|
||||||
|
:Example:
|
||||||
|
|
||||||
|
PST = HiddenServices("xxxxxxxx.onion", "onion")
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, domain, type):
|
||||||
|
|
||||||
|
configfile = os.path.join(os.environ['AIL_BIN'], 'packages/config.cfg')
|
||||||
|
if not os.path.exists(configfile):
|
||||||
|
raise Exception('Unable to find the configuration file. \
|
||||||
|
Did you set environment variables? \
|
||||||
|
Or activate the virtualenv.')
|
||||||
|
|
||||||
|
cfg = configparser.ConfigParser()
|
||||||
|
cfg.read(configfile)
|
||||||
|
self.r_serv_onion = redis.StrictRedis(
|
||||||
|
host=cfg.get("ARDB_Onion", "host"),
|
||||||
|
port=cfg.getint("ARDB_Onion", "port"),
|
||||||
|
db=cfg.getint("ARDB_Onion", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
self.r_serv_metadata = redis.StrictRedis(
|
||||||
|
host=cfg.get("ARDB_Metadata", "host"),
|
||||||
|
port=cfg.getint("ARDB_Metadata", "port"),
|
||||||
|
db=cfg.getint("ARDB_Metadata", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
self.domain = domain
|
||||||
|
self.type = type
|
||||||
|
self.tags = {}
|
||||||
|
|
||||||
|
if type == 'onion':
|
||||||
|
self.paste_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
|
||||||
|
self.paste_crawled_directory = os.path.join(self.paste_directory, cfg.get("Directories", "crawled"))
|
||||||
|
self.paste_crawled_directory_name = cfg.get("Directories", "crawled")
|
||||||
|
self.screenshot_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||||
|
elif type == 'i2p':
|
||||||
|
self.paste_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||||
|
self.screenshot_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||||
|
else:
|
||||||
|
## TODO: # FIXME: add error
|
||||||
|
pass
|
||||||
|
|
||||||
|
def get_origin_paste_name(self):
|
||||||
|
origin_paste = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'paste_parent')
|
||||||
|
if origin_paste is None:
|
||||||
|
return ''
|
||||||
|
return origin_paste.replace(self.paste_directory+'/', '')
|
||||||
|
|
||||||
|
def get_domain_tags(self):
|
||||||
|
return self.tags
|
||||||
|
|
||||||
|
def update_domain_tags(self, children):
|
||||||
|
p_tags = self.r_serv_metadata.smembers('tag:'+children)
|
||||||
|
for tag in p_tags:
|
||||||
|
self.tags[tag] = self.tags.get(tag, 0) + 1
|
||||||
|
|
||||||
|
#todo use the right paste
|
||||||
|
def get_last_crawled_pastes(self):
|
||||||
|
paste_parent = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'paste_parent')
|
||||||
|
#paste_parent = paste_parent.replace(self.paste_directory, '')[1:]
|
||||||
|
return self.get_all_pastes_domain(paste_parent)
|
||||||
|
|
||||||
|
def get_all_pastes_domain(self, father):
|
||||||
|
if father is None:
|
||||||
|
return []
|
||||||
|
l_crawled_pastes = []
|
||||||
|
paste_parent = father.replace(self.paste_directory+'/', '')
|
||||||
|
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_parent))
|
||||||
|
## TODO: # FIXME: remove me
|
||||||
|
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(father))
|
||||||
|
paste_childrens = paste_childrens | paste_children
|
||||||
|
for children in paste_childrens:
|
||||||
|
if self.domain in children:
|
||||||
|
l_crawled_pastes.append(children)
|
||||||
|
self.update_domain_tags(children)
|
||||||
|
l_crawled_pastes.extend(self.get_all_pastes_domain(children))
|
||||||
|
return l_crawled_pastes
|
||||||
|
|
||||||
|
def get_domain_son(self, l_paste):
|
||||||
|
if l_paste is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
set_domain = set()
|
||||||
|
for paste in l_paste:
|
||||||
|
paste_full = paste.replace(self.paste_directory+'/', '')
|
||||||
|
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_full))
|
||||||
|
## TODO: # FIXME: remove me
|
||||||
|
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(paste))
|
||||||
|
paste_childrens = paste_childrens | paste_children
|
||||||
|
for children in paste_childrens:
|
||||||
|
if not self.domain in children:
|
||||||
|
print(children)
|
||||||
|
set_domain.add((children.split('.onion')[0]+'.onion').split('/')[-1])
|
||||||
|
|
||||||
|
return set_domain
|
||||||
|
|
||||||
|
def get_all_domain_son(self, father):
|
||||||
|
if father is None:
|
||||||
|
return []
|
||||||
|
l_crawled_pastes = []
|
||||||
|
paste_parent = father.replace(self.paste_directory+'/', '')
|
||||||
|
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_parent))
|
||||||
|
## TODO: # FIXME: remove me
|
||||||
|
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(father))
|
||||||
|
paste_childrens = paste_childrens | paste_children
|
||||||
|
for children in paste_childrens:
|
||||||
|
if not self.domain in children:
|
||||||
|
l_crawled_pastes.append(children)
|
||||||
|
#self.update_domain_tags(children)
|
||||||
|
l_crawled_pastes.extend(self.get_all_domain_son(children))
|
||||||
|
|
||||||
|
return l_crawled_pastes
|
||||||
|
|
||||||
|
def get_domain_random_screenshot(self, l_crawled_pastes, num_screenshot = 1):
|
||||||
|
l_screenshot_paste = []
|
||||||
|
for paste in l_crawled_pastes:
|
||||||
|
## FIXME: # TODO: remove me
|
||||||
|
paste= paste.replace(self.paste_directory+'/', '')
|
||||||
|
|
||||||
|
paste = paste.replace(self.paste_crawled_directory_name, '')
|
||||||
|
if os.path.isfile( '{}{}.png'.format(self.screenshot_directory, paste) ):
|
||||||
|
l_screenshot_paste.append(paste[1:])
|
||||||
|
|
||||||
|
if len(l_screenshot_paste) > num_screenshot:
|
||||||
|
l_random_screenshot = []
|
||||||
|
for index in random.sample( range(0, len(l_screenshot_paste)), num_screenshot ):
|
||||||
|
l_random_screenshot.append(l_screenshot_paste[index])
|
||||||
|
return l_random_screenshot
|
||||||
|
else:
|
||||||
|
return l_screenshot_paste
|
||||||
|
|
||||||
|
def get_crawled_pastes_by_date(self, date):
|
||||||
|
|
||||||
|
pastes_path = os.path.join(self.paste_crawled_directory, date[0:4], date[4:6], date[6:8])
|
||||||
|
paste_parent = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'last_check')
|
||||||
|
|
||||||
|
l_crawled_pastes = []
|
||||||
|
return l_crawled_pastes
|
||||||
|
|
||||||
|
def get_last_crawled_pastes_fileSearch(self):
|
||||||
|
|
||||||
|
last_check = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'last_check')
|
||||||
|
return self.get_crawled_pastes_by_date_fileSearch(last_check)
|
||||||
|
|
||||||
|
def get_crawled_pastes_by_date_fileSearch(self, date):
|
||||||
|
pastes_path = os.path.join(self.paste_crawled_directory, date[0:4], date[4:6], date[6:8])
|
||||||
|
l_crawled_pastes = [f for f in os.listdir(pastes_path) if self.domain in f]
|
||||||
|
return l_crawled_pastes
|
|
@ -94,6 +94,7 @@ class Paste(object):
|
||||||
|
|
||||||
var = self.p_path.split('/')
|
var = self.p_path.split('/')
|
||||||
self.p_date = Date(var[-4], var[-3], var[-2])
|
self.p_date = Date(var[-4], var[-3], var[-2])
|
||||||
|
self.p_rel_path = os.path.join(var[-4], var[-3], var[-2], self.p_name)
|
||||||
self.p_source = var[-5]
|
self.p_source = var[-5]
|
||||||
self.supposed_url = 'https://{}/{}'.format(self.p_source.replace('_pro', ''), var[-1].split('.gz')[0])
|
self.supposed_url = 'https://{}/{}'.format(self.p_source.replace('_pro', ''), var[-1].split('.gz')[0])
|
||||||
|
|
||||||
|
@ -291,6 +292,9 @@ class Paste(object):
|
||||||
else:
|
else:
|
||||||
return '[]'
|
return '[]'
|
||||||
|
|
||||||
|
def get_p_rel_path(self):
|
||||||
|
return self.p_rel_path
|
||||||
|
|
||||||
def save_all_attributes_redis(self, key=None):
|
def save_all_attributes_redis(self, key=None):
|
||||||
"""
|
"""
|
||||||
Saving all the attributes in a "Redis-like" Database (Redis, LevelDB)
|
Saving all the attributes in a "Redis-like" Database (Redis, LevelDB)
|
||||||
|
|
|
@ -4,6 +4,8 @@ dicofilters = Dicos
|
||||||
pastes = PASTES
|
pastes = PASTES
|
||||||
hash = HASHS
|
hash = HASHS
|
||||||
base64 = BASE64
|
base64 = BASE64
|
||||||
|
crawled = crawled
|
||||||
|
crawled_screenshot = CRAWLED_SCREENSHOT
|
||||||
|
|
||||||
wordtrending_csv = var/www/static/csv/wordstrendingdata
|
wordtrending_csv = var/www/static/csv/wordstrendingdata
|
||||||
wordsfile = files/wordfile
|
wordsfile = files/wordfile
|
||||||
|
@ -184,6 +186,11 @@ host = localhost
|
||||||
port = 6382
|
port = 6382
|
||||||
db = 8
|
db = 8
|
||||||
|
|
||||||
|
[ARDB_Onion]
|
||||||
|
host = localhost
|
||||||
|
port = 6382
|
||||||
|
db = 9
|
||||||
|
|
||||||
[Url]
|
[Url]
|
||||||
cc_critical = DE
|
cc_critical = DE
|
||||||
|
|
||||||
|
@ -228,3 +235,9 @@ channel = FetchedOnion
|
||||||
host = localhost
|
host = localhost
|
||||||
port = 6381
|
port = 6381
|
||||||
db = 0
|
db = 0
|
||||||
|
|
||||||
|
[Crawler]
|
||||||
|
activate_crawler = True
|
||||||
|
crawler_depth_limit = 1
|
||||||
|
splash_url_onion = http://127.0.0.1
|
||||||
|
splash_onion_port = 8050-8052
|
||||||
|
|
|
@ -61,7 +61,7 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_alertHandler,Redis_Tags
|
||||||
|
|
||||||
[Onion]
|
[Onion]
|
||||||
subscribe = Redis_Onion
|
subscribe = Redis_Onion
|
||||||
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler,Redis_Tags
|
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler,Redis_Tags,Redis_Crawler
|
||||||
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler
|
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler
|
||||||
|
|
||||||
[DumpValidOnion]
|
[DumpValidOnion]
|
||||||
|
@ -136,3 +136,8 @@ publish = Redis_Duplicate,Redis_alertHandler,Redis_Tags
|
||||||
[submit_paste]
|
[submit_paste]
|
||||||
subscribe = Redis
|
subscribe = Redis
|
||||||
publish = Redis_Mixer
|
publish = Redis_Mixer
|
||||||
|
|
||||||
|
[Crawler]
|
||||||
|
subscribe = Redis_Crawler
|
||||||
|
publish = Redis_Mixer,Redis_Tags
|
||||||
|
|
||||||
|
|
242
bin/torcrawler/TorSplashCrawler.py
Normal file
242
bin/torcrawler/TorSplashCrawler.py
Normal file
|
@ -0,0 +1,242 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
# -*-coding:UTF-8 -*
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import gzip
|
||||||
|
import base64
|
||||||
|
import uuid
|
||||||
|
import datetime
|
||||||
|
import base64
|
||||||
|
import redis
|
||||||
|
import json
|
||||||
|
|
||||||
|
from scrapy.spidermiddlewares.httperror import HttpError
|
||||||
|
from twisted.internet.error import DNSLookupError
|
||||||
|
from twisted.internet.error import TimeoutError
|
||||||
|
|
||||||
|
from scrapy import Spider
|
||||||
|
from scrapy.linkextractors import LinkExtractor
|
||||||
|
from scrapy.crawler import CrawlerProcess, Crawler
|
||||||
|
|
||||||
|
from scrapy_splash import SplashRequest
|
||||||
|
|
||||||
|
sys.path.append(os.environ['AIL_BIN'])
|
||||||
|
from Helper import Process
|
||||||
|
|
||||||
|
class TorSplashCrawler():
|
||||||
|
|
||||||
|
def __init__(self, splash_url, crawler_depth_limit):
|
||||||
|
self.process = CrawlerProcess({'LOG_ENABLED': False})
|
||||||
|
self.crawler = Crawler(self.TorSplashSpider, {
|
||||||
|
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0',
|
||||||
|
'SPLASH_URL': splash_url,
|
||||||
|
'ROBOTSTXT_OBEY': False,
|
||||||
|
'DOWNLOADER_MIDDLEWARES': {'scrapy_splash.SplashCookiesMiddleware': 723,
|
||||||
|
'scrapy_splash.SplashMiddleware': 725,
|
||||||
|
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
|
||||||
|
},
|
||||||
|
'SPIDER_MIDDLEWARES': {'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,},
|
||||||
|
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
|
||||||
|
'HTTPERROR_ALLOW_ALL': True,
|
||||||
|
'DEPTH_LIMIT': crawler_depth_limit
|
||||||
|
})
|
||||||
|
|
||||||
|
def crawl(self, type, url, domain, original_paste, super_father):
|
||||||
|
self.process.crawl(self.crawler, type=type, url=url, domain=domain,original_paste=original_paste, super_father=super_father)
|
||||||
|
self.process.start()
|
||||||
|
|
||||||
|
class TorSplashSpider(Spider):
|
||||||
|
name = 'TorSplashSpider'
|
||||||
|
|
||||||
|
def __init__(self, type, url, domain,original_paste, super_father, *args, **kwargs):
|
||||||
|
self.type = type
|
||||||
|
self.original_paste = original_paste
|
||||||
|
self.super_father = super_father
|
||||||
|
self.start_urls = url
|
||||||
|
self.domains = [domain]
|
||||||
|
date = datetime.datetime.now().strftime("%Y/%m/%d")
|
||||||
|
self.full_date = datetime.datetime.now().strftime("%Y%m%d")
|
||||||
|
self.date_month = datetime.datetime.now().strftime("%Y%m")
|
||||||
|
|
||||||
|
config_section = 'Crawler'
|
||||||
|
self.p = Process(config_section)
|
||||||
|
|
||||||
|
self.r_cache = redis.StrictRedis(
|
||||||
|
host=self.p.config.get("Redis_Cache", "host"),
|
||||||
|
port=self.p.config.getint("Redis_Cache", "port"),
|
||||||
|
db=self.p.config.getint("Redis_Cache", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
self.r_serv_log_submit = redis.StrictRedis(
|
||||||
|
host=self.p.config.get("Redis_Log_submit", "host"),
|
||||||
|
port=self.p.config.getint("Redis_Log_submit", "port"),
|
||||||
|
db=self.p.config.getint("Redis_Log_submit", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
self.r_serv_metadata = redis.StrictRedis(
|
||||||
|
host=self.p.config.get("ARDB_Metadata", "host"),
|
||||||
|
port=self.p.config.getint("ARDB_Metadata", "port"),
|
||||||
|
db=self.p.config.getint("ARDB_Metadata", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
self.r_serv_onion = redis.StrictRedis(
|
||||||
|
host=self.p.config.get("ARDB_Onion", "host"),
|
||||||
|
port=self.p.config.getint("ARDB_Onion", "port"),
|
||||||
|
db=self.p.config.getint("ARDB_Onion", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
self.crawler_path = os.path.join(self.p.config.get("Directories", "crawled"), date )
|
||||||
|
|
||||||
|
self.crawled_paste_filemame = os.path.join(os.environ['AIL_HOME'], self.p.config.get("Directories", "pastes"),
|
||||||
|
self.p.config.get("Directories", "crawled"), date )
|
||||||
|
|
||||||
|
self.crawled_screenshot = os.path.join(os.environ['AIL_HOME'], self.p.config.get("Directories", "crawled_screenshot"), date )
|
||||||
|
|
||||||
|
def start_requests(self):
|
||||||
|
yield SplashRequest(
|
||||||
|
self.start_urls,
|
||||||
|
self.parse,
|
||||||
|
#errback=self.errback_catcher,
|
||||||
|
endpoint='render.json',
|
||||||
|
meta={'father': self.original_paste},
|
||||||
|
args={ 'html': 1,
|
||||||
|
'wait': 10,
|
||||||
|
'render_all': 1,
|
||||||
|
'har': 1,
|
||||||
|
'png': 1}
|
||||||
|
)
|
||||||
|
|
||||||
|
def parse(self,response):
|
||||||
|
#print(response.headers)
|
||||||
|
#print(response.status)
|
||||||
|
if response.status == 504:
|
||||||
|
# down ?
|
||||||
|
print('504 detected')
|
||||||
|
elif response.status != 200:
|
||||||
|
print('other response: {}'.format(response.status))
|
||||||
|
#print(error_log)
|
||||||
|
#detect connection to proxy refused
|
||||||
|
error_log = (json.loads(response.body.decode()))
|
||||||
|
if(error_log['info']['text'] == 'Connection to proxy refused'):
|
||||||
|
print('Connection to proxy refused')
|
||||||
|
else:
|
||||||
|
|
||||||
|
UUID = self.domains[0]+str(uuid.uuid4())
|
||||||
|
filename_paste = os.path.join(self.crawled_paste_filemame, UUID)
|
||||||
|
relative_filename_paste = os.path.join(self.crawler_path, UUID)
|
||||||
|
filename_screenshot = os.path.join(self.crawled_screenshot, UUID +'.png')
|
||||||
|
|
||||||
|
# save new paste on disk
|
||||||
|
if self.save_crawled_paste(filename_paste, response.data['html']):
|
||||||
|
|
||||||
|
# add this paste to the domain crawled set # TODO: # FIXME: put this on cache ?
|
||||||
|
#self.r_serv_onion.sadd('temp:crawled_domain_pastes:{}'.format(self.domains[0]), filename_paste)
|
||||||
|
|
||||||
|
self.r_serv_onion.sadd('{}_up:{}'.format(self.type, self.full_date), self.domains[0])
|
||||||
|
self.r_serv_onion.sadd('full_{}_up'.format(self.type), self.domains[0])
|
||||||
|
self.r_serv_onion.sadd('month_{}_up:{}'.format(self.type, self.date_month), self.domains[0])
|
||||||
|
|
||||||
|
# create onion metadata
|
||||||
|
if not self.r_serv_onion.exists('{}_metadata:{}'.format(self.type, self.domains[0])):
|
||||||
|
self.r_serv_onion.hset('{}_metadata:{}'.format(self.type, self.domains[0]), 'first_seen', self.full_date)
|
||||||
|
self.r_serv_onion.hset('{}_metadata:{}'.format(self.type, self.domains[0]), 'last_seen', self.full_date)
|
||||||
|
|
||||||
|
#create paste metadata
|
||||||
|
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'super_father', self.super_father)
|
||||||
|
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'father', response.meta['father'])
|
||||||
|
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'domain', self.domains[0])
|
||||||
|
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'real_link', response.url)
|
||||||
|
|
||||||
|
self.r_serv_metadata.sadd('paste_children:'+response.meta['father'], filename_paste)
|
||||||
|
|
||||||
|
dirname = os.path.dirname(filename_screenshot)
|
||||||
|
if not os.path.exists(dirname):
|
||||||
|
os.makedirs(dirname)
|
||||||
|
|
||||||
|
size_screenshot = (len(response.data['png'])*3) /4
|
||||||
|
|
||||||
|
if size_screenshot < 5000000: #bytes
|
||||||
|
with open(filename_screenshot, 'wb') as f:
|
||||||
|
f.write(base64.standard_b64decode(response.data['png'].encode()))
|
||||||
|
|
||||||
|
with open(filename_screenshot+'har.txt', 'wb') as f:
|
||||||
|
f.write(json.dumps(response.data['har']).encode())
|
||||||
|
|
||||||
|
# save external links in set
|
||||||
|
#lext = LinkExtractor(deny_domains=self.domains, unique=True)
|
||||||
|
#for link in lext.extract_links(response):
|
||||||
|
# self.r_serv_onion.sadd('domain_{}_external_links:{}'.format(self.type, self.domains[0]), link.url)
|
||||||
|
# self.r_serv_metadata.sadd('paste_{}_external_links:{}'.format(self.type, filename_paste), link.url)
|
||||||
|
|
||||||
|
le = LinkExtractor(allow_domains=self.domains, unique=True)
|
||||||
|
for link in le.extract_links(response):
|
||||||
|
yield SplashRequest(
|
||||||
|
link.url,
|
||||||
|
self.parse,
|
||||||
|
#errback=self.errback_catcher,
|
||||||
|
endpoint='render.json',
|
||||||
|
meta={'father': relative_filename_paste},
|
||||||
|
args={ 'html': 1,
|
||||||
|
'png': 1,
|
||||||
|
'render_all': 1,
|
||||||
|
'har': 1,
|
||||||
|
'wait': 10}
|
||||||
|
)
|
||||||
|
|
||||||
|
'''
|
||||||
|
def errback_catcher(self, failure):
|
||||||
|
# catch all errback failures,
|
||||||
|
self.logger.error(repr(failure))
|
||||||
|
print('failure')
|
||||||
|
#print(failure)
|
||||||
|
print(failure.type)
|
||||||
|
#print(failure.request.meta['item'])
|
||||||
|
|
||||||
|
#if isinstance(failure.value, HttpError):
|
||||||
|
if failure.check(HttpError):
|
||||||
|
# you can get the response
|
||||||
|
response = failure.value.response
|
||||||
|
print('HttpError')
|
||||||
|
self.logger.error('HttpError on %s', response.url)
|
||||||
|
|
||||||
|
#elif isinstance(failure.value, DNSLookupError):
|
||||||
|
elif failure.check(DNSLookupError):
|
||||||
|
# this is the original request
|
||||||
|
request = failure.request
|
||||||
|
print(DNSLookupError)
|
||||||
|
print('DNSLookupError')
|
||||||
|
self.logger.error('DNSLookupError on %s', request.url)
|
||||||
|
|
||||||
|
#elif isinstance(failure.value, TimeoutError):
|
||||||
|
elif failure.check(TimeoutError):
|
||||||
|
request = failure.request
|
||||||
|
print('TimeoutError')
|
||||||
|
print(TimeoutError)
|
||||||
|
self.logger.error('TimeoutError on %s', request.url)
|
||||||
|
'''
|
||||||
|
|
||||||
|
def save_crawled_paste(self, filename, content):
|
||||||
|
|
||||||
|
if os.path.isfile(filename):
|
||||||
|
print('File: {} already exist in submitted pastes'.format(filename))
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
gzipencoded = gzip.compress(content.encode())
|
||||||
|
gzip64encoded = base64.standard_b64encode(gzipencoded).decode()
|
||||||
|
except:
|
||||||
|
print("file error: {}".format(filename))
|
||||||
|
return False
|
||||||
|
|
||||||
|
# send paste to Global
|
||||||
|
relay_message = "{0} {1}".format(filename, gzip64encoded)
|
||||||
|
self.p.populate_set_out(relay_message, 'Mixer')
|
||||||
|
|
||||||
|
# increase nb of paste by feeder name
|
||||||
|
self.r_serv_log_submit.hincrby("mixer_cache:list_feeder", "crawler", 1)
|
||||||
|
|
||||||
|
# tag crawled paste
|
||||||
|
msg = 'infoleak:submission="crawler";{}'.format(filename)
|
||||||
|
self.p.populate_set_out(msg, 'Tags')
|
||||||
|
return True
|
1
bin/torcrawler/blacklist_onion.txt
Normal file
1
bin/torcrawler/blacklist_onion.txt
Normal file
|
@ -0,0 +1 @@
|
||||||
|
www.facebookcorewwwi.onion
|
43
bin/torcrawler/launch_splash_crawler.sh
Executable file
43
bin/torcrawler/launch_splash_crawler.sh
Executable file
|
@ -0,0 +1,43 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
usage() { echo "Usage: sudo $0 [-f <config_absolute_path>] [-p <port_start>] [-n <number_of_splash_servers>]" 1>&2;
|
||||||
|
echo " -f: absolute path to splash docker proxy-profiles directory (used for proxy configuration)";
|
||||||
|
echo " -p: number of the first splash server port number. This number is incremented for the others splash server";
|
||||||
|
echo " -n: number of splash servers to start";
|
||||||
|
echo "";
|
||||||
|
echo "example:";
|
||||||
|
echo "sudo ./launch_splash_crawler.sh -f /home/my_user/AIL-framework/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 3";
|
||||||
|
exit 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
while getopts ":p:f:n:" o; do
|
||||||
|
case "${o}" in
|
||||||
|
p)
|
||||||
|
p=${OPTARG}
|
||||||
|
;;
|
||||||
|
f)
|
||||||
|
f=${OPTARG}
|
||||||
|
;;
|
||||||
|
n)
|
||||||
|
n=${OPTARG}
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
usage
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
shift $((OPTIND-1))
|
||||||
|
|
||||||
|
if [ -z "${p}" ] || [ -z "${f}" ] || [ -z "${n}" ]; then
|
||||||
|
usage;
|
||||||
|
fi
|
||||||
|
|
||||||
|
screen -dmS "Docker_Splash"
|
||||||
|
sleep 0.1
|
||||||
|
|
||||||
|
for ((i=0;i<=$((${n} - 1));i++)); do
|
||||||
|
port_number=$((${p} + $i))
|
||||||
|
screen -S "Docker_Splash" -X screen -t "docker_splash:$port_number" bash -c 'sudo docker run -p '$port_number':8050 --cpus=1 --memory=4.5G -v '$f':/etc/splash/proxy-profiles/ --net="bridge" scrapinghub/splash; read x'
|
||||||
|
sleep 0.1
|
||||||
|
echo " Splash server launched on port $port_number"
|
||||||
|
done
|
34
bin/torcrawler/tor_crawler.py
Executable file
34
bin/torcrawler/tor_crawler.py
Executable file
|
@ -0,0 +1,34 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
# -*-coding:UTF-8 -*
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import configparser
|
||||||
|
from TorSplashCrawler import TorSplashCrawler
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
|
||||||
|
if len(sys.argv) != 7:
|
||||||
|
print('usage:', 'tor_crawler.py', 'splash_url', 'type', 'url', 'domain', 'paste', 'super_father')
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
configfile = os.path.join(os.environ['AIL_BIN'], 'packages/config.cfg')
|
||||||
|
if not os.path.exists(configfile):
|
||||||
|
raise Exception('Unable to find the configuration file. \
|
||||||
|
Did you set environment variables? \
|
||||||
|
Or activate the virtualenv.')
|
||||||
|
|
||||||
|
cfg = configparser.ConfigParser()
|
||||||
|
cfg.read(configfile)
|
||||||
|
|
||||||
|
splash_url = sys.argv[1]
|
||||||
|
type = sys.argv[2]
|
||||||
|
crawler_depth_limit = cfg.getint("Crawler", "crawler_depth_limit")
|
||||||
|
|
||||||
|
url = sys.argv[3]
|
||||||
|
domain = sys.argv[4]
|
||||||
|
paste = sys.argv[5]
|
||||||
|
super_father = sys.argv[6]
|
||||||
|
|
||||||
|
crawler = TorSplashCrawler(splash_url, crawler_depth_limit)
|
||||||
|
crawler.crawl(type, url, domain, paste, super_father)
|
|
@ -0,0 +1,4 @@
|
||||||
|
[proxy]
|
||||||
|
host=172.17.0.1
|
||||||
|
port=9050
|
||||||
|
type=SOCKS5
|
56
crawler_hidden_services_install.sh
Executable file
56
crawler_hidden_services_install.sh
Executable file
|
@ -0,0 +1,56 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
install_docker() {
|
||||||
|
# install docker
|
||||||
|
sudo apt install docker.io;
|
||||||
|
|
||||||
|
# pull splah docker
|
||||||
|
sudo docker pull scrapinghub/splash;
|
||||||
|
}
|
||||||
|
|
||||||
|
install_python_requirement() {
|
||||||
|
. ./AILENV/bin/activate;
|
||||||
|
pip3 install -U -r crawler_requirements.txt;
|
||||||
|
}
|
||||||
|
|
||||||
|
install_all() {
|
||||||
|
read -p "Do you want to install docker? (use local splash server) [y/n] " -n 1 -r
|
||||||
|
echo # (optional) move to a new line
|
||||||
|
if [[ $REPLY =~ ^[Yy]$ ]]
|
||||||
|
then
|
||||||
|
install_docker;
|
||||||
|
fi
|
||||||
|
install_python_requirement;
|
||||||
|
}
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
echo "Usage: crawler_hidden_services_install.sh [-y | -n]" 1>&2;
|
||||||
|
echo " -y: install docker"
|
||||||
|
echo " -n: don't install docker"
|
||||||
|
echo ""
|
||||||
|
echo "example:"
|
||||||
|
echo "crawler_hidden_services_install.sh -y"
|
||||||
|
exit 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
if [[ $1 == "" ]]; then
|
||||||
|
install_all;
|
||||||
|
exit;
|
||||||
|
else
|
||||||
|
key="$1"
|
||||||
|
case $key in
|
||||||
|
"")
|
||||||
|
install_all;
|
||||||
|
;;
|
||||||
|
-y|--yes)
|
||||||
|
install_docker;
|
||||||
|
install_python_requirement;
|
||||||
|
;;
|
||||||
|
-n|--no)
|
||||||
|
install_python_requirement;
|
||||||
|
;;
|
||||||
|
*) # unknown option
|
||||||
|
usage;
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
fi
|
2
crawler_requirements.txt
Normal file
2
crawler_requirements.txt
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
scrapy
|
||||||
|
scrapy-splash
|
4
etc/splash/proxy-profiles/default.ini
Normal file
4
etc/splash/proxy-profiles/default.ini
Normal file
|
@ -0,0 +1,4 @@
|
||||||
|
[proxy]
|
||||||
|
host=localhost
|
||||||
|
port=9050
|
||||||
|
type=SOCKS5
|
|
@ -1 +1,2 @@
|
||||||
onion
|
onion
|
||||||
|
i2p
|
||||||
|
|
|
@ -48,6 +48,9 @@ except IOError:
|
||||||
f = open('templates/ignored_modules.txt', 'w')
|
f = open('templates/ignored_modules.txt', 'w')
|
||||||
f.close()
|
f.close()
|
||||||
|
|
||||||
|
activate_crawler = cfg.get("Crawler", "activate_crawler")
|
||||||
|
if activate_crawler != 'True':
|
||||||
|
toIgnoreModule.add('hiddenServices')
|
||||||
|
|
||||||
# Dynamically import routes and functions from modules
|
# Dynamically import routes and functions from modules
|
||||||
# Also, prepare header.html
|
# Also, prepare header.html
|
||||||
|
|
|
@ -96,6 +96,12 @@ r_serv_statistics = redis.StrictRedis(
|
||||||
db=cfg.getint("ARDB_Statistics", "db"),
|
db=cfg.getint("ARDB_Statistics", "db"),
|
||||||
decode_responses=True)
|
decode_responses=True)
|
||||||
|
|
||||||
|
r_serv_onion = redis.StrictRedis(
|
||||||
|
host=cfg.get("ARDB_Onion", "host"),
|
||||||
|
port=cfg.getint("ARDB_Onion", "port"),
|
||||||
|
db=cfg.getint("ARDB_Onion", "db"),
|
||||||
|
decode_responses=True)
|
||||||
|
|
||||||
|
|
||||||
sys.path.append('../../configs/keys')
|
sys.path.append('../../configs/keys')
|
||||||
# MISP #
|
# MISP #
|
||||||
|
@ -150,6 +156,7 @@ bootstrap_label = ['primary', 'success', 'danger', 'warning', 'info']
|
||||||
UPLOAD_FOLDER = os.path.join(os.environ['AIL_FLASK'], 'submitted')
|
UPLOAD_FOLDER = os.path.join(os.environ['AIL_FLASK'], 'submitted')
|
||||||
|
|
||||||
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
|
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
|
||||||
|
SCREENSHOT_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
|
||||||
|
|
||||||
max_dashboard_logs = int(cfg.get("Flask", "max_dashboard_logs"))
|
max_dashboard_logs = int(cfg.get("Flask", "max_dashboard_logs"))
|
||||||
|
|
||||||
|
|
169
var/www/modules/hiddenServices/Flask_hiddenServices.py
Normal file
169
var/www/modules/hiddenServices/Flask_hiddenServices.py
Normal file
|
@ -0,0 +1,169 @@
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
# -*-coding:UTF-8 -*
|
||||||
|
|
||||||
|
'''
|
||||||
|
Flask functions and routes for the trending modules page
|
||||||
|
'''
|
||||||
|
import redis
|
||||||
|
import datetime
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
from flask import Flask, render_template, jsonify, request, Blueprint
|
||||||
|
|
||||||
|
from Date import Date
|
||||||
|
from HiddenServices import HiddenServices
|
||||||
|
|
||||||
|
# ============ VARIABLES ============
|
||||||
|
import Flask_config
|
||||||
|
|
||||||
|
app = Flask_config.app
|
||||||
|
cfg = Flask_config.cfg
|
||||||
|
baseUrl = Flask_config.baseUrl
|
||||||
|
r_serv_onion = Flask_config.r_serv_onion
|
||||||
|
r_serv_metadata = Flask_config.r_serv_metadata
|
||||||
|
bootstrap_label = Flask_config.bootstrap_label
|
||||||
|
PASTES_FOLDER = Flask_config.PASTES_FOLDER
|
||||||
|
|
||||||
|
hiddenServices = Blueprint('hiddenServices', __name__, template_folder='templates')
|
||||||
|
|
||||||
|
# ============ FUNCTIONS ============
|
||||||
|
def one():
|
||||||
|
return 1
|
||||||
|
|
||||||
|
def get_date_range(num_day):
|
||||||
|
curr_date = datetime.date.today()
|
||||||
|
date = Date( '{}{}{}'.format(str(curr_date.year), str(curr_date.month).zfill(2), str(curr_date.day).zfill(2)) )
|
||||||
|
date_list = []
|
||||||
|
|
||||||
|
for i in range(0, num_day):
|
||||||
|
date_list.append(date.substract_day(i))
|
||||||
|
|
||||||
|
return list(reversed(date_list))
|
||||||
|
|
||||||
|
def unpack_paste_tags(p_tags):
|
||||||
|
l_tags = []
|
||||||
|
for tag in p_tags:
|
||||||
|
complete_tag = tag
|
||||||
|
tag = tag.split('=')
|
||||||
|
if len(tag) > 1:
|
||||||
|
if tag[1] != '':
|
||||||
|
tag = tag[1][1:-1]
|
||||||
|
# no value
|
||||||
|
else:
|
||||||
|
tag = tag[0][1:-1]
|
||||||
|
# use for custom tags
|
||||||
|
else:
|
||||||
|
tag = tag[0]
|
||||||
|
l_tags.append( (tag, complete_tag) )
|
||||||
|
return l_tags
|
||||||
|
|
||||||
|
def get_onion_status(domain, date):
|
||||||
|
if r_serv_onion.sismember('onion_up:'+date , domain):
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
return False
|
||||||
|
# ============= ROUTES ==============
|
||||||
|
|
||||||
|
@hiddenServices.route("/hiddenServices/", methods=['GET'])
|
||||||
|
def hiddenServices_page():
|
||||||
|
last_onions = r_serv_onion.lrange('last_onion', 0 ,-1)
|
||||||
|
list_onion = []
|
||||||
|
|
||||||
|
now = datetime.datetime.now()
|
||||||
|
date = '{}{}{}'.format(now.strftime("%Y"), now.strftime("%m"), now.strftime("%d"))
|
||||||
|
statDomains = {}
|
||||||
|
statDomains['domains_up'] = r_serv_onion.scard('onion_up:{}'.format(date))
|
||||||
|
statDomains['domains_down'] = r_serv_onion.scard('onion_down:{}'.format(date))
|
||||||
|
statDomains['total'] = statDomains['domains_up'] + statDomains['domains_down']
|
||||||
|
statDomains['domains_queue'] = r_serv_onion.scard('onion_domain_crawler_queue')
|
||||||
|
|
||||||
|
for onion in last_onions:
|
||||||
|
metadata_onion = {}
|
||||||
|
metadata_onion['domain'] = onion
|
||||||
|
metadata_onion['last_check'] = r_serv_onion.hget('onion_metadata:{}'.format(onion), 'last_check')
|
||||||
|
metadata_onion['first_seen'] = r_serv_onion.hget('onion_metadata:{}'.format(onion), 'first_seen')
|
||||||
|
if get_onion_status(onion, metadata_onion['last_check']):
|
||||||
|
metadata_onion['status_text'] = 'UP'
|
||||||
|
metadata_onion['status_color'] = 'Green'
|
||||||
|
metadata_onion['status_icon'] = 'fa-check-circle'
|
||||||
|
else:
|
||||||
|
metadata_onion['status_text'] = 'DOWN'
|
||||||
|
metadata_onion['status_color'] = 'Red'
|
||||||
|
metadata_onion['status_icon'] = 'fa-times-circle'
|
||||||
|
list_onion.append(metadata_onion)
|
||||||
|
|
||||||
|
return render_template("hiddenServices.html", last_onions=list_onion, statDomains=statDomains)
|
||||||
|
|
||||||
|
@hiddenServices.route("/hiddenServices/onion_domain", methods=['GET'])
|
||||||
|
def onion_domain():
|
||||||
|
onion_domain = request.args.get('onion_domain')
|
||||||
|
if onion_domain is None or not r_serv_onion.exists('onion_metadata:{}'.format(onion_domain)):
|
||||||
|
return '404'
|
||||||
|
# # TODO: FIXME return 404
|
||||||
|
|
||||||
|
last_check = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'last_check')
|
||||||
|
last_check = '{}/{}/{}'.format(last_check[0:4], last_check[4:6], last_check[6:8])
|
||||||
|
first_seen = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'first_seen')
|
||||||
|
first_seen = '{}/{}/{}'.format(first_seen[0:4], first_seen[4:6], first_seen[6:8])
|
||||||
|
origin_paste = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'paste_parent')
|
||||||
|
|
||||||
|
h = HiddenServices(onion_domain, 'onion')
|
||||||
|
l_pastes = h.get_last_crawled_pastes()
|
||||||
|
if l_pastes:
|
||||||
|
status = True
|
||||||
|
else:
|
||||||
|
status = False
|
||||||
|
screenshot = h.get_domain_random_screenshot(l_pastes)
|
||||||
|
if screenshot:
|
||||||
|
screenshot = screenshot[0]
|
||||||
|
else:
|
||||||
|
screenshot = 'None'
|
||||||
|
|
||||||
|
domain_tags = h.get_domain_tags()
|
||||||
|
|
||||||
|
origin_paste_name = h.get_origin_paste_name()
|
||||||
|
origin_paste_tags = unpack_paste_tags(r_serv_metadata.smembers('tag:{}'.format(origin_paste)))
|
||||||
|
paste_tags = []
|
||||||
|
path_name = []
|
||||||
|
for path in l_pastes:
|
||||||
|
path_name.append(path.replace(PASTES_FOLDER+'/', ''))
|
||||||
|
p_tags = r_serv_metadata.smembers('tag:'+path)
|
||||||
|
paste_tags.append(unpack_paste_tags(p_tags))
|
||||||
|
|
||||||
|
return render_template("showDomain.html", domain=onion_domain, last_check=last_check, first_seen=first_seen,
|
||||||
|
l_pastes=l_pastes, paste_tags=paste_tags, bootstrap_label=bootstrap_label,
|
||||||
|
path_name=path_name, origin_paste_tags=origin_paste_tags, status=status,
|
||||||
|
origin_paste=origin_paste, origin_paste_name=origin_paste_name,
|
||||||
|
domain_tags=domain_tags, screenshot=screenshot)
|
||||||
|
|
||||||
|
@hiddenServices.route("/hiddenServices/onion_son", methods=['GET'])
|
||||||
|
def onion_son():
|
||||||
|
onion_domain = request.args.get('onion_domain')
|
||||||
|
|
||||||
|
h = HiddenServices(onion_domain, 'onion')
|
||||||
|
l_pastes = h.get_last_crawled_pastes()
|
||||||
|
l_son = h.get_domain_son(l_pastes)
|
||||||
|
print(l_son)
|
||||||
|
return 'l_son'
|
||||||
|
|
||||||
|
# ============= JSON ==============
|
||||||
|
@hiddenServices.route("/hiddenServices/domain_crawled_7days_json", methods=['GET'])
|
||||||
|
def domain_crawled_7days_json():
|
||||||
|
type = 'onion'
|
||||||
|
## TODO: # FIXME: 404 error
|
||||||
|
|
||||||
|
date_range = get_date_range(7)
|
||||||
|
json_domain_stats = []
|
||||||
|
#try:
|
||||||
|
for date in date_range:
|
||||||
|
nb_domain_up = r_serv_onion.scard('{}_up:{}'.format(type, date))
|
||||||
|
nb_domain_down = r_serv_onion.scard('{}_up:{}'.format(type, date))
|
||||||
|
date = date[0:4] + '-' + date[4:6] + '-' + date[6:8]
|
||||||
|
json_domain_stats.append({ 'date': date, 'value': int( nb_domain_up ), 'nb_domain_down': int( nb_domain_down )})
|
||||||
|
#except:
|
||||||
|
#return jsonify()
|
||||||
|
|
||||||
|
return jsonify(json_domain_stats)
|
||||||
|
|
||||||
|
# ========= REGISTRATION =========
|
||||||
|
app.register_blueprint(hiddenServices, url_prefix=baseUrl)
|
|
@ -0,0 +1 @@
|
||||||
|
<li id='page-hiddenServices'><a href="{{ url_for('hiddenServices.hiddenServices_page') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>
|
234
var/www/modules/hiddenServices/templates/hiddenServices.html
Normal file
234
var/www/modules/hiddenServices/templates/hiddenServices.html
Normal file
|
@ -0,0 +1,234 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
|
||||||
|
<title>Hidden Service - AIL</title>
|
||||||
|
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png') }}">
|
||||||
|
|
||||||
|
<!-- Core CSS -->
|
||||||
|
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet">
|
||||||
|
<link href="{{ url_for('static', filename='font-awesome/css/font-awesome.css') }}" rel="stylesheet">
|
||||||
|
<link href="{{ url_for('static', filename='css/sb-admin-2.css') }}" rel="stylesheet">
|
||||||
|
<!-- JS -->
|
||||||
|
<script language="javascript" src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||||
|
<script language="javascript" src="{{ url_for('static', filename='js/d3.min.js') }}"></script>
|
||||||
|
|
||||||
|
<style>
|
||||||
|
div.tooltip {
|
||||||
|
position: absolute;
|
||||||
|
text-align: center;
|
||||||
|
padding: 2px;
|
||||||
|
font: 12px sans-serif;
|
||||||
|
background: #ebf4fb;
|
||||||
|
border: 2px solid #b7ddf2;
|
||||||
|
border-radius: 8px;
|
||||||
|
pointer-events: none;
|
||||||
|
color: #000000;
|
||||||
|
}
|
||||||
|
.line_graph {
|
||||||
|
fill: none;
|
||||||
|
stroke: steelblue;
|
||||||
|
stroke-width: 2px;
|
||||||
|
stroke-linejoin: round;
|
||||||
|
stroke-linecap: round;
|
||||||
|
stroke-width: 1.5;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
{% include 'navbar.html' %}
|
||||||
|
|
||||||
|
<div id="page-wrapper">
|
||||||
|
|
||||||
|
<div class="row">
|
||||||
|
<div class="col-md-6">
|
||||||
|
|
||||||
|
<div class="panel panel-primary">
|
||||||
|
<div class="panel-heading">ONION</div>
|
||||||
|
<div class="panel-body">
|
||||||
|
<table class="table table-striped">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Domain</th>
|
||||||
|
<th>First Seen</th>
|
||||||
|
<th>Last Check</th>
|
||||||
|
<th>Status</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{% for metadata_onion in last_onions %}
|
||||||
|
<tr>
|
||||||
|
<td><a target="_blank" href="{{ url_for('hiddenServices.onion_domain') }}?onion_domain={{ metadata_onion['domain'] }}">{{ metadata_onion['domain'] }}</a></td>
|
||||||
|
<td>{{'{}/{}/{}'.format(metadata_onion['first_seen'][0:4], metadata_onion['first_seen'][4:6], metadata_onion['first_seen'][6:8])}}</td>
|
||||||
|
<td>{{'{}/{}/{}'.format(metadata_onion['last_check'][0:4], metadata_onion['last_check'][4:6], metadata_onion['last_check'][6:8])}}</td>
|
||||||
|
<td><div style="color:{{metadata_onion['status_color']}}; display:inline-block">
|
||||||
|
<i class="fa {{metadata_onion['status_icon']}} fa-2x"></i>
|
||||||
|
{{metadata_onion['status_text']}}
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<div id="graph_line">
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="col-md-6">
|
||||||
|
<div class="panel panel-info">
|
||||||
|
<div class="panel-heading">
|
||||||
|
<i class="fa fa-eye-slash"></i> Domains Crawled Today
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<table class="table table-hover table-striped">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<div style="color:Green; display:inline-block">
|
||||||
|
<i class="fa fa-check-circle fa-2x"></i>
|
||||||
|
Domains UP
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<div style="color:Green; display:inline-block">
|
||||||
|
{{ statDomains['domains_up'] }}
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<div style="color:Red; display:inline-block">
|
||||||
|
<i class="fa fa-times-circle fa-2x"></i>
|
||||||
|
Domains DOWN
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<div style="color:Red; display:inline-block">
|
||||||
|
{{ statDomains['domains_down'] }}
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Crawled Domains</td>
|
||||||
|
<td>{{ statDomains['total'] }}</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Domains in Queue</td>
|
||||||
|
<td>{{ statDomains['domains_queue'] }}</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<!-- /#page-wrapper -->
|
||||||
|
|
||||||
|
<script>
|
||||||
|
var all_graph = {};
|
||||||
|
$(document).ready(function(){
|
||||||
|
activePage = "page-hiddenServices"
|
||||||
|
$("#"+activePage).addClass("active");
|
||||||
|
all_graph.line_chart = create_line_chart('graph_line', "{{ url_for('hiddenServices.domain_crawled_7days_json') }}?type=onion");
|
||||||
|
});
|
||||||
|
$(window).on("resize", function() {
|
||||||
|
all_graph.onResize();
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
|
||||||
|
all_graph.onResize = function () {
|
||||||
|
var aspect = 1000 / 500, all_graph = $("#graph_div");
|
||||||
|
var targetWidth = all_graph.parent().width();
|
||||||
|
all_graph.attr("width", targetWidth);
|
||||||
|
all_graph.attr("height", targetWidth / aspect);
|
||||||
|
}
|
||||||
|
window.all_graph = all_graph;
|
||||||
|
|
||||||
|
function create_line_chart(id, url){
|
||||||
|
var width = 650;
|
||||||
|
var height = Math.round(width / 6);
|
||||||
|
var margin = {top: 20, right: 55, bottom: 50, left: 40};
|
||||||
|
var x = d3.scaleTime().range([0, width]);
|
||||||
|
var y = d3.scaleLinear().rangeRound([height, 0]);
|
||||||
|
var xAxis = d3.axisBottom(x);
|
||||||
|
var yAxis = d3.axisLeft(y);
|
||||||
|
var parseTime = d3.timeParse("%Y-%m-%d");
|
||||||
|
var line = d3.line()
|
||||||
|
.x(function(d) {
|
||||||
|
return x(d.date);
|
||||||
|
})
|
||||||
|
.y(function(d) {
|
||||||
|
return y(d.value);
|
||||||
|
});
|
||||||
|
var svg_line = d3.select('#'+id).append('svg')
|
||||||
|
.attr("id", "graph_div")
|
||||||
|
.attr("width", width + margin.left + margin.right)
|
||||||
|
.attr("height", height + margin.top + margin.bottom)
|
||||||
|
.append('g')
|
||||||
|
.attr('transform', "translate("+ margin.left +","+ margin.top +")");
|
||||||
|
var div = d3.select('body').append('div')
|
||||||
|
.attr('class', 'tooltip')
|
||||||
|
.style('opacity', 0);
|
||||||
|
//add div tooltip
|
||||||
|
d3.json(url)
|
||||||
|
.then(function(data){
|
||||||
|
data.forEach(function(d) {
|
||||||
|
d.date_label = d.date;
|
||||||
|
d.date = parseTime(d.date);
|
||||||
|
d.value = +d.value;
|
||||||
|
});
|
||||||
|
// fit the data
|
||||||
|
x.domain(d3.extent(data, function(d) { return d.date; }));
|
||||||
|
//x.domain(data.map(function (d) { return d.date; })); //E
|
||||||
|
y.domain([0, d3.max(data, function(d){ return d.value ; })]);
|
||||||
|
//line
|
||||||
|
svg_line.append("path")
|
||||||
|
.data([data])
|
||||||
|
.attr("class", "line_graph")
|
||||||
|
.attr("d", line);
|
||||||
|
// add X axis
|
||||||
|
svg_line.append("g")
|
||||||
|
.attr("transform", "translate(0," + height + ")")
|
||||||
|
.call(d3.axisBottom(x))
|
||||||
|
.selectAll("text")
|
||||||
|
.style("text-anchor", "end")
|
||||||
|
.attr("transform", "rotate(-45)" );
|
||||||
|
// Add the Y Axis
|
||||||
|
svg_line.append("g")
|
||||||
|
.call(d3.axisLeft(y));
|
||||||
|
//add a dot circle
|
||||||
|
svg_line.selectAll('dot')
|
||||||
|
.data(data).enter()
|
||||||
|
.append('circle')
|
||||||
|
.attr('r', 2)
|
||||||
|
.attr('cx', function(d) { return x(d.date); })
|
||||||
|
.attr('cy', function(d) { return y(d.value); })
|
||||||
|
.on('mouseover', function(d) {
|
||||||
|
div.transition().style('opacity', .9);
|
||||||
|
div.html('' + d.date_label+ '<br/>' + d.value).style('left', (d3.event.pageX) + 'px')
|
||||||
|
.style("left", (d3.event.pageX) + "px")
|
||||||
|
.style("top", (d3.event.pageY - 28) + "px");
|
||||||
|
})
|
||||||
|
.on('mouseout', function(d)
|
||||||
|
{
|
||||||
|
div.transition().style('opacity', 0);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
|
||||||
|
</html>
|
213
var/www/modules/hiddenServices/templates/showDomain.html
Normal file
213
var/www/modules/hiddenServices/templates/showDomain.html
Normal file
|
@ -0,0 +1,213 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
|
||||||
|
<title>Show Domain - AIL</title>
|
||||||
|
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png') }}">
|
||||||
|
|
||||||
|
<!-- Core CSS -->
|
||||||
|
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet">
|
||||||
|
<link href="{{ url_for('static', filename='font-awesome/css/font-awesome.css') }}" rel="stylesheet">
|
||||||
|
<link href="{{ url_for('static', filename='css/sb-admin-2.css') }}" rel="stylesheet">
|
||||||
|
<link href="{{ url_for('static', filename='css/dygraph_gallery.css') }}" rel="stylesheet" type="text/css" />
|
||||||
|
<!-- JS -->
|
||||||
|
<script type="text/javascript" src="{{ url_for('static', filename='js/dygraph-combined.js') }}"></script>
|
||||||
|
<script language="javascript" src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||||
|
<script src="{{ url_for('static', filename='js/jquery.dataTables.min.js')}}"></script>
|
||||||
|
<script src="{{ url_for('static', filename='js/dataTables.bootstrap.js')}}"></script>
|
||||||
|
|
||||||
|
<style>
|
||||||
|
.test thead{
|
||||||
|
background: #d91f2d;
|
||||||
|
color: #fff;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
{% include 'navbar.html' %}
|
||||||
|
|
||||||
|
<div id="page-wrapper">
|
||||||
|
|
||||||
|
<div class="row">
|
||||||
|
|
||||||
|
<div class="col-md-6">
|
||||||
|
<div class="row">
|
||||||
|
<div class="panel panel-info">
|
||||||
|
<div class="panel-heading">
|
||||||
|
{% if status %}
|
||||||
|
<div class="pull-right" style="color:Green;">
|
||||||
|
<i class="fa fa-check-circle fa-2x"></i>
|
||||||
|
UP
|
||||||
|
</div>
|
||||||
|
{% else %}
|
||||||
|
<div class="pull-right" style="color:Red;">
|
||||||
|
<i class="fa fa-times-circle fa-2x"></i>
|
||||||
|
DOWN
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
<h3>{{ domain }} :</h3>
|
||||||
|
<ul class="list-group">
|
||||||
|
<li class="list-group-item">
|
||||||
|
|
||||||
|
<table class="table table-condensed">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>First Seen</th>
|
||||||
|
<th>Last Check</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td class="panelText"><a href="#">{{ first_seen }}</a></td>
|
||||||
|
<td class="panelText"><a href="#">{{ last_check }}</a></td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</li>
|
||||||
|
<li class="list-group-item">
|
||||||
|
Origin Paste: <a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste', paste=origin_paste) }}" />{{ origin_paste_name }}</a>
|
||||||
|
<div>
|
||||||
|
{% for tag in origin_paste_tags %}
|
||||||
|
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag[1] }}">
|
||||||
|
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag[0] }}</span>
|
||||||
|
</a>
|
||||||
|
{% endfor %}
|
||||||
|
<br>
|
||||||
|
</div>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
{% for tag in domain_tags %}
|
||||||
|
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag }}">
|
||||||
|
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag }} <i>{{ domain_tags[tag] }}</i></span>
|
||||||
|
</a>
|
||||||
|
{% endfor %}
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<table class="test table table-striped table-bordered table-hover table-responsive " id="myTable_">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th style="max-width: 800px;">Crawled Pastes</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
|
||||||
|
{% for path in l_pastes %}
|
||||||
|
<tr>
|
||||||
|
<td><a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste') }}?paste={{path}}">{{ path_name[loop.index0] }}</a>
|
||||||
|
<div>
|
||||||
|
{% for tag in paste_tags[loop.index0] %}
|
||||||
|
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag[1] }}">
|
||||||
|
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag[0] }}</span>
|
||||||
|
</a>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="col-md-6">
|
||||||
|
<div class="panel panel-info" style="text-align:center;">
|
||||||
|
<div class="panel-heading">
|
||||||
|
<div class="row">
|
||||||
|
<div class="col-md-8">
|
||||||
|
<input class="center" id="blocks" type="range" min="1" max="50" value="13">
|
||||||
|
</div>
|
||||||
|
<div class="col-md-4">
|
||||||
|
<button class="btn btn-primary btn-tags" onclick="blocks.value=50;pixelate();">
|
||||||
|
<span class="glyphicon glyphicon-zoom-in"></span>
|
||||||
|
<span class="label-icon">Full resolution</span>
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<canvas id="canvas" style="width:100%;"></canvas>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<!-- /#page-wrapper -->
|
||||||
|
|
||||||
|
<script>
|
||||||
|
$(document).ready(function(){
|
||||||
|
activePage = "page-hiddenServices"
|
||||||
|
$("#"+activePage).addClass("active");
|
||||||
|
table = $('#myTable_').DataTable(
|
||||||
|
{
|
||||||
|
"aLengthMenu": [[5, 10, 15, 20, -1], [5, 10, 15, 20, "All"]],
|
||||||
|
"iDisplayLength": 5,
|
||||||
|
"order": [[ 0, "desc" ]]
|
||||||
|
}
|
||||||
|
);
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
var ctx = canvas.getContext('2d'), img = new Image();
|
||||||
|
|
||||||
|
/// turn off image smoothing
|
||||||
|
ctx.webkitImageSmoothingEnabled = false;
|
||||||
|
ctx.imageSmoothingEnabled = false;
|
||||||
|
|
||||||
|
img.onload = pixelate;
|
||||||
|
img.addEventListener("error", img_error);
|
||||||
|
var draw_img = false;
|
||||||
|
|
||||||
|
img.src = "{{ url_for('showsavedpastes.screenshot', filename=screenshot) }}";
|
||||||
|
|
||||||
|
function pixelate() {
|
||||||
|
|
||||||
|
/// use slider value
|
||||||
|
if( blocks.value == 50 ){
|
||||||
|
size = 1;
|
||||||
|
} else {
|
||||||
|
var size = (blocks.value) * 0.01;
|
||||||
|
}
|
||||||
|
|
||||||
|
canvas.width = img.width;
|
||||||
|
canvas.height = img.height;
|
||||||
|
|
||||||
|
/// cache scaled width and height
|
||||||
|
w = canvas.width * size;
|
||||||
|
h = canvas.height * size;
|
||||||
|
|
||||||
|
/// draw original image to the scaled size
|
||||||
|
ctx.drawImage(img, 0, 0, w, h);
|
||||||
|
|
||||||
|
/// pixelated
|
||||||
|
ctx.drawImage(canvas, 0, 0, w, h, 0, 0, canvas.width, canvas.height);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
blocks.addEventListener('change', pixelate, false);
|
||||||
|
|
||||||
|
function img_error() {
|
||||||
|
img.onerror=null;
|
||||||
|
img.src="{{ url_for('static', filename='image/AIL.png') }}";
|
||||||
|
blocks.value = 50;
|
||||||
|
pixelate;
|
||||||
|
}
|
||||||
|
</script>
|
||||||
|
|
||||||
|
</body>
|
||||||
|
|
||||||
|
</html>
|
|
@ -8,7 +8,7 @@ import redis
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import flask
|
import flask
|
||||||
from flask import Flask, render_template, jsonify, request, Blueprint, make_response, redirect, url_for, Response
|
from flask import Flask, render_template, jsonify, request, Blueprint, make_response, Response, send_from_directory, redirect, url_for
|
||||||
import difflib
|
import difflib
|
||||||
import ssdeep
|
import ssdeep
|
||||||
|
|
||||||
|
@ -25,6 +25,7 @@ r_serv_pasteName = Flask_config.r_serv_pasteName
|
||||||
r_serv_metadata = Flask_config.r_serv_metadata
|
r_serv_metadata = Flask_config.r_serv_metadata
|
||||||
r_serv_tags = Flask_config.r_serv_tags
|
r_serv_tags = Flask_config.r_serv_tags
|
||||||
r_serv_statistics = Flask_config.r_serv_statistics
|
r_serv_statistics = Flask_config.r_serv_statistics
|
||||||
|
r_serv_onion = Flask_config.r_serv_onion
|
||||||
max_preview_char = Flask_config.max_preview_char
|
max_preview_char = Flask_config.max_preview_char
|
||||||
max_preview_modal = Flask_config.max_preview_modal
|
max_preview_modal = Flask_config.max_preview_modal
|
||||||
DiffMaxLineLength = Flask_config.DiffMaxLineLength
|
DiffMaxLineLength = Flask_config.DiffMaxLineLength
|
||||||
|
@ -33,6 +34,7 @@ misp_event_url = Flask_config.misp_event_url
|
||||||
hive_case_url = Flask_config.hive_case_url
|
hive_case_url = Flask_config.hive_case_url
|
||||||
vt_enabled = Flask_config.vt_enabled
|
vt_enabled = Flask_config.vt_enabled
|
||||||
PASTES_FOLDER = Flask_config.PASTES_FOLDER
|
PASTES_FOLDER = Flask_config.PASTES_FOLDER
|
||||||
|
SCREENSHOT_FOLDER = Flask_config.SCREENSHOT_FOLDER
|
||||||
|
|
||||||
showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templates')
|
showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templates')
|
||||||
|
|
||||||
|
@ -41,6 +43,8 @@ showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templa
|
||||||
def showpaste(content_range, requested_path):
|
def showpaste(content_range, requested_path):
|
||||||
if PASTES_FOLDER not in requested_path:
|
if PASTES_FOLDER not in requested_path:
|
||||||
requested_path = os.path.join(PASTES_FOLDER, requested_path)
|
requested_path = os.path.join(PASTES_FOLDER, requested_path)
|
||||||
|
# remove old full path
|
||||||
|
#requested_path = requested_path.replace(PASTES_FOLDER, '')
|
||||||
# escape directory transversal
|
# escape directory transversal
|
||||||
if os.path.commonprefix((os.path.realpath(requested_path),PASTES_FOLDER)) != PASTES_FOLDER:
|
if os.path.commonprefix((os.path.realpath(requested_path),PASTES_FOLDER)) != PASTES_FOLDER:
|
||||||
return 'path transversal detected'
|
return 'path transversal detected'
|
||||||
|
@ -175,6 +179,16 @@ def showpaste(content_range, requested_path):
|
||||||
|
|
||||||
l_64.append( (file_icon, estimated_type, hash, saved_path, nb_in_file, b64_vt, b64_vt_link, b64_vt_report) )
|
l_64.append( (file_icon, estimated_type, hash, saved_path, nb_in_file, b64_vt, b64_vt_link, b64_vt_report) )
|
||||||
|
|
||||||
|
crawler_metadata = {}
|
||||||
|
if 'infoleak:submission="crawler"' in l_tags:
|
||||||
|
crawler_metadata['get_metadata'] = True
|
||||||
|
crawler_metadata['domain'] = r_serv_metadata.hget('paste_metadata:'+requested_path, 'domain')
|
||||||
|
crawler_metadata['paste_father'] = r_serv_metadata.hget('paste_metadata:'+requested_path, 'father')
|
||||||
|
crawler_metadata['real_link'] = r_serv_metadata.hget('paste_metadata:'+requested_path,'real_link')
|
||||||
|
crawler_metadata['screenshot'] = paste.get_p_rel_path()
|
||||||
|
else:
|
||||||
|
crawler_metadata['get_metadata'] = False
|
||||||
|
|
||||||
if Flask_config.pymisp is False:
|
if Flask_config.pymisp is False:
|
||||||
misp = False
|
misp = False
|
||||||
else:
|
else:
|
||||||
|
@ -202,6 +216,7 @@ def showpaste(content_range, requested_path):
|
||||||
hive_url = hive_case_url.replace('id_here', hive_case)
|
hive_url = hive_case_url.replace('id_here', hive_case)
|
||||||
|
|
||||||
return render_template("show_saved_paste.html", date=p_date, bootstrap_label=bootstrap_label, active_taxonomies=active_taxonomies, active_galaxies=active_galaxies, list_tags=list_tags, source=p_source, encoding=p_encoding, language=p_language, size=p_size, mime=p_mime, lineinfo=p_lineinfo, content=p_content, initsize=len(p_content), duplicate_list = p_duplicate_list, simil_list = p_simil_list, hashtype_list = p_hashtype_list, date_list=p_date_list,
|
return render_template("show_saved_paste.html", date=p_date, bootstrap_label=bootstrap_label, active_taxonomies=active_taxonomies, active_galaxies=active_galaxies, list_tags=list_tags, source=p_source, encoding=p_encoding, language=p_language, size=p_size, mime=p_mime, lineinfo=p_lineinfo, content=p_content, initsize=len(p_content), duplicate_list = p_duplicate_list, simil_list = p_simil_list, hashtype_list = p_hashtype_list, date_list=p_date_list,
|
||||||
|
crawler_metadata=crawler_metadata,
|
||||||
l_64=l_64, vt_enabled=vt_enabled, misp=misp, hive=hive, misp_eventid=misp_eventid, misp_url=misp_url, hive_caseid=hive_caseid, hive_url=hive_url)
|
l_64=l_64, vt_enabled=vt_enabled, misp=misp, hive=hive, misp_eventid=misp_eventid, misp_url=misp_url, hive_caseid=hive_caseid, hive_url=hive_url)
|
||||||
|
|
||||||
# ============ ROUTES ============
|
# ============ ROUTES ============
|
||||||
|
@ -250,6 +265,10 @@ def showDiff():
|
||||||
the_html = htmlD.make_file(lines1, lines2)
|
the_html = htmlD.make_file(lines1, lines2)
|
||||||
return the_html
|
return the_html
|
||||||
|
|
||||||
|
@showsavedpastes.route('/screenshot/<path:filename>')
|
||||||
|
def screenshot(filename):
|
||||||
|
return send_from_directory(SCREENSHOT_FOLDER, filename+'.png', as_attachment=True)
|
||||||
|
|
||||||
@showsavedpastes.route('/send_file_to_vt/', methods=['POST'])
|
@showsavedpastes.route('/send_file_to_vt/', methods=['POST'])
|
||||||
def send_file_to_vt():
|
def send_file_to_vt():
|
||||||
b64_path = request.form['b64_path']
|
b64_path = request.form['b64_path']
|
||||||
|
|
|
@ -351,7 +351,6 @@
|
||||||
<div class="panel-body" id="panel-body">
|
<div class="panel-body" id="panel-body">
|
||||||
|
|
||||||
{% if duplicate_list|length == 0 %}
|
{% if duplicate_list|length == 0 %}
|
||||||
<h3> No Duplicate </h3>
|
|
||||||
{% else %}
|
{% else %}
|
||||||
<h3> Duplicate list: </h3>
|
<h3> Duplicate list: </h3>
|
||||||
<table id="tableDup" class="table table-striped table-bordered">
|
<table id="tableDup" class="table table-striped table-bordered">
|
||||||
|
@ -420,6 +419,57 @@
|
||||||
</table>
|
</table>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
|
{% if crawler_metadata['get_metadata'] %}
|
||||||
|
<div class="row">
|
||||||
|
|
||||||
|
<div class="col-md-5">
|
||||||
|
<div class="row">
|
||||||
|
<div class="panel panel-info">
|
||||||
|
<div class="panel-heading">
|
||||||
|
<i class="fa fa-eye-slash"></i> Crawled Paste
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<table class="table table-hover table-striped">
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>Domain</td>
|
||||||
|
<td><a target="_blank" href="{{ url_for('hiddenServices.onion_domain') }}?onion_domain={{ crawler_metadata['domain'] }}" id='onion_domain'>{{ crawler_metadata['domain'] }}</a></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Father</td>
|
||||||
|
<td><a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste') }}?paste={{ crawler_metadata['paste_father'] }}" id='paste_father'>{{ crawler_metadata['paste_father'] }}</a></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Source link</td>
|
||||||
|
<td>{{ crawler_metadata['real_link'] }}</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="col-md-7">
|
||||||
|
<div class="panel panel-info" style="text-align:center;">
|
||||||
|
<div class="panel-heading">
|
||||||
|
<div class="row">
|
||||||
|
<div class="col-md-8">
|
||||||
|
<input class="center" id="blocks" type="range" min="1" max="50" value="13">
|
||||||
|
</div>
|
||||||
|
<div class="col-md-4">
|
||||||
|
<button class="btn btn-primary btn-tags" onclick="blocks.value=50;pixelate();">
|
||||||
|
<span class="glyphicon glyphicon-zoom-in"></span>
|
||||||
|
<span class="label-icon">Full resolution</span>
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<canvas id="canvas" style="width:100%;"></canvas>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
<h3> Content: </h3>
|
<h3> Content: </h3>
|
||||||
<a href="{{ url_for('showsavedpastes.showsavedrawpaste') }}?paste={{ request.args.get('paste') }}" id='raw_paste' > [Raw content] </a>
|
<a href="{{ url_for('showsavedpastes.showsavedrawpaste') }}?paste={{ request.args.get('paste') }}" id='raw_paste' > [Raw content] </a>
|
||||||
<p data-initsize="{{ initsize }}"> <pre id="paste-holder">{{ content }}</pre></p>
|
<p data-initsize="{{ initsize }}"> <pre id="paste-holder">{{ content }}</pre></p>
|
||||||
|
@ -535,5 +585,51 @@
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
var ctx = canvas.getContext('2d'), img = new Image();
|
||||||
|
|
||||||
|
/// turn off image smoothing
|
||||||
|
ctx.webkitImageSmoothingEnabled = false;
|
||||||
|
ctx.imageSmoothingEnabled = false;
|
||||||
|
|
||||||
|
img.onload = pixelate;
|
||||||
|
img.addEventListener("error", img_error);
|
||||||
|
var draw_img = false;
|
||||||
|
|
||||||
|
img.src = "{{ url_for('showsavedpastes.screenshot', filename=crawler_metadata['screenshot']) }}";
|
||||||
|
|
||||||
|
function pixelate() {
|
||||||
|
|
||||||
|
/// use slider value
|
||||||
|
if( blocks.value == 50 ){
|
||||||
|
size = 1;
|
||||||
|
} else {
|
||||||
|
var size = (blocks.value) * 0.01;
|
||||||
|
}
|
||||||
|
|
||||||
|
canvas.width = img.width;
|
||||||
|
canvas.height = img.height;
|
||||||
|
|
||||||
|
/// cache scaled width and height
|
||||||
|
w = canvas.width * size;
|
||||||
|
h = canvas.height * size;
|
||||||
|
|
||||||
|
/// draw original image to the scaled size
|
||||||
|
ctx.drawImage(img, 0, 0, w, h);
|
||||||
|
|
||||||
|
/// pixelated
|
||||||
|
ctx.drawImage(canvas, 0, 0, w, h, 0, 0, canvas.width, canvas.height);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
function img_error() {
|
||||||
|
img.onerror=null;
|
||||||
|
img.src="{{ url_for('static', filename='image/AIL.png') }}";
|
||||||
|
blocks.value = 50;
|
||||||
|
pixelate;
|
||||||
|
}
|
||||||
|
|
||||||
|
blocks.addEventListener('change', pixelate, false);
|
||||||
|
</script>
|
||||||
|
|
||||||
</html>
|
</html>
|
||||||
|
|
Loading…
Reference in a new issue