Merge pull request #260 from CIRCL/onion_crawler

Onion crawler
This commit is contained in:
Alexandre Dulaunoy 2018-10-02 13:42:45 +02:00 committed by GitHub
commit dd34382721
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
27 changed files with 1734 additions and 17 deletions

4
.gitignore vendored
View file

@ -11,9 +11,10 @@ ardb
faup faup
tlsh tlsh
Blooms Blooms
LEVEL_DB_DATA
PASTES PASTES
CRAWLED_SCREENSHOT
BASE64 BASE64
HASHS
DATA_ARDB DATA_ARDB
indexdir/ indexdir/
logs/ logs/
@ -33,6 +34,7 @@ var/www/submitted
bin/packages/config.cfg bin/packages/config.cfg
bin/packages/config.cfg.backup bin/packages/config.cfg.backup
configs/keys configs/keys
files
# installed files # installed files
nltk_data/ nltk_data/

View file

@ -84,9 +84,9 @@ You can navigate into the interface by using arrow keys. In order to perform an
To change list, you can press the <TAB> key. To change list, you can press the <TAB> key.
Also, you can quickly stop or start modules by clicking on the <K> or <S> symbol respectively. These are located in the _Action_ column. Also, you can quickly stop or start modules by clicking on the ``<K>`` or ``<S>`` symbol respectively. These are located in the _Action_ column.
Finally, you can quit this program by pressing either <q> or <C-c> Finally, you can quit this program by pressing either ``<q>`` or ``<C-c>``.
Terms frequency usage Terms frequency usage
@ -96,3 +96,52 @@ In AIL, you can track terms, set of terms and even regexes without creating a de
- You can track a term by simply putting it in the box. - You can track a term by simply putting it in the box.
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\` - You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`. - You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
Crawler
---------------------
In AIL, you can crawl hidden services.
There are two types of installation. You can install a *local* or a *remote* Splash server.
``(Splash host) = the server running the splash service``
``(AIL host) = the server running AIL``
### Installation/Configuration
1. *(Splash host)* Launch ``crawler_hidden_services_install.sh`` to install all requirements (type ``y`` if a localhost splah server is used or use the ``-y`` option)
2. *(Splash host)* To install and setup your tor proxy:
- Install the tor proxy: ``sudo apt-get install tor -y``
(Not required if ``Splah host == AIL host`` - The tor proxy is installed by default in AIL)
- Add the following line ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
(for a linux docker, the localhost IP is *172.17.0.1*; Should be adapted for other platform)
- Restart the tor proxy: ``sudo service tor restart``
3. *(AIL host)* Edit the ``/bin/packages/config.cfg`` file:
- In the crawler section, set ``activate_crawler`` to ``True``
- Change the IP address of Splash servers if needed (remote only)
- Set ``splash_onion_port`` according to your Splash servers port numbers that will be used.
those ports numbers should be described as a single port (ex: 8050) or a port range (ex: 8050-8052 for 8050,8051,8052 ports).
### Starting the scripts
- *(Splash host)* Launch all Splash servers with:
```sudo ./bin/torcrawler/launch_splash_crawler.sh -f <config absolute_path> -p <port_start> -n <number_of_splash>```
With ``<port_start>`` and ``<number_of_splash>`` matching those specified at ``splash_onion_port`` in the configuration file of point 3 (``/bin/packages/config.cfg``)
All Splash dockers are launched inside the ``Docker_Splash`` screen. You can use ``sudo screen -r Docker_Splash`` to connect to the screen session and check all Splash servers status.
- (AIL host) launch all AIL crawler scripts using:
```./bin/LAUNCH.sh -c```
### TL;DR - Local setup
#### Installation
- ```crawler_hidden_services_install.sh -y```
- Add the following line in ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
- ```sudo service tor restart```
- set activate_crawler to True in ``/bin/packages/config.cfg``
#### Start
- ```sudo ./bin/torcrawler/launch_splash_crawler.sh -f $AIL_HOME/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1";```
- ```./bin/LAUNCH.sh -c```

View file

@ -32,6 +32,7 @@ def decode_base58(bc, length):
for char in bc: for char in bc:
n = n * 58 + digits58.index(char) n = n * 58 + digits58.index(char)
return n.to_bytes(length, 'big') return n.to_bytes(length, 'big')
def check_bc(bc): def check_bc(bc):
try: try:
bcbytes = decode_base58(bc, 25) bcbytes = decode_base58(bc, 25)

239
bin/Crawler.py Executable file
View file

@ -0,0 +1,239 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
import re
import redis
import datetime
import time
import subprocess
import requests
sys.path.append(os.environ['AIL_BIN'])
from Helper import Process
from pubsublogger import publisher
def on_error_send_message_back_in_queue(type_hidden_service, domain, message):
# send this msg back in the queue
if not r_onion.sismember('{}_domain_crawler_queue'.format(type_hidden_service), domain):
r_onion.sadd('{}_domain_crawler_queue'.format(type_hidden_service), domain)
r_onion.sadd('{}_crawler_queue'.format(type_hidden_service), message)
def crawl_onion(url, domain, date, date_month, message):
#if not r_onion.sismember('full_onion_up', domain) and not r_onion.sismember('onion_down:'+date , domain):
super_father = r_serv_metadata.hget('paste_metadata:'+paste, 'super_father')
if super_father is None:
super_father=paste
try:
r = requests.get(splash_url , timeout=30.0)
except Exception:
# TODO: relaunch docker or send error message
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
publisher.error('{} SPASH DOWN'.format(splash_url))
print('--------------------------------------')
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
print(' {} DOWN'.format(splash_url))
exit(1)
if r.status_code == 200:
process = subprocess.Popen(["python", './torcrawler/tor_crawler.py', splash_url, type_hidden_service, url, domain, paste, super_father],
stdout=subprocess.PIPE)
while process.poll() is None:
time.sleep(1)
if process.returncode == 0:
output = process.stdout.read().decode()
print(output)
# error: splash:Connection to proxy refused
if 'Connection to proxy refused' in output:
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
publisher.error('{} SPASH, PROXY DOWN OR BAD CONFIGURATION'.format(splash_url))
print('------------------------------------------------------------------------')
print(' \033[91m SPLASH: Connection to proxy refused')
print('')
print(' PROXY DOWN OR BAD CONFIGURATION\033[0m'.format(splash_url))
print('------------------------------------------------------------------------')
exit(-2)
else:
print(process.stdout.read())
exit(-1)
else:
on_error_send_message_back_in_queue(type_hidden_service, domain, message)
print('--------------------------------------')
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
print(' {} DOWN'.format(splash_url))
exit(1)
if __name__ == '__main__':
if len(sys.argv) != 3:
print('usage:', 'Crawler.py', 'type_hidden_service (onion or i2p or regular)', 'splash_port')
exit(1)
type_hidden_service = sys.argv[1]
splash_port = sys.argv[2]
publisher.port = 6380
publisher.channel = "Script"
publisher.info("Script Crawler started")
config_section = 'Crawler'
# Setup the I/O queues
p = Process(config_section)
url_onion = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
re.compile(url_onion)
url_i2p = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
re.compile(url_i2p)
if type_hidden_service == 'onion':
regex_hidden_service = url_onion
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_onion"), splash_port)
elif type_hidden_service == 'i2p':
regex_hidden_service = url_i2p
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_i2p"), splash_port)
elif type_hidden_service == 'regular':
regex_hidden_service = url_i2p
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url_onion"), splash_port)
else:
print('incorrect crawler type: {}'.format(type_hidden_service))
exit(0)
print('splash url: {}'.format(splash_url))
crawler_depth_limit = p.config.getint("Crawler", "crawler_depth_limit")
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], p.config.get("Directories", "pastes"))
r_serv_metadata = redis.StrictRedis(
host=p.config.get("ARDB_Metadata", "host"),
port=p.config.getint("ARDB_Metadata", "port"),
db=p.config.getint("ARDB_Metadata", "db"),
decode_responses=True)
r_cache = redis.StrictRedis(
host=p.config.get("Redis_Cache", "host"),
port=p.config.getint("Redis_Cache", "port"),
db=p.config.getint("Redis_Cache", "db"),
decode_responses=True)
r_onion = redis.StrictRedis(
host=p.config.get("ARDB_Onion", "host"),
port=p.config.getint("ARDB_Onion", "port"),
db=p.config.getint("ARDB_Onion", "db"),
decode_responses=True)
# load domains blacklist
try:
with open(os.environ['AIL_BIN']+'/torcrawler/blacklist_onion.txt', 'r') as f:
r_onion.delete('blacklist_{}'.format(type_hidden_service))
lines = f.read().splitlines()
for line in lines:
r_onion.sadd('blacklist_{}'.format(type_hidden_service), line)
except Exception:
pass
while True:
# Recovering the streamed message informations.
message = r_onion.spop('{}_crawler_queue'.format(type_hidden_service))
if message is not None:
splitted = message.split(';')
if len(splitted) == 2:
url, paste = splitted
paste = paste.replace(PASTES_FOLDER+'/', '')
url_list = re.findall(regex_hidden_service, url)[0]
if url_list[1] == '':
url= 'http://{}'.format(url)
link, s, credential, subdomain, domain, host, port, \
resource_path, query_string, f1, f2, f3, f4 = url_list
domain = url_list[4]
r_onion.srem('{}_domain_crawler_queue'.format(type_hidden_service), domain)
domain_url = 'http://{}'.format(domain)
print('\033[92m------------------START CRAWLER------------------\033[0m')
print('crawler type: {}'.format(type_hidden_service))
print('\033[92m-------------------------------------------------\033[0m')
print('url: {}'.format(url))
print('domain: {}'.format(domain))
print('domain_url: {}'.format(domain_url))
if not r_onion.sismember('blacklist_{}'.format(type_hidden_service), domain):
date = datetime.datetime.now().strftime("%Y%m%d")
date_month = datetime.datetime.now().strftime("%Y%m")
if not r_onion.sismember('month_{}_up:{}'.format(type_hidden_service, date_month), domain) and not r_onion.sismember('{}_down:{}'.format(type_hidden_service, date), domain):
crawl_onion(url, domain, date, date_month, message)
if url != domain_url:
print(url)
print(domain_url)
crawl_onion(domain_url, domain, date, date_month, message)
# save down onion
if not r_onion.sismember('{}_up:{}'.format(type_hidden_service, date), domain):
r_onion.sadd('{}_down:{}'.format(type_hidden_service, date), domain)
#r_onion.sadd('{}_down_link:{}'.format(type_hidden_service, date), url)
#r_onion.hincrby('{}_link_down'.format(type_hidden_service), url, 1)
if not r_onion.exists('{}_metadata:{}'.format(type_hidden_service, domain)):
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'first_seen', date)
r_onion.hset('{}_metadata:{}'.format(type_hidden_service,domain), 'last_seen', date)
else:
#r_onion.hincrby('{}_link_up'.format(type_hidden_service), url, 1)
if r_onion.sismember('month_{}_up:{}'.format(type_hidden_service, date_month), domain) and r_serv_metadata.exists('paste_children:'+paste):
msg = 'infoleak:automatic-detection="{}";{}'.format(type_hidden_service, paste)
p.populate_set_out(msg, 'Tags')
# last check
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'last_check', date)
# last_father
r_onion.hset('{}_metadata:{}'.format(type_hidden_service, domain), 'paste_parent', paste)
# add onion screenshot history
# add crawled days
if r_onion.lindex('{}_history:{}'.format(type_hidden_service, domain), 0) != date:
r_onion.lpush('{}_history:{}'.format(type_hidden_service, domain), date)
# add crawled history by date
r_onion.lpush('{}_history:{}:{}'.format(type_hidden_service, domain, date), paste) #add datetime here
# check external onions links (full_scrawl)
external_domains = set()
for link in r_onion.smembers('domain_{}_external_links:{}'.format(type_hidden_service, domain)):
external_domain = re.findall(url_onion, link)
external_domain.extend(re.findall(url_i2p, link))
if len(external_domain) > 0:
external_domain = external_domain[0][4]
else:
continue
if '.onion' in external_domain and external_domain != domain:
external_domains.add(external_domain)
elif '.i2p' in external_domain and external_domain != domain:
external_domains.add(external_domain)
if len(external_domains) >= 10:
r_onion.sadd('{}_potential_source'.format(type_hidden_service), domain)
r_onion.delete('domain_{}_external_links:{}'.format(type_hidden_service, domain))
print(r_onion.smembers('domain_{}_external_links:{}'.format(type_hidden_service, domain)))
# update list, last crawled onions
r_onion.lpush('last_{}'.format(type_hidden_service), domain)
r_onion.ltrim('last_{}'.format(type_hidden_service), 0, 15)
else:
continue
else:
time.sleep(1)

View file

@ -27,6 +27,7 @@ islogged=`screen -ls | egrep '[0-9]+.Logging_AIL' | cut -d. -f1`
isqueued=`screen -ls | egrep '[0-9]+.Queue_AIL' | cut -d. -f1` isqueued=`screen -ls | egrep '[0-9]+.Queue_AIL' | cut -d. -f1`
isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1` isscripted=`screen -ls | egrep '[0-9]+.Script_AIL' | cut -d. -f1`
isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1` isflasked=`screen -ls | egrep '[0-9]+.Flask_AIL' | cut -d. -f1`
iscrawler=`screen -ls | egrep '[0-9]+.Crawler_AIL' | cut -d. -f1`
isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1` isfeeded=`screen -ls | egrep '[0-9]+.Feeder_Pystemon' | cut -d. -f1`
function helptext { function helptext {
@ -199,6 +200,35 @@ function launching_scripts {
} }
function launching_crawler {
if [[ ! $iscrawler ]]; then
CONFIG=$AIL_BIN/packages/config.cfg
lport=$(awk '/^\[Crawler\]/{f=1} f==1&&/^splash_onion_port/{print $3;exit}' "${CONFIG}")
IFS='-' read -ra PORTS <<< "$lport"
if [ ${#PORTS[@]} -eq 1 ]
then
first_port=${PORTS[0]}
last_port=${PORTS[0]}
else
first_port=${PORTS[0]}
last_port=${PORTS[1]}
fi
screen -dmS "Crawler_AIL"
sleep 0.1
for ((i=first_port;i<=last_port;i++)); do
screen -S "Crawler_AIL" -X screen -t "onion_crawler:$i" bash -c 'cd '${AIL_BIN}'; ./Crawler.py onion '$i'; read x'
sleep 0.1
done
echo -e $GREEN"\t* Launching Crawler_AIL scripts"$DEFAULT
else
echo -e $RED"\t* A screen is already launched"$DEFAULT
fi
}
function shutting_down_redis { function shutting_down_redis {
redis_dir=${AIL_HOME}/redis/src/ redis_dir=${AIL_HOME}/redis/src/
bash -c $redis_dir'redis-cli -p 6379 SHUTDOWN' bash -c $redis_dir'redis-cli -p 6379 SHUTDOWN'
@ -420,6 +450,9 @@ function launch_all {
Flask) Flask)
launch_flask; launch_flask;
;; ;;
Crawler)
launching_crawler;
;;
Killall) Killall)
killall; killall;
;; ;;
@ -445,9 +478,9 @@ while [ "$1" != "" ]; do
;; ;;
-k | --killAll ) killall; -k | --killAll ) killall;
;; ;;
-c | --configUpdate ) checking_configuration "manual"; -t | --thirdpartyUpdate ) update_thirdparty;
;; ;;
-t | --thirdpartyUpdate ) update_thirdparty; -c | --crawler ) launching_crawler;
;; ;;
-f | --launchFeeder ) launch_feeder; -f | --launchFeeder ) launch_feeder;
;; ;;

View file

@ -21,7 +21,6 @@ Requirements
*Need the ZMQ_Sub_Onion_Q Module running to be able to work properly. *Need the ZMQ_Sub_Onion_Q Module running to be able to work properly.
""" """
import pprint
import time import time
from packages import Paste from packages import Paste
from pubsublogger import publisher from pubsublogger import publisher
@ -30,6 +29,7 @@ import os
import base64 import base64
import subprocess import subprocess
import redis import redis
import re
from Helper import Process from Helper import Process
@ -97,6 +97,12 @@ if __name__ == "__main__":
db=p.config.getint("Redis_Cache", "db"), db=p.config.getint("Redis_Cache", "db"),
decode_responses=True) decode_responses=True)
r_onion = redis.StrictRedis(
host=p.config.get("ARDB_Onion", "host"),
port=p.config.getint("ARDB_Onion", "port"),
db=p.config.getint("ARDB_Onion", "db"),
decode_responses=True)
# FUNCTIONS # # FUNCTIONS #
publisher.info("Script subscribed to channel onion_categ") publisher.info("Script subscribed to channel onion_categ")
@ -107,9 +113,21 @@ if __name__ == "__main__":
message = p.get_from_set() message = p.get_from_set()
prec_filename = None prec_filename = None
# send to crawler:
activate_crawler = p.config.get("Crawler", "activate_crawler")
if activate_crawler == 'True':
activate_crawler = True
print('Crawler enabled')
else:
activate_crawler = False
print('Crawler disabled')
# Thanks to Faup project for this regex # Thanks to Faup project for this regex
# https://github.com/stricaud/faup # https://github.com/stricaud/faup
url_regex = "((http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)" url_regex = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.onion)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
i2p_regex = "((http|https|ftp)?(?:\://)?([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.i2p)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*)"
re.compile(url_regex)
while True: while True:
if message is not None: if message is not None:
@ -123,12 +141,32 @@ if __name__ == "__main__":
PST = Paste.Paste(filename) PST = Paste.Paste(filename)
for x in PST.get_regex(url_regex): for x in PST.get_regex(url_regex):
print(x)
# Extracting url with regex # Extracting url with regex
url, s, credential, subdomain, domain, host, port, \ url, s, credential, subdomain, domain, host, port, \
resource_path, query_string, f1, f2, f3, f4 = x resource_path, query_string, f1, f2, f3, f4 = x
domains_list.append(domain) if '.onion' in url:
urls.append(url) print(url)
domains_list.append(domain)
urls.append(url)
'''
for x in PST.get_regex(i2p_regex):
# Extracting url with regex
url, s, credential, subdomain, domain, host, port, \
resource_path, query_string, f1, f2, f3, f4 = x
if '.i2p' in url:
print('add i2p')
print(domain)
if not r_onion.sismember('i2p_domain', domain) and not r_onion.sismember('i2p_domain_crawler_queue', domain):
r_onion.sadd('i2p_domain', domain)
r_onion.sadd('i2p_link', url)
r_onion.sadd('i2p_domain_crawler_queue', domain)
msg = '{};{}'.format(url,PST.p_path)
r_onion.sadd('i2p_crawler_queue', msg)
'''
# Saving the list of extracted onion domains. # Saving the list of extracted onion domains.
PST.__setattr__(channel, domains_list) PST.__setattr__(channel, domains_list)
@ -149,12 +187,33 @@ if __name__ == "__main__":
to_print = 'Onion;{};{};{};'.format(PST.p_source, to_print = 'Onion;{};{};{};'.format(PST.p_source,
PST.p_date, PST.p_date,
PST.p_name) PST.p_name)
for url in fetch(p, r_cache, urls, domains_list, path):
publisher.info('{}Checked {};{}'.format(to_print, url, PST.p_path))
p.populate_set_out('onion;{}'.format(PST.p_path), 'alertHandler')
msg = 'infoleak:automatic-detection="onion";{}'.format(PST.p_path) if activate_crawler:
p.populate_set_out(msg, 'Tags') date_month = datetime.datetime.now().strftime("%Y%m")
date = datetime.datetime.now().strftime("%Y%m%d")
for url in urls:
domain = re.findall(url_regex, url)
if len(domain) > 0:
domain = domain[0][4]
else:
continue
if not r_onion.sismember('month_onion_up:{}'.format(date_month), domain) and not r_onion.sismember('onion_down:'+date , domain):
if not r_onion.sismember('onion_domain_crawler_queue', domain):
print('send to onion crawler')
r_onion.sadd('onion_domain_crawler_queue', domain)
msg = '{};{}'.format(url,PST.p_path)
r_onion.sadd('onion_crawler_queue', msg)
#p.populate_set_out(msg, 'Crawler')
else:
for url in fetch(p, r_cache, urls, domains_list, path):
publisher.info('{}Checked {};{}'.format(to_print, url, PST.p_path))
p.populate_set_out('onion;{}'.format(PST.p_path), 'alertHandler')
msg = 'infoleak:automatic-detection="onion";{}'.format(PST.p_path)
p.populate_set_out(msg, 'Tags')
else: else:
publisher.info('{}Onion related;{}'.format(to_print, PST.p_path)) publisher.info('{}Onion related;{}'.format(to_print, PST.p_path))

183
bin/packages/HiddenServices.py Executable file
View file

@ -0,0 +1,183 @@
#!/usr/bin/python3
"""
The ``hiddenServices Class``
===================
Use it to create an object from an existing paste or other random file.
Conditions to fulfill to be able to use this class correctly:
-------------------------------------------------------------
1/ The paste need to be saved on disk somewhere (have an accessible path)
2/ The paste need to be gziped.
3/ The filepath need to look like something like this:
/directory/source/year/month/day/paste.gz
"""
import os
import gzip
import redis
import random
import configparser
import sys
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'packages/'))
from Date import Date
class HiddenServices(object):
"""
This class representing a hiddenServices as an object.
When created, the object will have by default some "main attributes"
:Example:
PST = HiddenServices("xxxxxxxx.onion", "onion")
"""
def __init__(self, domain, type):
configfile = os.path.join(os.environ['AIL_BIN'], 'packages/config.cfg')
if not os.path.exists(configfile):
raise Exception('Unable to find the configuration file. \
Did you set environment variables? \
Or activate the virtualenv.')
cfg = configparser.ConfigParser()
cfg.read(configfile)
self.r_serv_onion = redis.StrictRedis(
host=cfg.get("ARDB_Onion", "host"),
port=cfg.getint("ARDB_Onion", "port"),
db=cfg.getint("ARDB_Onion", "db"),
decode_responses=True)
self.r_serv_metadata = redis.StrictRedis(
host=cfg.get("ARDB_Metadata", "host"),
port=cfg.getint("ARDB_Metadata", "port"),
db=cfg.getint("ARDB_Metadata", "db"),
decode_responses=True)
self.domain = domain
self.type = type
self.tags = {}
if type == 'onion':
self.paste_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
self.paste_crawled_directory = os.path.join(self.paste_directory, cfg.get("Directories", "crawled"))
self.paste_crawled_directory_name = cfg.get("Directories", "crawled")
self.screenshot_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
elif type == 'i2p':
self.paste_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
self.screenshot_directory = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
else:
## TODO: # FIXME: add error
pass
def get_origin_paste_name(self):
origin_paste = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'paste_parent')
if origin_paste is None:
return ''
return origin_paste.replace(self.paste_directory+'/', '')
def get_domain_tags(self):
return self.tags
def update_domain_tags(self, children):
p_tags = self.r_serv_metadata.smembers('tag:'+children)
for tag in p_tags:
self.tags[tag] = self.tags.get(tag, 0) + 1
#todo use the right paste
def get_last_crawled_pastes(self):
paste_parent = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'paste_parent')
#paste_parent = paste_parent.replace(self.paste_directory, '')[1:]
return self.get_all_pastes_domain(paste_parent)
def get_all_pastes_domain(self, father):
if father is None:
return []
l_crawled_pastes = []
paste_parent = father.replace(self.paste_directory+'/', '')
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_parent))
## TODO: # FIXME: remove me
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(father))
paste_childrens = paste_childrens | paste_children
for children in paste_childrens:
if self.domain in children:
l_crawled_pastes.append(children)
self.update_domain_tags(children)
l_crawled_pastes.extend(self.get_all_pastes_domain(children))
return l_crawled_pastes
def get_domain_son(self, l_paste):
if l_paste is None:
return None
set_domain = set()
for paste in l_paste:
paste_full = paste.replace(self.paste_directory+'/', '')
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_full))
## TODO: # FIXME: remove me
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(paste))
paste_childrens = paste_childrens | paste_children
for children in paste_childrens:
if not self.domain in children:
print(children)
set_domain.add((children.split('.onion')[0]+'.onion').split('/')[-1])
return set_domain
def get_all_domain_son(self, father):
if father is None:
return []
l_crawled_pastes = []
paste_parent = father.replace(self.paste_directory+'/', '')
paste_childrens = self.r_serv_metadata.smembers('paste_children:{}'.format(paste_parent))
## TODO: # FIXME: remove me
paste_children = self.r_serv_metadata.smembers('paste_children:{}'.format(father))
paste_childrens = paste_childrens | paste_children
for children in paste_childrens:
if not self.domain in children:
l_crawled_pastes.append(children)
#self.update_domain_tags(children)
l_crawled_pastes.extend(self.get_all_domain_son(children))
return l_crawled_pastes
def get_domain_random_screenshot(self, l_crawled_pastes, num_screenshot = 1):
l_screenshot_paste = []
for paste in l_crawled_pastes:
## FIXME: # TODO: remove me
paste= paste.replace(self.paste_directory+'/', '')
paste = paste.replace(self.paste_crawled_directory_name, '')
if os.path.isfile( '{}{}.png'.format(self.screenshot_directory, paste) ):
l_screenshot_paste.append(paste[1:])
if len(l_screenshot_paste) > num_screenshot:
l_random_screenshot = []
for index in random.sample( range(0, len(l_screenshot_paste)), num_screenshot ):
l_random_screenshot.append(l_screenshot_paste[index])
return l_random_screenshot
else:
return l_screenshot_paste
def get_crawled_pastes_by_date(self, date):
pastes_path = os.path.join(self.paste_crawled_directory, date[0:4], date[4:6], date[6:8])
paste_parent = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'last_check')
l_crawled_pastes = []
return l_crawled_pastes
def get_last_crawled_pastes_fileSearch(self):
last_check = self.r_serv_onion.hget('onion_metadata:{}'.format(self.domain), 'last_check')
return self.get_crawled_pastes_by_date_fileSearch(last_check)
def get_crawled_pastes_by_date_fileSearch(self, date):
pastes_path = os.path.join(self.paste_crawled_directory, date[0:4], date[4:6], date[6:8])
l_crawled_pastes = [f for f in os.listdir(pastes_path) if self.domain in f]
return l_crawled_pastes

View file

@ -94,6 +94,7 @@ class Paste(object):
var = self.p_path.split('/') var = self.p_path.split('/')
self.p_date = Date(var[-4], var[-3], var[-2]) self.p_date = Date(var[-4], var[-3], var[-2])
self.p_rel_path = os.path.join(var[-4], var[-3], var[-2], self.p_name)
self.p_source = var[-5] self.p_source = var[-5]
self.supposed_url = 'https://{}/{}'.format(self.p_source.replace('_pro', ''), var[-1].split('.gz')[0]) self.supposed_url = 'https://{}/{}'.format(self.p_source.replace('_pro', ''), var[-1].split('.gz')[0])
@ -291,6 +292,9 @@ class Paste(object):
else: else:
return '[]' return '[]'
def get_p_rel_path(self):
return self.p_rel_path
def save_all_attributes_redis(self, key=None): def save_all_attributes_redis(self, key=None):
""" """
Saving all the attributes in a "Redis-like" Database (Redis, LevelDB) Saving all the attributes in a "Redis-like" Database (Redis, LevelDB)

View file

@ -4,6 +4,8 @@ dicofilters = Dicos
pastes = PASTES pastes = PASTES
hash = HASHS hash = HASHS
base64 = BASE64 base64 = BASE64
crawled = crawled
crawled_screenshot = CRAWLED_SCREENSHOT
wordtrending_csv = var/www/static/csv/wordstrendingdata wordtrending_csv = var/www/static/csv/wordstrendingdata
wordsfile = files/wordfile wordsfile = files/wordfile
@ -184,6 +186,11 @@ host = localhost
port = 6382 port = 6382
db = 8 db = 8
[ARDB_Onion]
host = localhost
port = 6382
db = 9
[Url] [Url]
cc_critical = DE cc_critical = DE
@ -228,3 +235,9 @@ channel = FetchedOnion
host = localhost host = localhost
port = 6381 port = 6381
db = 0 db = 0
[Crawler]
activate_crawler = True
crawler_depth_limit = 1
splash_url_onion = http://127.0.0.1
splash_onion_port = 8050-8052

View file

@ -61,7 +61,7 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_alertHandler,Redis_Tags
[Onion] [Onion]
subscribe = Redis_Onion subscribe = Redis_Onion
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler,Redis_Tags publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler,Redis_Tags,Redis_Crawler
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler #publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_alertHandler
[DumpValidOnion] [DumpValidOnion]
@ -136,3 +136,8 @@ publish = Redis_Duplicate,Redis_alertHandler,Redis_Tags
[submit_paste] [submit_paste]
subscribe = Redis subscribe = Redis
publish = Redis_Mixer publish = Redis_Mixer
[Crawler]
subscribe = Redis_Crawler
publish = Redis_Mixer,Redis_Tags

View file

@ -0,0 +1,242 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
import gzip
import base64
import uuid
import datetime
import base64
import redis
import json
from scrapy.spidermiddlewares.httperror import HttpError
from twisted.internet.error import DNSLookupError
from twisted.internet.error import TimeoutError
from scrapy import Spider
from scrapy.linkextractors import LinkExtractor
from scrapy.crawler import CrawlerProcess, Crawler
from scrapy_splash import SplashRequest
sys.path.append(os.environ['AIL_BIN'])
from Helper import Process
class TorSplashCrawler():
def __init__(self, splash_url, crawler_depth_limit):
self.process = CrawlerProcess({'LOG_ENABLED': False})
self.crawler = Crawler(self.TorSplashSpider, {
'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0',
'SPLASH_URL': splash_url,
'ROBOTSTXT_OBEY': False,
'DOWNLOADER_MIDDLEWARES': {'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
},
'SPIDER_MIDDLEWARES': {'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,},
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
'HTTPERROR_ALLOW_ALL': True,
'DEPTH_LIMIT': crawler_depth_limit
})
def crawl(self, type, url, domain, original_paste, super_father):
self.process.crawl(self.crawler, type=type, url=url, domain=domain,original_paste=original_paste, super_father=super_father)
self.process.start()
class TorSplashSpider(Spider):
name = 'TorSplashSpider'
def __init__(self, type, url, domain,original_paste, super_father, *args, **kwargs):
self.type = type
self.original_paste = original_paste
self.super_father = super_father
self.start_urls = url
self.domains = [domain]
date = datetime.datetime.now().strftime("%Y/%m/%d")
self.full_date = datetime.datetime.now().strftime("%Y%m%d")
self.date_month = datetime.datetime.now().strftime("%Y%m")
config_section = 'Crawler'
self.p = Process(config_section)
self.r_cache = redis.StrictRedis(
host=self.p.config.get("Redis_Cache", "host"),
port=self.p.config.getint("Redis_Cache", "port"),
db=self.p.config.getint("Redis_Cache", "db"),
decode_responses=True)
self.r_serv_log_submit = redis.StrictRedis(
host=self.p.config.get("Redis_Log_submit", "host"),
port=self.p.config.getint("Redis_Log_submit", "port"),
db=self.p.config.getint("Redis_Log_submit", "db"),
decode_responses=True)
self.r_serv_metadata = redis.StrictRedis(
host=self.p.config.get("ARDB_Metadata", "host"),
port=self.p.config.getint("ARDB_Metadata", "port"),
db=self.p.config.getint("ARDB_Metadata", "db"),
decode_responses=True)
self.r_serv_onion = redis.StrictRedis(
host=self.p.config.get("ARDB_Onion", "host"),
port=self.p.config.getint("ARDB_Onion", "port"),
db=self.p.config.getint("ARDB_Onion", "db"),
decode_responses=True)
self.crawler_path = os.path.join(self.p.config.get("Directories", "crawled"), date )
self.crawled_paste_filemame = os.path.join(os.environ['AIL_HOME'], self.p.config.get("Directories", "pastes"),
self.p.config.get("Directories", "crawled"), date )
self.crawled_screenshot = os.path.join(os.environ['AIL_HOME'], self.p.config.get("Directories", "crawled_screenshot"), date )
def start_requests(self):
yield SplashRequest(
self.start_urls,
self.parse,
#errback=self.errback_catcher,
endpoint='render.json',
meta={'father': self.original_paste},
args={ 'html': 1,
'wait': 10,
'render_all': 1,
'har': 1,
'png': 1}
)
def parse(self,response):
#print(response.headers)
#print(response.status)
if response.status == 504:
# down ?
print('504 detected')
elif response.status != 200:
print('other response: {}'.format(response.status))
#print(error_log)
#detect connection to proxy refused
error_log = (json.loads(response.body.decode()))
if(error_log['info']['text'] == 'Connection to proxy refused'):
print('Connection to proxy refused')
else:
UUID = self.domains[0]+str(uuid.uuid4())
filename_paste = os.path.join(self.crawled_paste_filemame, UUID)
relative_filename_paste = os.path.join(self.crawler_path, UUID)
filename_screenshot = os.path.join(self.crawled_screenshot, UUID +'.png')
# save new paste on disk
if self.save_crawled_paste(filename_paste, response.data['html']):
# add this paste to the domain crawled set # TODO: # FIXME: put this on cache ?
#self.r_serv_onion.sadd('temp:crawled_domain_pastes:{}'.format(self.domains[0]), filename_paste)
self.r_serv_onion.sadd('{}_up:{}'.format(self.type, self.full_date), self.domains[0])
self.r_serv_onion.sadd('full_{}_up'.format(self.type), self.domains[0])
self.r_serv_onion.sadd('month_{}_up:{}'.format(self.type, self.date_month), self.domains[0])
# create onion metadata
if not self.r_serv_onion.exists('{}_metadata:{}'.format(self.type, self.domains[0])):
self.r_serv_onion.hset('{}_metadata:{}'.format(self.type, self.domains[0]), 'first_seen', self.full_date)
self.r_serv_onion.hset('{}_metadata:{}'.format(self.type, self.domains[0]), 'last_seen', self.full_date)
#create paste metadata
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'super_father', self.super_father)
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'father', response.meta['father'])
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'domain', self.domains[0])
self.r_serv_metadata.hset('paste_metadata:'+filename_paste, 'real_link', response.url)
self.r_serv_metadata.sadd('paste_children:'+response.meta['father'], filename_paste)
dirname = os.path.dirname(filename_screenshot)
if not os.path.exists(dirname):
os.makedirs(dirname)
size_screenshot = (len(response.data['png'])*3) /4
if size_screenshot < 5000000: #bytes
with open(filename_screenshot, 'wb') as f:
f.write(base64.standard_b64decode(response.data['png'].encode()))
with open(filename_screenshot+'har.txt', 'wb') as f:
f.write(json.dumps(response.data['har']).encode())
# save external links in set
#lext = LinkExtractor(deny_domains=self.domains, unique=True)
#for link in lext.extract_links(response):
# self.r_serv_onion.sadd('domain_{}_external_links:{}'.format(self.type, self.domains[0]), link.url)
# self.r_serv_metadata.sadd('paste_{}_external_links:{}'.format(self.type, filename_paste), link.url)
le = LinkExtractor(allow_domains=self.domains, unique=True)
for link in le.extract_links(response):
yield SplashRequest(
link.url,
self.parse,
#errback=self.errback_catcher,
endpoint='render.json',
meta={'father': relative_filename_paste},
args={ 'html': 1,
'png': 1,
'render_all': 1,
'har': 1,
'wait': 10}
)
'''
def errback_catcher(self, failure):
# catch all errback failures,
self.logger.error(repr(failure))
print('failure')
#print(failure)
print(failure.type)
#print(failure.request.meta['item'])
#if isinstance(failure.value, HttpError):
if failure.check(HttpError):
# you can get the response
response = failure.value.response
print('HttpError')
self.logger.error('HttpError on %s', response.url)
#elif isinstance(failure.value, DNSLookupError):
elif failure.check(DNSLookupError):
# this is the original request
request = failure.request
print(DNSLookupError)
print('DNSLookupError')
self.logger.error('DNSLookupError on %s', request.url)
#elif isinstance(failure.value, TimeoutError):
elif failure.check(TimeoutError):
request = failure.request
print('TimeoutError')
print(TimeoutError)
self.logger.error('TimeoutError on %s', request.url)
'''
def save_crawled_paste(self, filename, content):
if os.path.isfile(filename):
print('File: {} already exist in submitted pastes'.format(filename))
return False
try:
gzipencoded = gzip.compress(content.encode())
gzip64encoded = base64.standard_b64encode(gzipencoded).decode()
except:
print("file error: {}".format(filename))
return False
# send paste to Global
relay_message = "{0} {1}".format(filename, gzip64encoded)
self.p.populate_set_out(relay_message, 'Mixer')
# increase nb of paste by feeder name
self.r_serv_log_submit.hincrby("mixer_cache:list_feeder", "crawler", 1)
# tag crawled paste
msg = 'infoleak:submission="crawler";{}'.format(filename)
self.p.populate_set_out(msg, 'Tags')
return True

View file

@ -0,0 +1 @@
www.facebookcorewwwi.onion

View file

@ -0,0 +1,43 @@
#!/bin/bash
usage() { echo "Usage: sudo $0 [-f <config_absolute_path>] [-p <port_start>] [-n <number_of_splash_servers>]" 1>&2;
echo " -f: absolute path to splash docker proxy-profiles directory (used for proxy configuration)";
echo " -p: number of the first splash server port number. This number is incremented for the others splash server";
echo " -n: number of splash servers to start";
echo "";
echo "example:";
echo "sudo ./launch_splash_crawler.sh -f /home/my_user/AIL-framework/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 3";
exit 1;
}
while getopts ":p:f:n:" o; do
case "${o}" in
p)
p=${OPTARG}
;;
f)
f=${OPTARG}
;;
n)
n=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND-1))
if [ -z "${p}" ] || [ -z "${f}" ] || [ -z "${n}" ]; then
usage;
fi
screen -dmS "Docker_Splash"
sleep 0.1
for ((i=0;i<=$((${n} - 1));i++)); do
port_number=$((${p} + $i))
screen -S "Docker_Splash" -X screen -t "docker_splash:$port_number" bash -c 'sudo docker run -p '$port_number':8050 --cpus=1 --memory=4.5G -v '$f':/etc/splash/proxy-profiles/ --net="bridge" scrapinghub/splash; read x'
sleep 0.1
echo " Splash server launched on port $port_number"
done

34
bin/torcrawler/tor_crawler.py Executable file
View file

@ -0,0 +1,34 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
import configparser
from TorSplashCrawler import TorSplashCrawler
if __name__ == '__main__':
if len(sys.argv) != 7:
print('usage:', 'tor_crawler.py', 'splash_url', 'type', 'url', 'domain', 'paste', 'super_father')
exit(1)
configfile = os.path.join(os.environ['AIL_BIN'], 'packages/config.cfg')
if not os.path.exists(configfile):
raise Exception('Unable to find the configuration file. \
Did you set environment variables? \
Or activate the virtualenv.')
cfg = configparser.ConfigParser()
cfg.read(configfile)
splash_url = sys.argv[1]
type = sys.argv[2]
crawler_depth_limit = cfg.getint("Crawler", "crawler_depth_limit")
url = sys.argv[3]
domain = sys.argv[4]
paste = sys.argv[5]
super_father = sys.argv[6]
crawler = TorSplashCrawler(splash_url, crawler_depth_limit)
crawler.crawl(type, url, domain, paste, super_father)

View file

@ -0,0 +1,4 @@
[proxy]
host=172.17.0.1
port=9050
type=SOCKS5

View file

@ -0,0 +1,56 @@
#!/bin/bash
install_docker() {
# install docker
sudo apt install docker.io;
# pull splah docker
sudo docker pull scrapinghub/splash;
}
install_python_requirement() {
. ./AILENV/bin/activate;
pip3 install -U -r crawler_requirements.txt;
}
install_all() {
read -p "Do you want to install docker? (use local splash server) [y/n] " -n 1 -r
echo # (optional) move to a new line
if [[ $REPLY =~ ^[Yy]$ ]]
then
install_docker;
fi
install_python_requirement;
}
usage() {
echo "Usage: crawler_hidden_services_install.sh [-y | -n]" 1>&2;
echo " -y: install docker"
echo " -n: don't install docker"
echo ""
echo "example:"
echo "crawler_hidden_services_install.sh -y"
exit 1;
}
if [[ $1 == "" ]]; then
install_all;
exit;
else
key="$1"
case $key in
"")
install_all;
;;
-y|--yes)
install_docker;
install_python_requirement;
;;
-n|--no)
install_python_requirement;
;;
*) # unknown option
usage;
;;
esac
fi

2
crawler_requirements.txt Normal file
View file

@ -0,0 +1,2 @@
scrapy
scrapy-splash

View file

@ -0,0 +1,4 @@
[proxy]
host=localhost
port=9050
type=SOCKS5

View file

@ -1 +1,2 @@
onion onion
i2p

View file

@ -48,6 +48,9 @@ except IOError:
f = open('templates/ignored_modules.txt', 'w') f = open('templates/ignored_modules.txt', 'w')
f.close() f.close()
activate_crawler = cfg.get("Crawler", "activate_crawler")
if activate_crawler != 'True':
toIgnoreModule.add('hiddenServices')
# Dynamically import routes and functions from modules # Dynamically import routes and functions from modules
# Also, prepare header.html # Also, prepare header.html

View file

@ -96,6 +96,12 @@ r_serv_statistics = redis.StrictRedis(
db=cfg.getint("ARDB_Statistics", "db"), db=cfg.getint("ARDB_Statistics", "db"),
decode_responses=True) decode_responses=True)
r_serv_onion = redis.StrictRedis(
host=cfg.get("ARDB_Onion", "host"),
port=cfg.getint("ARDB_Onion", "port"),
db=cfg.getint("ARDB_Onion", "db"),
decode_responses=True)
sys.path.append('../../configs/keys') sys.path.append('../../configs/keys')
# MISP # # MISP #
@ -150,6 +156,7 @@ bootstrap_label = ['primary', 'success', 'danger', 'warning', 'info']
UPLOAD_FOLDER = os.path.join(os.environ['AIL_FLASK'], 'submitted') UPLOAD_FOLDER = os.path.join(os.environ['AIL_FLASK'], 'submitted')
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes")) PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "pastes"))
SCREENSHOT_FOLDER = os.path.join(os.environ['AIL_HOME'], cfg.get("Directories", "crawled_screenshot"))
max_dashboard_logs = int(cfg.get("Flask", "max_dashboard_logs")) max_dashboard_logs = int(cfg.get("Flask", "max_dashboard_logs"))

View file

@ -0,0 +1,169 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
'''
Flask functions and routes for the trending modules page
'''
import redis
import datetime
import sys
import os
from flask import Flask, render_template, jsonify, request, Blueprint
from Date import Date
from HiddenServices import HiddenServices
# ============ VARIABLES ============
import Flask_config
app = Flask_config.app
cfg = Flask_config.cfg
baseUrl = Flask_config.baseUrl
r_serv_onion = Flask_config.r_serv_onion
r_serv_metadata = Flask_config.r_serv_metadata
bootstrap_label = Flask_config.bootstrap_label
PASTES_FOLDER = Flask_config.PASTES_FOLDER
hiddenServices = Blueprint('hiddenServices', __name__, template_folder='templates')
# ============ FUNCTIONS ============
def one():
return 1
def get_date_range(num_day):
curr_date = datetime.date.today()
date = Date( '{}{}{}'.format(str(curr_date.year), str(curr_date.month).zfill(2), str(curr_date.day).zfill(2)) )
date_list = []
for i in range(0, num_day):
date_list.append(date.substract_day(i))
return list(reversed(date_list))
def unpack_paste_tags(p_tags):
l_tags = []
for tag in p_tags:
complete_tag = tag
tag = tag.split('=')
if len(tag) > 1:
if tag[1] != '':
tag = tag[1][1:-1]
# no value
else:
tag = tag[0][1:-1]
# use for custom tags
else:
tag = tag[0]
l_tags.append( (tag, complete_tag) )
return l_tags
def get_onion_status(domain, date):
if r_serv_onion.sismember('onion_up:'+date , domain):
return True
else:
return False
# ============= ROUTES ==============
@hiddenServices.route("/hiddenServices/", methods=['GET'])
def hiddenServices_page():
last_onions = r_serv_onion.lrange('last_onion', 0 ,-1)
list_onion = []
now = datetime.datetime.now()
date = '{}{}{}'.format(now.strftime("%Y"), now.strftime("%m"), now.strftime("%d"))
statDomains = {}
statDomains['domains_up'] = r_serv_onion.scard('onion_up:{}'.format(date))
statDomains['domains_down'] = r_serv_onion.scard('onion_down:{}'.format(date))
statDomains['total'] = statDomains['domains_up'] + statDomains['domains_down']
statDomains['domains_queue'] = r_serv_onion.scard('onion_domain_crawler_queue')
for onion in last_onions:
metadata_onion = {}
metadata_onion['domain'] = onion
metadata_onion['last_check'] = r_serv_onion.hget('onion_metadata:{}'.format(onion), 'last_check')
metadata_onion['first_seen'] = r_serv_onion.hget('onion_metadata:{}'.format(onion), 'first_seen')
if get_onion_status(onion, metadata_onion['last_check']):
metadata_onion['status_text'] = 'UP'
metadata_onion['status_color'] = 'Green'
metadata_onion['status_icon'] = 'fa-check-circle'
else:
metadata_onion['status_text'] = 'DOWN'
metadata_onion['status_color'] = 'Red'
metadata_onion['status_icon'] = 'fa-times-circle'
list_onion.append(metadata_onion)
return render_template("hiddenServices.html", last_onions=list_onion, statDomains=statDomains)
@hiddenServices.route("/hiddenServices/onion_domain", methods=['GET'])
def onion_domain():
onion_domain = request.args.get('onion_domain')
if onion_domain is None or not r_serv_onion.exists('onion_metadata:{}'.format(onion_domain)):
return '404'
# # TODO: FIXME return 404
last_check = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'last_check')
last_check = '{}/{}/{}'.format(last_check[0:4], last_check[4:6], last_check[6:8])
first_seen = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'first_seen')
first_seen = '{}/{}/{}'.format(first_seen[0:4], first_seen[4:6], first_seen[6:8])
origin_paste = r_serv_onion.hget('onion_metadata:{}'.format(onion_domain), 'paste_parent')
h = HiddenServices(onion_domain, 'onion')
l_pastes = h.get_last_crawled_pastes()
if l_pastes:
status = True
else:
status = False
screenshot = h.get_domain_random_screenshot(l_pastes)
if screenshot:
screenshot = screenshot[0]
else:
screenshot = 'None'
domain_tags = h.get_domain_tags()
origin_paste_name = h.get_origin_paste_name()
origin_paste_tags = unpack_paste_tags(r_serv_metadata.smembers('tag:{}'.format(origin_paste)))
paste_tags = []
path_name = []
for path in l_pastes:
path_name.append(path.replace(PASTES_FOLDER+'/', ''))
p_tags = r_serv_metadata.smembers('tag:'+path)
paste_tags.append(unpack_paste_tags(p_tags))
return render_template("showDomain.html", domain=onion_domain, last_check=last_check, first_seen=first_seen,
l_pastes=l_pastes, paste_tags=paste_tags, bootstrap_label=bootstrap_label,
path_name=path_name, origin_paste_tags=origin_paste_tags, status=status,
origin_paste=origin_paste, origin_paste_name=origin_paste_name,
domain_tags=domain_tags, screenshot=screenshot)
@hiddenServices.route("/hiddenServices/onion_son", methods=['GET'])
def onion_son():
onion_domain = request.args.get('onion_domain')
h = HiddenServices(onion_domain, 'onion')
l_pastes = h.get_last_crawled_pastes()
l_son = h.get_domain_son(l_pastes)
print(l_son)
return 'l_son'
# ============= JSON ==============
@hiddenServices.route("/hiddenServices/domain_crawled_7days_json", methods=['GET'])
def domain_crawled_7days_json():
type = 'onion'
## TODO: # FIXME: 404 error
date_range = get_date_range(7)
json_domain_stats = []
#try:
for date in date_range:
nb_domain_up = r_serv_onion.scard('{}_up:{}'.format(type, date))
nb_domain_down = r_serv_onion.scard('{}_up:{}'.format(type, date))
date = date[0:4] + '-' + date[4:6] + '-' + date[6:8]
json_domain_stats.append({ 'date': date, 'value': int( nb_domain_up ), 'nb_domain_down': int( nb_domain_down )})
#except:
#return jsonify()
return jsonify(json_domain_stats)
# ========= REGISTRATION =========
app.register_blueprint(hiddenServices, url_prefix=baseUrl)

View file

@ -0,0 +1 @@
<li id='page-hiddenServices'><a href="{{ url_for('hiddenServices.hiddenServices_page') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>

View file

@ -0,0 +1,234 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Hidden Service - AIL</title>
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png') }}">
<!-- Core CSS -->
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='font-awesome/css/font-awesome.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='css/sb-admin-2.css') }}" rel="stylesheet">
<!-- JS -->
<script language="javascript" src="{{ url_for('static', filename='js/jquery.js')}}"></script>
<script language="javascript" src="{{ url_for('static', filename='js/d3.min.js') }}"></script>
<style>
div.tooltip {
position: absolute;
text-align: center;
padding: 2px;
font: 12px sans-serif;
background: #ebf4fb;
border: 2px solid #b7ddf2;
border-radius: 8px;
pointer-events: none;
color: #000000;
}
.line_graph {
fill: none;
stroke: steelblue;
stroke-width: 2px;
stroke-linejoin: round;
stroke-linecap: round;
stroke-width: 1.5;
}
</style>
</head>
<body>
{% include 'navbar.html' %}
<div id="page-wrapper">
<div class="row">
<div class="col-md-6">
<div class="panel panel-primary">
<div class="panel-heading">ONION</div>
<div class="panel-body">
<table class="table table-striped">
<thead>
<tr>
<th>Domain</th>
<th>First Seen</th>
<th>Last Check</th>
<th>Status</th>
</tr>
</thead>
<tbody>
{% for metadata_onion in last_onions %}
<tr>
<td><a target="_blank" href="{{ url_for('hiddenServices.onion_domain') }}?onion_domain={{ metadata_onion['domain'] }}">{{ metadata_onion['domain'] }}</a></td>
<td>{{'{}/{}/{}'.format(metadata_onion['first_seen'][0:4], metadata_onion['first_seen'][4:6], metadata_onion['first_seen'][6:8])}}</td>
<td>{{'{}/{}/{}'.format(metadata_onion['last_check'][0:4], metadata_onion['last_check'][4:6], metadata_onion['last_check'][6:8])}}</td>
<td><div style="color:{{metadata_onion['status_color']}}; display:inline-block">
<i class="fa {{metadata_onion['status_icon']}} fa-2x"></i>
{{metadata_onion['status_text']}}
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
<div id="graph_line">
</div>
</div>
</div>
</div>
<div class="col-md-6">
<div class="panel panel-info">
<div class="panel-heading">
<i class="fa fa-eye-slash"></i> Domains Crawled Today
</div>
<table class="table table-hover table-striped">
<tbody>
<tr>
<td>
<div style="color:Green; display:inline-block">
<i class="fa fa-check-circle fa-2x"></i>
Domains UP
</div>
</td>
<td>
<div style="color:Green; display:inline-block">
{{ statDomains['domains_up'] }}
</div>
</td>
</tr>
<tr>
<td>
<div style="color:Red; display:inline-block">
<i class="fa fa-times-circle fa-2x"></i>
Domains DOWN
</div>
</td>
<td>
<div style="color:Red; display:inline-block">
{{ statDomains['domains_down'] }}
</div>
</td>
</tr>
<tr>
<td>Crawled Domains</td>
<td>{{ statDomains['total'] }}</td>
</tr>
<tr>
<td>Domains in Queue</td>
<td>{{ statDomains['domains_queue'] }}</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
<!-- /#page-wrapper -->
<script>
var all_graph = {};
$(document).ready(function(){
activePage = "page-hiddenServices"
$("#"+activePage).addClass("active");
all_graph.line_chart = create_line_chart('graph_line', "{{ url_for('hiddenServices.domain_crawled_7days_json') }}?type=onion");
});
$(window).on("resize", function() {
all_graph.onResize();
});
</script>
<script>
all_graph.onResize = function () {
var aspect = 1000 / 500, all_graph = $("#graph_div");
var targetWidth = all_graph.parent().width();
all_graph.attr("width", targetWidth);
all_graph.attr("height", targetWidth / aspect);
}
window.all_graph = all_graph;
function create_line_chart(id, url){
var width = 650;
var height = Math.round(width / 6);
var margin = {top: 20, right: 55, bottom: 50, left: 40};
var x = d3.scaleTime().range([0, width]);
var y = d3.scaleLinear().rangeRound([height, 0]);
var xAxis = d3.axisBottom(x);
var yAxis = d3.axisLeft(y);
var parseTime = d3.timeParse("%Y-%m-%d");
var line = d3.line()
.x(function(d) {
return x(d.date);
})
.y(function(d) {
return y(d.value);
});
var svg_line = d3.select('#'+id).append('svg')
.attr("id", "graph_div")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append('g')
.attr('transform', "translate("+ margin.left +","+ margin.top +")");
var div = d3.select('body').append('div')
.attr('class', 'tooltip')
.style('opacity', 0);
//add div tooltip
d3.json(url)
.then(function(data){
data.forEach(function(d) {
d.date_label = d.date;
d.date = parseTime(d.date);
d.value = +d.value;
});
// fit the data
x.domain(d3.extent(data, function(d) { return d.date; }));
//x.domain(data.map(function (d) { return d.date; })); //E
y.domain([0, d3.max(data, function(d){ return d.value ; })]);
//line
svg_line.append("path")
.data([data])
.attr("class", "line_graph")
.attr("d", line);
// add X axis
svg_line.append("g")
.attr("transform", "translate(0," + height + ")")
.call(d3.axisBottom(x))
.selectAll("text")
.style("text-anchor", "end")
.attr("transform", "rotate(-45)" );
// Add the Y Axis
svg_line.append("g")
.call(d3.axisLeft(y));
//add a dot circle
svg_line.selectAll('dot')
.data(data).enter()
.append('circle')
.attr('r', 2)
.attr('cx', function(d) { return x(d.date); })
.attr('cy', function(d) { return y(d.value); })
.on('mouseover', function(d) {
div.transition().style('opacity', .9);
div.html('' + d.date_label+ '<br/>' + d.value).style('left', (d3.event.pageX) + 'px')
.style("left", (d3.event.pageX) + "px")
.style("top", (d3.event.pageY - 28) + "px");
})
.on('mouseout', function(d)
{
div.transition().style('opacity', 0);
});
});
}
</script>
</body>
</html>

View file

@ -0,0 +1,213 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Show Domain - AIL</title>
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png') }}">
<!-- Core CSS -->
<link href="{{ url_for('static', filename='css/bootstrap.min.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='font-awesome/css/font-awesome.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='css/sb-admin-2.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='css/dygraph_gallery.css') }}" rel="stylesheet" type="text/css" />
<!-- JS -->
<script type="text/javascript" src="{{ url_for('static', filename='js/dygraph-combined.js') }}"></script>
<script language="javascript" src="{{ url_for('static', filename='js/jquery.js')}}"></script>
<script src="{{ url_for('static', filename='js/jquery.dataTables.min.js')}}"></script>
<script src="{{ url_for('static', filename='js/dataTables.bootstrap.js')}}"></script>
<style>
.test thead{
background: #d91f2d;
color: #fff;
}
</style>
</head>
<body>
{% include 'navbar.html' %}
<div id="page-wrapper">
<div class="row">
<div class="col-md-6">
<div class="row">
<div class="panel panel-info">
<div class="panel-heading">
{% if status %}
<div class="pull-right" style="color:Green;">
<i class="fa fa-check-circle fa-2x"></i>
UP
</div>
{% else %}
<div class="pull-right" style="color:Red;">
<i class="fa fa-times-circle fa-2x"></i>
DOWN
</div>
{% endif %}
<h3>{{ domain }} :</h3>
<ul class="list-group">
<li class="list-group-item">
<table class="table table-condensed">
<thead>
<tr>
<th>First Seen</th>
<th>Last Check</th>
</tr>
</thead>
<tbody>
<tr>
<td class="panelText"><a href="#">{{ first_seen }}</a></td>
<td class="panelText"><a href="#">{{ last_check }}</a></td>
</tr>
</tbody>
</table>
</li>
<li class="list-group-item">
Origin Paste: <a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste', paste=origin_paste) }}" />{{ origin_paste_name }}</a>
<div>
{% for tag in origin_paste_tags %}
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag[1] }}">
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag[0] }}</span>
</a>
{% endfor %}
<br>
</div>
</li>
</ul>
</div>
</div>
<div>
{% for tag in domain_tags %}
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag }}">
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag }} <i>{{ domain_tags[tag] }}</i></span>
</a>
{% endfor %}
<br>
<br>
</div>
<table class="test table table-striped table-bordered table-hover table-responsive " id="myTable_">
<thead>
<tr>
<th style="max-width: 800px;">Crawled Pastes</th>
</tr>
</thead>
<tbody>
{% for path in l_pastes %}
<tr>
<td><a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste') }}?paste={{path}}">{{ path_name[loop.index0] }}</a>
<div>
{% for tag in paste_tags[loop.index0] %}
<a href="{{ url_for('Tags.get_tagged_paste') }}?ltags={{ tag[1] }}">
<span class="label label-{{ bootstrap_label[loop.index0 % 5] }} pull-left">{{ tag[0] }}</span>
</a>
{% endfor %}
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<div class="col-md-6">
<div class="panel panel-info" style="text-align:center;">
<div class="panel-heading">
<div class="row">
<div class="col-md-8">
<input class="center" id="blocks" type="range" min="1" max="50" value="13">
</div>
<div class="col-md-4">
<button class="btn btn-primary btn-tags" onclick="blocks.value=50;pixelate();">
<span class="glyphicon glyphicon-zoom-in"></span>
<span class="label-icon">Full resolution</span>
</button>
</div>
</div>
</div>
</div>
<canvas id="canvas" style="width:100%;"></canvas>
</div>
</div>
</div>
<!-- /#page-wrapper -->
<script>
$(document).ready(function(){
activePage = "page-hiddenServices"
$("#"+activePage).addClass("active");
table = $('#myTable_').DataTable(
{
"aLengthMenu": [[5, 10, 15, 20, -1], [5, 10, 15, 20, "All"]],
"iDisplayLength": 5,
"order": [[ 0, "desc" ]]
}
);
});
</script>
<script>
var ctx = canvas.getContext('2d'), img = new Image();
/// turn off image smoothing
ctx.webkitImageSmoothingEnabled = false;
ctx.imageSmoothingEnabled = false;
img.onload = pixelate;
img.addEventListener("error", img_error);
var draw_img = false;
img.src = "{{ url_for('showsavedpastes.screenshot', filename=screenshot) }}";
function pixelate() {
/// use slider value
if( blocks.value == 50 ){
size = 1;
} else {
var size = (blocks.value) * 0.01;
}
canvas.width = img.width;
canvas.height = img.height;
/// cache scaled width and height
w = canvas.width * size;
h = canvas.height * size;
/// draw original image to the scaled size
ctx.drawImage(img, 0, 0, w, h);
/// pixelated
ctx.drawImage(canvas, 0, 0, w, h, 0, 0, canvas.width, canvas.height);
}
blocks.addEventListener('change', pixelate, false);
function img_error() {
img.onerror=null;
img.src="{{ url_for('static', filename='image/AIL.png') }}";
blocks.value = 50;
pixelate;
}
</script>
</body>
</html>

View file

@ -8,7 +8,7 @@ import redis
import json import json
import os import os
import flask import flask
from flask import Flask, render_template, jsonify, request, Blueprint, make_response, redirect, url_for, Response from flask import Flask, render_template, jsonify, request, Blueprint, make_response, Response, send_from_directory, redirect, url_for
import difflib import difflib
import ssdeep import ssdeep
@ -25,6 +25,7 @@ r_serv_pasteName = Flask_config.r_serv_pasteName
r_serv_metadata = Flask_config.r_serv_metadata r_serv_metadata = Flask_config.r_serv_metadata
r_serv_tags = Flask_config.r_serv_tags r_serv_tags = Flask_config.r_serv_tags
r_serv_statistics = Flask_config.r_serv_statistics r_serv_statistics = Flask_config.r_serv_statistics
r_serv_onion = Flask_config.r_serv_onion
max_preview_char = Flask_config.max_preview_char max_preview_char = Flask_config.max_preview_char
max_preview_modal = Flask_config.max_preview_modal max_preview_modal = Flask_config.max_preview_modal
DiffMaxLineLength = Flask_config.DiffMaxLineLength DiffMaxLineLength = Flask_config.DiffMaxLineLength
@ -33,6 +34,7 @@ misp_event_url = Flask_config.misp_event_url
hive_case_url = Flask_config.hive_case_url hive_case_url = Flask_config.hive_case_url
vt_enabled = Flask_config.vt_enabled vt_enabled = Flask_config.vt_enabled
PASTES_FOLDER = Flask_config.PASTES_FOLDER PASTES_FOLDER = Flask_config.PASTES_FOLDER
SCREENSHOT_FOLDER = Flask_config.SCREENSHOT_FOLDER
showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templates') showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templates')
@ -41,6 +43,8 @@ showsavedpastes = Blueprint('showsavedpastes', __name__, template_folder='templa
def showpaste(content_range, requested_path): def showpaste(content_range, requested_path):
if PASTES_FOLDER not in requested_path: if PASTES_FOLDER not in requested_path:
requested_path = os.path.join(PASTES_FOLDER, requested_path) requested_path = os.path.join(PASTES_FOLDER, requested_path)
# remove old full path
#requested_path = requested_path.replace(PASTES_FOLDER, '')
# escape directory transversal # escape directory transversal
if os.path.commonprefix((os.path.realpath(requested_path),PASTES_FOLDER)) != PASTES_FOLDER: if os.path.commonprefix((os.path.realpath(requested_path),PASTES_FOLDER)) != PASTES_FOLDER:
return 'path transversal detected' return 'path transversal detected'
@ -175,6 +179,16 @@ def showpaste(content_range, requested_path):
l_64.append( (file_icon, estimated_type, hash, saved_path, nb_in_file, b64_vt, b64_vt_link, b64_vt_report) ) l_64.append( (file_icon, estimated_type, hash, saved_path, nb_in_file, b64_vt, b64_vt_link, b64_vt_report) )
crawler_metadata = {}
if 'infoleak:submission="crawler"' in l_tags:
crawler_metadata['get_metadata'] = True
crawler_metadata['domain'] = r_serv_metadata.hget('paste_metadata:'+requested_path, 'domain')
crawler_metadata['paste_father'] = r_serv_metadata.hget('paste_metadata:'+requested_path, 'father')
crawler_metadata['real_link'] = r_serv_metadata.hget('paste_metadata:'+requested_path,'real_link')
crawler_metadata['screenshot'] = paste.get_p_rel_path()
else:
crawler_metadata['get_metadata'] = False
if Flask_config.pymisp is False: if Flask_config.pymisp is False:
misp = False misp = False
else: else:
@ -202,6 +216,7 @@ def showpaste(content_range, requested_path):
hive_url = hive_case_url.replace('id_here', hive_case) hive_url = hive_case_url.replace('id_here', hive_case)
return render_template("show_saved_paste.html", date=p_date, bootstrap_label=bootstrap_label, active_taxonomies=active_taxonomies, active_galaxies=active_galaxies, list_tags=list_tags, source=p_source, encoding=p_encoding, language=p_language, size=p_size, mime=p_mime, lineinfo=p_lineinfo, content=p_content, initsize=len(p_content), duplicate_list = p_duplicate_list, simil_list = p_simil_list, hashtype_list = p_hashtype_list, date_list=p_date_list, return render_template("show_saved_paste.html", date=p_date, bootstrap_label=bootstrap_label, active_taxonomies=active_taxonomies, active_galaxies=active_galaxies, list_tags=list_tags, source=p_source, encoding=p_encoding, language=p_language, size=p_size, mime=p_mime, lineinfo=p_lineinfo, content=p_content, initsize=len(p_content), duplicate_list = p_duplicate_list, simil_list = p_simil_list, hashtype_list = p_hashtype_list, date_list=p_date_list,
crawler_metadata=crawler_metadata,
l_64=l_64, vt_enabled=vt_enabled, misp=misp, hive=hive, misp_eventid=misp_eventid, misp_url=misp_url, hive_caseid=hive_caseid, hive_url=hive_url) l_64=l_64, vt_enabled=vt_enabled, misp=misp, hive=hive, misp_eventid=misp_eventid, misp_url=misp_url, hive_caseid=hive_caseid, hive_url=hive_url)
# ============ ROUTES ============ # ============ ROUTES ============
@ -250,6 +265,10 @@ def showDiff():
the_html = htmlD.make_file(lines1, lines2) the_html = htmlD.make_file(lines1, lines2)
return the_html return the_html
@showsavedpastes.route('/screenshot/<path:filename>')
def screenshot(filename):
return send_from_directory(SCREENSHOT_FOLDER, filename+'.png', as_attachment=True)
@showsavedpastes.route('/send_file_to_vt/', methods=['POST']) @showsavedpastes.route('/send_file_to_vt/', methods=['POST'])
def send_file_to_vt(): def send_file_to_vt():
b64_path = request.form['b64_path'] b64_path = request.form['b64_path']

View file

@ -351,7 +351,6 @@
<div class="panel-body" id="panel-body"> <div class="panel-body" id="panel-body">
{% if duplicate_list|length == 0 %} {% if duplicate_list|length == 0 %}
<h3> No Duplicate </h3>
{% else %} {% else %}
<h3> Duplicate list: </h3> <h3> Duplicate list: </h3>
<table id="tableDup" class="table table-striped table-bordered"> <table id="tableDup" class="table table-striped table-bordered">
@ -420,6 +419,57 @@
</table> </table>
{% endif %} {% endif %}
{% if crawler_metadata['get_metadata'] %}
<div class="row">
<div class="col-md-5">
<div class="row">
<div class="panel panel-info">
<div class="panel-heading">
<i class="fa fa-eye-slash"></i> Crawled Paste
</div>
<table class="table table-hover table-striped">
<tbody>
<tr>
<td>Domain</td>
<td><a target="_blank" href="{{ url_for('hiddenServices.onion_domain') }}?onion_domain={{ crawler_metadata['domain'] }}" id='onion_domain'>{{ crawler_metadata['domain'] }}</a></td>
</tr>
<tr>
<td>Father</td>
<td><a target="_blank" href="{{ url_for('showsavedpastes.showsavedpaste') }}?paste={{ crawler_metadata['paste_father'] }}" id='paste_father'>{{ crawler_metadata['paste_father'] }}</a></td>
</tr>
<tr>
<td>Source link</td>
<td>{{ crawler_metadata['real_link'] }}</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div class="col-md-7">
<div class="panel panel-info" style="text-align:center;">
<div class="panel-heading">
<div class="row">
<div class="col-md-8">
<input class="center" id="blocks" type="range" min="1" max="50" value="13">
</div>
<div class="col-md-4">
<button class="btn btn-primary btn-tags" onclick="blocks.value=50;pixelate();">
<span class="glyphicon glyphicon-zoom-in"></span>
<span class="label-icon">Full resolution</span>
</button>
</div>
</div>
</div>
</div>
<canvas id="canvas" style="width:100%;"></canvas>
</div>
</div>
{% endif %}
<h3> Content: </h3> <h3> Content: </h3>
<a href="{{ url_for('showsavedpastes.showsavedrawpaste') }}?paste={{ request.args.get('paste') }}" id='raw_paste' > [Raw content] </a> <a href="{{ url_for('showsavedpastes.showsavedrawpaste') }}?paste={{ request.args.get('paste') }}" id='raw_paste' > [Raw content] </a>
<p data-initsize="{{ initsize }}"> <pre id="paste-holder">{{ content }}</pre></p> <p data-initsize="{{ initsize }}"> <pre id="paste-holder">{{ content }}</pre></p>
@ -535,5 +585,51 @@
{% endfor %} {% endfor %}
</script> </script>
<script>
var ctx = canvas.getContext('2d'), img = new Image();
/// turn off image smoothing
ctx.webkitImageSmoothingEnabled = false;
ctx.imageSmoothingEnabled = false;
img.onload = pixelate;
img.addEventListener("error", img_error);
var draw_img = false;
img.src = "{{ url_for('showsavedpastes.screenshot', filename=crawler_metadata['screenshot']) }}";
function pixelate() {
/// use slider value
if( blocks.value == 50 ){
size = 1;
} else {
var size = (blocks.value) * 0.01;
}
canvas.width = img.width;
canvas.height = img.height;
/// cache scaled width and height
w = canvas.width * size;
h = canvas.height * size;
/// draw original image to the scaled size
ctx.drawImage(img, 0, 0, w, h);
/// pixelated
ctx.drawImage(canvas, 0, 0, w, h, 0, 0, canvas.width, canvas.height);
}
function img_error() {
img.onerror=null;
img.src="{{ url_for('static', filename='image/AIL.png') }}";
blocks.value = 50;
pixelate;
}
blocks.addEventListener('change', pixelate, false);
</script>
</html> </html>