Merge pull request #89 from ail-project/crawler_manager

Crawler manager
This commit is contained in:
Alexandre Dulaunoy 2021-03-25 15:58:17 +01:00 committed by GitHub
commit fa05244ee0
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
32 changed files with 1566 additions and 412 deletions

View file

@ -89,76 +89,34 @@ Also, you can quickly stop or start modules by clicking on the ``<K>`` or ``<S>`
Finally, you can quit this program by pressing either ``<q>`` or ``<C-c>``.
Terms frequency usage
---------------------
In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface.
- You can track a term by simply putting it in the box.
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
Crawler
---------------------
In AIL, you can crawl Tor hidden services. Don't forget to review the proxy configuration of your Tor client and especially if you enabled the SOCKS5 proxy and binding on the appropriate IP address reachable via the dockers where Splash runs.
There are two types of installation. You can install a *local* or a *remote* Splash server.
``(Splash host) = the server running the splash service``
``(AIL host) = the server running AIL``
### Installation/Configuration
1. *(Splash host)* Launch ``crawler_hidden_services_install.sh`` to install all requirements (type ``y`` if a localhost splash server is used or use the ``-y`` option)
2. *(Splash host)* To install and setup your tor proxy:
- Install the tor proxy: ``sudo apt-get install tor -y``
(Not required if ``Splash host == AIL host`` - The tor proxy is installed by default in AIL)
(Warning: Some v3 onion address are not resolved with the tor proxy provided via apt get. Use the tor proxy provided by [The torproject](https://2019.www.torproject.org/docs/debian) to solve this issue)
- Allow Tor to bind to any interface or to the docker interface (by default binds to 127.0.0.1 only) in ``/etc/tor/torrc``
``SOCKSPort 0.0.0.0:9050`` or
``SOCKSPort 172.17.0.1:9050``
- Add the following line ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
(for a linux docker, the localhost IP is *172.17.0.1*; Should be adapted for other platform)
- Restart the tor proxy: ``sudo service tor restart``
3. *(AIL host)* Edit the ``/configs/core.cfg`` file:
- In the crawler section, set ``activate_crawler`` to ``True``
- Change the IP address of Splash servers if needed (remote only)
- Set ``splash_onion_port`` according to your Splash servers port numbers that will be used.
those ports numbers should be described as a single port (ex: 8050) or a port range (ex: 8050-8052 for 8050,8051,8052 ports).
### Installation
### Starting the scripts
[Install AIL-Splash-Manager](https://github.com/ail-project/ail-splash-manager)
- *(Splash host)* Launch all Splash servers with:
```sudo ./bin/torcrawler/launch_splash_crawler.sh -f <config absolute_path> -p <port_start> -n <number_of_splash>```
With ``<port_start>`` and ``<number_of_splash>`` matching those specified at ``splash_onion_port`` in the configuration file of point 3 (``/configs/core.cfg``)
### Configuration
All Splash dockers are launched inside the ``Docker_Splash`` screen. You can use ``sudo screen -r Docker_Splash`` to connect to the screen session and check all Splash servers status.
- (AIL host) launch all AIL crawler scripts using:
```./bin/LAUNCH.sh -c```
1. Search the Splash-Manager API key. This API key is generated when you launch the manager for the first time.
(located in your Splash Manager directory ``ail-splash-manager/token_admin.txt``)
### TL;DR - Local setup
2. Splash Manager URL and API Key:
In the webinterface, go to ``Crawlers>Settings`` and click on the Edit button
![Splash Manager Config](./doc/screenshots/splash_manager_config_edit_1.png?raw=true "AIL framework Splash Manager Config")
#### Installation
- ```crawler_hidden_services_install.sh -y```
- Add the following line in ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
- ```sudo service tor restart```
- set activate_crawler to True in ``/configs/core.cfg``
#### Start
- ```sudo ./bin/torcrawler/launch_splash_crawler.sh -f $AIL_HOME/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1```
![Splash Manager Config](./doc/screenshots/splash_manager_config_edit_2.png?raw=true "AIL framework Splash Manager Config")
If AIL framework is not started, it's required to start it before the crawler service:
3. Launch AIL Crawlers:
Choose the number of crawlers you want to launch
![Splash Manager Nb Crawlers Config](./doc/screenshots/splash_manager_nb_crawlers_1.png?raw=true "AIL framework Nb Crawlers Config")
![Splash Manager Nb Crawlers Config](./doc/screenshots/splash_manager_nb_crawlers_2.png?raw=true "AIL framework Nb Crawlers Config")
- ```./bin/LAUNCH.sh -l```
Then starting the crawler service (if you follow the procedure above)
- ```./bin/LAUNCH.sh -c```
#### Old updates

View file

@ -420,6 +420,33 @@ Supported cryptocurrency:
}
```
### Splash containers and proxies:
| SET - Key | Value |
| ------ | ------ |
| all_proxy | **proxy name** |
| all_splash | **splash name** |
| HSET - Key | Field | Value |
| ------ | ------ | ------ |
| proxy:metadata:**proxy name** | host | **host** |
| proxy:metadata:**proxy name** | port | **port** |
| proxy:metadata:**proxy name** | type | **type** |
| proxy:metadata:**proxy name** | crawler_type | **crawler_type** |
| proxy:metadata:**proxy name** | description | **proxy description** |
| | | |
| splash:metadata:**splash name** | description | **splash description** |
| splash:metadata:**splash name** | crawler_type | **crawler_type** |
| splash:metadata:**splash name** | proxy | **splash proxy (None if null)** |
| SET - Key | Value |
| ------ | ------ |
| splash:url:**container name** | **splash url** |
| proxy:splash:**proxy name** | **container name** |
| Key | Value |
| ------ | ------ |
| splash:map:url:name:**splash url** | **container name** |
##### CRAWLER QUEUES:
| SET - Key | Value |
| ------ | ------ |

View file

@ -19,6 +19,9 @@ sys.path.append(os.environ['AIL_BIN'])
from Helper import Process
from pubsublogger import publisher
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
import crawlers
# ======== FUNCTIONS ========
def load_blacklist(service_type):
@ -117,43 +120,6 @@ def unpack_url(url):
return to_crawl
# get url, paste and service_type to crawl
def get_elem_to_crawl(rotation_mode):
message = None
domain_service_type = None
#load_priority_queue
for service_type in rotation_mode:
message = redis_crawler.spop('{}_crawler_priority_queue'.format(service_type))
if message is not None:
domain_service_type = service_type
break
#load_discovery_queue
if message is None:
for service_type in rotation_mode:
message = redis_crawler.spop('{}_crawler_discovery_queue'.format(service_type))
if message is not None:
domain_service_type = service_type
break
#load_normal_queue
if message is None:
for service_type in rotation_mode:
message = redis_crawler.spop('{}_crawler_queue'.format(service_type))
if message is not None:
domain_service_type = service_type
break
if message:
splitted = message.rsplit(';', 1)
if len(splitted) == 2:
url, paste = splitted
if paste:
paste = paste.replace(PASTES_FOLDER+'/', '')
message = {'url': url, 'paste': paste, 'type_service': domain_service_type, 'original_message': message}
return message
def get_crawler_config(redis_server, mode, service_type, domain, url=None):
crawler_options = {}
if mode=='auto':
@ -175,14 +141,17 @@ def get_crawler_config(redis_server, mode, service_type, domain, url=None):
redis_server.delete('crawler_config:{}:{}:{}'.format(mode, service_type, domain))
return crawler_options
def load_crawler_config(service_type, domain, paste, url, date):
def load_crawler_config(queue_type, service_type, domain, paste, url, date):
crawler_config = {}
crawler_config['splash_url'] = splash_url
crawler_config['splash_url'] = f'http://{splash_url}'
crawler_config['item'] = paste
crawler_config['service_type'] = service_type
crawler_config['domain'] = domain
crawler_config['date'] = date
if queue_type and queue_type != 'tor':
service_type = queue_type
# Auto and Manual Crawling
# Auto ################################################# create new entry, next crawling => here or when ended ?
if paste == 'auto':
@ -224,26 +193,29 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
crawler_config['port'] = port
print('Launching Crawler: {}'.format(url))
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'crawling_domain', domain)
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'crawling_domain', domain)
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
retry = True
nb_retry = 0
while retry:
try:
r = requests.get(splash_url , timeout=30.0)
r = requests.get(f'http://{splash_url}' , timeout=30.0)
retry = False
except Exception:
# TODO: relaunch docker or send error message
nb_retry += 1
if nb_retry == 2:
crawlers.restart_splash_docker(splash_url, splash_name)
if nb_retry == 6:
on_error_send_message_back_in_queue(type_service, domain, message)
publisher.error('{} SPASH DOWN'.format(splash_url))
print('--------------------------------------')
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
print(' {} DOWN'.format(splash_url))
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'SPLASH DOWN')
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'SPLASH DOWN')
nb_retry == 0
print(' \033[91m DOCKER SPLASH NOT AVAILABLE\033[0m')
@ -251,7 +223,7 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
time.sleep(10)
if r.status_code == 200:
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Crawling')
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Crawling')
# save config in cash
UUID = str(uuid.uuid4())
r_cache.set('crawler_request:{}'.format(UUID), json.dumps(crawler_config))
@ -273,8 +245,10 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
print('')
print(' PROXY DOWN OR BAD CONFIGURATION\033[0m'.format(splash_url))
print('------------------------------------------------------------------------')
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Error')
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Error')
exit(-2)
else:
crawlers.update_splash_manager_connection_status(True)
else:
print(process.stdout.read())
exit(-1)
@ -283,7 +257,7 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
print('--------------------------------------')
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
print(' {} DOWN'.format(splash_url))
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Crawling')
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Crawling')
exit(1)
# check external links (full_crawl)
@ -305,13 +279,27 @@ def search_potential_source_domain(type_service, domain):
if __name__ == '__main__':
if len(sys.argv) != 2:
print('usage:', 'Crawler.py', 'splash_port')
print('usage:', 'Crawler.py', 'splash_url')
exit(1)
##################################################
#mode = sys.argv[1]
splash_port = sys.argv[1]
splash_url = sys.argv[1]
splash_name = crawlers.get_splash_name_by_url(splash_url)
proxy_name = crawlers.get_splash_proxy(splash_name)
crawler_type = crawlers.get_splash_crawler_type(splash_name)
print(f'SPLASH Name: {splash_name}')
print(f'Proxy Name: {proxy_name}')
print(f'Crawler Type: {crawler_type}')
#time.sleep(10)
#sys.exit(0)
#rotation_mode = deque(['onion', 'regular'])
all_crawler_queues = crawlers.get_crawler_queue_types_by_splash_name(splash_name)
rotation_mode = deque(all_crawler_queues)
print(rotation_mode)
rotation_mode = deque(['onion', 'regular'])
default_proto_map = {'http': 80, 'https': 443}
######################################################## add ftp ???
@ -323,7 +311,6 @@ if __name__ == '__main__':
# Setup the I/O queues
p = Process(config_section)
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url"), splash_port)
print('splash url: {}'.format(splash_url))
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], p.config.get("Directories", "pastes"))
@ -346,7 +333,7 @@ if __name__ == '__main__':
db=p.config.getint("ARDB_Onion", "db"),
decode_responses=True)
faup = Faup()
faup = crawlers.get_faup()
# get HAR files
default_crawler_har = p.config.getboolean("Crawler", "default_crawler_har")
@ -372,9 +359,9 @@ if __name__ == '__main__':
'user_agent': p.config.get("Crawler", "default_crawler_user_agent")}
# Track launched crawler
r_cache.sadd('all_crawler', splash_port)
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Waiting')
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
r_cache.sadd('all_splash_crawlers', splash_url)
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Waiting')
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
# update hardcoded blacklist
load_blacklist('onion')
@ -385,7 +372,7 @@ if __name__ == '__main__':
update_auto_crawler()
rotation_mode.rotate()
to_crawl = get_elem_to_crawl(rotation_mode)
to_crawl = crawlers.get_elem_to_crawl_by_queue_type(rotation_mode)
if to_crawl:
url_data = unpack_url(to_crawl['url'])
# remove domain from queue
@ -408,9 +395,9 @@ if __name__ == '__main__':
'epoch': int(time.time())}
# Update crawler status type
r_cache.sadd('{}_crawlers'.format(to_crawl['type_service']), splash_port)
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'type', to_crawl['type_service'])
crawler_config = load_crawler_config(to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
crawler_config = load_crawler_config(to_crawl['queue_type'], to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
# check if default crawler
if not crawler_config['requested']:
# Auto crawl only if service not up this month
@ -456,11 +443,11 @@ if __name__ == '__main__':
redis_crawler.ltrim('last_{}'.format(to_crawl['type_service']), 0, 15)
#update crawler status
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Waiting')
r_cache.hdel('metadata_crawler:{}'.format(splash_port), 'crawling_domain')
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Waiting')
r_cache.hdel('metadata_crawler:{}'.format(splash_url), 'crawling_domain')
# Update crawler status type
r_cache.srem('{}_crawlers'.format(to_crawl['type_service']), splash_port)
r_cache.hdel('metadata_crawler:{}'.format(splash_url), 'type', to_crawl['type_service'])
# add next auto Crawling in queue:
if to_crawl['paste'] == 'auto':

View file

@ -150,6 +150,8 @@ function launching_scripts {
# LAUNCH CORE MODULE
screen -S "Script_AIL" -X screen -t "JSON_importer" bash -c "cd ${AIL_BIN}/import; ${ENV_PY} ./JSON_importer.py; read x"
sleep 0.1
screen -S "Script_AIL" -X screen -t "Crawler_manager" bash -c "cd ${AIL_BIN}/core; ${ENV_PY} ./Crawler_manager.py; read x"
sleep 0.1
screen -S "Script_AIL" -X screen -t "ModuleInformation" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./ModulesInformationV2.py -k 0 -c 1; read x"
@ -198,8 +200,8 @@ function launching_scripts {
sleep 0.1
screen -S "Script_AIL" -X screen -t "Tools" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Tools.py; read x"
sleep 0.1
screen -S "Script_AIL" -X screen -t "Phone" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Phone.py; read x"
sleep 0.1
#screen -S "Script_AIL" -X screen -t "Phone" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Phone.py; read x"
#sleep 0.1
#screen -S "Script_AIL" -X screen -t "Release" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Release.py; read x"
#sleep 0.1
screen -S "Script_AIL" -X screen -t "Cve" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Cve.py; read x"

66
bin/core/Crawler_manager.py Executable file
View file

@ -0,0 +1,66 @@
#!/usr/bin/env python3
# -*-coding:UTF-8 -*
import os
import sys
import time
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
import ConfigLoader
import crawlers
config_loader = ConfigLoader.ConfigLoader()
r_serv_metadata = config_loader.get_redis_conn("ARDB_Metadata")
config_loader = None
# # TODO: lauch me in core screen
# # TODO: check if already launched in tor screen
# # TODO: handle mutltiple splash_manager
if __name__ == '__main__':
is_manager_connected = crawlers.ping_splash_manager()
if not is_manager_connected:
print('Error, Can\'t connect to Splash manager')
session_uuid = None
else:
print('Splash manager connected')
session_uuid = crawlers.get_splash_manager_session_uuid()
is_manager_connected = crawlers.reload_splash_and_proxies_list()
print(is_manager_connected)
if is_manager_connected:
crawlers.relaunch_crawlers()
last_check = int(time.time())
while True:
# # TODO: avoid multiple ping
# check if manager is connected
if int(time.time()) - last_check > 60:
is_manager_connected = crawlers.is_splash_manager_connected()
current_session_uuid = crawlers.get_splash_manager_session_uuid()
# reload proxy and splash list
if current_session_uuid and current_session_uuid != session_uuid:
is_manager_connected = crawlers.reload_splash_and_proxies_list()
if is_manager_connected:
print('reload proxies and splash list')
crawlers.relaunch_crawlers()
session_uuid = current_session_uuid
if not is_manager_connected:
print('Error, Can\'t connect to Splash manager')
last_check = int(time.time())
# # TODO: lauch crawlers if was never connected
# refresh splash and proxy list
elif False:
crawlers.reload_splash_and_proxies_list()
print('list of splash and proxies refreshed')
else:
time.sleep(5)
# kill/launch new crawler / crawler manager check if already launched
# # TODO: handle mutltiple splash_manager
# catch reload request

View file

@ -4,6 +4,7 @@
import os
import subprocess
import sys
import re
all_screen_name = set()
@ -16,8 +17,11 @@ def is_screen_install():
print(p.stderr)
return False
def exist_screen(screen_name):
cmd_1 = ['screen', '-ls']
def exist_screen(screen_name, with_sudoer=False):
if with_sudoer:
cmd_1 = ['sudo', 'screen', '-ls']
else:
cmd_1 = ['screen', '-ls']
cmd_2 = ['egrep', '[0-9]+.{}'.format(screen_name)]
p1 = subprocess.Popen(cmd_1, stdout=subprocess.PIPE)
p2 = subprocess.Popen(cmd_2, stdin=p1.stdout, stdout=subprocess.PIPE)
@ -27,6 +31,36 @@ def exist_screen(screen_name):
return True
return False
def get_screen_pid(screen_name, with_sudoer=False):
if with_sudoer:
cmd_1 = ['sudo', 'screen', '-ls']
else:
cmd_1 = ['screen', '-ls']
cmd_2 = ['egrep', '[0-9]+.{}'.format(screen_name)]
p1 = subprocess.Popen(cmd_1, stdout=subprocess.PIPE)
p2 = subprocess.Popen(cmd_2, stdin=p1.stdout, stdout=subprocess.PIPE)
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
if output:
# extract pids with screen name
regex_pid_screen_name = b'[0-9]+.' + screen_name.encode()
pids = re.findall(regex_pid_screen_name, output)
# extract pids
all_pids = []
for pid_name in pids:
pid = pid_name.split(b'.')[0].decode()
all_pids.append(pid)
return all_pids
return []
def detach_screen(screen_name):
cmd = ['screen', '-d', screen_name]
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
#if p.stdout:
# print(p.stdout)
if p.stderr:
print(p.stderr)
def create_screen(screen_name):
if not exist_screen(screen_name):
cmd = ['screen', '-dmS', screen_name]
@ -38,18 +72,59 @@ def create_screen(screen_name):
print(p.stderr)
return False
def kill_screen(screen_name, with_sudoer=False):
if get_screen_pid(screen_name, with_sudoer=with_sudoer):
for pid in get_screen_pid(screen_name, with_sudoer=with_sudoer):
cmd = ['kill', pid]
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if p.stderr:
print(p.stderr)
else:
print('{} killed'.format(pid))
return True
return False
# # TODO: add check if len(window_name) == 20
# use: screen -S 'pid.screen_name' -p %window_id% -Q title
# if len(windows_name) > 20 (truncated by default)
def get_screen_windows_list(screen_name):
def get_screen_windows_list(screen_name, r_set=True):
# detach screen to avoid incomplete result
detach_screen(screen_name)
if r_set:
all_windows_name = set()
else:
all_windows_name = []
cmd = ['screen', '-S', screen_name, '-Q', 'windows']
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if p.stdout:
for window_row in p.stdout.split(b' '):
window_id, window_name = window_row.decode().split()
print(window_id)
print(window_name)
print('---')
#print(window_id)
#print(window_name)
#print('---')
if r_set:
all_windows_name.add(window_name)
else:
all_windows_name.append(window_name)
if p.stderr:
print(p.stderr)
return all_windows_name
def get_screen_windows_id(screen_name):
# detach screen to avoid incomplete result
detach_screen(screen_name)
all_windows_id = {}
cmd = ['screen', '-S', screen_name, '-Q', 'windows']
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if p.stdout:
for window_row in p.stdout.split(b' '):
window_id, window_name = window_row.decode().split()
if window_name not in all_windows_id:
all_windows_id[window_name] = []
all_windows_id[window_name].append(window_id)
if p.stderr:
print(p.stderr)
return all_windows_id
# script_location ${AIL_BIN}
def launch_windows_script(screen_name, window_name, dir_project, script_location, script_name, script_options=''):
@ -60,6 +135,16 @@ def launch_windows_script(screen_name, window_name, dir_project, script_location
print(p.stdout)
print(p.stderr)
def launch_uniq_windows_script(screen_name, window_name, dir_project, script_location, script_name, script_options='', kill_previous_windows=False):
all_screen_name = get_screen_windows_id(screen_name)
if window_name in all_screen_name:
if kill_previous_windows:
kill_screen_window(screen_name, all_screen_name[window_name][0], force=True)
else:
print('Error: screen {} already contain a windows with this name {}'.format(screen_name, window_name))
return None
launch_windows_script(screen_name, window_name, dir_project, script_location, script_name, script_options=script_options)
def kill_screen_window(screen_name, window_id, force=False):
if force:# kill
cmd = ['screen', '-S', screen_name, '-p', window_id, '-X', 'kill']

View file

@ -64,3 +64,12 @@ class ConfigLoader(object):
def has_section(self, section):
return self.cfg.has_section(section)
def get_all_keys_values_from_section(self, section):
if section in self.cfg:
all_keys_values = []
for key_name in self.cfg[section]:
all_keys_values.append((key_name, self.cfg.get(section, key_name)))
return all_keys_values
else:
return []

155
bin/lib/Config_DB.py Executable file
View file

@ -0,0 +1,155 @@
#!/usr/bin/python3
"""
Config save in DB
===================
"""
import os
import sys
import redis
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
import ConfigLoader
config_loader = ConfigLoader.ConfigLoader()
r_serv_db = config_loader.get_redis_conn("ARDB_DB")
config_loader = None
#### TO PUT IN CONFIG
# later => module timeout
#
## data retention
#########################
default_config = {
"crawler": {
"enable_har_by_default": False,
"enable_screenshot_by_default": True,
"default_depth_limit": 1,
"default_closespider_pagecount": 50,
"default_user_agent": "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0",
"default_timeout": 30
}
}
def get_default_config():
return default_config
def get_default_config_value(section, field):
return default_config[section][field]
config_type = {
# crawler config
"crawler": {
"enable_har_by_default": bool,
"enable_screenshot_by_default": bool,
"default_depth_limit": int,
"default_closespider_pagecount": int,
"default_user_agent": str,
"default_timeout": int
}
}
def get_config_type(section, field):
return config_type[section][field]
# # TODO: add set, dict, list and select_(multiple_)value
def is_valid_type(obj, section, field, value_type=None):
res = isinstance(obj, get_config_type(section, field))
return res
def reset_default_config():
pass
def set_default_config(section, field):
save_config(section, field, get_default_config_value(section, field))
def get_all_config_sections():
return list(get_default_config())
def get_all_config_fields_by_section(section):
return list(get_default_config()[section])
def get_config(section, field):
# config field don't exist
if not r_serv_db.hexists(f'config:global:{section}', field):
set_default_config(section, field)
return get_default_config_value(section, field)
# load default config section
if not r_serv_db.exists('config:global:{}'.format(section)):
save_config(section, field, get_default_config_value(section, field))
return get_default_config_value(section, field)
return r_serv_db.hget(f'config:global:{section}', field)
def get_config_dict_by_section(section):
config_dict = {}
for field in get_all_config_fields_by_section(section):
config_dict[field] = get_config(section, field)
return config_dict
def save_config(section, field, value, value_type=None): ###########################################
if section in default_config:
if is_valid_type(value, section, field, value_type=value_type):
if value_type in ['list', 'set', 'dict']:
pass
else:
r_serv_db.hset(f'config:global:{section}', field, value)
# used by check_integrity
r_serv_db.sadd('config:all_global_section', field, value)
# check config value + type
def check_integrity():
pass
config_documentation = {
"crawler": {
"enable_har_by_default": 'Enable HAR by default',
"enable_screenshot_by_default": 'Enable screenshot by default',
"default_depth_limit": 'Maximum number of url depth',
"default_closespider_pagecount": 'Maximum number of pages',
"default_user_agent": "User agent used by default",
"default_timeout": "Crawler connection timeout"
}
}
def get_config_documentation(section, field):
return config_documentation[section][field]
# def conf_view():
# class F(MyBaseForm):
# pass
#
# F.username = TextField('username')
# for name in iterate_some_model_dynamically():
# setattr(F, name, TextField(name.title()))
#
# form = F(request.POST, ...)
def get_field_full_config(section, field):
dict_config = {}
dict_config['value'] = get_config(section, field)
dict_config['type'] = get_config_type(section, field)
dict_config['info'] = get_config_documentation(section, field)
return dict_config
def get_full_config_by_section(section):
dict_config = {}
for field in get_all_config_fields_by_section(section):
dict_config[field] = get_field_full_config(section, field)
return dict_config
def get_full_config():
dict_config = {}
for section in get_all_config_sections():
dict_config[section] = get_full_config_by_section(section)
return dict_config
if __name__ == '__main__':
res = get_full_config()
print(res)

View file

@ -13,6 +13,7 @@ import os
import re
import redis
import sys
import time
import uuid
from datetime import datetime, timedelta
@ -34,19 +35,24 @@ config_loader = ConfigLoader.ConfigLoader()
r_serv_metadata = config_loader.get_redis_conn("ARDB_Metadata")
r_serv_onion = config_loader.get_redis_conn("ARDB_Onion")
r_cache = config_loader.get_redis_conn("Redis_Cache")
config_loader = None
# load crawler config
config_loader = ConfigLoader.ConfigLoader(config_file='crawlers.cfg')
#splash_manager_url = config_loader.get_config_str('Splash_Manager', 'splash_url')
#splash_api_key = config_loader.get_config_str('Splash_Manager', 'api_key')
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], config_loader.get_config_str("Directories", "pastes"))
config_loader = None
faup = Faup()
# # # # # # # #
# #
# COMMON #
# #
# # # # # # # #
def generate_uuid():
return str(uuid.uuid4()).replace('-', '')
# # TODO: remove me ?
def get_current_date():
return datetime.now().strftime("%Y%m%d")
def is_valid_onion_domain(domain):
if not domain.endswith('.onion'):
return False
@ -61,6 +67,10 @@ def is_valid_onion_domain(domain):
return True
return False
# TEMP FIX
def get_faup():
return faup
################################################################################
# # TODO: handle prefix cookies
@ -389,8 +399,127 @@ def api_create_cookie(user_id, cookiejar_uuid, cookie_dict):
#### ####
# # # # # # # #
# #
# CRAWLER #
# #
# # # # # # # #
#### CRAWLER GLOBAL ####
def get_all_spash_crawler_status():
crawler_metadata = []
all_crawlers = r_cache.smembers('all_splash_crawlers')
for crawler in all_crawlers:
crawler_metadata.append(get_splash_crawler_status(crawler))
return crawler_metadata
def reset_all_spash_crawler_status():
r_cache.delete('all_splash_crawlers')
def get_splash_crawler_status(spash_url):
crawler_type = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'type')
crawling_domain = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'crawling_domain')
started_time = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'started_time')
status_info = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'status')
crawler_info = '{} - {}'.format(spash_url, started_time)
if status_info=='Waiting' or status_info=='Crawling':
status=True
else:
status=False
return {'crawler_info': crawler_info, 'crawling_domain': crawling_domain, 'status_info': status_info, 'status': status, 'type': crawler_type}
def get_stats_last_crawled_domains(crawler_types, date):
statDomains = {}
for crawler_type in crawler_types:
stat_type = {}
stat_type['domains_up'] = r_serv_onion.scard('{}_up:{}'.format(crawler_type, date))
stat_type['domains_down'] = r_serv_onion.scard('{}_down:{}'.format(crawler_type, date))
stat_type['total'] = stat_type['domains_up'] + stat_type['domains_down']
stat_type['domains_queue'] = get_nb_elem_to_crawl_by_type(crawler_type)
statDomains[crawler_type] = stat_type
return statDomains
# # TODO: handle custom proxy
def get_splash_crawler_latest_stats():
now = datetime.now()
date = now.strftime("%Y%m%d")
return get_stats_last_crawled_domains(['onion', 'regular'], date)
def get_nb_crawlers_to_launch_by_splash_name(splash_name):
res = r_serv_onion.hget('all_crawlers_to_launch', splash_name)
if res:
return int(res)
else:
return 0
def get_all_crawlers_to_launch_splash_name():
return r_serv_onion.hkeys('all_crawlers_to_launch')
def get_nb_crawlers_to_launch():
nb_crawlers_to_launch = r_serv_onion.hgetall('all_crawlers_to_launch')
for splash_name in nb_crawlers_to_launch:
nb_crawlers_to_launch[splash_name] = int(nb_crawlers_to_launch[splash_name])
return nb_crawlers_to_launch
def get_nb_crawlers_to_launch_ui():
nb_crawlers_to_launch = get_nb_crawlers_to_launch()
for splash_name in get_all_splash():
if splash_name not in nb_crawlers_to_launch:
nb_crawlers_to_launch[splash_name] = 0
return nb_crawlers_to_launch
def set_nb_crawlers_to_launch(dict_splash_name):
r_serv_onion.delete('all_crawlers_to_launch')
for splash_name in dict_splash_name:
r_serv_onion.hset('all_crawlers_to_launch', splash_name, int(dict_splash_name[splash_name]))
relaunch_crawlers()
def relaunch_crawlers():
all_crawlers_to_launch = get_nb_crawlers_to_launch()
for splash_name in all_crawlers_to_launch:
nb_crawlers = int(all_crawlers_to_launch[splash_name])
all_crawler_urls = get_splash_all_url(splash_name, r_list=True)
if nb_crawlers > len(all_crawler_urls):
print('Error, can\'t launch all Splash Dockers')
print('Please launch {} additional {} Dockers'.format( nb_crawlers - len(all_crawler_urls), splash_name))
nb_crawlers = len(all_crawler_urls)
reset_all_spash_crawler_status()
for i in range(0, int(nb_crawlers)):
splash_url = all_crawler_urls[i]
print(all_crawler_urls[i])
launch_ail_splash_crawler(splash_url, script_options='{}'.format(splash_url))
def api_set_nb_crawlers_to_launch(dict_splash_name):
# TODO: check if is dict
dict_crawlers_to_launch = {}
all_splash = get_all_splash()
crawlers_to_launch = list(all_splash & set(dict_splash_name.keys()))
for splash_name in crawlers_to_launch:
try:
nb_to_launch = int(dict_splash_name.get(splash_name, 0))
if nb_to_launch < 0:
return ({'error':'The number of crawlers to launch is negative'}, 400)
except:
return ({'error':'invalid number of crawlers to launch'}, 400)
if nb_to_launch > 0:
dict_crawlers_to_launch[splash_name] = nb_to_launch
if dict_crawlers_to_launch:
set_nb_crawlers_to_launch(dict_crawlers_to_launch)
return (dict_crawlers_to_launch, 200)
else:
return ({'error':'invalid input'}, 400)
##-- CRAWLER GLOBAL --##
#### CRAWLER TASK ####
def create_crawler_task(url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, cookiejar_uuid=None, user_agent=None):
def create_crawler_task(url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, crawler_type=None, cookiejar_uuid=None, user_agent=None):
crawler_config = {}
crawler_config['depth_limit'] = depth_limit
@ -430,10 +559,18 @@ def create_crawler_task(url, screenshot=True, har=True, depth_limit=1, max_pages
tld = unpack_url['tld'].decode()
except:
tld = unpack_url['tld']
if tld == 'onion':
crawler_type = 'onion'
if crawler_type=='None':
crawler_type = None
if crawler_type:
if crawler_type=='tor':
crawler_type = 'onion'
else:
crawler_type = 'regular'
if tld == 'onion':
crawler_type = 'onion'
else:
crawler_type = 'regular'
save_crawler_config(crawler_mode, crawler_type, crawler_config, domain, url=url)
send_url_to_crawl_in_queue(crawler_mode, crawler_type, url)
@ -445,6 +582,7 @@ def save_crawler_config(crawler_mode, crawler_type, crawler_config, domain, url=
r_serv_onion.set('crawler_config:{}:{}:{}:{}'.format(crawler_mode, crawler_type, domain, url), json.dumps(crawler_config))
def send_url_to_crawl_in_queue(crawler_mode, crawler_type, url):
print('{}_crawler_priority_queue'.format(crawler_type), '{};{}'.format(url, crawler_mode))
r_serv_onion.sadd('{}_crawler_priority_queue'.format(crawler_type), '{};{}'.format(url, crawler_mode))
# add auto crawled url for user UI
if crawler_mode == 'auto':
@ -452,7 +590,7 @@ def send_url_to_crawl_in_queue(crawler_mode, crawler_type, url):
#### ####
#### CRAWLER TASK API ####
def api_create_crawler_task(user_id, url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, cookiejar_uuid=None, user_agent=None):
def api_create_crawler_task(user_id, url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, crawler_type=None, cookiejar_uuid=None, user_agent=None):
# validate url
if url is None or url=='' or url=='\n':
return ({'error':'invalid depth limit'}, 400)
@ -489,7 +627,10 @@ def api_create_crawler_task(user_id, url, screenshot=True, har=True, depth_limit
if cookie_owner != user_id:
return ({'error': 'The access to this cookiejar is restricted'}, 403)
# # TODO: verify splash name/crawler type
create_crawler_task(url, screenshot=screenshot, har=har, depth_limit=depth_limit, max_pages=max_pages,
crawler_type=crawler_type,
auto_crawler=auto_crawler, crawler_delta=crawler_delta, cookiejar_uuid=cookiejar_uuid, user_agent=user_agent)
return None
@ -572,6 +713,7 @@ def save_har(har_dir, item_id, har_content):
with open(filename, 'w') as f:
f.write(json.dumps(har_content))
# # TODO: FIXME
def api_add_crawled_item(dict_crawled):
domain = None
@ -580,30 +722,200 @@ def api_add_crawled_item(dict_crawled):
save_crawled_item(item_id, response.data['html'])
create_item_metadata(item_id, domain, 'last_url', port, 'father')
#### CRAWLER QUEUES ####
def get_all_crawlers_queues_types():
all_queues_types = set()
all_splash_name = get_all_crawlers_to_launch_splash_name()
for splash_name in all_splash_name:
all_queues_types.add(get_splash_crawler_type(splash_name))
all_splash_name = list()
return all_queues_types
#### SPLASH MANAGER ####
def get_splash_manager_url(reload=False): # TODO: add config reload
return splash_manager_url
def get_crawler_queue_types_by_splash_name(splash_name):
all_domain_type = [splash_name]
crawler_type = get_splash_crawler_type(splash_name)
#if not is_splash_used_in_discovery(splash_name)
if crawler_type == 'tor':
all_domain_type.append('onion')
all_domain_type.append('regular')
else:
all_domain_type.append('regular')
return all_domain_type
def get_splash_api_key(reload=False): # TODO: add config reload
return splash_api_key
def get_crawler_type_by_url(url):
faup.decode(url)
unpack_url = faup.get()
## TODO: # FIXME: remove me
try:
tld = unpack_url['tld'].decode()
except:
tld = unpack_url['tld']
if tld == 'onion':
crawler_type = 'onion'
else:
crawler_type = 'regular'
return crawler_type
def get_elem_to_crawl_by_queue_type(l_queue_type):
## queues priority:
# 1 - priority queue
# 2 - discovery queue
# 3 - normal queue
##
all_queue_key = ['{}_crawler_priority_queue', '{}_crawler_discovery_queue', '{}_crawler_queue']
for queue_key in all_queue_key:
for queue_type in l_queue_type:
message = r_serv_onion.spop(queue_key.format(queue_type))
if message:
dict_to_crawl = {}
splitted = message.rsplit(';', 1)
if len(splitted) == 2:
url, item_id = splitted
item_id = item_id.replace(PASTES_FOLDER+'/', '')
else:
# # TODO: to check/refractor
item_id = None
url = message
crawler_type = get_crawler_type_by_url(url)
return {'url': url, 'paste': item_id, 'type_service': crawler_type, 'queue_type': queue_type, 'original_message': message}
return None
def get_nb_elem_to_crawl_by_type(queue_type):
nb = r_serv_onion.scard('{}_crawler_priority_queue'.format(queue_type))
nb += r_serv_onion.scard('{}_crawler_discovery_queue'.format(queue_type))
nb += r_serv_onion.scard('{}_crawler_queue'.format(queue_type))
return nb
#### ---- ####
# # # # # # # # # # # #
# #
# SPLASH MANAGER #
# #
# # # # # # # # # # # #
def get_splash_manager_url(reload=False): # TODO: add in db config
return r_serv_onion.get('crawler:splash:manager:url')
def get_splash_api_key(reload=False): # TODO: add in db config
return r_serv_onion.get('crawler:splash:manager:key')
def get_hidden_splash_api_key(): # TODO: add in db config
key = get_splash_api_key()
if key:
if len(key)==41:
return f'{key[:4]}*********************************{key[-4:]}'
def is_valid_api_key(api_key, search=re.compile(r'[^a-zA-Z0-9_-]').search):
if len(api_key) != 41:
return False
return not bool(search(api_key))
def save_splash_manager_url_api(url, api_key):
r_serv_onion.set('crawler:splash:manager:url', url)
r_serv_onion.set('crawler:splash:manager:key', api_key)
def get_splash_url_from_manager_url(splash_manager_url, splash_port):
url = urlparse(splash_manager_url)
host = url.netloc.split(':', 1)[0]
return 'http://{}:{}'.format(host, splash_port)
return '{}:{}'.format(host, splash_port)
# def is_splash_used_in_discovery(splash_name):
# res = r_serv_onion.hget('splash:metadata:{}'.format(splash_name), 'discovery_queue')
# if res == 'True':
# return True
# else:
# return False
def restart_splash_docker(splash_url, splash_name):
splash_port = splash_url.split(':')[-1]
return _restart_splash_docker(splash_port, splash_name)
def is_splash_manager_connected(delta_check=30):
last_check = r_cache.hget('crawler:splash:manager', 'last_check')
if last_check:
if int(time.time()) - int(last_check) > delta_check:
ping_splash_manager()
else:
ping_splash_manager()
res = r_cache.hget('crawler:splash:manager', 'connected')
return res == 'True'
def update_splash_manager_connection_status(is_connected, req_error=None):
r_cache.hset('crawler:splash:manager', 'connected', is_connected)
r_cache.hset('crawler:splash:manager', 'last_check', int(time.time()))
if not req_error:
r_cache.hdel('crawler:splash:manager', 'error')
else:
r_cache.hset('crawler:splash:manager', 'status_code', req_error['status_code'])
r_cache.hset('crawler:splash:manager', 'error', req_error['error'])
def get_splash_manager_connection_metadata(force_ping=False):
dict_manager={}
if force_ping:
dict_manager['status'] = ping_splash_manager()
else:
dict_manager['status'] = is_splash_manager_connected()
if not dict_manager['status']:
dict_manager['status_code'] = r_cache.hget('crawler:splash:manager', 'status_code')
dict_manager['error'] = r_cache.hget('crawler:splash:manager', 'error')
return dict_manager
## API ##
def ping_splash_manager():
req = requests.get('{}/api/v1/ping'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
if req.status_code == 200:
return True
else:
print(req.json())
splash_manager_url = get_splash_manager_url()
if not splash_manager_url:
return False
try:
req = requests.get('{}/api/v1/ping'.format(splash_manager_url), headers={"Authorization": get_splash_api_key()}, verify=False)
if req.status_code == 200:
update_splash_manager_connection_status(True)
return True
else:
res = req.json()
if 'reason' in res:
req_error = {'status_code': req.status_code, 'error': res['reason']}
else:
print(req.json())
req_error = {'status_code': req.status_code, 'error': json.dumps(req.json())}
update_splash_manager_connection_status(False, req_error=req_error)
return False
except requests.exceptions.ConnectionError:
pass
# splash manager unreachable
req_error = {'status_code': 500, 'error': 'splash manager unreachable'}
update_splash_manager_connection_status(False, req_error=req_error)
return False
def get_splash_manager_session_uuid():
try:
req = requests.get('{}/api/v1/get/session_uuid'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
if req.status_code == 200:
res = req.json()
if res:
return res['session_uuid']
else:
print(req.json())
except (requests.exceptions.ConnectionError, requests.exceptions.MissingSchema):
# splash manager unreachable
update_splash_manager_connection_status(False)
def get_splash_manager_version():
splash_manager_url = get_splash_manager_url()
if splash_manager_url:
try:
req = requests.get('{}/api/v1/version'.format(splash_manager_url), headers={"Authorization": get_splash_api_key()}, verify=False)
if req.status_code == 200:
return req.json()['message']
else:
print(req.json())
except requests.exceptions.ConnectionError:
pass
def get_all_splash_manager_containers_name():
req = requests.get('{}/api/v1/get/splash/name/all'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
req = requests.get('{}/api/v1/get/splash/all'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
if req.status_code == 200:
return req.json()
else:
@ -615,6 +927,35 @@ def get_all_splash_manager_proxies():
return req.json()
else:
print(req.json())
def _restart_splash_docker(splash_port, splash_name):
dict_to_send = {'port': splash_port, 'name': splash_name}
req = requests.post('{}/api/v1/splash/restart'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False, json=dict_to_send)
if req.status_code == 200:
return req.json()
else:
print(req.json())
def api_save_splash_manager_url_api(data):
# unpack json
manager_url = data.get('url', None)
api_key = data.get('api_key', None)
if not manager_url or not api_key:
return ({'status': 'error', 'reason': 'No url or API key supplied'}, 400)
# check if is valid url
try:
result = urlparse(manager_url)
if not all([result.scheme, result.netloc]):
return ({'status': 'error', 'reason': 'Invalid url'}, 400)
except:
return ({'status': 'error', 'reason': 'Invalid url'}, 400)
# check if is valid key
if not is_valid_api_key(api_key):
return ({'status': 'error', 'reason': 'Invalid API key'}, 400)
save_splash_manager_url_api(manager_url, api_key)
return ({'url': manager_url, 'api_key': get_hidden_splash_api_key()}, 200)
## -- ##
## SPLASH ##
@ -647,7 +988,23 @@ def get_splash_name_by_url(splash_url):
def get_splash_crawler_type(splash_name):
return r_serv_onion.hget('splash:metadata:{}'.format(splash_name), 'crawler_type')
def get_all_splash_by_proxy(proxy_name):
def get_splash_crawler_description(splash_name):
return r_serv_onion.hget('splash:metadata:{}'.format(splash_name), 'description')
def get_splash_crawler_metadata(splash_name):
dict_splash = {}
dict_splash['proxy'] = get_splash_proxy(splash_name)
dict_splash['type'] = get_splash_crawler_type(splash_name)
dict_splash['description'] = get_splash_crawler_description(splash_name)
return dict_splash
def get_all_splash_crawler_metadata():
dict_splash = {}
for splash_name in get_all_splash():
dict_splash[splash_name] = get_splash_crawler_metadata(splash_name)
return dict_splash
def get_all_splash_by_proxy(proxy_name, r_list=False):
res = r_serv_onion.smembers('proxy:splash:{}'.format(proxy_name))
if res:
if r_list:
@ -683,16 +1040,50 @@ def delete_all_proxies():
for proxy_name in get_all_proxies():
delete_proxy(proxy_name)
def get_proxy_host(proxy_name):
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'host')
def get_proxy_port(proxy_name):
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'port')
def get_proxy_type(proxy_name):
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'type')
def get_proxy_crawler_type(proxy_name):
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'crawler_type')
def get_proxy_description(proxy_name):
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'description')
def get_proxy_metadata(proxy_name):
meta_dict = {}
meta_dict['host'] = get_proxy_host(proxy_name)
meta_dict['port'] = get_proxy_port(proxy_name)
meta_dict['type'] = get_proxy_type(proxy_name)
meta_dict['crawler_type'] = get_proxy_crawler_type(proxy_name)
meta_dict['description'] = get_proxy_description(proxy_name)
return meta_dict
def get_all_proxies_metadata():
all_proxy_dict = {}
for proxy_name in get_all_proxies():
all_proxy_dict[proxy_name] = get_proxy_metadata(proxy_name)
return all_proxy_dict
# def set_proxy_used_in_discovery(proxy_name, value):
# r_serv_onion.hset('splash:metadata:{}'.format(splash_name), 'discovery_queue', value)
def delete_proxy(proxy_name): # # TODO: force delete (delete all proxy)
proxy_splash = get_all_splash_by_proxy(proxy_name)
if proxy_splash:
print('error, a splash container is using this proxy')
#if proxy_splash:
# print('error, a splash container is using this proxy')
r_serv_onion.delete('proxy:metadata:{}'.format(proxy_name))
r_serv_onion.srem('all_proxy', proxy_name)
## -- ##
## LOADER ##
def load_all_splash_containers():
delete_all_splash_containers()
all_splash_containers_name = get_all_splash_manager_containers_name()
for splash_name in all_splash_containers_name:
r_serv_onion.sadd('all_splash', splash_name)
@ -715,6 +1106,7 @@ def load_all_splash_containers():
r_serv_onion.set('splash:map:url:name:{}'.format(splash_url), splash_name)
def load_all_proxy():
delete_all_proxies()
all_proxies = get_all_splash_manager_proxies()
for proxy_name in all_proxies:
proxy_dict = all_proxies[proxy_name]
@ -725,13 +1117,17 @@ def load_all_proxy():
description = all_proxies[proxy_name].get('description', None)
if description:
r_serv_onion.hset('proxy:metadata:{}'.format(proxy_name), 'description', description)
r_serv_onion.sadd('all_proxy', proxy_name)
def init_splash_list_db():
delete_all_splash_containers()
delete_all_proxies()
def reload_splash_and_proxies_list():
if ping_splash_manager():
load_all_splash_containers()
# LOAD PROXIES containers
load_all_proxy()
# LOAD SPLASH containers
load_all_splash_containers()
return True
else:
return False
# # TODO: kill crawler screen ?
## -- ##
@ -742,7 +1138,7 @@ def launch_ail_splash_crawler(splash_url, script_options=''):
script_location = os.path.join(os.environ['AIL_BIN'])
script_name = 'Crawler.py'
screen.create_screen(screen_name)
screen.launch_windows_script(screen_name, splash_url, dir_project, script_location, script_name, script_options=script_options)
screen.launch_uniq_windows_script(screen_name, splash_url, dir_project, script_location, script_name, script_options=script_options, kill_previous_windows=True)
## -- ##
@ -752,3 +1148,8 @@ def launch_ail_splash_crawler(splash_url, script_options=''):
#### CRAWLER PROXY ####
#### ---- ####
if __name__ == '__main__':
res = get_splash_manager_version()
#res = restart_splash_docker('127.0.0.1:8050', 'default_splash_tor')
print(res)

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

View file

@ -1,4 +0,0 @@
[proxy]
host=localhost
port=9050
type=SOCKS5

View file

@ -16,9 +16,11 @@ if [ -z "$VIRTUAL_ENV" ]; then
echo export AIL_REDIS=$(pwd)/redis/src/ >> ./AILENV/bin/activate
echo export AIL_ARDB=$(pwd)/ardb/src/ >> ./AILENV/bin/activate
. ./AILENV/bin/activate
fi
# activate virtual environment
. ./AILENV/bin/activate
pip3 install -U pip
pip3 install 'git+https://github.com/D4-project/BGP-Ranking.git/@7e698f87366e6f99b4d0d11852737db28e3ddc62#egg=pybgpranking&subdirectory=client'
pip3 install -U -r requirements.txt

View file

@ -24,10 +24,12 @@ sys.path.append(os.path.join(os.environ['AIL_BIN'], 'packages'))
import Tag
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
import Domain
import crawlers
import Domain
import Language
import Config_DB
r_cache = Flask_config.r_cache
r_serv_db = Flask_config.r_serv_db
r_serv_tags = Flask_config.r_serv_tags
@ -49,13 +51,44 @@ def create_json_response(data, status_code):
return Response(json.dumps(data, indent=2, sort_keys=True), mimetype='application/json'), status_code
# ============= ROUTES ==============
@crawler_splash.route("/crawlers/dashboard", methods=['GET'])
@login_required
@login_read_only
def crawlers_dashboard():
# # TODO: get splash manager status
is_manager_connected = crawlers.get_splash_manager_connection_metadata()
all_splash_crawler_status = crawlers.get_all_spash_crawler_status()
splash_crawlers_latest_stats = crawlers.get_splash_crawler_latest_stats()
date = crawlers.get_current_date()
return render_template("dashboard_splash_crawler.html", all_splash_crawler_status = all_splash_crawler_status,
is_manager_connected=is_manager_connected, date=date,
splash_crawlers_latest_stats=splash_crawlers_latest_stats)
@crawler_splash.route("/crawlers/crawler_dashboard_json", methods=['GET'])
@login_required
@login_read_only
def crawler_dashboard_json():
all_splash_crawler_status = crawlers.get_all_spash_crawler_status()
splash_crawlers_latest_stats = crawlers.get_splash_crawler_latest_stats()
return jsonify({'all_splash_crawler_status': all_splash_crawler_status,
'splash_crawlers_latest_stats':splash_crawlers_latest_stats})
@crawler_splash.route("/crawlers/manual", methods=['GET'])
@login_required
@login_read_only
def manual():
user_id = current_user.get_id()
l_cookiejar = crawlers.api_get_cookies_list_select(user_id)
return render_template("crawler_manual.html", crawler_enabled=True, l_cookiejar=l_cookiejar)
all_crawlers_types = crawlers.get_all_crawlers_queues_types()
all_splash_name = crawlers.get_all_crawlers_to_launch_splash_name()
return render_template("crawler_manual.html",
is_manager_connected=crawlers.get_splash_manager_connection_metadata(),
all_crawlers_types=all_crawlers_types,
all_splash_name=all_splash_name,
l_cookiejar=l_cookiejar)
@crawler_splash.route("/crawlers/send_to_spider", methods=['POST'])
@login_required
@ -65,6 +98,8 @@ def send_to_spider():
# POST val
url = request.form.get('url_to_crawl')
crawler_type = request.form.get('crawler_queue_type')
splash_name = request.form.get('splash_name')
auto_crawler = request.form.get('crawler_type')
crawler_delta = request.form.get('crawler_epoch')
screenshot = request.form.get('screenshot')
@ -73,6 +108,9 @@ def send_to_spider():
max_pages = request.form.get('max_pages')
cookiejar_uuid = request.form.get('cookiejar')
if splash_name:
crawler_type = splash_name
if cookiejar_uuid:
if cookiejar_uuid == 'None':
cookiejar_uuid = None
@ -81,6 +119,7 @@ def send_to_spider():
cookiejar_uuid = cookiejar_uuid[-1].replace(' ', '')
res = crawlers.api_create_crawler_task(user_id, url, screenshot=screenshot, har=har, depth_limit=depth_limit, max_pages=max_pages,
crawler_type=crawler_type,
auto_crawler=auto_crawler, crawler_delta=crawler_delta, cookiejar_uuid=cookiejar_uuid)
if res:
return create_json_response(res[0], res[1])
@ -459,4 +498,61 @@ def crawler_cookiejar_cookie_json_add_post():
return redirect(url_for('crawler_splash.crawler_cookiejar_cookie_add', cookiejar_uuid=cookiejar_uuid))
@crawler_splash.route('/crawler/settings', methods=['GET'])
@login_required
@login_analyst
def crawler_splash_setings():
all_proxies = crawlers.get_all_proxies_metadata()
all_splash = crawlers.get_all_splash_crawler_metadata()
nb_crawlers_to_launch = crawlers.get_nb_crawlers_to_launch()
splash_manager_url = crawlers.get_splash_manager_url()
api_key = crawlers.get_hidden_splash_api_key()
is_manager_connected = crawlers.get_splash_manager_connection_metadata(force_ping=True)
crawler_full_config = Config_DB.get_full_config_by_section('crawler')
return render_template("settings_splash_crawler.html",
is_manager_connected=is_manager_connected,
splash_manager_url=splash_manager_url, api_key=api_key,
nb_crawlers_to_launch=nb_crawlers_to_launch,
all_splash=all_splash, all_proxies=all_proxies,
crawler_full_config=crawler_full_config)
@crawler_splash.route('/crawler/settings/crawler_manager', methods=['GET', 'POST'])
@login_required
@login_admin
def crawler_splash_setings_crawler_manager():
if request.method == 'POST':
splash_manager_url = request.form.get('splash_manager_url')
api_key = request.form.get('api_key')
res = crawlers.api_save_splash_manager_url_api({'url':splash_manager_url, 'api_key':api_key})
if res[1] != 200:
return Response(json.dumps(res[0], indent=2, sort_keys=True), mimetype='application/json'), res[1]
else:
return redirect(url_for('crawler_splash.crawler_splash_setings'))
else:
splash_manager_url = crawlers.get_splash_manager_url()
api_key = crawlers.get_splash_api_key()
return render_template("settings_edit_splash_crawler_manager.html",
splash_manager_url=splash_manager_url, api_key=api_key)
@crawler_splash.route('/crawler/settings/crawlers_to_lauch', methods=['GET', 'POST'])
@login_required
@login_admin
def crawler_splash_setings_crawlers_to_lauch():
if request.method == 'POST':
dict_splash_name = {}
for crawler_name in list(request.form):
dict_splash_name[crawler_name]= request.form.get(crawler_name)
res = crawlers.api_set_nb_crawlers_to_launch(dict_splash_name)
if res[1] != 200:
return Response(json.dumps(res[0], indent=2, sort_keys=True), mimetype='application/json'), res[1]
else:
return redirect(url_for('crawler_splash.crawler_splash_setings'))
else:
nb_crawlers_to_launch = crawlers.get_nb_crawlers_to_launch_ui()
return render_template("settings_edit_crawlers_to_launch.html",
nb_crawlers_to_launch=nb_crawlers_to_launch)
## - - ##

View file

@ -74,10 +74,13 @@ def login():
if user.request_password_change():
return redirect(url_for('root.change_password'))
else:
if next_page and next_page!='None':
# update note
# next page
if next_page and next_page!='None' and next_page!='/':
return redirect(next_page)
# dashboard
else:
return redirect(url_for('dashboard.index'))
return redirect(url_for('dashboard.index', update_note=True))
# login failed
else:
# set brute force protection
@ -113,7 +116,9 @@ def change_password():
if check_password_strength(password1):
user_id = current_user.get_id()
create_user_db(user_id , password1, update=True)
return redirect(url_for('dashboard.index'))
# update Note
# dashboard
return redirect(url_for('dashboard.index', update_note=True))
else:
error = 'Incorrect password'
return render_template("change_password.html", error=error)

View file

@ -155,6 +155,8 @@ def stuff():
@login_required
@login_read_only
def index():
update_note = request.args.get('update_note')
default_minute = config_loader.get_config_str("Flask", "minute_processed_paste")
threshold_stucked_module = config_loader.get_config_int("Module_ModuleInformation", "threshold_stucked_module")
log_select = {10, 25, 50, 100}
@ -176,6 +178,7 @@ def index():
return render_template("index.html", default_minute = default_minute, threshold_stucked_module=threshold_stucked_module,
log_select=log_select, selected=max_dashboard_logs,
update_warning_message=update_warning_message, update_in_progress=update_in_progress,
update_note=update_note,
update_warning_message_notice_me=update_warning_message_notice_me)
# ========= REGISTRATION =========

View file

@ -72,12 +72,10 @@
</div>
{%endif%}
<div class="alert alert-info alert-dismissible fade show mt-1" role="alert">
<strong>Bootstrap 4 migration!</strong> Some pages are still in bootstrap 3. You can check the migration progress <strong><a href="https://github.com/CIRCL/AIL-framework/issues/330" target="_blank">Here</a></strong>.
<button type="button" class="close" data-dismiss="alert" aria-label="Close">
<span aria-hidden="true">&times;</span>
</button>
</div>
<!-- TODO: Add users messages -->
{%if update_note%}
{% include 'dashboard/update_modal.html' %}
{%endif%}
<div class="row my-2">

View file

@ -18,6 +18,7 @@ from flask_login import login_required
from Date import Date
from HiddenServices import HiddenServices
import crawlers
# ============ VARIABLES ============
import Flask_config
@ -27,7 +28,6 @@ baseUrl = Flask_config.baseUrl
r_cache = Flask_config.r_cache
r_serv_onion = Flask_config.r_serv_onion
r_serv_metadata = Flask_config.r_serv_metadata
crawler_enabled = Flask_config.crawler_enabled
bootstrap_label = Flask_config.bootstrap_label
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
@ -231,22 +231,22 @@ def delete_auto_crawler(url):
# ============= ROUTES ==============
@hiddenServices.route("/crawlers/", methods=['GET'])
@login_required
@login_read_only
def dashboard():
crawler_metadata_onion = get_crawler_splash_status('onion')
crawler_metadata_regular = get_crawler_splash_status('regular')
now = datetime.datetime.now()
date = now.strftime("%Y%m%d")
statDomains_onion = get_stats_last_crawled_domains('onion', date)
statDomains_regular = get_stats_last_crawled_domains('regular', date)
return render_template("Crawler_dashboard.html", crawler_metadata_onion = crawler_metadata_onion,
crawler_enabled=crawler_enabled, date=date,
crawler_metadata_regular=crawler_metadata_regular,
statDomains_onion=statDomains_onion, statDomains_regular=statDomains_regular)
# @hiddenServices.route("/crawlers/", methods=['GET'])
# @login_required
# @login_read_only
# def dashboard():
# crawler_metadata_onion = get_crawler_splash_status('onion')
# crawler_metadata_regular = get_crawler_splash_status('regular')
#
# now = datetime.datetime.now()
# date = now.strftime("%Y%m%d")
# statDomains_onion = get_stats_last_crawled_domains('onion', date)
# statDomains_regular = get_stats_last_crawled_domains('regular', date)
#
# return render_template("Crawler_dashboard.html", crawler_metadata_onion = crawler_metadata_onion,
# date=date,
# crawler_metadata_regular=crawler_metadata_regular,
# statDomains_onion=statDomains_onion, statDomains_regular=statDomains_regular)
@hiddenServices.route("/crawlers/crawler_splash_onion", methods=['GET'])
@login_required
@ -288,7 +288,7 @@ def Crawler_Splash_last_by_type():
crawler_metadata = get_crawler_splash_status(type)
return render_template("Crawler_Splash_last_by_type.html", type=type, type_name=type_name,
crawler_enabled=crawler_enabled,
is_manager_connected=crawlers.get_splash_manager_connection_metadata(),
last_domains=list_domains, statDomains=statDomains,
crawler_metadata=crawler_metadata, date_from=date_string, date_to=date_string)
@ -424,7 +424,7 @@ def auto_crawler():
return render_template("Crawler_auto.html", page=page, nb_page_max=nb_page_max,
last_domains=last_domains,
crawler_enabled=crawler_enabled,
is_manager_connected=crawlers.get_splash_manager_connection_metadata(),
auto_crawler_domain_onions_metadata=auto_crawler_domain_onions_metadata,
auto_crawler_domain_regular_metadata=auto_crawler_domain_regular_metadata)
@ -439,23 +439,6 @@ def remove_auto_crawler():
delete_auto_crawler(url)
return redirect(url_for('hiddenServices.auto_crawler', page=page))
@hiddenServices.route("/crawlers/crawler_dashboard_json", methods=['GET'])
@login_required
@login_read_only
def crawler_dashboard_json():
crawler_metadata_onion = get_crawler_splash_status('onion')
crawler_metadata_regular = get_crawler_splash_status('regular')
now = datetime.datetime.now()
date = now.strftime("%Y%m%d")
statDomains_onion = get_stats_last_crawled_domains('onion', date)
statDomains_regular = get_stats_last_crawled_domains('regular', date)
return jsonify({'statDomains_onion': statDomains_onion, 'statDomains_regular': statDomains_regular,
'crawler_metadata_onion':crawler_metadata_onion, 'crawler_metadata_regular':crawler_metadata_regular})
# # TODO: refractor
@hiddenServices.route("/hiddenServices/last_crawled_domains_with_stats_json", methods=['GET'])
@login_required

View file

@ -92,30 +92,6 @@
<div id="barchart_type">
</div>
<div class="card mt-1 mb-1">
<div class="card-header text-white bg-dark">
Crawlers Status
</div>
<div class="card-body px-0 py-0 ">
<table class="table">
<tbody id="tbody_crawler_info">
{% for crawler in crawler_metadata %}
<tr>
<td>
<i class="fas fa-{%if crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};"></i> {{crawler['crawler_info']}}
</td>
<td>
{{crawler['crawling_domain']}}
</td>
<td style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};">
{{crawler['status_info']}}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
</div>
@ -189,79 +165,6 @@ function toggle_sidebar(){
}
</script>
<script>/*
function refresh_list_crawled(){
$.getJSON("{{ url_for('hiddenServices.last_crawled_domains_with_stats_json') }}",
function(data) {
var tableRef = document.getElementById('tbody_last_crawled');
$("#tbody_last_crawled").empty()
for (var i = 0; i < data.last_domains.length; i++) {
var data_domain = data.last_domains[i]
var newRow = tableRef.insertRow(tableRef.rows.length);
var newCell = newRow.insertCell(0);
newCell.innerHTML = "<td><a target=\"_blank\" href=\"{{ url_for('crawler_splash.showDomain') }}?onion_domain="+data_domain['domain']+"\">"+data_domain['domain']+"</a></td>";
newCell = newRow.insertCell(1);
newCell.innerHTML = "<td>"+data_domain['first_seen'].substr(0, 4)+"/"+data_domain['first_seen'].substr(4, 2)+"/"+data_domain['first_seen'].substr(6, 2)+"</td>"
newCell = newRow.insertCell(2);
newCell.innerHTML = "<td>"+data_domain['last_check'].substr(0, 4)+"/"+data_domain['last_check'].substr(4, 2)+"/"+data_domain['last_check'].substr(6, 2)+"</td>"
newCell = newRow.insertCell(3);
newCell.innerHTML = "<td><div style=\"color:"+data_domain['status_color']+"; display:inline-block\"><i class=\"fa "+data_domain['status_icon']+" fa-2x\"></i>"+data_domain['status_text']+"</div></td>"
}
var statDomains = data.statDomains
document.getElementById('text_domain_up').innerHTML = statDomains['domains_up']
document.getElementById('text_domain_down').innerHTML = statDomains['domains_down']
document.getElementById('text_domain_queue').innerHTML = statDomains['domains_queue']
document.getElementById('text_total_domains').innerHTML = statDomains['total']
if(data.crawler_metadata.length!=0){
$("#tbody_crawler_info").empty();
var tableRef = document.getElementById('tbody_crawler_info');
for (var i = 0; i < data.crawler_metadata.length; i++) {
var crawler = data.crawler_metadata[i];
var newRow = tableRef.insertRow(tableRef.rows.length);
var text_color;
var icon;
if(crawler['status']){
text_color = 'Green';
icon = 'check';
} else {
text_color = 'Red';
icon = 'times';
}
var newCell = newRow.insertCell(0);
newCell.innerHTML = "<td><i class=\"fa fa-"+icon+"-circle\" style=\"color:"+text_color+";\"></i>"+crawler['crawler_info']+"</td>";
newCell = newRow.insertCell(1);
newCell.innerHTML = "<td><a target=\"_blank\" href=\"{{ url_for('crawler_splash.showDomain') }}?onion_domain="+crawler['crawling_domain']+"\">"+crawler['crawling_domain']+"</a></td>";
newCell = newRow.insertCell(2);
newCell.innerHTML = "<td><div style=\"color:"+text_color+";\">"+crawler['status_info']+"</div></td>";
$("#panel_crawler").show();
}
} else {
$("#panel_crawler").hide();
}
}
);
if (to_refresh) {
setTimeout("refresh_list_crawled()", 10000);
}
}*/
</script>
<script>
var margin = {top: 20, right: 90, bottom: 55, left: 0},
width = parseInt(d3.select('#barchart_type').style('width'), 10);

View file

@ -1 +1 @@
<li id='page-hiddenServices'><a href="{{ url_for('hiddenServices.dashboard') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>
<li id='page-hiddenServices'><a href="{{ url_for('crawler_splash.crawlers_dashboard') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>

View file

@ -29,6 +29,8 @@
<div class="col-12 col-lg-10" id="core_content">
{% include 'dashboard/update_modal.html' %}
<div class="card mb-3 mt-1">
<div class="card-header text-white bg-dark pb-1">
<h5 class="card-title">AIL-framework Status :</h5>

View file

@ -1,6 +1,14 @@
{% if not crawler_enabled %}
{%if not is_manager_connected['status']%}
<div class="alert alert-secondary text-center my-2" role="alert">
<h1><i class="fas fa-times-circle text-danger"></i> Crawler Disabled</h1>
<p>...</p>
<p>
{%if 'error' in is_manager_connected%}
<b>{{is_manager_connected['status_code']}}</b>
<br>
<b>Error:</b> {{is_manager_connected['error']}}
{%else%}
<b>Error:</b> core/Crawler_manager not launched
{%endif%}
</p>
</div>
{% endif %}
{%endif%}

View file

@ -44,7 +44,31 @@
<div class="input-group" id="date-range-from">
<input type="text" class="form-control" id="url_to_crawl" name="url_to_crawl" placeholder="Address or Domain">
</div>
<div class="d-flex mt-1">
<div class="d-flex mt-2">
<i class="fas fa-spider mt-1"></i> &nbsp;Crawler Type&nbsp;&nbsp;
<div class="custom-control custom-switch">
<input class="custom-control-input" type="checkbox" name="queue_type_selector" value="True" id="queue_type_selector">
<label class="custom-control-label" for="queue_type_selector">
<i class="fas fa-splotch"></i> &nbsp;Splash Name
</label>
</div>
</div>
<div id="div_crawler_queue_type">
<select class="custom-select form-control" name="crawler_queue_type" id="crawler_queue_type">
{%for crawler_type in all_crawlers_types%}
<option value="{{crawler_type}}" {%if crawler_type=='tor'%}selected{%endif%}>{{crawler_type}}</option>
{%endfor%}
</select>
</div>
<div id="div_splash_name">
<select class="custom-select form-control" name="splash_name" id="splash_name">
<option value="None" selected>Don't use a special splash crawler</option>
{%for splash_name in all_splash_name%}
<option value="{{splash_name}}">{{splash_name}}</option>
{%endfor%}
</select>
</div>
<div class="d-flex mt-3">
<i class="fas fa-user-ninja mt-1"></i> &nbsp;Manual&nbsp;&nbsp;
<div class="custom-control custom-switch">
<input class="custom-control-input" type="checkbox" name="crawler_type" value="True" id="crawler_type">
@ -143,11 +167,16 @@ var chart = {};
$(document).ready(function(){
$("#page-Crawler").addClass("active");
$("#nav_manual_crawler").addClass("active");
queue_type_selector_input_controler()
manual_crawler_input_controler();
$('#crawler_type').on("change", function () {
manual_crawler_input_controler();
});
$('#queue_type_selector').on("change", function () {
queue_type_selector_input_controler();
});
});
function toggle_sidebar(){
@ -172,4 +201,14 @@ function manual_crawler_input_controler() {
}
}
function queue_type_selector_input_controler() {
if($('#queue_type_selector').is(':checked')){
$("#div_crawler_queue_type").hide();
$("#div_splash_name").show();
}else{
$("#div_crawler_queue_type").show();
$("#div_splash_name").hide();
}
}
</script>

View file

@ -36,34 +36,15 @@
<h5><a class="text-info" href="{{ url_for('hiddenServices.Crawler_Splash_last_by_type')}}?type=onion"><i class="fas fa-user-secret"></i> Onions Crawlers</a></h5>
<div class="row">
<div class="col-6">
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_domain_up">{{ statDomains_onion['domains_up'] }}</a> UP
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_onion_domain_down">{{ statDomains_onion['domains_down'] }}</a> DOWN
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_domain_up">{{ splash_crawlers_latest_stats['onion']['domains_up'] }}</a> UP
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_onion_domain_down">{{ splash_crawlers_latest_stats['onion']['domains_down'] }}</a> DOWN
</div>
<div class="col-6">
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_total">{{ statDomains_onion['total'] }}</a> Crawled
<span class="badge badge-warning ml-md-3" id="stat_onion_queue">{{ statDomains_onion['domains_queue'] }}</span> Queue
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_total">{{ splash_crawlers_latest_stats['onion']['total'] }}</a> Crawled
<span class="badge badge-warning ml-md-3" id="stat_onion_queue">{{ splash_crawlers_latest_stats['onion']['domains_queue'] }}</span> Queue
</div>
</div>
</div>
<div class="card-body px-0 py-0 ">
<table class="table">
<tbody id="tbody_crawler_onion_info">
{% for crawler in crawler_metadata_onion %}
<tr>
<td>
<i class="fas fa-{%if crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};"></i> {{crawler['crawler_info']}}
</td>
<td>
{{crawler['crawling_domain']}}
</td>
<td style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};">
{{crawler['status_info']}}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
@ -73,58 +54,63 @@
<h5><a class="text-info" href="{{ url_for('hiddenServices.Crawler_Splash_last_by_type')}}?type=regular"><i class="fab fa-html5"></i> Regular Crawlers</a></h5>
<div class="row">
<div class="col-6">
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_domain_up">{{ statDomains_regular['domains_up'] }}</a> UP
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_regular_domain_down">{{ statDomains_regular['domains_down'] }}</a> DOWN
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_domain_up">{{ splash_crawlers_latest_stats['regular']['domains_up'] }}</a> UP
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_regular_domain_down">{{ splash_crawlers_latest_stats['regular']['domains_down'] }}</a> DOWN
</div>
<div class="col-6">
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_total">{{ statDomains_regular['total'] }}</a> Crawled
<span class="badge badge-warning ml-md-3" id="stat_regular_queue">{{ statDomains_regular['domains_queue'] }}</span> Queue
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_total">{{ splash_crawlers_latest_stats['regular']['total'] }}</a> Crawled
<span class="badge badge-warning ml-md-3" id="stat_regular_queue">{{ splash_crawlers_latest_stats['regular']['domains_queue'] }}</span> Queue
</div>
</div>
</div>
<div class="card-body px-0 py-0 ">
<table class="table">
<tbody id="tbody_crawler_regular_info">
{% for crawler in crawler_metadata_regular %}
<tr>
<td>
<i class="fas fa-{%if crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};"></i> {{crawler['crawler_info']}}
</td>
<td>
{{crawler['crawling_domain']}}
</td>
<td style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};">
{{crawler['status_info']}}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
</div>
{% include 'domains/block_domains_name_search.html' %}
<table class="table">
<tbody id="tbody_crawler_onion_info">
{% for splash_crawler in all_splash_crawler_status %}
<tr>
<td>
<i class="fas fa-{%if splash_crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if splash_crawler['status']%}Green{%else%}Red{%endif%};"></i> {{splash_crawler['crawler_info']}}
</td>
<td>
{%if splash_crawler['type']=='onion'%}
<i class="fas fa-user-secret"></i>
{%else%}
<i class="fab fa-html5">
{%endif%}
</td>
<td>
{{splash_crawler['crawling_domain']}}
</td>
<td style="color:{%if splash_crawler['status']%}Green{%else%}Red{%endif%};">
{{splash_crawler['status_info']}}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% include 'domains/block_domains_name_search.html' %}
<hr>
<div class="row mb-3">
<div class="col-xl-6">
<div class="text-center">
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_onion')}}" role="button">
<i class="fas fa-user-secret"></i> Onion Domain Explorer
</a>
<hr>
<div class="row mb-3">
<div class="col-xl-6">
<div class="text-center">
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_onion')}}" role="button">
<i class="fas fa-user-secret"></i> Onion Domain Explorer
</a>
</div>
</div>
<div class="col-xl-6">
<div class="text-center">
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_web')}}" role="button">
<i class="fab fa-html5"></i> Web Domain Explorer
</a>
</div>
</div>
</div>
<div class="col-xl-6">
<div class="text-center">
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_web')}}" role="button">
<i class="fab fa-html5"></i> Web Domain Explorer
</a>
</div>
</div>
</div>
@ -176,24 +162,24 @@ function toggle_sidebar(){
function refresh_crawler_status(){
$.getJSON("{{ url_for('hiddenServices.crawler_dashboard_json') }}",
$.getJSON("{{ url_for('crawler_splash.crawler_dashboard_json') }}",
function(data) {
$('#stat_onion_domain_up').text(data.statDomains_onion['domains_up']);
$('#stat_onion_domain_down').text(data.statDomains_onion['domains_down']);
$('#stat_onion_total').text(data.statDomains_onion['total']);
$('#stat_onion_queue').text(data.statDomains_onion['domains_queue']);
$('#stat_onion_domain_up').text(data.splash_crawlers_latest_stats['onion']['domains_up']);
$('#stat_onion_domain_down').text(data.splash_crawlers_latest_stats['onion']['domains_down']);
$('#stat_onion_total').text(data.splash_crawlers_latest_stats['onion']['total']);
$('#stat_onion_queue').text(data.splash_crawlers_latest_stats['onion']['domains_queue']);
$('#stat_regular_domain_up').text(data.statDomains_regular['domains_up']);
$('#stat_regular_domain_down').text(data.statDomains_regular['domains_down']);
$('#stat_regular_total').text(data.statDomains_regular['total']);
$('#stat_regular_queue').text(data.statDomains_regular['domains_queue']);
$('#stat_regular_domain_up').text(data.splash_crawlers_latest_stats['regular']['domains_up']);
$('#stat_regular_domain_down').text(data.splash_crawlers_latest_stats['regular']['domains_down']);
$('#stat_regular_total').text(data.splash_crawlers_latest_stats['regular']['total']);
$('#stat_regular_queue').text(data.splash_crawlers_latest_stats['regular']['domains_queue']);
if(data.crawler_metadata_onion.length!=0){
if(data.all_splash_crawler_status.length!=0){
$("#tbody_crawler_onion_info").empty();
var tableRef = document.getElementById('tbody_crawler_onion_info');
for (var i = 0; i < data.crawler_metadata_onion.length; i++) {
var crawler = data.crawler_metadata_onion[i];
for (var i = 0; i < data.all_splash_crawler_status.length; i++) {
var crawler = data.all_splash_crawler_status[i];
var newRow = tableRef.insertRow(tableRef.rows.length);
var text_color;
var icon;
@ -205,41 +191,22 @@ function refresh_crawler_status(){
icon = 'times';
}
var newCell = newRow.insertCell(0);
newCell.innerHTML = "<td><i class=\"fas fa-"+icon+"-circle\" style=\"color:"+text_color+";\"></i> "+crawler['crawler_info']+"</td>";
newCell = newRow.insertCell(1);
newCell.innerHTML = "<td>"+crawler['crawling_domain']+"</td>";
newCell = newRow.insertCell(2);
newCell.innerHTML = "<td><div style=\"color:"+text_color+";\">"+crawler['status_info']+"</div></td>";
//$("#panel_crawler").show();
}
}
if(data.crawler_metadata_regular.length!=0){
$("#tbody_crawler_regular_info").empty();
var tableRef = document.getElementById('tbody_crawler_regular_info');
for (var i = 0; i < data.crawler_metadata_regular.length; i++) {
var crawler = data.crawler_metadata_regular[i];
var newRow = tableRef.insertRow(tableRef.rows.length);
var text_color;
var icon;
if(crawler['status']){
text_color = 'Green';
icon = 'check';
if(crawler['type'] === 'onion'){
icon_t = 'fas fa-user-secret';
} else {
text_color = 'Red';
icon = 'times';
icon_t = 'fab fa-html5';
}
var newCell = newRow.insertCell(0);
newCell.innerHTML = "<td><i class=\"fas fa-"+icon+"-circle\" style=\"color:"+text_color+";\"></i> "+crawler['crawler_info']+"</td>";
newCell = newRow.insertCell(1);
newCell.innerHTML = "<td>"+crawler['crawling_domain']+"</td>";
var newCell = newRow.insertCell(1);
newCell.innerHTML = "<td><i class=\""+icon_t+"\"></i></td>";
newCell = newRow.insertCell(2);
newCell.innerHTML = "<td>"+crawler['crawling_domain']+"</td>";
newCell = newRow.insertCell(3);
newCell.innerHTML = "<td><div style=\"color:"+text_color+";\">"+crawler['status_info']+"</div></td>";
//$("#panel_crawler").show();

View file

@ -0,0 +1,60 @@
<!DOCTYPE html>
<html>
<head>
<title>AIL-Framework</title>
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png')}}">
<!-- Core CSS -->
<link href="{{ url_for('static', filename='css/bootstrap4.min.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='css/font-awesome.min.css') }}" rel="stylesheet">
<!-- JS -->
<script src="{{ url_for('static', filename='js/jquery.js')}}"></script>
<script src="{{ url_for('static', filename='js/bootstrap4.min.js')}}"></script>
</head>
<body>
{% include 'nav_bar.html' %}
<div class="container-fluid">
<div class="row">
{% include 'crawler/menu_sidebar.html' %}
<div class="col-12 col-lg-10" id="core_content">
<form action="{{ url_for('crawler_splash.crawler_splash_setings_crawlers_to_lauch') }}" method="post" enctype="multipart/form-data">
<h5 class="card-title">Number of Crawlers to Launch:</h5>
<table class="table table-sm">
<tbody>
{%for crawler_name in nb_crawlers_to_launch%}
<tr>
<td>{{crawler_name}}</td>
<td>
<input class="form-control" type="number" id="{{crawler_name}}" value="{{nb_crawlers_to_launch[crawler_name]}}" min="0" name="{{crawler_name}}" required>
</td>
</tr>
{%endfor%}
</tbody>
</table>
<button type="submit" class="btn btn-primary">Edit <i class="fas fa-pencil-alt"></i></button>
</form>
</div>
</div>
</div>
</body>
<script>
var to_refresh = false
$(document).ready(function(){
$("#page-Crawler").addClass("active");
$("#nav_settings").addClass("active");
});
</script>

View file

@ -0,0 +1,55 @@
<!DOCTYPE html>
<html>
<head>
<title>AIL-Framework</title>
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png')}}">
<!-- Core CSS -->
<link href="{{ url_for('static', filename='css/bootstrap4.min.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='css/font-awesome.min.css') }}" rel="stylesheet">
<!-- JS -->
<script src="{{ url_for('static', filename='js/jquery.js')}}"></script>
<script src="{{ url_for('static', filename='js/bootstrap4.min.js')}}"></script>
</head>
<body>
{% include 'nav_bar.html' %}
<div class="container-fluid">
<div class="row">
{% include 'crawler/menu_sidebar.html' %}
<div class="col-12 col-lg-10" id="core_content">
<form action="{{ url_for('crawler_splash.crawler_splash_setings_crawler_manager') }}" method="post" enctype="multipart/form-data">
<div class="form-group">
<label for="splash_manager_url">Splash Manager URL</label>
<input type="text" class="form-control" id="splash_manager_url" placeholder="https://splash_manager_url" name="splash_manager_url" {%if splash_manager_url%}value="{{splash_manager_url}}"{%endif%}>
</div>
<div class="form-group">
<label for="api_key">API Key</label>
<input type="text" class="form-control" id="api_key" placeholder="API Key" name="api_key" {%if api_key%}value="{{api_key}}"{%endif%}>
</div>
<button type="submit" class="btn btn-primary">Edit <i class="fas fa-pencil-alt"></i></button>
</form>
</div>
</div>
</div>
</body>
<script>
var to_refresh = false
$(document).ready(function(){
$("#page-Crawler").addClass("active");
$("#nav_settings").addClass("active");
});
</script>

View file

@ -0,0 +1,299 @@
<!DOCTYPE html>
<html>
<head>
<title>AIL-Framework</title>
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png')}}">
<!-- Core CSS -->
<link href="{{ url_for('static', filename='css/bootstrap4.min.css') }}" rel="stylesheet">
<link href="{{ url_for('static', filename='css/font-awesome.min.css') }}" rel="stylesheet">
<!-- JS -->
<script src="{{ url_for('static', filename='js/jquery.js')}}"></script>
<script src="{{ url_for('static', filename='js/bootstrap4.min.js')}}"></script>
</head>
<body>
{% include 'nav_bar.html' %}
<div class="container-fluid">
<div class="row">
{% include 'crawler/menu_sidebar.html' %}
<div class="col-12 col-lg-10" id="core_content">
<div class="row">
<div class="col-xl-6">
</div>
<div class="col-xl-6">
</div>
</div>
<div class="card mb-3 mt-1">
<div class="card-header bg-dark text-white">
<span class="badge badge-pill badge-light flex-row-reverse float-right">
{% if is_manager_connected['status'] %}
<div style="color:Green;">
<i class="fas fa-check-circle fa-2x"></i>
Connected
</div>
{% else %}
<div style="color:Red;">
<i class="fas fa-times-circle fa-2x"></i>
Error
</div>
{% endif %}
</span>
<h4>Splash Crawler Manager</h4>
</div>
<div class="card-body">
{%if not is_manager_connected['status']%}
{% include 'crawler/crawler_disabled.html' %}
{%endif%}
<div class="row mb-3 justify-content-center">
<div class="col-xl-6">
<div class="card text-center border-secondary">
<div class="card-body px-1 py-0">
<table class="table table-sm">
<tbody>
<tr>
<td>Splash Manager URL</td>
<td>{{splash_manager_url}}</td>
</tr>
<tr>
<td>API Key</td>
<td>
{{api_key}}
<!-- <a class="ml-3" href="/settings/new_token"><i class="fa fa-random"></i></a> -->
</td>
<td>
<a href="{{ url_for('crawler_splash.crawler_splash_setings_crawler_manager') }}">
<button type="button" class="btn btn-info">
Edit <i class="fas fa-pencil-alt"></i>
</button>
</a>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
<div {%if not is_manager_connected%}class="hidden"{%endif%}>
<div class="card border-secondary mb-4">
<div class="card-body text-dark">
<h5 class="card-title">Number of Crawlers to Launch:</h5>
<table class="table table-sm">
<tbody>
{%for crawler in nb_crawlers_to_launch%}
<tr>
<td>{{crawler}}</td>
<td>{{nb_crawlers_to_launch[crawler]}}</td>
</tr>
{%endfor%}
</tbody>
</table>
<a href="{{ url_for('crawler_splash.crawler_splash_setings_crawlers_to_lauch') }}">
<button type="button" class="btn btn-info">
Edit number of crawlers to launch <i class="fas fa-pencil-alt"></i>
</button>
</a>
</div>
</div>
<div class="card border-secondary mb-4">
<div class="card-body text-dark">
<h5 class="card-title">All Splash Crawlers:</h5>
<table class="table table-striped">
<thead class="bg-info text-white">
<th>
Splash name
</th>
<th>
Proxy
</th>
<th>
Crawler type
</th>
<th>
Description
</th>
<th></th>
</thead>
<tbody>
{% for splash_name in all_splash %}
<tr>
<td>
{{splash_name}}
</td>
<td>
{{all_splash[splash_name]['proxy']}}
</td>
<td>
{%if all_splash[splash_name]['type']=='tor'%}
<i class="fas fa-user-secret"></i>
{%else%}
<i class="fab fa-html5">
{%endif%}
{{all_splash[splash_name]['type']}}
</td>
<td>
{{all_splash[splash_name]['description']}}
</td>
<td>
<div class="d-flex justify-content-end">
<button class="btn btn-outline-dark px-1 py-0">
<i class="fas fa-pencil-alt"></i>
</button>
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<div class="card border-secondary">
<div class="card-body text-dark">
<h5 class="card-title">All Proxies:</h5>
<table class="table table-striped">
<thead class="bg-info text-white">
<th>
Proxy name
</th>
<th>
Host
</th>
<th>
Port
</th>
<th>
Type
</th>
<th>
Crawler Type
</th>
<th>
Description
</th>
<th></th>
</thead>
<tbody>
{% for proxy_name in all_proxies %}
<tr>
<td>
{{proxy_name}}
</td>
<td>
{{all_proxies[proxy_name]['host']}}
</td>
<td>
{{all_proxies[proxy_name]['port']}}
</td>
<td>
{{all_proxies[proxy_name]['type']}}
</td>
<td>
{%if all_proxies[proxy_name]['crawler_type']=='tor'%}
<i class="fas fa-user-secret"></i>
{%else%}
<i class="fab fa-html5">
{%endif%}
{{all_proxies[proxy_name]['crawler_type']}}
</td>
<td>
{{all_proxies[proxy_name]['description']}}
</td>
<td>
<div class="d-flex justify-content-end">
<button class="btn btn-outline-dark px-1 py-0">
<i class="fas fa-pencil-alt"></i>
</button>
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
<div class="card mb-3 mt-1">
<div class="card-header bg-dark text-white">
<h4>Crawlers Settings</h4>
</div>
<div class="card-body">
<table class="table table-striped table-hover">
<thead class="bg-info text-white">
<th>
Key
</th>
<th>
Description
</th>
<th>
Value
</th>
<th></th>
</thead>
<tbody>
{% for config_field in crawler_full_config %}
<tr>
<td>
{{config_field}}
</td>
<td>
{{crawler_full_config[config_field]['info']}}
</td>
<td>
{{crawler_full_config[config_field]['value']}}
</td>
<td>
<div class="d-flex justify-content-end">
<button class="btn btn-outline-dark px-1 py-0">
<i class="fas fa-pencil-alt"></i>
</button>
</div>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
</div>
</div>
</div>
</body>
<script>
var to_refresh = false
$(document).ready(function(){
$("#page-Crawler").addClass("active");
$("#nav_settings").addClass("active");
});
</script>

View file

@ -14,7 +14,7 @@
</h5>
<ul class="nav flex-md-column flex-row navbar-nav justify-content-between w-100"> <!--nav-pills-->
<li class="nav-item">
<a class="nav-link" href="{{url_for('hiddenServices.dashboard')}}" id="nav_dashboard">
<a class="nav-link" href="{{url_for('crawler_splash.crawlers_dashboard')}}" id="nav_dashboard">
<i class="fas fa-search"></i>
<span>Dashboard</span>
</a>
@ -43,6 +43,12 @@
Automatic Crawler
</a>
</li>
<li class="nav-item">
<a class="nav-link" href="{{url_for('crawler_splash.crawler_splash_setings')}}" id="nav_settings">
<i class="fas fa-cog"></i>
Settings
</a>
</li>
</ul>
<h5 class="d-flex text-muted w-100" id="nav_title_domains_explorer">

View file

@ -0,0 +1,42 @@
<div class="modal fade" id="update_modal" tabindex="-1" role="dialog" aria-labelledby="exampleModalLabel" aria-hidden="true">
<div class="modal-dialog modal-lg" role="document">
<div class="modal-content">
<div class="modal-header bg-secondary text-white">
<h5 class="modal-title" id="exampleModalLabel">Update Note: v3.5 - Splash Manager</h5>
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
<span aria-hidden="true">&times;</span>
</button>
</div>
<div class="modal-body">
<div class="alert alert-danger text-danger" role="alert">All Splash Crawlers have been removed from the core.</div>
AIL is using a new Crawler manager to start and launch dockers and tor/web crawlers
<ul class="list-group my-3">
<li class="list-group-item active">Splash Manager Features:</li>
<li class="list-group-item">Install and run Splash crawlers on another server</li>
<li class="list-group-item">Handle proxies (Web and tor)</li>
<li class="list-group-item">Launch/Kill Splash Dockers</li>
<li class="list-group-item">Restart crawlers on crash</li>
</ul>
<div class="d-flex justify-content-center">
<a class="btn btn-info" href="https://github.com/ail-project/ail-splash-manager" role="button">
<i class="fab fa-github"></i> Install and Configure AIL-Splash-Manager
</a>
</div>
</div>
<div class="modal-footer">
<button type="button" class="btn btn-secondary" data-dismiss="modal">Close</button>
</div>
</div>
</div>
</div>
<script>
$(window).on('load', function() {
$('#update_modal').modal('show');
});
</script>

View file

@ -22,7 +22,7 @@
<a class="nav-link" id="page-Tracker" href="{{ url_for('hunter.tracked_menu') }}" aria-disabled="true"><i class="fas fa-crosshairs"></i> Leaks Hunter</a>
</li>
<li class="nav-item mr-3">
<a class="nav-link" id="page-Crawler" href="{{ url_for('hiddenServices.dashboard') }}" tabindex="-1" aria-disabled="true"><i class="fas fa-spider"></i> Crawlers</a>
<a class="nav-link" id="page-Crawler" href="{{ url_for('crawler_splash.crawlers_dashboard') }}" tabindex="-1" aria-disabled="true"><i class="fas fa-spider"></i> Crawlers</a>
</li>
<li class="nav-item mr-3">
<a class="nav-link" id="page-Decoded" href="{{ url_for('hashDecoded.hashDecoded_page') }}" aria-disabled="true"><i class="fas fa-cube"></i> Objects</a>