mirror of
https://github.com/ail-project/ail-framework.git
synced 2024-11-10 08:38:28 +00:00
Merge pull request #89 from ail-project/crawler_manager
Crawler manager
This commit is contained in:
commit
fa05244ee0
32 changed files with 1566 additions and 412 deletions
68
HOWTO.md
68
HOWTO.md
|
@ -89,76 +89,34 @@ Also, you can quickly stop or start modules by clicking on the ``<K>`` or ``<S>`
|
|||
Finally, you can quit this program by pressing either ``<q>`` or ``<C-c>``.
|
||||
|
||||
|
||||
Terms frequency usage
|
||||
---------------------
|
||||
|
||||
In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface.
|
||||
- You can track a term by simply putting it in the box.
|
||||
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
|
||||
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.
|
||||
|
||||
|
||||
Crawler
|
||||
---------------------
|
||||
|
||||
In AIL, you can crawl Tor hidden services. Don't forget to review the proxy configuration of your Tor client and especially if you enabled the SOCKS5 proxy and binding on the appropriate IP address reachable via the dockers where Splash runs.
|
||||
|
||||
There are two types of installation. You can install a *local* or a *remote* Splash server.
|
||||
``(Splash host) = the server running the splash service``
|
||||
``(AIL host) = the server running AIL``
|
||||
|
||||
### Installation/Configuration
|
||||
|
||||
1. *(Splash host)* Launch ``crawler_hidden_services_install.sh`` to install all requirements (type ``y`` if a localhost splash server is used or use the ``-y`` option)
|
||||
|
||||
2. *(Splash host)* To install and setup your tor proxy:
|
||||
- Install the tor proxy: ``sudo apt-get install tor -y``
|
||||
(Not required if ``Splash host == AIL host`` - The tor proxy is installed by default in AIL)
|
||||
|
||||
(Warning: Some v3 onion address are not resolved with the tor proxy provided via apt get. Use the tor proxy provided by [The torproject](https://2019.www.torproject.org/docs/debian) to solve this issue)
|
||||
- Allow Tor to bind to any interface or to the docker interface (by default binds to 127.0.0.1 only) in ``/etc/tor/torrc``
|
||||
``SOCKSPort 0.0.0.0:9050`` or
|
||||
``SOCKSPort 172.17.0.1:9050``
|
||||
- Add the following line ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
|
||||
(for a linux docker, the localhost IP is *172.17.0.1*; Should be adapted for other platform)
|
||||
- Restart the tor proxy: ``sudo service tor restart``
|
||||
|
||||
3. *(AIL host)* Edit the ``/configs/core.cfg`` file:
|
||||
- In the crawler section, set ``activate_crawler`` to ``True``
|
||||
- Change the IP address of Splash servers if needed (remote only)
|
||||
- Set ``splash_onion_port`` according to your Splash servers port numbers that will be used.
|
||||
those ports numbers should be described as a single port (ex: 8050) or a port range (ex: 8050-8052 for 8050,8051,8052 ports).
|
||||
### Installation
|
||||
|
||||
|
||||
### Starting the scripts
|
||||
[Install AIL-Splash-Manager](https://github.com/ail-project/ail-splash-manager)
|
||||
|
||||
- *(Splash host)* Launch all Splash servers with:
|
||||
```sudo ./bin/torcrawler/launch_splash_crawler.sh -f <config absolute_path> -p <port_start> -n <number_of_splash>```
|
||||
With ``<port_start>`` and ``<number_of_splash>`` matching those specified at ``splash_onion_port`` in the configuration file of point 3 (``/configs/core.cfg``)
|
||||
### Configuration
|
||||
|
||||
All Splash dockers are launched inside the ``Docker_Splash`` screen. You can use ``sudo screen -r Docker_Splash`` to connect to the screen session and check all Splash servers status.
|
||||
|
||||
- (AIL host) launch all AIL crawler scripts using:
|
||||
```./bin/LAUNCH.sh -c```
|
||||
1. Search the Splash-Manager API key. This API key is generated when you launch the manager for the first time.
|
||||
(located in your Splash Manager directory ``ail-splash-manager/token_admin.txt``)
|
||||
|
||||
|
||||
### TL;DR - Local setup
|
||||
2. Splash Manager URL and API Key:
|
||||
In the webinterface, go to ``Crawlers>Settings`` and click on the Edit button
|
||||
![Splash Manager Config](./doc/screenshots/splash_manager_config_edit_1.png?raw=true "AIL framework Splash Manager Config")
|
||||
|
||||
#### Installation
|
||||
- ```crawler_hidden_services_install.sh -y```
|
||||
- Add the following line in ``SOCKSPolicy accept 172.17.0.0/16`` in ``/etc/tor/torrc``
|
||||
- ```sudo service tor restart```
|
||||
- set activate_crawler to True in ``/configs/core.cfg``
|
||||
#### Start
|
||||
- ```sudo ./bin/torcrawler/launch_splash_crawler.sh -f $AIL_HOME/configs/docker/splash_onion/etc/splash/proxy-profiles/ -p 8050 -n 1```
|
||||
![Splash Manager Config](./doc/screenshots/splash_manager_config_edit_2.png?raw=true "AIL framework Splash Manager Config")
|
||||
|
||||
If AIL framework is not started, it's required to start it before the crawler service:
|
||||
3. Launch AIL Crawlers:
|
||||
Choose the number of crawlers you want to launch
|
||||
![Splash Manager Nb Crawlers Config](./doc/screenshots/splash_manager_nb_crawlers_1.png?raw=true "AIL framework Nb Crawlers Config")
|
||||
![Splash Manager Nb Crawlers Config](./doc/screenshots/splash_manager_nb_crawlers_2.png?raw=true "AIL framework Nb Crawlers Config")
|
||||
|
||||
- ```./bin/LAUNCH.sh -l```
|
||||
|
||||
Then starting the crawler service (if you follow the procedure above)
|
||||
|
||||
- ```./bin/LAUNCH.sh -c```
|
||||
|
||||
|
||||
#### Old updates
|
||||
|
|
27
OVERVIEW.md
27
OVERVIEW.md
|
@ -420,6 +420,33 @@ Supported cryptocurrency:
|
|||
}
|
||||
```
|
||||
|
||||
### Splash containers and proxies:
|
||||
| SET - Key | Value |
|
||||
| ------ | ------ |
|
||||
| all_proxy | **proxy name** |
|
||||
| all_splash | **splash name** |
|
||||
|
||||
| HSET - Key | Field | Value |
|
||||
| ------ | ------ | ------ |
|
||||
| proxy:metadata:**proxy name** | host | **host** |
|
||||
| proxy:metadata:**proxy name** | port | **port** |
|
||||
| proxy:metadata:**proxy name** | type | **type** |
|
||||
| proxy:metadata:**proxy name** | crawler_type | **crawler_type** |
|
||||
| proxy:metadata:**proxy name** | description | **proxy description** |
|
||||
| | | |
|
||||
| splash:metadata:**splash name** | description | **splash description** |
|
||||
| splash:metadata:**splash name** | crawler_type | **crawler_type** |
|
||||
| splash:metadata:**splash name** | proxy | **splash proxy (None if null)** |
|
||||
|
||||
| SET - Key | Value |
|
||||
| ------ | ------ |
|
||||
| splash:url:**container name** | **splash url** |
|
||||
| proxy:splash:**proxy name** | **container name** |
|
||||
|
||||
| Key | Value |
|
||||
| ------ | ------ |
|
||||
| splash:map:url:name:**splash url** | **container name** |
|
||||
|
||||
##### CRAWLER QUEUES:
|
||||
| SET - Key | Value |
|
||||
| ------ | ------ |
|
||||
|
|
109
bin/Crawler.py
109
bin/Crawler.py
|
@ -19,6 +19,9 @@ sys.path.append(os.environ['AIL_BIN'])
|
|||
from Helper import Process
|
||||
from pubsublogger import publisher
|
||||
|
||||
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
|
||||
import crawlers
|
||||
|
||||
# ======== FUNCTIONS ========
|
||||
|
||||
def load_blacklist(service_type):
|
||||
|
@ -117,43 +120,6 @@ def unpack_url(url):
|
|||
|
||||
return to_crawl
|
||||
|
||||
# get url, paste and service_type to crawl
|
||||
def get_elem_to_crawl(rotation_mode):
|
||||
message = None
|
||||
domain_service_type = None
|
||||
|
||||
#load_priority_queue
|
||||
for service_type in rotation_mode:
|
||||
message = redis_crawler.spop('{}_crawler_priority_queue'.format(service_type))
|
||||
if message is not None:
|
||||
domain_service_type = service_type
|
||||
break
|
||||
#load_discovery_queue
|
||||
if message is None:
|
||||
for service_type in rotation_mode:
|
||||
message = redis_crawler.spop('{}_crawler_discovery_queue'.format(service_type))
|
||||
if message is not None:
|
||||
domain_service_type = service_type
|
||||
break
|
||||
#load_normal_queue
|
||||
if message is None:
|
||||
for service_type in rotation_mode:
|
||||
message = redis_crawler.spop('{}_crawler_queue'.format(service_type))
|
||||
if message is not None:
|
||||
domain_service_type = service_type
|
||||
break
|
||||
|
||||
if message:
|
||||
splitted = message.rsplit(';', 1)
|
||||
if len(splitted) == 2:
|
||||
url, paste = splitted
|
||||
if paste:
|
||||
paste = paste.replace(PASTES_FOLDER+'/', '')
|
||||
|
||||
message = {'url': url, 'paste': paste, 'type_service': domain_service_type, 'original_message': message}
|
||||
|
||||
return message
|
||||
|
||||
def get_crawler_config(redis_server, mode, service_type, domain, url=None):
|
||||
crawler_options = {}
|
||||
if mode=='auto':
|
||||
|
@ -175,14 +141,17 @@ def get_crawler_config(redis_server, mode, service_type, domain, url=None):
|
|||
redis_server.delete('crawler_config:{}:{}:{}'.format(mode, service_type, domain))
|
||||
return crawler_options
|
||||
|
||||
def load_crawler_config(service_type, domain, paste, url, date):
|
||||
def load_crawler_config(queue_type, service_type, domain, paste, url, date):
|
||||
crawler_config = {}
|
||||
crawler_config['splash_url'] = splash_url
|
||||
crawler_config['splash_url'] = f'http://{splash_url}'
|
||||
crawler_config['item'] = paste
|
||||
crawler_config['service_type'] = service_type
|
||||
crawler_config['domain'] = domain
|
||||
crawler_config['date'] = date
|
||||
|
||||
if queue_type and queue_type != 'tor':
|
||||
service_type = queue_type
|
||||
|
||||
# Auto and Manual Crawling
|
||||
# Auto ################################################# create new entry, next crawling => here or when ended ?
|
||||
if paste == 'auto':
|
||||
|
@ -224,26 +193,29 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
|
|||
crawler_config['port'] = port
|
||||
print('Launching Crawler: {}'.format(url))
|
||||
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'crawling_domain', domain)
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'crawling_domain', domain)
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
|
||||
|
||||
retry = True
|
||||
nb_retry = 0
|
||||
while retry:
|
||||
try:
|
||||
r = requests.get(splash_url , timeout=30.0)
|
||||
r = requests.get(f'http://{splash_url}' , timeout=30.0)
|
||||
retry = False
|
||||
except Exception:
|
||||
# TODO: relaunch docker or send error message
|
||||
nb_retry += 1
|
||||
|
||||
if nb_retry == 2:
|
||||
crawlers.restart_splash_docker(splash_url, splash_name)
|
||||
|
||||
if nb_retry == 6:
|
||||
on_error_send_message_back_in_queue(type_service, domain, message)
|
||||
publisher.error('{} SPASH DOWN'.format(splash_url))
|
||||
print('--------------------------------------')
|
||||
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
|
||||
print(' {} DOWN'.format(splash_url))
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'SPLASH DOWN')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'SPLASH DOWN')
|
||||
nb_retry == 0
|
||||
|
||||
print(' \033[91m DOCKER SPLASH NOT AVAILABLE\033[0m')
|
||||
|
@ -251,7 +223,7 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
|
|||
time.sleep(10)
|
||||
|
||||
if r.status_code == 200:
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Crawling')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Crawling')
|
||||
# save config in cash
|
||||
UUID = str(uuid.uuid4())
|
||||
r_cache.set('crawler_request:{}'.format(UUID), json.dumps(crawler_config))
|
||||
|
@ -273,8 +245,10 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
|
|||
print('')
|
||||
print(' PROXY DOWN OR BAD CONFIGURATION\033[0m'.format(splash_url))
|
||||
print('------------------------------------------------------------------------')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Error')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Error')
|
||||
exit(-2)
|
||||
else:
|
||||
crawlers.update_splash_manager_connection_status(True)
|
||||
else:
|
||||
print(process.stdout.read())
|
||||
exit(-1)
|
||||
|
@ -283,7 +257,7 @@ def crawl_onion(url, domain, port, type_service, message, crawler_config):
|
|||
print('--------------------------------------')
|
||||
print(' \033[91m DOCKER SPLASH DOWN\033[0m')
|
||||
print(' {} DOWN'.format(splash_url))
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Crawling')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Crawling')
|
||||
exit(1)
|
||||
|
||||
# check external links (full_crawl)
|
||||
|
@ -305,13 +279,27 @@ def search_potential_source_domain(type_service, domain):
|
|||
if __name__ == '__main__':
|
||||
|
||||
if len(sys.argv) != 2:
|
||||
print('usage:', 'Crawler.py', 'splash_port')
|
||||
print('usage:', 'Crawler.py', 'splash_url')
|
||||
exit(1)
|
||||
##################################################
|
||||
#mode = sys.argv[1]
|
||||
splash_port = sys.argv[1]
|
||||
splash_url = sys.argv[1]
|
||||
|
||||
splash_name = crawlers.get_splash_name_by_url(splash_url)
|
||||
proxy_name = crawlers.get_splash_proxy(splash_name)
|
||||
crawler_type = crawlers.get_splash_crawler_type(splash_name)
|
||||
|
||||
print(f'SPLASH Name: {splash_name}')
|
||||
print(f'Proxy Name: {proxy_name}')
|
||||
print(f'Crawler Type: {crawler_type}')
|
||||
|
||||
#time.sleep(10)
|
||||
#sys.exit(0)
|
||||
|
||||
#rotation_mode = deque(['onion', 'regular'])
|
||||
all_crawler_queues = crawlers.get_crawler_queue_types_by_splash_name(splash_name)
|
||||
rotation_mode = deque(all_crawler_queues)
|
||||
print(rotation_mode)
|
||||
|
||||
rotation_mode = deque(['onion', 'regular'])
|
||||
default_proto_map = {'http': 80, 'https': 443}
|
||||
######################################################## add ftp ???
|
||||
|
||||
|
@ -323,7 +311,6 @@ if __name__ == '__main__':
|
|||
# Setup the I/O queues
|
||||
p = Process(config_section)
|
||||
|
||||
splash_url = '{}:{}'.format( p.config.get("Crawler", "splash_url"), splash_port)
|
||||
print('splash url: {}'.format(splash_url))
|
||||
|
||||
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], p.config.get("Directories", "pastes"))
|
||||
|
@ -346,7 +333,7 @@ if __name__ == '__main__':
|
|||
db=p.config.getint("ARDB_Onion", "db"),
|
||||
decode_responses=True)
|
||||
|
||||
faup = Faup()
|
||||
faup = crawlers.get_faup()
|
||||
|
||||
# get HAR files
|
||||
default_crawler_har = p.config.getboolean("Crawler", "default_crawler_har")
|
||||
|
@ -372,9 +359,9 @@ if __name__ == '__main__':
|
|||
'user_agent': p.config.get("Crawler", "default_crawler_user_agent")}
|
||||
|
||||
# Track launched crawler
|
||||
r_cache.sadd('all_crawler', splash_port)
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Waiting')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
|
||||
r_cache.sadd('all_splash_crawlers', splash_url)
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Waiting')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'started_time', datetime.datetime.now().strftime("%Y/%m/%d - %H:%M.%S"))
|
||||
|
||||
# update hardcoded blacklist
|
||||
load_blacklist('onion')
|
||||
|
@ -385,7 +372,7 @@ if __name__ == '__main__':
|
|||
update_auto_crawler()
|
||||
|
||||
rotation_mode.rotate()
|
||||
to_crawl = get_elem_to_crawl(rotation_mode)
|
||||
to_crawl = crawlers.get_elem_to_crawl_by_queue_type(rotation_mode)
|
||||
if to_crawl:
|
||||
url_data = unpack_url(to_crawl['url'])
|
||||
# remove domain from queue
|
||||
|
@ -408,9 +395,9 @@ if __name__ == '__main__':
|
|||
'epoch': int(time.time())}
|
||||
|
||||
# Update crawler status type
|
||||
r_cache.sadd('{}_crawlers'.format(to_crawl['type_service']), splash_port)
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'type', to_crawl['type_service'])
|
||||
|
||||
crawler_config = load_crawler_config(to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
|
||||
crawler_config = load_crawler_config(to_crawl['queue_type'], to_crawl['type_service'], url_data['domain'], to_crawl['paste'], to_crawl['url'], date)
|
||||
# check if default crawler
|
||||
if not crawler_config['requested']:
|
||||
# Auto crawl only if service not up this month
|
||||
|
@ -456,11 +443,11 @@ if __name__ == '__main__':
|
|||
redis_crawler.ltrim('last_{}'.format(to_crawl['type_service']), 0, 15)
|
||||
|
||||
#update crawler status
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_port), 'status', 'Waiting')
|
||||
r_cache.hdel('metadata_crawler:{}'.format(splash_port), 'crawling_domain')
|
||||
r_cache.hset('metadata_crawler:{}'.format(splash_url), 'status', 'Waiting')
|
||||
r_cache.hdel('metadata_crawler:{}'.format(splash_url), 'crawling_domain')
|
||||
|
||||
# Update crawler status type
|
||||
r_cache.srem('{}_crawlers'.format(to_crawl['type_service']), splash_port)
|
||||
r_cache.hdel('metadata_crawler:{}'.format(splash_url), 'type', to_crawl['type_service'])
|
||||
|
||||
# add next auto Crawling in queue:
|
||||
if to_crawl['paste'] == 'auto':
|
||||
|
|
|
@ -150,6 +150,8 @@ function launching_scripts {
|
|||
# LAUNCH CORE MODULE
|
||||
screen -S "Script_AIL" -X screen -t "JSON_importer" bash -c "cd ${AIL_BIN}/import; ${ENV_PY} ./JSON_importer.py; read x"
|
||||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "Crawler_manager" bash -c "cd ${AIL_BIN}/core; ${ENV_PY} ./Crawler_manager.py; read x"
|
||||
sleep 0.1
|
||||
|
||||
|
||||
screen -S "Script_AIL" -X screen -t "ModuleInformation" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./ModulesInformationV2.py -k 0 -c 1; read x"
|
||||
|
@ -198,8 +200,8 @@ function launching_scripts {
|
|||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "Tools" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Tools.py; read x"
|
||||
sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "Phone" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Phone.py; read x"
|
||||
sleep 0.1
|
||||
#screen -S "Script_AIL" -X screen -t "Phone" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Phone.py; read x"
|
||||
#sleep 0.1
|
||||
#screen -S "Script_AIL" -X screen -t "Release" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Release.py; read x"
|
||||
#sleep 0.1
|
||||
screen -S "Script_AIL" -X screen -t "Cve" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./Cve.py; read x"
|
||||
|
|
66
bin/core/Crawler_manager.py
Executable file
66
bin/core/Crawler_manager.py
Executable file
|
@ -0,0 +1,66 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*-coding:UTF-8 -*
|
||||
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
|
||||
import ConfigLoader
|
||||
import crawlers
|
||||
|
||||
config_loader = ConfigLoader.ConfigLoader()
|
||||
r_serv_metadata = config_loader.get_redis_conn("ARDB_Metadata")
|
||||
config_loader = None
|
||||
|
||||
# # TODO: lauch me in core screen
|
||||
# # TODO: check if already launched in tor screen
|
||||
|
||||
# # TODO: handle mutltiple splash_manager
|
||||
if __name__ == '__main__':
|
||||
|
||||
is_manager_connected = crawlers.ping_splash_manager()
|
||||
if not is_manager_connected:
|
||||
print('Error, Can\'t connect to Splash manager')
|
||||
session_uuid = None
|
||||
else:
|
||||
print('Splash manager connected')
|
||||
session_uuid = crawlers.get_splash_manager_session_uuid()
|
||||
is_manager_connected = crawlers.reload_splash_and_proxies_list()
|
||||
print(is_manager_connected)
|
||||
if is_manager_connected:
|
||||
crawlers.relaunch_crawlers()
|
||||
last_check = int(time.time())
|
||||
|
||||
while True:
|
||||
|
||||
# # TODO: avoid multiple ping
|
||||
|
||||
# check if manager is connected
|
||||
if int(time.time()) - last_check > 60:
|
||||
is_manager_connected = crawlers.is_splash_manager_connected()
|
||||
current_session_uuid = crawlers.get_splash_manager_session_uuid()
|
||||
# reload proxy and splash list
|
||||
if current_session_uuid and current_session_uuid != session_uuid:
|
||||
is_manager_connected = crawlers.reload_splash_and_proxies_list()
|
||||
if is_manager_connected:
|
||||
print('reload proxies and splash list')
|
||||
crawlers.relaunch_crawlers()
|
||||
session_uuid = current_session_uuid
|
||||
if not is_manager_connected:
|
||||
print('Error, Can\'t connect to Splash manager')
|
||||
last_check = int(time.time())
|
||||
|
||||
# # TODO: lauch crawlers if was never connected
|
||||
# refresh splash and proxy list
|
||||
elif False:
|
||||
crawlers.reload_splash_and_proxies_list()
|
||||
print('list of splash and proxies refreshed')
|
||||
else:
|
||||
time.sleep(5)
|
||||
|
||||
# kill/launch new crawler / crawler manager check if already launched
|
||||
|
||||
|
||||
# # TODO: handle mutltiple splash_manager
|
||||
# catch reload request
|
|
@ -4,6 +4,7 @@
|
|||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import re
|
||||
|
||||
all_screen_name = set()
|
||||
|
||||
|
@ -16,8 +17,11 @@ def is_screen_install():
|
|||
print(p.stderr)
|
||||
return False
|
||||
|
||||
def exist_screen(screen_name):
|
||||
cmd_1 = ['screen', '-ls']
|
||||
def exist_screen(screen_name, with_sudoer=False):
|
||||
if with_sudoer:
|
||||
cmd_1 = ['sudo', 'screen', '-ls']
|
||||
else:
|
||||
cmd_1 = ['screen', '-ls']
|
||||
cmd_2 = ['egrep', '[0-9]+.{}'.format(screen_name)]
|
||||
p1 = subprocess.Popen(cmd_1, stdout=subprocess.PIPE)
|
||||
p2 = subprocess.Popen(cmd_2, stdin=p1.stdout, stdout=subprocess.PIPE)
|
||||
|
@ -27,6 +31,36 @@ def exist_screen(screen_name):
|
|||
return True
|
||||
return False
|
||||
|
||||
def get_screen_pid(screen_name, with_sudoer=False):
|
||||
if with_sudoer:
|
||||
cmd_1 = ['sudo', 'screen', '-ls']
|
||||
else:
|
||||
cmd_1 = ['screen', '-ls']
|
||||
cmd_2 = ['egrep', '[0-9]+.{}'.format(screen_name)]
|
||||
p1 = subprocess.Popen(cmd_1, stdout=subprocess.PIPE)
|
||||
p2 = subprocess.Popen(cmd_2, stdin=p1.stdout, stdout=subprocess.PIPE)
|
||||
p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
|
||||
output = p2.communicate()[0]
|
||||
if output:
|
||||
# extract pids with screen name
|
||||
regex_pid_screen_name = b'[0-9]+.' + screen_name.encode()
|
||||
pids = re.findall(regex_pid_screen_name, output)
|
||||
# extract pids
|
||||
all_pids = []
|
||||
for pid_name in pids:
|
||||
pid = pid_name.split(b'.')[0].decode()
|
||||
all_pids.append(pid)
|
||||
return all_pids
|
||||
return []
|
||||
|
||||
def detach_screen(screen_name):
|
||||
cmd = ['screen', '-d', screen_name]
|
||||
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
#if p.stdout:
|
||||
# print(p.stdout)
|
||||
if p.stderr:
|
||||
print(p.stderr)
|
||||
|
||||
def create_screen(screen_name):
|
||||
if not exist_screen(screen_name):
|
||||
cmd = ['screen', '-dmS', screen_name]
|
||||
|
@ -38,18 +72,59 @@ def create_screen(screen_name):
|
|||
print(p.stderr)
|
||||
return False
|
||||
|
||||
def kill_screen(screen_name, with_sudoer=False):
|
||||
if get_screen_pid(screen_name, with_sudoer=with_sudoer):
|
||||
for pid in get_screen_pid(screen_name, with_sudoer=with_sudoer):
|
||||
cmd = ['kill', pid]
|
||||
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
if p.stderr:
|
||||
print(p.stderr)
|
||||
else:
|
||||
print('{} killed'.format(pid))
|
||||
return True
|
||||
return False
|
||||
|
||||
# # TODO: add check if len(window_name) == 20
|
||||
# use: screen -S 'pid.screen_name' -p %window_id% -Q title
|
||||
# if len(windows_name) > 20 (truncated by default)
|
||||
def get_screen_windows_list(screen_name):
|
||||
def get_screen_windows_list(screen_name, r_set=True):
|
||||
# detach screen to avoid incomplete result
|
||||
detach_screen(screen_name)
|
||||
if r_set:
|
||||
all_windows_name = set()
|
||||
else:
|
||||
all_windows_name = []
|
||||
cmd = ['screen', '-S', screen_name, '-Q', 'windows']
|
||||
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
if p.stdout:
|
||||
for window_row in p.stdout.split(b' '):
|
||||
window_id, window_name = window_row.decode().split()
|
||||
print(window_id)
|
||||
print(window_name)
|
||||
print('---')
|
||||
#print(window_id)
|
||||
#print(window_name)
|
||||
#print('---')
|
||||
if r_set:
|
||||
all_windows_name.add(window_name)
|
||||
else:
|
||||
all_windows_name.append(window_name)
|
||||
if p.stderr:
|
||||
print(p.stderr)
|
||||
return all_windows_name
|
||||
|
||||
def get_screen_windows_id(screen_name):
|
||||
# detach screen to avoid incomplete result
|
||||
detach_screen(screen_name)
|
||||
all_windows_id = {}
|
||||
cmd = ['screen', '-S', screen_name, '-Q', 'windows']
|
||||
p = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
if p.stdout:
|
||||
for window_row in p.stdout.split(b' '):
|
||||
window_id, window_name = window_row.decode().split()
|
||||
if window_name not in all_windows_id:
|
||||
all_windows_id[window_name] = []
|
||||
all_windows_id[window_name].append(window_id)
|
||||
if p.stderr:
|
||||
print(p.stderr)
|
||||
return all_windows_id
|
||||
|
||||
# script_location ${AIL_BIN}
|
||||
def launch_windows_script(screen_name, window_name, dir_project, script_location, script_name, script_options=''):
|
||||
|
@ -60,6 +135,16 @@ def launch_windows_script(screen_name, window_name, dir_project, script_location
|
|||
print(p.stdout)
|
||||
print(p.stderr)
|
||||
|
||||
def launch_uniq_windows_script(screen_name, window_name, dir_project, script_location, script_name, script_options='', kill_previous_windows=False):
|
||||
all_screen_name = get_screen_windows_id(screen_name)
|
||||
if window_name in all_screen_name:
|
||||
if kill_previous_windows:
|
||||
kill_screen_window(screen_name, all_screen_name[window_name][0], force=True)
|
||||
else:
|
||||
print('Error: screen {} already contain a windows with this name {}'.format(screen_name, window_name))
|
||||
return None
|
||||
launch_windows_script(screen_name, window_name, dir_project, script_location, script_name, script_options=script_options)
|
||||
|
||||
def kill_screen_window(screen_name, window_id, force=False):
|
||||
if force:# kill
|
||||
cmd = ['screen', '-S', screen_name, '-p', window_id, '-X', 'kill']
|
||||
|
|
|
@ -64,3 +64,12 @@ class ConfigLoader(object):
|
|||
|
||||
def has_section(self, section):
|
||||
return self.cfg.has_section(section)
|
||||
|
||||
def get_all_keys_values_from_section(self, section):
|
||||
if section in self.cfg:
|
||||
all_keys_values = []
|
||||
for key_name in self.cfg[section]:
|
||||
all_keys_values.append((key_name, self.cfg.get(section, key_name)))
|
||||
return all_keys_values
|
||||
else:
|
||||
return []
|
||||
|
|
155
bin/lib/Config_DB.py
Executable file
155
bin/lib/Config_DB.py
Executable file
|
@ -0,0 +1,155 @@
|
|||
#!/usr/bin/python3
|
||||
|
||||
"""
|
||||
Config save in DB
|
||||
===================
|
||||
|
||||
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import redis
|
||||
|
||||
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
|
||||
import ConfigLoader
|
||||
|
||||
config_loader = ConfigLoader.ConfigLoader()
|
||||
r_serv_db = config_loader.get_redis_conn("ARDB_DB")
|
||||
config_loader = None
|
||||
|
||||
#### TO PUT IN CONFIG
|
||||
# later => module timeout
|
||||
#
|
||||
## data retention
|
||||
#########################
|
||||
|
||||
default_config = {
|
||||
"crawler": {
|
||||
"enable_har_by_default": False,
|
||||
"enable_screenshot_by_default": True,
|
||||
"default_depth_limit": 1,
|
||||
"default_closespider_pagecount": 50,
|
||||
"default_user_agent": "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0",
|
||||
"default_timeout": 30
|
||||
}
|
||||
}
|
||||
|
||||
def get_default_config():
|
||||
return default_config
|
||||
|
||||
def get_default_config_value(section, field):
|
||||
return default_config[section][field]
|
||||
|
||||
config_type = {
|
||||
# crawler config
|
||||
"crawler": {
|
||||
"enable_har_by_default": bool,
|
||||
"enable_screenshot_by_default": bool,
|
||||
"default_depth_limit": int,
|
||||
"default_closespider_pagecount": int,
|
||||
"default_user_agent": str,
|
||||
"default_timeout": int
|
||||
}
|
||||
}
|
||||
|
||||
def get_config_type(section, field):
|
||||
return config_type[section][field]
|
||||
|
||||
# # TODO: add set, dict, list and select_(multiple_)value
|
||||
def is_valid_type(obj, section, field, value_type=None):
|
||||
res = isinstance(obj, get_config_type(section, field))
|
||||
return res
|
||||
|
||||
def reset_default_config():
|
||||
pass
|
||||
|
||||
def set_default_config(section, field):
|
||||
save_config(section, field, get_default_config_value(section, field))
|
||||
|
||||
def get_all_config_sections():
|
||||
return list(get_default_config())
|
||||
|
||||
def get_all_config_fields_by_section(section):
|
||||
return list(get_default_config()[section])
|
||||
|
||||
def get_config(section, field):
|
||||
# config field don't exist
|
||||
if not r_serv_db.hexists(f'config:global:{section}', field):
|
||||
set_default_config(section, field)
|
||||
return get_default_config_value(section, field)
|
||||
|
||||
# load default config section
|
||||
if not r_serv_db.exists('config:global:{}'.format(section)):
|
||||
save_config(section, field, get_default_config_value(section, field))
|
||||
return get_default_config_value(section, field)
|
||||
|
||||
return r_serv_db.hget(f'config:global:{section}', field)
|
||||
|
||||
def get_config_dict_by_section(section):
|
||||
config_dict = {}
|
||||
for field in get_all_config_fields_by_section(section):
|
||||
config_dict[field] = get_config(section, field)
|
||||
return config_dict
|
||||
|
||||
def save_config(section, field, value, value_type=None): ###########################################
|
||||
if section in default_config:
|
||||
if is_valid_type(value, section, field, value_type=value_type):
|
||||
if value_type in ['list', 'set', 'dict']:
|
||||
pass
|
||||
else:
|
||||
r_serv_db.hset(f'config:global:{section}', field, value)
|
||||
# used by check_integrity
|
||||
r_serv_db.sadd('config:all_global_section', field, value)
|
||||
|
||||
# check config value + type
|
||||
def check_integrity():
|
||||
pass
|
||||
|
||||
|
||||
config_documentation = {
|
||||
"crawler": {
|
||||
"enable_har_by_default": 'Enable HAR by default',
|
||||
"enable_screenshot_by_default": 'Enable screenshot by default',
|
||||
"default_depth_limit": 'Maximum number of url depth',
|
||||
"default_closespider_pagecount": 'Maximum number of pages',
|
||||
"default_user_agent": "User agent used by default",
|
||||
"default_timeout": "Crawler connection timeout"
|
||||
}
|
||||
}
|
||||
|
||||
def get_config_documentation(section, field):
|
||||
return config_documentation[section][field]
|
||||
|
||||
# def conf_view():
|
||||
# class F(MyBaseForm):
|
||||
# pass
|
||||
#
|
||||
# F.username = TextField('username')
|
||||
# for name in iterate_some_model_dynamically():
|
||||
# setattr(F, name, TextField(name.title()))
|
||||
#
|
||||
# form = F(request.POST, ...)
|
||||
|
||||
def get_field_full_config(section, field):
|
||||
dict_config = {}
|
||||
dict_config['value'] = get_config(section, field)
|
||||
dict_config['type'] = get_config_type(section, field)
|
||||
dict_config['info'] = get_config_documentation(section, field)
|
||||
return dict_config
|
||||
|
||||
def get_full_config_by_section(section):
|
||||
dict_config = {}
|
||||
for field in get_all_config_fields_by_section(section):
|
||||
dict_config[field] = get_field_full_config(section, field)
|
||||
return dict_config
|
||||
|
||||
def get_full_config():
|
||||
dict_config = {}
|
||||
for section in get_all_config_sections():
|
||||
dict_config[section] = get_full_config_by_section(section)
|
||||
return dict_config
|
||||
|
||||
if __name__ == '__main__':
|
||||
res = get_full_config()
|
||||
print(res)
|
|
@ -13,6 +13,7 @@ import os
|
|||
import re
|
||||
import redis
|
||||
import sys
|
||||
import time
|
||||
import uuid
|
||||
|
||||
from datetime import datetime, timedelta
|
||||
|
@ -34,19 +35,24 @@ config_loader = ConfigLoader.ConfigLoader()
|
|||
r_serv_metadata = config_loader.get_redis_conn("ARDB_Metadata")
|
||||
r_serv_onion = config_loader.get_redis_conn("ARDB_Onion")
|
||||
r_cache = config_loader.get_redis_conn("Redis_Cache")
|
||||
config_loader = None
|
||||
|
||||
# load crawler config
|
||||
config_loader = ConfigLoader.ConfigLoader(config_file='crawlers.cfg')
|
||||
#splash_manager_url = config_loader.get_config_str('Splash_Manager', 'splash_url')
|
||||
#splash_api_key = config_loader.get_config_str('Splash_Manager', 'api_key')
|
||||
PASTES_FOLDER = os.path.join(os.environ['AIL_HOME'], config_loader.get_config_str("Directories", "pastes"))
|
||||
config_loader = None
|
||||
|
||||
faup = Faup()
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# COMMON #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
def generate_uuid():
|
||||
return str(uuid.uuid4()).replace('-', '')
|
||||
|
||||
# # TODO: remove me ?
|
||||
def get_current_date():
|
||||
return datetime.now().strftime("%Y%m%d")
|
||||
|
||||
def is_valid_onion_domain(domain):
|
||||
if not domain.endswith('.onion'):
|
||||
return False
|
||||
|
@ -61,6 +67,10 @@ def is_valid_onion_domain(domain):
|
|||
return True
|
||||
return False
|
||||
|
||||
# TEMP FIX
|
||||
def get_faup():
|
||||
return faup
|
||||
|
||||
################################################################################
|
||||
|
||||
# # TODO: handle prefix cookies
|
||||
|
@ -389,8 +399,127 @@ def api_create_cookie(user_id, cookiejar_uuid, cookie_dict):
|
|||
|
||||
#### ####
|
||||
|
||||
# # # # # # # #
|
||||
# #
|
||||
# CRAWLER #
|
||||
# #
|
||||
# # # # # # # #
|
||||
|
||||
#### CRAWLER GLOBAL ####
|
||||
|
||||
def get_all_spash_crawler_status():
|
||||
crawler_metadata = []
|
||||
all_crawlers = r_cache.smembers('all_splash_crawlers')
|
||||
for crawler in all_crawlers:
|
||||
crawler_metadata.append(get_splash_crawler_status(crawler))
|
||||
return crawler_metadata
|
||||
|
||||
def reset_all_spash_crawler_status():
|
||||
r_cache.delete('all_splash_crawlers')
|
||||
|
||||
def get_splash_crawler_status(spash_url):
|
||||
crawler_type = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'type')
|
||||
crawling_domain = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'crawling_domain')
|
||||
started_time = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'started_time')
|
||||
status_info = r_cache.hget('metadata_crawler:{}'.format(spash_url), 'status')
|
||||
crawler_info = '{} - {}'.format(spash_url, started_time)
|
||||
if status_info=='Waiting' or status_info=='Crawling':
|
||||
status=True
|
||||
else:
|
||||
status=False
|
||||
return {'crawler_info': crawler_info, 'crawling_domain': crawling_domain, 'status_info': status_info, 'status': status, 'type': crawler_type}
|
||||
|
||||
def get_stats_last_crawled_domains(crawler_types, date):
|
||||
statDomains = {}
|
||||
for crawler_type in crawler_types:
|
||||
stat_type = {}
|
||||
stat_type['domains_up'] = r_serv_onion.scard('{}_up:{}'.format(crawler_type, date))
|
||||
stat_type['domains_down'] = r_serv_onion.scard('{}_down:{}'.format(crawler_type, date))
|
||||
stat_type['total'] = stat_type['domains_up'] + stat_type['domains_down']
|
||||
stat_type['domains_queue'] = get_nb_elem_to_crawl_by_type(crawler_type)
|
||||
statDomains[crawler_type] = stat_type
|
||||
return statDomains
|
||||
|
||||
# # TODO: handle custom proxy
|
||||
def get_splash_crawler_latest_stats():
|
||||
now = datetime.now()
|
||||
date = now.strftime("%Y%m%d")
|
||||
return get_stats_last_crawled_domains(['onion', 'regular'], date)
|
||||
|
||||
def get_nb_crawlers_to_launch_by_splash_name(splash_name):
|
||||
res = r_serv_onion.hget('all_crawlers_to_launch', splash_name)
|
||||
if res:
|
||||
return int(res)
|
||||
else:
|
||||
return 0
|
||||
|
||||
def get_all_crawlers_to_launch_splash_name():
|
||||
return r_serv_onion.hkeys('all_crawlers_to_launch')
|
||||
|
||||
def get_nb_crawlers_to_launch():
|
||||
nb_crawlers_to_launch = r_serv_onion.hgetall('all_crawlers_to_launch')
|
||||
for splash_name in nb_crawlers_to_launch:
|
||||
nb_crawlers_to_launch[splash_name] = int(nb_crawlers_to_launch[splash_name])
|
||||
return nb_crawlers_to_launch
|
||||
|
||||
def get_nb_crawlers_to_launch_ui():
|
||||
nb_crawlers_to_launch = get_nb_crawlers_to_launch()
|
||||
for splash_name in get_all_splash():
|
||||
if splash_name not in nb_crawlers_to_launch:
|
||||
nb_crawlers_to_launch[splash_name] = 0
|
||||
return nb_crawlers_to_launch
|
||||
|
||||
def set_nb_crawlers_to_launch(dict_splash_name):
|
||||
r_serv_onion.delete('all_crawlers_to_launch')
|
||||
for splash_name in dict_splash_name:
|
||||
r_serv_onion.hset('all_crawlers_to_launch', splash_name, int(dict_splash_name[splash_name]))
|
||||
relaunch_crawlers()
|
||||
|
||||
def relaunch_crawlers():
|
||||
all_crawlers_to_launch = get_nb_crawlers_to_launch()
|
||||
for splash_name in all_crawlers_to_launch:
|
||||
nb_crawlers = int(all_crawlers_to_launch[splash_name])
|
||||
|
||||
all_crawler_urls = get_splash_all_url(splash_name, r_list=True)
|
||||
if nb_crawlers > len(all_crawler_urls):
|
||||
print('Error, can\'t launch all Splash Dockers')
|
||||
print('Please launch {} additional {} Dockers'.format( nb_crawlers - len(all_crawler_urls), splash_name))
|
||||
nb_crawlers = len(all_crawler_urls)
|
||||
|
||||
reset_all_spash_crawler_status()
|
||||
|
||||
for i in range(0, int(nb_crawlers)):
|
||||
splash_url = all_crawler_urls[i]
|
||||
print(all_crawler_urls[i])
|
||||
|
||||
launch_ail_splash_crawler(splash_url, script_options='{}'.format(splash_url))
|
||||
|
||||
def api_set_nb_crawlers_to_launch(dict_splash_name):
|
||||
# TODO: check if is dict
|
||||
dict_crawlers_to_launch = {}
|
||||
all_splash = get_all_splash()
|
||||
crawlers_to_launch = list(all_splash & set(dict_splash_name.keys()))
|
||||
for splash_name in crawlers_to_launch:
|
||||
try:
|
||||
nb_to_launch = int(dict_splash_name.get(splash_name, 0))
|
||||
if nb_to_launch < 0:
|
||||
return ({'error':'The number of crawlers to launch is negative'}, 400)
|
||||
except:
|
||||
return ({'error':'invalid number of crawlers to launch'}, 400)
|
||||
if nb_to_launch > 0:
|
||||
dict_crawlers_to_launch[splash_name] = nb_to_launch
|
||||
|
||||
if dict_crawlers_to_launch:
|
||||
set_nb_crawlers_to_launch(dict_crawlers_to_launch)
|
||||
return (dict_crawlers_to_launch, 200)
|
||||
else:
|
||||
return ({'error':'invalid input'}, 400)
|
||||
|
||||
|
||||
##-- CRAWLER GLOBAL --##
|
||||
|
||||
#### CRAWLER TASK ####
|
||||
def create_crawler_task(url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, cookiejar_uuid=None, user_agent=None):
|
||||
def create_crawler_task(url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, crawler_type=None, cookiejar_uuid=None, user_agent=None):
|
||||
|
||||
crawler_config = {}
|
||||
crawler_config['depth_limit'] = depth_limit
|
||||
|
@ -430,10 +559,18 @@ def create_crawler_task(url, screenshot=True, har=True, depth_limit=1, max_pages
|
|||
tld = unpack_url['tld'].decode()
|
||||
except:
|
||||
tld = unpack_url['tld']
|
||||
if tld == 'onion':
|
||||
crawler_type = 'onion'
|
||||
|
||||
if crawler_type=='None':
|
||||
crawler_type = None
|
||||
|
||||
if crawler_type:
|
||||
if crawler_type=='tor':
|
||||
crawler_type = 'onion'
|
||||
else:
|
||||
crawler_type = 'regular'
|
||||
if tld == 'onion':
|
||||
crawler_type = 'onion'
|
||||
else:
|
||||
crawler_type = 'regular'
|
||||
|
||||
save_crawler_config(crawler_mode, crawler_type, crawler_config, domain, url=url)
|
||||
send_url_to_crawl_in_queue(crawler_mode, crawler_type, url)
|
||||
|
@ -445,6 +582,7 @@ def save_crawler_config(crawler_mode, crawler_type, crawler_config, domain, url=
|
|||
r_serv_onion.set('crawler_config:{}:{}:{}:{}'.format(crawler_mode, crawler_type, domain, url), json.dumps(crawler_config))
|
||||
|
||||
def send_url_to_crawl_in_queue(crawler_mode, crawler_type, url):
|
||||
print('{}_crawler_priority_queue'.format(crawler_type), '{};{}'.format(url, crawler_mode))
|
||||
r_serv_onion.sadd('{}_crawler_priority_queue'.format(crawler_type), '{};{}'.format(url, crawler_mode))
|
||||
# add auto crawled url for user UI
|
||||
if crawler_mode == 'auto':
|
||||
|
@ -452,7 +590,7 @@ def send_url_to_crawl_in_queue(crawler_mode, crawler_type, url):
|
|||
|
||||
#### ####
|
||||
#### CRAWLER TASK API ####
|
||||
def api_create_crawler_task(user_id, url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, cookiejar_uuid=None, user_agent=None):
|
||||
def api_create_crawler_task(user_id, url, screenshot=True, har=True, depth_limit=1, max_pages=100, auto_crawler=False, crawler_delta=3600, crawler_type=None, cookiejar_uuid=None, user_agent=None):
|
||||
# validate url
|
||||
if url is None or url=='' or url=='\n':
|
||||
return ({'error':'invalid depth limit'}, 400)
|
||||
|
@ -489,7 +627,10 @@ def api_create_crawler_task(user_id, url, screenshot=True, har=True, depth_limit
|
|||
if cookie_owner != user_id:
|
||||
return ({'error': 'The access to this cookiejar is restricted'}, 403)
|
||||
|
||||
# # TODO: verify splash name/crawler type
|
||||
|
||||
create_crawler_task(url, screenshot=screenshot, har=har, depth_limit=depth_limit, max_pages=max_pages,
|
||||
crawler_type=crawler_type,
|
||||
auto_crawler=auto_crawler, crawler_delta=crawler_delta, cookiejar_uuid=cookiejar_uuid, user_agent=user_agent)
|
||||
return None
|
||||
|
||||
|
@ -572,6 +713,7 @@ def save_har(har_dir, item_id, har_content):
|
|||
with open(filename, 'w') as f:
|
||||
f.write(json.dumps(har_content))
|
||||
|
||||
# # TODO: FIXME
|
||||
def api_add_crawled_item(dict_crawled):
|
||||
|
||||
domain = None
|
||||
|
@ -580,30 +722,200 @@ def api_add_crawled_item(dict_crawled):
|
|||
save_crawled_item(item_id, response.data['html'])
|
||||
create_item_metadata(item_id, domain, 'last_url', port, 'father')
|
||||
|
||||
#### CRAWLER QUEUES ####
|
||||
def get_all_crawlers_queues_types():
|
||||
all_queues_types = set()
|
||||
all_splash_name = get_all_crawlers_to_launch_splash_name()
|
||||
for splash_name in all_splash_name:
|
||||
all_queues_types.add(get_splash_crawler_type(splash_name))
|
||||
all_splash_name = list()
|
||||
return all_queues_types
|
||||
|
||||
#### SPLASH MANAGER ####
|
||||
def get_splash_manager_url(reload=False): # TODO: add config reload
|
||||
return splash_manager_url
|
||||
def get_crawler_queue_types_by_splash_name(splash_name):
|
||||
all_domain_type = [splash_name]
|
||||
crawler_type = get_splash_crawler_type(splash_name)
|
||||
#if not is_splash_used_in_discovery(splash_name)
|
||||
if crawler_type == 'tor':
|
||||
all_domain_type.append('onion')
|
||||
all_domain_type.append('regular')
|
||||
else:
|
||||
all_domain_type.append('regular')
|
||||
return all_domain_type
|
||||
|
||||
def get_splash_api_key(reload=False): # TODO: add config reload
|
||||
return splash_api_key
|
||||
def get_crawler_type_by_url(url):
|
||||
faup.decode(url)
|
||||
unpack_url = faup.get()
|
||||
## TODO: # FIXME: remove me
|
||||
try:
|
||||
tld = unpack_url['tld'].decode()
|
||||
except:
|
||||
tld = unpack_url['tld']
|
||||
|
||||
if tld == 'onion':
|
||||
crawler_type = 'onion'
|
||||
else:
|
||||
crawler_type = 'regular'
|
||||
return crawler_type
|
||||
|
||||
|
||||
def get_elem_to_crawl_by_queue_type(l_queue_type):
|
||||
## queues priority:
|
||||
# 1 - priority queue
|
||||
# 2 - discovery queue
|
||||
# 3 - normal queue
|
||||
##
|
||||
all_queue_key = ['{}_crawler_priority_queue', '{}_crawler_discovery_queue', '{}_crawler_queue']
|
||||
|
||||
for queue_key in all_queue_key:
|
||||
for queue_type in l_queue_type:
|
||||
message = r_serv_onion.spop(queue_key.format(queue_type))
|
||||
if message:
|
||||
dict_to_crawl = {}
|
||||
splitted = message.rsplit(';', 1)
|
||||
if len(splitted) == 2:
|
||||
url, item_id = splitted
|
||||
item_id = item_id.replace(PASTES_FOLDER+'/', '')
|
||||
else:
|
||||
# # TODO: to check/refractor
|
||||
item_id = None
|
||||
url = message
|
||||
crawler_type = get_crawler_type_by_url(url)
|
||||
return {'url': url, 'paste': item_id, 'type_service': crawler_type, 'queue_type': queue_type, 'original_message': message}
|
||||
return None
|
||||
|
||||
def get_nb_elem_to_crawl_by_type(queue_type):
|
||||
nb = r_serv_onion.scard('{}_crawler_priority_queue'.format(queue_type))
|
||||
nb += r_serv_onion.scard('{}_crawler_discovery_queue'.format(queue_type))
|
||||
nb += r_serv_onion.scard('{}_crawler_queue'.format(queue_type))
|
||||
return nb
|
||||
|
||||
#### ---- ####
|
||||
|
||||
# # # # # # # # # # # #
|
||||
# #
|
||||
# SPLASH MANAGER #
|
||||
# #
|
||||
# # # # # # # # # # # #
|
||||
def get_splash_manager_url(reload=False): # TODO: add in db config
|
||||
return r_serv_onion.get('crawler:splash:manager:url')
|
||||
|
||||
def get_splash_api_key(reload=False): # TODO: add in db config
|
||||
return r_serv_onion.get('crawler:splash:manager:key')
|
||||
|
||||
def get_hidden_splash_api_key(): # TODO: add in db config
|
||||
key = get_splash_api_key()
|
||||
if key:
|
||||
if len(key)==41:
|
||||
return f'{key[:4]}*********************************{key[-4:]}'
|
||||
|
||||
def is_valid_api_key(api_key, search=re.compile(r'[^a-zA-Z0-9_-]').search):
|
||||
if len(api_key) != 41:
|
||||
return False
|
||||
return not bool(search(api_key))
|
||||
|
||||
def save_splash_manager_url_api(url, api_key):
|
||||
r_serv_onion.set('crawler:splash:manager:url', url)
|
||||
r_serv_onion.set('crawler:splash:manager:key', api_key)
|
||||
|
||||
def get_splash_url_from_manager_url(splash_manager_url, splash_port):
|
||||
url = urlparse(splash_manager_url)
|
||||
host = url.netloc.split(':', 1)[0]
|
||||
return 'http://{}:{}'.format(host, splash_port)
|
||||
return '{}:{}'.format(host, splash_port)
|
||||
|
||||
# def is_splash_used_in_discovery(splash_name):
|
||||
# res = r_serv_onion.hget('splash:metadata:{}'.format(splash_name), 'discovery_queue')
|
||||
# if res == 'True':
|
||||
# return True
|
||||
# else:
|
||||
# return False
|
||||
|
||||
def restart_splash_docker(splash_url, splash_name):
|
||||
splash_port = splash_url.split(':')[-1]
|
||||
return _restart_splash_docker(splash_port, splash_name)
|
||||
|
||||
def is_splash_manager_connected(delta_check=30):
|
||||
last_check = r_cache.hget('crawler:splash:manager', 'last_check')
|
||||
if last_check:
|
||||
if int(time.time()) - int(last_check) > delta_check:
|
||||
ping_splash_manager()
|
||||
else:
|
||||
ping_splash_manager()
|
||||
res = r_cache.hget('crawler:splash:manager', 'connected')
|
||||
return res == 'True'
|
||||
|
||||
def update_splash_manager_connection_status(is_connected, req_error=None):
|
||||
r_cache.hset('crawler:splash:manager', 'connected', is_connected)
|
||||
r_cache.hset('crawler:splash:manager', 'last_check', int(time.time()))
|
||||
if not req_error:
|
||||
r_cache.hdel('crawler:splash:manager', 'error')
|
||||
else:
|
||||
r_cache.hset('crawler:splash:manager', 'status_code', req_error['status_code'])
|
||||
r_cache.hset('crawler:splash:manager', 'error', req_error['error'])
|
||||
|
||||
def get_splash_manager_connection_metadata(force_ping=False):
|
||||
dict_manager={}
|
||||
if force_ping:
|
||||
dict_manager['status'] = ping_splash_manager()
|
||||
else:
|
||||
dict_manager['status'] = is_splash_manager_connected()
|
||||
if not dict_manager['status']:
|
||||
dict_manager['status_code'] = r_cache.hget('crawler:splash:manager', 'status_code')
|
||||
dict_manager['error'] = r_cache.hget('crawler:splash:manager', 'error')
|
||||
return dict_manager
|
||||
|
||||
## API ##
|
||||
def ping_splash_manager():
|
||||
req = requests.get('{}/api/v1/ping'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
|
||||
if req.status_code == 200:
|
||||
return True
|
||||
else:
|
||||
print(req.json())
|
||||
splash_manager_url = get_splash_manager_url()
|
||||
if not splash_manager_url:
|
||||
return False
|
||||
try:
|
||||
req = requests.get('{}/api/v1/ping'.format(splash_manager_url), headers={"Authorization": get_splash_api_key()}, verify=False)
|
||||
if req.status_code == 200:
|
||||
update_splash_manager_connection_status(True)
|
||||
return True
|
||||
else:
|
||||
res = req.json()
|
||||
if 'reason' in res:
|
||||
req_error = {'status_code': req.status_code, 'error': res['reason']}
|
||||
else:
|
||||
print(req.json())
|
||||
req_error = {'status_code': req.status_code, 'error': json.dumps(req.json())}
|
||||
update_splash_manager_connection_status(False, req_error=req_error)
|
||||
return False
|
||||
except requests.exceptions.ConnectionError:
|
||||
pass
|
||||
# splash manager unreachable
|
||||
req_error = {'status_code': 500, 'error': 'splash manager unreachable'}
|
||||
update_splash_manager_connection_status(False, req_error=req_error)
|
||||
return False
|
||||
|
||||
def get_splash_manager_session_uuid():
|
||||
try:
|
||||
req = requests.get('{}/api/v1/get/session_uuid'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
|
||||
if req.status_code == 200:
|
||||
res = req.json()
|
||||
if res:
|
||||
return res['session_uuid']
|
||||
else:
|
||||
print(req.json())
|
||||
except (requests.exceptions.ConnectionError, requests.exceptions.MissingSchema):
|
||||
# splash manager unreachable
|
||||
update_splash_manager_connection_status(False)
|
||||
|
||||
def get_splash_manager_version():
|
||||
splash_manager_url = get_splash_manager_url()
|
||||
if splash_manager_url:
|
||||
try:
|
||||
req = requests.get('{}/api/v1/version'.format(splash_manager_url), headers={"Authorization": get_splash_api_key()}, verify=False)
|
||||
if req.status_code == 200:
|
||||
return req.json()['message']
|
||||
else:
|
||||
print(req.json())
|
||||
except requests.exceptions.ConnectionError:
|
||||
pass
|
||||
|
||||
def get_all_splash_manager_containers_name():
|
||||
req = requests.get('{}/api/v1/get/splash/name/all'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
|
||||
req = requests.get('{}/api/v1/get/splash/all'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False)
|
||||
if req.status_code == 200:
|
||||
return req.json()
|
||||
else:
|
||||
|
@ -615,6 +927,35 @@ def get_all_splash_manager_proxies():
|
|||
return req.json()
|
||||
else:
|
||||
print(req.json())
|
||||
|
||||
def _restart_splash_docker(splash_port, splash_name):
|
||||
dict_to_send = {'port': splash_port, 'name': splash_name}
|
||||
req = requests.post('{}/api/v1/splash/restart'.format(get_splash_manager_url()), headers={"Authorization": get_splash_api_key()}, verify=False, json=dict_to_send)
|
||||
if req.status_code == 200:
|
||||
return req.json()
|
||||
else:
|
||||
print(req.json())
|
||||
|
||||
def api_save_splash_manager_url_api(data):
|
||||
# unpack json
|
||||
manager_url = data.get('url', None)
|
||||
api_key = data.get('api_key', None)
|
||||
if not manager_url or not api_key:
|
||||
return ({'status': 'error', 'reason': 'No url or API key supplied'}, 400)
|
||||
# check if is valid url
|
||||
try:
|
||||
result = urlparse(manager_url)
|
||||
if not all([result.scheme, result.netloc]):
|
||||
return ({'status': 'error', 'reason': 'Invalid url'}, 400)
|
||||
except:
|
||||
return ({'status': 'error', 'reason': 'Invalid url'}, 400)
|
||||
|
||||
# check if is valid key
|
||||
if not is_valid_api_key(api_key):
|
||||
return ({'status': 'error', 'reason': 'Invalid API key'}, 400)
|
||||
|
||||
save_splash_manager_url_api(manager_url, api_key)
|
||||
return ({'url': manager_url, 'api_key': get_hidden_splash_api_key()}, 200)
|
||||
## -- ##
|
||||
|
||||
## SPLASH ##
|
||||
|
@ -647,7 +988,23 @@ def get_splash_name_by_url(splash_url):
|
|||
def get_splash_crawler_type(splash_name):
|
||||
return r_serv_onion.hget('splash:metadata:{}'.format(splash_name), 'crawler_type')
|
||||
|
||||
def get_all_splash_by_proxy(proxy_name):
|
||||
def get_splash_crawler_description(splash_name):
|
||||
return r_serv_onion.hget('splash:metadata:{}'.format(splash_name), 'description')
|
||||
|
||||
def get_splash_crawler_metadata(splash_name):
|
||||
dict_splash = {}
|
||||
dict_splash['proxy'] = get_splash_proxy(splash_name)
|
||||
dict_splash['type'] = get_splash_crawler_type(splash_name)
|
||||
dict_splash['description'] = get_splash_crawler_description(splash_name)
|
||||
return dict_splash
|
||||
|
||||
def get_all_splash_crawler_metadata():
|
||||
dict_splash = {}
|
||||
for splash_name in get_all_splash():
|
||||
dict_splash[splash_name] = get_splash_crawler_metadata(splash_name)
|
||||
return dict_splash
|
||||
|
||||
def get_all_splash_by_proxy(proxy_name, r_list=False):
|
||||
res = r_serv_onion.smembers('proxy:splash:{}'.format(proxy_name))
|
||||
if res:
|
||||
if r_list:
|
||||
|
@ -683,16 +1040,50 @@ def delete_all_proxies():
|
|||
for proxy_name in get_all_proxies():
|
||||
delete_proxy(proxy_name)
|
||||
|
||||
def get_proxy_host(proxy_name):
|
||||
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'host')
|
||||
|
||||
def get_proxy_port(proxy_name):
|
||||
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'port')
|
||||
|
||||
def get_proxy_type(proxy_name):
|
||||
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'type')
|
||||
|
||||
def get_proxy_crawler_type(proxy_name):
|
||||
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'crawler_type')
|
||||
|
||||
def get_proxy_description(proxy_name):
|
||||
return r_serv_onion.hget('proxy:metadata:{}'.format(proxy_name), 'description')
|
||||
|
||||
def get_proxy_metadata(proxy_name):
|
||||
meta_dict = {}
|
||||
meta_dict['host'] = get_proxy_host(proxy_name)
|
||||
meta_dict['port'] = get_proxy_port(proxy_name)
|
||||
meta_dict['type'] = get_proxy_type(proxy_name)
|
||||
meta_dict['crawler_type'] = get_proxy_crawler_type(proxy_name)
|
||||
meta_dict['description'] = get_proxy_description(proxy_name)
|
||||
return meta_dict
|
||||
|
||||
def get_all_proxies_metadata():
|
||||
all_proxy_dict = {}
|
||||
for proxy_name in get_all_proxies():
|
||||
all_proxy_dict[proxy_name] = get_proxy_metadata(proxy_name)
|
||||
return all_proxy_dict
|
||||
|
||||
# def set_proxy_used_in_discovery(proxy_name, value):
|
||||
# r_serv_onion.hset('splash:metadata:{}'.format(splash_name), 'discovery_queue', value)
|
||||
|
||||
def delete_proxy(proxy_name): # # TODO: force delete (delete all proxy)
|
||||
proxy_splash = get_all_splash_by_proxy(proxy_name)
|
||||
if proxy_splash:
|
||||
print('error, a splash container is using this proxy')
|
||||
#if proxy_splash:
|
||||
# print('error, a splash container is using this proxy')
|
||||
r_serv_onion.delete('proxy:metadata:{}'.format(proxy_name))
|
||||
r_serv_onion.srem('all_proxy', proxy_name)
|
||||
## -- ##
|
||||
|
||||
## LOADER ##
|
||||
def load_all_splash_containers():
|
||||
delete_all_splash_containers()
|
||||
all_splash_containers_name = get_all_splash_manager_containers_name()
|
||||
for splash_name in all_splash_containers_name:
|
||||
r_serv_onion.sadd('all_splash', splash_name)
|
||||
|
@ -715,6 +1106,7 @@ def load_all_splash_containers():
|
|||
r_serv_onion.set('splash:map:url:name:{}'.format(splash_url), splash_name)
|
||||
|
||||
def load_all_proxy():
|
||||
delete_all_proxies()
|
||||
all_proxies = get_all_splash_manager_proxies()
|
||||
for proxy_name in all_proxies:
|
||||
proxy_dict = all_proxies[proxy_name]
|
||||
|
@ -725,13 +1117,17 @@ def load_all_proxy():
|
|||
description = all_proxies[proxy_name].get('description', None)
|
||||
if description:
|
||||
r_serv_onion.hset('proxy:metadata:{}'.format(proxy_name), 'description', description)
|
||||
r_serv_onion.sadd('all_proxy', proxy_name)
|
||||
|
||||
def init_splash_list_db():
|
||||
delete_all_splash_containers()
|
||||
delete_all_proxies()
|
||||
def reload_splash_and_proxies_list():
|
||||
if ping_splash_manager():
|
||||
load_all_splash_containers()
|
||||
# LOAD PROXIES containers
|
||||
load_all_proxy()
|
||||
# LOAD SPLASH containers
|
||||
load_all_splash_containers()
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
# # TODO: kill crawler screen ?
|
||||
## -- ##
|
||||
|
||||
|
@ -742,7 +1138,7 @@ def launch_ail_splash_crawler(splash_url, script_options=''):
|
|||
script_location = os.path.join(os.environ['AIL_BIN'])
|
||||
script_name = 'Crawler.py'
|
||||
screen.create_screen(screen_name)
|
||||
screen.launch_windows_script(screen_name, splash_url, dir_project, script_location, script_name, script_options=script_options)
|
||||
screen.launch_uniq_windows_script(screen_name, splash_url, dir_project, script_location, script_name, script_options=script_options, kill_previous_windows=True)
|
||||
|
||||
|
||||
## -- ##
|
||||
|
@ -752,3 +1148,8 @@ def launch_ail_splash_crawler(splash_url, script_options=''):
|
|||
#### CRAWLER PROXY ####
|
||||
|
||||
#### ---- ####
|
||||
|
||||
if __name__ == '__main__':
|
||||
res = get_splash_manager_version()
|
||||
#res = restart_splash_docker('127.0.0.1:8050', 'default_splash_tor')
|
||||
print(res)
|
||||
|
|
BIN
doc/screenshots/splash_manager_config_edit_1.png
Normal file
BIN
doc/screenshots/splash_manager_config_edit_1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 104 KiB |
BIN
doc/screenshots/splash_manager_config_edit_2.png
Normal file
BIN
doc/screenshots/splash_manager_config_edit_2.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 66 KiB |
BIN
doc/screenshots/splash_manager_nb_crawlers_1.png
Normal file
BIN
doc/screenshots/splash_manager_nb_crawlers_1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 51 KiB |
BIN
doc/screenshots/splash_manager_nb_crawlers_2.png
Normal file
BIN
doc/screenshots/splash_manager_nb_crawlers_2.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 65 KiB |
|
@ -1,4 +0,0 @@
|
|||
[proxy]
|
||||
host=localhost
|
||||
port=9050
|
||||
type=SOCKS5
|
|
@ -16,9 +16,11 @@ if [ -z "$VIRTUAL_ENV" ]; then
|
|||
echo export AIL_REDIS=$(pwd)/redis/src/ >> ./AILENV/bin/activate
|
||||
echo export AIL_ARDB=$(pwd)/ardb/src/ >> ./AILENV/bin/activate
|
||||
|
||||
. ./AILENV/bin/activate
|
||||
fi
|
||||
|
||||
# activate virtual environment
|
||||
. ./AILENV/bin/activate
|
||||
|
||||
pip3 install -U pip
|
||||
pip3 install 'git+https://github.com/D4-project/BGP-Ranking.git/@7e698f87366e6f99b4d0d11852737db28e3ddc62#egg=pybgpranking&subdirectory=client'
|
||||
pip3 install -U -r requirements.txt
|
||||
|
|
|
@ -24,10 +24,12 @@ sys.path.append(os.path.join(os.environ['AIL_BIN'], 'packages'))
|
|||
import Tag
|
||||
|
||||
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
|
||||
import Domain
|
||||
import crawlers
|
||||
import Domain
|
||||
import Language
|
||||
|
||||
import Config_DB
|
||||
|
||||
r_cache = Flask_config.r_cache
|
||||
r_serv_db = Flask_config.r_serv_db
|
||||
r_serv_tags = Flask_config.r_serv_tags
|
||||
|
@ -49,13 +51,44 @@ def create_json_response(data, status_code):
|
|||
return Response(json.dumps(data, indent=2, sort_keys=True), mimetype='application/json'), status_code
|
||||
|
||||
# ============= ROUTES ==============
|
||||
@crawler_splash.route("/crawlers/dashboard", methods=['GET'])
|
||||
@login_required
|
||||
@login_read_only
|
||||
def crawlers_dashboard():
|
||||
# # TODO: get splash manager status
|
||||
is_manager_connected = crawlers.get_splash_manager_connection_metadata()
|
||||
all_splash_crawler_status = crawlers.get_all_spash_crawler_status()
|
||||
splash_crawlers_latest_stats = crawlers.get_splash_crawler_latest_stats()
|
||||
date = crawlers.get_current_date()
|
||||
|
||||
return render_template("dashboard_splash_crawler.html", all_splash_crawler_status = all_splash_crawler_status,
|
||||
is_manager_connected=is_manager_connected, date=date,
|
||||
splash_crawlers_latest_stats=splash_crawlers_latest_stats)
|
||||
|
||||
@crawler_splash.route("/crawlers/crawler_dashboard_json", methods=['GET'])
|
||||
@login_required
|
||||
@login_read_only
|
||||
def crawler_dashboard_json():
|
||||
|
||||
all_splash_crawler_status = crawlers.get_all_spash_crawler_status()
|
||||
splash_crawlers_latest_stats = crawlers.get_splash_crawler_latest_stats()
|
||||
|
||||
return jsonify({'all_splash_crawler_status': all_splash_crawler_status,
|
||||
'splash_crawlers_latest_stats':splash_crawlers_latest_stats})
|
||||
|
||||
@crawler_splash.route("/crawlers/manual", methods=['GET'])
|
||||
@login_required
|
||||
@login_read_only
|
||||
def manual():
|
||||
user_id = current_user.get_id()
|
||||
l_cookiejar = crawlers.api_get_cookies_list_select(user_id)
|
||||
return render_template("crawler_manual.html", crawler_enabled=True, l_cookiejar=l_cookiejar)
|
||||
all_crawlers_types = crawlers.get_all_crawlers_queues_types()
|
||||
all_splash_name = crawlers.get_all_crawlers_to_launch_splash_name()
|
||||
return render_template("crawler_manual.html",
|
||||
is_manager_connected=crawlers.get_splash_manager_connection_metadata(),
|
||||
all_crawlers_types=all_crawlers_types,
|
||||
all_splash_name=all_splash_name,
|
||||
l_cookiejar=l_cookiejar)
|
||||
|
||||
@crawler_splash.route("/crawlers/send_to_spider", methods=['POST'])
|
||||
@login_required
|
||||
|
@ -65,6 +98,8 @@ def send_to_spider():
|
|||
|
||||
# POST val
|
||||
url = request.form.get('url_to_crawl')
|
||||
crawler_type = request.form.get('crawler_queue_type')
|
||||
splash_name = request.form.get('splash_name')
|
||||
auto_crawler = request.form.get('crawler_type')
|
||||
crawler_delta = request.form.get('crawler_epoch')
|
||||
screenshot = request.form.get('screenshot')
|
||||
|
@ -73,6 +108,9 @@ def send_to_spider():
|
|||
max_pages = request.form.get('max_pages')
|
||||
cookiejar_uuid = request.form.get('cookiejar')
|
||||
|
||||
if splash_name:
|
||||
crawler_type = splash_name
|
||||
|
||||
if cookiejar_uuid:
|
||||
if cookiejar_uuid == 'None':
|
||||
cookiejar_uuid = None
|
||||
|
@ -81,6 +119,7 @@ def send_to_spider():
|
|||
cookiejar_uuid = cookiejar_uuid[-1].replace(' ', '')
|
||||
|
||||
res = crawlers.api_create_crawler_task(user_id, url, screenshot=screenshot, har=har, depth_limit=depth_limit, max_pages=max_pages,
|
||||
crawler_type=crawler_type,
|
||||
auto_crawler=auto_crawler, crawler_delta=crawler_delta, cookiejar_uuid=cookiejar_uuid)
|
||||
if res:
|
||||
return create_json_response(res[0], res[1])
|
||||
|
@ -459,4 +498,61 @@ def crawler_cookiejar_cookie_json_add_post():
|
|||
|
||||
return redirect(url_for('crawler_splash.crawler_cookiejar_cookie_add', cookiejar_uuid=cookiejar_uuid))
|
||||
|
||||
@crawler_splash.route('/crawler/settings', methods=['GET'])
|
||||
@login_required
|
||||
@login_analyst
|
||||
def crawler_splash_setings():
|
||||
all_proxies = crawlers.get_all_proxies_metadata()
|
||||
all_splash = crawlers.get_all_splash_crawler_metadata()
|
||||
nb_crawlers_to_launch = crawlers.get_nb_crawlers_to_launch()
|
||||
|
||||
splash_manager_url = crawlers.get_splash_manager_url()
|
||||
api_key = crawlers.get_hidden_splash_api_key()
|
||||
is_manager_connected = crawlers.get_splash_manager_connection_metadata(force_ping=True)
|
||||
crawler_full_config = Config_DB.get_full_config_by_section('crawler')
|
||||
|
||||
return render_template("settings_splash_crawler.html",
|
||||
is_manager_connected=is_manager_connected,
|
||||
splash_manager_url=splash_manager_url, api_key=api_key,
|
||||
nb_crawlers_to_launch=nb_crawlers_to_launch,
|
||||
all_splash=all_splash, all_proxies=all_proxies,
|
||||
crawler_full_config=crawler_full_config)
|
||||
|
||||
@crawler_splash.route('/crawler/settings/crawler_manager', methods=['GET', 'POST'])
|
||||
@login_required
|
||||
@login_admin
|
||||
def crawler_splash_setings_crawler_manager():
|
||||
if request.method == 'POST':
|
||||
splash_manager_url = request.form.get('splash_manager_url')
|
||||
api_key = request.form.get('api_key')
|
||||
|
||||
res = crawlers.api_save_splash_manager_url_api({'url':splash_manager_url, 'api_key':api_key})
|
||||
if res[1] != 200:
|
||||
return Response(json.dumps(res[0], indent=2, sort_keys=True), mimetype='application/json'), res[1]
|
||||
else:
|
||||
return redirect(url_for('crawler_splash.crawler_splash_setings'))
|
||||
else:
|
||||
splash_manager_url = crawlers.get_splash_manager_url()
|
||||
api_key = crawlers.get_splash_api_key()
|
||||
return render_template("settings_edit_splash_crawler_manager.html",
|
||||
splash_manager_url=splash_manager_url, api_key=api_key)
|
||||
|
||||
@crawler_splash.route('/crawler/settings/crawlers_to_lauch', methods=['GET', 'POST'])
|
||||
@login_required
|
||||
@login_admin
|
||||
def crawler_splash_setings_crawlers_to_lauch():
|
||||
if request.method == 'POST':
|
||||
dict_splash_name = {}
|
||||
for crawler_name in list(request.form):
|
||||
dict_splash_name[crawler_name]= request.form.get(crawler_name)
|
||||
res = crawlers.api_set_nb_crawlers_to_launch(dict_splash_name)
|
||||
if res[1] != 200:
|
||||
return Response(json.dumps(res[0], indent=2, sort_keys=True), mimetype='application/json'), res[1]
|
||||
else:
|
||||
return redirect(url_for('crawler_splash.crawler_splash_setings'))
|
||||
else:
|
||||
nb_crawlers_to_launch = crawlers.get_nb_crawlers_to_launch_ui()
|
||||
return render_template("settings_edit_crawlers_to_launch.html",
|
||||
nb_crawlers_to_launch=nb_crawlers_to_launch)
|
||||
|
||||
## - - ##
|
||||
|
|
|
@ -74,10 +74,13 @@ def login():
|
|||
if user.request_password_change():
|
||||
return redirect(url_for('root.change_password'))
|
||||
else:
|
||||
if next_page and next_page!='None':
|
||||
# update note
|
||||
# next page
|
||||
if next_page and next_page!='None' and next_page!='/':
|
||||
return redirect(next_page)
|
||||
# dashboard
|
||||
else:
|
||||
return redirect(url_for('dashboard.index'))
|
||||
return redirect(url_for('dashboard.index', update_note=True))
|
||||
# login failed
|
||||
else:
|
||||
# set brute force protection
|
||||
|
@ -113,7 +116,9 @@ def change_password():
|
|||
if check_password_strength(password1):
|
||||
user_id = current_user.get_id()
|
||||
create_user_db(user_id , password1, update=True)
|
||||
return redirect(url_for('dashboard.index'))
|
||||
# update Note
|
||||
# dashboard
|
||||
return redirect(url_for('dashboard.index', update_note=True))
|
||||
else:
|
||||
error = 'Incorrect password'
|
||||
return render_template("change_password.html", error=error)
|
||||
|
|
|
@ -155,6 +155,8 @@ def stuff():
|
|||
@login_required
|
||||
@login_read_only
|
||||
def index():
|
||||
update_note = request.args.get('update_note')
|
||||
|
||||
default_minute = config_loader.get_config_str("Flask", "minute_processed_paste")
|
||||
threshold_stucked_module = config_loader.get_config_int("Module_ModuleInformation", "threshold_stucked_module")
|
||||
log_select = {10, 25, 50, 100}
|
||||
|
@ -176,6 +178,7 @@ def index():
|
|||
return render_template("index.html", default_minute = default_minute, threshold_stucked_module=threshold_stucked_module,
|
||||
log_select=log_select, selected=max_dashboard_logs,
|
||||
update_warning_message=update_warning_message, update_in_progress=update_in_progress,
|
||||
update_note=update_note,
|
||||
update_warning_message_notice_me=update_warning_message_notice_me)
|
||||
|
||||
# ========= REGISTRATION =========
|
||||
|
|
|
@ -72,12 +72,10 @@
|
|||
</div>
|
||||
{%endif%}
|
||||
|
||||
<div class="alert alert-info alert-dismissible fade show mt-1" role="alert">
|
||||
<strong>Bootstrap 4 migration!</strong> Some pages are still in bootstrap 3. You can check the migration progress <strong><a href="https://github.com/CIRCL/AIL-framework/issues/330" target="_blank">Here</a></strong>.
|
||||
<button type="button" class="close" data-dismiss="alert" aria-label="Close">
|
||||
<span aria-hidden="true">×</span>
|
||||
</button>
|
||||
</div>
|
||||
<!-- TODO: Add users messages -->
|
||||
{%if update_note%}
|
||||
{% include 'dashboard/update_modal.html' %}
|
||||
{%endif%}
|
||||
|
||||
<div class="row my-2">
|
||||
|
||||
|
|
|
@ -18,6 +18,7 @@ from flask_login import login_required
|
|||
|
||||
from Date import Date
|
||||
from HiddenServices import HiddenServices
|
||||
import crawlers
|
||||
|
||||
# ============ VARIABLES ============
|
||||
import Flask_config
|
||||
|
@ -27,7 +28,6 @@ baseUrl = Flask_config.baseUrl
|
|||
r_cache = Flask_config.r_cache
|
||||
r_serv_onion = Flask_config.r_serv_onion
|
||||
r_serv_metadata = Flask_config.r_serv_metadata
|
||||
crawler_enabled = Flask_config.crawler_enabled
|
||||
bootstrap_label = Flask_config.bootstrap_label
|
||||
|
||||
sys.path.append(os.path.join(os.environ['AIL_BIN'], 'lib'))
|
||||
|
@ -231,22 +231,22 @@ def delete_auto_crawler(url):
|
|||
|
||||
# ============= ROUTES ==============
|
||||
|
||||
@hiddenServices.route("/crawlers/", methods=['GET'])
|
||||
@login_required
|
||||
@login_read_only
|
||||
def dashboard():
|
||||
crawler_metadata_onion = get_crawler_splash_status('onion')
|
||||
crawler_metadata_regular = get_crawler_splash_status('regular')
|
||||
|
||||
now = datetime.datetime.now()
|
||||
date = now.strftime("%Y%m%d")
|
||||
statDomains_onion = get_stats_last_crawled_domains('onion', date)
|
||||
statDomains_regular = get_stats_last_crawled_domains('regular', date)
|
||||
|
||||
return render_template("Crawler_dashboard.html", crawler_metadata_onion = crawler_metadata_onion,
|
||||
crawler_enabled=crawler_enabled, date=date,
|
||||
crawler_metadata_regular=crawler_metadata_regular,
|
||||
statDomains_onion=statDomains_onion, statDomains_regular=statDomains_regular)
|
||||
# @hiddenServices.route("/crawlers/", methods=['GET'])
|
||||
# @login_required
|
||||
# @login_read_only
|
||||
# def dashboard():
|
||||
# crawler_metadata_onion = get_crawler_splash_status('onion')
|
||||
# crawler_metadata_regular = get_crawler_splash_status('regular')
|
||||
#
|
||||
# now = datetime.datetime.now()
|
||||
# date = now.strftime("%Y%m%d")
|
||||
# statDomains_onion = get_stats_last_crawled_domains('onion', date)
|
||||
# statDomains_regular = get_stats_last_crawled_domains('regular', date)
|
||||
#
|
||||
# return render_template("Crawler_dashboard.html", crawler_metadata_onion = crawler_metadata_onion,
|
||||
# date=date,
|
||||
# crawler_metadata_regular=crawler_metadata_regular,
|
||||
# statDomains_onion=statDomains_onion, statDomains_regular=statDomains_regular)
|
||||
|
||||
@hiddenServices.route("/crawlers/crawler_splash_onion", methods=['GET'])
|
||||
@login_required
|
||||
|
@ -288,7 +288,7 @@ def Crawler_Splash_last_by_type():
|
|||
crawler_metadata = get_crawler_splash_status(type)
|
||||
|
||||
return render_template("Crawler_Splash_last_by_type.html", type=type, type_name=type_name,
|
||||
crawler_enabled=crawler_enabled,
|
||||
is_manager_connected=crawlers.get_splash_manager_connection_metadata(),
|
||||
last_domains=list_domains, statDomains=statDomains,
|
||||
crawler_metadata=crawler_metadata, date_from=date_string, date_to=date_string)
|
||||
|
||||
|
@ -424,7 +424,7 @@ def auto_crawler():
|
|||
|
||||
return render_template("Crawler_auto.html", page=page, nb_page_max=nb_page_max,
|
||||
last_domains=last_domains,
|
||||
crawler_enabled=crawler_enabled,
|
||||
is_manager_connected=crawlers.get_splash_manager_connection_metadata(),
|
||||
auto_crawler_domain_onions_metadata=auto_crawler_domain_onions_metadata,
|
||||
auto_crawler_domain_regular_metadata=auto_crawler_domain_regular_metadata)
|
||||
|
||||
|
@ -439,23 +439,6 @@ def remove_auto_crawler():
|
|||
delete_auto_crawler(url)
|
||||
return redirect(url_for('hiddenServices.auto_crawler', page=page))
|
||||
|
||||
@hiddenServices.route("/crawlers/crawler_dashboard_json", methods=['GET'])
|
||||
@login_required
|
||||
@login_read_only
|
||||
def crawler_dashboard_json():
|
||||
|
||||
crawler_metadata_onion = get_crawler_splash_status('onion')
|
||||
crawler_metadata_regular = get_crawler_splash_status('regular')
|
||||
|
||||
now = datetime.datetime.now()
|
||||
date = now.strftime("%Y%m%d")
|
||||
|
||||
statDomains_onion = get_stats_last_crawled_domains('onion', date)
|
||||
statDomains_regular = get_stats_last_crawled_domains('regular', date)
|
||||
|
||||
return jsonify({'statDomains_onion': statDomains_onion, 'statDomains_regular': statDomains_regular,
|
||||
'crawler_metadata_onion':crawler_metadata_onion, 'crawler_metadata_regular':crawler_metadata_regular})
|
||||
|
||||
# # TODO: refractor
|
||||
@hiddenServices.route("/hiddenServices/last_crawled_domains_with_stats_json", methods=['GET'])
|
||||
@login_required
|
||||
|
|
|
@ -92,30 +92,6 @@
|
|||
<div id="barchart_type">
|
||||
</div>
|
||||
|
||||
<div class="card mt-1 mb-1">
|
||||
<div class="card-header text-white bg-dark">
|
||||
Crawlers Status
|
||||
</div>
|
||||
<div class="card-body px-0 py-0 ">
|
||||
<table class="table">
|
||||
<tbody id="tbody_crawler_info">
|
||||
{% for crawler in crawler_metadata %}
|
||||
<tr>
|
||||
<td>
|
||||
<i class="fas fa-{%if crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};"></i> {{crawler['crawler_info']}}
|
||||
</td>
|
||||
<td>
|
||||
{{crawler['crawling_domain']}}
|
||||
</td>
|
||||
<td style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};">
|
||||
{{crawler['status_info']}}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
@ -189,79 +165,6 @@ function toggle_sidebar(){
|
|||
}
|
||||
</script>
|
||||
|
||||
|
||||
|
||||
<script>/*
|
||||
function refresh_list_crawled(){
|
||||
|
||||
$.getJSON("{{ url_for('hiddenServices.last_crawled_domains_with_stats_json') }}",
|
||||
function(data) {
|
||||
|
||||
var tableRef = document.getElementById('tbody_last_crawled');
|
||||
$("#tbody_last_crawled").empty()
|
||||
|
||||
for (var i = 0; i < data.last_domains.length; i++) {
|
||||
var data_domain = data.last_domains[i]
|
||||
var newRow = tableRef.insertRow(tableRef.rows.length);
|
||||
|
||||
var newCell = newRow.insertCell(0);
|
||||
newCell.innerHTML = "<td><a target=\"_blank\" href=\"{{ url_for('crawler_splash.showDomain') }}?onion_domain="+data_domain['domain']+"\">"+data_domain['domain']+"</a></td>";
|
||||
|
||||
newCell = newRow.insertCell(1);
|
||||
newCell.innerHTML = "<td>"+data_domain['first_seen'].substr(0, 4)+"/"+data_domain['first_seen'].substr(4, 2)+"/"+data_domain['first_seen'].substr(6, 2)+"</td>"
|
||||
|
||||
newCell = newRow.insertCell(2);
|
||||
newCell.innerHTML = "<td>"+data_domain['last_check'].substr(0, 4)+"/"+data_domain['last_check'].substr(4, 2)+"/"+data_domain['last_check'].substr(6, 2)+"</td>"
|
||||
|
||||
newCell = newRow.insertCell(3);
|
||||
newCell.innerHTML = "<td><div style=\"color:"+data_domain['status_color']+"; display:inline-block\"><i class=\"fa "+data_domain['status_icon']+" fa-2x\"></i>"+data_domain['status_text']+"</div></td>"
|
||||
|
||||
}
|
||||
var statDomains = data.statDomains
|
||||
document.getElementById('text_domain_up').innerHTML = statDomains['domains_up']
|
||||
document.getElementById('text_domain_down').innerHTML = statDomains['domains_down']
|
||||
document.getElementById('text_domain_queue').innerHTML = statDomains['domains_queue']
|
||||
document.getElementById('text_total_domains').innerHTML = statDomains['total']
|
||||
|
||||
if(data.crawler_metadata.length!=0){
|
||||
$("#tbody_crawler_info").empty();
|
||||
var tableRef = document.getElementById('tbody_crawler_info');
|
||||
for (var i = 0; i < data.crawler_metadata.length; i++) {
|
||||
var crawler = data.crawler_metadata[i];
|
||||
var newRow = tableRef.insertRow(tableRef.rows.length);
|
||||
var text_color;
|
||||
var icon;
|
||||
if(crawler['status']){
|
||||
text_color = 'Green';
|
||||
icon = 'check';
|
||||
} else {
|
||||
text_color = 'Red';
|
||||
icon = 'times';
|
||||
}
|
||||
|
||||
var newCell = newRow.insertCell(0);
|
||||
newCell.innerHTML = "<td><i class=\"fa fa-"+icon+"-circle\" style=\"color:"+text_color+";\"></i>"+crawler['crawler_info']+"</td>";
|
||||
|
||||
newCell = newRow.insertCell(1);
|
||||
newCell.innerHTML = "<td><a target=\"_blank\" href=\"{{ url_for('crawler_splash.showDomain') }}?onion_domain="+crawler['crawling_domain']+"\">"+crawler['crawling_domain']+"</a></td>";
|
||||
|
||||
newCell = newRow.insertCell(2);
|
||||
newCell.innerHTML = "<td><div style=\"color:"+text_color+";\">"+crawler['status_info']+"</div></td>";
|
||||
|
||||
$("#panel_crawler").show();
|
||||
}
|
||||
} else {
|
||||
$("#panel_crawler").hide();
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
if (to_refresh) {
|
||||
setTimeout("refresh_list_crawled()", 10000);
|
||||
}
|
||||
}*/
|
||||
</script>
|
||||
|
||||
<script>
|
||||
var margin = {top: 20, right: 90, bottom: 55, left: 0},
|
||||
width = parseInt(d3.select('#barchart_type').style('width'), 10);
|
||||
|
|
|
@ -1 +1 @@
|
|||
<li id='page-hiddenServices'><a href="{{ url_for('hiddenServices.dashboard') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>
|
||||
<li id='page-hiddenServices'><a href="{{ url_for('crawler_splash.crawlers_dashboard') }}"><i class="fa fa-user-secret"></i> hidden Services </a></li>
|
||||
|
|
|
@ -29,6 +29,8 @@
|
|||
|
||||
<div class="col-12 col-lg-10" id="core_content">
|
||||
|
||||
{% include 'dashboard/update_modal.html' %}
|
||||
|
||||
<div class="card mb-3 mt-1">
|
||||
<div class="card-header text-white bg-dark pb-1">
|
||||
<h5 class="card-title">AIL-framework Status :</h5>
|
||||
|
|
|
@ -1,6 +1,14 @@
|
|||
{% if not crawler_enabled %}
|
||||
{%if not is_manager_connected['status']%}
|
||||
<div class="alert alert-secondary text-center my-2" role="alert">
|
||||
<h1><i class="fas fa-times-circle text-danger"></i> Crawler Disabled</h1>
|
||||
<p>...</p>
|
||||
<p>
|
||||
{%if 'error' in is_manager_connected%}
|
||||
<b>{{is_manager_connected['status_code']}}</b>
|
||||
<br>
|
||||
<b>Error:</b> {{is_manager_connected['error']}}
|
||||
{%else%}
|
||||
<b>Error:</b> core/Crawler_manager not launched
|
||||
{%endif%}
|
||||
</p>
|
||||
</div>
|
||||
{% endif %}
|
||||
{%endif%}
|
||||
|
|
|
@ -44,7 +44,31 @@
|
|||
<div class="input-group" id="date-range-from">
|
||||
<input type="text" class="form-control" id="url_to_crawl" name="url_to_crawl" placeholder="Address or Domain">
|
||||
</div>
|
||||
<div class="d-flex mt-1">
|
||||
<div class="d-flex mt-2">
|
||||
<i class="fas fa-spider mt-1"></i> Crawler Type
|
||||
<div class="custom-control custom-switch">
|
||||
<input class="custom-control-input" type="checkbox" name="queue_type_selector" value="True" id="queue_type_selector">
|
||||
<label class="custom-control-label" for="queue_type_selector">
|
||||
<i class="fas fa-splotch"></i> Splash Name
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
<div id="div_crawler_queue_type">
|
||||
<select class="custom-select form-control" name="crawler_queue_type" id="crawler_queue_type">
|
||||
{%for crawler_type in all_crawlers_types%}
|
||||
<option value="{{crawler_type}}" {%if crawler_type=='tor'%}selected{%endif%}>{{crawler_type}}</option>
|
||||
{%endfor%}
|
||||
</select>
|
||||
</div>
|
||||
<div id="div_splash_name">
|
||||
<select class="custom-select form-control" name="splash_name" id="splash_name">
|
||||
<option value="None" selected>Don't use a special splash crawler</option>
|
||||
{%for splash_name in all_splash_name%}
|
||||
<option value="{{splash_name}}">{{splash_name}}</option>
|
||||
{%endfor%}
|
||||
</select>
|
||||
</div>
|
||||
<div class="d-flex mt-3">
|
||||
<i class="fas fa-user-ninja mt-1"></i> Manual
|
||||
<div class="custom-control custom-switch">
|
||||
<input class="custom-control-input" type="checkbox" name="crawler_type" value="True" id="crawler_type">
|
||||
|
@ -143,11 +167,16 @@ var chart = {};
|
|||
$(document).ready(function(){
|
||||
$("#page-Crawler").addClass("active");
|
||||
$("#nav_manual_crawler").addClass("active");
|
||||
queue_type_selector_input_controler()
|
||||
manual_crawler_input_controler();
|
||||
|
||||
$('#crawler_type').on("change", function () {
|
||||
manual_crawler_input_controler();
|
||||
});
|
||||
|
||||
$('#queue_type_selector').on("change", function () {
|
||||
queue_type_selector_input_controler();
|
||||
});
|
||||
});
|
||||
|
||||
function toggle_sidebar(){
|
||||
|
@ -172,4 +201,14 @@ function manual_crawler_input_controler() {
|
|||
}
|
||||
}
|
||||
|
||||
function queue_type_selector_input_controler() {
|
||||
if($('#queue_type_selector').is(':checked')){
|
||||
$("#div_crawler_queue_type").hide();
|
||||
$("#div_splash_name").show();
|
||||
}else{
|
||||
$("#div_crawler_queue_type").show();
|
||||
$("#div_splash_name").hide();
|
||||
}
|
||||
}
|
||||
|
||||
</script>
|
||||
|
|
|
@ -36,34 +36,15 @@
|
|||
<h5><a class="text-info" href="{{ url_for('hiddenServices.Crawler_Splash_last_by_type')}}?type=onion"><i class="fas fa-user-secret"></i> Onions Crawlers</a></h5>
|
||||
<div class="row">
|
||||
<div class="col-6">
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_domain_up">{{ statDomains_onion['domains_up'] }}</a> UP
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_onion_domain_down">{{ statDomains_onion['domains_down'] }}</a> DOWN
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_domain_up">{{ splash_crawlers_latest_stats['onion']['domains_up'] }}</a> UP
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_onion_domain_down">{{ splash_crawlers_latest_stats['onion']['domains_down'] }}</a> DOWN
|
||||
</div>
|
||||
<div class="col-6">
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_total">{{ statDomains_onion['total'] }}</a> Crawled
|
||||
<span class="badge badge-warning ml-md-3" id="stat_onion_queue">{{ statDomains_onion['domains_queue'] }}</span> Queue
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=onion&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_onion_total">{{ splash_crawlers_latest_stats['onion']['total'] }}</a> Crawled
|
||||
<span class="badge badge-warning ml-md-3" id="stat_onion_queue">{{ splash_crawlers_latest_stats['onion']['domains_queue'] }}</span> Queue
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="card-body px-0 py-0 ">
|
||||
<table class="table">
|
||||
<tbody id="tbody_crawler_onion_info">
|
||||
{% for crawler in crawler_metadata_onion %}
|
||||
<tr>
|
||||
<td>
|
||||
<i class="fas fa-{%if crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};"></i> {{crawler['crawler_info']}}
|
||||
</td>
|
||||
<td>
|
||||
{{crawler['crawling_domain']}}
|
||||
</td>
|
||||
<td style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};">
|
||||
{{crawler['status_info']}}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
@ -73,58 +54,63 @@
|
|||
<h5><a class="text-info" href="{{ url_for('hiddenServices.Crawler_Splash_last_by_type')}}?type=regular"><i class="fab fa-html5"></i> Regular Crawlers</a></h5>
|
||||
<div class="row">
|
||||
<div class="col-6">
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_domain_up">{{ statDomains_regular['domains_up'] }}</a> UP
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_regular_domain_down">{{ statDomains_regular['domains_down'] }}</a> DOWN
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_domain_up">{{ splash_crawlers_latest_stats['regular']['domains_up'] }}</a> UP
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-danger ml-md-3" id="stat_regular_domain_down">{{ splash_crawlers_latest_stats['regular']['domains_down'] }}</a> DOWN
|
||||
</div>
|
||||
<div class="col-6">
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_total">{{ statDomains_regular['total'] }}</a> Crawled
|
||||
<span class="badge badge-warning ml-md-3" id="stat_regular_queue">{{ statDomains_regular['domains_queue'] }}</span> Queue
|
||||
<a href="{{ url_for('hiddenServices.show_domains_by_daterange') }}?service_type=regular&domains_up=True&domains_down=True&date_from={{date}}&date_to={{date}}" class="badge badge-success" id="stat_regular_total">{{ splash_crawlers_latest_stats['regular']['total'] }}</a> Crawled
|
||||
<span class="badge badge-warning ml-md-3" id="stat_regular_queue">{{ splash_crawlers_latest_stats['regular']['domains_queue'] }}</span> Queue
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="card-body px-0 py-0 ">
|
||||
<table class="table">
|
||||
<tbody id="tbody_crawler_regular_info">
|
||||
{% for crawler in crawler_metadata_regular %}
|
||||
<tr>
|
||||
<td>
|
||||
<i class="fas fa-{%if crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};"></i> {{crawler['crawler_info']}}
|
||||
</td>
|
||||
<td>
|
||||
{{crawler['crawling_domain']}}
|
||||
</td>
|
||||
<td style="color:{%if crawler['status']%}Green{%else%}Red{%endif%};">
|
||||
{{crawler['status_info']}}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{% include 'domains/block_domains_name_search.html' %}
|
||||
<table class="table">
|
||||
<tbody id="tbody_crawler_onion_info">
|
||||
{% for splash_crawler in all_splash_crawler_status %}
|
||||
<tr>
|
||||
<td>
|
||||
<i class="fas fa-{%if splash_crawler['status']%}check{%else%}times{%endif%}-circle" style="color:{%if splash_crawler['status']%}Green{%else%}Red{%endif%};"></i> {{splash_crawler['crawler_info']}}
|
||||
</td>
|
||||
<td>
|
||||
{%if splash_crawler['type']=='onion'%}
|
||||
<i class="fas fa-user-secret"></i>
|
||||
{%else%}
|
||||
<i class="fab fa-html5">
|
||||
{%endif%}
|
||||
</td>
|
||||
<td>
|
||||
{{splash_crawler['crawling_domain']}}
|
||||
</td>
|
||||
<td style="color:{%if splash_crawler['status']%}Green{%else%}Red{%endif%};">
|
||||
{{splash_crawler['status_info']}}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
{% include 'domains/block_domains_name_search.html' %}
|
||||
|
||||
<hr>
|
||||
<div class="row mb-3">
|
||||
<div class="col-xl-6">
|
||||
<div class="text-center">
|
||||
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_onion')}}" role="button">
|
||||
<i class="fas fa-user-secret"></i> Onion Domain Explorer
|
||||
</a>
|
||||
<hr>
|
||||
<div class="row mb-3">
|
||||
<div class="col-xl-6">
|
||||
<div class="text-center">
|
||||
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_onion')}}" role="button">
|
||||
<i class="fas fa-user-secret"></i> Onion Domain Explorer
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
<div class="col-xl-6">
|
||||
<div class="text-center">
|
||||
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_web')}}" role="button">
|
||||
<i class="fab fa-html5"></i> Web Domain Explorer
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="col-xl-6">
|
||||
<div class="text-center">
|
||||
<a class="btn btn-secondary" href="{{url_for('crawler_splash.domains_explorer_web')}}" role="button">
|
||||
<i class="fab fa-html5"></i> Web Domain Explorer
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
|
@ -176,24 +162,24 @@ function toggle_sidebar(){
|
|||
|
||||
function refresh_crawler_status(){
|
||||
|
||||
$.getJSON("{{ url_for('hiddenServices.crawler_dashboard_json') }}",
|
||||
$.getJSON("{{ url_for('crawler_splash.crawler_dashboard_json') }}",
|
||||
function(data) {
|
||||
|
||||
$('#stat_onion_domain_up').text(data.statDomains_onion['domains_up']);
|
||||
$('#stat_onion_domain_down').text(data.statDomains_onion['domains_down']);
|
||||
$('#stat_onion_total').text(data.statDomains_onion['total']);
|
||||
$('#stat_onion_queue').text(data.statDomains_onion['domains_queue']);
|
||||
$('#stat_onion_domain_up').text(data.splash_crawlers_latest_stats['onion']['domains_up']);
|
||||
$('#stat_onion_domain_down').text(data.splash_crawlers_latest_stats['onion']['domains_down']);
|
||||
$('#stat_onion_total').text(data.splash_crawlers_latest_stats['onion']['total']);
|
||||
$('#stat_onion_queue').text(data.splash_crawlers_latest_stats['onion']['domains_queue']);
|
||||
|
||||
$('#stat_regular_domain_up').text(data.statDomains_regular['domains_up']);
|
||||
$('#stat_regular_domain_down').text(data.statDomains_regular['domains_down']);
|
||||
$('#stat_regular_total').text(data.statDomains_regular['total']);
|
||||
$('#stat_regular_queue').text(data.statDomains_regular['domains_queue']);
|
||||
$('#stat_regular_domain_up').text(data.splash_crawlers_latest_stats['regular']['domains_up']);
|
||||
$('#stat_regular_domain_down').text(data.splash_crawlers_latest_stats['regular']['domains_down']);
|
||||
$('#stat_regular_total').text(data.splash_crawlers_latest_stats['regular']['total']);
|
||||
$('#stat_regular_queue').text(data.splash_crawlers_latest_stats['regular']['domains_queue']);
|
||||
|
||||
if(data.crawler_metadata_onion.length!=0){
|
||||
if(data.all_splash_crawler_status.length!=0){
|
||||
$("#tbody_crawler_onion_info").empty();
|
||||
var tableRef = document.getElementById('tbody_crawler_onion_info');
|
||||
for (var i = 0; i < data.crawler_metadata_onion.length; i++) {
|
||||
var crawler = data.crawler_metadata_onion[i];
|
||||
for (var i = 0; i < data.all_splash_crawler_status.length; i++) {
|
||||
var crawler = data.all_splash_crawler_status[i];
|
||||
var newRow = tableRef.insertRow(tableRef.rows.length);
|
||||
var text_color;
|
||||
var icon;
|
||||
|
@ -205,41 +191,22 @@ function refresh_crawler_status(){
|
|||
icon = 'times';
|
||||
}
|
||||
|
||||
var newCell = newRow.insertCell(0);
|
||||
newCell.innerHTML = "<td><i class=\"fas fa-"+icon+"-circle\" style=\"color:"+text_color+";\"></i> "+crawler['crawler_info']+"</td>";
|
||||
|
||||
newCell = newRow.insertCell(1);
|
||||
newCell.innerHTML = "<td>"+crawler['crawling_domain']+"</td>";
|
||||
|
||||
newCell = newRow.insertCell(2);
|
||||
newCell.innerHTML = "<td><div style=\"color:"+text_color+";\">"+crawler['status_info']+"</div></td>";
|
||||
|
||||
//$("#panel_crawler").show();
|
||||
}
|
||||
}
|
||||
if(data.crawler_metadata_regular.length!=0){
|
||||
$("#tbody_crawler_regular_info").empty();
|
||||
var tableRef = document.getElementById('tbody_crawler_regular_info');
|
||||
for (var i = 0; i < data.crawler_metadata_regular.length; i++) {
|
||||
var crawler = data.crawler_metadata_regular[i];
|
||||
var newRow = tableRef.insertRow(tableRef.rows.length);
|
||||
var text_color;
|
||||
var icon;
|
||||
if(crawler['status']){
|
||||
text_color = 'Green';
|
||||
icon = 'check';
|
||||
if(crawler['type'] === 'onion'){
|
||||
icon_t = 'fas fa-user-secret';
|
||||
} else {
|
||||
text_color = 'Red';
|
||||
icon = 'times';
|
||||
icon_t = 'fab fa-html5';
|
||||
}
|
||||
|
||||
var newCell = newRow.insertCell(0);
|
||||
newCell.innerHTML = "<td><i class=\"fas fa-"+icon+"-circle\" style=\"color:"+text_color+";\"></i> "+crawler['crawler_info']+"</td>";
|
||||
|
||||
newCell = newRow.insertCell(1);
|
||||
newCell.innerHTML = "<td>"+crawler['crawling_domain']+"</td>";
|
||||
var newCell = newRow.insertCell(1);
|
||||
newCell.innerHTML = "<td><i class=\""+icon_t+"\"></i></td>";
|
||||
|
||||
newCell = newRow.insertCell(2);
|
||||
newCell.innerHTML = "<td>"+crawler['crawling_domain']+"</td>";
|
||||
|
||||
newCell = newRow.insertCell(3);
|
||||
newCell.innerHTML = "<td><div style=\"color:"+text_color+";\">"+crawler['status_info']+"</div></td>";
|
||||
|
||||
//$("#panel_crawler").show();
|
|
@ -0,0 +1,60 @@
|
|||
<!DOCTYPE html>
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>AIL-Framework</title>
|
||||
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png')}}">
|
||||
<!-- Core CSS -->
|
||||
<link href="{{ url_for('static', filename='css/bootstrap4.min.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='css/font-awesome.min.css') }}" rel="stylesheet">
|
||||
|
||||
<!-- JS -->
|
||||
<script src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||
<script src="{{ url_for('static', filename='js/bootstrap4.min.js')}}"></script>
|
||||
|
||||
|
||||
</head>
|
||||
|
||||
<body>
|
||||
|
||||
{% include 'nav_bar.html' %}
|
||||
|
||||
<div class="container-fluid">
|
||||
<div class="row">
|
||||
|
||||
{% include 'crawler/menu_sidebar.html' %}
|
||||
|
||||
<div class="col-12 col-lg-10" id="core_content">
|
||||
|
||||
<form action="{{ url_for('crawler_splash.crawler_splash_setings_crawlers_to_lauch') }}" method="post" enctype="multipart/form-data">
|
||||
<h5 class="card-title">Number of Crawlers to Launch:</h5>
|
||||
<table class="table table-sm">
|
||||
<tbody>
|
||||
{%for crawler_name in nb_crawlers_to_launch%}
|
||||
<tr>
|
||||
<td>{{crawler_name}}</td>
|
||||
<td>
|
||||
<input class="form-control" type="number" id="{{crawler_name}}" value="{{nb_crawlers_to_launch[crawler_name]}}" min="0" name="{{crawler_name}}" required>
|
||||
</td>
|
||||
</tr>
|
||||
{%endfor%}
|
||||
</tbody>
|
||||
</table>
|
||||
<button type="submit" class="btn btn-primary">Edit <i class="fas fa-pencil-alt"></i></button>
|
||||
</form>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</body>
|
||||
|
||||
<script>
|
||||
var to_refresh = false
|
||||
$(document).ready(function(){
|
||||
$("#page-Crawler").addClass("active");
|
||||
$("#nav_settings").addClass("active");
|
||||
});
|
||||
|
||||
</script>
|
|
@ -0,0 +1,55 @@
|
|||
<!DOCTYPE html>
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>AIL-Framework</title>
|
||||
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png')}}">
|
||||
<!-- Core CSS -->
|
||||
<link href="{{ url_for('static', filename='css/bootstrap4.min.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='css/font-awesome.min.css') }}" rel="stylesheet">
|
||||
|
||||
<!-- JS -->
|
||||
<script src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||
<script src="{{ url_for('static', filename='js/bootstrap4.min.js')}}"></script>
|
||||
|
||||
|
||||
</head>
|
||||
|
||||
<body>
|
||||
|
||||
{% include 'nav_bar.html' %}
|
||||
|
||||
<div class="container-fluid">
|
||||
<div class="row">
|
||||
|
||||
{% include 'crawler/menu_sidebar.html' %}
|
||||
|
||||
<div class="col-12 col-lg-10" id="core_content">
|
||||
|
||||
<form action="{{ url_for('crawler_splash.crawler_splash_setings_crawler_manager') }}" method="post" enctype="multipart/form-data">
|
||||
<div class="form-group">
|
||||
<label for="splash_manager_url">Splash Manager URL</label>
|
||||
<input type="text" class="form-control" id="splash_manager_url" placeholder="https://splash_manager_url" name="splash_manager_url" {%if splash_manager_url%}value="{{splash_manager_url}}"{%endif%}>
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label for="api_key">API Key</label>
|
||||
<input type="text" class="form-control" id="api_key" placeholder="API Key" name="api_key" {%if api_key%}value="{{api_key}}"{%endif%}>
|
||||
</div>
|
||||
<button type="submit" class="btn btn-primary">Edit <i class="fas fa-pencil-alt"></i></button>
|
||||
</form>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</body>
|
||||
|
||||
<script>
|
||||
var to_refresh = false
|
||||
$(document).ready(function(){
|
||||
$("#page-Crawler").addClass("active");
|
||||
$("#nav_settings").addClass("active");
|
||||
});
|
||||
|
||||
</script>
|
|
@ -0,0 +1,299 @@
|
|||
<!DOCTYPE html>
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>AIL-Framework</title>
|
||||
<link rel="icon" href="{{ url_for('static', filename='image/ail-icon.png')}}">
|
||||
<!-- Core CSS -->
|
||||
<link href="{{ url_for('static', filename='css/bootstrap4.min.css') }}" rel="stylesheet">
|
||||
<link href="{{ url_for('static', filename='css/font-awesome.min.css') }}" rel="stylesheet">
|
||||
|
||||
<!-- JS -->
|
||||
<script src="{{ url_for('static', filename='js/jquery.js')}}"></script>
|
||||
<script src="{{ url_for('static', filename='js/bootstrap4.min.js')}}"></script>
|
||||
|
||||
|
||||
</head>
|
||||
|
||||
<body>
|
||||
|
||||
{% include 'nav_bar.html' %}
|
||||
|
||||
<div class="container-fluid">
|
||||
<div class="row">
|
||||
|
||||
{% include 'crawler/menu_sidebar.html' %}
|
||||
|
||||
<div class="col-12 col-lg-10" id="core_content">
|
||||
|
||||
<div class="row">
|
||||
<div class="col-xl-6">
|
||||
|
||||
|
||||
|
||||
</div>
|
||||
<div class="col-xl-6">
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
<div class="card mb-3 mt-1">
|
||||
<div class="card-header bg-dark text-white">
|
||||
<span class="badge badge-pill badge-light flex-row-reverse float-right">
|
||||
{% if is_manager_connected['status'] %}
|
||||
<div style="color:Green;">
|
||||
<i class="fas fa-check-circle fa-2x"></i>
|
||||
Connected
|
||||
</div>
|
||||
{% else %}
|
||||
<div style="color:Red;">
|
||||
<i class="fas fa-times-circle fa-2x"></i>
|
||||
Error
|
||||
</div>
|
||||
{% endif %}
|
||||
</span>
|
||||
<h4>Splash Crawler Manager</h4>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
|
||||
{%if not is_manager_connected['status']%}
|
||||
{% include 'crawler/crawler_disabled.html' %}
|
||||
{%endif%}
|
||||
|
||||
<div class="row mb-3 justify-content-center">
|
||||
<div class="col-xl-6">
|
||||
<div class="card text-center border-secondary">
|
||||
<div class="card-body px-1 py-0">
|
||||
<table class="table table-sm">
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>Splash Manager URL</td>
|
||||
<td>{{splash_manager_url}}</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>API Key</td>
|
||||
<td>
|
||||
{{api_key}}
|
||||
<!-- <a class="ml-3" href="/settings/new_token"><i class="fa fa-random"></i></a> -->
|
||||
</td>
|
||||
<td>
|
||||
<a href="{{ url_for('crawler_splash.crawler_splash_setings_crawler_manager') }}">
|
||||
<button type="button" class="btn btn-info">
|
||||
Edit <i class="fas fa-pencil-alt"></i>
|
||||
</button>
|
||||
</a>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div {%if not is_manager_connected%}class="hidden"{%endif%}>
|
||||
|
||||
<div class="card border-secondary mb-4">
|
||||
<div class="card-body text-dark">
|
||||
<h5 class="card-title">Number of Crawlers to Launch:</h5>
|
||||
<table class="table table-sm">
|
||||
<tbody>
|
||||
{%for crawler in nb_crawlers_to_launch%}
|
||||
<tr>
|
||||
<td>{{crawler}}</td>
|
||||
<td>{{nb_crawlers_to_launch[crawler]}}</td>
|
||||
</tr>
|
||||
{%endfor%}
|
||||
</tbody>
|
||||
</table>
|
||||
<a href="{{ url_for('crawler_splash.crawler_splash_setings_crawlers_to_lauch') }}">
|
||||
<button type="button" class="btn btn-info">
|
||||
Edit number of crawlers to launch <i class="fas fa-pencil-alt"></i>
|
||||
</button>
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="card border-secondary mb-4">
|
||||
<div class="card-body text-dark">
|
||||
<h5 class="card-title">All Splash Crawlers:</h5>
|
||||
<table class="table table-striped">
|
||||
<thead class="bg-info text-white">
|
||||
<th>
|
||||
Splash name
|
||||
</th>
|
||||
<th>
|
||||
Proxy
|
||||
</th>
|
||||
<th>
|
||||
Crawler type
|
||||
</th>
|
||||
<th>
|
||||
Description
|
||||
</th>
|
||||
<th></th>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for splash_name in all_splash %}
|
||||
<tr>
|
||||
<td>
|
||||
{{splash_name}}
|
||||
</td>
|
||||
<td>
|
||||
{{all_splash[splash_name]['proxy']}}
|
||||
</td>
|
||||
<td>
|
||||
{%if all_splash[splash_name]['type']=='tor'%}
|
||||
<i class="fas fa-user-secret"></i>
|
||||
{%else%}
|
||||
<i class="fab fa-html5">
|
||||
{%endif%}
|
||||
{{all_splash[splash_name]['type']}}
|
||||
</td>
|
||||
<td>
|
||||
{{all_splash[splash_name]['description']}}
|
||||
</td>
|
||||
<td>
|
||||
<div class="d-flex justify-content-end">
|
||||
<button class="btn btn-outline-dark px-1 py-0">
|
||||
<i class="fas fa-pencil-alt"></i>
|
||||
</button>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="card border-secondary">
|
||||
<div class="card-body text-dark">
|
||||
<h5 class="card-title">All Proxies:</h5>
|
||||
<table class="table table-striped">
|
||||
<thead class="bg-info text-white">
|
||||
<th>
|
||||
Proxy name
|
||||
</th>
|
||||
<th>
|
||||
Host
|
||||
</th>
|
||||
<th>
|
||||
Port
|
||||
</th>
|
||||
<th>
|
||||
Type
|
||||
</th>
|
||||
<th>
|
||||
Crawler Type
|
||||
</th>
|
||||
<th>
|
||||
Description
|
||||
</th>
|
||||
<th></th>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for proxy_name in all_proxies %}
|
||||
<tr>
|
||||
<td>
|
||||
{{proxy_name}}
|
||||
</td>
|
||||
<td>
|
||||
{{all_proxies[proxy_name]['host']}}
|
||||
</td>
|
||||
<td>
|
||||
{{all_proxies[proxy_name]['port']}}
|
||||
</td>
|
||||
<td>
|
||||
{{all_proxies[proxy_name]['type']}}
|
||||
</td>
|
||||
<td>
|
||||
{%if all_proxies[proxy_name]['crawler_type']=='tor'%}
|
||||
<i class="fas fa-user-secret"></i>
|
||||
{%else%}
|
||||
<i class="fab fa-html5">
|
||||
{%endif%}
|
||||
{{all_proxies[proxy_name]['crawler_type']}}
|
||||
</td>
|
||||
<td>
|
||||
{{all_proxies[proxy_name]['description']}}
|
||||
</td>
|
||||
<td>
|
||||
<div class="d-flex justify-content-end">
|
||||
<button class="btn btn-outline-dark px-1 py-0">
|
||||
<i class="fas fa-pencil-alt"></i>
|
||||
</button>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="card mb-3 mt-1">
|
||||
<div class="card-header bg-dark text-white">
|
||||
<h4>Crawlers Settings</h4>
|
||||
</div>
|
||||
<div class="card-body">
|
||||
|
||||
<table class="table table-striped table-hover">
|
||||
<thead class="bg-info text-white">
|
||||
<th>
|
||||
Key
|
||||
</th>
|
||||
<th>
|
||||
Description
|
||||
</th>
|
||||
<th>
|
||||
Value
|
||||
</th>
|
||||
<th></th>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for config_field in crawler_full_config %}
|
||||
<tr>
|
||||
<td>
|
||||
{{config_field}}
|
||||
</td>
|
||||
<td>
|
||||
{{crawler_full_config[config_field]['info']}}
|
||||
</td>
|
||||
<td>
|
||||
{{crawler_full_config[config_field]['value']}}
|
||||
</td>
|
||||
<td>
|
||||
<div class="d-flex justify-content-end">
|
||||
<button class="btn btn-outline-dark px-1 py-0">
|
||||
<i class="fas fa-pencil-alt"></i>
|
||||
</button>
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</body>
|
||||
|
||||
<script>
|
||||
var to_refresh = false
|
||||
$(document).ready(function(){
|
||||
$("#page-Crawler").addClass("active");
|
||||
$("#nav_settings").addClass("active");
|
||||
});
|
||||
|
||||
</script>
|
|
@ -14,7 +14,7 @@
|
|||
</h5>
|
||||
<ul class="nav flex-md-column flex-row navbar-nav justify-content-between w-100"> <!--nav-pills-->
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{url_for('hiddenServices.dashboard')}}" id="nav_dashboard">
|
||||
<a class="nav-link" href="{{url_for('crawler_splash.crawlers_dashboard')}}" id="nav_dashboard">
|
||||
<i class="fas fa-search"></i>
|
||||
<span>Dashboard</span>
|
||||
</a>
|
||||
|
@ -43,6 +43,12 @@
|
|||
Automatic Crawler
|
||||
</a>
|
||||
</li>
|
||||
<li class="nav-item">
|
||||
<a class="nav-link" href="{{url_for('crawler_splash.crawler_splash_setings')}}" id="nav_settings">
|
||||
<i class="fas fa-cog"></i>
|
||||
Settings
|
||||
</a>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h5 class="d-flex text-muted w-100" id="nav_title_domains_explorer">
|
||||
|
|
42
var/www/templates/dashboard/update_modal.html
Normal file
42
var/www/templates/dashboard/update_modal.html
Normal file
|
@ -0,0 +1,42 @@
|
|||
<div class="modal fade" id="update_modal" tabindex="-1" role="dialog" aria-labelledby="exampleModalLabel" aria-hidden="true">
|
||||
<div class="modal-dialog modal-lg" role="document">
|
||||
<div class="modal-content">
|
||||
<div class="modal-header bg-secondary text-white">
|
||||
<h5 class="modal-title" id="exampleModalLabel">Update Note: v3.5 - Splash Manager</h5>
|
||||
<button type="button" class="close" data-dismiss="modal" aria-label="Close">
|
||||
<span aria-hidden="true">×</span>
|
||||
</button>
|
||||
</div>
|
||||
<div class="modal-body">
|
||||
<div class="alert alert-danger text-danger" role="alert">All Splash Crawlers have been removed from the core.</div>
|
||||
AIL is using a new Crawler manager to start and launch dockers and tor/web crawlers
|
||||
|
||||
<ul class="list-group my-3">
|
||||
<li class="list-group-item active">Splash Manager Features:</li>
|
||||
<li class="list-group-item">Install and run Splash crawlers on another server</li>
|
||||
<li class="list-group-item">Handle proxies (Web and tor)</li>
|
||||
<li class="list-group-item">Launch/Kill Splash Dockers</li>
|
||||
<li class="list-group-item">Restart crawlers on crash</li>
|
||||
</ul>
|
||||
|
||||
<div class="d-flex justify-content-center">
|
||||
<a class="btn btn-info" href="https://github.com/ail-project/ail-splash-manager" role="button">
|
||||
<i class="fab fa-github"></i> Install and Configure AIL-Splash-Manager
|
||||
</a>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
<div class="modal-footer">
|
||||
<button type="button" class="btn btn-secondary" data-dismiss="modal">Close</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
|
||||
<script>
|
||||
$(window).on('load', function() {
|
||||
$('#update_modal').modal('show');
|
||||
});
|
||||
</script>
|
|
@ -22,7 +22,7 @@
|
|||
<a class="nav-link" id="page-Tracker" href="{{ url_for('hunter.tracked_menu') }}" aria-disabled="true"><i class="fas fa-crosshairs"></i> Leaks Hunter</a>
|
||||
</li>
|
||||
<li class="nav-item mr-3">
|
||||
<a class="nav-link" id="page-Crawler" href="{{ url_for('hiddenServices.dashboard') }}" tabindex="-1" aria-disabled="true"><i class="fas fa-spider"></i> Crawlers</a>
|
||||
<a class="nav-link" id="page-Crawler" href="{{ url_for('crawler_splash.crawlers_dashboard') }}" tabindex="-1" aria-disabled="true"><i class="fas fa-spider"></i> Crawlers</a>
|
||||
</li>
|
||||
<li class="nav-item mr-3">
|
||||
<a class="nav-link" id="page-Decoded" href="{{ url_for('hashDecoded.hashDecoded_page') }}" aria-disabled="true"><i class="fas fa-cube"></i> Objects</a>
|
||||
|
|
Loading…
Reference in a new issue