Merge pull request #128 from mokaddem/doc

Doc
This commit is contained in:
mokaddem 2017-07-17 13:50:54 +02:00 committed by GitHub
commit 7c88e7d013
26 changed files with 280 additions and 138 deletions

98
HOWTO.md Normal file
View file

@ -0,0 +1,98 @@
Feeding, adding new features and contributing
=============================================
How to feed the AIL framework
-----------------------------
For the moment, there are three different ways to feed AIL with data:
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL.
2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below).
3. You can feed your own data using the [./bin/import_dir.py](./bin/import_dir.py) script.
### Feeding AIL with pystemon
AIL is an analysis tool, not a collector!
However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries!
Feed data to AIL:
1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon)
2. Install its python dependencies inside your virtual environment
3. Launch pystemon ``` ./pystemon ```
4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly
5. Launch pystemon-feeder ``` ./pystemon-feeder.py ```
How to create a new module
--------------------------
If you want to add a new processing or analysis module in AIL, follow these simple steps:
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to at least one module at minimum (Usually, Redis_Global).
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
How to create a new webpage
---------------------------
If you want to add a new webpage for a module in AIL, follow these simple steps:
1. Launch [./var/www/create_new_web_module.py](./var/www/create_new_web_module.py) and enter the name to use for your webpage (Usually, your newly created python module).
2. A template and flask skeleton has been created for your new webpage in [./var/www/modules/](./var/www/modules/)
3. Edit the created html files under the template folder as well as the Flask_* python script so that they fit your needs.
4. You can change the order of your module in the top navigation header in the file [./var/www/templates/header_base.html](./var/www/templates/header_base.html)
5. You can ignore module, and so, not display them in the top navigation header by adding the module name in the file [./var/www/templates/ignored_modules.txt](./var/www/templates/ignored_modules.txt)
How to contribute a module
--------------------------
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
To contribute your module, feel free to pull your contribution.
Additional information
======================
Manage modules: ModulesInformationV2.py
---------------------------------------
You can do a lots of things easily with the [./bin/ModulesInformationV2](./bin/ModulesInformationV2) script:
- Monitor the health of other modules
- Monitor the ressources comsumption of other modules
- Start one or more modules
- Kill running modules
- Restart automatically stuck modules
- Show the paste currently processed by a module
### Navigation
You can navigate into the interface by using arrow keys. In order to perform an action on a selected module, you can either press <ENTER> or <SPACE> to show the dialog box.
To change list, you can press the <TAB> key.
Also, you can quickly stop or start modules by clicking on the <K> or <S> symbol respectively. These are located in the _Action_ column.
Finally, you can quit this program by pressing either <q> or <C-c>
Terms frequency usage
---------------------
In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface.
- You can track a term by simply putting it in the box.
- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\`
- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`.

22
OVERVIEW.md Normal file
View file

@ -0,0 +1,22 @@
Overview
========
Redis and LevelDB overview
--------------------------
* Redis on TCP port 6379
- DB 0 - Cache hostname/dns
- DB 1 - Paste meta-data
* Redis on TCP port 6380 - Redis Log only
* Redis on TCP port 6381
- DB 0 - PubSub + Queue and Paste content LRU cache
- DB 1 - _Mixer_ Cache
* LevelDB on TCP port 6382
- DB 1 - Curve
- DB 2 - Trending
- DB 3 - Terms
- DB 4 - Sentiments
* LevelDB on TCP port <year>
- DB 0 - Lines duplicate
- DB 1 - Hashs

118
README.md
View file

@ -11,50 +11,24 @@ AIL is a modular framework to analyse potential information leaks from unstructu
![Dashboard](./doc/screenshots/dashboard.png?raw=true "AIL framework dashboard")
Trending charts
---------------
![Trending-Web](./doc/screenshots/trending-web.png?raw=true "AIL framework webtrending")
![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending")
Browsing
--------
![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes")
Sentiment analysis
------------------
![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis")
Terms manager and occurence
---------------------------
![Term-Manager](./doc/screenshots/terms-manager.png?raw=true "AIL framework termManager")
## Top terms
![Term-Top](./doc/screenshots/terms-top.png?raw=true "AIL framework termTop")
![Term-Plot](./doc/screenshots/terms-plot.png?raw=true "AIL framework termPlot")
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
Features
--------
* Modular architecture to handle streams of unstructured or structured information
* Default support for external ZMQ feeds, such as provided by CIRCL or other providers
* Multiple feed support
* Each module can process and reprocess the information already processed by AIL
* Detecting and extracting URLs including their geographical location (e.g. IP address location)
* Extracting and validating potential leak of credit cards numbers
* Extracting and validating potential leak of credit cards numbers, credentials, ...
* Extracting and validating email addresses leaked including DNS MX validation
* Module for extracting Tor .onion addresses (to be further processed for analysis)
* Keep tracks of duplicates
* Extracting and validating potential hostnames (e.g. to feed Passive DNS systems)
* A full-text indexer module to index unstructured information
* Modules and web statistics
* Statistics on modules and web
* Realtime modules manager in terminal
* Global sentiment analysis for each providers based on nltk vader module
* Terms tracking and occurrence
* Terms, Set of terms and Regex tracking and occurrence
* Many more modules for extracting phone numbers, credentials and others
Installation
@ -101,69 +75,51 @@ Eventually you can browse the status of the AIL framework website at the followi
``http://localhost:7000/``
How to
======
HOWTO
-----
How to feed the AIL framework
-----------------------------
For the moment, there are two different ways to feed AIL with data:
1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL.
2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below).
###Feeding AIL with pystemon
AIL is an analysis tool, not a collector!
However, if you want to collect some pastes and feed them to AIL, the procedure is described below.
Nevertheless, moderate your queries!
Here are the steps to setup pystemon and feed data to AIL:
1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon)
2. Install its python dependencies inside your virtual environment
3. Launch pystemon ``` ./pystemon ```
4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly
5. Launch pystemon-feeder ``` ./pystemon-feeder.py ```
HOWTO are available in [HOWTO.md](HOWTO.md)
How to create a new module
--------------------------
Screenshots
===========
If you want to add a new processing or analysis module in AIL, follow these simple steps:
Trending charts
---------------
1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to the Redis_Global at minimum.
![Trending-Web](./doc/screenshots/trending-web.png?raw=true "AIL framework webtrending")
![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending")
2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration.
Browsing
--------
How to contribute a module
--------------------------
![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes")
Feel free to fork the code, play with it, make some patches or add additional analysis modules.
Sentiment analysis
------------------
To contribute your module, feel free to pull your contribution.
![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis")
Overview and License
====================
Terms manager and occurence
---------------------------
![Term-Manager](./doc/screenshots/terms-manager.png?raw=true "AIL framework termManager")
### Top terms
![Term-Top](./doc/screenshots/terms-top.png?raw=true "AIL framework termTop")
![Term-Plot](./doc/screenshots/terms-plot.png?raw=true "AIL framework termPlot")
Redis and LevelDB overview
--------------------------
[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo)
* Redis on TCP port 6379 - DB 1 - Paste meta-data
* DB 0 - Cache hostname/dns
* Redis on TCP port 6380 - Redis Pub-Sub only
* Redis on TCP port 6381 - DB 0 - Queue and Paste content LRU cache
* Redis on TCP port 6382 - DB 1-4 - Trending, terms and sentiments
* LevelDB on TCP port <year> - Lines duplicate
Command line module manager
---------------------------
LICENSE
-------
![Module-Manager](./doc/screenshots/module-manager.png?raw=true "AIL framework ModuleInformationV2.py")
License
=======
```
Copyright (C) 2014 Jules Debra

View file

@ -5,25 +5,7 @@
The ZMQ_Sub_Attribute Module
============================
This module is consuming the Redis-list created by the ZMQ_PubSub_Line_Q Module
It perform a sorting on the line's length and publish/forward them to
differents channels:
*Channel 1 if max length(line) < max
*Channel 2 if max length(line) > max
The collected informations about the processed pastes
(number of lines and maximum length line) are stored in Redis.
..note:: Module ZMQ_Something_Q and ZMQ_Something are closely bound, always put
the same Subscriber name in both of them.
Requirements
------------
*Need running Redis instances. (LevelDB & Redis)
*Need the ZMQ_PubSub_Line_Q Module running to be able to work properly.
This module is saving Attribute of the paste into redis
"""
import time

View file

@ -1,5 +1,16 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
"""
The Credential Module
=====================
This module is consuming the Redis-list created by the Categ module.
It apply credential regexes on paste content and warn if above a threshold.
"""
import time
import sys
from packages import Paste

View file

@ -1,5 +1,17 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
"""
The CreditCards Module
======================
This module is consuming the Redis-list created by the Categ module.
It apply credit card regexes on paste content and warn if above a threshold.
"""
import pprint
import time
from packages import Paste
@ -7,7 +19,6 @@ from packages import lib_refine
from pubsublogger import publisher
import re
from Helper import Process
if __name__ == "__main__":

View file

@ -5,14 +5,6 @@
This module manage top sets for terms frequency.
Every 'refresh_rate' update the weekly and monthly set
Requirements
------------
*Need running Redis instances. (Redis)
*Categories files of words in /files/ need to be created
*Need the ZMQ_PubSub_Tokenize_Q Module running to be able to work properly.
"""
import redis

View file

@ -1,7 +1,13 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
"""
Template for new modules
The CVE Module
======================
This module is consuming the Redis-list created by the Categ module.
It apply CVE regexes on paste content and warn if a reference to a CVE is spotted.
"""
import time

View file

@ -5,8 +5,8 @@
The DomClassifier Module
============================
The DomClassifier modules is fetching the list of files to be
processed and index each file with a full-text indexer (Whoosh until now).
The DomClassifier modules extract and classify Internet domains/hostnames/IP addresses from
the out output of the Global module.
"""
import time

View file

@ -1,7 +1,14 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
"""
Template for new modules
The Keys Module
======================
This module is consuming the Redis-list created by the Global module.
It is looking for PGP encrypted messages
"""
import time

View file

@ -1,6 +1,16 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
"""
The CreditCards Module
======================
This module is consuming the Redis-list created by the Categ module.
It apply mail regexes on paste content and warn if above a threshold.
"""
import redis
import pprint
import time

View file

@ -1,8 +1,8 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
"""
The ZMQ_Feed_Q Module
=====================
The Mixer Module
================
This module is consuming the Redis-list created by the ZMQ_Feed_Q Module.
@ -22,13 +22,7 @@ Depending on the configuration, this module will process the feed as follow:
Note that the hash of the content is defined as the sha1(gzip64encoded).
Every data coming from a named feed can be sent to a pre-processing module before going to the global module.
The mapping can be done via the variable feed_queue_mapping
Requirements
------------
*Need running Redis instances.
*Need the ZMQ_Feed_Q Module running to be able to work properly.
The mapping can be done via the variable FEED_QUEUE_MAPPING
"""
import base64
@ -44,7 +38,7 @@ from Helper import Process
# CONFIG #
refresh_time = 30
feed_queue_mapping = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module
FEED_QUEUE_MAPPING = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module
if __name__ == '__main__':
publisher.port = 6380
@ -117,8 +111,8 @@ if __name__ == '__main__':
else: # New content
# populate Global OR populate another set based on the feeder_name
if feeder_name in feed_queue_mapping:
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
if feeder_name in FEED_QUEUE_MAPPING:
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
else:
p.populate_set_out(relay_message, 'Mixer')
@ -139,8 +133,8 @@ if __name__ == '__main__':
server.expire('HASH_'+paste_name, ttl_key)
# populate Global OR populate another set based on the feeder_name
if feeder_name in feed_queue_mapping:
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
if feeder_name in FEED_QUEUE_MAPPING:
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
else:
p.populate_set_out(relay_message, 'Mixer')
@ -153,8 +147,8 @@ if __name__ == '__main__':
server.expire(paste_name, ttl_key)
# populate Global OR populate another set based on the feeder_name
if feeder_name in feed_queue_mapping:
p.populate_set_out(relay_message, feed_queue_mapping[feeder_name])
if feeder_name in FEED_QUEUE_MAPPING:
p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name])
else:
p.populate_set_out(relay_message, 'Mixer')

View file

@ -145,6 +145,7 @@ if __name__ == "__main__":
PST.p_name)
for url in fetch(p, r_cache, urls, domains_list, path):
publisher.warning('{}Checked {};{}'.format(to_print, url, PST.p_path))
p.populate_set_out('onion;{}'.format(PST.p_path), 'BrowseWarningPaste')
else:
publisher.info('{}Onion related;{}'.format(to_print, PST.p_path))

View file

@ -1,7 +1,14 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
"""
module for finding phone numbers
The Phone Module
================
This module is consuming the Redis-list created by the Categ module.
It apply phone number regexes on paste content and warn if above a threshold.
"""
import time
@ -17,6 +24,7 @@ def search_phone(message):
content = paste.get_p_content()
# regex to find phone numbers, may raise many false positives (shalt thou seek optimization, upgrading is required)
reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\d{2,3}){3,4})')
reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\(?\d{2,4}\)?){3,4})')
# list of the regex results in the Paste, may be null
results = reg_phone.findall(content)

View file

@ -2,6 +2,8 @@
# -*-coding:UTF-8 -*
"""
This Module is used for term frequency.
It processes every paste coming from the global module and test the regexs
supplied in the term webpage.
"""
import redis

View file

@ -6,6 +6,11 @@ from pubsublogger import publisher
from Helper import Process
import re
'''
This module takes its input from the global module.
It applies some regex and publish matched content
'''
if __name__ == "__main__":
publisher.port = 6380
publisher.channel = "Script"

View file

@ -1,7 +1,14 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
"""
Sql Injection module
The SQLInjectionDetection Module
================================
This module is consuming the Redis-list created by the Web module.
It test different possibility to makes some sqlInjection.
"""
import time

View file

@ -4,8 +4,8 @@
Sentiment analyser module.
It takes its inputs from 'global'.
The content analysed comes from the pastes with length of the line
above a defined threshold removed (get_p_content_with_removed_lines).
The content is analysed if the length of the line is
above a defined threshold (get_p_content_with_removed_lines).
This is done because NLTK sentences tokemnizer (sent_tokenize) seems to crash
for long lines (function _slices_from_text line#1276).

View file

@ -2,6 +2,8 @@
# -*-coding:UTF-8 -*
"""
This Module is used for term frequency.
It processes every paste coming from the global module and test the sets
supplied in the term webpage.
"""
import redis

View file

@ -1,8 +1,8 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
"""
The ZMQ_PubSub_Lines Module
============================
The Tokenize Module
===================
This module is consuming the Redis-list created by the ZMQ_PubSub_Tokenize_Q
Module.

View file

@ -1,5 +1,14 @@
#!/usr/bin/env python
# -*-coding:UTF-8 -*
"""
The Web Module
============================
This module tries to parse URLs and warns if some defined contry code are present.
"""
import redis
import pprint
import time

View file

@ -1,7 +1,13 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
"""
Template for new modules
The WebStats Module
======================
This module makes stats on URL recolted from the web module.
It consider the TLD, Domain and protocol.
"""
import time

View file

@ -57,8 +57,8 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_BrowseWarningPaste
[Onion]
subscribe = Redis_Onion
publish = Redis_ValidOnion,ZMQ_FetchedOnion
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion
publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste
#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste
[DumpValidOnion]
subscribe = Redis_ValidOnion

View file

@ -1,6 +1,15 @@
#!/usr/bin/env python2
# -*-coding:UTF-8 -*
'''
The preProcess Module
=====================
This module is just an example of how we can pre-process a feed coming from the Mixer
module before seding it to the Global module.
'''
import time
from pubsublogger import publisher

Binary file not shown.

Before

Width:  |  Height:  |  Size: 66 KiB

After

Width:  |  Height:  |  Size: 51 KiB

View file

@ -75,6 +75,7 @@
<li name='nav-pan'><a data-toggle="tab" href="#keys-tab" data-attribute-name="keys" data-panel="keys-panel">Keys</a></li>
<li name='nav-pan'><a data-toggle="tab" href="#mail-tab" data-attribute-name="mail" data-panel="mail-panel">Mails</a></li>
<li name='nav-pan'><a data-toggle="tab" href="#phone-tab" data-attribute-name="phone" data-panel="phone-panel">Phones</a></li>
<li name='nav-pan'><a data-toggle="tab" href="#onion-tab" data-attribute-name="onion" data-panel="onion-panel">Onions</a></li>
</ul>
</br>
@ -101,6 +102,9 @@
<div class="col-lg-12 tab-pane fade" id="phone-tab">
<img id="loading-gif-modal" src="{{url_for('static', filename='image/loading.gif') }}" style="margin: 4px;">
</div>
<div class="col-lg-12 tab-pane fade" id="onion-tab">
<img id="loading-gif-modal" src="{{url_for('static', filename='image/loading.gif') }}" style="margin: 4px;">
</div>
</div> <!-- tab-content -->
<!-- /.row -->
</div>