diff --git a/HOWTO.md b/HOWTO.md new file mode 100644 index 00000000..d4a7b962 --- /dev/null +++ b/HOWTO.md @@ -0,0 +1,98 @@ +Feeding, adding new features and contributing +============================================= + +How to feed the AIL framework +----------------------------- + +For the moment, there are three different ways to feed AIL with data: + +1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL. + +2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below). + +3. You can feed your own data using the [./bin/import_dir.py](./bin/import_dir.py) script. + +### Feeding AIL with pystemon + +AIL is an analysis tool, not a collector! +However, if you want to collect some pastes and feed them to AIL, the procedure is described below. Nevertheless, moderate your queries! + +Feed data to AIL: + +1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon) + +2. Install its python dependencies inside your virtual environment + +3. Launch pystemon ``` ./pystemon ``` + +4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly + +5. Launch pystemon-feeder ``` ./pystemon-feeder.py ``` + + +How to create a new module +-------------------------- + +If you want to add a new processing or analysis module in AIL, follow these simple steps: + +1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to at least one module at minimum (Usually, Redis_Global). + +2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration. + + +How to create a new webpage +--------------------------- + +If you want to add a new webpage for a module in AIL, follow these simple steps: + +1. Launch [./var/www/create_new_web_module.py](./var/www/create_new_web_module.py) and enter the name to use for your webpage (Usually, your newly created python module). + +2. A template and flask skeleton has been created for your new webpage in [./var/www/modules/](./var/www/modules/) + +3. Edit the created html files under the template folder as well as the Flask_* python script so that they fit your needs. + +4. You can change the order of your module in the top navigation header in the file [./var/www/templates/header_base.html](./var/www/templates/header_base.html) + +5. You can ignore module, and so, not display them in the top navigation header by adding the module name in the file [./var/www/templates/ignored_modules.txt](./var/www/templates/ignored_modules.txt) + +How to contribute a module +-------------------------- + +Feel free to fork the code, play with it, make some patches or add additional analysis modules. + +To contribute your module, feel free to pull your contribution. + + +Additional information +====================== + +Manage modules: ModulesInformationV2.py +--------------------------------------- + +You can do a lots of things easily with the [./bin/ModulesInformationV2](./bin/ModulesInformationV2) script: + +- Monitor the health of other modules +- Monitor the ressources comsumption of other modules +- Start one or more modules +- Kill running modules +- Restart automatically stuck modules +- Show the paste currently processed by a module + +### Navigation + +You can navigate into the interface by using arrow keys. In order to perform an action on a selected module, you can either press or to show the dialog box. + +To change list, you can press the key. + +Also, you can quickly stop or start modules by clicking on the or symbol respectively. These are located in the _Action_ column. + +Finally, you can quit this program by pressing either or + + +Terms frequency usage +--------------------- + +In AIL, you can track terms, set of terms and even regexes without creating a dedicated module. To do so, go to the tab `Terms Frequency` in the web interface. +- You can track a term by simply putting it in the box. +- You can track a set of terms by simply putting terms in an array surrounded by the '\' character. You can also set a custom threshold regarding the number of terms that must match to trigger the detection. For example, if you want to track the terms _term1_ and _term2_ at the same time, you can use the following rule: `\[term1, term2, [100]]\` +- You can track regexes as easily as tracking a term. You just have to put your regex in the box surrounded by the '/' character. For example, if you want to track the regex matching all email address having the domain _domain.net_, you can use the following aggressive rule: `/*.domain.net/`. diff --git a/OVERVIEW.md b/OVERVIEW.md new file mode 100644 index 00000000..72c8e236 --- /dev/null +++ b/OVERVIEW.md @@ -0,0 +1,22 @@ +Overview +======== + +Redis and LevelDB overview +-------------------------- + +* Redis on TCP port 6379 + - DB 0 - Cache hostname/dns + - DB 1 - Paste meta-data +* Redis on TCP port 6380 - Redis Log only +* Redis on TCP port 6381 + - DB 0 - PubSub + Queue and Paste content LRU cache + - DB 1 - _Mixer_ Cache +* LevelDB on TCP port 6382 + - DB 1 - Curve + - DB 2 - Trending + - DB 3 - Terms + - DB 4 - Sentiments +* LevelDB on TCP port + - DB 0 - Lines duplicate + - DB 1 - Hashs + diff --git a/README.md b/README.md index 1e4c7b6d..2c01a9e3 100644 --- a/README.md +++ b/README.md @@ -11,50 +11,24 @@ AIL is a modular framework to analyse potential information leaks from unstructu ![Dashboard](./doc/screenshots/dashboard.png?raw=true "AIL framework dashboard") -Trending charts ---------------- - -![Trending-Web](./doc/screenshots/trending-web.png?raw=true "AIL framework webtrending") -![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending") - -Browsing --------- - -![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes") - -Sentiment analysis ------------------- - -![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis") - -Terms manager and occurence ---------------------------- - -![Term-Manager](./doc/screenshots/terms-manager.png?raw=true "AIL framework termManager") - -## Top terms - -![Term-Top](./doc/screenshots/terms-top.png?raw=true "AIL framework termTop") -![Term-Plot](./doc/screenshots/terms-plot.png?raw=true "AIL framework termPlot") - - -[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo) - Features -------- * Modular architecture to handle streams of unstructured or structured information * Default support for external ZMQ feeds, such as provided by CIRCL or other providers +* Multiple feed support * Each module can process and reprocess the information already processed by AIL * Detecting and extracting URLs including their geographical location (e.g. IP address location) -* Extracting and validating potential leak of credit cards numbers +* Extracting and validating potential leak of credit cards numbers, credentials, ... * Extracting and validating email addresses leaked including DNS MX validation * Module for extracting Tor .onion addresses (to be further processed for analysis) +* Keep tracks of duplicates * Extracting and validating potential hostnames (e.g. to feed Passive DNS systems) * A full-text indexer module to index unstructured information -* Modules and web statistics +* Statistics on modules and web +* Realtime modules manager in terminal * Global sentiment analysis for each providers based on nltk vader module -* Terms tracking and occurrence +* Terms, Set of terms and Regex tracking and occurrence * Many more modules for extracting phone numbers, credentials and others Installation @@ -101,69 +75,51 @@ Eventually you can browse the status of the AIL framework website at the followi ``http://localhost:7000/`` -How to -====== +HOWTO +----- -How to feed the AIL framework ------------------------------ - -For the moment, there are two different ways to feed AIL with data: - -1. Be a collaborator of CIRCL and ask to access our feed. It will be sent to the static IP your are using for AIL. - -2. You can setup [pystemon](https://github.com/CIRCL/pystemon) and use the custom feeder provided by AIL (see below). - -###Feeding AIL with pystemon -AIL is an analysis tool, not a collector! -However, if you want to collect some pastes and feed them to AIL, the procedure is described below. - -Nevertheless, moderate your queries! - -Here are the steps to setup pystemon and feed data to AIL: - -1. Clone the [pystemon's git repository](https://github.com/CIRCL/pystemon) - -2. Install its python dependencies inside your virtual environment - -3. Launch pystemon ``` ./pystemon ``` - -4. Edit your configuration file ```bin/packages/config.cfg``` and modify the pystemonpath path accordingly - -5. Launch pystemon-feeder ``` ./pystemon-feeder.py ``` +HOWTO are available in [HOWTO.md](HOWTO.md) -How to create a new module --------------------------- +Screenshots +=========== -If you want to add a new processing or analysis module in AIL, follow these simple steps: +Trending charts +--------------- -1. Add your module name in [./bin/packages/modules.cfg](./bin/packages/modules.cfg) and subscribe to the Redis_Global at minimum. +![Trending-Web](./doc/screenshots/trending-web.png?raw=true "AIL framework webtrending") +![Trending-Modules](./doc/screenshots/trending-module.png?raw=true "AIL framework modulestrending") -2. Use [./bin/template.py](./bin/template.py) as a sample module and create a new file in bin/ with the module name used in the modules.cfg configuration. +Browsing +-------- -How to contribute a module --------------------------- +![Browse-Pastes](./doc/screenshots/browse-important.png?raw=true "AIL framework browseImportantPastes") -Feel free to fork the code, play with it, make some patches or add additional analysis modules. +Sentiment analysis +------------------ -To contribute your module, feel free to pull your contribution. +![Sentiment](./doc/screenshots/sentiment.png?raw=true "AIL framework sentimentanalysis") -Overview and License -==================== +Terms manager and occurence +--------------------------- + +![Term-Manager](./doc/screenshots/terms-manager.png?raw=true "AIL framework termManager") + +### Top terms + +![Term-Top](./doc/screenshots/terms-top.png?raw=true "AIL framework termTop") +![Term-Plot](./doc/screenshots/terms-plot.png?raw=true "AIL framework termPlot") -Redis and LevelDB overview --------------------------- +[AIL framework screencast](https://www.youtube.com/watch?v=1_ZrZkRKmNo) -* Redis on TCP port 6379 - DB 1 - Paste meta-data -* DB 0 - Cache hostname/dns -* Redis on TCP port 6380 - Redis Pub-Sub only -* Redis on TCP port 6381 - DB 0 - Queue and Paste content LRU cache -* Redis on TCP port 6382 - DB 1-4 - Trending, terms and sentiments -* LevelDB on TCP port - Lines duplicate +Command line module manager +--------------------------- -LICENSE -------- +![Module-Manager](./doc/screenshots/module-manager.png?raw=true "AIL framework ModuleInformationV2.py") + +License +======= ``` Copyright (C) 2014 Jules Debra diff --git a/bin/Attributes.py b/bin/Attributes.py index a7f78696..66e22f39 100755 --- a/bin/Attributes.py +++ b/bin/Attributes.py @@ -5,25 +5,7 @@ The ZMQ_Sub_Attribute Module ============================ -This module is consuming the Redis-list created by the ZMQ_PubSub_Line_Q Module - -It perform a sorting on the line's length and publish/forward them to -differents channels: - -*Channel 1 if max length(line) < max -*Channel 2 if max length(line) > max - -The collected informations about the processed pastes -(number of lines and maximum length line) are stored in Redis. - -..note:: Module ZMQ_Something_Q and ZMQ_Something are closely bound, always put -the same Subscriber name in both of them. - -Requirements ------------- - -*Need running Redis instances. (LevelDB & Redis) -*Need the ZMQ_PubSub_Line_Q Module running to be able to work properly. +This module is saving Attribute of the paste into redis """ import time diff --git a/bin/Credential.py b/bin/Credential.py index 8c62f34a..ff8f8f97 100755 --- a/bin/Credential.py +++ b/bin/Credential.py @@ -1,5 +1,16 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* + +""" +The Credential Module +===================== + +This module is consuming the Redis-list created by the Categ module. + +It apply credential regexes on paste content and warn if above a threshold. + +""" + import time import sys from packages import Paste diff --git a/bin/CreditCards.py b/bin/CreditCards.py index 6c9bf9c1..79442576 100755 --- a/bin/CreditCards.py +++ b/bin/CreditCards.py @@ -1,5 +1,17 @@ #!/usr/bin/env python # -*-coding:UTF-8 -* + +""" +The CreditCards Module +====================== + +This module is consuming the Redis-list created by the Categ module. + +It apply credit card regexes on paste content and warn if above a threshold. + +""" + + import pprint import time from packages import Paste @@ -7,7 +19,6 @@ from packages import lib_refine from pubsublogger import publisher import re - from Helper import Process if __name__ == "__main__": diff --git a/bin/CurveManageTopSets.py b/bin/CurveManageTopSets.py index 562705cf..eea46a8c 100755 --- a/bin/CurveManageTopSets.py +++ b/bin/CurveManageTopSets.py @@ -5,14 +5,6 @@ This module manage top sets for terms frequency. Every 'refresh_rate' update the weekly and monthly set - -Requirements ------------- - -*Need running Redis instances. (Redis) -*Categories files of words in /files/ need to be created -*Need the ZMQ_PubSub_Tokenize_Q Module running to be able to work properly. - """ import redis diff --git a/bin/Cve.py b/bin/Cve.py index 97e5aaae..fb4b0b24 100755 --- a/bin/Cve.py +++ b/bin/Cve.py @@ -1,7 +1,13 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* """ - Template for new modules +The CVE Module +====================== + +This module is consuming the Redis-list created by the Categ module. + +It apply CVE regexes on paste content and warn if a reference to a CVE is spotted. + """ import time diff --git a/bin/DomClassifier.py b/bin/DomClassifier.py index 74522917..c205cb01 100755 --- a/bin/DomClassifier.py +++ b/bin/DomClassifier.py @@ -5,8 +5,8 @@ The DomClassifier Module ============================ -The DomClassifier modules is fetching the list of files to be -processed and index each file with a full-text indexer (Whoosh until now). +The DomClassifier modules extract and classify Internet domains/hostnames/IP addresses from +the out output of the Global module. """ import time diff --git a/bin/Keys.py b/bin/Keys.py index a286dada..d2e7ebd2 100755 --- a/bin/Keys.py +++ b/bin/Keys.py @@ -1,7 +1,14 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* + """ - Template for new modules +The Keys Module +====================== + +This module is consuming the Redis-list created by the Global module. + +It is looking for PGP encrypted messages + """ import time diff --git a/bin/Mail.py b/bin/Mail.py index 6ec938f3..99dd6948 100755 --- a/bin/Mail.py +++ b/bin/Mail.py @@ -1,6 +1,16 @@ #!/usr/bin/env python # -*-coding:UTF-8 -* +""" +The CreditCards Module +====================== + +This module is consuming the Redis-list created by the Categ module. + +It apply mail regexes on paste content and warn if above a threshold. + +""" + import redis import pprint import time diff --git a/bin/Mixer.py b/bin/Mixer.py index 266eada3..40614253 100755 --- a/bin/Mixer.py +++ b/bin/Mixer.py @@ -1,8 +1,8 @@ #!/usr/bin/env python # -*-coding:UTF-8 -* """ -The ZMQ_Feed_Q Module -===================== +The Mixer Module +================ This module is consuming the Redis-list created by the ZMQ_Feed_Q Module. @@ -22,13 +22,7 @@ Depending on the configuration, this module will process the feed as follow: Note that the hash of the content is defined as the sha1(gzip64encoded). Every data coming from a named feed can be sent to a pre-processing module before going to the global module. -The mapping can be done via the variable feed_queue_mapping - -Requirements ------------- - -*Need running Redis instances. -*Need the ZMQ_Feed_Q Module running to be able to work properly. +The mapping can be done via the variable FEED_QUEUE_MAPPING """ import base64 @@ -44,7 +38,7 @@ from Helper import Process # CONFIG # refresh_time = 30 -feed_queue_mapping = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module +FEED_QUEUE_MAPPING = { "feeder2": "preProcess1" } # Map a feeder name to a pre-processing module if __name__ == '__main__': publisher.port = 6380 @@ -117,8 +111,8 @@ if __name__ == '__main__': else: # New content # populate Global OR populate another set based on the feeder_name - if feeder_name in feed_queue_mapping: - p.populate_set_out(relay_message, feed_queue_mapping[feeder_name]) + if feeder_name in FEED_QUEUE_MAPPING: + p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name]) else: p.populate_set_out(relay_message, 'Mixer') @@ -139,8 +133,8 @@ if __name__ == '__main__': server.expire('HASH_'+paste_name, ttl_key) # populate Global OR populate another set based on the feeder_name - if feeder_name in feed_queue_mapping: - p.populate_set_out(relay_message, feed_queue_mapping[feeder_name]) + if feeder_name in FEED_QUEUE_MAPPING: + p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name]) else: p.populate_set_out(relay_message, 'Mixer') @@ -153,8 +147,8 @@ if __name__ == '__main__': server.expire(paste_name, ttl_key) # populate Global OR populate another set based on the feeder_name - if feeder_name in feed_queue_mapping: - p.populate_set_out(relay_message, feed_queue_mapping[feeder_name]) + if feeder_name in FEED_QUEUE_MAPPING: + p.populate_set_out(relay_message, FEED_QUEUE_MAPPING[feeder_name]) else: p.populate_set_out(relay_message, 'Mixer') diff --git a/bin/Onion.py b/bin/Onion.py index 1680a244..af41777d 100755 --- a/bin/Onion.py +++ b/bin/Onion.py @@ -145,6 +145,7 @@ if __name__ == "__main__": PST.p_name) for url in fetch(p, r_cache, urls, domains_list, path): publisher.warning('{}Checked {};{}'.format(to_print, url, PST.p_path)) + p.populate_set_out('onion;{}'.format(PST.p_path), 'BrowseWarningPaste') else: publisher.info('{}Onion related;{}'.format(to_print, PST.p_path)) diff --git a/bin/Phone.py b/bin/Phone.py index e18e12c0..cb32a691 100755 --- a/bin/Phone.py +++ b/bin/Phone.py @@ -1,7 +1,14 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* + """ - module for finding phone numbers +The Phone Module +================ + +This module is consuming the Redis-list created by the Categ module. + +It apply phone number regexes on paste content and warn if above a threshold. + """ import time @@ -17,6 +24,7 @@ def search_phone(message): content = paste.get_p_content() # regex to find phone numbers, may raise many false positives (shalt thou seek optimization, upgrading is required) reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\d{2,3}){3,4})') + reg_phone = re.compile(r'(\+\d{1,4}(\(\d\))?\d?|0\d?)(\d{6,8}|([-/\. ]{1}\(?\d{2,4}\)?){3,4})') # list of the regex results in the Paste, may be null results = reg_phone.findall(content) diff --git a/bin/RegexForTermsFrequency.py b/bin/RegexForTermsFrequency.py index 023710c4..2efdfee5 100755 --- a/bin/RegexForTermsFrequency.py +++ b/bin/RegexForTermsFrequency.py @@ -2,6 +2,8 @@ # -*-coding:UTF-8 -* """ This Module is used for term frequency. +It processes every paste coming from the global module and test the regexs +supplied in the term webpage. """ import redis diff --git a/bin/Release.py b/bin/Release.py index ce30ea3f..98e60a96 100755 --- a/bin/Release.py +++ b/bin/Release.py @@ -6,6 +6,11 @@ from pubsublogger import publisher from Helper import Process import re +''' +This module takes its input from the global module. +It applies some regex and publish matched content +''' + if __name__ == "__main__": publisher.port = 6380 publisher.channel = "Script" diff --git a/bin/SQLInjectionDetection.py b/bin/SQLInjectionDetection.py index 1901d4b6..d2948f1b 100755 --- a/bin/SQLInjectionDetection.py +++ b/bin/SQLInjectionDetection.py @@ -1,7 +1,14 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* + """ - Sql Injection module +The SQLInjectionDetection Module +================================ + +This module is consuming the Redis-list created by the Web module. + +It test different possibility to makes some sqlInjection. + """ import time diff --git a/bin/SentimentAnalysis.py b/bin/SentimentAnalysis.py index 8cd71305..00b15abb 100755 --- a/bin/SentimentAnalysis.py +++ b/bin/SentimentAnalysis.py @@ -4,8 +4,8 @@ Sentiment analyser module. It takes its inputs from 'global'. - The content analysed comes from the pastes with length of the line - above a defined threshold removed (get_p_content_with_removed_lines). + The content is analysed if the length of the line is + above a defined threshold (get_p_content_with_removed_lines). This is done because NLTK sentences tokemnizer (sent_tokenize) seems to crash for long lines (function _slices_from_text line#1276). diff --git a/bin/SetForTermsFrequency.py b/bin/SetForTermsFrequency.py index b3100073..c4e480ff 100755 --- a/bin/SetForTermsFrequency.py +++ b/bin/SetForTermsFrequency.py @@ -2,6 +2,8 @@ # -*-coding:UTF-8 -* """ This Module is used for term frequency. +It processes every paste coming from the global module and test the sets +supplied in the term webpage. """ import redis diff --git a/bin/Tokenize.py b/bin/Tokenize.py index 5e5c9b17..377cba5a 100755 --- a/bin/Tokenize.py +++ b/bin/Tokenize.py @@ -1,8 +1,8 @@ #!/usr/bin/env python # -*-coding:UTF-8 -* """ -The ZMQ_PubSub_Lines Module -============================ +The Tokenize Module +=================== This module is consuming the Redis-list created by the ZMQ_PubSub_Tokenize_Q Module. diff --git a/bin/Web.py b/bin/Web.py index 0fae546d..dc2bf2fd 100755 --- a/bin/Web.py +++ b/bin/Web.py @@ -1,5 +1,14 @@ #!/usr/bin/env python # -*-coding:UTF-8 -* + +""" +The Web Module +============================ + +This module tries to parse URLs and warns if some defined contry code are present. + +""" + import redis import pprint import time diff --git a/bin/WebStats.py b/bin/WebStats.py index 4cc05b48..cbb52e7a 100755 --- a/bin/WebStats.py +++ b/bin/WebStats.py @@ -1,7 +1,13 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* + """ - Template for new modules +The WebStats Module +====================== + +This module makes stats on URL recolted from the web module. +It consider the TLD, Domain and protocol. + """ import time diff --git a/bin/packages/modules.cfg b/bin/packages/modules.cfg index c7e0063f..33eebd21 100644 --- a/bin/packages/modules.cfg +++ b/bin/packages/modules.cfg @@ -57,8 +57,8 @@ publish = Redis_Duplicate,Redis_ModuleStats,Redis_BrowseWarningPaste [Onion] subscribe = Redis_Onion -publish = Redis_ValidOnion,ZMQ_FetchedOnion -#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion +publish = Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste +#publish = Redis_Global,Redis_ValidOnion,ZMQ_FetchedOnion,Redis_BrowseWarningPaste [DumpValidOnion] subscribe = Redis_ValidOnion diff --git a/bin/preProcessFeed.py b/bin/preProcessFeed.py index fe542647..d9ef419d 100755 --- a/bin/preProcessFeed.py +++ b/bin/preProcessFeed.py @@ -1,6 +1,15 @@ #!/usr/bin/env python2 # -*-coding:UTF-8 -* +''' +The preProcess Module +===================== + +This module is just an example of how we can pre-process a feed coming from the Mixer +module before seding it to the Global module. + +''' + import time from pubsublogger import publisher diff --git a/doc/screenshots/sentiment.png b/doc/screenshots/sentiment.png index 1aa0fc05..d7f1dbec 100644 Binary files a/doc/screenshots/sentiment.png and b/doc/screenshots/sentiment.png differ diff --git a/var/www/modules/browsepastes/templates/browse_important_paste.html b/var/www/modules/browsepastes/templates/browse_important_paste.html index ee3503fd..5d7d84e3 100644 --- a/var/www/modules/browsepastes/templates/browse_important_paste.html +++ b/var/www/modules/browsepastes/templates/browse_important_paste.html @@ -75,6 +75,7 @@
  • Keys
  • Mails
  • Phones
  • +
  • Onions

  • @@ -101,6 +102,9 @@
    +
    + +