Commit graph

183 commits

Author SHA1 Message Date
Raphaël Vinot
4a1f300a1a Cleanup (remove unused imports, more pep8 compatible) 2014-08-14 14:11:07 +02:00
Starow
04a8f1bdf2 maxi cleanup old code :'( 2014-08-14 11:48:46 +02:00
Starow
29b24b6466 printing set of domain for debugging 2014-08-13 16:35:27 +02:00
Raphaël Vinot
ece3bc173e Cleanup of main Paste module 2014-08-13 11:56:22 +02:00
Raphaël Vinot
5b17d416c8 remove script installed by pubsublogger 2014-08-13 11:55:59 +02:00
Raphaël Vinot
935e51c961 Remove 3rd party code (pubsublogger), add it in the deps. 2014-08-13 10:19:43 +02:00
Starow
37033ca3a6 Minor logs modifications 2014-08-13 10:08:44 +02:00
Starow
6aa4d7cb7d Harmonising logs messages + Changing some dygraph options 2014-08-12 15:42:16 +02:00
0b4a80b7ea -s option added to find similar documents
By default, the index is not storing the vector of the document (Whoosh
document schema). It won't work if you don't change the schema of the
index for the content. It depends of your storage strategy.
2014-08-12 13:42:26 +02:00
fd6e1a8436 -f option added: dump full document for each match 2014-08-12 13:26:56 +02:00
0a6664ffba Indexer: Some index statistics added
usage: indexer_lookup.py [-h] [-q Q] [-n] [-t] [-l]

Fulltext search for AIL

optional arguments:
  -h, --help  show this help message and exit
  -q Q        query to lookup (one or more)
  -n          return number of indexed documents
  -t          dump top 500 terms
  -l          dump all terms encountered in indexed documents
2014-08-11 15:07:12 +02:00
f65a94d47b -l added -> dumping all terms indexed 2014-08-11 14:56:15 +02:00
f3d1ca052e Return the number of indexed documents 2014-08-11 14:50:35 +02:00
611d2a466f Configuration that should not be there... 2014-08-11 14:24:27 +02:00
2b8f2689bf Indexer queue and script added to "BBS-like" LAUNCH script 2014-08-11 14:06:52 +02:00
9657c6bf80 Merge branch 'master' of https://github.com/CIRCL/AIL-framework 2014-08-11 13:46:37 +02:00
b1053af3cd Indexer module: script to query the index
Test script to query the index generated from the Indexer module.

python indexer_lookup.py -q Visa -q Mastercard
2014-08-11 12:03:27 +02:00
Starow
079db6f80c Hardcoded path from ZMQ_Curve are now referring correctly in config.cfg.sample fix #6 2014-08-11 11:33:18 +02:00
7bdd4a41a5 Indexer module added - initial version with Whoosh full-text indexer
The indexer module indexes all the pastes using Whoosh. The module
can be extended to support additional full-text indexers in the future.
2014-08-11 11:04:09 +02:00
Starow
d1d4b2ebe0 Importing dns.exeption fix #4 fix #7 2014-08-11 09:27:50 +02:00
Starow
192074e569 Merge branch 'master' of https://github.com/CIRCL/AIL-framework 2014-08-11 09:21:09 +02:00
Starow
a5c1d59d29 Catching the exception dns.exception.Timeout fix #7 2014-08-11 09:18:55 +02:00
Starow
54091a2174 Catching the exception dns.exception.Timeout fix #4 2014-08-11 09:08:28 +02:00
Starow
eb603e8762 Fixing a bug about caching paste inside Redis :) 2014-08-08 17:23:51 +02:00
Starow
7a1db94f9e Adding a letter (s) 2014-08-08 17:19:42 +02:00
Starow
043800287a adding a . 2014-08-08 17:18:03 +02:00
Starow
bf682c4b44 Fixing last commit ... 2014-08-08 17:13:18 +02:00
Starow
503c23ca3b Fixing last commit 2014-08-08 17:08:41 +02:00
Starow
c9e1eaf182 Improving cache code 2014-08-08 17:04:25 +02:00
Starow
44addf1afe Redis cache added fix #5
The paste will be add in Redis during 5min and also saved on disk.
Now if a module want to get the paste for further processing, it will first try to get it in the cache
instead of getting it directly on the disk and wasting I/O.
2014-08-08 16:48:02 +02:00
Starow
97f3a3df9e update pubsublogger with the last version 2014-08-07 14:49:34 +02:00
Starow
c10003a630 Changing ZMQ Curve Module comment 2014-08-07 14:46:43 +02:00
Starow
1379ef705a Initial import of AIL framework - Analysis Information Leak framework
AIL is a modular framework to analyse potential information leak from unstructured data source like pastes from Past
ebin or similar services. AIL framework is flexible and can be extended to support other functionalities to mine sen
sitive information
2014-08-06 11:43:40 +02:00