Raphaël Vinot
8803c8447a
Publish the fetched onions on a ZMQ feed.
2014-09-30 16:55:16 +02:00
Raphaël Vinot
65b9a01644
Add config file for DomainClassifier, proper reporting
2014-09-17 17:22:56 +02:00
Raphaël Vinot
f017680365
fix onions, cc and domain classifier modules
2014-09-08 16:51:43 +02:00
Raphaël Vinot
e983c839ad
Categ now listen to the Global queue
2014-09-05 17:05:45 +02:00
Raphaël Vinot
46f27ada4e
More cleanup
2014-09-05 10:42:01 +02:00
Raphaël Vinot
fca00beed9
Add Domain Classifier module.
...
Cleanup in the config files.
2014-09-05 10:41:00 +02:00
Raphaël Vinot
b7c9e489c9
Fix the exceptions
2014-09-04 11:46:07 +02:00
Raphaël Vinot
9e8611a42d
stop killing the disk when creating the word curve
2014-09-02 18:20:28 +02:00
Raphaël Vinot
7542eaf739
Update starting script.
2014-09-02 15:21:36 +02:00
Raphaël Vinot
0c6b09f379
Fix the onion module, log the valid onions.
2014-09-01 16:18:06 +02:00
Raphaël Vinot
f4b89669fc
The onion module now fetches the URLs it finds.
2014-08-31 22:42:12 +02:00
Raphaël Vinot
abfe13436b
Big refactoring, make the queues more flexible
2014-08-29 19:37:56 +02:00
Raphaël Vinot
623e876f3b
Cleanup.
...
* Remove useless subscriber
* Fix typo in the config file
* Update Helper accordingly
2014-08-26 17:36:57 +02:00
f070ac2005
cymruwhois uses dotted decimal format
2014-08-25 10:05:36 +02:00
Raphaël Vinot
3886d1b834
Small fixes to make the refactoring production ready
...
* the port for the logging is 6380
* use os.environ properly
* fix typos
2014-08-22 17:35:40 +02:00
Raphaël Vinot
78125db4ea
Use env variables everywhere
2014-08-22 14:52:02 +02:00
Raphaël Vinot
277d138a5d
cleanup, add FIXME
2014-08-21 14:39:17 +02:00
Raphaël Vinot
63b29176c1
move Redis_Data_Merging to Paste
2014-08-21 12:22:07 +02:00
Raphaël Vinot
50cfac857e
Update config
...
Make all paths in the config file relative to the home directory.
2014-08-20 16:00:56 +02:00
Raphaël Vinot
a68f5b6a0e
fix subscriber names, update default config
2014-08-20 15:54:21 +02:00
Raphaël Vinot
2485ba5df2
Merge remote-tracking branch 'origin/master' into testing
...
Conflicts:
bin/ZMQ_Sub_Urls.py
2014-08-20 15:24:10 +02:00
Raphaël Vinot
99c8cc7941
completely remove ZMQ_PubSub.py
2014-08-20 15:14:57 +02:00
1d64dc44c8
MIME type guessing - removed one duplicate call to libmagic
2014-08-20 10:22:33 +02:00
Raphaël Vinot
8d9ffbaa53
Do not create a ZMQ sub if it is not required.
2014-08-19 19:53:33 +02:00
Raphaël Vinot
45b0bf3983
Improve the cleanup. Still some to do.
2014-08-19 19:07:07 +02:00
Raphaël Vinot
f1753d67c6
Cleanup the queues.
2014-08-19 16:05:37 +02:00
e8fcea6cd6
Remove undeclared variable
2014-08-18 16:17:36 +02:00
7d8ee102a3
Assignment before use (if Enumerate fails)
2014-08-18 15:58:06 +02:00
4304c6858e
Configuration path fixed
2014-08-18 09:02:08 +02:00
Raphaël Vinot
078c8ea836
Big cleanup, pep8
2014-08-14 18:07:18 +02:00
Jules
ab6765315e
Merge pull request #13 from adulau/master
...
Log where URLs are hosted - cc_critical option added
2014-08-14 14:28:01 +02:00
762def3a23
Log where URLs are hosted - cc_critical option added
...
It logs where the hostname of the URL is hosted (ASN and geographic location).
A simple option cc_critical added to set the country code to log as critical.
2014-08-14 14:22:11 +02:00
Raphaël Vinot
4a1f300a1a
Cleanup (remove unused imports, more pep8 compatible)
2014-08-14 14:11:07 +02:00
Starow
04a8f1bdf2
maxi cleanup old code :'(
2014-08-14 11:48:46 +02:00
Starow
29b24b6466
printing set of domain for debugging
2014-08-13 16:35:27 +02:00
Raphaël Vinot
ece3bc173e
Cleanup of main Paste module
2014-08-13 11:56:22 +02:00
Raphaël Vinot
5b17d416c8
remove script installed by pubsublogger
2014-08-13 11:55:59 +02:00
Raphaël Vinot
935e51c961
Remove 3rd party code (pubsublogger), add it in the deps.
2014-08-13 10:19:43 +02:00
Starow
37033ca3a6
Minor logs modifications
2014-08-13 10:08:44 +02:00
Starow
6aa4d7cb7d
Harmonising logs messages + Changing some dygraph options
2014-08-12 15:42:16 +02:00
0b4a80b7ea
-s option added to find similar documents
...
By default, the index is not storing the vector of the document (Whoosh
document schema). It won't work if you don't change the schema of the
index for the content. It depends of your storage strategy.
2014-08-12 13:42:26 +02:00
fd6e1a8436
-f option added: dump full document for each match
2014-08-12 13:26:56 +02:00
0a6664ffba
Indexer: Some index statistics added
...
usage: indexer_lookup.py [-h] [-q Q] [-n] [-t] [-l]
Fulltext search for AIL
optional arguments:
-h, --help show this help message and exit
-q Q query to lookup (one or more)
-n return number of indexed documents
-t dump top 500 terms
-l dump all terms encountered in indexed documents
2014-08-11 15:07:12 +02:00
f65a94d47b
-l added -> dumping all terms indexed
2014-08-11 14:56:15 +02:00
f3d1ca052e
Return the number of indexed documents
2014-08-11 14:50:35 +02:00
611d2a466f
Configuration that should not be there...
2014-08-11 14:24:27 +02:00
2b8f2689bf
Indexer queue and script added to "BBS-like" LAUNCH script
2014-08-11 14:06:52 +02:00
9657c6bf80
Merge branch 'master' of https://github.com/CIRCL/AIL-framework
2014-08-11 13:46:37 +02:00
b1053af3cd
Indexer module: script to query the index
...
Test script to query the index generated from the Indexer module.
python indexer_lookup.py -q Visa -q Mastercard
2014-08-11 12:03:27 +02:00
Starow
079db6f80c
Hardcoded path from ZMQ_Curve are now referring correctly in config.cfg.sample fix #6
2014-08-11 11:33:18 +02:00