Ingest Common Crawl index in ClickHouse
Find a file
2025-10-15 18:24:54 +02:00
.gitignore Initial commit 2025-10-15 09:51:39 +00:00
ingest.py initial commit 2025-10-15 18:24:54 +02:00
LICENSE Initial commit 2025-10-15 09:51:39 +00:00
live_record.py initial commit 2025-10-15 18:24:54 +02:00
README.md Initial commit 2025-10-15 09:51:39 +00:00
requirements.txt initial commit 2025-10-15 18:24:54 +02:00

CommonCrawl-Ingestor

Ingest Common Crawl index in ClickHouse