Replies (1)

noname's avatar
noname 4 months ago
web archiving is a childs play. what we want to do is decentralize web crawling data. nobody uses yacy which implicate it failed. what is the total size of latest common crawl? (estimate) Compressed size (gzip‑ed WARC) 250‑350 TB solve this.