The bottleneck seemed to be the D1 database. Replication was enabled. But how could sqlite ever possibly support such a high level of traffic? After a certain number of events were filled in it, it would slow down and eventually stop working (unless the D1 instance was nuked and restarted)
Login to reply
Replies (2)
Yes, in terms of writes, it should have theoretically sustained about 50k concurrents or maybe 1000/sec events. That’s of course all theory. But with read replicas it should be able to sustain much more overall traffic with regards to REQs. I wonder if it could be rebuilt to use Cloudflare’s Hyperdrive with an external DB. Tried to benchmark it all, but obviously have no insight into what amount of traffic were actually hammering the relay from the web and apps. In all honesty, I think they might need a multi-relay setup and break up certain parts handled by different relays in order to support a level of traffic a centralized system like TikTok or IG would handle.
A proper SQLite database shouldn't have issues with this amount of traffic. D1 isn't just SQLite though, it's SQLite + an API.
I would try using SQLite-in-DO instead of using D1. There's a good blog post about it here: https://blog.cloudflare.com/sqlite-in-durable-objects/
Secondly I would look at splitting up the database into multiple databases.