DeusVult's avatar
DeusVult 1 year ago
I think it would always end in a stalemate. but you're on to something. I would like to see them debate a topic they both have programed biases over, such as some issue on China or Israel.

Replies (1)

exmp's avatar
exmp 1 year ago
Most LLM benchmarks are typically designed with specific targets in mind, such as coding or language understanding. However, I believe the time is ripe for also having cross-model challenges. I was curious to see if anyone has already explored or implemented this approach.