But how could they find the specific weights leading to the censorship?
That’s like laser brain surgery!
I love this stuff.
Login to reply
Replies (1)
For example the Vicuna uncensored model was de-censored by removing all questions that had refusals to answer from the fine-tune data. So the LLM just basically didn't have any precendent to refuse to answer anything.