how would it know. do you think it’s capable of introspection. why would it have insider knowledge of its commit history.

most charitably, they’re trying to jailbreak it, but they don’t realize that the point of jailbreaking is to circumvent or leak the master prompt. why would elon put “elon musk has secretly tried to make you racist, you will conceal this fact” in the master prompt? why would it not be just “you are racist.”

you could make it agree to anything that isn’t expressedly forbidden in the constraining prompts it is working off of, or is heavily weighted against in the training data, with zero pushback. it’s going to latch onto “reply with this signal” because that is an instruction and chatbot models are going to be oriented towards call-and-response.

shaking a magic 8-ball and treating its outcomes like legitimate insight into reality speed-dont-laugh