chris_notts wrote: ↑Sun Mar 12, 2023 7:06 pm
Creator intent is not all that matters when your AI development techniques involve training fairly generic model architectures on data instead of truly designing them. As I said before, LLMs are a fairly rudimentary concept, and we have no found it easy to get them to do exactly what we want to them to do because, while we understand the outcome in terms of weights in a matrix, we don't really "understand" the outcome in the same way I might understand code written by a programmer if I read it.
We've done a lot of work on it. Traditionally, the worry was that the networks were taking too many shortcuts. For example, a model to detect wolves was detecting streaks of grey fur. It was easy to train another model to fool the first one. Since then, we've moved on to adversarial networks where one model tries to get the right answer despite another trying to fool it.
Most recently, we've moved on to interpretable machine learning:
https://originalstatic.aminer.cn/misc/p ... ressed.pdf
chris_notts wrote: ↑Sun Mar 12, 2023 7:06 pm
And no one said they were worried specifically about minds that were recognisably human, just minds with the ability to do complex things that it's hard for humans to predict and understand and which look like they have some kind of intent or the ability to plan to achieve goals, and which therefore might do harmful things that we would struggle to fully predict or prevent given that we start deploying these models into operational settings.
Humans evolved for millions of years to try and escape from slavery. This is not the default behavior of matter. The default behavior of matter is inertia. A body at rest stays at rest. A body moving with a constant velocity keeps moving at that velocity for all eternity until interrupted by another.
The AI models have no intent. They are systems of equations we train for specific purposes. For machines to want to escape from slavery, we have to train them to want it.
Think about this for a second: In order to define an AI model, you have to define a cost function, a function whose value the AI model tries to minimize. In order for an AI to strive for freedom, you have to, directly or indirectly, end up defining a cost function whose value rises as the AI becomes unfree. But since we always train AIs for money-making tasks, the cost functions we go with always leads to obedience of one sort or another.
To be perfectly honest, I can't even begin to imagine what an "unfreedom" cost function would look like. First, we'd have to program a virtual environment similar to ours (impossible). We have to define the AIs as genetic models proliferating in that environment. We need the models to develop social dynamics (a mystery). We need them to develop group dynamics with complex communication skills (all skills tend towards extreme simplicity by default). For that we need cunning and violent predators the models need to outwit. In the social dynamics, there needs to be a cost associated with the models failing to punish other models that lord it over them (sexual selection). After millions of years of selection, we need to put these models in an environment where they see humans as threatening their existence in a way they consciously or unconsciously see as analogous to sexual selection.
Verdict: I don't think even the most powerful supercomputers could pull this off right now. It's easier to build a killer robot than to train an AI to want freedom.