We understand reinforcement learning but not how to properly implement it.
In the real world when you reward a pet for doing a trick on command there's a time gap between the command you gave it, the action it performed and you giving it a reward. That gap introduces the potential for false associations, this isn't so bad when the gap is merely a few seconds but as that gap increases the number of potential false associations increases exponentially. After a few minutes the amount of processing power needed to sort the metaphorical wheat from the chaff becomes impractical, at least not without resorting to various tricks that get you to the answer quicker at the expense of some mental flexibility. If the AI dog is specifically programmed to log commands and performed actions (or sequences of actions) and to compare them to when it receives rewards it'll come to the right association (for a given degree of "right", that's another problem) but it's now limited to that paradigm of thought and will have a harder time finding the right association for a situation that doesn't fit the paradigm.
I think we know all the high level and foundational stuff we need but there's this middle layer of metacognition that just seems to get bigger the more you get into it like some vast abyss between an ocean's floor and surface.
To an AI researcher the human brain is like a modern passenger jet to the Wright brothers, they get the basic concepts of combustion engines and aerodynamics but the jet uses that knowledge on a whole different level.