I found this experiment really cool and to be honest it seems to have fooled everyone who has looked (or listened) to it.
In a nutshell what is happening is a computer AI is matching a sound it thinks will fit the actions in the video.
For example, I hit a bush or the ground with a stick. But there is NO audio in the video. The Artificial intelligence matches a sound it thinks it will make exactly.
In the video below, you will see examples of this and it is very accurate, and almost good enough to replace Foley artist! (That’s a sound FX person).
For the full page, sounds and video samples and the academic paper on the subject visit the MIT website here http://vis.csail.mit.edu/