1. The WebAudio api requires user interaction to allow playing audio. The main menu saying "press any key" is intended to trigger that.
2. One WebAudio is ready to use, I build all the sound effects procedurally. There's 13 SFX and 3.5 minutes of music that need to be generated! This takes about 5-6 seconds total to churn through all the math used to build the audio buffers.
I'm guessing it's because WebAudio won't let you create audio buffers until a user gesture like a click allows audio context to be started ("resumed").