Running 6 Dolphin: Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention ๐ Separate and isolate individual speakers from videos