Interactive Digital Multimedia

IGERT Summer Projects

 

Out of the Ether: A System for Interactive Control of Virtual Performers Using Video and Audio Systems

 

 

Previous

Students

John Thompson, Music
Mary Li, Elec & Comp Engineering
Lance Putnam, Media Arts & Tech
Jim Kleban, Elec & Comp Engineering

 

 

 

 

Faculty Advisors


Xinding Sun, Elec & Comp Engineering
JoAnn Kuchera-Morin, Media Arts & Tech
B.S. Manjunath, Elec & Comp Engineering

Next

 
 

Abstract

The goal of this project is to design an intelligent system that extends the traditional musical instrument and the conventional performance style. Specifically, a human flutist will interact with an Intelligent Virtual Being (IVB) in a similar way to another human player. This will require the use of non-intrusive sensors for perceiving the actions of the human player and a system for detecting gestures or patterns in the raw data to further map to a predetermined musical composition. Specific gestures are to be recognized to cue events, sequences, and provide continuous control over sound transformation processes. Another major challenge is to create a system that is as easily transportable, simple to set up, and robust as a traditional musical instrument is for a performer.

The technical focus of the IVB will be on developing pattern recognition (PR) algorithms to convert raw audio and video samples to performance cues. On the computer vision side, selecting the proper PR techniques will be of prime importance since many are domain specific. This involves implementing and evaluating system performance of various selected techniques. With the constraint of computer hardware and the goal of achieving real-time response, the 2D image sequences, i.e. the pixel level raw information, need to be compacted into useful and representative features, and these features should be invariant to environmental (lighting, people, and hardware) changes. To begin with, the IVB will undergo a supervised learning process to visually recognize pre-defined gestures, including entrance and exit cues. The audio aspects of the IVB will involve converting the flutist’s sounds to spectral data, storing the spectral data in a buffer, analyzing the waveform/spectral data to extract higher level features and performing transformations on the spectral data based on the combined features of the audio and a pre-determined score. The prominent features to be extracted will be pitch, amplitude dynamics, and voice/noise presence and will be used to trigger events from a predetermined score. Ideally, during real musical performance, the IVB will accurately classify new information based on the prior knowledge base and make a real-time response, such as to initiate a synthesis process.

By studying the interaction of human musicians we hope to give a virtual performer some of the intelligence that is demonstrated in these interactions. Eventually, we hope to teach the virtual performer to derive other structural features of the musical interaction from the analysis of the sensor space. At the core of our concern is to build a system that is robust enough to provide deterministic results in the context of serious music performance. On a more general level, musical interaction confronts multi-sensory input, analysis, and mapping of a complex array of human communication signals, as well as, the multi-modal output of the resulting media content. Studying musical human-computer interaction provides a rich source for understanding subtle and expressive human communication for cross-disciplinary exploration of human-computer interaction techniques.