Video Manipulation With Perl Helen Cook May, 2003 This is an effort to create a system which will manipulate video as easily as text is currently. The system allows a user to manually input events, or to plug in modules that automate the process of finding events in a video stream. A tag or set of tags is created for each event, and the tags can then be searched and selected. Video corresponding to the selected events is concatenated to form the final output. Why is searching text easy while searching video is hard? Text contains atoms -- words -- which retain some meaning if isolated from the whole, albeit out of context. If a keyword is located in a sample text, surrounding phrases are likely to be relevant to that keyword. Atoms of video (by which I mean both audio and picture) -- frames and sounds -- do not as readily exhibit this property. If one is searching for video containing a person, say, the number of variables that govern how the person will appear is very large. People are non-rigid, may be viewed from multiple angles and in variable lighting conditions. The person's name may be spoken, but is likely to be obscured by background noise or accents. If searching text, on the other hand, one would look for a name or title, which is simple to find. Finding the correct combination of low level features to accuately represent a specific event, or worse, an arbitrary future event is hard. So to search video effectively, it is clear that we must cheat. If the problem of searcing video were converted to a problem of searching text, it would become trivial (provided the mapping of video to text was reliable). The simplest, though most crude, way to construct this mapping is to have a human view the video, and input tags corresponding to what he considers to be important events. FindEvent::Manual implements several different ways of inputing tags. Events can either have start and end points, or they can be instantaneous. Events can be undone, and strings (tags) can be attached to any type of event. All events have an associated time they occured, type, value indicating how interesting the event is, and probability (that the event actually occured at it's associated time. Associated with a type of event is an envelope (most important for instantaneous events), which indicates how the interestingness level falls off around the event. Using this metadata, the video can be searched and compressed in many different ways. Only the events above a threshold interestingness can be selected to create a summary of the video, or, events with tags matching a regex can be selected to extract video on a certain topic.