GOALS - EXPECTED RESULTS

Over the last decades, there has been a significant increase in the availability of high quality digital video as a result of the expansion of broadband services and high-capacity storage media. Due to the extensive use of video in many different applications such as distance learning, digital libraries, internet TV and video on demand services, as well as thousands of movies and other broadcasts continuously produced, a huge amount of information of video data is added daily to the repositories of various organizations and companies. This implies a strong need for techniques and applications that will provide an efficient method for indexing, browsing and retrieval of video data.

The basic idea of the project is to complete research on video alaysis and develop an application that would allow companies, organizations producing audiovisual data, broadcasters, professional filmmakers and anyone interested in video editing, to organize easily and effectively audiovisual data and automatically create summaries of unedited/edited video. The modeling, representation, summarization, indexing, retrieval and browsing of video data w.r.t to visual content will be the basic research of VIDEO-SUM project.

The application to be developed will provide description, indexing, search, storage and edit of audiovisual data produced by end-users. Moreover, the ability to create video summaries automatically and fast will also be available. Usually, videos related to cultural events, filming at monuments and archaeological sites, have large duration, therefore such video summaries constitute a valuable knowledge and contribute to the promotion and enhancement of cultural heritage. Emphasis will be given to processing of news and documentaries.

TECHNICAL OR/AND RESEARCH OBJECTIVES

This project will focus on industrial research on machine learning techniques for multimedia knowledge management. More specifically, it will focus on video segmentation and representation using machine learning techniques. Moreover, the methods that will be developed will be applied to summarization of unedited video (rushes summarization).

A software configuration will aslo be implemented for the preservation of digital libraries, which include description of relevant information with xml files, organization of thematic modules (collections) and creation of indexes of instant search to digital data, storage in relational bases etc. Furthermore, they can be published over the internet, thus any interested user can have instant access on the video data. Next, we will examine how these two technologies can be combined to form a complete solution for companies such as these involved in the VIDEO-SUM project.

Under the project, the following research issues will be examined:

Shot boundary segmentation: The first level of video segmentation which is associated with the detection of shot boundaries.
Keyframe extraction: The extraction of unique frames that represent adequately the content of each video shot.
Video scene segmentation: A scene refers to a group of shots that take place in the same physical location (e.g. a dialogue detection in a room) or a group of shots that describe an action or event (e.g a car chase by police cars).
High level segmentation: A more compact representation/segmentation of a video is the merging of scenes into logical story units. The latter, corresponds to the DVD chapters describing the different sub-themes of a movie.

Further research will focus on the issue of summarizing unedited video. The unedited video contains enough redundant information (repetitve shots) and unwanted information such as monochrome frames, colorbars and clapboards. We will develop and implement appropriate methodologies to automatically remove unwanted frames and repetitive information. Similar shots will be grouped and only one representative will be included in the final summary.

At the end of the project an application will be developed that will provide users with abilities such as :

Editing audiovidual dataa and creating summaries, thus gaining much in time .
Text-based search in digital libraries and in application.
Automatic and efficinent indexing of audiovisual data.

ACHIEVEMENTS

RESEARCH RESULTS

During the VIDEOSUM project we have improved and developed methodologies on the following research topics:

Shot Boundary Detection.
Unwanted Shots/Frames Removal.
Shot Summarization via keyframe extraction.
Detection and Characterization of camera movements.
Detection of Sequences of Similar Shots.
Video segmentation into scenes and chapters.
News video segmentation into thematic units.

The methodologies that were developed and the research results that were obtained in some of the aforementioned research issues were reported in the following scientific publications:

C1. A. Kalogeratos and A. Likas, "Dip-means: an incremental clustering method for estimating the number of clusters", Proc. Neural Information Processing Systems (NIPS'12), Lake Tahoe, Nevada, USA, 2012.

C2. K. Blekas and A. Likas, "The mixture of multi-kernel relevance vector machines model", Proc IEEE Int. Conf. Data Mining (ICDM'12), Brussels, 2012.

C3. G. Tzortzis and A. Likas, "Kernel-based Weighted Multi-view Clustering", Proc IEEE Int. Conf. Data Mining (ICDM'12), Brussels, 2012.

C4. A. Ioannidis, V. Chasanis and A. Likas, "Key-frame Extraction using Weighted Multi-View Convex Mixture Models and Spectral Clustering", 22nd International Conference on Pattern Recognition (ICPR14), Stockholm, Sweden. Best Scientific Paper Award, Track 3 "Image, Speech, Signal and Video Processing", Stockholm, 2014.

C5. V. Chasanis, A. Ioannidis, and A. Likas, "Efficient Key-frame Extraction Based on Unimodality of Frame Sequences", 12th IEEE International Conference on Signal Processing (ICSP 2014), Hangzhou, 2014.

C6. A. Pappa, V. Chasanis, and A. Ioannidis, "Rushes Video Segmentation Using Semantic Features", 8th Hellenic Conference on Artificial Intelligence (SETN 2014), Ioannina, 2014

C7. A. Ioannidis, V. Chasanis and A. Likas, "An Agglomerative Approach for Shot Summarization Based on Content Homogeneity", 7th International Conference on Machine Vision (ICMV 2014), Milan, 2014.

C8. Chasanis V, Voglis C., Ioannidis A., Lanaridis A., Vathi E., Siolas G., Likas A. and Stafylopatis A., "VideoSum: A Video Storing, Processing and Summarization Platform", 12th Asian Conference on Computer Vision (ACCV 2014). (Demo), Singapore, 2014.

C9. Chasanis V, Voglis C., Ioannidis A., Lanaridis A., Vathi E., Siolas G., Likas A. and Stafylopatis A., "VideoSum: A Video Storing, Processing and Summarization Platform", 11th European Conference on Visual Media Production (CVMP 2014). (Demo & Short Paper), London, 2014.

VIDEOSUM APPLICATION

The VIDEOSUM application provides the following functionalities:

Editing audiovisual data and creating summaries that reduce the data volume and processing time accordingly.
Text-based search in digital libraries.
Automatic and efficient indexing of audiovisual data.

VIDEOSUM can handle a variety of video types, edited or unedited. The main features/contributions of this system are:

Video Segmentation: Segmentation can be performed at three levels: from the finest shot level, the intermediate scene level and the coarse chapter level.
Video Summarization: Several efficient summarization algorithms provide alternative summarizations of the video content at any level.
Video Representation: Different visual representations of the video's summary are available to the user.
Video Storage: Storage of summaries at different forms, such as video, images, xml and html.

The system is fully automated, executing several actions depending on the video type. Although default parameters are available, the system could also be adjusted according to the user's preferences. The basic actions of the system are:

Shot Boundary Detection
Unwanted Shots/Frames Removal
Shot Representation/Summarization
Detection of Sequences of Similar Shots
Scene/Chapter segmentation
News Video Segmentation
Camera Movements detection

DIGITAL LIBRARY

Alongside with the implementation of the video summarization application, we have implemented a digital library that works in conjunction with the application to organize audiovisual content and their corresponding summaries as created from the VIDEOSUM application. The digital library interacts with the application VIDEOSUM in two ways. On the one hand, it is possible through the digital library menu, if a video file is selected, to open the VIDEOSUM application and load the corresponding file. Thus, the overall time to extract and archive the exported summary is reduced. On the other hand, in the digital library, useful information extracted from the VIDEOSUM application can be added. More specifically, a video summary of the original video (the basic output of the VIDEOSUM application), montage images, keyframes and a XML file summary can be added to characterize each video file.

The digital library automatically indexes audiovisual content, enables browsing of records and provides the ability to search simple keywords or complex, in each field separately. The representation of each record is complete, since the user can watch simultaneously the primary video, the video summary, the fields with all the text information and additional files such as montage image, keyframes and the XML summary file. Moreover, for each video whose summary is available, the VIDEOSUM can be directly executed by loading the video and its corresponding XML summary file. Additionally, the digital library provides the ability to upload each file (video, summary, images, XML) locally and the ability to import / export all the records in an XML format for management of multiple records and interaction with other applications.