Tutorials & Workshops

WORKSHOPS

The following workshops are offered free of charge to all interested participants.


  • MATLAB Today

    +

    MATLAB® has been on a fast track recently, delivering many significant improvements that will affect all aspects of your technical computing work. Come hear the latest from a MathWorks engineer with more than 20 years of experience developing MATLAB and Image Processing Toolbox™.
    Learn more about:

    • The new graphics system – an updated look and easier interaction
    • The new default MATLAB color map – why we changed it and how we designed it
    • New math, image processing, and computer vision algorithms – for your prototyping and research
    • New software developer tools – for managing your code
    • Improved performance – for tackling bigger problems faster
    Room: Exact room will be announced in the final program
    Date: MONDAY, September 28, 2015
    Time: 12:30 – 14:00
    Cost: Free

    Presenter Biography

    Dr. Steve Eddins, an electrical engineer turned software developer, has developed MATLAB and image processing products at MathWorks since 1993. He is a senior MATLAB designer for language evolution and for the overall collection of MATLAB functions and types. He also coaches MathWorks development teams on the design of programming interfaces intended for use by engineers and scientists.

    Before moving to his current role, he led development of the Image Processing Toolbox for 15 years. During that time he designed and implemented many image processing capabilities, including filtering and filter design, transforms, mathematical morphology, geometric transformations, analysis, segmentation, color science, visualization, and image file formats. He also created the second-generation MATLAB performance profiler and the MATLAB xUnit Test Framework.

    Before joining MathWorks, Steve was on the faculty of the Electrical Engineering and Computer Science Department at the University of Illinois at Chicago. There he taught graduate and senior classes in digital image processing, computer vision, pattern recognition, and filter design, and he performed research in image compression. Steve coauthored the book Digital Image Processing Using MATLAB and writes regularly about image processing and MATLAB on his blog, Steve on Image Processing.

    Steve received his B.E.E. and Ph.D. from the Georgia Institute of Technology. He is a senior member of the IEEE.
    Twitter: @SteveEddins

  • Image and Video DSP at Google

    +

    While well known for search, Google has now grown to generate significant impact in the media-processing space. Recruitment in media/imaging and vision has been growing for some time. This workshop features four Googlers, David Gallup, Peyman Milanfar, Anil Kokaram and Debargha Mukerjee, with 20-min snapshots of Imaging and Video DSP technology currently being explored by Google/YouTube. We highlight key developments and expose some of the underbelly of technology research and development in YouTube, Chrome, and Google Research itself.

    Jump is a new system for creating VR video being developed at Google. It consists of a new multi-camera rig design, a cloud based stitching algorithm and playback on YouTube. In my talk I will tell the story of Jump, from the initial prototypes to commercial deployment and share some of our adventures along the way.


    Room: 206A
    Date: TUESDAY,September 29, 2015
    Time: 10:30 – 12 :30
    Cost: Free

    Pressenters Biographies

    David Gallup joined Google in 2010 after receiving his Ph.D. in Computer Science from the University of North Carolina. He has worked on 3D reconstruction, view synthesis, and 3D photo browsing. He is currently a tech lead for Jump, Google's stereo 360 VR video effort.

    Anil Kokaram is a Tech Lead in the Transcoding Group at YouTube/Google, leading a group responsible for video quality. He is also a professor at Trinity College Dublin, Ireland.

    Peyman Milanfar has been involved with the Glass project at Google and now leads a team within Google Research focussed on imaging. He was Associate Dean for research and graduate studies at the University of California Santa Cruz (UCSC) from 2010 to 2012.

    Debargha Mukherjee, received the Ph.D. in Electrical and Computer Engineering from the University of California Santa Barbara in 1999. Between 1999 and 2010, he was at Hewlett Packard Laboratories conducting research on video and image coding and processing. Since 2010, he has been with Google where he is currently involved with open-source video-codec development.

TUTORIALS

All tutorials are half-day long and will be held on Sunday, Sept. 27, 2015 in either the morning (TAM) or afternoon (TPM).


Morning Sessions (09:00 – 12:30)

  • TAM-T1 (Invited) – Deep Learning in Image Processing and Vision

    +

    Instructors

    Yoshua BENGIO and Roland MEMISEVIC, Université de Montréal, Canada

    Classroom

    To be announced

    Course Motivation and Description

    Machine learning enables computers to learn about the world around us but also holds fundamentally hard challenges associated with the so-called curse of dimensionality: the huge number of possible observations, events, or configurations of variables. Deep learning has been introduced to face that challenge by adding to the rich science of machine learning the notion of deep representation, the idea that better models can be learned if the machine constructs and discovers rich and abstract representations of the data. Past and future advances in deep learning hold incredible promises of technological advances on the path towards AI. This realization has strongly influenced information technology markets recently and there are already impressive fallouts from these investments in science and technology.

    This tutorial will cover some of the main current topics in deep learning research and applications, starting from the theoretical underpinnings of distributed representations and depth, as well as a detailed description of the most commonly used method for obtaining parameter gradients, i.e., the backpropagation algorithm. It will show how these ideas are incorporated in convolutional neural networks (for images) and recurrent neural networks (for capturing sequential structure). Although the deep learning breakthroughs started with unsupervised learning, most of the current applications have focused on supervised learning, as many challenges but also major promises remain, in the land of deep unsupervised learning. A brief introduction will be given to the current state-of-the-art in this area and how these ideas are motivated the point of view of geometry (manifold learning) and the discovery of underlying causal factors. The tutorial will close with the lighter subject of applications of deep learning in industry, with a focus on computer vision and image processing.

    Course Outline

    The course will cover the following aspects:

    • - Motivations for deep learning
    • - Theoretical underpinnings, distributed representations & depth
    • - Multi-layer networks and backpropagation
    • - Convolutional networks and recurrent neural networks
    • - Underlying factors, unsupervised learning and transfer learning
    • - Auto-encoders and deep generative models
    • - Applications to computer vision, speech and language understanding

    Course Prerequisites

    Undergraduate degree in mathematical sciences, or the equivalent.

    Distributed Material

    • - Copy of the slides
    • - Free access to draft chapters of the Deep Learning book (MIT Press, to appear).

    Bibliographies

    Yoshua BENGIO (PhD in Computer Science, McGill University, 1991) did two post-docs at M.I.T. (Michael Jordan) and AT&T; Bell Labs (Yann LeCun), then became professor at the Department of Computer Science and Operations Research at Université de Montréal. He authored two books and around 200 publications, the most cited being in the areas of deep learning, recurrent networks, probabilistic learning, natural language and manifold learning. He is among the most cited Canadian computer scientists and is or has been associate editor of the top journals in machine learning and neural networks. Since '2000 he holds a Canada Research Chair in Statistical Learning Algorithms, since '2006 an NSERC Industrial Chair, since '2005 is a Fellow of the Canadian Institute for Advanced Research. He is on the the NIPS foundation board and has been program chair and general chair for NIPS. He has co-organized the Learning Workshop for 14 years and co-created the new International Conference on Learning Representations. His current interests are centered around a quest for AI through machine learning, and include fundamental questions on deep learning and representation learning, the geometry of generalization in high-dimensional spaces, manifold learning, biologically inspired learning algorithms, and challenging applications of statistical machine learning.

    Roland MEMISEVIC (PhD in Computer Science, University of Toronto, 2008) held positions as research scientist at PNYLab, Princeton, as post-doc at the University of Toronto and at ETH Zurich, and as a junior professor at the University of Frankfurt, Germany. In 2012, he joined the University of Montreal as an assistant professor in Computer Science. His research interests are in deep learning and computer vision with a focus on approaches that extend deep learning beyond object recognition towards more general tasks in vision and AI. His scientific contributions include approaches to learning motion and transformation patterns from images and videos, and approaches to learning invariance from data. He presented his work at conferences such as NIPS, CVPR, ICCV, ICML, AAAI, and in journals including PAMI, Neural Networks, Neural Computation. He served as a program committee member or reviewer for most of these and other conferences and journals in machine learning and computer vision. Roland Memisevic has been invited speaker at numerous deep learning events and tutorials.

  • TAM-T2 – HEVC/H.265 Video Coding Standard (v. 2) Including Range, Scalable, and Multiview Extensions

    +

    Instructors

    Dan GROIS, Fraunhofer Heinrich Hertz Institute, Germany
    Benjamin BROSS, Fraunhofer Heinrich Hertz Institute, Germany
    Detlev MARPE, Fraunhofer Heinrich Hertz Institute, Germany
    Karsten Suehring, Heinrich Hertz Institute, Germany

    Classroom

    To be announced

    Course Motivation and Description

    The High-Efficiency Video Coding (HEVC) standard is the latest standard developed by a Joint Collaborative Team on Video Coding (JCT-VC), the first version of which was established by both ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Pictures Expert Group (MPEG) in January, 2013. When compared to its predecessor, i.e. the H.264/MPEG-4 Advanced Video Coding (AVC) standard, H.265/MPEG-H HEVC allowed achieving dramatic bit-rate savings due to employing state-of-the-art technological achievements. H.265/MPEG-H HEVC was also especially designed for the High Definition (HD) as well as to the Ultra-High Definition (UHD) video content, which often refers to both 3840x2160 (4K) or 7680x4320 (8K) resolutions in terms of luma samples, the demand for which is expected to dramatically increase in the near future.

    This tutorial will be focused on the second version of the HEVC video coding standard, which was officially issued in October 2014, further including the combined extensions: RExt – the range extensions, SHVC – the scalable extension, and MV-HEVC – the multiview extension.

    First, the speakers will provide a brief overview of the H.264/MPEG-4 AVC standard, followed by a detailed overview of the HEVC coding tools, which led to such significant improvements in coding efficiency compared to H.264/MPEG-4 AVC (including the quadtree coding structure, intra/inter-prediction, in-loop filtering, high-level syntax, transform coding, entropy coding, parallel coding tools, etc.), further making a special emphasis on the compression efficiency and performance. Second, the speakers will give an overview on the HEVC extensions: particularly, the range extensions (including extended bit depth, chroma format support, etc.), the scalable extension (including the up-sampling process, inter-layer prediction process, etc.), and the multiview extension (including the inter-view prediction, etc.). Finally, this talk will be concluded by a discussion regarding further research directions and challenges.

    Course Outline

    • - Brief review of H.264/MPEG-4 Advanced Video Coding (AVC) standard;
    • - H.265/MPEG-H HEVC version 1 standard;
    • - H.265/MPEG-H HEVC version 2 standard, including RExt – the Range Extensions, SHVC – the Scalable Extension, and MV-HEVC – the Multiview Extension.

    Course Prerequisites

    There are no course pre-requisites since the tutorial is aimed at an audience from very diverse backgrounds.

    Distributed Material

    Attendees will receive the slides presented during the tutorial.

    Biographies

    Dan GROIS received Ph.D. degree at the Communication Systems Engineering Department, Ben-Gurion University of the Negev (BGU), Israel, 2011. From 2011 to 2013, Dan was a Senior Researcher at the Communication Systems Engineering Department, BGU. Starting from the middle of 2013, Dan is a Post-Doctoral Senior Researcher at the Image Processing & Analytics Department of the Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI), Germany. Dan is an author and co-author of about 40 publications in the area of image/video coding and data processing, which have been presented at top-tier international conferences, and published in various scientific journals, books, etc. In addition, Dan is a referee of top-tier conferences and international journals, such as the IEEE Trans. in Image Processing, IEEE Trans. on Multimedia, IEEE Trans. on Signal Processing, Journal of Visual Comm. and Image Repres., Elsevier, IEEE Sensors, SPIE Optical Engineering, etc. In 2013, Dan also served as a Guest Editor of the SPIE Optical Engineering journal. During his academic carrier, Dan was granted various fellowships, including Kreitman Fellowships and the ERCIM Alain Bensoussan Fellowship, which was provided by the FP7 Marie Curie Actions COFUND Programme. In addition, Dan currently is a Fellow of the PROVISION ITN project, which is a part of the European Union’s Marie Skłodowska-Curie Actions of the European Commission. Dan is a Senior Member of the IEEE, a Member of the ACM and SMPTE societies. Dan's research interests include image and video coding and processing, video coding standards, particularly H.265 | MPEG-H High-Efficiency Video Coding (HEVC), region-of-interest scalability, computational complexity and bit-rate control, network communication and protocols, and future multimedia applications/systems.

    Benjamin BROSS is a Project Manager at the Image Processing & Analytics Department of the Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute, Berlin and a part-time lecturer at the HTW University of Applied Sciences Berlin. He received the Dipl.-Ing. degree in electrical engineering from RWTH University Aachen, Germany in 2008. During his studies he was working on three-dimensional image registration in medical imaging and on decoder side motion vector derivation in H.264/MPEG-4 Advanced Video Coding (AVC). Since the development of the new H.265 | MPEG-H High-Efficiency Video Coding (HEVC) Standard, which started in 2010, Benjamin was very actively involved in the standardization process as a technical contributor and coordinator of core experiments. In July 2012, Benjamin was appointed as a co-chair of the editing Ad Hoc Group and became the chief editor of the HEVC video coding standard. At the Heinrich Hertz Institute, he is currently responsible for the development of HEVC conforming real-time encoders and decoders. Besides giving talks about the emerging HEVC video coding standard, Benjamin Bross is an author or co-author of several fundamental HEVC-related publications, and an author of two book chapters on HEVC and Inter-Picture Prediction Techniques in HEVC. He received the IEEE Best Paper Award at the 2013 IEEE International Conference on Consumer Electronics – Berlin in 2013 and the SMPTE Journal Certificate of Merit in 2014.

    Detlev MARPE is Head of the Image Processing & Analytics Department and Head of the Image & Video Coding Group of the Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute, Berlin. He is also active as a part-time lecturer at Technical University Berlin. He received the Dipl.-Math. degree from the Technical University of Berlin (TUB), Berlin, Germany and the Dr.-Ing. degree from the University of Rostock, Germany. For over a decade, he has successfully contributed to the standardization activities of ITU-T VCEG, ISO/IEC JPEG, and ISO/IEC MPEG for still image and video coding. During the development of the H.264 | MPEG-4 Advanced Video Coding (AVC) standard, he was chief architect of the CABAC entropy coding scheme as well as one of the main technical and editorial contributors to the so-called Fidelity Range Extensions (FRExt) with the addition of the High Profile in H.264 | MPEG-4 AVC. He was also one of the key people in designing the basic architecture of Scalable Video Coding (SVC) and Multiview Video Coding (MVC) as algorithmic and syntactical extensions of H.264 | MPEG-4 AVC. During the recent development of the H.265 | MPEG-H High-Efficiency Video Coding (HEVC) standard, he made significant contributions to the design of its fundamental building blocks. In addition, he also made successful proposals to the recent standardization of its Range Extensions and 3D Extensions. For his substantial contributions to the field of video coding, he received numerous awards, including, amongst many others, a nomination for the 2012 German Future Prize, the Karl Heinz Beckurts Award 2011, and two Emmy Engineering Awards in 2008 and 2009. Detlev Marpe is author or co-author of more than 200 publications in the area of video coding and signal processing,. He is an IEEE Fellow and Member of the German Information Technology Society. He also serves as an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology. His current research interests include image and video coding, signal processing for communications as well as computer vision and information theory.

    Karsten SUEHRING is a Project Manager at the Video Coding & Analytics Department of the Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute. He received the Dipl.-Inf. (FH) degree in applied computer science from the University of Applied Sciences, Berlin, Germany, in 2001. Already as a student he was involved in MPEG standardization activities as maintainer of one of the reference implementations for MPEG-4 Part 2. When the JCT-VC was founded in 2001, has was appointed as coordinator of the JM reference software of H.264/MPEG-4 AVC. Since June 2011 he was chairing the JCT-VC ad-hoc group on software development and is one of the coordinators for the HM reference software for HEVC. His current research interests include coding and transmission of video and audio content, as well as software design and optimization. At the Heinrich Hertz Institute, he is currently responsible for the development of H.264/AVC and HEVC decoder test products.

  • TAM-T3 – Image Processing for Cinema

    +

    Instructors

    Marcelo BERTALMÍO, Universitat Pompeu Fabra, Spain

    Classroom

    To be announced

    Course Motivation and Description

    This tutorial provides a detailed overview of the relevant image processing techniques that are used in practice in cinema, covering a wide range of topics showing how image processing has become ubiquitous in movie-making, from shooting to exhibition. It is intended primarily for advanced undergraduate and graduate students in applied mathematics, image processing, computer science and related fields, for researchers from academia, and also for professionals from the movie industry.

    The tutorial does not deal with visual effects or computer-generated images, but rather with all the ways in which image processing algorithms are used to enhance, restore, adapt or convert moving images, their purpose being to make the images look as good as possible while exploiting all the capabilities of cameras, projectors and displays.

    Current digital cinema cameras match or even surpass film cameras in color capabilities, dynamic range and resolution, and several of the largest camera makers have ceased production of film cameras. On the exhibition side, film has practically disappeared from American movie theaters. And while many mainstream and blockbuster movies are still being shot on film, they are all digitized for postproduction. Therefore, in this tutorial we will equate "cinema" with "digital cinema", considering only digital cameras and digital movies, and not discussing algorithms for problems that are inherent to film, like the restoration of film scratches or color fading.

    The tutorial is structured in three parts. The first one covers some fundamentals on optics and color. The second part explains how cameras work and details all the image processing algorithms that are applied in-camera. The last part is devoted to image processing algorithms that are applied off-line in order to solve a wide range of problems, presenting state-of-the-art methods. The mathematical presentation of all methods will concentrate on their purpose and idea, leaving formal proofs and derivations for the interested reader in the cited references.

    Course Outline

    • - Fundamentals on color, optics, photography;
    • - In-camera image processing: Image processing pipeline, Image sensors, Exposure control, Focus control, White balance, Color transformation, Gamma correction and quantization, Edge enhancement, Output formats;
    • - Noise and dynamic range: Classic denoising ideas, Non-local approaches, New trends and optimal denoising, High dynamic range imaging, Tone mapping;
    • - Color correction: Human color constancy, Computational color constancy under uniform illumination, Retinex and related methods, Cinema and colors at night, Color matching, Color stabilization;
    • - Image stabilization: Rolling shutter compensation, Compensation of camera motion;
    • - Zoom-In and Slow Motion;
    • - Gamut mapping: Color gamuts, Gamut reduction, Gamut extension, Validating a gamut mapping algorithm;
    • - In-painting: Video in-painting for specific problems, Video in-painting in a general setting, Video inpainting for stereoscopic 3D cinema.

    Course Prerequisites

    There are no course pre-requisites since the tutorial is aimed at an audience from very diverse backgrounds.

    Distributed Material

    Attendees will receive the slides presented during the tutorial.

    Biographies

    Marcelo BERTALMÌO received the Ph.D. degree in electrical and computer engineering from the University of Minnesota in 2001. He is an Associate Professor at Universitat Pompeu Fabra, Spain.

    His publications total some 7,000 citations. He was awarded the 2012 SIAG/IS Prize of the Society for Industrial and Applied Mathematics of the USA (SIAM) for co-authoring the most relevant image processing work published in the period 2008-2012. Has received the Femlab Prize, the Siemens Best Paper Award, the Ramón y Cajal Fellowship, and the ICREA Academia Award, among other honors. He is an Associate Editor for SIAM-SIIMS and the secretary of SIAM's activity group on imaging. Has an ERC-Starting Grant for his project “Image processing for enhanced cinematography”. Has written a book titled “Image Processing for Cinema”, published by CRC Press / Taylor & Francis. Has directed two award-winning feature-length films.

    His current research interests are in developing image processing algorithms allowing to shoot cinema with no more artificial lighting than what people present at the scene need to be able to see. The approach is to work out software methods mimicking neural processes in the human visual system, and apply them to images captured with a regular digital movie camera.

  • TAM-T5 – Visual saliency: Fundamentals, Applications, and Recent Progress

    +

    Instructors

    Ali BORJI, University of Wisconsin-Milwaukee, USA
    Neil D. B. BRUCE, University of Manitoba, Canada
    Ming-Ming CHENG, Nankai University, China
    Jian LI, National University of Defense Technology, China

    Classroom

    To be announced

    Course Motivation and Description

    Recently, visual saliency has received extensively growing attention across many disciplines including cognitive psychology, neurobiology, image processing, and computer vision. Based on our observed reaction times and estimated signal transmission times along biological pathways, human attention theories hypothesize that the human visual system processes only parts of an image in detail, with only limited processing of areas outside of the focus of attention. From an engineering perspective, such visual attention mechanisms have inspired a series of key research topics in the last few decades. One of the key forces behind these rapid developments is the vast amount of successful applications. These applications, marked by different requirements and points of emphasis have resulted in a rich kinship between fixation prediction, salient object detection, and objectness proposal generation.

    It is noted that there has consistently been many papers about visual saliency appearing in ICIP over the past decade. While there are still many open issues and challenges (sometimes diverging arguments and debates) that need to be addressed in this area, the field of saliency computing continues to grow very rapidly. In this tutorial, we will introduce basic ideas, important models and applications of visual attention and saliency. Some key research issues will be discussed including top-down vs. bottom-up attention, and the relationship between fixation prediction, salient object detection, object proposal generation, etc. Recent advances in fixation prediction, salient object detection, and objectness proposals will be introduced in detail, with a significant emphasis on their respective potential applications. Finally, we will discuss the fairness of model evaluation criteria, model benchmarking, divergent opinions, open challenges, and potential future work.

    Course Outline

    This tutorial will consist of 5 talks (about 35-40 minutes for each talk). This begins with the fundamental knowledge and important classical models. Then, we discuss the divergence of, and correlation among different subareas (fixation prediction, salient object detection, and objectness proposals), followed by detailed introduction to each subarea. Finally, we discuss topics relating to model evaluation and benchmarking. The contents of the tutorial are as follows.

    • - Fundamentals of visual attention and saliency and some important models. [Dr. Bruce]
    • - Top-down vs. bottom-up attention, relationship between fixation predictions, salient object detection, object proposal generation, etc. [Dr. Borji]
    • - Recent advances in fixation prediction, evaluation metrics and ground truth, and potential applications. [Dr. Jian]
    • - Recent advances in salient object detection, and objectness proposals, and potential applications. [Dr. Cheng]
    • - The fairness of model evaluation criteria (for both fixation prediction and salient regions detection) and model benchmarking. [Dr. Borji]

    Course Prerequisites

    The attendee only needs to have basic knowledge of digital image processing in order to follow the course.

    Distributed Material

    All materials will be distributed to the attendees electronically via webpage downloads. No physical materials will be distributed.

    Bibliographies

    Ali BORJI received his B.S. and M.S. degrees in computer engineering from the Petroleum University of Technology, Tehran, Iran, 2001 and Shiraz University, Shiraz, Iran, 2004, respectively. He received his Ph.D. degree in computational neurosciences from the Institute for Studies in Fundamental Sciences (IPM) in Tehran, 2009. He then spent a year at University of Bonn as a postdoc. Before coming to the University of Wisconsin-Milwaukee in the fall of 2014, Dr. Borji was a postdoctoral scholar at iLab, University of Southern California, Los Angeles for four years.

    Ming-Ming CHENG is an associate professor with College of Computer and Control Engineering, Nankai University. He received his PhD degree from Tsinghua University in 2012 under guidance of Prof. Shi-Min Hu, and working closely with Prof. Niloy Mitra. Then he worked as a research fellow for 2 years, working with Prof. Philip Torr in Oxford. Dr. Cheng’s research primarily centers on algorithmic issues in image understanding and processing, including image segmentation, editing, retrieval, etc. During the past 5 years, he has published a serials of influential papers in several sub-areas of visual saliency modeling, including salient object detection (e.g. his CVPR 2011 paper has received 790+ citations), objectness estimation (e.g. his CVPR 2014 oral paper has received 70+ citations and 3000+ source code downloads), and visual saliency based applications (e.g. his SIGGRAPH Asia 2009 paper ‘Sketch2Photo’ has received 250+ citations, and been reported by ‘The Telegraph’ from UK and ‘Spiegel’ from Germany).

    Neil D. BRUCE is an Assistant Professor at the University of Manitoba in Canada. His research interests include a variety of topics including both computer vision and human vision, image processing, visual attention, machine learning, computational neuroscience, information theory, sparse coding, 3D modeling and reconstruction, natural image statistics, and statistical and graphical models. Prior to joining the University of Manitoba he completed two post-doctoral fellowships, one at the Centre for Vision Research at York University, and the other at INRIA Sophia Antipolis. Previously, he completed a Ph.D. in the department of Computer Science and Engineering in 2008 as a member of the Centre for Vision Research at York University, Toronto, Canada. In 2003, he completed a M. A. Sc. in System Design Engineering at the University of Waterloo, and received an Honors B.Sc. with a double major in Computer Science and Mathematics from the University of Guelph in 2001.

    Jian LI is an assistant professor with National University of Defense Technology. He received the B.E. degree, the M.E. degree and the PhD Degree from National University of Defense Technology (NUDT), Changsha, P.R. China. From Jan 2010 to Jan 2011, he was a visiting Ph.D. student (Academic Trainee) at Center for Intelligent Machines (CIM) in McGill University under the supervision of Prof. Martin Levine.

Afternoon Sessions (13:30-17:00)

  • TPM-T1 (Invited) – Computational Photography

    +

    Instructors

    Mohit GUPTA, Columbia University, USA
    Jean-François LALONDE, Université Laval, Canada

    Classroom

    To be announced

    Course Motivation and Description

    In the last decade, computational photography has emerged as a vibrant field of research. A computational camera uses a combination of unconventional optics and novel algorithms to produce images that cannot otherwise be captured with traditional cameras. The design of such cameras involves the following two main aspects:

    • Optical coding – modifying the design of a traditional camera by introducing programmable optical elements and light sources to capture maximal amount of scene information in images;
    • Algorithm design – developing algorithms that take information captured by conventional or modified cameras, and create a visual experience that goes beyond the capabilities of traditional systems.

    Examples of computational cameras that are already making an impact in the consumer market include wide field-of-view cameras (Omnicam), light-field cameras (Lytro), high dynamic range cameras (mobile cameras), multispectral cameras, motion sensing cameras (Leap Motion) and depth cameras (Kinect).

    This course serves as an introduction to the basic concepts in programmable optics and computational image processing needed for designing a wide variety of computational cameras, as well as an overview of the recent work in the field.

    Course Outline

    • A brief history of photography − Camera Obscura − Film, Digital and Computational photography;
    • Coded photography − Novel camera designs and functionalities, including:
    • - Optical coding approaches: Aperture, Image plane, and Illumination coding; Camera arrays,
    • - Novel functionalities: Light field cameras − Extended DOF cameras, Hyperspectral cameras − Ultra high-resolution cameras (Gigapixel) − HDR cameras − Post-capture refocusing and Post-capture resolution trade-offs,
    • - Depth cameras: Structured light − Time-of-flight,
    • - Compressive sensing: Single pixel and High speed cameras;
    • Augmented photography: algorithmic tools for novel visual experiences:
    • - Multiple viewpoints: Image stitching, panoramas − Gigapixel imaging − Large-scale structure from motion,
    • - Data-driven approaches: Texture transfer − Object transfer − Color/attribute/style transfer,
    • - 2D image plane vs 3D scene: Scene geometry estimation − Light, geometry, and object editing,
    • - Smarter tools: Content-aware inpainting − Edit propagation in image collections − Matte cutouts,
    • - Smartphone photography: Cheap optics / powerful computing − Virtual tripod, Burst-mode HDR and denoising − Video stabilization,
    • - Motion magnification and visual microphone;
    • Future and impact of photography:
    • - "Social/collaborative photography" or the Internet of Cameras,
    • - Wearable and flexible cameras,
    • - Seeing the invisible: seeing around corners, through walls, laser speckle photography,
    • - Image forensics,
    • - Next generation applications (personalized health monitoring, robotic surgery, self-driving cars, astronomy).

    Course Prerequisites

    Basic knowledge of linear algebra and probability.

    Distributed Material

    Course PowerPoint / keynote slides.

    Bibliographies

    Jean-François LALONDE is an assistant professor in Electrical and Computer Engineering at Laval University, Quebec City. Previously, he was a Post-Doctoral Associate at Disney Research, Pittsburgh. He received a B.Eng. degree in Computer Engineering with honors from Laval University, Canada, in 2004. He earned his M.S at the Robotics Institute at Carnegie Mellon University in 2006 and received his Ph.D., also from Carnegie Mellon, in 2011. His Ph.D. thesis won the 2010-11 CMU School of Computer Science Distinguished Dissertation Award, and was partly supported by a Microsoft Research Graduate Fellowship. After graduation, he became a Computer Vision Scientist at Tandent, where he helped develop LightBrush™, the first commercial intrinsic imaging application, and introduced the technology of intrinsic videos at SIGGRAPH 2012. His work focuses on lighting-aware image understanding and synthesis by leveraging large amounts of data. More details about his research can be found here.

    Mohit GUPTA will start as an assistant professor in the CS department at the University of Wisconsin-Madison in January ’16. He is currently a research scientist in the CAVE lab at Columbia University. He received a B.Tech. in computer science from Indian Institute of Technology Delhi in 2003, an M.S. from Stony Brook University in 2005 and a Ph.D. from the Robotics Institute, Carnegie Mellon University in 2011. His research interests are in computer vision and computational imaging. His focus is on designing computational cameras that enable computer vision systems to perform robustly in demanding real-world scenarios, as well as capture novel kinds of information about the physical world. Details can be found here.

  • TPM-T2 – Example-based Super Resolution

    +

    Instructors

    Jordi SALVADOR, Technicolor – Deutsche Thomson, Germany
    Mehmet TURKAN, Technicolor, France & Izmir University of Economics, Turkey

    Classroom

    To be announced

    Course Motivation and Description

    Super Resolution has been one of the most popular research disciplines in image processing during the last years. From the research perspective, the reasons for this success include the interesting solutions to combinations of different image processing problems (registration, deblurring, denoising…) or the increasing understanding of the subspace of natural images and its proper application in recent statistical models. Besides, the introduction of new imaging standards with progressively higher resolutions favors the interest on new upscaling algorithms also in the industry. When properly designed, super-resolution methods are capable of adapting legacy contents to the resolution offered by the latest display technologies, either during postproduction or directly at the end user’s devices, thus offering optimal visual experiences.

    During the last years, research on example-based super resolution has received the main focus of attention essentially due to two reasons: In first place, in contrast with classic multi-frame super resolution, the use of more advanced image priors alleviates the requirement of having different captures of the same scene with subpixel shifts. Furthermore, numerical stability problems that might arise when reconstructing a super-resolved image under the commonly over-simplified parametric models in multi-frame super resolution are also avoided by using more meaningful non-parametric image priors.

    This tutorial is designed to present an evolutionary timeline of the many existing and continuously improving state-of-the-art approaches that benefit from the favorable features of example-based super resolution, with insights on the theoretical background, implementation issues (including parallelization) and discussion on the practical applicability.

    Course Outline

    The tutorial provides a thorough introduction and overview of example-based super-resolution, covering the most successful algorithmic approaches, the theory behind them, implementation insights, and some hints about current challenges and expected outcomes for the near future. The list of covered topics is as follows.

    • Introduction to super resolution

      This section introduces early (non-example-based) super-resolution pipelines and the rationale of the example-based concept covered by the rest of the tutorial.
    • - A historic view of super resolution
    • - Multi-frame super resolution
    • - Example-based super resolution
    • Self-similarity-based super resolution

      This part of the tutorial describes super-resolution models where examples are learned from one or more scales of the input data. This strategy can be efficiently implemented when hardware solutions for block search are available, and has the nice property of being implicitly adaptive to the input contents.
    • - High-frequency transfer
    • - Locally linear embedding
    • - Robust self-similarity
    • Super resolution by external learning

      This section will cover super-resolution strategies where larger amounts of data can be exploited to build suitable regression models during an offline training stage. These models can then be efficiently applied during the online inference stage. Under proper configurations, the generalizability of these machine-learning approaches can be virtually as high as that of self-similarity-based approaches and the reconstruction quality is often superior.
    • - Dictionaries
    • - Anchored neighbors and variations
    • - Hybrid models: self-similarity and regression
    • - Regression trees
    • - Deep learning

    Course Prerequisites

    The attendees should be familiar with basic concepts in image processing, probability and statistics (undergraduate courses suffice), but the tutorial is self-contained for the most part.

    Distributed Material

    All registered attendees shall receive printouts of the supporting slides.

    Bibliographies

    Jordi SALVADOR is project leader at Technicolor R&I; in Germany, where he started working in 2011, and member of Technicolor’s Fellowship Network since 2014. His main research focus is on machine learning for example-based super resolution and image restoration. Formerly, he received a M.Sc. in Telecommunications (equivalent to Electrical) Engineering in 2006 and a M.Sc. in the European MERIT program in 2008, both from the Universitat Politècnica de Catalunya (UPC) in Barcelona. He obtained the Ph.D. degree in 2011, also from UPC, where he contributed to projects of the Spanish Science and Technology System (VISION, PROVEC) and to a European FP6 project (CHIL) as research assistant on multi-camera 3D reconstruction. He has also served as reviewer in conferences and journals like EUSIPCO and IEEE Transactions on Image Processing. His research interests include 3D reconstruction, real-time and parallel algorithms, new computer-human interfaces, image and video restoration, super resolution, inverse problems and machine learning. .

    Mehmet TÜRKAN is a researcher at Technicolor R&I; in Cesson-Sévigné, France, since 2011. He will be joining the Engineering and Computer Science Faculty of Izmir University of Economics, Izmir, Turkey, in Sept 2015. He obtained his PhD degree in computer science from INRIA-Bretagne Atlantique- and University of Rennes 1, Rennes, France. He received his MSc and BSc (Hhons) degrees both in electrical and electronics engineering from Bilkent University, Ankara, and Eskisehir Osmangazi University, Eskisehir, Turkey, respectively. He was involved with the European Commission (EC) 6th Framework Program (FP6) Multimedia Understanding through Semantics, Computation and Learning Network of Excellence (MUSCLE-NoE), EC FP6 Integrated Three-Dimensional Television–Capture, Transmission, and Display Network of Excellence (3-DTV-NoE), and European UltraHD-4U research projects. His general research interests are in the area of signal processing with an emphasis on image and video processing and compression, pattern recognition and classification, and computer vision. Dr. Türkan was the recipient of the Best Student Paper Award in the 2010 IEEE International Conference on Image Processing (ICIP) and was a nominee for the Best Student Paper Award in the 2011 IEEE ICIP.

  • TPM-T3 – Perceptual Metrics for Image and Video Quality in a Broader Context: From Perceptual Transparency to Structural Equivalence

    +

    Instructors

    Thrasyvoulos N. PAPPAS, Northwestern University, Evanston, Illinois, USA
    Sheila S. HEMAMI, Northeastern University, Boston, Massachusetts, USA

    Classroom

    To be announced

    Course Motivation and Description

    We will examine objective criteria for the evaluation of image quality that are based on models of visual perception. Our primary emphasis will be on image fidelity, i.e., how close an image is to a given original or reference image, but we will broaden the scope of image fidelity to include structural equivalence. We will also discuss no-reference and limited-reference metrics. We will examine a variety of applications with special emphasis on image and video compression. We will examine near-threshold perceptual metrics, which explicitly account for human visual system (HVS) sensitivity to noise by estimating thresholds above which the distortion is just-noticeable, and supra-threshold metrics, which attempt to quantify visible distortions encountered in high compression applications or when there are losses due to channel conditions. We will also consider metrics for structural equivalence, whereby the original and the distorted image have visible differences but both look natural and are of equally high visual quality. We will also take a close look at procedures for evaluating the performance of quality metrics, including database design, models for generating realistic distortions for various applications, and subjective procedures for metric development and testing. Throughout the course, we will discuss both the state of the art and directions for future research.


    This course will enable you to:

    • - Gain a basic understanding of the properties of the human visual system and how current applications (image and video compression, restoration, retrieval, etc.) that attempt to exploit these properties.
    • - Gain an operational understanding of existing perceptually-based and structural similarity metrics, the types of images/artifacts on which they work, and their failure modes.
    • - Understand current distortion models for different applications, and how they can be used to modify or develop new metrics for specific contexts.
    • - Understand the differences between sub-threshold and supra-threshold artifacts, the HVS responses to these two paradigms, and the differences in measuring that response.
    • - Understand criteria by which to select and interpret a particular metric for a particular application.
    • - Understand the capabilities and limitations of full-reference, limited-reference, and no-reference metrics, and why each might be used in a particular application.

    Course Outline

    • - Applications: Image and video compression, restoration, retrieval, graphics, etc.
    • - Human visual system review
    • - Near-threshold perceptual quality metrics
    • - Supra-threshold perceptual quality metrics
    • - Structural similarity metrics
    • - Perceptual metrics for texture analysis and compression – structural texture similarity metrics
    • - No-reference and limited-reference metrics
    • - Models for generating realistic distortions for different applications
    • - Design of databases and subjective procedures for metric development and testing
    • - Metric performance comparisons, selection, and general use and abuse
    • - Embedded metric performance, e.g., for rate-distortion optimized compression or restoration
    • - Metrics for specific distortions, e.g., blocking and blurring
    • - Metrics for specific attributes, e.g., contrast, roughness, and glossiness
    • - Multimodal applications

    Course Prerequisites

    • - Basic understanding of image compression algorithms
    • - Background in digital signal processing and basic statistics: frequency-based representations, filtering, distributions.
    • - Level: Intermediate

    Distributed Material

    PDF of PowerPoint presentation

    Bibliographies

    Thrasyvoulos N. PAPPAS received the S.B., S.M., and Ph.D. degrees in electrical engineering and computer science from MIT in 1979, 1982, and 1987, respectively. From 1987 until 1999, he was a Member of the Technical Staff at Bell Laboratories, Murray Hill, NJ. He is currently a professor in the Department of Electrical and Computer Engineering at Northwestern University, which he joined in 1999. His research interests are in image and video quality and compression, image and video analysis, content-based retrieval, perceptual models for multimedia processing, model-based halftoning, and tactile and multimodal interfaces. Prof. Pappas will be serving as Vice-President Publications, IEEE Signal Processing Society (2015-107). He has served as editor-in-chief of the IEEE Transactions on Image Processing (2010-12), elected member of the Board of Governors of the Signal Processing Society of IEEE (2004-06), chair of the IEEE Image and Multidimensional Signal Processing (now IVMSP) Technical Committee, technical program co-chair of ICIP-01 and ICIP-09, and co-chair of the 2011 IEEE IVMSP Workshop on Perception and Visual Analysis. He has also served as co-chair of the 2005 SPIE/IS&T; Electronic Imaging Symposium, and since 1997 he has been co-chair of the SPIE/IS&T; Conference on Human Vision and Electronic Imaging. Dr. Pappas is a Fellow of IEEE and SPIE.

    Sheila S. HEMAMI received the B.S.E.E. degree from the University of Michigan in 1990, and the M.S.E.E. and Ph.D. degrees from Stanford University in 1992 and 1994, respectively. She was with Hewlett-Packard Laboratories in Palo Alto, California in 1994 and was with the School of Electrical Engineering at Cornell University from 1995-2013. She is currently Professor and Chair of the Department of Electrical & Computer Engineering at Northeastern University in Boston, MA. Dr. Hemami's research interests broadly concern communication of visual information from the perspectives of both signal processing and psychophysics. She was elected a Fellow of the IEEE in 2009 for her for contributions to robust and perceptual image and video communications. Dr. Hemami has held various visiting positions, most recently at the University of Nantes, France and at École Polytechnique Fédérale de Lausanne, Switzerland. She has received numerous university and national teaching awards, including Eta Kappa Nu's C. Holmes MacDonald Award. She will be serving as Vice-President Publications Products and Services, IEEE (2015). She was a Distinguished Lecturer for the IEEE Signal Processing Society in 2010-11, was editor-in-chief for the IEEE Transactions on Multimedia from 2008-10. She has held various technical leadership positions in the IEEE.

  • TPM-T4 – Spectral Methods in 3D Data Analysis

    +

    Instructors

    Michael BRONSTEIN, University of Lugano, Switzerland & Perceptual Computing, Intel 

    Classroom

    To be announced

    Course Motivation and Description

    Over the last decade, the intersections between 3D shape analysis and image processing have become a topic of increasing interest in the computer graphics community. Nevertheless, when attempting to apply current image analysis methods to 3D shapes (feature-based description, registration, recognition, indexing, etc.) one has to face fundamental differences between images and geometric objects. Shape analysis poses new challenges that are non-existent in image analysis.

    The purpose of this course is to overview the foundations of shape analysis and to formulate state-of-the-art theoretical and computational methods for shape description based on their intrinsic geometric properties. The emerging field of spectral and diffusion geometry provides a generic framework for many methods in the analysis of geometric shapes and objects. The course will present in a new light the problems of shape analysis based on diffusion geometric constructions such as manifold embeddings using the Laplace-Beltrami and heat operator, 3D feature detectors and descriptors, diffusion and commute-time metrics, functional correspondence, and spectral symmetry.

    Course Outline

    The course is divided in four sections, covering the topics listed below.

    • Theoretical foundations

      Diffusion operators, their spectral properties, Fourier analysis on manifolds, similarities to the classical case − Heat diffusion equation on a Riemannian manifold − The Laplace-Beltrami operator − Diagonalization of Laplacians, relation to joint approximate diagonalization problems − The fundamental solution based on the heat kernel − The discrete heat operator and its basic algebraic properties − Scale-space and heat diffusion − The diffusion and the commute-time distances.
    • Shape representation

      Manifold embedding using the heat operator − Relationship with Laplacian embedding and diffusion embeddings − Geometric and photometric diffusion − Local and global diffusion geometry − Feature detection and feature description − Heat and wave kernel signatures − Optimal spectral descriptors. Convolutional neural networks on manifolds − Volumetric vs surface diffusion.
    • Applications

      Minimum-distortion similarity and correspondences − Functional correspondence, relation to sparse coding and matrix completion problems − Intrinsic symmetry detection − Shape retrieval, bag-of-feature methods − Benchmarks.
    • Implementation and application examples

      Live demos in MATLAB to exemplify the main concepts of the tutorial.

    Course Prerequisites

    Basic knowledge of signal/image processing, Fourier analysis

    Distributed Material

    Course slides will be available online.

    Bibliographies

    Michael BRONSTEIN is a professor in the Faculty of Informatics at the University of Lugano (USI), Switzerland and a Research Scientist at the Perceptual Computing group, Intel, Israel. Michael got his B.Sc. in Electrical Engineering (2002) and Ph.D. in Computer Science (2007), both from the Technion, Israel. His main research interests are theoretical and computational methods in spectral and metric geometry and their application to problems in computer vision, pattern recognition, computer graphics, image processing, and machine learning. His research appeared in international media and was recognized by numerous awards. In 2012, Michael received the highly competitive European Research Council (ERC) grant. In 2014, he was invited as a Young Scientist to the World Economic Forum New Champions meeting, an honor bestowed on forty world's leading scientists under the age of 40. Besides academic work, Michael is actively involved in the industry. He was the co-founder of the Silicon Valley start-up company Novafora, where he served as VP of technology (2006-2009), responsible for the development of algorithms for large-scale video analysis. He was one of the principal inventors and technologists at Invision, an Israeli startup developing 3D sensing technology acquired by Intel in 2012 and released under the RealSense brand.

  • TPM-T5 – Sparse stochastic processes: A unifying statistical framework for modern image processing

    +

    Instructors

    Michael UNSER, EPFL, Switzerland

    Classroom

    To be announced

    Course Motivation and Description

    Sparsity and compressed sensing are very popular topics in image processing. More and more, researchers are relying on the related l1-type minimization schemes to solve a variety of ill-posed problems in imaging. The paradigm is well established with a solid mathematical foundation, although the arguments that have been put forth in the past are mostly deterministic. In this tutorial, we shall introduce the participants to the statistical side of this story. As an analogy, think of the foundational role of Gaussian stationary processes: these justify the use of the Fourier transform or DCT and lend themselves to the formulation of MMSE/MAP estimators based on the minimization of quadratic functionals.

    The relevant objects here are sparse stochastic processes (SSP), which are continuous-domain processes that admit a parsimonious representation in a matched wavelet-like basis. Thus, they exhibit the kind of sparse behavior that has been exploited by researchers in recent years for designing second-generation algorithms for image compression (JPEG 2000), compressed sensing, and the solution of ill-posed inverse problems (l1 vs. l2 minimization).

    The construction of SSPs is based on an innovation model that is an extension of the classical filtered-white-noise representation of a Gaussian stationary process. In a nutshell, the idea is to replace 1) the traditional white Gaussian noise by a more general continuous-domain entity (Lévy innovation) and 2) the shaping filter by a more general linear operator. We shall present the functional tools for the complete characterization of these generalized processes and the determination of their transform-domain statistics. We shall also describe self-similar models (non-Gaussian variants of fBm) that are well suited for image processing.

    We shall then apply those models to the derivation of statistical algorithms for solving ill-posed problems in imaging. This allows for a reinterpretation of popular sparsity-promoting processing schemes—such as total-variation denoising, LASSO, and wavelet shrinkage—as MAP estimators for specific types of SSPs. It also suggests novel alternative Bayesian recovery procedures that minimize the estimation error (MMSE solution). The concepts will be illustrated with concrete examples of sparsity-based image processing including denoising, deconvolution, tomography, and MRI reconstruction from non-Cartesian k-space samples.

    Course Outline

    Introduction

    • - Classical reconstruction algorithms and the Gaussian hypothesis
    • - Variational formulations: from l2- to l1-norm minimization
    • - Compressed sensing

    Part I: Statistical modeling
    An introduction to sparse stochastic processes

    • - Generalized innovation model
    • - Statistical characterization of signals

    Part II: Recovery of sparse signals
    Reconstruction of biomedical images

    • - Discretization of inverse problems
    • - Generic MAP estimator (iterative reconstruction algorithm)
    • - Applications: deconvolution microscopy, MRI, x-ray tomography

    From MAP to MMSE estimation

    • - MMSE estimation of Markov processes
    • - Iterative wavelet-domain MMSE denoising

    Course Prerequisites

    Basic knowledge of statistical signal processing (MAP estimation), optimization techniques (iterative algorithms), and functional analysis (Fourier transform, generalized functions, differential equations)

    Distributed Material

    Copies of the slides
    Complete lecture notes for the tutorial (and beyond) are available on the web at http://www.sparseprocesses.org

    Bibliographies

    Michael UNSER is Professor and Director of EPFL's Biomedical Imaging Group, Lausanne, Switzerland. His main research area is biomedical image processing. He has a strong interest in sampling theories, multiresolution algorithms, wavelets, the use of splines for image processing, and, more recently, stochastic processes. He has published about 250 journal papers on those topics. He is the leading author of “An introduction to sparse stochastic processes”, Cambridge University Press, 2014.

    From 1985 to 1997, he was with the Biomedical Engineering and Instrumentation Program, National Institutes of Health, Bethesda USA, conducting research on bioimaging and heading the Image Processing Group.

    Dr. Unser is a fellow of the IEEE (1999), an EURASIP fellow (2009), and a member of the Swiss Academy of Engineering Sciences. He is the recipient of several international prizes including three IEEE-SPS Best Paper Awards and two Technical Achievement Awards from the IEEE (2008 SPS and EMBS 2010).