- This article is filed under:
- » View All
Machine Vision Research: A Look at Some Leading University Progams
by Nello Zuech, Contributing Editor - AIA Posted 09/27/2004
While many universities have departments or interdisciplinary centers conducting research in computer vision, most of the research today appears to be aimed at security related applications and health related applications. Security related research is often biometric driven, especially face recognition. There are several organizations looking at the under vehicle inspection application. In addition I found that there is much research be conducted related to autonomous vehicles. The objective in these cases is 3D-based developments consistent with 24/7 outdoor ambient variables. For the most part there is very little research of an applied nature targeting industrial applications.
To prepare for this article we emailed the key researchers in over 50 North American universities identified as suggesting they conduct research in computer vision or image processing. I received only three responses to my request for help. Consequently I gathered information for this article by reviewing the websites of the over 50 institutions, culling from them descriptions of their work relevant to machine vision. What follows are descriptions of some of the research that is somewhat related.
One of the largest research groups is The Vision & Autonomous Systems Center (VASC) within the Robotics Institute at Carnegie Mellon University. VASC personnel consist of over 100 faculty, students, and staff, visitors, working in the areas of computer vision, autonomous navigation, virtual reality, intelligent manipulation, space robotics, and related fields. Lots of work in face recognition and 3D. One of the more interesting projects they described is work they are doing transferring essentially industrially related 3D research into the construction industry.
As they describe, the construction industry suffers from costly remedies associated with late defect detection at construction sites. Frequent and accurate assessment of the status of work-in-place, identifying critical spatio-temporal and quality related deviations, and predicting the impacts of these deviations during a construction project are necessary for active project control and for developing an accurate project history. This research project builds on, combines and extends the advances in generating 3D environments using laser scanners, collecting quality information about built environments using embedded sensors, and generation and utilization of semantically-rich Architecture/Engineering/ Construction (A/E/C) project models, in developing an integrated early defect detection system. The research objectives include: (1) formulating strategies/mechanisms to utilize laser scanning and embedded sensor systems for frequent and accurate collection and representation of spatial and quality related as-built data, (2) developing mechanisms for integrating and interpreting data acquired from these systems with the project model, (3) developing a general, flexible and integrated representation schema to model product, process and as-built information, and (4) formalizing mechanisms for automated defect detection and management. The expected contributions of this research fall within the fields of robotics, embedded sensing in civil engineering, A/E/C project modeling and analysis. The societal impacts of this research include potential savings in rework and maintenance costs.
The goal of the Cameron project within the Colorado State University’s Computer Vision Group is to make FPGAs and other adaptive computer systems available to more applications programmers, by raising the abstraction level from hardware circuits to software algorithms. To this end, they have developed a variant of the C programming language and an optimizing compiler that maps high-level programs directly onto FPGAs, and have tested the language and compiler on a variety of image processing (and other) applications.
Over the past 30 years, the Purdue Robot Vision Lab has developed some of the most efficient and robust vision systems for 3D object recognition and localization. They have investigated the 3D object recognition using both structured light sensors for 3D map generation and conventional TV camera sensors for stereovision.
Four working 3D/2D vision systems are fully operational in the lab at this time. One of these systems is the Tubular Objects Bin-Picking System developed originally with funding from Nippondenso Corporation. The second, called MULTI-HASH, was developed with funding from various government agencies and can be used for recognition and localization of a fairly large class of 3D objects that are allowed to be non-polyhedral and concave. The third system was investigated for recognition and pose-estimation of alternator covers (automobile parts) -- as one of the examples for bin-picking tasks using conventional TV cameras for stereovision systems. Finally, the forth system was developed for Ford and aimed to automate the task of unloading tires from the back of a semi-truck and for pose-estimation of torque converters.
The University of Louisville has developed CardEye - an experimental trinocular, active vision system. The system uses an agile trinocular vision setup mounted on a 3-segment robotic arm. It has the same degrees of freedom as the human eyes and neck, which are convergence, pan, tilt, roll, zoom, focus and iris control. The immediate application of the invention was building a 3D model for the environment. The 3D model can be used in high-level vision tasks such as object recognition, object tracking and robot navigation.
Another project developed a vision guided robot system capable of guiding a robot arm to a chosen destination in 3D-space based purely on vision cues. This system allows the robot arm to track a moving object in 3D-space through a camera mounted on the arm or to guide the arm to an object in 3D-space. This project will serves as a natural continuation of the CardEye project by adding mobility to its list of features. The project was targeted a developing a vision-guided robotic gas pump to automate filling of cars.
Another project employed commercial products and software developed by the CVIP laboratory research team to inspect, classify and sort ceramic tiles.
CalTech describes one project as EARLY VISION. It involves the analysis of image sequences with the purpose of recognizing objects, calculating the shape and physical properties of surfaces, determining position and motion relationships of rigid and deformable bodies in a scene. The laboratory studies the computational aspects of vision with the objective of building machines that can see. Their approach is both analytical and experimental, with insight from geometry, optics, signal processing, functional analysis, and through computational simulations and psychophysical tests.
Early vision is the first stage of visual processing. Early visual processes compute elementary properties of images such as brightness, texture, color, motion flow, and stereo disparity. The gradient of these quantities is then used to segment the image into its component regions. In images useful information tends to be localized in space, orientation, and scale of resolution. They are developing efficient signal processing techniques for analyzing images along these coordinates, mimicking computations that are performed in biological visual systems. These techniques consist of filtering the image with families of kernels, which are obtained by rotating, scaling and deforming an original small set of impulse responses. The filtering stage is followed by simple nonlinearities such as rectification and inhibition leading to descriptors of texture, brightness boundaries, motion flow and stereo disparity.
Another project is related to autonomous navigation. From sequences of images taken form a moving vehicle one can reconstruct the structure of the ambient space and the motion of the vehicle. One may use this information for e.g. controlling the vehicle's trajectory. They are studying the use of vision as an input signal for controlling dynamical systems such as wheeled and legged vehicles and helicopters. They are also conducting psychophysical experiments to investigate the nature of the mechanisms in the human brain that allow one to calculate motion parameters and scene structure.
Another project involves shape reconstruction. One may calculate the 3D shape of objects form 2D images using a number of visual cues: stereoscopy, texture, motion parallax, shading, and boundaries. They are studying techniques and representations that allow this reconstruction to happen. They are also measuring the representation of shape in the human brain, and the mechanisms that it uses for calculating it.
Another project is targeted at visual pattern recognition. The human visual system can detect and recognize complex visual patterns with a minimal amount of training. This capability is very useful for recognizing objects, inspecting industrial products and organizing perception of cluttered environments. They are working at duplicating this capability in an artificial system. Starting from a small set of sample images of an object, or of a category of objects, the system should be able to detect and locate instances of the object in previously unseen images. In collaboration with the JPL they are applying these techniques to detecting and measuring interesting geological features in planetary images.
At Michigan State’s Pattern Recognition and Image Processing (PRIP) Lab faculty and students investigate the use of machines to recognize patterns or objects. Methods are developed to sense objects, to discover which of their features distinguish them from others, and to design algorithms that can be used by a machine to do the classification. Many practical applications use a sensed image to initially represent the object, and so much of the PRIP Lab research deals with images. A significant portion of their research focuses on the development of algorithms to do feature extraction and matching and on the organization of data to support efficient matching. Important applications include face recognition, fingerprint identification, document image analysis, 3D object recognition, robot navigation, and visualization/exploration of 3D volumetric data.
At Penn State the goal of the Computer Vision Laboratory is to make computers understand and interpret visual information. Computer-vision systems bring together imaging devices, computers, and sophisticated algorithms for solving problems in areas such as industrial inspection, medicine, document analysis, autonomous navigation, and remote sensing.
Research spans a broad range of multidisciplinary topics, such as autonomous navigation, automated visual inspection, document image analysis, human-computer interface systems, multidimensional medical imaging and visualization, object recognition, telerobotics, and visual information management.
One of their interesting projects is Computer Vision-based Gesture Analysis for Display Control. In everyday life, the natural communication between people consists of a complex mixture of speech, body movements, facial expressions, and eye motions. Clearly, the most natural means of human communication is multimodal. Their long-term goal is to develop a natural HCI framework where many different sensing modalities will be used simultaneously and cooperatively for interpreting the human input to the computer. They are exploring the use of computer vision to interpret human motion (e.g. hand gestures) as part of a multimodal interface. They are addressing the problems of tracking a human hand and recognition of hand gestures. The recognition task is set in the context of a combined speech/gesture interface for controlling a graphical display. The gesture analysis involves extracting the user hand from the background, distinguishing a meaningful gesture from unintentional hand movements using the context, and resolving the conflicts between gestures from multiple users. A challenge of gesture analysis in the multimodal HCI setting is finding ways to improve gesture recognition using, for example, speech recognition and gaze direction. They are exploring the use of hidden Markov models for the combined speech/gesture analysis.
The Computer Vision Laboratory University of Southern California located in Los Angeles, California, has been one of the major centers of computer vision research for over twenty years. They conduct research in a number of basic and applied areas. Specific topics include image and scene segmentation, stereo and motion analysis, range analysis, perceptual grouping, shape analysis and object recognition. They have worked on applications to robotics and manufacturing, mobile robots and aerial photo interpretation. Their approach emphasizes the use of segmented (part- based) symbolic descriptions of objects. Such representations have long been believed to have significant advantages over other alternatives but have been difficult to infer from the sensed data.
The UCLA Vision Lab is engaged in a variety of projects involving processing visual information to retrieve a model of the environment for the purpose of control and interaction with the environment and with humans. Dynamic Vision offers the potential for applications that can have a positive social impact by assisting humans in decision/control tasks performed by processing sensory information, such as recognition, classification, navigation, manipulation, tracking etc. In an industrial setting, computational sensing has already resulted in several products that relieve humans from tasks that are repetitive (e.g. detecting imperfections in fabric or manufactured parts), stressful (e.g. security) or dangerous (e.g. maintenance of underwater platforms or power plants). In transportation, several major companies have working prototypes of automatic guidance systems for passenger cars and trucks (although the systems are complete and operational, they are not currently employed due to unresolved legal issues). Naturally, the Military is very sensitive to the potential of computational sensing systems. Additional industries that are increasingly involved in Computational Vision are Entertainment (image-based modeling and rendering, visual insertion, architectural models), Health Care (assisted/teleoperated surgery, tomography, imaging, brain mapping), and the Computer Industry (human-computer interfaces). The uncertain, complex and dynamic nature of the physical world and the intrinsic ill-posedness of the general vision problem brings about an unusual combination of mathematical tools from the fields of differential geometry (for the study of shape), dynamical systems (for the study of motion and deformation) and functional analysis (images are functions of the radiance distribution of physical surfaces). Uncertainty is often captured in a probabilistic sense through the use of stochastic processes, and computational perception is posed as a statistical inference or functional optimization problem.
The University of Wisconsin - Madison Computer Vision Group conducts research in designing, implementing and experimenting with important components of computer vision systems. This includes modules for image analysis and scene understanding in the areas of
- View synthesis (a.k.a. image-based rendering)
- Visual exploration of objects by purposive viewpoint control
- Motion analysis (periodic motion detection, dynamic perceptual organization)
- Three-dimensional shape representation and recognition, especially image-based representations
- Deformable contours
In addition, they have studied methods for using vision, images and image streams for specific image-guided activities in the areas of
- Robotics: Vision for 3D motion planning in unknown environments
- Visualization: Visualizing user-defined data types for interactively steering and visual experimentation during algorithm development
While the research at only a few universities has been reviewed in these two articles, the nature of the research being conducted at these universities is consistent with the type of research related to machine vision being conducted at many North American universities.
There are currently no comments for this article.
Leave a Comment:
All fields are required, but only your name and comment will be visible (email addresses are kept confidential). Comments are moderated and will not appear immediately. Please no link dropping, no keywords or domains as names; do not spam, and please do not advertise.