This project aims to develop biologically relevant deep recurrent models for generalized object recognition. As a first step we plan to develop an efficient deep recurrent model for solving complex object recognition tasks such as face recognition, facial expression recognition, and character recognition from natural images. In recent years, deep neural networks have shown excellent performance for solving complex object recognition tasks. The increase in performance is achieved by corresponding increase in size and depth of the network, and addition of thousands of active neurons. This, in turn, requires training huge number of free parameters which is computationally intensive. Therefore, we propose a simultaneous recurrent network (SRN) based auto-encoder for object recognition that significantly reduces the number of trainable parameters by sharing weights in the hidden layers. The simultaneous recurrency results in an unfolding effect of the SRN through time, potentially enabling the design of an arbitrarily deep network. Furthermore, the inherent forward and recurrent connections make the SRN more biologically plausible compared to the generic feed-forward architectures. Additionally, the SRN based deep network model can be utilized for extracting meaningful features from raw input data that can then be utilized as input to any feature classification techniques such as metric learning, SVMs, k-means, kernel methods, etc.

Figure: Object recognition pipeline using deep recurrent model

Metric learning has been successful in distance based classification tasks. However, metric learning tends to become increasingly complex with the increase of input feature dimensionality. Therefore, application of an efficient feature extraction and dimensionality reduction technique prior to metric learning has been pursued. Conventional feature extraction and dimensionality reduction techniques used for metric learning are usually hand-crafted and may not offer the best overall performance. Contemporary methods such as deep neural networks (DNNs) along with metric learning have been used for improved feature extraction and dimensionality reduction through learning. While DNNs have exhibited excellent performance, such deep structures tend to get cumbersome with increasing complexity of the task as mentioned above. Consequently, the SRN based deep network model is considered an efficient feature extraction and dimensionality reduction technique for metric learning based algorithms.
The proposed SRN based deep architecture is tested on solving several complex recognition tasks: face recognition, facial expression, and character recognition. Our results show that the proposed recognition framework achieves superior performance (2%-6% better accuracy) in comparison to the state-of-the-art DNN-based models.
A practical demonstration of incorporating our proposed recognition framework in a humanoid robotic platform called NAO is shown in the following video demo.