Special Focus: Deep Learning

  • Share:
Release Date: 2019-11-27 Visited: 

本专栏特邀编委:

王立威,北京大学

孟德宇,西安交通大学

杨易,悉尼科技大学

于剑,北京交通大学

专栏详情

“深度学习取得的成绩显著,使它成为学术、工程领域及社会甚至政府竞相追逐的热点。其应用领域也由计算视觉、语音识别、自然语言处理等扩展至更宽的领域,诸如:音频识别、社交网络过滤、机器翻译、自动控制、生物信息及药物等。

本专栏旨在推动深度学习在理论、算法和应用方面发展。”

本次专栏共5篇文章,内容涵盖近年来深度学习的发展及热点、面部识别、机器人操作、机器视觉、图像分类等。

Some New Trends of Deep Learning Research

MENG Deyu, SUN Lina   

Deep learning has been attracting increasing attention in the recent decade throughout science and engineering due to its wide range of successful applications. In real problems, however, most implementation stages for applying deep learning still require inevitable manual interventions, which naturally conducts difficulty in its availability to general users with less expertise and also deviates from the intelligence of humans. It is thus a challenging while critical issue to enhance the level of automation across all elements of the entire deep learning framework, like input amelioration, model designing and learning, and output adjustment. This paper tries to list several representative issues of this research topic, and briefly describe their recent research progress and some related works proposed along this research line. Some specific challenging problems have also been presented.  

DOI:10.1049/cje.2019.07.011

Face Liveness Detection Based on the Improved CNN with Context and Texture Information

GAO Chenqiang, LI Xindou, ZHOU Fengshun, MU Song

Face liveness detection, as a key module of real face recognition systems, is to distinguish a fake face from a real one. In this paper, we propose an improved Convolutional neural network (CNN) architecture with two bypass connections to simultaneously utilize low-level detailed information and high-level semantic information. Considering the importance of the texture information for describing face images, texture features are also adopted under the conventional recognition framework of Support vector machine (SVM). The improved CNN and the texture feature based SVM are fused. Context information which is usually neglected by existing methods is well utilized in this paper. Two widely used datasets are used to test the proposed method. Extensive experiments show that our method outperforms the state-of-the-art methods.

DOI:10.1049/cje.2019.07.012

Generating Basic Unit Movements with Conditional Generative Adversarial Networks

LUO Dingsheng, NIE Mengxi, WU Xihong

Arm motion control is fundamental for robot accomplishing complicated manipulation tasks. Different movements can be organized by configuring a series of motion units. Our work aims at equipping the robot with the ability to carry out Basic unit movements (BUMs), which are used to constitute various motion sequences so that the robot can drive its hand to a desired position. With the definition of BUMs, we explore a learning approach for the robot to develop such an ability by leveraging deep learning technique. In order to generate the BUM regarding to the current arm state, an internal inverse model is developed. We propose to use Conditional generative adversarial networks (CGANs) to establish the inverse model to generate the BUMs. The experimental results on a humanoid robot PKU-HR6.0II illustrate that CGANs could successfully generate multiple solutions given a BUM, and these BUMs can be used to constitute further reaching movement effectively.

DOI:10.1049/cje.2019.07.013  

Multi-vision Attention Networks for on-Line Red Jujube Grading

SUN Xiaoye, MA Liyan, LI Gongyan

To solve the red jujube classification problem, this paper designs a convolutional neural network model with low computational cost and high classification accuracy. The architecture of the model is inspired by the multi-visual mechanism of the organism and DenseNet. To further improve our model, we add the attention mechanism of SE-Net. We also construct a dataset which contains 23,735 red jujube images captured by a jujube grading system. According to the appearance of the jujube and the characteristics of the grading system, the dataset is divided into four classes:invalid, rotten, wizened and normal. The numerical experiments show that the classification accuracy of our model reaches to 91.89%, which is comparable to DenseNet-121, InceptionV3, InceptionV4, and Inception-ResNet v2. Our model has real-time performance.

DOI:10.1049/cje.2019.07.014

Multi-label Image Classification via Coarse-to-Fine Attention

LYU Fan, LI Linyan, Victor S. Sheng, FU Qiming, HU Fuyuan

Great efforts have been made by using deep neural networks to recognize multi-label images. Since multi-label image classification is very complicated, many studies seek to use the attention mechanism as a kind of guidance. Conventional attention-based methods always analyzed images directly and aggressively, which is difficult to well understand complicated scenes. We propose a global/local attention method that can recognize a multi-label image from coarse to fine by mimicking how human-beings observe images. Our global/local attention method first concentrates on the whole image, and then focuses on its local specific objects. We also propose a joint max-margin objective function, which enforces that the minimum score of positive labels should be larger than the maximum score of negative labels horizontally and vertically. This function further improve our multi-label image classification method. We evaluate the effectiveness of our method on two popular multi-label image datasets (i.e., Pascal VOC and MS-COCO). Our experimental results show that our method outperforms state-of-the-art methods.

DOI:10.1049/cje.2019.07.015


  • Share:
Release Date: 2019-11-27 Visited: