Loading [MathJax]/jax/output/SVG/jax.js
Hengmin Zhang, Jian Yang, Wenli Du, et al., “Enhanced acceleration for generalized nonconvex low-rank matrix learning,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 98–113, 2025. DOI: 10.23919/cje.2023.00.340
Citation: Hengmin Zhang, Jian Yang, Wenli Du, et al., “Enhanced acceleration for generalized nonconvex low-rank matrix learning,” Chinese Journal of Electronics, vol. 34, no. 1, pp. 98–113, 2025. DOI: 10.23919/cje.2023.00.340

Enhanced Acceleration for Generalized Nonconvex Low-Rank Matrix Learning

More Information
  • Author Bio:

    Zhang Hengmin: Hengmin Zhang received the Ph.D. degree from Nanjing University of Science and Technology, Nanjing, China, in 2019. He was a Post-Doctoral Fellow with School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China, and also a Post-Doctoral Fellow with the Pattern Analysis and Machine Intelligence Research Group, Department of Computer and Information Science, University of Macau, Macau, China, from 2019 to 2022. He is currently a Research Fellow with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. He has published more than 30 technical papers at prominent journals and conferences. His research interests include sparse coding and low-rank matrix recovery, nonconvex optimizations, and large-scale representation learning methods. (Email: hengmin.zhang@ntu.edu.sg, hengmin.zhang@ntu.edu.sg)

    Yang Jian: Jian Yang received the Ph.D. degree in pattern recognition and intelligence systems from Nanjing University of Science and Technology (NJUST), Nanjing, China, in 2002. In 2003, he was a Post-Doctoral Researcher with University of Zaragoza, Zaragoza, Spain. From 2004 to 2006, he was a Post-Doctoral Fellow with Biometrics Centre, The Hong Kong Polytechnic University, Hong Kong, China. From 2006 to 2007, he was a Post-Doctoral Fellow with Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA. He is currently a Chang-Jiang Professor with School of Computer Science and Engineering, NJUST, Nanjing, China. He has authored more than 400 scientific papers in pattern recognition, computer vision, and machine learning. His papers have been cited more than 41000 times in the Google Scholar. His research interests include pattern recognition, computer vision, and machine learning. Moreover, Prof. Yang is a Fellow of International Association for Pattern Recognition, i.e., IAPR Fellow. He is/was an Associate Editor of Pattern Recognition, Pattern Recognition Letters, IEEE Transactions on Neural Networks and Learning Systems, and Neurocomputing. (Email: csjyang@njust.edu.cn)

    Du Wenli: Wenli Du received the B.S. and M.S. degrees in chemical process control from Dalian University of Technology, Dalian, China, in 1997 and 2000, respectively, and the Ph.D. degree in control theory and control engineering from East China University of Science and Technology, Shanghai, China, in 2005. She is currently a Professor of School of Information Science and Engineering and serves as the Dean of Graduate School, East China University of Science and Technology, Shanghai, China, and is also the Vice-Director of Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, China. Her research interests include control theory and applications, system modeling, advanced control, and process optimization. (Email: wldu@ecust.edu.cn)

    Zhang Bob: Bob Zhang received the Ph.D. degree in electrical and computer engineering from University of Waterloo, Waterloo, ON, Canada, in 2011. He was with Center for Pattern Recognition and Machine Intelligence and later was a Post-Doctoral Researcher with Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA. He is currently an Associate Professor with Department of Computer and Information Science, University of Macau, Macau, China. In addition, he is/was also a Technical Committee Member of the IEEE Systems, Man, and Cybernetics Society, and an Associate Editor of IEEE Transactions on Systems, Man, and Cybernetics: Systems, IEEE Transactions on Neural Networks and Learning Systems, Artificial Intelligence Review, and IET Computer Vision. His research interests include biometrics, pattern recognition, feature extraction/detection, and image processing. (Email: bobzhang@um.edu.mo)

    Zha Zhiyuan: Zhiyuan Zha received the Ph.D. degree from Nanjing University, Nanjing, China, in 2018. He is currently a Senior Post-Doctoral Research Fellow with Nanyang Technological University, Singapore. Dr. Zha was a recipient of the Platinum Best Paper Award and the Best Paper Runner-Up Award at the IEEE International Conference on Multimedia and Expo (ICME) in 2017 and 2020, respectively. He has been an Associate Editor of The Visual Computer since 2023. His research interests include inverse problems in image/video processing, sparse signal representation, and machine learning. (Email: zhiyuan.zha@ntu.edu.sg)

    Wen Bihan: Bihan Wen received the B.S. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2012, and the M.S. and Ph.D. degrees in electrical and computer engineering from University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 2015 and 2018, respectively. He is currently a Nanyang Assistant Professor with School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. He was a recipient of the 2016 Yee Fellowship and the 2012 Professional Engineers Board Gold Medal, Singapore. He was also a recipient of the Best Paper Runner Up Award at the IEEE International Conference on Multimedia and Expo in 2020. He has been an Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology since 2022, and an Associate Editor of MDPI Micromachines since 2021. He is a Guest Editor for IEEE Signal Processing Magazine from 2021 to 2023, and a Guest Editor for IEEE Journal of Selected Topics in Signal Processing from 2023 to 2025. His research interests include machine learning, computational imaging, computer vision, image and video processing, and big data applications. (Email: bihan.wen@ntu.edu.sg)

  • Corresponding author:

    Wen Bihan, Email: bihan.wen@ntu.edu.sg

  • Received Date: October 25, 2023
  • Revised Date: November 26, 2023
  • Accepted Date: January 09, 2024
  • Available Online: March 01, 2024
  • Published Date: March 01, 2024
  • Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inherent biases and computational burdens, especially when used to relax the rank function, making them less effective and efficient in real-world scenarios. To address these challenges, our research focuses on generalized nonconvex rank regularization problems in robust matrix completion, low-rank representation, and robust matrix regression. We introduce innovative approaches for effective and efficient low-rank matrix learning, grounded in generalized nonconvex rank relaxations inspired by various substitutes for the 0-norm relaxed functions. These relaxations allow us to more accurately capture low-rank structures. Our optimization strategy employs a nonconvex and multi-variable alternating direction method of multipliers, backed by rigorous theoretical analysis for complexity and convergence. This algorithm iteratively updates blocks of variables, ensuring efficient convergence. Additionally, we incorporate the randomized singular value decomposition technique and/or other acceleration strategies to enhance the computational efficiency of our approach, particularly for large-scale constrained minimization problems. In conclusion, our experimental results across a variety of image vision-related application tasks unequivocally demonstrate the superiority of our proposed methodologies in terms of both efficacy and efficiency when compared to most other related learning methods.

  • In recent years, the recovery of low-rank matrices has found widespread applications in pattern recognition, machine learning, and computer vision [1]–[6]. This field encompasses various well-established methods, including robust principal component analysis (RPCA) [7], matrix completion (MC) [8], robust matrix completion (RMC) [9], low-rank representation (LRR) [10], and robust matrix regression (RMR) [11]. Additionally, several variants have been explored in recent research [12]–[16]. It is noteworthy that both RMC and LRR can be regarded as extensions of MC and RPCA. These fundamental concepts were originally introduced through rank minimization problems, as follows:

    RMC:minZλrank(Z)+PΩ(D)PΩ(Z)qq (1)
    LRR:minZλrank(Z)+DDZ (2)
    RMR:minxλrank(DA(x))+xqq (3)

    where λ>0 represents the regularization parameter. In both Problems (1) and (2), the symbol D represents the data matrix. We define PΩ(Zi,j) as Zi,j if (i,j)Ω, and as 0 otherwise. The rank of Z corresponds to the number of non-zero singular values in the matrix. We employ norms such as qq (e.g., with q>0) and (e.g., =1,2, or 21-norm) to describe various residual styles, including Laplace, Gaussian, and column-elements, respectively. In Problem (3), D represents the testing sample matrix. The expression A(x)=x1A1+x2A2++xnAn denotes a linear mapping from Rn to Rp×q for each Ai belonging to Rp×q, respectively. We use A() to represent its adjoint operation, and xqq signifies the coefficients representation norm, where 1-norm and 2-norm correspond to setting q as 1 and 2, respectively. The convex norms mentioned above have been thoroughly examined in previous studies [14], [17]–[19].

    In general, the problems presented in (1)–(3) are non-deterministic polynomial (NP)-hard and challenging to solve due to the nonconvex and nonsmooth nature of the rank function. Notably, this formulation extends several existing low-rank matrix recovery problems by replacing fλ(X;p) with various norms, including the nuclear norm [8], Schatten-p norm (p>0) [9], weighted Schatten-p norm (p>0) [20], and weighted/truncated nuclear norm [21], [22]. Additionally, we demonstrate that convex and nonconvex 0-norm relaxations (such as p-norm (p>0) [9], exponential-type penalty (ETP) [23], Laplace [24], and Geman [25]), listed in Table 1, can be extended to relax the rank function by acting on the singular values of matrix Z [20], [26], [27]. These properties naturally motivate us to develop feasible solutions to efficiently obtain low-rank solutions in a proper way. Moreover, our primary focus in this work centers on the investigation of problem formulations, optimization algorithms, and performance evaluations related to effective and efficient nonconvex low-rank matrix learning.

    Table  1.  Several nonconvex 0-norm relaxation formulations for low-rank matrix measurements
    Function name Function formula fλ(X;p)
    p-norm [9] ri=1λσpi(X)
    ETP [23] ri=1λ(1exp(pσi(X)))1exp(p)
    Laplace [24] ri=1λ(1exp(σi(X)p))
    Geman [25] ri=1λσi(X)σi(X)+p
     | Show Table
    DownLoad: CSV

    To optimize the previously mentioned minimization problems, it is necessary to introduce auxiliary variables to transform them into constrained problems rather than solving them directly. Here, on the one hand, building upon the principles established in Problems (1) and (2), we introduce the term fλ(X;p) as a substitute for λrank(X), while g(DA(Z)) quantifies the loss function for a general nonconvex low-rank matrix learning problem, denoted as follows:

    minfλ(X;p)+g(E)s.t.X=Z,E+A(Z)=D (4)

    where both X and E are auxiliary variables. On the other hand, adopting a complementary approach, we replace the rank function with generalized nonconvex relaxations and employ xqq with q=1,2 in Problem (3). This leads to the formulation of the following q-norm regularized problems:

    minfλ(E;p)+zqqs.t.x=z,E+A(x)=D (5)

    where the fλ(E;p) is also derived from Table 1. Especially, Problem (5) encompasses a broader scope than the models studied in [28] and [29], as the latter exclusively focus on specific instances involving Schatten-p norms with p=1,2/3, and 1/2, respectively. In some real-world scenarios marked by significant occlusions and intense illuminations, nonconvex formulations consistently outperform their convex counterparts. This superiority is attributed to the extended matrix variate power exponential distribution [29], [30], which offers a more comprehensive depiction of pixel dependencies. Moreover, it features a heavier-tailed region that captures spatial structural information within residual images. Empirical results consistently validate the superior performance of nonconvex methods in recovering low-rank matrices from incomplete or corrupted data matrices.

    It is crucial to highlight that the problems addressed in this work were previously explored in the conference papers [31], [32]. However, this work distinguishes itself by expanding upon these problems and developing a unified nonconvex low-rank matrix learning framework with more in-depth analysis and explanations. To visually illustrate the comparisons and differences, we present plotted curves of four nonconvex functions in Figure 1 and various demonstrations related to different types of noise and two low-rank matrix structures in Figure 2. These visualizations provide additional insights into the regularization term and the residual function for various real-world image applications. Meanwhile, it is noteworthy that we have successfully incorporated preliminary information into the objective terms, encompassing the loss function and coefficient term. This integration proves to be advantageous for enhancing the overall performance. Addressing constrained problems presents a substantial challenge in developing optimization algorithms that are both effective and efficient, particularly when dealing with multiple variables. This challenge is notably prominent in the study of widely utilized alternating direction method of multipliers (ADMM) algorithms [33]–[35].

    Figure  1.  (a) Plots depict several convex and nonconvex 0-norm functions; (b) Their corresponding supergradients; (c) Proximal operators. In these plots, we have set λ=1.0, p{0,0.5,1.0} for the p-norm, and p=1.5 for ETP, Geman, and Laplace, respectively.
    Figure  2.  Visual comparisons presented for two common types of noise, as measured by both (a) the Frobenius norm and (b) the 21-norm, and (c) and (d) comparisons made for two common types of low-rank matrix structures in within the term metrics of objective functions.

    In comparison with [31] and [32], we have introduced an advanced acceleration technique using randomized singular value decomposition (RSVD) to alleviate the computational complexity associated with low-rank matrix computations. This represents a significant departure from our earlier conference papers, which relied on traditional ADMM algorithms with either full or economical singular value decomposition (SVD) [6], [22], [36]. Moreover, the problems under investigation involve two constrained equations, and the concept of low-rankness holds specific significance within the context of the measurements incorporated into the objective function. Additionally, we have provided comprehensive theoretical analyses from two perspectives, including computational complexity and convergence guarantees. Finally, we also conducted extensive experimental comparisons with several closely related works to validate the superiority of our proposed methodologies across various application tasks in terms of computational efficiency and efficacy. In summary, compared to previously published conference works, we have achieved notable improvements in model explanations, efficient algorithm designs, comparison methods, and application tasks, respectively.

    Building upon these insights and the existing body of researches, which aims to reduce computational complexity and enlarge real-world image applications, we can distill the primary contributions of this work as follows:

    Model formulation We have established a unified nonconvex low-rank matrix relaxation framework by extending RMC, LRR, and RMR. This approach yields nearly unbiased estimators compared to nuclear norm-based regularization problems, thereby elevating the overall solution quality. Our rank relaxation functions are derived from nonconvex relaxations of the 0-norm, which we apply to the singular values of the target low-rank matrix. This renders the proposed problem formulation exceptionally robust, versatile, and applicable to a wide range of image processing tasks.

    Algorithm design Our algorithm designs involve the development of a nonconvex ADMM method with two dual variables, eliminating the need for linearization strategies. Each subproblem can be solved to obtain a closed-form solution. To further enhance acceleration, we incorporate RSVD in the low-rank matrix updates and also utilize orthogonal factorization techniques and parallel strategies. Additionally, we provide theoretical guarantees, demonstrating that the variable sequence generated by nonconvex ADMM is bounded. Moreover, during local convergence analysis, we show that a subsequence converges to a stationary point. The limiting point satisfies the Karush-Kuhn-Tucker (KKT) conditions under more lenient assumptions.

    Experimental validation In our experimental results, we demonstrate the superior recovery performance of our proposed nonconvex methods when compared to convex counterparts and other relevant techniques in the domains of matrix completion, subspace clustering, image classification, and image reconstruction using various real-world datasets. Our evaluations cover a diverse range of low-level and high-level vision tasks within the realms of computer vision and pattern recognition. These experimental tasks serve to reinforce the effectiveness and efficiency of our methods through both quantitative and qualitative comparisons.

    The following sections are organized as follows: In Section II, we introduce the generalized nonconvex low-rank matrix learning problems and explore their connections with the related work from various perspectives. Section III delves into the development of the multiple variables ADMM optimization algorithm with some acceleration strategies to enhance the computational efficiency for three distinct low-rank matrix optimization problems. We provide a comprehensive theoretical analysis, covering aspects such as computational complexity and convergence properties. Section IV presents a wide array of experiments designed to validate the performance and computational advantages of our proposed method. Finally, in Section V, we conclude our work, providing a summary and outlining potential future directions.

    In the upcoming sections, we will delve into the concrete analysis of the constrained Problems (4) and (5). Our initial focus will be on unveiling the significance of each component within the objective function and investigating their connections to existing low-rank matrix learning problems. Building upon these foundational insights, we will embark on a comprehensive analysis of the generalized nonconvex low-rank matrix learning methods and their various extensions, as represented in Figure 3. This diagram provides a comprehensive overview of the research framework of this work from multiple aspects including problem formulations, algorithm design and analysis, and application tasks.

    Figure  3.  Research framework for the generalized nonconvex low-rank matrix learning methods including problem formulations, ADMM algorithms, and various low-level image vision applications.

    It can be observed that the objective functions in both Problems (4) and (5) are composed of a regularization term and a residual function, each serving a distinct purpose. To illustrate, fλ(X;p) quantifies the regularization term for the low-rank matrix in Problem (4), whereas fλ(E;p) characterizes the residual function for the low-rank matrix in Problem (5). This formulation’s versatility enables us to tailor the low-rank constraints to various generalized learning problems within the realm of low-level image vision tasks. Furthermore, our research highlights the significance of and qq norms and their connection to the formulation of both convex and nonconvex norms. This analysis underscores the diverse roles played by different norms in defining residual and coefficient styles, as mentioned earlier.

    To the best of our knowledge, the adoption of the nonconvex 0-norm function to relax the rank function is widely recognized as a more effective approach, as supported by prior research. Emphasizing the versatility and broad applicability of Problems (4) and (5), we conduct comparisons and establish connections with various existing works, centering our discussion on three key perspectives, outlined as follows:

    • Differences in problem formulations: Existing rank relaxation techniques [9], [26], [37], [38] involve calculating the sum of all singular values associated with specific functions to quantify the second-order sparsity of the low-rank matrix. This further served as establishing generalized nonconvex low-rank matrix learning problems. Notably, our work addresses these minimization problems within a unified framework, eliminating the need for individual case-by-case formulations and providing a versatile solution with proper meanings.

    • Optimization using first-order algorithms: To achieve the desired solution, one can employ ADMM [13], [35], [39], [40] for handling constrained minimization problems. Alternatively, accelerated proximal gradient algorithm (APG) [41]–[44] and iteratively reweighed algorithm (IRA) [14], [45] are tailored for unconstrained minimization problems. Importantly, to maintain closed-form solutions without relying on linearization techniques, unlike many existing approaches [26], [46]–[49], we also favor the use of ADMM algorithms to effectively tackle the constrained Problems (4) and (5) while incorporating the acceleration strategy for improving the computational efficiency.

    • Applications in image low-level and high-level vision: Depending on the specific application cases [2], [3], Problem (4) can find applications in tasks such as image inpainting (as in matrix completion methods) and image clustering (as in subspace segmentation methods). Additionally, Problem (5) is applicable in image classification and reconstruction (as in matrix regression methods). These various low-level applications are underpinned by the construction of meanings for the objective terms.

    In this section, we introduce an efficient algorithm that integrates faster low-rank matrix computation to enhance the efficiency of ADMM. Our primary objective is to provide an analysis of the iteration procedures, computational complexity, and convergence properties. Therefore, we will outline the step-by-step solution for Problems (4) and (5) below.

    To enhance computational efficiency and reduce the complexity of the optimization algorithm, we introduce a faster proximal operator for low-rank matrices. This approach involves calculating a smaller matrix for the proximal operator, leading to a significant reduction in its size as in [50]–[52]. By offering a closed-form solution for computing the proximal operator related to low-rank matrices, the following proposition greatly improves the algorithm’s computational efficiency and reduces complexity at each iteration. The primary challenge in solving for low-rank matrices is computing the proximal operators, which often involve convex and nonconvex rank relaxations and typically require the computation of the full SVD in most cases.

    To enhance the computational efficiency, we propose an acceleration technique known as randomized SVD. This technique reduces computational complexity and enables faster updates, as described below.

    Proposition 1 Suppose that rank(X)=r<dmin(m,n), QRm×d is an orthogonal matrix satisfying QTQ=I, and YRd×n such that X=QY. Then, the proximal operator Proxhλ(QTZ) is the solution to

    Proxhλ(QTP)=argminYhλ(Y)+12YQTP2F (6)

    where we have hλ(Y)=di=1hλ(σi(Y)), XP2F=YQTP2F, and the matrices X and Y have the same singular values. Therefore, by finding the optimal Y=Proxhλ(QTP), we have the desired matrix X=QY.

    Based on von Neumann’s theorem [53], which presents a series of transformations, we initially perform the SVD of a matrix, denoted as UΣVT without loss of generality. Here, U and V represent the left and right singular vector matrices, and Σ is the diagonal singular value matrix with its i-th element denoted as σi for 1il=min(m,n). By leveraging Proposition 1, we achieve a significant reduction in computational complexity, especially when dmin(m,n). This reduction implies that fewer singular values need to be calculated, resulting in faster execution time. This underscores the importance of the proximal operator extension from (6) in obtaining the optimal solution and approximating the matrix through the accelerated generalized singular value thresholding (GSVT) [2], [47] for solving the subsequent low-rank matrix-related subproblems, which are not considered in the existing works [31], [32].

    Following the principles of multiple ADMM [33]–[35] for solving constrained optimization problems, we employ an iterative procedure to efficiently address Problems (4) and (5), respectively. This iterative process entails the sequential update of the relevant variables. Our choice is rooted in the constrained nature of the problems in question, where each subproblem can be efficiently computed with a closed-form solution, rendering the ADMM algorithms a fitting choice.

    To facilitate the computation of a closed-form solution for each subproblem, one can transform Problem (4) into the following unconstrained problem. Subsequently, we adopt the augmented Lagrange function in the following form:

    Lμ(Z,E,X,Λ1,Λ2)=fλ(X)+g(E)+μ2[XZ+Λ1μ2F+E+A(Z)D+Λ2μ2F]12μ(Λ12F+Λ22F) (7)

    here, μ>0 is the penalty parameter, and both Λ1 and Λ2 represent the dual variables. It is important to highlight that (7) functions as the augmented Lagrange function for (4). The terms involving 2F represent the differentiable term introduced in [10], [21], and [54]. Note that this iteration process aims to achieve the low-rank matrix solution derived from the RMC and LRR problems.

    Building upon the previously introduced augmented Lagrange function and considering the variables at the k-th iteration (Zk,Ek,Xk,Λ1,k,Λ2,k,μk), the iterative process entails sequential updates of the primal variables, dual variables, and the penalty parameter. Consequently, to eliminate the need for linearization techniques, the variables at the (k+1)-th iteration can be updated as follows:

    Zk+1=argminZμk2[ZˆXkΛ1,k2F+A(Z)ˆEkΛ2,k2F]=(I+AA)1(ˆXkΛ1,k+ˆEkΛ2,k) (8)
    Ek+1=argminEg(E)+μk2EˆZk+1Λ2,k2F (9)
    Xk+1=argminXfλ(X;p)+μk2XˆZk+1Λ1,k2F (10)

    where ˆXkΛ1,k=Xk+Λ1,kμk, ˆEkΛ1,k=DEkΛ2,kμk, ˆZk+1Λ2,k=DA(Zk+1)Λ2,kμk, and ˆZk+1Λ1,k=Zk+1Λ1,kμk. Meanwhile, we also need to update the dual variables and the penalty parameter, as follows:

    Λ1,k+1=Λ1,k+μk(Xk+1Zk+1) (11)
    Λ2,k+1=Λ2,k+μk(Ek+1+A(Zk+1)D) (12)
    μk+1=min(ρμk,μmax),ρ>1 (13)

    where the number of iterations is significantly influenced by the choice of ρ values. Specifically, a larger value of ρ generally results in faster convergence, whereas a smaller value of ρ tends to yield a more accurate solution.

    In the updated process, solving subproblem (8) involves addressing a quadratic problem, while subproblem (10) requires computing proximal operators using a fixed-point iteration algorithm for various formulations of the function fλ(X), as detailed in Proposition 1 for the RMC method, focused on enhancing computational efficiency. We consider a “dictionary” that linearly spans the data space, which helps reduce computational complexity, as employed in the LRR method. Both of the aforementioned acceleration techniques are designed to minimize computational complexity. As demonstrated in [10], the lowest-rank representation can recover the underlying row space, thereby revealing the true data segmentation. Moreover, the optimal solver for subproblem (9) can be adapted based on different choices of g(E), such as 1, 2, or 21 norms for various noise measurements. The initial values for the variables are set as (Z0,E0,X0,Λ1,0,Λ2,0,μ0). To assess convergence, we have established stopping criteria based on the following formula:

    Ek+1+A(Zk+1)DFDF<ϵ (14)

    where ϵ=0.00001. In summary, the detailed iteration procedure of the nonconvex ADMM with two dual variables for solving (4) using the augmented Lagrange function (7) is presented in our Algorithm 1.

    Algorithm 1 Optimization algorithm for Problem (4)
    Input: data matrix D, projecting matrix A.
    Parameters: (p,λ,ρ,μ), ε, and kmax.
    Output: X.
    1: Set k=0 and μ0;
    2: while not converged do
    3:  Update Z via (8);
    4:  Update E via (9);
    5:  Update X via (10);
    6:  Update Λ1 via (11);
    7:  Update Λ2 via (12);
    8:  Update μ via (13);
    9: end while

    To derive the vector of representation coefficients, we employ the foundational iteration procedures of ADMM to optimize Problem (4). These steps entail the sequential updates of variables and the penalty parameter. Throughout the optimization process, the availability of closed-form solvers is particularly crucial, especially when dealing with q-norms in Problem (5), where we set q to 1, as extensively discussed in [11], [29], and [55]. Particularly, for the case of q=2, we omit the introduction of the auxiliary variable z and eliminate the use of the constrained equation x=z due to the differentiable property of x22. You can find further details in [11] and [32]. In this work, our focus is specifically on the optimization iterations for the case of q=1, which are elaborated upon in the following two primary steps:

    Step 1 By introducing the dual variables Λ1 and Λ2, we formulate the augmented Lagrangian function for Problem (5) only with q=1, represented as

    Lμ(E,z,x,Λ1,Λ2)=fλ(E;p)+z11+μ2E+A(x)D+Λ1μ2F12μΛ12F+μ2xz+Λ2μ212μΛ22 (15)

    where this function formulation follows the same processing procedure as the previously described function (15). Then, we proceed to present the variables updated below.

    Step 2 By alternately updating each variable in the objective function (15), we also take into account the variables at the k-th iteration and then present the following iteration sequences for the (k+1)-th iteration:

    i) In the above-given function, while keeping ˆxΛ1,k=DA(xk)Λ1,kμk fixed, we can update E through the following minimization subproblem:

    Ek+1=argminEfΛ(E;p)+μ2EˆxΛ1,k2F (16)

    and the closed-form solution can be achieved by using the generalized low-rank matrix computations [2], [47].

    ii) Fixing ˆEΛ1,k=DEk+1Λ1,kμk, we subsequently update (z,x) from (15) by

    zk+1=argminzz11+μ2z(xk+Λ2,kμ)2=sign(xk+Λ2,kμk)max(xk+Λ2,k1μk,0) (17)
    xk+1=argminxA(x)ˆEΛ1,k2+xzk+1+Λ2,kμk2=(I+AA)1(A(ˆEΛ1,k)+zk+1Λ2,kμk) (18)

    iii) Using the updated variables Ek+1, zk+1, and xk+1 obtained from the previous steps, we proceed to update (Λ1,Λ2) and μ according to the following equations:

    Λ1,k+1=Λ1,k+μk(Ek+1+A(xk+1)D) (19)
    Λ2,k+1=Λ2,k+μk(xk+1zk+1) (20)
    μk+1=min(ρμk,μmax),ρ>1 (21)

    where (19) and (20) are employed for the dual variables, while (21) is utilized for updating the penalty parameter in this process.

    In updating the involved variables, analytical solutions for subproblems (16)–(18) can be readily obtained by computing the derivatives of (18) with respect to x and setting them to zero. Furthermore, the subproblem (17), which is based on the 1-norm, can be solved exactly using the well-known soft-thresholding operator [56]. The subproblem (16), which relies on fλ(E;p), also has a closed-form solution [2], [47]. In summary, the iteration procedures for solving Problem (5) using the nonconvex ADMM are outlined in Algorithm 2. The iterations continue until either reaching the maximum iteration limit kmax or satisfying the predefined stopping criterion, which is set to ε=0.001, as specified in (14).

    Algorithm 2 Optimization algorithm for Problem (5)
    Input: testing sample matrix D, training sample matrices A1,A2,,An, and its column matrix HRm×n with m=p×q.
    Parameters: (p,λ,ρ,μ), ε, and kmax.
    Output: (E,x).
    1: Set k=0 and μ0;
    2: while not converged do
    3:  Update E via (16);
    4:  Update z via (17);
    5:  Update x via (18);
    6:  Update Λ1 via (19);
    7:  Update Λ2 via (20);
    8:  Update μ via (21);
    9: end while

    In this subsection, we begin by examining the computational complexity of Algorithms 1 and 2. The computational complexity is influenced by the SVD computations and matrix multiplications carried out during each iteration, as well as the total number of iterations. We then proceed to establish convergence guarantees, relying on the updated rules and the properties of the objective function. The detailed analysis is presented below from two perspectives.

    1) Computational complexity of Algorithms 1 and 2: The factorization technique efficiently computes an orthogonal matrix QRm×d, enabling a small SVD of QTZRd×n to be performed in O(nd2) time. This method reduces the time complexity of obtaining Z to O(mnd) due to the multiplication of two matrices of sizes m×n and n×d. In contrast, the complexity for computing the matrix inverse is O(n3) for a matrix size of n×n. The overall complexity of Algorithms 1 and 2 is determined by the complexity of each iteration multiplied by the total number of iterations.

    2) Convergence guarantees of Algorithm 1: The convergence analysis encompasses two technical results, which are presented respectively as follows:

    Lemma 1 Let {Θk=(Zk,Ek,Xk,Λ1,k,Λ2,k)} be the sequence generated by nonconvex ADMM, and assume that ki=01μi<+ holds. Then {Θk} is a bounded variable sequence.

    Theorem 1 Let {Θk=(Zk,Ek,Xk,Λ1,k,Λ2,k)} be the sequence generated by nonconvex ADMM, assume limj+μkj(Xkj+1Xkj)=0, limj+μkj(EkjEkj+1)=0, then any accumulation point Θ=(Z,E,X,Λ1,,Λ2,), it satisfies

    {0=A(Λ2,)Λ1,0g(E)+Λ2,0fλ(X)+Λ1, (22)

    as well as both X=Z and D=E+A(Z).

    Hence, the above results accord to the desired conclusions as having provided in [31]. In other words, the limiting point (Z,E,X,Λ1,,Λ2,) satisfies the KKT conditions of (7) and (Z,E,X) is a stationary point of Problem (4), respectively. This provides the local convergence property of Algorithm 1.

    3) Convergence guarantees of Algorithm 2: The convergence analysis yields similar theoretical results to those presented in Algorithm 1, which are given as follows:

    Lemma 2 Suppose that if(σi(DA(x)))+λxqq+ iff x+ and we assume that μk1(xk1xk)0 as k+, then the dual variable sequences {Λ1,k} and {Λ2,k} generated by Algorithm 2 are both bounded.

    Theorem 2 Let {Tk=(Ek,zk,xk,Λ1,k,Λ2,k)} be the sequence generated by Algorithm 2 for the case p=1, which has at least one accumulation point under the same assumptions with Lemma 2 and +i=0μi+μi+12μ2i<+. Then, for any accumulation point T=(E,z,x,Λ1,,Λ2,), and the (E,z,x) is a stationary point of Problem (5).

    Through the proof procedures provided in Theorems 1 and 2 above, we further establish the boundedness of primal variable sequences, as demonstrated in Lemmas 1 and 2. In essence, we confirm that any accumulation point attained from Algorithms 1 and 2 qualifies as a stationary point. It is worth noting that the local convergence proofs for these results exhibit similarities under comparable conditions and slight revisions. In addition, the elaborate proofs for these analyses are available in the previously published conference papers [31], [32]. For the sake of brevity and space conservation, we have omitted them in this work.

    In this section, our main objective is to showcase the superior learning efficacy and computational efficiency of nonconvex rank regularization methods in addressing RMC, LRR, and RMR problems, as respectively formulated in Problems (4) and (5). We aim to provide theoretical insights into their convergence properties and present visual results from various viewpoints, covering a range of vision tasks, including image inpainting, clustering, classification, and reconstruction. This section offers a considerably more comprehensive overview and validation than the previously published conference papers [31], [32], which focused solely on the proposed methods for the experimental comparisons to validate the evaluation efficacy and computational efficiency. Please notice that although the proposed methods have not yet achieved the highest level of computational efficiency, we have successfully validated the effectiveness of the enhanced acceleration technologies employed in our approaches.

    In the subsequent experiments, our primary focus is on benchmarking our proposed methods against several existing algorithms, utilizing notations commonly found in pertinent references. The comparison works are sourced from the corresponding journal publications and conference proceedings. These experiments are conducted using databases containing raw images and are executed on Matlab 2021b, running on a personal computer with 8.0 GB of RAM and an Intel(R) Core(TM) i7-7700 CPU@3.60 GHz. The forthcoming descriptions of experimental tasks and evaluations cover various perspectives, detailed below.

    • In the first experiment, we employ matrix completion methods to robustly recover a natural image with missing entries, as shown in Figure 4. The initial image has dimensions of 600×600×3, as shown in Figure 4(a). Subsequently, we truncate this image to create an incomplete version, treating it as a strictly low-rank matrix with a rank of 80, as depicted in Figure 4(b). The introduction of missing entries through text generates an image with incomplete sections, demonstrated in Figure 4(c). Considering that the image comprises three channels (red, green, and blue), we independently complete the missing pixel parts for each channel and then merge the results to achieve optimal inpainting performance. Our evaluation criteria include metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [2], [52]. Additionally, we assess computational efficiency in terms of CPU computation time (TIME) and the number of iterations (ITER). It is noteworthy that higher PSNR and SSIM values, coupled with lower TIME and ITER values, indicate the superiority of the evaluated methods to some degree.

    Figure  4.  A series of visual images and plotted curves depicting three scenarios. These plotted curves correspond to the red (R), green (G), and blue (B) channels, highlighting the significance of singular values in image inpainting tasks.

    • In the second experiment, we apply clustering methods for subspace segmentation to the ExtYaleB 1 dataset, which features face images from 38 subjects. Each image is resized to dimensions of 32×32 pixels. Raw matrices D are created with sizes of 1024×320 and 1024×640 by selecting the first 5 and 10 subjects, respectively. You can view partial face images in Figure 5. To evaluate the clustering results [57], [58], we employ metrics including clustering accuracy (ACC), normalized mutual information (NMI), and TIME. Additionally, we construct an affinity matrix W based on the optimal X, calculated as (|X|+|XT|)/2. Higher ACC and NMI values, along with lower TIME values, indicate superior performance and computational efficiency for the methods under consideration.

    Figure  5.  Partial face images with various illumination corruptions from the ExtYaleB database used for subspace clustering tasks.

    • In the third experiment, we employ regression techniques for image classification using two well-established face databases: AR 2 and ExtYaleB (same as above-given). These databases contain partial face images, as shown in Figure 6, with various occlusions like sunglasses, scarves, and baboons added to challenge the testing samples. Our training sets consist of clean face images selected based on predefined criteria [11], [28]. We optimize Problem (5) using Algorithm 2 to obtain optimal coefficient vectors for two scenarios. To establish class decisions, we use the rule identity(D)= argmini˜Ei(D) for i=1,2,,n, where ˜Ei(D) is computed as f(σ[A(x)A(ζi(x))])|ζi(x)|, with x representing the optimal solution for each testing sample and ζi(x) corresponding to the associated i-th class. Note that all the testing samples are processed in parallel to enhance computational efficiency.

    Figure  6.  Partial face images from (a) the AR dataset, which include occlusions like sunglasses and scarves and (b) the ExtYaleB dataset with random baboon occlusions utilized for training and testing in image classification tasks.

    • In the fourth experiment, we mainly conduct numerical assessments to validate the reconstruction capabilities of the proposed methods in both clustering and classification tasks. Our analysis begins by presenting the reconstructed and residual images from the illuminated ExtYaleB dataset. Furthermore, we meticulously curate two sets of face images, one from the AR dataset and another from the ExtYaleB dataset. One set comprises clean face images, while the other set features face images with three different occlusions. We apply the provided methods, leveraging four nonconvex functions for both clustering and classification tasks, respectively. Our particular emphasis is on comparing the reconstruction images and the residual images from these two vision tasks. In particular, the reconstruction images are achieved using the coefficient matrix and vector, while the residual images are obtained from the error matrix in the iteration process.

    This experiment aims to assess the performance of Algorithm 1 in optimizing the RMC problem for image inpainting. Subsequently, we evaluated the efficiency and effectiveness of several existing completion methods, resulting in the following observations from two key perspectives:

    Table 2 presents the inpainting results, evaluating our proposed methods across four different nonconvex functions. It includes a comprehensive comparison with various methods, encompassing both convex and nonconvex approaches, some of which incorporate acceleration techniques. The table features diverse methods and their respective outcomes, comparing values of PSNR, SSIM, TIME, and ITER. It is worth noting that for the p-norm and ETP, we chose p=0.5, whereas for the Laplace and Geman functions, we utilized p=0.001. The findings reveal that convex methods such as APG and augmented Lagrange multipliers (ALM) yield lower PSNR and SSIM values. Please note that ALM originates from the work presented in the arXiv preprint arXiv:1009.5055. We here substituted this reference with [46] to ensure the availability of a DOI number. In contrast, nonconvex methods, including iteratively reweighted nuclear norm (IRNN), generalized proximal gradient (GPG), partial singular value thresholding (PSVT), weighted nuclear norm minimization (WNNM), accelerated proximal alternating linearized minimization (AccPALM), and LpTNN (the model combining Lp-norm and the truncated nuclear norm), demonstrate superior performance, with notable effectiveness observed for both WNNM and AccPALM. Moreover, the utilization of acceleration techniques results in more efficient computations, as evident by ALM, IRNN, GPG, and AccPALM, which exhibit reduced iteration counts and/or lower computational time. It is obvious that these results underscore the advantages of our proposed methods, optimized by Algorithm 1, in terms of both inpainting efficacy and computational efficiency compared to most of the comparison methods.

    Table  2.  Numerous results presented for image inpainting to evaluate performance and computational efficiency
    Method PSNR SSIM TIME (s) ITER
    APG [42] 50.697 0.9997 78.2 (354, 365, 338)
    ALM [46] 50.864 0.9997 24.0 (96, 92, 100)
    IRNN [26] 51.905 0.9998 21.6 (68, 68, 68)
    GPG [47] 52.064 0.9998 21.7 (70, 69, 69)
    PSVT [59] 50.921 0.9997 46.9 (160, 163, 165)
    WNNM [21] 55.872 0.9999 132.9 (457, 593, 592)
    AccPALM [52] 54.234 0.9998 12.6 (101, 103, 105)
    L2/3TNN [60] 50.895 0.9997 374.1 (90, 91, 93)
    L1/2TNN [60] 50.895 0.9997 350.9 (88, 89, 91)
    ADMMp 52.844 0.9998 20.0 (177, 177, 179)
    ADMMETP 52.144 0.9997 20.1 (180, 180, 186)
    ADMMLaplace 51.910 0.9997 19.7 (178, 178, 181)
    ADMMGeman 52.227 0.9997 18.5 (177, 178, 176)
     | Show Table
    DownLoad: CSV

    • To evaluate our proposed methods under four nonconvex functions, we initiated the assessment by plotting convergence curves of the stopping condition for the p-norm and ETP. These plots illustrate the relationship between ITER and TIME, as shown in Figures 7(a) and (b), respectively. Furthermore, we presented the influence of different parameter choices for the p-value in the Laplace function and the initial ˉμ-value, which is used for the initial penalty parameter (i.e., ˉμD2) under the case of the Geman function on evaluation performance, as demonstrated in Figures 7(c) and (d). To sum up, these analyses not only validate the convergence properties of our approaches but also provide valuable insights into the multifaceted effects of various parameters.

    Figure  7.  Curves showing the stopping function over ITER and TIME for (a) the p-norm and (b) the ETP, respectively, and the effects on PSNR values under different values of ρ and ˉμ for (c) the Laplace function and (d) the Geman function, correspondingly.

    This subsection is dedicated to conducting a comprehensive comparison of subspace segmentation methods, which includes assessing both convex and nonconvex rank relaxation methods, as well as low-rank matrix factorization-based approaches. Our evaluation focuses on the unsupervised clustering task using the ExtYaleB dataset, considering scenarios with 5 and 10 subjects. Following this, we provide quantitative and qualitative results, along with highlighting significant observations, described as follows:

    Table 3 presents a comprehensive comparison of various clustering methods and exhibits advantages across four nonconvex functions. In the specific scenarios of our evaluations, it is apparent that LRR, least squares regression (LSR1, LSR2), and fast universal low rank representation (FULRR) exhibit relatively lower ACC and NMI values, along with less time consumption. In contrast, iteratively reweighted least squares (IRLS), smooth representation (SMR), the FULRR, Schatten-1/2 norm factorization based on subspace clustering (S1/2NFSC), Schatten-2/3 norm factorization based on subspace clustering (S2/3NFSC), AccPALM, and weighted schatten-p norm based on LRR (WsLRR) demonstrate improved ACC and NMI values compared to convex methods. Notably, LSR and SMR stand out for their higher efficiency in terms of computational costs. Furthermore, S1/2NFSC, S2/3NFSC, FULRR, WsLRR, and AccPALM leverage matrix factorization acceleration techniques and iteration reduction strategies, enabling them to efficiently achieve low-rank matrices with results that either outperform or match other methods. This naturally aligns with expectations since these nonconvex methods have been intentionally designed to strike a balance between accuracy and efficiency by using Algorithm 1 to optimize Problem (4).

    Table  3.  Comparisons of numerical results for subspace segmentation using the ExtYaleB dataset with 5 and 10 subjects
    Method 5 subjects 10 subjects
    ACC (%) NMI (%) TIME (s) ACC (%) NMI (%) TIME (s)
    LRR [10] 79.375 81.636 7.7 75.313 79.285 29.3
    IRLS [61] 89.688 81.656 17.5 72.969 66.017 93.4
    LSR1 [62] 94.063 87.481 <lt; $1 60.000 53.131 <lt; $1
    LSR2 [62] 95.625 92.173 <lt; $1 58.125 51.218 <lt; $1
    SMR [63] 94.688 86.634 <lt; $1 69.375 67.096 <lt; $1
    FULRR [64] 90.313 79.606 12.1 84.063 77.463 43.4
    WsLRR [65] 96.875 91.554 8.2 76.719 77.289 33.8
    S1/2NFSC [57] 86.563 74.753 3.1 75.000 75.207 10.0
    S2/3NFSC [57] 88.750 77.842 3.1 73.750 75.018 10.4
    AccPALM [52] 89.063 76.065 <lt; $1 70.469 62.672 3.5
    ADMMp 97.500 92.900 2.1 84.063 79.192 9.8
    ADMMETP 97.813 93.888 1.8 85.469 80.672 9.0
    ADMMLaplace 96.563 91.300 2.4 84.844 80.065 10.4
    ADMMGeman 97.188 93.059 2.3 84.688 79.967 10.3
     | Show Table
    DownLoad: CSV

    Figure 8 offers visual comparisons for our proposed methods employing four different nonconvex functions with 5 subjects in Figures 8(a) and (b) and 10 subjects in Figures 8(c) and (d), respectively. In Figure 8(a), we present the block-diagonal structures of the coefficient matrix with normalization, shedding light on their relationship with the number of subjects for the p-norm. Moving to Figure 8(b), we display the statistical distribution of the coefficient matrix, offering insights into the representation of clustering samples related to the ETP function. Meanwhile, Figure 8(c) allows us to analyze the impact of parameter choices through experiments on the Laplace function. Finally, we provide an analysis of the influences of randomly initialized variables on clustering accuracy in Figure 8(d) based on the Geman norm through 1000 runs. These visual results enhance the understanding of our proposed methods, thereby confirming their feasibility and stability from various viewpoints in this unsupervised task.

    Figure  8.  (a) and (b) Visual comparison results showcased with the ExtYaleB dataset, featuring relevant coefficient matrices for 5 subjects displayed; (c) and (d) Analysis of the influences of the parameters and initial variables for 10 subjects, illustrating their effects and validating the property across four nonconvex functions.

    In this subsection, we evaluate the performance of various classification methods using two experimental datasets: the AR dataset and the ExtYaleB dataset with three different types of occlusions. Moreover, we utilize the coefficient vectors obtained from (15). Following the previous studies [10], [45], the training samples remain uncorrupted, while the testing samples intentionally incorporate corruption. Then, we provide a comprehensive analysis encompassing both numerical and visual results, as detailed below.

    Table 4 presents a comparative analysis of classification methods based on their experimental results, considering ACC and TIME. We note that the integration of prior information into regression methods, including nuclear norm based matrix regression (NMR), faster NMR (FNMR), generalized iterated shrinkage algorithm (GISA), and our proposed methods, results in an enhanced classification performance at the expense of increased computation time compared to other methods, except for adaptive locality preserving regression (ALPR). This is attributed to the improved evaluation performance achieved by incorporating proper measurements for the noises. In contrast, collaborative representation classifier (CRC), GISA, and unified linear regression (ULR) demonstrated lower computational costs as they omit SVD computations in their iterative processes. This suggests that avoiding SVD computations can contribute to improved computational efficiency. Especially, we employ the four nonconvex functions, addressing each sample in parallel based on Algorithm 2. The results provided underscore the superiority of these proposed classification methods under various occlusions in both AR and ExtYaleB datasets. This superiority is due to the proper measurements for the residual error and the coefficient vector, simultaneously.

    Table  4.  Evaluating the quantitative effectiveness and efficiency of various classification methods on both the AR and ExtYaleB datasets, considering three various occlusions
    Method AR_sunglass AR_scarf ExtYaleB_baboon
    ACC (%) TIME (s) ACC (%) TIME (s) ACC (%) TIME (s)
    CRC [66] 53.167 3.6 67.500 3.5 55.483 14.9
    NMR [28] 73.000 340.5 76.333 366.2 96.491 829.5
    FNMR [28] 73.000 216.5 76.167 232.1 96.491 489.2
    ALPR [67] 41.667 207.5 51.167 145.3 44.518 387.4
    ULR* [14] 50.833 37.1 66.667 36.3 52.851 77.1
    ULR** [14] 50.833 11.5 67.883 11.6 52.851 22.5
    GISA2/3 [68] 50.000 6.2 65.667 6.0 63.816 20.1
    GISA1/2 [68] 50.500 6.1 66.000 6.1 63.816 20.9
    ADMMp 83.000 98.1 79.667 126.6 99.561 234.7
    ADMMETP 82.833 55.9 78.500 54.7 99.561 131.5
    ADMMLaplace 85.500 122.7 82.167 123.8 99.781 282.3
    ADMMGeman 85.833 120.3 82.333 115.6 99.781 266.9
     | Show Table
    DownLoad: CSV

    Figure 9 offers visual comparisons of our proposed methods using four different nonconvex functions for classification. In Figures 9(a) and (b), we illustrate the statistical distribution of the coefficient vector with normalization, using both the p-norm and the ETP function for the AR datasets. These visuals clearly reveal that the coefficient values for the same subjects have larger magnitudes, while those for different subjects tend to be lower. In contrast, Figures 9(c) and (d) display the reconstruction errors for the ExtYaleB dataset, with a focus on the Laplace and Geman functions. In these scenarios, the reconstruction errors for the same subjects as the testing samples exhibit lower values, whereas errors for different subjects demonstrate higher values. These visual results offer valuable insights into the efficiency of our proposed methods.

    Figure  9.  Visual results from the AR dataset displaying (a) the reconstruction errors and (b) the statistical distribution of the coefficient vector for p-norm and ETP functions, and (c) and (d) similar results from the ExtYaleB dataset for the Laplace and Geman functions. These outcomes are associated with four different nonconvex functions.

    This subsection focuses on evaluating the visual comparisons of our proposed clustering and classification methods in terms of their reconstruction capabilities. We utilize two experimental datasets with illumination corruptions and various occlusions. Subsequently, we present the reconstructed and residual images generated under four distinct nonconvex functions. Meanwhile, the visual results will offer valuable insights and observations, as follows:

    • For the clustering task, our focus is on presenting visual examples in Figures 10(a)–(d) from the top to the bottom row. These images showcase a sequence of reconstructed images alongside corresponding residual images, obtained from the ExtYaleB dataset. Upon a detailed examination of these samples, it becomes apparent that our methods excel at mitigating illumination effects in the clustered images compared to Figure 5. Importantly, this reinforces the observations made during the validation of their reconstruction ability.

    Figure  10.  Partial face reconstruction and residual images generated using subspace clustering for 10 subjects. These were achieved through the utilization of four distinct nonconvex functions in our proposed methods, all applied to the ExtYaleB dataset.

    • For the classification task, we present visual results using face images from the AR and ExtYaleB dataset in Figures 11(a)–(d) from left to right, demonstrating the outcomes of the proposed methods employing four distinct nonconvex functions for processing the raw images, as shown in Figure 6. Additionally, we evaluate the reconstruction capabilities of these methods by analyzing the differences and similarities in reducing occlusions to some extent. It is noteworthy that among the relaxation functions used in our proposed methods, those based on the involved nonconvex functions consistently yield superior reconstruction performance in the regression process and analysis.

    Figure  11.  Partial face reconstruction and residual images obtained within the context of image classification. This was accomplished using four distinct nonconvex functions, with input data sourced from two datasets: the AR dataset, which contained images with sunglass and scarf occlusions, and the ExtYaleB dataset with baboon occlusions.

    This work focuses on the development of efficient and effective nonconvex multiple ADMM techniques, enhanced with randomized SVD acceleration, factorization, and/or parallel computing approaches. These techniques are designed to address a class of generalized nonconvex rank regularization problems under equality constraints, effectively extending the scope of problems like RMC, LRR, and RMR. By deriving update rules for the involved subproblems, which benefit from closed-form solutions and milder assumptions. Morover, we established the boundedness of one, and in some cases, two dual variables. Additionally, we delved into the computational complexity and provable convergence properties of the variable sequence generated by our approach. Our numerical experiments, conducted on real-world data, consistently confirm the theoretical properties, demonstrating the efficiency of our methodology. Extensive experiments in image inpainting, clustering, face classification, and reconstruction clearly showcase the superiority of our methods over most related ones in terms of both performance and efficiency.

    As part of the future work, it will be pivotal to tackle the challenges associated with accelerating generalized proximal operators and conduct comprehensive global convergence analyses. This holds particular significance for advancing nonconvex first-order algorithms, especially in multi-variable low-rank matrix learning scenarios [54]. Additionally, we aim to broaden the applicability of the proposed algorithms by addressing nonconvex tensor recovery problems [16], [69]–[71]. Furthermore, we also plan to explore the integration of specific auxiliary information [15], [32], [67], [72], which could contribute significantly to enhancing the evaluation performance of the proposed methods.

    This work was supported by the Ministry of Education, Republic of Singapore, through its Start-Up Grant and Academic Research Fund Tier 1 (Grant No. RG61/22), and in part by the National Natural Science Foundation of China (Grant No. 61906067), and the China Postdoctoral Science Foundation (Grant Nos. 2019M651415 and 2020T13019).

  • [1]
    N. He, R. L. Wang, J. Lyu, et al., “Low-rank combined adaptive sparsifying transform for blind compressed sensing image recovery,” Chinese Journal of Electronics, vol. 29, no. 4, pp. 678–685, 2020. DOI: 10.1049/cje.2020.05.014
    [2]
    Z. X. Hu, F. P. Nie, R. Wang, et al., “Low rank regularization: A review,” Neural Networks, vol. 136, pp. 218–232, 2021. DOI: 10.1016/j.neunet.2020.09.021
    [3]
    K. W. Tang, J. Zhang, C. S. Zhang, et al., “Unsupervised, supervised and semi-supervised dimensionality reduction by low-rank regression analysis,” Chinese Journal of Electronics, vol. 30, no. 4, pp. 603–610, 2021. DOI: 10.1049/cje.2021.05.002
    [4]
    Y. Q. Cai, X. W. Lu, and N. Jiang, “A survey on quantum image processing,” Chinese Journal of Electronics, vol. 27, no. 4, pp. 718–727, 2018. DOI: 10.1049/cje.2018.02.012
    [5]
    Z. Zhang, Y. Xu, J. Yang, et al., “A survey of sparse representation: Algorithms and applications,” IEEE Access, vol. 3, pp. 490–530, 2015. DOI: 10.1109/ACCESS.2015.2430359
    [6]
    Y. Chen, J. Yang, L. Luo, et al., “Adaptive noise dictionary construction via IRRPCA for face recognition,” Pattern Recognition, vol. 59, pp. 26–41, 2016. DOI: 10.1016/j.patcog.2016.02.005
    [7]
    E. J. Candès, X. D. Li, Y. Ma, et al., “Robust principal component analysis?,” Journal of ACM, vol. 58, no. 3, article no. 11, 2011. DOI: 10.1145/1970392.1970395
    [8]
    E. J. Candes and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053–2080, 2010. DOI: 10.1109/TIT.2010.2044061
    [9]
    F. P. Nie, H. Wang, H. Huang, et al., “Joint Schatten-p norm and l p-norm robust matrix completion for missing value recovery,” Knowledge and Information Systems, vol. 42, no. 3, pp. 525–544, 2015. DOI: 10.1007/s10115-013-0713-z
    [10]
    G. C. Liu, Z. C. Lin, S. C. Yan, et al., “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013. DOI: 10.1109/TPAMI.2012.88
    [11]
    J. C. Xie, J. Yang, J. J. Qian, et al., “Robust nuclear norm-based matrix regression with applications to robust face recognition,” IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2286–2295, 2017. DOI: 10.1109/TIP.2017.2662213
    [12]
    H. M. Zhang, S. Y. Li, J. Qiu, et al., “Efficient and effective nonconvex low-rank subspace clustering via SVT-free operators,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 12, pp. 7515–7529, 2023. DOI: 10.1109/TCSVT.2023.3275299
    [13]
    L. Yang, T. K. Pong, and X. J. Chen, “Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction,” SIAM Journal on Imaging Sciences, vol. 10, no. 1, pp. 74–110, 2017. DOI: 10.1137/15M1027528
    [14]
    H. M. Zhang, F. Qian, B. Zhang, et al., “Incorporating linear regression problems into an adaptive framework with feasible optimizations,” IEEE Transactions on Multimedia, vol. 25, pp. 4041–4051, 2023. DOI: 10.1109/TMM.2022.3171088
    [15]
    J. M. Liu, Y. J. Chen, J. S. Zhang, et al., “Enhancing low-rank subspace clustering by manifold regularization,” IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 4022–4030, 2014. DOI: 10.1109/TIP.2014.2343458
    [16]
    Y. Y. Chen, S. Q. Wang, C. Peng, et al., “Generalized nonconvex low-rank tensor approximation for multi-view subspace clustering,” IEEE Transactions on Image Processing, vol. 30, pp. 4022–4035, 2021. DOI: 10.1109/TIP.2021.3068646
    [17]
    J. J. Qian, W. K. Wong, H. M. Zhang, et al., “Joint optimal transport with convex regularization for robust image classification,” IEEE Transactions on Cybernetics, vol. 52, no. 3, pp. 1553–1564, 2022. DOI: 10.1109/TCYB.2020.2991219
    [18]
    J. J. Qian, S. M. Zhu, W. K. Wong, et al., “Dual robust regression for pattern classification,” Information Sciences, vol. 546, pp. 1014–1029, 2021. DOI: 10.1016/j.ins.2020.09.062
    [19]
    C. Gong, H. M. Zhang, J. Yang, et al., “Learning with inadequate and incorrect supervision,” in Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA, pp. 889–894, 2017.
    [20]
    H. M. Zhang, C. Gong, J. J. Qian, et al., “Efficient recovery of low-rank matrix via double nonconvex nonsmooth rank minimization,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 10, pp. 2916–2925, 2019. DOI: 10.1109/TNNLS.2019.2900572
    [21]
    S. H. Gu, Q. Xie, D. Y. Meng, et al., “Weighted nuclear norm minimization and its applications to low level vision,” International Journal of Computer Vision, vol. 121, no. 2, pp. 183–208, 2017. DOI: 10.1007/s11263-016-0930-5
    [22]
    Y. Hu, D. B. Zhang, J. P. Ye, et al., “Fast and accurate matrix completion via truncated nuclear norm regularization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2117–2130, 2013. DOI: 10.1109/TPAMI.2012.271
    [23]
    C. X. Gao, N. Y. Wang, Q. Yu, et al., “A feasible nonconvex relaxation approach to feature selection,” in Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, pp. 356–361, 2011.
    [24]
    J. Trzasko and A. Manduca, “Highly undersampled magnetic resonance image reconstruction via homotopic l0-minimization,” IEEE Transactions on Medical Imaging, vol. 28, no. 1, pp. 106–121, 2009. DOI: 10.1109/TMI.2008.927346
    [25]
    D. Geman and C. D. Yang, “Nonlinear image recovery with half-quadratic regularization,” IEEE Transactions on Image Processing, vol. 4, no. 7, pp. 932–946, 1995. DOI: 10.1109/83.392335
    [26]
    C. Y. Lu, J. H. Tang, S. C. Yan, et al., “Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 829–839, 2016. DOI: 10.1109/TIP.2015.2511584
    [27]
    H. M. Zhang, J. J. Qian, J. B. Gao, et al., “Scalable proximal Jacobian iteration method with global convergence analysis for nonconvex unconstrained composite optimizations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 9, pp. 2825–2839, 2019. DOI: 10.1109/TNNLS.2018.2885699
    [28]
    J. Yang, L. Luo, J. J. Qian, et al., “Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 1, pp. 156–171, 2017. DOI: 10.1109/TPAMI.2016.2535218
    [29]
    L. Luo, J. Yang, J. J. Qian, et al., “Robust image regression based on the extended matrix variate power exponential distribution of dependent noise,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 9, pp. 2168–2182, 2017. DOI: 10.1109/TNNLS.2016.2573644
    [30]
    H. M. Zhang, J. Yang, J. J. Qian, et al., “Efficient image classification via structured low-rank matrix factorization regression,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1496–1509, 2024. DOI: 10.1109/TIFS.2023.3337717
    [31]
    H. M. Zhang, W. Luo, W. L. Du, et al., “Robust recovery of low rank matrix by nonconvex rank regularization,” in Proceedings of the 11th International Conference on Image and Graphics, Haikou, China, pp. 106–119, 2021.
    [32]
    H. M. Zhang, W. L. Du, Z. M. Li, et al., “Nonconvex rank relaxations based matrix regression for face reconstruction and recognition,” in Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, pp. 2335–2340, 2020.
    [33]
    H. M. Zhang, J. B. Gao, J. J. Qian, et al., “Linear regression problem relaxations solved by nonconvex ADMM with convergence analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 828–838, 2024. DOI: 10.1109/TCSVT.2023.3291821
    [34]
    S. Boyd, N. Parikh, E. Chu, et al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011.
    [35]
    H. M. Zhang, F. Qian, P. Shi, et al., “Generalized nonconvex nonsmooth low-rank matrix recovery framework with feasible algorithm designs and convergence analysis,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 9, pp. 5342–5353, 2023. DOI: 10.1109/TNNLS.2022.3183970
    [36]
    H. M. Zhang, J. Yang, J. C. Xie, et al., “Weighted sparse coding regularized nonconvex matrix regression for robust face recognition,” Information Sciences, vol. 394–395, pp. 1–17, 2017. DOI: 10.1016/j.ins.2017.02.020
    [37]
    X. Peng, C. Y. Lu, Z. Yi, et al., “Connections between nuclear-norm and frobenius-norm-based representations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 1, pp. 218–224, 2018. DOI: 10.1109/TNNLS.2016.2608834
    [38]
    H. M. Zhang, J. Yang, J. J. Qian, et al., “Nonconvex relaxation based matrix regression for face recognition with structural noise and mixed noise,” Neurocomputing, vol. 269, pp. 188–198, 2017. DOI: 10.1016/j.neucom.2016.12.095
    [39]
    S. Magnusson, P. C. Weeraddana, M. G. Rabbat, et al., “On the convergence of alternating direction Lagrangian methods for nonconvex structured optimization problems,” IEEE Transactions on Control of Network Systems, vol. 3, no. 3, pp. 296–309, 2016. DOI: 10.1109/TCNS.2015.2476198
    [40]
    D. Y. Yin, G. Q. Wang, B. Xu, et al., “Image deblurring with mixed regularization via the alternating direction method of multipliers,” Journal of Electronic Imaging, vol. 24, no. 4, pp. 043020–043020, 2015. DOI: 10.1117/1.JEI.24.4.043020
    [41]
    Q. M. Yao, J. T. Kwok, T. F. Wang, et al., “Large-scale low-rank matrix learning with nonconvex regularizers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 11, pp. 2628–2643, 2019. DOI: 10.1109/TPAMI.2018.2858249
    [42]
    K. C. Toh and S. Yun, “An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific Journal of Optimization, vol. 6, no. 3, pp. 615–640, 2010.
    [43]
    H. M. Zhang, J. J. Qian, B. Zhang, et al., “Low-rank matrix recovery via modified Schatten-p norm minimization with convergence guarantees,” IEEE Transactions on Image Processing, vol. 29, pp. 3132–3142, 2020. DOI: 10.1109/TIP.2019.2957925
    [44]
    H. M. Zhang, F. Qian, F. H. Shang, et al., “Global convergence guarantees of (A)GIST for a family of nonconvex sparse learning problems,” IEEE Transactions on Cybernetics, vol. 52, no. 5, pp. 3276–3288, 2022. DOI: 10.1109/TCYB.2020.3010960
    [45]
    C. Y. Lu, Z. C. Lin, and S. C. Yan, “Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization,” IEEE Transactions on Image Processing, vol. 24, no. 2, pp. 646–654, 2015. DOI: 10.1109/TIP.2014.2380155
    [46]
    Z. C. Lin, R. S. Liu, and Z. X. Su, “Linearized alternating direction method with adaptive penalty for low-rank representation,” in Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, pp. 612–620, 2011.
    [47]
    C. Y. Lu, C. B. Zhu, C. Y. Xu, et al., “Generalized singular value thresholding,” in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, pp. 1805–1811, 2015.
    [48]
    H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods,” Mathematical Programming, vol. 137, no. 1–2, pp. 91–129, 2013. DOI: 10.1007/s10107-011-0484-9
    [49]
    J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Mathematical Programming, vol. 146, no. 1–2, pp. 459–494, 2014. DOI: 10.1007/s10107-013-0701-9
    [50]
    H. M. Zhang, W. L. Du, X. Q. Liu, et al., “Factored trace lasso based linear regression methods: Optimizations and applications,” in Proceedings of the 5th International Conference on Cognitive Systems and Signal Processing, Zhuhai, China, pp. 121–130, 2021.
    [51]
    F. H. Shang, Y. Y. Liu, H. H. Tong, et al., “Robust bilinear factorization with missing and grossly corrupted observations,” Information Sciences, vol. 307, pp. 53–72, 2015. DOI: 10.1016/j.ins.2015.02.026
    [52]
    H. M. Zhang, B. H. Wen, Z. Y. Zha, et al., “Accelerated PALM for nonconvex low-rank matrix recovery with theoretical analysis,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2304–2317, 2024.
    [53]
    H. Nikaidô, “On von Neumann’s minimax theorem,” Pacific Journal of Mathematics, vol. 4, no. 1, pp. 65–72, 1954. DOI: 10.2140/pjm.1954.4.65
    [54]
    C. H. Chen, B. S. He, Y. Y. Ye, et al., “The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent,” Mathematical Programming, vol. 155, no. 1–2, pp. 57–79, 2016. DOI: 10.1007/s10107-014-0826-5
    [55]
    J. H. Chen and J. Yang, “Robust subspace segmentation via low-rank representation,” IEEE Transactions on Cybernetics, vol. 44, no. 8, pp. 1432–1445, 2014. DOI: 10.1109/TCYB.2013.2286106
    [56]
    A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM Journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. DOI: 10.1137/080716542
    [57]
    H. M. Zhang, J. Yang, F. H. Shang, et al., “LRR for subspace segmentation via tractable Schatten-p norm minimization and factorization,” IEEE Transactions on Cybernetics, vol. 49, no. 5, pp. 1722–1734, 2019. DOI: 10.1109/TCYB.2018.2811764
    [58]
    J. Chen, H. Mao, Z. Wang, et al., “Low-rank representation with adaptive dictionary learning for subspace clustering,” Knowledge-Based Systems, vol. 223, artilce no. 107053, 2021. DOI: 10.1016/j.knosys.2021.107053
    [59]
    T. H. Oh, Y. W. Tai, J. C. Bazin, et al., “Partial sum minimization of singular values in robust PCA: Algorithm and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 4, pp. 744–758, 2016. DOI: 10.1109/TPAMI.2015.2465956
    [60]
    H. Liang, L. Kang, and J. J. Huang, “A robust low-rank matrix completion based on truncated nuclear norm and Lp-norm,” The Journal of Supercomputing, vol. 78, no. 11, pp. 12950–12972, 2022. DOI: 10.1007/s11227-022-04385-8
    [61]
    M. J. Lai, Y. Y. Xu, and W. T. Yin, “Improved iteratively reweighted least squares for unconstrained smoothed l q-minimization,” SIAM Journal on Numerical Analysis, vol. 51, no. 2, pp. 927–957, 2013. DOI: 10.1137/110840364
    [62]
    C. Y. Lu, H. Min, Z. Q. Zhao, et al., “Robust and efficient subspace segmentation via least squares regression,” in Proceedings of the 12th European Conference on Computer Vision, Florence, Italy, pp. 347–360, 2012.
    [63]
    H. Hu, Z. C. Lin, J. J. Feng, et al., “Smooth representation clustering,” in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 3834–3841, 2014.
    [64]
    Q. Q. Shen, Y. S. Liang, S. Y. Yi, et al., “Fast universal low rank representation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1262–1272, 2022. DOI: 10.1109/TCSVT.2021.3078327
    [65]
    Q. Q. Shen, Y. Y. Chen, Y. S. Liang, et al., “Weighted Schatten p-norm minimization with logarithmic constraint for subspace clustering,” Signal Processing, vol. 198, artilce no. 108568, 2022. DOI: 10.1016/j.sigpro.2022.108568
    [66]
    L. Zhang, M. Yang, and X. C. Feng, “Sparse representation or collaborative representation: Which helps face recognition?,” in Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, pp. 471–478, 2011.
    [67]
    J. Wen, Z. F. Zhong, Z. Zhang, et al., “Adaptive locality preserving regression,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 1, pp. 75–88, 2020. DOI: 10.1109/TCSVT.2018.2889727
    [68]
    W. M. Zuo, D. Y. Meng, L. Zhang, et al., “A generalized iterated shrinkage algorithm for non-convex sparse coding,” in Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, pp. 217–224, 2013.
    [69]
    X. L. Sun, Y. Hai, X. J. Zhang, et al., “Adaptive tensor rank approximation for multi-view subspace clustering,” Chinese Journal of Electronics, vol. 32, no. 4, pp. 840–853, 2023. DOI: 10.23919/cje.2022.00.180
    [70]
    Y. P. Liu, J. N. Liu, and C. Zhu, “Low-rank tensor train coefficient array estimation for tensor-on-tensor regression,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5402–5411, 2020. DOI: 10.1109/TNNLS.2020.2967022
    [71]
    Y. P. Liu, Z. Long, H. Y. Huang, et al., “Low CP rank and tucker rank tensor completion for estimating missing components in image data,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 4, pp. 944–954, 2020. DOI: 10.1109/TCSVT.2019.2901311
    [72]
    H. M. Zhang, J. Y. Zhao, B. Zhang, et al., “Unified framework for faster clustering via joint Schatten p-norm factorization with optimal mean,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 3, pp. 3012–3026, 2024.

Catalog

    Figures(11)  /  Tables(4)

    Article Metrics

    Article views (509) PDF downloads (77) Cited by()
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return