This paper presents a hardware-optimized variant of the well-known Gaussian elimination and its IEEE-754 single-precision FPGA implementation with highly efficient design, which is worked as an Application function unit (AFU) in a loosely-coupled reconfigurable computing prototype system. In this design, pipelined floating-point operators are employed supported by opensource FPLibrary. The design is mainly composed of uniformly distributed entries, yielding a standalone worst case runtime of O(n2) opposed to O(n3) of the software replication. The results indicate that 15 times-speedup is achieved comparing to the software run by a 2.6GHz Pentium4 CPU with 1GB main memory. To evaluate the hardware, a simple model of reconfigurable system has also been proposed using a Xilinx ML555 board which connects and communicates with a desktop computer via the PCIe port. DMA access method is used for data block transport between host and AFU. To the best of authors’ knowledge, there is no efficient floating-point FPGA for solving Linear systems of equations (LSEs) in the previous work.