With the development of railway, high speed and intensive rolling will significantly increase the probability of fatigue damage and they lay a hidden danger for major safety accidents. Therefore, it is vital to detect defects of the rail. The commonly used algorithms mostly trained with the defect samples whereas this data is rare and cannot cover all types of defects. On the contrary, the background samples are relatively easy to obtain. In this work, a new algorithm is proposed which models the background of rail samples with memory items in order to exploit the full use of the background information. The samples that do not fit with the models are considered as defect candidates. Our model is only trained with the normal samples and a small amount of defect samples. When new patterns occur, they will be considered as defect candidates and the model is not limited to the known defect patterns. Additionally, the proposed metric learning modules enable the model to learn representative memory items, and enlarges the feature space distance between defect and normal samples as well as reducing the influence of disturbance. In order to show the efficacy of the proposed algorithm, it has been validated with various types of rail defects, ranging from the visually apparent defects to visually weak defects. In the experiments, the thermography inspection system scans at a constant speed while inductively heats the rail tracks continuously to record the infrared thermal image sequences from the rail surface. Recent anomaly detection algorithms are used as platform for objective performance evaluation. The algorithms are evaluated quantitatively based on Area under ROC curve (AUC) at both image level and pixel level along with the visual comparative analysis.