LAMOST Assisting Machine Learning in Multi-wavelength Data Classification

Since the 20th century, with the development and implementation of various ground- and space-based astronomical observational equipment, astronomical observations have covered the entire electromagnetic waveband from radio waves, infrared waves, visible light, ultraviolet light to X-rays and gamma rays. Astronomy has entered a new era of full band-big data-massive information.


The abundant data provide opportunities to explore the nature of all kinds of celestial bodies and astronomical phenomena, however, they also challenge astronomers with their tremendous quantity, quality and complexity.


By means of cross-identification, Profs. Yanxia Zhang and Yongheng Zhao from National Astronomical Observatories of Chinese Academy of Sciences (NAOC) and Prof. Xuebing Wu from Peking University have constructed samples with information from X-ray, optical and/or infrared bands in their new study. With machine learning methods, they built optimal classifiers suitable for samples in different bands, providing classification predictions and probabilities for 4XMM-DR9 of the X-ray Multi-Mirror Mission (XMM-Newton).


The results have been published in Monthly Notices of the Royal Astronomical Society.


Figure 1: Multi-band observation related to this work. (Image by Yue Wang) 


The released serendipitous source catalogue 4XMM-DR9 of XMM-Newton includes more than half a million unique sources. It provides abundant observed information in X-ray band for celestial X-ray sources and helps scientists to uncover cosmic mysteries, such as back holes, the formation and evolution of galaxies, and the origins of the universe.


"Since most of the observed X-ray sources in the 4XMM-DR9 catalogue is unknown, it is of great importance to classify them," said Prof. Yanxia Zhang, the first author of the research paper.


Taking advantage of the spectra obtained with the Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST) and the Sloan Digital Sky Survey (SDSS), the researchers obtained samples with known spectral classes. Correlating the data of XMM-Newton with those from SDSS and the Wide-field Infrared Survey Explorer (WISE), the multi-wavelength data from X-ray, optical and infrared bands were got.


Various machine learning methods were applied to different samples from different bands, and the optimal machine learning models with the best input patterns were built according to different samples. These created models were applied to predict the classification and assign their membership as well as membership probabilities for all the X-ray sources in the 4XMM-DR9 catalogue.


"Our classification results will be of great value for further research of X-ray sources in greater detail," said Prof. Yongheng Zhao. "This work also shows the superiority of machine learning in the big data era of astronomy."


Figure 2: Examples of the distribution of stars, galaxies, and quasars in 2D space. (Image by Yanxia Zhang)


The paper can be accessed at