J/MNRAS/506/1651       XGBoost ML classifier of BASS DR3 sources     (Li+, 2021)

Identification of BASS DR3 sources as stars, galaxies, and quasars by XGBoost. Li C., Zhang Y., Cui C., Fan D., Zhao Y., Wu X.-B., He B., Xu Y., Li S., Han J., Tao Y., Mi L., Yang H., Yang S. <Mon. Not. R. Astron. Soc., 506, 1651-1664 (2021)> =2021MNRAS.506.1651L 2021MNRAS.506.1651L (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; Galaxies ; Stars, normal ; YSOs ; Optical ; Infrared sources ; Photometry ; Spectral types Keywords: methods: data analysis - methods: statistical - astronomical data bases: miscellaneous - catalogues - stars: general - galaxies: general Abstract: The Beijing-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue was released in 2019, which contains the data from all BASS and the Mosaic z-band Legacy Survey (MzLS) observations during 2015 January and 2019 March, about 200 million sources. We cross-match BASS DR3 with spectral data bases from the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST) to obtain the spectroscopic classes of known samples. Then, the samples are cross-matched with ALLWISE data base. Based on optical and infrared information of the samples, we use the XGBoost algorithm to construct different classifiers, including binary classification and multiclass classification. The accuracy of these classifiers with the best input patterns is larger than 90.0 per cent. Finally, all selected sources in the BASS DR3 catalogue are classified by these classifiers. The classification label and probabilities for individual sources are assigned by different classifiers. When the predicted results by binary classification are the same as multiclass classification with optical and infrared information, the number of star, galaxy, and quasar candidates is separately 12 375 838 (PS > 0.95), 18 606 073 (PG > 0.95), and 798 928 (PQ > 0.95). For these sources without infrared information, the predicted results can be as a reference. Those candidates may be taken as input catalogue of LAMOST, DESI (Dark Energy Spectroscopic Instrument), or other projects for follow-up observation. The classified result will be of great help and reference for future research of the BASS DR3 sources. Description: The BASS used the 2.3-m Bok telescope to take g and r band imaging over a sky area of about 5400 deg2 in the northern Galactic cap at δ > 30 degrees, MzLS used the 4-m Mayall telescope to obtain z band imaging over a similar sky area to BASS (δ > 32 degrees). In 2019, the BASS DR3 was released, which contains the data from all BASS and MzLS observations during 2015 January and 2019 March. The DR3 includes single-epoch photometric catalogue and coadded photometric catalogue. Sources in DR3 are detected in stacked images and are required to be identified in at least two bands. We use coadded photometric catalogue from BASS DR3, which contains about 200 million sources (Zou et al. 2019ApJS..245....4Z 2019ApJS..245....4Z). According to the median 5σ AB magnitude depths , we handle the BASS DR3 catalogue by removing out-of-range or bad pixel data. Thus, the number of selected sources of BASS DR3 is 110 896 598 for classification. The sources from different data bases may be correlated by positional cross-match. Named Sample I (BASS-SDSS-LAMOST), and then this sample is cross-matched with ALLWISE in 4 arcsec radius by CDS Upload X-Match of the software topcat. We get the BASS-SDSS-LAMOST-ALLWISE sample as the known sample with identified spectral classes, named Sample II. All photometries in the samples are extinction-corrected according to the work (Schindler et al. 2017ApJ...851...13S 2017ApJ...851...13S) and AB magnitudes are adopted. Introducing and explaining machine learning model XGBoost in the section 3 and the classifer model especially the importance of feature input patterns which affects Binary classifier and Multiclass classifier of XGBoost, (see sections 4.1-4.3). We thus detail and show the performances linked to classifiers throughtout the choice of the input patterns and then make us select patterns having the best accuracies and performances. Thus, our goal is to separate galaxies, stars, and quasars with photometric data (BASS optical filters: g, r, z; ALLWISE mid-IR: W1, W2) using XGBoost. Compared to 1D histogram and 2D scatter plot, XGBoost may take all features into account and solve the classification problem. As a result, XGBoost outperforms random forest for our case in terms of efficiency, (i.e see best efficiency resumed in table 7 section 4.4 Discussion). BASS-DR3 sources (110 896 598) are first classified into extended sources (galaxies) and point sources (stars and quasars) by Classifier 1st, then the point sources are further separated into stars and quasars by Classifier 2nd. BASS-DR3 sources are also directly divided into galaxies, stars, and quasars by Classifier 3rd. By correlating BASS-DR3 sources with ALLWISE data base, we obtain BASS-DR3-ALLWISE sources (43 859 467). Similar to classifying BASS DR3 sources, BASS-DR3-ALLWISE sources are first grouped into extended sources (galaxies) and point sources (stars and quasars) by Classifier 4th, then the point sources are further discriminated into stars and quasars by Classifier 5th. BASS-DR3-ALLWISE sources are directly distinguished into galaxies, stars, and quasars by Classifier 6th, (i.e see The six classifiers of XGBoost constructed in section 5 application of classifiers). Finally, for the BASS DR3 sources, all predicted results are combined in a whole bass_dr3.sam. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file bass_dr3.sam 325 1000 Photometric and classifications data results of BASS DR3 sources (110896599 sources) -------------------------------------------------------------------------------- See also: VII/28 : Southern Groups and Clusters of Galaxies (Duus, Newell 1977) V/164 : LAMOST DR5 catalogs (Luo+, 2019) II/328 : AllWISE Data Release (Cutri+ 2013) Byte-by-byte Description of file: bass_dr3.sam -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 11 I11 --- BASS BASS identifier number (id) 13- 26 F14.10 deg RAdeg Right ascension (J2000) (ra) 28- 41 F14.10 deg DEdeg Declination (J2000) (dec) 43- 50 F8.5 mag gmagIso Apparent AB magnitude in g-band measured with isophotal aperture (magisog) 52- 60 F9.5 mag e_gmagIso Mean error on gmagIso (magerriso_g) 62- 70 F9.5 mag gmagKron Apparent AB magnitude in g-band measured with the Kron aperture (magkrong) (1) 72- 82 F11.5 mag e_gmagKron Mean error on gmagKron (magerrkron_g) 84- 91 F8.5 mag gmag Apparent AB magnitude in g-band measured by fitting a point spread function (magpsfg) (2) 93-101 F9.5 mag e_gmag Mean error on gmag (magerrpsf_g) 103-110 F8.5 mag rmagIso Apparent AB magnitude in r-band measured with isophotal aperture (magisor) 112-120 F9.5 mag e_rmagIso Mean error on rmagIso (magerriso_r) 122-130 F9.5 mag rmagKron Apparent AB magnitude in r-band measured with the Kron aperture (magkronr) (1) 132-141 F10.5 mag e_rmagKron Mean error on rmagKron (magerrkron_r) 143-150 F8.5 mag rmag Apparent AB magnitude in r-band measured by fitting a point spread function (magpsfr) (2) 152-159 F8.5 mag e_rmag Mean error on rmag (magerrpsf_r) 161-168 F8.5 mag zmagIso Apparent AB magnitude in z-band measured with isophotal aperture (magisoz) 170-179 F10.5 mag e_zmagIso Mean error on zmagIso (magerriso_z) 181-190 F10.5 mag zmagKron Apparent AB magnitude in z-band measured with the Kron aperture (magkronz) (1) 192-202 F11.5 mag e_zmagKron Mean error on zmagKron (magerrkron_z) 204-211 F8.5 mag zmag Apparent AB magnitude in z-band measured by fitting a point spread function (magpsfz) (2) 213-220 F8.5 mag e_zmag Mean error on zmag (magerrpsf_z) 222 I1 --- Type [0/1] Combined star/galaxy separation based on the Kron and PSF magnitude difference for all bands, 1 for star, 0 for galaxy (Type) 224-232 F9.6 --- Typeann Star/galaxy separation based on ANN (Artificial Neural Networks, e.g Li et al. 2006PABei..24..285L 2006PABei..24..285L), >0.5 for star, <0.5 for galaxy (Type_ann) 234-241 F8.5 mag rmagcor Apparent AB magnitude extinction-corrected in r-band measured by fitting a point spread function (rMag) (3) 243-250 F8.5 mag gmagcor Apparent AB magnitude extinction-corrected in g-band measured by fitting a point spread function (gMag) (3) 252-259 F8.5 mag zmagcor Apparent AB magnitude extinction-corrected in z-band measured by fitting a point spread function (zMag) (3) 261 I1 --- Classb [0/2] The classification label fir the first b1 and the second b2 classifier (CLASS_b) (4) 263-270 F8.6 --- Pb2 ?=- The classification probability for the second classifier (P_b2) (6) 272 I1 --- Classm [0/2] The classification label for the third m classifier (CLASS_m) (4) 274-281 F8.6 --- Pm The classification probability for the third classifier (P_m) (7) 283-290 F8.6 --- Pb1 The classification probability for the first classifier (P_b1) (5) 292 I1 --- Classbi ?=- [0/2] The classification label for the fourth classifier (CLASS_bi) (4) 296-303 F8.6 --- Pbi2 ?=- The classification probability for the fifth classifier (P_bi2) (9) 305 I1 --- Classmi ?=- [0/2] The classification label for the sixth classfier (CLASS_mi) (4) 309-316 F8.6 --- Pmi ?=- The classification probability for the sixth classfier (P_mi) (10) 318-325 F8.6 --- Pbi1 ?=- The classification probability for the fourth classfier (P_bi1) (8) -------------------------------------------------------------------------------- Note (1): The Kron aperture is defined by Kron 1980ApJS...43..305K 1980ApJS...43..305K, the total source flux is integrated within a certain Kron radius defined such that it encloses a significant fraction of the total light of the object, commonly chosen as about 90 per cent of the total light. The Kron magnitude is particularly useful because it provides a way to measure the total light from extended objects like galaxies, taking into account their diffuse and often irregular light profiles. For resolved sources (e.g. galaxies), the Kron magnitude is a better measure than the PSF magnitude. Note (2): While, for unresolved point sources (e.g. stars and quasars), the PSF magnitude is the best measure by fitting a point spread function (PSF) to the sources. Note (3): Extinction-corrected according to the work (Schindler et al. 2017ApJ...851...13S 2017ApJ...851...13S). Note (4): The classification label definition as follows: 0 = Quasars 1 = Stars 2 = Galaxies (warning: none value for Classbi and Classmi when the object did not cross matched the AllWISE data based, BASS-DR3 and AllWISE cross matched about 40 per cent of cases). Note (5): First classified into extended sources (galaxies) and point sources (stars and quasars) by Classifier 1 which is binary logistic with input pattern (δg, δr, δz, g-r, g-z, r, r-z, g), where δx = xmag-xmagKron for the three optical filters g, r, z; for their parts g,r,z correspond to extinction-corrected magnitudes, (more details in section 5 Application of classifiers). Note (6): Secondly, the point sources are further separated into stars and quasars by Classifier 2 which is binary logistic with input pattern (δg, δr, δz, g-r, r-z, g-z, r, g, z), where δx = xmag-xmagKron for the three optical filters g, r and z; for their parts g,r,z correspond to extinction-corrected magnitudes, (warning: Pb2 not calculated for objects classed as galaxies by Pb1), (more details in section 5 Application of classifiers). Note (7): Thirdly, BASS-DR3 sources are also directly divided into galaxies, stars, and quasars by Classifier 3 which is multiclass softmax with input pattern (δg, δr, δz, g-r, g-z, r, r-z, z), where δx = xmag-xmagKron for the three optical filters g, r, z; for their parts g,r,z correspond to extinction-corrected magnitudes, (more details in section 5 Application of classifiers). Note (8): Fourthly, by correlating BASS-DR3 sources with ALLWISE data base, we obtain BASS-DR3-ALLWISE sources (43 859 467), about 40 per cents of the (110 896 598) objects. Similar to classifying BASS DR3 sources, BASS-DR3-ALLWISE sources are first grouped into extended sources (galaxies) and point sources (stars and quasars) by Classifier 4 which is binary logistic with input pattern (δg, δz, g-W1, W1-W2, z-W1, δr, g-z, z-W2, g-r, r-z, W1, r, g, z, r-W2), where δx = xmag-xmagKron for the three optical filters g, r and z; for their parts g,r,z,W1 allWISE infrared ,W2 allWISE infrared correspond to extinction-corrected magnitudes, (warning: only computed for BASS-DR3-ALLWISE cross-mathced sources), (more details in section 5 Application of classifiers). Note (9): Fifthly, the BASS-DR3-ALLWISE point sources are further discriminated into stars and quasars by Classifier 5 which is binary logistic with input pattern (-W2, W1-W2, g-z, g-r, z-W1, r-z, δz, r, r-W2, z, δg, g-W1, δr, W1, g-W2), where δx = xmag-xmagKron for the three optical filters g, r and z; for their parts g,r,z, W1 allWISE infrared ,W2 allWISE infrared correspond to extinction-corrected magnitudes, (warning: Pbi2 not calculated for objects classed as galaxies by Pbi1), (more details in section 5 Application of classifiers). Note (10): Sixthly, BASS-DR3-ALLWISE sources are directly distinguished into galaxies, stars, and quasars by Classifier 6 which is multiclass softmax with input pattern (z-W2, δz, W1-W2, δr, g-r, z-W1, δg, g-z, r-W2, r-z, r), where δx = xmag-xmagKron for the three optical filters g, r and z; for their parts g,r,z, W1 allWISE infrared ,W2 allWISE infrared correspond to extinction-corrected magnitudes, (warning: only computed for BASS-DR3-ALLWISE cross-mathced sources), (more details in section 5 Application of classifiers). -------------------------------------------------------------------------------- History: Copied at https://nadc.china-vo.org/res/r101066/
(End) Luc Trabelsi [CDS] 21-Jun-2024
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line