J/MNRAS/506/1651 XGBoost ML classifier of BASS DR3 sources (Li+, 2021)
Identification of BASS DR3 sources as stars, galaxies, and quasars by XGBoost.
Li C., Zhang Y., Cui C., Fan D., Zhao Y., Wu X.-B., He B., Xu Y., Li S.,
Han J., Tao Y., Mi L., Yang H., Yang S.
<Mon. Not. R. Astron. Soc., 506, 1651-1664 (2021)>
=2021MNRAS.506.1651L 2021MNRAS.506.1651L (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; Galaxies ; Stars, normal ; YSOs ; Optical ;
Infrared sources ; Photometry ; Spectral types
Keywords: methods: data analysis - methods: statistical -
astronomical data bases: miscellaneous - catalogues - stars: general -
galaxies: general
Abstract:
The Beijing-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue
was released in 2019, which contains the data from all BASS and the
Mosaic z-band Legacy Survey (MzLS) observations during 2015 January
and 2019 March, about 200 million sources. We cross-match BASS DR3
with spectral data bases from the Sloan Digital Sky Survey (SDSS) and
the Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST)
to obtain the spectroscopic classes of known samples. Then, the
samples are cross-matched with ALLWISE data base. Based on optical and
infrared information of the samples, we use the XGBoost algorithm to
construct different classifiers, including binary classification and
multiclass classification. The accuracy of these classifiers with the
best input patterns is larger than 90.0 per cent. Finally, all
selected sources in the BASS DR3 catalogue are classified by these
classifiers. The classification label and probabilities for individual
sources are assigned by different classifiers. When the predicted
results by binary classification are the same as multiclass
classification with optical and infrared information, the number of
star, galaxy, and quasar candidates is separately 12 375 838 (PS >
0.95), 18 606 073 (PG > 0.95), and 798 928 (PQ > 0.95). For these
sources without infrared information, the predicted results can be as
a reference. Those candidates may be taken as input catalogue of
LAMOST, DESI (Dark Energy Spectroscopic Instrument), or other projects
for follow-up observation. The classified result will be of great help
and reference for future research of the BASS DR3 sources.
Description:
The BASS used the 2.3-m Bok telescope to take g and r band imaging
over a sky area of about 5400 deg2 in the northern Galactic cap at
δ > 30 degrees, MzLS used the 4-m Mayall telescope to obtain z
band imaging over a similar sky area to BASS (δ > 32 degrees).
In 2019, the BASS DR3 was released, which contains the data from all
BASS and MzLS observations during 2015 January and 2019 March. The DR3
includes single-epoch photometric catalogue and coadded photometric
catalogue. Sources in DR3 are detected in stacked images and are
required to be identified in at least two bands.
We use coadded photometric catalogue from BASS DR3, which contains
about 200 million sources (Zou et al. 2019ApJS..245....4Z 2019ApJS..245....4Z). According
to the median 5σ AB magnitude depths , we handle the BASS DR3
catalogue by removing out-of-range or bad pixel data. Thus, the number
of selected sources of BASS DR3 is 110 896 598 for classification. The
sources from different data bases may be correlated by positional
cross-match. Named Sample I (BASS-SDSS-LAMOST), and then this sample
is cross-matched with ALLWISE in 4 arcsec radius by CDS Upload X-Match
of the software topcat. We get the BASS-SDSS-LAMOST-ALLWISE sample as
the known sample with identified spectral classes, named Sample II.
All photometries in the samples are extinction-corrected according to
the work (Schindler et al. 2017ApJ...851...13S 2017ApJ...851...13S) and AB magnitudes are
adopted.
Introducing and explaining machine learning model XGBoost in the
section 3 and the classifer model especially the importance of feature
input patterns which affects Binary classifier and Multiclass
classifier of XGBoost, (see sections 4.1-4.3). We thus detail and show
the performances linked to classifiers throughtout the choice of the
input patterns and then make us select patterns having the best
accuracies and performances.
Thus, our goal is to separate galaxies, stars, and quasars with
photometric data (BASS optical filters: g, r, z; ALLWISE mid-IR: W1,
W2) using XGBoost. Compared to 1D histogram and 2D scatter plot,
XGBoost may take all features into account and solve the
classification problem. As a result, XGBoost outperforms random forest
for our case in terms of efficiency, (i.e see best efficiency resumed
in table 7 section 4.4 Discussion). BASS-DR3 sources (110 896 598) are
first classified into extended sources (galaxies) and point sources
(stars and quasars) by Classifier 1st, then the point sources are
further separated into stars and quasars by Classifier 2nd. BASS-DR3
sources are also directly divided into galaxies, stars, and quasars by
Classifier 3rd. By correlating BASS-DR3 sources with ALLWISE data
base, we obtain BASS-DR3-ALLWISE sources (43 859 467). Similar to
classifying BASS DR3 sources, BASS-DR3-ALLWISE sources are first
grouped into extended sources (galaxies) and point sources (stars and
quasars) by Classifier 4th, then the point sources are further
discriminated into stars and quasars by Classifier 5th.
BASS-DR3-ALLWISE sources are directly distinguished into galaxies,
stars, and quasars by Classifier 6th, (i.e see The six classifiers of
XGBoost constructed in section 5 application of classifiers).
Finally, for the BASS DR3 sources, all predicted results are combined
in a whole bass_dr3.sam.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
bass_dr3.sam 325 1000 Photometric and classifications data results
of BASS DR3 sources (110896599 sources)
--------------------------------------------------------------------------------
See also:
VII/28 : Southern Groups and Clusters of Galaxies (Duus, Newell 1977)
V/164 : LAMOST DR5 catalogs (Luo+, 2019)
II/328 : AllWISE Data Release (Cutri+ 2013)
Byte-by-byte Description of file: bass_dr3.sam
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 11 I11 --- BASS BASS identifier number (id)
13- 26 F14.10 deg RAdeg Right ascension (J2000) (ra)
28- 41 F14.10 deg DEdeg Declination (J2000) (dec)
43- 50 F8.5 mag gmagIso Apparent AB magnitude in g-band measured
with isophotal aperture (magisog)
52- 60 F9.5 mag e_gmagIso Mean error on gmagIso (magerriso_g)
62- 70 F9.5 mag gmagKron Apparent AB magnitude in g-band measured
with the Kron aperture (magkrong) (1)
72- 82 F11.5 mag e_gmagKron Mean error on gmagKron (magerrkron_g)
84- 91 F8.5 mag gmag Apparent AB magnitude in g-band measured
by fitting a point spread function
(magpsfg) (2)
93-101 F9.5 mag e_gmag Mean error on gmag (magerrpsf_g)
103-110 F8.5 mag rmagIso Apparent AB magnitude in r-band measured
with isophotal aperture (magisor)
112-120 F9.5 mag e_rmagIso Mean error on rmagIso (magerriso_r)
122-130 F9.5 mag rmagKron Apparent AB magnitude in r-band measured
with the Kron aperture (magkronr) (1)
132-141 F10.5 mag e_rmagKron Mean error on rmagKron (magerrkron_r)
143-150 F8.5 mag rmag Apparent AB magnitude in r-band measured
by fitting a point spread function
(magpsfr) (2)
152-159 F8.5 mag e_rmag Mean error on rmag (magerrpsf_r)
161-168 F8.5 mag zmagIso Apparent AB magnitude in z-band measured
with isophotal aperture (magisoz)
170-179 F10.5 mag e_zmagIso Mean error on zmagIso (magerriso_z)
181-190 F10.5 mag zmagKron Apparent AB magnitude in z-band measured
with the Kron aperture (magkronz) (1)
192-202 F11.5 mag e_zmagKron Mean error on zmagKron (magerrkron_z)
204-211 F8.5 mag zmag Apparent AB magnitude in z-band measured
by fitting a point spread function
(magpsfz) (2)
213-220 F8.5 mag e_zmag Mean error on zmag (magerrpsf_z)
222 I1 --- Type [0/1] Combined star/galaxy separation based
on the Kron and PSF magnitude difference
for all bands, 1 for star, 0 for galaxy
(Type)
224-232 F9.6 --- Typeann Star/galaxy separation based on ANN
(Artificial Neural Networks, e.g Li et al.
2006PABei..24..285L 2006PABei..24..285L), >0.5 for star,
<0.5 for galaxy (Type_ann)
234-241 F8.5 mag rmagcor Apparent AB magnitude extinction-corrected
in r-band measured by fitting a point spread
function (rMag) (3)
243-250 F8.5 mag gmagcor Apparent AB magnitude extinction-corrected in
g-band measured by fitting a point spread
function (gMag) (3)
252-259 F8.5 mag zmagcor Apparent AB magnitude extinction-corrected in
z-band measured by fitting a point spread
function (zMag) (3)
261 I1 --- Classb [0/2] The classification label fir the first
b1 and the second b2 classifier (CLASS_b) (4)
263-270 F8.6 --- Pb2 ?=- The classification probability for the
second classifier (P_b2) (6)
272 I1 --- Classm [0/2] The classification label for the third
m classifier (CLASS_m) (4)
274-281 F8.6 --- Pm The classification probability for the third
classifier (P_m) (7)
283-290 F8.6 --- Pb1 The classification probability for the first
classifier (P_b1) (5)
292 I1 --- Classbi ?=- [0/2] The classification label for the
fourth classifier (CLASS_bi) (4)
296-303 F8.6 --- Pbi2 ?=- The classification probability for the
fifth classifier (P_bi2) (9)
305 I1 --- Classmi ?=- [0/2] The classification label for the
sixth classfier (CLASS_mi) (4)
309-316 F8.6 --- Pmi ?=- The classification probability for the
sixth classfier (P_mi) (10)
318-325 F8.6 --- Pbi1 ?=- The classification probability for the
fourth classfier (P_bi1) (8)
--------------------------------------------------------------------------------
Note (1): The Kron aperture is defined by Kron 1980ApJS...43..305K 1980ApJS...43..305K, the total
source flux is integrated within a certain Kron radius defined such
that it encloses a significant fraction of the total light of the
object, commonly chosen as about 90 per cent of the total light.
The Kron magnitude is particularly useful because it provides a way
to measure the total light from extended objects like galaxies,
taking into account their diffuse and often irregular light profiles.
For resolved sources (e.g. galaxies), the Kron magnitude is a better
measure than the PSF magnitude.
Note (2): While, for unresolved point sources (e.g. stars and quasars),
the PSF magnitude is the best measure by fitting a point spread
function (PSF) to the sources.
Note (3): Extinction-corrected according to the work (Schindler et al.
2017ApJ...851...13S 2017ApJ...851...13S).
Note (4): The classification label definition as follows:
0 = Quasars
1 = Stars
2 = Galaxies
(warning: none value for Classbi and Classmi when the object
did not cross matched the AllWISE data based, BASS-DR3 and AllWISE
cross matched about 40 per cent of cases).
Note (5): First classified into extended sources (galaxies) and point sources
(stars and quasars) by Classifier 1 which is binary logistic with
input pattern (δg, δr, δz, g-r, g-z, r, r-z, g),
where δx = xmag-xmagKron for the three optical filters g, r, z;
for their parts g,r,z correspond to extinction-corrected magnitudes,
(more details in section 5 Application of classifiers).
Note (6): Secondly, the point sources are further separated into stars and
quasars by Classifier 2 which is binary logistic with input pattern
(δg, δr, δz, g-r, r-z, g-z, r, g, z), where
δx = xmag-xmagKron for the three optical filters g, r and z;
for their parts g,r,z correspond to extinction-corrected magnitudes,
(warning: Pb2 not calculated for objects classed as galaxies by Pb1),
(more details in section 5 Application of classifiers).
Note (7): Thirdly, BASS-DR3 sources are also directly divided into galaxies,
stars, and quasars by Classifier 3 which is multiclass softmax with
input pattern (δg, δr, δz, g-r, g-z, r, r-z, z),
where δx = xmag-xmagKron for the three optical filters g, r, z;
for their parts g,r,z correspond to extinction-corrected magnitudes,
(more details in section 5 Application of classifiers).
Note (8): Fourthly, by correlating BASS-DR3 sources with ALLWISE data base,
we obtain BASS-DR3-ALLWISE sources (43 859 467), about 40 per cents
of the (110 896 598) objects. Similar to classifying BASS DR3 sources,
BASS-DR3-ALLWISE sources are first grouped into extended sources
(galaxies) and point sources (stars and quasars) by Classifier 4
which is binary logistic with input pattern (δg, δz,
g-W1, W1-W2, z-W1, δr, g-z, z-W2, g-r, r-z, W1, r, g, z, r-W2),
where δx = xmag-xmagKron for the three optical filters
g, r and z; for their parts g,r,z,W1 allWISE infrared ,W2 allWISE
infrared correspond to extinction-corrected magnitudes,
(warning: only computed for BASS-DR3-ALLWISE cross-mathced sources),
(more details in section 5 Application of classifiers).
Note (9): Fifthly, the BASS-DR3-ALLWISE point sources are further discriminated
into stars and quasars by Classifier 5 which is binary logistic with
input pattern (-W2, W1-W2, g-z, g-r, z-W1, r-z, δz, r, r-W2, z,
δg, g-W1, δr, W1, g-W2), where δx = xmag-xmagKron
for the three optical filters g, r and z; for their parts g,r,z,
W1 allWISE infrared ,W2 allWISE infrared correspond to
extinction-corrected magnitudes,
(warning: Pbi2 not calculated for objects classed as galaxies by
Pbi1),
(more details in section 5 Application of classifiers).
Note (10): Sixthly, BASS-DR3-ALLWISE sources are directly distinguished into
galaxies, stars, and quasars by Classifier 6 which is multiclass
softmax with input pattern (z-W2, δz, W1-W2, δr, g-r,
z-W1, δg, g-z, r-W2, r-z, r), where δx = xmag-xmagKron
for the three optical filters g, r and z; for their parts g,r,z,
W1 allWISE infrared ,W2 allWISE infrared correspond to
extinction-corrected magnitudes,
(warning: only computed for BASS-DR3-ALLWISE cross-mathced sources),
(more details in section 5 Application of classifiers).
--------------------------------------------------------------------------------
History:
Copied at https://nadc.china-vo.org/res/r101066/
(End) Luc Trabelsi [CDS] 21-Jun-2024