J/A+A/649/A81 KiDSDR4 QSOs photometric redshifts catalog (Nakoneczny+, 2021)
Photometric selection and redshifts for quasars in the
Kilo-Degree Survey Data Release 4.
Nakoneczny S.J., Bilicki M., Pollo A., Asgari M., Dvornik A., Erben T.,
Giblin B., Heymans C., Hildebrandt H., Kannawadi A., Kuijken K.,
Napolitano N.R., Valentijn E.
<Astron. Astrophys. 649, A81 (2021)>
=2021A&A...649A..81N 2021A&A...649A..81N (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; Active gal. nuclei ; QSOs ; Galaxy catalogs;
Galaxies, photometry ; Redshifts ; Colors
Keywords: methods: data analysis - methods: observational - catalog - surveys -
quasars: general - large-scale structure of Universe
Abstract:
We present a catalog of quasars with their corresponding redshifts
derived from the photometric Kilo-Degree Survey (KiDS) Data Release 4.
We achieved it by training machine learning (ML) models using optical
ugri and near-infrared ZYJHKs bands, on objects known from SDSS
spectroscopy. We define inference subsets from the 45 million objects
of the KiDS photometric data limited to 9-band detections, based on a
feature space built from magnitudes and their combinations. We show
that projections of the high-dimensional feature space on two
dimensions can be successfully used instead of the standard
color-color plots, to investigate the photometric estimations, compare
them with spectroscopic data, and efficiently support the process of
building a catalog. The model selection and fine-tuning employs two
subsets of objects: those randomly selected and the faintest ones,
which allows us to properly fit the bias vs. variance trade-off. We
test three ML models: Random Forest (RF), XGBoost (XGB) and Artificial
Neural Network (ANN). We find that XGB is the most robust and
straightforward model for classification, while ANN performs the best
for combined classification and redshift. The ANN inference results
are tested using number counts, Gaia parallaxes and other quasar
catalogs external to the training set. Based on these tests, we derive
the minimum classification probability for quasar candidates which
provides the best purity vs. completeness trade-off: p(QSO_cand)>0.9
for r<22, and p(QSO_cand)>0.98 for 22<r<23.5. We find 158000 quasar
candidates in the safe inference subset (r<22), and further 185000 in
the reliable extrapolation regime (22<r<23.5). Test-data purity equals
97%, completeness is 94%, the latter dropping by 3% in the
extrapolation to data fainter by one magnitude than the training set.
The photometric redshifts are derived with ANN and modeled with
Gaussian uncertainties. Test-data redshift error (mean and scatter)
equals 0.009±0.12 in the safe subset, and -0.0004±0.19 in the
extrapolation, averaged over redshift range 0.14<z<3.63 (1st and 99th
percentiles). Our success of the extrapolation challenges the way that
models are optimized and applied at the faint data end. The resulting
catalog is ready for cosmology and Active Galactic Nucleus (AGN)
studies.
Description:
The catalog results from applying artificial neural networks to
process the KiDS data limited to 9-band detections. The machine
learning (ML) models are trained on KiDS objects cross-matched with
the spectroscopic SDSS survey. We address the problem of extrapolation
to KiDS objects fainter than the SDSS limit by properly generalising
ML models, and creating inference subsets which describe the
reliability of estimations: safe, extrapolation, unsafe. We provide
the suggested cuts on magnitude and probability of photometric
classification, which are derived from validating the catalog with
several methods.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
qsos.dat 155 1095711 *Catalog of quasar candidates
all.dat 155 45469955 *Catalog of all machine learning estimates
--------------------------------------------------------------------------------
Note on qsos.dat: Data limited to:
9-band detections, r<25, CLASS_STAR<0.2 or CLASS_STAR>0.8, QSO_PHOTO>0.9.
Note on all.dat: Data limited to 9-band detections.
--------------------------------------------------------------------------------
See also:
J/A+A/624/A13 : KiDS DR3 QSO catalog (Nakoneczny+, 2019)
Byte-by-byte Description of file: qsos.dat all.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 29 A29 --- KiDSDR4 KiDSDR4 designation,
KiDSDR4 JHHMMSS.sss+DDMMSS.ss
31- 40 F10.6 deg RAdeg [0/360] Centroid sky position
right ascension (J2000)
42- 51 F10.6 deg DEdeg Centroid sky position declination (J2000)
53- 59 F7.4 mag rmag r-band GAaP magnitude with optimal MIN_APER
(extinction corrected)
61- 68 F8.6 --- ClassStar SExtractor star-galaxy classifier
70- 74 I5 --- Mask 9-band mask information
76- 87 E12.6 --- PGalaxy Probability that the source is a galaxy
89-100 E12.6 --- PQSO Probability that the source is a QSO
102-113 E12.6 --- PStar Probability that the source is a star
115-120 A6 --- Class Object class with the highest probability,
GALAXY, QSO or STAR
122-130 F9.7 ---- zph Photometric redshift
132-141 F10.8 --- e_zph Uncertainty of photometric redshift
143-155 A13 --- Subset ML inference subset (1)
--------------------------------------------------------------------------------
Note (1): see Section 2.2 in the paper. Values as follows:
safe = safe subset is r<22 and
a stellarity index of ∉ (0.2, 0.8)
extrapolation = extrapolation subset is r ∈ (22, 25) and
a stellarity index of ∉ (0.2, 0.8)
unsafe = unsafe subset is r>25 or a stellarity index of ∈ (0.2, 0.8)
--------------------------------------------------------------------------------
Acknowledgements:
Szymon J. Nakoneczny, szymon.nakoneczny(at)ncbj.gov.pl
(End) Szymon J. Nakoneczny [NCBJ, Poland], Patricia Vannier [CDS] 22-Feb-2021