J/A+A/633/A154      HDBSCAN star, galaxy, QSO classification      (Logan+, 2020)

Unsupervised star, galaxy, QSO classification. Application of HDBSCAN. Logan C.H.A., Fotopoulou S. <Astron. Astrophys. 633, A154 (2020)> =2020A&A...633A.154L 2020A&A...633A.154L (SIMBAD/NED BibCode)
ADC_Keywords: Surveys ; MK spectral classification ; Redshifts ; Photometry Keywords: stars: general - galaxies: general - galaxies: active - methods: data analysis - surveys Abstract: Classification will be an important first step for upcoming surveys that will detect billions of new sources such as LSST and Euclid, as well as DESI, 4MOST and MOONS. The application of traditional methods of model fitting and colour-colour selections will face significant computational constraints, while machine-learning (ML) methods offer a viable approach to tackle datasets of that volume. While supervised learning methods can perform very well for classification tasks, the creation of representative and accurate training sets is a resource and time consuming task. We present a viable alternative using an unsupervised ML method to separate stars, galaxies and QSOs using photometric data. The heart of our work uses HDBSCAN to find the star, galaxy and QSO clusters in a multidimensional colour space. We optimized the hyperparameters and input attributes of three separate HDBSCAN runs, each to select a particular object class, and thus treat the output of each separate run as a binary classifier. We subsequently consolidate the output to give our final classifications, optimized on their F1 scores. We explore the use of Random Forest and PCA as part of the pre-processing stage for feature selection and dimensionality reduction. Using our dataset of ∼50000 spectroscopically labelled objects we obtain an F1 score of 98.9, 98.9 and 93.13 respectively for star, galaxy and QSO selection using our unsupervised learning method. We find that careful attribute selection is a vital part of accurate classification with HDBSCAN. We applied our classification to a subset of the SDSS spectroscopic catalogue and demonstrate the potential of our approach in correcting misclassified spectra useful for DESI and 4MOST. Finally, we create a multiwavelength catalogue of 2.7 million sources using the KiDS, VIKING and ALLWISE surveys and publish corresponding classifications and photometric redshifts. Description: Photometric data and classifications of sources and other outputs from the star, galaxy, QSO classification method presented in the paper. We note that the column descriptions are also in the Appendix, sometimes with specific references to what sections detail the output of certain columns. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file cpz.dat 692 48686 CPz catalogue with object classifications klabels.dat 535 2728329 KiDSVW catalogue with object classifications -------------------------------------------------------------------------------- See also: J/A+A/619/A14 : Classification-aided zph estimation (Fotopoulou+, 2018) Byte-by-byte Description of file: cpz.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 24 A24 --- id Spectroscopic redshift ID (G1) 26- 34 F9.5 deg RAdeg Spectroscopic redshift right ascension (J2000) (G1) 36- 44 F9.5 deg DEdeg Spectroscopic redshift declination (J2000) (G1) 46- 54 F9.6 --- z Spectroscopic redshift value (G1) 56 I1 --- Hclass [0/3] Spectroscopic redshift classification (0=star, 1=galaxy, 3=QSO) (2) 58- 64 F7.4 mag umag u band total magnitude 66- 74 F9.4 mag e_umag u band total magnitude error 76- 82 F7.4 mag gmag g band total magnitude 84- 93 F10.4 mag e_gmag ? g band total magnitude error 95-101 F7.4 mag rmag r band total magnitude 103-113 F11.4 mag e_rmag r band total magnitude error 115-121 F7.4 mag imag i band total magnitude 123-133 F11.4 mag e_imag i band total magnitude error 135-141 F7.4 mag zmag z band total magnitude 143-153 F11.4 mag e_zmag z band total magnitude error 155-161 F7.4 mag Ymag Y band total magnitude 163-168 F6.4 mag e_Ymag Y band total magnitude error 170-176 F7.4 mag Jmag J band total magnitude 178-183 F6.4 mag e_Jmag J band total magnitude error 185-191 F7.4 mag Hmag H band total magnitude 193-198 F6.4 mag e_Hmag H band total magnitude error 200-206 F7.4 mag Kmag K band total magnitude 208-213 F6.4 mag e_Kmag K band total magnitude error 215-221 F7.4 mag W1mag W1 band total magnitude 223-228 F6.4 mag e_W1mag W1 band total magnitude error 230-236 F7.4 mag W2mag W2 band total magnitude 238-243 F6.4 mag e_W2mag W2 band total magnitude error 245-251 F7.4 mag u3mag u band 3 arcsecond magnitude 253-262 F10.4 mag e_u3mag ? u band 3 arcsecond magnitude error 264-270 F7.4 mag g3mag g band 3 arcsecond magnitude 272-278 F7.4 mag e_g3mag g band 3 arcsecond magnitude error 280-286 F7.4 mag r3mag r band 3 arcsecond magnitude 288-293 F6.4 mag e_r3mag r band 3 arcsecond magnitude error 295-301 F7.4 mag i3mag i band 3 arcsecond magnitude 303-308 F6.4 mag e_i3mag i band 3 arcsecond magnitude error 310-316 F7.4 mag Z3mag z band 3 arcsecond magnitude 318-323 F6.4 mag e_Z3mag z band 3 arcsecond magnitude error 325-331 F7.4 mag Y3mag Y band 3 arcsecond magnitude 333-338 F6.4 mag e_Y3mag Y band 3 arcsecond magnitude error 340-346 F7.4 mag J3mag J band 3 arcsecond magnitude 348-353 F6.4 mag e_J3mag J band 3 arcsecond magnitude error 355-361 F7.4 mag H3mag H band 3 arcsecond magnitude 363-369 F7.4 mag e_H3mag H band 3 arcsecond magnitude error 371-377 F7.4 mag K3mag K band 3 arcsecond magnitude 379-384 F6.4 mag e_K3mag K band 3 arcsecond magnitude error 386-397 F12.5 arcsec Yhlr ?=-99 Y band half light radius (HLR) 399-410 F12.5 arcsec Jhlr ?=-99 J band half light radius (HLR) 412-423 F12.5 arcsec Hhlr ?=-99 H band half light radius (HLR) 425-436 F12.5 arcsec Khlr ?=-99 K band half light radius (HLR) 438-445 F8.5 --- PCAs1c PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcastar1_colours) 447-454 F8.5 --- PCAs2c PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcastar2_colours) 456-463 F8.5 --- PCAs3c PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcastar3_colours) 465-472 F8.5 --- PCAg1c PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcagal1_colours) 474-481 F8.5 --- PCAg2c PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcagal2_colours) 483-490 F8.5 --- PCAg3c PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcagal3_colours) 492-499 F8.5 --- PCAq1c PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcaqso1_colours) 501-508 F8.5 --- PCAq2c PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcaqso2_colours) 510-517 F8.5 --- PCAq3c PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup, colours used (pcaqso3_colours) 519 I1 --- ClasscO [0/3] Consolidation method classification (0=outlier, 1=star, 2=gal, 3=QSO), colours used (hdbscanclassoptimalmethod_colours) 521 I1 --- CassscA [0/3] (0=outlier, 1=star, 2=gal, 3=QSO), colours used (hdbscanclassalternativemethod_colours) 523 I1 --- dpc [0/1] Doubly positively classified objects, colours used (doublepositivescolours) 525-529 F5.3 --- Poutc Outlier probability , colours used (outlierprobabilitycolours) 531-535 F5.3 --- Pstarc Star probability, colours used (starprobabilitycolours) 537-541 F5.3 --- Pgalc Galaxy probability, colours used (galprobabilitycolours) 543-547 F5.3 --- Pqsoc QSO probability, colours used (QSOprobabilitycolours) 549 I1 --- Labelc [0/3] Final labels, 'highest probability' method (highestprobabilitylabels) 551-559 F9.5 --- PCAs1cHLR ?=-99 PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup, colours & HLR used (pcastar1_colours+HLR) 561-569 F9.5 --- PCAs2cHLR ?=-99 PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup, colours & HLR used (pcastar2_colours+HLR) 571-579 F9.5 --- PCAs3cHLR ?=-99 PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup, colours & HLR used (pcastar3_colours+HLR) 581-589 F9.5 --- PCAg1cHLR ?=-99 PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup , colours & HLR used (pcagal1_colours+HLR) 591-599 F9.5 --- PCAg2cHLR ?=-99 PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup , colours & HLR used (pcagal2_colours+HLR) 601-609 F9.5 --- PCAg3cHLR ?=-99 PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup , colours & HLR used (pcagal3_colours+HLR) 611-619 F9.5 --- PCAq1cHLR ?=-99 PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup , colours & HLR used (pcaqso1_colours+HLR) 621-629 F9.5 --- PCAq2cHLR ?=-99 PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup , colours & HLR used (pcaqso2_colours+HLR) 631-639 F9.5 --- PCAq3cHLR ?=-99 PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup , colours & HLR used (pcaqso3_colours+HLR) 641-643 I3 --- ClasscHLRO [0/3]?=-99 Optimal consolidation method classification (0=outlier, 1=star, 2=gal, 3=QSO), colours & H LR used (hdbscanclassoptimalmethod_colours+HLR) 645-647 I3 --- ClasscHLRA [0/3]?=-99 Alternative consolidation method classification (0=outlier, 1=star, 2=gal, 3=QSO), colours & HLR used (hdbscanclassalternativemethod_colours+HLR) 649-651 I3 --- dpcHLR [0/1]?=-99 Doubly positively classified objects, colours & HLR used (doublepositivescolours+HLR) 653-660 F8.3 --- PoutcHLR ?=-99 Outlier probability, colours & HLR used (outlierprobabilitycolours+HLRs) 662-669 F8.3 --- PstarcHLR ?=-99 Star probability, colours & HLR used (starprobabilitycolours+HLRs) 671-678 F8.3 --- PgalcHLR ?=-99 Galaxy probability, colours & HLR used (galprobabilitycolours+HLRs) 680-687 F8.3 --- PqsocHLR ?=-99 QSO probability, colours & HLR used (QSOprobabilitycolours+HLRs) 690-692 I3 --- LabelcHLR [0/3]?=-99 Final labels for when the 'highest probability' method is used, colours & HLR used (highestprobabilitylabels_colours+HLRs) -------------------------------------------------------------------------------- Note (2): same as column 5 in the CPz catalogue presented in Fotopoulou and Paltani, 2018, Cat. J/A+A/619/A14, with the change as described in Sect.2.2 in the paper paper for AGN and UNKNOWN. -------------------------------------------------------------------------------- Byte-by-byte Description of file: klabels.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 8 I8 --- KiDSu KiDSVW identifier (serialidkidsu_dr4v3) 10- 18 F9.5 deg RAdeg Spectroscopic redshift right ascension (J2000) 20- 28 F9.5 deg DEdeg Spectroscopic redshift declination (J2000) 30- 35 A6 --- zClass14 Spectroscopic redshift classification 37- 45 F9.6 --- zSDSS14 ? Spectroscopic redshift value from SDSS DR14 47- 55 F9.6 --- zSDSSQ14 ? Spectroscopic redshift value from DR14Q (1) 57- 80 A24 --- id Spectroscopic redshift ID (G1) 82- 88 F7.4 mag umag u band total magnitude 90- 96 F7.4 mag e_umag u band total magnitude error 98-104 F7.4 mag gmag g band total magnitude 106-112 F7.4 mag e_gmag g band total magnitude error 114-120 F7.4 mag rmag r band total magnitude 122-128 F7.4 mag e_rmag r band total magnitude error 130-136 F7.4 mag imag i band total magnitude 138-143 F6.4 mag e_imag i band total magnitude error 145-151 F7.4 mag zmag z band total magnitude 153-161 F9.4 mag e_zmag z band total magnitude error 163-169 F7.4 mag Ymag Y band total magnitude 171-178 F8.4 mag e_Ymag Y band total magnitude error 180-186 F7.4 mag Jmag J band total magnitude 188-195 F8.4 mag e_Jmag J band total magnitude error 197-203 F7.4 mag Hmag H band total magnitude 205-213 F9.4 mag e_Hmag H band total magnitude error 215-221 F7.4 mag Kmag K band total magnitude 223-230 F8.4 mag e_Kmag K band total magnitude error 232-238 F7.4 mag W1mag W1 band total magnitude 240-245 F6.4 mag e_W1mag W1 band total magnitude error 247-253 F7.4 mag W2mag W2 band total magnitude 255-260 F6.4 mag e_W2mag W2 band total magnitude error 262-268 F7.4 mag u3mag u band 3 arcsecond magnitude 270-278 F9.4 mag e_u3mag ? u band 3 arcsecond magnitude error 280-286 F7.4 mag g3mag g band 3 arcsecond magnitude 288-293 F6.4 mag e_g3mag g band 3 arcsecond magnitude error 295-301 F7.4 mag r3mag r band 3 arcsecond magnitude 303-309 F7.4 mag e_r3mag r band 3 arcsecond magnitude error 311-317 F7.4 mag i3mag i band 3 arcsecond magnitude 319-324 F6.4 mag e_i3mag i band 3 arcsecond magnitude error 326-332 F7.4 mag z3mag z band 3 arcsecond magnitude 334-339 F6.4 mag e_z3mag z band 3 arcsecond magnitude error 341-347 F7.4 mag Y3mag Y band 3 arcsecond magnitude 349-354 F6.4 mag e_Y3mag Y band 3 arcsecond magnitude error 356-362 F7.4 mag J3mag J band 3 arcsecond magnitude 364-369 F6.4 mag e_J3mag J band 3 arcsecond magnitude error 371-377 F7.4 mag H3mag H band 3 arcsecond magnitude 379-385 F7.4 mag e_H3mag H band 3 arcsecond magnitude error 387-393 F7.4 mag K3mag K band 3 arcsecond magnitude 395-400 F6.4 mag e_K3mag K band 3 arcsecond magnitude error 402-409 F8.5 --- PCAs1 PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup (pcastar1_) 411-418 F8.5 --- PCAs2 PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup (pcastar2) 420-427 F8.5 --- PCAs3 PCA components, STAR HDBSCAN binary classifiers, 'optimal' method setup (pcastar3) 429-436 F8.5 --- PCAg1 PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup (pcagal1) 438-445 F8.5 --- PCAg2 PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup (pcagal2) 447-454 F8.5 --- PCAg3 PCA components, GAL HDBSCAN binary classifiers, 'optimal' method setup (pcagal3) 456-463 F8.5 --- PCAq1 PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup (pcaqso1) 465-472 F8.5 --- PCAq2 PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup (pcaqso2) 474-481 F8.5 --- PCAq3 PCA components, QSO HDBSCAN binary classifiers, 'optimal' method setup (pcaqso3) 483 I1 --- ClassO [0/3] Optimal consolidation method classification (0=outlier, 1=star, 2=gal, 3=QSO), colours used (hdbscanclassoptimalmethod) 485 I1 --- CasssA [0/3] Alternative consolidation method classification (0=outlier, 1=star, 2=gal, 3=QSO), colours used (hdbscanclassalternativemethod) 487 I1 --- dpc [0/3] Doubly positively classified objects, colours used (double_positives) 489-493 F5.3 --- Poutc Outlier probability, colours used (outlier_probability) 495-499 F5.3 --- Pstarc Star probability, colours used (star_probability) 501-505 F5.3 --- Pgalc Galaxy probability, colours used (gal_probability) 507-511 F5.3 --- Pqsoc QSO probability, colours used (QSO_probability) 513 I1 --- Labelc [0/3] Final labels, 'highest probability' method (highestprobabilitylabels) 515-522 F8.6 --- zPredG Photometric redshift predictions (GALPHOTOZPREDICTOR) 524-531 F8.6 --- zPredQ Photometric redshift predictions (QSOPHOTOZPREDICTOR) 533-535 I3 --- phztrain [0/10]?=-99 Training/validation/test set labels for the photometric redshift predictions (see Sect. 7.3) (2) -------------------------------------------------------------------------------- Note (1): column zSDSSQ14 can be used as a flag for if a point is in the DR14Q catalogue - if it has a value, it is in the DR14Q catalogue. Note (2): If the source is in SDSS DR14 it has a value from 1-10. The training set has values 1-6, validation 7-8 and test 9-10. -99 values are for sources not in SDSSDR14. -------------------------------------------------------------------------------- Global notes: Note (G1): same as in the CPz catalogue presented in Fotopoulou and Paltani, 2018, Cat. J/A+A/619/A14. -------------------------------------------------------------------------------- Acknowledgements: Crispin Logan, crispin.logan(at)bristol.ac.uk
(End) Crispin Logan [Bristol University, UK], Patricia Vannier [CDS] 21-Nov-2019
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line