J/A+A/646/A104      Improving the open cluster census. I.          (Hunt+, 2021)

Improving the open cluster census. I. Comparison of clustering algorithms applied to Gaia DR2 data. Hunt E.L., Reffert S. <Astron. Astrophys. 646, A104 (2021)> =2021A&A...646A.104H 2021A&A...646A.104H (SIMBAD/NED BibCode)
ADC_Keywords: Milky Way ; Surveys ; Positional data ; Clusters, open Keywords: methods: data analysis - open clusters and associations: general - astrometry Abstract: The census of open clusters in the Milky Way is in a never-before seen state of flux. Recent works have reported hundreds of new open clusters thanks to the incredible astrometric quality of the Gaia satellite, but other works have also reported that many open clusters discovered in the pre Gaia era may be associations. We aim to conduct a comparison of clustering algorithms used to detect open clusters, attempting to statistically quantify their strengths and weaknesses by deriving the sensitivity, specificity, and precision of each as well as their true positive rate against a larger sample. We selected DBSCAN, HDBSCAN, and Gaussian mixture models for further study, owing to their speed and appropriateness for use with Gaia data. We developed a preprocessing pipeline for Gaia data and developed the algorithms further for the specific application to open clusters. We derived detection rates for all 1385 open clusters in the fields in our study as well as more detailed performance statistics for 100 of these open clusters. DBSCAN was sensitive to 50%-62% of the true positive open clusters in our sample, with generally very good specificity and precision. HDBSCAN traded precision for a higher sensitivity of up to 82%, especially across different distances and scales of open clusters. Gaussian mixture models were slow and only sensitive to 33% of open clusters in our sample, which tended to be larger objects. Additionally, we report on 41 new open cluster candidates detected by HDBSCAN, three of which are closer than 500pc. When used with additional post-processing to mitigate its false positives, we have found that HDBSCAN is the most sensitive and effective algorithm for recovering open clusters in Gaia data. Our results suggest that many more new and already reported open clusters have yet to be detected in Gaia data. Description: clusterl.dat (clustersliteratureocs) contains one entry for every detected open cluster by every algorithm and parameter combination. When no algorithm detected a cluster, a blank row with only 'Name' and 'source' parameters filled is given. clustern.dat (clustersnewocs) contains 41 new open clusters detected by this work. membersl.dat (membersliteratureocs) and membersn.dat (membersnewocs) list Gaia DR2 members of literature and newly detected open clusters respectively. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file clusterl.dat 267 5175 List of detected open clusters by each algorithm clustern.dat 246 41 List of new open clusters reported by this work membersl.dat 322 678768 List of members for detected literature OCs membersn.dat 322 2099 List of members for new open clusters -------------------------------------------------------------------------------- See also: I/345 : Gaia DR2 (Gaia Collaboration, 2018) J/ApJS/245/32 : Newly identified star clusters in Gaia DR2 (Liu+, 2019) J/A+A/558/A53 : Milky Way global survey of star clusters II (Kharchenko+, 2013) J/A+A/635/A45 : 570 new open clusters in Galactic disc (Castro-Ginard+, 2020) J/A+A/633/A99 : Gaia DR2 open clusters in Milky Way. II (Cantat-Gaudin+, 2020) Byte-by-byte Description of file: cluster?.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 17 A17 --- Name Name of the source 19- 37 A19 --- IntID Internal ID of the source 39- 45 A7 --- Algorithm Algorithm used for this detection 47- 68 A22 --- Param Parameters of algorithm used 70- 78 F9.5 deg RAdeg ? Right ascension (ICRS) at Ep=2015.5 80- 86 F7.5 deg s_RAdeg ? Standard deviation of right ascension 88- 94 F7.5 deg e_RAdeg ? Standard error of right ascension 96-104 F9.5 deg DEdeg ? Declination (ICRS) at Ep=2015.5 106-112 F7.5 deg s_DEdeg ? Standard deviation of declination 114-120 F7.5 deg e_DEdeg ? Standard error of declination 122-130 F9.5 deg GLON ? Galactic longitude (2015.5) 132-140 F9.5 deg GLAT ? Galactic latitude (2015.5) 142-150 F9.5 mas/yr pmRA ? Proper motion along RA, pmRA*cosDE (2015.5) 152-158 F7.5 mas/yr s_pmRA ? Standard deviation of pmRA 160-166 F7.5 mas/yr e_pmRA ? Standard error of pmRA 168-176 F9.5 mas/yr pmDE ? Proper motion along DE (2015.5) 178-184 F7.5 mas/yr s_pmDE ? Standard deviation of pmDE 186-192 F7.5 mas/yr e_pmDE ? Standard error of pmDE 194-200 F7.5 mas plx ? Parallax 202-208 F7.5 mas s_plx ? Standard deviation of parallax 210-216 F7.5 mas e_plx ? Standard error of parallax 218-224 F7.5 deg rad50 ? Radius containing 50% of members 226-232 F7.5 deg radtidal ? Estimated tidal radius 234-237 I4 --- Nstars ? Number of identified member stars 239-246 F8.5 --- cst ? Cluster significance test score 248-267 A20 --- Source Source paper used for crossmatching, only for clusterl.dat file (1) -------------------------------------------------------------------------------- Note (1): the following papers are referenced as: Cantat-Gaudin+_2020a = Cantat-Gaudin et al., 2020A&A...633A..99C 2020A&A...633A..99C, Cat. J/A+A/633/A99 Castro-Ginard+_2020 = Castro-ginard et al., 2020A&A...635A..45C 2020A&A...635A..45C, Cat. J/A+A/635/A45 Kharchenko+_2013 = Kharchenko et al., 2013A&A...558A..53K 2013A&A...558A..53K, Cat. J/A+A/558/A53 Liu+_2019 = Liu & Pang, 2019ApJS..245...32L 2019ApJS..245...32L, Cat. J/ApJS/245/32 -------------------------------------------------------------------------------- Byte-by-byte Description of file: members?.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 17 A17 --- Name Name of the source 19- 37 A19 --- IntID Internal ID of the source 39- 45 A7 --- Algorithm Algorithm used for this detection 47- 68 A22 --- Param Parameters of algorithm used 70- 88 I19 --- GaiaDR2 Gaia DR2 source ID of this source 90- 98 F9.5 deg RAdeg Right ascension (ICRS) at Ep=2015.5 100-106 F7.5 deg e_RAdeg Standard error of right ascension 108-116 F9.5 deg DEdeg Declination (ICRS) at Ep=2015.5 118-124 F7.5 deg e_DEdeg Standard error of declination 126-134 F9.5 deg GLON Galactic longitude (2015.5) 137-145 E9.5 deg GLAT Galactic latitude (2015.5) 147-155 E9.5 mas/yr pmRA Proper motion along RA, pmRA*cosDE (2015.5) 157-163 F7.5 mas/yr e_pmRA Standard error of pmRA 165-173 E9.5 mas/yr pmDE Proper motion along DE (2015.5) 175-181 F7.5 mas/yr e_pmDE Standard error of pmDE 183-190 E8.5 mas plx Parallax 192-198 F7.5 mas e_plx Standard error of parallax 200-207 F8.5 mag Gmag Magnitude in the G band 209-216 F8.5 mag BPmag Magnitude in the BP band 218-225 F8.5 mag RPmag Magnitude in the RP band 227-241 F15.5 e-/s FG Flux in the G band 243-255 F13.5 e-/s e_FG Standard error on flux in the G band 257-271 F15.5 e-/s FBP Flux in the BP band 273-284 F12.5 e-/s e_FBP Standard error on flux in the BP band 286-300 F15.5 e-/s FRP Flux in the RP band 302-314 F13.5 e-/s e_FRP Standard error on flux in the RP band 316-322 F7.5 deg Prob Membership probability -------------------------------------------------------------------------------- Acknowledgements: Emily Blunt, ehunt(at)lsw.uni-heidelberg.de
(End) Patricia Vannier [CDS] 07-Dec-2020
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line