J/A+A/646/A104 Improving the open cluster census. I. (Hunt+, 2021)
Improving the open cluster census.
I. Comparison of clustering algorithms applied to Gaia DR2 data.
Hunt E.L., Reffert S.
<Astron. Astrophys. 646, A104 (2021)>
=2021A&A...646A.104H 2021A&A...646A.104H (SIMBAD/NED BibCode)
ADC_Keywords: Milky Way ; Surveys ; Positional data ; Clusters, open
Keywords: methods: data analysis - open clusters and associations: general -
astrometry
Abstract:
The census of open clusters in the Milky Way is in a never-before seen
state of flux. Recent works have reported hundreds of new open
clusters thanks to the incredible astrometric quality of the Gaia
satellite, but other works have also reported that many open clusters
discovered in the pre Gaia era may be associations.
We aim to conduct a comparison of clustering algorithms used to detect
open clusters, attempting to statistically quantify their strengths
and weaknesses by deriving the sensitivity, specificity, and precision
of each as well as their true positive rate against a larger sample.
We selected DBSCAN, HDBSCAN, and Gaussian mixture models for further
study, owing to their speed and appropriateness for use with Gaia
data. We developed a preprocessing pipeline for Gaia data and
developed the algorithms further for the specific application to open
clusters. We derived detection rates for all 1385 open clusters in the
fields in our study as well as more detailed performance statistics
for 100 of these open clusters.
DBSCAN was sensitive to 50%-62% of the true positive open clusters
in our sample, with generally very good specificity and precision.
HDBSCAN traded precision for a higher sensitivity of up to 82%,
especially across different distances and scales of open clusters.
Gaussian mixture models were slow and only sensitive to 33% of open
clusters in our sample, which tended to be larger objects.
Additionally, we report on 41 new open cluster candidates detected by
HDBSCAN, three of which are closer than 500pc.
When used with additional post-processing to mitigate its false
positives, we have found that HDBSCAN is the most sensitive and
effective algorithm for recovering open clusters in Gaia data. Our
results suggest that many more new and already reported open clusters
have yet to be detected in Gaia data.
Description:
clusterl.dat (clustersliteratureocs) contains one entry for every
detected open cluster by every algorithm and parameter combination.
When no algorithm detected a cluster, a blank row with only 'Name' and
'source' parameters filled is given.
clustern.dat (clustersnewocs) contains 41 new open clusters detected
by this work.
membersl.dat (membersliteratureocs) and membersn.dat
(membersnewocs) list Gaia DR2 members of literature and newly
detected open clusters respectively.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
clusterl.dat 267 5175 List of detected open clusters by each algorithm
clustern.dat 246 41 List of new open clusters reported by this work
membersl.dat 322 678768 List of members for detected literature OCs
membersn.dat 322 2099 List of members for new open clusters
--------------------------------------------------------------------------------
See also:
I/345 : Gaia DR2 (Gaia Collaboration, 2018)
J/ApJS/245/32 : Newly identified star clusters in Gaia DR2 (Liu+, 2019)
J/A+A/558/A53 : Milky Way global survey of star clusters II (Kharchenko+, 2013)
J/A+A/635/A45 : 570 new open clusters in Galactic disc (Castro-Ginard+, 2020)
J/A+A/633/A99 : Gaia DR2 open clusters in Milky Way. II (Cantat-Gaudin+, 2020)
Byte-by-byte Description of file: cluster?.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 17 A17 --- Name Name of the source
19- 37 A19 --- IntID Internal ID of the source
39- 45 A7 --- Algorithm Algorithm used for this detection
47- 68 A22 --- Param Parameters of algorithm used
70- 78 F9.5 deg RAdeg ? Right ascension (ICRS) at Ep=2015.5
80- 86 F7.5 deg s_RAdeg ? Standard deviation of right ascension
88- 94 F7.5 deg e_RAdeg ? Standard error of right ascension
96-104 F9.5 deg DEdeg ? Declination (ICRS) at Ep=2015.5
106-112 F7.5 deg s_DEdeg ? Standard deviation of declination
114-120 F7.5 deg e_DEdeg ? Standard error of declination
122-130 F9.5 deg GLON ? Galactic longitude (2015.5)
132-140 F9.5 deg GLAT ? Galactic latitude (2015.5)
142-150 F9.5 mas/yr pmRA ? Proper motion along RA, pmRA*cosDE (2015.5)
152-158 F7.5 mas/yr s_pmRA ? Standard deviation of pmRA
160-166 F7.5 mas/yr e_pmRA ? Standard error of pmRA
168-176 F9.5 mas/yr pmDE ? Proper motion along DE (2015.5)
178-184 F7.5 mas/yr s_pmDE ? Standard deviation of pmDE
186-192 F7.5 mas/yr e_pmDE ? Standard error of pmDE
194-200 F7.5 mas plx ? Parallax
202-208 F7.5 mas s_plx ? Standard deviation of parallax
210-216 F7.5 mas e_plx ? Standard error of parallax
218-224 F7.5 deg rad50 ? Radius containing 50% of members
226-232 F7.5 deg radtidal ? Estimated tidal radius
234-237 I4 --- Nstars ? Number of identified member stars
239-246 F8.5 --- cst ? Cluster significance test score
248-267 A20 --- Source Source paper used for crossmatching,
only for clusterl.dat file (1)
--------------------------------------------------------------------------------
Note (1): the following papers are referenced as:
Cantat-Gaudin+_2020a = Cantat-Gaudin et al., 2020A&A...633A..99C 2020A&A...633A..99C,
Cat. J/A+A/633/A99
Castro-Ginard+_2020 = Castro-ginard et al., 2020A&A...635A..45C 2020A&A...635A..45C,
Cat. J/A+A/635/A45
Kharchenko+_2013 = Kharchenko et al., 2013A&A...558A..53K 2013A&A...558A..53K,
Cat. J/A+A/558/A53
Liu+_2019 = Liu & Pang, 2019ApJS..245...32L 2019ApJS..245...32L, Cat. J/ApJS/245/32
--------------------------------------------------------------------------------
Byte-by-byte Description of file: members?.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 17 A17 --- Name Name of the source
19- 37 A19 --- IntID Internal ID of the source
39- 45 A7 --- Algorithm Algorithm used for this detection
47- 68 A22 --- Param Parameters of algorithm used
70- 88 I19 --- GaiaDR2 Gaia DR2 source ID of this source
90- 98 F9.5 deg RAdeg Right ascension (ICRS) at Ep=2015.5
100-106 F7.5 deg e_RAdeg Standard error of right ascension
108-116 F9.5 deg DEdeg Declination (ICRS) at Ep=2015.5
118-124 F7.5 deg e_DEdeg Standard error of declination
126-134 F9.5 deg GLON Galactic longitude (2015.5)
137-145 E9.5 deg GLAT Galactic latitude (2015.5)
147-155 E9.5 mas/yr pmRA Proper motion along RA, pmRA*cosDE (2015.5)
157-163 F7.5 mas/yr e_pmRA Standard error of pmRA
165-173 E9.5 mas/yr pmDE Proper motion along DE (2015.5)
175-181 F7.5 mas/yr e_pmDE Standard error of pmDE
183-190 E8.5 mas plx Parallax
192-198 F7.5 mas e_plx Standard error of parallax
200-207 F8.5 mag Gmag Magnitude in the G band
209-216 F8.5 mag BPmag Magnitude in the BP band
218-225 F8.5 mag RPmag Magnitude in the RP band
227-241 F15.5 e-/s FG Flux in the G band
243-255 F13.5 e-/s e_FG Standard error on flux in the G band
257-271 F15.5 e-/s FBP Flux in the BP band
273-284 F12.5 e-/s e_FBP Standard error on flux in the BP band
286-300 F15.5 e-/s FRP Flux in the RP band
302-314 F13.5 e-/s e_FRP Standard error on flux in the RP band
316-322 F7.5 deg Prob Membership probability
--------------------------------------------------------------------------------
Acknowledgements:
Emily Blunt, ehunt(at)lsw.uni-heidelberg.de
(End) Patricia Vannier [CDS] 07-Dec-2020