J/MNRAS/489/3591  Open Supernova Catalog anomaly detection (Pruzhinskaya+, 2019)

Anomaly detection in the Open Supernova Catalog. Pruzhinskaya M.V., Malanchev K.L., Kornilov M.V., Ishida E.E.O., Mondon F., Volnova A.A., Korolev V.S. <Mon. Not. R. Astron. Soc., 489, 3591-3608 (2019)> =2019MNRAS.489.3591P 2019MNRAS.489.3591P (SIMBAD/NED BibCode)
ADC_Keywords: Supernovae ; Morphology ; Photometry, classification ; Optical ; Models Keywords: methods: data analysis - catalogues; supernovae: general Abstract: In the upcoming decade, large astronomical surveys will discover millions of transients raising unprecedented data challenges in the process. Only the use of the machine learning algorithms can process such large data volumes. Most of the discovered transients will belong to the known classes of astronomical objects. However, it is expected that some transients will be rare or completely new events of unknown physical nature. The task of finding them can be framed as an anomaly detection problem. In this work, we perform for the first time an automated anomaly detection analysis in the photometric data of the Open Supernova Catalog (OSC), which serves as a proof of concept for the applicability of these methods to future large-scale surveys. The analysis consists of the following steps: (1) data selection from the OSC and approximation of the pre-processed data with Gaussian processes, (2) dimensionality reduction, (3) searching for outliers with the use of the isolation forest algorithm, and (4) expert analysis of the identified outliers. The pipeline returned 81 candidate anomalies, 27 (33 per cent) of which were confirmed to be from astrophysically peculiar objects. Found anomalies correspond to a selected sample of 1.4 per cent of the initial automatically identified data sample of approximately 2000 objects. Among the identified outliers we recognized superluminous supernovae, non-classical Type Ia supernovae, unusual Type II supernovae, one active galactic nucleus and one binary microlensing event. We also found that 16 anomalies classified as supernovae in the literature are likely to be quasars or stars. Our proposed pipeline represents an effective strategy to guarantee we shall not overlook exciting new science hidden in the data we fought so hard to acquire. All code and products of this investigation are made publicly available. (http://snad.space/osc/) Description: The data are drawn from the OSC (Guillochon et al. 2017ApJ...835...64G 2017ApJ...835...64G). It represents an open repository for SN metadata, LCs, and spectra in an easily downloadable format. This catalogue also includes some contamination from non-SN objects. Given the large number of objects and their diverse characteristics, this catalogue is ideal for our goal of automatically identifying anomalies. It incorporates data for more than 5x104 SNe candidates among which 1.2x104 objects have >10 photometric observations and 5x103 have spectra. For comparison, SDSS SN catalogue contains only 4607 SNe candidates: 889 with measured spectra (Sako et al. 2018PASP..130f4002S 2018PASP..130f4002S, Cat. II/333). We downloaded the data from the GitHub page (http://github.com/astrocatalogs/) of the Astrocats project on 2018 June. The complete data set of 45162 objects is located at http://snad.space/osc/sne.tar.lzma. In summary, the isolation forest analysis identified 81 potentially interesting objects (see Table A1), from which 27 (33 per cent) where confirmed to be non-SN events or representatives of the rare SN classes. Among these objects, we report for the first time the 16 star/quasar-like objects misclassified as SNe. File Summary: -------------------------------------------------------------------------------- FileName Lrecl Records Explanations -------------------------------------------------------------------------------- ReadMe 80 . This file tablea1.dat 427 81 List of outliers and their hosts -------------------------------------------------------------------------------- See also: II/333 : Sloan Digital Sky Survey-II Supernova Survey (Sako+, 2018) Byte-by-byte Description of file: tablea1.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 16 A16 --- Name Supernova name 18- 19 A2 --- f_Name [ef ] Flag on Name (1) 21 I1 --- n_Name [1/9] Note on Name (2) 23- 24 I2 h RAh Right ascension (J2000) 26- 27 I2 min RAm Right ascension (J2000) 29- 33 F5.2 s RAs Right ascension (J2000) 35 A1 --- DE- Declination sign (J2000) 36- 37 I2 deg DEd Declination (J2000) 39- 40 I2 arcmin DEm Declination (J2000) 42- 45 F4.1 arcsec DEs Declination (J2000) 47- 65 A19 --- Type Type of the source (3) 67- 71 F5.3 --- zCMB ? CMB redshift 73-102 A30 --- Host Host galaxy name 104-105 I2 h RAHh ? Right ascension (J2000) of the Host 107-108 I2 min RAHm ? Right ascension (J2000) of the Host 110-114 F5.2 s RAHs ? Right ascension (J2000) of the Host 116 A1 --- DEH- Declination sign (J2000) of the Host 117-118 I2 deg DEHd ? Declination (J2000) of the Host 120-121 I2 arcmin DEHm ? Declination (J2000) of the Host 123-126 F4.1 arcsec DEHs ? Declination (J2000) of the Host 128-133 A6 --- HType Host morphological type from Simbad 135-139 F5.1 arcsec Sepas ? Separation of the source from the center of its host galaxy (in arcsec) 141-145 F5.2 kpc Sepkpc ? Separation of the source from the center of its host galaxy (in kpc) 147-318 A172 --- Comments Classification and comments (4) 320-427 A108 --- Refs References -------------------------------------------------------------------------------- Note (1): Flag as follows: e = The object is also found in a data set of 364 photometric characteristics (121x3 normalized fluxes and the LC flux maximum) f = The object is also found in a data set of 10 Gaussian process parameters (9 fitted parameters of the kernel and the log-likelihood of the fit) Note (2): Note as follows: 1 = Outlier found in eight data sets with different dimensionality reduction 2 = Outlier found in seven data sets with different dimensionality reduction 3 = Outlier found in six data sets with different dimensionality reduction 4 = Outlier found in five data sets with different dimensionality reduction 5 = Outlier found in four data sets with different dimensionality reduction 6 = Outlier found in three data sets with different dimensionality reduction 7 = Outlier found in two data sets with different dimensionality reduction 8 = Outliers found in a data set of 364 photometric characteristics (121x3 normalized fluxes and the LC flux maximum) 9 = Outliers found in a data set of 10 Gaussian process parameters (9 fitted parameters of the kernel and the log-likelihood of the fit) Note (3): A prefix '?' means that the source is not confirmed spectroscopically Note (4): If the classification is made by Sako et al. (2018PASP..130f4002S 2018PASP..130f4002S, Cat. II/333), a prefix 'p' (pSN) indicates a purely photometric type, a prefix 'z' (zSN) indicates that a redshift is measured from its candidate host galaxy and the classification uses that redshift as a prior. -------------------------------------------------------------------------------- History: From electronic version of the journal
(End) Ana Fiallos [CDS] 13-Jan-2023
The document above follows the rules of the Standard Description for Astronomical Catalogues; from this documentation it is possible to generate f77 program to load files into arrays or line by line