J/MNRAS/489/3591 Open Supernova Catalog anomaly detection (Pruzhinskaya+, 2019)
Anomaly detection in the Open Supernova Catalog.
Pruzhinskaya M.V., Malanchev K.L., Kornilov M.V., Ishida E.E.O., Mondon F.,
Volnova A.A., Korolev V.S.
<Mon. Not. R. Astron. Soc., 489, 3591-3608 (2019)>
=2019MNRAS.489.3591P 2019MNRAS.489.3591P (SIMBAD/NED BibCode)
ADC_Keywords: Supernovae ; Morphology ; Photometry, classification ; Optical ;
Models
Keywords: methods: data analysis - catalogues; supernovae: general
Abstract:
In the upcoming decade, large astronomical surveys will discover
millions of transients raising unprecedented data challenges in the
process. Only the use of the machine learning algorithms can process
such large data volumes. Most of the discovered transients will belong
to the known classes of astronomical objects. However, it is expected
that some transients will be rare or completely new events of unknown
physical nature. The task of finding them can be framed as an anomaly
detection problem. In this work, we perform for the first time an
automated anomaly detection analysis in the photometric data of the
Open Supernova Catalog (OSC), which serves as a proof of concept for
the applicability of these methods to future large-scale surveys. The
analysis consists of the following steps: (1) data selection from the
OSC and approximation of the pre-processed data with Gaussian
processes, (2) dimensionality reduction, (3) searching for outliers
with the use of the isolation forest algorithm, and (4) expert
analysis of the identified outliers. The pipeline returned 81
candidate anomalies, 27 (33 per cent) of which were confirmed to be
from astrophysically peculiar objects. Found anomalies correspond to a
selected sample of 1.4 per cent of the initial automatically
identified data sample of approximately 2000 objects. Among the
identified outliers we recognized superluminous supernovae,
non-classical Type Ia supernovae, unusual Type II supernovae, one
active galactic nucleus and one binary microlensing event. We also
found that 16 anomalies classified as supernovae in the literature are
likely to be quasars or stars. Our proposed pipeline represents an
effective strategy to guarantee we shall not overlook exciting new
science hidden in the data we fought so hard to acquire. All code and
products of this investigation are made publicly available.
(http://snad.space/osc/)
Description:
The data are drawn from the OSC (Guillochon et al.
2017ApJ...835...64G 2017ApJ...835...64G). It represents an open repository for SN
metadata, LCs, and spectra in an easily downloadable format. This
catalogue also includes some contamination from non-SN objects.
Given the large number of objects and their diverse characteristics,
this catalogue is ideal for our goal of automatically identifying
anomalies. It incorporates data for more than 5x104 SNe candidates
among which 1.2x104 objects have >10 photometric observations and
5x103 have spectra. For comparison, SDSS SN catalogue contains only
4607 SNe candidates: 889 with measured spectra (Sako et al.
2018PASP..130f4002S 2018PASP..130f4002S, Cat. II/333).
We downloaded the data from the GitHub page
(http://github.com/astrocatalogs/) of the Astrocats project on 2018
June. The complete data set of 45162 objects is located at
http://snad.space/osc/sne.tar.lzma.
In summary, the isolation forest analysis identified 81 potentially
interesting objects (see Table A1), from which 27 (33 per cent) where
confirmed to be non-SN events or representatives of the rare SN
classes. Among these objects, we report for the first time the 16
star/quasar-like objects misclassified as SNe.
File Summary:
--------------------------------------------------------------------------------
FileName Lrecl Records Explanations
--------------------------------------------------------------------------------
ReadMe 80 . This file
tablea1.dat 427 81 List of outliers and their hosts
--------------------------------------------------------------------------------
See also:
II/333 : Sloan Digital Sky Survey-II Supernova Survey (Sako+, 2018)
Byte-by-byte Description of file: tablea1.dat
--------------------------------------------------------------------------------
Bytes Format Units Label Explanations
--------------------------------------------------------------------------------
1- 16 A16 --- Name Supernova name
18- 19 A2 --- f_Name [ef ] Flag on Name (1)
21 I1 --- n_Name [1/9] Note on Name (2)
23- 24 I2 h RAh Right ascension (J2000)
26- 27 I2 min RAm Right ascension (J2000)
29- 33 F5.2 s RAs Right ascension (J2000)
35 A1 --- DE- Declination sign (J2000)
36- 37 I2 deg DEd Declination (J2000)
39- 40 I2 arcmin DEm Declination (J2000)
42- 45 F4.1 arcsec DEs Declination (J2000)
47- 65 A19 --- Type Type of the source (3)
67- 71 F5.3 --- zCMB ? CMB redshift
73-102 A30 --- Host Host galaxy name
104-105 I2 h RAHh ? Right ascension (J2000) of the Host
107-108 I2 min RAHm ? Right ascension (J2000) of the Host
110-114 F5.2 s RAHs ? Right ascension (J2000) of the Host
116 A1 --- DEH- Declination sign (J2000) of the Host
117-118 I2 deg DEHd ? Declination (J2000) of the Host
120-121 I2 arcmin DEHm ? Declination (J2000) of the Host
123-126 F4.1 arcsec DEHs ? Declination (J2000) of the Host
128-133 A6 --- HType Host morphological type from Simbad
135-139 F5.1 arcsec Sepas ? Separation of the source from the center
of its host galaxy (in arcsec)
141-145 F5.2 kpc Sepkpc ? Separation of the source from the center
of its host galaxy (in kpc)
147-318 A172 --- Comments Classification and comments (4)
320-427 A108 --- Refs References
--------------------------------------------------------------------------------
Note (1): Flag as follows:
e = The object is also found in a data set of 364 photometric
characteristics (121x3 normalized fluxes and the LC flux maximum)
f = The object is also found in a data set of 10 Gaussian process
parameters (9 fitted parameters of the kernel and the log-likelihood
of the fit)
Note (2): Note as follows:
1 = Outlier found in eight data sets with different dimensionality reduction
2 = Outlier found in seven data sets with different dimensionality reduction
3 = Outlier found in six data sets with different dimensionality reduction
4 = Outlier found in five data sets with different dimensionality reduction
5 = Outlier found in four data sets with different dimensionality reduction
6 = Outlier found in three data sets with different dimensionality reduction
7 = Outlier found in two data sets with different dimensionality reduction
8 = Outliers found in a data set of 364 photometric characteristics
(121x3 normalized fluxes and the LC flux maximum)
9 = Outliers found in a data set of 10 Gaussian process parameters (9 fitted
parameters of the kernel and the log-likelihood of the fit)
Note (3): A prefix '?' means that the source is not confirmed spectroscopically
Note (4): If the classification is made by Sako et al. (2018PASP..130f4002S 2018PASP..130f4002S,
Cat. II/333), a prefix 'p' (pSN) indicates a purely photometric type,
a prefix 'z' (zSN) indicates that a redshift is measured from its
candidate host galaxy and the classification uses that redshift as a
prior.
--------------------------------------------------------------------------------
History:
From electronic version of the journal
(End) Ana Fiallos [CDS] 13-Jan-2023