🐁

【Kaggle】Some overviews of Kaggle competition

に公開

These are some overviews of Kaggle competition G2Net, IceCube, and SETI.

1. G2Net Gravitational Wave Detection

・Comeptiton link
https://www.kaggle.com/competitions/g2net-gravitational-wave-detection

Term: 2021/7/1 ~ 2021/9/30

1.1 Overview

In 2015, the researchers found the gravitational waves, signals from colliding binary black holes. This led the researchers to measure the expansion of the Universe.

These signals are unimaginably tiny ripples in the fabric of space-time and even though the global network of GW detectors are some of the most sensitive instruments on the planet, the signals are buried in detector noise.

In this competition, you’ll aim to detect GW signals from the mergers of binary black holes. Specifically, you'll build a model to analyze simulated GW time-series data from a network of Earth-based detectors.

1.2 Evaluation

metrics

AUC

sample submission
id,target
00005bced6,0.5
0000806717,0.5
0000ef4fe1,0.5
etc.

1.3 Data

In this competition, you are provided with a training set of time series data containing simulated gravitational wave measurements from a network of 3 gravitational wave interferometers (LIGO Hanford, LIGO Livingston, and Virgo).

The integrated signal-to noise ratio (SNR) is classically the most informative measure of how detectable a signal is and a typical level of detectability is when this integrated SNR exceeds ~8. This shouldn't confused with the instantaneous SNR - the factor by which the signal rises above the noise - and in nearly all cases the these signals are not visible by eye in the time series.

2. G2Net Detecting Continuous Gravitational Waves

・Comeptiton link
https://www.kaggle.com/competitions/g2net-detecting-continuous-gravitational-waves

Term: 2022/10/5 ~ 2023/01/04

2.1 Overview

The goal of this competition is to find continuous gravitational-wave signals. You will develop a model sensitive enough to detect weak yet long-lasting signals emitted by rapidly-spinning neutron stars within noisy data.

When scientists detected the first class of gravitational waves in 2015, they expected the discoveries to continue.
The signals from the planet like the mass of our Sun but condensed into a ball the size of a city and spinning over 1,000 times a second, we can catch at the Earth.

2.2 Evaluation

metrics

AUC

sample submission
id,target
00054c878,0.5
0007285a3,0.5
00076c5a6,0.5
etc.

2.3 Data

In this competition, you are provided with a training set containing time-frequency data from two gravitational-wave interferometers (LIGO Hanford & LIGO Livingston). Each data sample contains either real or simulated noise and possibly a simulated continuous gravitational-wave signal (CW). The task is to identify when a signal is present in the data (target=1).

3. IceCube - Neutrinos in Deep Ice

・Comeptiton link
https://www.kaggle.com/competitions/icecube-neutrinos-in-deep-ice

Term: 2023/01/20 ~ 2023/04/20

3.1 Overview

The goal of this competition is to predict a neutrino particle’s direction. You will develop a model based on data from the "IceCube" detector, which observes the cosmos from deep within the South Pole ice.

By making the process faster and more precise, you'll help improve the reconstruction of neutrinos. As a result, we could gain a clearer image of our universe.

3.2 Evaluation

metrics

Mean Angular Error

sample submission

For each event_id in the test set, you must predict the azimuth and zenith. The file should contain a header and have the following format:

event_id,azimuth,zenith
730,1,1
769,1,1
774,1,1
etc.

3.3 Data

The goal of this competition is to identify which direction neutrinos detected by the IceCube neutrino observatory came from. When detection events can be localized quickly enough, traditional telescopes are recruited to investigate short-lived neutrino sources such as supernovae or gamma ray bursts. Because the sky is huge better localization will not only associate neutrinos with sources but also to help partner observatories limit their search space. With an average of three thousand events per second to process, it's difficult to keep up with the stream of data using traditional methods. Your challenge in this competition is to quickly and accurately process a large number of events.

train/test_meta

[azimuth/zenith] (float32): the [azimuth/zenith] angle in radians of the neutrino. A value between 0 and 2*pi for the azimuth and 0 and pi for zenith. The target columns. Not provided for the test set. The direction vector represented by zenith and azimuth points to where the neutrino came from.

Note: Other quantities regarding the event, such as the interaction point in x, y, z (vertex position), the neutrino energy, or the interaction type and kinematics are not included in the dataset.

[train/test]/batch_[n].parquet

Each batch contains tens of thousands of events. Each event may contain thousands of pulses, each of which is the digitized output from a photomultiplier tube and occupies one row.

event_id (int): the event ID. Saved as the index column in parquet.
time (int): the time of the pulse in nanoseconds in the current event time window. The absolute time of a pulse has no relevance, and only the relative time with respect to other pulses within an event is of relevance.
sensor_id (int): the ID of which of the 5160 IceCube photomultiplier sensors recorded this pulse.
charge (float32): An estimate of the amount of light in the pulse, in units of photoelectrons (p.e.). A physical photon does not exactly result in a measurement of 1 p.e. but rather can take values spread around 1 p.e. As an example, a pulse with charge 2.7 p.e. could quite likely be the result of two or three photons hitting the photomultiplier tube around the same time. This data has float16 precision but is stored as float32 due to limitations of the version of pyarrow the data was prepared with.
auxiliary (bool): If True, the pulse was not fully digitized, is of lower quality, and was more likely to originate from noise. If False, then this pulse was contributed to the trigger decision and the pulse was fully digitized.

sensor_geometry.csv

The x, y, and z positions for each of the 5160 IceCube sensors. The row index corresponds to the sensor_idx feature of pulses. The x, y, and z coordinates are in units of meters, with the origin at the center of the IceCube detector. The coordinate system is right-handed, and the z-axis points upwards when standing at the South Pole. You can convert from these coordinates to azimuth and zenith with the following formulas (here the vector (x,y,z) is normalized):

x = cos(azimuth) * sin(zenith)
y = sin(azimuth) * sin(zenith)
z = cos(zenith)

4. SETI Breakthrough Listen - E.T. Signal Search

・Comeptiton link
https://www.kaggle.com/competitions/seti-breakthrough-listen

Term: 2021/05/11 ~ 2021/08/19

4.1 Overview

The Listen team is part of the Search for ExtraTerrestrial Intelligence (SETI) and uses the largest steerable dish on the planet, the 100-meter diameter Green Bank Telescope.

The goal of this competition is to find the anomalous signals added by the host team. they simulated and added some signals (that they call “needles”) in the haystack of data from the telescope.

4.2 Evaluation

metrics

AUC

sample submission
id,target
00034abb3629,0.5
0004be0baf70,0.5
0005be4d0752,0.5
etc.

4.3 Data

In this competition, you are tasked with looking for technosignature signals in cadence snippets taken from the Green Bank Telescope (GBT).

train

A training set of cadence snippet files stored in numpy float16 format (v1.20.1), one file per cadence snippet id, with corresponding labels found in the train_labels.csv file. Each file has dimension (6, 273, 256), with the 1st dimension representing the 6 positions of the cadence, and the 2nd and 3rd dimensions representing the 2D spectrogram.

test

The test set cadence snippet files; you must predict whether or not the cadence contains a "needle", which is the target for this competition

old_leaky_data

Full pre-relaunch data, including test labels; you should not assume this data is helpful (it may or may not be).

Discussion