Seismic Facies Identification
=============================
This lab covers a challenging task:
The identification of seismic facies through machine learning.
We'll use the data of the `Seismic Facies Identification Challenge <https://www.aicrowd.com/challenges/seismic-facies-identification-challenge>`_ which is already preprocessed and can be used with little effort to apply our knowledge on efficient ML.
Note, that having carefully preprocessed data is a luxury we typically don't have when working with experimental scientific data.

Our task is to classify volumetric stacked seismic data.
Classifying means that we have to decide what type of geologic description applies to a specific underground point.
Stacked seismic data typically consist of the inline dimensions, the crossline dimension and a depth or time dimension.
Further information is, e.g., available in the chapter  `Introduction to 3-D seismic exploration <https://wiki.seg.org/wiki/Introduction_to_3-D_seismic_exploration>`_ of the `Society of Exploration Geophysicists <https://seg.org>`_ wiki.
In essence the seismic waves are reflected or manipulated at material interfaces.
These effects are visible in the stacked seismic data which allows us to infer respective underground material properties.
In our dataset, a team of interpreters closely looked at the seismic data and labeled it for us.
Thus, we can learn from this labeled data and "interpret" seismic data on our own without years of training.

Keep in mind that we heavily depend on the interpreters doing a good job.
No matter the quality of the data-driven ML method itself, we will not be able to outperform the interpreters in quality or correct errors.
For this we would have to go another route.
However, we are able to automate the interpretation job and presumably do it much faster.
For example, assume that an interpreter labeled a small section of a dataset for us.
Here, the ML-driven approach developed in this lab might allow us to interpret the rest of the dataset automatically.

Getting Started
---------------
.. figure:: /chapters/data_seismic/seismic_facies_combined.png
   :name: fig:seismic_facies_combined
   :width: 100%

   Illustration of the seismic data of the Seismic Facies Identification Challenge.


The dataset is available in the class's cloud directory ``seismic/data``.
Provided is the labeled data, i.e., files ``data_train.npz`` and ``labels_train.npz``, which we use for training.
Further two unlabeled files, i.e., ``files data_test_1.npz`` and ``data_test_2.npz``, are available which were used in the first and second round of the challenge respectively.

.. _tab:seismic_files:

.. table:: Overview of the Seismic Facies Identification Challenge's data.

  +-------------------------+-----+-----+------+-------+------------+
  | File                    |  nx |  ny |   nz | dtype | size (GiB) |
  +=========================+=====+=====+======+=======+============+
  | ``data_train.npz``      | 782 | 590 | 1006 |       |            |
  +-------------------------+-----+-----+------+-------+------------+
  | ``labels_train.npz``    |     |     |      |       |            |
  +-------------------------+-----+-----+------+-------+------------+
  | ``data_test_1.npz``     |     |     |      |       |            |
  +-------------------------+-----+-----+------+-------+------------+
  | ``data_test_2.npz``     |     |     |      |       |            |
  +-------------------------+-----+-----+------+-------+------------+

Before doing any ML-related work, we'll explore the dataset to get a good feeling for it.

.. admonition:: Tasks

  #. Make yourself familiar with the Seismic Facies Identification Challenge.
     Have a look at the `Challenge Starter Kit <https://github.com/AIcrowd/seismic-facies-identification-starter-kit>`__
  #. Add the missing entries to  :numref:`tab:seismic_files`.
  #. Visualize the data in ``data_train.npz`` and ``labels_train.npz``.
     Slice the data for this purpose.
     Show slices in all dimensions, i.e., slices normal to the x-direction, y-direction and z-direction.
     Produce a plot which shows an xz-slice and yz-slice at the same time

U-Net Architecture
------------------
.. figure:: /chapters/data_seismic/unet_all.svg
   :name: fig:seismic_unet_all
   :width: 100%

   Illustration of the involved number of features and sizes in the code frame's U-Net type architecture.

In 2015 the U-Net architecture has been introduced in the context of biomedical image segmentation.
Since then U-Net type networks have been applied to a large range of scientific applications.
U-Nets consist of an encoder which quickly reduces the spatial size of the input, i.e., typically max pooling steps are used to halve the image size in every step.
At the same time the convolutional blocks in the encoder increase the number of features.

The final output of the encoder is connected through a convolutional bottleneck block to the decoder.
The decoder successively increases the spatial extend of the data through upsampling.
Respective convolutional blocks in the decoder decrease the number of features.
Further, the encoder is connected through skip connections to the decoder.

In this task we'll use U-Net type networks for the classification of seismic images.
A working code base is given the cloud share of the class to get you over the initial hurdles.
However, the code base is neither highly optimized for computational performance nor for overall accuracy.
Bottom line: This part is freestyle and you should follow your own ideas!

.. admonition:: Tasks

   #. Read the paper `U-Net: Convolutional Networks for Biomedical Image Segmentation <https://arxiv.org/abs/1505.04597>`_.
   #. Briefly explain the following terms:

      * Max poopling
      * Upsampling using bilinear interpolation
      * Skip connection
      * Batch Normalization
   #. Train a U-Net type network using the data of the Seismic Facies Identification Challenge.
      Optimize your network!
      You are free to choose whatever approach sounds suitable to you.
      Document your ideas and respective results!