13. Seismic Facies Identification

This lab covers a challenging task: The identification of seismic facies through machine learning. We’ll use the data of the Seismic Facies Identification Challenge which is already preprocessed and can be used with little effort to apply our knowledge on efficient ML. Note, that having carefully preprocessed data is a luxury we typically don’t have when working with experimental scientific data.

Our task is to classify volumetric stacked seismic data. Classifying means that we have to decide what type of geologic description applies to a specific underground point. Stacked seismic data typically consist of the inline dimensions, the crossline dimension and a depth or time dimension. Further information is, e.g., available in the chapter Introduction to 3-D seismic exploration of the Society of Exploration Geophysicists wiki. In essence the seismic waves are reflected or manipulated at material interfaces. These effects are visible in the stacked seismic data which allows us to infer respective underground material properties. In our dataset, a team of interpreters closely looked at the seismic data and labeled it for us. Thus, we can learn from this labeled data and “interpret” seismic data on our own without years of training.

Keep in mind that we heavily depend on the interpreters doing a good job. No matter the quality of the data-driven ML method itself, we will not be able to outperform the interpreters in quality or correct errors. For this we would have to go another route. However, we are able to automate the interpretation job and presumably do it much faster. For example, assume that an interpreter labeled a small section of a dataset for us. Here, the ML-driven approach developed in this lab might allow us to interpret the rest of the dataset automatically.

13.1. Getting Started

../_images/seismic_facies_combined.png

Fig. 13.1.1 Illustration of the seismic data of the Seismic Facies Identification Challenge.

The dataset is available in the class’s cloud directory seismic/data. Provided is the labeled data, i.e., files data_train.npz and labels_train.npz, which we use for training. Further two unlabeled files, i.e., files data_test_1.npz and data_test_2.npz, are available which were used in the first and second round of the challenge respectively.

Table 13.1.1 Overview of the Seismic Facies Identification Challenge’s data.

File

nx

ny

nz

dtype

size (GiB)

data_train.npz

782

590

1006

labels_train.npz

data_test_1.npz

data_test_2.npz

Before doing any ML-related work, we’ll explore the dataset to get a good feeling for it.

Tasks

  1. Make yourself familiar with the Seismic Facies Identification Challenge. Have a look at the Challenge Starter Kit

  2. Add the missing entries to Table 13.1.1.

  3. Visualize the data in data_train.npz and labels_train.npz. Slice the data for this purpose. Show slices in all dimensions, i.e., slices normal to the x-direction, y-direction and z-direction. Produce a plot which shows an xz-slice and yz-slice at the same time

13.2. U-Net Architecture

../_images/unet_all.svg

Fig. 13.2.1 Illustration of the involved number of features and sizes in the code frame’s U-Net type architecture.

In 2015 the U-Net architecture has been introduced in the context of biomedical image segmentation. Since then U-Net type networks have been applied to a large range of scientific applications. U-Nets consist of an encoder which quickly reduces the spatial size of the input, i.e., typically max pooling steps are used to halve the image size in every step. At the same time the convolutional blocks in the encoder increase the number of features.

The final output of the encoder is connected through a convolutional bottleneck block to the decoder. The decoder successively increases the spatial extend of the data through upsampling. Respective convolutional blocks in the decoder decrease the number of features. Further, the encoder is connected through skip connections to the decoder.

In this task we’ll use U-Net type networks for the classification of seismic images. A working code base is given the cloud share of the class to get you over the initial hurdles. However, the code base is neither highly optimized for computational performance nor for overall accuracy. Bottom line: This part is freestyle and you should follow your own ideas!

Tasks

  1. Read the paper U-Net: Convolutional Networks for Biomedical Image Segmentation.

  2. Briefly explain the following terms:

    • Max poopling

    • Upsampling using bilinear interpolation

    • Skip connection

    • Batch Normalization

  3. Train a U-Net type network using the data of the Seismic Facies Identification Challenge. Optimize your network! You are free to choose whatever approach sounds suitable to you. Document your ideas and respective results!