Facies Classification with Wavelets and Deep Learning
With the dramatic growth and complexity of seismic data, manual labeling of seismic facies has become a significant challenge. In this talk, we will highlight how applying deep learning and wavelets in MATLAB® can help solve this challenge and provide a starting point to speed up interpretation by geoscientists. You will learn how to:
- Use MATLAB to simplify the application of advanced techniques like wavelets through interactive apps
- Create deep learning models with just a few lines of MATLAB code
- Explore a seismic volume with the Volume Viewer app
- Accelerate algorithms on NVIDIA® GPUs or in the cloud without specialized programming or extensive knowledge of IT infrastructure
Published: 21 Nov 2021
Hello, welcome to my talk on seismic facies classification with artificial intelligence. My name is Akhilesh Mishra. And I'm a Senior Application Engineer based off in the Plano, Texas office.
So a little bit of background. The seismic data remote sensing is a very popular technique in the oil and gas exploration phase. Now, people do this oil and gas exploration below the surface of the sea, or also on the land.
And the way this exploration is done is that we send sound wave signals from some sensors. And because of the difference in the impedance of all these layers, we have these reflections coming back, collected in the sensors. And then we see those contrast in the layers in our final image.
Our quantitative interpretation is done to allow us to characterize the reservoir characteristics and also there is a wire types. And this oil gas, this seismic collection is an important step in the oil-and-gase exploration because ultimately, that's the data which decides where we have to drill and where we'll find the oil.
But what are the challenges? So just the whole process about the acquisition, processing, and getting up to a final image, it could take like several months to even a year. Now there is a lot of data which we're dealing with because there's several of kilometers surveys being done. And given that the acquisition over the sea and the acquisition of over the land could have very different properties, and then there are a lot of unknowns, which needs to be processed in order to get the final image. But adding a little bit more to the complexity is the next phase where the interpretation happens.
Now, after spending that much time to get the images, now we need to involve an expert like a geoscientists or a geologist who would go ahead and do a semantic segmentation of the image, labeling all the different regions what are the features which we're seeing in the images. Otherwise, for a layman like myself, I'm in this image means nothing.
So some examples of the features is like there could be some sedimentary rocks. There could be some sand base sediments deposited by some river. There could be a fine grained sediment.
There could be a shell deposition. There could be some solid deposition and a lot of different things could be there. Now, this process has a lot of challenges.
I would call out three specific ones. First of all, this, again, is very time consuming. It takes lots of time labeling manually all these details, which are worth like several kilometers of surveys.
Reproducibility is an issue because it's hard to reproduce the same results for 1 billion. And then it cannot be really applied to another region. The same interpreter has to use his expertise in the second region and so forth.
And then interpretative, the results can vary from one interpreter to warchest to another interpreter because these are the things that you are seeing just based on an eye and experience. And we're labeling-- relying on that. All right, so that's the motivation in the industry today.
AI started to become a very popular. And in the past few years, especially like I would say since 2015, we have been seeing people have been developing lots of models, deep learning and machine learning based, which does this seismic interpretation more analytically, and automatically. So a couple of these papers I found online, like the salt classification using deep learning and 2D seismic phases classification, if you see the Google Scholar like the citations, like it's got over 500k of citations or 33,000 citations. And it's becoming very, very popular these days.
However, when I went and did the literature survey, I saw that there was one common trend in all these methods. All of these methods, I would say 95% at least were based on convolutional neural networks, algorithm based, which is actually one of the techniques in deep learning.
So some of the networks, which were fairly popular and cited, were like state of the art networks where the unit and VGGNet unit, by the way, is one of a very popular segmenter network used in the developed by the medical community. And it has been adopted by the oil and gas community as well. So this is what this network looks like.
I won't be going into much details about the networks in the stock just because there's a lot of ground to cover. But what it does under the hood for those of you who might be new to the deep learning world is that it takes some batches of images like in this case, it is of size 480 by 640 by 3. And then all these different layers, this is, by the way, an encoder, decoder cell network, or an autoencoder they call it.
These layers under the hood are based convolutional layers, are ALU layers, max pooling layers. They kind of are transforming the image every layer to layer, doing some of these convolution and filtering operations. And all the way till the end where the output image is actually of size 480 by 640 again, but multiplied by the number of facies we are classifying.
So in this case for this network, it was six facies. So that's why we have the output 480 by 640 by 6. So input image, the raw image, the seismic image gives us the output image, which is all got all the labels for all the six different classes. And it's of the same size.
It's almost like a segmented mask you can overlay on that image. So this is what we have over here. So the final would be-- this would be the input size. And the final one would be this color the image over here, which is a segmented image could be overlaid on the original input layer after passing through those networks.
Now, the challenges associated with the semantic segmentation approach is a couple of them are so forth. So although it's very, very popular because deep learning technology itself has been evolving around a lot around images, so we do have a lot of state of the art literature out there and models out there, which can be leveraged for doing this training. But I would say still call out that these papers do have these limitations where there's a problem of accuracy, which is overall accuracy is less.
For a particular region, survey region, the accuracy could be more. But then when the algorithms are applied on a data set, which is captured elsewhere, the accuracy starts to deteriorate. So some problem of what we call in AI is overfitting. So there is some degree, a certain degree of overfitting with this image of support happening all the time.
Now, another challenge is in a few papers. What we saw is input image size. How do we patch-- create the size of the patch like in this case for the patch full of size 456 by 944? It impacts the prediction accuracy.
I mean, there's no really analytical reasoning why and how this happening. But based on the network, let's say we're using a unit, which has fixed layers underneath it, it can work differently for the different size of patch images. The models are not data agnostic.
I mentioned already before it could be over fitted in one region. But then the moment we take some surveys, let's say, three kilometers far from the current region where we have trained the model, the region where we have trained the model with the data, it may or may not work. It's very likely like the accuracy starts to fall down.
And one most important thing I would like to mention here is these features are all image based. But if we inspect the data in itself, there-- it's all signals. Now, when I say it's all signals, here is the screenshot what I have of a very zoomed in portion of one such area.
And I've overlaid it with the signals, these 1D signals also referred to as seismic traces or even shots is what constitutes all these 2D images. But an important thing to note over here is these signals at different interfaces or all the facies, the signals are changing quite a bit, have a different property, have a different frequency. And these are the slow time waiting changes at these interfaces, which make these layers look unique.
And if there is a way we can leverage that, that would be more analytical in our approach and building and more scientific rather in building our deep learning based model for doing this facies identification. And remember what I said earlier? These changes in the small frequencies across the different facies is happening because there's a difference of impedance in all these different rock and layers below the surface of the Earth. So that's what is causing these small transients if you would call it. And that's why this layer looks unique.
So this is what our motivation. To overcome those challenges, we did come up with a solution, which is purely signal based. And we got to take-- develop this because as a part of the hackathon competition we participated in organized by SEAM AI. And in this competition we were given this Parihaka data set, which is a survey region in subsurface survey done in New Zealand. And this is all-- data is available in the public domain.
And then we were given some labeled images. And we were asked to build AI model to do that classification automatically. So our solution was to use a recurrent neural network instead of CNN. And for those of you who again might be new, just I would like to mention that the recurrent neural network is some neural network, which can take sequence data as inputs. And it would take care of the time dependency of the data.
So one sample at a time as we go down the depth of the surface of the Earth, we're feeding one sample at a time. And then we can go to the other trace and then the second trace, third trace, and so forth. So RNN approach with some wavelet pre-processing, which we'll talk about it just right now.
So ideally, in an ideal world, I mean, we should be able to train, like I said, with the RNN, it can take the sequence data. So I should be able to feed in the raw signal, the seismic process signals, which I have, directly into a deep learning network, which is an origin based like this one.
So we are using the gated recurrent units as my RNN units. And the final output is, again, a classification layer output, which is giving me like for one sample what's a different class, like one of those six classes I need to identify.
But the problem with that is this network never learns. So this is a training progress curve. And you'll see that it will kind of keep oscillating up and down. And if I were to inspect my confusion matrix, what I see is that of all the six layers, the predicted class does not match the true class. And it's all over the place. The results are very, very poor.
Now, it is because of a variety of reasons. But one particular reason, which is the most important one, is that these signals are pretty long in time. And those facies, the changes, the difference between the two facies are small transient effects happening in the signal, which occur in a very short time duration. And these learnable layers, like the GRU layers to be able to capture those subtle nuances happening in such a short duration of time is really very challenging.
So that was a need of what we brought us to some we realize that some pre-processing is needed. And if we inspect the data going back to that image again-- by the way, this is one of our tools in MATLAB, which allows you to view the entire volume with a superimposed labels on it. But if we go back to our previous image, which we had up there, we want to isolate these small transient effects, which is happening between the different layers.
And what we thought is like if we can isolate these events at a different channel or decompose the signal to capture all these events separately, then we might be able to get away with a better RNN network. So to do that, we had a couple of ways to do it, options to do it.
One was a simple band pass filtering where we divide the signal in the various sub bands and then take those sub bands. But because these events are such a short time limited, and the resolution of the final data, which we are getting by the decomposition was really very, very poor. And it would-- the F50 based methods essentially were not able to capture with much fidelity these small transient effects.
So the resolution was a big problem. And that is what made us look into using wavelet analysis. Now, a wavelet analysis, there is one technique in wavelets, which is called a discrete wavelet transform. And what we can do is use the discrete wavelet transform to split the signals in more and more finer subvents and then have that split decompose our signal in this multiple channels.
So what this does is it takes a particular wavelength. And there are so many discrete wavelengths you can choose from, like fk, sym wavelet, Daubechies wavelet. But then it will split it in upper band, which probably would be like, let's say, for sampling frequency is FS.
The upper band would FS over four to FS over two. The lower band would be 0 to FS over four. And then it could be further subdivided into two more sub bands like FS over four to FS over eight and zero to FS over eight and FS over eight to FS over 16 and so forth.
So you can control how many levels you want to divide your signal into. And this is done pretty easily in MATLAB, using this app, which we call as a wavelet single multi resolution app. And what it allows you to do is over here, you can choose the different wavelets.
And you can iteratively experiment with all these different wavelets. What I chose over here is FK 14 and split it up in four levels. And I see that all these different levels, there are some signals which are getting isolated from the main signal.
And it is showing up in the several sub bands. And this is exactly what I was looking to do. So we did have to iterate through all these wavelets. But in the app, it was very easy because it's more interactive process.
The process-- and you can choose between the different levels, different wavelets and see what is visually suiting the best. And what we ended up doing is we split this signal into five channels, using the FK 14 wavelet. And then we thought that now since we have this five sub bands of all these signals, we'll start to the next step.
So now each signal is split into five sub bands. And by the way, the length of each signal for us was 1,006. So we started off with building RNNs with different techniques like LSDMs, GRU's. And then we started one at a time.
But then again, with some iteration, in the training, what we realize is that it is also important not only to split the signal in the five sub bands, which we are done using the wavelet transform. But also it was important for us to capture the spatial correlation.
Now, when I say spatial correlation, what I mean is-- so all these traces are taken in a 2D grid x and y, which is representing the region of our scan, or our survey. So what we realized it, because we're treating each signal at a time, but the literature told us that the seismic features do not change within 10 feet of a region. So 10 feet if we go in the X direction and 10 feet if we go in the Y direction, usually the seismic features are more or less consistent and the same.
So we thought that yeah, this might actually help because our x and y resolution is a lot much finer. So what we ended up doing is approximately the resolution was three meters for every trace. So we ended up taking like 3 by 3 traces in each x and y direction, clubbed it together, and have the network understand what the spatial correlation in x and y across this different traces.
So we ended up doing 1,006 by 3 by 3 by 5, input vector for one trace, and then have the consistent one label for all of that, this small block of grid. So we had 782 by 590 tracers in total. And that's how we group this data in 2006 by 3 by 3 by 5.
So 3 by 3 by 5 combined together became 45 samples. So what we are doing is in our deep learning network, what we constructed with the bilstm layer we were feeding in 45 samples at a time. And then kind of progressively going down 1,006 for one trace.
Then, for the next one, we'll do a 45 again and go down 1,006. And as you can see, the output size for our layer was six, corresponding to the score for each classes. So 45 inputs will give me one output. 45 input will give me more like six scores for the six different classes.
But then I will choose the class which has the maximum score. So that is how we architected our network after a few iterations. But one of the things I will point out is that in MATLAB, it was quite easy because we had that deep network designer app to iterate through a different network architectures. And then we had the wavelet analyzer or the multiresolution analysis app, which we saw in the previous slide, which allowed us to iterate through which wavelet might work the best.
And the combination of both, we could train multiple models, see the accuracy, and gradually increment by making a few subtle changes. This spatial correlation was a big change because that did help the layers actually to understand how x and y is correlated together and how it is different from other regions, which are far away.
All right, so now we have this network. But then what we created a problem for ourselves was that we-- by combining the data in this fashion, we almost ended up creating like 200 gigabytes of data. And you solved that problem like we were training on, the NVIDIA GPU cloud, which is also called NGC. And we do have a MATLAB container image directly on the NGC cloud. So that's a very powerful machine with a lot of RAM.
And then the GPUs are very powerful over there. So you can leverage that for your training directly out of the box. But then MATLAB also supports AWS.
So if you would like to train your algorithms on the Amazon Web Services, we do have a reference architecture image over there as well. And you can leverage the most powerful GPUs up there.
So the results what we saw on that validation data was what I have on the screen. So you see that it's the actual labels match very closely to the predicted labels. And this is a result of what we submitted. And even on the test data, which the network had never seen before, we got a fairly accurate results.
Now the final solution, which we submitted, was that we got overall like 93% of validation score. And like I mentioned before, you use the NDC compute. So it took us only three hours to train this model over there. And the prediction time on GPU device, we can also predict the data, which I'll show you in the next slide.
But it was only two to three minutes for around 1,000 traces. So very fast algorithm. And the score what we submitted are overall F1 weighted score came out to be 0.76 in the competition.
And then there were other teams who participated. But you can see the starking difference. Like our approach was RNN. And everybody else was doing a CNN.
And you can see that we were way much better than other teams who had participated in the competition. So that being said, I do want to touch upon the last part. Now, since we have this prediction time of two to three minutes for 1,000 traces, the idea is what if we deploy this kind of algorithms on an edge device to do it as the data is being acquired and save us from all that processing times, right?
So MATLAB does have this capability to deploy any algorithm, be it signal processing, controls, AI on multiple devices, edge devices, both the CPU, like microcontrollers, GPUs like embedded GPUs, Nvidia, Intel, programmable logic control, PLC. If you're doing something controlled, you can deploy directly on PLC and then also on SPGA boards.
So we do support multiplatform deployment. And we do have support lot of different hardware, which are available off the shelf like ARM Cortex microcontrollers, NVIDIA Jetson, and whatnot. So this whole process could be deployed to package as a function, which could be deployed on an edge device on the data acquisition device.
This would be a prototype function what it would look like. So as the data is coming in, let's say we have the system over there gets like 15 by 15 data, each of 2006. It would be doing the wavelet processing over here for that data set.
It would be loading the train deep learning network. And this is the place where be going to classify or call the classify command on a deep learning network in a 1,006 by 45 data set to get our output labels. And this output labels would be the output. So this edge device like Nvidia GPU would be taking in all this data and then giving us the labels for that data and then which could be saved to disk and server and save you from all that post-processing or the interpretation phase. So it's happening all automatic.
And I did do a full thorough speed up comparison of GPU vs CPU. We did see that it was 50 times more faster for running the data of trace of size 15 by 15. So if you see over here, I took just 15 by 15 segment.
And it was 50 times faster when I did a GPU deployment versus what was I was getting on the CPU. So we can leverage this different tools like GPU Coder, Embedded Coder to deploy directly on an NVIDIA Jetson platform and put this algorithm or a similar algorithm like this, which is doing short processing, migration models, everything packaged as a function on an edge device. So that we save all that post processing time.
So these are the different tools I've highlighted for if you are interested in doing the deployment. But I do want to emphasize all models you develop in MATLAB and Simulink can be deployed on embedded edge device, including the seismic processing model.
Additionally, if you're working with some enterprise level deployment like on a cloud based system, we have the tools to do that as well. And the main tool would be MATLAB Production Server in question. And now you can deploy the cloud platform, develop dashboards. You can deploy your own container image and whatnot and put it in a large scale production.
All right, so a quick recap. This is almost towards my final slide. So what we saw today is we were able to build a very complex algorithm with almost a low code or a no code approach with a minimal amount of coding iteratively in the app, using the wavelet analysis and then iterated it through the deep network designer, through the deep learning modeling and training and then kind of go back and forth to come up with a very robust model.
Now, additionally, handling big data, 200 gigs, was very easy because we already have container images on AWS and NGC, which we can leverage directly out of the box. And then last but not the least, we have the tools which convert your algorithms directly into the code, which could be put on the edge computing deployment and target-- slaughter the targets are directly supported out of the box, which allows you to do things like processor in the loop or SPG in the loop kind of prototyping and testing.
But then for those which are not, you can still get the end C code, C++ code, CUDA code, or HDL code, like a very long VDL code, which you can take on other platforms. All right, so everything which we did, I did not show you the code today. But this is the link.
You can go online. We have a blog article on the entire workflow. And it has all the code as well.
And you'll see that the code is almost minimal, which was used to do this training. And last but not the least with MathWorks, you do have access to a lot of resources. We have consulting. We have trainings.
We have guided evaluation. You have technical support. And then we, in training specifically, if you're new to wavelets, we have a one day course on wavelets, which goes in the depth of using the different wavelets, going over the theory of wavelets, discrete wavelet transform, continuous wavelet transform. And we believe that working with the real world signal, you can leverage a lot by using wavelet analysis in general.
There's also a wavelet TikTok series. I provided a link over here. Feel free to check it out online.
And then there are also deep learning trainings if you are interested in taking that. We have the links over here. But then there's a free training to our tutorial course. And also we have a 16 hour in-depth training, which is instructor led, which you can leverage.
All right, so that is what I all had today. Thank you very much for attending this talk. I can take questions at this time. And please feel free to add me on LinkedIn.
My ID is linkedin.com, Akhilesh Mishra Mathworks. And feel free to also email me if you have any additional questions. But at this moment, I'll open the floor to questions right now. Thank you very much.