How the Danish government used AI on satellite data to identify slurry tanks?
For the past few years, I heard a lot about the potential of Deep Learning in the geospatial industry to automate analytics and interpretation of raster data. There is a lot of research and some promising startups in that space, but there are not too many large scale success stories with production application of AI that come to my mind, so every such an example catches my attention.
An interesting project has been delivered at the end of the last year using Picterra platform. This Switzerland-based startup offers a cloud-based geospatial tool that enables users to train Deep Learning models on satellite and aerial imagery data without a single line of code. Picterra was used by the Danish agricultural advisory institute SEGES that has been tasked by the government to measure the level of the total country’s emission of ammonia.
The ammonia emissions happen mostly from so-called slurry tanks which are these large circular concrete structures where farmers gather all their animal waste. In addition, the emission is lower for containers having covered rooftops. As there were no data on the presence of the slurry tanks available and counting it manually on 34000 farms would take months, it seemed like a perfect use case for the object detection task.
SEGES had access to two critical data sources perfect for Deep Learning applications:
- A WMS imagery server covering the whole of Denmark at 25 cm of spatial resolution (1TB of data)
- Centroids of the 34.000 farms to be investigated
The first one was a great source of data with consistent quality over the entire country. The second one could be used to crop the whole dataset to areas of interest rather than to run the analytics over the entire dataset. The data has been plugged to Picterra, and the training datasets have been created.
Interestingly, Picterra builds their platform around the concept of low-shot learning which aims to deliver good results with a low number of labelled training data. The platform has a set of pre-trained models, training data augmentation workflows and a well-structured GPU architecture that allows you to apply transfer learning effectively and quickly build new classes of objects to be detected on the top of existing models. Once you have even a few objects of each class labelled, you don’t need to wait for hours to test the model, but you get the results within minutes to understand how much training data is still required.
FYI. If you have ever played with Deep Learning and you don’t have a fully automated data pipeline, you will understand how painful and time consuming the process of labelling, training and testing the models really is.
For the sake of this job, SEGES had prepared labels of just 56 slurry tanks with two classes of objects representing covered and uncovered reservoirs. Based on this input, the engine detected about 26k of slurry tanks with high levels of confidence (Recall > 90% and Precision > 85%). Based on the data the heatmap of emissions has be created.
The project is a great example showing that Deep Learning is already changing the way how geoscience is done. Until recently, the entry barrier to applying neural networks in our geoanalytics workflows has been too high. It required expensive GPU server setup, data scientists that would be able to develop data pipelines in a geospatial environment and tons of training data. With projects like Picterra, Deep Learning started to be accessible to the geospatial community… and guys from Picterra have just released a QGIS plugin to make it even easier for all of us.
Play around with the plugin and let me know your thoughts in the comments below.
How to map the impact of COVID-19 on your neighborhood using machine learning and satellite data?
Life amidst the COVID-19 pandemic has been somewhat surreal. Though I have been fortunate enough to be able to work from home during these unusual times, certain aspects of sheltering-in-place have felt palpably more bizarre – such as listening to birds instead of the car traffic during the morning ‘rush hour’.
Wanting to capture this strange moment in time, I set about to find a way to measure the impact of the novel coronavirus on my neighbourhood. Having done a fair bit of research on how COVID-19 was influencing mobility patterns all over the world already, there were a few good options.
From the geospatial science perspective, the most generic way to model it would be to measure the spatial aspect of the impact based on ground sensors such as mobile phones. Unfortunately, such datasets are reserved for the big players in the industry such as Google, Facebook, TomTom, HERE, or Mapbox. Satellite data, on the other hand, is much more accessible (even in high-res) if you know where to look.
I decided to analyze the change in the number of cars detected in an area as my unit of measurement. It’s not a perfect metric, but in principle, it should give fairly reasonable results for a larger urban area.
Now I had to find access to satellite data and object detection algorithms which I would be able to deploy quickly and effectively. In the past, I have been playing around with ArcGIS Pro Deep Learning toolbox but my 60-day trial expired, and buying a subscription is far too expensive for an individual. Also, I was looking to run a DIY geospatial experiment on a weekend using something that had the complexity I need, while being accessible.
After a bit more searching around, I stumbled upon UP42, who we have covered a few times on Geoawesomenss. The startup is a marketplace for geospatial data and dozens of analytical algorithms, where you simply combine blocks with data sources and algorithms and run the analytics directly in the cloud.
One of the pre-build algorithms I found is the “Small Vehicle Detection” model, so exactly what I was looking for. Additionally, UP42 gives you some free credits to get started which should let you analyze at least a few squares kilometres worth of satellite data. Now I was good to go!
I selected around 1 sq km area around the temporarily-closed Mall of America in Bloomington, Minnesota, as my test location. It took me around 15 mins for the setup of the “blocks” on the UP42 platform and ~1h of data processing in the cloud, and again 15 mins to analyze the results.
Overall, I’ve learned that year-to-year (April 16, 2019, to April 19, 2020) the number of vehicles parked and driving around the area has dropped around 3 times!
To make the analysis scientifically correct, I should be looking at a longer observation period to account for different hourly and weekday patterns. However, I believe that even data from a simple model is better than no data at all! So check out the following simple and straightforward method to measure the geospatial impact of COVID-19 in a neighbourhood — for free.
This is how you can do it yourself:
- Create an account on UP42 and you will receive 10,000 free credits that you can use on the platform
2. Go to Console and create a new project
3. Click to Catalog and select your area of interest
4. Select the satellite scenes you want to use in the analysis. I chose to use the imagery from high-resolution optical Earth-imaging satellites Pléiades
5. Go to View Parameters to find coordinated of your bounding box and ID of selected satellite imagery. Keep the “Catalog” open
6. Now, you need to go back to Console and create a workflow for the data analytics
7. In this step, you should select the satellite system you want to use
8. It’s time to select the analytics algorithms now
9. Phew! The workflow is now complete and you can move to more complex (and interesting) part
10. Save and configure your job. I know that it may look a bit scary to non-developers, but it’s actually quite easy, so stay with me…
11. Draw any simple polygon on the map window on the right side of the console. You’ll notice that a parameter “coordinates” in the console changed
12. Now go back to Catalog view (in point 5) and update two fields: “coordinates” and “ids” of satellite images. Be careful about copying text in the right brackets. You’ll need to rerun the process for both satellite scenes separately.
13. Now, you can run the job. Do it first on job type: “Test Query” followed by “Live Job” if everything goes well
14. Download the output GeoJSON and GeoTIFF and open it in QGIS to view and analyze the results and Excel (after some JSON format edits in Notepadd++) to count the detected vehicle.
When looking at the data in QGIS I could notice that there are some errors like detection far too many cars:
or some cars mixed with rooftop infrastructure:
These case would, however, had a quite low confidence level which fortunately is a parameter of every objected detected. The average confidence level of all observations in my datasets was at the level of 0.25 for 3001 objects, so I decided to filter out objects with the confidence level below 0.5. and I was left with 327 vehicles detected on April 19, 2019 and 109 on April 16, 2020 which is 3x drop!
The whole experiment took me about an hour + some idle cloud processing time and I ended up using only about 3,000 of my 10,000 free credits. Pretty awesome, right?
I hope you found this tutorial useful. Let me know in the comments!
P.S. There are some useful tutorials on UP42 YouTube channel to help you get started with the user interface
and Catalog Search