Gathering the right data is perhaps one of the biggest challenges that Data Scientists face today. Not only are there a wide variety of data sources to sift from, but a huge amount of time and energy goes into cleaning and preparing the data for analysis. In the geospatial industry, is estimated that spatial data science teams are able to spend only 20% of their time on actual analysis, modeling, and communication of results.
When location intelligence platform CARTO built its Data Observatory, the chief idea was to create an up-to-date index of location data. The recently released Data Observatory 2.0 takes that vision forward to provide Data Scientists with a scalable platform full of rich data in the format they really need it in! CARTO is now hosting geospatial datasets on Google Cloud’s BigQuery public datasets program.
“We have come up with a smart metadata system that registers thousands of datasets, all of which are spatially indexed and fully cataloged for the exploration of variables and geographies,” says Javier de la Torre, founder and Chief Strategy Officer of CARTO, adding that adopting a modern, cloud-based approach to data pipelines will deliver several benefits to users.
The biggest advantage is the separation of computation from storage. According to CARTO, analysts can push all the data they want to BigQuery, but they will need to pay only when they compute analysis on the data. “This is important for any organization serving lots of spatial data because it balances the business model. Now, the data provider doesn’t face a huge bill every month, but the cost is distributed to the users of it, and whoever uses it more, pays more. It’s a big win for spatial data infrastructure business models,” Javier points out.
Other benefits include having access to a fully scalable infrastructure without setting up any servers, and the ability to decide to which user, inside or outside of the organization, can access your dataset without having to duplicate it. “This set of functionalities means that our Data Observatory is probably the most cost-effective spatial data infrastructure and possibly also the most advanced,” says Javier.
You can read more collaborating on public datasets here.