#Business #GeoDev

7-Step Knowledge that could Hurl you into the Data Science World!

imagesTakeaway: Data science is best learned by doing, but a good foundation of statistics and machine learning matters too.

I am frequently asked how to learn data mining and data science. Here is my summary.
You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don’t forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of big data.

Here are seven steps for learning data mining and data science. Although they are numbered, you can do them in parallel or in a different order.

  1. Languages: Learn RPython and SQL
  2. Tools: Learn how to use data mining and visualization tools
  3. Textbooks: Read introductory textbooks to understand the fundamentals
  4. Education: Watch webinars, take courses and consider a certificate or a degree in data science (Read more in Ben Lorica’s How to Nurture a Data Scientist.)
  5. Data: Check available data resources and find something there
  6. Competitions: Participate in data mining competitions
  7. Interact with other data scientists, via social networks, groups and meetings

In this article, I use data mining and data science interchangeably. See my presentation,Analytics Industry Overview, where I look at the evolution and popularity of different terms like statistics, knowledge discovery, data mining, predictive analytics, data science and big data.
1. Learning Languages

A recent KDnuggets Poll found that the most popular languages for data mining are R, Python, and SQLThere are many resources for each, for example

2. Tools: Data Mining, Data Science, and Visualization Software

There are many data mining tools for different tasks, but it is best to learn how to use a data mining suite that supports the entire process of data analysis. You can start with open-source (free) tools such as KNIMERapidMiner and Weka.

However, for many analytics jobs you need to know SAS, which is the leading commercial tool and widely used. Other popular analytics and data mining software include MATLAB, StatSoft STATISTICA, Microsoft SQL Server, Tableau, IBM SPSS Modeler, and Rattle.

Visualization is an essential part of any data analysis. Learn how to use Microsoft Excel (good for many simpler tasks), R graphics, (especially ggplot2), and also Tableau – an excellent package for visualization. Other good visualization tools include TIBCO Spotfire and Miner3D.
3. Textbooks

There are many data mining and data science textbooks available, but you can check these:

4. Education: Webinars, Courses, Certificates and Degrees

You can start by watching some of the many free webinars and webcasts on latest topics in analytics, big data, data mining and data science.

There are also many online courses, short and long, many of them free. (See KDnuggets online education directory.)

Check in particular these courses:

Finally, consider getting certificates in data mining, and data science or advanced degrees, such as a master’s degree in data science.
5. Data

You will need data to analyze – see KDnuggets directory of Datasets for Data Mining, including:

6. Competitions

Again, you will best learn by doing, so participate in Kaggle competitions. Start with beginner competitions, such as Predicting Titanic Survival Using Machine Learning.
7. Interact: Meetings, Groups, and Social Networks

You can join many peer groups. See the Top 30 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science.

AnalyticBridge is an active community for analytics and data science.

You can attend some of the many Meetings and Conferences on Analytics, Big Data, Data Mining, Data Science, & Knowledge Discovery.

Also, consider joining ACM SIGKDD, which organizes the annual KDD conference – the leading research conference in the field.
Source: Article pulled from KDNuggets.com

Say thanks for this article (0)
The community is supported by:
Become a sponsor
1 Year Update for The Geospatial Index
Avatar for The Geospatial Investor
The Geospatial Investor 09.27.2023
#Business #Featured
2023 Global Top 100 Geospatial Companies – Nominations are now open
Avatar for Muthukumar Kumar
Muthukumar Kumar 10.4.2022
#Business #Environment
Water quality mapping and how it may protect swimmers
Stefan Mühlbauer 06.5.2023
Next article

A simple Collection of Some Cool GeoTools Useful for GeoGeeks!



Name Description User-Friendliness Costs URL
Generate & collect geodata
Python Programming language (ideal for scraping) medium free https://www.python.org/
kimonify Scraping tool simple free http://www.kimonolabs.com/
OutWit Hub Scraping tool medium free https://www.outwit.com/products/hub/
Tabula Extract data from.pdf-files simple free http://tabula.nerdpower.org/
OpenPaths GPS-Tracking for Smartphones simple free https://openpaths.cc/
Google Location History GPS-Tracking for Smartphones simple free https://maps.google.com/locationhistory
GPS-Logger Device for recording GPS-positions simple under 100€
GPS-Visualizer Geocoder for small to medium amounts of addresses simple free http://www.gpsvisualizer.com/geocoder/
OpenStreetMap Exporting OSM Data simple free http://overpass-turbo.eu/
Processing, Cleaning and Exporting Geodata
MS Excel Data processing simple cheap http://www.microsoftstore.com/
MS Access Storing and processing data medium medium http://www.microsoftstore.com/
Notepad++ Texteditor (works well with large csv-files) simple free http://notepad-plus-plus.org/
CSVed Editor for .csv-files simple free http://csved.sjfrancke.nl/
open / google refine Data processing/cleaning medium free http://openrefine.org/
DataWrangler Data processing/cleaning medium free http://vis.stanford.edu/wrangler/
Operations and File Converting
Mygeodataconverter Online converter for diverse geodata file formats simple free http://converter.mygeodata.eu/
GeoConverter Online converter for diverse geodata file formats (incl. WFS) simple free http://geoconverter.hsr.ch/
ShapeEscape Data converter (shp to JSON and Fusion Tables) simple free http://www.shpescape.com/
Mapshaper Geodata simplification (reduces complexity and file size) simple free http://mapshaper.org/
Google Earth Displaying and creating .KML-files simple free http://www.google.de/intl/de/earth/
QGIS Geographic Information System (GIS) medium free http://www.qgis.org/de/site/
ArcGIS Desktop Geographic Information System (GIS) hard Not Free http://www.esri.de/
PostGIS Database system with geospatial extensions hard free http://postgis.net/
Visualization and Publishing
Mapbox Create individually styled webmaps simple free https://www.mapbox.com/
CartoDB Create individually styled webmaps, host geodata in the cloud simple free http://cartodb.com/
QGIS2LEAF Export-Plugin for QGIS (install via Plugin-Manager) simple free https://github.com/Geolicious/qgis2leaf
Leaflet Create and publish individual webmaps (Javascript) medium free http://leafletjs.com/
Open Layers Create and publish individual webmaps (Javascript) medium free http://openlayers.org/
D3 Multitalented dataviz framwork (Javascript) hard free http://d3js.org/
Tableau Analyzing and visualizing (geo-)data + publish to web medium free http://www.tableausoftware.com/public/
Tilemill Render your own map tiles medium free https://www.mapbox.com/tilemill/
Fusion Tables Data processing + creating webmaps simple free https://support.google.com/fusiontables/answer/2571232
StoryMap js Create animated (story) maps simple free http://storymap.knightlab.com/
Odysee js Create animated (story) maps medium free http://cartodb.github.io/odyssey.js/
ColorBrewr Color advice for maps simple free http://colorbrewer2.org/

Source: Mappable.info

Read on