Building Together an Open Collection Of Historical Maps: Discover the Wikimaps project
The biggest challenge of our times regarding geographic information is to process and organize a massive amount of data from various sources, with various dates and especially, formats. Big data is about processing a huge and diversified volume of information, with high velocity and high relevance – which is what Google pretty much succeeded to do so far with the internet.
How about geographical information? How about the huge amount of maps, data points, places, layers of thematic information all the villages, cities, private corporations, land authorities, NGOs and other organizations have put together for centuries? How about this extremely rare and unique knowledge everybody has in their basement without even knowing it is here?
Galleries, libraries, archives and museums (the GLAMs) are the keepers of this unsuspected information, for the common good. How about we – you and me, the average citizen – succeed to dig out, put together, standardize and classify this dusty pile of historical maps and place related data and turn it into an open, free and smart online catalogue?
That is the goal and ambition behind wikimaps, the wikipedia project to collect historical maps. It might look simple, but there is a serious technical challenge behind it. Every company in the world is currently struggling to find solutions to keep velocity and relevance going while collecting trillions of data per minute. Every company is trying to find highest the added value in the mess (and masses) of data to improve their product and open new ways of consuming and informing. The spirit is more knowledge-oriented than business-minded regarding the wikimap project, but that is one more great initiative that will definitely contribute to collect and order our geographical legacy.
http://lincolnmullen.com/blog/the-spread-of-american-slavery/ Also, one interesting slideshow to know more about the project:
Wikimaps is a project where wikimedians work together with GLAMS (Galleries, Libraries, Archives and Museums) to collect old maps in Wikimedia Commons. The project will offer tools for georeferencing the maps via crowdsourcing, and make them available for historical mapping with the OpenHistoricalMap project. The information gathered from the maps will contribute to a historical gazetteer, a place name index across times.
Currently an enormous amount of maps are locked in archives. They have not been digitized, they are behind cumbersome user interfaces or copyrights limit their use. In most cases the map is just an image and does not align with real world coordinates. The Wikimaps project hope to change that!
7-Step Knowledge that could Hurl you into the Data Science World!
I am frequently asked how to learn data mining and data science. Here is my summary.
You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don’t forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of big data.
Here are seven steps for learning data mining and data science. Although they are numbered, you can do them in parallel or in a different order.
- Languages: Learn R, Python and SQL
- Tools: Learn how to use data mining and visualization tools
- Textbooks: Read introductory textbooks to understand the fundamentals
- Education: Watch webinars, take courses and consider a certificate or a degree in data science (Read more in Ben Lorica’s How to Nurture a Data Scientist.)
- Data: Check available data resources and find something there
- Competitions: Participate in data mining competitions
- Interact with other data scientists, via social networks, groups and meetings
In this article, I use data mining and data science interchangeably. See my presentation,Analytics Industry Overview, where I look at the evolution and popularity of different terms like statistics, knowledge discovery, data mining, predictive analytics, data science and big data.
1. Learning Languages
A recent KDnuggets Poll found that the most popular languages for data mining are R, Python, and SQLThere are many resources for each, for example
- Free e-book on Data Science with R
- Getting Started With Python For Data Science
- Python for Data Analysis: Agile Tools for Real World Data
- An Indispensable Python: Data Sourcing to Data Science
- W3 School’s Learning SQL
2. Tools: Data Mining, Data Science, and Visualization Software
There are many data mining tools for different tasks, but it is best to learn how to use a data mining suite that supports the entire process of data analysis. You can start with open-source (free) tools such as KNIME, RapidMiner and Weka.
However, for many analytics jobs you need to know SAS, which is the leading commercial tool and widely used. Other popular analytics and data mining software include MATLAB, StatSoft STATISTICA, Microsoft SQL Server, Tableau, IBM SPSS Modeler, and Rattle.
Visualization is an essential part of any data analysis. Learn how to use Microsoft Excel (good for many simpler tasks), R graphics, (especially ggplot2), and also Tableau – an excellent package for visualization. Other good visualization tools include TIBCO Spotfire and Miner3D.
There are many data mining and data science textbooks available, but you can check these:
- Data Mining and Analysis: Fundamental Concepts and Algorithms, free PDF download (draft), by Mohammed Zaki and Wagner Meira Jr.
- Data Mining: Practical Machine Learning Tools and Techniques, by Ian Witten, Eibe Frank and Mark Hall, from the authors of Weka, and using Weka extensively in examples
- The Elements of Statistical Learning, Data Mining, Inference and Prediction, by Trevor Hastie, Robert Tibshirani, Jerome Friedman. A great introduction for the mathematically oriented
- LIONbook: Learning and Intelligent Optimization, by Roberto Battiti and Mauro Brunato, freely available on the Web, chapter by chapter
- Mining of Massive Datasets Book, by A. Rajaraman, J. Ullman
- StatSoft Electronic Statistics Textbook (free), includes many data mining topics
4. Education: Webinars, Courses, Certificates and Degrees
You can start by watching some of the many free webinars and webcasts on latest topics in analytics, big data, data mining and data science.
There are also many online courses, short and long, many of them free. (See KDnuggets online education directory.)
Check in particular these courses:
- Machine Learning, at Coursera, taught by Andrew Ng
- Learning from Data at edX, taught by Caltech professor Yaser Abu-Mostafa
- Open Online Course in Applied Data Science, from Syracuse iSchool
- Data Mining with Weka, free online course
- Check also free online slides from my Data Mining Course, a semester-long introductory course in data mining
Finally, consider getting certificates in data mining, and data science or advanced degrees, such as a master’s degree in data science.
You will need data to analyze – see KDnuggets directory of Datasets for Data Mining, including:
- Government, federal, state, city, local and public data sites and portals
- Data APIs, hubs, marketplaces, platforms, portals and search engines
- Free public datasets
Again, you will best learn by doing, so participate in Kaggle competitions. Start with beginner competitions, such as Predicting Titanic Survival Using Machine Learning.
7. Interact: Meetings, Groups, and Social Networks
You can join many peer groups. See the Top 30 LinkedIn Groups for Analytics, Big Data, Data Mining, and Data Science.
AnalyticBridge is an active community for analytics and data science.
You can attend some of the many Meetings and Conferences on Analytics, Big Data, Data Mining, Data Science, & Knowledge Discovery.