Foundational methodology for data science

A 10-stage data science methodology that spans technologies and approaches

Published April 2016

In the domain of data science, solving problems and answering questions through data analysis is standard practice. Often, data scientists construct a model to predict outcomes or discover underlying patterns, with the goal of gaining insights. Organisations can then use these insights to take actions that ideally improve future outcomes.

There are numerous rapidly evolving technologies for analysing data and building models. In a remarkably short time, they have progressed from desktops to massively parallel warehouses with huge data volumes and in-database analytic functionality in relational databases and Apache Hadoop. Text analytics on unstructured or semi-structured data is becoming increasingly important as a way to incorporate sentiment and other useful information from text into predictive models, often leading to significant improvements in model quality and accuracy.