Understanding Data Mining and Big Data

In April 2002, the International Council for Science (ICSU): Committee on Data for Science and Technology (CODATA)started the Data Science Journal,
 a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues.

Shortly thereafter, in January 2003, Columbia University began publishing The Journal of Data Science, which provided a platform for all data workers to present their views and exchange ideas. 
The journal was largely devoted to the application of statistical methods and quantitative research. 

In 2005, The National Science Board published "Long-lived Digital Data Collections: 
Enabling Research and Education in the 21st Century" 

defining data scientists as 
"the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to 
"conduct creative inquiry and analysis.


In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in the International Statistical Review.

 In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.





Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) 
and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. 


Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

Data science is the same concept as data mining and big data: "use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems".

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.

It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. 



In 2012, when Harvard Business Review called it the term "data science" became a buzzword. 


Hans Rosling, featured in a 2011 BBC documentary. Nate Silver referred to data science as a sexed up term for statistics. 

In many cases, earlier approaches and solutions are now simply rebranded as "data science" to be more attractive, which can cause the term to become "dilute beyond usefulness."

While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents.

To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources.