Statistical Learning and Data Science

Jim Gray envisioned "data-driven science" as a "fourth paradigm" of science that uses the computational analysis of large data as
primary scientific method and "to have a world in which all of the science literature is online, all of the science data is online, and they interoperate with each other.


data science is different from the existing practice of data analysis across all disciplines, which focuses only on explaining data sets. Data science seeks actionable and consistent pattern for predictive uses.

This practical engineering goal takes data science beyond traditional analytics. Now the data in those disciplines and applied fields that lacked solid theories, like health science and social science, could be sought and utilized to generate powerful predictive models

In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the University of Michigan.
In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making.

In his conclusion, he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists.
Later, he presented his lecture entitled "Statistics = Data Science?" as the first of his 1998 P.C. Mahalanobis Memorial Lectures.

These lectures honor Prasanta Chandra Mahalanobis, an Indian scientist and statistician and founder of the Indian Statistical Institute.



In 2013, the IEEE Task Force on Data Science and Advanced Analytics was launched. In 2013, the first "European Conference on Data Analysis (ECDA)" was organised in Luxembourg,
establishing the European Association for Data Science (EuADS).

The first international conference: IEEE International Conference on Data Science and Advanced Analytics was launched in 2014.

In 2014, General Assembly launched student-paid bootcamp and The Data Incubator launched a competitive free data science fellowship.

In 2014, the American Statistical Association section on Statistical Learning and Data Mining renamed its journal to "Statistical Analysis and Data Mining: The ASA Data Science Journal" and
in 2016 changed its section name to "Statistical Learning and Data Science".

In 2015, the International Journal on Data Science and Analytics was launched by Springer to publish original work on data science and big data analytics. In September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name of the Society "Data Science Society" at the third ECDA conference  at the University of Essex, Colchester, UK.

Data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists,
and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."

"Data science" has recently become a popular term among business executives.
However, many critical academics and journalists see no distinction between data science and statistics, whereas others consider it largely a popular term for "data mining" and "big data".
Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs.

In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a new term for a statistician...

Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”

Similarly, in business sector, multiple researchers and analysts state that data scientists alone are far from being sufficient in granting companies a real competitive advantage and consider data scientists as only one of the four greater job families companies require to leverage big data effectively, namely: data analysts, data scientists, big data developers and big data engineers.

Understanding Data Mining and Big Data

In April 2002, the International Council for Science (ICSU): Committee on Data for Science and Technology (CODATA)started the Data Science Journal,
 a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues.

Shortly thereafter, in January 2003, Columbia University began publishing The Journal of Data Science, which provided a platform for all data workers to present their views and exchange ideas. 
The journal was largely devoted to the application of statistical methods and quantitative research. 

In 2005, The National Science Board published "Long-lived Digital Data Collections: 
Enabling Research and Education in the 21st Century" 

defining data scientists as 
"the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to 
"conduct creative inquiry and analysis.


In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate "advances in computing with data" in his article "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics," which was published in the International Statistical Review.

 In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.





Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) 
and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. 


Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

Data science is the same concept as data mining and big data: "use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems".

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.

It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. 



In 2012, when Harvard Business Review called it the term "data science" became a buzzword. 


Hans Rosling, featured in a 2011 BBC documentary. Nate Silver referred to data science as a sexed up term for statistics. 

In many cases, earlier approaches and solutions are now simply rebranded as "data science" to be more attractive, which can cause the term to become "dilute beyond usefulness."

While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents.

To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources.