Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Engineering LibreTexts

1: What Are Data and Data Science?

( \newcommand{\kernel}{\mathrm{null}\,}\)

  • 1.0: Introduction
    This page provides an overview of key terms in data and data science, illustrating their relevance in different domains. It highlights technologies utilized by data scientists, particularly emphasizing Python for data analysis, and aims to lay a technical groundwork for readers to grasp and implement more complex data science concepts in future chapters.
  • 1.1: What Is Data Science?
    This page outlines data science objectives and processes, emphasizing the data science cycle, which includes problem definition, data collection, preparation, analysis, and reporting. It highlights the importance of efficient data management and communication, particularly with cloud systems and data visualization, to convey insights effectively.
  • 1.2: Data Science in Practice
    This page highlights the interdisciplinary role of data science across various fields, focusing on its applications in business, finance, healthcare, engineering, public policy, and education. It provides examples such as retail analytics by Walmart and Amazon, predictive analytics in healthcare, and sports analytics like the Oakland Athletics' use of sabermetrics.
  • 1.3: Data and Datasets
    This page outlines key learning objectives in data science, focusing on definitions, data types, and structures. It emphasizes the distinction between quantitative and categorical data, structured versus unstructured datasets, and various formats like CSV, JSON, and XML. The text details how JSON uses key-value pairs and XML uses tags to represent data attributes.
  • 1.4: Using Technology for Data Science
    This page details learning objectives for data analysis, focusing on the use of statistical software and programming languages such as Python and R for data manipulation and visualization. It introduces Excel and Google Sheets for basic tasks while noting their limitations in complex analysis. Advanced tools like Tableau and PowerBI are mentioned for enhanced visualizations, alongside a discussion of evolving standards in data science, particularly with AI integration.
  • 1.5: Data Science with Python
    This page covers the process of loading and analyzing data in Python using Jupyter Notebook and Google Colaboratory. It focuses on essential libraries like Pandas for data manipulation, including handling CSV files and using DataFrames and Series. The text explains filtering data and visualizing it with Matplotlib, providing examples with the Iris dataset and movie profits.
  • 1.6: Key Terms
    This page offers definitions for key data science terms, covering data types, data management concepts, tools, and analytical methods. It emphasizes the significance of data analysis for insights and informed decision-making, mentioning essential programming languages like Python and R, and outlining the data science cycle.
  • 1.7: Group Project
    This page outlines three projects aimed at enhancing data science skills for students and professionals. Project A focuses on finding and cleaning secondary data while analyzing datasets relevant to specific policies. Project B involves downloading a dataset, formulating questions, and visualizing results using Python.
  • 1.8: Chapter Review
    This page includes multiple-choice questions on data science. The first question addresses incorrect step and goal pairings in the data science cycle. The second contrasts local storage with cloud systems in the evolution of data management. The third emphasizes the interdisciplinary nature of data science by asking for the best example among various fields, including history, mathematics, biology, and chemistry.
  • 1.9: Critical Thinking
    This page details tasks involving the Spotify and CancerDoc datasets, focusing on attributes, data type classification, entry identification, and data visualization using scatterplots and Python Pandas. It highlights the necessity of comprehending each dataset's characteristics and offers guidance for conducting analysis and visualization tasks.
  • 1.10: Quantitative Problems
    This page provides a guide on calculating the average beats per minute of 2023 songs using Python Pandas and spreadsheet programs like MS Excel or Google Sheets. It explains the AVERAGE() function in Excel with an example for clarity.
  • 1.11: References
    This page discusses diverse articles on data science trends, smart cities, customer behavior, Amazon's product delivery, agricultural automation, fraud detection, and Walmart visits. It highlights technological innovations in retail and agriculture, showcasing successes and failures in smart city projects, emphasizing the role of data analytics and digital solutions in transforming multiple sectors.


This page titled 1: What Are Data and Data Science? is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.

  • Was this article helpful?

Support Center

How can we help?