7: Files
- Page ID
- 122365
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In this chapter, we start to work with Secondary Memory (or files). Secondary memory is not erased when the power is turned off. Or in the case of a USB flash drive, the data we write from our programs can be removed from the system and transported to another system.
- 7.1: Persistence
- This page discusses the shift from using CPU and main memory for temporary programming to using secondary memory for permanent data storage. It notes that programs have historically been transient and highlights the importance of secondary memory (like USB flash drives) for data persistence after power loss. The chapter emphasizes skills in reading and writing text files, with future lessons planned on managing database files intended for database software use.
- 7.2: Opening Files
- This page explains the process of opening a file on your hard drive, which requires confirmation from the operating system about the file's existence. It illustrates that opening a file, like 'mbox.txt', provides a file handle if successful, or raises an error if the file is missing. Future discussions will focus on using try and except to manage such errors better.
- 7.3: Text files and Lines
- This page explains that a text file is structured in lines, akin to how a Python string is composed of characters. It highlights an example of a mail activity log from an open-source project, detailing the file format and how mail messages are separated. The crucial role of the newline character (\n) is emphasized, serving as an indicator of line endings and reinforcing its significance in organizing text data, despite being represented as two symbols yet counted as one character.
- 7.4: Reading Files
- This page explains how to count lines in a file using a for loop in Python, emphasizing memory efficiency by reading one line at a time. It notes that the open function avoids loading the entire file into memory, making it suitable for large files. For smaller files, it suggests using the read method to load content into a string, though this is not advisable for larger files.
- 7.5: Searching through a File
- This page discusses a programming method for processing files by reading lines and filtering based on conditions. It illustrates extracting lines starting with "From:" using the `startswith` method, addressing extra blank lines, and introducing the `rstrip` method for trimming whitespace. The page also demonstrates the use of `continue` to skip uninteresting lines and the `find` method for searching specific substrings, accompanied by example code snippets and outputs.
- 7.6: Letting the user choose the file name
- This page describes a Python program that processes user-specified files by counting lines that start with "Subject:". It emphasizes flexibility, as no code changes are needed for different inputs, and demonstrates the program's functionality with examples. Additionally, it stresses the importance of error handling to ensure graceful failure in case of potential issues.
- 7.7: Using try, except, and open
- This page emphasizes the critical role of Quality Assurance (QA) in software development, particularly in managing user input that can lead to errors like FileNotFoundError. It advocates for using try/except structures in Python to handle these errors effectively, promoting "Pythonic" practices. The text suggests that incorporating error handling improves code reliability and adds an artistic touch to programming, merging technical skills with creativity.
- 7.8: Writing Files
- This page explains how to write to a file in Python by opening it in "w" mode, which clears existing data or creates a new file. It emphasizes using the write method without automatic newline addition and stresses the importance of closing the file after writing to prevent data loss, despite Python's management of open files at program termination.
- 7.9: Debugging
- This page addresses challenges with whitespace in file operations, emphasizing the debugging difficulties of invisible characters like spaces, tabs, and newlines. It highlights the `repr` function for visualizing these characters and notes the variability of line-ending characters across different systems, which can complicate file transfers. To mitigate these issues, it recommends using conversion tools or developing custom solutions.
- 7.E: Files (Exercises)
- This page outlines three programming exercises: one for reading a file and printing its contents in uppercase, another for extracting and calculating average spam confidence from lines containing "X-DSPAM-Confidence:", and a third modification to the file prompt that displays a humorous message if the input is "na na boo boo", while preserving regular functionality for other inputs.
- 7.G: Files (Glossary)
- This page covers key programming concepts: "catch" involves using try and except to manage exceptions; "newline" represents line endings in texts; "Pythonic" emphasizes elegant coding practices; "Quality Assurance" focuses on maintaining software quality through testing; and "text file" refers to character sequences saved on storage media.