Skip to article frontmatterSkip to article content

Introduction to UNIX and Filesystems: Navigating and Managing Data

What is a Filesystem?

A filesystem is a way of organizing and storing data on a computer. It provides a structure for how data is stored and accessed. In the context of your computer, think of it as a file cabinet where you organize and store your files. Each file is stored in a specific location known as a directory or folder.

Directories and Folders

On macOS, you might be familiar with the Finder application, which allows you to navigate through directories and see the contents of your computer’s filesystem in a graphical way. Directories, also known as folders, are containers that hold files or other directories.

Finder

In the example above, “Documents” is a directory that can contain multiple files. Understanding the concept of directories is crucial for organizing and managing your data, especially in research.

Enter: UNIX and Terminal

UNIX is a powerful and versatile operating system that has been widely used in scientific computing, including astronomy and data science. It provides a command-line interface (CLI; also called the shell or the terminal) for interacting with the computer’s operating system. While modern versions of UNIX, such as Linux, macOS, and others, have evolved, they still retain the core principles and commands that originated with the original UNIX system.

The terminal is a text-based interface that allows users to interact with their computer’s operating system by typing commands. Unlike graphical user interfaces (GUIs) like Finder, which use visual elements like windows and buttons, the terminal relies on text commands.

A note on terminology:

  • Operating Systems: UNIX, Linux, macOS, Windows, Debian
  • CLIs: Command line, terminal, shell (specific CLI)
  • Shell Languages: bash, zsh

Key Components of the Terminal:

  • Prompt: The prompt displays information about the system’s current state and awaits your command.
  • Command Line: This is where you input commands. You type a command and press Enter to execute it.
  • Output: After executing a command, the terminal provides output, displaying information or responses from the system.
Terminal

In practice, your default mac terminal might look something like this:

Terminal

iTerm

Anatomy of a UNIX Command:

Understanding the anatomy of a UNIX command is crucial for effectively using the command line. Let’s break down the key components: command, options/flags, and arguments.

A UNIX command typically follows the structure:

command [options/flags] [arguments]
  1. Command:

    • The primary action you want the computer to perform.
    • Examples: ls, cd, cp, mv, echo, etc.
  2. Options/Flags:

    • Flags modify the behavior of the command.
    • Usually preceded by a hyphen (-) or double hyphen (--).
    • Examples: -l, -a, --verbose, --force, etc.
  3. Arguments:

    • The entities upon which the command acts.
    • Can be file names, directories, strings, etc.
    • Examples: file.txt, directory/, string, etc.

Example Commands:

  1. Basic Command:
    ls
    • Command: ls
    • Options/Flags: None
    • Arguments: None
  1. Command with Options:
    ls -l
    • Command: ls
    • Options/Flags: -l
    • Arguments: None
  1. Command with Arguments:
    cp file.txt backup/
    • Command: cp
    • Options/Flags: None
    • Arguments: file.txt, backup/

Common UNIX Commands

1. pwd - Print Working Directory

The pwd command is used to print the current working directory, which is the directory you are currently in within the file system.

pwd

This command will display the full path to the current directory.

2. ls - List Files

The ls command is used to list the files and directories in the current directory.

ls

This command provides a simple listing of the files and directories in the current location.

3. cd - Change Directory

The cd command is used to change the current working directory. You can move to a different directory by specifying the path.

cd /path/to/directory

Use cd .. to move up one level in the directory hierarchy.

Understanding and practicing these basic commands will give you the foundation to navigate and interact with the file system using the UNIX command line. As we progress in the workshop, we’ll explore more advanced commands and concepts.

Why are we doing this?

1. Organizing Data

When you conduct research, you often work with large datasets and various files. Knowing how to navigate through directories using the command line is essential. Let’s say you have to migrate a bunch of data files into a different location on your computer. You can use the following commands:


# Create a subdirectory to organize the files
mkdir data_files

# Move all files with a specific extension (e.g., .txt) into the subdirectory
mv *.txt data_files/

The mkdir data_files command creates a new subdirectory named “data_files” in the current directory. Then the mv *.txt data_files/ command moves all files with the extension “.txt” into the “data_files” subdirectory. There are lots of different ways you can change these commands to move your data in different ways:

  • you can customize the file extension based on your specific data files
  • you can specify arbitrary folders to move files between rather than using the current directory
  • you could use keywords in the filenames to select files rather than their extensions

UNIX commands are incredibly flexible and can become incredibly complicated. There will almost always be a solution your specific problem!

2. Navigating to Data in your Code

Understanding filesystems becomes crucial when writing code. Your code will not know where your data lives unless you tell it. If your code throws an error like “File not found” or “No such file or directory,” it’s often because the file is not in the specified directory (or there’s a typo in the file name).

You will see examples of specifying the path to the data once we start working with Python in the coming weeks!

3. Executing Code, Managing Dependencies, Version Control, and More!

Understanding UNIX + Filesystems is important for many many more aspects of research beyond data management! We can’t get into all of these now, but as a preview:

  • When we want to ``execute" python code, we can do it from command line
  • When we want to remotely operate another computer (for instance, at a telescope or a supercomputer cluster)
  • When we want to manage code installations + packages

Text Editors: Vim and Nano

Vim

Vim (Vi Improved) is a powerful and highly configurable text editor that is widely used in the programming community. It operates in different modes, allowing users to navigate, edit, and manipulate text efficiently. Vim has a steeper learning curve, but its super versatile and its my favorite.

Basic Vim Commands

  • Normal Mode: Used for navigating and manipulating text. You start in normal mode, and can press Esc to return to normal mode.

    • h, j, k, l, or arrow keys: Move left, down, up, and right, respectively.
    • dd: Delete a line.
    • yy: Copy a line.
    • p: Paste the copied or deleted text.
  • Insert Mode: Used for inserting or editing text. Press i to enter Insert Mode.

    • You can type to insert your desired text.
  • Visual Mode Used for selecting text. Press v to enter Visual Mode.

    • You can select blocks of text (to copy/cut/paste different amounts of text)
    • Special sub-modes: visual line mode (V) or visual block mode (<C-V>)
  • Command Mode: Used for executing commands. : to enter command mode.

    • :w: Save changes.
    • :q: Quit Vim (without changes) or :q! to force quit and delete unsaved changes
    • :wq or :x or ZZ: Save and quit.
    • You can also do cool things like find and replace in this mode!

Nano

Nano is a straightforward and user-friendly text editor that is beginner-friendly. It provides a simple interface for editing text files and is particularly suitable for quick edits or when a full-featured editor like Vim might be overwhelming.

Basic Nano Commands

  • Saving Changes:

    • Press Ctrl + O to write changes.
    • Press Enter to confirm the file name.
  • Exiting Nano:

    • Press Ctrl + X to exit Nano.
  • Editing Text:

    • Use arrow keys to navigate.
    • Use Backspace to delete characters.
    • Press Ctrl + K to cut a line.
    • Press Ctrl + U to paste the cut line.

Choosing an Editor

The choice between Vim and Nano often depends on personal preference and familiarity. Vim’s power lies in its extensive features, while Nano excels in simplicity and ease of use. Both editors offer efficient ways to manipulate and edit text, and the choice ultimately depends on your comfort level and requirements.

Piping UNIX Commands

Piping involves directing the output of one command as input to another command. The symbol for piping is |. This allows you to chain commands together, creating powerful and flexible workflows.

Example 1: Basic Pipe between Two Commands

ls -l | grep "txt"
  • The ls -l command lists files in long format.
  • The output of ls -l is piped (|) to the grep "txt" command.
  • grep "txt" searches for lines containing the string “txt” in the output.

Example 2: Saving Output using Piping

You can also pipe the output of a UNIX command into a text file to save it. Here’s an example:

# List all files in the current directory and save the output to a text file
ls -l > file_list.txt
  • Again, the ls -l command lists files in long format.
  • The > symbol redirects the output of the command to a file.
  • file_list.txt is the name of the text file where the output will be saved.

After executing this command, the detailed listing of files in the current directory will be saved in the file_list.txt file.

And that’s it!

Appendix: Useful UNIX commands for your reference

Here’s a list of some important UNIX commands that can serve as a quick reference:

  • pwd: Print the current working directory.
  • ls: List files and directories in the current directory.
  • cd: Change the current working directory.
    • cd /path/to/directory: Change to the specified directory.
    • cd ..: Move up one level in the directory hierarchy.
  • mkdir: Create a new directory.
    • mkdir new_directory: Create a directory named “new_directory”.
  • cp: Copy files or directories.
    • cp file.txt /path/to/destination: Copy “file.txt” to the specified destination.
  • mv: Move or rename files or directories.
    • mv old_name new_name: Rename a file or directory.
    • mv file.txt /path/to/destination: Move “file.txt” to the specified destination.
  • rm: Remove files or directories.
    • rm file.txt: Remove “file.txt”.
    • rm -r directory: Remove a directory and its contents.
  • cat: Display the content of a file.
    • cat file.txt: Display the content of “file.txt”.
  • echo: Print text to the terminal.
    • echo "Hello, World!": Print the text “Hello, World!”.
  • man: Display the manual or help for a command.
    • man ls: Show the manual for the ls command.
  • chmod: Change file permissions.
    • chmod +x script.sh: Add execute permission to a script.
  • grep: Search for a pattern in files.
    • grep pattern file.txt: Search for “pattern” in “file.txt”.
  • ps: Display information about running processes.
    • ps aux: Show detailed information about all processes.
  • kill: Terminate a process.
    • kill process_id: Terminate the process with the specified ID.
  • nano or vim: Text editors for creating and editing files.
    • nano filename.txt: Open “filename.txt” in the Nano text editor.

These commands cover basic file and directory manipulation, text file viewing, process management, and more.