Skip to content
Bash Text Processing — Pipes, Redirection & Filters Explained

Bash Text Processing — Pipes, Redirection & Filters Explained

DodaTech Updated Jun 4, 2026 9 min read

Every Bash command produces output. Pipes and redirection let you send that output to files, to other commands, or nowhere at all — turning simple commands into powerful data pipelines.

What You’ll Learn

  • Redirect command output to files using > and >>
  • Connect commands with pipes (|) to build data pipelines
  • Search and filter text with grep, sed, and awk
  • Sort, count, and transform data with sort, uniq, cut, and tr
  • Avoid critical mistakes that can overwrite or lose data

Why Text Processing Matters

Real-world data doesn’t arrive neatly formatted. Log files contain millions of lines. CSVs need column extraction. Error reports require filtering. Durga Antivirus Pro processes gigabytes of scan logs daily — using grep to find threat patterns, awk to extract timestamps, and sort | uniq -c to count infection types. DodaZIP uses find | sort pipelines to batch-compress files by date. These one-liners save hours of manual work.

Learning Path

    flowchart LR
  A[Bash Basics] --> B[Pipes & Redirection<br/>You are here]
  B --> C[Shell Scripts]
  C --> D[Permissions & Users]
  D --> E[System Monitoring]
  
Prerequisites: You should know basic Bash like ls, cat, and cd. Understanding Linux helps. No programming experience needed.

How Data Flows Through Commands

Think of every command as having three pipes attached:

  • stdin (standard input, number 0) — where data comes in (keyboard by default)
  • stdout (standard output, number 1) — where normal results go (screen by default)
  • stderr (standard error, number 2) — where error messages go (screen by default)
    flowchart LR
  A[stdin<br/>0] --> B[Command]
  B --> C[stdout<br/>1]
  B --> D[stderr<br/>2]
  C --> E["| Pipe to next command"]
  C --> F["> Save to file"]
  

The key insight: These are just numbered channels. Redirection lets you rewire them. Send stdout to a file, stderr to a different file, or both to the same place.

Output Redirection — Saving Results

The > operator is like a funnel: it takes everything coming out of stdout and pours it into a file.

# List files and save the output to a file
ls -la > output.txt

# The file now contains the directory listing
cat output.txt

Watch out: > overwrites the file. Every time you use it, the old content is gone.

# Append instead — adds to the end
echo "new data" >> output.txt

# Now output.txt has both the old listing and the new line

Redirecting Errors

# Send errors to a file (normal output still shows on screen)
grep "error" log.txt 2> errors.txt

# Send everything to the same file
command > all-output.txt 2>&1

# The 2>&1 syntax means: "send stderr (2) to the same place as stdout (1)"

The Null Device — Throwing Output Away

# Send all output to /dev/null (a black hole)
command > /dev/null 2>&1

This is useful when you only care whether a command succeeds, not what it outputs.

Input Redirection — Reading from Files

# Read from a file instead of the keyboard
sort < unsorted.txt

The < symbol says “take the contents of this file and feed them as input.”

Here Documents — Multi-line Input

cat << EOF
This is a multi-line string.
Everything between EOF markers
is treated as input.
EOF

This is useful for scripts that need to generate text blocks.

Pipes (|) — The Plumbing

Think of a pipe (|) as a physical water pipe connecting two faucets. The output of the left command flows through the pipe into the input of the right command.

# List files, find text files, count them
ls -la | grep ".txt" | wc -l

Let’s break this down step by step:

  1. ls -la lists every file with details
  2. | grep ".txt" takes that list and keeps only lines containing .txt
  3. | wc -l takes the filtered list and counts how many lines remain

The output of each step feeds the next. No intermediate files needed.

A Realistic Log Analysis Pipeline

cat /var/log/syslog | grep "ERROR" | sort | uniq -c | sort -rn

Broken down:

  • cat reads the log file
  • grep "ERROR" keeps only error lines
  • sort arranges them alphabetically
  • uniq -c groups identical lines and counts them
  • sort -rn sorts by count, highest first

This one-liner shows you the most frequent errors, no matter how large the log file.

grep — Search Text

grep is like Ctrl+F on steroids. It searches for patterns in text.

# Basic search
grep "error" log.txt

# Ignore case: finds "Error", "ERROR", "error"
grep -i "warning" log.txt

# Search recursively through all files in a directory
grep -r "TODO" src/

# Invert match: show lines that do NOT contain the pattern
grep -v "debug" log.txt

# Count matches instead of showing them
grep -c "error" log.txt

# Show line numbers alongside matches
grep -n "error" log.txt

# Show only filenames that contain the pattern
grep -l "error" *.txt

# Extended regex: match "error", "fail", OR "crash"
grep -E "error|fail|crash" log.txt

sed — Stream Editor

sed edits text programmatically — like find-and-replace on steroids.

# Replace first "old" on each line with "new"
sed 's/old/new/' file.txt

# Replace ALL occurrences (global flag g)
sed 's/old/new/g' file.txt

# Replace only on lines 3 through 5
sed '3,5s/old/new/g' file.txt

# Delete lines containing "debug"
sed '/debug/d' file.txt

# Edit the file in-place (no separate output file)
sed -i 's/foo/bar/g' config.txt

Why the syntax is weird: sed uses a compact command language. s means “substitute”, the / characters are delimiters, and g means “global”. Think of it as: s/search-regex/replacement/flags.

awk — Pattern Scanning

awk is a mini programming language for structured text. Use it when you need to work with columns, sums, or conditional logic.

# Print the first column of each line
awk '{print $1}' file.txt

# Print columns 1 and 3 with a space between
awk '{print $1, $3}' data.csv

# Filter: lines where column 3 is greater than 100
awk '$3 > 100 {print $1, $3}' scores.txt

# Parse a CSV (fields separated by comma)
awk -F',' '{print $1, $3}' data.csv

# Sum all values in column 2
awk '{sum += $2} END {print "Total:", sum}' sales.txt

cut, sort, uniq, tr

# cut: extract characters or fields by position
cut -c1-10 file.txt           # first 10 characters of each line
cut -d',' -f1,3 data.csv      # fields 1 and 3 from a CSV

# sort: arrange lines
sort names.txt                # alphabetical
sort -n scores.txt            # numeric sort
sort -r names.txt             # reverse order

# uniq: remove duplicates (must sort first!)
sort items.txt | uniq         # unique list
sort items.txt | uniq -c      # count occurrences
sort items.txt | uniq -d      # show only duplicates

# tr: translate or delete characters
cat file.txt | tr 'a-z' 'A-Z'   # to uppercase
cat file.txt | tr -d ' '        # delete all spaces

Why must you sort before uniq? uniq only removes adjacent duplicates. If “apple” appears twice but separated by “banana”, both “apple” lines survive. Sorting first groups identical lines together.

The Useless Use of Cat

# Wasteful: starts a process just to pipe to grep
cat file.txt | grep foo

# Direct: grep opens the file itself
grep foo file.txt

The second form is faster and uses less memory. grep, awk, sed, sort all accept filenames directly.

Common Mistakes

1. Forgetting sort before uniq

uniq removes only adjacent duplicates. Non-adjacent duplicates survive. Always pipe sort before uniq.

2. Using > when you mean >>

> overwrites. >> appends. Accidentally overwriting a config file you meant to add to is a painful mistake.

3. Not quoting variables in awk

awk -v pattern="$search_term" '$0 ~ pattern' file.txt

Without quotes, $search_term expands and breaks if it contains spaces.

4. Confusing single and double quotes in sed

sed 's/$var/foo/' file.txt   # $var is literal text, not expanded
sed "s/$var/foo/" file.txt   # $var is expanded by the shell

Single quotes: literal. Double quotes: shell expands variables first.

5. Using cat when not needed (Useless Use of Cat)

Let tools read files directly. grep foo file.txt instead of cat file.txt | grep foo.

6. Forgetting -i.bak with sed

sed -i 's/foo/bar/' file.txt   # No backup — can't undo
sed -i.bak 's/foo/bar/' file.txt   # Creates file.txt.bak

Practice Questions

  1. What does > do vs >>? > overwrites the destination file. >> appends to it.

  2. How do you count how many files in a directory end in .log? ls | grep "\.log$" | wc -l

  3. What does 2>&1 mean? Redirect file descriptor 2 (stderr) to wherever file descriptor 1 (stdout) is going — usually used to merge error and normal output.

  4. Why does uniq need sort before it? uniq removes only adjacent duplicates. Sorting first ensures all identical lines are grouped together.

  5. What does the s command in sed do? Substitute — replaces text matching a pattern with replacement text. s/old/new/g replaces all occurrences of “old” with “new”.

Challenge: Write a one-liner that reads a web server access log, finds all 404 errors, extracts the IP address (first column), counts how many 404s each IP generated, and shows the top 5 offenders sorted by count descending.

FAQ

What is the difference between > and |?
> redirects output to a file. | redirects output to another command’s input. You can chain multiple pipes but only one > per command.
How do I save output AND see it on screen?
Use tee: command | tee output.txt shows the output on screen AND writes it to the file simultaneously.
What is the difference between grep and awk?
grep searches for text patterns. awk is a full programming language — it can search, transform, sum columns, format output, and more. If you need to sum a column, use awk; if you just need to find lines matching a pattern, use grep.
How do I edit a file in-place with sed?
sed -i 's/old/new/g' file.txt. Always add a backup extension: sed -i.bak 's/old/new/g' file.txt creates file.txt.bak before modifying.
How do I discard all output from a command?
command > /dev/null 2>&1 sends stdout to the null device and stderr to the same place. Nothing is shown.
What does the -r flag in grep do?
Recursive search — searches all files in a directory tree. Very useful for finding where something is used in a codebase.

Try It Yourself

Open your terminal and create a test file to experiment:

# Create a sample log file
cat > /tmp/sample.log << EOF
2026-06-06 INFO: Server started
2026-06-06 ERROR: Connection timeout on port 8080
2026-06-06 WARN: Disk usage at 85%
2026-06-06 ERROR: Failed to connect to database
2026-06-06 INFO: Request processed
2026-06-06 ERROR: Connection timeout on port 8080
EOF

# Now experiment:
grep "ERROR" /tmp/sample.log
grep -c "ERROR" /tmp/sample.log
grep "ERROR" /tmp/sample.log | sort | uniq -c
cut -d' ' -f1,3 /tmp/sample.log
awk '{print $3}' /tmp/sample.log | sort | uniq -c | sort -rn

Try changing the grep pattern, adding more lines to the file, or chaining more commands with pipes.

What’s Next

TutorialWhat You’ll Learn
Shell ScriptsWrite reusable scripts with variables, loops, and functions
Permissions & UsersManage file permissions, users, and groups
Python Text ProcessingCompare Bash text tools with Python’s approach

What’s Next

Congratulations on completing this Bash Io tutorial! Here’s where to go from here:

  • Practice daily — Consistency is more important than long study sessions
  • Build a project — Apply what you learned by building something real
  • Explore related topics — Check out other tutorials in the same category
  • Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro