Bash Text Processing — Pipes, Redirection & Filters Explained
Every Bash command produces output. Pipes and redirection let you send that output to files, to other commands, or nowhere at all — turning simple commands into powerful data pipelines.
What You’ll Learn
- Redirect command output to files using
>and>> - Connect commands with pipes (
|) to build data pipelines - Search and filter text with
grep,sed, andawk - Sort, count, and transform data with
sort,uniq,cut, andtr - Avoid critical mistakes that can overwrite or lose data
Why Text Processing Matters
Real-world data doesn’t arrive neatly formatted. Log files contain millions of lines. CSVs need column extraction. Error reports require filtering. Durga Antivirus Pro processes gigabytes of scan logs daily — using grep to find threat patterns, awk to extract timestamps, and sort | uniq -c to count infection types. DodaZIP uses find | sort pipelines to batch-compress files by date. These one-liners save hours of manual work.
Learning Path
flowchart LR
A[Bash Basics] --> B[Pipes & Redirection<br/>You are here]
B --> C[Shell Scripts]
C --> D[Permissions & Users]
D --> E[System Monitoring]
ls, cat, and cd. Understanding Linux helps. No programming experience needed.How Data Flows Through Commands
Think of every command as having three pipes attached:
- stdin (standard input, number 0) — where data comes in (keyboard by default)
- stdout (standard output, number 1) — where normal results go (screen by default)
- stderr (standard error, number 2) — where error messages go (screen by default)
flowchart LR
A[stdin<br/>0] --> B[Command]
B --> C[stdout<br/>1]
B --> D[stderr<br/>2]
C --> E["| Pipe to next command"]
C --> F["> Save to file"]
The key insight: These are just numbered channels. Redirection lets you rewire them. Send stdout to a file, stderr to a different file, or both to the same place.
Output Redirection — Saving Results
The > operator is like a funnel: it takes everything coming out of stdout and pours it into a file.
# List files and save the output to a file
ls -la > output.txt
# The file now contains the directory listing
cat output.txtWatch out: > overwrites the file. Every time you use it, the old content is gone.
# Append instead — adds to the end
echo "new data" >> output.txt
# Now output.txt has both the old listing and the new lineRedirecting Errors
# Send errors to a file (normal output still shows on screen)
grep "error" log.txt 2> errors.txt
# Send everything to the same file
command > all-output.txt 2>&1
# The 2>&1 syntax means: "send stderr (2) to the same place as stdout (1)"The Null Device — Throwing Output Away
# Send all output to /dev/null (a black hole)
command > /dev/null 2>&1This is useful when you only care whether a command succeeds, not what it outputs.
Input Redirection — Reading from Files
# Read from a file instead of the keyboard
sort < unsorted.txtThe < symbol says “take the contents of this file and feed them as input.”
Here Documents — Multi-line Input
cat << EOF
This is a multi-line string.
Everything between EOF markers
is treated as input.
EOFThis is useful for scripts that need to generate text blocks.
Pipes (|) — The Plumbing
Think of a pipe (|) as a physical water pipe connecting two faucets. The output of the left command flows through the pipe into the input of the right command.
# List files, find text files, count them
ls -la | grep ".txt" | wc -lLet’s break this down step by step:
ls -lalists every file with details| grep ".txt"takes that list and keeps only lines containing.txt| wc -ltakes the filtered list and counts how many lines remain
The output of each step feeds the next. No intermediate files needed.
A Realistic Log Analysis Pipeline
cat /var/log/syslog | grep "ERROR" | sort | uniq -c | sort -rnBroken down:
catreads the log filegrep "ERROR"keeps only error linessortarranges them alphabeticallyuniq -cgroups identical lines and counts themsort -rnsorts by count, highest first
This one-liner shows you the most frequent errors, no matter how large the log file.
grep — Search Text
grep is like Ctrl+F on steroids. It searches for patterns in text.
# Basic search
grep "error" log.txt
# Ignore case: finds "Error", "ERROR", "error"
grep -i "warning" log.txt
# Search recursively through all files in a directory
grep -r "TODO" src/
# Invert match: show lines that do NOT contain the pattern
grep -v "debug" log.txt
# Count matches instead of showing them
grep -c "error" log.txt
# Show line numbers alongside matches
grep -n "error" log.txt
# Show only filenames that contain the pattern
grep -l "error" *.txt
# Extended regex: match "error", "fail", OR "crash"
grep -E "error|fail|crash" log.txtsed — Stream Editor
sed edits text programmatically — like find-and-replace on steroids.
# Replace first "old" on each line with "new"
sed 's/old/new/' file.txt
# Replace ALL occurrences (global flag g)
sed 's/old/new/g' file.txt
# Replace only on lines 3 through 5
sed '3,5s/old/new/g' file.txt
# Delete lines containing "debug"
sed '/debug/d' file.txt
# Edit the file in-place (no separate output file)
sed -i 's/foo/bar/g' config.txtWhy the syntax is weird: sed uses a compact command language. s means “substitute”, the / characters are delimiters, and g means “global”. Think of it as: s/search-regex/replacement/flags.
awk — Pattern Scanning
awk is a mini programming language for structured text. Use it when you need to work with columns, sums, or conditional logic.
# Print the first column of each line
awk '{print $1}' file.txt
# Print columns 1 and 3 with a space between
awk '{print $1, $3}' data.csv
# Filter: lines where column 3 is greater than 100
awk '$3 > 100 {print $1, $3}' scores.txt
# Parse a CSV (fields separated by comma)
awk -F',' '{print $1, $3}' data.csv
# Sum all values in column 2
awk '{sum += $2} END {print "Total:", sum}' sales.txtcut, sort, uniq, tr
# cut: extract characters or fields by position
cut -c1-10 file.txt # first 10 characters of each line
cut -d',' -f1,3 data.csv # fields 1 and 3 from a CSV
# sort: arrange lines
sort names.txt # alphabetical
sort -n scores.txt # numeric sort
sort -r names.txt # reverse order
# uniq: remove duplicates (must sort first!)
sort items.txt | uniq # unique list
sort items.txt | uniq -c # count occurrences
sort items.txt | uniq -d # show only duplicates
# tr: translate or delete characters
cat file.txt | tr 'a-z' 'A-Z' # to uppercase
cat file.txt | tr -d ' ' # delete all spacesWhy must you sort before uniq? uniq only removes adjacent duplicates. If “apple” appears twice but separated by “banana”, both “apple” lines survive. Sorting first groups identical lines together.
The Useless Use of Cat
# Wasteful: starts a process just to pipe to grep
cat file.txt | grep foo
# Direct: grep opens the file itself
grep foo file.txtThe second form is faster and uses less memory. grep, awk, sed, sort all accept filenames directly.
Common Mistakes
1. Forgetting sort before uniq
uniq removes only adjacent duplicates. Non-adjacent duplicates survive. Always pipe sort before uniq.
2. Using > when you mean >>
> overwrites. >> appends. Accidentally overwriting a config file you meant to add to is a painful mistake.
3. Not quoting variables in awk
awk -v pattern="$search_term" '$0 ~ pattern' file.txtWithout quotes, $search_term expands and breaks if it contains spaces.
4. Confusing single and double quotes in sed
sed 's/$var/foo/' file.txt # $var is literal text, not expanded
sed "s/$var/foo/" file.txt # $var is expanded by the shellSingle quotes: literal. Double quotes: shell expands variables first.
5. Using cat when not needed (Useless Use of Cat)
Let tools read files directly. grep foo file.txt instead of cat file.txt | grep foo.
6. Forgetting -i.bak with sed
sed -i 's/foo/bar/' file.txt # No backup — can't undo
sed -i.bak 's/foo/bar/' file.txt # Creates file.txt.bakPractice Questions
What does
>do vs>>?>overwrites the destination file.>>appends to it.How do you count how many files in a directory end in
.log?ls | grep "\.log$" | wc -lWhat does
2>&1mean? Redirect file descriptor 2 (stderr) to wherever file descriptor 1 (stdout) is going — usually used to merge error and normal output.Why does
uniqneedsortbefore it?uniqremoves only adjacent duplicates. Sorting first ensures all identical lines are grouped together.What does the
scommand inseddo? Substitute — replaces text matching a pattern with replacement text.s/old/new/greplaces all occurrences of “old” with “new”.
Challenge: Write a one-liner that reads a web server access log, finds all 404 errors, extracts the IP address (first column), counts how many 404s each IP generated, and shows the top 5 offenders sorted by count descending.
FAQ
Try It Yourself
Open your terminal and create a test file to experiment:
# Create a sample log file
cat > /tmp/sample.log << EOF
2026-06-06 INFO: Server started
2026-06-06 ERROR: Connection timeout on port 8080
2026-06-06 WARN: Disk usage at 85%
2026-06-06 ERROR: Failed to connect to database
2026-06-06 INFO: Request processed
2026-06-06 ERROR: Connection timeout on port 8080
EOF
# Now experiment:
grep "ERROR" /tmp/sample.log
grep -c "ERROR" /tmp/sample.log
grep "ERROR" /tmp/sample.log | sort | uniq -c
cut -d' ' -f1,3 /tmp/sample.log
awk '{print $3}' /tmp/sample.log | sort | uniq -c | sort -rnTry changing the grep pattern, adding more lines to the file, or chaining more commands with pipes.
What’s Next
| Tutorial | What You’ll Learn |
|---|---|
| Shell Scripts | Write reusable scripts with variables, loops, and functions |
| Permissions & Users | Manage file permissions, users, and groups |
| Python Text Processing | Compare Bash text tools with Python’s approach |
What’s Next
Congratulations on completing this Bash Io tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro