Skip to content
sort and uniq Commands in Linux — Sort & Deduplicate Data

sort and uniq Commands in Linux — Sort & Deduplicate Data

DodaTech Updated Jun 20, 2026 5 min read

The sort and uniq commands organize and deduplicate text data in Linux — from sorting log files to counting unique visitors in access logs. Combined in pipelines, they form one of the most powerful data-processing toolchains in the shell.

What You’ll Learn

You’ll master sorting by columns, numeric values, and months; extracting unique lines; counting duplicate occurrences; and combining sort with uniq to summarize structured data like logs and CSVs.

Why sort and uniq Matter

Unstructured data is noise — sorted data is signal. System administrators use sort+uniq to find the most frequent error codes, count unique IP addresses, and identify duplicate entries in configuration files. DodaZIP uses sort internally to organize archive manifests, and Durga Antivirus Pro ranks threat signatures by frequency using these same commands.

Learning Path

    flowchart LR
  A[Essential Commands] --> B[Text Processing Tools]
  B --> C[sort & uniq<br/>You are here]
  C --> D[cut & tr]
  C --> E[awk & sed]
  style C fill:#f90,color:#fff
  
Prerequisites: Familiarity with essential Linux commands and piping (|). These commands work on any Linux distribution.

Syntax Overview

sort [options] [file...]
uniq [options] [input_file [output_file]]

sort Options Table

OptionDescription
-nNumeric sort (123 before 89)
-rReverse (descending) order
-kSort by column / field
-tField separator (default: whitespace)
-uUnique lines (same as sort | uniq)
-MSort by month (Jan, Feb, …)
-hHuman-readable numbers (2K, 3G)
-sStable sort (preserve original order for ties)
-fCase-insensitive sort

uniq Options Table

OptionDescription
-cPrefix lines by count of occurrences
-dOnly print duplicate lines
-uOnly print unique (non-duplicate) lines
-iCase-insensitive comparison
-w NCompare at most N characters per line

Examples

Example 1: Basic sort

$ cat fruits.txt
banana
apple
cherry
date

$ sort fruits.txt
apple
banana
cherry
date

Lines are sorted alphabetically (lexicographically) by default.

Example 2: Numeric Sort (-n)

$ cat numbers.txt
10
2
33
1

$ sort numbers.txt
1
10
2
33

$ sort -n numbers.txt
1
2
10
33

Without -n, 10 comes before 2 because 1 < 2 alphabetically.

Example 3: Reverse Sort (-r)

$ sort -r numbers.txt
33
2
10
1

-r reverses any sort order — useful for top-N lists.

Example 4: Sort by Column (-k)

$ cat employees.txt
Alice 45000
Bob   38000
Carol 52000

$ sort -k2 -n employees.txt
Bob   38000
Alice 45000
Carol 52000

Sorts by column 2 (salary) numerically. The -t flag changes the field separator — for CSVs use -t','.

Example 5: Unique Lines with sort -u

$ cat duplicates.txt
alpha
beta
alpha
gamma
beta

$ sort -u duplicates.txt
alpha
beta
gamma

sort -u sorts and removes duplicates in one pass.

Example 6: Sort by Month (-M)

$ cat months.txt
Mar
Jan
Dec
Apr
Nov

$ sort -M months.txt
Jan
Mar
Apr
Nov
Dec

-M understands three-letter month abbreviations.

Example 7: Count Duplicates with uniq -c

$ cat access.log
192.168.1.1
192.168.1.2
192.168.1.1
192.168.1.3
192.168.1.1

$ sort access.log | uniq -c
      3 192.168.1.1
      1 192.168.1.2
      1 192.168.1.3

uniq requires sorted input — always pipe through sort first.

Example 8: Only Duplicates (uniq -d)

$ sort access.log | uniq -d
192.168.1.1

Only lines that appear more than once are shown.

Example 9: Pipe sort | uniq — Top Error Codes

$ cat /var/log/syslog | grep "ERROR" | awk '{print $NF}' | sort | uniq -c | sort -rn
    127 Connection refused
     83 Timeout occurred
     12 Disk full

This pipeline extracts the last field from ERROR lines, counts each code, and sorts by frequency descending.

Example 10: Stable Sort (-s)

$ cat data.csv
Alice 45000 Marketing
Bob   38000 Engineering
Carol 52000 Marketing
David 41000 Engineering

$ sort -s -k3 data.csv
Bob   38000 Engineering
David 41000 Engineering
Alice 45000 Marketing
Carol 52000 Marketing

Stable sort preserves the original input order for lines with the same key — here, alphabetical order within each department is preserved.

Common Use Cases

Use CaseCommand
Sort files by sizels -l | sort -k5 -n
Find top 5 IP addressessort access.log | uniq -c | sort -rn | head -5
Remove duplicate linessort -u file.txt > cleaned.txt
Sort by human-readable sizedu -sh * | sort -h
Count unique loginslast | awk '{print $1}' | sort | uniq -c

Common Errors

  • uniq without sort: uniq only compares adjacent lines — if data isn’t sorted, duplicates won’t be detected.
  • sort -n on non-numeric data: If a field contains non-numeric characters, sort -n treats it as 0.
  • Column sorting with wrong delimiter: Use -t to set the field separator (e.g., -t',' for CSV).
  • Case sensitivity: By default both sort and uniq are case-sensitive. Add -f or -i to ignore case.
  • Large file performance: Sorting files larger than RAM can be slow — use -S to specify buffer size.

Practice Exercises

  1. Basic sort: Create a file with 10 random words and sort them alphabetically.
  2. Numeric + reverse: Sort a list of numbers from highest to lowest.
  3. Column sort: Sort a CSV by column 3 (numeric).
  4. Dedup + count: Count how many times each word appears in a text file.
  5. Log analysis: Find the top 10 most frequent IP addresses in access.log.

Challenge

Write a one-liner that reads /var/log/auth.log, extracts failed login usernames, sorts them, counts occurrences, and displays only usernames that failed more than 5 times — sorted by most failures first. Durga Antivirus Pro uses similar patterns to detect brute-force attacks.

grep "Failed password" /var/log/auth.log | awk '{print $9}' | sort | uniq -c | awk '$1 > 5' | sort -rn

Real-World Task

Analyze a web server access log to find:

  1. The busiest hour of the day
  2. The most requested URL
  3. The top 5 referrer domains

Use sort, uniq, cut, and head in a pipeline.

What is the sort command?

The sort command arranges lines of text files in a specified order — alphabetical, numeric, by month, or by any column — and outputs the result to stdout.

What is the uniq command?

The uniq command filters adjacent duplicate lines from sorted input, optionally counting them or showing only duplicates.

Related Tutorials

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro