Linux – Using ‘split’ command
The split
command in Unix and Linux is used to divide a large file into smaller files. It’s particularly useful when you need to handle a large file in chunks or split data for easier processing or transmission.
Here’s a detailed breakdown of the usage of the split
command:
Basic Syntax:
split [OPTION]... [INPUT [PREFIX]]
- INPUT: The name of the file you want to split.
- PREFIX: The prefix for the names of the output files. By default, the output files will be named
xaa
,xab
,xac
, etc. If a prefix is provided, the output files will use that prefix (e.g.,fileaa
,fileab
, etc.).
Commonly Used Options:
1. Split by Size:
To split a file based on size, use the following options:
split -b SIZE INPUT PREFIX
SIZE
: Size of each chunk. You can specify the size in bytes (default), kilobytes (K), megabytes (M), or gigabytes (G).- Example:
split -b 10M largefile part_
- This splits
largefile
into chunks of 10 MB each with names likepart_aa
,part_ab
, etc.
- Example:
2. Split by Number of Lines:
To split a file based on the number of lines in each chunk:
split -l NUMBER INPUT PREFIX
NUMBER
: The number of lines each output file should have.- Example:
split -l 1000 data.txt output_
- This splits
data.txt
into files of 1000 lines each with names likeoutput_aa
,output_ab
, etc.
- Example:
3. Split by Number of Files:
To split a file into a specific number of output files:
split -n NUMBER INPUT PREFIX
NUMBER
: The number of chunks (files) to create.- Example:
split -n 5 bigfile part_
- This splits
bigfile
into 5 equally sized parts namedpart_aa
,part_ab
, etc.
- Example:
4. Split with Numeric Suffix:
By default, split
uses alphabetical suffixes (xaa
, xab
, etc.). To use numeric suffixes instead:
split --numeric-suffixes=1 INPUT PREFIX
- Example:
split --numeric-suffixes=1 largefile part_
- This creates files with names like
part_01
,part_02
, etc.
5. Custom Suffix Length:
To specify the length of the suffix (the default is 2 characters):
split -a LENGTH INPUT PREFIX
- Example:
split -a 3 largefile part_
- This would create files with names like
part_aaa
,part_aab
, etc., where the suffix is 3 characters long.
6. Verbose Output:
To see which files are being created during the split operation:
split --verbose INPUT PREFIX
- This option will print the name of each output file as it is being created.
7. Split from a Specific Starting Point:
If you want to start splitting a file from a specific location:
split -C SIZE INPUT PREFIX
- This ensures that no chunk will be larger than
SIZE
and split lines properly. - Example:
split -C 1M largefile part_
ensures each file is no larger than 1MB and doesn’t split inside lines.
8. Round Robin Split:
If you want to split a file by distributing lines in a round-robin manner across several output files:
split --number=l/N INPUT PREFIX
N
specifies how many files you want to split into, andl
is the method for distributing lines.- Example:
split --number=l/3 inputfile part_
distributes the lines ofinputfile
across 3 files in a round-robin fashion.
Examples:
- Split a file into chunks of 1 MB each:
split -b 1M largefile part_
This splits largefile
into chunks of 1 MB each, with file names starting from part_aa
, part_ab
, and so on.
- Split a file into files with 500 lines each:
split -l 500 inputfile segment_
This splits inputfile
into multiple files, each containing 500 lines, with names like segment_aa
, segment_ab
, etc.
- Split a file into 4 equal parts:
split -n 4 inputfile chunk_
This will split inputfile
into 4 equal-sized files.
Handling Binary Files:
If you’re working with binary files and want to ensure they’re split correctly without data corruption, you can still use the -b
option. Example:
split -b 512k binaryfile binpart_
This will split a binary file into 512 KB chunks.
Recombining Split Files:
To reassemble the split files back into one, use the cat
command:
cat part_* > combined_file
This will concatenate the files in the order they were created and restore them into a single file.
Conclusion:
The split
command is a powerful tool for dividing files into smaller pieces based on size, line count, or number of output files. It offers flexibility with custom prefixes, suffix lengths, and verbose output for ease of use. It’s commonly used when managing large datasets, splitting logs, or distributing files across systems.