The split command in Unix and Linux is used to divide a large file into smaller files. It’s particularly useful when you need to handle a large file in chunks or split data for easier processing or transmission.
Here’s a detailed breakdown of the usage of the split command:
Basic Syntax:
split [OPTION]... [INPUT [PREFIX]]- INPUT: The name of the file you want to split.
- PREFIX: The prefix for the names of the output files. By default, the output files will be named
xaa,xab,xac, etc. If a prefix is provided, the output files will use that prefix (e.g.,fileaa,fileab, etc.).
Commonly Used Options:
1. Split by Size:
To split a file based on size, use the following options:
split -b SIZE INPUT PREFIXSIZE: Size of each chunk. You can specify the size in bytes (default), kilobytes (K), megabytes (M), or gigabytes (G).- Example:
split -b 10M largefile part_ - This splits
largefileinto chunks of 10 MB each with names likepart_aa,part_ab, etc.
- Example:
2. Split by Number of Lines:
To split a file based on the number of lines in each chunk:
split -l NUMBER INPUT PREFIXNUMBER: The number of lines each output file should have.- Example:
split -l 1000 data.txt output_ - This splits
data.txtinto files of 1000 lines each with names likeoutput_aa,output_ab, etc.
- Example:
3. Split by Number of Files:
To split a file into a specific number of output files:
split -n NUMBER INPUT PREFIXNUMBER: The number of chunks (files) to create.- Example:
split -n 5 bigfile part_ - This splits
bigfileinto 5 equally sized parts namedpart_aa,part_ab, etc.
- Example:
4. Split with Numeric Suffix:
By default, split uses alphabetical suffixes (xaa, xab, etc.). To use numeric suffixes instead:
split --numeric-suffixes=1 INPUT PREFIX- Example:
split --numeric-suffixes=1 largefile part_ - This creates files with names like
part_01,part_02, etc.
5. Custom Suffix Length:
To specify the length of the suffix (the default is 2 characters):
split -a LENGTH INPUT PREFIX- Example:
split -a 3 largefile part_ - This would create files with names like
part_aaa,part_aab, etc., where the suffix is 3 characters long.
6. Verbose Output:
To see which files are being created during the split operation:
split --verbose INPUT PREFIX- This option will print the name of each output file as it is being created.
7. Split from a Specific Starting Point:
If you want to start splitting a file from a specific location:
split -C SIZE INPUT PREFIX- This ensures that no chunk will be larger than
SIZEand split lines properly. - Example:
split -C 1M largefile part_ensures each file is no larger than 1MB and doesn’t split inside lines.
8. Round Robin Split:
If you want to split a file by distributing lines in a round-robin manner across several output files:
split --number=l/N INPUT PREFIXNspecifies how many files you want to split into, andlis the method for distributing lines.- Example:
split --number=l/3 inputfile part_distributes the lines ofinputfileacross 3 files in a round-robin fashion.
Examples:
- Split a file into chunks of 1 MB each:
split -b 1M largefile part_This splits largefile into chunks of 1 MB each, with file names starting from part_aa, part_ab, and so on.
- Split a file into files with 500 lines each:
split -l 500 inputfile segment_This splits inputfile into multiple files, each containing 500 lines, with names like segment_aa, segment_ab, etc.
- Split a file into 4 equal parts:
split -n 4 inputfile chunk_This will split inputfile into 4 equal-sized files.
Handling Binary Files:
If you’re working with binary files and want to ensure they’re split correctly without data corruption, you can still use the -b option. Example:
split -b 512k binaryfile binpart_This will split a binary file into 512 KB chunks.
Recombining Split Files:
To reassemble the split files back into one, use the cat command:
cat part_* > combined_fileThis will concatenate the files in the order they were created and restore them into a single file.
Conclusion:
The split command is a powerful tool for dividing files into smaller pieces based on size, line count, or number of output files. It offers flexibility with custom prefixes, suffix lengths, and verbose output for ease of use. It’s commonly used when managing large datasets, splitting logs, or distributing files across systems.
