File Size calculation problems

Hello, I have a question about the size calculation with regard to different disks. I found same files on different disks show totally different size as calcualted with the du -ch command. Also I found the even the total size 6.3G is not equivalent to the sum of the files’ sizes. So what happended:

(base) jiangxu@discovery2:~/proj3/NGS_Sequencing/20180730_CTCC_cryomilling_and_crosslinking/data_analysis/hg19_analysis/HiC_Pro_Analysis/output_01/bowtie_results/bwt2_local/01$ ls -lah
total 2.1G
drwxrws— 2 jiangxu linchen_209 24 Dec 22 00:20 .
drwxrws— 3 jiangxu linchen_209 1 Dec 21 23:54 …
-rw-rw---- 1 jiangxu linchen_209 239M Dec 21 23:58 CTCC_SDS_HindIII_R1_block_06_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 897M Dec 21 23:54 CTCC_SDS_HindIII_R1_block_06_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 237M Dec 22 00:04 CTCC_SDS_HindIII_R1_block_07_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 900M Dec 21 23:59 CTCC_SDS_HindIII_R1_block_07_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 233M Dec 22 00:09 CTCC_SDS_HindIII_R1_block_08_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 889M Dec 22 00:04 CTCC_SDS_HindIII_R1_block_08_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 231M Dec 22 00:14 CTCC_SDS_HindIII_R1_block_09_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 882M Dec 22 00:10 CTCC_SDS_HindIII_R1_block_09_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 230M Dec 22 00:19 CTCC_SDS_HindIII_R1_block_10_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 880M Dec 22 00:15 CTCC_SDS_HindIII_R1_block_10_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 231M Dec 22 00:24 CTCC_SDS_HindIII_R1_block_11_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 879M Dec 22 00:20 CTCC_SDS_HindIII_R1_block_11_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 260M Dec 21 23:59 CTCC_SDS_HindIII_R2_block_06_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 954M Dec 21 23:54 CTCC_SDS_HindIII_R2_block_06_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 251M Dec 22 00:04 CTCC_SDS_HindIII_R2_block_07_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 928M Dec 21 23:59 CTCC_SDS_HindIII_R2_block_07_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 250M Dec 22 00:09 CTCC_SDS_HindIII_R2_block_08_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 926M Dec 22 00:04 CTCC_SDS_HindIII_R2_block_08_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 245M Dec 22 00:14 CTCC_SDS_HindIII_R2_block_09_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 910M Dec 22 00:10 CTCC_SDS_HindIII_R2_block_09_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 245M Dec 22 00:19 CTCC_SDS_HindIII_R2_block_10_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 910M Dec 22 00:15 CTCC_SDS_HindIII_R2_block_10_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu linchen_209 245M Dec 22 00:24 CTCC_SDS_HindIII_R2_block_11_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu linchen_209 907M Dec 22 00:20 CTCC_SDS_HindIII_R2_block_11_hg19.bwt2glob.unmap_trimmed.fastq
(base) jiangxu@discovery2:~/proj3/NGS_Sequencing/20180730_CTCC_cryomilling_and_crosslinking/data_analysis/hg19_analysis/HiC_Pro_Analysis/output_01/bowtie_results/bwt2_local/01$ du -ch
2.1G .
2.1G total
(base) jiangxu@discovery2:~/proj3/NGS_Sequencing/20180730_CTCC_cryomilling_and_crosslinking/data_analysis/hg19_analysis/HiC_Pro_Analysis/output_01/bowtie_results/bwt2_local/01$ cd /scratch/jiangxu/
ATAC_seq/ CTCC/ dilution_HiC/ envs/ genome_file/ isHiC_GM12878_MboI/ isHiC_HUVEC/ micro_C/ mnase_seq/ native_HiC_HUVEC/ sprite/
(base) jiangxu@discovery2:~/proj3/NGS_Sequencing/20180730_CTCC_cryomilling_and_crosslinking/data_analysis/hg19_analysis/HiC_Pro_Analysis/output_01/bowtie_results/bwt2_local/01$ cd /scratch2/jiangxu/
2017_cryomilling_HiC/ CTCC_SDS_HindIII/ env/ isHiC_GM12878_MboI/ TCC_Reza/
article/ CTCC_with_RNaseaA_breaking_analysis/ GM12878_subcomp_sorted.bed micro_C/ xlink_first_CTCC_SDS_something_wrong/
(base) jiangxu@discovery2:~/proj3/NGS_Sequencing/20180730_CTCC_cryomilling_and_crosslinking/data_analysis/hg19_analysis/HiC_Pro_Analysis/output_01/bowtie_results/bwt2_local/01$ cd /scratch2/jiangxu/CTCC_SDS_HindIII/
Display all 102 possibilities? (y or n)
(base) jiangxu@discovery2:~/proj3/NGS_Sequencing/20180730_CTCC_cryomilling_and_crosslinking/data_analysis/hg19_analysis/HiC_Pro_Analysis/output_01/bowtie_results/bwt2_local/01$ cd /scratch2/jiangxu/CTCC_SDS_HindIII/output_01/bowtie_results/bwt2_local/01/
(base) jiangxu@discovery2:/scratch2/jiangxu/CTCC_SDS_HindIII/output_01/bowtie_results/bwt2_local/01$ ls -lah
total 6.3G
drwxrwx— 2 jiangxu jiangxu 24 Dec 22 00:20 .
drwxrwx— 3 jiangxu jiangxu 1 Dec 21 23:54 …
-rw-rw---- 1 jiangxu jiangxu 239M Dec 21 23:58 CTCC_SDS_HindIII_R1_block_06_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 897M Dec 21 23:54 CTCC_SDS_HindIII_R1_block_06_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 237M Dec 22 00:04 CTCC_SDS_HindIII_R1_block_07_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 900M Dec 21 23:59 CTCC_SDS_HindIII_R1_block_07_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 233M Dec 22 00:09 CTCC_SDS_HindIII_R1_block_08_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 889M Dec 22 00:04 CTCC_SDS_HindIII_R1_block_08_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 231M Dec 22 00:14 CTCC_SDS_HindIII_R1_block_09_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 882M Dec 22 00:10 CTCC_SDS_HindIII_R1_block_09_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 230M Dec 22 00:19 CTCC_SDS_HindIII_R1_block_10_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 880M Dec 22 00:15 CTCC_SDS_HindIII_R1_block_10_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 231M Dec 22 00:24 CTCC_SDS_HindIII_R1_block_11_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 879M Dec 22 00:20 CTCC_SDS_HindIII_R1_block_11_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 260M Dec 21 23:59 CTCC_SDS_HindIII_R2_block_06_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 954M Dec 21 23:54 CTCC_SDS_HindIII_R2_block_06_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 251M Dec 22 00:04 CTCC_SDS_HindIII_R2_block_07_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 928M Dec 21 23:59 CTCC_SDS_HindIII_R2_block_07_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 250M Dec 22 00:09 CTCC_SDS_HindIII_R2_block_08_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 926M Dec 22 00:04 CTCC_SDS_HindIII_R2_block_08_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 245M Dec 22 00:14 CTCC_SDS_HindIII_R2_block_09_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 910M Dec 22 00:10 CTCC_SDS_HindIII_R2_block_09_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 245M Dec 22 00:19 CTCC_SDS_HindIII_R2_block_10_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 910M Dec 22 00:15 CTCC_SDS_HindIII_R2_block_10_hg19.bwt2glob.unmap_trimmed.fastq
-rw-rw---- 1 jiangxu jiangxu 245M Dec 22 00:24 CTCC_SDS_HindIII_R2_block_11_hg19.bwt2glob.unmap_bwt2loc.bam
-rw-rw---- 1 jiangxu jiangxu 907M Dec 22 00:20 CTCC_SDS_HindIII_R2_block_11_hg19.bwt2glob.unmap_trimmed.fastq
(base) jiangxu@discovery2:/scratch2/jiangxu/CTCC_SDS_HindIII/output_01/bowtie_results/bwt2_local/01$ du -ch
6.3G .
6.3G total
(base) jiangxu@discovery2:/scratch2/jiangxu/CTCC_SDS_HindIII/output_01/bowtie_results/bwt2_local/01$

Hey Guys,
I think I found the solution to this problem. Use the option

--apparent-size

solved the problem.
du -shc --apparent-size

1 Like

@jiangxu The file systems run ZFS which compresses files, so the size on disk will be smaller than the actual file size. Using the --apparent-size option will give the actual size without compression, and ls -lh should give the same apparent size result for individual files.

1 Like

Thank you!