Size and Size on disk of a .txt file
Asked Answered
S

3

5

Opened up a new file in Notepad and inserted the sentence without the quotes, "Four score and seven years ago" in it.

Four              4 characters
score             5 characters
and               3 characters
seven             5 characters 
years             5 characters 
ago               3 characters

TOTAL : 25 + 5 spaces = 30 characters.

You will find that the file has a size of 30 bytes on disk: 1 byte for each character. Saved the file to disk under the name gettingSize.txt. Then look at the size of the file. As a rule, Each character consumes a byte.

Size : 30 bytes
Size on Disk : 4.00 KB (4,096 bytes)

The below paragraphs are copy pasted from a pdf.

If you were to look at the file as a computer looks at it, you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character (see below). So on disk, the numbers for the file look like this:

F o u r a n d s e v e n

70 111 117 114 32 97 110 100 32 115 101 118 101 110

By looking in the ASCII table, you can see a one-to-one correspondence between each character and the ASCII code used. Note the use of 32 for a space -- 32 is the ASCII code for a space. We could expand these decimal numbers out to binary numbers (so 32 = 00100000) if we wanted to be technically correct -- that is how the computer really deals with things.

1) i know that every thing is stored in the form of bits and bytes, so what generally this means - "you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character". A byte is 8 bits. So how does "each byte a number -- the number is the ASCII code". How can a byte contains a ASCII number(eg. 49 for '1') other than 0 and 1?

2) What exactly is the difference between Size and Size on Disk? And How does ASCII and Unicode fit into it?

3)In Java, Strings are objects. Can I say it be a multiple characters concated together? String str = "Four score and seven years ago" So how does a str stored in memory. Is it in the same manner as saving in the notepad file?

Stere answered 8/9, 2014 at 6:52 Comment(1)
This is 3 almost completely different questions you have here...Donner
R
7

Files are stored in blocks. If file size is smaller than block size (in your case, 4KB) the file will take all block but most of its space is unused. I think this question was answered on SuperUser, i'll find the link. UPDATE: https://superuser.com/questions/704218/why-is-there-such-a-big-difference-between-size-and-size-on-disk

enter image description here

Remediable answered 8/9, 2014 at 6:57 Comment(0)
D
3

To make a few short points:

  1. "How can a byte contain an ASCII number (eg. 49 for '1') other than 0 and 1?

    A Byte is 8 bits. Thus you can store numbers between 0 and 255 in it.

  2. What is the difference between filesize and size on disk:

    See MJafar Mash's answer: "size" is the actual size in bytes and "size on disk" is the number of bytes you need to allocate as blocks for the file to be placed in.

  3. In Java Strings are Objects. Can I say that a String is multiple characters concatenated together?

    Yes, but It's actually more complicated than that:
    Taken from this answer:

    Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.

Donner answered 8/9, 2014 at 7:4 Comment(8)
A (primitive) byte is 8 bits. A (wrapper) Byte is not 8 bits.Supersedure
While you are actually talking good Java here, IIRC at least german network engineers are always writing Byte with the first letter as majuscle to add additional differentiation between bits and Bytes.Donner
But that is blatantly wrong because a byte is not a Byte. :PSupersedure
You are thinking too Java here. This is waaaay below that. I am talking about OSI layer 1 and you are talking about OSI layers 6-8...Donner
ASCII uses 7-bit numbers to represent the letters, numerals and common punctuation used in American English. ASCII maps 65 to A. So when i typed A from my laptop keypad, how does did they achieve reflecting (representing A) on the machine screen. I just failed to understand this thing?Stere
@ShirgillAnsari well your keyboard will send an Event to your OS. This event probably contains the Unicode Codepoint of the Key you pressed. That again corresponds 1:1 to the ASCII Code for the ASCII values (backward compatiblity). And that is already the byte representation you want saved on your harddriveDonner
Didn't get you. What do you mean by "That again corresponds 1:1 to the ASCII Code for the ASCII values (backward compatiblity)."?I do the Understand the backward compatibility. And am i correct to say ASCII comprises 128 code points mapped in the ratio 1:1 to the ASCII values.Stere
What I am saying is, that the first 128 codepoints of UTF-* are the same as the ASCII codepoints you have. Your Keyboard provides additional possiblitites outside of ASCII scope, given you use the ALT+codepoint feature.Donner
S
1

1) i know that every thing is stored in the form of bits and bytes, so what generally this means - "you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character". A byte is 8 bits. So how does "each byte a number -- the number is the ASCII code". How can a byte contains a ASCII number(eg. 49 for '1') other than 0 and 1?

Each ASCII character occupies 1 byte. Internally, each character is stored as its ASCII number. So, you can store 8-bits of data max i.e, 2^8 -1 = 255. So the range would be 0-255.

2) What exactly is the difference between Size and Size on Disk? And How does ASCII and Unicode fit into it?

Each ASCII character is 1 byte. So, 30 bytes is the actual size of the data in the file. Next, the 4KB is the size of the segment/block in which the file is stored. In your case it is the minimum "new" space given to any file on the disk.

3)In Java, Strings are objects. Can I say it be a multiple characters concated together? String str = "Four score and seven years ago" So how does a str stored in memory. Is it in the same manner as saving in the notepad file?

Yes. Strings are indeed (internally) multiple characters concatenated together. But the characters cannot be changed.String is an object, so , they are stored as an array of characters (in java each character is 2 bytes). Java uses UTF-8 (it could be different based on various factors) as default Charset. You can also change it.

Supersedure answered 8/9, 2014 at 7:9 Comment(3)
What do you mean by "Internally, each character is stored as its ASCII number", not in bits and bytes? I am just confused. it's simple but i am not getting it?Stere
ASCII Number is a number and a number is always bits and bytes.Donner
So why the need for this middleware(ASCII) in between when everything is stored in bits and bytes.Stere

© 2022 - 2024 — McMap. All rights reserved.