algorithm - How to calculate the entropy of a file? -
how calculate entropy of file? (or let's bunch of bytes)
have idea, i'm not sure it's mathematically correct.
my idea following:
- create array of 256 integers (all zeros).
- traverse through file , each of bytes,
increment corresponding position in array. - at end: calculate "average" value array.
- initialize counter zero,
, each of array's entries:
add entry's difference "average" counter.
well, i'm stuck. how "project" counter result in such way results lie between 0.0 , 1.0? i'm sure, idea inconsistent anyway...
i hope has better , simpler solutions?
note: need whole thing make assumptions on file's contents:
(plaintext, markup, compressed or binary, ...)
- at end: calculate "average" value array.
- initialize counter zero, , each of array's entries: add entry's difference "average" counter.
with some modifications can shannon's entropy:
rename "average" "entropy"
(float) entropy = 0 in array[256]:counts (float)p = counts[i] / filesize if (p > 0) entropy = entropy - p*lg(p) // lgn logarithm base 2
edit: wesley mentioned, must divide entropy 8 in order adjust in range 0 . . 1 (or alternatively, can use logarithmic base 256).
Comments
Post a Comment