algorithm - How to calculate the entropy of a file? -


how calculate entropy of file? (or let's bunch of bytes)
have idea, i'm not sure it's mathematically correct.

my idea following:

  • create array of 256 integers (all zeros).
  • traverse through file , each of bytes,
    increment corresponding position in array.
  • at end: calculate "average" value array.
  • initialize counter zero,
    , each of array's entries:
    add entry's difference "average" counter.

well, i'm stuck. how "project" counter result in such way results lie between 0.0 , 1.0? i'm sure, idea inconsistent anyway...

i hope has better , simpler solutions?

note: need whole thing make assumptions on file's contents:
(plaintext, markup, compressed or binary, ...)

  • at end: calculate "average" value array.
  • initialize counter zero, , each of array's entries: add entry's difference "average" counter.

with some modifications can shannon's entropy:

rename "average" "entropy"

(float) entropy = 0 in array[256]:counts    (float)p = counts[i] / filesize   if (p > 0) entropy = entropy - p*lg(p) // lgn logarithm base 2 

edit: wesley mentioned, must divide entropy 8 in order adjust in range 0 . . 1 (or alternatively, can use logarithmic base 256).


Comments

Popular posts from this blog

Java 3D LWJGL collision -

spring - SubProtocolWebSocketHandler - No handlers -

methods - python can't use function in submodule -