算法设计与应用——贪心策略

贪心策略基本思想

Problem
- Suppose that you have a file of 100K characters. To keep the example simple, suppose that each character is one of the 6 letters from a through f. Since we have just 6 characters, we need just 3 bits to represent a character, so the file requires 300K bits to store. Can we do better?
- Suppose that we have more information about the file:
  - the frequency which each character appears
Solution
- The idea is that we will use a variable length code instead of a fixed length code (3 bits for each character), with fewer bits to store the common characters, and more bits to store the rare characters

For example, suppose that the characters appear with the following frequencies, and following codes:
Then the variable-length coded version will take not 300K bits but 451 + 133 + 123 + 163 + 94 + 54 = 224K bits to store, a 25% saving. In fact this is the optimal way to encode the 6 characters present, as we shall see

In a Prefix code no codeword is a prefix of another code word没有一个编码是另一个编码的前缀
- Easy encoding and decoding
To encode, we need only concatenate the codes of consecutive characters in the message
- The string 110001001101 parses uniquely as 1100-0-100-1101, which decodes to FACE
To decode, we have to decide where each code begins and ends
- Easy, since, no codes share a prefix
  
  “prefix-free codes” would be a better name, but the term “prefix codes” is standard in the literature

The greedy algorithm for computing the optimal Human coding tree T is as follows
- It starts with a forest of one-node trees representing each c ∈*C, and merges them in a greedy style, using a priority queue Q*, sorted by the smallest frequency: