From: Muthu <gnu...@us...> - 2006-10-01 07:02:36
|
Update of /cvsroot/octave/octave-forge/main/comm/inst In directory sc8-pr-cvs3.sourceforge.net:/tmp/cvs-serv3667 Added Files: huffmandeco.m huffmandict.m huffmanenco.m Log Message: Octave now has huffman coding algorithms which are compatible with the other-leading-brand and even better with 2 other options for Huffman code generation including a minimum variance of code word option. The computational complexity of these routines are way high, sorry about that. But O(N*log(N)) encoding & decoding exist which we can implement as time permits. --- NEW FILE: huffmandict.m --- ## (C) 2006, Sep 30, Muthiah Annamalai, <mut...@ut...> ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program; if not, write to the Free Software ## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ## ## usage: huffmandict(symbols, source probability vector, {toggle},{minvar}) ## toggle is a 1,0 optional argument that starts building a ## code based on 1's or 0's, defaulting to 0. Also minvar is a boolean ## value that is useful in choosing if you want to optimize buffer for ## transmission in the applications of Huffman coding, however it doesnt ## affect the type or average codeword length of the generated code. ## ## This function builds a Huffman code, given the probability list. ## The Huffman codes persymbol are output as a list of ## strings-per-source symbol. A zero probability symbol is NOT assigned ## any codeword as this symbol doesnt occur in practice anyway. ## ## example: huffmandict(symbols, [0.5 0.25 0.15 0.1]) => CW(0,10,111,110) ## huffmandict(symbols, 0.25*ones(1,4)) => CW(11,10,01,00) ## ## prob=[0.5 0 0.25 0.15 0.1] ## dict=huffmandict(1:5,[0.5 0 0.25 0.15 0.1],1) ## entropy(prob) ## laverage(dict,prob) ## ## x = [0.20000 0.40000 0.20000 0.10000 0.10000]; ## #illustrates the minimum variance thing. ## huffmandict(1,x,0,true) #min variance tree. ## huffmandict(1,x) #normal huffman tree. ## ## Ref: Dr.Rao's course EE5351 Digital Video Coding, ## at UT-Arlington. ## ## Huffman code algorithm. ## while (uncombined_symbols_remain) ## combine least probable symbols into one with, ## their probability being the sum of the two. ## save this result as a stage at lowest order rung. ## (Moving to lowest position possible makes it non-minimum variance ## entropy coding) ## end ## ## for each (stage) ## Walk the tree we built, and assign each row either 1, ## or 0 of ## end ## ## reverse each symbol, and dump it out. ## function cw_list=huffmandict(sym,source_prob,togglecode,minvar) if nargin < 2 error("Usage: huffman_dict(source_prob,{togglecode 1-0 in code});") elseif nargin < 3 togglecode=0; minvar=0; end if(sum(source_prob)!=1.0) error("source probabilities must add up to 1.0"); end % %need to find & eliminate the zero-probability code words. %in practice we donot need to assign anything to them. Reasoning %being that if it doesnt occur why bother saving its vale anyway? % origsource_prob=source_prob; % %sort the list and know the index transpositions. kills the speed O(n^2). % L=length(source_prob); index=[1:L]; for itr1=1:L for itr2=itr1:L if(source_prob(itr1) < source_prob(itr2)) t=source_prob(itr1); source_prob(itr1)=source_prob(itr2); source_prob(itr2)=t; t=index(itr1); index(itr1)=index(itr2); index(itr2)=t; end end end %index %source_prob idx=find(source_prob==0); if(not(isempty(idx))) source_prob=source_prob(1:idx(1)-1); end clear idx; stage_list={}; cw_list{1:L}=[]; stage_curr={}; stage_curr.prob_list=source_prob; stage_curr.sym_list={}; S=length(source_prob); for i=1:S; stage_curr.sym_list{i}=[i]; #cw_list{i}=""; end % % another O(n^2) part. % I=1; while (I<S) L=length(stage_curr.prob_list); nprob=stage_curr.prob_list(L-1)+stage_curr.prob_list(L); nsym=[stage_curr.sym_list{L-1}(1:end),stage_curr.sym_list{L}(1:end)]; %stage_curr; %archive old stage list. stage_list{I}=stage_curr; % %insert the new probability into the list, at the %first-position (greedy?) possible. % for i=1:(L-2) if((minvar && stage_curr.prob_list(i)<=nprob) || \ stage_curr.prob_list(i) < nprob) break; end end stage_curr.prob_list=[stage_curr.prob_list(1:i-1) nprob stage_curr.prob_list(i:L-2)]; stage_curr.sym_list={stage_curr.sym_list{1:i-1}, nsym, stage_curr.sym_list{i:L-2}}; % Loopie I=I+1; end one_cw="1"; zero_cw="0"; if (togglecode==0) one_cw=1; zero_cw=0; else one_cw=0; zero_cw=1; end % % another O(n^2) part. % %printf("Exit Loop"); I=I-1; while (I>0) stage_curr=stage_list{I}; L=length(stage_curr.sym_list); clist=stage_curr.sym_list{L}; for k=1:length(clist) cw_list{1,clist(k)}=[cw_list{1,clist(k)} one_cw]; end clist=stage_curr.sym_list{L-1}; for k=1:length(clist) cw_list{1,clist(k)}=[cw_list{1,clist(k)}, zero_cw]; end % Loopie I=I-1; end % %zero all the code-words of zero-probability length, 'cos they %never occur. % S=length(source_prob); for itr=(S+1):length(origsource_prob) cw_list{1,itr}=-1; end %disp('Before resorting') %cw_list nw_list{1:L}=[]; % % Re-sort the indices according to the probability list. % L=length(source_prob); for itr=1:(L) t=cw_list{index(itr)}; nw_list{index(itr)}=cw_list{itr}; end cw_list=nw_list; %zero all the code-words of zero-probability length, 'cos they %never occur. %for itr=1:L % if(origsource_prob(itr)==0) % cw_list{itr}=""; % end %end return end %! %!assert(huffmandict(1:4,[0.5 0.25 0.15 0.1],1), {[0],[1 0],[1 1 1],[1 1 0]},0) %!assert(huffman(0.25*ones(1,4),1),{[1 1],[1 0],[0 1],[0 0]},0) %! --- NEW FILE: huffmanenco.m --- ## (C) 2006, Sep 30, Muthiah Annamalai, <mut...@ut...> ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program; if not, write to the Free Software ## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ## ## ## usage: huffmanenco(sig,dict) ## The function returns the Huffman encoded signal using dict. ## This function uses a dict built from the huffmandict ## and uses it to encode a signal list into a huffman list. ## Restrictions include a signal set that strictly belongs ## in the range [1,N] with N=length(dict). Also dict can ## only be from the huffmandict() routine. ## ## ## example: hd=huffmandict(1:4,[0.5 0.25 0.15 0.10]) ## huffmanenco(1:4,hd) # [ 1 0 1 0 0 0 0 0 1 ] ## ## function hcode=huffmanenco(sig,dict) if ( nargin < 2 || strcmp(class(dict),"cell")~=1 ) error('usage: huffmanenco(sig,dict)'); end if (max(sig) > length(dict)) || ( min(sig) < 1) error("signal has elements that are outside alphabet set ... Cannot encode."); end hcode=[dict{sig}]; return end %! %! assert(huffmanenco(1:4, huffmandict(1:4,[0.5 0.25 0.15 0.10])), [ 1 0 1 0 0 0 0 0 1 ],0) %! --- NEW FILE: huffmandeco.m --- ## (C) 2006, Sep 30, Muthiah Annamalai, <mut...@ut...> ## ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License ## along with this program; if not, write to the Free Software ## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ## ## ## usage: sig=huffmandeco(hcode,dict) ## The function returns the original signal that was ## Huffman encoded signal using huffmanenco. This function uses ## a dict built from the huffmandict and uses it to decode a signal ## list into a huffman list. ## Restrictions include hcode is expected to be a binary code; ## returned signal set that strictly belongs in the range [1,N] ## with N=length(dict). Also dict can only be from the ## huffmandict() routine. Whenever decoding fails, ## those signal values are indicated by -1, and we successively ## try to restart decoding from the next bit that hasnt failed in ## decoding, ad-infinitum. ## ## example: hd=huffmandict(1:4,[0.5 0.25 0.15 0.10]) ## hcode=huffmanenco(1:4,hd) # [ 1 0 1 0 0 0 0 0 1 ] ## huffmandeco(hcode,hd) # [1 2 3 4] ## function sig=huffmandeco(hcode,dict) if ( nargin < 2 || strcmp(class(dict),"cell")~=1 ) error('usage: huffmanenco(sig,dict)'); end if (max(hcode) > 1 || min(hcode) < 0) error("hcode has elements that are outside alphabet set ... Cannot decode."); end # TODO: # O(log(N)) algorithms exist, but we need some effort to implement those # Maybe sometime later, it would be a nice 1-day project # Ref: A memory efficient Huffman Decoding Algorithm, AINA 2005, IEEE # # TODO: Somebody can implement a 'faster' algorithm than O(N^3) at present # Decoding Algorithm O(N+k*log(N)) which is approximately O(N+Nlog(N)) # # 1. Separate the dictionary by the lengths # and knows symbol correspondence. # # 2. Find the symbol in the dict of lengths l,h where # l = smallest cw length ignoring 0 length CW, and # h = largest cw length , with k=h-l+1; # # 3. Match the k codewords to the dict using binary # search, and then you can declare decision. # # 4. In case of non-decodable words skip the start-bit # used in the previous case, and then restart the same # procedure from the next bit. # L=length(dict); lenL=length(dict{1}); lenH=0; # # Know the ranges of operation # for itr=1:L t=length(dict{itr}); if(t < lenL) lenL=t; end if(t > lenH) lenH=t; end end # # Now do a O(N^4) algorithm # itr=0; #offset into the hcode sig=[]; #output CL=length(hcode); %whole decoding loop. while ( itr < CL ) %decode one element (or just try to). for itr_y=lenL:lenH %look for word in dictionary. %with flag to indicate found %or not found. Detect end of buffer. if ( (itr+itr_y) > CL ) break; end word=hcode(itr+1:itr+itr_y); flag=0; %word %search loop. for itr_z=1:L if(isequal(dict{itr_z},word)) itr=itr+itr_y; sig=[sig itr_z]; flag=1; break; end end %for_ loop breaker if( flag == 1 ) break; %also need to break_ above then. end end %for_ loop #could not decode if( flag == 0 ) itr=itr+1; sig=[sig -1]; end end %while_ loop end %! %! assert(huffmandeco(huffmanenco(1:4, huffmandict(1:4,[0.5 0.25 0.15 0.10])), huffmandict(1:4,[0.5 0.25 0.15 0.10])), [1:4],0) %! |