DSP56300FM [FREESCALE]

24-Bit Audio Digital Signal Processor; 24位音频数字信号处理器
DSP56300FM
型号: DSP56300FM
厂家: Freescale    Freescale
描述:

24-Bit Audio Digital Signal Processor
24位音频数字信号处理器

数字信号处理器
文件: 总108页 (文件大小:1082K)
中文:  中文翻译
下载:  下载PDF数据表文档文件
Freescale Semiconductor, Inc.  
Freescale Semiconductor  
Implementing Viterbi Decoders Using  
the VSL Instruction on DSP Families  
DSP56300 and DSP56600  
by  
Dana Taipale  
This application report describes how to generate, from a set of  
convolutional code polynomials, the assembly code needed for  
implementation of a Viterbi decoder.  
© Freescale Semiconductor, Inc., 2004. All rights reserved.  
For More Information On This
Go to: www.freescale.
Freescale Semiconductor, Inc.  
Order this document by number APR40/D  
(Revision 0, May 1998)  
Information in this document is provided solely to enable system and software  
implementers to use Freescale Semiconductor products. There are no express or  
implied copyright licenses granted hereunder to design or fabricate any integrated  
circuits or integrated circuits based on the information in this document.  
Freescale Semiconductor reserves the right to make changes without further notice to  
any products herein. Freescale Semiconductor makes no warranty, representation or  
guarantee regarding the suitability of its products for any particular purpose, nor does  
Freescale Semiconductor assume any liability arising out of the application or use of  
any product or circuit, and specifically disclaims any and all liability, including without  
limitation consequential or incidental damages. “Typical” parameters which may be  
provided in Freescale Semiconductor data sheets and/or specifications can and do  
vary in different applications and actual performance may vary over time. All operating  
parameters, including “Typicals” must be validated for each customer application by  
customer’s technical experts. Freescale Semiconductor does not convey any license  
under its patent rights nor the rights of others. Freescale Semiconductor products are  
not designed, intended, or authorized for use as components in systems intended for  
surgical implant into the body, or other applications intended to support or sustain life,  
or for any other application in which the failure of the Freescale Semiconductor product  
could create a situation where personal injury or death may occur. Should Buyer  
purchase or use Freescale Semiconductor products for any such unintended or  
unauthorized application, Buyer shall indemnify and hold Freescale Semiconductor  
and its officers, employees, subsidiaries, affiliates, and distributors harmless against all  
claims, costs, damages, and expenses, and reasonable attorney fees arising out of,  
directly or indirectly, any claim of personal injury or death associated with such  
unintended or unauthorized use, even if such claim alleges that Freescale  
Semiconductor was negligent regarding the design or manufacture of the part.  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
TABLE OF CONTENTS  
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1  
1.1  
1.2  
1.3  
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3  
Viterbi Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3  
Manual Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4  
The Viterbi Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1  
2.1  
2.2  
2.3  
2.4  
IS-136. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3  
Convolutional Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3  
Viterbi Decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4  
Algorithmic Enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12  
Expanding the Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1  
3.1  
3.2  
3.3  
3.4  
3.5  
3.6  
3.7  
3.8  
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3  
Partitioning the Task. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3  
The Inner Loop: Viterbi Butterflies . . . . . . . . . . . . . . . . . . . . . . . 3-3  
Creating the Branch Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6  
Storing the Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11  
Traceback: Obtaining the Decoder Output. . . . . . . . . . . . . . . . 3-13  
Main: Gluing the Pieces Together . . . . . . . . . . . . . . . . . . . . . . 3-17  
Memory Organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19  
Algorithmic Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1  
4.1  
4.2  
4.2.1  
4.2.2  
4.3  
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3  
Allowing More General Branch Metrics . . . . . . . . . . . . . . . . . . . 4-3  
Modify Viterbi Butterfly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4  
Modify Branch Metric Generation . . . . . . . . . . . . . . . . . . . . . 4-6  
Starting from 0: The Pre ACS Macro . . . . . . . . . . . . . . . . . . . . . 4-8  
Collapsing the States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10  
Main: Putting the Pieces Back Together . . . . . . . . . . . . . . . . . 4-12  
4.4  
4.5  
Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1  
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
iii  
Freescale Semiconductor, Inc.  
5.2  
5.3  
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3  
Program Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4  
APPENDIX A Basic Algorithm Program Listing. . . . . . . . . . . . . . . A-1  
A.1 Viterbi Algorithm PROGRAM LISTING . . . . . . . . . . . . . . . . . . . A-3  
APPENDIX B Extended Algorithm Program Listing . . . . . . . . . . . B-1  
B.1 16-Bit Enhanced Viterbi Decoder PROGRAM LISTING . . . . . . B-3  
APPENDIX C 24-Bit Algorithm Program Listing . . . . . . . . . . . . . . C-1  
C.1  
24-Bit Enhanced Viterbi Decoder PROGRAM LISTING . . . . . . C-3  
iv  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
LIST OF FIGURES  
Figure 2-1  
Figure 2-2  
Figure 2-3  
Figure 2-4  
Figure 2-5  
Figure 2-6  
Figure 2-7  
Figure 2-8  
Figure 2-9  
Figure 3-1  
Figure 4-1  
Example Rate 1/2 Convolutional Encoder. . . . . . . . . . . . . . . . . . . . . 2-3  
Single State of Prototype Encoder Tree Diagram . . . . . . . . . . . . . . . 2-5  
First Five Levels of the Encoder State Tree . . . . . . . . . . . . . . . . . . . 2-6  
Using Multiple Input Paths to Collapse the State Tree . . . . . . . . . . . 2-8  
Using Intermediate States to Eliminate Partial Paths . . . . . . . . . . . . 2-9  
Trellis Structure for Viterbi Decoding. . . . . . . . . . . . . . . . . . . . . . . . 2-10  
Example Viterbi Butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11  
Encoder Tap Symmetry to Reduce Calculations. . . . . . . . . . . . . . . 2-13  
Applying an Offset to Obtain Odd Symmetry in the Branch Metrics 2-14  
Stored B0 Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12  
Polynomial (1,1+D) Trellis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
v
Freescale Semiconductor, Inc.  
vi  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
LIST OF TABLES  
Table 2-1  
Table 3-1  
Table 5-1  
Input/Output Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4  
Recreated Encoder Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8  
Viterbi Decoder Code Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
vii  
Freescale Semiconductor, Inc.  
viii  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
LIST OF EXAMPLES  
Example 3-1  
Example 3-2  
Example 3-3  
Example 3-4  
Example 3-5  
Example 3-6  
Example 3-7  
Example 3-8  
Example 4-1  
Example 4-2  
Example 4-3  
Example 4-4  
Example 4-5  
Example A-1  
Example B-1  
Example C-1  
Using the VSL Instruction in a Viterbi Butterfly . . . . . . . . . . . . . . . . . 3-4  
Find Branch Metrics Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9  
Partial Path Storage Code Listing . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14  
Traceback Output Path Code Listing . . . . . . . . . . . . . . . . . . . . . . . 3-15  
Main Viterbi Decoding Routine: Initialization . . . . . . . . . . . . . . . . . . 3-17  
Main Viterbi Decoding Routine: Patch Metric Update . . . . . . . . . . . 3-18  
Main Viterbi Decoding Routine: Termination and Traceback . . . . . 3-19  
Memory Organization Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20  
Modified Viterbi Butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4  
Modified Branch Metric Generation Code . . . . . . . . . . . . . . . . . . . . 4-6  
Pre-ACS Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9  
The ACSFlush Macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11  
Main Program Code Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13  
Basic 16-Bit Implementation of a Viterbi Decoder . . . . . . . . . . . . . . A-4  
Extended Algorithm Program Listing . . . . . . . . . . . . . . . . . . . . . . . . B-3  
24-bit Algorithm Program Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . C-3  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
ix  
Freescale Semiconductor, Inc.  
x
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
SECTION 1  
INTRODUCTION  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Introduction  
1.1  
1.2  
1.3  
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3  
Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3  
Manual Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-4  
1-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Introduction  
Introduction  
1.1  
INTRODUCTION  
TodayÕs communication systems typically make considerable use of signal processing to  
improve their performance. Two common functions that use signal processing to  
improve performance are channel equalization and error correction coding. For  
equalization, maximum likelihood sequence estimation is among the most popular  
schemes; for error correction, convolutional coding with Viterbi decoding is a method of  
choice. Both communication functions make use of the Viterbi algorithm to accomplish  
their task. To simplify the explanations presented here, the focus is on using the  
algorithm for error correction.  
This application report is written with the intent of instructing the reader who, with a set  
of convolutional code polynomials, can generate the assembly code needed to  
implement a Viterbi decoder on the DSP56300 and DSP56600 families of digital signal  
processors (DSPs).  
The DSP56300 and DSP56600 families of DSPs are designed to implement  
communication functions. Because the Viterbi algorithm occupies such a prominent  
position in many communications systems, these processor families have a special  
instruction, Viterbi shift left (VSL). This instruction is designed to improve the  
performance of the processor when doing Viterbi algorithm operations. One goal of this  
application note is to explain the use of the VSL instruction when implementing the  
Viterbi algorithm.  
1.2  
VITERBI ALGORITHM  
Communication applications that use convolutional coding or sequence estimation often  
use the Viterbi algorithm to efficiently process the received data. These applications  
occur so often that the DSP56300 and DSP56600 families have a special instruction to  
enhance the algorithmÕs implementation. This application note begins by introducing  
the Viterbi algorithm, then proceeds step by step through an implementation of that  
algorithm for a specific error correction example. In addition, use of the new VSL  
instruction is explained for obtaining efficient coding for applications requiring the  
Viterbi algorithm.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
1-3  
Freescale Semiconductor, Inc.  
Introduction  
Manual Organization  
1.3  
MANUAL ORGANIZATION  
Section 2 of this manual covers many basics of the Viterbi algorithm and introduces  
much of the terminology used in this application note. Those already acquainted with  
the algorithm may wish to proceed directly to Section 3.  
Section 3 shows how to develop assembly language routines for the Viterbi algorithm.  
This is done by means of an example, taking a set of encoding polynomials from the  
wireless interim standard IS-136 and developing the assembly routines needed to  
implement the corresponding decoder using the Viterbi algorithm.  
Section 4 discusses some code modifications that are useful for generalizing and  
optimizing the example code. The code-to-cover generalized branch metrics is extended,  
and an efficient method for normalizing the path metrics is introduced. In addition, two  
additional sets of code are given for more efficient processing of data periodically forced  
through a known state.  
Section 5 summarizes the application report and presents interesting data and statistics  
arrived at by comparing the basic Viterbi algorithm code of Section 2 with that of the  
modified program code presented in Section 3 and Section 4.  
Appendix A contains the complete program listing for Section 3.  
Appendix B contains the complete program listing for Section 4.  
Appendix C contains the complete 24-bit program listing for Section 4.  
1-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
SECTION 2  
THE VITERBI ALGORITHM  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
2.1  
2.2  
2.3  
2.4  
IS-136 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3  
Convolutional Encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3  
Viterbi Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4  
Algorithmic Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12  
2-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
IS-136  
2.1  
IS-136  
This section introduces the Viterbi algorithm through an example from industry: the  
convolutional error correcting code used in IS-136, the TIA/EIA interim wireless  
standard. Here it is shown how the algorithm is derived, and some typical  
computational enhancements are introduced.  
2.2  
CONVOLUTIONAL ENCODING  
The Viterbi algorithm can be usefully illustrated by implementing a convolutional code.  
A convolutional encoder implementation in hardware is illustrated schematically in  
Figure 2-1. In order to illustrate the steps needed to implement Viterbi decoders in  
general, this application report develops the assembly code needed to implement a  
Viterbi algorithm decoder for this encoder.  
5
4
3
2
1
D
D
D
D
D
S
S
S
S
0
S
4
3
2
1
Figure 2-1 Example Rate 1/2 Convolutional Encoder  
In this figure, the square boxes represent a one-bit-wide shift register. Some shift register  
outputs are connected to modulo 2 adders (i.e., parity generators). Every time one bit is  
shifted into the input, two output bits are generated by the adders. Because we have  
more output bits than input bits, we have redundancy. The redundant information can  
be used to correct errors at the receiver.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
2-3  
 
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
Error correcting codes have some standard notation. This example, used in the wireless  
standard IS-136, is a rate 1/2 code, meaning that for each processing period, one  
information bit is taken in and two bits of output are generated. The word (S ,S ,S ,S ,S )  
4
3 2 1 0  
is the encoder state. The remaining bit of the encoder we call the input bit. There is a  
convenient notation that describes the encoder using polynomials. This encoder can be  
3
5
2
3
4
5
described with the encoding polynomials (1+D+D +D ,1+D +D +D +D ). Each factor D  
corresponds to a one clock delay for that adder input. Given the encoding polynomials,  
we can begin to design the Viterbi decoder.  
2.3  
VITERBI DECODER  
To design a Viterbi decoder, begin by taking some example encoder input and  
generating the corresponding output. An example appears in Table 2-1, where it is  
assumed that the encoder is 0 filled at the start. For proper decoding, we need to recreate  
the encoder states and find the set of state changes, or transitions, that produce an  
encoder output that best agrees with our decoder input data. We begin using the same  
assumption we used in generating the encoder output in Table 2-1. We assume that the  
initial state of the encoder is 00000.  
Table 2-1 Input/Output Mapping  
Encoder  
Input  
Encoder  
Output  
1
0
1
1
0
11  
10  
10  
10  
10  
To compare our recreated encoder state with the decoder input, we keep track of the  
agreements between the recreated encoder outputs and the decoder inputs. The  
cumulative agreement for a path leading to one particular recreated encoder state is  
called a path metric for that path. Incremental agreements are called branch metrics.  
2-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
 
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
Figure 2-2 shows a prototype for one state of the state tree for this decoder. Figure 2-3  
shows the entire tree used by the decoder to track the first five input pairs. The state of  
the recreated encoder appears in the box. Along each arrow (a state transition) appears a  
number pair showing the corresponding encoder output bit pair for that state transition  
(the output produced immediately before the state changes). This figure is ordered so  
that transitions corresponding to an input 0 are always the upper path and transitions  
corresponding to an input 1 are always the lower path. To determine the encoder output  
for a given transition, load the encoder with the state and use a 0 or 1 for the input bit,  
according to the transition chosen. The encoder polynomials then determine the encoder  
output, as shown in Figure 2-1.  
Encoder Output/  
Branch Metric  
Path Metric State  
Figure 2-2 Single State of Prototype Encoder Tree Diagram  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
2-5  
 
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
Input:  
11  
10  
10  
10  
00/1  
11/1  
10  
00/1  
00  
11  
10  
01  
01  
10  
11  
00  
11  
00  
01  
10  
10  
01  
00  
11  
01  
10  
11  
00  
00  
11  
10  
01  
10  
01  
00  
11  
11  
00  
01  
10  
11  
00  
01  
10  
10  
01  
00  
11  
00  
11  
10  
01  
01  
10  
11  
00  
10  
01  
00  
11  
11  
00  
01  
10  
01  
10  
11  
00  
00  
11  
10  
01  
3
00000  
00001  
00000  
2
11/1  
00000  
3
10/2  
01/0  
00010  
00011  
00/1  
11/1  
00001  
1
00000  
4
01/0  
10/2  
00100  
00101  
00010  
10/2  
01/0  
2
00001  
00/1  
2
00110  
00111  
11/1  
00/1  
00011  
0
00000  
3
11/1  
00/1  
01000  
01001  
00100  
01/0  
10/2  
3
11/1  
00010  
5
01/0  
10/2  
01010  
01011  
10/2  
01/0  
00101  
1
00001  
00/0  
2
10/2  
01/0  
01100  
01101  
00110  
11/1  
00/1  
1
00011  
2
00/1  
11/1  
01110  
01111  
00111  
0
00000  
5
01/0  
10/2  
11/1  
00/1  
10000  
10001  
01000  
11/1  
00/1  
4
00100  
5
10010  
10011  
01/0  
01001  
4
11/2  
00010  
6
00/1  
11/1  
10100  
10101  
10/2  
01010  
01/0  
10/2  
6
00101  
10/2  
10/2  
8
10110  
10111  
01011  
2
01/0  
10/0  
01/2  
00/1  
11/1  
11/1  
00/1  
00001  
5
11000  
11001  
01100  
10/2  
01/0  
3
00110  
01/0  
3
11010  
11011  
11/1  
01101  
2
00011  
4
11100  
11101  
01110  
00/1  
11/1  
3
00/1  
00111  
4
01/0  
10/2  
11110  
11111  
01111  
AA1542  
Figure 2-3 First Five Levels of the Encoder State Tree  
2-6  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
 
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
Comparing the recreated encoder output with the decoder input, we determine the  
number of agreements, shown in Figure 2-3 as the branch metric. We find a branch  
metric for each transition. For each state, we track the cumulative branch metrics to form  
the path metrics. These are shown as a number appearing above each state box.  
To recreate the correct input sequence, we choose the recreated encoder path that best  
agrees with the decoder input data. In this case, the best path is the one with the largest  
final path metric. For clarity, the path metrics for the best path are distinguished with a  
larger font size and bolder arrows. To recreate the input sequence (so far), we can use  
two methods. The easiest is to use the state as the decoder output. Unfortunately, this  
method will not work after we finish the development of the decoder. The second  
method will work when we are done. To obtain the decoder output, trace the best path  
(the one with the largest final path metric) back to the beginning. Now, follow the same  
path forward again to obtain the input sequence by placing a 0 at the decoder output  
each time we choose an upper transition, and a 1 output each time we choose the lower  
transition. Using this method on the tree in Figure 2-3 gives us 10110, which agrees with  
the encoder input example.  
The most troublesome aspect of this decoder is that the number of states we have to  
track for each decoder input is actually the number of possible paths. For this coding  
example, the number of states doubles for each input. For any reasonable number of  
inputs, the amount of work and storage needed for this decoder is far too large to be  
practical. To solve this problem, begin by noting that we donÕt really need all the data  
generated by the decoder. In particular, all the work goes toward finding the path that  
best agrees with the input. We only need the path that gives us the largest path metric. If  
we determine that a path cannot ever have the largest path metric, we can ignore that  
path for all future calculations.  
To collapse the ever-growing tree in Figure 2-3, consider what happens if we continue  
the tree for one more pair of decoder inputs. The total number of states would be 64 for  
the next input pair. To keep the diagram manageable, only a pair of specially chosen  
states appears in Figure 2-4. When we extend the tree to the next state, we get states with  
six bits. Note, however, that the encoder we are attempting to trace only needs five bits  
to determine its output bits. To emphasize this, the sixth (leading) bit is separated in the  
state boxes. Because the extra (leading) bits do not affect the recreated encoder outputs,  
we can ignore them. As a result, the 64 states collapse into 32 states again. The only  
resulting complication is that each state now has two rather than one entering paths.  
Figure 2-4 shows this state collapse as well as each stateÕs multiple input paths in two  
example states. To correctly process the decoder input, we must next determine which of  
the input paths to keep for each state.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
2-7  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
0 00000  
00  
00  
00000  
00000  
00000  
11  
11  
0 00001  
00001  
11  
1 00000  
1 00001  
00  
11  
00  
10000  
10000  
Figure 2-4 Using Multiple Input Paths to Collapse the State Tree  
To find out how to choose the correct input branch for each state, we use the additivity  
of the path metrics. Suppose we want to find the path with the largest metric that  
extends from time i to time k (i<k). Also, suppose we are interested in another time j  
such that i<j<k. For a given state (encoder state) S at time j, consider two paths P1 and  
P2, extending from time i to time j, both entering state S at time j. To get from state S at  
time j to time k, consider two more paths Q1 and Q2. Our setup so far is illustrated in  
Figure 2-5.  
Let the path metric for P1 up to time j be PM1, and let the path metric for P2 up to time j  
be PM2. Define QM1 and QM2 to be the partial path metrics for their respective paths.  
By partial path metrics we mean the contribution due to the partial path only, that is,  
QM1 is the path metric for path Q1 minus the value of the path metric of state S at time j.  
Finally, assume that PM1>PM2. Then any path containing P2 cannot be the path with  
largest path metric for any time after time j.  
2-8  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
To understand why this is so, note that the total path metric for P1 È Q1 is PM1+QM1.  
Similarly, P1 È Q2 has path metric PM1+QM2, P2 È Q1 has path metric PM2+QM1, and  
P2 È Q2 has path metric PM2+QM2. Suppose P2 È Q2 is a candidate for largest path  
metric. Then P1 È Q2 is still larger, as PM1+QM2 >PM2+QM2 (remembering we  
assumed that PM1 >PM2). This is true for any Q2. So, if PM1 > PM2, we can eliminate P2  
from further consideration at time j, without waiting for the ÒfutureÓ inputs from time j to  
time k.  
Q2  
P1  
Q1  
State S  
P2  
time i  
time j  
time k  
Figure 2-5 Using Intermediate States to Eliminate Partial Paths  
In our example, each state will have two paths entering it at any time. The number of  
paths doubles each time, but we can now eliminate half of the paths each time as well.  
This is the basis of the Viterbi algorithm. To use it, we must keep track of the best path  
metric for each state up to the current time. By eliminating paths that can never have the  
largest path metric, however, we keep the amount of computation constant with time.  
To diagram what is going on, we need a different figure than the tree in Figure 2-3. We  
need only keep track of each (encoder) state at each time. The result is called a trellis  
because it resembles one. A diagram of a trellis for our IS-136 code appears in Figure 2-6.  
The beginning of the trellis does not look the same for each time (each decoder input is a  
stage in the diagram). This is because we assumed the encoder started in state 00000. In  
general, the encoder can be in any state, and the trellis repeats. For our figure, this can be  
seen in stages 6 and 7 (after 5 stages, our encoder can be in any state, so we get a Òsteady  
stateÓ in our trellis diagram). Each line in the trellis is a transition, just like the ones in the  
tree. For the trellis, however, the number of states is bounded. The number of states for  
this trellis is determined by the encoder which has five state bits. This means there are  
5
2 = 32 states in the trellis.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
2-9  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
State  
00000  
00001  
00010  
00011  
00100  
00101  
00110  
00111  
01000  
01001  
01010  
01011  
01100  
01101  
01110  
01111  
10000  
10001  
10010  
10011  
10100  
10101  
10110  
10111  
11000  
11001  
11010  
11011  
11100  
11101  
11110  
11111  
AA1543  
Stage  
1
2
3
4
5
6
7
Figure 2-6 Trellis Structure for Viterbi Decoding  
2-10  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Viterbi Decoder  
Fortunately, we do not have to examine all of this trellis at once! For an idea of the basic  
processing, we can examine states in pairs. In particular, note that we only need path  
metrics of two source states to update a pair of destination states. Note that the source  
state pairs are not the same as the destination state pairs.  
For our example code, the source state pair 0ABCD and 1ABCD provides the path  
metrics needed to update the destination state pair ABCD0 and ABCD1, where ABCD is  
fixed for each state pair update. LetÕs examine a single pair update. This is sometimes  
called a Viterbi butterfly. An example butterfly is shown in Figure 2-7.  
.
State  
11/2  
00100  
7
8
11/2  
00100  
5
6
01000  
01001  
Path metric  
5
01000  
01001  
00/0  
Encoder out/  
branch metric  
00/0  
11/2  
11  
11/2  
Decoder  
6
10100  
11  
10100  
input  
Figure 2-7 Example Viterbi Butterfly  
To compare the input paths to the state being updated, we must take the path metric for  
each state on connecting input branches and add the branch metric associated with that  
path. In the example shown in Figure 2-7, the lower branch from state 00100 has a  
recreated encoder output of 00. As noted before, this can be determined by finding the  
encoder output for a state of 00100, along with an input bit of 1 because we are taking the  
lower branch. Assuming a decoder input of 11, we find that neither bit of the recreated  
encoder output agrees with the decoder input. We assign a branch metric of 0 (no  
agreements). We can do similar assignments for the other branches, as shown in  
Figure 2-7.  
To update the states, we add the path metrics to the branch metrics. Then we compare  
the sums and choose the larger as the surviving branch. The sum of the surviving path  
becomes the path metric for the updated state. We must do this update for all state pairs,  
for each decoder input. When implementing a Viterbi decoder in software, this update is  
the most computation-intensive and creates a do loop that should be optimized to obtain  
good performance. In the next section, we will develop DSP56300 assembly code to  
implement this algorithm.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
2-11  
 
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Algorithmic Enhancements  
2.4  
ALGORITHMIC ENHANCEMENTS  
There are a number of modifications that can be made to the algorithm in order to  
improve its effectiveness as well as our ability to implement it. In our example, we count  
agreements. This produces branch metrics of 0, 1, and 2. In our example butterfly,  
however, the branch metrics into the state pair are all 0Õs and 2Õs. This is not entirely a  
coincidence. For this code, the encoding polynomials both have taps at both extremes.  
What this means is that a 1/0 at the end of the encoder will invert/not invert the encoder  
output. For our trellis, this means that the pair of branches into a given state will always  
have complementary branch metrics (0 and 2, or 1 and 1). By combining this observation  
with one more property, we can simplify the Viterbi butterfly computation.  
The additional property for this code (shown in Figure 2-8) is that a 1/0 at the input bit  
will invert/not invert the encoder output. For our trellis, this means that the two states  
of a state pair being updated will have the same branch metrics, but with their order  
reversed (0 and 2 will become 2 and 0, etc.). These polynomial conditions are not true in  
general but do occur quite often. To take advantage of them, we subtract 1 from every  
branch metric we produce. This gives branch metric pairs of (-1,1) and (0,0).  
As shown in Figure 2-9, each pair now has odd symmetry in the two components.  
Hence, we need not compute branch metrics for both branches in an update. Instead, we  
can find just the uppermost branch metric, and add it as usual for the upper branch.  
Next, subtract rather than add it to do the lower branch processing. To process the lower  
state update, we subtract the branch metric for the upper path and add to update the  
lower path. The total path metrics change, but their relative values do not. Since we are  
only looking for the path with the maximum path metric, the results are the same.  
2-12  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Algorithmic Enhancements  
11/2  
00100  
01000  
If all polys have  
taps at the input,  
branches into the  
same state will  
have complement  
branch metrics.  
5
4
3
2
1
D
4
D
3
D
2
D
1
D
01001  
00/0  
S
S
S
S
S
0
Decoder  
11  
10100  
input  
00100  
If all polys have  
00/0  
01000  
01001  
taps at the last (S )  
4
5
4
3
2
1
D
4
D
3
D
2
D
1
D
state bit, branches  
into the lower state  
pair will be flipped  
versions of the  
upper state  
S
S
S
S
S
0
11/2  
branches.  
10100  
Figure 2-8 Encoder Tap Symmetry to Reduce Calculations  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
2-13  
Freescale Semiconductor, Inc.  
The Viterbi Algorithm  
Algorithmic Enhancements  
11/1  
00100  
After subtracting  
01000  
01001  
one from each  
5
4
3
2
branch metric, the  
updates can be  
be done by adding  
and subtracting 1.  
1
D
4
D
3
D
2
D
1
D
S
S
S
S
S
0
00/-1  
Decoder  
input  
11  
10100  
00100  
Similarly, the lower  
state update can be  
done by subtracting 1  
first, then adding 1  
for the lower branch.  
Same metric, but the  
add/sub order is  
00/-1  
01000  
01001  
5
4
3
2
1
D
4
D
3
D
2
D
1
D
S
S
S
S
S
0
11/1  
reversed.  
10100  
Figure 2-9 Applying an Offset to Obtain Odd Symmetry in the Branch Metrics  
A second modification to the algorithm is the ability to use soft decisions. By this we  
mean that the input data need not be just a 0/1 decision on a receiver statistic. We can  
assign intermediate values to indicate the reliability of the decision. In this way, a 0.001  
can be a very reliable 0, a .75 can be a moderately reliable one, and a .49 is just barely a  
zero. The Viterbi algorithm generalizes to handle soft decisions with no trouble; the  
additions for the metric updates simply have more bits. By using soft decisions, the  
ability of the decoder to correct errors is greatly improved.  
Note that from an implementation on our DSP view, this modification is almost  
invisible, as 16-bit or 24-bit arithmetic will be done regardless of the number of bits in the  
branch metrics. Of course, provisions to protect path metric values from overflow may  
have to change for some systems.  
2-14  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
SECTION 3  
EXPANDING THE VITERBI ALGORITHM  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
3.1  
3.2  
3.3  
3.4  
3.5  
3.6  
3.7  
3.8  
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3  
Partitioning the Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-3  
The Inner Loop: Viterbi Butterflies . . . . . . . . . . . . . . . . . . . . . . . .3-3  
Creating the Branch Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6  
Storing the Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11  
Traceback: Obtaining the Decoder Output . . . . . . . . . . . . . . . .3-13  
Main: Gluing the Pieces Together . . . . . . . . . . . . . . . . . . . . . . .3-17  
Memory Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-19  
3-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Introduction  
3.1  
INTRODUCTION  
Having introduced the principal concepts involved in the Viterbi algorithm, we can  
proceed to generate code to implement it. There are, however, a number of details that  
need attention. We have concentrated on the Viterbi butterfly.The next step involves,  
among other things, generating the branch metrics and allowing for multiple word  
traceback to recover the output data. Up to now, we have not even discussed how the  
path decision might be stored.  
The example code in this section shows a basic implementation of the Viterbi algorithm  
implemented on the DSP56600 assembly language. These examples decode data  
encoded using the convolutional code for IS-136.  
3.2  
PARTITIONING THE TASK  
In this section, we develop code to implement the entire algorithm. We will start with  
the most critical section, the Viterbi butterfly code. We then work outward, developing  
branch metric storage code and traceback instructions.  
All code assumes 16-bit registers and will run as-is on the DSP56600 DSP family. It also  
runs correctly on the DSP56300 family of DSPs when they are set in 16-bit arithmetic  
mode.  
A complete code listing of the modules presented here appears in Appendix A. The code  
can be easily modified for 24-bit operation by changing one line of code, as shown in the  
detailed traceback code discussion. Note that the test data provided is 16-bit data, and  
would have to be changed to 24-bit data to test correctly in 24-bit arithmetic mode. Also,  
Appendix C has a 24-bit version of the code discussed in Section 4.  
3.3  
THE INNER LOOP: VITERBI BUTTERFLIES  
We begin by discussing the code used to do a Viterbi butterfly (updating the states). The  
easiest way to present the code for this section is to display the instructions, then discuss  
their function. The code utilizes the Viterbi shift left (VSL) instruction, designed for  
DSP56300 and DSP56600 DSP families to enhance their ability to execute the Viterbi  
algorithm. The code appears in Example 3-1.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-3  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
The Inner Loop: Viterbi Butterflies  
Example 3-1 Using the VSL Instruction in a Viterbi Butterfly  
;*******************viterbi add, compare, select butterfly macro***  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by  
doing an add, compare, select update for state pairs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should offset addresses by NUMSTATES/2  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,x0,y1,r2,r4,r5,n5 r2 unchanged (modulo required)  
Registers:  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path, states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
D \  
C /  
/
\/  
/\  
D /  
/
\
\
SB------NSB  
C
;*****************************************************************  
;
ACS  
;
macro  
;
move  
move  
move  
#BRY,r2  
y:(r2)+,y1  
l:(r5)+n5,a  
;r2 points to branch metrics  
;get first branch metric  
;load 1st metric/path pair  
;
do  
#NoOfAcsButt,_P_NextStage  
;update each state  
sub  
add  
max  
vsl  
add  
sub  
max  
vsl  
y1,a  
y1,b  
a,b  
l:(r5)-n5,b  
;sub pt,br met,get next pair  
;update metrics  
l:(r5)+n5,a  
;pick max,reload 1st pair  
;store survivor, end top half  
;add pt,br met, reload next pair  
;inc st ptr,ld nxt br met  
;pick max met,load next pair  
;st survivor,end 2nd half  
b,0,l:(r4)+  
y1,a  
y1,b  
a,b  
l:(r5)-n5,b  
x:(r5)+,x0  
l:(r5)+n5,a  
y:(r2)+,y1  
b,1,l:(r4)+  
_P_NextStage  
nop  
;needed to separate do loop ends  
endm  
3-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
The Inner Loop: Viterbi Butterflies  
We now explain the routine line by line. Begin by moving the address of the  
precomputed branch metric table to the r2 address register. In our final, combined  
routine, this is unnecessary because the desired value already exists in r2, but we include  
it as a comment here to make it clear that this value is needed. Branch metric creation  
will be discussed in Section 4.2.  
Next, we update the address register that points to the states used to update. The  
memory that stores the path metrics is divided into two parts. In this routine, r5 points to  
the old path metrics, used to update the new ones. Address register r4 points to the new  
states, those being updated. For each new decoder input, we swap the memory used to  
store the updated path metrics with the memory pointing to the old path metrics. To do  
this, we initialize address registers r4 and r5 to be modulo registers, with a modulus of  
twice the number of states. Address incrementing for the butterfly is so designed that at  
the end of the loop, r4 and r5 have automatically swapped pointer values. Thus, no  
instructions need be dedicated to swapping the memory.  
The path metrics are stored in X memory. Collocated with them in Y memory are the  
paths. By paths, we mean a word that contains the bit decisions describing the recreated  
encoder input data that would produce that path in the decoder. How we get these will  
become clear below as we go through this code. By collocating the path metrics with  
their respective paths, we can get both by doing long memory moves. The next two  
moves load the first branch metric, and the first (state 00000) path metric/path pair.  
Now we are ready for the loop. It is necessary to update every state, and we update  
states in pairs. Hence, the number of loop passes is equal to the number of states divided  
by 2. In our example, this is the value of NoOfAcsButt (i.e., 16).  
Because we have polynomials with taps at both ends of the encoder, we can use one  
branch metric, and add and subtract to update our states. We begin by subtracting the  
branch, loading the second path metric/path pair at the same time.  
Next, we add the branch metric to the path metric we just fetched. Note that for both the  
subtract and add of the metric, the path metrics in A1 and B1 are affected, but because  
we are not performing a long word add, the paths in A0 and B0 are not.  
To compute an updated metric, choose the largest result of the subtract/add operations.  
The MAX instruction puts the updated result in B. Note that all of A is transferred if A1  
is the survivor path, which means that B0 holds the path bits that represent the survivor  
path. This instruction also reloads the first path metric/path pair for use in updating the  
lower state later in the loop.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-5  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Creating the Branch Metrics  
The next instruction is the VSL. VSL is a mnemonic for Viterbi Shift Left, a new  
instruction tailored for Viterbi algorithm updates. The action of this instruction is to take  
an accumulator a or b, store the mid-register (a1 or b1) in X memory, shift the  
accumulator left, append a 0 or 1, as indicated by the instruction arguments, and store  
the low register (a0 or b0) in Y memory. The net result is that path bits are updated and  
the path metric/path pair stored in memory.  
This VSL puts a 0 in the LSB of B0. In this way we store the input bit of this transition  
(which is 0 regardless of the path chosen because the input bit is 0 as long as the  
destination state is the upper state). Because we are storing recreated encoder bits, this  
path string will be a recreation of the encoder input, which is what we want for the  
decoder output. Of course, we can only store 16 or 24 bits of the path at a time (16 for  
DSP56600 or DSP56300 in 16-bit arithmetic mode, 24 for DSP56300 otherwise).  
Next, repeat the operations for the lower state update. Note we subtract and add, rather  
than add and subtract, as required by the lower state update. The add instruction  
reloads the second path metric/path pair, while the sub instruction increments the path  
metric fetch pointer and loads the branch metric from the branch metric table, both for  
use in the next loop iteration. The increment of the path metric fetch pointer is a dummy  
read into x0 (i.e., x0 is not used). This allows us to use a second parallel move to  
increment r5.  
The max instruction finds the survivor path metric/path pair for the lower state, and  
preloads the path metric/path pair for the next loop iteration. The last VSL instruction  
shifts the survivor path left 1 bit, appends a 1 to represent the recreated encoder state for  
the lower state, and stores the lower state path metric.  
We now iterate this loop by the number of butterflies needed, and exit the macro.  
3.4  
CREATING THE BRANCH METRICS  
To present the branch metric routine, we start with the encoding polynomials. For this  
example, we start with the encoding polynomials. We then show how to create the  
branch metrics we need (in the order required) to do the butterfly correctly. Recall that  
3
5
2
3
4
5
the encoding polynomials are 1+D+D +D and 1+D +D +D +D . There are 32 states in  
the decoder, so we need 32 branch metrics. In general, we would access these metrics in  
pairs in the butterfly routine, updating the states in pairs. As noted above, we can save  
the work involved in half of these, because our polynomials induce a symmetry in the  
branch metrics. By this we mean that the branch metrics for this code have the property  
that the upper and lower input branches to any state are complementary.  
3-6  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Creating the Branch Metrics  
Thus, by using a shifted version of the branch metrics, we only need one branch metric  
for each state pair update. This metric is the one that gets subtracted from one path and  
added into the other. We need only generate and store 16 branches.  
Any given branch will, for a specified input, have one of four values. The four values  
will represent how well the decoder input compares to the recreated encoder outputs of  
00, 01, 10, 11. For our example, we assume some soft decision input. The decoder input  
data is shifted and scaled so that 1 is changed to 16, and 0 is changed to -16. We obtain  
two partial branch metrics B and B , where B is the partial branch metric for the  
0
1
0
3
5
encoder polynomial 1+D+D +D output, and B is the partial branch metric for the  
1+D +D +D +D polynomial output. We can then compute the four branches as B +B ,  
1
2
3
4
5
0
1
B -B , -B +B , and -B -B , corresponding to recreated encoder values of  
0
1
0
1
0
1
11, 10, 01, and 00.  
The butterfly loop needs the branches in a specific order dictated by the encoding  
polynomials. As seen in Figure 2-2, we update states 2i and 2i+1 using states i and i+16.  
We only need the 16 branches for the 16 state pairs. The state gives us five of the six  
values needed to determine the encoder output. The remaining value can be found by  
examining the butterfly loop assembly code. In the loop, note that we subtract the  
branch metric first (from the upper branch), then add it to the lower branch. The  
butterfly loop is written assuming that the single branch metric value has the correct  
sign to be added in the lower branch. This is equivalent to assuming the encoder input  
bit is a 1 (for the purposes of computing the branch metric table). We account for this in  
the code by choosing positive 16 for 1Õs and minus 16 for 0Õs (as noted in the preceding  
paragraph). The resulting recreated encoder outputs for the first 16 states are shown in  
the following table.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-7  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Creating the Branch Metrics  
Table 3-1 Recreated Encoder Outputs  
Input  
Bit  
Encoder Map Encoder Output to  
Branch Metric for Example  
Decoder 10 Input  
State  
Output  
Branch Metric  
00000  
00001  
1
11  
B +B  
16 + (-16) = 0  
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
01  
10  
00  
00  
10  
01  
11  
10  
00  
11  
01  
01  
11  
00  
10  
(-16) + (-16) = -32  
-B +B  
0
1
00010  
00011  
00100  
00101  
00110  
00111  
01000  
01001  
01010  
01011  
01100  
01101  
01110  
01111  
32  
0
B -B  
0
1
-B -B  
0
1
0
-B -B  
0
1
32  
-32  
0
B -B  
0
1
-B +B  
0
1
B +B  
0
1
32  
0
B -B  
0
1
-B -B  
0
1
0
B +B  
0
1
-32  
-32  
0
-B +B  
0
1
-B +B  
0
1
B +B  
0
1
0
-B -B  
0
1
32  
B -B  
0
1
The code needed to implement the values in this table appears in Example 3-2.  
3-8  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
 
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Creating the Branch Metrics  
Example 3-2 Find Branch Metrics Code  
;*****************BRANCH METRIC MACRO*************************************  
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Input data and generate branch metrics.  
For this decoder, the metric is a scaled  
sum or difference of the real and imag inputs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r1 should point to the next input XY data pair  
OUTPUTS:  
Branch metrics are stored at BRX in XY memory  
REGISTERS USED:  
a,b,x01,y01,r1,r2,n2, r2 unchanged (modulo req'd)  
;****************************************************************  
;
FindMetrics macro  
move  
move  
mpy  
move  
mac  
l:(r1)+,y  
#-16,x1  
x1,y1,a  
#BRY+3,r2  
x1,y0,a  
;grab dec input  
;sign for real component, 0 sent.  
;a has 0x partial branch  
;storage for generated branch metrics  
;a gets 00 branch  
a,b  
mac  
-x1,y0,b  
;b has 01 branch  
;
neg  
neg  
a
b
a,x1  
b,x0  
a,y:(r2)+n2 ;mv 00 to x1,11 to a, st 00 in location 3  
b,y:(r2)+n2 ;mv 01 to x0,10 to b, st 01 in location 6  
;*****************************************************************************  
;
;
AT this point X1 has 00, X0 has 01,  
A1 has 11, B1 has 10, needed for quick storage in Y memory  
;*****************************************************************************  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
endm  
x1,y:(r2)+n2  
x0,y:(r2)+n2  
b,y:(r2)+n2  
b,y:(r2)+n2  
b,y:(r2)+n2  
b,y:(r2)+n2  
x0,y:(r2)+n2  
x1,y:(r2)+n2  
x0,y:(r2)+n2  
x1,y:(r2)+n2  
a,y:(r2)+n2  
a,y:(r2)+n2  
a,y:(r2)+n2  
a,y:(r2)  
;store 00 in location 9  
;store 01 in location 12  
;store 10 in location 15  
;store 10 in location 2  
;store 10 in location 5  
;store 10 in location 8  
;store 01 in location 11  
;store 00 in location 14  
;store 01 in location 1  
;store 00 in location 4  
;store 11 in location 7  
;store 11 in location 10  
;store 11 in location 13  
;store 11 in location 0. r2-> BRY  
Viterbi Decoder Implementation  
3-9  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Creating the Branch Metrics  
The FindMetrics macro is for the most part straightforward. We begin by loading the  
decoder input data to compute the branch metrics. Next, load the scaling factor, and  
multiply to obtain the partial branch metric with one of the inputs. The next line  
initializes address register r2 as a pointer to the table location used by state 3, where the  
branch metrics are to be stored. We then finish the computation of the branch metric for  
a branch with encoder output 00, followed by the metric for 01. With these two metrics,  
we begin to load the branch metric table.  
We can load the branch metric table in any order. To minimize the number of cycles  
needed, we apply a few constraints. First, it is easier if we load the table in some  
consistent manner such as using a constant address offset each time. Second, it is  
convenient if we end up at the address used by state 0, because then we are initialized  
for the butterfly state update. Also, we must finish generating the metric values for 11  
and 10. It is most efficient if we can do this in parallel with the moves to load the branch  
metric table.  
The easiest way to store the branch metrics might be to start at state 15 and count down.  
If we do this, it turns out that a stall is unavoidable: we cannot generate all of the  
required branch metric values in time to use them with this ordering. Other loading  
orders are easy to obtain by using an address register increment that is relatively prime  
to the table length (and using a modulo addressing mode to wrap around). For our  
table length, any odd number increment will ensure that we cover the entire table with  
constant increments. Our first candidate is 3. If we start at state 3, and increment by 3Õs,  
we will cover the entire table and end up at the address for state 0. These first three  
branch metric values are associated with recreated encoder outputs of 00, 01 and 00.  
Because the repeat of 00 gives us time to generate the extra branch metric values with  
no stall, we use increments of 3.  
We begin by writing the branch metric for state 3. At the same time, we negate A to  
compute the metric for branches with encoder output 11, saving the metric for 00 in x1.  
For the next write, we negate B to compute the metric for 10, saving the metric for 01 in  
x0. We write to the branch metric table at the same time.  
What follows are writes to the branch metric table. Using modulo addressing, we  
increment the address by 3Õs until the entire table is filled. We write the branch metrics  
determined by the encoder polynomials as they appear in Table 3-1.  
Finally, note that this code has been constructed so that the address register r2 points to  
the beginning of the branch metric table at the end of this routine. Because of this, the  
butterfly loop is automatically initialized to load branch metrics by the end of this  
routine.  
3-10  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Storing the Paths  
3.5  
STORING THE PATHS  
For some applications of the Viterbi algorithm, the paths will not need to be stored. This  
occurs when the memory of the encoder (or channel, for maximum likelihood sequence  
estimation) is small enough for the paths in the algorithm to merge quickly. In other  
situations, this may not be adequate. To clarify the storage process, we introduce the  
idea of traceback.  
Assume the decoder has been processing data for some time. If we examine the paths  
up to the time of the current input, it is not clear what path should be chosen. In fact,  
choosing a path to decide what decoder output should correspond to the current input  
would lead to poor performance. This is because we do not have all the data we can get.  
Because the current encoder input has an effect on the decoder inputs for some time  
into the future, future decoder inputs can be used to increase the reliability of current  
decoder output decision values. As a result, we introduce delay into the decoder,  
waiting for future inputs to be processed before making a decision.  
For the Viterbi algorithm, this means that while the current array of paths may not have  
reliable decisions about the present, they do contain reliable data about the past. In  
practice, there is a traceback depth for the decoder, the number of states we need to look  
back before the decision data is assumed to be reliable. For this depth, it is highly likely  
that the paths have converged. This means that all current paths have the same data up  
to this time in the past. If this depth is less than the register size used for the paths (i.e.,  
16 bits for the DSP56600 and the DSP56300 in 16-bit arithmetic mode, 24 bits for  
DSP56300 otherwise), we can choose a path and output the most significant bits of the  
register as decisions.  
If this is not the case, we must make some provision to store the paths. For our example,  
a traceback depth of 16 bits is inadequate. A depth of 24 bits is extremely marginal for  
this example, while 35 bits is probably more than needed. Hence, we implement  
traceback storage. For this example, we actually store paths for the entire sequence  
length (assumed to be 168 bits here), and produce the entire output at the end. It should  
be easy to modify the code to produce partial output data in stages if desired.  
The critical concept in storing paths is to arrange the path data so that we can connect the  
paths for consecutive storage times. The path bits of the survivor path bits at the end  
must be joined with their preceding path bits. Because we stored the preceding path bits  
in memory, we must know where in memory to look for the survivor path. Where in  
memory do we look?  
To see how this is done, consider an example path in register B0 just after a Viterbi  
butterfly update. Assume there are 13 pairs of decoder inputs processed so far.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-11  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Storing the Paths  
Bit #  
15  
8
7
0
0 0 0 1 0 0 1 1 1 0 1 0 1 1 1 0  
B0  
Stored path byte  
Encoder state  
Figure 3-1 Stored B0 Path  
Note that the current encoder state is part of the path! In retrospect, this must be so, as  
both the state (the recreated encoder state) and its path (the potential decoder outputÑ  
the recreated encoder input) represent the same data stream. To use this, store the path  
byte in a memory location with address defined by the time the path bytes are stored,  
offset by the current encoder state. Then, make sure the saved partial path does not  
contain the current encoder state. Instead, save the current encoder state part of the path  
to be stored the next time we save paths.  
To do the traceback, start with the survivor path at the end, and trace the path bits back  
in time to recover the decoder output. When we recover the stored path as part of our  
traceback, the final bits in that path point to the offset memory location where we need  
to read to continue the traceback.  
Hence, if we omit the most recent 5 bits of the path, we can potentially store up to 11 bits  
of path (assuming 16-bit registers). We have chosen to store 8 in this example because  
reconstruction is a little easier if we have data in bytes. There is a trade-off. Storing more  
path bits at a time means we donÕt need to store paths as often (fewer instructions  
executed, less memory used), but the traceback reconstruction is more difficult because  
we have to piece together 11-bit data into 16-bit words.  
The code in our example code makes one more assumption. We assume that the data is  
sent in a block, and that the transmitter ends the block by 0 filling the encoder. The effect  
of 0 filling the encoder is to end everything at state 0. Hence, for traceback, start  
everything at state 0.  
3-12  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Traceback: Obtaining the Decoder Output  
For other codes that do not use this assumption, traceback can either begin at state 0  
anyway (relying on the convergence of paths to give the correct output at the traceback  
depth) or the decoder can search for the state with the maximum path metric to begin  
the traceback.  
3.6  
TRACEBACK: OBTAINING THE DECODER OUTPUT  
This section presents the code needed to obtain the decoder output from the stored path  
data. For this example, we have data organized in blocks of 168 bits. Because the encoder  
is flushed to the zero state at the end, and the data for all 168 bits is stored, the traceback  
can obtain the entire block at once. The most complicated portion of this code is the need  
to obtain the pointer to the next previous state from the current path data.  
The code listings for storage and traceback appear in Example 3-3 and Example 3-4,  
respectively.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-13  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Traceback: Obtaining the Decoder Output  
Example 3-3 Partial Path Storage Code Listing  
;***********************STORE PARTIAL PATH METRICS MACRO***************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: The storage is somewhat twisted. The stored paths are current  
up to the most recent input bit, which is NOT convenient for  
traceback. I process the data as follows: I pre  
loaded the path with 6 bits. Thereafter, I process 8 path bits  
so the path has 14 bits. The most significant 8, I save.  
the remaining bits are the current encoder values for that path.  
The 5 lsb's are the current state.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Updated paths storage X memory, n0 points to next path storage  
REGISTERS USED:  
a,b,x1,r0,n0,r3,r5  
;*********************************************************************  
;
;
;
Store off path data in bytes to avoid overflow in path reg's  
STOREPATHS macro LPCNT  
;
move  
move  
move  
do  
move  
asr  
n0,r3  
n0,r0  
#>$1f,x1  
LPCNT,_PSTORE1  
y:(r5),b  
#5,b,a  
;n0 stores path data pointer  
;mask for 5 lsb's(NUMENCBITS of 1's)  
;store paths for each state  
;grab path  
;align bits 5-12 with a1  
(the 8 bits beyond enc)  
;mask off 5 lsb's to return to path  
;store 8 ms path bits  
;
and  
move  
move  
x1,b  
a1,x:(r3)+  
b1,y:(r5)+  
;return 5 ls path bits  
_PSTORE1  
lua  
lua  
(R4+NUMSTATES),R5  
(r0+NUMSTATES),n0  
;r5 points to latest states  
;update path data pointer  
endm  
To store the paths, begin by recalling the path storage address pointer. In the example  
code, this is stored in n0. We place copies into r3 (to use as the address pointer for data  
writes) and r0 (to update the pointer to the next page at the end of the routine). We then  
mask off the 5 LSBs, and proceed to store the paths. We have designed this macro to  
process a variable number of states, controlled by the macro argument LPCNT. This  
variability will be used in Section 4.  
3-14  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Traceback: Obtaining the Decoder Output  
For each path stored, then, we start by reading in the path. We move the path 5 bits to  
the right, so that A1 has the byte of path data to store right aligned. We mask off all but  
the 5 rightmost bits in B1 to clear the stored path bits. The loop finishes by writing both  
the path data to be stored and the updated path for the path metric.  
The macro finishes by setting r5 for use in the next butterfly, as well as updating N0 to  
point to the next page of stored path data during the next storage interval.  
Example 3-4 Traceback Output Path Code Listing  
;***********************TRACEBACK OUTPUT PATH MACRO*******************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: To output the correct data, we begin at the end. We take the  
output path of the survivor state (0), and place its associated  
output path in memory as the last output data byte. Then we use  
bits 3-7 of that data as an offset pointer to the correct traceback  
data of the next previous path data memory. We continue this until  
we have traced the data back to the beginning.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Decoder output data in Y memory  
REGISTERS USED:  
a,b,y,r0,n0,r2,r5  
;*********************************************************************  
;
;
;
Store off path data in bytes to avoid overflow in path Registers  
TRACEBACK macro  
;
move  
move  
move  
move  
move  
move  
move  
N0,R0  
#0,n0  
;path ptr to r0  
;prep for traceback  
;point to end to trace data  
;recall last path  
;r2 now linear  
;save off last path data  
;control word for extract, 16 bit  
;control word for extract, 24 bit  
#DECOUT+(NUMINPUTS/8/2),r2  
y:(r5),b  
#-1,m2  
b1,x:(r0)  
#$513,x0  
;
;
move  
#$501d,x0  
;*****************BEGIN TRACEBACK*****************************  
;
IF (EVEN==0)  
move  
x:(r0+n0),a  
;recall last path  
extractu x0,a,b  
;bits 3-7 of a1 point to next data  
;dec r0 to next earlier state set  
;move to upper byte  
lua  
(r0-NUMSTATES),r0  
lsl  
#8,a  
move  
move  
ENDIF  
b0,n0  
a1,y:(r2)-  
;load as offset for traceback  
;save off  
;
Viterbi Decoder Implementation  
3-15  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Traceback: Obtaining the Decoder Output  
Example 3-4 Traceback Output Path Code Listing (Continued)  
do  
move  
#NUMINPUTS/8/2,TRCBK  
x:(r0+n0),a  
;once for each byte pair  
;recall last path  
extractu x0,a,b  
;get ptr to next earlier path  
;point r0 to next earlier states  
;save ptr as offset  
;save out byte in x1  
;do it all again!  
lua  
(r0-NUMSTATES),r0  
move  
move  
move  
b0,n0  
a1,x1  
x:(r0+n0),a  
extractu x0,a,b  
lua  
move  
lsl  
or  
(r0-NUMSTATES),r0  
b0,n0  
#8,a  
x1,a  
;move to upper byte  
;or in last byte to get 16 bit word  
;store result  
move  
a1,y:(r2)-  
TRCBK  
endm  
We begin the traceback in Example 3-4 by moving the path storage pointer to r0 and  
initializing n0 for its use as an offset. Next we position r2 at the end of the buffer that will  
contain the decoder output. Again, this is because we will traceback the path from the  
end survivor path to the beginning. For our example, we can start at state 0. Start by  
taking the current path for state 0 (r5 points to this) and reading into B. We put r2 in  
linear addressing mode, then store the last path (the loop will read it again). Finally, we  
load x0 with the control code ($513) to extract the pointer to the next traceback location  
from the path.  
The next section of code is used if the number of bytes of data is odd. It decodes the last  
byte, and places it in the most significant byte of decoder output. As this code is like the  
last half of the following do loop, discussion of this code is deferred.  
The do loop processes the traceback two bytes at a time. In this way, it can assemble the  
decoder output into 16-bit words for more efficient storage. Begin by reading the current  
path into A1. Then, extract the pointer to the next (actually previous) path using extract.  
Recall the control register contains $513, which means we take bits $13-$17 of A or,  
equivalently, bits 3-7 of A1. Note that we can load $501d instead and the program will  
work in 24-bit mode, although the example data provided would require 00 appended to  
every data symbol to test correctly. As noted above, these most significant bits of the  
path byte point to the storage address of the path continuation in the previous ÒpageÓ of  
memory. By page, we mean the block of stored path bits for each state. There is one page  
written each time we store paths (every 8 decoder input periods).  
Next, move r0 to point to the previous page, then move the extracted path pointer bits to  
n0 to be used in the next traceback read. We end processing of this least significant byte  
by moving it to x1.  
3-16  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Main: Gluing the Pieces Together  
Again, we read the next (previous) path into A1 using r0, which points to the previous  
page, and n0, which points to the correct path in that page. We extract the new path  
pointer information, update r0 to the next previous page, and move the path pointer to  
n0. Move the path data to the upper byte, OR in the lower byte from x1, and move the  
resulting word into memory using r2. The loop continues until we have traced back to  
the beginning, and y:DECOUT points to the start of the decoder output.  
3.7  
MAIN: GLUING THE PIECES TOGETHER  
The only thing left is to put the pieces together. In Examples 3-5 through 3-7, we present  
the main routine. Its function is to initialize registers so everything starts properly, and  
to invoke the macros at the needed time.  
Example 3-5 Main Viterbi Decoding Routine: Initialization  
;***************************MAIN***********************************************  
;
NUMSTATESequ  
32  
ENCBITS  
NoOfAcsButt  
NUMINPUTSequ  
EVEN  
;
equ  
equ  
168  
equ  
6
;most cases=log2(NUMSTATES)+1  
NUMSTATES/2  
1-(NUMINPUTS/8)%2 ;EVEN SET TO 1/0 IF NUMINPUTS IS  
EVEN/ODD #BYTES  
org  
p:$400  
VITDEC move  
move  
move  
move  
move  
move  
move  
move  
move  
;
#NUMSTATES/2-1,m2  
#>3,n2  
#STATE1,r5  
;r2 points to branch metric table  
;increment for branch metric storage  
;r5 points to current state metric  
;r4 points to updated state metric  
;both modulo to flip loc each sym  
#STATE2,r4  
#NUMSTATES*2-1,m4  
#NUMSTATES*2-1,m5  
#NUMSTATES/2,n5  
#PATHOUT,n0  
;input metrics spacing for each butterfly  
;n0 points to storage for output paths  
;set linear mode, traceback ptr  
#-1,m3  
move  
;
#INDATA,r1  
;r1 points to input data  
Example 3-5 begins the Viterbi decoding routine by setting equates for the encoder size  
and input data length, and initializing address registers. NUMSTATES is set to the  
number of encoder states, which for our example is 32. ENCBITS is the number of bits  
used to encode. This includes the input bit as well as the state bits. NoOfAcsButt sets the  
number of ACS butterflies for the butterfly loop. For our rate 1/2 code, this is  
automatically set to half the number of states. The EVEN flag is used by the assembler to  
include an extra half-byte of traceback if the number of input bytes of data is odd.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-17  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Main: Gluing the Pieces Together  
Next, we begin the executable code. The first two lines initialize address registers m2  
and n2 so that r2 addressing wraps around and increments by 3ÕsÑboth for branch  
metric storage. The next 5 lines initialize address registers r4 and r5 to wrap around for  
path metric/path storage to begin them pointing to the correct places in the storage  
table. Offset register n5 is set to access path metrics spaced halfway apart in the storage  
table.  
The initialization finishes by pointing n0 to the path storage table start, making sure r3  
is in linear mode for traceback, and pointing r1 to the input data. Note that the r1  
addressing mode is not set. We assume that an outside calling routine does this, and  
permit circular input buffers if desired.  
Example 3-6 Main Viterbi Decoding Routine: Patch Metric Update  
;
;***********************PREPARATION LOOP*************************  
;
;
;
;
;
This loop iterates by the number of bits used in the  
encoder to pre load the bit decisions. Thereafter,  
the paths are updated and stored off in bytes. We need  
the preload bits so that the stored path metrics point  
correctly to their previous paths for traceback  
;****************************************************************  
;
do  
#ENCBITS-1,PRELP  
;once for each encoder bit  
;
FindMetrics  
ACS  
PRELP  
;*****************MAIN LOOP--PROCESS BYTES OF DATA*****************  
do  
do  
#NUMINPUTS/8-1,DATALP  
#8,SYMLP  
;process bytes of output  
;8 bits per byte  
;
FindMetrics  
ACS  
SYMLP  
STOREPATHS#NUMSTATES  
DATALP  
These next two sections do much of the decoding work. The first loop processes  
ENCBITS of input data. We process 1 pair of decoder inputs for each iteration of the  
macros FindMetrics and ACS. This preprocessing loads ENCBITS-1 of data into the  
paths. Thereafter, we process the decoder input pairs in groups of eight, producing 8  
path bits for each state. For every 8 decoder input pairs processed, we invoke the  
STOREPATHS macro to store the paths.  
3-18  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Memory Organization  
By preloading ENCBITS-1 and storing the leftmost eight every time we store paths, we  
ensure that the paths always have (ENCBITS-1)+8 bits at the time we store paths.The  
extra ENCBITS-1 bits are, as noted above, the current state, and are kept till the next  
path store time to allow us to traceback. The main loop processes all but the last  
8-ENCBITS+1 sets of decoder input data.  
Example 3-7 Main Viterbi Decoding Routine: Termination and Traceback  
;
;********POSTPROCESSING,LAST INFO BYTE ********************************  
;
; ENCBITS PREPROCESSED, SO WE HAVE 8-ENCBITS BITS LEFT.  
; FOR THIS EXAMPLE, WE HAVE 3INPUTS LEFT TO PROCESS  
;**********************************************************************  
;
;
;
THE LAST THREE BITS------  
do  
#8-ENCBITS+1,FLSH1;process last 3 bits to get last byte  
;
FindMetrics  
ACS  
FLSH1  
;*******Traceback the path data to obtain the decoder output**********  
TRACEBACK  
FIN  
nop  
Example 3-7 shows the final processing for the data block. At this time, we have  
processed all but the last byte, i.e., 8 bits less the ENCBITS-1 processing we did in  
preprocessing. This means we have 3 bits of data left to process. After processing the  
last decoder input, the TRACEBACK macro is invoked to obtain the decoder output.  
The decoded data is memory at y:DECOUT, in 16-bit word form.  
3.8  
MEMORY ORGANIZATION  
Example 3-8 shows all memory organization except the input data defines used to test  
the code. The first reserved spaces are dedicated to the storage for the path metrics (in X  
memory) and their respective paths (in Y memory). Note that these must be collocated.  
For this code, we initialized the metrics for that state 00 has a large path metric, and the  
rest are 0. We do not initialize the path metrics in executable code! This allows the  
decoder to operate on data over multiple invocations if desired. If the decoder is  
operating on independent data blocks, each started assuming the encoder starts in the 0  
state, then this memory will have to be initialized accordingly. Another alternative in  
this case would be to modify the butterfly loop in the preprocessor to use the fact that  
the encoder starts in the 0 state. This option is explored in Section 4.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
3-19  
Freescale Semiconductor, Inc.  
Expanding the Viterbi Algorithm  
Memory Organization  
Example 3-8 Memory Organization Code  
;**********************MEMORY ORGANIZATION****************************  
;
;
;
;
;
;
MOST OF THE MEMORY LOCATION IS IMPORTANT--THE FIRST TWO  
LABELS OF THE X AND Y DATA MEMORY ARE PAIRED AND MUST  
BE CO LOCATED (AT THE SAME ADDRESSES). IN ADDITION, STATE1  
AND STATE2 MUST BE LOCATED ON A 0 MOD 2*NUMSTATES BOUNDARY,  
BRY MUST BE ON A 0 MOD NUMSTATES/2 BOUNDARY.  
X and Y memory for input data must be paired --GOOD LUCK.....  
;*********************************************************************  
;
org  
STATE1 DC  
DC  
STATE2 DC  
DC  
x:$0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
;
PATHOUT DS  
NUMSTATES*(NUMINPUTS/8+1)  
INDATA ;THIS DATA ENCODES  
;$1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$00 ;....  
;I didnÕt include the test data in this listing...  
org  
DC  
DC  
DC  
DC  
y:$0  
PATH1  
PATH2  
BRY  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
NUMSTATES/2  
DSM  
DECOUT DS  
org  
DC  
NUMINPUTS/8  
y:@cvs(y,INDATA)  
$a000,$a000,$a000,$6000;....  
YDATA  
The path metric/path memory must start on a 0 mod 2*NUMSTATES boundary for the  
modulo addressing to work properly. In addition, the Branch metric storage BRY must  
start on a 0 mod NUMSTATES/2 boundary to operate correctly. Finally, the storage of  
the input data must be paired in X and Y memory, because we do long moves to get the  
decoder input data pairs into the branch metric calculation.  
3-20  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
SECTION 4  
ALGORITHMIC EXTENSIONS  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
4.1  
4.2  
4.2.1  
4.2.2  
4.3  
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3  
Allowing More General Branch Metrics . . . . . . . . . . . . . . . . . . . .4-3  
Modify Viterbi Butterfly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4  
Modify Branch Metric Generation . . . . . . . . . . . . . . . . . . . . . .4-6  
Starting from 0: The Pre ACS Macro. . . . . . . . . . . . . . . . . . . . . .4-8  
Collapsing the States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-10  
Main: Putting the Pieces Back Together . . . . . . . . . . . . . . . . . .4-12  
4.4  
4.5  
4-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Introduction  
4.1  
INTRODUCTION  
This section shows how to adjust the code of Section 3 to include several enhancements.  
Generalized branch metrics are allowed, as well as optimizing the code to reduce the  
computations when data occurs in blocks.  
The code in this section introduces several modifications to the code discussed in  
Section 3. Section 4.2 modifies the handling of branch metrics to allow nonsymmetrical  
branch metrics. Section 4.3 and Section 4.4 develop two additional macros useful if the  
data is blocked and the encoder is periodically forced through a known state. We put all  
of these routines together to produce a modified Viterbi algorithm decoder that has  
autonormalized path metrics and efficient coding for blocked data. Note that a complete  
listing of this code appears in Appendix B.  
4.2  
ALLOWING MORE GENERAL BRANCH METRICS  
The code introduced so far is efficient for algorithms that have special branch metrics. In  
particular, the branch metrics are assumed to be inverses of each other and flipped  
between state pairs. Although this is true for many popular codes in use today, it is not  
necessarily the case. If a polynomial set is specified where one of the encoding  
polynomials does not have a tap at one end, the branch metrics lose some of their  
symmetry. A simple example is the polynomial set (1,1+D), whose trellis appears in  
Figure 4-1.  
S
0
00  
01  
0
S
0
11  
+
10  
1
Figure 4-1 Polynomial (1,1+D) Trellis  
Note that the input branches to State 0 are not complements of each other! We might  
accommodate this with a fractional branch offset to make branch 00 and branch 01  
negatives, but note that the offset for the lower state would have to be different. Another  
approach is to compute separate branch offsets for the upper and lower state updates.  
This requires a different version of the branch metric generation as well as the butterfly  
loop. We consider modified code for each of these functions in the examples that follow.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
4-3  
 
 
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Allowing More General Branch Metrics  
4.2.1  
Modify Viterbi Butterfly  
Begin modification with the butterfly loop. The code from Section 3 reads in the branch  
metric from y memory near the loop end (the y:(r0)+,y1 that appears two lines  
from _P_NextStage). The easiest way to get two branch metrics is to do long reads on the  
branch metrics. For the code in Section 3, however, finding the space to do the extra  
branch read is harder. Most of the data movement is tightly controlled and cannot be  
moved in the code without disrupting the data flow. Instead, we can make use of the  
pipeline stall in the Viterbi butterfly loop. The modified code appears in Example 4-1.  
Example 4-1 Modified Viterbi Butterfly  
;*******************viterbi add, compare, select butterfly macro***  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by  
doing an add,compare,select update for state pairs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should offset addresses by NUMSTATES/2  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
Registers:  
a,b,y01,r2,r4,r5,n5 r2 unchanged (modulo req'd)  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path,states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
D \  
C /  
/
\/  
/\  
D /  
/
\
\
SB------NSB  
C
4-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
 
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Allowing More General Branch Metrics  
Example 4-1 Modified Viterbi Butterfly (Continued)  
;*****************************************************************  
;
ACS  
;
macro  
;
move  
move  
move  
#BRY,r2  
l:(r2)+,y  
l:(r5)+n5,a  
;r2 points to branch metrics  
;get first branch metric  
;load 1st metric/path pair  
;
;
do  
#NoAcs,_P_NextStage  
;update each state  
;sum pt,br met,get next pair  
;update metric  
add  
add  
max  
move  
vsl  
add  
add  
max  
move  
vsl  
y0,a  
y1,b  
a,b  
l:(r5)-n5,b  
l:(r5)+n5,a  
;pick max, reload 1st pair  
;load next branch metrics  
;save surivior, end top half  
;sum pt,br,reload next pair  
;sum again,inc pt mt read ptr  
;pick max, load next pair  
;load next branch metrics  
;write survivor,end 2nd half  
l:(r2)+,y  
b,0,l:(r4)+  
y1,a  
y0,b  
a,b  
l:(r5)-n5,b  
(r5)+  
l:(r5)+n5,a  
l:(r2)+,y  
b,1,l:(r4)+  
_P_NextStage  
nop  
;needed to separate do loop ends  
endm  
This code is almost identical to the butterfly code in Section 3, with the following  
differences. The first executable line is a long move to register Y. This allows us to grab  
two branch metrics at the same time. For this example, we show both branch metrics  
being added to the path metrics to update the states. Line 9 in the loop (the assembly line  
move l:(r2)+,y) is a long read of two more branch metrics. This is an extra line of code,  
but it executes in the same number of cycles as the original code because it takes the  
place of a pipeline stall. Finally, note that address register 5 is still incremented, but  
without a dummy read into a register.  
The loop uses the same branch metrics for both upper and lower path metric updates.  
This is what is needed for Example 4-1, where we still have some symmetry because  
both encoding polynomials have a tap on the input bit. If this is not the case, the code can  
accommodate other conditions. A more general setup is to uncomment the long move  
that reads additional branch metrics in line 4 of the loop. This line will also take the place  
of a pipeline stall, and so does not increase the number of cycles needed to execute the  
butterfly.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
4-5  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Allowing More General Branch Metrics  
4.2.2  
Modify Branch Metric Generation  
The branch metric generation needs modification also. Example 4-2 shows the user a  
different motivation for noncomplementary branch metrics. In this example, we use  
convolutional code from Section 3, but take the path metric for state 0 and subtract it  
from all branch metrics. This is a computationally inexpensive way to automatically  
normalize the path metrics so that we need not worry about arithmetic saturation of  
path metrics when decoding very long streams of data. It does, however, destroy the  
symmetry of the branch metrics.  
Example 4-2 Modified Branch Metric Generation Code  
;*****************BRANCH METRIC MACRO*************************************  
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Input data and generate branch metrics. This function  
subtracts the path metric for state 0 from all branch metrics  
to provide autonormaliztion. For this decoder, the metric is a scaled  
sum or difference of the real and imag inputs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r1 should point to the next input XY data pair  
OUTPUTS:  
Branch metrics are stored at BRX in XY memory  
REGISTERS USED:  
a,b,x01,y01,r1,r2,r5, r5 unchanged,r2 unchanged (modulo req'd)  
;****************************************************************  
;
FindMetrics macro  
move  
neg  
move  
mac  
move  
mac  
subl  
tfr  
x:(r5),a  
a
;st. metric scaling, read last st. 0  
l:(r1)+,y ;negate metric|grab dec input  
;sign for real component, upper br.  
#-16,x1  
x1,y1,a  
y1,x0  
x1,y0,a  
a,b  
y1,a  
-x1,y0,a  
x0,b  
a,b  
;comp 0x partial br. path mt. to  
;mv real input to x0  
a,y1  
b,x0  
a,y1  
b,x1  
b,y0  
y1,x0  
;a gets 00 br. y1 gets 0x partial br  
;a has 00 br, b has 11 br, save path  
;swap 0x and 00 to compute 01 branch  
;a gets 01 br  
;swap path metric,11 branch to y0  
;b gets 10 br, x0 has copy of 00 br.  
mac  
tfr  
subl  
a,b  
;*****************************************************************************  
;
;
;
AT this point, X is configured with br 11|00, y with 00|11, and  
AB with 01|10, BA with 10|01 all needed for quick storage in XY  
memory.  
;*****************************************************************************  
4-6  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
 
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Allowing More General Branch Metrics  
Example 4-2 Modified Branch Metric Generation Code (Continued)  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
endm  
x,l:(r2)+  
ab,l:(r2)+  
ba,l:(r2)+  
y,l:(r2)+  
y,l:(r2)+  
ba,l:(r2)+  
ab,l:(r2)+  
x,l:(r2)+  
ba,l:(r2)+  
y,l:(r2)+  
x,l:(r2)+  
ab,l:(r2)+  
ab,l:(r2)+  
x,l:(r2)+  
y,l:(r2)+  
ba,l:(r2)+  
;store 11 in location 0.  
;store 01 in location 1  
;store 10 in location 2  
;store 00 in location 3  
;store 00 in location 4  
;store 10 in location 5  
;store 01 in location 6  
;store 11 in location 7  
;store 10 in location 8  
;store 00 in location 9  
;store 11 in location 10  
;store 01 in location 11  
;store 01 in location 12  
;store 11 in location 13  
;store 00 in location 14  
;store 10 in location 15 r2-> BRX  
For this example, we are using the same code as in Section 3 to make the unnormalized  
branch metrics symmetrical. For easy storage in memory, we want to generate metric  
pairs. Each pair should have the values needed for the upper and lower branch metric  
value needed for the butterfly. For this example, the branch metric for encoder output 00  
will be paired with the branch metric value for 11 and the value for 01 paired with the  
value for 10. All branch metrics will have the previous path metric for state 0 subtracted  
from them. This means that after the Viterbi butterfly update, each path metric value  
will have been normalized. It can be shown that this limits the worst case path metric  
values to be no more than 12 times the maximum branch metric value (for this code  
example).  
The branch metric macro begins by reading in the last path metric for state 0. It then  
negates that value in accumulator a and reads the decoder input into register y. The data  
in y is assumed to be in the form y1:y0 mapped to real:imaginary data input. Next, the  
scaling factor -16 is placed in register x1.  
The mac instruction finds a partial branch metric by multiplying the real data by the  
scaling factor and adding the result to the negated path metric value in a. We also save  
the negated path metric value to b for later use.  
Next, we move the real decoder input from y1 to x0. The next mac instruction multiplies  
imaginary input by the scaling factor and adds it to a. The accumulator a now has the  
normalized branch metric for encoder 00 outputs. At the same time, the partial branch  
metric that was in a is saved in y1 for later use.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
4-7  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Starting from 0: The Pre ACS Macro  
The subl a,b instruction does a shift left and subtract operation on b and a to obtain the  
branch metric for encoder 11 outputs. The negated path metric value is saved in x0. The  
transfer instruction and its parallel move swap y1 with a to save the 00 branch metric  
and place the partial branch metric for the next computation.  
The next mac instruction scales the imaginary part and subtracts it from the partial  
branch metric to obtain the branch metric for encoder 01 outputs. The 11 branch metric is  
saved in register x1. The negated path metric is placed in b, and the 11 branch metric is  
move to y0.  
The final subl instruction shift left and subtracts the negated path metric from the 01  
branch to get the 10 branch metric in b. The 00 branch metric is copied to x0.  
After all these moves, we find that register x1:x0 contains branches 11:00, register y1:y0  
contains branches 00:11, and a1:b1 has branches 01:10. Note that we can access  
accumulators as ab or ba to get branches 01:10 or 10:01. All we have to do now is to place  
the branch metric pairs in memory to correspond with their states for the Viterbi  
butterfly loop.  
Unlike the case in Section 3, our branch metric generation allows us to use a simple  
memory placement scheme. We begin with the memory location for state 0, and  
increment through the states until the r2 address pointer rolls over to 0 after location 15  
(r2 is set to be a modulo 16 address pointer). Because we have all possible needed branch  
metric pairs in some register, we just code each move with the branch metric pair for that  
state.  
4.3  
STARTING FROM 0: THE PRE ACS MACRO  
For data that is coded in packets, it is not unusual for the encoder to start in a known  
state. The easiest case is when the encoder begins each data packet starting in state zero.  
This is what is assumed for this example. For this case, the beginning trellis is shown in  
Figure 4-1. Notably, the number of states to be updated changes from 2 to 4 to 8, etc.,  
doubling until the full 32 states is reached. Until that time, we have fewer computations  
to do because the number of states is smaller and because no comparison of competing  
branches is needed until 32 states of the trellis are being used. We can reduce the  
computation needed for a packet by treating the first five decoder inputs separately.  
The code to do this appears in the PreACS macro in Example 4-3. The first two moves  
save the state address pointers to be restored at the routineÕs end. Next the branch  
metrics are loaded into the accumulators b and a. Finally, the path metric/path pair is  
loaded into x.  
4-8  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Starting from 0: The Pre ACS Macro  
The do loop processes each state update. The branches are added to the state to obtain  
the state updates. The second of these adds also loads in the next path metric/path pair  
for the next loop iteration. The VSL instructions store the updated path metrics, and the  
move loads the next branch metrics.  
After the loop, we double n5 (used to vary the number of loop iterations), restore r4 and  
r5 to swap path metric read/write addresses, and reinitialize the branch metric pointer  
for use by the branch metric update routine.  
Example 4-3 Pre-ACS Macro  
;*******************************;PreACS***************************************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by  
ACS butterfly from assumed 0 state to start, and double  
the number of states on each invocation until the full  
trellis is used. This routine CANNOT be used unless the  
encoder is starting from an all 0's state. Note this  
means it cannot be used if the data is a continuation  
of a previously processed data stream. Use the ACS macro  
instead. However, for packetised data, or other data that  
assumes the data starts with the encoder 0 filled, this  
routine saves cycles, AND only state 0 needs to be initialised  
before starting the decode.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should be the number of input states/2 to process  
n5 is doubled each time this macro is invoked  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd)  
Registers:  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path,states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
C
D \  
\
\
\
\
SB  
NSB  
Viterbi Decoder Implementation  
4-9  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Collapsing the States  
Example 4-3 Pre-ACS Macro (Continued)  
;*****************************************************************  
;
PreACSmacro  
;
;
;
;
move#BRY,r2  
mover5,r3  
mover4,n3  
movel:(r2)+,ba;get first branch metrics  
movel:(r5)+,y ;load 1st metric/path pair  
;r2 points to branch metrics  
;save r5 to init r4 later  
;save r4 to init r5 later  
do n5,_P_NextStageupdate each state  
addy,a  
addy,bl:(r5)+,y;updt met, ld nxt pair  
vsla,0,l:(r4)+;end top half  
;update metric  
vslb,1,l:(r4)+;end 2nd half  
movel:(r2)+,ba;ld br met  
_P_NextStage  
moven5,b  
;recall loop count  
aslb n3,r5 ;mult it by 2  
mover3,r4  
moveb,n5  
move#BRY,r2  
endm  
;storage for loop count  
;reinit branch ptr  
4.4  
COLLAPSING THE STATES  
For codes that 0 fill the encoder to create an end of data block, we can reduce the number  
of cycles needed to finish the Viterbi decoder processing. Example 4-4 assumes the coder  
ends a block by 0 filling the encoder. For the decoder, this means that the last 5 inputs to  
a block are known beforehand to take the upper state. There are two cycle-saving  
consequences to this. First, we need not update the lower state in the butterflies. Second,  
each of the last five inputs creates only half the number of states from the previous input.  
For the last input, we update state 0 only, giving us a single starting point for traceback.  
The code appears in Example 4-4.  
4-10  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Collapsing the States  
Example 4-4 The ACSFlush Macro  
;***********************ACSFlush Macro****************************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: This routine is very similar to the ACS macro, except that  
the encoding shift register is now flushing back to 0.  
This means that only the upper paths are taken, halving  
the number of states we need to update. The ACS code is  
modified so that we don't compute the lower paths. In  
addition, state storage is modified so that survivor  
paths are stored in consecutive memory locations. Survivors  
are even states on the first pass, then every fourth state,etc.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should be the number of input states/2 to process  
n5 is halved each time this macro is invoked  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,n2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd)  
SA------NSA  
C
/
/
/
D/  
/
/
SB  
;*****************************************************************  
ACSFlush macro  
;**  
move  
move  
move  
move  
move  
do  
#BRY,r2  
r5,r3  
r4,n3  
l:(r2)+n2,y  
l:(r5)+n5,a  
n5,_NextStage  
;r2 points to branch metrics  
;save r5 to init r4 later  
;save r4 to init r5 later  
;get first branch metrics  
;load 1st metric/path pair  
;update each state  
add  
add  
max  
move  
vsl  
y0,a  
y1,b  
a,b  
l:(r5)+n5,a  
b,0,l:(r4)+  
l:(r5)-n5,b  
l:(r2)+n2,y  
(r5)+  
;updt met,load next pair  
;updt met,ld nxt br met  
;sel surv,save met,inc st ptr  
;ld next pair  
;end 2nd half  
_NextStage  
move  
n5,b  
;recall flush count, loop count  
asr  
b
n2,a  
;divide it by 2,prep double branch jump  
move  
move  
asl  
n3,r5  
b,n5  
a
;storage for flush count  
;reinit branch ptr  
(r2)-n2  
move  
move  
endm  
r3,r4  
a,n2  
;next pass, skip more branches  
Viterbi Decoder Implementation  
4-11  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Main: Putting the Pieces Back Together  
We have placed a comment at the beginning, moving the branch metric table address to  
r2. Since this is done in the FindMetrics routine, we donÕt actually need to execute this,  
but it is included as a reminder that r2 needs to be set properly. The remainder of this  
routine is like the ACS butterfly, except that parts are removed, and the number of  
iterations varies.  
We store the beginning values of r4 and r5 so that we can restore them at the end,  
swapped with each other. Because the number of loop iterations varies, we donÕt get the  
automatic swapping that the ACS macro has. The first branch metrics are fetched, and  
the updated count value is stored in memory. We then load the first path metric/path  
pair.  
The loop is quite similar to the ACS except we have eliminated unneeded operations. We  
add a branch metric and load the next path metric/path pair (for the lower path). We  
then add a branch metric and read the branch metrics for the next loop iteration. Taking  
the MAX of the accumulators chooses the survivor path metric/path pair, and the path  
metric read pointer is incremented for the next loop in parallel. The move loads the path  
metric/path pair for the next loop. The VSL completes the current state update for the  
upper state.  
Note that the state storage pointer is incremented only once, even though we only  
update the upper state. This means that state 2 will be written at the address normally  
used for state 1, state 4 at the address used for state 2, etc. The next invocation of the  
ACSFlush will read the data correctly, because the read pointer increment, r5 is halved  
for each invocation of this macro. During the next invocation, we will only have states 0,  
4, 8, etc. These are exactly the states that are possible when the encoder is being 0 filled.  
To end this macro, we restore r4 and r5 to the desired values, swapping old and update  
memories as needed. The loop count is read from n5, halved, and restored, so that we  
process half as many states on the next loop invocation.  
4.5  
MAIN: PUTTING THE PIECES BACK TOGETHER  
There are a number of changes to the main program that are required for these new  
macros to operate properly. The code for MAIN appears in the following example.  
4-12  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Main: Putting the Pieces Back Together  
Example 4-5 Main Program Code Changes  
;***************************MAIN***********************************************  
;
NUMSTATESequ  
ENCBITS  
32  
equ  
6
;most cases=log2(NUMSTATES)+1  
NoAcs  
NUMINPUTSequ  
equ  
NUMSTATES/2  
168  
EVEN  
;
equ  
1-(NUMINPUTS/8)%2 ;EVEN SET TO 1/0 IF NUMINPUTS IS  
EVEN/ODD #BYTES  
org  
p:$400  
VITDEC move  
move  
move  
move  
move  
move  
move  
#NUMSTATES/2-1,m2  
#BRX,r2  
#STATE1,r5  
;r2 points to branch metric table  
;r5 points to current state metric  
;r4 points to updated state metric  
;both modulo to flip locations each sym  
#STATE2,r4  
#NUMSTATES*2-1,m4  
#NUMSTATES*2-1,m5  
#>1,n5  
#NUMSTATES/2,n5  
#PATHOUT,n0  
#-1,m3  
;ctr/input metrics spacing for each butterfly  
;input metrics spacing for each butterfly  
;n0 points to storage for output paths  
;set linear mode, traceback ptr  
;
move  
move  
move  
move  
#>1,n2  
;
;
move  
#INDATA,r1  
;r1 points to input data  
;***********************PREPARATION LOOP*************************  
;
;
;
;
;
This loop iterates by the number of bits used in the  
encoder to pre load the bit decisions. Thereafter,  
the paths are updated and stored off in bytes. We need  
the preload bits so that the stored path metrics point  
correctly to their previous paths for traceback.  
;****************************************************************  
;
do  
#ENCBITS-1,PRELP  
;preload decisions/trellis start  
;
FindMetrics  
PreACS  
;
ACS  
PRELP  
move  
#NUMSTATES/2,n5  
;n5 now serves as offset between fetch  
states  
;
;
;*****************MAIN LOOP--PROCESS BYTES OF DATA*****************  
do  
do  
#NUMINPUTS/8-2,DATALP;process bytes of output  
#8,SYMLP ;8 bits per byte  
;
FindMetrics  
ACS  
SYMLP  
STOREPATHS#NUMSTATES  
DATALP  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
4-13  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Main: Putting the Pieces Back Together  
Example 4-5 Main Program Code Changes (Continued)  
;********POSTPROCESSING,LAST 2 INFO BYTES & FLUSH ENCODER BACK TO 0****  
;
; THESE SETUPS SHOULD ACCOMODATE ENCODERS OF 5 TO 8 BITS.  
; ENCBITS-1 PREPROCESSED, SO WE HAVE 16-ENCBITS+1 BITS LEFT.  
; WE DO ENCODER FLUSHING FOR #ENCBITS-1 BITS. SO WE HAVE  
; (16-ENCBITS+1) - ENCBITS+1 DO DO BEFORE FLUSHING.  
; NEXT, WE PROCESS THE ENCODER FLUSH BITS, STOPPING TO STORE  
; PATHS AS NEEDED. WE STORE AFTER 8 BITS PROCESSED SO THIS MEANS  
; STORE PATHS AFTER 16-2*ENCBITS+2 (from nonflush) +8-(16-2*ENCBITS+2)  
; (these are the additional flush bits needed before storing paths).  
; FINALLY, WE FINISH OUT THE DATA. THE BITS REMAINING ARE:  
; 16-ENCBITS+1-(16-2*ENCBITS+2)-(8-(16-2*ENCBITS+2))  
; = 8-ENCBITS.  
;
; FOR THIS EXAMPLE, WE NONFLUSH PROCESS  
; 6 MORE BITS, THEN FLUSH 2, STORE PATH, THEN FLUSH  
; THE LAST THREE TO STATE 0.  
; LAST 6 NONFLUSH BITS---  
;**********************************************************************  
do  
#16-(2*ENCBITS)+2,LAST6  
FindMetrics  
ACS  
;
LAST6  
;
;
;
ENCODER FLUSH-----  
move  
do  
#>NUMSTATES/2,n5  
;flush, process 16,8, 4, then 2,  
then 1 state  
;
;
#8-(16-(2*ENCBITS)+2),FLSH;process 3 of last five bits to get  
next to last byte  
FindMetrics  
ACSFlush  
ACS  
;
FLSH  
STOREPATHS#NUMSTATES/8  
;
;
;
FLUSH THE LAST THREE BITS------  
do #8-ENCBITS+1,FLSH3  
; do last 3 bits to get last byte  
;
FindMetrics  
ACSFlush  
ACS  
;
FLSH3  
;
TRACEBACK  
;
FINISH nop  
4-14  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Main: Putting the Pieces Back Together  
To make the changes to the MAIN program code, we begin as we did in Section 3, by  
initializing the address registers. R2 is still used to address branch metrics. Registers r4  
and r5 address the states for path metric storage. The differences needed for the  
initialization code are to set n5 to 1 (needed for tracking the number of states in the  
PreAcs code), and n2 is set to 1. This is different from the code in Section 3 for two  
reasons. First, the branch metric generation is different, and second, using n2 to update  
register r2 for branch metric storage is no longer required. Instead, we use n2 to change  
the addresses to read branch metrics for the ACSFlush routine.  
Next, we do ENCBITS-1 iterations of input processing to preload the path storage  
registers. This is different because we use the PreACS macro instead of the ACS macro.  
Using the PreACS macro means that the code uses the starting encoder 0 state to reduce  
execution cycles as well as to avoid the need of initializing the path metric storage for all  
the states.  
The next major piece of code is the nested do loops that process bytes of decoder input  
and store off the resulting paths. This code is identical to the code in Section 3, except  
that we process one less byte of data.  
The end processing is more complicated because there are two subprocesses occurring at  
the same time and their endings are not in phase. One process is the storing of the path  
data in bytes as the path storage registers fill up. The second process is the ACSFlush  
processing, which collapses the trellis states back to 0.  
For this example, start with 168 decoder inputs. We preprocess five inputs, and then  
process 19*8=152 more inputs in bytes using the nested loops. We have 11 decoder  
inputs left to process. We need to do 8 more to get the next byte, but after processing  
6 inputs, we have to change from using the ACS macro to using the ACSFlush macro  
(used to collapse states for the last five inputs).  
Accordingly, we process the first 6 of the last 11 decoder inputs using the ACS macro.  
We then process the next two decoder inputs using the ACSFlush macro. Now we have  
processed eight more decoder inputs, and can store off the path data using the  
STOREPATHS macro. We then finish processing the last three decoder inputs using the  
ACSFlush macro. The final path data is in the collapsed state (as it was in the code in  
Section 3), and is access in the TRACEBACK macro as before. After doing the traceback  
using the TRACEBACK macro of Section 3, the decoder is done at the FINISH label.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
4-15  
Freescale Semiconductor, Inc.  
Algorithmic Extensions  
Main: Putting the Pieces Back Together  
4-16  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
SECTION 5  
SUMMARY  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Summary  
5.1  
5.2  
5.3  
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3  
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-3  
Program Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4  
5-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Summary  
Summary  
5.1  
SUMMARY  
This application report began with an explanation of the Viterbi algorithm in Section 2,  
introduced a basic Viterbi decoder in Section 3, and offered some enhancements to the  
decoding assembly code in Section 4. The explanations in Section 2 and Section 3 are  
detailed enough for the ideas behind the code to be understood. The goal is to make it  
easy to take and modify this code to produce efficient code for any desired application of  
the Viterbi algorithm.  
We chose a nontrivial industry standard code as our example, and there are some  
statistics that are worth noting. Table 5-1 presents some notable statistics for code  
presented in this note. In the table, we compare the basic Viterbi decoder code of  
Section 3 with the basic code in Section 4. The basic code requires about 215 clock cycles  
per decoder input to decoder the data. As an example, this means that at a received data  
rate of 13,350 per second (the IS-136 rate if all six slots in a frame are voice data to be  
decoded) 2.87 MIPS are required.  
If we use generalized branch metrics, the numbers increase, because the branch metric  
calculation is more complicated. Note, however, that they lower again as the Pre ACS  
and ACSFlush routines are included. Once all the enhancements of Section 4 are  
included, the computational load drops to 2.81 MIPS, more than 3% below what the rate  
would be if we just used the generalized branch metrics.  
5.2  
CONCLUSIONS  
In conclusion, note that the relative improvement is dependent on the length of the data  
packet. Shorter packets show greater improvement. For example, an encoded voice data  
slot in IS-136 uses a packet size of only 89 decoder inputs instead of our example 168. For  
packets of this size, the percent change is 4.8%Ñlarge enough to make the extra code  
worth considering.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
5-3  
Freescale Semiconductor, Inc.  
Summary  
Program Listings  
Table 5-1 Viterbi Decoder Code Statistics  
Total Clock  
Cycles  
Clock Cycles per  
Information Bit  
MIPS at 13,350  
Data Rate  
%
change  
Code Used  
Basic Viterbi  
36127  
215.04  
218.26  
2.87  
3.31  
Ñ
Viterbi with  
generalized  
36668  
+1.5%  
branch metrics  
+ACSFlush  
+Pre ACS  
36065  
35459  
214.67  
211.07  
2.87  
2.81  
-0.172%  
-1.85%  
Another statistic worth noting is the program memory required by each program. The  
basic Viterbi decoder code in Section 3 takes 178 program words, and the code in  
Section 4 with all enhancements added takes 311. The large increase is mostly due to the  
unrolling of some loops at the end, required to phase the ACSFlush and STOREPATHS  
macros correctly. Notably, the amount of program space required for either routine is  
quite small, and may not be a large consideration for most systems.  
5.3  
PROGRAM LISTINGS  
Finally, note that there are three appendices to this application report. Each appendix  
contains a program listing for the program as described on one of the sections of this  
manual.  
¥ Appendix A contains a listing for the complete code discussed in Section 3.  
¥ Appendix B contains a complete listing of the code discussed in Section 4.  
¥ Appendix C contains a complete listing of a 24-bit version of the Section 4 code,  
suitable for implementation on a DSP56300 DSP operating in 24-bit arithmetic  
mode.  
5-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
APPENDIX A  
BASIC ALGORITHM PROGRAM LISTING  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
A.1  
VITERBI ALGORITHM PROGRAM LISTING . . . . . . . . . . . . . . A-3  
A-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
A.1  
VITERBI ALGORITHM PROGRAM LISTING  
This appendix contains the complete program listing for a 16-bit implementation of the  
Viterbi algorithm, as presented in Section 3 of this manual.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
A-3  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder  
OPT  
mex  
;***********************VITERBI DECODER***********************************  
;
;
;
;
;
;
THIS ROUTINE IMPLEMENTS A CONVOLUTIONAL DECODER USING THE VITERBI ALG.  
IT IS OPTIMIZED FOR SPEED, WHICH MEANS THAT EVERY ALU REGISTER  
AND ALL OF THE R REGISTERS ARE USED. A SIGNIFICANT AMOUNT OF  
THESE CAN BE FREED BY STORING AND REUSING REGISTERS BETWEEN THE  
FINDMETRICS ROUTINES AND THE ACS,ACSFlush ROUTINES.  
INPUT is in INDATA real in x imag in y. OUTPUT begins at y:DECOUT  
;*****************BRANCH METRIC MACRO*************************************  
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Input data and generate branch metrics.  
For this decoder, the metric is a scaled  
sum or difference of the real and imag inputs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r1 should point to the next input XY data pair  
OUTPUTS:  
Branch metrics are stored at BRX in XY memory  
REGISTERS USED:  
a,b,x01,y01,r1,r2,n2, r2 unchanged (modulo req'd)  
;****************************************************************  
FindMetrics macro  
move  
move  
mpy  
move  
mac  
l:(r1)+,y  
#-16,x1  
x1,y1,a  
#BRY+3,r2  
x1,y0,a  
;grab dec input  
;sign for real component, 0 sent.  
;a has 0x partial branch  
;storage for generated branch metrics  
;a gets 00 branch  
a,b  
mac  
-x1,y0,b  
;b has 01 branch  
;
neg  
neg  
a
b
a,x1  
b,x0  
a,y:(r2)+n2 ;mv 00 to x1,11 to a, st 00 in location 3  
b,y:(r2)+n2 ;mv 01 to x0,10 to b, st 01 in location 6  
;*****************************************************************************  
;
;
AT this point X1 has 00, X0 has 01,  
A1 has 11, B1 has 10, needed for quick storage in Y memory  
;*****************************************************************************  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
endm  
x1,y:(r2)+n2  
x0,y:(r2)+n2  
b,y:(r2)+n2  
b,y:(r2)+n2  
b,y:(r2)+n2  
b,y:(r2)+n2  
x0,y:(r2)+n2  
x1,y:(r2)+n2  
x0,y:(r2)+n2  
x1,y:(r2)+n2  
a,y:(r2)+n2  
a,y:(r2)+n2  
a,y:(r2)+n2  
a,y:(r2)  
;store 00 in location 9  
;store 01 in location 12  
;store 10 in location 15  
;store 10 in location 2  
;store 10 in location 5  
;store 10 in location 8  
;store 01 in location 11  
;store 00 in location 14  
;store 01 in location 1  
;store 00 in location 4  
;store 11 in location 7  
;store 11 in location 10  
;store 11 in location 13  
;store 11 in location 0. r2-> BRY  
A-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder (Continued)  
;
;*******************viterbi add, compare, select butterfly macro***  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by  
doing an add,compare,select update for state pairs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should offset addresses by NUMSTATES/2  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,x0,y1,r2,r4,r5,n5 r2 unchanged (modulo req'd)  
Registers:  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path,states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
D \  
C /  
/
\/  
/\  
D /  
/
\
\
SB------NSB  
C
;*****************************************************************************  
ACS  
;
macro  
;
move  
move  
move  
#BRY,r2  
y:(r2)+,y1  
l:(r5)+n5,a  
;r2 points to branch metrics  
;get first branch metric  
;load 1st metric/path pair  
;
do  
#NoOfAcsButt,_P_NextStage  
;update each state  
sub  
add  
max  
vsl  
add  
sub  
max  
vsl  
y1,a  
y1,b  
a,b  
l:(r5)-n5,b  
;sub pt,br met,get next pair  
;update metrics  
l:(r5)+n5,a  
;pick max,reload 1st pair  
;store survivor,end top half  
;add pt,br met,reload next pair  
;inc st ptr,ld nxt br met  
;pick max met,load next pair  
;st survivor,end 2nd half  
b,0,l:(r4)+  
y1,a  
y1,b  
a,b  
l:(r5)-n5,b  
x:(r5)+,x0  
l:(r5)+n5,a  
y:(r2)+,y1  
b,1,l:(r4)+  
_P_NextStage  
nop  
;needed to separate do loop ends  
endm  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
A-5  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder (Continued)  
;***********************STORE PARTIAL PATH METRICS MACRO***************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: The storage is somewhat twisted. The stored paths are current  
up to the most recent input bit, which is NOT convenient for  
traceback. I process the data as follows: I pre  
loaded the path with 6 bits. Thereafter, I process 8 path bits  
so the path has 14 bits. The most significant 8, I save.  
the remaining bits are the current encoder values for that path.  
The 5 lsb's are the current state.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Updated paths storage X memory, n0 points to next path storage  
REGISTERS USED:  
a,b,x1,r0,n0,r3,r5  
;*****************************************************************************  
;
;
Store off path data in bytes to avoid overflow in path reg's  
STOREPATHS macro LPCNT  
;
move  
move  
move  
n0,r3  
n0,r0  
#>$1f,x1  
;n0 stores path data pointer  
;mask for 5 lsb's(NUMENCBITS of  
1's)  
do  
move  
asr  
LPCNT,_PSTORE1  
y:(r5),b  
#5,b,a  
;store paths for each state  
;grab path  
;align bits 5-12 with a1  
(the 8 bits beyond enc)  
;mask off 5 lsb's to return to  
;
and  
x1,b  
path  
move  
move  
a1,x:(r3)+  
b1,y:(r5)+  
;store 8 ms path bits  
;return 5 ls path bits  
_PSTORE1  
lua  
lua  
(R4+NUMSTATES),R5  
(r0+NUMSTATES),n0  
;r5 points to latest states  
;update path data pointer  
endm  
;
A-6  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder (Continued)  
;***********************TRACEBACK OUTPUT PATH MACRO*******************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: To output the correct data, we begin at the end. We take the  
output path of the survivor state (0), and place its associated  
output path in memory as the last output data byte. Then we use  
bits 3-7 of that data as an offset pointer to the correct traceback  
data of the next previous path data memory. We continue this until  
we have traced the data back to the begining.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Decoder output data in Y memory  
REGISTERS USED:  
a,b,y,r0,n0,r2,r5  
;*********************************************************************  
;
;
Store off path data in bytes to avoid overflow in path reg's  
TRACEBACK macro  
;
move  
move  
move  
move  
move  
move  
move  
;
N0,R0  
#0,n0  
;path ptr to r0  
;prep for traceback  
;point to end to trace data  
;recall last path  
;r2 now linear  
;save off last path data  
;control word for extract  
#DECOUT+(NUMINPUTS/8/2),r2  
y:(r5),b  
#-1,m2  
b1,x:(r0)  
#$513,x0  
;******************************BEGIN TRACEBACK*****************************  
IF (EVEN==0)  
move  
x:(r0+n0),a  
;recall last path  
extractu x0,a,b  
;bits 3-7 of a1 point to next data  
;dec r0 to next earlier state set  
;move to upper byte  
lua  
(r0-NUMSTATES),r0  
lsl  
#8,a  
move  
move  
ENDIF  
b0,n0  
a1,y:(r2)-  
;load as offset for traceback  
;save off  
;
do  
move  
#NUMINPUTS/8/2,TRCBK  
x:(r0+n0),a  
;once for each byte pair  
;recall last path  
extractu x0,a,b  
;get ptr to next earlier path  
;point r0 to next earlier states  
;save ptr as offset  
;save out byte in x1  
;do it all again!  
lua  
(r0-NUMSTATES),r0  
move  
move  
move  
b0,n0  
a1,x1  
x:(r0+n0),a  
extractu x0,a,b  
lua  
move  
lsl  
or  
(r0-NUMSTATES),r0  
b0,n0  
#8,a  
x1,a  
;move to upper byte  
;or in last byte to get 16 bit word  
;store result  
move  
a1,y:(r2)-  
TRCBK  
endm  
Viterbi Decoder Implementation  
A-7  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder (Continued)  
;***************************MAIN***********************************************  
NUMSTATESequ  
ENCBITS  
32  
equ  
6
;most cases=log2(NUMSTATES)+1  
NoOfAcsButtequ NUMSTATES/2  
NUMINPUTSequ  
EVEN  
;
168  
equ  
1-(NUMINPUTS/8)%2  
;EVEN SET TO 1/0 IF NUMINPUTS IS  
EVEN/ODD #BYTES  
org  
p:$400  
VITDEC move  
move  
#NUMSTATES/2-1,m2  
#>3,n2  
#STATE1,r5  
;r2 points to branch metric table  
;increment for branch metric storage  
;r5 points to current state metric  
;r4 points to updated state metric  
;both modulo to flip loc each sym  
move  
move  
move  
move  
#STATE2,r4  
#NUMSTATES*2-1,m4  
#NUMSTATES*2-1,m5  
#NUMSTATES/2,n5  
move  
;input metrics spacing for each butter-  
fly  
move  
move  
move  
#PATHOUT,n0  
#-1,m3  
#INDATA,r1  
;n0 points to storage for output paths  
;set linear mode, traceback ptr  
;r1 points to input data  
;***********************PREPARATION LOOP*************************  
;
;
;
;
;
This loop iterates by the number of bits used in the  
encoder to pre load the bit decisions. Thereafter,  
the paths are updated and stored off in bytes. We need  
the preload bits so that the stored path metrics point  
correctly to their previous paths for traceback  
;****************************************************************  
do #ENCBITS-1,PRELP ;once for each encoder bit  
;
FindMetrics  
ACS  
PRELP  
;*****************MAIN LOOP--PROCESS BYTES OF DATA*****************  
do  
do  
#NUMINPUTS/8-1,DATALP  
#8,SYMLP  
;process bytes of output  
;8 bits per byte  
;
FindMetrics  
ACS  
SYMLP  
STOREPATHS#NUMSTATES  
DATALP  
;********POSTPROCESSING,LAST INFO BYTE ********************************  
;
; ENCBITS-1 PREPROCESSED, SO WE HAVE 8-ENCBITS+1 BITS LEFT.  
; FOR THIS EXAMPLE, WE HAVE 3 INPUTS LEFT TO PROCESS  
;**********************************************************************  
;
THE LAST THREE BITS------  
do  
#8-ENCBITS+1,FLSH1  
;process last 3 bits to get last byte  
FindMetrics  
ACS  
FLSH1  
;*******Traceback the path data to obtain the decoder output**********  
TRACEBACK  
FIN  
nop  
A-8  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder (Continued)  
;**********************MEMORY ORGANIZATION****************************  
;
;
;
;
;
MOST OF THE MEMORY LOCATION IS IMPORTANT--THE FIRST TWO  
LABELS OF THE X AND Y DATA MEMORY ARE PAIRED AND MUST  
BE CO LOCATED (AT THE SAME ADDRESSES). IN ADDITION, STATE1  
AND STATE2 MUST BE LOCATED ON A 0 MOD 2*NUMSTATES BOUNDARY,  
X and Y memory for input data must be paired --GOOD LUCK.....  
;*****************************************************************************  
;*********************************************************************  
;
org  
STATE1 DC  
DC  
STATE2 DC  
DC  
x:$0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
;
PATHOUT DS  
NUMSTATES*(NUMINPUTS/8+1)  
INDATA ;THIS DATA ENCODES  
$1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$20  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
$a000,$a000,$a000,$6000,$6000,$a000,$a000,$6000  
$6000,$6000,$6000,$6000,$6000,$a000,$a000,$6000  
$a000,$6000,$a000,$6000,$a000,$6000,$a000,$6000  
$a000,$a000,$6000,$6000,$6000,$a000,$a000,$a000  
$a000,$a000,$a000,$a000,$a000,$a000,$a000,$a000  
$a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000  
$a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000  
$6000,$a000,$6000,$a000,$6000,$6000,$6000,$6000  
$a000,$a000,$6000,$6000,$a000,$a000,$a000,$6000  
$a000,$6000,$a000,$6000,$6000,$a000,$6000,$a000  
$a000,$a000,$6000,$a000,$a000,$a000,$a000,$6000  
$6000,$6000,$6000,$a000,$6000,$6000,$6000,$6000  
$6000,$6000,$a000,$a000,$a000,$a000,$6000,$6000  
$a000,$a000,$6000,$a000,$a000,$a000,$a000,$a000  
$a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000  
$6000,$a000,$6000,$a000,$6000,$6000,$a000,$a000  
$6000,$6000,$6000,$a000,$a000,$6000,$a000,$6000  
$6000,$6000,$a000,$a000,$a000,$6000,$a000,$a000  
$a000,$a000,$6000,$6000,$6000,$a000,$a000,$6000  
$6000,$a000,$6000,$a000,$6000,$a000,$a000,$a000  
$a000,$a000,$6000,$6000,$a000,$6000,$a000,$6000  
;*****************************************************************************  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
A-9  
Freescale Semiconductor, Inc.  
Basic Algorithm Program Listing  
Example A-1 Basic 16-Bit Implementation of a Viterbi Decoder (Continued)  
;*****************************************************************************  
;
org  
DC  
DC  
DC  
DC  
y:$0  
PATH1  
PATH2  
BRY  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
NUMSTATES/2  
DSM  
DECOUT DS  
org  
DC  
NUMINPUTS/8  
y:@cvs(y,INDATA)  
YDATA  
$a000,$a000,$a000,$6000,$a000,$6000,$a000,$6000  
$a000,$6000,$a000,$a000,$6000,$6000,$a000,$6000  
$a000,$a000,$6000,$a000,$6000,$6000,$6000,$a000  
$6000,$6000,$6000,$6000,$6000,$6000,$a000,$6000  
$6000,$6000,$6000,$a000,$a000,$a000,$6000,$a000  
$a000,$a000,$6000,$6000,$6000,$a000,$6000,$a000  
$6000,$6000,$6000,$6000,$a000,$6000,$a000,$a000  
$6000,$6000,$a000,$6000,$6000,$6000,$a000,$6000  
$a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000  
$a000,$a000,$a000,$6000,$a000,$a000,$6000,$6000  
$6000,$a000,$6000,$a000,$a000,$6000,$a000,$6000  
$6000,$6000,$a000,$a000,$6000,$a000,$6000,$a000  
$a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000  
$a000,$a000,$a000,$a000,$a000,$6000,$a000,$6000  
$a000,$6000,$a000,$6000,$6000,$6000,$6000,$6000  
$a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000  
$a000,$a000,$a000,$6000,$a000,$a000,$6000,$a000  
$6000,$6000,$a000,$6000,$6000,$6000,$a000,$a000  
$6000,$6000,$a000,$a000,$6000,$a000,$a000,$6000  
$a000,$6000,$6000,$6000,$6000,$a000,$a000,$a000  
$a000,$a000,$6000,$a000,$6000,$6000,$6000,$6000  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
end  
A-10  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
APPENDIX B  
EXTENDED ALGORITHM PROGRAM  
LISTING  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
B.1  
16-BIT ENHANCED VITERBI DECODER PROGRAM LISTINGB-3  
B-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
B.1  
16-BIT ENHANCED VITERBI DECODER PROGRAM LISTING  
This appendix contains the complete 16-bit program listing for Section 4. The code in  
this section introduces several modifications to the code discussed in Section 3.  
Generalized branch metrics are allowed, as well as the optimization of the code that  
reduces the computations when data occurs in blocks.  
Example B-1 Extended Algorithm Program Listing  
OPT  
mex  
;****************56600**VITERBI DECODER***********************************  
; THIS ROUTINE IMPLEMENTS A CONVOLUTIONAL DECODER USING THE VITERBI ALG. IT IS  
; OPTIMIZED FOR SPEED, WHICH MEANS THAT EVERY ALU REGISTER AND ALL OF THE R REGISTERS  
; ARE USED. A SIGNIFICANT AMOUNT OF THESE CAN BE FREED BY STORING AND REUSING  
; REGISTERS BETWEEN THE FINDMETRICS ROUTINES AND THE ACS,ACSFlush ROUTINES.  
; INPUT is at INDATA: real in x imag in y. OUTPUT begins at y:DECOUT  
; GLOBAL REGISTER USE: a,b,x,y,r012345,n025,m245(are set to a modulo mode)  
;*****************BRANCH METRIC MACRO*************************************  
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Input data and generate branch metrics. This function  
subtracts the path metric for state 0 from all branch metrics  
to provide autonormaliztion. For this decoder, the metric is a scaled  
sum or difference of the real and imag inputs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r1 should point to the next input XY data pair  
OUTPUTS:  
Branch metrics are stored at BRX in XY memory  
REGISTERS USED:  
a,b,x01,y01,r1,r2,r5, r5 unchanged,r2 unchanged (modulo req'd)  
;*****************FINDMETRICS MACRO******************************************  
FindMetrics macro  
move  
neg  
x:(r5),a  
a
;st. metric scaling, read last st. 0  
l:(r1)+,y ;negate metric|grab dec input  
move  
mac  
move  
mac  
subl  
tfr  
mac  
#-16,x1  
x1,y1,a  
y1,x0  
x1,y0,a  
a,b  
y1,a  
-x1,y0,a  
x0,b  
;sign for real component, upper br.  
;comp 0x partial br. path mt. to  
;mv real input to x0  
;a gets 00 br. y1 gets 0x partial br  
;a has 00 br, b has 11 br, save path  
;swap 0x and 00 to compute 01 branch  
;a gets 01 br  
a,b  
a,y1  
b,x0  
a,y1  
b,x1  
b,y0  
y1,x0  
tfr  
subl  
;swap path metric,11 branch to y0  
;b gets 10 br, x0 has copy of 00 br.  
a,b  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
B-3  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;*****************************************************************************  
;
;
AT this point, X is configured with br 11|00, y with 00|11, and AB  
with 01|10, BA with 10|01 all needed for quick storage in XY memory  
;*****************************************************************************  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
endm  
x,l:(r2)+  
ab,l:(r2)+  
ba,l:(r2)+  
y,l:(r2)+  
y,l:(r2)+  
ba,l:(r2)+  
ab,l:(r2)+  
x,l:(r2)+  
ba,l:(r2)+  
y,l:(r2)+  
x,l:(r2)+  
ab,l:(r2)+  
ab,l:(r2)+  
x,l:(r2)+  
y,l:(r2)+  
ba,l:(r2)+  
;store 11 in location 0.  
;store 01 in location 1  
;store 10 in location 2  
;store 00 in location 3  
;store 00 in location 4  
;store 10 in location 5  
;store 01 in location 6  
;store 11 in location 7  
;store 10 in location 8  
;store 00 in location 9  
;store 11 in location 10  
;store 01 in location 11  
;store 01 in location 12  
;store 11 in location 13  
;store 00 in location 14  
;store 10 in location 15 r2-> BRX  
B-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;*******************viterbi add, compare, select butterfly macro***  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by  
doing an add,compare,select update for state pairs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should offset addresses by NUMSTATES/2  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,r4,r5,n5 r2 unchanged (modulo req'd)  
Registers:  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path,states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
D \  
C /  
/
\/  
/\  
D /  
/
\
\
SB------NSB  
C
;*****************************************************************  
;
ACS  
;
macro  
;
move  
move  
move  
#BRY,r2  
l:(r2)+,y  
l:(r5)+n5,a  
;r2 points to branch metrics  
;get first branch metric  
;load 1st metric/path pair  
;
do  
#NoAcs,_P_NextStage  
;update each state  
add  
add  
max  
vsl  
add  
add  
max  
move  
vsl  
y0,a  
y1,b  
a,b  
l:(r5)-n5,b  
;sum pt,br met,get next pair  
;update metric  
l:(r5)+n5,a  
;pick max, reload 1st pair  
;save surivior, end top half  
;sum pt,br,reload next pair  
;sum again,inc pt mt read ptr  
;pick max, load next pair  
;load next branch metrics  
;write survivor,end 2nd half  
b,0,l:(r4)+  
y1,a  
y0,b  
a,b  
l:(r2)+,y  
b,1,l:(r4)+  
l:(r5)-n5,b  
(r5)+  
l:(r5)+n5,a  
_P_NextStage  
nop  
;needed to separate do loop ends  
endm  
Viterbi Decoder Implementation  
B-5  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;*******************************;PreACS***************************************  
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by ACS butterfly  
from assumed 0 state to start, and double the number of states on each invocation  
until the full trellis is used. This routine CANNOT be used unless the encoder is  
starting from an all 0's state. Note this means it cannot be used if the data is  
a continuation of a previously processed data stream. Use the ACS macro instead. ;  
However, for packetised data, or other data that assumes the data starts with the  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
encoder 0 filled, this routine saves cycles, AND only state 0 needs to be  
initialised before starting the decode.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should be the number of input states/2 to process  
n5 is doubled each time this macro is invoked  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd)  
Registers:  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path,states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
C
D \  
\
\
\
\
SB  
NSB  
;*****************************************************************  
PreACS macro  
;
move  
move  
move  
move  
move  
do  
#BRY,r2  
r5,r3  
r4,n3  
l:(r2)+,ba  
l:(r5)+,y  
n5,_P_NextStage  
y,a  
;r2 points to branch metrics  
;save r5 to init r4 later  
;save r4 to init r5 later  
;get first branch metrics  
;load 1st metric/path pair  
update each state  
add  
;update metric  
add  
vsl  
vsl  
move  
y,b  
l:(r5)+,y  
;updt met, ld nxt pair  
;end top half  
;end 2nd half  
a,0,l:(r4)+  
b,1,l:(r4)+  
l:(r2)+,ba  
;ld br met  
_P_NextStage  
move  
n5,b  
b
;recall loop count  
;mult it by 2  
asl  
n3,r5  
move  
move  
move  
r3,r4  
b,n5  
#BRY,r2  
;storage for loop count  
;reinit branch ptr  
endm  
B-6  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;***********************ACSFlush Macro****************************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: This routine is very similar to the ACS macro, except that  
the encoding shift register is now flushing back to 0.  
This means that only the upper paths are taken, halving  
the number of states we need to update. The ACS code is  
modified so that we don't compute the lower paths. In  
addition, state storage is modified so that survivor  
paths are stored in consecutive memory locations.  
Survivors are even states on the first pass, then  
every fourth state, etc.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should be the number of input states/2 to process  
n5 is halved each time this macro is invoked  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,n2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd)  
SA------NSA  
C
/
/
/
D/  
/
/
SB  
;*****************************************************************  
ACSFlush macro  
;**  
move  
move  
move  
move  
move  
#BRY,r2  
r5,r3  
r4,n3  
l:(r2)+n2,y  
l:(r5)+n5,a  
;r2 points to branch metrics  
;save r5 to init r4 later  
;save r4 to init r5 later  
;get first branch metrics  
;load 1st metric/path pair  
;
do  
n5,_NextStage  
;update each state  
add  
add  
max  
move  
vsl  
y0,a  
y1,b  
a,b  
l:(r5)+n5,a  
b,0,l:(r4)+  
l:(r5)-n5,b  
l:(r2)+n2,y  
(r5)+  
;updt met,load next pair  
;updt met,ld nxt br met  
;sel surv,save met,inc st ptr  
;ld next pair  
;end 2nd half  
_NextStage  
move  
n5,b  
;recall flush count, loop count  
asr  
b
n2,a  
;divide it by 2,prep double branch jump  
move  
move  
asl  
n3,r5  
b,n5  
a
;storage for flush count  
;reinit branch ptr  
(r2)-n2  
move  
move  
endm  
r3,r4  
a,n2  
;next pass, skip more branches  
Viterbi Decoder Implementation  
B-7  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;***********************STORE PARTIAL PATH METRICS MACRO***************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: The storage is somewhat twisted. The stored paths are  
current up to the most recent input bit, which is NOT  
convenient for traceback. As a result, I process the data as  
follows: I preloaded the path with 6 bits. Thereafter, I  
process 8 path bits so the path has 14 bits. The most  
significant 8, I save. The remaining bits are the current  
encoder values for that path. The 5 lsb's are the current  
state.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Updated paths storage X memory, n0 points to next path storage  
REGISTERS USED:  
a,b,y1,r0,n0,r3,r5  
;*********************************************************************  
;
;
;
Store off path data in bytes to avoid overflow in path reg's  
STOREPATHS macro LPCNT  
;
move  
move  
move  
do  
move  
asr  
and  
move  
move  
n0,r3  
n0,r0  
#>$1f,y1  
LPCNT,_PSTORE1  
y:(r5),b  
#5,b,a  
y1,b  
a1,x:(r3)+  
b1,y:(r5)+  
;n0 stores path data pointer  
;mask for 5 lsb's(NUMENCBITS-1 of 1's)  
;store paths for each state  
;grab path  
;align bits 5-12 w/ a1 (8 bits beyond enc)  
;mask off 5 lsb's to return to path  
;store 8 ms path bits  
;return 5 ls path bits  
_PSTORE1  
lua  
(R4+NUMSTATES),R5  
;r5 points to latest states  
lua  
(r0+NUMSTATES),n0 ;update path data pointer  
endm  
B-8  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;***********************TRACEBACK OUTPUT PATH MACRO*******************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: To output the correct data, we begin at the end. We take  
the output path of the survivor state (0), and place its  
associated output path in memory as the last output data  
byte. Then we use bits 3-7 of that data as an offset pointer  
to the correct traceback data of the next previous path data  
memory. We continue this until we have traced the data back  
to the begining.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Decoder output data in Y memory  
REGISTERS USED:  
a,b,y,r0,n0,r2,r5  
;*********************************************************************  
Store off path data in bytes to avoid overflow in path reg's  
;
TRACEBACK macro  
;
move  
move  
move  
move  
move  
move  
move  
;
N0,R0  
#0,n0  
;path ptr to r0  
;prep for traceback  
#DECOUT+(NUMINPUTS/8/2),r2 ;point to end of output buffer  
y:(r5),b  
#-1,m2  
b1,x:(r0)  
#$513,y0  
;recall last path  
;r2 now linear  
;save off last path data  
;control word for extract  
;*****************BEGIN TRACEBACK*****************************  
IF (EVEN==0)  
move  
x:(r0+n0),a  
;recall last path  
extractu y0,a,b  
;bits 3-7 of a1 point to next data  
;dec r0 to next earlier path set  
;move to upper byte  
lua  
(r0-NUMSTATES),r0  
lsl  
#8,a  
move  
move  
ENDIF  
b0,n0  
a1,y:(r2)-  
;load as offset for traceback  
;save off  
;
do  
move  
#NUMINPUTS/8/2,TRCBK  
x:(r0+n0),a  
;once for each byte pair  
;recall last path  
extractu y0,a,b  
;get ptr to next earlier path  
;point r0 to next earlier states  
;save ptr as offset  
;save out byte in x1  
;do it all again!  
lua  
(r0-NUMSTATES),r0  
move  
move  
move  
b0,n0  
a1,y1  
x:(r0+n0),a  
extractu y0,a,b  
lua  
move  
lsl  
or  
(r0-NUMSTATES),r0  
b0,n0  
#8,a  
y1,a  
;move to upper byte  
;or in last byte to get 16 bit word  
;store result  
move  
a1,y:(r2)-  
TRCBK  
endm  
Viterbi Decoder Implementation  
B-9  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;***************************MAIN***********************************************  
;
NUMSTATESequ  
ENCBITS  
32  
equ  
6
;most cases=log2(NUMSTATES)+1  
NoAcs  
NUMINPUTSequ  
equ  
NUMSTATES/2  
168  
EVEN  
;
equ  
1-(NUMINPUTS/8)%2 ;EVEN SET TO 1/0 IF NUMINPUTS IS  
EVEN/ODD #BYTES  
org  
p:$400  
VITDEC move  
move  
move  
move  
move  
move  
move  
#NUMSTATES/2-1,m2  
#BRX,r2  
#STATE1,r5  
;r2 points to branch metric table  
;r5 points to current state metric  
;r4 points to updated state metric  
;both modulo to flip locations each sym  
#STATE2,r4  
#NUMSTATES*2-1,m4  
#NUMSTATES*2-1,m5  
#>1,n5  
#NUMSTATES/2,n5  
#PATHOUT,n0  
#-1,m3  
;ctr/input metrics spacing for each butterfly  
;input metrics spacing for each butterfly  
;n0 points to storage for output paths  
;set linear mode, traceback ptr  
;
move  
move  
move  
move  
#>1,n2  
;
;
move  
#INDATA,r1  
;r1 points to input data  
;***********************PREPARATION LOOP*************************  
;
;
;
;
;
This loop iterates by the number of bits used in the  
encoder to pre load the bit decisions. Thereafter,  
the paths are updated and stored off in bytes. We need  
the preload bits so that the stored path metrics point  
correctly to their previous paths for traceback.  
;****************************************************************  
;
do  
#ENCBITS-1,PRELP  
;preload decisions/trellis start  
;
FindMetrics  
PreACS  
;
ACS  
PRELP  
move  
#NUMSTATES/2,n5  
;n5 now serves as offset between fetch  
states  
;
;
;*****************MAIN LOOP--PROCESS BYTES OF DATA*****************  
do  
do  
#NUMINPUTS/8-2,DATALP;process bytes of output  
#8,SYMLP ;8 bits per byte  
;
FindMetrics  
ACS  
SYMLP  
STOREPATHS#NUMSTATES  
DATALP  
B-10  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;********POSTPROCESSING,LAST 2 INFO BYTES & FLUSH ENCODER BACK TO 0****  
;
; THESE SETUPS SHOULD ACCOMODATE ENCODERS OF 5 TO 8 BITS.  
; ENCBITS-1 PREPROCESSED, SO WE HAVE 16-ENCBITS+1 BITS LEFT.  
; WE DO ENCODER FLUSHING FOR #ENCBITS-1 BITS. SO WE HAVE  
; (16-ENCBITS+1) - ENCBITS+1 DO DO BEFORE FLUSHING.  
; NEXT, WE PROCESS THE ENCODER FLUSH BITS, STOPPING TO STORE  
; PATHS AS NEEDED. WE STORE AFTER 8 BITS PROCESSED SO THIS MEANS  
; STORE PATHS AFTER 16-2*ENCBITS+2 (from nonflush) +8-(16-2*ENCBITS+2)  
; (these are the additional flush bits needed before storing paths).  
; FINALLY, WE FINISH OUT THE DATA. THE BITS REMAINING ARE:  
; 16-ENCBITS+1-(16-2*ENCBITS+2)-(8-(16-2*ENCBITS+2))  
; = 8-ENCBITS.  
;
; FOR THIS EXAMPLE, WE NONFLUSH PROCESS  
; 6 MORE BITS, THEN FLUSH 2, STORE PATH, THEN FLUSH  
; THE LAST THREE TO STATE 0.  
; LAST 6 NONFLUSH BITS---  
;**********************************************************************  
do  
#16-(2*ENCBITS)+2,LAST6  
FindMetrics  
ACS  
;
LAST6  
;
;
;
ENCODER FLUSH-----  
move  
do  
#>NUMSTATES/2,n5  
;flush, process 16,8, 4, then 2,  
then 1 state  
;
;
#8-(16-(2*ENCBITS)+2),FLSH;process 3 of last five bits to get  
next to last byte  
FindMetrics  
ACSFlush  
ACS  
;
FLSH  
STOREPATHS#NUMSTATES/8  
;
;
;
FLUSH THE LAST THREE BITS------  
do #8-ENCBITS+1,FLSH3  
; do last 3 bits to get last byte  
;
FindMetrics  
ACSFlush  
ACS  
;
FLSH3  
;
TRACEBACK  
;
FINISH nop  
Viterbi Decoder Implementation  
B-11  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
;**********************MEMORY ORGANIZATION****************************  
;
;
;
;
;
MOST OF THE MEMORY LOCATION IS IMPORTANT--THE FIRST TWO  
LABELS OF THE X AND Y DATA MEMORY ARE PAIRED AND MUST  
BE CO LOCATED (AT THE SAME ADDRESSES). IN ADDITION, STATE1  
AND STATE2 MUST BE LOCATED ON A 0 MOD 2*NUMSTATES BOUNDARY,  
X and Y memory for input data must be paired --GOOD LUCK.....  
;*********************************************************************  
;
org  
STATE1 DC  
DC  
STATE2 DC  
DC  
x:$0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
NUMSTATES/2  
BRX  
DSM  
;
PATHOUT DS  
NUMSTATES*(NUMINPUTS/8+1)  
INDATA ;THIS DATA ENCODES  
$1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$20  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
$a000,$a000,$a000,$6000,$6000,$a000,$a000,$6000  
$6000,$6000,$6000,$6000,$6000,$a000,$a000,$6000  
$a000,$6000,$a000,$6000,$a000,$6000,$a000,$6000  
$a000,$a000,$6000,$6000,$6000,$a000,$a000,$a000  
$a000,$a000,$a000,$a000,$a000,$a000,$a000,$a000  
$a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000  
$a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000  
$6000,$a000,$6000,$a000,$6000,$6000,$6000,$6000  
$a000,$a000,$6000,$6000,$a000,$a000,$a000,$6000  
$a000,$6000,$a000,$6000,$6000,$a000,$6000,$a000  
$a000,$a000,$6000,$a000,$a000,$a000,$a000,$6000  
$6000,$6000,$6000,$a000,$6000,$6000,$6000,$6000  
$6000,$6000,$a000,$a000,$a000,$a000,$6000,$6000  
$a000,$a000,$6000,$a000,$a000,$a000,$a000,$a000  
$a000,$6000,$6000,$a000,$a000,$a000,$a000,$a000  
$6000,$a000,$6000,$a000,$6000,$6000,$a000,$a000  
$6000,$6000,$6000,$a000,$a000,$6000,$a000,$6000  
$6000,$6000,$a000,$a000,$a000,$6000,$a000,$a000  
$a000,$a000,$6000,$6000,$6000,$a000,$a000,$6000  
$6000,$a000,$6000,$a000,$6000,$a000,$a000,$a000  
$a000,$a000,$6000,$6000,$a000,$6000,$a000,$6000  
;
org  
DC  
DC  
DC  
DC  
y:$0  
PATH1  
PATH2  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
B-12  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
Example B-1 Extended Algorithm Program Listing (Continued)  
BRY  
DECOUT DS  
org  
DC  
DSM  
NUMSTATES/2  
NUMINPUTS/8  
y:@cvs(y,INDATA)  
YDATA  
$a000,$a000,$a000,$6000,$a000,$6000,$a000,$6000  
$a000,$6000,$a000,$a000,$6000,$6000,$a000,$6000  
$a000,$a000,$6000,$a000,$6000,$6000,$6000,$a000  
$6000,$6000,$6000,$6000,$6000,$6000,$a000,$6000  
$6000,$6000,$6000,$a000,$a000,$a000,$6000,$a000  
$a000,$a000,$6000,$6000,$6000,$a000,$6000,$a000  
$6000,$6000,$6000,$6000,$a000,$6000,$a000,$a000  
$6000,$6000,$a000,$6000,$6000,$6000,$a000,$6000  
$a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000  
$a000,$a000,$a000,$6000,$a000,$a000,$6000,$6000  
$6000,$a000,$6000,$a000,$a000,$6000,$a000,$6000  
$6000,$6000,$a000,$a000,$6000,$a000,$6000,$a000  
$a000,$6000,$6000,$a000,$a000,$6000,$a000,$a000  
$a000,$a000,$a000,$a000,$a000,$6000,$a000,$6000  
$a000,$6000,$a000,$6000,$6000,$6000,$6000,$6000  
$a000,$a000,$a000,$a000,$a000,$6000,$6000,$a000  
$a000,$a000,$a000,$6000,$a000,$a000,$6000,$a000  
$6000,$6000,$a000,$6000,$6000,$6000,$a000,$a000  
$6000,$6000,$a000,$a000,$6000,$a000,$a000,$6000  
$a000,$6000,$6000,$6000,$6000,$a000,$a000,$a000  
$a000,$a000,$6000,$a000,$6000,$6000,$6000,$6000  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
end  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
B-13  
Freescale Semiconductor, Inc.  
Extended Algorithm Program Listing  
B-14  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
APPENDIX C  
24-BIT ALGORITHM PROGRAM LISTING  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
C.1  
24-BIT ENHANCED VITERBI DECODER PROGRAM LISTINGC-3  
C-2  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
C.1  
24-BIT ENHANCED VITERBI DECODER PROGRAM LISTING  
This section contains a second complete program listing for Section 4. This program  
is nearly identical to the program listing in Appendix B. The difference is that the data is  
24-bit data, meant to run on a 56300 family digital signal processor in 24-bit arithmetic  
mode. In addition, the path data is stored and recovered in 16-bit words instead of 8-bit  
bytes. The longer register allow us to reduce the memory and some complexity in the  
traceback.  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
C-3  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing  
OPT  
mex  
;****************56600**24 bit VITERBI DECODER*****************************  
;
;
;
;
;
;
;
THIS ROUTINE IMPLEMENTS A CONVOLUTIONAL DECODER USING THE VITERBI ALG.  
IT IS OPTIMIZED FOR SPEED, WHICH MEANS THAT EVERY ALU REGISTER  
AND ALL OF THE R REGISTERS ARE USED. A SIGNIFICANT AMOUNT OF  
THESE CAN BE FREED BY STORING AND REUSING REGISTERS BETWEEN THE  
FINDMETRICS ROUTINES AND THE ACS,ACSFlush ROUTINES.  
INPUT is at INDATA: real in x imag in y. OUTPUT begins at y:DECOUT  
GLOBAL REGISTER USE: a,b,x,y,r012345,n025,m245(are set to a modulo mode)  
;*************************************************************************  
;
;*****************BRANCH METRIC MACRO*************************************  
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Input data and generate branch metrics. This function  
subtracts the path metric for state 0 from all branch metrics  
to provide autonormaliztion. For this decoder, the metric is a scaled  
sum or difference of the real and imag inputs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r1 should point to the next input XY data pair  
OUTPUTS:  
Branch metrics are stored at BRX in XY memory  
REGISTERS USED:  
a,b,x01,y01,r1,r2,r5, r5 unchanged,r2 unchanged (modulo req'd)  
;****************************************************************  
;
FindMetricsmacro  
move  
neg  
move  
mac  
move  
mac  
subl  
tfr  
x:(r5),a  
a
;st. metric scaling, read last st. 0  
l:(r1)+,y ;negate metric|grab dec input  
;sign for real component, upper br.  
a,b  
#-16,x1  
x1,y1,a  
y1,x0  
x1,y0,a  
a,b  
y1,a  
-x1,y0,a  
x0,b  
;cp metric to b|mv real in to x0  
a,y1  
b,x0  
a,y1  
b,x1  
b,y0  
y1,x0  
;y1 gets 0x partial branch  
;a has 00 br., b has 11 branch, save st.  
;swap 0x and 00 to compute 01 branch  
;a gets 01 br,x1 gets cp of 11 br.  
;swap st metric,11 branch to y0  
;b gets 10 br, x0 has copy of 00 br.  
mac  
tfr  
subl  
a,b  
C-4  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;*****************************************************************************  
;
;
AT this point, X is configured with br 11|00, y with 00|11, and  
AB with 01|10, BA with 10|01 all needed for quick storage in XY memory  
;*****************************************************************************  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
move  
endm  
x,l:(r2)+  
ab,l:(r2)+  
ba,l:(r2)+  
y,l:(r2)+  
y,l:(r2)+  
ba,l:(r2)+  
ab,l:(r2)+  
x,l:(r2)+  
ba,l:(r2)+  
y,l:(r2)+  
x,l:(r2)+  
ab,l:(r2)+  
ab,l:(r2)+  
x,l:(r2)+  
y,l:(r2)+  
ba,l:(r2)+  
;store 11 in location 0.  
;store 01 in location 1  
;store 10 in location 2  
;store 00 in location 3  
;store 00 in location 4  
;store 10 in location 5  
;store 01 in location 6  
;store 11 in location 7  
;store 10 in location 8  
;store 00 in location 9  
;store 11 in location 10  
;store 01 in location 11  
;store 01 in location 12  
;store 11 in location 13  
;store 00 in location 14  
;store 10 in location 15 r2-> BRX  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
C-5  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;*******************viterbi add, compare, select butterfly macro***  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: Update path metrics/paths for the Viterbi algorithm by  
doing an add,compare,select update for state pairs.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should offset addresses by NUMSTATES/2  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,r4,r5,n5 r2 unchanged (modulo req'd)  
Registers:  
r5, pointer to the path metric/path table, arranged as  
x: path metric, y: path,states ordered assuming  
bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged  
as x:C, y:D, CD,CD,CD, etc.  
SA------NSA  
\
D \  
C /  
/
\/  
/\  
D /  
/
\
\
SB------NSB  
C
;*****************************************************************  
;
ACS  
;
macro  
;
move  
move  
move  
#BRY,r2  
l:(r2)+,y  
l:(r5)+n5,a  
;r2 points to branch metrics  
;get first branch metric  
;load 1st metric/path pair  
;
do  
#NoAcs,_P_NextStage  
;update each state  
;get next pair  
;update metrics  
;reload 1st pair  
;end top half  
;reload next pair  
;inc st ptr  
;load next pair  
;load next branch metrics  
;end 2nd half  
add  
add  
max  
vsl  
add  
add  
max  
move  
vsl  
y0,a  
y1,b  
a,b  
l:(r5)-n5,b  
l:(r5)+n5,a  
b,0,l:(r4)+  
y1,a  
y0,b  
a,b  
l:(r5)-n5,b  
(r5)+  
l:(r5)+n5,a  
l:(r2)+,y  
b,1,l:(r4)+  
_P_NextStage  
nop  
;needed to separate do loop ends  
endm  
C-6  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;*******************************;PreACS***************************************  
;FUNCTION: Update path metrics/paths for the Viterbi algorithm by the ACS  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
butterfly from assumed 0 state to start, and double the number of states  
on each invocation until the full trellis is used. This routine CANNOT  
be used unless the encoder is starting from an all 0's state. Note this  
means it cannot be used if the data is a continuation of a previously  
processed data stream. Use the ACS macro instead. However, for packetised  
data, or other data that assumes the data starts with the encoder 0 filled,  
this routine saves cycles, AND only state 0 needs to be initialised  
before starting the decode.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should be the number of input states/2 to process  
n5 is doubled each time this macro is invoked  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd)  
Registers:  
r5, pointer to the path metric/path table, arranged as x: path metric,  
y: path,states ordered assuming bits shift right to left.  
r4, pointer to the output path metric/path table  
r0, pointer to the branch metric table, arranged as x:C, y:D, CD,CD, etc.  
SA------NSA  
\
C
D \  
\
\
\
\
SB  
NSB  
;*****************************************************************  
PreACS macro  
;
move  
move  
move  
move  
move  
do  
#BRY,r2  
r5,r3  
r4,n3  
l:(r2)+,ba  
l:(r5)+,y  
n5,_P_NextStage  
y,a  
;r2 points to branch metrics  
;save r5 to init r4 later  
;save r4 to init r5 later  
;get first branch metrics  
;load 1st metric/path pair  
;update each state  
add  
;update metric  
add  
vsl  
vsl  
move  
y,b  
l:(r5)+,y  
;updt met, ld nxt pair  
;end top half  
;end 2nd half  
a,0,l:(r4)+  
b,1,l:(r4)+  
l:(r2)+,ba  
;ld br met  
_P_NextStage  
move  
n5,b  
b
;recall loop count  
;mult it by 2  
asl  
n3,r5  
move  
move  
move  
r3,r4  
b,n5  
#BRY,r2  
;storage for loop count  
;reinit branch ptr  
endm  
Viterbi Decoder Implementation  
C-7  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;***********************ACSFlush Macro****************************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: This routine is very similar to the ACS macro, except that  
the encoding shift register is now flushing back to 0.  
This means that only the upper paths are taken, halving  
the number of states we need to update. The ACS code is  
modified so that we don't compute the lower paths. In  
addition, state storage is modified so that survivor  
paths are stored in consecutive memory locations.  
Survivors are even states on the first pass, then  
every fourth state, etc.  
INPUTS:  
r2 should point to the beginning of the branch metric table  
r5 should point to the latest path metric for state 0  
r4 should point to the storage location for updated state 0  
n5 should be the number of input states/2 to process  
n5 is halved each time this macro is invoked  
OUTPUTS:  
Updated path metrics/paths stored in XY memory  
REGISTERS USED:  
a,b,y01,r2,n2,r3,n3,r4,r5,n5 r2 unchanged (modulo req'd)  
SA------NSA  
C
/
/
/
D/  
/
/
SB  
;*****************************************************************  
ACSFlush macro  
;**  
move  
move  
move  
move  
move  
#BRY,r2  
r5,r3  
r4,n3  
l:(r2)+n2,y  
l:(r5)+n5,a  
;r2 points to branch metrics  
;save r5 to init r4 later  
;save r4 to init r5 later  
;get first branch metrics  
;load 1st metric/path pair  
;
do  
n5,_NextStage  
;update each state  
add  
add  
max  
move  
vsl  
y0,a  
y1,b  
a,b  
l:(r5)+n5,a  
b,0,l:(r4)+  
l:(r5)-n5,b  
l:(r2)+n2,y  
(r5)+  
;updt met,load next pair  
;updt met,ld nxt br met  
;sel surv,save met,inc st ptr  
;ld next pair  
;end 2nd half  
_NextStage  
move  
n5,b  
;recall flush count, loop count  
asr  
b
n2,a  
;divide it by 2,prep double branch jump  
move  
move  
asl  
n3,r5  
b,n5  
a
;storage for flush count  
;reinit branch ptr  
(r2)-n2  
move  
move  
endm  
r3,r4  
a,n2  
;next pass, skip more branches  
C-8  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;***********************STORE PARTIAL PATH METRICS MACRO***************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: The storage is somewhat twisted. The stored paths are current  
up to the most recent input bit, which is NOT convenient for  
traceback. As a result, I process the data as follows: I pre  
loaded the path with 6 bits. Thereafter, I process 8 path bits  
so the path has 14 bits. The most significant 8, I save.  
the remaining bits are the current encoder values for that path.  
The 5 lsb's are the current state.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Updated paths storage X memory, n0 points to next path storage  
REGISTERS USED:  
a,b,y1,r0,n0,r3,r5  
;*********************************************************************  
;
;
;
Store off path data in wordsto avoid overflow in path reg's  
STOREPATHS macro LPCNT  
;
move  
move  
move  
do  
move  
asr  
and  
move  
move  
n0,r3  
n0,r0  
#>$1f,y1  
LPCNT,_PSTORE1  
y:(r5),b  
#5,b,a  
y1,b  
a1,x:(r3)+  
b1,y:(r5)+  
;n0 stores path data pointer  
;mask for 1 lsb's(NUMENCBITS-1 of 1's)  
;store paths for each state  
;grab path  
;align bits5-20 w/ a1 (16 bits beyond enc)  
;mask off 5 lsb's to return to path  
;store 16 ms path bits  
;return 5 ls path bits  
_PSTORE1  
lua  
lua  
(R4+NUMSTATES),R5  
(r0+NUMSTATES),n0  
;r5 points to latest states  
;update path data pointer  
endm  
Viterbi Decoder Implementation  
C-9  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;***********************TRACEBACK OUTPUT PATH MACRO*******************  
;
;
;
;
;
;
;
;
;
;
;
;
;
;
FUNCTION: To output the correct data, we begin at the end. We take the  
output path of the survivor state (0), and place its associated  
output path in memory as the last output data word. Then we use  
bits 11-15 of that data as an offset pointer to the correct traceback  
data of the next previous path data memory. We continue this until  
we have traced the data back to the begining.  
INPUTS:  
n0 should point to memory where the path bits are to be stored  
r5 should point to the latest path metric for state 0  
OUTPUTS:  
Decoder output data in Y memory  
REGISTERS USED:  
a,b,y,r0,n0,r2,r5  
;*********************************************************************  
;
;
;
Store off path data in words to avoid overflow in path reg's  
TRACEBACK macro  
;
move  
move  
move  
move  
move  
move  
move  
;
N0,R0  
#0,n0  
;path ptr to r0  
;prep for traceback  
#DECOUT+(NUMINPUTS/8/2),r2 ;point to end of output buffer  
y:(r5),b  
#-1,m2  
b1,x:(r0)  
#$5023,y0  
;recall last path  
;r2 now linear  
;save off last path data  
;control word for extract  
;*****************BEGIN TRACEBACK*****************************  
;
IF (EVEN==0)  
move  
x:(r0+n0),a  
;recall last path  
extractu y0,a,b  
;bits 11-15 of a1 point to next data  
;dec r0 to next earlier state set  
;move to upper byte  
lua  
(r0-NUMSTATES),r0  
lsl  
#8,a  
move  
move  
ENDIF  
b0,n0  
a1,y:(r2)-  
;load as offset for traceback  
;save off  
;
do  
move  
#NUMINPUTS/8/2,TRCBK  
x:(r0+n0),a  
;once for each byte pair  
;recall last path  
extractu y0,a,b  
;get ptr to next earlier path  
;point r0 to next earlier states  
;save ptr as offset  
lua  
(r0-NUMSTATES),r0  
move  
move  
b0,n0  
a1,y:(r2)-  
;store result  
TRCBK  
endm  
C-10  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;***************************MAIN***********************************************  
;
NUMSTATESequ  
ENCBITS  
32  
equ  
6
;most cases=log2(NUMSTATES)+1  
NoAcs  
NUMINPUTSequ  
equ  
NUMSTATES/2  
168  
EVEN  
;
equ  
1-(NUMINPUTS/8)%2  
;EVEN SET TO 1/0 IF NUMINPUTS IS  
EVEN/ODD #BYTES  
org  
VITDEC move  
move  
p:$400  
#NUMSTATES/2-1,m2  
#BRX,r2  
;r2 points to branch metric table  
move  
move  
move  
move  
#STATE1,r5  
#STATE2,r4  
#NUMSTATES*2-1,m4  
#NUMSTATES*2-1,m5  
#>1,n5  
;r5 points to current state metric  
;r4 points to updated state metric  
;modulo to flip locations each sym  
move  
;ctr/input metrics spacing for each  
butterfly  
;
;
move  
move  
move  
move  
#NUMSTATES/2,n5  
#PATHOUT,n0  
#-1,m3  
;input spacing for each butterfly  
;n0 points to stor for output paths  
;set linear mode, traceback ptr  
#>1,n2  
;
;
move  
#INDATA,r1  
;r1 points to input data  
;***********************PREPARATION LOOP*************************  
;
;
;
;
;
This loop iterates by the number of bits used in the  
encoder to pre load the bit decisions. Thereafter,  
the paths are updated and stored off in bytes. We need  
the preload bits so that the stored path metrics point  
correctly to their previous paths for traceback.  
;****************************************************************  
;
do  
#ENCBITS-1,PRELP  
;preload decisions/trellis start  
;
FindMetrics  
PreACS  
;
ACS  
PRELP  
move  
#NUMSTATES/2,n5  
;n5 now serves as offset between  
fetch states  
;
;*****************MAIN LOOP--PROCESS BYTES OF DATA*****************  
do  
do  
#NUMINPUTS/16-1,DATALP  
#16,SYMLP  
;process bytes of output  
;16 bits per byte  
;
FindMetrics  
ACS  
SYMLP  
STOREPATHS#NUMSTATES  
DATALP  
Viterbi Decoder Implementation  
C-11  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;********POSTPROCESSING, FLUSH ENCODER BACK TO 0****  
; ASSUMES that there are an odd number of bytes of data!!!  
; This will need redoing if the number of data bytes is even.  
; THESE SETUPS SHOULD ACCOMODATE ENCODERS OF 5 TO 8 BITS.  
; ENCBITS-1 PREPROCESSED, 152 IN MAIN LOOP DONE  
; SO WE HAVE 24-ENCBITS+1 BITS LEFT.  
; WE DO ENCODER FLUSHING FOR #ENCBITS-1 BITS. SO WE HAVE  
; (24-ENCBITS+1) - ENCBITS+1 DO DO BEFORE FLUSHING.  
; NEXT, WE PROCESS THE ENCODER FLUSH BITS, STOPPING TO STORE  
; PATHS AS NEEDED. WE STORE AFTER 16 BITS PROCESSED SO THIS MEANS  
; STORE PATHS AFTER 24-2*ENCBITS+2 (from nonflush) +16-(24-2*ENCBITS+2)  
; (these are the additional flush bits needed before storing paths).  
; FINALLY, WE FINISH OUT THE DATA. THE BITS REMAINING ARE:  
; 24-ENCBITS+1-(24-2*ENCBITS+2)-(16-(24-2*ENCBITS+2))  
; = 8-ENCBITS+1.  
;
; FOR THIS EXAMPLE, WE NONFLUSH PROCESS  
; 14 MORE BITS, THEN FLUSH 2, STORE PATH, THEN FLUSH  
; THE LAST THREE TO STATE 0.  
; LAST 14 NONFLUSH BITS---  
;**********************************************************************  
do  
#24-(2*ENCBITS)+2,LAST14  
FindMetrics  
ACS  
;
LAST14  
;
;
;
ENCODER FLUSH-----  
move  
do  
#>NUMSTATES/2,n5  
;flush, process 16,8, 4, then 2,  
then 1 state  
;
;
#16-(24-(2*ENCBITS)+2),FLSH;process 2 of last five bits to get  
next to last byte  
FindMetrics  
ACSFlush  
ACS  
;
FLSH  
STOREPATHS#NUMSTATES/8  
;
;
;
FLUSH THE LAST THREE BITS------  
do #8-ENCBITS+1,FLSH3  
;process last 3 bits to get last  
byte  
;
FindMetrics  
ACSFlush  
ACS  
;
FLSH3  
;
TRACEBACK  
;
FINISH nop  
C-12  
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
24-Bit Algorithm Program Listing  
Example C-1 24-bit Algorithm Program Listing (Continued)  
;**********************MEMORY ORGANIZATION****************************  
;
;
;
;
;
MOST OF THE MEMORY LOCATION IS IMPORTANT--THE FIRST TWO  
LABELS OF THE X AND Y DATA MEMORY ARE PAIRED AND MUST  
BE CO LOCATED (AT THE SAME ADDRESSES). IN ADDITION, STATE1  
AND STATE2 MUST BE LOCATED ON A 0 MOD 2*NUMSTATES BOUNDARY,  
X and Y memory for input data must be paired --GOOD LUCK.....  
;*********************************************************************  
;
org  
STATE1 DC  
DC  
STATE2 DC  
DC  
x:$0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0ff,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
NUMSTATES/2  
BRX  
DSM  
;
PATHOUT DS  
NUMSTATES*(NUMINPUTS/8+1)  
INDATA ;THIS DATA ENCODES  
$1234,$5678,$9abc,$4973,$7925,$3491,$ad43,$ff21,$7ebb,$0100,$20  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
$a00000,$a00000,$a00000,$600000,$600000,$a00000,$a00000,$600000  
$600000,$600000,$600000,$600000,$600000,$a00000,$a00000,$600000  
$a00000,$600000,$a00000,$600000,$a00000,$600000,$a00000,$600000  
$a00000,$a00000,$600000,$600000,$600000,$a00000,$a00000,$a00000  
$a00000,$a00000,$a00000,$a00000,$a00000,$a00000,$a00000,$a00000  
$a00000,$600000,$600000,$a00000,$a00000,$a00000,$a00000,$a00000  
$a00000,$a00000,$a00000,$a00000,$a00000,$600000,$600000,$a00000  
$600000,$a00000,$600000,$a00000,$600000,$600000,$600000,$600000  
$a00000,$a00000,$600000,$600000,$a00000,$a00000,$a00000,$600000  
$a00000,$600000,$a00000,$600000,$600000,$a00000,$600000,$a00000  
$a00000,$a00000,$600000,$a00000,$a00000,$a00000,$a00000,$600000  
$600000,$600000,$600000,$a00000,$600000,$600000,$600000,$600000  
$600000,$600000,$a00000,$a00000,$a00000,$a00000,$600000,$600000  
$a00000,$a00000,$600000,$a00000,$a00000,$a00000,$a00000,$a00000  
$a00000,$600000,$600000,$a00000,$a00000,$a00000,$a00000,$a00000  
$600000,$a00000,$600000,$a00000,$600000,$600000,$a00000,$a00000  
$600000,$600000,$600000,$a00000,$a00000,$600000,$a00000,$600000  
$600000,$600000,$a00000,$a00000,$a00000,$600000,$a00000,$a00000  
$a00000,$a00000,$600000,$600000,$600000,$a00000,$a00000,$600000  
$600000,$a00000,$600000,$a00000,$600000,$a00000,$a00000,$a00000  
$a00000,$a00000,$600000,$600000,$a00000,$600000,$a00000,$600000  
;
Viterbi Decoder Implementation  
For More Information On This Product,  
Go to: www.freescale.com  
C-13  
Freescale Semiconductor, Inc.  
Example C-1 24-bit Algorithm Program Listing (Continued)  
org  
y:$0  
PATH1  
PATH2  
BRY  
DC  
DC  
DC  
DC  
DSM  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
$0,$0,$0,$0,$0,$0,$0,$0,0,0,0,0,0,0,0,0  
NUMSTATES/2  
DECOUT DS  
org  
DC  
NUMINPUTS/8  
y:@cvs(y,INDATA)  
YDATA  
$a00000,$a00000,$a00000,$600000,$a00000,$600000,$a00000,$600000  
$a00000,$600000,$a00000,$a00000,$600000,$600000,$a00000,$600000  
$a00000,$a00000,$600000,$a00000,$600000,$600000,$600000,$a00000  
$600000,$600000,$600000,$600000,$600000,$600000,$a00000,$600000  
$600000,$600000,$600000,$a00000,$a00000,$a00000,$600000,$a00000  
$a00000,$a00000,$600000,$600000,$600000,$a00000,$600000,$a00000  
$600000,$600000,$600000,$600000,$a00000,$600000,$a00000,$a00000  
$600000,$600000,$a00000,$600000,$600000,$600000,$a00000,$600000  
$a00000,$600000,$600000,$a00000,$a00000,$600000,$a00000,$a00000  
$a00000,$a00000,$a00000,$600000,$a00000,$a00000,$600000,$600000  
$600000,$a00000,$600000,$a00000,$a00000,$600000,$a00000,$600000  
$600000,$600000,$a00000,$a00000,$600000,$a00000,$600000,$a00000  
$a00000,$600000,$600000,$a00000,$a00000,$600000,$a00000,$a00000  
$a00000,$a00000,$a00000,$a00000,$a00000,$600000,$a00000,$600000  
$a00000,$600000,$a00000,$600000,$600000,$600000,$600000,$600000  
$a00000,$a00000,$a00000,$a00000,$a00000,$600000,$600000,$a00000  
$a00000,$a00000,$a00000,$600000,$a00000,$a00000,$600000,$a00000  
$600000,$600000,$a00000,$600000,$600000,$600000,$a00000,$a00000  
$600000,$600000,$a00000,$a00000,$600000,$a00000,$a00000,$600000  
$a00000,$600000,$600000,$600000,$600000,$a00000,$a00000,$a00000  
$a00000,$a00000,$600000,$a00000,$600000,$600000,$600000,$600000  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
DC  
end  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
For More Information On This Product,  
Go to: www.freescale.com  
Freescale Semiconductor, Inc.  
How to Reach Us:  
Home Page:  
www.freescale.com  
E-mail:  
support@freescale.com  
USA/Europe or Locations Not Listed:  
Freescale Semiconductor  
Technical Information Center, CH370  
1300 N. Alma School Road  
Chandler, Arizona 85224  
+1-800-521-6274 or +1-480-768-2130  
support@freescale.com  
Europe, Middle East, and Africa:  
Freescale Halbleiter Deutschland GmbH  
Technical Information Center  
Schatzbogen 7  
81829 Muenchen, Germany  
+44 1296 380 456 (English)  
+46 8 52200080 (English)  
+49 89 92103 559 (German)  
+33 1 69 35 48 48 (French)  
support@freescale.com  
Japan:  
Freescale Semiconductor Japan Ltd.  
Headquarters  
ARCO Tower 15F  
1-8-1, Shimo-Meguro, Meguro-ku,  
Tokyo 153-0064  
Japan  
Information in this document is provided solely to enable system and software  
implementers to use Freescale Semiconductor products. There are no express or  
implied copyright licenses granted hereunder to design or fabricate any integrated  
circuits or integrated circuits based on the information in this document.  
Freescale Semiconductor reserves the right to make changes without further notice to  
any products herein. Freescale Semiconductor makes no warranty, representation or  
guarantee regarding the suitability of its products for any particular purpose, nor does  
Freescale Semiconductor assume any liability arising out of the application or use of  
any product or circuit, and specifically disclaims any and all liability, including without  
limitation consequential or incidental damages. “Typical” parameters which may be  
provided in Freescale Semiconductor data sheets and/or specifications can and do  
vary in different applications and actual performance may vary over time. All operating  
parameters, including “Typicals” must be validated for each customer application by  
customer’s technical experts. Freescale Semiconductor does not convey any license  
under its patent rights nor the rights of others. Freescale Semiconductor products are  
not designed, intended, or authorized for use as components in systems intended for  
surgical implant into the body, or other applications intended to support or sustain life,  
or for any other application in which the failure of the Freescale Semiconductor product  
could create a situation where personal injury or death may occur. Should Buyer  
purchase or use Freescale Semiconductor products for any such unintended or  
unauthorized application, Buyer shall indemnify and hold Freescale Semiconductor  
and its officers, employees, subsidiaries, affiliates, and distributors harmless against all  
claims, costs, damages, and expenses, and reasonable attorney fees arising out of,  
directly or indirectly, any claim of personal injury or death associated with such  
unintended or unauthorized use, even if such claim alleges that Freescale  
0120 191014 or +81 3 5437 9125  
support.japan@freescale.com  
Asia/Pacific:  
Freescale Semiconductor Hong Kong Ltd.  
Technical Information Center  
2 Dai King Street  
Tai Po Industrial Estate  
Tai Po, N.T., Hong Kong  
+800 2666 8080  
support.asia@freescale.com  
For Literature Requests Only:  
Freescale Semiconductor Literature Distribution Center  
P.O. Box 5405  
Denver, Colorado 80217  
1-800-441-2447 or 303-675-2140  
Fax: 303-675-2150  
Semiconductor was negligent regarding the design or manufacture of the part.  
LDCForFreescaleSemiconductor@hibbertgroup.com  
For More Information On This Product,  
Go to: www.freescale.com  

相关型号:

DSP56300FM/AD

high density CMOS device with 3.3 V inputs and outputs
MOTOROLA

DSP56300FM/AD

a high-density CMOS device
FREESCALE

DSP56300FMAD

DSP56300 Family Manual Addendum
ETC

DSP56301

DSP56301 Digital Signal Processor
MOTOROLA

DSP56301

24-Bit Digital Signal Processor
FREESCALE

DSP56301ADMUM

DSP56301 Application Development Module Users Manual
ETC

DSP56301AG100

24-Bit Digital Signal Processor
FREESCALE

DSP56301AG80

24-Bit Digital Signal Processor
FREESCALE

DSP56301CE2K30A

DSP56301 Digital Signal Processor
MOTOROLA

DSP56301PW100

24-Bit Digital Signal Processor
FREESCALE

DSP56301PW80

24-Bit Digital Signal Processor
FREESCALE

DSP56301UMAD

DSP56301 Users Manual Addendum
ETC