Drive computer vision camera (Point Grey) on single-board computer (Banana Pi)

Computer vision cameras come with extremely small size without too much compromise on image quality. They fill in the holes of a wide range of applications that conventional photography cameras are not capable of.

One drawback, however, is the fact that they don’t have large on-board storage. That is to say, the data must be streamed out to some external storage. A dumb computer could be a good companion to the computer vision camera, but this combination will cost the computer vision camera its advantage of size. Imagine that if you want to build a computer vision camera rig, then a large rig of computers will be required, which is very likely to make the maintenance work become nightmare.

Given the assumption that those dumb computers are only there to receive and store the data, we don’t actually need all the advanced stuff such as CPU computing power or graphics performance. In this case, is it possible for those super cheap single-board computers (embedded system) to take over the job? The good news is, YES!

According to Point Grey’s website, they’ve already tested their cameras with single-board computers such as Jetson TK1 from Nvidia and ODROID-XU. How about others, the popular Raspberry Pi or Banana Pi? Especially the latter one because it comes with a SATA port, perfect for a SSD storage.

P1040148_1920
A stereo set of Point Grey Flea3 FL3-GE-50S5C connecting to Banana Pi BPi M1 + SSD storage

After downloading the ARM SDK and the sample codes into the Banana Pi, the first tryout gave me the following error:


There is an image consistency issue with this image

What does it mean? It means your data receiving part (which is the Banana Pi) can’t catch up the speed with the camera. So the image you received is likely corrupted.

How to solve it? Fortunately, you only need to tweak the parameters on  the camera as well as the OS (e.g. Ubuntu) on the embedded system:

  • Packet Size and Delay on the camera
  • Receive Buffers on your OS

I will take a little bit more in my next posts regarding tweaking of those parameters. Once I got everything work, I was able to achieve 4fps of the RAW data from its 5.0 MP sensor, that is roughly 80MB/s. When I tried to increase to 5fps, I got the “image consistency issue” back from time to time. I believe there still should be a little more juice to squeeze out of the Banana Pi by tweaking the above parameters.

 

Advertisements

Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller(5)

Bootloader must be designed to be very reliable. In other word, bootloader is an unsung hero carrying out its job when necessary. It shouldn’t be replaced or destroyed at anytime. However, if you understand how the bootloader is implemented in a micro-controller, you will know it’s nothing different from other pieces of your custom application code — just some binaries stored in the flash. So it is possible that they can be erased and overwritten. A well designed bootloader would avoid self-destruction from most mistake. However, hardware failure couldn’t be planned and eliminated. Your bootloader is under risk if no protecction is in force against those exception. Worst case scenario, page 0 of your device’s flash is erased and for some reason (most likely, IO error or power surge), the most critical row 0 of this page couldn’t be correctly re-programmed. Your micro-controller will end up not being ablt to jump to the starting address of bootloader next time when it is powered on. Instead, the chip will hang up itself right after power-on because it has no idea of where to jump to according to the flash’s row 0 of flash 0. If you don’t quite understand why row 0 of page 0 is so important for bootloader design, and why this critial line of data must be rewritten every time, please refer to my first post of this series.

A roll-back strategy is introduced here to save a bootloader from losing its GOTO-RESET instruction. A bootLoader overwrite flag is created and set to 0 on initializtion. The flag is set to 1 right after page 0 is erased, indicating that roll-back might be needed if following re-programming of this page couldn’t be correctly performed. The flag is only set back to 0 after row 0 of page 0 has been loaded with new jumping instructions, indicating that roll-back is no longer needed. In order to retain the critical GOTO-RESET data, a temporary buffer is created to store it before page 0 is erased:

        .bss
bt_Addr:.space 6

; Two goto instructions(6 bytes) for jumping back to
; bootloader's starting address:
; Two goto instructions come as the first 6 bytes of row 0 of page 0

This is how I implemented the roll-back mechanism. I am explaining inline. You could end up have different style of coding

Roll_Bk:
; Load write latch
mov     #0xFA, W0
mov     W0, TBLPAG

; Buffer bt_Addr has been loaded with GOTO-RESET's content when page 0
; is erased.
mov     #bt_Addr, W1
tblwth.b    [W1++], [W0]
tblwtl.b    [W1++], [W0++]
tblwtl.b    [W1++], [W0++]
tblwth.b    [W1++], [W0]
tblwtl.b    [W1++], [W0++]
tblwtl.b    [W1], [W0]

; Programming two instructions that is stored at W0(which is 0)at address 0(goto)
; Load address
mov     #0, W0
; W0 is 0, which is the starting address of GOTO-RESET instructions
; in the flash. It is located at the very beginning of the flash.
mov     W0, NVMADRU
mov     W0, NVMADR
; Setup NVMCON for word programming
mov     #0x4001, W0
rcall   Write
return

Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller(4)

If you have read my previous post about how to extract useful data from INTE HEX32 format hex file, you are just one step away from being able to burn the firmware via bootloader.

Unlike an ICD programmer that burns the firmware through PCG/PCD pins, Microchip has a complete set of instructions / registers that are dedicated to overwrite its own flash for the 16-bit MCU series. For example:

  • NVMADRU (for higher 8-bit address) and NVMADR (for lower 16-bit address) are Flash address registers, pointing to the physical location (24-bit address) of the flash where binary firmware will be programed/erased/read.
  • NVMOP of NVMCON register. Operation select bits to perform bulk erase, page erase, row program and etc., depending on different settings.
  • TBLPAG register is used to store upper 8-bit of 24-bit address for the “Write latches” (Scroll down for detailed description of the Write Latches”). For dsPIC33E or PIC24E device, you should always load “0xFA” into TBLPAG because the “Write latche” is physically implemented from address 0xFA0000.BL4_Write_Latches_Address
  • TBLWTH and TBLWTL instructions are needed to load the address of binary data in the RAM into “Write latches” before the actual burning process.
  • and many other…..

Since Microchip implemented the “row program” scheme, it is more convenient and faster to do bootloading by rows, or every 128 instructions. “Write latches” could be considered as a temporary storage place on the Flash where 128 instructions are held before they get burned into their final destination. In page program mode, Microchip doesn’t seem to support writing into the flash directly from RAM. All data must be firstly loaded into RAM and then transferred to the Write Latches, before finally “burned” into the flash. The following piece of codes is an example:

  1. Receive firmware through peripherals such as UART, SPI, I2C, PMP or IO  and store them in RAM.
    
            .bss
            buffer:  .space (#128*3)      ;Create a array size of 128*3 bytes in RAM
            ;Your code here to receive binary data via UART/SPI/I2C
    
    
  2. Load the data from RAM into Write Laches.
    
            mov       #128, W1      ;To burn 128 instructions, which is the size of a "row"
            mov       #0xFA, W0
            mov       W0, TBLPAG    ;As explained above, Write Latches starts from 0xFA0000
            mov       #0, W2        ;Lower 16-bit of Write Latches starts from 0
            mov       #buffer, W3   ;Load buffer (RAM) address
    WRLA:   tblwth.b  [W3++], [W2]
            tblwtl.b  [W3++], [W2++]
            tblwtl.b  [W3++], [W2++]
            dec       W1, W1
            bra       nz, WRLA
    
    
  3. Execute the row program command.
            ;Before starting writing into the flash,
            ;we must know what's the starting address for this 128-instructions
            ;Assume upper byte of the address is already loaded into W0
            mov       W0, NVMADRU    ;Load the upper byte of address into NVMADRU register
            ;Assume mid/lower bytes of the address are already loaded into W0
            mov       W0, NVMADR     ;Load the mid/lower bytes of the address into NVMADR register
            ;Now the MCU knows where the data are (Write Latches) and where they should go (loaded
            ;in NVMADRU and NVMADR registers)
            ;We are ready to starting the "burning" process!
            mov       #0x4002, W0
            mov       W0, NVMCON
            mov       #0x55, W0
            mov       W0, NVMKEY
            mov       #0xAA, W0
            mov       W0, NVMKEY
            bset      NVMCON, #WR
            nop
            nop
            ;Wait for write to finish
    WWTF:   btsc    NVMCON, #WR
            bra     WWTF
    
    
  4. Now you have just finished one row of firmware. Repeat from step 1 to step 3 until all the firmware is burned into your flash.

Tips for using dsPIC33e series MCU’s remappable peripheral IO

A big improvement of dsPIC33e series micro-controller from its predecessors is the re-mappable peripheral IO. This allows great flexibility for circuit board design, but also requires slightly more efforts to use it.

  • “RPn” is for both output and input, whereas “RPIn” is input only. When wiring up your PCB design, you need to double check that “PRIn” is not hookup up as a peripheral output.
  • Declaration for output type IO is different from input. For example, if you use RPI16 as input for UART1 receiver, you should claim in your program like:
    // U1ART RX connects to RPI16
    _U1RXR = 16;
    
    

    If you use RP118 as output for UART1 transmitter, you will need to claim in a reversed way like:

    // RP118 connects to peripheral 0b00001
    _RP118R = 1; // 0b00001 stands for UART1
    
    

    I believe it is related to the way the look-up table is implemented.

  • Configuring the peripheral IO register is not the whole story to correctly use an IO, especially when this pin shares with an analog function (usually ADC). All the IO that are shared with “ANx” are set to be an analog pin as default. Any attempts trying to use them as digital IO must have their corresponding ANSELx register cleared. (set to 0).
    For example, if you want to use Pin No.3 (which is also AN29, RE5, RP85) for any peripheral function’s output (UART, DCI, SPI, I2C, etc.,), you must declare:

    // ANSxy, x stands for Port A/B/C/D/E/..., y stands for number
    _ANSE5 = 0; // Enables digital port pin
    
    

    dsPIC33e_remappable_peripheral_IO

    [Update on 11/25/2013]
    Pins that multiplex with analog function are not only limited to ADCs, but also Comparators/Op-Amps.  I/Os such as RG6, RG7 share with C1IN3- and C1IN1- are also default as analog pin after reset. In order to use them as digital I/O, their corresponding ANSxy (in this case _ANSG6, _ANSG7) register must be cleared. I didn’t realized this until I ran into trouble when remapping the DCI CSDI on one of the comparator input pin, and ended up receiving nothing but zero. As soon as I clear the corresponding ANSELx register, data started pumping into the DCI receiver.

Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller(3)

If you have read my “Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller(2)” of this series, you should be able to extract all the necessary information from the hex file compiled by Microchip’s MPLAB. The necessary information includes nothing but address and data. The whole hex file is actually a map telling you what data is supposed to be placed at which part of the MCU’s flash. Today, I am going to focus on dsPIC33E/PIC24E’s user memory space structure, tips for hex file extraction and GOTO/RESET replacement.

Before you can put any data to its target location in a MCU’s flash, the first thing you must know is how the flash is structured. Although the user memory space is implemented continuously on the MCU’s flash from 0x000000 to 0x02ABFE (or 0x0557FE for bigger memory version), it is actually sectioned into different pieces for manage purpose: Erase block or “page” contains 1024 instructions, and each program block or “row” contains 128 instructions, or you can say each page contains 8 rows. From the view of “page/erase block” or “row/program block”, there are 86 pages or 684 rows in a device with address limit 0x02ABFE, or 171 pages or 1368 rows for 0x0557FE. For smaller memory device, you might wonder, since 0x02ABFE means there are 175104 available address, if you divide 175104 by 1024 or 128, you should get 171 and 1368  respectively, why you end up only having half of the result? If you couldn’t understand, you should check out my previous post of this series where I explained why each instruction takes two address. You might also have notice that 175104/2/1024 should give you 85.5 rather than 86. This time you are right, it is 85.5, and the last page is only half as big as all the others — 512 instructions wide. But microchip still counts the half page as a page. That’s how you get 86 on its datasheet. “Page” is the smallest unit to erase, which means if you run a page erase operation on a MCU, you will at least have to erase the whole page of data, or you can also do a whole chip erase….. Accordingly, “row” is the smallest unit to program. When you run a program operation, you will program 128 instructions in a row. This also requires that you should have your 128 instructions ready to go before the execution. Latch media is the temporary warehouse to store those 128 instructions before you burn them into the flash, I will cover it in the following post.

Hex extraction is quite straightforward if you have read my previous post of this series. Here is some python codes I wrote to extract hex data and line them up according to their address:

def _Parse_Hex32(self):
    extended_Lineaer_Address = 0    # Left-shift by 16 bits
    extended_Segment_Address = 0    # Left-shift by 4 bits
    for i in range(0, len(self._hex32_Lines)):
        #print str(self._hex32_Lines[i])

        byte_Count, starting_Address, record_Type, data = self._Parse_Line(self._hex32_Lines[i])
        if record_Type == 1:        # End record
            if i != (len(self._hex32_Lines) - 1):
                raise Hex32_Invalid("Data type \"End(0)\" appears (line \"" + str(i+1) + "\") before the end of file.")
        elif record_Type == 2:      # Extended segment address
            if len(data) == 2:
                print ("!!! Warning !!!: Data type \"Extended Segment Address(2)\" appears on line\"" + str(i+1) + "\". ")
                extended_Segment_Address = (data[0] * 256 + data[1]) * 16
            else:
                raise Hex32_Invalid("Data type \"Extended Segment Address(2)\" (line\"" + str(i+1) + "\") contains more than two bytes.")
        elif record_Type == 4:      # Extended linear address
            if len(data) == 2:
                extended_Linear_Address = (data[0] * 256 + data[1]) * 256
            else:
                raise Hex32_Invalid("Data type \"Extended Linear Address(4)\" (line\"" + str(i+1) + "\") contains more than two bytes.")
        elif record_Type == 0:      # Data record
            for i in range(0, len(data) / 4):
                device_Address = (extended_Linear_Address + extended_Segment_Address + starting_Address) / 2 + i
                # Flag the LUT, divide the address by because the real device addres inrements by 2
                self._Flag_LUT(device_Address)

                # Fill in the hex into the array
                self._flash_Memory[device_Address * self._instruction_Size_In_Hex + 0] = data[self._instruction_Size_In_Hex * i + 0]
                self._flash_Memory[device_Address * self._instruction_Size_In_Hex + 1] = data[self._instruction_Size_In_Hex * i + 1]
                self._flash_Memory[device_Address * self._instruction_Size_In_Hex + 2] = data[self._instruction_Size_In_Hex * i + 2]
                self._flash_Memory[device_Address * self._instruction_Size_In_Hex + 3] = data[self._instruction_Size_In_Hex * i + 3]

        else:
            raise Hex32_Invalid("Unsupport data type: " + str(reccord_Type))

As I explained in my first post of this series, the GOTO-RESET data in the hex data must be replaced by your own instructions that point to the starting address of your bootloader. Rather than executing your customer application, a device with modified GOTO-RESET will first jump to the bootloader to check if there is new firmware coming in. If yes, then burn the firmware into the flash, otherwise, jump to the starting address of your customer application and run it.

So attention should be paid in two places: 1) Store the  GOTO-RESET  data from your compiler, and replace it with your starting address of bootloader; 2) Save the stored GOTO-RESET data at the end of your bootloader, so that the device knows where to find your application once the booloading is finished.

########################################################################################################
# Replace the very first two instrutions (which is the GOTO-RESET) with the ones that points to the
# starting address of bootloader.
# Write the original GOTO-RESET instructions to the ending of bootloader.
# So that when a hardware reset occurs, the program will hardwarely points to address zero. Since we've
# already changed the first two instrutions, the program then will jump to the bootloader first instead
# of user's application. After the booloader is done, the program will jump back to user's application
# by reading in the ending GOTO-RESET.
########################################################################################################
def _Modify_Goto_Reset(self):
    # Check if there is GOTO-RESET instruction in the beginning
    if self._flash_Memory[2] == 4 and self._flash_Memory[6] == 0:
        # Copy user's original GOTO-RESET instruction
        user_Lower_Address_Goto = self._flash_Memory[0:2]     # (04)jump to lower 16-bit address, format LSB, MSB
        user_Higher_Address_Goto = self._flash_Memory[4:6]   # (00)jump to higher 16-bit address, format LSB, MSB

        # Replace with the ones that points to where the bootloader is
        bootloader_Lower_Address_Goto = [ self._bootloader_Starting_Address & 0xff, (self._bootloader_Starting_Address >> 8) & 0xff ]
        bootloader_Higher_Address_Goto = [(self._bootloader_Starting_Address >> 16) & 0xff, (self._bootloader_Starting_Address >> 24) & 0xff]

        self._flash_Memory[0] = bootloader_Lower_Address_Goto[0]
        self._flash_Memory[1] = bootloader_Lower_Address_Goto[1]

        self._flash_Memory[4] = bootloader_Higher_Address_Goto[0]
        self._flash_Memory[5] = bootloader_Higher_Address_Goto[1]

        # Fill back in the user's original GOTO-RESET towards to the ending of bootloader
        # The GOTO-RESET after bootloader is set to be physically placed at two instructions
        # ahead of bootloader. Program will jump onto that address once bootloader is finished
        new_Goto_Reset_Address = self._bootloader_Starting_Address / 2 - 2

        self._Flag_LUT(new_Goto_Reset_Address)

        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 0] = user_Lower_Address_Goto[0]
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 1] = user_Lower_Address_Goto[1]
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 2] = 4
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 3] = 0  # Phantom byte

        new_Goto_Reset_Address = new_Goto_Reset_Address + 1
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 0] = user_Higher_Address_Goto[0]
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 1] = user_Higher_Address_Goto[1]
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 2] = 0
        self._flash_Memory[new_Goto_Reset_Address * self._instruction_Size_In_Hex + 3] = 0  # Phantom byte

    else:
        raise Hex32_Invalid("No user's GOTO-RESET found at the beginning of hex file.")

Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller (2)

In my last post “Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller (1)“, I talked about some basics of bootloader and picked up one of the architectures as my solution, which is to place the bootloader towards the end of the on-board flash. In this post, I will go over the work flow of bootloading, hoping to give you an image of how the whole things work out.

After you finish your embedded application design, the compiler will compile your C/C++/Assembly codes into binary, which is usually output under the extension name “hex” in a format of INTEL HEX32. I will talk about the Intel Hex32 format in details later on. Then you open your hex loader software on PC and load in the hex file. Before starting burning the binary into your micro-controller, make sure your programmer (ICD2 or ICD3 for Microchip) is properly connected to your target device, and the compiled codes will be downloaded into the device through PGC/PGD ports after you click the execution button. Microchip has built all the components in the tool chain: compiler (MPLAB IDE X) – hex loader (MPLAB IPE X) – in circuit programmer (ICD2/ICD3) – hardware receiver on the device (PGC/PGD ports). In order to make the bootloader work, you will have to build some of the components in the tool chain for your own, which will probably look like: compiler (MPLAB IDE X) – your own hex loader – your own software/hardware programmer responsible for transmitting the binary through peripherals supported by the device such as USB, UART, SPI, I2C or even SD card – software receiver, which is your bootloader codes that implemented with some peripherals. Yes, as you probably already figured out, designing your own bootloader is not jut the piece of codes hiding in your micro-controller.

3. Understand the standard INTEL HEX32 format
If you open the compiled “.hex” file in hex format, you will find nothing but lines after lines of hex string. All of the lines follow this single format:
:BBAAAATTHHHH….HHHHCC, where:

  • [:] Beginning mark, always as “:”
  • [BB] Number of bytes in HHH…HH part
  • [AAAA] Starting address
  • [TT] Type:
    • 00: Data Record
    • 01: End Record
    • 02: Extended Segment Address Record
    • 03: Start Segment Address Record
    • 04: Extended Linear Address Record
    • 05: Start Linear Address Record
  • [HHHH….HHHH] Data
  • [CC] Checksum

I won’t cover the details of this format here as they are everywhere on the Internet. For example: Microchip DS70619B: dsPIC33E/PIC24E Flash Programming Specification. Instead, I will spend some efforts here looking into how it is implemented on the dsPIC33E/PCI24E device, and putting together all the related details that are scattered across numerous documents. Let’s start with some example hex lines to strengthen your understanding:

:020000040108EA
:0200000212FFBD
:0401000090FFAA5502
:00000001FF
  • Determine the extended linear address offset for the data record (0108 in this example).
    :020000040108EA
  • Determine the extended segment address for the data record (12FF in this example).
    :0200000212FFBD
    
  • Determine the address offset for the data in the data record (0100 in this example).
    :0401000090FFAA5502
  • Calculate the absolute address for the first byte of the data record.
  • + 0108 0000 linear address offset
    shifted left 16 bits
  • + 0001 2FF0 segment address offset
    shifted left 4 bits
  • + 0000 0100 address offset from
    data record
  • = 0109 30F0 32 bit address for
    first data byte
  • Which gives us the following
    010930F0 90
    010930F1 FF
    010930F2 AA
    010930F3 55
    
  • Please note that as soon as data type “02” or “04” appears, all the following lines’ address will be shifted until another “02” or “04” line shows up, then a new shift should be re-caculated.

4. Understand dsPIC33E/PIC24E specific INTEL HEX32 format
Microchip added some of its own rules on top of standard INTEL HEX32 format. They must be understood before a microchip hex file could be correctly translated.  That’s what really confused me during the early stage of the bootloader development.

  1. Each instruction for dsPIC33E/PIC24E is 24-bit or 3-byte wide, and flash program memory is also 24-bit wide. However, in the standard INTEL HEX32, data always come as multiple of 4 bytes. In order to accommodate INTEL’s rule. Microchip throw away the right most byte of every 4 bytes in the hex file. This byte is called “phantom byte”. That’s why you can find in the hex file that the right most byte of every 4 bytes is always “00”. So divide the “BB” part by 4 is the count of instructions contained on that line.
  2. Address in the flash memory is incremented by 2 to read/write one instruction. This is because one 3-byte instruction takes two address if the address is based on 16-bit/2-byte width. Microchip explains this in their DS70609D as “The 24-bit Flash program memory can be regarded as two side-by-side 16-bit spaces, with each space sharing the same address range … the upper byte of Flash program memory that does not exist. This byte is called the ‘phantom byte.” This also explains why dsPIC33EP512MU810 has “User Memory Address Limit” up to 0x557FE (dec. 350,206)  can only support 175,104 (350,208 / 2) instructions.
  3. “AAAA” part (MSB-LSB) of Hex32 represents the starting address of the data. Divide this number by two is the real device address.
  4. To understand GOTO and RESET in the Hex23 file:
    The very second line of a hex file is always the line contains the GOTO and RESET instruction. It tells the chip where to find the first line of code after the chip is powered or reset. It contains only two instructions with address at 0. The first instruction, usually ends with “0x04” (not counting the phantom byte), means “Jump to address low 16-bit”. The first 2 bytes are the address. the second instruction, usually ends with “0x00”, means “Jump to address high 16-bit”. The first 2 bytes are the address.
    Here is an example:

    :0800000000a80400020000004a
    :08 0000 00 00a80400 02000000 4a
     (1)(2)  (3)(4)      (5)      (6)
    

    (1). There are 8 bytes on this line
    (2). Starting address 0
    (3). Type: data record
    (4). 1st instruction: 00 a8 04 00(phantom byte) -> (04)jump lower 16-bit address(a800)
    (5). 2nd instruction: 02 00 00 00(phantom byte) -> (00)jump higher 16-bit address(0020)
    (6). CRC
    This line represents: Jump to address 0x02a800.

    Another example:

    :080000000002040000000000f2
    

    Represents: Jump to address 0x200. (This hex is normal as it jumps to the beginning of user program flash memory of e.g. dsPIC33FJ256GP710A)

Designing bootloader for Microchip dsPIC33E/PIC24E micro-controller (1)

When speaking about good bootloader product for Microchip’s various MCU product series, I put my two cents in “ds30 Loader” from Mikael Gustafsson. However, “ds30 Loader” is no longer available for free to support Micorchip’s latest 70MHz  dsPIC33E/PIC24E products. If you have determined to develop your own bootloader for them, this article could be a reference that you might be looking for. First of all, let’s take a look at  Mikael Gustafsson‘s opensource “ds30 Loader” and then, graduate to your own version of that.

1. Bootloading ABC:
Bootloader itself is essentially one piece of firmware that is stored in the MCU’s flash. Unlike normal application firmware, bootloader’s job is to install a new incoming firmware into is own flash when necessary. Or,let’s put it another way: A micro-controller with bootloader is capable of programming itself. Except for that, it is invisible, that is, you don’t even notice its existence, and the MCU runs just exactly same way as a chip that is burned by a regular programmer. Rule of thumb, the size of firmware must be as small as possible because it also eats your flash. A great advantage of bootloader is that you no longer need a programmer once the bootloader is burned into its flash. This gives the bootloader a very tough challenge — it must be very reliable, or at least, it couldn’t erase or destroy itself by accident.

2. Where and how to place the bootloader:
There are some discussion over where to put the bootloader. Products like “ds30 Loader” use the last user program flash memory block to store the bootloader firmware, others like Microchip’s official “AN1094 – Bootloader for dsPIC30F/33F and PIC24F/24H Devices” put bootloader at the beginning of user memory block, right after the IVT (Interrupt Vector Table, which locates in the first program block of User Program Flash Memory). I personally prefer “ds30 Loader”‘s solution simply because an end user pays zero attention to the booloader when working on his/her own application firmware. In another case, a firmware engineer must specially configure your IDE to avoid overwriting the first user program flash block.

No_Bootloader

Figure 1. A normal without bootloader. Compiler converts your design into binary that uses the very first available flash space in your chip. GOTO will be modified to jump to the beginning of your firmware

Microchip_Bootloader

Figure 2. An architecture in which bootloader is placed at the beginning of the flash space. Technically, bootloader is not placed at the very beginning of the flash space because “GOTO”, “RESET” and “IVT” are already there. Usually, the bootloader is placed in the 2nd program block (erase block) of the flash. Compiler must avoid using the 2nd block so special attention must be paid to configure the compiler. Another disadvantage of this architecture is that, you may have to lose the space right after the IVT because flash is always treated with the minimum unit of block when being erased. “GOTO”, “RESET” and “IVT” are in the 1st block, but not occupying 100% of that, actually not even close to that.  A normal compiler will start using the flash space right after IVT, but in this case, compiler will have to skip the whole 1st block and 2nd block. If you really want that little space back, it is still possible, but requires more complex configuration to your compiler.

My_Preferred_Bootloader

Figure 3. An architecture with bootloader at the end of flash space. So the compiler converts your design in its normal way that uses the first available space right after IVT, no special configuration for the compiler is needed. And you don’t lose anything in the 1st program block. Of course, you must keep in mind that your application mustn’t be too long, otherwise it will overwrite the last block of bootloader. And you can’t save the free space in the last block that is not being used by the bootloader (most likely this will never happen).

You can avoid using programmer any more after bootloader is there. However, before you have it there, you will have to use the programmer to burn the bootloader once. Yes, it is only once!