Garlic Software

assembly

Reducing power usage on an AVR ATTiny4

This is an article about how I reduced the battery power usage of the cool iCufflinks made by Adafruit Industries.

Background

The iCufflinks use an Atmel ATtiny4 microcontroller (MCU) as the brains to controlling the LED lighting pattern. The MCU is an 8-bit processor with 32 bytes of SRAM, only a handful of registers, and 512 bytes of flash for program storage. The stack is stored in the SRAM so you don’t really get to use it for anything.

The original hardware design and software are all open source and can be found on the Adafruit GitHub. One of the things about the design is that it runs on CR1220 batteries and it is recommended that they be changed after 24 hours of use. That is what got me thinking that I could improve this product to increase the amount of time between battery changes.

I have also never read nor written assembly code for an AVR processor and the last time I probably looked at assembly was 386 stuff about 20 years ago. So excuse any minor assembly style issues. I was temped to rewrite the code in C but with the limited flash space I had to rule this out. Had this been a ATtiny9 with 1k bytes I would have gone this route. The small overhead that AVR Studio introduces was just a tiny bit too much for this limited memory space.

Baseline

I needed to measure the baseline power usage of the circuit so I could see what my changes were doing. The problem was that I didn’t have any iCufflinks nor any ATtiny4 devices. So I had to order some components from Digi-Key. I also ordered the same LED model that was used in the cufflinks (this was really more of a guess based on the schematics). Once the order arrived I set to building my test circuit shown in the picture below. The version 1.0 code is running on the left and my modified code is running on the right.

IMG_0538


Now that the circuit was built, I measured the current draw to get my baseline usage. Note that when I’m measuring I’m only measuring one of the little circuits at a time. In the rest of this article I will be mainly using the Average reading when comparing results.

MaxAverageMin
Whole Circuit1.923 mA0.848 mA0.458 mA
MCU Only465.29 μA452.53 μA442.65 μA

I’m making all these measurements with a Fluke 289 multimeter using the mA and μA current settings. Power is coming from a bench top supply at 3V.

Sleep mode

The first thing I noticed about the code was that the processor was active all the time and was basically just constantly counting in a loop intended to introduce a delay in the code.
Here is the delay loop which is intended to make a 17ms delay.

   ; delay!
   ldi delayms, DELAYTIME ; delay ~17 ms
DELAY:
   ldi delaycnt1, 0xFF
   DELAY1MS: ; this loop takes about 1ms (with 1 MHz clock)
      dec delaycnt1 ; 1 clock
      cpi delaycnt1, 0 ; 1 clock
      brne DELAY1MS ; 2 clocks (on avg)
   dec delayms
   cpi delayms, 0
   brne DELAY


As far as power goes, this is really inefficient as it is just sitting there burning power the whole time counting. It would be much better to have the processor sleep for those 17ms. So I introduced the Idle sleep mode to the code. Since we want to keep the PWM constantly running to drive the LED we can’t shut down the chip completely in sleep and need to keep CLKIO active but we can shut down CLKCPU. As it turns out there is only one sleep mode called Idle Sleep Mode where the CLKIO is left running. This code snippet put in your reset vector will enable sleep mode and the default is Idle mode.

   ; enable sleep mode
   ldi temp, (1<<SE) ; by default the mode is 000 Idle
   out SMCR, temp

The other issue is how to wake up from sleep so we can continue the work. The PWM is using our one counter so we can’t use that and there is no external component to trigger the interrupt pin INT0. So the only option is to repurpose the watchdog timer to generate an interrupt. The default watchdog timer is set for 2k cycles at 128kHz which is about 16ms. That delay is close enough to the 17ms of the original code so I’m going to just use that default value. The following code snippet put in your reset vector will setup the watchdog timer in interrupt mode and enable interrupts.

   ; setup watchdog
   ldi temp, 0xD8 ; write signature
   out CCP, temp
   ldi temp, (0<<WDE)|(1<<WDIE) ; set watchdog in interrupt mode
   out WDTCSR, temp

   sei ; enable global interrupts

Now that we have sleep mode enabled and the watchdog all setup, the delay loop can now be replaced with a simple reset of the watchdog timer and a sleep call. You also need to add in the watchdog interrupt vector and the interrupt handler, see the code where WDT is defined and used.

   ; reset the watchdog timer to full value and sleep until it pops an interrupt
   wdr
   sleep

Measuring the power savings with just the sleep mode added is pretty significant.

BaselineModified
MCU Only452.53 μA191.31 μA

Saves about 261 μA.

Enable pull-ups

Next up is a little trick I learned about while reading an application note on picoPower (see link in references section at the end). PicoPower is one of Atmel’s power saving technologies in some of the newer ATtiny devices (not the ATtiny4). It turns out that the chip wastes power switching on I/O pins if the pins are floating and don’t have any pull-up resistor on them. This is also mentioned in the data sheet. The easiest method to fix this is to enable the built-in pull-ups on unused port pins. Since we are using PB0 for the PWM we wont touch that one, but everything else is unused and can be fixed. The following code snippet in your reset vector is all you need.

   ; setting all pullups on unused pins (for power savings)
   ldi temp, (1<<PUEB3)|(1<<PUEB2)|(1<<PUEB1)
   out PUEB, temp

BaselineModified
MCU Only452.53 μA176.11 μA

Saves about 15 μA.

Slow the clock down

The ATtiny4 can run at up to 8MHz with its’ built-in oscillator. From the factory it is set with a clock prescale of 8 so it is really running at 1MHz. The main thing this code does is load a value from memory and set the PWM to this value. This does not need to run at 1MHz and we can really slow this down to the slowest possible setting and it will still be plenty fast for our purposes. The largest clock division factor available is 256 which will result in the clock running at 32kHz. So again, adding this little code snipped in the reset vector will slow the clock down.

   ; changing clock prescale to slow down the processing power (for power savings)
   ldi temp, 0xD8 ; write signature
   out CCP, temp
   ldi temp, (1<<CLKPS3)|(0<<CLKPS2)|(0<<CLKPS1)|(0<<CLKPS0) ; scale to divide by 256
   out CLKPSR, temp

BaselineModified
MCU Only452.53 μA139.76 μA

Saves about 36 μA.

Shrink the code

Looking at the compile time statistics you can see that the largest part of the compiled app is a huge block of PWM values. This is the original code size before I changed anything.

   ATtiny4 memory use summary [bytes]:
   Segment Begin End Code Data Used Size Use%
   ---------------------------------------------------------------
   [.cseg] 0x000000 0x00015e 50 300 350 512 68.4%

The program memory space in the chip is only 512 bytes and this block was taking up 300 bytes. This is what it looks like:

PULSETAB:
.db 255, 255, 255, 255, 255, 255, 255, 255, 252, 247, 235, 235, 230, 225, 218, 213, 208, 206, 199, 189, 187, 182, 182, 177, 175, 168, 165, 163, 158, 148, 146, 144, 144, 141, 139, 136, 134, 127, 122, 120, 117, 115, 112, 112, 110, 110, 108, 103, 96, 96, 93, 91, 88, 88, 88, 88, 84, 79, 76, 74, 74, 72, 72, 72, 72, 69, 69, 62, 60, 60, 57, 57, 57, 55, 55, 55, 55, 48, 48, 45, 45, 43, 43, 40, 40, 40, 40, 36, 36, 36, 33, 33, 31, 31, 31, 28, 28, 26, 26, 26, 26, 24, 24, 21, 21, 21, 21, 20, 19, 19, 16, 16, 16, 16, 14, 14, 14, 16, 12, 12, 12, 12, 12, 9, 9, 9, 9, 9, 9, 7, 7, 7, 7, 7, 7, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7, 7, 9, 9, 9, 12, 12, 12, 14, 14, 16, 16, 16, 16, 21, 21, 21, 21, 24, 24, 26, 28, 28, 28, 31, 36, 33, 36, 36, 40, 40, 43, 43, 45, 48, 52, 55, 55, 55, 57, 62, 62, 64, 67, 72, 74, 79, 81, 86, 86, 86, 88, 93, 96, 98, 100, 112, 115, 117, 124, 127, 129, 129, 136, 141, 144, 148, 160, 165, 170, 175, 184, 189, 194, 199, 208, 213, 220, 237, 244, 252, 255, 255, 255, 255, 255, 255, 255, 0

I looked at that data and saw so many duplicate values right next to each other that I though I could at least halve the data and still retain the same visual look at the LED. It turns out that the big 300 byte block of PWM values in the code is a bit redundant. So I chopped the data in half while rounding down.

PULSETAB:
.db 255, 255, 255, 255, 250, 235, 228, 216, 207, 194, 185, 180, 171, 164, 153, 145, 143, 138, 131, 121, 116, 112, 110, 106, 96, 92, 88, 88, 82, 75, 73, 72, 71, 66, 60, 57, 56, 55, 52, 47, 44, 42, 40, 38, 36, 33, 31, 30, 27, 26, 25, 23, 21, 20, 19, 16, 16, 14, 14, 12, 12, 11, 9, 9, 8, 7, 7, 6, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 5, 7, 7, 8, 9, 12, 13, 15, 16, 18, 21, 22, 25, 28, 30, 33, 36, 40, 43, 46, 53, 55, 59, 63, 69, 76, 83, 86, 90, 97, 106, 116, 125, 129, 138, 146, 162, 172, 186, 196, 210, 228, 248, 255, 255, 255, 0

This reduced nicely and with my other changes in the code the main code only grew by 18 bytes.

   ATtiny4 memory use summary [bytes]:
   Segment Begin End Code Data Used Size Use%
   ---------------------------------------------------------------
   [.cseg] 0x000000 0x0000e8 68 150 218 512 42.6%

Not only does this save 150 bytes of precious space, but there is one more benefit. Since I had changed the code to sleep between changes of PWM values, I needed to now double the sleep time because I had half the data. Turns out this is really easy by just changing the watchdog timer to go twice as long. It was going at 2k cycles (about 16ms) and this change moves that to 4k cycles (about 32ms). This snippet replaces the previous watchdog setup.

   ; setup watchdog
   ldi temp, 0xD8 ; write signature
   out CCP, temp
   ldi temp, (0<<WDE)|(1<<WDIE)|(1<<WDP0) ; set watchdog in interrupt mode and 4k cycles
   out WDTCSR, temp

This code change doesn’t really save much at all in power as we were already spending most of the time sleeping, but there is a tiny improvement.


BaselineModified
MCU Only452.53 μA138.01 μA

Saves about 2 μA.

Results

Here are the final results of these changes to improve battery life.


BaselineModified
Whole Circuit0.848 mA0.533 mA
MCU Only452.53 μA138.01 μA

Saves about 315 μA.

As you can see from the data there is about a 395 μA draw from the LED and resistor that makes up the rest of the circuit. With the current hardware there is nothing I can do about this draw.

The overall effect this has on the product is that the 24 hour time between battery changes can be upped to 38 hours. That is a pretty good power savings for the day.

Resources

My code on GitHub

References

Atmel AVR ATtiny4 Data Sheet
Atmel Application Note AVR4013: picoPower Basics
Atmel AVR Studio 5
Fluke 289 Multimeter