Just do this on the CPU using DMA and PIO.
Talking through the look up table (lut). You have 4 pins, each pin has its own bit sequence table within a 8-bit parallel write. There are 20 steps per period. I can process four steps in parallel. Which have 16 possible bit sequences.
Remember to inline get_table so the compiler can optimize the order. It should be mostly pure (lut breaks this), so we do not care about the scope violations.
Talking through the look up table (lut). You have 4 pins, each pin has its own bit sequence table within a 8-bit parallel write. There are 20 steps per period. I can process four steps in parallel. Which have 16 possible bit sequences.
Code:
typedef uint32_t custom_t[5]; // We do four step in parallel using SIMD bit operation on int. uint32_t lut[4][1 << 4]; // Four bits in four bits out. template <typename T> inline void PWM<T>::build_period(custom_t *result, uint32_t v0, uint32_t v1, uint32_t v2, uint32_t v3) { if (result != nullptr) { for (uint8_t nib = 0; nib < 20; nib += 4) { SIMD::SIMD_QUARTER<T> *c[4] = { get_table(v0 % (1 << 20), 0, nib), get_table(v1 % (1 << 20), 1, nib), get_table(v2 % (1 << 20), 2, nib), get_table(v3 % (1 << 20), 3, nib) }; for (uint32_t i = 0; i < 5; i++) { // Superscalar Operation (forgive the loads) SIMD::SIMD_QUARTER<T> p = *c[0] | *c[1] | *c[2] | *c[3]; (*result)[i] = p.v; // Superscalar operation for (uint32_t j = 0; j < 4; j++) ++c[j]; } } } } template class PWM<uint32_t>;
Statistics: Posted by dthacher — Wed Nov 06, 2024 9:43 am