gcell solves the problem of efficiently distributing potentially small tasks across the SPEs by using a distributed SPE-centric scheduler that pulls work to SPEs as they become available. In addition gcell provides high-performance DMA of arguments to and from the SPEs, task completion notification to client processes, and binding and rendezvous between PPE and SPE code. Benchmarks show near linear speed-up from 1 to 16 SPEs on 2-way Cell blades.

The Interface

The primary interface is given in

The benchmark code is


Here's a snapshot of our current scaling performance. The test code submits 500k jobs and times how long it takes to complete all the jobs as a function of the number of spes used. Several runs are made, which vary in how much work each job does. For the purposes of the benchmark, the jobs busy wait for a specified number of microseconds on the SPE. This is called the 'work increment'. Thus, if the work increment is 10 us, the total useful_work is 10 us * 500k jobs = 5 seconds. The Y-axis plots useful_work divided by the real time required to complete all jobs. It's basically "speedup".

QS21 Performance:

PS3 Performance:

R-10231-ps3-20090115-0226.png - PS3 graph (25.2 KB) Eric Blossom, 01/15/2009 10:42 AM
R-10231-qs21-20090115-0247.png - QS21 graph (32 KB) Eric Blossom, 01/15/2009 11:06 AM

Gcell(英文原文出处,以上翻译整理仅供 GNU Radio 爱好者参考! email: gnuradio@microembedded.com