My current PhD project is designing a hardware based search accelerator. Essentially, it will be like a graphics/audio co-processor, that will accelerate search algorithms. The surprising thing that I had encountered when I first started on this project more than 1 year ago, was that there is very little prior-art and literature on this matter. There is a lot of work done on hardware engines that sped up different things that would contribute to search, but there didn’t seem to be a comprehensive look at a new architecture to speed up search algorithms.
So, I thought, why not? We all know that search is an integral part of modern computing. It’s so integral that it is working itself downstream, from the back-end enterprise servers onto a user desktop today. Everyone can and will benefit from search acceleration. Therefore, it makes sense to try to design a processor architecture that has search in mind. There are several characteristics of search architectures that modern microprocessor architectures aren’t handling in the best method.
So, what are the major bottlenecks of running a search algorithm on a general purpose processor? Please keep in mind that I’m not a CompSci student. So, my knowledge in these matters is limited. However, I started my approach by looking at algorithms and data structures. After reading up on these topics, I have found some angles to attack the problem from.
- Processor Architecture
General purpose processors (GPP) aren’t designed to handle algorithms quickly. This isn’t much of a discovery as everyone knows that the GPP isn’t optimised for specific applications but made to be a Jack of all trades. So, we can attack this problem by reducing the processor architecture and changing it into something more suited to handling algorithms.
- Memory Architecture
All algorithms are N-bound operations. In the case of search algorithms, they are bound by the number of records that need to be parsed through. Due to limitations in memory technology, getting at these records may prove to be expensive. The method used to speed up memory access is caching. However, present day caches exploit temporal and spatial locality of reference. In the case of search algorithms, there isn’t much temporal locality as once a record has been parsed through, it is rarely needed again. So, we can attack this problem by designing a new cache architecture that takes structural locality into account.
- Search Operations
Search operations perform some sort of comparison against a key and then do something if it matches the comparison criterion. In GPPs, these are typically implemented as conditional branch instructions. Even with the advances made in branch prediction technology, branching is still an expensive operation, whether in time or transistors. So, we can attack this problem by designing an architecture that reduces branches and makes branching cheap.
So, hopefully, by attacking the problem from these different angles, I would be able to design a processor architecture that is suitable for speeding up search algorithms. I’m not sure how much of a speed up I can hope to obtain. However, I’m hoping that it will more than double the search performance, when compared to a standard processor architecture. As with all my other processor designs, I plan to keep my design elegantly simple, small and fast. This is proving to be the problem.