Posted by Peter Curran
If you are an Endeca software engineer working on projects using CAS manipulators, make sure they are thread-safe.
What does this mean and why is it important?
Unlike Forge adapters which are single-threaded, CAS manipulators are multi-threaded. This means that if you have a CAS manipulator where your process(record) method is modifying a class resource (like a HashMap in our case), you need to make sure concurrent access to it is controlled (we used a ConcurrentHashMap). The process(record) method is called by multiple threads on a single record store in order to split up the record processing workload between multiple threads. Our manipulator stored records in this HashMap during process() and processed and outputted them in onInputClose(). This meant that multiple threads were trying to access the HashMap.
The result of not controlling access to a shared resource between threads resulted in our CAS crawl locking up. It would spin for hours on end with no progress and when we tried to abort the crawl, it wouldn’t even be able to exit cleanly until we killed the CAS process through the command line. The logs also didn’t contain any useful information because the crawl wouldn’t fail gracefully. To identify the issue, we set the number of threads in the crawl to 1. We did this in the Endeca Workbench (Data Sources->your-crawl-name->Advanced Settings). Then by observing that the crawl functioned as expected under one thread and broke running on multiple threads, we were able to isolate the issue. Just FYI – setting the number of threads to 1 is not a long-term solution, but it does provide a way to test functionality with any complicated manipulators that might be running into concurrency issues. We hope this information will help other developers running into similar behavior to that which we observed – especially with larger record sets. For reference, we observed that with crawls containing less than 10,000 records, CAS uses a single thread which won’t manifest any concurrency issues. However, in crawls with over 10,000 records, multi-threading kicks in and can create failures/lockups if non-thread-safe manipulators are used.