how to get endeca cas to update

Posted by Peter Curran


This is a post for any Endeca devs who may run into an issue running Oracle Commerce 11.1, on an application running with CAS only (without Forge).

You may find that Baseline update works fine but that you have an issue with partial updates.

In our case we have an extract file containing the subset of records that need to be updated. However, this file only lists a small subset of properties for each record (i.e. it only provides the properties that have actually changed).

When performing a partial update (using the default mechanism that comes with the CAS-only deployment template), it completes successfully but the records that were updated have only the subset of fields provided in the file – all of the fields that haven’t changed are simply missing. It’s as if CAS simply replaced the existing record (with the full set of properties) with a new record only containing the few properties in the extract file.

For example, say one of the records looks like this:

Record 23
---------
id 23
name Test
inventoryCount 23
buyable 1
imageUrl test.jpg

and say the partial extract file has an entry like this:

Record 23
---------
id 23
inventoryCount 10

The result after a partial update is this:

Record 23
---------
id 23
inventoryCount 10

How can we get CAS to preserve those properties instead of removing them, as is possible with Forge?

Since there is not an explicit mechanism to do this we came up with the following solution:

To summarize how it works: we customized the PartialUpdate beanshell script so that, right after the last mile crawl runs, it invokes a custom-component we created called DGIDXTransformer (i.e. it extends CustomComponent). This class unzips and parses the file that the last-mile-crawl creates which is supposed to be fed into DGIDX and writes out a modified version of that file. Specifically, it modifies all of the update information so that the records will be updated instead of replaced with the new properties. The format of the DGIDX input file is not documented, but according to our research that format is unlikely to change very drastically in future versions of Endeca.

Here’s DGIDXTransformer:

import com.endeca.soleng.eac.toolkit.component.*;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.io.*;
import java.nio.file.AccessDeniedException;
import java.nio.file.Files;
import java.util.Map;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

/**
 * Custom component which runs during the PartialUpdate beanshell script. It transforms the DGIDX-compatible input file
 * that CAS produces so that records will be updated instead of replaced.
 *
 * Expects only one property entry called "dgidxInputFileDirectory", specifying the directory to look in to
 * find the file to transform (relative to the config directory).
 *
 * @author chairbender
 */
public class DGIDXTransformer extends CustomComponent {
    private static final String DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME = "dgidxInputFileDirectory";
    private static final String RECORD_SPEC_PROPERTY_NAME = "record.spec";

    /**
     * Does the transformation as specified in the class javadoc.
     */
    public void transformDGIDXInputFileToUpdateInsteadOfReplace() throws Exception {
        //Find the file in the directory
        Map<String, String> properties = getProperties();
        if (null == properties || !properties.containsKey(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME)) {
            throw new Exception();
        } else {
            File directory = new File(properties.get(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME));
            File[] gzipFiles = directory.listFiles(new FilenameFilter() {
                @Override
                public boolean accept(File dir, String name) {
                    return name.endsWith(".xml.gz");

                }
            });
            if (gzipFiles == null || gzipFiles.length == 0) {
                throw new Exception();
            } else {
                File gzipFile = gzipFiles[0];
                File unzippedFile = unzipFile(gzipFile);

                transformInputFile(unzippedFile, unzippedFile.getAbsolutePath().replace(".xml", "transformed.xml"));

                //delete the extra files in a way that throws an exception if deletion fails
                Files.delete(gzipFile.toPath());
                Files.delete(unzippedFile.toPath());

            }
        }

    }

    /**
     * Gzips the passed file and saves it at the specified location
     * @param toGzip file to gzip
     * @param outputPath where to output the gzipped file
     *
     */
    private void gzipFile(File toGzip,String outputPath) throws IOException {
        byte[] buffer = new byte[1024];

        GZIPOutputStream gzipOutputStream =
                new GZIPOutputStream(new FileOutputStream(outputPath,false));

        FileInputStream inputStream =
                new FileInputStream(toGzip);

        int len;
        while ((len = inputStream.read(buffer)) > 0) {
            gzipOutputStream.write(buffer, 0, len);
        }

        inputStream.close();

        gzipOutputStream.finish();
        gzipOutputStream.close();
        inputStream.close();
    }

    /**
     *
     * @param unzippedFile file representing DGIDX input data to transform
     * @param transformedFilePath path where transformed file should go.
     * @return the transformed file
     */
    private File transformInputFile(File unzippedFile, String transformedFilePath) throws IOException {
        File outputFile = new File(transformedFilePath);

        //Since the XML and the transformation isn't very complicated, we'll just write it out line by line as we go through the
        //unzipped file line-by-line
        BufferedReader unzippedFileReader = new BufferedReader(new FileReader(unzippedFile));
        BufferedWriter outputFileWriter = new BufferedWriter(new FileWriter(outputFile));

        String nextLine;
        while ((nextLine = unzippedFileReader.readLine()) != null) {
            if (nextLine.contains("RECORD_ADD_OR_REPLACE")) {
                //If the line contains RECORD_ADD_OR_REPLACE, need to change it to RECORD_UPDATE
                outputFileWriter.write(nextLine.replace("RECORD_ADD_OR_REPLACE","RECORD_UPDATE"));
            } else if (nextLine.contains("<PROP NAME=")) {
                //if this line contains <PROP NAME="...">, and the property
                //name isn't the record spec, we need to transform this element only if it isn't the record spec.
                String propertyName = nextLine.split("\"")[1];
                if (!propertyName.equals(RECORD_SPEC_PROPERTY_NAME)) {
                    //Read the property value from the next line
                    String propertyValueLine = unzippedFileReader.readLine();
                    String propertyValue = propertyValueLine.replace("<PVAL>","").replace("</PVAL>","").trim();

                    //Now write the PVAL_DELETE and PVAL_ADD entries
                    outputFileWriter.write("<PVAL_DELETE><PROPERTY_NAME NAME=\"" + propertyName + "\"/></PVAL_DELETE>");
                    outputFileWriter.write("<PVAL_ADD><PROP NAME=\"" + propertyName + "\"><PVAL>" + propertyValue + "</PVAL></PROP></PVAL_ADD>");

                    //Discard the closing element line of the input file
                    unzippedFileReader.readLine();
                } else {
                    //it's not the record spec, so don't transform it.
                    outputFileWriter.write(nextLine);
                }
            } else {
                //Just output the line
                outputFileWriter.write(nextLine);
            }
        }
        unzippedFileReader.close();
        outputFileWriter.close();
        return outputFile;
    }

    /**
     *
     * @param gzipFile file to un-gzip. Will create the un-gzipped version in the same directory as gzipFile,
     *                 but without the ".gz" ending.
     * @return the unzipped version of the file.
     */
    private File unzipFile(File gzipFile) throws IOException {
        //Un-gzip the file in one pass
        GZIPInputStream gzipInputStream =
                new GZIPInputStream(new FileInputStream(gzipFile));
        File outputFile = new File(gzipFile.getAbsolutePath().replace(".gz",""));
        FileOutputStream outputStream =
                new FileOutputStream(outputFile);

        int len;
        byte[] buffer = new byte[1024];
        while ((len = gzipInputStream.read(buffer)) > 0) {
            outputStream.write(buffer, 0, len);
        }

        gzipInputStream.close();
        outputStream.close();

        return outputFile;
    }

}

This is compiled into a JAR which goes in config/lib/java.

Here’s the custom component definition in DataIngest.xml:

<custom-component id="DGIDXTransformer" host-id="ITLHost" class="com.chairbender.DGIDXTransformer">
    <properties>
        <property name="dgidxInputFileDirectory" value="../data/cas_output" />
    </properties>
</custom-component>

And here’s the relevant part of the custom PartialUpdate script:

CAS.runIncrementalCasCrawl("${lastMileCrawlName}");     
  DGIDXTransformer.transformDGIDXInputFileToUpdateInsteadOfReplace();     
  CAS.archiveDvalIdMappingsForCrawlIfChanged("${lastMileCrawlName}");

Now, with upcoming versions of CAS, Oracle may end up adding this sort of functionality to the product. In the meantime, this solution will help you optimize your partial updates and give you some valuable experience with several of Endeca’s flexible extension points.

Endeca Guidance Systems Download

Need further strategic or technical help?