Analyzing Data

Overview

This article explains how to configure a SPADE deployment so that it can run an analysis on any files it places in the local warehouse.

Adding a Registration

When data is analyzed is determined by its registration. The following commands creates a file containing a suitable registration.

mkdir -p ${HOME}/spade.zero/spade/registrations/local
cat > ${HOME}/spade.zero/spade/registrations/local/analyze_data.3.xml << EOF
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<registration>
    <local_id>analyze.data.3</local_id>
    <drop_box>
        <location>
            <directory>data/spade/dropbox</directory>
        </location>
        <pattern>.*.ayz</pattern>
        <mapping>gov.lbl.nest.spade.tour.AnalysisLocator</mapping>
    </drop_box>
    <analyze>true</analyze>
</registration>
EOF

Updating the SPADE configuration

Analysis of data is done the the analysis task. Therefore, in order for SPADE to analyze data as it is put in the warehouse you need to add a suitable element in the the spade.xml file. In this case, the following element can be added to the assembly element immediately after its name element.

<activity>
    <name>analyzer</name>
    <init-param>
        <param-name>command</param-name>
        <param-value><![CDATA[/opt/jboss/data/analysis/scripts/simple.sh {0} {2}]]></param-value>
    </init-param>
    <init-param>
        <param-name>output</param-name>
        <param-value>/opt/jboss/data/external/spade.zero/log</param-value>
    </init-param>
</activity>

You are now ready to redeploy SPADE:

docker exec -it tour_spade \
    cp wars/spade-${SPADE_VERSION}.war \
    /opt/wildfly/standalone/deployments/spade.war

Creating Log File Area

You can see from the activity declaration above, SPADE expects to write the log files specified directory. Therefore before continuing you should create the appropriate directory with the following command.

mkdir -p ${HOME}/spade/shared/spade.zero/log

Creating data and semaphore files

You can now send a data file to the local warehouse and have it analyzed with the following:

cat > ${HOME}/spade.zero/dropbox/tour.3.data << EOF
This data file should be analyzed after it is placed in the local warehouse
EOF
touch ${HOME}/spade.zero/dropbox/tour.3.ayz

docker exec -it tour_spade bash -l -c "spade-cli local_scan"

Seeing the Results of Analysis

The following command will show the files generated by the analysis.

find ${HOME}/spade/shared/spade.zero/log/ -name "*tour.3.*"

There are two pairs of files. The first pair, in the date stamped directory tree, contain the output of the script run by SPADE, simple.sh. The second, under the analysis directory, contain the re-directed output of the script, analysis.sh, run by the simple.sh script.

In this case the analysis simple outputs the bundle name and its file location.

more $(find ${HOME}/spade/shared/spade.zero/log/ -name "*tour.3.log")

This is just an example of how to organize the analysis, the only contains are the script specified in the spade.xml file will write its output in a date based hierarchy.

NEXT STEP