Configuring SPADE to use a local warehouse

This article explains how to configure a SPADE deployment so that data dropped into the local dropbox is placed into the local warehouse.

Pre-requisites

The pre-requisite of this scenario is that the nest-spade-war project has been install and running as outlined here, and that the log output of the JBoss server server can be seen in a second terminal.

The rest of this document assumes the following environmental variables are set to their appropiate values. If you followed the standard installation then the following commands will set the correct values.

export WILDFLY_HOME=${HOME}/server/wildfly-9.0.2.Final
export SPADE_VERSION=3.0.1
export SPADE_HOME=${HOME}/nest-spade-war-${SPADE_VERSION}
export SPADE_WAR=${SPADE_HOME}/target/spade-${SPADE_VERSION}.war

Configuration

The default deployment of SPADE uses the ${HOME}/spade directory to hold its configuration files. This area is also used, by default, for the warehouse and dropbox areas. The warehouse area can be changed by editing the <warehouse/root> element in the main configuration file, spade.xml, which is created, if it does ot exist, the first time SPADE is run.

A file stream is declared to SPADE by means of a registration file. A registration defines what files are included in the stream and how it should be handled. You can set up this scenario's registration by using the following commands.

mkdir -p ~/spade/registrations/local
cp ${SPADE_HOME}/src/main/extras/examples/registration.1.xml ~/spade/registrations/local

If you review the contents of the registration file you will see that the main part of it is a definition of the dropbox, along with assignment of a local identity unique to this registration. (Currently it is up to the user to make sure these identities are unique as an automatic system is not yet in place.) The <dropbox> element contains three parts: a <location> element that defines the directory that acts as the dropbox; a <pattern> element that defines how a semaphore file is detected with that directory; and <mapping> element that is used to map the semaphore file's name onto the location of its associated data. In this example the <dataSuffix> element means that the associated data location is constructed by replacing the semaphore file's suffix with the suffix specified by this element.

Execution

The following command deploys the SPADE application so that it can read the registration created in the previous section.

${WILDFLY_HOME}/bin/jboss-cli.sh --connect --command="deploy \
    --name=spade.war ${SPADE_WAR}"

Now that SPADE is running you can create and submit some test data using the following commands.

mkdir -p ~/spade/dropbox/scenario/warehouse
cat > ~/spade/dropbox/scenario/warehouse/mock.1.data << EOF
put some junk in here
EOF
touch ~/spade/dropbox/scenario/warehouse/mock.1.sem

By default, SPADE polls its dropboxes for new files every minute, but if you do not want to wait then you can execute the following command.

${SPADE_HOME}/src/main/python/spade-cli local_scan

In either case you should now see, in SPADE's log file, the data file being processed. You know processing is complete when the "finisher" activity for the file has stopped. You also now see the resulting file in the warehouse using the following command.

find ~/spade/warehouse -name "mock.1.*"

You should also notice that there is a mock.1.meta.xml file in the warehouse. This contains the metadata for the file as created by SPADE. You can examine this file, which is in XML, using the following command.

xmllint -format `find ~/spade/warehouse -name "mock.1.meta.xml"`

The default metadata simply contains the date and time that the semaphore file was originally created. This is the time used, by default, to place the file in the warehouse with its final directory derived from that original time, as measured in UTC.

Cleanup

Having successfully completed this scenario you should now undeploy the application using the following command.

${WILDFLY_HOME}/bin/jboss-cli.sh --connect --command="undeploy spade.war"