Non-Dropbox Data

Overview

This article explains how to configure a SPADE deployment so that it can pick up and transfer data that is not placed in the dropbox.

Adding a Registration

So far all configurations of SPADE have expected to find data in the same directory as its associated semaphore file. After SPADE has copied the semaphore file and its data into its cache it deletes these from the drop box. However, there are situations where the data already exists elsewhere and making a copying of it in the dropbox gains you nothing. In those situations you can define a registration that does not take its data from the dropbox, but rather takes it from some other location. You can install a registration that does just that by using the following command.

mkdir -p ${HOME}/spade.zero/spade/registrations/local
cat > ${HOME}/spade.zero/spade/registrations/local/non-dropbox.4.xml << EOF
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<registration>
    <local_id>non-dropbox.4</local_id>
    <drop_box>
        <location>
            <directory>data/spade/dropbox</directory>
        </location>
        <pattern>.*.ndd</pattern>
        <mapping>gov.lbl.nest.spade.tour.PathDataLocator</mapping>
    </drop_box>
    <owner>false</owner>
</registration>
EOF

If you compare this to the first registration of this tour, you will see two significant change here (the change in the pattern element is simple to make sure this is a different file stream).

  • The addition of the mapping element within the dropbox element. We have come across this before in the Output Transfer scenario where it was described as “this specifies the Java class that will be used to derived the bundle and data file names based on the semaphore file.” In this case the PathDataLocator class reads the data files path from the metadata provided in the semaphore files that, in this case, is an instance of the PathMetadata class.
  • The addition of the owner element within the main registration element. In this case, as it is declared as false, it means the SPADE will not own the data file and therefore will not delete it when it has been ingested into SPADE. (The default value is true which, as we seen throughout the preceding parts of te tour, mean SPADE will delete the data file.)

Updating the SPADE configuration

Provided you have run then Customizing Metadata part of the tour, then the SPADE deployment will already be configured to handle the necessary metadata. If you haven’t, then go back and follow the instructions there.

You are now ready to redeploy SPADE:

docker exec -it tour_spade \
    cp wars/spade-${SPADE_VERSION}.war \
    /opt/wildfly/standalone/deployments/spade.war

Creating data and semaphore files

The creation and the data and semaphore files follow the same approach as that in the Customizing Metadata scenario, but in this case the data file is not created in the dropbox but in a different location.

mkdir -p mkdir -p ${HOME}/spade/shared/spade.zero/tour/local/structure
cat > ${HOME}/spade/shared/spade.zero/tour/local/structure/tour.6.data << EOF
This data file does not need to be in the dropbox
EOF

cat > ${HOME}/spade.zero/dropbox/tour.6.tmp << EOF
<path_metadata>
    <path>/opt/jboss/data/external/spade.zero/tour/local/structure/tour.6.data</path>
</path_metadata>
EOF

mv ${HOME}/spade.zero/dropbox/tour.6.tmp ${HOME}/spade.zero/dropbox/tour.6.ndd

docker exec -it tour_spade bash -l -c "spade-cli local_scan"

Seeing the File in the Warehouse

As with the Local Warehousing scenario, you can see where the file are in the warehouse with the following command.

find ${HOME}/spade.zero/warehouse -name "*tour.6.*"

Seeing the Data File in its Original Location

Because the registration used here declared owner to be false you can see that the data file remains at its original location as well as being move into the warehouse.

ls -l ${HOME}/spade/shared/spade.zero/tour/local/structure/tour.6.data

NEXT STEP