The SPADE Project

Welcome to the SPADE project. The aim of this article is to give you a feel for what the SPADE application does, and whether it will be useful to you. Clearly not all the details can be included in this article but if it looks like SPADE may be useful feel free to dig a little deeper. At the moment this site is fairly limited but we are working on fleshing it out as quickly as we can.

Introduction

SPADE is a JEE 7 application that takes files from wherever they are produced, i.e. at an experiment, and delivers them into a data warehouse from which they can be retrieved for analysis and also archived.

Requirements

One of the key aims of SPADE is to keep its installation requirements to a minimum. To this end the only absolute requirements are:

  • A Java 1.8 SDK installation

  • Network access to download everything else

However, for a real production installation it is recommended that you have access to a good DBMS. PostgresSQL 9 is supported directly, while other DDMSs can be used provided they have a JDBC 4 implementation; the set up is just a little longer.

To use SPADE to transfer file from one location to another you will also need a functional email address for each SPADE installation. Email is used to provide out-of-band exchange of information.

The SPADE application is deployed as a Web Archive (WAR) and is JEE 7 compliant. Therefore it should be able to run (with a couple of minor modifications) on any JEE 7 server. At the moment development is done using Wildfly server , therefore all the instuctions on this web site refer to that when discussing a JEE 7 server. If you want to deploy SPADE in a different one, such as glassfish , then this needs to be installed on the target host.

How does the SPADE Application Work?

To begin with, data is written to a file or a series of files within a directory. When writing is complete a accompanying semaphore file is put into a directory that is called a " SPADE dropbox. " The appearance of the semaphore file signals to SPADE that the original data are ready for movement and managing. SPADE starts this by transferring both the data and semaphore file into its cache, thus freeing up space for more data to be written. Once in the cache, SPADE uses the semaphore file, along with any specified defaults and a customizable MetadataManager class, to create metadata to accompany the data.

In the simplest deployment the warehouse is managed by the same instance of SPADE that fetches the data and semaphore file from the drop box. In this case the data and metadata file are copied into the warehouse ready to be accessed by clients. The placement of the data is determined by a customization of the PlacementPolicy interface. Once the data has been placed in the warehouse SPADE is able to trigger the execution of a specified script in order to execute a prompt analysis of the newly delivered data.

In more complex deployments there are options to send files to a remotely managed SPADE warehouse, archive files to tertiary storage and create copies of files locally outside the SPADE managed warehouse. The first step in sending data to a remote warehouse or archiving it is to wrap the data and metadata file up into a single file. If necessary this wrapped file can also be compressed in order to save space and bandwidth. The resulting packaged file (which may or may not be compressed) can be transferred to another SPADE deployment, at which point is becomes known as an "inbound file" .

A SPADE deployment detects inbound files from other deployments in a similar way to local files, i.e. their appearance in a "receiving" dropbox along with an appropriate semaphore file. Upon arrival, the packaged file is copied into the local SPADE's cache. From here in can be immediately sent on to another SPADE deployment, i.e. relayed through this deployment, as well as being immediately archived. It can also be expanded (if necessary) and unwrapped to reveal the original data plus its metadata file. These can then be placed in that SPADE's warehouse ready for local retrieval. Again this SPADE is able to trigger the execution of a specified script upon placement of the file in its warehouse.

If files are not being locally warehoused by the SPADE, i.e. local files are only being sent to a remote SPADE for warehousing, it is still possible for analysis to be run on those files locally. In this situation the SPADE deployment can be configuration to duplicate the data into a local directory ready for analysis. The management of the file after it has been duplicated into that directory is not done by the SPADE program. That responsibility is outside the its scope (otherwise it would be a local warehouse!).

A fully idea of how SPADE works can be found in the article about its workflows .

Installation

Detailed instructions on how to install the "vanilla" version of SPADE, namely nest-spade-war , are given in this article . These instructions walk you through the complete installation up to the point where you can loopback files to the same instance, which only requres the Java 1.8 SDK to be available. A second article explains how you can connect two different SPADE deployements to each other so that they can transfer files.

Both of the articles mentioned in the previous paragraph use the H2 DBMS that comes with WildFly 9.0.2 , but that it not recommended for major production deployment. Therefore there is A third article explaining how, for a WildFly 9.0.2 installation, you can connect to a production quality DBMS.

Monitoring

A discussion on available monitoring data is available in this article .