<xweld/>-xmlwarehouse

The xweld-xmlwarehouse project provides an extension of the core and services modules in the <xweld/> platform, supporting the creation and maintenance of a hybrid EAV/XML document warehouse that is fully audited and versioned.

Useful Reading

Primary Features

Prebuilt xweld-xmlwarehouse functionality includes:

  • A highly-scalable and performant warehouse of XML versioned documents that can be searched by configurable EAV attribute values
  • A configurable runtime service that writes XML entities into the hybrid EAV model over any JPA persistence store

Some benefits of representing data in EAV form include:

  • Flexibility - There are no arbitrary limits on the number of attribute types per entity. The number of attributes associated with an entity can grow as the model evolves, without schema redesign. This insulates applications from the consequences of change and provides domain model independence.
  • Highly-Scalable - EAV supports space-efficient storage for very sparse data. Since the metadata describing a domain object can grow over time, the richness of the data model can also grow without negative historical impact.
  • Self Describing Data Model - A simple physical data format with partially self-describing data that maps naturally to XML or name-value pair representation.

Using EAV to represent every domain object in every application would certainly not be advisable, but when used judiciously intermixed with more traditional ORM domain models, the advantages can be many. The xweld-xmlwarehouse module defines a hybrid approach using an EAV metadata model to represent XML documents while storing the XML as the entity data such that it can be searched using both the metadata and the XML representation.

Key Concepts

An EAV (Entity-Attribute-Value) model describes application domain objects, or Entities as a block of data with associated attribute values that represent the searchable properties of the Entity. In other words, an EAV model represents domain objects by metadata. The universe of metadata values is potentially very large but the number of values that apply to a specific Entity is typically quite modest, but can grow over time.

Entity types are used to define the attribute metadata for instances or objects of that type. Entities have a type (and therefore attribute metadata) and instances of attribute values, each of which correspond to those of the type's attributes. An AttributeValue may be numeric, temporal or a string value.

Entities may be identified by eternally assigned primary keys, or identifiers, which may be numeric or string values. The xmlwarehouse supports the association of entities with any number of external identifiers so that documents in the warehouse may be retrieved using those external assigned primary keys. For example, a Trade document may have a trade identifer assigned by the upstream trading system, and in turn that trade may contain references to securities that have identifiers assigned by multiple third parties such as Bloomberg, Reuters, CUSIP, ISIN etc.

CRUD operations are supported using the EntityManager class, which supports bulk retrieval using a scrollable result set, lookup by id, or queries by EAV attrbutes.

Packages and Classes

xweld-xmlwarehouse includes the following packages:
  • com.xweld.persistence – implements the EntityType/AttributeMetaData classes that configure the EAV model as well as the Entity/Attribute class that represents instances of domain objects stored in the warehouse
  • com.xweld.services – defines the classes to import and process XML documents and store them in the warehouse
  • com.xweld.io – contains the entity writer and entity writer configuration details necessary to write and handle XML entities for the import service

Entities, EntityTypes, Attributes and MetaData

EntityType and AttributeMetaData

The EntityType class defines the metadata for the EAV attributes that make up that type. The AttributeMetaData class describes the name, type and description of each attribute of the type. AttributeMetaData also includes the definition of an XPATH query that is used at runtime to shred an XML document for the value of the named attribute.

Long, Calendar, Date, Double Boolean and String Attribute Values

The Attribute class and its sub-classes define the representation of attribute values of their corresponding types. Supported attribute value types include Long, String, Double, Date, Calendar and Boolean.

Entity and External Identifiers

The Entity class defines the structure of an Entity instance for a given EntityType. An Entity instance holds a collection of its attribute values as well as a collection of externally assigned identifers. Both string and long values of identifiers are supported. ExternalIndentifers are assigned by a 'source' (such as CUSIP, ISIN, CEDOL, Reuters) for an identifier 'scheme' (such as Security).

Entity Management

CRUD Operations on Entities

The EntityManager provides factory methods for creating and storing new EntityTypes and is used to apply CRUD operations to Entities. Listeners for each CRUD operation are supported, and these are installed at runtime via configuration. The EntityManagerConfiguration class records the types of each listener and is installed when the manager is configured. Bulk retrieval of Entity values is supported using IScrollableResults.

Runtime Components

An OOTB Entity Import Service is provided, which uses any number of concurrent, multi-threaded readers to import data from disparate sources, convert it to XML form, and pass it to the import service. The service shreds the XML using configurated XPATH queries into Entity attributes. The shredding operation as well as the mapping of queries to attributes is completely configurable. The Entities are written to the JPA store using the EntityManager. If a document is imported with an external identifier that has already been processed, the existing entity is updated. Otherwise, it is persisted as a new instance. This process is depicted below:

During processing the standard <xweld/> event emission model is used to optionally generate events for downstream listeners over JMS. The service, like all <xweld/> services, is configured with JMX beans for instrumentation. Any off-the-shelf JMX monitoring tool can be used to monitor and control the service.

Contact Us

For more information, please contact us at

Copyright

Copyright © 1999-2012 <xweld/> Development Community, All Rights Reserved.

The contents of this web site, including all images, text, graphics and software, are the sole property of the <xweld/> Development Community and its agents. Unauthorized use or transmission is strictly forbidden by international copyright law.

xmlwarehouse.JPG (143.6 kB) John Free, 10 August 2011 08:40 AM

entity-manager.JPG (61.1 kB) John Free, 11 August 2011 05:53 AM

entity-import-service.jpg (34.7 kB) John Free, 11 August 2011 09:03 AM