DANA-WH
OPERATING STRUCTURE
By distributed database network, we are
referring to a large data collection with
different constituent data sets under
the control of separate Relational Database
Management Systems (RDBMSs) running on
independent, remote computer systems that
are linked to the network. Each of the
participating systems in the network has
autonomous processing capability for local
applications, but each can also participate
in the execution of global—that
is to say network—applications.
The connections between the systems in
the total network are hidden, or transparent,
to the users, making the network appear
to the user to operate as a single data
warehouse. At the same time, users can
easily discover the contributing institution
for data by checking the appropriate datafiles
DANA-WH.
DANA-WH Interoperability
In order to establish interoperability
of the participating network databases,
DANA-WH uses open standards such as Structured
Query Language (SQL) and Extensible Markup
Language (XML) combined with new Java
Technologies. DANA-WH participants are
able to use whichever Java-compliant RDBMS
they choose. This is more easily accomplished
when vender-specific commands are avoided,
so the system uses generic SQL statements
for table creation and data insertion
and retrieval. The current operational
prototype works with Oracle 9i and PostgreSQL,
and soon will work with IBM Informix,
IBM DB2, Microsoft SQL Server, and other
fully functional RDBMSs with Java JDBC
Data Access.
XML, which is a platform independent and
license-free text format, provides a means
to format data in a way that facilitates
data sharing between computer processes.
As such, it provides standardization for
Internet distributed data. With the use
of XML Schema or Document Type Definitions
(DTDs), the data in an XML document are
well described and easily shared. XML
is used to archive and display the associated
metadata for multimedia and textual objects
in archives.
Servlets
The DANA-WH network uses servlets to handle
database queries and other communications
with the remote servers. This allows us
to load the JDBC drivers at the servlet
level for the appropriate database systems
under that servlet's control. The connection
to the RDBMS is implemented through a
connection pool that can be set to open
connections only as needed. As a result,
the pool will not use more connections
than are actually required at any given
time, which will maximize efficiency and
cost effectiveness by not tying up more
licenses than needed. DANA-WH currently
utilizes two levels of Java servlets.
One level is used for searching, the other
for routing the searches to the correct
systems.
There will be times when not all of the
data in the network will be available,
but we hope to minimize this and to provide
a list of all offline servers at the time
of a query. The DANA-WH system is designed
to scale well, from running entirely on
one computer to containing any number
of sites, each of which could employ up
to four servers (even more in a high availability
cluster scheme), each with different functionality.
Participants
The primary task of the institutions participating in DANA-WH is
archiving of data in textual and visual forms. The database tables for
different collections will be created by the participating institutions
in accord with the DANA-WH schema. The system administrator at each database
location may grant write access to qualified individuals for his/her database.
Future tables can be created and data entered through either traditional methods
or our Java data entry application.
|