DANA-WH OPERATING STRUCTURE
By distributed database network, we are referring to a large data collection with different constituent data sets under the control of separate Relational Database Management Systems (RDBMSs) running on independent, remote computer systems that are linked to the network. Each of the participating systems in the network has autonomous processing capability for local applications, but each can also participate in the execution of global—that is to say network—applications. The connections between the systems in the total network are hidden, or transparent, to the users, making the network appear to the user to operate as a single data warehouse. At the same time, users can easily discover the contributing institution for data by checking the appropriate datafiles DANA-WH.
DANA-WH Interoperability
In order to establish interoperability of the participating network databases, DANA-WH uses open standards such as Structured Query Language (SQL) and Extensible Markup Language (XML) combined with new Java Technologies. DANA-WH participants are able to use whichever Java-compliant RDBMS they choose. This is more easily accomplished when vender-specific commands are avoided, so the system uses generic SQL statements for table creation and data insertion and retrieval. The current operational prototype works with Oracle 9i and PostgreSQL, and soon will work with IBM Informix, IBM DB2, Microsoft SQL Server, and other fully functional RDBMSs with Java JDBC Data Access.
XML, which is a platform independent and license-free text format, provides a means to format data in a way that facilitates data sharing between computer processes. As such, it provides standardization for Internet distributed data. With the use of XML Schema or Document Type Definitions (DTDs), the data in an XML document are well described and easily shared. XML is used to archive and display the associated metadata for multimedia and textual objects in archives.
Servlets
The DANA-WH network uses servlets to handle database queries and other communications with the remote servers. This allows us to load the JDBC drivers at the servlet level for the appropriate database systems under that servlet's control. The connection to the RDBMS is implemented through a connection pool that can be set to open connections only as needed. As a result, the pool will not use more connections than are actually required at any given time, which will maximize efficiency and cost effectiveness by not tying up more licenses than needed. DANA-WH currently utilizes two levels of Java servlets. One level is used for searching, the other for routing the searches to the correct systems.
There will be times when not all of the data in the network will be available, but we hope to minimize this and to provide a list of all offline servers at the time of a query. The DANA-WH system is designed to scale well, from running entirely on one computer to containing any number of sites, each of which could employ up to four servers (even more in a high availability cluster scheme), each with different functionality.
Participants
The primary task of the institutions participating in DANA-WH is archiving of data in textual and visual forms. The database tables for different collections will be created by the participating institutions in accord with the DANA-WH schema. The system administrator at each database location may grant write access to qualified individuals for his/her database. Future tables can be created and data entered through either traditional methods or our Java data entry application.
|