Implementation

Data Model
- Interface Definition
- Example
Page Specific Wrapper
Path Expression
Query Language
Example
Navigational Integration in Semistructured Documents

Data Model

Our data model is a minor variation on the object-oriented model designed for Web contents. To wrap WISs, our policy is as follows:

We regard a Web page as a set of complex objects. Thus, web pages of a common structure are instances of a class. We describe the schema of this class (= a set of complex objects) by an interface definition (ID).
Each ID directly describes a class of a simple structure like a normalized relation scheme, but it can express complex objects by using references to other IDs or OR-constructors. Path expressions are used for specifying data-values at desired positions in the object structure.
Further, IDs which directly wrap a WIS are given public methods which call the inherent service-calls of the UI or DA parts of the corresponding WIS.

Interface definition:
A general syntax of an ID is as follows:

Figure 1

IDName is the name of ID, and is maintained in a repository database DatabaseName whose address = URLofProxyExecutor. An ID has a set of attributes of the form (A₁ DT₁, ... , A_i DT_i, ... , A_n DT_n), which describes data in a page where A_i and DT_i are an attribute and its data type, respectively. Data types are used for modeling semistructures.

Methods have a form [static] methodName(DT₁ AR₁ , ... , DT_n AR_n) where methodName is the name of the method and DT_i is the data type of the argument AR_i. The keyword static says that the method is a class method in the sense of object-oriented languages. (The other case implies an instance method.) There are two types of methods: public and private. The methods defined in the public part can be used in users' query command. In contrast, the methods of the private part are implicitly invoked by the system. The concrete realization of the methods are described in Perl code in the implement part.

Data type: The following kinds of basic data types are represented in our data model:

Domain (denoted by DM). This represents a pool of atomic data-values(e.g., text, integer) that have exact data-formats and meanings. Domain is an atomic unit of metadata, used for resolving semantic conflict between different sources.
listof IDName. It represents a list of object-ids which are instances of IDName. This data type is used for describing a nested structure of Web pages.
link IDName. It describes an external link to another page where IDName is the name of the top-level ID that wraps that page.
Methods: To access a Web page, an ID for the page must have a public static method. Arguments of such a method depend on the type of the Web page. In case of a front page (= the 1st page containing a client application of a WIS) whose url is unique, the arguments of the method are used to update data in the page. In case of a result page which is generated by a server, the arguments of the method are used as data-conditions to access that Web page. In addition, to interact with the system, a set of methods having pre-defined names must be given in the private part.

Example:

An example of location-dependent WIS
An example of a document WIS of a department of a university

Page-specific wrapper:
Before describing path expressions, we explain here how the common data model is used by wrappers.

For each page, there must be a page-specific wrapper to coordinate between a proxy executor(PE) and the IDs that wrap the page. It works as follows:

When the page-specific wrapper receives a query command from the PE, it invokes associated public methods defined in the IDs to get a result page.
Thereafter, it parses the result page, finds data from the page and creates objects from the data according to the definition given in the IDs.
Next, it embeds data-tags into the page, and returns both the data-objects and the (HTML) page with data-tag to the PE. Using both of these results, the PE can generate derived links in the resulting HTML page, which is passed to users' browsers.
By following this procedure, the PE can execute only a subpart of derived links in the current page, and so it can materialize navigational integration at a time.

Path Expression:
A data-entry on a Web page can be specified by a path expression. It has the form
t₀->{A₁}-> ... ->{A_i}=>IDName ->{A_j} ->... ->{A_n}
,where t₀ is an object variable and {A_i} is an attribute. An arrow (->) denotes traversing down an edge in a nested structure of a page. A double arrow (=>) followed by an ID name that wraps its next result-page denotes traversing an external link between different pages. As an example, let D be an object variable of an ID Department, Then the following path expression
D->{academicPage}=>Academic->{laboratories}->{staffs}->{fullName}
is interpreted to get fullnames of all staffs in all academic pages in a given instance of Department page.

Query Language
The navigational integration is defined by using a query command. Here we describe our query language and then how to generate derived links based on the query command. For simplicity, we assume that only atomic domains are used in queries.

Our query language is a SQL-like language. It has a syntax as follows:

Figure 2
where:

OV_i is an object variable;
ID_i is a name of an interface definition that wraps WIS_i;
DB_i is a repository database name;
PE_i is an url of a proxy executer; and
C_i is a condition to retrieve a Web page from WIS_i.
The C_i is given by a form ID_i ->methodName_i(arg_(i-1)1,...,arg_(i-1)j) where arguments (arg_(i-1)1, ... ,arg_(i-1)j) are constants or path expressions specifying the data of the previous WIS(WIS_i-1).

Derived links:
For a given C_i mentioned above, there must be a derived link from the current page (P_(i-1)) to a new page (P_i) which will be generated from ID_i by the expression ID_i->methodName_i(arg_(i-1)1,...,arg_(i-1)j). Because data-values of the arguments arg_(i-1)1,...,arg_(i-1)j are found in the current page P_(i-1), we can create a derived link from P_(i-1) to P_i if the source position of the link is decided. The destination of the link is a new result-page itself.

Currently, the source position of a derived link is set to the position of arg_(i-1)x s.t. arg_(i-1)x is the rightmost leaf at the deepest level in the set of the arguments arg_(i-1)1,...,arg_{(i-1)j when the current page P_{(i-1) is regarded as a tree structure. In retrospect, it would be better to add an location-indicator in a path expression.

The positions can be determined by searching the data-tags embedded by the page-specific wrapper.

Example
Now, let us show an example of the navigational integration by a query. Suppose a mobile user wants to find the information of laboratories and staffs in each laboratory in each floor, by starting from the client application shown in this example To do so, he wants to define the navigational integration over this WIS (denoted by WIS1), a document WWW server of the department (shown in this example(denoted by WIS2)), and the Altavista search engine (denoted by WIS3). The query command is shown as follows:
from B in BldGuideF, D in Department, A in AltaSearch source BldGuideF of ISBuilding on http://HOST1/cgi-bin/Proxy.pl, Department of ISInformation on http://HOST2/cgi-bin/Proxy.pl, AltaSearch of NewAltaVista on http://HOST3/cgi-bin/Proxy.pl where BldGuideF->getBldForm('','') and Department->getByKeyWord(B->{submit}=>FloorLabInfo->{lab}) and AltaSearch->getByKey(D->{laboratories}->{staffs}->{fullName})

Figure 3

Figure 3 shows how these three WISs work under this query of navigational integration. Let us explain how the materialization proceeds under this query:

Firstly, the original front page of WIS1 is created through the wrapper by running BldGuideF(). When running the second Where-condition in the above query, a proxy executer must firstly run the service-invocation corresponding to the submit button. In the other cases, go proceed to the next page. At the page of the upperright corner in Fig.3, derived links are now embedded at the laborotarory names in the result pages of the WIS1. This is to find a department page containing the required laboratory, bypassing the front page of WIS2. When the materialization proceeds as the user issues clicking, derived links (= remaining subqueries) are embedded at the staff names in the result page of WIS2. These links are to invoke result pages of the altavista search engine without entering the staff name into the front page of Altavista. (The result page from Altavista is not shown in this Figure).

Navigational Integration in Semistructured Documents
The detail of this topic can be seen in this link

Last modified: Mon Apr 24 19:22:32 JST 2000