Implementation


Data Model

Our data model is a minor variation on the object-oriented model designed for Web contents. To wrap WISs, our policy is as follows:
  1. We regard a Web page as a set of complex objects. Thus, web pages of a common structure are instances of a class. We describe the schema of this class (= a set of complex objects) by an interface definition (ID).
  2. Each ID directly describes a class of a simple structure like a normalized relation scheme, but it can express complex objects by using references to other IDs or OR-constructors. Path expressions are used for specifying data-values at desired positions in the object structure.
  3. Further, IDs which directly wrap a WIS are given public methods which call the inherent service-calls of the UI or DA parts of the corresponding WIS.

Interface definition:

A general syntax of an ID is as follows:

Figure 1

IDName is the name of ID, and is maintained in a repository database DatabaseName whose address = URLofProxyExecutor. An ID has a set of attributes of the form (A1  DT1, ... , Ai  DTi, ... , An  DTn), which describes data in a page where Ai and DTi are an attribute and its data type, respectively. Data types are used for modeling semistructures.

Methods have a form [static] methodName(DT1  AR1 , ... , DTn  ARn) where methodName is the name of the method and DTi is the data type of the argument ARi. The keyword static says that the method is a class method in the sense of object-oriented languages. (The other case implies an instance method.) There are two types of methods: public and private. The methods defined in the public part can be used in users' query command. In contrast, the methods of the private part are implicitly invoked by the system. The concrete realization of the methods are described in Perl code in the implement part.

Data type: The following kinds of basic data types are represented in our data model:
  1. Domain (denoted by DM). This represents a pool of atomic data-values(e.g., text, integer) that have exact data-formats and meanings. Domain is an atomic unit of metadata, used for resolving semantic conflict between different sources.
  2. listof IDName. It represents a list of object-ids which are instances of IDName. This data type is used for describing a nested structure of Web pages.
  3. link IDName. It describes an external link to another page where IDName is the name of the top-level ID that wraps that page.
Methods: To access a Web page, an ID for the page must have a public static method. Arguments of such a method depend on the type of the Web page. In case of a front page (= the 1st page containing a client application of a WIS) whose url is unique, the arguments of the method are used to update data in the page. In case of a result page which is generated by a server, the arguments of the method are used as data-conditions to access that Web page. In addition, to interact with the system, a set of methods having pre-defined names must be given in the private part.

Example:


Page-specific wrapper:

Before describing path expressions, we explain here how the common data model is used by wrappers.

For each page, there must be a page-specific wrapper to coordinate between a proxy executor(PE) and the IDs that wrap the page. It works as follows: By following this procedure, the PE can execute only a subpart of derived links in the current page, and so it can materialize navigational integration at a time.

Path Expression:

A data-entry on a Web page can be specified by a path expression. It has the form

t0->{A1}-> ... ->{Ai}=>IDName ->{Aj} ->... ->{An}
,where t0 is an object variable and {Ai} is an attribute. An arrow (->) denotes traversing down an edge in a nested structure of a page. A double arrow (=>) followed by an ID name that wraps its next result-page denotes traversing an external link between different pages. As an example, let D be an object variable of an ID Department, Then the following path expression

D->{academicPage}=>Academic->{laboratories}->{staffs}->{fullName}
is interpreted to get fullnames of all staffs in all academic pages in a given instance of Department page.

Query Language

The navigational integration is defined by using a query command. Here we describe our query language and then how to generate derived links based on the query command. For simplicity, we assume that only atomic domains are used in queries.

Our query language is a SQL-like language. It has a syntax as follows:


Figure 2

where:
  1. OVi is an object variable;
  2. IDi is a name of an interface definition that wraps WISi;
  3. DBi is a repository database name;
  4. PEi is an url of a proxy executer; and
  5. Ci is a condition to retrieve a Web page from WISi.
The Ci is given by a form IDi ->methodNamei(arg(i-1)1,...,arg(i-1)j) where arguments (arg(i-1)1, ... ,arg(i-1)j) are constants or path expressions specifying the data of the previous WIS(WISi-1).

Derived links:

For a given Ci mentioned above, there must be a derived link from the current page (P(i-1)) to a new page (Pi) which will be generated from IDi by the expression IDi->methodNamei(arg(i-1)1,...,arg(i-1)j). Because data-values of the arguments arg(i-1)1,...,arg(i-1)j are found in the current page P(i-1), we can create a derived link from P(i-1) to Pi if the source position of the link is decided. The destination of the link is a new result-page itself.

Currently, the source position of a derived link is set to the position of arg(i-1)x s.t. arg(i-1)x is the rightmost leaf at the deepest level in the set of the arguments arg(i-1)1,...,arg{(i-1)j when the current page P{(i-1) is regarded as a tree structure. In retrospect, it would be better to add an location-indicator in a path expression.

The positions can be determined by searching the data-tags embedded by the page-specific wrapper.

Example

Now, let us show an example of the navigational integration by a query. Suppose a mobile user wants to find the information of laboratories and staffs in each laboratory in each floor, by starting from the client application shown in this example To do so, he wants to define the navigational integration over this WIS (denoted by WIS1), a document WWW server of the department (shown in this example(denoted by WIS2)), and the Altavista search engine (denoted by WIS3). The query command is shown as follows:

 from B in BldGuideF, D in Department, A in AltaSearch 
 source BldGuideF of ISBuilding on http://HOST1/cgi-bin/Proxy.pl, 
	Department of ISInformation on http://HOST2/cgi-bin/Proxy.pl, 
        AltaSearch of NewAltaVista on http://HOST3/cgi-bin/Proxy.pl  
 where BldGuideF->getBldForm('','') 
 and  Department->getByKeyWord(B->{submit}=>FloorLabInfo->{lab}) 
 and  AltaSearch->getByKey(D->{laboratories}->{staffs}->{fullName})


Figure 3


Figure 3 shows how these three WISs work under this query of navigational integration. Let us explain how the materialization proceeds under this query:

Firstly, the original front page of WIS1 is created through the wrapper by running BldGuideF(). When running the second Where-condition in the above query, a proxy executer must firstly run the service-invocation corresponding to the submit button. In the other cases, go proceed to the next page. At the page of the upperright corner in Fig.3, derived links are now embedded at the laborotarory names in the result pages of the WIS1. This is to find a department page containing the required laboratory, bypassing the front page of WIS2. When the materialization proceeds as the user issues clicking, derived links (= remaining subqueries) are embedded at the staff names in the result page of WIS2. These links are to invoke result pages of the altavista search engine without entering the staff name into the front page of Altavista. (The result page from Altavista is not shown in this Figure).

Navigational Integration in Semistructured Documents

The detail of this topic can be seen in this link
Last modified: Mon Apr 24 19:22:32 JST 2000