Navigational Integration in Semistructured Document

As described in data type, the basic data types in our data model consists of three types:
  1. Domain,
  2. listof IDName, and
  3. link IDName.
In this section, we present completions of navigational integration for WISs, where schemas of Web document including three properties:
  1. a heterogeneous domain: This is used for data that can be defined as string capable to represent in several domains. For example, contact address may be represented by e-mail in one place and by room number in another.
  2. a heterogeneous structure: This is used for data that can be defined as alternative complex structure. For example, staff data may be represented by a teacher object in one place and by a visitor object in another.
  3. a concatenation domain. This is used for data that is composed of several atomic domains and requires to used it as one unit. For example, name is composed of first name and last name.
To support semistructures of Web data, we further use two operators: '|' and ','. The former operator indicates an OR constructor and the latter operator indicates a concatenated sequence of appearance. Using these operators onto the above three data types, we use other 5 data types as follows:
  1. ($DM_1 | DM_2 | ...$), used to describe an element whose data-value can be represented by different atomic domains. (termed heterogeneous domain)
  2. ($DM_1, DM_2, ...$), used to describe an element whose data-value is composed of multiple atomic data-values and is required to be declared as a single data-value.
  3. listof ($IDName_1 | IDName_2|...$). It says that this data type is a set of objects which belong to $IDName_1$, or $IDName_2$, .... (termed heterogeneous structure).
  4. ($DM_1 | DM_2 | ... |$listof($IDName_1 | IDName_2|...$)). This is used to describe an element that may be a heterogeneous domain or a heterogeneous structure.
  5. link ($IDName_1 | IDName_2 | ...$). This is used to describe an element that may link to an external page that may have different structures.
An example of document with the above properties is shown in Figure 1. Their interface definitions are shown in Figure 2.



Figure 1





Figure 2


Path Expression:

A data-entry on a Web page can be specified by a path expression. It has the form

t0->{A1}-> ... ->{Ai}=>IDName ->{Aj}->[<Domain>]... ->{An}
,where t0 is an object variable and {Ai} is an attribute. An arrow (->) denotes traversing down an edge in a nested structure of a page. A double arrow (=>) followed by an ID name that wraps its next result-page denotes traversing an external link between different pages. The [] is optional and is used to specific data whose domain is Domain

As an example, let D be an object variable of an ID Department, Then the following path expression

D->{academicPage}=>Academic->{laboratories}->{staffs}->{fullName}
is interpreted to get first-names of all staffs in all academic pages in a given instance of Department page.

Query Command

In such a general case, the condition expression can be written by a disjunctive normal form (DNF) of atomic condition expressions. That is, the WHERE clause of such a general query is written in:

CT1 or CT2 or ... or CTj
where CTi (1 <= i <= j) is one of j patterns in a set of derived links for navigating over the sequence WIS1, WIS2,...,WISn. Namely, CTi is a condition expression that contains ANDs of Cijs, s.t.

CTi = Ci1 and Ci1 and ... and Cin
where Cij (2 <= j <= n) = the condition specifying a derived link from WISj-1 to WISj, and Ci1 = the condition to access the first front page.

Example:

Let the data type of an attribute A in a WIS1 be a heterogeneous type defined as (DM1 | DM2). Then, consider the conditional expression for a derived link from both A and other data-values in WIS1 to another WIS2. This can be written as follows:

 where   ... 
 and    ... 
 and   (WIS2-> methodOfWIS2(WIS1 ->{A}<DM1>, 
	WIS1->{B}->{D}->{E}, 
        WIS1->{B}->{D}->{F})  
                    or         
        WIS2-> methodOfWIS2(WIS1 ->{A}<DM2>, 
        WIS1 ->{B}->{D}->{E}, 
        WIS1 ->{B}->{D}->{F})) 
 and  ...

In this way, the path expressions enable us to write down all queries in a DNF of condition expressions on atomic domains.
Last modified: Mon Apr 24 22:02:02 JST 2000