Document WIS of a department of a univarisity
The below figure shows another example of a department homepage of a
university, including a front page and its result pages. This example
clarifies how semistructured data are represented by our data model.
Let us show how to wrap this page. This WIS is assumed to be a
structured document, whose DTD and an equivalent schema in our data
model are given in the below figure.
In order to explain interface definitions, let us use the interface
definition of Department shown as follows:
database ISInformation
address http://HOST2/cgi-bin/WrapperExecutor.pl
interface Department
body
url URLDomain,
academicPage link Academic,
deptName Department,
laboratories setof Laboratory,
method
public:
static getByURL(URLDomain $url);
static getByKeyWord(String $keyword);
private:
static new(Department $dept,OID $labOID);
printDetail();
implement
sub new {
my $package = shift;
my $deptName = shift;
my $labOID = shift;
my $this;
$this = newEmptyObject Department();
$this->{deptName} = $depName;
push(@{$this->{laboratories}},$labOID);
bless $this;
return $this;
}
sub getByURL{
my $package = shift;
my $url = shift;
my $resultPage;
my $resultPageFile = 'S_Major.html';
open(S,">$resultPageFile");
$resultPage = modifyHTMLContent($url);
print S $resultPage;
close S;
return $resultPageFile;
}
sub getByKeyWord{
my $package = shift;
my $keyword = shift;
my $url1 = 'http://hercules.hol.is.uec.ac.jp:8888/jua/IS/s_major.html';
my $url2 = 'http://hercules.hol.is.uec.ac.jp:8888/jua/IS/n_major.html';
my $pageContent;
my @url = ($url1,$url2);
my $resultContent;
my $resultFName = 'is_page.html';
foreach $url (@url){
$pageContent = modifyHTMLContent($url);
if ($pageContent =~ /$keyword/){
$resultContent .= $pageContent;
$resultContent =~ s/$keyword/$keyword<\/A>/g;
}
}
$resultContent =~ s/(Jump to result<\/A>\n$1/g;
open(S,">$resultFName");
print S $resultContent;
close S;
return $resultFName;
}
sub printDetail{
my $this = shift;
print "Departmant = $this->{deptName}\n";
foreach (@{$this->{laboratories}}){
$_->printDetail();
}
}
endinterface
This interface definition has four attributes: url,
academicPage, deptName , laboratories. The
second atrribute is a single-value to represent a link between the
Department page and Acedemic page. The third attribute is a
single-value of string and its domain is DeptDomain. The
fourth attribute is a multi-value keeping OIDs of instances of a
Laboratory interface. Using such a multi-valued attribute, we can
represent nested structure of documents. The first attribute
url is an additional element keeping the URL of the original
document page.
In the method part, there are two types of methods; a public
type and a private type. Only public methods are
invoked explicitly. In this example, getByURL() method is
the way to retrieve an original document having a given URL and
getByKeyWord() method is the way to retrieve an original
document having a given keyword string. The new() is a class
method used to created Department object and printDetail() is
a instance method used to print detail of information in each object.
Below are the interface definition for a laboratory element and a
staff element in the page.
database ISInformation
address http://HOST2/cgi-bin/WrapperExecutor.pl
interface Laboratory
body
labName LabDomain,
staffs setof Staff,
method
public:
private:
....
implement
...
endinterface
database ISInformation
address http://HOST2/jua/cgi-bin/WrapperExecutor.pl
interface Staff
body
position Position,
fullName Staff,
method
public:
private:
....
implement
....
endinterface
Finally, the below figure shows the instance of a department object
extracted from a department page. It is a complex object consisting of
multiple simple objects. The structure of page is mapped to the
structure of a department object.
Last modified: Mon Apr 24 13:21:25 JST 2000