HomeTechnologyDemocratic Governance

Open Web Repository Content

Open Web Repository contains the following data about website pages:


Also it includes third party generated RDFs for websites’ pages.

Web Page Extracted Content is the actual content of specific web page in XML based format detected by OMFICA Crawler.

The example below contains “Web Page Extracted Content” for http://www.omfica.org/npo_open_web_repository.php.

Example 1.
<Description about="http://www.omfica.org/npo_open_web_repository.php">
<omfica:contentText>

Open Web Repository contains integrated data about websites’ structure, pages’ actual content, visit statistics and pages’ semantic analysis results of World Wide Web. OMFICA develops own and also deploys third party technologies for the creation of Open Web Repository.

</omfica:contentText>
<omfica:contentText>

Activities carried out by OMFICA are intented for Open Web Repository content aggregation and could be logically separated into following subgroups:

</omfica:contentText>
...
<omfica:contentText>

Continuously increasing and keeping up to date Website Parse Templates Repository. Open Web Crawling process management and web page parsed data storing in Open Web Repository. Website pages visit statistics management. Generation of Open Web Repository snapshots and daily updates as files available for FTP downloading.

</omfica:contentText>
</Description>

Web Page Parse Result is an XML based format which contains structured data of specific web page extracted by OMFICA Crawler based on Website Parse Templete’s ontology and templates sections.

The example below contains “Web Page Parse Result” for http://music.yahoo.com/ar-8206256---Amy-Winehouse.

Example 2.
<?xml version="1.0"?>
<RDF
xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:omfica="http://omfica.org/rdfnamespace/music/schema/">
<Description about="http://music.yahoo.com/ar-8206256---Amy-Winehouse">

<omfica:id>8206256</omfica:id>
<omfica:fullname>Amy-Winehouse</omfica:fullname>

</Description>
</RDF>

Website sitemaps is an XML based format defined by http://www.sitemaps.org. Sitemaps XML schema defined at http://www.sitemaps.org/protocol.php.

Web Page Internal & External Links is an XML based format which contains internal and external links of specific web page.

The example below contains “Web Page Internal & External Links” for http://www.omfica.org/npo_data_repository.php.

Example 3.
<Description about="http://www.omfica.org/npo_data_repository.php">
<omfica:links>

http://www.omfica.org/npo_website_template.php
http://www.omfica.org/npo_open_web_crawling.php
...

</omfica:links>
<omfica:externalLinks>

http://www.sitemaps.org
http://www.w3.org/2001/sw
...

</omfica:externalLinks>
</Description>


Third Party Analyzed Storage

Third Party Analyzed Storage is an XML based format that contains: