Angelica

Main Page	Results List

Search Engine Website

Description

Angelica is a search engine website that uses Web Crawler to browse the world wide websites It usues Text or voice search to get from the user the word that he wants to get websites based on, It response to his search with a list of websites ordered with the website ranking. to accomplish that 4 main Modules:

Web Crawler
Indexer
Query Processing / Phrase Searching
Ranker

Technologies

JAVA Servelates
JSP
HTML / CSS

Ranking

Term_frequency-Inverse_document(TF-IDF) to rank relevance of a search query
The PageRank algorithm to rank the popularity of all pages
In geographical based ranking if the extension of the page is the same as the location of the search query then this page’s rank is increased by a default percentage to give a boost to the rank of this page
Personalized search, after the user click on a page the base url for this page is increased by certain percentage

How To run

Connect to mssql and Create a database called search_engine in mssql
Run the file called CrawlerMain.java to start crawling,indexing and ranking websites
In the case that you stop the program before crawler finishes crawling its limit of pages you should run the Ranker.java module on its own to rank pages crawled so far as the popularity ranking starts after the crawler finishes to get more accurate results

To run the website

run tomcat
run the main.jsp file on tomcat

Dependencies

jsoup.jar
servler-api.jar
json-simple.jar
mssql-jdbc.jar
opennlp-tools.jar

In the link attached is a .bak file of a database that the program crawled and indexed so it could be used to preform queries on it

https://drive.google.com/file/d/1iLhaS6YQ9PzUpCq32B2ShzDFTY4sHlN7/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
SearchEngine		SearchEngine
APTFinalAssessment.pdf		APTFinalAssessment.pdf
README.md		README.md
members.txt		members.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Angelica

Table of Contents

Description

Technologies

Ranking

How To run

To run the website

Dependencies

About

Releases

Packages

Contributors 4

Languages

SalmaIbraheem/SearchEngine

Folders and files

Latest commit

History

Repository files navigation

Angelica

Table of Contents

Description

Technologies

Ranking

How To run

To run the website

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages