Fen Bilimleri Enstitüsü
Permanent URI for this communityhttps://hdl.handle.net/20.500.12416/30
Browse
Browsing Fen Bilimleri Enstitüsü by Author "Abdulwahid, Nibras"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Citation Count: ABDULWAHID, N. (2014). Crawling the web using Apache Nutch and Lucene. Yayımlanmamış yüksek lisans tezi. Ankara: Çankaya Üniversitesi Fen Bilimleri EnstitüsüCrawling the web using Apache Nutch and Lucene(Çankaya Üniversitesi, 2014-07-31) Abdulwahid, Nibras; Çankaya Üniversitesi, Fen Bilimleri Enstitüsü, Matematik Bilgisayar BölümüThe availability of information in large quantities on the Web makes it difficult for user selects resources about their information needs. The good link between the internet users and this information is Search engine. Search engine is kind of Information Retrieval (IR). It works on data collection from the Web by software program is called crawler, bot or spider. Most of Search Engines users don't know the mechanism of action the Search Engine, like how Search Engine works and how it catch information in the Web and how it rank the results to users. For this reason in this thesis used the open-source Search Engine is researched in detail. In this study, we used each of (Apache Nutch and Lucene) to clarify work of Web crawling open source. They are released under the Apache Software Foundation. Nutch is a web Search Engine working to search and index Web Pages from the World Wide Web (WWW). Nutch is based or built on top of Lucene. It uses in the information retrieval technology. It has more software libraries to indexing of large-size data. Lucene doesn't care about information existing in the Web, like PDF, TEXT, and MS Word. It is working to indexing these documents and convert them to the data can be utilized. The benefit of using both Nutch and Lucene in this study, they are free and we can their development. The Nutch and Lucene are written by Java language, it is a computer programming language. Furthermore, we used Tag Cloud Technology to analysis and view the Lucene content or its index