(18.222.240.21)
Users online: 17232     
Ijournet
Email id
 

International Journal of Engineering and Management Research (IJEMR)
Year : 2017, Volume : 7, Issue : 5
First page : ( 52) Last page : ( 56)
Print ISSN : 2394-6962. Online ISSN : 2250-0758.

Near Duplicate URL Detection for Removing Dust Unique Key

Santhi R. Vijaya

Research Scholar, Department of Computer Science, Tamil University College, Thanjavur, Tamil Nadu, India

Online published on 8 December, 2017.

Abstract

Regular parallel mining algorithms for mining frequent item sets intends to balance load by equally partitioning data among a group of computing nodes. But those existing parallel Frequent Item set Mining algorithms has serious performance issues. In big data environment existing mining algorithm suffer high communication and mining overhead induced by redundant data transmitted among computing nodes. We explore this problem by developing a data partitioning approach using the Map Reduce programming model. The aim of this paper is to enhance the performance of parallel Frequent Item set mining on Hadoop clusters. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, in this proposed model VUK (Valid Unique Key) DUST removing technique LDA-CRATS mining data is used to run this approach. This approach is to derive quality rules that take advantage of a multi-sequence alignment strategy. It demonstrates that a full multi-sequence alignment of URLs with duplicated content, before the generation of the rules, can lead to the deployment of very effective rules. By evaluating this method, it observed it achieved larger reductions in the number of duplicate URLs than our best baseline, with gains of 85 to 150.76 percent in two different web collections.

Top

Keywords

MapReduce, URL duplicated, multi-sequence, Hadoop.

Top

  
║ Site map ║ Privacy Policy ║ Copyright ║ Terms & Conditions ║ Page Rank Tool
745,408,839 visitor(s) since 30th May, 2005.
All rights reserved. Site designed and maintained by DIVA ENTERPRISES PVT. LTD..
Note: Please use Internet Explorer (6.0 or above). Some functionalities may not work in other browsers.