An Overview of Web Data Extraction Techniques

(18.117.7.243)

Users online: 16377

Ijournet

Email id

FREE

Sample Issue

Trial Access

International Journal of Scientific Engineering and Technology
Year : 2013, Volume : 2, Issue : 4
First page : ( 278) Last page : ( 287)
Online ISSN : 2277-1581.

An Overview of Web Data Extraction Techniques

Devika K*, Surendran Subu**

Department of Computer Science and Engineering, SCT College of Engineering, Trivandrum, Kerala

*k_devu@yahoo.co.in

**subusurendran@gmail.com

Online published on 4 November, 2017.

Abstract

Web pages are usually generated for visualization not for data exchange. Each page may contain several groups of structured data. Web pages are generated by plugging data values to predefined templates. Manual data extraction from semi supervised web pages is a difficult task. This paper focuses on study of various automatic web data extraction techniques. There are mainly two types of techniques one is based on wrapper induction another is automatic extraction. In wrapper induction set of extraction rules are used, which are learnt from multiple pages containing similar data records.

Top

Keywords

Data extraction, wrapper induction, DOM tree, web crawler, Data alignment, pattern mining.

Top

Access denied

Your current subscription does not entitle you to view this content or Abstract is unavailable, the access to full-text of this Article/Journal has been denied. For Information regarding subscription please click here.

For a comprehensive list of other publications available on IJour.net please click here

or, You can subscribe other items from IJour.net (Click here to see other items list.)

Top

║ Site map ║ Privacy Policy ║ Copyright ║ Terms & Conditions ║

751,136,297 visitor(s) since 30^th May, 2005.

Note: Please use Internet Explorer (6.0 or above). Some functionalities may not work in other browsers.

Agriculture
Applied Science/Technology
Biology
Botany
Business/Economics/Management
Chemistry
Civil Engineering
Commerce/Banking/Finance
Computers/Information Technology
Dental Science
Earthscience
Education
Engineering Mechanics/Materials
Environment
Health Science
Humanities
Library and Information Science
Management
Mathematics/Statistics
Medical Science
Nanotechnology
Nursing
Pharmacy
Physics
Social Science
Veterinary/Animal Sciences