Detecting Impolite Crawler by Using Time Series Analysis

Abstract

Numerous web crawlers especially impolite crawlers visit websites to get contents every day, which yields higher access frequency than the websites can hold. The big traffic of impolite crawlers causes a strong hazard on analysis of normal users and advertisement income. In this paper, we present a method to detect impolite crawlers by using time series analysis. This method is applied to real data of web server logs. Compared with the old methods only using common log attributes as features, the method using time series features improves detection accuracy by at least 20%.

Publication
In IEEE 25th International Conference on Tools with Artificial Intelligence
Avatar
Zhiqian Chen
Ph.D. Candidate in Computer Science

Zhiqian Chen is a Ph.D. candidate at Department of Computer Science, Virginia Tech, focusing on AI and interdisciplinary research.