A spider, also known as a crawler or a bot, is the software Proximic uses to visit and access the content of webpages.
- are friendly and identify themselves,
- only download the static, textual content,
- honor the rules of a robots.txt,
- crawl at a slow rate by default.
Here are answers to the most common questions. If you need to know more, please contact us.
Sites may also be crawled in a linear fashion to provide site-level analysis to advertising partners who are interested in a specific site.
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com/info/spider.php)
To whitelist our spiders please add a separate paragraph to the robots.txt like this:
This analysis helps the advertiser to place topically relevant campaigns onto a safe environment. Relevance drives CPM, which is your win.
Some advertisers are stripping the URL parameters, which means a working URL like
www.forum.com/showthread.php?t=123is rendered into something like this:
User-agent: proximicMake sure that the robots.txt is in the correct location. It must be in the top directory, e.g. www.domain.com/robots.txt.
Placing the file in a subdirectory won't have any effect. Furthermore please note that the IP addresses used by the spiders change from time to time and that it may take up to a day for changes in robots.txt to propagate to all of our spiders.