Nginx : Block User Agent

In Nginx, you can block certain user agents (normally it is crawler) like this : /etc/nginx/sites-enabled/default server { listen 80; server_name mysite.com; root /etc/tomcat7/webapps/mysite; if ($http_user_agent ~* (ahrefs|wget|crawler|majestic) ) { return 403; } location / { <!– xxx –> } } In above example, for “user agent” that contains one of this pattern : ahrefs|wget|crawler|majestic, …

Read more

Java – Check if web request is from Google crawler

If a web request is coming from Google crawler or Google bot, the requested “user agent” should look similar like this : Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) or (rarely used): Googlebot/2.1 (+http://www.google.com/bot.html) Source : Google crawlers 1. Java Example In Java, you can get the “user agent” from HttpServletRequest. Example : Service hosted at abcdefg.com @Autowired …

Read more