Hi,
I have not been able to get JCrawler to avoid crawling certain areas of my site.
I have Joomla SEF URLs enabled and also have JEvents installed, so when JCrawlers begins crawling it picks up every single calendar url for the past and future 10 years lol
There are errors like
# Curl error on url http://www.animewave.net/events/day.listevents/2009/11/06/-.html: Connection time-out
# Curl error on url http://www.animewave.net/events/day.listevents/2009/11/07/-.html: Connection time-out
# Curl error on url http://www.animewave.net/events/week.listevents/2009/11/8/-.html: Connection time-out
And URLs that were indexed are
http://www.animewave.net/events/day.listevents/2009/12/15/-.html
http://www.animewave.net/events/day.listevents/2009/12/16/-.html
What I haven't been able to do is to make it so JCrawler COMPLETELY avoids the calendar extension.
I have tried telling it to avoid the SEF link
http://www.animewave.net/events/
I have tried telling it to avoid a component link
http://animewave.net/index.php?option=com_jevents&task=month.calendar
Any help? Because I really would love to index my site with JCrawler, but I really don't want 10000 URLs just for the calendar...