Discussion:
[General] HoldBadHrefs has no effect
Jeff Taylor
2017-02-08 22:25:51 UTC
Permalink
I've been running Mnogo 3.3.13 under debian wheezy for a number of
years, but it seems that old cached content is never removed. I
normally run with HoldBadHrefs=7d, but I've even tried setting it to 1s,
and the old content is still never removed. As an example, on a recent
search I noticed pages that were last cached in Aug 2011 (which was
probably when the site went offline), but it still comes up in searches.

Help? I would even be happy with running a mysql query to remove all
cached content that is more than 7 days old, but I didn't want to go
blindly deleting things without knowing how the info in the tables might
be cross-referenced. If there's a way to fix indexer.conf I would also
be happy. Note that the old pages which should be removed are no longer
referenced in server.list, and when I run indexer I get a long list of
URLs that can't be reached. So what can I do to get these old entries
removed from the database?
Alexander Barkov
2017-02-17 10:01:01 UTC
Permalink
Hello Jeff,

Sorry for a late reply.
Post by Jeff Taylor
I've been running Mnogo 3.3.13 under debian wheezy for a number of
years, but it seems that old cached content is never removed. I
normally run with HoldBadHrefs=7d, but I've even tried setting it to 1s,
and the old content is still never removed. As an example, on a recent
search I noticed pages that were last cached in Aug 2011 (which was
probably when the site went offline), but it still comes up in searches.
Help? I would even be happy with running a mysql query to remove all
cached content that is more than 7 days old, but I didn't want to go
blindly deleting things without knowing how the info in the tables might
be cross-referenced. If there's a way to fix indexer.conf I would also
be happy. Note that the old pages which should be removed are no longer
referenced in server.list, and when I run indexer I get a long list of
URLs that can't be reached. So what can I do to get these old entries
removed from the database?
Which http status do these old documents have?

Can you please check statistics for a few old documents:

./indexer -S -u http://old1/
./indexer -S -u http://old2/
./indexer -S -u http://old3/


Or using this SQL query:

SELECT status, url FROM url WHERE url IN
('http://old1/','http://old2/','http://old3/');



Also, the output from this command would be helpful:

./indexer -am -v6 -u http://old1/
Post by Jeff Taylor
_______________________________________________
General mailing list
http://lists.mnogosearch.org/listinfo/general
Loading...