Author: fabien
Email: ***@gmail.com
Message:
Thanks for your quick answer.
I tried to add the NoIndexIf but i cannot get it to work.
I used the indexer.conf default file, and added the two following lines at the end of that file :
Server http://www.wearethelous.com/feed/
NoIndexIf Content-Type application/rss+xml
I got the following log :
[71598]{--} Clearing
[71598]{--} Clearing done 0.01
[71600]{--} indexer from mnogosearch-3.4.1-mysql-pqsql started with '/etc/mnogosearch/indexer.conf'
[71600]{01} URL: http://www.wearethelous.com/feed/
[71600]{01} Server Path Allow 'http://www.wearethelous.com/feed/'
[71600]{01} Allow by default
[71600]{01} ROBOTS: http://www.wearethelous.com/robots.txt
[71600]{01} Request.Accept-Encoding: gzip,deflate,compress
[71600]{01} Request.Host: www.wearethelous.com
[71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
[71600]{01} Response.Connection: close
[71600]{01} Response.Content-Encoding: gzip
[71600]{01} Response.Content-Length: 67
[71600]{01} Response.Content-Type: text/plain
[71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:46 GMT
[71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; rel="https://api.w.org/"
[71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
[71600]{01} Response.ResponseSize: 475
[71600]{01} Response.ResponseTime: 2261
[71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips mod_bwlimited/1.4
[71600]{01} Response.Server-Charset: utf-8
[71600]{01} Response.Status: 200
[71600]{01} Response.URL: http://www.wearethelous.com/robots.txt
[71600]{01} Response.URL_ID: 1928115922
[71600]{01} Response.Vary: Accept-Encoding,User-Agent
[71600]{01} Response.X-Powered-By: PHP/5.5.29
[71600]{01} Response.X-Robots-Tag: noindex, follow
[71600]{01} Request.Accept-Encoding: gzip,deflate,compress
[71600]{01} Request.Host: www.wearethelous.com
[71600]{01} Request.User-Agent: MnoGoSearch/3.4.1
[71600]{01} Response.body:
[71600]{01} Response.Charset:
[71600]{01} Response.Connection: close
[71600]{01} Response.Content-Encoding: gzip
[71600]{01} Response.Content-Language:
[71600]{01} Response.Content-Length: 2337
[71600]{01} Response.Content-Type: application/rss+xml
[71600]{01} Response.crc32: 0
[71600]{01} Response.crc32old: 0
[71600]{01} Response.Date: Wed, 12 Oct 2016 20:42:48 GMT
[71600]{01} Response.ETag: "7059155a990290887650add31475f88e"
[71600]{01} Response.Hops: 0
[71600]{01} Response.ID: 5
[71600]{01} Response.ilinktext:
[71600]{01} Response.Last-Modified: Thu, 29 Sep 2016 12:48:50 GMT
[71600]{01} Response.Link: <http://www.wearethelous.com/wp-json/>; rel="https://api.w.org/"
[71600]{01} Response.MaxDocPerSite: 0
[71600]{01} Response.MaxHops: 256
[71600]{01} Response.meta.description:
[71600]{01} Response.meta.keywords:
[71600]{01} Response.msg.from:
[71600]{01} Response.msg.subject:
[71600]{01} Response.msg.to:
[71600]{01} Response.PrevStatus: 0
[71600]{01} Response.ResponseLine: HTTP/1.1 200 OK
[71600]{01} Response.ResponseSize: 2842
[71600]{01} Response.ResponseTime: 1455
[71600]{01} Response.Server: Apache/2.2.31 (Unix) mod_ssl/2.2.31 OpenSSL/1.0.1e-fips mod_bwlimited/1.4
[71600]{01} Response.Server-Charset: utf-8
[71600]{01} Response.Server_id: -2050898686
[71600]{01} Response.Status: 200
[71600]{01} Response.title:
[71600]{01} Response.URL: http://www.wearethelous.com/feed/
[71600]{01} Response.url.file:
[71600]{01} Response.url.host:
[71600]{01} Response.url.path:
[71600]{01} Response.url.proto:
[71600]{01} Response.URL_ID: -2050898686
[71600]{01} Response.Vary: Accept-Encoding,User-Agent
[71600]{01} Response.X-Powered-By: PHP/5.5.29
[71600]{01} Response.X-Robots-Tag: noindex, follow
[71600]{01} Status: 200 OK
[71600]{01} Guesser: Lang: , Charset: utf-8
[71600]{01} SectionFilter: NoIndexIf Match Wild Insensitive 'Content-Type' 'application/rss+xml'
[71600]{01} Flushing word cache
[71600]{01} Flushing word cache done 0.00
[71600]{01} Done (4 seconds, 1 documents, 2842 bytes, 0.69 Kbytes/sec.)
I see that the section filter talks about the NoIndexIf filter that i added, but the url is still indexed.
So what can be wrong ?
Thanks in advance for your help.
Fabien.
Post by b***@mnogosearch.orgHi,
Post by b***@mnogosearch.orgHi all,
Is it possible to exclude certain mime types such as rss feeds ?
http://www.mnogosearch.org/doc34/msearch-cmdref-noindexif.html
NoIndexIf Content-Type application/rss+xml
http://www.mnogosearch.org/doc34/msearch-cmdref-section.html#cmdref-section-user-defined
The idea is to define a user section using a regex pattern to catch some known RSS text fragments, and then use NoIndexIf with this section.
Reply: <http://www.mnogosearch.org/board/message.php?id=21790>