[General] Webboard: Index full html code in DDB

b***@mnogosearch.org

2016-05-30 19:13:50 UTC

Author: Alexander Barkov
Email: ***@mnogosearch.org
Message:
Hello,

Post by b***@mnogosearch.org
Hello,
I would like to crawl the whole html code for each url.

Perhaps cached copy is what you're looking for.
In 3.4.x cached copies are stored in a separate table "cachedcopy".
Cached copies are compressed by default, but compression can
be switched off:

http://www.mnogosearch.org/doc34/msearch-cmdref-cachedcopyencoding.html

Post by b***@mnogosearch.org
Is there anyway to do this ?
Section headhtml 25 2058 "<head([^>]*)>(*.)</head>" $2
Section bodyhtml 26 2058 "<body([^>]*)>(*.)</body>" $2
Section htmlcode 25 2058 "<html([^>]*)>(*.)</html>" $2
Section body 1 2018 afterheaders html
gets the body but with all htlm tags stripped out :(
Thank you for your help

Reply: <http://www.mnogosearch.org/board/message.php?id=21773>