Discussion:
Webboard: Statistics on daily occurence of words
g***@mnogosearch.org
2011-11-06 18:58:16 UTC
Permalink
Author: mscag
Email:
Message:
Hi,



I am a newcomer. I have installed mnogosearch 3.3.11 and started cumulating some data from a few news sites.



Is there a method to retrieve daily occurence statistics of some preselected words appearing on the crawled sites ?



The only way I managed to do it is running daily a script to

- clear all the data in the DB,

- crawl the sites,

- search for the words and output the result to a file.



Any better ideas ?

Reply: <http://www.mnogosearch.org/board/message.php?id=21325>
g***@mnogosearch.org
2011-11-07 08:30:54 UTC
Permalink
Author: Alexander Barkov
Email: ***@mnogosearch.org
Message:
Hi,
Post by g***@mnogosearch.org
Hi,
I am a newcomer. I have installed mnogosearch 3.3.11 and started cumulating some data from a few news sites.
Is there a method to retrieve daily occurence statistics of some preselected words appearing on the crawled sites ?
The only way I managed to do it is running daily a script to
- clear all the data in the DB,
- crawl the sites,
- search for the words and output the result to a file.
Any better ideas ?
Why do you clear the data?



Perhaps "indexer -a" should do.



Reply: <http://www.mnogosearch.org/board/message.php?id=21327>
g***@mnogosearch.org
2011-11-07 22:15:11 UTC
Permalink
Author: mscag
Email:
Message:
Hi,



I am not exactly sure what "indexer -a" does, and I couldn't find info about the parameters of "indexer".



Thanks

Reply: <http://www.mnogosearch.org/board/message.php?id=21328>
g***@mnogosearch.org
2011-11-08 18:56:27 UTC
Permalink
Author: Alexander Barkov
Post by g***@mnogosearch.org
Hi,
I am not exactly sure what "indexer -a" does, and I couldn't find info about the parameters of "indexer".
Thanks
It marks all documents as expired,

then it crawls through all documents once again.





Reply: <http://www.mnogosearch.org/board/message.php?id=21330>
g***@mnogosearch.org
2011-11-13 17:36:31 UTC
Permalink
Author: mscag
Email:
Message:
I see,



Thanks for your answer.



"indexer -a" deals with the crawling process. Where as I am interested with the reporting side.



I may have not expressed myself clear enough. What I am trying to achieve is to create a report displaying statistics about the occurances of any word/s as a table. A simple representation is below.



Any idea ?



Report representation :



Keyword-1, Keyword-2, Keyword-1, Keyword-2,

Day-1, 16, 14, 21, 04

Day-2, 61, 41, 12, 24

Day-3, 14, 17, 27, 41

Day-4, 15, 61, 62, 19



Reply: <http://www.mnogosearch.org/board/message.php?id=21340>
g***@mnogosearch.org
2011-11-13 19:54:50 UTC
Permalink
Author: Alexander Barkov
Post by g***@mnogosearch.org
I see,
Thanks for your answer.
"indexer -a" deals with the crawling process. Where as I am interested with the reporting side.
I may have not expressed myself clear enough. What I am trying to achieve is to create a report displaying statistics about the occurances of any word/s as a table. A simple representation is below.
Any idea ?
Keyword-1, Keyword-2, Keyword-1, Keyword-2,
Day-1, 16, 14, 21, 04
Day-2, 61, 41, 12, 24
Day-3, 14, 17, 27, 41
Day-4, 15, 61, 62, 19
Do you need statistics about the total number of

the word occurrences in the database, or the number

of *new* word occurrences found each day?





Reply: <http://www.mnogosearch.org/board/message.php?id=21345>
g***@mnogosearch.org
2011-11-13 20:18:19 UTC
Permalink
Author: mscag
Email:
Message:
Hi Alexander,



Yes. I am trying to learn how I can analyze and work with the crawled data.



"new word occurrences found each day" is what I am interested. I'll use this data to see the trend of the occurrence of some keywords.



I am also trying to figure out how the crawled data is organized in the database. If I can manage that, I will try to prepare the vector table of some keywords per day, and then reach the TF-IDF analysis results.



Regards.



Reply: <http://www.mnogosearch.org/board/message.php?id=21347>

Loading...