diff options
Diffstat (limited to 'debian/htdig/htdig-3.2.0b6/ChangeLog')
-rw-r--r-- | debian/htdig/htdig-3.2.0b6/ChangeLog | 8763 |
1 files changed, 8763 insertions, 0 deletions
diff --git a/debian/htdig/htdig-3.2.0b6/ChangeLog b/debian/htdig/htdig-3.2.0b6/ChangeLog new file mode 100644 index 00000000..b7615dd4 --- /dev/null +++ b/debian/htdig/htdig-3.2.0b6/ChangeLog @@ -0,0 +1,8763 @@ +Mon Jun 14 10:08:01 CEST 2004 Gabriele Bartolini <angusgb@users.sourceforge.net> + + * Tagged release htdig-3-2-0b6 + +Sun 13 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * db/os_abs.c, (db/os_abs.c.win32 removed): + Re-fix Cygwin bug (#814268, fixed 25 Apr) so that it won't be + clobbered by autotools. + +Sat 12 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htdoc/RELEASE.html: Separated bug fixes from new features + + * htdoc/{htdig,htfuzzy}.html, installdir/{htdig,htfuzzy}.1.in: + Added list of database files used + + * htdoc/{htdump,htmerge,htnotify,htpurge,hts_general,htstat,rundig}.html: + Hyperlinked COMMON_DIR, BIN_DIR, DATABASE_DIR to attrs.html. + + * htcommon/defaults.cc, htdoc/attrs.html.in: + Remove reference to deprecated '-l' option (generate URL log) of htdig. + +Fri Jun 11 11:48:40 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/parser.cc (phrase): Applied Lachlan's patch to prevent endless + loop when boolean keywords appear in a phrase in boolean match method. + +Fri Jun 11 11:26:56 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * db/hash.c (CDB___ham_open): Applied Red Hat's h_hash patch, to ensure + that hash function always set to something valid. + +Fri Jun 11 10:53:49 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/HtFileType: Added -f to rm command. + + * htsearch/parser.cc (perform_or): Added missing & in if clause. + + * contrib/htdig-3.2.0.spec: Updated for 3.2.0b6. + + * installdir/Makefile.{am,in}: Don't stick $(DESTDIR) in HtFileType. + +Thu Jun 10 16:39:36 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/conf_(lexer.lxx,parser.yxx): applied Gilles' patch (April 22) + which features: + - improved error handling, gives file name and correct line number, + even if using include files + - allows space before comment, because otherwise it would just complain + about the "#" character and go on to parse the text after it as a + definition + - allows config file with an unterminated line at end of file, by + pushing an extra newline token to the parser at EOF + - parser correctly handles extra newline tokens, by moving this + handling out of simple_expression, and into simple_expression_list + and block, as simple_expression must return a new ConfigDefaults + object and a newline token doesn't cut it (caused segfaults when + dealing with fix above) + * htcommon/conf_lexer.cxx: Regenerate using flex 2.5.31. + * htcommon/conf_parser.cxx: Regenerate using bison 1.875a. + +Wed Jun 9 12:32:47 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (do_tag): Fixed meta date handling fix of June 3 to + ensure null byte gets put in by get() call. + +Wed 9 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * contrib/doc2html/doc2html.pl, installdir/mime.types: + Add support for OpenOffice.org documents (#957305) + +Sat 5 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * test/t_htdig, test/t_factors: fix tests for non-gnu/linux systems. + +Sat 5 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htdoc/cf_generate.pl: Hyperlink to simplify finding the defaults of + attributes defined in terms of others (e.g., + accents_db->database_base->database_dir). + * htdoc/attrs.html.in: regenerated using cf_generate.pl + +Sat 5 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: Escaped new-line in "allow_spaces_in_url" entry. + Set no_next_page_text to ${next_page_text}; likewise no_prev_page_text. + +Fri Jun 4 10:23:53 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/URL.cc: added "allow_space_in_url" (from fileSpace.1 patch) + * htcommon/defaults.[cc,xml]: added documentation of allow_space_in_url + * htdoc/attrs.html.in: regenerated using cf_generate.pl + * htdoc/cf_byname.html: ditto + * htdoc/cf_byprog.html: ditto + * htdoc/RELEASE.html: updated with info regarding this attribute + +Thu Jun 3 16:04:23 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (do_tag): Fixed meta date handling to avoid inadvertently + matching names like DC.Date.Review. + +Thu Jun 3 10:01:50 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdoc/RELEASE.html: updated release notes and changes + * htdoc/THANKS.html: updated the 'thanks' section + +Thu Jun 3 09:32:52 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * global: updated with 'autoreconf -if' (autoconf 2.59, libtool 1.5.6 + and automake 1.7.9) + +Wed Jun 2 19:03:14 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * contrib/rtf2html: added the rtf2html.c source as modified by David Lippi + and Gabriele Bartolini of the Comune di Prato. The source code is now + released under GNU GPL and included in the ht://Dig package. + +Tue Jun 1 20:23:40 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/HtSGMLCodec.cc: changed ¤ to € + +Fri 28 May 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * Most files: Update copyright to 2004 + +Sun 23 May 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htdocs/FAQ.html: Sync with maindocs + +Sun 23 May 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * configure, configure.in: + Resolve variables (e.g., BINDIR) copied into attrs.html, + without introducing "NONE" prefix detected by Gabriele. + +Sun 23 May 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * .version, htdoc/RELEASE.html, htdoc/where.html, + htdoc/attrs.html.in, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Prepare docs for release of 3.2.0b6. + +Mon Apr 26 15:12:22 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Soundex.cc (generateKey): Applied Alex Kiesel's fix to prevent + segfaults when word has no letters. + +Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/HTML.cc: Handle empty noindex_start/noindex_end lists. + * htlib/StringList.{cc,h}: const-correctness of Add/Insert/Assign(char*) + + * redo mistakenly backed out patch... + +Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/parser.cc: Address (but not fix) bug #934739 + If collection->getDocumentRef() on line 889 returns NULL, don't crash. + I'm still trying to work out why it does return NULL -- I don't think + it ever should. + + * mistakenly back out previous patch :( + +Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/Retriever.{h,cc}, htcommon/defaults.cc, htdoc/FAQ.html: + Add store_phrases attribute. If it is false, htdig only stores the + first occurrence of each word in a document. This reduces the database + size dramatically, and slightly increases digging speed. + +Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * db/{aclocal.m4,configure,os_abs.c.win32}, STATUS, htdoc/THANKS.html: + Correctly dected paths beginning C: as absolute paths in cygwin/Win32. + Fixes bug #814268. + +Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/Retriever.cc: + Gilles's patch to avoid regex compile for every URL encountered. + +Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * contrib/htdig-3.2.0.spec: + Karl Eichwalder's patch to use mktemp to create safe temp file. + +Wed Apr 7 17:12:33 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (IsValidURL): Fixed bug #931377 so bad_extensions + and valid_extensions not thrown off by periods in query strings. + +Mon Mar 15 11:56:04 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htsearch/Display.cc: changed (and fixed) the date factor formula as + Lachlan and David Lippi suggested, in order not to give negative results. + +Fri Mar 12 09:13:28 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * configure.in: removed 'eval' expressions which caused the 'NONE' prefix + path to be instantiated and the make script to hang + * acinclude.in: fixed AC_DEFINEs for SSL and ZLIB check macros, which prevented + autoheader (and therefore autoreconf) to correctly work + * moved manual pages from htdoc to installdir + * htdoc/[manpages].in: removed + * installdir/*.[1,8]: removed man pages (htdig-pdfparser.1, htdig.1, + htdump.1, htfuzzy.1, htload.1, htmerge.1, htnotify.1, htpurge.1, + htsearch.1, htstat.1, rundig.1, htdigconfig.8) + * installdir/*.[1,8].in: added pre-configure man pages (htdig-pdfparser.1.in, + htdig.1.in, htdump.1.in, htfuzzy.1.in, htload.1.in, htmerge.1.in, htnotify.1.in, + htpurge.1.in, htsearch.1.in, htstat.1.in, rundig.1.in, htdigconfig.8.in) + * regenerated configure scripts with autoreconf + * fixes bug #909674 + +Sat 21 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * installdir/HtFileType: Use mktemp to create safe temp file (bug #901555) + +Wed Feb 25 11:14:45 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdocs/THANKS.html: added Robert Ribnitz to the 'thanks' page and fixed + Nenciarini's position (it was not in alphabetical order - sorry!). + +Wed Feb 25 11:02:37 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * installdir/*.[1,8]: added man pages (htdig-pdfparser.1, htdig.1, + htdump.1, htfuzzy.1, htload.1, htmerge.1, htnotify.1, htpurge.1, + htsearch.1, htstat.1, rundig.1, htdigconfig.8) provided by + Robert Ribnitz <ribnitz at linuxbourg.ch> of the Debian Project + * installdir/Makefile.am: prepared the automake script for correctly + handling the man pages + +Sat 21 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/htsearch.cc: + Back out change of 21 December, as it causes problems with characters + which *should* be unencded, like / + +Thu 19 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * aclocal.m4, acinclude.m4, configure.in: + Remove duplicate tests for zlib + Fix tests for SSL (Fixes bug #829081) + Fix configure --help formatting + + * htdoc/*.[18].in, htdoc/Makefile.am, configure.in: Added man pages + + * htdoc/attrs.html.in, htdoc/cf_generate.pl, htdoc/Makefile.am: + Fill in #define'd attribs (Fixes bug #692125) + + * test/Makefile.am: Incorporate new tests in make check + + * test/t_htdig, test/t_parsing: suppress unwanted diagnostics + + * STATUS: list Cygwin bug (#814268) + + * htcommon/default.cc: + added wordlist_cache_inserts, remove worlist_cache_dirty_level + + * configure, */Makefile.in, */Makefile, htdoc/cf_by{name,prog}.html: + regenerated + +Fri 13 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * db/mp_cmpr.c: Fix bug with --without-zlib + +Sun 8 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/URL.cc: Make server_alias case insensitive. + + * htdig/Document.cc: Don't hex-decode twice. (Caused problems with names + like file%20name) + + * htdig/Retriever.cc: Test validity of URL value *before* calling + signature(), as that implictly normalises, and confuses + limit_normalised vs limit_urls_to + + * htdig/htdig.cc: Remove stale md5_db if -i specified + + * installdir/htdig.conf: Set common_url_parts to contain all strings + which *must* be in a valid URL. Probably contains whole domain name, + so more compression than using standard strings. + + * htcommon/defaults.cc: Update docs. Remove default "bad_extensions" + from common_url_parts, and add .shtml + + * test/t_htdig, test/t_htdig_local: Update self-tests + +Tue Feb 3 18:06:38 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/HtConfiguration.cc: changed the Find method in order not to + ignore empty string results for string attributes whenever they are + defined in the configuration file by the user + * htdig/Document.cc: fixed bugs in handling the http_proxy, + http_proxy_authorization, authorization attributes + * htlib/Configuration.[h,cc]: added the Exists method in order to query + whether an attribute's definition is present in the configuration + dictionary (before it was checked against its string's length which + prevented empty attributes to be correctly used) + * these changes fix bug #887552 + +Sun 18 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/URL.cc, test/url.cc: + Rename "allow_dbl_slash" to "allow_double_slash", to match defaults.cc + + * htcommon/default.cc, htdoc/{hts_temlates,attrs}.html: + Explain that keywords_factor applies to meta keywords. Fix old typo. + + * test/t_{factors,templates}, test/htdocs/set1/{title.html,bad_local.htm} + * test/conf/entry-template: + Expanded test suite. + +Sat 17 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * test/t_{parsing,htdig_local,factors,templates}, + * test/htdocs/set1/title.html: + Expanded test suite. + +Sat 17 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/DocumentRef.cc: + Fix old-style use of HtConfiguration, so defaults are read correctly. + Causes max_descriptions to be treated correctly. + + * htcommon/default.cc, htdoc/{hts_temlates,attrs,cf_byname,cf_byprog}.html: + Explain that max_description{s,_length} don't affect indexing -- only + text used to fill in template variables. + +Mon 12 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * Very many files: Fix bug #873965 + Replace C++ style comments with C style comments in all C files, and .h + files they include. + Also, change //_WIN32 to /* _WIN32 */ in .cc files for uniformity. + +Mon 12 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net> + + * test/t_parsing, test/test_functions.in: Add new tests + * htcommon/default.cc, htdoc/hts_templates.html: Cross-ref documentation. + +Mon Dec 29 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/Retriever.cc: + Fix bug in which validity of first URL from each server was not checked. + +Mon Dec 29 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/htdig.cc, htdoc/htdig.html: Fix bug #845054 + Fix behaviour of -m and additional list of urls at the end of a command. + In either case, "-" denotes stdin. + +Mon Dec 29 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * installdir/rundig, installdir/Makefile.{in,am}: Address bug #860708 + Make bin/rundig -a handle multiple database directories + +Sun Dec 21 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/htsearch.cc: + Improve handling of restrict/exclude URLs with spaces or encoded chars + +Sun Dec 21 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/HtURLSeedScore.cc, htsearch/SplitMatches.cc: Fix bug #863860 + Split patterns at "|". + For SplitMatches, make "*" only match if all other patterns fail. + +Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/Server.cc: Fix bug #851303. + Allow indexing if robots.txt has an empty "disallow". + + * test/t_htdig, test/t_htsearch, test/htdocs/robots.txt: + Tests for the above. + +Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/htdig.cc, test/t_factors: Warn if config file has obsolete fields. + +Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/Display.cc: Apply Gilles's patch for ellipses bug #844828. + +Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * test/{t_validwords,t_templates,t_fuzzy,t_factors} + * test/{set_attr,synonym_dict,dummy.stems,dummy.affixes,bad_word_list} + * test/conf/main-template test/htdocs/set1/{site2.html,site4.html}: + Added four new tests to test suite. Not included in "make check", + but can be run explicitly by "make TESTS=t_... check". + +Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/conf_lexer.{lxx,cxx}: + Back out changes to try to accept files without EOL :( + +Sat Dec 13 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.{cc,xml}, htdoc/{attrs,cf_byprog}.html: + Fix "used by" for max_excerpts, and resulting hyperlinks. + +Sat Nov 22 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/conf_lexer.{lxx,cxx}, htcommon/conf_parser.{yxx,cxx}: + Partially address bug #823455. + Don't complain if config file doesn't end in EOL. + Should the grammar be fixed not to need EOL? + Report errors to stderr, not stdout, as they confuse the web server. + +Sun Nov 9 14:44:02 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * Tagged release htdig-3-2-0b5 + +Sat Nov 8 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/default.cc, htsearch/parser.cc: Fix bug #825877 + Reduce backlink_factor to comparable with other factors, and + interpret multimatch_factor as the *bonus* given for multiple matches. + +Sat Nov 1 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/parser.cc: Fix bug #806419. Ignore bad words at start of phrase. + +Tue Oct 28 11:58:06 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdig/htdig.cc: set the debug level when we are importing a cookie file. + Fix bug #831478. + +Mon Oct 27 17:13:02 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Server.cc: Fix bug #831407. Make sure time properly reset after + delay completed, so that it doesn't allow 2 connections per delay. + +Mon Oct 27 15:57:38 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/THANKS.html: Added Lachlan, Jim and Neal to the active developers + list. + +Sun Oct 26 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdoc/hts_templates.html: Clarify that PREV/NEXTPAGE template variables + are empty if there is only one page, ignoring no_{prev,next}_page_text. + +Sun Oct 26 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: Fixed documentation to close bug #829767 + Clarified that noindex_start/end do not get replaced by whitespace. + Also removed spurious '>' from start of boolean_syntax_errors, and + added missing '#' to many local <a href> tags. + +Sun Oct 26 12:42:27 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/defaults.cc: Fixed description of 'head_before_get' after + Lachlan fixes. + * htdoc/attrs.html: rerun cf_generate.pl + +Sat Oct 25 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/Display.cc: Fix #829761. + If last component of the URL is used as a title, URL-decode it. + +Sat Oct 25 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/Server.cc: Fix #829754. Avoid calculations with negative time + +Fri Oct 24 17:17:15 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/htdig.html, htdoc/meta.html, htdoc/require.html: Update URL for + the Standard for Robot Exclusion. + + * htdoc/htmerge.html: Added two clarifications to -m option description. + + * htdoc/cf_types.html: Make clear distinction between String List and + Quoted String List. + +Fri Oct 24 15:30:08 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc: Fix bug #829746. Applied Niel Kohl's fix for this, + to check if words input given before trying to use it, to avoid NULL + argument to syslog(). + +Fri Oct 24 15:15:53 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc: Fix bug #578570. The enddate handling now works + correctly for a large, negative startday value. + +Fri Oct 24 12:47:51 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (ctor): Fix obvious typo in metadatetags.Pattern setting. + +Thu Oct 23 10:27:18 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/default.cc: Fix bug #828808. Default startyear to empty + Document "startyear defaults to 1970 if a start/end date set". + +Thu Oct 23 12:14:30 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdig/htdig.cc: restored the code before Oct 21 (fixes ##828628) + +Thu Oct 23 11:41:15 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdig/Retriever.[h,cc]: removed 'head_before_get' overriding by + restoring the code before Oct 21. + * htdig/Document.[h,cc]: ditto, with the exception of detaching the HEAD + before GET mechanism from the persistent connections'. + * htcommon/defaults.cc: improved documentation (even though it needs + corrections by an english-speaking developer). + * These changes fix bug #828628 + +Wed Oct 22 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/parser.cc: Applied Neal's patch to fix bug #823403 + Documents only added to search list if they were successfully dug. + Lines 237-238 of htsearch/Display.cc + if (!ref || ref->DocState() != Reference_normal) + continue; + should now be redundant. (Left in to be defensive.) + +Tue Oct 21 11:04:56 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdig/Retriever.h: added the 'RetrieverType' enum and an object variable + for storing the type of dig we are performing (default initial); + * htdig/Retriever.cc: changed constructor in order to handle the type, + added some debugging explanation regarding the override of the + 'head_before_get' attribute, added checks regarding an empty + database of URLs to be updated (set the type to initial). + * htdig/Document.h: added the attribute 'is_initial' which stores the + information regarding the type of indexing (initial or incremental) + we are currently performing. Added access methods (get-and-set-like) + * htdig/Document.cc: modified the logic of the HeadBeforeGet settings during + the retrieval phase, in order to always override user's settings in + an incremental dig and automatically set the 'HEAD' call in this case. + * htcommon/defaults.cc: modified the default value of 'head_before_get' and a bit + of its explanation. + * htnet/HtHTTP.cc: detached the HEAD before GET mechanism to the persistent + connections one + * htdig/Server.cc: added one level of debugging to the display of the + server settings in the server constructor + +Fri Oct 17 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htword/WordType.cc, htcommon/defaults.cc: Patched to fix bug #823083 + Don't assume IsStrictChar returns false for digits. + Clarify behaviour of allow_numbers in the documentation. + +Fri Oct 17 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: Patched to fix bug #823455 + Escaped "$" in valid_punctuation, and add warnings about $, \ and `. + +Wed Oct 15 11:12:52 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Server.cc (robotstxt): Patched to fix bug #765726. + Don't block paths with subpaths excluded by robots.txt, and make + sure any regex meta characters are properly escaped. + +Tue Oct 14 11:54:07 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc: add an empty Accept-Encoding header - this inform the + server that htdig is only able to manage documents that are not encoded + (if no Accept-Encoding is sent, the server assumes that the client is + capable of handling every content encoding - i.e. zipped documents with + Apache's mod_gzip module). Partial fix of bug #594790 (which now becomes a + feature request) + +Mon Oct 13 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htfuzzy/Regex.cc: Search for regular expression. (Used to ignore it!) + + * htfuzzy/Speling.cc, htword/{WordList.cc,WordList.h,WordKey.cc,WordKey.h}: + When looking in word database for misspelt words, don't ask to match + trailing numeric fields in database key. + + * htcommon/defaults.cc, htdoc/htfuzzy.cc: Update docs. + +Sun Oct 12 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/htsearch.cc: + Fix bug if fuzzy algorithms produced no search words. + Send all debugging output to cerr not cout. More debugging output. + +Sun Oct 12 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/{Retriever,Server}.cc: Back out the previous. + Gilles pointed out inconsistency with Retriever::IsValidURL(). + +Sun Oct 5 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/{Retriever,Server}.cc: Jim Cole's patch to bug #765726. + Don't block paths with subpaths excluded by robots.txt. + +Sun Oct 5 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/htsearch.cc: Highlight phrases containing stop words + * test/t_htsearch, test/conf/htdig.conf.in: Tests for the above + +Sat Sep 27 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * test/{test_functions.in,t_htdig,t_htdig_local,t_htnet}: + Don't assume shell "." command passes arguments. (Doesn't on FreeBSD.) + +Sat Sep 27 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htlib/HtDateTime.h, htnet/HtCookie.cc: + Avoid ambiguous function call on systems (HP-UX) where time_t=int + +Fri Aug 29 09:35:46 MDT 2003 Neal Richter <nealr at rightnow.com> + + * removed references to CDB___mp_dirty_level ,CDB_set_mp_diry_level() + & CDB_get_mp_diry_level() + + * The config verb 'wordlist_cache_dirty_level' was left for possible use in + the future. + +Thu Aug 28 15:11:21 MDT 2003 Neal Richter <nealr at rightnow.com> + + * Changed db/LICENSE file to new LGPL compatible license from Sleepycat + Software -- Thanks Sleepycat! + + * Reverted to Revision 1.2 or db/mp_alloc.c The recent changed cuased + large DB growth. Strangely the files contained no 'new' data, they were + just much larger. Looks like the pages were being flushed too often???? + +Thu Aug 28 12:41:22 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * global: updated with 'autoreconf -if' (autoconf 2.57, libtool 1.5.0a and + automake 1.7.6) + * 'make check' successful on: AMD64 Linux 2.4, Alpha Linux 2.2, + RedHat Linux 7.3 (2.4), SPARC Ultra60 Linux 2.4, + Sparc R220 Sun Solaris (5.8). + * README.developer: added further info + +Thu Aug 28 12:00:10 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * db/[config.guess,config.sub,install-sh,ltmain.sh,missing]: added in the + database directory (this way 'make dist' goes on); I have not been able to + tell the db/configure script to get the 'top_srcdir' ones (which should be + the default behaviour). Maybe in the future we'll look for this. + +Thu Aug 28 11:53:48 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * db/configure.in: changed AC_PROG_INSTALL() to AC_PROG_INSTALL and removed + AC_CONFIG_AUX_DIR; this implies that autotools copies will be made for the + db directory as well. + +Thu Aug 28 11:36:42 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * [htcommon,htdb,htdig,htfuzzy,htlib,htnet,htsearch,httools,htword,test]/Makefile.am: + added the option above to every *_LDFLAGS + +Thu Aug 28 11:30:39 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * Makefile.am: removed acconfig.h from the EXTRA_DIST list + +Thu Aug 28 11:25:07 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * configure.in: removed portability checks for error, stat and lstat that + caused a compile errors on Solaris. Added the '-mimpure-text' + extra ld flag for GCC on solaris systems (a linkage error occurs + when libstdc++ is not shared) + +Thu Aug 28 11:22:57 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * include/Makefile.am: changed htconfig.h.in into config.h.in + +Thu Aug 28 11:16:19 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htlib/error.[h,c]: removed for now, until replacement functions will be + correctly performed. + +Thu Aug 28 11:11:32 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdoc/cf_generate.pl: fixed an error when opening tail and head files + * Makefile.am: enabled rebuild from a different directory (it is used + my 'make dist') + +Thu Aug 28 10:46:35 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htlib/malloc.c: modified according to autoconf specifications as far + as replacement functions are regarded + * htlib/[lstat, stat].c: removed for now + +Thu Aug 28 10:40:58 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdoc/cf_generate.pl: accept an optional parameter (top source directory) + * htcommon/defaults.cc: fixed some broken lines which prevented + cf_generate.pl from correctly working + * htdoc/Makefile.am: modified the automake file for passing the top + source directory to cf_generate.pl + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerated using cf_generate.pl. + +Tue Aug 26 12:25:40 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * configure.in: removed AC_FUNC_MKTIME because it may not work properly + and added default replacement directory (htlib) for future uses + * htlib/Makefile.am: back-step with re-inclusion of mktime.c in the + list of files to be always compiled (caused linking errors + for the __mktime_internal function) + * global: updated with 'autoreconf -if' + +Sun Aug 24 12:44:29 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * updated with 'autoreconf -if': autoconf 2.57, automake 1.7.6 and + libtool 1.5.0a (autotools that come with Debian SID) + +Sun Aug 24 12:39:34 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * configure.in: moved AC_PROG_LEX to AM_PROG_LEX + * db/configure.in: enabled AM_MAINTAINER_MODE which prevented users without + autotools to configure and compile the program (relatively to the db + directory) + * include/htconfig.h: previously excluded from the branch (severe error!) + +Mon Jul 21 20:54:47 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htlib/(malloc|error|lstat|stat|realloc).c: added for cross-compiling + reasons (as suggested by automake) + * htlib/error.h: ditto + * db/acconfig.h: removed as suggested by autotools' new versions + * configure.in: removed AC_PROG_RANLIB (overriden by AC_PROG_LIBTOOL) + * updated as of rerun 'autoreconf -if' + +Mon Jul 21 10:08:24 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * Patch provided by Marco Nenciarini <mnencia at linux.it> has been + completely applied; the patch adds support for detection + of standard C++ library + * all sources using <iostream.h> <fstream.h> <iomanip.h>: modified + to use standard ISO C++ library, if present + * db/configure scripts: modified for autoconf 2.57 + +Mon Jul 21 09:59:16 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * [.,*]/Makefile.in: regenerated by new automake against new configure.in + * Makefile.config: now looking for the global configuration file + in the source directory + +Mon Jul 21 09:49:22 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * configure.in: completely rewritten, deprecated directives have + been removed and now version 2.57 is a prerequisite. + * acinclude.m4: moved all the macros here + * aclocal.m4, configure: regenerated by aclocal and autoconf + * acconfig.h: removed as now it is deprecated + * include/htconfig.h.in: removed, as 'config.h.in' is preferred + and auto-generated + * config.[guess,sub]: updated with newer versions + +Tue Jul 8 16:29:44 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/parser.cc (checkSyntax): Fixed boolean_syntax_errors + handling to work over multiple config files. + +Mon Jul 7 00:41:55 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * Updated to autoconf 2.57, libtool 1.5 and automake 1.7.5 + * removed acconfig.h files + * autoconf include file is now include/config.h (for autoheader) + * include/htconfig.h.in renamed in include/htconfig.h: now includes + config.h and redefines the bool types + * htlib/HtRegexList.cc, htdig/(Document.cc|ExternalParser.cc): removed + TRUE and FALSE and converted to C++ standard values + +Sat Jul 5 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * test/test_functions.in: Fix bugs starting/killing apache + +Sat Jul 5 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: Disable cache flushing to avoid "page leak". + +Tue Jun 24 2003 Neal Richter <nearl at rightnow.com> + + * Update Copyright Notices in code & documentation to 2003 + + * Changed License Notice GPL -> LGPL License change (Decided by HtDig + Board & Membership October 2002 + +Mon Jun 23 2003 Neal Richter <nearl at rightnow.com> + + * Raft of changes. Most todo with Native Win32 support + + * TODO: ExternalTranport & ExternalParser are effectively dissabled with + #ifdefs for Native WIN32 + + * remove global CDB___mp_dirty_level variable and subsitute functions to set/get variable + + * Added local copies of GNU LGPL regex, POSIX-like dirent routines, getopt + library and filecopy routines - mainly for Native WIN32 support + + * improve IsValidURL with return codes (htdig/Retriever.cc) + + * lots of improvements/new-features to libhtdig + +Sun Jun 22 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/mp_cmpr.c (CDB___memp_cmpr_open): + Make weak compression database standalone to avoid recursion + This *should* fix all of the recent problems with dirty cache etc. + + * test/search.cc: Don't take sizeof zero sized array + +Fri Jun 20 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * configure,aclocal.m4,acinclude.m4: --with-ssl set CPPFLAGS, not CFLAGS + +Fri Jun 20 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/configure: Hack which should allow select to be detected on HP/UX + + * db/db.c: Replace HAVE_ZLIB with HAVE_LIBZ (as set by configure) + + * htword/wordKey.cc: More descriptive error message + + (Changes to compile with Sun's C++) + * htnet/{HtCookie.cc,HtFTP.cc,Transport.cc}: + Assign substring of const string to const pointer. + * htsearch/ResultMatch.h: + Allow use of SortType in ResultMatch::setSortType() + * test/search.cc: Don't take sizeof(variable size array) + * htdb/htdb_stat.cc: avoid name clash for global var internal + * htcommon/URL.h, htlib/HtTime.h, htlib/htString.h, htnet/Connection.h, + htword/WordBitCompress.h: + Cast default args of type string literal to type (char*) + + * htdocs/require.html: Remove email address. + + * htlib/gregex.h: Avoid warning if __restrict_arr already defined + +Sun Jun 14 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: + Set wordlist_cache_dirty_level to 1 (it most conservative value). + Miscellaneous reformatting. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerated using cf_generate.pl. + + * htdoc/{require.html,meta.html,all.html,meta.html}: + Update disk usage for phrase searching. + Updated list of supported platforms. More hyperlinks. + +Fri Jun 13 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/Display.cc (setVariables), htdocs/hts_template.html: + Set MATCH_MESSAGE from method_names (for internationalisability). + Removed all trace of hack for config attribute... + +Thu Jun 12 14:16:05 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc (main): Fixed boolean_keywords handling to + work over multiple config files (must destroy old list before + creating new one). + + * htcommon/defaults.cc, htsearch/Display.cc (setVariables): Removed + incorrect default value for "config" attribute, and removed hack + that attempted to correct it. + + * htdoc/attrs.html: Regenerated using cf_generate.pl. + +Thu Jun 12 13:28:01 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc, htcommon/HtSGMLCodec.cc (ctor): Added + translate_latin1 option to allow disable Latin 1 specific SGML + translations. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerated using cf_generate.pl. + +Mon Jun 9 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/htsearch.cc: Fixed setupWords loop for junk at end of query + +Mon Jun 9 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/Display.cc: Set CONFIG template variable to the base name + of the config file (no directory or .conf), as expected by htsearch + +Mon Jun 9 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * test/test_functions.in: avoid trying killing apache multiple times + + * configure,configure.in: Reformat --help output + * htdoc/FAQ.html: Brought up-to-date with main docs + * htdoc/hts_templates.html: added hyperlinks. + * installdir/search.html: Display version + +Sun Jun 8 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * configure: Hack to set --disable-bigfile for Solaris (with Sun cc) + and --disable-shared --enable-static for Mac OS X + + * test/{test_functions.in,t_htdig,t_htdig_local,t_htnet}: + Only start Apache for tests which need it, and kill it after the test + + * contrib/parse_doc.pl: Allow file names containing spaces (from .deb) + +Mon Jun 2 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/mp_cmpr.c: Add default zlib setting to default_cmpr_info + * htcommon/defaults.cc, htword/WordDBCompress.cc: Fix docs to say + default compression by 8 (not by 3, which I had "fixed" it to...) + + * htcommon/conf_lexer.{cxx,lxx}: Avoid warnings, and document hack. + +Thu May 29 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/mp_cmpr.c: Fix comparison of -1 and unsigned which broke SunOS cc + * htdoc/install.html: Warn SunOS cc users to --disable-bigfile + + * htcommon/conf_lexer.cxx: Suppress warnings of unused identifiers + * test/con/htdig.conf2.in: Disable testing of content_classifier + attribute, as didn't work until after installation + +Tue May 27 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/configure, db/ac{local,include}.m4: + Stop test for zlib from adding -I/default/path (*this* time...) + + * htword/DBPage.h: Fix bug introduce in previous patch + + * test/Makefile.{in,am}: Replace non-portable make -C X by cd X; make + +Tue May 27 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * {,db/}configure, {,db/}ac{local,include}.m4: + Stop test for zlib from adding -I/default/path (broke SunOS cc) + Fix -Wall test if CCC is g++ but CC is not gcc + + * test/dbbench.cc: #include <fcntl.h> later, to avoid #define open + causing problems + + * includedir/synonyms: Remove trailing blank line which caused warning + * htnet/HtCookieInFileJar.cc,htfuzzy/Synonym.cc: .get() to stop warnings + * htlib/mhash_md5.c: char -> unsigned char to stop warnings + * test/search.cc, htword/WordDBPage.h: + Casts to (int) to stop printf warnings. ALLIGN -> ALIGN + +Sat May 24 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: Keep more wordlist cache pages clean + + * {,db/}configure{,.in}, {,db/}ac{local,include}.m4: + Patch by Richard Munroe to test if -Wno-deprecated needed. + Many bug fixes / extra search paths added. + + * include/htconfig.h.in, db/db_config.h.in: + Only '#define const' if not C++ (htword/WordDB.cc uses db_config.h) + * test/dbbench.cc: check for alloca even if gcc + * test/t_url: used grep -C instead of grep -c (for portability) + * db/mp_{alloc,cmpr}.c: Removed/replaced C++ style comments + + * htdoc/require.html: Revised list of supported platforms + +Thu May 22 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htnet/HtFile.cc: Fix previous .get() patch... + +Thu May 22 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htlib/DB_2.cc: Set wordlist_cache_dirty_level before opening + database, to avoid database memory allocation problem. + + * db/db_err.c: Make 'fatal' errors actually exit. + + * htdig/Document.cc, htsearch/parser.cc, htdig/htdig.cc, + * htnet/Ht{HTTP,File}.cc: + Add .get() to use of strings to avoid compiler warnings (FreeBSD). + +Thu May 22 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * ltmain.sh, test/Makefile.in: Hack to list library dependencies + multiple times in g++ command, to get MacOS X to 'make check'. + + * test/{search,word}.cc: cast sizeof() to (int) to avoid warnings. + + * htdoc/install.html: Documented MacOS X's shared libraries problem. + +Sun May 18 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/mp_alloc.c: Hopefully the *last* fix for this morning's patch... + + * configure, aclocal.m4, acinclude.m4: + Look for httpd modules in .../libexec/httpd for OS X + * test/conf/httpd.conf: Disabled mod_auth_db, mod_log{agent,referer}. + +Sun May 18 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/db.h.in: Declare variable introduced in db/mp_cmpr.c patch + +Sun May 18 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * db/mp.h, db/mp_{alloc,bh,cmpr,region}.c, + * htword/WordDB.cc, htdig/htdig.cc: + Avoid infinite loop if memp_alloc has only dirty, + "weakly compressed" (i.e. overflow) pages. + * htcommon/defaults.cc: Document the above, plus misc updates. + + * htword/WordDBPage.h: + Cast sizeof() to (int) in printf()s to avoid compiler warnings. + +Sun APR 20 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/htdig.cc: delete db.words.db_weakcmpr if -i specified. + +Wed Feb 26 22:10:40 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc: fixed colon (':') problem with HTTP header parsing, + as Frank Passek, Gilles and others suggested, as space is not + mandatory between the field declaration and the field value returned + by the server + +Sun Feb 23 10:20:58 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/defaults.[cc,xml]: added the 'cookies_input_file' + configuration attribute for pre-loading cookies in memory + * htdig/htdig.cc: added the feature above; the code automatically + loads the cookies from the input file into the 'jar' that will be + used during the crawl. + +Sun Feb 23 10:16:08 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.h: removed the NULL pointer check before assigning a + new jar to the HTTP code + +Tue Feb 11 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/defaults.cc: Set default compression_level to 6, + which enables Neal's wordlist_compression_zlib flag. + +Tue Feb 11 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htcommon/{DocumentRef.h, HtWordReference.h}, + htsearch/WeightWord.{cc,h}, + htsearch/parser.{cc,h}, htsearch/htsearch.cc: + Added field-restricted searching, by title:word or author:word + + * htdig/ExternalParser.cc, htdig/HTML.{cc,h}, htdig/Parsable.{cc,h}, + htdig/Retriever.{cc,h}: + Parse author from <meta ...> tags. Also moved some common + functionality from HTML/ExternalParser into Parsable. + + * test/t_htsearch, htcommon/defaults.cc, + htdoc/{TODO.html,hts_general.html,hts_method.html}: + Test and document the above + +Sun Feb 9 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htdig/HTML.cc: fix bug in detection of deprecated noindex_start/end + * htsearch/Display.cc: try harder to find value for DBL_MAX #680836 + * htcommon/defaults.cc: fixed typos. + +Sat Feb 1 13:57:17 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie.[h,cc]: allowed printDebug to be passed an ostream object + * htnet/HtCookieMemJar.cc: removed a debug call + +Thu Jan 30 19:28:32 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * configure.in: used AC_LIBOBJ instead of deprecated LTLIBOBJS's workaround + * ltconfig: removed as not needed anymore since libtool 1.4 + * db/configure.in: added AC_CONFIG_AUX_DIR(../) for letting automake know to use + the main ltmain.sh file + * configure, aclocal.m4, Makefile.in, */Makefile.in, config.guess, config.sub, + install-sh, ltmain.sh, missing, mkinstalldirs: re-generated by autotools: + aclocal, autoconf 2.57, automake 1.6.3 and libtool 1.4.3 + * db/aclocal.m4, db/configure, db/mkinstalldirs: ditto + +Thu Jan 30 00:16:51 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htsearch/htsearch.cc: removed a warning due to a not-initialized pointer + +Wed Jan 29 22:53:25 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * acinclude.m4: included the function for checking against SSL, as + found in the ac-archive. + +Tue Jan 28 12:23:16 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/Makefile.am: added HtCookieInFileJar.[h,cc] files + * installdir/cookies.txt: example file for pre-loading HTTP cookies + * installdir/Makefile.am: added cookies.txt + +Tue Jan 28 12:16:28 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookieMemJar.[h,cc]: performed deep copy of the jar in the copy constructor + +Tue Jan 28 12:13:44 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie.[h,cc]: added the constructor of a cookie object from a line + of a cookie input file (Netscape's way): if an expiration value of '0' is set + through the cookies input file, the cookie is managed as a session cookie. + Improved copy constructor, solving a bug related to the expires field. + +Tue Jan 28 12:11:27 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookieInFileJar.[h,cc]: class for importing cookies from a text file + +Tue Jan 28 12:08:20 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htlib/HtDateTime.h: added the constructor HtDateTime(const int) + +Sat Jan 25 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htsearch/Display.cc: Convert "<br>\n" in $(DESCRIPTION) to "<br>" + so it can be used in Javascript (feature request #529926). + +Tue Jan 21 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * HTML.cc (HTML, parse): Handle noindex_start/end as string lists. + + * test/{t_htsearch,htdocs/set1/script}: Test the above + + * htcomon/defaults.cc: + Add "<SCRIPT" to default noindex_start/end (feature request #586359). + + + * htlib/String.cc (operator>> (istream&,String&) ): + Exit loop when getline fails for reasons other than a full buffer. + + * htnet/HtFile.cc (File2Mime), installdir/HtFileType: + Allow file names containing spaces. + +Sat Jan 11 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htnet/HtFile.cc (Request), htdig/Document.cc (RetrieveLocal), + htcommon/URL.h htcommon/URLTrans.cc: + Decode URL paths before use as local filenames (file:/// & local_urls). + + * test/{t_htdig,t_htdig_local,t_htsearch}, test/conf/htdig.conf2.in, + test/htdocs/set1/{index.html,site 1,sub%20dir/empty file.html}: + Tests for the above. + + * htcommon/HtConfiguration.cc: brackets around assignment in 'if'. + * test/search.cc (LocationCompare): Only specify default arg once. + +Fri Jan 10 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htlib/String.cc (operator>> (istream&,String&) ): + Check status of stream, no return value of get(). + Fixes bug (for some C++ libs) where reading stops at a blank line. + +Fri Jan 1 2003 Lachlan Andrew <lha at users.sourceforge.net> + + * htnet/HtFile.cc(Ext2Mime,Request), htdig/Document.cc(RetrieveLocal): + Determine local files' MIME types from mime.types, not hard-coded. + URLs matching attribute "bad_local_extensions" must use their true + transport protocol (HTTP for http://, filesystem for file:///). + + * htnet/HtFile.cc (File2Mime, Request): For file:/// URLs only, + files without (or with unrecognised) extensions are checked by + the program specfied by the "content_classifier" attribute. + + * htnet/htFile.cc (Request): Symbolic links are treated as + redirects, to avoid problems with relative references. + + * htcommon/defaults.cc: Documented the above (and added crossrefs). + + * test/t_ht{dig,dig_local,search}, test/htdocs/set1/*, + test/conf/htdig.conf2.in: Add tests for bad_local_extensions. + +Mon Dec 31 2002 Lachlan Andrew <lha at users.sourceforge.net> + + * configure.in,htfuzzy/EndingsDB.cc,htlib/{HtR,r}egex.h,Makefile.am: + Renamed regex.h to gregex.h and allow use of rx instead. + + * htcommon/defaults.cc,htdocs/{attrs,cf_byprog,cf_byname}.html: + Fixed typo in cross-references to restrict and limit_urls_to. + + * test/t_htmerge: Re-enabled htmerge command (discarding output). + + * test/Makefile,test/conf/htdig.conf3.in: Added conf3 and fixed db path. + +Mon Dec 30 2002 Lachlan Andrew <lha at users.sourceforge.net> + + * contrib/doc2html/*: Incorporated David Adams' latest version, 3.0.1. + +Mon Dec 30 2002 Lachlan Andrew <lha at users.sourcefourge.net> + + Forward-ported several patches from 3.1.6: + + * htdig/ExternalParser.cc: Added "description_meta_tag_names" attrib. + Added "dc.date|dc.date.created|dc.date.modified" synonyms for "date". + Allow spaces between "url" and "=" in refresh. + Fixed bug in flag positions. + Added "use_doc_date" attribute. + + * htdig/HTML.cc: Added "description \_meta_tag_names" attribute. + Added "dc.date|..." synonyms. + Added "ignore_alt_text" attribute. + + * htdig/Retriever.cc: Added "ignore_dead_servers" attribute. + Added call to "url.rewrite() in got_href(). + + * htdig/FAQ.html: Latest version now 3.1.6. Mention old security hole. + Describe external converters for PostScript etc. + Mention pdf_parser not supported in 3.2. + + * htdoc/{attrs,cf_byname,cf_byprog}.html: New attributes added + (automatically from defaults.cc). + + * htdoc/htmerge.html: Update for multiple database support. + + * htdoc/hts_form.html: Describe relative/incomplete dates. + + * htdoc/require.html: Describe phrase searching, external parsers, + external transports. + Added some new supported systems. (Commented out as testing + incomplete.) + + * htfuzzy/Synonym.cc: Protect against "synonym" entries with one word. + + * htlib/String.cc: Protect against negative string lengths. + + * htsearch/Display.{cc,h}: Added "search_result_contenttype" attribute, + and corresponding displayHTTPheaders() function. + Rewrite URLs. + Remove old "ANCHOR" variable. + Handle relative dates. + Added "max_excerpts" attribute and buildExcerpts() function. + Added "anchor_target" attribute. + + * htsearch/DocMatch.h: Added "orMatches" + + * htsearch/htsearch.cc: Added "boolean_keywords" attribute. + Rewrite URLs. + + * htsearch/parser.cc: Added "boolean_syntax_errors" attribute. + Added wildcard search. + Fixed bug in perform_phrase() so it now handles "bad words" and + short words properly. + Added "multimatch_factor" to give greater weight to documents matching + multiple "OR" terms. + + * htsearch/htparser.h: Added boolean_keywords support. + + * htcommon/defaults.{cc,xml}: New attributes added, and enhanced + descriptions + + + Cleaned code to remove some compiler warnings/errors: + + * htcommon/HtConfiguration.cc: Brackets around assignment 'path=' + inside 'if' + + * htdig/Server.cc, htsearch/Display.cc: + Added ".get()" when strings passed as arguments. + + * htlib/StringMatch.h, htword/WordBitCompress.h: + Explicit cast of NULL to (char*)NULL for broken C++ compilers. + + + Also: + + * STATUS: Removed "not all htsearch input parameters handled properly", + "Return all URLs", "Turn on URL parser test", + "htsearch phrase support tests". + Reduced list of things to do for "require.html". + + + * test/t_htsearch, test/conf/htdig.conf3.in: + Added testing of phrases and boolean_keywords / boolean_syntax_errors. + +Thu Nov 28 09:02:46 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/english.0: Removed S flag from birth, because it doesn't + do what we want (birthes, not births). + +Tue Nov 26 23:16:08 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/hts_form.html: Fixed typo in link & description for restrict. + +Tue Nov 26 22:30:06 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/english.0: Patched with Lachlan Andrew's changes, fixing + lots of dubious uses of suffixes to get more appropriate and correct + fuzzy endings expansions. + + * installdir/synonyms: Updated with the version contributed by + David Adams, with minor changes. Kept old one as synonyms.original. + +Mon Nov 4 10:44:35 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/URL.[h,cc]: added the assignment operator + +Sun Oct 27 09:29:02 2002 Geoffrey Hutchison <ghutchis at localhost> + + Merge in word DB zlib patch from Neal Richter. + + * db/db.h.in, db/mp_cmpr.c, htword/WordList.cc, + htword/WordDBCompress.h, htword/WordDBCompress.cc: Add support for + using the zlib compression (and compression level) if specified by + the new wordlist_compress_zlib, which is "true" by default. + + * htcommon/defaults.cc: Add attribute wordlist_compress_zlib as + above. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Update using cf_generate.pl. + +Sat Oct 26 21:59:01 2002 Geoffrey Hutchison <ghutchis at localhost> + + Merge in fixes from Lachlan Andrew + + * test/Makefile.am, test/Makefile.in, test/t_url, test/url.cc, + test/url.children, test/url.parents, test/url.output: Add URL + tests to the automatic test suite (rather than requiring them to + be run manually). + + * */Makefile.in: Regenerate using automake-1.4p6. + + * htcommon/URL.cc, htcommon/URL.h: Add new configuration attribute + allow_double_slash to only remove // marks when requested (since + some server-side code uses them), handle initial protocols + without double slashes, and only remove the default doc string + from appropriate protocol URLs (e.g. not file), treat ".//" as a + relative path, and collapse /../ *after* // and /./ handling. + + * htcommon/defaults.cc: Add documentation for allow_double_slash, + as well as various documentation cleanups. + + * htdig/ExternalTransport.cc: Fix minor bug--recognize service + specified as https:// rather than https. + + * htdoc/hts_form.html, htdoc/hts_templates.html: Documentation fixes. + + * htsearch/htsearch.cc: Create valid boolean query if "exact" not + specified in search_algorithms by adding the exact word with low + weight. Solves PR#405294. + +Fri Oct 4 17:05:06 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.xml: Added first-draft XML version of defaults + file. This will eventually be used to generate defaults.cc and + documentation automatically. (As pointed out by Brian White, this + will make the binaries smaller.) + +Wed Sep 25 13:56:31 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (parse): Fixed handling of JavaScript skipping so it + doesn't get confused by "<" in code. + +Thu Sep 19 09:04:50 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc : another check for cookie jar's null pointer + +Tue Sep 17 17:41:51 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (external_protocols): Fixed table formatting + as suggested by Lachlan Andrew. + +Thu Aug 29 21:21:34 CEST 2002 Soeren Vejrup Carlsen <svc at users.sourceforge.net> + + * htdig/Document.[h,cc]: first steps in FTP handling. HtFTP.h included and + we now test for the 'ftp' protocol in the Document::Retrieve function. + Has not yet been tested! + + * htnet/HtFTP.[h,cc]: added class to handle the FTP-protocol. Very + experimental (has not been tested yet). + +Fri Aug 9 13:01:05 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * httools/htnotify.cc (readPreAndPostamble): Check for empty strings + in file names, not just NULL, as suggested by Martin Kraemer. + +Wed Aug 7 12:11:31 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Fixed to impose max_doc_size + restriction on external converter output which it reads in. + +Tue Aug 6 18:21:11 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * these changes were suggested by David Reed <DReed1 at citgo.com> (thanks) + + * htdig/Document.cc: manage cookies via SSL + + * htnet/HtCookie.[h,cc]: features both RFC2109 and Netscape version + + * htnet/HtCookieJar.cc: ditto + +Tue Aug 6 17:12:22 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/defaults.cc: added the 'http_proxy_authorization' attribute. + Needs revision due to my usual *spaghetti* english. :-) + + * htdig/Document.[h,cc]: proxy authorization is now enabled + +Tue Aug 6 09:28:39 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/Connection.[h,cc]: IP address storing as string (sync with ht://Check) + + * htnet/Transport.[h,cc]: HTTP Proxy and Basic credentials handling moved here (ditto) + through the use of a protected static method + + * htnet/HtHTTP.h: SetCredentials declared to be virtual (unnecessary because inherited, + but gives better understanding); new method SetProxyCredentials for + proxy authorization. + + * htnet/HtHTTP.cc: HTTP header Proxy-Authorization is now handled. The + SetCredentials and SetProxyCredentials methods now make use of the + Transport::SetHTTPBasicAccessAuthorizationString method, in order to + write the string for negotiating the access. + +Fri Aug 2 15:40:18 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc (Retrieve): Allow redirects from HTTPSConnect. + +Tue Jul 30 12:46:56 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/md5.cc: Added missing include of stdlib.h, as Geoff suggested. + +Sat Jul 27 11:57:25 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/SSLConnection.cc: Add fix for segfault on SSL connections + noticed by several users. Fix contributed by Andy Bach + <afbach at users.sourceforge.net>. + +Tue Jun 18 10:22:01 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (got_word): Check that the word length meets + the minimum word length before doing any processing. + +Fri Jun 14 17:26:21 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (buildMatchList), htsearch/HtURLSeedScore.cc + (Match), htsearch/SplitMatches.cc (Match): Added Jim Cole's fix to + bugs in handling of search_results_order. + +Wed May 15 09:45:40 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/Retriever.cc: fixed the bug regarding the server_wait_time + feature after the maximum number of requests per connection has been + reached. + +Tue Apr 9 16:41:33 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie*.[h,cc]: RFC2109 compliant. + * htlib/HtDateTime.[h,cc]: Add const-ness to the DiffTime static method + +Tue Apr 9 12:52:30 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie.cc: fixed a bug regarding expiry date recognition + +Fri Apr 5 14:08:39 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalTransport.cc (Request): Fixed to strip CR from + header lines, output header lines with -vvv. + +Tue Mar 19 08:40:54 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie.cc: enhanced controls regarding the expires setting + when no expires is returned. Prevents NULL pointer exceptions to be + arisen. + +Mon Mar 18 11:28:02 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htlib/HtDateTime.h: added the copy constructor + * htnet/HtCookie.cc: fixed a NULL pointer bug regarding 'datestring' + management and HtDateTime copy constructor is now used + +Tue Mar 12 18:19:49 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/HtDateTime.cc (Parse, SetFTime): Added Parse method for + more flexible parsing of LOOSE/SHORT formats, use it in SetFTime. + Also skip unexpected leading spaces in SetFTime, as these frequently + cause problems with some strptime() implementations. + +Mon Feb 11 23:28:37 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.h (got_redirect): Add referer to properly handle + broken links through a redirect as reported by Joe Jah. + + * htdig/Retriever.cc: As above. + + * htdig/Document.cc (Retrieve): Fix bug that prevented external + transport methods from reporting redirects as reported by Jamie + Anstice <Jamie.Anstice at sli-systems.com>. + + * htlib/Dictionary.cc (hashCode): Trial of hash function suggested + by Jamie Anstice. + +Sat Feb 9 18:06:29 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/DocMatch.[h,cc]: Add scoring code for the new htsearch + framework. + +Thu Feb 7 11:32:14 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc (ReadChunkedBody): gets control of Read_Line + methods (return error when they fail). + +Fri Feb 1 17:12:31 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * Merged htdig-3-2-x branch back into CVS mainline. + + * ChangeLog.0: Update with current 3.1.6 ChangeLog. + +Thu Jan 24 18:06:04 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in, aclocal.m4: Use new CHECK_SSL macro from the + autoconf archive. + + * configure: Generate via autoconf. + +Fri Jan 18 11:15:29 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Transport.h (class Transport): Add const to SetCredentials + method declaration as pointed out by Roman Maeder. + +Wed Jan 16 13:35:26 2002 Geoff Hutchison <ghutchis at wso.williams.edu> + + * db/db.h.in: Add #include <sys/stat.h> which seems to help + problems of stat64 conflicts on Solaris as suggested by Gilles. + +Sat Jan 12 16:19:55 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: A few changes to the wording and formatting + of the 'accept_language' attribute description. + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Fri Jan 11 21:18:00 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/defaults.cc: added the 'accept_language' attribute + +Fri Jan 11 20:53:36 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.[h,cc]: management of the accept-language directive added + * htcommon/URL.[h,cc]: const-ness in copy constructor and other cosmetic changes + * htlib/Server.[h,cc]: management of the 'accept_language' attribute as + a server block configuration directive. + * htlib/Document.cc: set of the attribute above for the HTTP layer + +Fri Jan 11 13:25:49 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalTransport.cc (Request): Fixed to allocate access_time + object before setting it. + +Fri Jan 4 12:31:34 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtCookie.cc, htword/WordKeyInfo.cc, htword/WordMonitor.cc, + test/search.cc: changed all uses of strcasecmp to mystrcasecmp for + consistency and portability. + +Fri Jan 4 12:17:10 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtHTTP.cc (HTTPRequest): make the second comparison of the + transfer-encoding header the same as the first, i.e. case insensitive + and limited to 7 characters. + +Fri Jan 4 15:13:13 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc: parse the transfer-encoding header as case insens. + [fix htdig-Bugs-499388 by Matthias Emmert <Matthias.Emmert2 at start.de>] + +Sun Dec 30 15:47:35 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * HtHTTP.[h,cc]: management of the Content-Language directive for the response + +Sat Dec 29 13:07:08 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie.[h,cc]: new fields (srcURL and isDomainValid) and + a more robust class with initialization list and copy constructor + + * htnet/HtCookieJar.[h,cc]: method for calculating the minimum number + of periods that a domain specification of a cookie must have. Depending + on what the Netscape cookies specification says. + + * htnet/HtCookieMemJar.cc: Management of the domain field of the cookie + +Mon Dec 17 06:45:02 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htdig/htdig.cc: fixed bug about cookie jar creation. It is done in + here, because there is only one jar for the whole process. However + it can be moved anywhere else. :-) + +Mon Dec 17 06:40:25 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc: check for null pointer of cookie jar + +Sun Dec 16 19:55:07 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/Connection.[h,cc]: default constructor is changed and accepts + a socket value (by default is -1) + * htnet/HtCookieJar.[h,cc]: added a simple iterator + * htnet/HtCookieMemJar.[h,cc]: ditto + * htnet/HtFile: removed the management of modification_time (constructor) + * htnet/HtHTTP.[h,cc]: constructor with initilization list and without + a default constructor (the construction is now forced to pass a valid + connection object). Removed any memory deletion from the destructor. + The class is now abstract (see the virtual pure destructor). + * htnet/HtHTTPBasic.cc: creates a Connection object in the initialization + and the destructor has no responsability + * htnet/HtHTTPSecure.cc: creates an SSLConnection object in the initialization + and the destructor has no responsability + * htnet/HtNNTP.cc: creates a Connection object in the initialization + and the destructor has no responsability + * htnet/Transport.[h,cc]: default constructor accepts a pointer to a + Connection object and the destructor carries out the deletion of it + +Thu Dec 6 13:24:30 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/examples/rundig.sh: Fixed to make use of DBDIR variable, + and to test for and copy db.words.db.work_weakcmpr if it's there. + +Fri Oct 19 11:07:33 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (IsValidURL): Fixed discrepancies in debug + levels for messages giving cause of rejection, inadvertantly + changed when regex support added. + +Wed Oct 17 15:48:23 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalTransport.h: Added missing class keyword on friend + declaration. + +Tue Oct 16 14:35:16 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/default.cc (external_parsers): Documented external converter + chaining to same content-type, e.g. text/html->text/html-internal. + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Mon Oct 15 22:25:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Document.cc, htdig/htdig.cc, htdig/Retriever.cc: Make sure + setEscaped is called with the current value of + case_sensitive. Fixes bug pointed out by Phil Glatz. + +Fri Oct 12 17:14:08 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/htdump.html, htdoc/htload.html: Fixed 3 little typos. + +Fri Oct 12 15:11:45 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtHTTP.cc (ParseHeader): Show header lines in debugging + output at verbosity level 3, not 4, for consistency with 3.1.x. + + * htcommon/URL.cc (removeIndex): Fixed to make sure the matched + file name is at the end of the URL. + +Fri Oct 12 10:39:54 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/HtRegexList.cc (setEscaped): Fixed to set compiled flag to + FALSE when there's no pattern, so match() can detect this condition. + Fixes handling of empty lists in bad_querystr, exclude_urls, etc. + + * htdig/Retriever.cc (IsValidURL): Fixed bad_querystr matching to + look at right part of URL, not whole URL. + +Mon Sep 24 11:47:15 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtHTTP.cc (SetRequestCommand): Put If-Modified-Since header + out in GMT, not local time, and only put it out if existing document + time > 0. + + * htsearch/parser.cc (perform_phrase): Optimized phrase search handling + to use linear algorithm with Dictionary lookups instead of n**2 alg., + as suggested by Toivo Pedaste. + +Tue Sep 18 10:50:40 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/running.html: New documentation on how to run after configuring. + * htdoc/rundig.html: New manual page for rundig script. + * htdoc/install.html: Added link to running.html. + * htdoc/contents.html: Added link to running.html, rundig.html, related + projects. Updated links to contrib and developer site. + +Fri Sep 14 22:12:56 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/URL.h: Moved DefaultPort() from private to public for + use in HtHTTP.cc. + +Fri Sep 14 09:25:20 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtHTTP.cc (SetRequestCommand): Add port to Host: header when + port is not default, as per RFC2616(14.23). Fixes bug #459969. + +Sat Sep 8 22:15:33 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * acconfig.h, include/htconfig.h.in: Add undef for + ALLOW_INSECURE_CGI_CONFIG, which if defined does about what you'd + expect. (This is for any wrapper authors who don't want to rewrite + but are willing to run insecure.) + + * htsearch/htsearch.cc: Only allow the -c flag to work when + REQUEST_METHOD is undefined. Fixes PR#458013. + +Tue Sep 4 18:58:31 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/DocMatch.cc: Add scoring for Quim's new parser + framework. Only the normal word scoring is currently done, not + backlink_factor or other "Document" methods. + +Fri Aug 31 15:34:28 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.h, htdig/HTML.cc (ctor, parse, do_tag): Fixed buggy + handling of nested tags that independently turn off indexing, so + </script> doesn't cancel <meta name=robots ...> tag. Add handling + of <noindex follow> tag. Added <> delim. to tag debugging output. + Fixed a few typos. + +Wed Aug 29 10:33:01 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (url_part_aliases): Added clarification + explaining how to use example. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Mon Aug 27 15:05:09 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/search.html: Add DTD tag for HTML 4 compliance. + * installdir/htdig.conf: Added .css to bad_extensions default, + added missing closing ">". + * htdoc/config.html: Updated with sample of latest htdig.conf and + installdir/*.html. + +Wed Jul 25 22:16:06 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: Put new htnotify_* entries in alphabetical + order. Removed superfluous quotes from htnotify_webmaster example + (htnotify.cc adds in the quotes). + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Tue Jul 24 16:07:01 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: Changed references in (no_)page_number_text + entries from maximum_pages to maximum_page_buttons. + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Tue Jul 24 14:38:22 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/hts_templates.html: Document Quim Sanmarti's URL decoding + feature for template variables. + +Thu Jul 12 14:12:02 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtFile.cc (Request): Fixed so it doesn't remove newlines + from documents, and so it only tries to open mime.types once even + if the open fails. + +Thu Jul 12 11:40:07 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/conv_doc.pl, contrib/parse_doc.pl: Fixed EOF handling in + dehyphenation, fixed to handle %xx codes in title made from URL. + + * contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl, + contrib/doc2html/swf2html.pl: Fixed to handle %xx codes in URL title. + +Wed Jul 11 15:05:47 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (readFile): Added missing fclose() call, and + debugging message for when file can't be opened. + +Wed Jul 11 14:26:28 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (displayParsedFile): Added debugging message + when file can't be opened. + + * htseach/Display.cc (buildMatchList): Fixed while loop to avoid + warning. + + * htsearch/htsearch.cc (main): Fixed handling of syntax error message + to use String class instead of strdup(). + + * htsearch/parser.cc (setError): Added debugging message when error + is set. + + * htsearch/parser.cc (parse): Fixed not to clear error message after + it's set. + +Sat Jul 7 22:19:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * */Makefile.in: Update using current production automake + (1.4-p4). + + * htfuzzy/Regexp.[cc,h]: Change class name to Regexp to prevent + further namespace clashes. + + * htfuzzy/Fuzzy.c: #include "Regexp.h" now and make sure we create + the right class when needed. + + * htlib/mktime.c: Change included mktime declaration to mymktime + to avoid conflict on Mac OS X. (For some reason, autoconf's + AC_FUNC_MKTIME doesn't work for Mac OS X. So this is a hack in the + meantime.) + + * htfuzzy/Makefile.am: Rename Regex files. Oops! + +Fri Jul 6 18:38:58 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Regexp.cc, htfuzzy/Regexp.h: Rename Regex class to + prevent problems on case-insensitive systems. + + * htlib/HtRegexReplaceList.cc, htlib/String.cc, htdig/htdig.cc: + Change #include of <stream.h> to modern standard of iostream.h. + + * htlib/Configuration.cc (Read): Make sure we never reference a + negative position when trimming off whitespace. + + * config.guess, config.sub: Update with new versions from GNU to + recognize various flavors of Mac OS X/Rhapsody. + + * htlib/strptime.cc: Make sure len is initialized. + +Fri Jul 6 12:04:52 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/HtRegexList.cc (setEscaped): Fixed a potential problem + with list building. When we go back a step, we still have to + compile the new pattern in case it's the last one. + +Wed Jul 4 23:39:19 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/URL.cc (parse, ServerAlias): Fixed two problems that + caused incorrect signatures to be generated. + +Wed Jul 4 13:52:54 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * test/document.cc (dodoc), test/url.cc (dourl), + test/testnet.cc (Retrieve): Fixed up handling of config to match + David Graff's changes of May 16, and handling of HtHTTPBasic class + to match Joshua Gerth's changes of Mar 17. + +Tue Jul 3 16:20:56 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (GetLocal): Fixed to use URL class on given + URL, so that default port numbers are stripped off. This was needed + to allow local fetching of robots.txt. + + * htnet/Connection.cc (ctors, dtor, Assign_Server, Get_Peername), + htnet/Connection.h: Got rid of strdup stuff, used String class for + peer & server_name. + + * htnet/Connection.cc (Get_PeerIP): Used unambiguous name for structure. + + * htnet/HtHTTP.cc (ctor, dtor): Don't allocate a 2nd Connection, as + child classes already do this, and set pointer to null when connection + is deleted, so we don't try to delete it twice. This was messing up + the heap and causing segfaults. Call Transport::CloseConnection before + deleting connection. + + * htnet/HtHTTPBasic.cc (dtor), htnet/HtHTTPSecure.cc (dtor), + + * htnet/HtNNTP.cc (dtor): Only delete connection if non-null, & set + to null after deleting. Call Transport::CloseConnection before + deleting connection. + + * htnet/Transport.cc (CloseConnection): Don't exit if connection + pointer is null, as this may be normal when called from destructor. + +Fri Jun 29 11:14:36 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Endings.cc (getWords): Undid change introduced in 3.1.3, + in part. It now gets permutations of word whether or not it has + a root, but it also gets permutations of one or more roots that + the word has, based on a suggestion by Alexander Lebedev. + * htfuzzy/EndingsDB.cc (createRoot): Fixed to handle words that have + more than one root. + * installdir/english.0: Removed P flag from wit, like and high, so + they're not treated as roots of witness, likeness and highness, which + are already in the dictionary. + +Mon Jun 25 12:50:47 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc (main): Got rid of last remnants of 'urllist' + and used the 'l' StringList as was used in the code before, to make + restrict and exclude handling work properly. + +Mon Jun 25 15:52:19 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htsearch/htsearch.cc: defined 'urllist' in order to remove the + compilation error (as Jesse suggested). + +Fri Jun 22 16:28:13 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (buildMatchList): Fix date_factor calculation + to avoid 32-bit int overflow after multiplication by 1000, and avoid + repetitive time(0) call, as contributed by Marc Pohl. Also move the + localtime() call up before gmtime() call, to avoid clobbering gmtime's + returned static structure (my thinko). + + * htdig/htdig.cc (main): Use .work file for md5_db, if -a given, + as contributed by Marc Pohl. + + * htcommon/URL.cc (constructURL): Ensure that the _host is set if we + are constructing non-file urls, as contributed by Marc Pohl. + + * htdoc/THANKS.html: Credit Marc Pohl for patches. + +Tue Jun 19 17:14:05 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * README: Bump up to 3.2.0b4, fix note about bug report submissions. + +Tue Jun 19 17:01:16 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (setVariables): Fixed handling of + build_select_lists attribute, to deal with new restrict & exclude + attributes. + +Mon Jun 18 12:16:27 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * configure.in, configure: Fix "hdig" typo in help. + +Fri Jun 15 17:57:19 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: Noted effect of locale setting on floating + point numbers in search_algorithm and locale descriptions. + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Fri Jun 15 15:36:51 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/cf_generate.pl: Fixed to handle new defaults.cc format + with trailing backslashes. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Fri Jun 15 14:57:21 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdb/htdb_dump.cc, htdb/htdb_load.cc, htdb/htdb_stat.cc: Added a + conditional include of <getopt.h> if HAVE_GETOPT_H is defined. + +Fri Jun 15 11:25:24 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc (main), htcommon/defaults.cc, + htdoc/hts_form.html: two new attributes, used by htsearch, have + been added: restrict and exclude. They can now give more control + to template customisation through configuration files, allowing + to restrict or exclude URLs from search without passing + any CGI variables (although this specification overrides the + configuration one). + +Fri Jun 15 09:34:23 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc (main): Changed ridiculously outdated question + "Did you run htmerge?" to "Did you run htdig?". + +Fri Jun 8 11:07:04 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc: Add <float.h> header, now needed for RH 7.1. + +Thu Jun 7 12:05:09 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/htdig-3.2.0.spec: Updated to 3.2.0b4. + + * contrib/README: Mention acroconv.pl script. + +Thu Jun 7 10:46:19 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (expandVariables): Use isalnum() instead of + isalpha() to allow digits in variable names, allow '-' in variable + names too for consistency with attribute name handling. + +Wed Jun 6 16:14:06 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * httools/htpurge.cc (main): Added missing "u:" declaration in + getopt() call. + +Wed Jun 6 15:24:04 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/doc2html/DETAILS, contrib/doc2html/README, + contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl, + contrib/doc2html/swf2html.pl: Update to version 3.0 of doc2html, + contributed by David Adams <D.J.Adams at soton.ac.uk>. + +Wed May 16 11:23:04 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + Added a pile of changes contributed by David Graff + <phlat at mindspring.com> fixing compilation problems with + non-gcc/g++ compilers (i.e. Sun's compiler). + + * Makefile.config, db/Makefile.am: Added no-dependencies to + AUTOMAKE_OPTIONS for those not on GNU C/C++ + + * configure.in: Changed AM_PROG_YACC to AC_PROG_YACC as autoconf + and autoreconf both complain that AM_PROG_YACC is not in the + library. + + * htcommon/DocumentDB.cc: Removed default parameters as they are + already declared in the header + + * htcommon/HtConfiguration.cc: Changed some of the loop + declarations so that Sparc C 4.2 is happy. Removed default + parameters as they are already declared in the header Moved inline + ParseString to header where it belongs. Added initialization for + HtConfiguration::_config static member variable. Added + implementation of HtConfiguration::config() static class member. + + * htcommon/HtConfiguration.h: Added include for ParsedString.h. + Added declaration of static member function ::config(). + Added private static member variable _config;. + Added inline ParseString from implementation. + + * htcommon/HtURLCodec.cc, htcommon/HtURLRewriter.cc, + htcommon/HtZlibCodec.cc, htcommon/URL.cc, htcommon/conf_lexer.lxx, + htdig/Document.cc, htdig/ExternalParser.cc, + htdig/ExternalTransport.cc, htdig/HTML.cc, htdig/Parsable.cc, + htdig/Plaintext.cc, htdig/Retriever.cc, : + Changed to use new global configuration semantics. + + * htcommon/conf_parser.yxx: Added a return to yyerror to quiet + Sparc C 4.2. Should really return a value here. Is it normal to + return a YY_something or just -1, 0, ? + + * htcommon/defaults.cc: Added line continuation characters at the + end of all the string lines that did not completed by a quote. + + * htcommon/defaults.h, htdig/htdig.h: Removed extern + HtConfiguation config in favor of HtConfiguration::config(). + + * htdig/ExternalTransport.h Changed return type of GetResponse to + match superclass. + + * htdig/Server.cc, htdig/htdig.cc, htfuzzy/htfuzzy.cc, htnet/HtFile.cc, + htsearch/Display.cc, htsearch/QueryLexer.cc, htsearch/WordSearcher.cc, + htsearch/htsearch.cc, htsearch/parser.cc, htsearch/qtest.cc, + httools/htdump.cc, httools/htload.cc, httools/htmerge.cc, + httools/htnotify.cc, httools/htpurge.cc, httools/htstat.cc + htlib/Configuration.cc, htlib/HtRegex.cc: + Changed constructor to use initializers + + * htlib/HtDateTime.cc: Moved inlines to header + + * htlib/HtDateTime.h: Added inlines from implementation + + * htlib/HtHeap.cc, htlib/HtHeap.h, htlib/HtVector.cc, htlib/HtVector.h, + htlib/HtVectorGeneric.h, htlib/HtVectorGenericCode.h: + Changed Copy member to return same type as superclass + + * htlib/HtRegexReplace.cc, htlib/HtRegexReplaceList.cc: Removed + default parameters as they are declared already in the header + + * htlib/myqsort.h: Changed comment in header to use C-style + comments as it's compiled using a C. + + * htlib/regex.h: Changed #if __STDC__ to #if defined(__STDC__) + + * htword/WordKey.h: Corrected const'ness + +Wed May 9 07:50:19 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookieJar.h: ShowSummary makes the class abstract + +Sat May 5 20:51:00 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/cf_blocks.html: Add colon in example and description of + blocks to match code for the moment. The parser can be changed + later if we like. + +Sat May 5 20:38:44 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/ParsedString.cc (get): Use isalnum() instead of isalpha() + for looking up--allows names that contain digits too. + +Sat May 5 20:36:29 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/htString.h (class String): Remove now-obsolete and + confusing int() casting operator. This was previously used to make + a string of a certain length. Use String(int) as a ctor instead. + +Sat May 5 20:30:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htword/WordContext.[h,cc]: Change Initialize to supply a config + that can be modified (i.e. if we don't have ZLIB_H). + +Sat May 5 23:30:55 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookieJar.h: ShowSummary, printing cookies (to be derived) + * htnet/HtCookieMemJar.[h,cc]: ShowSummary, printing cookies + +Thu May 3 23:14:14 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP[h,cc]: connection object is now created and destroyed. + NULL pointers converted to C++ standard (0). + * htnet/Transport[h,cc]: NULL pointers converted to C++ standard (0). + * htnet/Connection[h,cc]: ditto + +Thu May 3 23:09:33 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htlib/HtDateTime.[h,cc]: Timestamp format added (used by ht://Check + for MySQL interfacing) - keeping them equal helps me maintaining + both of them! + +Thu May 3 10:28:56 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/parser.cc (perform_and): Add missing return statement, + as suggested by Quim Sanmarti. + +Fri Mar 30 15:50:42 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/ResultMatch.h, htsearch/ResultMatch.cc (setTitle): Changed + argument type to char * to fix problem with sort by title not working, + as reported by Adam Lewenberg. + +Fri Mar 30 14:08:51 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.h, htdig/Retriever.cc (parse_url): Define and use + Document::StoredLength() method to get actual length of data + retrieved and given to md5(), which may be less than original + length. Fixes bug reported by Michael Haggerty. + +Wed Mar 21 22:22:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc (generateStars): Add NSTARS variable for + template output as suggested by Caleb Crome + <ccrome at users.sourceforge.net> (except here precision is 0). Fixes + feature request #405787. + + * htdoc/hts_templates.html: Add description of NSTARS variable + above. + + * htlib/HtRegex.cc (set): Make sure we free memory if we've + already compiled a pattern. + + * htdig/Retriever.cc (got_href): Fix bug pointed out by Gilles + with hopcounts and don't bother to update the DocURL unless we + have a new doc. + +Mon Mar 19 18:00:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/URL.cc (URL): Make sure even absolute relative URLs are + run through normalizePath() as pointed out by Gilles. Allows + backout of previous fix of #408586, which does extra re-parsing of + URL. + + * htdig/Retriever.cc (Need2Get): Back out change of Mar. 17 for above. + + * htcommon/conf_lexer.[cxx, lxx]: Apply change suggested by Jesse + to remove empty statements. + +Mon Mar 19 11:33:25 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegexList.cc (setEscaped): Fix assorted bugs, including + obvious segfault, incorrect creation of limits, and failure to set + "compiled" flag before return(). + + * htdig/Retriever.cc (IsValidURL): Make sure the tmpList is + cleared before attempting to parse the bad_querystr + config--otherwise we'll just Add to the end of the list. + +Sun Mar 18 14:01:56 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/Transport.[h,cc], htnet/HtHTTP.cc: In order to modularize + the net code the default parser string for the content-type has + been added to the Transport class. + * htdig/Document.cc: modified for the changes above. + +Sat Mar 17 16:38:27 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in, configure, include/htconfig.h.in: Add tests for + libssl, libcrypto, and ssl.h. + + * htnet/SSLConnection.[cc,h], htnet/HtHTTPBasic.[cc,h], + htnet/HTTPSecure.[cc,h]: New files. Contributed by Joshua Gerth + <jgerth at hmsoaps.com>. + + * htnet/Transport.[cc,h], htnet/HtNTTP.cc, htnet/HtHTTP.cc, + htnet/Connection.h: Changes needed to support SSLConnection class. + + * htdig/Document.cc, htdig/Document.h: Ditto. + + * htnet/Makefile.am, htnet/Makefile.in: Add above for compilation. + + * htdoc/THANKS.html: Updated with new contributors. + +Sat Mar 17 15:28:20 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htword/WordContext.cc (Initialize): If HAVE_LIBZ or HAVE_ZLIB_H + are not defined, make sure wordlist_compress is set to false. This + semi-hack will not be necessary with new mifluz code which does + not necessary need zlib. Fixes bug #405761. + +Sat Mar 17 14:39:17 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Fixed problems with META descriptions + containing newlines, returns or tabs. They are now replaced with + spaces. Fixes bug #405771. + +Sat Mar 17 14:26:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Improve handling of whitespace in META + refresh handling. Fixes bug #406244. + + * htlib/HtRegexList.cc (setEscaped): Make this more efficient by + building up larger and larger patterns--when we fail, go back a + step and add the pattern in the next loop. This ensures we have a + list of the maximum allowable length regexp. + + * htdig/Retriever.cc (Need2Get): Add change suggested by Yariv Tal + to run URLs through the URL parser for cleanup before comparing to + the visited list. Fixes bug #408586. + +Mon Mar 12 13:28:56 2001 Michael Haggerty <mhagger at alum.mit.edu> + + * htdig/Retriever.cc, htdig/Retriever.h: + Fixed two off-by-one errors related to Retriever::factor table. + +Mon Mar 12 11:25:31 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Dictionary.cc (Add): Fix comments about add method--it + will replace existing keys. Fixes report #407940. + +Thu Mar 8 15:31:45 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc: removed an unuseful <else> + +Tue Mar 6 11:42:10 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/regex.[c,h]: Update with versions from glibc 2.2.2. + +Mon Mar 5 13:47:30 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * ltconfig (host_os): Add test to solve problems building C++ + shared libraries on some platforms. Currently should only make + --enable-shared the default on Linux and *BSD* unless specified + explicitly by the user. + +Mon Mar 5 12:52:57 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/String.cc (operator =): Add fix contributed by Yariv Tal + <YarivT at webmap.com>, fixed bug #406075. + +Mon Mar 5 12:06:26 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegexList.cc (match): Ignore rearrangement code for the + moment--may or may not be the culprit for bug #405277, but is a + start to debugging the problem. + + * htlib/List.[cc,h]: Remove *prev pointer from listnode + structure and add a *prev pointer to the cursor structure. Saves + one pointer per item in the list, plus overhead. + +Mon Mar 5 11:56:16 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc (bad_extensions): Add .css to ignore CSS docs. + + * htdig/Document.cc (getParsable): Ignore CSS documents -- they + aren't very useful to parse. Solves bug report #405772. + +Sun Mar 04 11:32:43 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.cc: fixed a bug regarding <no header> with persistent + connections enabled, but head call before the get one disabled. + Sourceforge.net's bug reference: 405275 - fixed. + +Sat Mar 3 21:09:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * .version: Bump to 3.2.0b4 so snapshots have right versioning. + +Thu Mar 1 16:51:09 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in: Added test for alloca.h, which is needed for the + regex.c code. + +Wed Feb 28 12:54:43 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htcommon/defaults.cc: 'disable_cookies' option has been added, with + a 'server' scope. By default it is set to 'false'. + * htdig/Server.h, cc: management of the option above has been enhanced. + * htnet/HtHTTP.h, cc: now an HTTP connection can disable/enable cookies + through the configuration attribute 'disable_cookies'. + * htdig/Document.cc: management of cookies enabling/disabling is here. + * Cookies classes: now support the expiration time. Need only the + subdomain treatment. + +Mon Feb 26 16:37:30 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/conf_lexer.lxx: Don't directly call exit(1) on an error + condition! Seems a harsh problem for an unknown character. + + * htcommon/conf_parser.yxx: Ditto. (Running out of memory is a + much more fatal condition, of course.) + + * htcommon/conf_lexer.cxx: Regenerate using flex 2.5.4. + + * htcommon/conf_parser.cxx: Regenerate using bison 1.28. + +Sun Feb 25 19:46:01 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtHTTP.h, cc: support for cookies enabled + * htnet/Makefile.am: files for cookies have been added to make. + +Sun Feb 25 19:27:18 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net> + + * htnet/HtCookie.h,cc: class HTTP cookie + * htnet/HtCookieJar.h,cc: abstract class for managing the + 'jar' of cookies. In this way, we can use different methods + for the storage of them. + * htnet/HtCookieMemJar.h,cc: class for managing the 'jar' of + cookies in memory, without persistent storage (no db or file). + * Many thanks to Robert LaFerla for his coding on this! Yeah, + really really thanks Robert! <robertlaferla at mediaone.net> + + +Thu Feb 22 16:43:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/ChangeLog, htdig/RELEASE.html, README: Update to roll the + release of 3.2.0b3. + +Thu Feb 22 16:22:05 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc (main), htsearch/Display.cc (setVariables, + createURL, buildMatchList), htdoc/hts_form.html, + htdoc/hts_templates.html: Add Mike Grommet's date range search + feature. + +Mon Feb 19 18:24:42 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Synonym.cc (createDB): Create database in a temporary + directory before we move it into place, much like the endings + code. This should prevent problems when we just append to the DB + instead of making a new one. + + * htdig/htdig.cc (main): Fix bug discovered by Gilles--htword + should be initialized *after* we are finished modifying config + attributes based on flags and unlink with -i. + + * installdir/rundig: Fix bug with calling htpurge with -s option. + +Thu Feb 15 11:03:42 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/*.html: Update with 2001 copyrights and various changes + with the website move for the pending 3.2.0b3 release. + +Thu Feb 15 10:41:47 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegexList.cc (match): Fix thinko with logic for matching + and add code to rearrange matching nodes for hopefully better + performance. + +Sun Feb 11 16:42:11 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegexList.h, htlib/HtRegexList.cc (class HtRegexList): + Simple List(HtRegex) object with similar calling conventions to + HtRegex class. This version is not as sophisticated as it could + be, but it's not likely to drop objects when reorganizing. + + * htlib/Makefile.[in,am]: Add HtRegexList files to list for + compilation. + + * htdig/htdig.h, htdig/htdig.cc, htdig/Retriever.cc: Use + HtRegexList instead of HtRegex for setting escaped values--should + never fail (since each String item is short). + + * htlib/HtDateTime.cc: Put back timezone specs into the output + formats so we give everything even if we ignore it when reading + input. + +Mon Feb 5 11:47:07 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtDateTime.cc: Remove the timezone specs in the date + formats--these are not required in the RFCs because many dates are + in GMT anyway. + +Wed Jan 17 08:48:30 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalTransport.cc (Request): Oops, fixed a holdover from + code borrowed from ExternalParser.cc's fork handling. + +Mon Jan 15 23:09:37 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Connection.cc: Back out previous change--this should not + in any way be needed since the configure script should set + FD_SET_T. + + * configure.in, configure: Add more lenient prototyping for + select() test--now allows "const struct timeval" for compilation + on BSDI. + + * htdoc/RELEASE.html: Update with Gilles's changes. + + * htdoc/cf_blocks.html: New file describing <server ...></server> + and <url ...></url> blocks. + + * htdoc/cf_general.html, htdoc/confmenu.html: Refer to the above. + +Mon Jan 15 17:46:07 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/TemplateList.cc (createFromString), htcommon/defaults.cc: + Treat template_map as a _quoted_ string list. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Mon Jan 15 17:40:45 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/hts_templates.html: Add METADESCRIPTION variable. + + * htsearch/Display.cc (displayMatch): Add METADESCRIPTION variable. + + * htdig/ExternalParser.cc (parse): Fix up handling of arguments. + + * htdig/ExternalTransport.cc (Request): Fix up handling of fork/exec + and command arguments, add wait() call. + +Wed Jan 10 19:23:36 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/rundig: Fix -a handling to move db.words.db.work_weakcmpr + into place if it exists + +Sat Jan 6 21:50:58 2001 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in: Add checks for <sys/wait.h> and <wait.h> for + ExternalParser. + + * include/htconfig.h.in: Regenerate using autoheader. + + * configure: Regenerate using configure. + + * htnet/Connection.cc: Add definition for FD_SET_T to fix problems + compiling on BSDI mentioned by Joe. + + * htdig/ExternalParser.cc: Use <sys/wait.h> or <wait.h> as + appropriate. Should fix problems with compiliation mentioned by + Jesse on HP/UX. + + * README, htdoc/RELEASE.html: Adjust dates for the new year. + + * htdoc/upgrade.html: A few "remaining features" have been implemented. + +Sun Dec 06 19:46:15 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.cc: Fixed bug for Read_Line function call in + ReadChunkedBody method. Many thanks to Robert LaFerla. ;-) + +Tue Dec 12 13:24:49 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Fixed to properly handle binary + output from an external converter. Fixed some compilation errors. + +Tue Dec 12 12:52:14 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Handle parser command string + as a string list again to allow arguments, build up argv and + use execv instead of execl. + +Tue Dec 12 12:25:04 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Add call to wait for child process, + to avoid zombie buildup. + +Mon Dec 11 23:57:43 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Fix up handling of fds in child + process, more fault-tolerant handling of pipe or fork errors. + +Mon Dec 11 23:30:55 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Fix up handling of creation + of temporary file, check for proper return code, give error if + appropriate. + +Mon Dec 11 23:19:28 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Lowercase content-types and + strip off any trailing semicolons, at one last spot. This reinserts + code added Sep 11, which was dropped Oct 9, probably inadvertantly + during mifluz back-out. + +Sun Dec 10 15:28:44 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/ExternalTransport.cc: Use fork/exec instead of calling + popen, which bypasses any shell escape problems. + + * htdig/ExternalParser.cc: Ditto, plus use of mkstemp where + available to pick the filename. + + * configure, configure.in: Check for mkstemp where available. + + * include/htconfig.h.in: Define it as above. + + * htlib/Makefile.am: Omit regex.c from SOURCES--this is included + when necessary by the configure script. Otherwise this produces + duplicate declarations, etc. + + * htlib/Makefile.in: Regenerate using automake --foreign. + + * htcommon/URL.cc: Fix bug with ports of 0 showing up in URLs like + mailto: or other less-common protocols. + +Fri Dec 1 14:45:33 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/htdig-3.2.0.spec: Updated to 3.2.0b3. + +Fri Dec 1 13:59:09 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/Makefile.am: Fix pkginclude_HEADERS to list missing headers + ber.h, libdefs.h, myqsort.h, mhash_md5.h, omit unneeded langinfo.h; + fix libht_la_SOURCES to list missing sources regex.c, myqsort.c. + + * htlib/Makefile.in: Regenerate using automake --foreign + + * htlib/langinfo.h, htlib/nl_types.h: Removed as they're now unused. + +Fri Dec 1 13:22:47 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/strptime.cc (mystrptime): make ptr const and use cast on + return value to avoid warnings. + + * htlib/Makefile.am: Fix pkginclude_HEADERS to list HtRegexReplace*.h + rather than .cc. + + * htlib/Makefile.in: Regenerate using automake --foreign + +Fri Dec 1 11:58:21 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * Makefile.in, [hit]*/Makefile.in: Regenerate using automake --foreign + after fixing bug with cp -pr in automake. + +Thu Nov 30 14:41:58 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/Makefile.am: Removed howitworks.html from EXTRA_DIST. + + * Makefile.in (distdir): Added missing variable name 'd' to cp -pr. + +Thu Nov 30 14:01:48 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/strptime.cc, htlib/lib.h: make first 2 args to strptime + const to avoid warnings, use cast in asizeof to avoid warnings. + + * htsearch/qtest.cc: Change include from iostream to iostream.h + + * htsearch/DocMatch.cc: Change include from iostream to iostream.h + + * htsearch/Display.cc (createURL, buildMatchList, excerpt, hilight): + Clean up code to get rid of warnings, especially resulting from + NULLs in ternary operators. + +Thu Nov 30 10:55:09 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/String_fmt.cc (form, vform): Use vsnprintf rather than + vsprintf, for buffer overflow prevention if vsnprintf available. + + * htdig/Retriever.cc: Remove unused strptime declaration. + + * htlib/HtDateTime.cc: Use mystrptime if HAVE_STRPTIME not set. + +Wed Nov 29 23:31:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdb/htdb_stat.cc, htdb_load.cc, htdb_dump.cc: Make sure we + include htconfig.h to include proper declarations. + + * htlib/strptime.cc: Change to strptime.cc, from htdig-3.1 series + hopefully more portable until I can find a more suitable + replacement. + + * htlib/Makefile.am, htlib/Makefile.in: As above. + + * htlib/clib.h, htlib/lib.h: Ditto. + + * htdoc/all.html: Add a first draft of program summaries. + +Wed Nov 29 18:00:15 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (parse_url): Remove undeclared "dup" variable, + add missing calls to words.Skip(). + +Wed Nov 29 17:44:56 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/htdig.html: Add description of -v output. + +Mon Nov 27 12:03:34 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/md5.cc: Added missing include of time.h + +Fri Nov 24 00:56:01 2000 Toivo Pedaste <toivo at ucs.uwa.edu.au> + + * htsearch/Display.cc: Some extra debugging for scoring + +Sun Nov 19 00:56:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/HtFile.cc (Request): Use opendir/readdir instead of + scandir for generating directory listings on-the-fly. + + * htdoc/RELEASE.html: Write up release notes for 3.2.0b3. + + * htdoc/THANKS.html: Update list of contributors for 3.2.0b3 as + current. + +Fri Nov 17 14:52:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/acroconv.pl: Added external converter script to convert + PDFs with acroread. + +Mon Nov 6 12:13:13 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (GetLocal, GetLocalUser): move String definition + out of while statement for AIX xlC compiler. + +Mon Oct 30 21:50:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Server.h, htdig/Server.cc (push): Add newDoc paramter that + will allow redirects (old docs) to be followed and not count + against the maxDoc restrictions. + + * htdig/Retriever.cc (got_redirect): Use new parameter so we don't + count against a server's max documents since it's a redirect. + + * htlib/nl_types.h: Add for systems missing this header file. + +Sun Oct 29 21:36:51 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Updated per-server and per-URL fields to + match code. I still have a "wish list" of additional attributes + that should work this way eventually. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Sun Oct 22 17:13:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/HtWordList.h: Add missing include for stdlib.h needed for + abort(). + + * htsearch/BooleanQueryParser.cc (ParseAnd): Fix problems with RH7 + compiler -- shouldn't use "not" as a variable name! + +Thu Oct 19 22:19:16 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * ltmain.sh, ltconfig: Update with versions from libtool + 1.3.5. which may fix some problems building libraries. + +Mon Oct 9 21:59:11 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * */* [many, many files]: Backed out mifluz merge by going back on + modified files to 091000 snapshot. + + * configure: Regenerated from configure.in. + + * */Makefile.in: Regenerated using automake. + +Fri Oct 6 11:03:14 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (do_tag): Parse <object> tags properly, looking + for data= attribute rather than src=. + + * htcommon/defaults.cc (server_aliases): Additional clarification + to server_aliases description of port numbers. + +Wed Oct 4 12:12:31 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (limit_normalized, server_aliases, + server_max_docs, server_wait_time): Added clarification + to server_aliases description. Changed word "directive" to + "attribute" where appropriate. Added cross-link to server_aliases + from limit_normalized. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Wed Sep 27 00:05:41 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdb/mifluz[dict, dump, load].cc, htdb/util_sig.h, + htdb/util_sig.cc: New files from mifluz merge. (Whoops, missed a + directory). + + * htdb/*.cc: Change config.h references to htconfig.h. + + * htlib/myqsort.c: Ditto. + + * htcommon/HtWordReference.h, htcommon/HtWordReference.cc: Ensure + we keep the WordContext object around--unfortunately this also + requires that callers initialize us with a WordContext (e.g. from + the HtWordList class). + + * htlib/StringMatch.h, htlib/StringMatch.cc: Changes to use + WordType directly instead of HtWordType. + + * htfuzzy/*: Ditto. Additionally make sure HtWordReference objects + are intstantiated properly. + + * htcommon/DocumentRef.cc, htcommon/HtWordList.cc: As above. + + * htdig/*: As above. + + * htsearch/*: As above. + + * httools/*: Don't bother initializing WordContext--this is done + in the HtWordList class now. + + * htdig/htdig.cc: Ditto. + + * htsearch/htsearch.cc, htsearch/qtest.cc: Ditto. + + * htfuzzy/htfuzzy.cc: Ditto. + + * db/Makefile.am, db/Makefile.in: Update to build libhtdb instead + of libdb to prevent conflicts. + +Sun Sep 24 22:50:22 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htword/HtWordList.h, htword/HtWordList.cc: Keep a WordContext + object private that is associated with this word database and + provide accessor. + + * htword/WordType.h, htword/WordType.cc: Add WordToken function, + migrated from HtWordType class. + + * htcommon/HtWordType.cc: WordType class no longer has Instance() + method, so just pass along the calls. + + * htlib/DB2_db.cc (db_init): Remove unnecessary NULL parameter. + + * htlib/Makefile.am, htlib/Makefile.in: Remove HtVectorGeneric and + derived files as well as HtWordType as these are depreciated. + +Wed Sep 20 22:47:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * aclocal.m4: Add in missing autoconf macros that somehow didn't + make the merge before. (No idea why I didn't catch this earlier.) + + * acinclude.m4: Use newer CHECK_ZLIB macro. + + * */Makefile.in: Updated with automake for new build changes. + + * configure, include/htconfig.h.in: Updated using autoconf. + + * test/dbbench.cc, test/word.cc, test/search.cc: Fix #include to + point to htconfig.h not non-existant config.h. + + * htlib/Configuration.h: Fix copy ctor, removing code in header file. + + * htword/*.cc: Ditto. + + * htword/Makefile.am: Update from mifluz version. + + * htlib/myqsort.h, htlib/myqsort.c: Additional system library + replacement code. + +Sat Sep 16 20:14:32 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in, configure, acinclude.m4, aclocal.m4, acconfig.h, + include/htconfig.h.in: Merged with mifluz versions. Main + difference is that top-level configure script now also configures + db/ directory as well. + + * Makefile.am, */Makefile.in: Updated with automake for new build + environment (with db/ run through top-level configure). + + * db/*.c: Updated to use htconfig.h instead of config.h. + +Wed Sep 13 22:05:33 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * Merged in mifluz-0.19 branch. Everything will break + temporarily. Loic and I will clean up tomorrow. + + * htdoc/RELEASE.html, htdoc/THANKS.html, htdoc/TODO.html: Get a + start on updting these files for the next release. + + * htdoc/cf_generate.pl: Revert change of Sep. 9 to ignore links to + all.html in cf_byprog.html file. + + * htdoc/all.html: New file, moved from howitworks.html and not + updated yet. + + * htdoc/contents.html: Change link from howitworks.html to all.html + +Tue Sep 12 17:00:00 CEST 2000 Quim Sanmarti <qss at gtd.es> + + * htsearch: added AndQuery.cc BooleanLexer.cc BooleanQueryParser.cc + ExactWordQuery.cc GParser.cc NearQuery.cc NotQuery.cc + OperatorQuery.cc OrFuzzyExpander.cc OrQuery.cc + PhraseQuery.cc Query.cc QueryLexer.cc QueryParser.cc + SimpleQueryParser.cc VolatileCache.cc WordSearcher.cc + qtest.cc WordSearcher.h AndQuery.h AndQueryParser.h + BooleanLexer.h BooleanQueryParser.h ExactWordQuery.h + FuzzyExpander.h GParser.h NearQuery.h NotQuery.h + OperatorQuery.h OrFuzzyExpander.h OrQuery.h OrQueryParser.h + PhraseQuery.h Query.h QueryCache.h QueryLexer.h + QueryParser.h SimpleLexer.h SimpleQueryParser.h VolatileCache.h. + This is the new query parsing/evaluation framework. + + * Modified DocMatch.{cc,h} and ResultList.{cc,h} for compatibility. + + * Removed the previous {And,Or,Exact,}ParseTree.{cc,h} files. + + * Modified Makefile.{am,in} consequently. + +Mon Sep 11 11:56:44 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc (parse): Lowercase content-types and + strip off any trailing semicolons, at one last spot which Geoff missed. + +Sat Sep 9 21:28:29 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Document.cc (getParsable): Fix a bug with earlier + change--if no parser is found and the MIME type is not text/* then + return a NULL parser. + + * htdig/Retriever.cc (RetrievedDocument): If a NULL parser is + returned, mark the document as noindex and move on. + + * configure.in, configure (enable-tests): Fix bug that would run + the 'yes' program inside the configure script if --enable-tests + was set. + +Sat Sep 9 17:50:11 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Add "all" program listing for common + attributes--seems more logical esp. now with many httool programs. + + * htdoc/cf_generate.pl (cf_byprog): Do not output a link when + 'prog' is 'all.' + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Sat Sep 9 11:44:47 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * aclocal.m4 (AM_CHECK_YACC): New macro to check for bison/yacc + and use "missing yacc" if not found. + + * configure.in (enable_tests): Fix buglet where --enable-tests=no + or --disable-tests would not work and set the default to enabled + tests. Since the tests do not build unless the user does a "make + check" this should not be confusing and should help debugging. + Also use AM_CHECK_YACC instead of AC_CHECK_YACC. + + * configure: Regenerate using autoconf. + +Sat Sep 9 11:01:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/ExternalParser.cc (canParse): Lowercase content-types and + strip off any trailing semicolons. Should prevent problems with + combined content-type; charset values. + (ctor): As above. + + * htdig/Document.cc (getParsable): Only assume plain text if MIME + code starts with text/. Should prevent problems with retrieving + things like image/png or application/postscript as text. + +Fri Sep 8 22:59:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Add new attributes htnotify_replyto, + htnotify_webmaster, htnotify_prefix_file, htnotify_suffix_file. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + + * httools/htnotify.cc: Added in code from Richard Beton + <richard.beton at roke.co.uk> to collect multiple URLs per e-mail + address and allow customization of notification messages by + reading in header/footer text as designated by the new attributes + above. + +Fri Sep 8 15:15:00 2000 Quim Sanmarti <qss at gtd.es> + + * htsearch/Display.cc: Fixed tiny date_format bug; + added url-decoding template variable expansion. + +Thu Sep 7 23:45:25 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (Retriever): Only open up md5 database if + check_unique_md5 attribute is set. + +Thu Sep 7 22:56:19 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/URL.cc (DefaultPort): Add file default port of 0. + + * htnet/HtFile.cc (Request): Handle directory listings by using + scandir and generating minimal HTML file with appropriate noindex listing. + +Wed Sep 06 10:00:50 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htlib/URL.h, htlib/URL.cc: Restored corrected versions of URL.* + * htnet/HtNNTP.h: Removed the error in the NNTP class declaration + +Mon Sep 04 13:43:40 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.cc: Restored previous version of HtHTTP. I removed + an initialization in the constructor (_modification_time). Sorry. + +Sun Sep 3 16:51:24 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc, htdig/Server.cc: Fix compiler warnings about + String conversions. + + * configure, configure.in, db/configure, db/configure.in, + db/acinclude.m4, db/aclocal.m4: Ensure --enable-bigfile is handled + correctly by the configure scripts as pointed out by Jesse. + +Fri Sep 01 23:28:43 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * URL.cc: added DefaultPort() method and changed NNTP default port + from 523 to 119. + * Document.cc: management of NNTP documents retrieval. + +Fri Sep 01 19:05:02 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtNNTP.* : just created them ... + * htnet/HtHTTP.cc : removed modification_time deletion in the + class destructor. + +Thu Sep 01 12:00:00 2000 Toivo Pedaste <toivo at ucs.uwa.edu.au> + + * htdig/Retriever.cc: Allow for modify time being set to + current time if not available. + +Thu Aug 31 13:21:12 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (allow_in_form, build_select_lists): + Add clearer instructions to allow_in_form description, add + cross-links between these two sections. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Wed Aug 30 10:01:59 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * substition of char * returned types to const String & in URL and + Server classes. This change made me do lots of changes in other files: + HtFile.cc, HtHTTP.cc, HtConfiguration.*, Document.*, ExternalParser.*, + Retriever.*. + +Tue Aug 30 12:00:00 2000 Toivo Pedaste <toivo at ucs.uwa.edu.au> + + * htlibs/md5.cc, htlibs/md5.h: Generate md5 hash of + a page and also optionally the modify date. + + * htlibs/mhash_md5.h, htlibs/mhash_md5.c, htlibs/libdefs.h: + Md5 hash code from libmhash + + * htdig/Retriever.cc: Allow storing m5 hashes of pages + in order to reject aliases. + + * htcommon/defaults.cc: Options "check_unique_md5" and + "check_unique_date" + +Tue Aug 29 08:51:39 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/upgrade.html: Add description of the difference between + htmerge and htpurge. Mention other httools. + + * htsearch/parser.cc, htsearch/parser.h: Merge in patch by Quim + Sanmarti <qss at gtd.es> to fix problems with phrase searching and + AND searches and improve performance. + +Sun Aug 27 22:41:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/AndParseTree.cc, htsearch/OrParseTree.cc (Parse): + Rewrote using new WordToken inherited method. Fixes a bug where + user input two phrases next to each other. + + * htsearch/ParseTree.cc (Parse): Fix bug where phrases would + "adsorb" prior query words. Also fix bug where operators were + incorrectly popped off the stack. Should (hopefully) solve all + parsing problems. + + * htsearch/*ParseTree.cc (GetLogicalWords): Test for empty list of + children to prevent potential segfault. + +Sat Aug 26 18:40:50 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * installdir/{syntax, header, footer, wrapper, nomatch}.html: + Add DTD tags, ALT attributes and remove bogus </select> tags to + fix invalid HTML pointed out in PR#901. + +Wed Aug 23 23:39:18 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/ParseTree.cc (Parse): Get rid of compiler warnings, use + new private tokenizer to ensure parens and quote aren't + removed. Also, when popping an operator off the parens stack, make + sure it's adopted by a new ParseTree object so we get the parens + back in the tree heirarchy. + +Wed Aug 23 23:34:44 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/AndParseTree.cc (Parse): Fix nasty infinite loop when + phrases hit in AND searches. + + * htsearch/OrParseTree.cc (Parse): Ditto. + +Wed Aug 23 13:24:31 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.*, htnet/Transport.h: all 'char *', when possibile, + have been changed into 'const String &' types. + +Sun Aug 20 23:25:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htpurge.cc (purgeDocs): Add error message when document + database is completely empty. Should take care of PR#672 (and others). + +Sun Aug 20 20:37:53 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegex.h, htlib/HtRegex.cc: Made destructor virtual, + added lastError() and associated support. Changed return type of + set*() to int. They now return the value of |compiled|. + + * htcommon/defaults.cc (url_rewrite_rules): Add new attribute to + support patch by Andy Armstrong <andy at tagish.com> for permanent + URL rewriting. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + + * htlib/HtRegexReplace.cc, htlib/HtRegexReplaceList.cc, + htlib/HtRegexReplace.h, htlib/HtRegexReplaceList.h, + htcommon/HtURLRewriter.cc, htcommon/HtURLRewriter.h: New classes. + + * htcommon/Makefile.am, htcommon/Makefile.in: Add compilation for + HtURLRewriter. + + * htlib/Makefile.am, htcommon/Makefile.in: Ditto for + HtRegexReplace* + + * htcommon/URL.h, htcommon/URL.cc (rewrite): New method for + transforming URLs based on HtURLRewriter. + + * htdig/Retriever.cc (got_href): Rewrite the URL before we do + anything with it. + + * htdig/htdig.cc: Include HtURLRewriter headers and check rewrite + rules for errors. + +Sat Aug 19 17:01:36 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/conf_lexer.lxx: Patched to fix the bug with relative + filename includes. Keeps a separate stack with the filenames and + adjusts accordingly. + + * htcommon/conf_lexer.cxx: Updated using flex 2.5.4. + +Thu Aug 17 23:59:26 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/conf_lexer.lxx: Patched to fix a bug reported by Abel + Deuring -- config filename stack was decremented too many times. + + * htcommon/conf_lexer.cxx: Updated using flex 2.5.4. + +Thu Aug 17 23:40:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htword/WordType.h (WordToken): Add non-destructive version of + HtWordToken using a passed int as a pointer into the + string. Add virtual destructor so class can be sub-classed. + + * htword/WordType.cc (WordToken): Implement it. + + * httools/htmerge.cc (mergeDB): Back out change of Aug. 9th -- + WordSearchDescription has disappeared from htword + interfaces. Should be restored when Loic comes back and can + suggest an alternative. + +Thu Aug 17 16:59:05 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (createURL): Get rid of extra "config=" + parameter that was inserted before collections stuff. + +Thu Aug 17 15:47:58 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.cc: ask again for a document after a <NoHeader> + response is given by the HTTPRequest() method. + +Thu Aug 17 12:25:33 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.*, htnet/Transport.* : fixed bug with HTTP/1.1 management. + Now the "Connection: close" directive is handled and force the connection + to be closed. So the bug has now been fixed. Fixed other minor bugs and + strings initializations. + +Tue Aug 15 00:24:33 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * contrib/multidig/Makefile, gen-collect, db.conf, multidig.conf: + Add missing trailing newlines as pointed out by Doug Moran + <dmoran at dougmoran.com>. + + * contrib/multidig/Makefile (install): Make sure scripts have a+x + permissions. Pointed out by Doug Moran. + + * contrib/multidig/new-collect: Fix typo to ensure MULTIDIG_CONF + is set correctly. + +Sun Aug 13 23:17:30 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Server.h, htdig/Server.cc (Server): Add support for + per-server user_agent configuration. + + * htdig/Document.cc (Retrieve): Ditto. + + * httools/htpurge.cc (purgeDocs): Set remove_* attributes on a + per-server basis. + + * htcommon/defaults.cc: Fix remove_bad_urls and + remove_unretrieved_urls to point to htpurge and not htmerge. + +Sat Aug 12 23:03:32 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/cf_generate.pl (html_escape): Fix mindless thinko with + perl stringwise-equal operator. Documentation is now generated + with block: portion appropriate to defaults.cc. + + * htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl. + +Fri Aug 11 16:03:18 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (parse): fix problem with & not being translated. + +Fri Aug 11 10:48:54 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (setVariables), htcommon/defaults.cc: Added + maximum_page_buttons attribute, to limit buttons to less than + maximum_pages. Fixes PR#731 & PR#781. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + +Wed Aug 9 23:04:39 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htmerge.cc (mergeDB): Add fix to prevent duplicate + documents when you merge a database with a copy of itself + contributed by Lorenzo. + +Wed Aug 9 22:58:39 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/parser.cc (score): Merged in patch contributed by + Lorenzo Campedelli <lorenzo.campedelli at libero.it> and Arthur + Prokosch <prokosch at aptima.com> to fix problems with AND operators + and phrase matches. + +Wed Aug 2 11:44:11 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (setVariables), htcommon/defaults.cc: Enhanced + build_select_lists attribute, to generate not only single-choice + select lists, but also select multiple lists, radio button lists + and checkbox lists. Added explanation and examples in documentation. + * htdoc/hts_selectors.html: Added detailed explanation of new feature. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + +Tue Aug 1 21:50:22 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/ParseTree.cc (Parse): Fix problems with token + comparisons and fix thinko with HtWordToken parsing--previously + didn't advance the parse step at all. + + * htsearch/*ParseTree.cc (Parse): Fix thinko with HtWordToken as + above--here it acted as an infinite loop. + + * htdig/ExternalParser.cc (parse): Add shell quoting around + content-type. Hard to exploit, but a server could potentially + return a strange value that could then be exectuted locally. + +Thu Jun 29 23:33:51 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/ParseTree.h, htsearch/ParseTree.cc: New parent class + for the new htsearch framework. Still needs work. + + * htsearch/*ParseTree.*: Derived classes appropriate to the method + indicated. + + * htsearch/parsetest.cc: New program to alllow initial + command-line testing of ParseTree classes. + + * htsearch/Makefile.am, htsearch/Makefile.in: Build parsetest in + addition to htsearch. Eventually, parsetest is probably best + modified slightly and moved into the tests directory. + +Tue Jun 20 22:29:57 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htmerge.cc (mergeDB): Merge in patch contributed by + Lorenzo Campedelli <lorenzo.campedelli at libero.it> to greatly + reduce memory usage. + +Sun Jun 18 13:15:43 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Object.h (class Object): Fix problems with retrieval order + by insuring the compare() method is declared const. + +Tue Jun 13 22:57:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (GetLocal): Fix bug that would cause a + coredump when local_urls was used and local_default_docs was + needed. The list of default filenames was freed before it should + have been. + +Tue Jun 13 19:30:28 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/HtWordReference.h, htcommon/HtWordReference.cc (Load, + LoadHeaders): New methods to check the header of an ASCII + representation and read it in. + + * htcommon/HtWordList.h, htcommon/HtWordList.cc (Load): Add load + method to read in data. Calls the new methods above. + + * httools/htload.cc: Open word databases read-write and call + HtWordList::Load(). + +Sun Jun 11 14:39:28 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc (generateStars): Fix problem when maxScore + == minScore as reported by Rajendra. Fixed problem PR#858. + (displayMatch): Ditto. + + * htsearch/htsearch.cc: Fix memory corruption problem in reporting + syntax errors pointed out by Rajendra. Fixes PR#860. + +Thu Jun 8 09:31:15 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Accents.h, htfuzzy/Accents.cc: Apply Robert Marchand's + patch to his algorithm. Gets rid of writeDB function (falls back + on default one in Fuzzy.cc), changes addWord, and adds a new + getWords function to override default. These avoid overhead of + unaccented forms of words in accents database, but ensure that + unaccented form of search word is always searched. + +Thu Jun 8 09:00:02 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/DocumentRef.h(DocScore, docScore), + htsearch/ResultMatch.cc(ScoreMatch::compare), + htsearch/ResultMatch.h(setScore, getScore, score), + htsearch/Display.cc(displayMatch, generateStars, buildMatchList): + Apply Terry Luedtke's patch for score calculations, to calculate + min & max from log(score). + +Thu Jun 8 08:47:03 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/doc2html/doc2html.pl: Apply David Adams' fix for missing + quote. + +Wed Jun 07 10:53:53 2000 Loic Dachary <loic at senga.org> + + * db/db.c (CDB___db_dbenv_setup): open mode is 0666 instead + of 0 otherwise the weakcmpr file is not open with the proper + mode. + +Tue Jun 6 23:48:48 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htpurge.cc: Fix coredump problems by passing + dictionaries as pointers rather than full objects (this is + preferred anyway). + +Sun Jun 4 22:17:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * test/t_htdig_local: Added test for local filesystem support. + + * test/config/htdig.conf2.in: Change to be a config file for + local_urls testing. + + * test/Makefile.am: Add t_htdig_local to list. + +Tue May 30 23:52:45 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htmerge.cc: Move to httools directory, remove "cleanup" + functionality now in htpurge and merge in htmerge.h and db.cc files. + + * httools/Makefile.am: Add htmerge now moved to this directory. + + * */Makefile.in: Update with automake. + + * Makefile.am (SUBDIRS): Remove htmerge, now found in httools. + + * configure.in: Ditto. + + * configure: Update with autoconf. + + * test/test_functions.in: Add paths for htpurge, htstat, htload, + htdump and update path for htmerge. + + * test/t_htdig: Change htmerge to htpurge to clean out incorrect URLs. + + * installdir/rundig: Change htmerge to htpurge. This needs serious + additional cleanup for use in 3.2 since many conventions have changed! + +Tue May 23 22:21:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * README: Fix for 3.2.0b3 and clean up organization a bit for new + directory structure. + +Wed May 17 23:22:31 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Add support for TITLE attributes in + anchor and related tags. + +Fri May 12 17:54:09 2000 Loic Dachary <loic at senga.org> + + * db/acinclude.m4: bigfile support is disabled by default. + + * db/mp_region.c (CDB___memp_close): clear weakcmpr pointer + when closing region so that memory pool files are not + released twice. + +Wed May 10 22:26:21 2000 Loic Dachary <loic at senga.org> + + * */*.cc: all include htconfig.h + + * htlib/HtTime.h: remove htconfig.h inclusion (never in headers) + + * htlib/*.h,*.cc: Fix copyright GNU Public -> Gnu General Public + and 1999, 2000 instead of 1999. + +Tue May 09 16:38:07 2000 Loic Dachary <loic at senga.org> + + * htsearch/Collection.cc (Collection): set searchWords and + searchWordsPattern to null in constructor. Delete in destructor. + Also delete matches in destructor. + + * test/word.cc (doskip_harness): free cursor after use. + + * test/word.cc (doskip_overflow): free cursor after use. + + * test/dbbench.cc (find): free cursor after use. + + * htsearch/htsearch.cc (main): free searchWords and searchWordsPattern + after usage. + + * htdb/htdb_{load,dump,stat}.cc (main): call WordContext::Finish + to free global context for inverted index. + + * htdb/htdb_stat.cc (btree_stats): free stat structure. + + * htlib/List.h (class List): Add Shift/Unshift/Push/Pop methods. + + * htlib/List.h (class List): Add Remove(int position) method. + +Tue May 09 00:22:33 2000 Loic Dachary <loic at senga.org> + + * htsearch/htsearch.cc (main): kill useless call to + StringList::Release + + * htsearch/HtURLSeedScore.cc (ScoreAdjustItem): remove useless + call to StringList::Destroy. + + * htlib/HtWordCodec.cc (HtWordCodec): Fix usage of StringList + that was inserting pointers to volatile strings instead of + permanent copies. I suspect that the tweak on StringList was + primarily done to satisfy this piece of code. After reviewing + all the usage of StringList, it's the only one to use it in this + fashion. + + * htlib/QuotedStringList.h (class QuotedStringList): remove + noop destructor to enable Destroy of the underlying StringList + when deleted. + +Mon May 08 18:17:02 2000 Loic Dachary <loic at senga.org> + + * htlib/StringList.h (class StringList): change methods + Add/Insert/Assign that were copying the String* given in argument. + This behaviour is confusing since it has a different semantic + than the base class List. + +Mon May 08 17:16:00 2000 Loic Dachary <loic at senga.org> + + * htdig/Retriever.cc (GetLocal): fix leaked defaultdocs + +Mon May 08 04:27:47 2000 Loic Dachary <loic at senga.org> + + * htlib/StringList.cc (Create): remove SRelease. Deleting + the strings is taken care of by the destructor thru + Destroy. If destruction of the Strings is not desirable + Release should be used. SRelease was added apparently after + a virtual constructor doing nothing was added to hide the + default call to Destroy therefore leaking memory. + +Mon May 08 01:28:25 2000 Loic Dachary <loic at senga.org> + + * test/txt2mifluz.cc,word.cc,search.cc: fix minor memory leaks. + +Sun May 07 19:24:12 2000 Loic Dachary <loic at senga.org> + + * Makefile.config (HTLIBS): add libht at end because htdb + now depends on htlib. + + * configure.in,htlib/Makefile.am: use LTLIBOBJS as suggested + by the libtool documentation. + +Sun May 07 17:09:22 2000 Loic Dachary <loic at senga.org> + + * test/Makefile.am (clean-local): clean conf to prevent + inconsistencies when re-configuring in a directory that + is not the source directory. + +Sun May 07 05:07:23 2000 Loic Dachary <loic at senga.org> + + * db/mkinstalldir,test/benchmark: Add for installation purpose + +Sun May 07 02:17:03 2000 Loic Dachary <loic at senga.org> + + * Makefile.am (distclean-local): Xtest instead of test + that confuse some shells. + +Sun May 07 02:02:46 2000 Loic Dachary <loic at senga.org> + + * htword/WordDB.cc: Move Open to WordDB.cc. + +Sun May 07 01:32:47 2000 Loic Dachary <loic at senga.org> + + * test/t_*: check/fix scripts. All regression tests pass + on RedHat-6.2. + +Sun May 07 00:54:30 2000 Loic Dachary <loic at senga.org> + + * */*.cc: fix warnings and large file support inclusion + files on Solaris. + +Sat May 06 21:55:58 2000 Loic Dachary <loic at senga.org> + + * test/: import regression tests from mifluz + + * htlib/DB2_db.cc (db_init): fix flags used when creating the + environment to include a memory pool. + + * htcommon/defaults.cc: change wordkey_description format. + update all wordlist_* attributes + +Sat May 06 04:46:03 2000 Loic Dachary <loic at senga.org> + + * htmerge/words.cc (mergeWords): WordSearchDescription becomes + WordCursor. + + * httools/htpurge.cc (purgeWords): WordSearchDescription becomes + WordCursor. + +Sat May 06 02:01:40 2000 Loic Dachary <loic at senga.org> + + * htdb/*: upgrade to Berkeley DB 3.0.55. Very different. + + * htlib/getcwd.c,memcmp.c,memcpy.c,memmove.c,raise.c,snprintf.c, + strerror.c,vsnprintf.c,clib.h: Add compatibility support + + * htcommon/DocumentDB.cc (LoadDB): remove unused variable + + * htlib/DB2_db.cc: adapt to Berkeley DB 3.0.55 syntax. + + * htlib/Database.h (class Database): remove DB_INFO, does + not exist in Berkeley DB 3.0.55 + + * htlib/*: run ../db/prefix-symbols.sh + + * Makefile.config (INCLUDES): fix db include dirs + + * acconfig.h: Big file support + replacement functions + + * acinclude.m4,configure.in : db instead of db/dist + bug fixes + +Fri May 5 08:33:59 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * db/*: Merge in changes from Loic's mifluz tree. This will break + everything, but Loic promises he'll fix it ASAP after I make this + change. + +Mon Apr 24 21:58:22 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/htdig.cc (main): Make the -l stop & restart mode the + default. This will catch signals and quit gracefully. The + command-line parser will still accept -l, it will just ignore it. + (usage): Remove -l portion. + (main): Fix -m option to read in a file as it's + supposed to do! Also set max_hops correctly so really only indexes + the URLs in that file. + + * htdoc/htdig.html: Remove -l from documentation since it's now + the default. + +Mon Apr 24 21:22:53 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Server.cc (push): Fix bug where changes in the robots.txt + would be ignored. If a URL was indexed and later the robots.txt + changed to forbid it, the URL would still be updated. + +Wed Apr 19 22:13:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * Merging in changes from mifluz 0.14 from Loic. + + * htlib/Configuration.cc (Read): Removed dependency on fstream.h, + use fopen, fprintf, fgets, fclose instead of iostream. + + * htlib/HtPack.cc, htlib/HtVectorGeneric.h, htlib/Object.h, + htlib/ParsedString.cc, htlib/String.cc: Remove use of cerr, + instead use fprintf(stderr ...). + + * htlib/Dictionary.cc, htlib/HtVectorGeneric.cc, htlib/List.cc, + htlib/Object.cc, htlib/StringList.cc, htlib/htString.h, + htlib/strcasecmp.cc: Add #ifdef blocks for htconfig.h + +Wed Apr 12 19:09:40 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * .version: Bump to 3.2.0b3. + + * htdoc/htload.html, htdoc/htpurge.html, htdoc/htstat.html: Fix + typos in headers. + + * htdoc/main.html: Fix link to download to actually point to 3.2.0b2. + +Tue Apr 11 00:21:48 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc (setupWords): Does not apply fuzzy + algorithms to phrase queries. This helps prevent the infinite + loops described on the mailing list. + + * htcommon/conf_parser.yxx (list): Add conditions for lists + starting with string-number, number-string, and number-number. + + * htcommon/conf_parser.cxx: Regenerate using bison. + + * htdoc/RELEASE.html: Update release notes for recent bug fixes + and likely release date for 3.2.0b2. + + * htdoc/main.html: Add a blurb about the 3.2.0b2 release. + + * htdoc/*.html: Remove author notes in the footer as requested by + Andrew. To balance it out, the copyright notice at the top links + to THANKS.html. + +Sun Apr 9 15:21:12 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/conf_parser.yxx (list): Fix problem with + build_select_lists--parser didn't support lists including numbers. + + * htcommon/conf_parser.cxx: Regenerate using bison. + +Sun Apr 9 12:53:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/RELEASE.html: Add a first draft of 3.2.0b2 release notes. + +Sun Apr 9 12:31:13 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/Makefile.am, httools/Makefile.in: Add htload to + compilation list. + + * htcommon/DocumentDB.h: Add optional verbose options to DumpDB + and LoadDB. + + * htcommon/DocumentDB.cc (LoadDB): Implement loading and parsing + an ASCII version of the document database. Records on disk will + replace any matching records in the db. + (DumpDB): Add all fields in the DocumentRef to ensure the entire + database is written out. + + * htcommon/DocumentRef.h: Add new method for setting DocStatus + from an int type. + + * htcommon/DocumentRef.cc (DocStatus): Set it using a switch + statement. (It's not pretty, but it works.) + + * httools/htload.cc: New file. Loads in ASCII versions of the + databases, replacing existing records if found. + + * httools/htdump.cc: Pass verbose flags to DumpDB method. Make + sure to close the document DB before quitting. + + * httools/htpurge.cc: Add -u option to specify a URL to purge from + the command-line. + + * httools/htstat.cc: Add -u option to output the list of URLs in + the document DB as well. + +Sat Apr 8 16:35:55 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Change all <b>, <i>, and <tt> tags to the + HTML-4.0 compliant <strong>, <em>, and <code> tags. + + * installdir/long.html, installdir/header.html, + installdir/nomatch.html, installdir/syntax.html, + installdir/wrapper.html: Ditto. + + * htdoc/*.html: Ditto. (Don't you just love sed?) + + * htsearch/TemplateList.cc (createFromString): Ditto. + + * htdoc/htpurge.html, htdoc/htdump.html, htdoc/htload.html, + htdoc/htstat.html: New files documenting usage of httools + programs. + + * htdoc/contents.html: Add links to above. + + * htdoc/htdig.html: Update table with -t format to match htdump. + +Fri Apr 7 00:30:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * README: Update to mention 3.2.0b2 and use correct copyright. (It + is 2000 after all!) + + * htdoc/FAQ.html, htdoc/where.html, htdoc/uses.html, + htdoc/isp.html: Update with most recent versions from maindocs. + + * htdoc/RELEASE.html: Add release notes for 3.1.5 to the + top. (It's out of version ordering, but it is in correct + chronological order.) + +Fri Apr 7 00:11:29 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htpurge.cc (main): Read in URLs from STDIN for purging, + one per line. Pass them along to purgeDocs for removal. Also, make + discard_list into a local variable and pass it from purgeDocs to + purgeWords. + (purgeDocs): Accept a hash of URLs to delete (user input) and + return the list of doc IDs deleted. + (usage): Note the - option to read in URLs to be deleted from STDIN. + +Thu Apr 6 00:10:23 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (got_redirect): Allow the redirect to accept + relative redirects instead of just full URLs. + +Wed Apr 5 15:07:52 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc: Added #if test to make sure DBL_MAX is + defined on Solaris, as reported by Terry Luedtke. + +Tue Apr 4 12:46:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/doc2html/*: Added parser submitted by D.J.Adams at soton.ac.uk + +Mon Apr 3 13:48:59 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: Fix error in description of new attribute + plural_suffix. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Fri Mar 31 21:48:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in, configure: Add test using AC_TRY_RUN to compile + against the htlib/regex.c and attempt to compile a regexp. This + should allow us to find out if the included regex code causes + problems. + + * acconfig.h: Add HAVE_BROKEN_REGEX as a result of the configure + script to conditionally include the appropriate regex.h file. + + * include/htconfig.h.in: Regenerate using autoheader. + + * htlib/regex.c: Move #include "htconfig.h" inside HAVE_CONFIG_H + tests. This file is only created when this is true anyway. This + prevents problems with the configure test. + + * htlib/HtRegex.h, htfuzzy/EndingsDB.cc: Use HAVE_BROKEN_REGEX + switch to use the system include instead of the local include + where appropriate. + + * htlib/Makefile.am, htlib/Makefile.in: Only compile regex.lo if + the configure script added it to LIBOBJS. + +Thu Mar 30 22:41:38 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/URL.cc (normalizePath): Remove Gilles's loop to add + back ../ components to a path that would go above the top + level. Now we simply discard them. Both are allowed under the RFC, + but this should have fewer "surprises." + +Tue Mar 28 21:57:49 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Connection.cc (Read_Partial): Fix bug reported by Valdas + where a zero value returned by select would result in an infinite + loop. + + * htcommon/defaults.cc: Add new attribute plural_suffix to set the + language-dependent suffix for PLURAL_MATCHES contributed by Jesse. + + * htsearch/Display.cc (setVariables): Use it. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Mon Mar 27 22:28:20 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentRef.cc (Deserialize): Add back stub for + DOC_IMAGESIZE to prevent decoding errors. This just throws away + that field. + + * htcommon/HtSGMLCodec.h (class HtSGMLCodec): Differentiate + between codec used for &foo; and numeric form &#nnn; Make sure + encoding goes through both but decoding only goes through the + preferred text form. + + * htcommon/HtSGMLCodec.cc (HtSGMLCodec): When constructing the + private HtWordCodec objects, create separate lists for the number + and text codecs. + +Mon Mar 27 21:25:27 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/HtURLSeedScore.cc (ScoreAdjustItem): Change to use + HtRegex for flexibility and to get around const char * -> char * + problems. + + * htsearch/SplitMatches.cc (MatchArea): Ditto. + + * htsearch/Makefile.am, htsearch/Makefile.in: Add SplitMatches.cc + and HtURLSeedScore.cc to compilation list! + +Mon Mar 27 21:03:12 2000 Hans-Peter Nilsson <hp at bitrange.com> + + * htcommon/defaults.cc (defaults): Add default for + search_results_order, url_seed_score. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerated using cf_generate.pl. + + * htlib/List.h (List): New method AppendList. + * htlib/List.cc (List::AppendList): Implement it. + + * htsearch/SplitMatches.h, htsearch/SplitMatches.cc: New. + + * htsearch/HtURLSeedScore.cc, HtURLSeedScore.h: New. + + * htsearch/Display.h (class Display: Add member minScore. + Change maxScore type to double. + + * htsearch/Display.cc: Include SplitMatches.h and HtURLSeedScore.h + (ctor): Initialize minScore, change init value for + maxScore to -DBL_MAX. + (buildMatchList): Use a SplitMatches to hold search results and + interate over its parts when sorting scores. + Ignore Count() of matches when setting minScore and maxScore. + Use an URLSeedScore to adjust the score after other calculations. + Calculate minScore. + Correct maxScore adjustment for change to double. + (displayMatch): Use minScore in calculation of score to adjust for + negative scores. + (sort): Calculation of maxScore moved to buildMatchList. + +Mon Mar 27 20:22:24 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Remove + DocImageSize field since it is not used anywhere and is never updated. + + * htdig/Retriever.h (class Retriever): Remove references to Images class. + + * htcommon/DocumentDB.cc (DumpDB): Ignore DocImageSize field. + + * htdig/Makefile.am, htdig/Makefile.in: Remove Images.cc since + this is no longer used. + + * htdig/Plaintext.cc: Do not insert SGML equivalents into the + excerpt, these are decoded by HtSGMLCodec automatically. + +Sat Mar 25 21:58:36 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/cf_generate.pl (html_escape): Changed <b></b> and <i></i> + tags to HTML 4.0 <strong> and <em> tags. + +Sat Mar 25 17:23:46 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdb/Makefile.am, htdb/Makefile.in: Change the names of the htdb + utility programs to escape name conflicts with httool programs. + + * htdb/htdb_load.cc: Rename htload.cc to escape name conflict and + more closely match orignal db_load program name. + + * htdb/htdb_dump.cc, htdb/htdb_stat.cc: Ditto. + + * htfuzzy/Prefix.cc (getWords): Add code to "weed out" duplicates + returned from WordList::Prefix. We only want to add unique words + to the search list. + +Fri Mar 24 22:33:20 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Document.cc (Document): Fix bug reported by Mentos + Hoffman, contributed by Atlee Gordy <agordy at moonlight.net>. + +Mon Mar 20 23:14:26 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentDB.cc (Delete): Fix bug reported by Valdas + where duplicate document records could "sneak in" because the + doc_index entry was removed incorrectly. + +Mon Mar 20 19:08:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Added block field and added appropriate blocks. + + * htlib/Configuration.h (struct ConfigDefaults): Add block field. + + * htdoc/cf_generate.pl: Parse the new block field. + + * htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html: + Regenerate using above. + + * htcommon/DocumentDB.cc (DumpDB): Make sure we decompress the + DocHead field before we write it to disk! + + * httools/htdump.cc, httools/htstat.cc: Call + WordContext::Initialize() before doing any htword calls. + +Mon Mar 20 14:10:30 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/htpurge.cc: Whoops! Left some references to htmerge in + the error messages and usage message. + + * httools/htstat.cc: New program. Simply spits up the total number + of documents, words and unique words in the databases. + + * httools/htdump.cc: New program. Simply dumps the contents of the + document DB and the word DB to doc_list and word_dump files + respectively. Also has flags -w and -d to pick one or the other. + + * httools/Makefile.am, httools/Makefile.in: Add htdump and htstat + programs to compilation list. + + * htcommon/DocumentDB.cc (DumpDB): Change name of CreateSearchDB + and add fields for DocBackLinks, DocSig, DocHopCount, DocEmail, + DocNotification, and DocSubject. This should now export every + portion of the document DB. + + * htcommon/DocumentDB.h: Change name of CreateSearchDB and add + stub for LoadDB, to be written shortly. + + * htdig/htdig.cc: Call DumpDB instead of CreateSearchDB when + creating an ASCII version of the DB. + +Sat Mar 18 22:57:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * httools/Makefile.am, httools/Makefile.in: New directory for + useful database utilities. + + * httools/htnotify.cc: Moved htnotify to httools directory. + + * httools/htpurge.cc: New program--currently just purges documents + (and corresponding words) in the databases. Will shortly also + allow deletion of specified URLs. + + * Makefile.am, configure.in: Remove htnotify directory in favor of + httools directory. + + * configure: Regenerate using autoconf. + + * Makefile.in: Regenerate using automake --foreign. + +Fri Mar 17 16:47:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (excerpt, hilight): Correctly handle case + where there is no pattern to highlight. + * htsearch/htsearch.cc (addRequiredWords), htcommon/defaults.cc: + Add any_keywords attribute, to OR keywords rather than ANDing, + fix addRequiredWords not to mess up expression when there are + no search words, but required words are given. + * htdoc/hts_form.html: Mention new attribute, add links to all + mentioned attributes. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Fri Mar 17 15:48:12 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Accents.cc (generateKey): Truncate words to + maximum_word_length, for consistency with what's found in word DB. + +Fri Mar 17 10:56:17 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (do_tag): Use case insensitive parsing of META + robots tag content. + * htlib/String.cc (uppercase): Fix misplaced cast for islower(). + +Mon Mar 6 17:31:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc (setupWords): Don't allow comma as string + list separator, as it can be a decimal point in some locales. + +Mon Mar 06 00:58:00 2000 Loic Dachary <loic at ceic.com> + + * db/mp/mp_bh.c (__memp_bhfree): always free the chain, if + any. The bh is reset to null after free and we loose the + pointer anyway, finally filling the pool with it. + + * db/mp/mp_cmpr.c (__memp_cmpr_write): i < CMPR_MAX - 1 instead of + i < CMPR_MAX otherwise go beyond array limits. This fixes a + major problem when handling large files. + +Sat Mar 04 19:41:49 2000 Loic Dachary <loic at ceic.com> + + * db/mp/mp_cmpr.c (__memp_cmpr_free_chain): clear BH_CMPR + flag. Was causing core dumps, thanks to + Peter Marelas maral at phase-one.com.au for providing + a simple case to reproduce the error. + +Fri Mar 3 11:32:34 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * Fixed bugs regarding yesterday's changes. Even Leonardo da Vinci + used to commit errors, so ... + +Fri Mar 3 11:25:42 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * testnet.cc: added the -r and -w options in order to set how many + times it retries to re-connect after a timeout occurs, and how long + it should wait after it. + +Thu Mar 2 18:45:15 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/Connection.*: management of wait_time and number of retries + after a timeout occurs. + + * htnet/Transport.*: Management of connection attributes above. + + * htdig/Server.*: Set members for managing timeout retries taken from + the configuration file ("timeout", "tcp_max_retries", "tcp_wait_time"). + + * htdig/Document.cc: Added the chance to configure on a server basis + "persistent_connections", "head_before_get", "timeout", + "tcp_max_retries", "tcp_wait_time". Changed Retrieve method accepting + now a server object pointer: Retrieve (server*, HtDateTime). + + * htdig/Retriever.cc: Added the chance to configure on a server basis + "max_connection_requests" attribute. + + * htcommon/defaults.cc: Added "tcp_max_retries", "tcp_wait_time" -- Need + to be go over by someone who speaks english better than me. Not a hard + work !!! ;-) + +Wed Mar 1 17:01:09 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc (excerpt, hilight): move SGML encoding into + hilight() function, because when it's done earlier it breaks + highlighting of accented characters. + +Wed Mar 1 16:02:49 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/htfuzzy.cc (main): Correctly test return value on Open() + of word database, include db name in error message if Open() fails, + do a WordContext::Initialize() before we need htword functions. + (Obviously I'm the first to test htfuzzy in 3.2!) + * htfuzzy/Accents.cc (generateKey): cast characters to unsigned char + before using as array subscripts. + +Wed Mar 1 13:27:26 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: Added accents_db attribute, mentioned accents + algorithm in search_algorithms section. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + * installdir/htdig.conf: Added mentions of accents, speling & substring, + fixed a couple typos in comments. + * htdoc/htfuzzy.html: Added blurb on accents algorithm. + * htdoc/require.html: Added mentions of accents, speling, substring, + prefix & regex. + * htdoc/config.html: Updated with sample of latest htdig.conf and + installdir/*.html, added blurb on wrapper.html. + +Wed Mar 1 00:30:19 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in, configure: Add test for FD_SET_T, the second (also + third and fourth) argument in calls to select(). Should solve PR#739. + + * acconfig.h, include/htconfig.h.in: Add declaration for FD_SET_T. + + * htnet/Connection.cc (ReadPartial): Change declaration of fds to + use FD_SET_T define set by the configure script. + +Tue Feb 29 23:11:49 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/DB2_db.cc (Error): Simply fprint the error message on + stderr. This is not a method since the db.h interface expects a C + function. + (db_init): Don't set db_errfile, instead set errcall to point to + the new Error function. + +Tue Feb 29 15:09:41 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Accents.h, htfuzzy/Accents.cc: Adapted writeDB() for 3.2. + +Tue Feb 29 14:29:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Accents.h, htfuzzy/Accents.cc: Added these, as contributed + by Robert Marchand, to implement accents fuzzy match. Adapted to 3.2. + * htfuzzy/Fuzzy.cc, htfuzzy/htfuzzy.cc, htfuzzy/Makefile.am, + htfuzzy/Makefile.in: Added in accents algorithm, as for soundex. + +Tue Feb 29 11:31:53 2000 Loic Dachary <loic at ceic.com> + + * test/testnet.cc (Listen): Add -b port to listen to a specific + port. This is to test connect timeout conditions. + + * htnet/Connection.cc (Connect): Added SIGALRM signal handler, + Connect() always allow EINTR to occur. + +Mon Feb 28 15:32:46 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKey.h (class WordKey): explicitly add inline keyword + for all inline functions. + +Mon Feb 28 13:10:34 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKey.h (class WordKey): nfields data member caches + result of NFields() method. + + * htword/WordDBPage.h (class WordDBPage): nfields data member caches + result of WordKey::NFields() method. + + * acinclude.m4 (APACHE): check in lib/apache for modules + +Sat Feb 26 22:05:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Collection.h, htsearch/Collection.cc: New files + contributed by Rajendra Inamdar <inamdar at beasys.com>. + + * htsearch/Makefile.am, htsearch/Makefile.in: Compile them. + + * htcommon/defaults.cc: Add new collection_names attribute as + described by Rajendra. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + + * htsearch/Display.h, htsearch/Display.cc: Loop through + collections as we are assembling results. + (buildMatchList): Use 1.0 as minimum score and take log(score) as + the final score. This requires an increase in magnitude in weight + to correspond to a factor of increase in score. + + * htsearch/DocMatch.h, htsearch/DocMatch.cc: Keep track of the + collection we're in. + + * htsearch/ResultMatch.h: Ditto. + + * htsearch/htsearch.h, htsearch/htsearch.cc: Wrap results in + collections. + + * htsearch/parser.h, htsearch/parser.cc: Set the collection for + the results--we use this to get to the appropriate word DB. + (score): Divide word weights by word frequency to calibrate for + expected Zipf's law. Rare words should count more. + +Fri Feb 25 11:19:47 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (maximum_pages): Describe new bahaviour (as of + 3.1.4), where this limits total matches shown. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Thu Feb 24 14:43:06 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtFile.cc (Request): Fix silly typo. + + * htlib/DB2_db.cc: Remove include of malloc.h, as it causes problems + on some systems (e.g. Mac OS X), and all we need should be in stdlib.h. + +Thu Feb 24 13:11:15 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnet/HtFile.cc (Request): Don't append more than _max_document_size + bytes to _contents string, set _content_length to size returned by + stat(). + * htnet/HtHTTP.cc (HTTPRequest): Extra tests in case Content-Length + not given for non-chunked input, and not to close persistent + connection when chunked input exceeds _max_document_size. + (ReadChunkedBody): Don't append more than _max_document_size bytes + to _contents string. + +Thu Feb 24 11:40:24 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (do_tag): Fix handling of img alt text to be consistent + with body text, rather than keywords. + * htdig/Retriever.cc (ctor): Treat alt text as plain text, until it has + its own FLAG and factor. + +Thu Feb 24 11:16:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (version): Moved example over to correct field. + (defaults[] terminator): Padded zeros to new number of fields. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Thu Feb 24 19:08:41 2000 Loic Dachary <loic at ceic.com> + + * htmerge/words.cc: only display Word in verbose message instead + of complete key if verbosity < 3. + +Thu Feb 24 10:43:12 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (external_protocols, external_parser): + Swapped these two entries to put them in alphabetical order. + (star_blank): Fixed old typo (incorrect reference to image_star). + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Wed Feb 23 16:53:40 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc (backlink_factor, external_parser, + local_default_doc, local_urls, local_urls_only, local_user_urls): + Add some updates from 3.1.5's attrs.html. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Wed Feb 23 15:11:51 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + [ Improve htsearch's HTML 4.0 compliance ] + * htsearch/TemplateList.cc (createFromString): Use file name rather + than internal name to select builtin-* templates, use $&(TITLE) and + $&(URL) in templates and quote HTML tag parameters. + * installdir/long.html, installdir/short.html: Use $&(TITLE) and + $&(URL) in templates and quote HTML tag parameters. + * htsearch/Display.cc (setVariables): quote all HTML tag parameters + in generated select lists. + * installdir/footer.html, installdir/header.html, + installdir/nomatch.html, installdir/search.html, + installdir/syntax.html, installdir/wrapper.html: + Use $&(var) where appropriate, and quote HTML tag parameters. + * installdir/htdig.conf: quote all HTML tag parameters. + +Wed Feb 23 13:40:27 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/URL.h (encodeURL): Change list of valid characters to + include only unreserved ones. + * htcommon/cgi.cc (init): Allow "&" and ";" as input param. separators. + * htsearch/Display.cc (createURL): Encode each parameter separately, + using new unreserved list, before piecing together query string, to + allow characters like "?=&" within parameters to be encoded. + +Wed Feb 23 13:22:29 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/URL.cc (ServerAlias): Fix server_aliases processing to prevent + infinite loop (as for local_urls in PR#688). + +Wed Feb 23 12:49:52 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/HtDateTime.h, htlib/HtDateTime.cc: change Httimegm() method + to HtTimeGM(), to avoid conflict with Httimegm() C function, so we + don't need "::" override, for Mac OS X. + * htlib/htString.h, htlib/String.cc: change write() method to + Write(), to avoid conflict with write() function, so we don't need + "::" override, for Mac OS X. + +Wed Feb 23 12:17:46 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/Configuration.cc(Read): Fixed to allow final line without + terminating newline character, rather than ignoring it. + +Wed Feb 23 12:01:01 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (GetLocal, GetLocalUser): Add URL-decoding + enhancements to local_urls, local_default_urls & local_default_doc, + to allow hex encoding of special characters. + +Wed Feb 23 19:14:29 2000 Loic Dachary <loic at ceic.com> + + * htcommon/conf_parser.cxx: regenerated from conf_parser.yxx + +Wed Feb 23 19:04:16 2000 Loic Dachary <loic at ceic.com> + + * test/test_functions.in: inconditionaly remove existing test/var + directory before runing tests to prevent accidents. + + * htcommon/URL.cc (URL): fixed String->char warning + + * htcommon/defaults.cc (wordlist_compress): defaults to true + +Tue Feb 22 17:09:10 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc(parse, do_tag): Fix handling of <img alt=...> text + and parsing of words in meta tags, to to proper word separation. + * htlib/HtWordType.h, htlib/HtWordType.cc: Add HtWordToken() function, + to replace strtok() in HTML parser. + +Tue Feb 22 16:21:25 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/URL.cc (ctor, normalizePath): Fix PR#779, to handle relative + URLs correctly when there's a trailing ".." or leading "//". + +Tue Feb 22 14:09:26 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc (RetrieveLocal): Handle common extensions for + text/plain, application/pdf & application/postscript. + +Mon Feb 21 17:25:21 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/htdig-3.2.0.spec: Fixed %post script to add more + descriptive entries in htdig.conf, made cron script a config file, + updated to 3.2.0b2. + + * contrib/conv_doc.pl, contrib/parse_doc.pl: Added comments to show + Warren Jones's updates in change history. + +Mon Feb 21 17:09:13 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/HtConfiguration.h, htcommon/conf_parser.yxx, + htlib/Configuration.h, htlib/Configuration.cc: split Add() method + into Add() and AddParsed(), so that only config attributes get parsed. + Use AddParsed() only in Read() and Defaults(). + +Fri Feb 18 22:50:54 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Connection.h, htnet/Connection.cc: Renamed methods with + capitals to remove the need to use ::-escaped library calls. + + * htnet/Transport.h, htnet/Transport.cc, htnet/HtHTTP.cc, + htdig/Images.cc: Fix code using Connection to use the newly + capitalized methods. + +Fri Feb 18 14:40:50 2000 Loic Dachary <loic at ceic.com> + + * test/conf/access.conf.in: removed cookies. Not used and some + httpd are not compiled with usertrack. + +Wed Feb 16 12:15:08 2000 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/Makefile.am replaced conf.tab.cc.h by conf_parser.h in + noinst_HEADERS + + * htcommon/conf_parser.yxx,conf_parser.lxx,HtConfiguration.cc, + HtConfiguration.h: added copyright and Id: + + * htcommon/cgi.cc(init): fixed bug: array must be free by + delete [] buf, not just delete buf; + +Tue Feb 15 23:16:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/HtHTTP.cc (isParsable): Remove application/pdf as a + default type--it is now handled through the ExternalParser + interface if at all. + + * htcommon/defaults.cc: Remove pdf_parser attribute. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + + * htdig/Document.cc (getParsable): Remove PDF once and for all + (hopefully). + + * htdig/ExternalParser.cc (parse): Ditto. + + * configure.in: Remove check for PDF_PARSER. + + * configure: Regenerate using autoconf + + * htdig/Makefile.am: Remove PDF.cc and PDF.h. + + * Makefile.in, */Makefile.in: Regenerate using automake --foreign + +Tue Feb 15 12:02:39 EET 2000 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/HtConfiguration.cc,HtConfiguration.h: fixed bug discovered + by Gilles. HtConfiguration was able to get info only from "url" and + "server" block. + + * htcommon/conf_parser.yxx: deleted 1st parameter for new char[], + lefted when realloc was replaced by new char[]. Removed a few unused + variable declaration. + + * htcommon/Makefile.am: added -d flag to bison to generate + conf_parser.h template from conf_parser.yxx; + conf_lexer.lxx uses #include conf_parser.h; + conf.tab.cc.h removed. + +Sun Feb 13 21:19:04 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Get rid of uncoded_db_compatible since + the current DB format has clearly broken backwards compatibility. + + * htsearch/Display.cc (Display), htnotify/htnotify.cc (main), + htmerge/docs.cc (convertDocs), htmerge/db.cc (mergeDB), + htdig/htdig.cc (main): Remove call to DocumentDB::setCompatibility(). + + * htcommon/DocumentDB.h (class DocumentDB): Remove + setCompatibility and related private variable. + + * htcommon/DocumentDB.cc ([], Delete): Don't bother checking for + an unencoded URL, at this point all URLs will be encoded using + HtURLCodec. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Sat Feb 12 21:29:20 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/HtSGMLCodec.cc (HtSGMLCodec): Always translate " + & < and > + + * htcommon/defaults.cc: Remove translate_* and word_list + attributes since they're now no longer used. + + * htdig/PDF.cc (parseNonTextLine): Fix bogus escape sequences + around Title parsing. Fixes PR#740. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Fri Feb 11 11:41:36 2000 Loic Dachary <loic at ceic.com> + + * htlib/Makefile.am: removed CFLAGS=-g (use make CXXFLAGS=-g all + instead). + + * htdoc/install.html: specify header/lib install directory now + is prefix/include/htdig and prefix/lib/htdig. + + * Makefile.am (distclean-local): use TESTDIR instead of deprecated + HTDIGDIRS. + + * */Makefile.am: install libraries in prefix/lib/htdig and + includes in prefix/include/htdig. Just prepend pkg in front of + automake targets. + + * include/Makefile.am: install htconfig.h + +Thu Feb 10 23:18:37 2000 Loic Dachary <loic at ceic.com> + + * Connection.cc (Connection): set retry_value to 1 instead of + 0 as suggested by Geoff. + +Thu Feb 10 17:36:09 2000 Loic Dachary <loic at ceic.com> + + * htdig/Document.cc: fix (String)->(char*) conversion warnings. + + * htword/WordList.cc: kill Collect(WordSearchDescription) which + was useless and error prone. + + * htword/WordDB.h (WordDBCursor::Get): small performance improvement + by copying values only if key found. + + * htword/WordDB.h,WordList.cc: fix reference counting bug when + using Override (+1 even if entry existed). Turn WordDB.h return + values to be std Berkeley DB fashion instead of the mixture with + OK/NOTOK that was a stupid idea. This allows to detect Put errors + and handle them properly to fix the Override bug without performance + loss. + + * test/conf/httpd.conf.in: comment out loading of mod_rewrite + since not everyone has it. + +Thu Feb 10 00:26:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Add new attribute "nph" to send out + non-parsed headers for servers that do not supply HTTP headers on + CGI output (e.g. IIS). + + * htsearch/Display.cc (display): If nph is set, send out HTTP OK + header as suggested by Matthew Daniel <mdaniel at scdi.com> + (displaySyntaxError): Ditto. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate from current defaults.cc file. + +Thu Feb 10 00:21:58 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Treat <script></script> tags as noindex + tags, much like <style></style> as suggested by Torsten. + +Thu Feb 10 00:02:41 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * .version: Bump for 3.2.0b2. + + * htcommon/defaults.cc: Add category fields for each + attribute. Though these are currently unused, they could allow the + documentation to be split into multiple files based on logical + categories and subcategories. + +Wed Feb 9 23:52:55 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Connection.cc (connect): Add alarm(timeout) ... alarm(0) + around ::connect() call to ensure this does timeout as appropriate + as suggested by Russ Lentini <rlentini at atl.lmco.com> to resolve + PR#762 (and probably others as well). + (connect): Add a retry loop as suggested by Wilhelm Schnell + <Wilhelm.Schnell at mn.man.de> to resolve PR#754. + + * htnet/HtHTTP.cc (HTTPRequest): Add CloseConnection() when the + connection fails on open before returning from the method. Should + take care of PR#670 for htdig-3-2-x. + +Wed Feb 09 17:20:50 2000 Loic Dachary <loic at ceic.com> + + * db/dist/Makefile.in (libhtdb.so): move dependent libraries + *after* the list of objects, otherwise it's useless. + + * htword/WordKey.h (class WordKey): move #if SWIG around to + please swig (www.swig.org). + + * htword/WordList.h (class WordList): allow SWIG to see Walk* + functions (#if SWIG). + +Wed Feb 9 09:21:00 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Server.cc (robotstxt): apply more rigorous parsing of + multiple user-agent fields, and use only the first one. + + * htlib/HtRegex.cc (set): apply the fix from Valdas Andrulis, to + properly compile case_sensitive expressions. + +Mon Feb 09 09:43:59 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.cc: changed "<<" to append() for content_length + assignment in ReadChunkedBody() function (as Gilles suggested) + +Tue Feb 08 10:54:08 2000 Loic Dachary <loic at ceic.com> + + * db/dist/configure.in: Added AC_PREFIX_DEFAULT(/opt/www) + so that headers and libraries are installed in the proper + directory when no --prefix is given. + +Tue Feb 08 10:32:48 2000 Loic Dachary <loic at ceic.com> + + * test/t_wordskip: copy $srcdir/skiptest_db.txt to allow running + outside the source tree. + + * configure.in: use '${prefix}/...' instead of "$ac_default_prefix/..." + that did not carry the --prefix value. + + * configure.in: run CHECK_USER and AC_PROG_APACHE if --enable-tests + +Mon Feb 07 17:40:47 2000 Loic Dachary <loic at ceic.com> + + * htlib/htString.h (last): turn to const + +Mon Feb 07 14:05:37 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/HtHTTP.cc: fixed a bug in ReadChunkedBody() function + regarding document size assignment (raised by Valdas Andrulis) + +Sun Feb 06 19:11:05 2000 Loic Dachary <loic at ceic.com> + + * configure.in: Fix inconsistencies between default values + shown by ./configure and actual defaults. + + * htdoc/install.html: change example version 3.1 to 3.2 + Commented out warning about libguile. + Replace CONFIG variables by configure.in options. + Specify default value for each of them. + Replace (and move) make depend by automake (distributed + Makefiles do not include dependency generation) + Added section for running tests. + Added section on shared libraries. + + * configure.in: use AM_CONDITIONAL for --enable-tests + + * Makefile.am: use automake conditionals for subdir so + that make dist knows what to distribution --enable-tests + specified or not. + + * db/Makefile.in: allow make dist to work outside the source + tree. + +Sat Feb 05 18:31:04 2000 Loic Dachary <loic at ceic.com> + + * test/word.cc (SkipTestEntries): The fix of + WordList::SkipUselessSequentialWalking actually saves us + a few hops when walking lists of words. + +Fri Feb 04 17:28:32 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKey.cc,WordReference.cc,WordRecord.cc (Print): use + cerr instead of cout for immediate printing under debugger. + +Thu Feb 3 16:06:45 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc (RetrieveLocal): fix bug that prevented local + filesystem digging, because max_doc_size was initialized to 0. + Now sets it to max_doc_size for current url. + +Thu Feb 3 12:36:56 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/Makefile.{am,in}: install mime.types as mime.types, + not as htdig.conf. + + * htfuzzy/EndingsDB.cc (createDB): fix code to use MV macro in + system() command, not hard-coded "MV" string literal, and use + get() on config objects to avoid passing String objects to form(). + +Wed Feb 2 19:44:33 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtDateTime.cc (SetRFC1123): Strip off weekday, if present + and use LOOSE format. + (SetRFC850): Ditto. + + * configure.in, configure: Add configure check for "mv." + + * htfuzzy/Makefile.am: Use it. + + * */Makefile.in: Regenerate using automake. + + * htfuzzy/EndingsDB.cc (createDB): Use the detected mv, or + whatever is in the path to move the endings DB when they're + finished. + +Wed Feb 2 15:49:14 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc (RetrieveLocal), htdig/Retriever.cc (GetLocal): + Fix compilation errors. Oops! + +Wed Feb 2 13:53:27 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc (IsValidURL): fix problem with valid_extensions + matching failure when URL parameters follow extension. + +Wed Feb 2 13:29:48 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/QuotedStringList.cc (Create): fix PR#743, where quoted string + lists didn't allow embedded quotes of opposite sort in strings + (e.g. "'" or '"'), and fix to avoid overrunning end of string + if it ends with backslash. + +Wed Feb 2 13:23:16 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (ctor, parse, do_tag), htcommon/defaults.cc: + Add max_keywords attribute to limit meta keyword spamming. + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Wed Feb 2 12:57:40 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc (RetrieveLocal), htdig/Document.h, + htdig/Retriever.cc (Initial, parse_url, GetLocal, GetLocalUser, + IsLocalURL, got_href, got_redirect), htdig/Retriever.h, + htdig/Server.cc (ctor), htdig/Server.h: Add in Paul Henson's + enhancements to local_urls, local_default_urls & local_default_doc. + * htcommon/defaults.cc: Document these. + +Wed Feb 02 10:14:57 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKeyInfo.h,WordKey.{cc,h}: fix overflow bug when 32 + bits. For that purpose implement Outbound/Overflow/Underflow + methods in WordKey, MaxValue in WordKey/WordKeyInfo. + (WordKey::SetToFollowing) was FUBAR : overflow of field1 tested + with number of bits in next field, do not handle overflow, + Re-implemented. + (WordKey::Set) Change atoi to strtoul. + (WordList::SkipUselessSequentialWalking) was much to fucked up + to explain. Re-implement + (WordKey::Diff) Added as a support function of + SkipUselessSequentialWalking. + implement consistent verbosity. + + * htword/WordList.cc (operator >>): explicit error message when + insert failed, with line number. + +Wed Feb 2 00:11:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdoc/RELEASE.html: Finish up with notes on all significant + new attributes. + + * htdoc/FAQ.html, htdoc/where.html: Mention new 3.2.0b1 release + as a beta. + + * contrib/README: Update to mention new scripts. + + * installdir/mime.types: Add default Apache mime.types file for + systems that do not already have one. + + * installdir/Makefile.am: Make sure it is installed by default. + + * installdir/Makefile.in: Regenerate using automake. + + * htcommon/defaults.cc: Add documentation for mime_types + attribute, remove currently unused image_alt_factor, and add + documentation for external_protocols. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Regenerate using cf_generate.pl. + +Tue Feb 1 10:24:19 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/parser.cc (score): fix up score calculations for + correctness and efficiency. + +Mon Jan 31 16:29:20 2000 Marcel Bosc <bosc at ceic.com> + + * htword/WordBitCompress.cc: fixed endian bug in compression + +Sat Jan 29 21:14:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/parser.cc (score): Change config.Value (which returns + int) to config.Double to preserve accuracy of attributes. + + * htcommon/defaults.cc: Updated documentation for attributes now + allowing regex, search_algorithms (for new fuzzy) and added + documentation for the overlooked remove_unretrieved_urls. + + * htdoc/*.html: Updated copyright notice for 2000, changed footer + to use CVS's magic Date keyword. Regenerated documentation from + defaults changes. + +Sat Jan 29 16:32:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * contrib/htdig-3.1.4.spec, contrib/htdig-3.1.4-conf.patch: Remove + these since they don't apply to the 3.2.x releases. + + * htfuzzy/Synonym.cc (openIndex): Change database format from + DB_BTREE to DB_HASH--no reason for the synonym database to be a + btree. This was probably overlooked when I switched the rest of + the fuzzy databases over to DB_HASH. + +Sat Jan 29 05:34:26 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKey.h (UnpackNumber): Very nasty bug. Optimization + dated Dec 29 broke endianess on Solaris. Restore previous version. + +Fri Jan 28 18:17:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Configuration.h (struct ConfigDefaults): Add version and + category fields for more accurate documentation. + + * htcommon/defaults.cc: Add blank category fields and start + filling in version field. Killed modification_time_is_now_attribute. + + * htdig/Document.cc (Document): Kill attribute + modification_time_is_now since it can cause more harm than good. + + * htnet/HtHTTP.cc (ParseHeader): Ditto. + + * htdoc/cf_generate.pl: Added support for new version and category + fields. Currently category does nothing, but it could split the + documentation into categories. + +Sat Jan 29 01:37:45 2000 Loic Dachary <loic at ceic.com> + + * .version: remove the trailing -dev + +Thu Jan 27 12:22:57 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.cc: cdebug replaced by cerr. replace lverbose + by verbose > 2. Remove shutup. + (WordList): monitor = 0 + (Open): create monitor only if wordlist_monitor = true + (Close): delete monitor if set, delete compressor if set + + * htword/WordDBCompress.cc,WordList.cc: only activate monitoring code + if monitor is set. No interaction with the monitor is therefore possible + if wordlist_monitor is false. + + * htword/WordMonitor.cc: remove useless test of wordlist_monitor (done by + WordList now). + + * htword/WordDBCompress.cc (TestCompress): remove redundant debuglevel argument. + + * htword/WordDBCompress.cc (WordDBCompress): init cmprInfo to 0 + + * db/include/db_cxx.h: Add get_mp_cmpr_info method + + * htword/WordDBCompress.cc (WordDBCompress): set default debug level to 0 + + * htword/WordDB.h: CmprInfo returns current CmprInfo and non static, + overload to set CmprInfo if argument given. + + * htword/WordDBCompress.h: new CmprInfo() method returns DB_CMPR_INFO object + for Berkeley DB database. + + * htword/WordList.h: add compressor member, kill cmprInfo member. + + * htword/WordList.cc: + +Wed Jan 26 20:05:33 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.cc,htword/WordList.h: get rid of obsolete WordBenchmarking + +Wed Jan 26 9:14:32 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htcommon/defaults.cc: added "max_connection_requests". + + * htdig/Retriever.cc: now manages the attribute above. + +Tue Jan 25 12:59:01 2000 Loic Dachary <loic at ceic.com> + + * htsearch/Display.cc (setVariables): fixed + Display.cc:505: warning: multiline `//' comment + +Tue Jan 25 8:37:15 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Document.h: Added the "HtHTTP *GetHTTPHandler()" method, in + order to be able to control an HTTP object outside the Document class. + This is useful for the Server class, after the request for robots.txt. + We can control the response of a server and check if it supports + persistent connections. + + * htdig/Server.cc: inside the constructor, persistent_connections var is + initialized to the configuration parameter value, instead of <true>. + Besides, after the request of the robots.txt, it controls and set + the attribute for persistent connections, depending on whether the + server supports them or not. + + * htdig/Retriever.cc: modified the Start() method. Now the loop manage + HTTP persistent connections "on a server" basis. Indeed, it's a + Server object that decides if persisent connections are allowed on + that server or not (depending on configuration or capabilities of + the remote http server). + +Mon Jan 24 12:57:45 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(setVariables): Added double quotes around + default selection value in build_select_lists handling. + +Mon Jan 24 12:37:22 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(setVariables), htcommon/defaults.cc: Added + build_select_lists attribute, to generate selector menus in forms. + Added relevant explanations and links to selectors documentation. + * htdoc/hts_selectors.html: Added this page to explain this new + feature, plus other details on select lists in general. + * htdoc/hts_templates.html: Added relevant links to related attributes + and selectors documentation. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + +Fri Jan 21 18:57:58 EET 2000 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/HtConfiguration.cc: added HtConfiguration::ParseString(char*) + method to allow lexer handle "include: ${var}/file.inc" construction + + * htcommon/conf_lexer.lxx: fixed handling "include: ${var}file.inc" + bug. + +Fri Jan 21 17:04:28 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.cc (WalkFinish,WalkInit,WalkNextStep): fix typos in error messages + and misleading comment. + + * htword/WordList.h,WordList.cc: move part of WalkInit in WalkRewind so that + we have a function to go back to the beginning of possible matches. + +Wed Jan 19 21:49:57 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Only add words for META descriptions, + keywords, and IMG ALT attributes if doindex is set. + + * htcommon/DocumentRef.h: Added Reference_obsolete for documents + that should be removed (but haven't). + + * htdig/Retriever.cc (parse_url): Flag documents that have been + modified as Reference_obsolete and update the database. Flag all + documents with various errors as something other than + Reference_normal, as appropriate--these probably should be pruned. + + * htdig/Retriever.h: Get rid of GetRef() method--it's only used once! + + * htsearch/Display.cc (display): Don't show DocumentRefs with + states other than Reference_normal--these documents have various + errors. + + * htmerge/docs.cc: If a document has a state of Reference_obsolete, ignore it. + + * htcommon/HtWordList.h, htcommon/HtWordList.cc (Skip): Change + MarkGone() to Skip() to emphasize that this document should be ignored. + +Wed Jan 19 14:11:51 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.cc (SkipUselessSequentialWalking): return OK if skipping, + NOTOK if not skipping. + + * htword/WordReference.h: remove useless Clear in WordReference(key, record) + constructor. + + * htword/WordList.h,WordList.cc: Split Walk in three separate functions + WalkInit, WalkNext and WalkFinish. Much clearer. Fill the status field + of WordSearchDescription to have more information about the error condition. + Add found field to WordSearchDescription for WalkNext result. Add cursor_get_flags + and searchKeyIsSamePrefix fields to WordSearchDescription as internal state + information. + + * htword/WordList.h,WordList.cc: WalkInit to create and prepare cursor, + WalkNext to move to next match + WalkNextStep to move to next index entry, be it a match or not + WalkFinish to release cursor. + + * htword/WordList.h: WordSearchDescription::ModifyKey add to jump + while walking. + + * htword/WordList.cc (WalkNext) : it is now legal to step without + collection or callback because search contains the last match (found + field) and it s therefore not useless. + +Mon Jan 17 12:15:45 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/htdig-3.2.0.spec: added sample RPM spec file for 3.2 + +Sat Jan 15 11:53:35 2000 Loic Dachary <loic at ceic.com> + + * htdb/htstat.cc,htdb/htdump.cc: remove useless -S option since + the page size is found in the header of the file. + + * htdb/htstat.cc,htdump.cc,htload.cc: only call WordContext::Initialize + if -W flag specified. + +Fri Jan 14 18:39:12 2000 Marcel Bosc <bosc at ceic.com> + + * htword/WordBitCompress.cc: speedup, VlengthCoder::code() + finds appropriate coding interval much faster + +Fri Jan 14 11:30:41 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriver.cc(IsValidURL): Fix problem with valid_extensions, + which got lost in the shuffle yesterday. + +Fri Jan 14 15:56:49 2000 Loic Dachary <loic at ceic.com> + + * htword/WordType.cc,WordRecord.cc,WordKeyInfo.cc (Initialize): change + inverted test on instance (== instead of !=). + + * htword/WordRecord.cc (WordRecordInfo): change inverted test on compare + +Fri Jan 14 14:24:39 2000 Loic Dachary <loic at ceic.com> + + * htdig/htdig.cc,htmerge/htmerge.cc,htsearch/htsearch.cc: Use Initialize(defaults) + to load configuration file if provided. + + * htword/WordDBCompress.cc (Compress): initialize monitor to null in + constructor and check if null before usage. Core dumped in htdb/htload. + + * htword/WordContext.h (class WordContext): Add + Initialize(const ConfigDefaults* config_defaults = 0) + that probe configuration files. Usefull when htword is used as a standalone library. + +Thu Jan 13 19:52:27 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriver.cc: Fix problem with valid_extensions when an + "extension" would include part of a directory path or server + name, as contributed by Warren Jones. + +Thu Jan 13 19:22:25 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Makefile.am, htnet/Makefile.in: Add HtFile to the build process. + +Thu Jan 13 18:58:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/HtFile.h, htnet/HtFile.cc: New Transport classes + contributed by Alexis Mikhailov to allow file:// access. + + * htdig/Document.h, htdig/Document.cc: Add logic to call HtFile + objects for URLs. + + * htcommon/URL.cc: Don't remove a trailing index.html (removeIndex) + if the URL is a file://URL. + +Thu Jan 13 18:49:41 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * contrib/conv_doc.pl, contrib/parse_doc.pl: Replace "break" by + "last" for correct Perl syntax and additional cleanups and + simplifications as contributed by Warren Jones. + +Thu Jan 13 18:42:29 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htword/WordType.h, htword/WordType.cc: Implementation of new + methods IsDigit() and IsCntrl() as contributed by Marc Pohl + <marc.pohl at wdr.de>. Fixes some problems with 8-bit characters. + +Thu Jan 13 17:17:47 2000 Geoff Hutchison <ghutchis at wso.williams.edu> + + * ChangeLog.0, configure, configure.in, htfuzzy/Endings.cc, + htlib/String.cc, htlib/Configuration.cc, + htlib/QuotedStringList.cc, htlib/regex.c, htcommon/defaults.cc, + htdig/ExternalParser.cc, htdig/Retriever.h, htsearch/Display.cc, + include/htconfig.h.in installdir/htdig.conf: Merge in changes from + 3.1.x releases. + + * htdoc/: Merge in documentation changes from 3.1.x releases. + +Thu Jan 13 20:12:42 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.cc (Walk): close the cursor before returning. If + not doing that the cursor might be closed after the database is + closed, leading to double free of the cursor. Bad bug. + +Thu Jan 13 13:23:17 2000 Loic Dachary <loic at ceic.com> + + * htword/WordContext.h (class WordContext): simplifies a lot. WordContext is + no longer a repository for pointers of class instances. Only a place to call + Initialize for classes that have a single instance. + + * htlib/HtWordType.cc: added to include definition of functions shortcuts for + WordType. + + * htword/WordRecord.h,WordType.h,WordKeyInfo.h: implement homogeneous scheme to + handle unique instance of the class. + - constructor takes const Configuration& argument and init object with config + values + - static member instance + - static method Initialize the static member instance + - static method Instance returns the pointer in instance data member + + * htword/WordRecord.cc: add constructor for WordRecordInfo, and Instance static + function. Add WORD_RECORD_INVALID to depict uninitialize WordRecordInfo object. + + * htword/WordKeyInfo.h: rename SetKeyDescriptionFromFile and SetKeyDescriptionFromString + to InitializeFromFile and InitializeFromString and implement them by calling Initialize. + rename SetKeyDescriptionRandom to InitializeRandom + rename Initialize(String& line) to GetNFields(String& line) + rename Initialize(int nfields) to Alloc(int nfields) + + * htdig/htdig.cc,htmerge/htmerge.cc,htsearch/htsearch.cc,test/word.cc: replace + WordList::Initialize with WordContext::Initialize and run immediately after + config is read. Otherwise WordType fails to work and configuration value + extraction will fail. + + * htmerge/htmerge.cc: move initialization + + * test/conf/htdig.conf2.in: reorder so that it looks as much as possible as conf.in + +Thu Jan 13 12:33:46 2000 Loic Dachary <loic at ceic.com> + + * htdb/htstat.cc,htdump.cc,htload.cc: set proper progname + +Wed Jan 12 20:02:26 2000 Loic Dachary <loic at ceic.com> + + * htcommon/HtWordList.cc (Dump): Use Walk instead of Collect otherwise does not work. + +Wed Jan 12 19:38:33 2000 Loic Dachary <loic at ceic.com> + + * htlib/HtDateTime.h (class HtDateTime): killed void SetDateTime(const int t) + because they cause problems when time_t is an int and were useless anyway. + +Wed Jan 12 13:31:45 2000 Loic Dachary <loic at ceic.com> + + * htword/WordBitCompress.h: remove inline qualifier on check_tag1: its not inline + + * htword/WordKey.h: #define WORD_KEY_UNKNOWN_POSITION to -1. Remove default + argument to SetToFollowing so that its more explicit when used with + WORD_KEY_UNKNOWN_POSITION. + + * htword/WordKey.cc: change name of variable info0 to info + + * htword/WordList.cc: use WordKey::Info instead of WordKeyInfo::Get as done + in WordKey.cc for consistency. + + * htword/WordList.{cc,h},htword/WordDB.h: rename WordCursor to WordDBCursor + for consistency. + + * htword/WordList.h: Kill the WordSearchDescription::Setup useless function + + * htword/WordList.h: WordSearchDescription constructor now have a straightforward + semantics. + + * htword/WordList.h: Rename Search into Collect since it already existed, just + with a different prototype. + +Wed Jan 12 12:36:46 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.h (class WordSearchDescription): add cursor member + +Tue Jan 11 19:33:44 2000 Marcel Bosc <bosc at ceic.com> + + * htlib/HtVectorGeneric,htword: Fixed some warnings found + when compiling under FreeBSD + +Tue Jan 11 18:22:58 2000 Marcel Bosc <bosc at ceic.com> + + * htlib/HtVectorGeneric.h: inlined functions Add and Allocate which + are critical to performance + +Tue Jan 11 12:18:47 2000 Marcel Bosc <bosc at ceic.com> + + * htword/WordKey.h: fixed uninitialized memory read + + * htword/WordBitCompress.cc: Fixed big number BUG + Fixed memeory leak + +Tue Jan 11 09:37:36 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.h: move operator << and operator >> to end of + functions declarations instead of data members. + + * htword/WordList.h: added more comments on functions behaviour. + + * htword/WordList.h: added #if SWIG for Perl interface + +Mon Jan 10 17:55:05 2000 Marcel Bosc <bosc at ceic.com> + + * htword/WordDBPage: enhanced compression debugging output + +Mon Jan 10 09:07:19 2000 Loic Dachary <loic at ceic.com> + + * WordContext.h,WordKey.h,WordList.h: Added #if SWIG for perl + interfaces. Remove InSortOrder, useless now that everything + is manipulated in sort order as far as the interface is concerned. + + * WordKey.cc,WordList.cc: remove InSortOrder + + * WordKey.h,WordRecord.h,WordReference.h: commented out Set/Get for + ascii Set/Get for SWIG. + + * WordKey.h: turn CopyFrom to public for those who dont want to + use operator =. + + * WordKey.h: rename info -> Info and nfields NFields + + * WordKey.h: remove int IsFullyDefined() const redundant with Filled + +Thu Jan 06 14:41:15 2000 Marcel Bosc <bosc at ceic.com> + + * htword,all: Changed interface to overloaded Walk function that was + ambigous on some compilers... + +Thu Jan 06 14:00:01 2000 Loic Dachary <loic at ceic.com> + + * htword/WordList.h (class WordSearchDescription): rename setup to Setup + + * htword/WordList.h (class WordBenchmarking): rename show to Show + + * htword/WordRecord.{h,cc}, htword/WordReference.h, htword/WordList.h: + add comments, reorganize member functions for clarity. + +Thu Jan 06 12:01:47 2000 Marcel Bosc <bosc at ceic.com> + + * htword/compression: Split WordDBCompress.* to WordDBCompress + + WordDBPage.* + + * htword/WordBitCompress: renamed put/get to put_uint/get_uint. added get/put_uint_vl + + * htword/compression: modified slightly the compression: this makes old databases + OBSOLETE: headers compress better. Chaged Flags compress better and faster. + + * htword/WordKey: added operator [] and Get/Set accessors + + * htword: removed the obsolete --with_key configure option (KEYDESC) + + * htword/WordMonitor: addded monitor input + +Wed Jan 05 14:32:31 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKeyInfo.h (class WordKeyInfo ): if(encode) was if(sort) + + * htword/WordKeyInfo.h: rename show to Show an nprint to Nprint + + * htword/WordKeyInfo.h: move WORD_ISA from WordKey.h to WordKeyInfo.h, + rename WORD_ISA_String to WORD_ISA_STRING. + + * htword/WordKey.h: rename FATAL_ABORT to WORD_FATAL_ABORT and errr to word_errr + + * htword/WordKey.h: move private functions at bottom of class above data members + rename show_packed to ShowPacked + + * htword/WordKey.cc: move WordKeyInfo::SetKeyDescriptionRandom from WordKey.cc + to WordKeyInfo.cc + + * htword/WordKeyInfo.cc: add include htconfig.h + +Wed Jan 05 13:26:16 2000 Loic Dachary <loic at ceic.com> + + * htdig/ExternalParser.cc (parse): use nocase_compare instead of mystrcasecmp to + suppress warnings. (char*)String for mystrncasecmp that has no equivalent in + the String class. + + * htdig/Retriever.cc (IsValidURL): remove warning by (char*)url + +Wed Jan 05 11:54:19 2000 Loic Dachary <loic at ceic.com> + + * htword/WordKey.h: kill obsolete comment and add suffix explanation at + the beginning of the file. + + * htword/WordKey.h (class WordKey): rename copy_from and initialize to CopyFrom + and Initialize to fit naming conventions. Reorganize the methods to group them + in logical sets. Fix indenting. Comment each method. + + * htword/WordKey.h (Clear): add kword.trunc() + + * htword/WordKey.h: protect SetWord(const char *str,int len) because it opens + the door to all kind of specific derivations. Should be + SetWord(String(foo, foo_length)) if not performance critical. + +Wed Dec 29 18:41:14 1999 Marcel Bosc <bosc at ceic.com> + + * htlib/HtMaxMin: added max/min of arrays, added comments to + HtMaxMin. Added HtMaxMin.cc all these are used in htword + + * htlib/HtTime.h: added comments. included portable time.h + + * htlib/HtVectorGeneric.cc: added HtVector_double, HtVector_String + + * htlib/HtVectorGeneric.h: inlined several methods, disactivated CheckBounds + + * htlib/StringMatch.cc: removed #include"WordType.h", this made htlib dependant + on htword, which is not acceptable for a library + + * htlib/HtWordType.h: this replaces the macros used in StringMatch.cc + + * htlib/HtRandom.h: added tools for using random number + (this is used currently in tests) + + * htword/WordBitCompress.cc: transfered max_v/min_v to htlib + + * htword/WordBitCompress.cc: optimized put/get for better performance + + * htword/WordMonitor: system for detailed monitoring of operation + and performance within htword + + * htword/WordDBCompress: fixed compression for case of empty WordRecord + + * htword/WordDBCompress: cleaned up some code added some comments + + * htword/WordKeyInfo: split WordKey files into WordKey and WordKeyInfo files + + * htword/WordContext: centralized global configuration into one class + + * htword/WordKey: inserted randomized key/keydescription into WordKey classes + (this was previously used in several tests) + + * htword/WordKey: optimized Compare, UnpackNumber for speed (these are + really speed critical) + + * htword/WordRecord: is now configurable, type can be configured to "DATA" (htdig) + or "NONE" (for other uses) + + * htword/WordType: changed macros to global functions to make it compatible + with cleanup in StringMatch. Integrated WordType to WordContext + configuration/Initialization + + * htword/WordKeyInfo: fixed initialization from key descrition file + +Tue Dec 28 18:58:21 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htlib/String.cc: String::lowercase(), String::uppercase() + support for national character added. + + * htfuzzy/Prefix.cc: method "prefix" works now. + +Mon Dec 27 22:17:48 1999 Loic Dachary <loic at ceic.com> + + * htdig/htdig.cc (main): change '\r\n' to "\r\n" + + * Makefile.config,db/dist/Makefile.in: rename libdb to libhtdb to + prevent conflicts with installed libdb. + + * db/dist/Makefile.in: do not install documentation nor binary + utilities (db_dump & al) since they are replaced by htdb binaries + (htdump & al). + + * db/dist/Makefile.in (prefix): prepend $(DESTDIR) to prefix + to support make DESTDIR=/staging install for binary distribution + packages generation. + + * configure.in: use AC_FUNC_ALLOCA to check for alloca. Used + in regex and test/dbbench.cc only but definitely a usefull + feature to have. + +Thu Dec 23 11:10:24 1999 Marcel Bosc <bosc at ceic.com> + + * htcommon/defaults.cc: set wordlist_cache_size default to 10Meg + + * db/mp: removed some debuging messages + + * htword/WordList.cc: added warning if no cache + + * test/word.cc: added cache + + * htlib/HtTime.h: added ifdefs for portable time.h sys/time.h + +Tue Dec 21 23:33:06 1999 Loic Dachary <loic at ceic.com> + + * htdoc/attrs.html,cf_by*.html: regenerate to include + wordlist_wordkey_description attribute + + * htcommon/Makefile.am: Add AM_LFLAGS = -L and AM_YFLAGS = -l to + prevent #line generation because it confuses the dependencies + generator of GCC if configure run out of source tree. + + * configure.in: remove --with-key option. Not needed since + word description now dynamic. Destroyed WordKey.h if + specified. + + * htword/Makefile.am: remove commented lines for WordKey.h + generation. + +Tue Dec 21 18:18:01 1999 Marcel Bosc <bosc at ceic.com> + + * htword: added code for benchmarking + +Mon Dec 20 17:59:15 1999 Marcel Bosc <bosc at ceic.com> + + * WordKey: Made the key structure dynamic: Changing the + key structure used to imply recompiling the htword library. + This should not change anything in htdig. + + * WordKey: numerical key fields are stored in an array of unsigned + ints instead of compile-time defined pools. + + * WordKey.h: WordKey now needs copy opreators. Setbits are stored + in sort order (used to be in encoding order) + + * htword: word_key_info is now a pointer, had to change all references + + * word.cc: Rewrote wordkey test for new dynamically + set key structure. The test randomly creates key structures + and tests them. + + * test: adapted test files (simplifies things a lot) + +1999-12-21 Toivo Pedaste <toivo at ucs.uwa.edu.au> + + * htlib/Dictionary.cc: Fix memory leak when destroying dictionary + + * htlib/StringList.cc, htdig/Retriever.cc: Fix memory leak, not + the most elegent way but I'm not sure about the exact semantics + of StringList + +Mon Dec 20 21:59:03 1999 Loic Dachary <loic at ceic.com> + + * htdb/{Makefile.am,err.c,getlong.c}: Fix mistake: err.c and + getlong.c contain C functions (declared in clib_ext) and + must be C compiled otherwise the prototype won't fit. Checking + db Makefiles, getlong.c and err.c are added to the list of objects + for each utility program. This guaranties that they won't conflict + with objects included in libdb.a. + +Sun Dec 19 20:04:42 1999 Loic Dachary <loic at ceic.com> + + * htdb/{Makefile.am, err.cc}: add err.cc for portability + purposes. + +Fri Dec 17 18:04:09 1999 Loic Dachary <loic at ceic.com> + + * Makefile.config: add PROFILING variable and document it. Designed + to enable profiling of htdig easily. + + * */Makefile.am: add *_LDFLAGS = $(PROFILING) for every binary to + enable profiling, if specified. + +Thu Dec 16 17:16:33 1999 Loic Dachary <loic at ceic.com> + + * htdb/*.cc: add -W option to activate htword specific compression. + Keep compatibility with zlib compression (-z only). + +Thu Dec 16 11:56:02 1999 Loic Dachary <loic at ceic.com> + + * test/dbbench.cc: change wrong strcpy with memcpy + +Wed Dec 15 15:04:39 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/htdig.cc(main): Handle list of URLs given on stdin, if + optional "-" argument given. (Uses >> operator below.) + + * htlib/htString.h, htlib/String.cc: Added Alexis Mikhailov's String + input methods, readLine() and >> operator. + +Wed Dec 15 13:59:34 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc: remove include of sys/stat.h, which is no + longer needed after hack removed from Need2Get(), and could pose + a problem on systems that need sys/types.h included first. + +Wed Dec 15 17:00:04 1999 Loic Dachary <loic at ceic.com> + + * htword/WordDB.h: add inline keyword for portability + + * htword/WordDB.h: add CmprInfo method to get object describing + compression scheme for Berkeley DB + + * htdb: Add htdump, htload, htstat equivalent of db_dump + db_load and db_stat that know about htword specific compression + strategy. + + * htword/WordDBCompress: add static to localy defined functions and + variables, remove unecessary #define and #include from header. + +Tue Dec 14 21:56:57 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/conf_parser.lxx, htcommon/conf_lexer.cxx: + bcopy on Solaris is in strings.h, not in string.h. Added + check for #ifdef HAVE_STRINGS_H + +Tue Dec 14 19:18:22 1999 Marcel Bosc <bosc at ceic.com> + + * WordBitCompress: code cleaned up and commented + +Tue Dec 14 18:32:21 1999 Loic Dachary <loic at ceic.com> + + * htword/Word{Record,Reference,Key}: added a Get method to + convert the structure into it's ascii string representation. + operator << now uses Get. + +Tue Dec 14 17:46:33 1999 Loic Dachary <loic at ceic.com> + + * db/dist/Makefile.in (install): fix bugous test for libshared + +Tue Dec 14 14:10:28 1999 Loic Dachary <loic at ceic.com> + + * htword/{WordKey,WordReference,WordRecord}: rework + the input methods (operator >>). Each class now has a Set function + to initialize itself from an ascii description and a Get function + to retrieve an ascii description of the object. + + * htword/WordList: operator >> has a better and cleaner input loop + using StringList and String instead of char*. + +Tue Dec 14 12:06:24 1999 Marcel Bosc <bosc at ceic.com> + + * WordDBCompress.cc : Added compression version checking + +Mon Dec 13 21:09:31 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/conf_parser.lxx, htcommon/conf_lexer.cxx: + Added #include <string.h> Without it failed to compile + on Solaris. + +Mon Dec 13 16:31:27 1999 Marcel Bosc <bosc at ceic.com> + + * htword/WordBitCompress.cc : fixed bug that made compression + fail on big documents or big number of url's ... + +Mon Dec 13 13:49:35 1999 Loic Dachary <loic at ceic.com> + + * htword/WordKey.h.tmpl: Added *_POSITION macro generation + +Mon Dec 13 11:51:50 1999 Marcel Bosc <bosc at ceic.com> + + * htcommon/conf_parser.yxx: fixed several delete that should be delete [] + +Sun Dec 12 17:14:00 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/conf_lexer.lxx, htcommon/conf_lexer.cxx: + national symbols are allowed in right part of expressions + (noted by Marcel Bosc). + Changed default behavior of flex from print unknown chars + on stdout to exit with error message. + +Sat Dec 11 17:34:03 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htdig/Retriever.cc,htdig/htdig.cc: "exclude_urls","bad_querystr" + "bad_extensions","valid_extensions","local_default_doc" + changed for new config. + + * htdig/Server.cc: "server_max_docs","server_wait_time" changed for + new config. + + * check for "limit_normalized" moved from Retriever::got_href and + Retriever::got_redirect to more appropriate Retriever::IsValidUrl + +Fri Dec 10 18:05:48 1999 Marcel Bosc <bosc at ceic.com> + + * htword: checked for failed memory allocations in compression code + +Fri Dec 10 18:03:42 1999 Marcel Bosc <bosc at ceic.com> + + * htword/WordList,htcommon/HtWordList.cc,htmerge/words.cc: cleaned up WordList::Walk() + function, change two occurences of WordList::Walk in htdig files + +Fri Dec 10 17:40:22 1999 Marcel Bosc <bosc at ceic.com> + + * htword/WordKey.cc (Compare): Fixed bug: compare used to compare chars and not + unsigned chars, this failed when non-ascii caracters were used + +Fri Dec 10 11:54:36 1999 Marcel Bosc <bosc at ceic.com> + + * htcommon/defaults.cc : doc for wordlist_cache_size + +Thu Dec 09 17:07:47 1999 Marcel Bosc <bosc at ceic.com> + + * htcommon/defaults.cc: added defaults for compression and DB configuration + parameters + +Thu Dec 09 16:47:54 1999 Loic Dachary <loic at ceic.com> + + * db/dist/configure.in,Makefile.in: Added shared lib support + for linux only. Not enabled if not on linux. + +Thu Dec 09 15:07:11 1999 Loic Dachary <loic at ceic.com> + + * acinclude.m4,db/dist/acinclude.mr: CHECK_ZLIB now fails if either + zlib.h or libz is not found. + + * configure.in: do not test zlib.h + + * db/db/db.c,db/mp/mp_fopen.c: added #ifdef HAVE_ZLIB so that + compilation works if zlib is not found + + * htlib/.cvsignore: remove wrong *.cxx + + * test/dbbench.cc: added #ifdef HAVE_ZLIB so that + compilation works if zlib is not found + +Thu Dec 09 13:25:45 1999 Marcel Bosc <bosc at ceic.com> + + * test/Word.cc,t_wordlist,Makefile.am: upgraded tests + * htcommon/HtWordList.h: fixed Configuration/HtConfiguration problem + +Thu Dec 09 12:10:32 1999 Marcel Bosc <bosc at ceic.com> + + * htword: Added the compression code: + * WordDBCompress: Classes for page specific compression code + * WordBitCompress: Classes for bitstreams and non-specific compression + +Thu Dec 9 12:09:51 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htcommon/HtConfiguration.cc: bug fix: sometimes + htConfiguration::Find(url,char*) retuned empty values + even if there was something to return. + +Thu Dec 09 11:15:30 1999 Marcel Bosc <bosc at ceic.com> + + * htlib/Configuration.cc (Read): Read is now a virtual function: the old one + for Configuration the new one (Vadim's ... with the parser) in HtConfiguration + +Thu Dec 09 11:01:22 1999 Loic Dachary <loic at ceic.com> + + * acinclude.m4: upgrade AC_PROG_APACHE macro for + modules detection. + + * test/conf/httpd.conf,test/test_functions.in,test/conf/Makefile: + use @APACHE_MODULES@ to accomodate various apache modules directory + flavors. + +Tue Dec 07 20:32:34 1999 Marcel Bosc <bosc at ceic.com> + + * htdig: Split the Configuration class into Configuration + and HtConfiguration. All the HtConfiguration and the + configuration parsing (lex..) was woved to htcommon. + Configuration was replaced by HtConfiguration as needed + +Tue Dec 07 16:21:13 1999 Loic Dachary <loic at ceic.com> + + * configure.in: added AM_PROG_LEX and AC_PROG_YACC + + * htlib/Makefile.am: simply set conf_lexer.lxx and conf_parser.yxx, + automake knows how to handle these. The renaming is needed to avoid + conflicts in automake generated rules. + +Mon Dec 6 16:23:39 CST 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/cf_generate.pl: added a bit of error checking for when it + can't fetch the config info, and made it more flexible for what it + allows as terminator. + * htcommon/defaults.cc: add default and description for authorization + attribute, and clean up external_protocols entry for cf_generate.pl. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + * htdig/htdig.cc(main): set authorization parameter before Retriever + constuctor is called, as it may initialize a Server. (Should complete + fix of PR#490.) + +Mon Dec 6 21:34:29 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htdig/Document.cc htdig/htdig.cc: "authorization" parameter + in config is added and is new config compatible. + New code has'n got PR#490 bug (don't authentificate robot.txt) + +Mon Dec 06 11:58:56 1999 Marcel Bosc <bosc at ceic.com> + + * HtVectorGeneric.h: generic vectors, stl-free: this was originally a copy of + HtVector.h with Object * replaced by GType and some small changes. + It has been modified and checked to see if it all works ok. + You can build vectors of any type that has an empty constructor. + * HtVectorGenericCode.h: generic vectors, stl-free: implementation + (modified "copy" of HtVector.cc) + * HtVectorGeneric.cc: generic vectors: implementation for common types + * HtVector_int.h: generic vectors: declaration for the most common type + (and example of howto use) + +Sat Dec 4 23:49:18 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Synonym.cc (createDB): Change declaration to match + Fuzzy::createDB(config), allowing the method to be called by + htfuzzy. + + * htfuzzy/htfuzzy.cc (main): Add an error message if + fuzzy->createDB() comes back with an error. + +Sat Dec 4 15:38:34 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * htnet/HtHTTP.cc, htnet/HtHTTP.h, htdig/Document.cc + fixed proxy bug. GET command in HtHTTP included only + path of url insead full url when use proxy. + HtHTTP::UseProxy(int) added. + + * htdig/Document.cc: make "http_proxy" parameter + url-depended for new configuration. + +Fri Dec 03 14:57:13 1999 Marcel Bosc <bosc at ceic.com> + + * BerkelyDB: Compression code: added possibility to use + user-defined compression routines (the goal is to enable + the mifluz-specific DB page compression that obtains + higher compression ratios than generic zlib compression) + this envolves the following changes in BerkeleyDB: + * BerkelyDB/CompressionEnvironment: Adding a structure db_cmpr_info + in db_env that permits db user to specify the external compression + routines and other information related to compression + * BerkelyDB/CompressionEnvironment: Adding a cmpr_context structure + to DB_MPOOLFILE that stores information that compression needs + (the _weacmpr DB and the db_cmpr_info) + * BerkelyDB/Compression: Needed to modify the compression + system (that is implemented in the BerkelyDB memory pool) to permit + higher compression ratios and to use the compression environment + +Thu Dec 2 16:47:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc(parse_url): Use a static int to avoid + re-fetching local_urls_only from the config object. + (Initial, got_href, got_redirect): Try to get the local filename + for a server's robots.txt file and pass it along to the newly + generated server. + + * htdig/Server.cc(ctor): Retrieve the robots.txt file from the + filesystem when possible and respect the local_urls_only option. + + * htdig/Server.h: Change type of local_robots_file to String* to + better match Retriever::GetLocal(). + +Thu Dec 02 16:24:27 1999 Loic Dachary <loic at ceic.com> + + * htword/WordReference.cc,WordKey.cc,WordRecord.cc (Print): Add function + to ease printing from Perl. + +Thu Dec 02 16:06:29 1999 Loic Dachary <loic at ceic.com> + + * htword/WordReference.h (WORD_FILLED): remove + unused WORD_FILLED and WORD_PARTIAL macros + +Wed Dec 01 19:18:42 1999 Loic Dachary <loic at ceic.com> + + * htword/WordKey.h.tmpl,WordRecord.h,WordReference.h, + WordList.h: Added #ifndef SWIG for + www.swig.org sake. + +Wed Dec 1 19:47:20 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegex.cc, htlib/HtRegex.h (set*): Add a case_sensitive + flag which defaults to insensitive. This better mirrors the + StringMatch class. + + * htcommon/URL.cc(signature): Make the signature a proper URL to + the base of the server. + + * htdig/Server.h: Add IsDead() methods to query the status of the + server, as well as an IsDisallowed() method to query whether a URL + is forbidden by the robots.txt rules. Change _disallow to HtRegex. + + * htdig/Server.cc(ctor): Only retrieve the robots.txt file if this + is an http or https server. + (robotstxt): Use the proper HtRegex method for setting the pattern. + (push): Remove logic checking the _disallow patterns. This is now + done by the Retriever object. + + * htcommon/defaults.cc: Add new attribute "local_urls_only" which + defaults to false, which dictates whether retrieval should revert + to another method if RetrieveLocal() fails. + + * htdig/Retriever.cc(parse_url): Check to see if the server is + dead before calling the Retrieve() method. Notify the server + object if a connection fails. Also respects the new + local_urls_only attribute as described above. + (IsValidURL): Check the server's IsDisallowed() method to see if + the robots.txt forbids this URL. + + * htdoc/THANKS.html: Updated to reflect current contributions, etc. + + * README: Update to mention version 3.2.0b1. + +Wed Dec 1 17:05:48 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc(GetLocal): Fix error in GetLocalUser() return + value check, as suggested by Vadim. + +Wed Dec 1 15:57:09 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/conv_doc.pl: Added a sample external converter script. + +Mon Nov 29 23:19:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriver.cc, htdig/Retriver.h, htdig/Server.cc, + htdig/Server.h: forward-ported patch provided by Alexis Mikhailov + <alexis at medinf.chuvashia.su> and Gilles's for cleaning up + IsLocal/GetLocal. Makes local digging persistent, even when HTTP + server is down. + +Mon Nov 29 22:35:06 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * test/url.cc: New test for URL class. + + * test/url.parents: Base URLs for parsing. + + * test/url.children: Derived relative URLs for testing. + + * test/Makefile.am, test/Makefile.in: Add the above for building. + + * htcommon/URL.cc: A variety of bug fixes (some hacks), especially + for file:// and user@host URLs. + +Sun Nov 28 00:35:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * .version: Bump to 3.2.0b1-dev. + +Sat Nov 27 20:23:14 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/ExternalTransport.h, htdig/ExternalTransport.cc: New class + to allow external scripts to handle transport methods. + + * contrib/handler.pl: Example handler using the program 'curl' to + handle HTTP or HTTPS transactions. + + * htcommon/defaults.cc: Add new configuration option + 'external_protocols' as a list of protocols and scripts to handle + them. Documentation currently needs to be written. + + * htdig/Document.h, htdig/Document.cc(Retrieve): Call + ExternalTransport::canHandle to establish which protocols are + supported by handler scripts and then create an appropriate + transport object. + + * Makefile.in, htdig/Makefile.am, htdig/Makefile.in: Add + dependencies for ExternalTransport class. + + * htnet/HtHTTP.h, htnet/HtHTTP.cc, htnet/Transport.h, + htnet/Transport.cc: Move _location field from HtHTTP_Response to + Transport_Response to allow other subclasses to use it. Similarly, + move NewDate and RecognizeDateFormat to Transport. + +Fri Nov 26 17:07:52 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc(HTML & do_tag): add code to turn off indexing between + <style> and </style> tags. + +Fri Nov 26 15:56:47 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(setVariables): added Alexis Mikhailov's fix + to check the number of pages against maximum_pages at the right time. + * htlib/String.cc(write): added Alexis Mikhailov's fix to bump up + pointer after writing a block. + +Wed Nov 24 15:10:05 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * installdir/htdig.conf: Add bad_extensions to make it more obvious to + users how to exclude certain document types. + +Tue Nov 23 19:29:37 CST 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htnotify/htnotify.cc(send_notification): apply Jason Haar's fix + to quote the sender name "ht://Dig Notification Service". + +Tue Nov 23 19:46:00 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * conf.tab.cc.h conf.l.cc conf.tab.cc + Added files pre-generated from conf.y, conf.l + +Sun Nov 21 18:26:21 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + *htdig/Document.cc: "max_doc_size" supports new + configuration and is url-depended now. + +Sun Nov 21 17:06:50 EET 1999 Vadim Chekan <vadim at etc.lviv.ua> + + * New config parser commited. htlib/(Makefile.am,Makefile.in), + htlib/Configuration.cc, htlib/Configuration.h + htlib/(conf.y, conf.l) added. + +Fri Nov 12 14:17:37 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/cgi.cc(init): Fix bug in reading long queries via POST + method (PR#668). + +Wed Nov 10 15:34:04 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(setVariables & createURL), + htsearch/htsearch.cc(main), htdoc/hts_templates.html: handle keywords + input parameter like others, and make it propagate to followups. + +Wed Nov 10 15:16:57 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc: Fix PR#688, where htdig goes into an infinite + loop if an entry in local_urls (or local_user_urls) is missing a '=' + (or a ','). + + * htcommon/defaults.cc: removed vestigial references to MAX_MATCHES + template variables in search_results_{header,footer}. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + + * htdoc/hts_form.html: add disclaimer about keywords parameter not + being limited to meta keywords. + + * htdoc/meta.html: add description of "keywords" meta tag property. + add links to keywords_factor & meta_description_factor attributes. + +1999-11-10 Toivo Pedaste <toivo at ucs.uwa.edu.au> + + * htdig/Retriever.cc : Ignore SIGPIPEs with persistant connections + + * htnet/HtHTTP.cc : Fix buffer overrun reading chunks + + * htdig/Document.cc : Make redirects work + + * htdig/Retriever.cc : Make valid URL checks apply to initial URL's + particularly those from a previous run + + * htlib/Dictionary.cc : Fix memory deallocation error + + +Tue Nov 02 13:44:57 1999 Marcel Bosc <bosc at ceic.com> + + * htsearch/Display.cc (setVariables): parentheses missing around ternary + operator : confusion in priority with <<. + +Tue Nov 02 13:33:50 1999 Marcel Bosc <bosc at ceic.com> + + * htsearch/Display.cc (hilight): changed static char * (!!) to const string, + static char evaluated before configuration is loaded so config had no + effect + unnecesary conversion + +Tue Nov 02 11:45:49 1999 Marcel Bosc <bosc at ceic.com> + + * htword/WordKey.cc : Cleaned up obsolete code now using *InSortOrder fcts + and WordKeyInfo.sort[] + * htword/WordKey : Added FirstSkipField : + find first field that must be checked for skip + * htword/WordKey (PrefixOnly): now returns OK/NOTOK, fixed bug which + made Walk loop over the whole db if the searchkey just had + a the "word" field defined + * htword/WordKey.cc (Unpack): had forgten to: SetDefinedWordSuffix + * htword/WordKey.cc (operator >>): added check for very very long words + (even if this should never happen) + * htword/WordKey.cc (operators << >>): added <UNDEF> word suffix handling + * htword/WordKey.h : Filled() did not check for WordSuffix + * htword/WordKey.h : added WordKey::ExactEqual + * htword/WordKey.h (IsDefinedWordSuffix): fixed bad flag check + * htword/WordList : Removed all obsolete HTDIG_WORDLIST flags: only + two remain : COLLECTOR and WALKER the rest is now specified by the searchKey + removed action arg to WordList::Collect() + * htcommon/HtWordList.cc,htmerge/words.cc : changed flags in calls to WordList::Walk + * htword/WordList.cc : skip now deals with the SuffixUndefined case + +Fri Oct 29 17:13:21 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/cf_generate.pl: now updates last modified date in attrs.html + * htdoc/attrs.html: reran cf_generate.pl + +Fri Oct 29 15:28:22 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(setVariables & hilight): added Sergey's idea + for start_highlight, end_highlight & page_number_separator attributes. + * htcommon/defaults.cc: added & documented these. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + +Thu Oct 28 13:06:23 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/ExternalParser.cc: added support for external converters + as extension to external_parsers attribute. + * htcommon/defaults.cc: Updated external_parsers with new description + and examples of external converters. + +Thu Oct 28 12:52:28 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: Updated programs lists for *_factor, so they + all refer to htsearch and not htdig. Added htsearch to programs lists + for translate_*. img_alt_factor & url_factor not defined yet because + they're still not used in htdig/htsearch. + +Wed Oct 27 15:53:36 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: added descriptions & examples for + doc_excerpt, heading_factor, max_descriptions, minimum_speling_length, + regex_max_words, use_doc_date, valid_extensions. Added references + to these elsewhere in document as appropriate. Removed -pairs option + from pdf_parser default (again). Minor changes to noindex_start & end, + and changed example for modification_time_is_now. Corrected references + to heading_factor_[1-6]. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + +Wed Oct 27 13:32:50 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/cf_generate.pl: changed formatting of output to more closely + match format of old attrs.html (to make diff'ing easier), + and fixed handling of pdf_parser default to strip quotes. + * htcommon/defaults.cc: oops, fixed typo in url_part_aliases example. + * htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl + +Wed Oct 27 18:24:36 1999 Loic Dachary <loic at ceic.com> + + * htdoc/cf_generate.pl: fixed wrong target for cf_byprog, escape + HTML chars <>&'" for default values. + +Wed Oct 27 10:21:18 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: restored 2nd example for url_part_aliases + +Tue Oct 26 16:28:29 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc: corrected descriptions for allow_in_form, + search_results_header, noindex_start, noindex_end. Also fixed a + few small typos & formatting errors here & there in descriptions + and examples. + +Tue Oct 26 16:01:22 1999 Loic Dachary <loic at ceic.com> + + * htword/Makefile.am: rm Wordkey.h instead of chmod to copy with + non existent WordKey.h + +Tue Oct 26 10:54:52 1999 Loic Dachary <loic at ceic.com> + + * htcommon/default.cc: fixed all inconsistencies reported by Gilles. + +Mon Oct 25 11:42:13 1999 Marcel Bosc <bosc at ceic.com> + + * htword/ word.cc,t_wordskip,skip_db.txt: Added test for *Skip Speedup* + * htword/ WordList: Added tracing of Walk() for debuging purposes + +Fri Oct 22 18:22:00 1999 Marcel Bosc <bosc at ceic.com> + + * htword/ WordList.cc,WordKey: Added a defined/undefined flag for saying + if a search key's word is a prefix or not: WORD_KEY_WORDSUFFIX_DEFINED + reduces code size and makes it much easier to undertand + * htword/ WordList,WordReference,WordKey: Added input output streams for + WordList,WordReference,WordKey + +Wed Oct 20 16:47:52 1999 Marcel Bosc <bosc at ceic.com> + + * htword/ WordKey,Makefile.am,WordCaseIsAStatements.h: for readability + replaced the switch ... #ifdef ..STATEMENT().... sequence that apeared many times + with an include file :WordCaseIsAStatements.h + + * htword/ WordKey: WordKeyInfo: duplicated all of the fields structure into + sort structure, for fast acces without cross referencing and for simplifying code + (required change of perl in template WordKey.h.tmpl) + + * htword/ WordList: *Skip Speedup* added a speedup to avoid wasting time + by sequentialy walking through useless entries. see function: + SkipUselessSequentialWalking() for an example and more info + + * htword/ WordKey.h,WordKey.cc: Changed Set,Unset,IsSet Wordkey accesors' names to: + SetDefined,Undefined,IsDefined. (easier to read and avoids naming conflicts) + + * htword/ WordKey: added generic numerical accesors for accesing + numerical fields in WordKey (in sorted order):GetInSortOrder,SetInSortOrder + + * htword/ WordKey,word_builder.pl: added a MAX_NFIELDS constant, that specifies + a maximum number of fields that a WordKey can have. Sanity check in word_builder.pl. + + * htword/ word_builder.pl: enforced word sort order to ascending + + * htword/ WordList: added a verbose flag using config."wordlist_verbose" + +Tue Oct 19 18:36:42 1999 Loic Dachary <loic at ceic.com> + + * htword/WordType.h: const accessors to wtype and config + +Tue Oct 19 13:10:47 1999 Loic Dachary <loic at ceic.com> + + * acconfig.h: remove uncessary VERSION (redundant) + +Tue Oct 19 11:32:38 1999 Loic Dachary <loic at ceic.com> + + * db/Makefile.in,db/dist/Makefile.in: install db library so + that external applications can be linked. + +Tue Oct 19 10:57:27 1999 Loic Dachary <loic at ceic.com> + + * configure.in: add --with-key to specify alternate to htword/word.desc + + * configure.in: htword is done before htcommon to prevent unecessary + recompilation because WordKey.h changes. + + * htword/Makefile.am: use @KEYDESC@ + +Tue Oct 19 10:38:41 1999 Loic Dachary <loic at ceic.com> + + * test/word.cc use TypeA instead of DocID and the like + +Mon Oct 18 17:21:34 1999 Loic Dachary <loic at ceic.com> + + * Makefile.config: AUTOMAKE_OPTIONS = foreign + +Mon Oct 18 11:40:17 1999 Marcel Bosc <bosc at ceic.com> + + * htword/ WordList.cc (Walk): fixed bug in Walk: if flag HTDIG_WORDLIST was set + then data was uninitialized in loop + +Fri Oct 15 18:52:03 1999 Marcel Bosc <bosc at ceic.com> + + * htdig/Document.h (class Document): added const to: + Transport::DocStatus RetrieveLocal(HtDateTime date, const String filename); + +Fri Oct 15 17:46:23 1999 Loic Dachary <loic at ceic.com> + + * acinclude.m4,configure.in: modified AC_APACHE_PROG to detect + version number and control it. + + * test/conf/*.in: patch to fit module loading or not, accomodate + various installation configurations. + + * test/test_functions.in: More portable call to apache. + +Fri Oct 15 12:55:47 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Document: added the management of 'persistent_connections', + 'head_before_get', 'max_retries' configuration attributes. + +Fri Oct 15 12:54:11 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * test/testnet.cc: added the option '-m' for setting the max size + of the document. + +Fri Oct 15 12:48:49 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Server: added a flag for persistent connections. + It's set to true if the Server allows persistent connections. + It should be used when retrieving a document. + +Fri Oct 15 12:45:42 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * defaults.cc: added the configuration attributes 'persistent_connections', + 'max_retries' and 'head_before_get'. Their default values are + respectively true, 3, false. + +Fri Oct 15 12:35:51 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP.cc: managing of uncompleted stream reading with persistent + connections (it occurs when max_doc_size is lower than the real + content length of the document, or when a document is not parsable + and we asked for it with a GET call). + + * Transport: _host variable is treated as a String, as Loic suggested. + +Fri Oct 15 12:11:23 1999 Marcel Bosc <bosc at ceic.com> + + * Added README to htword + +Thu Oct 14 11:29:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/mktime.c, htlib/regex.c, htlib/regex.h, htlib/strptime.c: + Updated with latest glibc versions. Merging from glibc sources may + have introduced bugs, so this is the last merge before htdig-3.2.0b1. + +Thu Oct 14 13:09:32 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/Transport: added statistics for open and close of connections + and changes of servers. + Fixed a bug in the SetConnection method, regarding the host comparison. + Added a method for showing the statistics on a given channel. + + * htnet/HtHTTP: More debug info available. + Added a method for showing the statistics on a given channel. + + * test/testnet.cc: now receives changes above. + +Wed Oct 13 13:35:42 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Document.h: added an HtHTTP pointer to the class. + + * htdig/Document.cc: Transport and HtHTTP initialization methods + inside the Document constructur. The class destructor now calls + only the HtHTTP destructor (not the Transport destructor). + Modified the Retrieve method. + + * htdig/Server.h: _last_connection is now an HtDateTime object. + + * htdig/Server.cc: _modified the constructor and the delay method. + + * htdig/Retriever.cc: modified the parse_url function in order to manage + all the Document status messages coming from the Transport class. + Also modified the method for not found URLs for managing the no_port + status. + +Tue Oct 12 10:12:10 1999 Loic Dachary <loic at ceic.com> + + * install headers and libraries so that htdig libraries may be used by external programs + + * htword/WordList.cc,WordType.cc: add comments about config parameters used. + +Fri Oct 8 09:35:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtDateTime.cc (SetFTime): Change buffer argument to const + char* to prevent problems passing in const buffers. + + * htnet/HtHTTP.h: Change SetUserAgent to take a const char* to + prevent problems passing in const parameters. + + * htdig/Document.h, htdig/Document.cc(): Use Transport class for + obtaining documents. Remove duplication of declarations + (e.g. DocStatus). + + * htdig/Retriever.cc: Adapt switch statements from + Document::DocStatus to Transport::DocStatus. + + * htdig/Server.cc: Use Document::Retrieve instead of RetrieveHTTP. + +Fri Oct 08 16:35:16 1999 Loic Dachary <loic at ceic.com> + + * test/t_htnet: succeed if timeout occurs. It was the opposite. + + * configure.in: AC_MSG_CHECKING(how to call getpeername?) add missing + comma at end for header spec block. + +Fri Oct 08 14:42:47 1999 Loic Dachary <loic at ceic.com> + + * Fix all warnings reported by gcc-2.95.1 related to string + cast to char*. + +Fri Oct 08 14:04:21 1999 Loic Dachary <loic at yoda.ceic.com> + + * htlib/Configuration,ParsedString,Dictionary: change char* to String + where possible. + + * Fix a lot of warnings reported by gcc-2.95.1 related to string + cast to char*. + + * Completely disable exception code from db. + +Fri Oct 08 13:44:32 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP.cc: fixed a little bug in setting the modification time + if not returned by the server. + +Fri Oct 08 11:30:53 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP.cc: better management of connection failures return values. + * Transport.h: added Document_no_connection and + Document_connection_no_port enum values. + * testnet.cc: management of above changes. + +Fri Oct 08 11:27:31 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * configure.in: modified getpeername() test. + +Fri Oct 08 10:28:15 1999 Loic Dachary <loic at ceic.com> + + * htdig/Retriever.cc (IsValidURL): test return value of + ext = strrchr(url, '.'); + + * htword/WordRecord.h: initialize info member to 0 in constructor and + Clear. + + * htlib/Configuration: char* -> String to all functions. Resolve + warnings. + +Thu Oct 07 16:19:46 1999 Loic Dachary <loic at ceic.com> + + * htnet/HtHTTP.cc (ReadChunkedBody): use append instead of + << because buffer is *not* null terminated. + + * htnet/Transport.cc (Transport): initialize _port and _max_document_size + otherwise comparison with undefined value occurs. + +Thu Oct 07 16:34:21 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP.cc: call FinishRequest everytime in HTTPRequest() a value is + returned. + * testnet.cc: improved with more statistics and connections timeouts + control. + +Thu Oct 07 12:53:12 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * configure.in: modified getpeername() test function with + AC_LANG_CPLUSPLUS instead of AC_LANG_C. + +Thu Oct 07 11:56:52 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP.cc : fixed bug of double deleting _access_time + and _modification_time objects in ~HtHTTP(). + +Thu Oct 07 10:17:22 1999 Loic Dachary <loic at ceic.com> + + * htword/WordRecord.h: change (const char*) cast to (char*) + + * htword/WordKey.h.tmp: fix constness of accessors, const accessor + returns const ref. Prevents unecessary copies. + +Wed Oct 6 23:31:50 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnet/Connection.h, htnet/Connection.cc: Merge in io + class. Connection class was the only subclass of io. + + * Makefile.in, htlib/Makefile.am, htlib/Makefile.in: Update for + removed io class. + + * htdig/ExternalParser.cc: Add more verbose flags for errors. + +Wed Oct 06 14:56:34 1999 Loic Dachary <loic at ceic.com> + + * htnet/Connection.cc (assign_server): use free, not delete + on strdup allocated memory. + + * htcommon/URL.cc (URL): set _port to 0 in constructors. + +Wed Oct 06 12:08:38 1999 Loic Dachary <loic at ceic.com> + + * Move htlib/HtSGMLCodec.* to htcommon to prevent + crossed interdependencies between htlib and htcommon + +Wed Oct 06 12:07:32 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP.cc: patch from Michal Hirohama regarding + the SetBodyReadingController() method + +Wed Oct 06 11:49:15 1999 Loic Dachary <loic at ceic.com> + + * Move htlib/HtZlibCodec.* htlib/cgi.* to htcommon to prevent + crossed interdependencies between htlib and htcommon + +Wed Oct 06 11:40:48 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * HtHTTP: stores the server info correctly and removed some debug info + in chunk managing + +Wed Oct 06 11:39:12 1999 Loic Dachary <loic at ceic.com> + + * Move htlib/*URL* to htcommon + +Wed Oct 06 10:09:19 1999 Loic Dachary <loic at ceic.com> + + * README: add htword + + * test/t_htnet: fix variable set problem & return code problem + +Wed Oct 06 08:53:52 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * Written t_htnet test + +Tue Oct 5 12:24:43 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * db/*: Import of Sleepycat's Berkeley DB 2.7.7. + + * db/db/db.c, db/include/db.h, db/include/db_cxx.h, db/mp/mp_bh.c: + Resolve conflicts created in merge. + +Tue Oct 05 18:53:13 1999 Loic Dachary <loic at ceic.com> + + * htdig/Display.cc, htword/*.cc: add inclusion of htconfig.h + +Tue Oct 05 14:54:17 1999 Loic Dachary <loic at ceic.com> + + * htlib/htString.h (class String): add set(char*) + + * htword/WordKey.cc: define typedefs for key components. Leads to more + regular code and no dependency on a predefined set of known types. + All types must still be castable to unsigned int. + Assume Word of type String always exists. + Generic Get/Set/Unset methods made simpler. Added const and ref + for Get in both forms. + + * htword/WordList.cc: enable word reference counting only if wordlist_extend + configuration parameter is set. This parameter is hidden because + no code uses per word statistics at present. It is only activated + in the test directory. + + * htword/word_list.pl: add mapping to symbolic type names, + force and check to have exactly one String field named Word. + +Mon Oct 04 20:05:35 1999 Loic Dachary <loic at ceic.com> + + * test: add thingies to make test work when doing ./configure + outside the source directory. + + * htword/WordList: Add Ref and Unref to update statistics. + Fix walking to start from the end of statistics. All statistics + words start with \001, therefore at the beginning of the file and + all clustered together. + + * htword/WordStat: derived from WordReference to implement + uniq word statistics. + + * test/word.cc: test statistics updating. + + * htword/WordKey.cc: fix bugous compare (returned length diff + if key of different length). + +Mon Oct 04 18:43:56 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * test/testnet.cc: added the option for HEAD before GET control + +Mon Oct 04 17:33:24 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/Transport.h .cc: added the FlushConnection() method + + * htnet/HtHTTP.h .cc: now the Request() method can make a HEAD + request precede a GET request. This is made by default, and + can be changed by using the methods Enable/DisableHeadBeforeGet(). + A configuration option can be raised to manage it. + +Mon Oct 04 12:43:41 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htlib/io.h .cc: added a flush() method. + + * htnet/HtHTTP.cc: manage the chunk correctly, by calling the flush() + method after reading it. + +Mon Oct 04 12:02:24 1999 Loic Dachary <loic at ceic.com> + + * htlib/htString.h: move null outside inline operator [] functions. + +Fri Oct 01 14:55:56 1999 Loic Dachary <loic at ceic.com> + + * htword/WordRecord: mutable, can also contain uniq word statistics. + + * htword/WordReference: remove all dependencies related to the actual + structure of the key. + + * htcommon/HtWordReference: derived from WordReference, explicit + accessors. + + * htcommon/HtWordList: derived from WordList, only handles the + word cache (Flush, MarkGone). + + * htdig/HTML.cc (do_tag): add wordindex to have location set in + tags + + * htcommon/DocumentRef.cc (AddDescription): add Location calculation + + * htword/WordList.cc: add dberror to map Berkeley DB error codes + + * htsearch/Display.cc (display): initialize good_sort to get rid + of strange warning. + +Fri Oct 01 09:02:11 1999 Loic Dachary <loic at ceic.com> + + * Makefile.config: duplicate library lines to resolve + interdependencies. + +Thu Sep 30 17:56:55 1999 Loic Dachary <loic at ceic.com> + + * htmerge/words.cc (delete_word): Upgrade to use WordCursor. + + * htword/WordList: Walk now uses a local WordCursor. Many concurent + Walk can happen at the same time. + + * htword/WordList: Walk callback now take the current WordCursor. + Added a Delete method that takes the WordCursor. Allows to delete + the current record while walking. + + * db/include/db_cxx.h (DB_ENV): add int return type to operator = + + * db/dist/configure.in (CXXFLAGS): disable adding obsolete + g++ option. + + * configure.in: enable C++ support when configuring Berkeley DB + + * htword: create. move Word* from htcommon. move HtWordType + from htlib and rename WordType. + + * htword/WordList: use db_cxx interface instead of Database. + Less interface overhead. Get access to full capabilities of + Berkeley DB. Much more error checking done. + Create WordCursor private class to use String instead of Dbt. + +Wed Sep 29 20:03:31 1999 Loic Dachary <loic at yoda.ceic.com> + + * htlib/lib.h: AIX xlC does is confused by overloaded mystrcasestr + that only differ in constness. Only keep const form and use cast + where approriate. *sigh* + + * htlib/htString.h: accomodate new form of Object::compare and + Copy. Explicitly convert compare arg to String&, prevent hiding + and therefore missing the underlying compare function. + + * htlib/HtVector.cc (Copy): make it const + + * htlib/HtHeap.cc: accomodate new form of Object::compare + + * htcommon/List.h,cc: Add ListCursor to allow many pointers that + walk the list to exist in the same program. + + * htlib/Object.h (class Object): kill unused Serialize + Deserialize. + Change unused Copy to const and bark on stderr if called because it + is clearly not was is wanted. If Copy is called and the derived class + does not implement Copy we are in trouble. Alternatives are to make + it pure virtual but it will break things all over the code or to abort + but this will be considered to violent. Change compare to take a + const reference and be a const. + +Wed Sep 29 16:51:58 1999 Loic Dachary <loic at yoda.ceic.com> + + * acinclude.m4,configure.in,Makefile.config: remove -Wall from + Makefile.conf, add the AC_COMPILE_WARNINGS macro in acinclude.m4 + and use it in configure.in. + + * htdoc/default_check.pl: remove, unused + +Wed Sep 29 13:07:58 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/Transport: fixed some bugs on construction and destruction + + * htnet/HtHTTP: the most important add is the decoding of chunked + encoded responses, as reported on RFC2616 (HTTP/1.1). It needs + to be developed, because it timeouts at the end of the request. + Added a function pointer in order to dynamically handle the function + that reads the body of a response (for now, normal and chunked, but + other encoding ways exist, so ...). Fixed some bugs on construction + and added some features like Server and Transfer-encoding headers. + +Wed Sep 29 13:54:59 1999 Loic Dachary <loic at yoda.ceic.com> + + * fix all inline method declarations so that they are always declared + inline in the class declaration if an inline definition follows. + + * acinclude.m4: also search apache in /usr/local/apache/bin by default. + + * fix various warnings of gcc-2.95, now compiles ok without warnings + and with -Wall. + + * htlib/htString.h: removed commented out inline get + + * test/testnet.cc: add includes for optarg + +Tue Sep 28 18:56:36 1999 Loic Dachary <loic at ceic.com> + + * Makefile.config (HTLIBS): libhtnet at the beginning of the list. It + matters on Solaris-2.6 for instance. + + * test/testnet.cc: change times to timesvar to avoid conflict with + function (was warning only on Solaris-2.6). + + * htdig,htsearch,htmerge,test/word are purify clean when running + make check. + +Tue Sep 28 18:23:49 1999 Loic Dachary <loic at ceic.com> + + * htmerge/words.cc (mergeWords): use WordList::Walk to avoid loading ALL + the words into memory. + + * htlib/DB2_db.cc (Open): we don't want duplicates. Big mistake. If DUP is + on, every put for update will insert a new entry. + + * htcommon/WordList.cc (Delete): separate Delete (straight Delete and WalkDelete) + to avoid accessing dbf from outside WordList. + + * htcommon/WordList.cc (Walk): now promoted to public. + +Tue Sep 28 16:34:56 1999 Loic Dachary <loic at ceic.com> + + * test/word.cc (dolist): Add regression tests for Delete. + + * htcommon/WordList.cc (Delete): Reimplement from scratch. Use Walk + to find records to delete. This allows to say delete all occurence + of this word, delete all words in this document (slow), delete + all occurences of this word in this document etc. + + * htcommon/WordList.cc (Walk): extend so that it handles walk for + partially specified keys, remains fully backward compatible. It allows + to extract all the words in a specific document (slow) or all occurences + of a word in a specific document etc. + +Tue Sep 28 12:56:12 1999 Loic Dachary <loic at ceic.com> + + * htcommon/DocumentDB.cc (Open): report errors on stderr + + * htmerge/docs.cc (convertDocs): rely on error reporting from DocumentDB + instead of implementing a custom one. + +Tue Sep 28 11:36:28 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htnet/Transport.h: added the status code and the reason phrase + + * htnet/HtHTTP.cc .h: removed the attributes above. + Read the body of a response if the code is 2xx. Issues the + GetLocation() method. + +Tue Sep 28 10:32:47 1999 Loic Dachary <loic at ceic.com> + + * test/htdocs/set3: create and populate with cgi scripts have + bad behaviour (time out and, slow connection). + +Tue Sep 28 10:20:23 1999 Loic Dachary <loic at ceic.com> + + * test/htdocs: move html files in set1/set2 subdirectories to allows + tests that use different set of files. Change htdig.conf accordingly. + +Tue Sep 28 09:31:12 1999 Loic Dachary <loic at ceic.com> + + * test/Makefile.am: comment test options, add LONG_TEST='y' for lengthy + tests, by default run quick tests. + + * installdir/bad_words: removed it an of : since the minimum word + length is by default 3, these words are ignored anyway. + +Mon Sep 27 20:37:38 1999 Loic Dachary <loic at ceic.com> + + * htlib/HtWordType.h,cc: concentrate knowledge about word definition in this + class. Rename the class WordType (think WordReference etc...). Change + Initialize to use an external default object. A WordType object may be + allocated on its own. Drag functionalities from BadWordFile, Replace and + IsValid of WordList, and concentrate them in the WordType::Normalize + function. + + * htcommon/WordList: use the new WordList semantic. WordType is now a member + of WordList, opening the possibility to have many WordList object with different + configurations within the same program since the constructor takes + + * htsearch/htsearch.cc (setupWords): Use HtNormalize to find out if word should + be ignored in query. Formerly using IsValid. + + * htlib/String.cc (operator []): fix big mistake, operator [] was indeed last() ! + + * htlib/String.cc(uppercase, lowercase): return the number of converted chars. + + * htlib/String.cc(remove): return the number of chars removed. + +Mon Sep 27 17:43:23 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * Created testnet.cc under test dir for trying the htnet library + It's a simple program that retrieves an URL. + + * htnet/HtHTTP.cc, .h: added a 'int (*) (char *)' function pointer. + This attribute is static and it is used under the isParsable method + in order to determine if a document is parsable. It must be set + outside this class by using the SetParsingController static method. + The classic use is to set it to 'ExternalParser::canParse' . + +Mon Sep 27 10:52:51 1999 Loic Dachary <loic at ceic.com> + + * htmerge/db.cc (mergeDB): delete words instead of words->Destroy() + because the words object itself was not freed. + +Mon Sep 27 10:38:37 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * Created 'htnet' library + +Mon Sep 27 12:39:24 1999 Loic Dachary <loic at ceic.com> + + * test/word.cc (dolist): don't deal with upper case at present and prevent warning. + +Mon Sep 27 10:38:37 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htlib/String.cc: removed compiler warnings + + * htdig/HtHTTP.h: corrected cvs Id property + +Mon Sep 27 10:29:58 1999 Loic Dachary <loic at ceic.com> + + * htlib/String.cc (String): make sure *all* constructors set the Data + member to 0. + + * htsearch/parser.cc (score): add missing dm->id = wr->DocID(); + strange it did not make search fail horribly. + +Mon Sep 27 09:46:34 1999 Loic Dachary <loic at ceic.com> + + * test/conf/htdig.conf.in (common_dir): add common_dir so that + templates are found in compile directory. + + * htsearch/parser.cc (phrase): free wordList at end and only allocate if + needed. + +Fri Sep 24 16:35:47 1999 Loic Dachary <loic at ceic.com> + + * htcommon/DocumentDB.ccf (Open): change mode to 666 instead of 664, + it's the bizness of umask to remove permission bits. + + * htlib/URL.cc (removeIndex): Memory leak. do not use l.Release + since standard Destroy called by destructor is ok. + + * htdig/htdig.cc (main): Memory leak. Use l.Destroy instead of + l.Release. + + * htlib/StringList.cc (Join): Memory leak (new String str + + return *str). Also change to const fct. + + * htlib/List.cc (Nth): add const version to help StringList::Join save + memory. + + * htdig/HTML.cc (parse): delete [] text (was missing []) + + * htlib/HtVector.cc: Most of the boundary tests with element_count + (but not all of them) were wrong (> instead of >= for instance). + + * htlib/HtVector.cc (Previous): limit test cut and pasted from Next + and obviously completely wrong. Fix. + + * htlib/HtVector.cc (Remove): use RemoveFrom, avoid code duplication. + + * htcommon/DocumentRef.cc (Clear): set all numerical fields to 0, + and truncate strings to 0. Some were missing. + + * htlib/Connection.cc (Connection): free(server_name) because allocated + by strdup not new. + +Fri Sep 24 14:30:21 1999 Loic Dachary <loic at ceic.com> + + * */.cvsignore: update to include .pure, *.la, *.lo, .purify + + * htlib/String.cc (String): add Data = 0 + + * htlib/htString.h (class String): add Data = 0 + + * htlib/String.cc (String): init set to MinimumAllocationSize at least + prevents leaking if init = 0. + + * htlib/String.cc (nocase_compare): use get() instead of direct + pointer to Data so that the trailing null will be added. + + * htlib/Dictionary.cc (DictionaryEntry): free(key) instead of + delete [] key because obtained with strdup. + + * htlib/DB2_db.cc (Close): free(dbenv) because db_appexit does not + free this although it free everything else. + +Thu Sep 23 18:18:40 1999 Loic Dachary <loic at ceic.com> + + * configure.in: add PERL detection & use in Makefile.am + +Thu Sep 23 14:29:29 1999 Loic Dachary <loic at ceic.com> + + * configure.in: removed unused alloca.h + + * htcommon/DocumentDB.cc: test isopen in Close instead of before calling Close. + Add some const in functions arguments. + (Read): change char* args to const String&, changed tests for null pointers to + empty(). + (Add): Delete the temp class member, use function local temp. + (operator []): change char* args to const String& + (CreateSearchDB): change char* args to const String& + + * htcommon/DocumentRef.cc:(AddDescription): Add some const in functions arguments. + Use a WordReference as insertion context instead of merely the docid: it contains + the insertion context. + (AddAnchor): Add some const in functions arguments. + + * htcommon/DocumentRef.h: Add some const in inline functions arguments. + + * htcommon/Makefile.am: add WordKey + WordKey.h generation + + * htcommon/word_builder.pl, word.desc, WordKey.h.tmpl: generate WordKey.h from WordKey.h.tmpl and + word.desc + + * htcommon/WordList.cc: In general remove code that belongs to WordReference rather + than WordList and cleanup const + String. + (WordList) the constructor takes a Configuration object in argument. + (Word -> Replace): Word method replaced by Replace method because more explicit. Now + taks a WordReference in argument instead of the list of fields values. + (valid_word deleted, IsValid only): Add some const in functions arguments. + (BadWordFile): change char* args to const String& + (Open + Read -> Open): Open and Read merge into Open with mode argument. change char* args + to const String&. + (Add): use WordReference::Pack and simply do Put. + (operator[], Prefix ...) now take WordReference instead of Word. Autmatic Conversion from + Word for compatibility thru WordReference(const Word& w). + (Dump): change char* args to const String& + (Walk): use WordReference member functions instead of hard coded packing + + * htcommon/WordRecord.h: move flag definitions to WordReference.h + only keep anchor, the reste moved to key. + + * htdig/Document.cc: change all config[""] manipulations from char* to String + or const String + (setUsernamePassword): Add some const in functions arguments. + + * htdig/HTML.cc: change all config[""] manipulations from char* to String + or const String. Change null pointer tests to empty(). + (transSGML): change char* args to const String& + + * htdig/HtHTTP.cc: Add error messages for default cases in every switch. + + * htdig/PDF.cc: (parse) change char* to const String& for config[""] + + * htdig/Plaintext.cc: (parse) remove unused variable + + * htdig/Retriever.cc: use WordReference word_context instead of simple docid + to hold the insertion context. + (Retriever) pass config to WordList initializer. + (setUsernamePassword): Add some const in functions arguments. + (Initial): change char* args to const String& + (parse_url): use WordReference word_context, add debug information. + (RetrievedDocument): set anchor in word_context. + (got_word): use Replace instead of Word + (got_*): Add some const in functions arguments. + + * htdig/htdig.cc: change all config[""] manipulations from char* to String + + * htdoc/cf_generate.pl: compute attrs.html, cf_byprog.html and cf_byname.html from + ../htlib/default.cc and attrs_head.html attrs_tail.html cf_byname_head.html cf_byname_tail.html + cf_byprog_head.html cf_byprog_tail.html + Add rules in Makefile.am + + * htfuzzy: In every programs I changed the constructor to take a + Configuration agrument. The openIndex and writeDB had this + argument sometime used it, sometimes used the global + config. Having it in the contructor is cleaner and safer, there + is no more reference to the global config. I also changed some + char* to String and const. Most of the program look the same, I + won't go into details here :-} + + * htlib/Configuration.cc: changed separators from String* to String. Simpler. + (~Configuration): removed because not needed. + (Add): change to String, remove new String + delete for local var. + (Find, operator[]): make it const fct, add some const in functions arguments. + (Value + Double): killed, replaced by as_integer + as_double from String + (Boolean): use String methods + string objects + (Defaults): Add some const in functions arguments. + + * htlib/Configuration.h: add + char *type; // Type of the value (string, integer, boolean) + char *programs; // White separated list of programs/modules using this attribute + char *example; // Example usage of the attribute (HTML) + char *description; // Long description of the attribute (HTML) + to the ConfigDefaults type. + + * htlib/Connection.cc: (assign_server) change char* args to const String& + + * htlib/DB2_db.cc: Merge with DB2_hash. + Add compare and prefix functions pointers. + Merge OpenRead & OpenReadWrite into Open, keep for compatibility. + skey and data are now strings instead of DBT. + Remove Get_Next_Seq. + Get_Next now returns key and value in arguments. + Remove all other Get_Next interfaces. + + * htlib/Database.h: + Compatibility functions for Get_Next + Put, Get, Exists, Delete take String args and are inline + Add SetPrefix and SetCompare + + * htlib/Dictionary.cc: + Add copy constructor. + Add DictionaryCursor that holds the traversal context. + Use DictionaryCursor object for traversal without explicit + cursor specified. + Add constness where meaningfull. + + * htlib/HtPack.cc: + (htPack) format is const, change strtol call + to use temporary variable to cope with constness. + (htUnpack) dataref argument is not a reference anymore. Not used + anywhere and kind of hidden argument nobody wants. + + * htlib/HtRegex.cc: set, match, HtRegex have const args. + + * htlib/HtWordCodec.cc: (code) orig is const + + * htlib/HtWordType.cc,h: statics is made of String instead of char*. Remove + static String punct_and_extra from Initialize. + + * htlib/HtZlibCodec.cc: len is unsigned int + + * htlib/ParsedString.cc: add constness to function args + (get) use String instead of char + + * htlib/QuotedStringList.cc: inline functions argument variations and + add constness. + + * htlib/String.cc: add constness whereever possible. + + * htlib/htString.h: Add const get, char* cast, operator []. + Add as_double conversion. + + * htlib/StringList.cc: inline functions argument variations and + add constness. + + * htlib/StringMatch.cc: add constness to function args. + + * htlib/URL.cc: add constness to function args. + (URL): fct arg was used as temp. Change, clearer. + + * htlib/lib.h: add const declaration of string manipulation functions. + Two forms for mystrcasestsr: const and not const. + + * htlib/strcasecmp.cc: add constness to function args. + + * htlib/timegm.c: add declaration for __mktime_internal + + * htmerge/db.cc: change *doc* vars from char* to const String, use + new WordList + WordReference interface. + + * htmerge/docs.cc: change *doc* vars from char* to const String. + + * htmerge/words.cc: use new WordList + WordReference interface. + + * htsearch/Display.cc: use empty method on String where appropriate. + use String instead of char* where config[""] used. + (includeURL): change char* args to const String& + + * htsearch/ResultMatch.cc: (setTitle, setSortType) change char* args to const String& + + * htsearch/Template.cc: (createFromFile) change char* args to const String& + + * htsearch/Template.h: accessors return const String& or take const char* + + * htsearch/TemplateList.cc: (get) use const String for internalNames. + + * htsearch/htsearch.cc: use String instead of char* where config[""] used. + + * htsearch/parser.cc: Initialize WordList member with config global. + (perform_push): free the result list after calling score. + (score, phrase): use new WordList + WordReference interface. + +Thu Sep 23 14:29:29 1999 Loic Dachary <loic at ceic.com> + + * htcommon/WordKey.h.tmpl, WordKey.cc: new, describe the key of the word + database. + + * htcommon/word.desc: new, abstract description of the key structure of the word + database. + + * htcommon/word_builder.pl: new, generate WordKey.h from WordKey.h.tmpl + + * htcommon/WordReference.cc: move key manipulation to WordKey.cc + Add Unpack/Pack functions. Add accessors for fields and move fields to private. + Add constness where possible. + +Mon Sep 20 14:50:47 1999 Loic Dachary <loic at ceic.com> + + * Everywhere config["string"] is used, check that it's *not* converted to + char* for later use. Keep String object so that there is no chance to + use a char* that has been deallocated. Using a String as return for config["string"] + is also *much* safer for the great number of calls that did not check for a possible + 0 pointer return. + + * htfuzzy/*.{cc,h}: const Configuration& config member. Constructor sets it. + Remove config argument from openIndex & writeDB. The idea (as it was initialy, + I guess) is to be able to have a standalone fuzzy library using a specify + configuration file. It is now possible and consistent. + + * htlib/htString.cc: more constness where appropriate. Changed compare + to have const String& arg instead of const Object* because useless and + potential source of bugous code. + + * htfuzzy/Regex.cc (getWords): fix bugous setting of extra_word_chars + configuration value. It is set to change the behaviour of HtStripPunctuation + but this function get the extra_word_chars from a static array initialized + at program start by static void Initialize(Configuration & config). Use straight + s.remove() instead. Besides, the string was anchored by prepending a ^ that + was removed because part of the reserved chars. + +Mon Sep 20 11:47:05 1999 Loic Dachary <loic at ceic.com> + + * htlib/Configuration.cc (operator []): changed return type to String + to solve memory leak. When char* the string was malloced from ParsedString + after substitution and never freed. In fact it was even worse : it was + free before use in some cases. + +Sun Sep 19 19:12:44 1999 Loic Dachary <loic at ceic.com> + + * htdoc/cf_generate.pl, htcommon/defaults.cc, htlib/Configuration.h: + Change the structure of the configuration defaults. Move + description, examples, types, used_by information from attrs.html. + Write cf_generate.pl to build attrs.html, cf_byname, cf_byprog + from defaults.cc. Makes it easier to maintain an up to date + description of existing attributes. About 10 attributes existed + in defaults.cc and were not describted in the HTML pages. + Add rules in htdoc/Makefile.am to generate the pages if a source + changes. + +Fri Sep 17 19:34:48 1999 Loic Dachary <loic at ceic.com> + + * Makefile.config: add -Wall to all compilation and fix + all resulting warnings. + + * htlib/Connection.cc (assign_server): remove redundant test + and cast litteral value to unsigned + + * htlib/String.cc: add const qualifier where possible. Helps + dealing with const objects at an upper level. + +Fri Sep 17 18:27:57 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at> + + A few changes so that it compiles with xlC on AIX: + + * configure.in, include/htconfig.h.in: Add check for sys/select.h. + Add "long unsigned int" to the possible getpeername_length types. + + * htdig/htdig.cc: Moved variable declaration out of case block. + + * htlib/Connection.cc: Include sys/select.h. + + * htcommon/WordList.cc: just a type cast + + * htlib/regex.c: define true and false only if they aren't already + + * htdig/Transport.{h,cc}: removed inline keywords (inline functions + have to be defined and declared simultaneously) + + * htlib/{mktime.c,regex.h,strptime.c,timegm.c}: change // comments + to /* ... */ + +Tue Sep 14 01:15:48 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htmerge/db.cc: Rewrite to use the WordList functions to merge + the two word databases. Also make sure to load the document + excerpt when adding in DocumentRefs. + + * htmerge/docs.cc: Fix bug where ids were not added to the discard + list correctly. + + * htmerge/words.cc: Fix bug where ids were not checked for + existance in the discard list correctly. + +Sun Sep 12 12:27:16 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Remove word_list since that file is no + longer used. + + * htdig/htdig.cc: Ensure -a and -i are followed for the word_db + file. Fixes PR #638. + +Sat Sep 11 00:11:28 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/StringMatch.h: Add back mistakenly deleted #ifndef/#define. + +Fri Sep 10 23:07:43 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htmerge/*, htcommon/*, htdig/*, htlib/*: Add copyright information. + +Fri Sep 10 11:33:50 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htnotify/htnotify.cc: Add copyright information. + + * htsearch/* htfuzzy/*: Ditto. + +Fri Sep 10 15:24:44 1999 Loic Dachary <loic at ceic.com> + + * htdig/Retriever.cc: change static WordList words to + object member. words.Close() at end of Start function + to make sure data is flushed by database. + + * htcommon/WordList.cc (Close): test isopen to prevent + ugly crash. Remove isopen test in calling functions. + +Fri Sep 10 13:45:53 1999 Loic Dachary <loic at ceic.com> + + * htcommon/WordList.h htcommon/WordList.cc: methods Collect + and Walk that factorise the behaviour of operator [], Prefix + and WordRefs. + + * htcommon/WordList.h htcommon/WordList.cc: method Dump to + dump an ascii version of the word database. + + * htcommon/WordReference.h,htcommon/WordReference.cc: method Dump + to write an ascii version of a word. + + * htdig/htdig.cc: -t now also dump word database in ascii as + well. + + * htdoc/attrs.html,cf_byprog.html,cf_byname.html: added doc + for word_dump + +Thu Sep 9 20:30:18 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Fuzzy.h, htfuzzy/Fuzzy.cc, htfuzzy/Prefix.cc, + htfuzzy/Regex.cc, htfuzzy/Speling.cc, htfuzzy/Substring.cc, + htfuzzy/htfuzzy.cc, htfuzzy.h: Change to use WordList code instead + of direct access to the database. + +Thu Sep 9 14:55:59 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/parse_doc.pl: fix bug in pdf title extraction. + +Tue Sep 7 23:49:41 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/ExternalParser.h, htdig/ExternalParser.cc (parse): Change + parsing of location to allow phrase searching -- location is *not* + just 0-1000. + + * htdig/Plaintext.h, htdig/Plaintext.cc, htdig/PDF.cc: Ditto. + + * htdig/Retriever.h, htdig/Retriever.cc: Don't call + HtStripPunctuation. This is now done in the WordList::Word method. + + * htcommon/WordList.h htcommon/WordList.cc (Prefix): New method to + do prefix retrievals. Essentially the same as [], except the loop + is broken only in the unlikely event that we retrieve something + beyond the range set. + (Exists): New method for checking the existance of a + string--attempt to retrieve it and determine if anything's + actually there. + (Word): Call HtStripPunctuation as part of the cleanup. + +Tue Sep 7 21:37:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Add new configuration option + removed_unretrieved_urls to remove docs that have not been accessed. + + * htmerge/docs.cc (convertDocs): Use it. + + * htcommon/defaults.h, htcommon/WordRecord.h, + htcommon/WordReference.h: Add copyright notice to head of file. + +Mon Sep 6 10:32:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtZlibCodec.h, htlib/HtZlibCodec.cc(instance): New method + as used in other codecs. + (encode, decode): Fix compilation errors. + + * htlib/Makefile.am: Added HtZlibCodec.cc to the compilation list. + + * htcommon/DocumentDB.cc (ReadExcerpt): Call HtZlibCodec to decompress + the excerpt. + (Add): Call HtZlibCodec to compress the excerpt before storing. + (Open, Read): If the databases are + already open, close them first in case we're opening under a + different filename. + (CreateSearchDB): Remove call to external + sort program. Database is already sorted by DocID. + + * configure.in, configure: Remove check for external sort + program. No longer necessary. + + * */Makefile.in: Regenerate using automake. + +Sun Sep 5 13:50:34 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htmerge/docs.cc: Ensure a document with empty excerpt has + actually been retrieved. Otherwise document stubs are always + removed. + + * htlib/String.cc: Implement the nocase_compare method. + + * htcommon/WordReference.cc: Implement a compare method for + WordRefs to use in sorting. Uses the above. + + * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Update the + headers. + + * htcommon/DocumentDB.h: Ditto. + +Sun Sep 5 01:37:27 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/WordList.cc(Flush): Call Add() instead of storing the + data ourselves. Additionally, don't open the database ourself (and + then close it), instead call Open() if it's not open already. + + * htcommon/DocumentRef.h, htcommon/DocumentRef.cc(AddDescription): + Pass in a WordList to use when adding link text words. Ensures + that the word db is never opened twice for writing. + + * htdig/Retriever.cc: Call AddDescription as above. + + * htdig/Server.cc(ctor): If debugging, write out an entry for the + robots.txt file. + + * htlib/HtHeap.cc(percolateUp): Fix a bug where the parent was not + updated when moving up more than once. + (pushDownRoot): Fix a bug where the root was inproperly pushed + down when it required looping. + +Fri Sep 3 16:23:23 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtHeap.cc(Remove): Correct bug where after a removal, the + structure was not "re-heapified" correctly. The last item should + be moved to the top and pushed down. + (pushDownRoot): Don't move items past the size of the underlying + array. + + * htdig/Server.h, htdig/Server.cc: Change _paths to work on a + heap, based on the hopcount. Ensures on a given server that the + indexing will be done in level-order by hopcount. + +Wed Sep 01 15:40:37 1999 Loic Dachary <loic at ceic.com> + + * test: implement minimal tests for htsearch and htdig + +Tue Aug 31 02:17:04 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/WordRecord.h: Change back to struct to ensure integrity + when compressed and stored in the word database. + + * htcommon/WordList.cc (Flush): Use HtPack to compress the + WordRecord before storage. + ([], WordRefs): Use HtUnpack to decompress the WordRecord after + storage. + +Sun Aug 29 00:42:07 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc (convertToBoolean): Remove debugging + strings. + + * htsearch/parser.h: Add new method score(List) to merge scoring + for both standard and phrase searching. + + * htsearch/parser.cc(phrase): Keep the current list of successful + matched words around to pass to score and perform_phrase. + (perform_phrase): Naively (and slowly, but correctly) loop through + past words to make sure they match DocID as well as successive locations. + Move scoring to score(). + (perform_push): Move scoring to score(). + (score): Loop through a list of WordReferences and create a list + of scored DocMatches. + +Sun Aug 29 00:33:17 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc(createLogicalWords): Hack to produce + correct output with phrase searching (e.g. anything in quotes is + essentially left alone). Ensure the StringMatch pattern includes + the phrase with correct spacing as well. + (setupWords): Add a " token whenever it occurs in the query. + (convertToBoolean): Make sure booleans are not inserted into + phrases. + + * htsearch/parser.h: Add new methods phrase and perfor_phrase to + take care of parsing phrases and performing the actual matching. + + * htsearch/parser.cc(lexan): Return a '"' when present for phrase + searching. + (factor): Call phrase() before parsing a factor--phrases are the + highest priority, so ("RedHat Linux" & Debian) ! Windows makes + sense. + (phrase): New method--slurps up the rest of a phrase and calls + perform_phrase to do the matching. + (perform_phrase): New method--currently just calls perform_and to + give the simulation of a phrase match. + +Sat Aug 28 15:57:53 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Server.h, htdig/Server.cc: Undo yesterdays change -- still + very buggy and shouldn't be used yet. + + * htdig/Retriever.cc (parse_url): Change default index to 1 to + more closely match DocIDs shown with verbose output. + + * htsearch/DocMatch.h: Change score to double and clean up + headers. + + * htcommon/WordRecord.h: Change unnecessary long ints (id and + flags) to plain ints. + + * htdig/HTML.cc (parse): Call got_word with actual word sequence + (i.e. 1, 2, 3...) rather than scaling to 1-1000 by character + offset. + + * htlib/Database.h, htlib/DB2_db.h, htlib/DB2_hash.h: Change + Get_Item to Get_Next(String item) to return the data as a + reference. This makes it easier to use in a loop and cuts the + database calls in half. + + * htlib/DB2_db.cc, htlib/DB2_hash.cc: Implement it, making sure we + keep the possibly useful data around, rather than tossing it! + + * htsearch/htsearch.cc(htsearch): Don't attempt to open the word db + ourselves. Instead, pass the filename off to the parser, which + will do it through WordList. + + * htsearch/parser.h: Use a WordList instead of a generic Database. + + * htsearch/parser.cc(perform_push): Use the WordList[] operator to + return a list of all matching WordRefs and loop through, summing + the score. + + * htcommon/WordList.cc (Flush): Don't use HtPack on the + data--somehow when unpacking, there's a mismatch of sizes. + (Read): Fix thinko where we attempted to open the database as a + DB_HASH. + ([]): Don't use HtUnpack since we get mismatches. Use the new + Get_Next(data) call instead of calling Get_Item separately. + (WordRefs): Same as above. + +Fri Aug 27 09:44:09 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (Need2Get): Remove duplicate detection code for + local_urls. The code is somewhat buggy and should be replaced by + more general code shortly. + + * htdig/Server.h, htdig/Server.cc (push, pop): Change _paths to a + HtHeap sorted on hopcount first (and order placed on heap + second). Ensures that on each server, the order indexed is + guaranteed to be level-order by hopcount. + + * htdig/URLRef.h, htdig/URLRef.cc (compare): Add comparison method + to enable sorting by hopcount. + +Fri Aug 27 09:36:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/WordList.h, htcommon/WordList.cc (WordList): Change + words to a list instead of a dictionary for minor speed improvement. + +Thu Aug 26 11:18:20 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc, htdoc/attrs.html: increase default + maximum_word_length to 32. + +Wed Aug 25 16:50:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Retriever.cc(got_word): add code to check for compound words + and add their component parts to the word database. + * htdig/PDF.cc(parseString), htdig/Plaintext.cc(parse): Don't strip + punctuation or lowercase the word before calling got_word. That + should be left up to got_word & Word methods. + + * htlib/StringMatch.h, htlib/StringMatch.cc(Pattern, IgnoreCase): + Add an IgnorePunct() method, which allows matches to skip over valid + punctuation, change Pattern() and IgnoreCase() to accomodate this. + * htsearch/htsearch.cc(main, createLogicalWords): use IgnorePunct() + to highlight matching words in excerpts regardless of punctuation, + toss out old origPattern, and don't add short or bad words to + logicalPattern. + + * htlib/HtWordType.h, htlib/HtWordType.cc(Initialize): set up and + use a lookup table to speed up HtIsWordChar() and HtIsStrictWordChar(). + +Mon Aug 23 10:13:05 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc(parse): fix problems with null pointer when attempting + SGML entity decoding on bare &, as reported by Vadim Chekan. + +Thu Aug 19 11:52:06 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc(main): Fix to allow multiple keywords + input parameter definitions. + + * contrib/parse_doc.pl: make spaces optional in LANGUAGE = POSTSCRIPT + PJL test. + +Wed Aug 18 11:27:46 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/PDF.cc(parse): Fixed wrong variable name in new code. + Double-Oops! (It was Friday the 13th, after all...) + +Tue Aug 17 16:26:46 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/HtHeap.cc(Remove): apply Geoff's patch to fix Remove. + + * htlib/HtVector.h, htlib/HtVector.cc(Index): various bounds overrun + bug fixes and checking in Last(), Nth() & Index(). + +Mon Aug 16 13:55:10 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(expandVariables): fix up test for & + +Mon Aug 16 12:08:57 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * Makefine.am, Makefile.in, installdir/Makefile.am, + installdir/Makefile.in: change all remaining INSTALL_ROOT to DESTDIR. + +Fri Aug 13 15:44:31 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/PDF.cc(parse): added missing ')' in new code. Oops! + + * htlib/strptime.c, htlib/mktime.c: added #include "htconfig.h" + to pick up definitions from configure program. Let's try to + remember that config.h != htconfig.h! + +Fri Aug 13 14:49:07 1999 Loic Dachary <loic at ceic.com> + + * configure.in: removed unused HTDIG_TOP, changed AM_WITH_ZLIB + by CHECK_ZLIB + +Fri Aug 13 14:00:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/PDF.cc(parse), htcommon/defaults.cc, htdoc/attrs.html + (pdf_parser): Removed -pairs option from default arguments, added + special test for acroread to decide whether to use output file or + directory as last argument (also adds -toPostScript if missing). + Program now tries to test for existance of parser before trying + to call it. + +Fri Aug 13 10:10:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/attrs.html(pdf_parser): updated xpdf version number. + +Thu Aug 12 17:09:37 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/parse_doc.pl: updated for xpdf 0.90, plus other fixes. + +Thu Aug 12 11:12:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/attrs.html(logging): added Geoff's description of log lines. + +Thu Aug 12 11:21:12 1999 Loic Dachary <loic at ceic.com> + + * strptime fixes : AC_FUNC_STRPTIME defined in acinclude.m4 and used in configure.in, + conditional compilation of strptime.c (only if HAVE_STRPTIME not defined), + removed Htstrptime (strptime.c now defines strptime), changed all calls to Htstrptime + to calls to strptime. + +Wed Aug 11 16:59:41 1999 Loic Dachary <loic at ceic.com> + + * */Makefile.am: use -release instead of -version-info because nobody + wants to bother with published shared lib interfaces version numbers + at present. + + * htlib/Makefile.am: added langinfo.h + +Wed Aug 11 15:00:07 1999 Loic Dachary <loic at yoda.ceic.com> + + * acconfig.h: removed MAX_WORD_LENGTH + + * re-run auto* to make sure chain is consistent + + * Makefile.am: improve distclean for tests + +Wed Aug 11 13:46:22 1999 Loic Dachary <loic at yoda.ceic.com> + + * configure.in: change --enable-test to --enable-tests so + that Berkeley DB tests are not activated. Since they depend + on tcl this can be a pain. + + * acinclude.m4: AM_PROG_TIME locate time command + find out + if verbose output is -l (freebsd) or -v (linux) + +Wed Aug 11 13:13:39 1999 Loic Dachary <loic at yoda.ceic.com> + + * acinclude.m4 : AM_WITH_ZLIB autoconf macro for zlib detection that + allows --with-zlib=DIR to specify the install root of zlib, + --without-zlib to prevent inclusion of zlib. If nothing + specified zlib is searched in /usr and /usr/local. + --disable-zlib is replaced with --without-zlib. + + * configure.in,configure,aclocal.m4,db/dist/acinclude.m4, + db/dist/aclocal.m4,db/dist/configure,db/dist/configure.in: + changed to use AM_WITH_ZLIB + +Tue Aug 10 21:14:34 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc (outputVariable): Fix compilation error with + assignment between char * and char *. + + * htsearch/htsearch.cc (main): Use cleaner trick to sidestep + discarding const char * as suggested by Gilles. + +Tue Aug 10 17:24:12 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(expandVariables): clean up, simplify and + label lexical analyzer states. + +Tue Aug 10 17:04:54 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(expandVariables, outputVariable): add handling + for $%(var) and $&(var) in templates. Still to be documented. + +Tue Aug 10 20:13:52 1999 Loic Dachary <loic at yoda.ceic.com> + + * db/mp/mp_bh.c: fixed HAVE_ZLIB -> HAVE_LIBZ + +Tue Aug 10 17:58:01 1999 Loic Dachary <loic at yoda.ceic.com> + + * configure,configure.in,db/dist/configure.in,db/dist/configure: + added --with-zlib configure flag for htdig to specify zlib + installation path. Motivated to have compatible tests between + htdig and db as far as zlib is concerned. Otherwise configuration + is confused and miss an existing libz. + +Tue Aug 10 17:44:49 1999 Loic Dachary <loic at yoda.ceic.com> + + * db/mp/mp_fopen.c: fixed cmpr_open called even if libz not here + +Tue Aug 10 17:40:53 1999 Loic Dachary <loic at yoda.ceic.com> + + * htlib/langinfo.h: header missing on FreeBSD-3.2, needed + by strptime.c + +Tue Aug 10 11:43:14 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.h, htdig/HTML.cc(parse, do_tag): fix problems with + SGML entity decoding, add decoding of entities within tag attributes. + +Mon Aug 9 21:13:50 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HtHTTP.h(SetRequestMethod): Fix declaration to be void. + + * htdig/Transport.h(GetRequestMaxDocumentSize): Fix declaration to + return int. + + * htdig/Retriever.cc(got_href): Fix mistake in hopcount + calculations. Now returns the correct hopcount even for pages + when a faster path is found. (Still need to change indexing to + sort on hopcount). + + * htsearch/htsearch.cc(main): Fix compiler error in gcc-2.95 when + discarding const by using strcpy. It's a hack, hopefully there's a + better way. + +Mon Aug 9 17:23:15 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/URL.cc(ServerAlias): fix small memory leak in new default + path code (don't need to allocate new from string each time). + + * htlib/cgi.cc(init): Fix PR#572, where htsearch crashed if + CONTENT_LENGTH was not set but REQUEST_METHOD was. + + * htfuzzy/Fuzzy.cc(getWords), htfuzzy/Metaphone.cc(vscode): + Fix Geoff's change of May 15 to Fuzzy.cc, add test to vscode macro + to stay in array bounds, so non-ASCII letters to cause segfault. + Should fix PR#514. + +Mon Aug 9 17:03:45 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * include/htconfig.h.in, htcommon/WordList.cc(Word,Flush&BadWordFile), + htcommon/DocumentRef.cc(AddDescription), htcommon/defaults.cc, + htsearch/parser.cc(perform_push), htdoc/attrs.html, + htdoc/cf_byname.html, htdoc/cf_byprog.html: + Convert the MAX_WORD_LENGTH compile-time option into the run-time + configuration attribute maximum_word_length. This required reinserting + word truncation code that had been taken out of WordList.cc. + +Mon Aug 9 16:34:14 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HtHTTP.cc (isParsable): allow application/pdf as parsable, + to use builtin PDF code. + + * htdig/HtHTTP.cc (ParseHeader), + htdig/Document.cc (readHeader): clean up header parsing. + + * htdig/Document.cc (getdate): make tm static, so it's initialized + to zeros. Should fix PR#81 & PR#472, where strftime() would crash + on some systems. Idea submitted by benoit.sibaud at cnet.francetelecom.fr + + * htlib/URL.cc (parse): fix PR#348, to make sure a missing or invalid + port number will get set correctly. + +Mon Aug 9 15:42:41 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Added descriptions for attributes that were missing, added a few + clarifications, and corrected a few defaults and typos. + Covers PR#558, PR#626, and then some. + + * configure.in, configure, include/htconfig.h.in, htlib/regex.c: + PR#545 fixed - configure tests for presence of alloca.h for regex.c + +Sat Aug 07 13:40:17 1999 Loic Dachary <loic at ceic.com> + + * configure.in: remove test for strptime. Run autoconf + autoheader. + + * htlib/HtDateTime.cc: always use htdig strptime, do not try to use + existing function in libc. + + * htlib/HtDateTime.h: move inclusion of htconfig.h on top of file, + change #ifdef HAVE_CONFIG to HAVE_CONFIG_H + +Fri Aug 6 16:37:33 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc (UseProxy): fix call to match() and test of + return value to work as documented for http_proxy_exclude (PR#603). + +Fri Aug 06 15:06:23 1999 <loic at yoda.ceic.com> + + * db/dist/config.hin, db/mp/mp_cmpr.c db/db/db.c, db/mp/mp_fopen.c: + disable compression if zlib not found by configure. + +Thu Aug 05 12:27:15 1999 <loic at yoda.ceic.com> + + * test/dbbench.cc: invert -z and -Z for consistency + + * test/Makefile.am: add dbbench call examples + +Thu Aug 05 11:38:58 1999 Loic Dachary <loic at ceic.com> + + * test/Makefile.am: all .html go in distribution, compile dbbench + that tests Berkeley DB performances. + + * configure.in/Makefile.am: conditional inclusion of the test + directory in the list of subdirs (--enable-test). The list + of subdirs is now @HTDIGDIRS@ in configure.in & Makefile.am + + * db/*: Transparent I/O compression implementation. Defines the DB_COMPRESS flag. + For instance DB_CREATE | DB_COMPRESS. + + * db/db_dump/load: add -C option to specify cache size to db_dump/db_load + +Wed Aug 4 22:57:27 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * db/*: Import of Sleepycat's Berkeley DB 2.7.5. + +Wed Aug 4 22:40:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * contrib/htparsedoc/htparsedoc: Add in contributed bug fixes from + Andrew Bishop to work on SunOS 4.x machines. + +Wed Aug 4 01:58:52 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * COPYING, htdoc/COPYING, configure.in, Makefile.am, Makefile.in: + Update information to use canonical version of the GPL from the + FSF. In particular, this version has the correct mailing address + of the FSF. + +Mon Aug 02 11:28:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htlib/htString.h, htlib/String.cc : added the possibility to + insert an unsigned int into a string. + * htdig.cc : with verbose mode shows start and end time. + +Thu Jul 22 18:10:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Transport.cc, htdig/HtHTTP.cc : modified the destructors. + +Thu Jul 22 13:10:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Transport.cc, htdig/Transport.h, htdig/HtHTTP.cc, + htdig/HtHTTP.h: Re-analyzed inheritance methods and attributes of + the 2 classes. This is a first step, not definitive ... cos it + still doesn't work as I hope. + +Tue Jul 20 11:21:52 1999 <loic at ceic.com> + + * configure.in : added AM_MAINTAINER_MODE to prevent unwanted + dependencies check by default. + + * db/Makefile.in : remove Makefile when distclean + +Mon Jul 19 13:23:53 1999 <loic at ceic.com> + + * Makefile.config (INCLUDES): added -I$(top_srcdir)/include because + automatically -I../include is not good, added -I$(top_builddir)/db/dist + because some db headers are configure generated (if building in a + directory that is not the source directory). + + * rename db/Makefile db/Makefile.in: otherwise it does not show + up if if building in a directory that is not the source directory. + +Mon Jul 19 13:02:22 1999 <loic at ceic.com> + + * .cvsignore: do not ignore Makefile.config + +Sun Jul 18 22:47:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/parser.cc: Eliminated compiler errors. Currently + returns no matches until bugs in the WordList code are fixed. + +Sun Jul 18 22:42:04 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htmerge/htmerge.h: Cleanup, including WordRecord and + WordReference as needed. + + * htmerge/htmerge.cc: Update for files necessary for merge + calls. + Call convertDocs before mergeWords so that the discardList gets + the list of documents deleted. + + * htmerge/docs.cc: Update for difference in calling order. + + * htmerge/words.cc: Update (and significant cleanup) since + WordList writes directly to db.words.db. Iterate over the stored + words, deleting those from deleted documents. + + * htmerge/db.cc: Update to eliminate compiler errors. Currently + disabled until bugs in the words code are fixed. + +Sun Jul 18 22:33:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Collapse the multiple heading_factors into + one. (It's prohibitive to define a flag for each h* tag). + Add a new url_factor for the text of URLs (presently unused). + + * htcommon/DocumentRef.cc(AddDescription): Use FLAG_LINK_TEXT as + defined in htcommon/WordRecord.h. + + * htdig/Retriever.h: Change factor to accomodate flags instead of + weighting factors. + + * htdig/Retriever.cc: Update to use flags, and define the indexed + flags in factor as appropriate. + + * htdig/HTML.cc: Update calls to got_word with appropriate new + offsets into factor[]. + +Sun Jul 18 22:18:16 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/WordReference.h, htcommon/WordRecord.h: Update to use + flags instead of weight. + + * htcommon/WordList.h, htcommon/WordList.cc: Add database access + routines to match DocumentDB.cc. + (Word): Recognize flags instead of weight, simply add the + word. (Duplicates expected!) + (mark*): Simply delete the list of words. + (flush): Rather than dump to a text file, dump directly to the db. + +Sun Jul 18 21:50:04 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Database.h, htlib/DB2_db.h, htlib/DB2_hash.h: Add new + method Get_Item to access the data of the current item when using + Get_Next() or Get_Next_Seq(). + + * htlib/DB2_db.h, htlib/DB2_hash.cc: Implement Get_Item() using + cursor access. + +Sat Jul 17 12:59:01 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * test/*.html: Added various HTML files as the beginnings of a + testing suite. + +Fri Jul 16 16:06:27 1999 Loic Dachary <loic at ceic.com> + + * All libraries (except db) use libtools. Shared libraries are + generated by default. --disable-shared to get old behaviour. + Libraries are installed in all cases. + + * Change structure of default installation directory (match + standard). + database : var/htdig + programs : bin + libraries : lib + + Like default apache: + conf : conf + htdocs : htdocs/htdig + cgi-bin : cgi-bin + + * Switch all Makefile.in into Makefile.am + + * CONFIG.in CONFIG : removed. Replaced with --with- arguments in + configure.in + + * Makefile.config.in removed, only keep Makefile.config : automake + automatically defines variables for each AC_SUBST variables. + Makefile.config has HTLIBS + DEFINES + + * db/Makefile : added to forward (clean all distclean) targets to + db/dist and implement distdir target. + + * acconfig.h : created to allow autoheader to work (contains GETPEERNAME_LENGTH_T + HAVE_BOOL, HAVE_TRUE, HAVE_FALSE, NEED_PROTO_GETHOSTNAME). Extra definitions + added before @TOP@ (TRUE, FALSE, VERSION, MAX_WORD_LENGTH, LOG_LEVEL, LOG_FACILITY). + + * installdir/Makefile.am : installation rules moved from Makefile.am to installdir/Makefile.am + + * include/Makefile.am : distribute htconfig.h.in and stamp-h.in + + * Makefile.am : do not pre-create the directories, creation is done during the installation + + * configure.in: CF_MAKE_INCLUDE not needed anymore : automake handles + the include itself. + +Fri Jul 16 13:04:27 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc(parse): fix to prevent closing ">" from being passed + to do_tag(). + +Thu Jul 15 21:25:12 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Document.cc (readHeader, getParsable): Add back + application/pdf to use builtin PDF code. + + * htdig/Makefile.in: Remove broken Postscript parser as it never + worked. + + * htlib/URL.cc (normalizePath, path): Use config.Boolean as + pointed out by Gilles. + +Thu Jul 15 15:54:30 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdoc/attrs.html(pdf_parser & external_parsers): add corrections & + clarifications, links to relevant FAQ entries. + +Thu Jul 15 18:00:00 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it> + + * htlib/HtDateTime.cc, htlib/HtDateTime.h : added the possibility + to initialize and compares HtDateTime with integers. Added the + constructor HtDateTime (int) and various operator overloading methods. + +Wed Jul 14 22:57:14 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/URL.cc (normalizePath, path): If not case_sensitive, + lowercase the URL. Should ensure that all URLs are appropriately + lowercased, regardless of where they're generated. + +Wed Jul 14 22:37:47 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/DB2_db.cc (OpenReadWrite, OpenRead): Add flag DB_DUP to + database to allow storage of duplicate keys (in this case, + words). + +Tue Jul 13 15:36:40 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc (do_tag): Fix handling of <link> and <area>, + to use href= instead of src=. + +Mon Jul 12 22:31:48 1999 Hanno Mueller <kontakt at hanno.de> + + * contrib/scriptname/results.shtml: Remove unintentional $(VERSION). + +Mon Jul 12 22:20:40 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Cleanups suggested by Gilles, combining + <link> and <area>, <embed> <object> and <frame> and moving <img> + to a separate case. + +Sun Jul 11 19:32:38 1999 Hanno Mueller <kontakt at hanno.de> + + * contrib/README: Add scriptname directory. + + * contrib/scriptname/*: An example of using htsearch within + dynamic SSI pages + + * htcommon/defaults.cc: Add script_name attribute to override + SCRIPT_NAME CGI environment variable. + + * htdoc/FAQ.html: Update question 4.7 based on including htsearch + as a CGI in SSI markup. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, + htdoc/hts_templates.html: Update based on behavior of script_name + attribute. + + * htsearch/Display.cc: Set SCRIPT_NAME variable to attribute + script_name if set and CGI environment variable if undefined. + +Sat Jul 10 00:22:34 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Regex.cc (getWords): Anchor the match to the beginning + of string, add regex-interpeted characters to extra_word_chars + temporarily, and strip remaining punctuation before making a match. + +Fri Jul 9 22:35:57 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc: Back out change of June 24. + + * htsearch/htsearch.cc: Ditto. + + * htsearch/htsearch.cc (setupWords): Remove HtStripPunctuation in + favor of requiring Fuzzy classes to strip whatever punctuation is + necessary. + + * htfuzzy/Fuzzy.h: Add HtWordType.h to #includes and update comments. + + * htfuzzy/Synonym.cc, htfuzzy/Substring.cc, htfuzzy/Speling.cc, + htfuzzy/Prefix.cc, htfuzzy/Exact.cc, htfuzzy/Endings.cc, + htfuzzy/Fuzzy.cc (getWords): Call HtStripPunctuation on input before + performing fuzzy matching. + +Thu Jul 8 21:28:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Add support for parsing <LINK> tags. + +Mon Jul 5 16:53:23 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/htdig.cc (main): Insert '*' instead of username/password + combination to hide credentials in process accounting. + +Sat Jul 3 17:35:52 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Transport.h(ConnectionWrite): Return value from + Connection::write call. + + * htdig/URLRef.h, htdig/URLRef.cc: Cleanup and made hopcount + default consistent with 7/3 change to DocumentRef.cc + + * htdig/Server.h, htdig/Server.cc, htdig/Retriever.cc: Cleanup and + fixes to match URLRef calling interface. + +Sat Jul 3 16:37:29 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.cc (do_tag): Fix <meta> robots parsing to allow + multiple directives to work correctly. Fixes PR#578, as provided + by Chris Liddiard <c.h.liddiard at qmw.ac.uk>. + +Sat Jul 3 00:47:51 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Makefile.in: Remove old SGMLEntities code. + +Sat Jul 3 00:26:55 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentRef.cc (Clear): Change default value of + docHopCount to 0 to fix several hopcount bugs. + + * htdig/Transport.h, htdig/Transport.cc: Changes to support URL + referers as well as authentication credentials. + + * htdig/HtHTTP.h, htdig/HtHTTP.cc(SetCredentials): Implement HTTP + Basic Authentication credentials. + (SetRequestCommand): Use Referer and Authentication headers if + supplied. + +Sun Jun 30 11:26:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it> + + * htdig/Transport.h: Inserted the methods declarations regarding + the connection management. The code has been moved out from the + HtHTTP.h code. Also moved here the static variable 'debug'. + + * htdig/Transport.cc: Definition of the connection management code. + The code has been moved out from the HtHTTP.cc code. + + * htdig/HtHTTP.h: Eliminated the connection management code and the + static variable 'debug'. Inserted the 'modification_time_is_now' as + a static variable, in order to respect the encapsulation principle. + + * htdig/HtHTTP.cc: Eliminated the connection management code and the + static variable 'debug' initialization. Inserted the + 'modification_time_is_now' initialization. + +Sun Jun 27 16:29:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HTML.h: Cleanup. + + * htcommon/defaults.cc: Added default for img_alt_factor for text + weighting on <IMG ALT="..." tags. + + * htdig/Retriever.cc: Add slot for img_alt_factor. + + * htdig/HTML.cc (do_tag): Rewrite using Configuration class to + separate tag attributes. + (parse): Ignore final '>' in string passed to do_tag. + (do_tag): Index IMG ALT text. + +Fri Jun 25 17:58:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Transport.h: Fix virtual methods for Transport_Response to + have defaults. + + * htdig/HtHTTP.h: Fix class declaration of HtHTTP class to prevent + syntax error. Pointed out by Gabriele. + + * htdig/Transport.cc: Add (empty) ctor and dtor functions for + Transport_Response. + +Thu Jun 24 22:28:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc (main): Add support for form inputs + configdir and commondir as contributed by Herbert Martin Dietze + <herbert at fh-wedel.de>. + + * htsearch/Display.cc (createURL): If configdir and commondir are + defined, add them to URLs sent for other pages. + +Wed Jun 23 23:00:18 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HtHTTP.h, htdig/HtHTTP.cc: Make a subclass of Transport. + +Wed Jun 23 22:08:20 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Configuration.cc (Add): Handle single-quoted values for + attributes. + +Tue Jun 22 23:35:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Transport.h, htdig/Transport.cc: Virtual classes to handle + transport protocols such as HTTP, FTP, WAIS, gopher, etc. + + * htdig/Makefile.in: Make sure they're compiled (not that there's + much!) + + * htdig/HtHTTP.h: Add htdig.h to ensure config is defined. + +Mon Jun 21 14:33:10 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc(readHeader), htdig/HtHTTP.cc(ParseHeader): fix + handling of modification_time_is_now in readHeader, add similar code + to ParseHeader. + +Sun Jun 20 21:25:15 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.h: Add hop parameter to got_href + method. Defaults to 1. + + * htdig/Retriever.cc(got_href): Use it instead of constant 1. + + * htdig/HTML.cc (do_tag): Use new hop parameter to keep the same + hopcount for frame, embed and object tags. + + * htdig/Makefile.in: Make sure HtHTTP.cc is compiled. + + * htdig/HtHTTP.cc (ctor): Add default value for _server to make + prevent strange segmentation faults. + +Fri Jun 18 09:53:30 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/DocumentRef.h, htcommon/DocumentRef.cc(Clear, Deserialize): + add docHeadIsSet field, code for setting and getting it. + * htcommon/DocumentDB.cc(Add): only put out excerpt record if DocHead + is really set. + * htmerge/doc.cc(convertDocs): add missing else after code to delete + documents with no excerpts. + (All these changes fix the disappearing excerpts problem in 3.2.) + +Wed Jun 16 23:04:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Document.cc (UseProxy): Change http_proxy_exclude to an + escaped regex string. Allows for much more complicated rules. + +Wed Jun 16 16:04:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * Makefile.config.in: fix typo in name IMAGE_URL_PREFIX. + + * htdig/Retriever.cc(IsValidURL): change handling of valids to only + reject if list is not empty, give different error message. + +Wed Jun 16 14:40:56 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc(main): pass StringList args to setEscaped() + instead of unprocessed input[] char *'s. + + * htsearch/Display.cc(buildMatchList): cast score to (int) in maxScore + calculation, to avoid compiler warnings. + + * htdig/htdig.cc(main): change comparison on minimalFile to avoid + compiler warnings. + +Wed Jun 16 11:30:23 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/HtRegex.cc(setEscaped): Fix appending of substring to avoid + compiler warnings. + + * htlib/HtDateTime.cc(SettoNow): Strip out all the nonsense that + doesn't work, set Ht_t directly instead. + +Wed Jun 16 09:58:12 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * configure.in, configure, Makefile.config.in: Correct handling of + SEARCH_FORM variable, as Gabriele recommended. + +Wed Jun 16 09:32:06 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/cgi.h, htlib/cgi.cc(cgi & init), htsearch/htsearch.cc + (main & usage): allow a query string to be passed as an argument. + +Wed Jun 16 08:43:09 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/Makefile.in, htdig/Makefile.in, htfuzzy/Makefile.in, + htmerge/Makefile.in, htnotify/Makefile.in: Use standard $(bindir) + variable instead of $(BIN_DIR). Allows for standard configure flags + to set this. (Completes Geoff's change on May 15.) + +Tue Jun 15 14:31:50 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/PDF.cc(parseNonTextLine): move line that clears _parsedString, + so title cleared even if rejected. + + * htsearch/Display.cc(buildMatchList & sort): move maxScore calculation + from sort to buildMatchList, so it's done even if there's only 1 match. + +Mon Jun 14 15:01:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc(RetrieveHTTP): Show "Unknown host" message if + Connection::assign_server() fails (due to gethostbyname() failure). + +Mon Jun 14 13:52:34 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htcommon/defaults.cc, htsearch/Display.h, htsearch/Display.cc, + htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, + htdoc/hts_templates.html: add template_patterns attribute, to select + result templates based on URL patterns. + +Sun Jun 13 16:29:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (IsValidURL): Add valid_extension list, as + requested numerous times. + + * htcommon/defaults.cc: Add config attribute valid_extensions, + with default as empty. + +Sat Jun 12 23:10:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentRef.h: Fix thinkos introduced in change earlier + today. Actually compiles correctly now. + +Sat Jun 12 22:37:22 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HtHTTP.cc (ParseHeader): Fix parsing to take empty headers + into account. Fixes PR#557. + + * htsearch/Display.h, htsearch/Display.cc (excerpt): Fix + declaration to refer to first as reference--ensures ANCHOR is + properly set. Fixes PR#541 as suggested by <pmb1 at york.ac.uk>. + + * htfuzzy/Endings.cc (getWords): Fixed PR#560 as suggested by + Steve Arlow <yorick at ClarkHill.com>. Solves problems with fuzzy + matching on words like -ness: witness, highness, likeness... Tries + to interpret words as root words before attempting stemming. + + * installdir/search.html (Match): Add Boolean to default search + form, as suggested by PR#561. + + * htlib/URL.cc (URL): Fix PR#566 by setting the correct length of + the string being matched. 'http://' is 7 characters... + +Sat Jun 12 19:06:36 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtZlibCodec.h, htlib/HtZlibCodec.cc: New files. Provide + general access to zlib compression routines when available. + + * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Remove + compression access and restore DocHead access through default + methods. Compression of excerpts will occur through the + HtZlibCodec classes and through the DocumentDB excerpt access. + +Sat Jun 12 15:25:08 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htmerge/docs.cc (convertDocs): Load excerpt from external + database before considering it empty. + +Sat Jun 12 14:41:54 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc (displayMatch): Added patch from Torsten + Neuer <tneuer at inwise.de> to fix PR# 554. + + * htdig/HTML.cc (do_tag): Add parsing for <embed> and <object>, + including suggestions from Gilles as to condensing cases with + <img> parsing. + +Sat Jun 12 14:00:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/ExternalParser.cc (parse): Quote the filename before + passing it to the command-line to prevent shell escapes. Fixes PR#542. + +Fri Jun 11 15:59:10 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/URL.cc(removeIndex): use CompareWord instead of FindFirstWord, + to avoid substring matches. + +Wed Jun 2 15:51:00 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/URLTrans.cc(encodeURL): Fix to ensure that non-ASCII letters + get URL-encoded. + +Mon May 31 22:40:29 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentDB.cc(ReadExcerpt): Fix silly typos with methods, + thinko with docID. + (Add): Add the excerpt *before* the URL index is written. + + * htdig/Retriever.cc(isValidURL): Remove code restricting URLs to + relative and http://. + + * htdig/htdig.cc(main): Unlink the doc_excerpt file when doing an + initial dig. + (main): Fix silly typo with minimumFile. + + * htmerge/db.cc(mergeDB): Call DocumentDB::Open() with doc_excerpt for + consistency--doesn't actually do anything with it. + + * htmerge/docs.cc(convertDocs): Ditto. Also don't delete a + document simply because it has an empty DocHead. Excerpts are now + stored in a separate database! + + * htmerge/htmerge.h: Call mergeDB and convertDocs with + doc_excerpt parameter. + + * htmerge/htmerge.cc(main): Ditto. + + * htsearch/Display.h: Call ctor with all three doc db filenames. + + * htsearch/Display.cc(Display): Call DocumentDB::Open with above. + (excerpt): Retrieve the excerpt from the excerpt database. + + * htsearch/htsearch.cc: Call Display::Display with all three doc + db filenames. + +Mon May 31 15:08:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentDB.h: Add new method ReadExcerpt to read the + excerpt from the separate (new) excerpt database. Change Open() + and Read() methods to account for this new database. + + * htcommon/DocumentDB.cc (Open): Open the excerpt database too. + (Read): Ditto. + (Close): Close it if it exists. + (ReadExcerpt): Explicitly read the DocHead of this DocumentRef. + (Add): Make sure DocHeads go into the excerpt database. + (Delete): Make sure we delete the associated excerpt too. + (CreateSearchDB): Make sure we grab the excerpt from the database. + + * htcommon/DocumentRef.cc(Serialize): Don't serialize the DocHead + field, this is done in the DocumentDB code. + + * htcommon/defaults.cc(modification_time_is_now): Set to true to + avoid problems with not setting dates when no Last-Modified: + header appears. + (doc_excerpt): Add new attribute for the filename of the excerpt + database. + + * htdig/HtHTTP.h: Remove incorrect virtual declarations from + Request and EstablishConnection methods. Assign void return value + to ResetStatistics since it doesn't return a value. + + * htdig/htdig.cc (main): Add new "minimal" flag '-m' to only index + the URLs in the supplied file. Sets hopcount to ignore links. + +Sun May 30 19:36:15 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at> + + * htlib/URL.cc (normalizePath): Fix bug that caused endless loops + and core dumps when normalizing URLs with more than one of + ( "/../" | "/./" | "//" | "%7E" ) + + * htlib/HtDateTime.cc (Httimegm): Call Httimegm in timegm.c unless + HAVE_TIMEGM. + +Wed May 26 23:15:46 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htmerge/db.cc (mergeDB): Add patch contributed by Roman Dimov + <roman at twist.mark-itt.ru> to fix problems with confusing docIDs, + resulting in documents in main db removed when the corresponding + DocID was supposed to be removed from the merged db. + +Wed May 26 11:30:22 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.h, htsearch/Display.cc, htsearch/htsearch.cc: + Switch restrict and excludes to use HtRegex instead of StringMatch. + + * htdig/htdig.cc (main): Fix typo clobbering setting of + excludes. Obviously fixes problems with badquerystr and excludes! + + * htdig/HtHTTP.cc (ParseHeader): Change parsing to skip extra + whitespace, as in 5/19 Document.cc(readHeader) change. + +Wed May 19 22:17:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/HtHTTP.cc, htdig/HtHTTP.h: Add new files, contributed by + Gabriele. A start at an HTTP/1.1 implementation. + + * htdig/Document.cc (readHeader): Fix change of 5/16 to actually + work! :-) + + * htsearch/Display.cc (expandVariables): Change end-of-expansion + test to include states 2 and 5 to ensure templates ending in } are + still properly expanded, as suggested by Gilles. + +Mon May 17 14:31:31 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegex.cc (setEscaped): Use full list of characters to + escape as suggested by Gilles. + +Sun May 16 17:27:51 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Document.cc (readHeader): Since multiple whitespace + characters are allowed after headers, don't use strtok. + (readHeader): We no longer pretend to parse Word, PostScript, or + PDF files internally. + (getParsable): Don't generate PostScript or PDF objects since we + no longer recommend using them. + +Sun May 16 17:07:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegex.cc (setEscaped): Ensure escaping does not loop + beyond the end of a string. + + * htdig/Retriever.cc (IsValidURL): Fix badquerystr parsing to use + HtRegex as expected. (Oops!) + + * htdig/HTML.cc (parse): Use HtSGMLCodec during parsing, rather + than encoding the whole document at the beginning. More consistent + with previous use of SGMLEntities. + +Sat May 15 12:57:40 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/URL.cc (normalizePath): Remove extra (useless) variable + declarations. + + * htlib/htString.h, htlib/String.cc: Add new method Nth to solve + problems with (String *)->[]. + + * htlib/HtRegex.h, htlib/HtRegex.cc: Added new method + setEscaped(StringList) to produce a pattern connected with '|' of + possibly escaped strings. Strings are not escaped if enclosed in + [] and the brackets are removed from unescaped regex. + + * htdig/htdig.h: Use HtRegex instead of StringMatch for limiting + by default. + + * htdig/Retriever.cc: As above. + + * htdig/htdig.cc(main): As above. Use setEscaped to set limits + correctly (i.e. in a backwards-compatible way). + +Sat May 15 11:24:26 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Speling.h, htfuzzy/Speling.cc: New files for simple + spelling corection. Currently limited to transpostion and added + character errors. Missing character errors to be added soon. + + * htfuzzy/Makefile.in: Compile it. + + * htfuzzy/Fuzzy.cc (getFuzzyByName): Use it. + + * htcommon/defaults.cc: Add new option minimum_speling_length for + the shortest query word to receive speling fuzzy + modifications. Should prevent problems with valid words generating + unrelated "corrections" of words. Default is 5 chars. + +Sat May 15 11:18:27 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Fuzzy.cc (getWords): Ensure word is not an empty or null + string. + + * htfuzzy/Metaphone.cc (generateKey): Ditto. Should solve PR#514. + + * htdig/Document.cc (Reset): Do not use modification_time_is_now + attribute. Simply reset modtime to 0, time is set elsewhere. + + * Makefile.config.in: Add options from separate CONFIG files. + + * configure.in, configure: Add configure-level switches for + --with-image-url-prefix= and --with-search-form=. Do not generate + CONFIG file (hopefully to be phased out soon). + + * */Makefile.in: Make linking CONFIG-dependent files depend on + Makefile.config, not CONFIG. + + * Makefile.in: Use standard $(bindir) variable instead of + $(BIN_DIR). Allows for standard configure flags to set this. + +Tue May 11 11:15:08 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtDateTime.h, htlib/HtDateTime.cc: Updates from Gabriele, + fixing SetToNow() and adding GetDiff to return the difference in + time_t between two objects. + + * htdig/Retriever.cc (Need2Get): Add patch from Warren Jones + <wjones at tc.fluke.com> to keep track of inodes on local files to + eliminate duplicates. Hopefully this will serve for a first-try at + a signature method for HTTP as well. + +Tue May 4 20:20:40 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/Regex.h, htfuzzy/Regex.cc: Add new regex fuzzy + algorithm, based on Substring and Prefix. + + * htfuzzy/Fuzzy.cc (getFuzzyByName): Add it. + + * htfuzzy/Makefile.in: Compile it. + + * htcommon/defaults.cc: Add new attribute regex_max_words, same + concept as substring_max_words. + + * htfuzzy/Exact.cc, htfuzzy/Substring.cc, htfuzzy/Prefix.cc: + Define names attribute for debugging purposes. + + * installdir/htdig.conf: Fix the comments for search_algorithm to + refer to all the current possibilities. + + * htlib/HtRegex.cc (match): Slight cleanup of how to return. + +Tue May 4 15:28:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc (reportError): Add e-mail of maintainer to + error message. Should help direct people to the correct place. + + * htdig/Retriever.cc (IsValidURL): Lowercase all extensions from + bad_extensions as well as all extensions used in + comparisons. Ensures we're using case-insenstive matching. + +Mon May 3 23:20:22 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/Retriever.cc (IsValidURL): Fix typo with #else statement + for REGEX. + + * htdig/htdig.cc: Add conditionals for REGEX to use HtRegex + instead of StringMatch methods when defined. + + * htlib/HtDateTime.h: Update to remove definitions of true and + false, established by May 2 change in + include/htconfig.h.in as contributed by Gabriele. + + * htlib/HtDateTime.cc: Replace call to mktime internal function to + Httimegm in timegm.c, contributed by Leo. + + * htlib/timegm.c: Declare my_mktime_gmtime_r to prevent compiler + errors with incompatible gmtime structures, contributed by Leo. + + * configure.in: Rearrange date/time checks for clarity. + + * configure: Regenerate using autoconf. + + * include/htconfig.in: Add HAVE_STRFTIME flag. + +Sun May 2 18:49:04 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at> + + * configure.in, include/htconfig.h.in: Added a configure test for + the availability of the bool type. + +Fri Apr 30 20:00:09 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtDateTime.h, htlib/HtDateTime.cc: Update with new + versions sent by Gabriele. + +Fri Apr 30 19:30:42 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtRegex.h, htlib/HtRegex.cc: New class, contributed by + Peter D. Gray <pdg at draci.its.uow.edu.au> as a small wrapper for + system regex calls. + + * htlib/Makefile.in: Build it. + + * htdig/htdig.h: Use it if REGEX is defined. + + * htdig/htdig.cc: Ditto. + + * htdig/Retriever.cc: Ditto. + + * htsearch/Display.cc(generateStars): Remove extra newline after + STARSRIGHT and STARSLEFT variables, noted by Torsten Neuer + <tneuer at inwise.de>. + +Fri Apr 30 18:52:56 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at> + + * htlib/URL.cc(ServerAlias): port for server_aliases entries now + defaults to 80 if omitted. + +Wed Apr 28 19:57:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtDateTime.h, htlib/HtDateTime.cc: New class, contributed + by Gabriele. + + * htlib/Makefile.in: Compile it. + + * README: Update message from 3.1.0 (oops!) to 3.2.0, remove rx + directory. + + * installdir/htdig.conf: Add example of no_excerpt_show_top + attribute in line with most user's expectations. + + * contrib/README: Mention contributed section of the website. + + * Makefile.in: Ignore mailarchive directory--now removed from CVS. + +Wed Apr 28 10:46:31 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htmerge/db.cc(mergeDB): fix a few errors in how the merge index + name is obtained. + +Tue Apr 27 23:00:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * Makefile.config.in: Remove now-useless LIBDIRS variable. + + * mailarchive/Split.java, mailarchive/htdig: Remove ancient + mailarchive stuff. + +Tue Apr 27 18:01:52 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(setupImages): Remove code setting URLimage to + a bogus pattern (remnant left over after merge). + +Tue Apr 27 16:43:08 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc(RetrieveHTTP): Show "Unable to build connection" + message at lower debug level. + +Tue Apr 27 11:24:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.h: Remove sort, compare functions re-introduced + in merge. Moved to ResultMatch by Hans-Peter's April 19th chnages. + + * htsearch/Display.cc: Remove bogus call to ResultMatch:setRef, + removed by Hans-Pater's April 19th changes. + +Sat Apr 24 21:08:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * Merge in changes from 3.1.2 (see below). + + * htcommon/WordList.cc: Change valid_word to use iscntl(). + + * htdig/Plaintext.cc: Remove CVS Log. + + * htdig/Retriever.cc: Fix ancient bug with empty excludes list. + + * htlib/List.cc: Remove CVS Log, use more succinct test for + out-of-bounds. + + * htsearch/Display.cc: Fix logic with starPatterns, only show top + of META description. + + * htsearch/Display.h: Introduce headers needed for sort functionality. + + * installdir/htdig.conf: Add example max_doc_size attribute as + well as example for including start_url from a file. + + * htdoc/ChangeLog, htdoc/RELEASE.html, htdoc/FAQ.html, + htdoc/where.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, + htdoc/uses.html, htdoc/contents.html, htdoc/mailarchive.html: + Merge in documentation updates from 3.1.2. + +Sat Apr 24 15:18:45 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htsearch/Display.cc (sort): Return immediately if <= 1 items to + sort. + +Mon Apr 19 00:53:06 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htsearch/ResultMatch.h (create): New. All (the only) ctor + caller changed to use this. + (setRef, getRef): Removed. Callers changed to use nearby data. + (incomplete): Removed. + (setIncompleteScore): Renamed to... + (setScore): ...this. All callers changed. + (setSortType): New. + (getTitle, getTime, setTitle, setTime, getSortFun): New virtual + functions. + (enum SortType): Moved from Display, private. + (mySortType): New static member. + + * htsearch/ResultMatch.cc (mySortType): Define static member + variable. + (getScore): Remove handling of "incomplete". Moved to ResultMatch.h + (getTitle, getTime, setTitle, setTime): New dummy functions. + (class ScoreMatch, class TimeMatch, class IDMatch, class + TitleMatch): Derived classes with compare functions (from Display) + and extra sort-method-related members, as needed. + (setSortType): New, mostly moved from Display. + (create): New. + + * htsearch/Display.h: Changed first argument from ResultMatch * to + DocumentRef *. + (compare, compareTime, compareID, compareTitle, enum SortType, + sortType): Removed. + + * htsearch/Display.cc (display): Call ResultMatch::setSortType and + output syntax error page for invalid sort methods. + (displayMatch): Change first argument from ResultMatch * to + DocumentRef *ref. All callers changed. + (buildMatchList): Remove call to sortType and typ variable. + Always call (ResultMatch::)setTime and setTitle. Remove extra + call to setID. + (sort): Call (ResultMatch::)getSortFun for qsort compare function. + (compare, compareTime, compareID, compareTitle, sortType): Removed. + +Wed Apr 14 21:21:35 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at> + + * htlib/regex.c: fixed compile problem with AIX xlc compiler + + * htlib/HtHeap.h: fixed compile problem with AIX xlc compiler (bool) + + * htlib/HtVector.h: ditto + + * htsearch/Display.cc: fixed typo + +Wed Apr 14 00:17:06 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.h: Add compareID for sorting results by DocID. + + * htsearch/Display.cc: As above. + +Tue Apr 13 23:50:28 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/defaults.cc: Add new config option use_doc_date to use + document meta information for the DocTime() field. + + * htdig/HTML.cc(do_tag): Call Retriever::got_time if use_doc_date + is set and we run across a META date tag. + + * htdig/Retriever.h, htdig/Retriver.cc: Add new got_date + function. When called, sets the DocTime field of the DocumentRef + after parsing is completed. Currently assumes ISO 8601 format for + the date tag. + +Sun Apr 11 12:51:39 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htsearch/Display.cc (buildMatchList): Delete thisRef if excluded + by URL. Call setRef(NULL), not setRef(thisRef). + +Wed Apr 7 19:35:42 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc(usage): Remove bogus -w flag. + +Thu Apr 1 12:05:11 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/htsearch.cc(main): Apply Gabriele's patch to avoid using an + invalid matchesperpage CGI input variable. + + * htsearch/Display.cc(display) & (setVariables): Correct any invalid + values for matches_per_page attribute to avoid div. by 0 error. + +Wed Mar 31 15:19:25 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htfuzzy/Synonym.cc: Fix previous fix of minor memory leak. + (db pointer wasn't properly set) + +Mon Mar 29 10:31:09 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/Display.cc(excerpt): Added patch from Gabriele to + improve display of excerpts--show top of description always, + otherwise try to find the excerpt. + +Sun Mar 28 19:45:02 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htlib/HtWordType.h (HtIsWordChar): Avoid matching 0 when using + strchr. + (HtIsStrictWordChar): Ditto. + + * htdig/ExternalParser.cc (parse): Before got_href call, set + hopcount of URL to that of base plus 1. + Add URL to external parser error output. + + * htlib/URL.cc (URL(char *ref, URL &parent) ): Move call to + constructURL call inside previous else-clause. + (parse): Reset _normal, _signature, _user initially. + Commence parsing, even if no "//" is found. Do not set _normal + here. + (normalizePath): Call removeIndex finally. + + * htcommon/WordRecord.h (WORD_RECORD_COMPRESSED_FORMAT) + [!NO_WORD_COUNT]: Change to "cu4". + + * htlib/HtPack.cc (htPack): Correct handling at end of code-string + and end of encoding-byte. Add code 'c' for often-1 unsigned ints. + (htUnpack): Add handling of code 'c'. + +Thu Mar 25 12:18:05 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * installdir/long.html, installdir/short.html: Remove backslashes + before quotes in HTML versions of the builtin templates. + + * Makefile.in: Add long.html & short.html to COMMONHTML list, so + they get installed in common_dir. + +Thu Mar 25 11:56:50 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(displayMatch), htcommon/defaults.cc, + htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Add date_format attribute suggested by Marc Pohl. + +Thu Mar 25 09:46:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(displayMatch): Avoid segfault when DocAnchors + list has too few entries for current anchor number. + +Tue Mar 23 15:08:40 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(displayMatch): Fix problem when documents + did not have descriptions. + +Tue Mar 23 14:17:14 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/PDF.cc(parseString): Use minimum_word_length instead of + hardcoded constant. + +Tue Mar 23 14:02:40 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc: Fix bug where noindex_start was empty, allow case + insensitive matching of noindex_start & noindex_end. + + * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: + Fix inconsistencies in documentation for noindex_start & noindex_end. + +Tue Mar 23 14:01:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc: Add check for <a href=...> tag that is missing a + closing </a> tag, terminating it at next href. + +Tue Mar 23 13:57:35 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Document.cc: Fix check of Content-type header in readHeader(), + correcting bug introduced Jan 10 (for PR#91), and check against + allowed external parsers. + +Tue Mar 23 13:54:35 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc: More lenient comment parsing, allows extra dashes. + +Tue Mar 23 12:22:53 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htlib/Configuration.cc(Add): Fix function to avoid infinite loop + on some systems, which don't allow all the letters in isalnum() that + isalpha() does, e.g. accented ones. + + * htdig/HTML.cc: Fix three reported bugs about inconsistent + handling of space and punctuation in title, href description & head. + Now makes destinction between tags that cause word breaks and those + that don't, and which of the latter add space. + +Tue Mar 23 12:15:48 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/Plaintext.cc(parse): Use minimum_word_length instead of + hardcoded constant. + +Tue Mar 23 12:11:04 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htmerge/words.cc(mergeWords): Fix to prevent description text + words from clobbering anchor number of merged anchor text words. + +Tue Mar 23 12:02:00 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/Display.cc(generateStars): Add in support for use_star_image + which was lost when template support was put in way back when. + +Tue Mar 23 11:47:52 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * Makefile.in: add missing ';' in for loops, between fi & done + +Mon Mar 22 16:06:15 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htdig/HTML.cc: Check for presence of more than one <title> tag. + +Mon Mar 22 15:32:15 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrib/parse_doc.pl: Fix handling of minimum word length. + +Sun Mar 21 15:19:00 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htlib/HtPack.cc (htPack): New. + * htlib/HtPack.h: New. + * htsearch/parser.cc (perform_push): Unpack WordRecords using + htUnpack. + * htsearch/htsearch.h: Add "debug" declaration. + * htmerge/words.cc (mergeWords): Pack WordRecords using htPack. + * htlib/Makefile.in (OBJS): Add HtPack.o + * htcommon/WordRecord.h: Add WORD_RECORD_COMPRESSED_FORMAT + + * htdig/HTML.cc (parse): Keep contents in String variable + textified_contents while using its "char *". + + * htsearch/Display.cc (excerpt): Similar for head_string. + +Thu Mar 18 20:01:24 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * installdir/long.html, installdir/short.html: Write out HTML + versions of the builtin templates. + + * installdir/htdig.conf: Add commented-out template_map and + template_name attributes to use the on-disk versions. + +Tue Mar 16 03:06:06 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htcommon/DocumentDB.cc (Delete): Fix bad parameter to Get: use + key, not DocID. + +Tue Mar 16 01:50:16 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htlib/HtWordType.h (class HtWordType): New. + * htlib/HtWordType.cc: New. + * htlib/Makefile.in (OBJS): Add HtWordType.o + + * htdoc/attrs.html: Document attribute extra_word_characters. + * htdoc/cf_byprog.html: Ditto. + * htdoc/cf_byname.html: Ditto. + + * htcommon/defaults.cc (defaults): Add extra_word_characters. + + * htsearch/htsearch.h: Lose spurious extern declaration of unused + variable valid_punctuation. + * htsearch/htsearch.cc (main): Call HtWordType::Initialize. + (setupWords): Use HtIsWordChar, HtIsStrictWordChar and + HtStripPunctuation. Do not read valid_punctuation. + + * htsearch/Display.cc (excerpt): Use HtIsStrictWordChar. + + * htlib/StringMatch.cc (FindFirstWord): Ditto. + (CompareWord): Ditto. + + * htdig/htdig.cc (main): Call HtWordType::Initialize. + + * htdig/Retriever.h (class Retriever): Lose member + valid_punctuation. + * htdig/Retriever.cc (Retriever): Lose its initialization. + + * htdig/Postscript.h (class Postscript): Lose member + valid_punctuation. + * htdig/Postscript.cc (Postscript): Lose its initialization. + (flush_word): Use HtStripPunctuation. + (parse_string): Use HtIsWordChar, + HtIsStrictWordChar and HtStripPunctuation. + + * htdig/Parsable.h (class Parsable): Lose member + valid_punctuation. + * htdig/Parsable.cc (Parsable): Lose its initilization. + + * htcommon/WordList.cc (valid_word): Use HtIsStrictWordChar. + (BadWordFile): Use HtStripPunctuation. Do not read + valid_punctuation. + + * htcommon/DocumentRef.cc (AddDescription): Use HtIsWordChar, + HtIsStrictWordChar and HtStripPunctuation. Do not read + valid_punctuation. + + * htdig/PDF.cc (parseString): Similar.. + + * htdig/HTML.cc (parse): Similar. + + * htdig/Plaintext.cc (parse): Similar. + +Sun Mar 14 14:04:31 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Makefile.in: Add HtSGMLEntites.o to OBJS. + +Sat Mar 13 21:29:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htcommon/DocumentDB.cc(Open, Read): Switch to DB_HASH for faster + access. Most important for very quick URL lookups! + + * htcommon/DocumentRef.cc(AddDescription): Check to see that + description isn't a null string or contains only whitespace before + doing anything. + + * htlib/HtSGMLCodec.h, htlib/HtSGMLCodec.cc: Add new class to + convert between SGML entities and high-bit characters. + + * htdig/HTML.cc(parse): Use it instead of SGMLEntities. + + * htsearch/Display.cc(excerpt): Use HtSGMLCodec to covert *back* + to SGML entities before displaying. + + * htlib/HtHeap.cc: Cleaned up comments, use more efficient + procedure to build from a vector. + + * htlib/HtWordCodec.cc(HtWordCodec): Fix bug with constructing from + uninitialized variables! + + * htlib/URL.h, htlib/URL.cc: Initial support for multiple schemes and + user@host URLs. + + * htlib/List.cc(Nth): Check for out-of-bounds requests before + doing anything. + +Fri Mar 12 00:31:03 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htlib/mktime.c (__mon_yday): Correct size to number of + initializers (2). + + * htsearch/htsearch.cc (main): Remove doc_index handling. + + * htsearch/ResultMatch.h (setURL): Change to setID, use int. + All callers changed. + (getURL): Change to getID. + All callers changed. + (String url): Change to "int id". + + * htsearch/Display.h: (Display): Second parameter removed. + (docIndex) removed. + + * htsearch/Display.cc (Display, ~Display): Do not handle + docIndex. + (display): Use DocumentDB::operator [](int), not + DocumentDB::operator [] (char *). + (buildMatchList): Changed to handle ResultMatch as DocID int, + instead of URL string: use DocumentDB::operator [](int), not + DocumentDB::operator [] (char *). Get DocumentRef directly, then + filter the URL by includeURL(). + + * htnotify/htnotify.cc (main): Use DocIDs(), not DocURLs(). + Handle the change from String * to IntObject *. + + * htmerge/htmerge.cc (main): Do not delete doc_index. + + * htmerge/docs.cc (convertDocs): Test doc_index access as + read-only. Pass as parameter for docdb, do not handle separately. + + * htmerge/docs.cc (convertDocs): Add debug messages about cause + when deleting documents. If verbose > 1, write id/URL for every URL. + + * htmerge/db.cc (mergeDB): Handle doc_index, test accessibility. + + * htlib/IntObject.h (class IntObject): Add int-constructor. + + * htdoc/attrs.html (doc_index): Say that mapping is from document + URLs to numbers. + (doc_db): Say that indexing is on document number. + + * htdoc/cf_byprog.html (doc_index): Move from htsearch to htdig + entry. + + * htdig/htdig.cc (main): Add .work suffix to doc_index too. + Unlink doc_index if initial. + + * htcommon/DocumentDB.h (Open): New second argument. + (Read): New second argument, default to 0. + (operator [](int)): New. + (Exists(char *), Delete(char *)): Change to int parameter. + (DocIDs, i_dbf): New. + + * htcommon/DocumentDB.cc (operator [](int)): New. + (Exists(char *), Delete(char *)): Changed to DocID int parameter. + All callers changed. + (URLs): Assume keys are ok without probing for documents + with each key. + (DocIDs): New. + (Open): Take an index database file name as second argument. + All callers changed. + (Read): Similar, accept 0. + (all): Change to index on DocID. + +Wed Mar 10 02:25:24 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htdoc/attrs.html (template_name): Typo; used by htsearch, not + htdig. + +Mon Mar 8 13:30:44 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htdig/Retriever.cc (got_href): Check if the ref is for the + current document before adding it to the db. + +Mon Mar 8 01:36:38 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htlib/DB2_db.cc: Remove errno. + * htlib/DB2_hash.cc: Ditto. + +Sun Mar 7 20:50:37 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htfuzzy/EndingsDB.cc(createDB): Use link and unlink to move, + rather than a non-portable system call. + + * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Fix #ifdef + problems with zlib. + +Sun Mar 7 09:39:37 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/timegm.c: Fix problems compiling on libc5 systems noted by + Hans-Peter. + + * htlib/Makefile.in, Makefile.in, Makefile.config.in: Use regex.c + instead of rx. + + * htfuzzy/EndingsDB.cc: Ditto. + + * configure.in, configure: Don't bother to config rx directory. + +Fri Mar 5 08:09:20 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * contrig/parse_doc.pl: uses pdftotext to handle PDF files, + generates a head record with punctuation intact, extra checks + for file "wrappers" & check for MS Word signature (no longer + defaults to catdoc), strip extra punct. from start & end of words, + rehyphenate text from PDFs. + +Tue Mar 2 23:18:20 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htdig/htdig.cc: Renamed main.cc for consistency with other programs. + + * htlib/DB2_hash.h, htlib/DB2_hash.cc: Added interface to Berkeley + hash database format. + + * htlib/Makefile.in: Use them! + + * htlib/Database.h: Define database types, allowing a choice + between different formats. + + * htlib/Database.cc(getDatabaseInstance): Use passed type to pick + between subclasses. Currently only uses Hash and B-Tree formats of + Berkeley DB. + + * htcommon/DocumentDB.cc, htfuzzy/Endings.cc, + htfuzzy/EndingsDB.cc, htfuzzy/Fuzzy.cc, htfuzzy/Prefix.cc, + htfuzzy/Substring.cc, htfuzzy/Synonym.cc, htfuzzy/htfuzzy.cc, + htmerge/docs.cc, htmerge/words.cc, htsearch/Display.cc, + htsearch/htsearch.cc: Use new form of getDatabaseInstance(), + currently with DB_BTREE option (for compatibility). + +Mon Mar 1 22:53:37 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/regex.c, htlib/striptime.c: Import new versions from + glibc. + + * htlib/Makefile.in, htlib/mktime.c, htlib/timegm.c, htlib/lib.h: + Changes to use glibc timegm() function instead of buggy mytimegm(). + + * htdig/Document.cc(getdate): Use it. + +Tue Mar 2 02:35:50 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * attrs.html: Rephrase and clarify entry for url_part_aliases. + +Sun Feb 28 23:25:40 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htlib/HtURLCodec.cc (~HtURLCodec): Add missing deletion of + myWordCodec. + +Fri Feb 26 19:03:58 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure, configure.in: Fix typo on timegm test. + + * htlib/mytimegm.cc: Fix Y2K problems. + +Wed Feb 24 21:09:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc(main): Remember to delete the parser! + + * htlib/String.cc(String(char *s, int len)): Remove redundant copy. + + * htsearch/Display.cc(display): Free DocumentRef memory after + displaying them. + (displayMatch): Fix memory leak when documents did not have anchors. + +Wed Feb 24 15:18:26 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Configuration.cc(Add): Fix small leak in locale code. + + * htlib/String.cc: Fix up code to be cleaner with memory + allocation, inline next_power_of_2. + +Mon Feb 22 22:13:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/String.cc, htlib/htString.h: Fix some memory leaks. + +Mon Feb 22 08:52:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/Dictionary.h, htlib/Dictionary.cc(hashCode): Check if key + can be converted to an integer using strtol. If so, use the + integer as the hash code. + + * htlib/HtVector.h, htlib/HtVector.cc: Implement Release() method + and make sure delete calls are done properly. + + * htsearch/ResultList.h, htsearch/ResultList.cc(elements): Use HtVector + instead of List. + + * htsearch/parser.cc: Ditto. + +Sun Feb 21 16:13:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtHeap.h, htlib/HtHeap.cc: Add new class. + + * htlib/Makefile.in: Compile it. + + * htlib/HtVector.h, htlib/HtVector.cc: Add Assign() to assign to + elements of vectors. + +Sun Feb 21 14:45:26 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htsearch/htsearch.cc: Add patch from Jerome Alet <alet at unice.fr> + to allow '.' in config field but NOT './' for security reasons. + + * htdig/HTML.cc: Add patch from Gabriele to ensure META + descriptions are parsed, even if 'description' is added to the + keyword list. + +Sun Feb 21 14:43:44 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca> + + * htsearch/parser.h, htsearch/parser.cc: Clean up patch made for + error messages, made on Feb 16. + +Thu Feb 18 20:19:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * htlib/HtVector.h, htlib/HtVector.cc: Added new Vector class. + + * htlib/Makefile.in: Compile it. + + * htlib/strptime.c: Add new version from glibc-2.1, replacing + strptime.cc. + + * htdig/Document.cc: Use it. + + * htlib/regex.h, htlib/regex.c: Add new files from glibc-2.1. + + * htlib/mktime.c: Update from glibc-2.1. + +Wed Feb 17 23:44:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu> + + * configure.in, configure, aclocal.m4: Add autoconf macro to + detect syntax of makefile includes. + + * Makefile.in, Makefile.config.in, */Makefile.in: Change include + syntax to use it. + +Wed Feb 17 12:36:42 1999 Hans-Peter Nilsson <hp at bitrange.com> + + * htcommon/defaults.cc (defaults): locale: change to "C". + +Local Variables: + add-log-time-format: current-time-string +End: |