diff options
Diffstat (limited to 'debian/htdig/htdig-3.2.0b6/htdoc/RELEASE.html')
-rw-r--r-- | debian/htdig/htdig-3.2.0b6/htdoc/RELEASE.html | 1542 |
1 files changed, 1542 insertions, 0 deletions
diff --git a/debian/htdig/htdig-3.2.0b6/htdoc/RELEASE.html b/debian/htdig/htdig-3.2.0b6/htdoc/RELEASE.html new file mode 100644 index 00000000..5caf2b79 --- /dev/null +++ b/debian/htdig/htdig-3.2.0b6/htdoc/RELEASE.html @@ -0,0 +1,1542 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> +<html> + <head> + <title> + ht://Dig: Release notes + </title> + </head> + <body bgcolor="#eef7ff"> + <h1> + Release notes + </h1> + <p> + ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br> + Please see the file <a href="COPYING">COPYING</a> for + license information. + </p> + <hr size="4" noshade> + <p> + These are notes that go with each release of ht://Dig. There + is also a <a href="ChangeLog">ChangeLog</a> file which has + more details on the code changes. + </p> + + <p> + <strong>Release notes for htdig-3.2.0b6</strong> 20 Jun 2004<br> + The next beta release of ht://Dig, 3.2.0b6, is now available. + It fixes several bugs from 3.2.0b5, and runs somewhat faster, + although still much slower than 3.1.6. (No significant speed + improvements are expected in the near future, although we are + working on it.) Calling this release a "beta" simply means + that exhausive testing, especially on non-Linux platforms, is + not yet complete. However, we consider it stable enough for + most production use. + </p> + + <p> + As with 3.2.0b5, if you are upgrading + from a previous version, you should read the <a + href="upgrade.html">upgrade guide</a> first. + </p> + Bug fixes: + <ul> + <li>Correctly handle empty <code>disallow</code> entries in + robots.txt</li> + <li>No longer compile regular expressions for + every URL (improve performances)</li> + <li>Allow compressed databases on Cygwin</li> + <li>Fixed bugs in phrase searching</li> + <li>Improved parsing of the configuration file</li> + <li>bin/rundig -a handles multiple database directories</li> + <li>Ellipsis displayed correctly by htsearch</li> + <li>Allow '-' argument to '-m' ('minimal') runtime option to + htdig</li> + <li>Check validity of first URL from each server</li> + <li>No longer ignore empty configuration attributes</li> + <li>fixed bug in handling 'http_proxy', 'http_proxy_authorization', + 'authorization attributes'</li> + <li>remove stale md5_db if '-i' specified</li> + <li>Make 'server_alias' case insensitive</li> + <li>fixed bugs with zlib</li> + <li>Allow &euro; HTML entity</li> + <li>fixed other minor bugs</li> + </ul> + New features: + <ul> + <li>added <a + href="attrs.html#allow_space_in_url">allow_space_in_url</a> + attribute: if set to true, htdig will handle URLs that + contain embedded spaces</li> + <li>added <a + href="attrs.html#store_phrases">store_phrases</a> attribute: + if it is false, htdig only stores the first occurrence + of each word in a document</li> + <li>added an improved version of RTF2HTML into the + contrib section</li> + <li>added <a href="http://www.openoffice.org/">OpenOffice.org</a> + support to doc2html in contrib section</li> + <li>improved date factor formula</li> + <li>improved tests</li> + <li>improved documentation</li> + <li>added man pages</li> + </ul> + + <p> + <strong>Release notes for htdig-3.2.0b5</strong> 10 Nov 2003<br> + This version was slated to be 3.2.0rc1, but some final testing + is still required. It primarily fixes many bugs in 3.2.0b3, with + some limited new functionality. + As with 3.2.0b1 and 3.2.0b2, if you are upgrading + from a previous version, you should read the <a + href="upgrade.html">upgrade guide</a> first. + </p> + <ul> + <li>Fixed database bugs. Introduced zlib compression to replace + buggy internal compression.</li> + <li>Forward-ported functionality from 3.1.6 + (description_meta_tag_names, use_doc_date, ignore_alt_text, + ignore_dead_servers, boolean_keywords, boolean_syntax_errors, + multimatch_factor, translate_latin1)</li> + <li>Fixed bugs in phrase searching</li> + <li>Fixed compile problems due to deprecated C++ includes</li> + <li>Fixed bugs handling double slashes in URLs</li> + <li>Suppress display of matches with weight zero</li> + <li>Fixed bugs in nesting of tags which turn off indexing</li> + </ul> + <ul> + <li>Added Native Win32 support</li> + <li>Added http_proxy_authorization attribute</li> + <li>Improved networking code, with improved cookie handling and + accept_language support</li> + <li>Implemented field-restricted searches (e.g. title:word)</li> + <li>Handle noindex_start/noindex_end as string lists</li> + <li>Implemented external converters, + text/html->text/html-internal</li> + <li>Improved support for MIME types</li> + <li>Changed licence to LGPL from GPL</li> + </ul> + + <p> + <strong>Release notes for htdig-3.2.0b4</strong><br> + This beta was never issued. + </p> + + <p> + <strong>Release notes for htdig-3.2.0b3</strong> 22 Feb 2001<br> + This version is still marked beta because it has still only + received limited testing and there are still revisions pending + for the 3.2 releases. However, it adds more functionality and + should address all serious bugs in the 3.2.0b2 release. + As with 3.2.0b1 and 3.2.0b2, if you are upgrading + from a previous version, you should read the <a + href="upgrade.html">upgrade guide</a> first. + </p> + <p> + <strong>Please note</strong> if you are updating from a prior + release (3.1 or 3.2), the htmerge program has changed syntax as noted + below. You will probably want to change your behavior to call + htpurge instead of htmerge after htdig as noted below. + </p> + <ul> + <li>Fixed several non-exploitable bugs in handling external + parsers or transport agents.</li> + <li>Fix bug where changes in the robots.txt would be + ignored. If a URL was indexed and later the robots.txt + changed to forbid it, the URL would be checked anyway.</li> + <li>Fixed scoring bugs introduced in 3.2.0b2.</li> + <li>Fixed a non-exploitable security issue where content-type + headers were passed incorrectly to external parsers or converters.</li> + <li>Fixed bugs in the accents fuzzy algorithm, cutting down + on the size of the accent database.</li> + <li>Fixed a bug where duplicate documents would be generated when + merging a database with itself.</li> + <li>Fixed a bug in the new regex handling for indexing limits + where large patterns could fail and would be silently ignored.</li> + <li>Fixed minor bugs with the HTTP/1.1 implementation.</li> + <li>Fix a bug where an extra config= portion of a URL would + be output when using collections.</li> + <li>Fixed a bug with content-type declarations in external parsers + with combined content-type; charset declarations.</li> + <li>Fixed a bug in the config parser that did not correctly + handle relative config <a + href="attrs.html#include">include</a> statements.</li> + <li>Fixed a bug in htfuzzy which would append to an existing + synonyms database rather than creating it anew.</li> + <li>Fixed problems with the configure script ignoring + --enable-bigfile flags.</li> + <li>Fixed problems with retrieval order--this could + potentially foul things up when limiting indexing by + hopcount.</li> + <li>Fixed some problems with the HTML in the included sample files.</li> + <li>Make the -l flag to <a href="htdig.html">htdig</a> + obsolete--this is now the default behavior -- the program + will intercept many signals and write a log file for a restart.</li> + <li>Updated database format from the mifluz/htword project.</li> + <li>Changed syntax of <a href="htmerge.html">htmerge</a>. The + program now <em>only</em> merges databases. The <a + href="htpurge.html">htpurge</a> program will "clean + up" databases after running htdig. The included + "rundig" script reflects this.</li> + <li>htload now properly loads ASCII word databases.</li> + <li>Enhanced <a + href="attrs.html#build_select_lists">build_select_lists</a> + attribute.</li> + <li>Added support for controlling the number of Page buttons + in htsearch with <a + href="attrs.html#maximum_page_buttons">maximum_page_buttons</a>.</li> + <li>Added the METADESCRIPTION htsearch template variable for + displaying the <META> description field in output along + with the normal description, instead of using the <a + href="attrs.html#use_meta_description">use_meta_description</a> + attribute.</li> + <li>Added support for permanent URL rewriting with the <a + href="attrs.html#url_rewrite_rules">url_rewrite_rules</a> + attribute. (As opposed to the <a + href="attrs.html#url_part_aliases">url_part_aliases</a> + attribute which can provide a different URL to htsearch and htdig.)</li> + <li>Added support for restricting a search to match only + documents between two dates as specified in the <a + href="hts_form.html">search form</a> as well as the <a + href="hts_templates.html">template variables</a> STARTYEAR, + STARTMONTH, STARTDAY, ENDYEAR, ENDMONTH, ENDDAY.</li> + <li>Added support for limiting duplicates based on MD5 + signatures with the new attributes <a + href="attrs.html#check_unique_md5">check_unique_md5</a>, <a + href="attrs.html#check_unique_date">check_unique_date</a>, <a + href="attrs.html#md5_db">md5_db</a>.</li> + <li>The documentation has been revised to include a block: + portion to note if attributes can be included in URL or + Server blocks. See the <a href="confindex.html" + target="_top">configuration</a> documentation for more + information.</li> + <li>More attributes are set on a per-server or per-URL basis.</li> + <li>New support for nttp:// protocol.</li> + <li>Added support for auto-generating directory listings for + file:// URLs.</li> + <li>Set the default compilation to enable tests that can be + run with "make check"</li> + <li>Greatly improved htnotify program with one message per + e-mail address and support for message + templates using the new attributes <a + href="attrs.html#htnotify_webmaster">htnotify_webmaster</a>, + <a href="attrs.html#htnotify_replyto">htnotify_replyto</a>, <a + href="attrs.html#htnotify_prefix_file">htnotify_prefix_file</a>, + <a href="attrs.html#htnotify_suffix_file">htnotify_suffix_file</a>.</li> + <li>There are the usual variety of other fixes and + changes. See the <a href="ChangeLog">ChangeLog</a> for + more details.</li> + <li>Once again, a huge thank you to everyone who + contributed bug reports, fixes and patches!</li> + </ul> + + <strong>Release notes for htdig-3.2.0b2</strong> 11 Apr 2000<br> + This version is still marked beta because it has still only + received limited testing. However, it adds more functionality + and should fix all known bugs in the previous 3.2.0b1 release, + including the security hole fixed in version 3.1.5 in + production versions. As with 3.2.0b1, if you are upgrading + from a previous version, you should read the <a + href="upgrade.html">upgrade guide</a> first. + </p> + <ul> + <li>Fixed several bugs in the new HTTP/1.1 implementation that would + cause problems with so-called "Chunked" data.</li> + <li>Fixed a bug in the new regex-based configuration options that + would ignore the case_sensitive attribute.</li> + <li>Fixed the robots.txt parsing to more rigorously stick to the + standard.</li> + <li>Fixed a bug where upper-case META robots directives would be + ignored.</li> + <li>Fixed a bug that could leave a connection open when it failed.</li> + <li>Fixed the timeout in the connection code to ensure that hung + connections are killed properly.</li> + <li>Fixed a bug where duplicates of modified documents could pile up + over time.</li> + <li>Fixed a bug in the SGML entity handling where numeric entities + would be ignored. (e.g. &#162; -> ¢)</li> + <li>Fixed a bug in the new configuration parser that + wouldn't accept lists including numbers</li> + <li>Fixed a potential infinite loop in the phrase + searching parser that came up when fuzzy algorithms were + used.</li> + <li>The HTML parser now ignores anything between <script> tags, + much like it does for <style> tags.</li> + <li>Fixed some performance problems in the new word database code.</li> + <li>Removed the attributes translate_quot, translate_lt, translate_gt + and translate_amp since all SGML entities are now encoded and decoded + when displayed.</li> + <li>Removed the attribute uncoded_db_compatible since the 3.2 + databases are no longer compatible with previous versions anyway.</li> + <li>Removed the attribute word_list because the db.wordlist file is no + longer generated. To get an ASCII version of the database, use the + word_dump attribute.</li> + <li>Removed the pdf_parser attribute. It is now preferred to use the + external parser or external converter support with xpdf.</li> + <li>The <a + href="attrs.html#wordlist_compress">wordlist_compress</a> + attribute is now turned on by default.</li> + <li>The output from htsearch and the default and included templates + should now be more HTML-4.0 compliant.</li> + <li>Added support for searching collections of multiple + databases. To use this, supply multiple config fields or + config names separated by "|" characters. Also + see the <a + href="attrs.html#collection_names">collection_names</a> attribute.</li> + <li>Added a new accents fuzzy algorithm, which treats + accented and unaccented words the same. You must create an + <a href="attrs.html#accents_db">accents_db</a> with + htfuzzy after indexing.</li> + <li>Added new attributes <a + href="attrs.html#tcp_max_retries">tcp_max_retries</a> and + <a href="attrs.html#tcp_wait_time">tcp_wait_time</a> to + control how many times a low-level connection is retried + and how long to wait on a hung connection.</li> + <li>Add <a href="attrs.html#any_keywords">any_keywords</a> + attribute to OR the keywords field in a search form + instead of AND-ing them together.</li> + <li>Add the attributes <a + href="attrs.html#search_results_order">search_results_order</a> + and <a href="attrs.html#url_seed_score">url_seed_score</a> + to control result ranking and scoring based on URL patterns.</li> + <li>Moved the htnotify program into the new httools directory.</li> + <li>Added the programs <a href="htdump.html">htdump</a>, + <a href="htload.html">htload</a>, <a + href="htstat.html">htstat</a> and <a + href="htpurge.html">htpurge</a>.</li> + <li>There are the usual variety of other fixes and + changes. See the <a href="ChangeLog">ChangeLog</a> for + more details.</li> + <li>Once again, a huge thank you to everyone who + contributed bug reports, fixes and patches!</li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.5</strong> 25 Feb 2000<br> + This version cleans up some remaining bugs in the 3.1.4 + release. As the latest stable release of ht://Dig, it is + recommended for all production servers. + </p> + <ul> + <li>Fixed a nasty security hole in htsearch, which would allow + users to view any file on your site that had read permission.</li> + <li>Fixed a bug that could cause problems with 8-bit + characters on some systems.</li> + <li>Made some attempts to get htsearch's output to be more HTML 4.0 + compliant. It quotes all HTML tag parameters, and uses ";" + instead of "&" as parameter separator in URLs for next + pages. Reserved characters in parameters are now + encoded. Please note that this may break a variety of CGI + wrappers, for example, those written in PHP3.</li> + <li>Fixed handling of SGML entities: htdig will still decode + them to store as single characters in the database, but + htsearch now encodes some of them back for compliant results.</li> + <li>Added two new formats for variables in htsearch templates, + $%(var), which escapes the variable for a URL, and $&(var), + which HTML-escapes the variable as necessary.</li> + <li>Fixed htdig's handling of robots.txt, such that only the first + applicable User-agent field bearing its name will be used, rather + than only the last.</li> + <li>Fixed htdig's handling of servers that return 2-digit years.</li> + <li>Fixed handling of embedded quotes in quoted string lists.</li> + <li>Fixed handling of relative URLs with trailing ".." or leading + "//".</li> + <li>Fixed handling of the + <a href="attrs.html#valid_extensions">valid_extensions</a> + attribute, which sometimes failed in the previous version.</li> + <li>Enhanced the handling of local filesystem indexing with the + <a href="attrs.html#local_urls">local_urls</a>, + <a href="attrs.html#local_user_urls">local_user_urls</a> or + <a href="attrs.html#local_default_doc">local_default_doc</a> + attributes, which now allow multiple directory or file names to + be tried.</li> + <li>Added the <a + href="attrs.html#build_select_lists">build_select_lists</a> + attribute to allow the config file to specify + <select> form elements in htsearch output as a + template variable, much like $(SORT) and $(METHOD).</li> + <li>Added support for two additional configuration attributes: + <a href="attrs.html#max_keywords">max_keywords</a>, and + <a href="attrs.html#nph">nph</a>.</li> + <li>A variety of other bug fixes, and many documentation updates. + See the <a href="ChangeLog">ChangeLog</a> for details.</li> + <li>Once again, thanks to everyone who reported bugs and bug + fixes.</li> + </ul> + + <p> + <strong>Release notes for htdig-3.2.0b1</strong> 4 Feb 2000<br> + This marks the first beta version of the 3.2.0 codebase, + over a year in the works. Since it has not received as much + testing as the 3.1.x series, it is *not* recommended for + production environments. A full description of how to upgrade + is provided <a href="upgrade.html">here</a>. + <blockquote><strong>NOTE:</strong> Read this document before + upgrading. You have been warned.</blockquote> + </p> + <ul> + <li>Fixed a bug in htdig where hopcounts could be calculated + incorrectly between multiple servers.</li> + <li>Fixed a bug that could cause problems with 8-bit + characters on some systems.</li> + <li>Fixed handling of unreachable servers. First, the new <a + href="attrs.html#max_retries">max_retries</a> attribute allows + htdig to attempt multiple connections. Secondly, if the server + is not available, htdig will stop trying to connect.</li> + <li>Fixed handling of SGML entities: htdig will still decode + them to store as single characters in the database, but + htsearch now encodes them back for compliant results.</li> + <li>Rewrote the database formats, allowing room for more + sophisticated searches and compression of the word database + using the new attribute <a + href="attrs.html#wordlist_compress">wordlist_compress</a>. + These changes include the removal of the word_list file + (db.wordlist) and the addition of the new <a + href="attrs.html#doc_excerpt">doc_excerpt</a> database.</li> + <li>Cleaned up many parts of the code, including the URL and + HTML parsers. Additionally, on platforms that support it, much + of the code will be built as shared libraries, which should + help memory utilization, especially under high load.</li> + <li>Removed the modification_time_is_now attribute, which is + now on by default. This means the time at indexing is taken as + the date of the document if the server does not return a + date.</li> + <li>Added the new attribute <a + href="attrs.html#use_doc_date">use_doc_date</a> to use the + date specified in a META date tag.</li> + <li>Merged all heading_factor attributes into one new + attribute, <a + href="attrs.html#heading_factor">heading_factor</a>.</li> + <li>As a result of the new database format, all _factor + attributes (like <a + href="attrs.html#title_factor">title_factor<a/> and <a + href="attrs.html#keywords_factor">keywords_factor</a> are + now dynamic--you do not have to rebuild your database to + change the scaling.</li> + <li>Changed attributes <a + href="attrs.html#bad_querystr">bad_querystr</a>, <a + href="attrs.html#exclude_urls">exclude_urls</a>, <a + href="attrs.html#limit_urls_to">limit_urls_to</a>, <a + href="attrs.html#limit_normalized">limit_normalized</a>, + <a + href="attrs.html#http_proxy_exclude">http_proxy_exclude</a> + to allow full regular expressions when the regex are + surrounded by [ and ].</li> + <li>Changed htsearch fields restrict and exclude to allow + regular expressions when the regex are surrounded by [ and + ].</li> + <li>Added phrase searching support to htsearch--queries + enclosed in quotes will be checked to ensure the words + occur in that exact order in the documents.</li> + <li>Added the <a + href="attrs.html#build_select_lists">build_select_lists</a> + attribute to allow the config file to specify + <select> form elements in htsearch output as a + template variable, much like $(SORT) and $(METHOD). + <li>Added a regex fuzzy method. This will allow searches to + include regex that match words. The fuzzy method will + return up to <a + href="attrs.html#regex_max_words">regex_max_words</a> matches.</li> + <li>Added a speling [sic] fuzzy method. This attempts several + simple spelling mistakes (like transposed letters and + extra letters) to find matches. This adds the new + attribute <a + href="attrs.html#minimum_speling_length">minimum_speling_length</a> + to restrict whether small words should be + checked. Transposing letters in smaller words can give + unrelated correctly-spelled words.</li> + <li>Added support for external transport methods, using the <a + href="attrs.html#external_protocols">external_protocols</a> + attribute, an analogue of the external_parsers system.</li> + <li>Added support for HTTP/1.1, including persistent + connections. This can be configured using the new attributes <a + href="attrs.html#persistent_connections">persistent_connections</a>, + <a href="attrs.html#head_before_get">head_before_get</a>, + and <a href="attrs.html#max_connection_requests">max_connection_requests</a>. + </li> + <li>Added support for file:// URLs and support for using the + <a href="attrs.html#mime_types">mime_types</a> file to + decide whether local files are parsable.</li> + <li>Added two new formats for variables in htsearch templates, + $%(var), which escapes the variable for a URL, and $&(var), + which HTML-escapes the variable as necessary.</li> + <li>Added support for reading the list of URLs to index with + <a href="htdig.html">htdig</a> by supplying the + command-line option -.</li> + <li>Added a flag -m to <a href="htdig.html">htdig</a> to index <em>only</em> the + files given in the filename.</li> + <li>There are many more changes especially to the internal + code structure, so a huge thank you goes out to everyone + who helped make this release! + </ul> + + <p> + <strong>Release notes for htdig-3.1.4</strong> 9 Dec 1999<br> + This version cleans up some remaining bugs in the 3.1.3 + release. As the latest stable release of ht://Dig, it is + recommended for all production servers. + </p> + <ul> + <li>Fixed a nasty bug in URL parameter parsing, which was gobbling + up bare ampersands (&) and CGI parameter names.</li> + <li>Fixed a bug where htdig would go into an infinite loop if an + entry in <a href="attrs.html#local_urls">local_urls</a>, + <a href="attrs.html#local_user_urls">local_user_urls</a> or + <a href="attrs.html#server_aliases">server_aliases</a> was + missing the "=".</li> + <li>Fixed a bug in htsearch, where it failed when reading long + queries via the POST method.</li> + <li>Fixed a bug in htdig, where it failed to close the connection + after certain errors.</li> + <li>Fixed a bug that clobbered the hop count of initial documents.</li> + <li>Fixed bugs in HTML parser's handling of META tags. It no longer + continues indexing meta tags when indexing is turned off for the + document, and it no longer gets confused by punctuation in META + descriptions and keywords.</li> + <li>Fixed a bug in the handling of the + <a href="attrs.html#case_sensitive">case_sensitive</a> + attribute, so that it's not limited to robots.txt + parsing. Now, if false, it causes URLs to be mapped to + lowercase, to avoid mixed case duplicates as expected.</li> + <li>HTML parser now indexes text in alt parameter of img tags, and + calculates word locations more accurately than before.</li> + <li>Digging via the local filesystem can now be done even without + an HTTP server running, and a few more file types can be indexed + locally, without having to rely on the server.</li> + <li>Sender name in htnotify's e-mail messages is now quoted.</li> + <li>The <a href="attrs.html#external_parsers">external_parsers</a> + attribute is now extended to support external converters, to avoid + a lot of the complications of writing external parsers.</li> + <li>Added support for several new configuration attributes: + <a href="attrs.html#authorization">authorization</a>, + <a href="attrs.html#start_highlight">start_highlight</a>, + <a href="attrs.html#end_highlight">end_highlight</a>, + <a href="attrs.html#local_urls_only">local_urls_only</a>, + <a href="attrs.html#page_number_separator">page_number_separator</a>, + <a href="attrs.html#script_name">script_name</a>, + <a href="attrs.html#template_patterns">template_patterns</a>, and + <a href="attrs.html#valid_extensions">valid_extensions</a>.</li> + <li>The keywords input parameter to htsearch is now propagated to + followup searches, as for other input parameters.</li> + <li>The query string can now be passed to htsearch as a single + command line argument, for use in scripts.</li> + <li>Added better examples and comments in sample htdig.conf, and + added boolean match type to sample search.html form.</li> + <li>The HTML parser in htdig now turns off indexing between + <style> and </style> tags.</li> + <li>A variety of other bug fixes, and many documentation updates. + See the <a href="ChangeLog">ChangeLog</a> for details.</li> + <li>Once again, thanks to everyone who reported bugs and bug + fixes.</li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.3</strong> 22 Sep 1999<br> + This version fixes a number of bugs in the 3.1.2 release and + is the latest stable release of ht://Dig. It is the only version + recommended for production servers and users of all previous + versions are suggested to upgrade. + </p> + <ul> + <li>Fixed a long-standing bug where search queries containing + punctuation would not be highlighted in excerpts.</li> + <li>Fixed a bug where SGML entities inside HTML tags were not + expanded.</li> + <li>Fixed the <a + href="attrs.html#server_aliases">server_aliases</a> + attribute to default to port 80 if ommitted. + <li>Fixed a bug in URL parsing, where documents ending in the + value used for remove_default_doc were ignored. For + example, a URL ending in /left_index.html would become /. + <li>Fixed META robot parsing to correctly parse multiple + directives.</li> + <li>Fixed a coredump when generating the metaphone fuzzy + database on some systems.</li> + <li>Fixed the behavior of the <a + href="attrs.html#modification_time_is_now">modification_time_is_now</a> + attribute to work as documented.</li> + <li>Fixed the behavior of htdig to block out the + username/password set on the command-line in process + listing.</li> + <li>Fixed a bug with external parsers to prevent shell escapes + in filenames.</li> + <li>Fixed a bug on some systems, where printing a date might + crash.</li> + <li>Handles the ispell endings lists better so that suffixes + more closely match grammatical rules.</li> + <li>Changed the maximum word length to a run-time option, set + with the new attribute <a + href="attrs.html#maximum_word_length">maximum_word_length</a>. + <li>Tests for the presence of alloca.h, which would cause + problems with compiling the regex code under non-GNU + compilers.</li> + <li>Added support for <EMBED>, <OBJECT>, and + <LINK> HTML tags. + <li>A variety of other bugs were fixed, see the + <a href="ChangeLog">ChangeLog</a> for details.</li> + <li>When indexing, htdig should now attempt to index compound + words as separate words in addition to a compound word. For + example, "pdf_parser" would also be indexed as "pdf" and "parser." + <li>Once again, thanks to everyone who reported bugs and bug + fixes.</li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.2</strong> 21 Apr 1999<br> + This version fixes a number of bugs in the 3.1.1 release and + is the latest stable release of ht://Dig. It is highly + recommended for production servers. + </p> + <ul> + <li>Fixed a bug that ignored META description tags when they + were also added to the meta_keywords attribute.</li> + <li>Fixed the HTML comment parsing to be more lenient about + non-standard comments.</li> + <li>Fixed problems in the date-parsing code that made it Y2K + incompatible. In particular, it forgot that 2000 is a leap + year and wouldn't correctly parse dates after 29 Feb + 2000.</li> + <li>Fixed a variety of bugs in the HTML parser.</li> + <li>Fixed an old bug that would exclude <strong>all</strong> URLs if + the exclude_urls attribute left empty.</li> + <li>Fixed display of META description tags. Now it always + shows the top of a description. If no description exists, it + looks for the search terms in the excerpt as usual.</li> + <li>Fixed some small memory leaks.</li> + <li>Changed the htfuzzy endings algorithm to use a more + efficient regex system. Speed improvements on non-English + languages are noted, now taking minutes for generation that + would take days!</li> + <li>Changed the noindex_start and noindex_end attributes to + allow case-insensitive matching.</li> + <li>Added on-disk versions of the builtin templates to make it + more obvious how to change the results templates.</li> + <li>Added <a href="attrs.html#date_format">date_format</a> + attribute to change the format of dates output in search results.</li> + <li>Added <a href="attrs.html#extra_word_characters">extra_word_characters</a> + attribute that defines extra characters that should be + considered part of a word, rather than punctuation.</li> + <li>Several other, relatively minor bugs were also + fixed. Many thanks to those who sent in bug reports and to + Gilles Detillieux for coordinating this release.</li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.1</strong> 17 Feb 1999<br> + This version cleans up some remaining bugs in the 3.1.0 + release. As the latest stable release of ht://Dig, it is + recommended for all production servers. + </p> + <ul> + <li>Fixed a bug in the configure script under IRIX and Solaris 7. + </li> + <li>Fixed a minor bug with the Berkeley database code under + AlphaLinux.</li> + <li>Fixed a serious bug causing bus errors on several platforms, + notably Solaris SPARC, caused by unaligned access to database + structures.</li> + <li>Fixed some bugs in the boolean search parser.</li> + <li>Replaced the contributed parse_word_doc.pl script with a + more capable parse_doc.pl script.</li> + <li>Fixed the htnotify program to parse dates as mentioned in the + <a href="notification.html">documentation</a>.</li> + <li>Cleaned up some minor mistakes in the documentation and moved + to HTML 4.0 Transitional syntax.</li> + <li>Fixed the documentation for the <a + href="attrs.html#pdf_parser">pdf_parser</a> attribute that was + changed in version 3.1.0. This attribute must call the parser with + all command-line options. + </ul> + + <p> + <strong>Release notes for htdig-3.1.0</strong> 9 Feb 1999<br> + This version marks the "full release" of version + 3.1.0. Naturally, this version adds a few new feature and fixes a + large number of remaining bugs. This version is the latest stable + release of ht://Dig and is recommended for all production servers + for current bug-fixes and oft-requested + features. + </p> + <blockquote> + <p> + <strong>NOTE:</strong> You <em>must</em> rebuild + your databases from scratch after updating to this + version. Several database-related bugs were fixed and will remain + unless you rebuild from scratch. We're sorry for any + inconvenience. + </p> + </blockquote> + <ul> + <li>Fixed a variety of small memory leaks.</li> + <li>Fixed a bug that could duplicate documents in the document + databases.</li> + <li>Fixed a bug that would not remove documents marked as deleted.</li> + <li>Fixed a bug that could dump core with incorrectly defined + template_map attributes.</li> + <li>Fixed a bug that could dump core or produce bogus dates when + a server returns the date in an incorrect format.</li> + <li>Fixed a variety of string-matching bugs that caused problems + with restricting indexing and searching.</li> + <li>Fixed a bug that could dump core if logging searches and CGI + environment variables were not set.</li> + <li>Fixed a bug that would not hilight searches properly if they + contained punctuation.</li> + <li>Fixed PDF parsing to support programs beyond acroread.</li> + <li>Fixed a bug that caused problems with large robots.txt files.</li> + <li>Fixed a bug in the sample rundig script from a non-portable + test for the age of databases.</li> + <li>Fixed bugs in the fuzzy matching code that could prevent + searches from completing if fuzzy databases were not present.</li> + <li>Fixed bugs in the soundex and metaphone algorithms that + would only return the first word of several matching + words. <strong>Note</strong> that to completely fix this bug, you must + rebuild your soundex and metaphone databases.</li> + <li>Fixed up many compilation warnings and errors.</li> + <li>Fixed a performance slowdown in htsearch when + <a href="attrs.html#backlink_factor">backlink_factor</a> and + <a href="attrs.html#date_factor">date_factor</a> are zero and can + be ignored.</li> + <li>Improved performance when a server ignores the + If-Modified-Since request during update digs.</li> + <li>Added a warning message if the locale: option is set + to a locale that is not present.</li> + <li>Some minor performance improvements.</li> + <li>Allow "include" keyword in <a href="cf_general.html">config + file</a> to include other config files.</li> + <li>Uses latest (2.6.4) version of the Berkeley database.</li> + <li>Two databases may be merged together using + <a href="htmerge.html">htmerge</a>.</li> + <li>The <a href="htdig.html">htdig</a> program can be safely + stopped and restarted in the middle of a dig. The dig will write + the progress to the file specified by the new + <a href="attrs.html#url_log">url_log</a> option.</li> + <li>Added support for anchors in excerpts with the + <a href="attrs.html#add_anchors_to_excerpt">add_anchors_to_excerpt</a> + option and the ANCHOR template variable.</li> + <li>Added support for sorting results in increasing or + decreasing order of document date, size, title and score using + the <a href="hts_form.html">search form</a>. Note that changing + sort from the default of score will result in a performance + decrease.</li> + <li>Added config options <a href="attrs.html#sort">sort</a> and + <a href="attrs.html#sort_names">sort_names</a> to change the + default sort and names used in the SORT template variable. + <li>Added the option <a + href="attrs.html#compression_level">compression_level</a> to + compress the document database if the zlib library is + present.</li> + <li>Added the options + <a href="attrs.html#noindex_start">noindex_start</a> and + <a href="attrs.html#noindex_stop">noindex_stop</a> to delimit + sections of HTML documents to be ignored.</li> + <li>Added the option + <a href="attrs.html#allow_in_form">allow_in_form</a> to allow + specific config options to be set in the search form.</li> + <li>Added the option + <a href="attrs.html#bad_querystr">bad_querystr</a> to ingore URLs + containing specified CGI queries.</li> + <li>Added the option + <a href="attrs.html#search_results_wrapper">search_results_wrapper</a> + to replace separate header and footer files. For mor + information, see the <a href="hts_general.html">general + htsearch</a> documentation.</li> + <li>Added option + <a href="attrs.html#no_title_text">no_title_text</a> to allow + configuration of the text used when no title is found.</li> + <li>Added option + <a href="attrs.html#url_part_aliases">url_part_aliases</a> to allow + rewriting portions of URLs.</li> + <li>Added option + <a href="attrs.html#common_url_parts">common_url_parts</a> to + compression common portions of URLs. Requires rebuilding + databases when changed.</li> + <li>Added option + <a href="attrs.html#remove_default_doc">remove_default_doc</a> to + control whether ht://Dig strips off the default document in a + folder. Set to empty will prevent problems with servers that + treat / and /index.html as different URLs.</li> + <li>Of course there are many other bug-fixes and small + enhancements. Many thanks to everyone who reported a bug or + contributed code for this release!</li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.0b4</strong> 22 Dec 1998<br> + This version fixes a security hole in htnotify. The hole has been + present in previous versions but was inadevertently made worse in + the 3.1.0 beta releases. Malicious users could contstruct pages + that executed commands running under the shell of the user running + htnotify. <strong>It is highly recommended that users of previous + versions switch to this release.</strong> + </p> + <ul> + <li>Fixed a memory leak in htnotify and htsearch.</li> + <li>Updated the contributed parse_word_doc.pl script.</li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.0b3</strong> 15 Dec 1998<br> + This version adds only a few features and a significant number of + bug fixes. This version has been pretty thoroughly tested. Though + there are a few remaining issues, it is hoped that this will be + near the end of the beta releases before version 3.1.0. Note that + it's recommended to update your databases to eliminate the + possibility of subtle changes in the database format. + </p> + <ul> + <li>Fixed a bug which would ignore the proxy settings, + introduced in version 3.1.0b2.</li> + <li>Fixed a bug where words would remain from deleted + documents.</li> + <li>Fixed a bug where SGML < was considered part of a tag + in the HTML parser, introduced in verison 3.1.0b2.</li> + <li>Fixed a bug where empty boolean searches would dump + core.</li> + <li>Fixed a bug where boolean "and," "or," and "not" would be + removed from a search string, causing a sytnax error.</li> + <li>Fixed a bug which wouldn't keep track of the hopcounts + correctly.</li> + <li>Added support for META refresh tags, contributed by Aidas + Kasparas</li> + <li>Added support for using CGI + <a href="http://hoohoo.ncsa.uiuc.edu/cgi/">environment + variables</a> in the search templates, contributed by Gilles + Detillieux.</li> + <li>Improved memory requirements <strong>slightly</strong> through + fixing a memory leak in htdig and a general system-wide + adjustment.</li> + <li>Improved support for multiple exclude and restrict items + through htsearch, contributed by William Rhee and Gilles.</li> + <li>Improved support to compile under CygWinB20, contributed + by Klaus Mueller.</li> + <li>Upgraded to the latest version (2.5.9) of the + <a href="http://www.sleepycat.com/">Berkeley DB</a> + <li>Added a new option + <a href="attrs.html#server_wait_time">server_wait_time</a> to + give a delay between connections to a server. Currently this + can also affect local filesystem digging if set.</li> + <li>Added a new option + <a href="attrs.html#server_max_docs">server_max_docs</a> to limit + the number of documents pulled down from a server in one dig.</li> + <li>Added a new option + <a href="attrs.html#http_proxy_exclude">http_proxy_exclude</a> + to ignore the proxy setting on certain URLs.</li> + <li>Added a new option + <a href="attrs.html#no_excerpt_show_top">no_excerpt_show_top</a>to + show the top of a document when there is no excerpt.</li> + <li>Added new options + <a href="attrs.html#date_factor">date_factor</a>, + <a href="attrs.html#backlink_factor">backlink_factor</a>, and + <a href="attrs.html#description_factor">description_factor</a> to + improve search rankings. Respectively, they can give higher + rankings to more recent documents, documents with a high + number of links pointing to them, and documents with relevant + URL descriptions pointing to them. See the documentation for + more information.</li> + <li>Added a set of contributed scripts called multidig to help + work with multiple sets of URLs and databases.</li> + <li>Fixed many compilation problems under AIX, thanks to + Alexander Bergolth!</li> + <li> + Many other bugs were fixed, so a big thanks to everyone + who submitted a bug report, patch or gave other feedback! See the + <a href="ChangeLog">ChangeLog</a> for more details. + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.1.0b2</strong> 1 Nov 1998<br> + This version adds a few minor features as well as many + bugfixes. It is still considered beta as some bug reports have not + been fully examined. + </p> + <ul> + <li> + Fixed a <strong>major</strong> database corruption + problem. Since this bug corrupted the document databases, to + completely fix it, you will need to rebuild your databases from + scratch. + </li> + <li> + Fixed many problems with the Makefiles and configure + scripts. Using <code>./configure --prefix=</code> now works. + </li> + <li> + Added fixes for connection problems with Digital Alpha-based + systems contributed by Paul J. Meyer! + </li> + <li> + Added support for syslog-based htsearch logging. See the + <a href="attrs.html#logging">config documentation</a> for more + details. Thanks to Leo Bergolth for this! + </li> + <li> + Added fixes to work with DNS aliases (as opposed to virtual + hosts) through the + <a href="attrs.html#server_aliases">server_aliases</a> and + <a href="attrs.html#limit_normalized">limit_normalized</a> options + as contributed by Leo Bergolth. + </li> + <li> + Added cleanups of the HTML parser and the connection timeout + code contributed by René Seindal. + </li> + <li> + Now supports case insensitive servers through the + <a href="attrs.html#case_sensitive">case_sensitive</a> option. + </li> + <li> + Now supports ISO 8601 date format, using the + <a href="attrs.html#iso_8601">iso_8601</a> option. + </li> + <li> + Added a wrapper to emulate Exite for Web Servers (EWS) + contributed by John Grohol. + </li> + <li> + Added fixes to the contrib whatsnew.pl script to work with DB2 + contributed by Jacques Reynes. + </li> + <li> + Added a new contributed synonyms file from John Banbury + <li> + Added a new template variable: CURRENT, the number of the + current match, from a patch by René Seindal. + <li> + Many other minor bugs were fixed, so a big thanks to everyone + who submitted a bug report or a patch! See the + <a href="ChangeLog">ChangeLog</a> for more details. + </li> + </ul> + <br> + + <p> + <strong>Release notes for htdig-3.1.0b1</strong> 8 Sep + 1998<br> + This version adds several major new features as well as some + bug-fixes. It is considered a beta release since it has only seen + limited testing. + </p> + <blockquote> + <p> + <font face="Helvetica" size="+1">It is <strong> + extremely</strong> important that you rebuild all your databases made + with previous versions. This version no longer uses the GDBM database + format and databases produced with it will be incompatible with other + versions. Do not blame me for anything if you didn't do this. You have + been warned...</font> + </p> + </blockquote> + <ul> + <li> + Added patches made by Pasi Eronen to support local filesystem access + </li> + <li> + Added a PDF parser contributed by Sylvain Wallez + </li> + <li> + Added support for META description and robots tags + </li> + <li> + Converted the database code to use the BerkeleyDB format, contibuted + by Esa Ahola and Jesse op den Brouw. + </li> + <li> + Added a prefix fuzzy algorithm, contributed by Esa and Jesse. + </li> + <li> + Various other bugs were fixed. Thanks for all the patches + that were sent to me and the mailing list! + </li> + </ul> + <br> + + <p> + <strong>Release notes for htdig-3.0.8b2</strong> 15 Aug + 1997<br> + This new version contains most of the patches that Pasi Eronen + has posted to the list plus some other random fixes. + </p> + + <p> + <strong>Release notes for htdig-3.0.8b1</strong> + 27-Apr-1997<br> + I consider this a beta release since I have not had time to + test everything. Use at your own risk... + </p> + <ul> + <li> + Base tag problem fixed + </li> + <li> + URL parser somewhat more robust + </li> + <li> + Date parsing bug fixed + </li> + <li> + Added Substring fuzzy algorithm. + </li> + <li> + Various other bugs were fixed. Thanks for all the patches + that were sent to me! + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.0.7</strong> 12-Jan-1997<br> + More bug fixes and some minor new functionality. Hopefully, + I'll be able to finish up work on version 3.1 at some point in + the near future.<br> + I have recently received some more patches for various things, + but I have not incorporated those, yet. Next version. + </p> + <ul> + <li> + The problem with the missing words has been fixed. This was + a problem in the Dictionary class. + </li> + <li> + htsearch is a *lot* faster due to a patch by Esa Ahola. + </li> + <li> + htfuzzy has some work done to it. With the addition of the + new rx-1.4 library, the endings algorithm now actually + works for languages other than English... It still takes an + awfully long time to build the tables for languages with + lots of rules. + </li> + <li> + URLs now can be of the dubious form http:foo.html I have + never seen this used and think it is bogus, but alas, it + works now. + </li> + <li> + A search form can now manually add words to any search + using the new <em>keywords</em> form attribute. + </li> + <li> + A problem in the plaintext parser used to cause bogus HTML + in search results. This has been fixed. + </li> + <li> + New documentation format. Lots of new documentation, as + well. + </li> + <li> + New robotstxt_name attribute. Used to match the + 'user-agent' lines in robots.txt files. + </li> + <li> + The <base> tag is now properly supported. + </li> + <li> + Preliminary support for lots of new features, including: + <ul> + <li> + External document parsers. You'll be able to write your + own document parser for that special document type that + ht://Dig doesn't know about. + </li> + <li> + New fuzzy search algorithms: substring, regex, + globbing, etc. + </li> + </ul> + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.0.6</strong> 26-Oct-1996<br> + Just a single bug fix and one additional feature in this + release. + </p> + <ul> + <li> + Fixed the problem that caused frequent crashes with virtual + memory exhausted. + </li> + <li> + Added a new attribute, keywords_meta_tag_names, which + should contain a list of meta tag names for which the + content should be used as keywords. The default is set to + "keywords htdig-keywords" + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.0.5</strong> 13-Oct-1996<br> + This release consists of more bug fixes.<br> + I want to thank Elliot Lee <sopwith@cuc.edu> for his + help with tracking down several bugs. + </p> + <ul> + <li> + Fixed problem with accent characters. Words with SGML + entities and iso-8859-1 characters will now be indexed + correctly. + </li> + <li> + Changed the auto configuration to detect the need for a + prototype for the gethostname() function. (This was + supposed to be fixed before, but wasn't) + </li> + <li> + Reduced the memory requirements for all the programs by + changing the rehash() method in the Dictionary class. + Access to hashes may be a little slower, but the memory + requirements were reduced by a factor 10 or so. + </li> + <li> + Hopefully fixed a problem with the time related functions + on certain platforms. More checks are done to make sure the + functions that are used are actually available. + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.0.4</strong> 2-Sep-1996<br> + The previous version failed to build under Linux. This should + be fixed now. + </p> + <ul> + <li> + Fixed problem with the time stuff which caused the build of + htdig to fail. + </li> + <li> + Fixed a memory problem in htdig + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.0.3</strong> 2-Sep-1996<br> + Bugs bugs bugs... Will they <em>ever</em> all be found? + </p> + <p> + <strong>NOTE</strong>: I made extensive changes to the htdig.conf file + that gets installed. I would advise you to remove or rename + your existing htdig.conf and let the installation process + create a new one for you that you can then modify. + </p> + <p> + Also, since the rundig script has changed, you should remove + the old one before installing ht://Dig. (The installation + will refuse to overwrite existing files...) + </p> + <ul> + <li> + The problem with htsearch crashing on some machines has + been fixed. + </li> + <li> + A bug caused the <AREA> tab to be ignored. Fixed. + </li> + <li> + A bug in SunOS caused dates to be all screwed up. + </li> + <li> + Added lots of comments to the example htdig.conf file. Also + added some additional example attributes. + </li> + <li> + Fixed a bug in the installation process which caused rundig + to be created incorrectly. + </li> + <li> + Added a sample synonyms file. Also modified rundig to + create a synonyms database for it. + </li> + </ul> + + <p> + <strong>Release notes for htdig-3.0.2</strong> 22-Aug-1996<br> + More bug fixes. + </p> + <ul> + <li> + Multiple start URLs now actually work. Before they were + just documented to work, but didn't actually work. + </li> + <li> + htmerge now will refuse to remove database files if it + detects that the call to /bin/sort failed. + </li> + <li> + htmerge can now tell /bin/sort to use a specific temporary + directory. This is done by setting the TMPDIR environment + variable. + </li> + <li> + htsearch can now search for words with non-ASCII characters + in them. + </li> + <li> + Added support for finding URLs in the <frame> and + <area> tags. + </li> + <li> + There is a problem with htsearch under Linux. It causes a + segmentation violation after the first search result is + displayed. Don't know what the problem is, yet. + </li> + <li> + Fixed bug in the auto configuration which always set the + value for NEED_PROTO_GETHOSTNAME to 1. For most systems + this actually needs to be 0. + </li> + <li> + <strong>Release notes for htdig-3.0.1</strong> + 16-Aug-1996<br> + This is a maintenance release in response to several bug + reports. + <ul> + <li> + htdig now will display a list of errors when the + statistics option (-s) is used. The list gives the URL + that caused the error and a URL that referred to it. + Hopefully this information is useful for site + maintainers. + </li> + <li> + Some problems with the SGML character entities were + fixed. The major symptom was that the ';' that ends an + entity used to be included as well. + </li> + <li> + Major problems with htnotify were fixed. There were + many hardcoded things in this program that made it very + specific to SDSU and to me. + </li> + <li> + malloc.h should not be included anymore. All references + to it were replaced with stdlib.h instead. This should + make compiles on some platforms work better. + </li> + <li> + htsearch now will use the CONFIG_DIR environment + variable to override the compiled in default. (set in + the CONFIG file...) This was done so that htsearch can + be called from a simple wrapper that sets that + environment variable. Only the wrapper needs to be be + modified to get different CONFIG_DIR values. + </li> + </ul> + </ul> + + <p> + <strong>Release notes for htdig-3.0</strong> + 17-Jul-1996<br> + I decided to make this the <em>official</em> 3.0 release. + </p> + <blockquote> + <blockquote> + <font face="Helvetica" size="+1">It is <strong> + extremely</strong> important that you remove all traces + of earlier beta versions of the software before + installing this version or that you install in a + completely different location. Do not blame me for + anything if you didn't do this. You have been + warned...</font> + </blockquote> + </blockquote> + <ul> + <li> + htwrapper is no more. htsearch is now the CGI program + </li> + <li> + <a href="htsearch.html" target="_top">htsearch</a> now + uses templates to display the results. A template is + simply a piece of HTML code for a single match. The + HTML code includes variables that will be expanded to + the various items that are unique to each match, like + URL, EXCERPT, TITLE, etc. The template can be selected + at search time (through a menu). There are two builtin + templates: <code>builtin-short</code> and <tt> + builtin-long</code>. The <code>builtin-short</tt> template + just lists the stars and title while the <code> + builtin-long</code> template lists results in a similar + fashion to the way Alta Vista displays results. + </li> + <li> + Many runtime configuration options have been removed + and many new ones have been added. Check the + <a href="attrs.html">configuration file</a> documentation for + details. There are also some enhancements to the format + of the configuration file. + <ul> + <li> + Attribute values can now span multiple lines by + ending each line that needs to be continued with a + backslash ('\'). The file that is specified is read + in and all newlines and starting and trailing + whitespaces are reduced to a single space. If the + file is not found, nothing is included and no error + is flagged.<br> + Note that the backquote character is used, not the + regular quote character. + </li> + <li> + Attribute values can now include the contents of + files. Just put the filename in back-quotes. The + filename can use the normal variable expansion so + that things like: + <blockquote> + <code>someattribute: `${common_dir}/somefile`</code> + </blockquote> + </li> + </ul> + Notable attribute changes: + <ul> + <li> + All the attributes that set the heading text have + been removed. These attributes include: + <ul> + <li> + accessed_heading_text + </li> + <li> + datesize_heading_text + </li> + <li> + descriptions_heading_text + </li> + <li> + excerpt_heading_text + </li> + <li> + modified_heading_text + </li> + <li> + score_heading_text + </li> + <li> + size_heading_text + </li> + <li> + url_heading_text + </li> + <li> + wordlist_heading_text + </li> + <li> + field_order + </li> + </ul> + </li> + <li> + New attributes added: + <dl> + <dt> + <strong>http_proxy</strong> + </dt> + <dd> + Added to support the use of a HTTP proxy server + to index documents + </dd> + <dt> + <strong>locale</strong> + </dt> + <dd> + Added to support international character sets + </dd> + <dt> + <strong>match_method</strong> + </dt> + <dd> + New way of specifying if a search is an 'or', + 'and', or 'boolean' search + </dd> + <dt> + <strong>matches_per_page</strong> + </dt> + <dd> + The new paged results uses this + </dd> + <dt> + <strong>max_doc_size</strong> + </dt> + <dd> + Limit the size of documents retrieved + </dd> + <dt> + <strong>next_page_text</strong> + </dt> + <dd> + Used in the navigation between pages + </dd> + <dt> + <strong>no_excerpt_text</strong> + </dt> + <dd> + Text displayed if no excerpt was available + (this used to be hard-coded) + </dd> + <dt> + <strong>no_next_page_text</strong> + </dt> + <dd> + Used in the navigation between pages + </dd> + <dt> + <strong>no_prev_page_text</strong> + </dt> + <dd> + Used in the navigation between pages + </dd> + <dt> + <strong>prev_page_text</strong> + </dt> + <dd> + Used in the navigation between pages + </dd> + <dt> + <strong>star_patterns</strong> + </dt> + <dd> + Allow different star images to be used + depending on the match URL + </dd> + <dt> + <strong>synonym_dictionary</strong> + </dt> + <dd> + Support for the new synonyms fuzzy algorithm + </dd> + <dt> + <strong>synonym_db</strong> + </dt> + <dd> + Support for the new synonyms fuzzy algorithm + </dd> + <dt> + <strong>syntax_error_file</strong> + </dt> + <dd> + HTML file displayed if there was a boolean + expression syntax error + </dd> + <dt> + <strong>template_map</strong> + </dt> + <dd> + Used in the support for the new result display + templates + </dd> + <dt> + <strong>template_name</strong> + </dt> + <dd> + Sets the default template name + </dd> + <dt> + <strong>text_factor</strong> + </dt> + <dd> + Added to allow normal text to have a variable + weight (0, for example...) + </dd> + </dl> + </li> + </ul> + <ul> + <li> + Some form tag names have changed. The list of + recognized form tags are in the + <a href="htsearch.html" target="_top">htsearch</a> + documentation. + </li> + <li> + Multiple start urls can be specified as a value to the + 'start_url' attribute. This could be combined with the + file inclusion to read in a file of URLs to start with. + </li> + <li> + <a href="htdig.html">htdig</a> now sends the 'Referer:' + header in HTTP requests so that any link errors will be + logged in the server's log files. + </li> + <li> + In addition to the "htdig-keywords" META tag name, + <a href="htdig.html">htdig</a> now also supports just + "keywords". This is to make it more compatible with the + Alta Vista search engine. + </li> + <li> + The verbose display of <a href="htdig.html">htdig</a> + was enhanced to show '+' for a link that will be + followed and '-' for a link that was discarded. + </li> + <li> + <a href="htmerge.html">htmerge</a> was changed to use + the Unix sort program instead of doing its own sorting. + It no longer uses mmap() to map the words into memory. + This was causing problems on systems with limited + virtual memory available. (What??? You mean you DON'T + have at least a 1GB disk dedicated to swap???) + </li> + <li> + The Endings algorithm was fixed up to work properly + now. There were several well hidden bugs that made the + algorithm come up with illegal words. + </li> + <li> + The <strong>synonyms</strong> fuzzy algorithm was + added. This is simply a mapping of words to other + words. The input file is just a list of words which + causes the first word on a line to be mapped to the + rest of the words on that line. (We use this to map + course abbreviations to full course names) + </li> + <li> + SGML entities are now supported. They are translated to + their equivalent ISO-8859-1 encoding. + </li> + </ul> + </ul> + + <p> + <strong>Release notes for htdig-3.0b5</strong> + </p> + <ul> + <li> + The configuration has changed. There is now a CONFIG + file which contains all the variables which control + where things get installed. 'make install' will now + actually attempt to set everything up with default or + example files.<br> + Note that some default directories have changed. For + example, the default configuration file location is not + /usr/local/etc/htdig.conf anymore. Instead it is now + defined in terms of CONFIG_DIR. + </li> + <li> + The htfuzzy/createDict.pl Perl program has been + obsoleted. Creating the endings database is now done by + htfuzzy itself. If you already have endings databases, + you don't need to recreate them, they will still work. + </li> + <li> + GNU rx-1.0 is now included with the distribution. This + is used by htfuzzy to create the endings databases. + </li> + <li> + The name of the whole search system has changed from + <em>HTDig</em> to <em>ht://Dig</em>. + </li> + <li> + The HTML documentation got a big facelift! This + includes the new logo for ht://Dig. (Thanks goes to + Keith Parks for the Images!) + </li> + <li> + htsearch got a new option '-r' which will allow it to + produce raw output. This output can easily parsed by a + wrapper program to produce custom HTML or other output + for the search results. + </li> + </ul> + + <hr size="4" noshade> + Last modified: $Date: 2004/06/12 13:39:12 $ + </body> +</html> |