debian/htdig/htdig-3.2.0b6/htdoc/all.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <head>
	<title>
	  ht://Dig: Overview of Programs
	</title>
  </head>
  <body bgcolor="#eef7ff">
	<h1>
	  Overview of Programs
	</h1>
	<p>
	  ht://Dig Copyright &copy; 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
	  Please see the file <a href="COPYING">COPYING</a> for
	  license information.
	</p>
	<hr size="4" noshade>
	<p>
	  There are several programs in the ht://Dig package. 
	</p>
	<h3>
	  <a href="htdig.html">htdig</a>
	</h3>
	<p>
	  Digging is the first step in creating a search database. This
	  system uses the word <em>digging</em> while other systems call
	  it <em>harvesting</em> or <em>gathering</em>. In the ht://Dig
	  system, the program <a href="htdig.html">htdig</a> performs
	  the information gathering stage. In this process, the program
	  will act as a regular web user, except that it will follow
	  <em>all</em> hyperlinks that it comes across. (Actually, it
	  will not follow all of them, just those that are within the
	  domain it needs to gather information on...)<br>
	   Each document it goes to is examined and all the unique
	  words in this document are extracted and stored.
	</p>
	<p>
	  The digging process will <em>only</em> follow links and has
	  no notion of JavaScript, applets, or user-input forms.
	</p>
	<hr noshade>
	<h3>
	  <a href="htsearch.html" target="_top">htsearch</a>
	</h3>
	<p>
	  Searching is where the users actually get to use all the
	  information that was gathered during the dig and merge
	  stages. The <a href="htsearch.html" target="_top">
	  htsearch</a> program performs the actual searches. It typically
	  produces <code>HTML</code> output which will be seen by the
	  users, though other text formats could be generated by
	  editing the output templates.
	</p>
	<hr noshade>
	<h3>
	  <a href="htmerge.html">htmerge</a>
	</h3>
	<p>
	 Merging does exactly that--it merges one database
	 into another. In previous versions of ht://Dig, the htmerge
	 program also formed databases for use by htsearch from the
	 htdig output. This process is now largely unnecessary except
	 for removal of invalid URLs which is now done by the htpurge
	 program.
	</p>
	<hr noshade>
	<h3>
	  <a href="htpurge.html">htpurge</a>
	</h3>
	<p>
	 Purging removes documents and the associated words from the
	 databases. This should be done after running htdig to remove
	 invalid URLs, documents marked not to be indexed, old
	 versions of modified documents, etc. You can also specify
	 specific URLs to be removed explicitly by htpurge.
	</p>
	<hr noshade>
	<h3>
	  <a href="htload.html">htload</a>
	</h3>
	<p>
	  Loading involves importing the contents of the databases
	  from formatted ASCII text documents as created by htdump or
	  the -t flag from htdig. This is, of course, destructive by
	  nature and data from the text files will replace any
	  conflicting data in the databases.
	</p>
	<hr noshade>
	<h3>
	  <a href="htdump.html">htdump</a>
	</h3>
	<p>
	  Dumping involves exporting the contents of the databases to
	  formatted ASCII text documents. This can be useful for
	  backups, transferring databases between different operating
	  systems, changing the compression or encodings in the
	  ht://Dig configuration, parsing by external utilities. It is
	  <em>not</em> recommended to edit these files by hand, so be
	  warned! (Minor edits will probably be fine.)
	</p>
	<hr noshade>
	<h3>
	  <a href="htstat.html">htstat</a>
	</h3>
	<p>
	  The htstat program returns statistics on the databases,
	  similar to the -s flags for some of the programs. In
	  addition, it can return a list of URLs in the databases.
	</p>
	<hr noshade>
	<h3>
	<a href="htnotify.html">htnotify</a>
	</h3>
	<p>
	  The ht://Dig system includes a handy reminder service which
	  allows HTML authors to add some ht://Dig specific <a href="meta.html">meta
	  information</a> in HTML documents. This meta information is
	  used to email authors after a specified date. Very useful
	  to maintain lists that contain those annoying &quot;new&quot;
	  graphics with new items. (Hint: Things really aren't all
	  that new anymore after 6 months!)<br>
	</p>
	<hr noshade>
	<h3>
	<a href="htfuzzy.html">htfuzzy</a>
	</h3>
	<p>
	  To allow the searches to use &quot;fuzzy&quot; algorithms to match
	  words, the <a href="htfuzzy.html">htfuzzy</a> program can
	  create indexes for several different algorithms.
	</p>
	<hr size="4" noshade>

	Last modified: $Date: 2004/05/28 13:15:17 $

  </body>
</html>