summaryrefslogtreecommitdiffstats
path: root/debian/htdig/htdig-3.2.0b6/contrib/htparsedoc/README
blob: 4ec0f6ab21bba4973a06c8c4d906f06b59f189a4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

>    Subject: htdig: HTDIG: Searching Word files
>         To: htdig@sdsu.edu
>       From: Richard Jones <rjones@imcl.com>
>       Date: Tue, 15 Jul 1997 12:44:03 +0100
>
> I'm currently trying to hack together a script to search
> Word files. I have a little program called `catdoc' (attached)
> which takes Word files and turns them into passable text files.
> What I did was write a shell script around this called
> `htparsedoc' (also attached) and add it as an external
> parser:
> 
>         --- /usr/local/lib/htdig/conf/htdig.conf ---
> 
>         # External parser for Word documents.
>         external_parsers:       "applications/msword"
> "/usr/local/lib/htdig/bin/htparsedoc"
> 
> This script produces output like this:
> 
>         t Word document http://annexia.imcl.com/test/comm.doc
>         w INmEDIA 1 -
>         w Investment 2 -
>         w Ltd 3 -
>         w Applications 4 -
>         w Subproject 5 -
>         w Terms 6 -
>         w of 7 -
>  [...]
>         w Needed 994 -
>         w Tbd 995 -
>         w Resources 996 -
>         w Needed 997 -
>         w Tbd 998 -
>         w i 1000 -
>