summaryrefslogtreecommitdiffstats
path: root/kexi/kexiutils/transliteration_table.readme
blob: ff00d8abc8e7bc775148d22e6021c0a6696245bf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Transliteration Table README
----------------------------

1. Rationale: Identifiers within the database or programming languages 
only accept latin-1 characters, numbers and '_' character.

Application developers can enter captions (titles) to give 
objects or variables a meaningful name using full unicode set.

Transliteration is used to convert unicode captions to identifiers
without loosing meaning of the names.

More info:
 http://en.wikipedia.org/wiki/Transliteration
 http://en.wikipedia.org/wiki/Romanization

2. We use special kind of romanization as we only allow characters
described in 1.

3. Implementation: transliteration table, was generated by 
generate_transliteration_table.sh shell script is used 
to transliterate any unicode character (having code < 65535) 
to an identifier, what gives constant time for converting 
single character.

The resulting generated code is kept in transliteration_table.{h|cpp} files,
included by identifier.cpp for use in public utility functions.

For each item, the table (basically a table of c-strings) contains:
- a NULL string it the resulting conversion have to be "_" string;
- a c-string of size 1 or more containing a valid transliteration 
   as described in 1;
- an empty string "" if the transliteration should return empty string
  (can be useful e.g. for soft signs in Cyrillic)

4. Fixes: Because iconv/recode tools are not fully implemented in regards 
to transliteration to latin-1 (e.g. no good support 
for Greek and Cyrillic/Serbian characters), 
the transliteration_table.cpp file is patched with 
transliteration_table.cpp.patch which provides fixes written by hand.

If you find invalid or missing transliterations:
 a) edit transliteration_table.cpp (using UTF-8-compliant text editor!)
   - if transliteration_table.cpp file does not exist, 
   extract it from transliteration_table.bz2 archive
 b) run update_transliteration_table_patch.sh shell script,
   what will update the transliteration_table.cpp.patch file
 c) send the transliteration_table.cpp.patch file to the Kexi team

5. Credits
 Jaroslaw Staniek <js at iidea.pl>
 Michael Drueing <michael at drueing.de>
 Chusslove Illich <caslav.ilic at gmx.net>
 Michal Svec <rebel at atrey.karlin.mff.cuni.cz>