summaryrefslogtreecommitdiffstats
path: root/kspread/DESIGN.html
blob: 4bfa7e8afaf249d984f791c3b0dd57ba8f0dd19e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
  <title>KSpread Development Notes</title>
</head>

<body>

<h1>KSpread Development Notes</h1>

<p>Maintainer: Ariya Hidayat (<a href="mailto:ariya@kde.org">ariya@kde.org</a>)</p>
<p>Some portions by Tomas Mecir (<a href="mailto: mecirt@gmail.com">mecirt@gmail.com</a>)</p>
<p>Revision: September 2004.</p>

<h2>Introduction</h2>

<p>This document contains information about internal structure of KSpread
as well as some notes of upcoming redesign. The sources for this document
are mainly the discussions which take place in koffice-devel mailing-list
and the source code itself.</p>

<h2>Document/View Architecture</h2>

<p>Status: IN PROGRESS.</p>

<p>MVC (Model/View/Controller) means that the application consists of three
big parts, the <i>Model</i> which holds the data structure and objects,
the <i>View</i> which shows the model to the user and the <i>Controller</i>
which handles user inputs and changes the model accordingly. Like other
office applications, KSpread uses the Document/View architecture, a slightly
different variant of MVC where the View and Controller are put together
as one part.</p>

<p>In order of its complexity scale, KSpread code has to be well separated,
i.e. the <i>Document</i> and the <i>View</i>. We may also call them as
<i>back-end</i> and <i>front-end</i> respectively. Right now part of which
should belong to the Document sometimes has access to the View. For example,
a cell stores information about its metrics in pixels (which is zoom dependent),
knows whether it is visible to the user or not (which is view dependent), etc.
This needs to be changed.</p>

<p>One easy way to decide whether some stuff or relationship must really really
alienated in the Document is to imagine that somebody wants to create another
View (front-end) to the Document object model (back-end) that is being worked
on. Say, one decent guy would like to copy the look-and-feel of classic Lotus
1-2-3 (for whatever reason we are not really interested in here); so basically
to some extent he can take most part of the KSpread back-end and glue a new
user interface around the code.</p>

<h2>Dependency Handling</h2>
<p>Status: IN PROGRESS.</p>

<p>When a cell holds a formula, then it is likely that it depends on other
cell(s) for calculating the result. For example, if cell A11 has the formula
&quot;=SUM(A1:A10)&quot;, this means that values in cells A1, A2, A3, until
A10 must be correctly calculated first before the sum can be obtained for
cell A11. This is called <i>dependency</i>.</p>

<p>As for now, KSpread tries to manage dependency by storing the dependent
cells or ranges in the cell itself. This is not too efficient. If a cell
is very simple, i.e. stored only value, not formula, such scheme will just
waste a couple of bytes of pointers for the dependency data structure.
It is much more wiser to simply create one <i>dependency manager</i> for each
worksheet; it should be responsible for maintaining and handling cell
dependencies for that sheet. Also KSpread always stores ranges which depend
on one particular cell and ranges whose one of its dependent is that cell
(and these are all in the cell structure itself). This is not necessary as
that information is redundant. The dependency manager should be able to handle
both cases.</p>

<p>Let us have a look at this simple example:</p>

<table cellspacing="0" cellpadding="3" border="1">
<tr>
  <td align="center">&nbsp;</td>
  <td align="center">A</td>
  <td align="center">B</td>
  <td align="center">C</td>
  <td align="center">D</td>
</tr>
<tr>
  <td align="center">1</td>
  <td align="right">14</td>
  <td align="right">36</td>
  <td align="right">&nbsp;</td>
  <td align="right">&nbsp;</td>
</tr>
<tr>
  <td align="center">2</td>
  <td align="right">3</td>
  <td align="right">&nbsp;</td>
  <td align="right">&nbsp;</td>
  <td align="right">&nbsp;</td>
</tr>
<tr>
  <td align="center">3</td>
  <td align="right">77</td>
  <td align="right">&nbsp;</td>
  <td align="right">&nbsp;</td>
  <td align="right">&nbsp;</td>
</tr>
<tr>
  <td align="center">4</td>
  <td align="right">=SUM(<b>A1:A3</b>)</td>
  <td align="right">=A4+SUM(<b>B1:B3</b>)</td>
  <td align="right">=100*<b>B4</b></td>
  <td align="right">&nbsp;</td>
</tr>
</table>

<p>Such sheet should produce dependencies like:</p>

<table cellspacing="0" cellpadding="3" border="1">
<tr>
  <td><b>Reference</b></td>
  <td><b>Dependent(s)</b></td>
</tr>
<tr>
  <td>A4</td>
  <td>A1:A3</td>
</tr>
<tr>
  <td>B4</td>
  <td>A4 and B1:B3</td>
</tr>
<tr>
  <td>C4</td>
  <td>B4</td>
</tr>
</table>

<p>When we want to recalculate cell B4, from the dependencies shown above we
may know that first we need to know values of cell A4 and range B1:B3. Further on,
cell A4 needs to know values of cells in range A1:A3. Therefore, <i>given one reference
cell</i> (e.g. B4), the dependency manager must be able to <i>return all
dependents, cells and/or ranges</i> (e.g. A4, B1:B3). Do we need to go
recursively when searching for dependencies? That really depends on the
implementation, but it is not a big problem, though.</p>

<p>In another case, say the user has changed cell A3 so we need to update the
calculation. We should not recalculate the whole sheet because it wastes time.
We just need to recalculate cells that depend on A3, in this case A4, B4 and C4.
So the dependency manager has another responsibility: <i>given a cell</i>
it should <i>find all cells and/or ranges which depend on that particular
cell</i>. It is a matter of iterating over all dependencies and checking
whether the cell is within the dependent(s) and returning the reference cell.
In this example, cell A3 is in the range A1:A3, a dependent range of cell A4.
Hence, we just return A4. Recursive or not, we can either continue finding
dependents of A4 or just stop here.</p>

<p>Note also that dependency manager should not store cell pointers, but rather
only the location of the cell (i.e. the sheet that owns the cell, row number and
column number). This is because on some cases the dependent cell may not exist
yet. As illustrated in the example, dependents of cell B4 are A4, B1, B2 and B3
but here cells B2 and B3 are still empty. Of course, when we just want to
know which cells we need to recalculate for one reference cell, the dependency
manager is allowed to return only non-empty cells (e.g. A4 and B1 in our case)
as empty cells have no effect and will not be recalculated anyway.</p>

<p>By the same manner, dependency manager can also held responsible when
chart comes into play. Any charts placed in the sheet (that are actually KChart
parts) depend on some values of the cells. An action by the user to changing
those cells, directly or indirectly, should trigger the update of the respective
charts.</p>

<p>Inter-sheet dependencies can be well handled if we store the owner of
each dependent. This is not shown yet in the explanation above to avoid
unnecessary complication. But let have one example now: if Sheet2!A1 is
&quot;=SUM(Sheet1!A1:A10)&quot; then changing Sheet1!A1 (the dependent)
means updating Sheet2!A1 (the reference). Of course during recalculation we
must take care that all sheets in the document must be processed, even though
only one single cell in one sheet has been changed.</p>

<p>Implementation-wise, there will be one instance of the dependency manager
for each sheet. This class will fully manage all dependencies and trigger cell
recalculation. An important part of this concept is this: The cell itself knows
<em>nothing</em> about dependencies, and it doesn't care about them either.
The cell will just inform about the fact, that its value has been changed,
and the dependency manager will do the rest. In addition, this gives us
recursive dependency calculation at almost no cost.</p>

<h2>Manipulators</h2>
<p>Status: PLANNED.</p>

<p>Currently, every operation on a cell or on a range of cells is quite complex.
You need to ensure correct repainting, recalculation, iterate on a range and so on.</p>

<p>To address this issue, manipulators shall be implemented. A manipulator will
implement one operation (formatting change, sequence fill, ..., ...).</p>

<p>Basically, usage of a manipulator should look like this:</p>

<p><pre>
Manipulator *manip = manipulatorManager::self()->getManip ("seqfill");
manip->setArgument ("type", 1);
... (more setArgument's)
manip->exec (selection);
</pre></p>

<p>That's all...</p>

<p>What concerns manipulator implementation, you'll derive from the base
manipulator and reimplement constructor and methods initialize()
(called just before the operation starts), processCell(), and maybe
done(). The constructor or initialize() would set some properties for
the cell-walking algorithm, and then it won't care about it anymore.
The base class will walk the range and call processCell() for each
cell, possibly creating it if it doesn't exist (if the manipulator
wants so).There will also be some methods that can be used to process
the whole range or row/column at once, if the manipulator wants to do so
(useful for, say, formatting manipulators that will be able to set attributes
of a whole range or row/col, in accordance with thoughts about format storage
below.</p>

<p>In addition, the manipulator can implement the undo/redo functionality
- the base manipulator will provide some common stuff needed to
accomplish this.</p>

<h2>Selection handling</h2>
<p>Status: PLANNED</p>

<p>The selection shall be an instance of some RangeList class,
or however we want to call it - this will contain a list of
cells/ranges/rows/whatever - like current selection, but will contain
more entries. This will allow easy implementation of CTRL-selections and so,
because thanks to manipulators, each operation will automatically support these.</p>

<h2>Repaint Triggering</h2>
<p>Status: PLANNED</p>

<p>As mentioned above, the interface between the core and the GUI needs to be kept
at minimum. Also, the number of repaints needs to be as low as possible, and repaints
should be groupped whenever possible. To achieve all this, the following approach
can be used:</p>

<p>When a cell is changed, it calls some method in KSpread::Sheet - valueChanged()
or formattingChanged(). These methods then trigger everything necessary, like
a call to the painting routine or dependency calculation.</p>

<p>This simple system would work on itself, but it would be slow. If you do
a sequence fill on A1:A1000 and you have a SUM(A1:A1000) somewhere, why would
you want to compute that SUM 1000 times, when you can simply compute it after
the sequence fill has been finished? Hence, the sheet will offer some more
methods - disableUpdates(), enableUpdates(), rangeListChanged() and
rangeListFormattingChanged(). All these will be used (solely?) by manipulators,
preferably by the base manipulator class, so that we don't have to call these
functions in each operation. After a call to disableUpdates(), there will
be no repainting and no dependency calculation. Note that a call to
enableUpdates() won't cause any repaints either, as the sheet cannot remember
all the calls (due to loss of range information). Hence, the base manipulator
class needs to call the correct rangeList*Changed method to trigger an
update in an effective way. The base manipulator needs to be configurable by
the manipulators that derive from it, so that it knows whether it changed
cell's content or formatting.</p>

<h2>Formula Engine</h2>
<p>Status: FINISHED.</p>

<p>This formula engine is just an expression evaluator. To offer better
performance, the expression is first compiled into byte codes which will
be executed later by a virtual machine.</p>

<p>Before compilation, the expression is separated into pieces, called tokens.
This step, which is also known as lexical analysis, takes places at once
and will produce sequence of tokens. They are however not stored and used only
for the purpose of the subsequent step.
Tokens are supplied to the parser, also known as syntax analyzer. In this
design, the parser is also a code generator. It involve the generation
of byte codes which represents the expression.
Evaluating the formula expression is now basically running the virtual
machine to execute compiled byte codes. No more scanning or parsing
are performed during evaluation, this saves time a lot.</p>

<p>The virtual machine itself (and of course the byte codes) are designed to be
as simple as possible. This is supposed to be stack-based, i.e. the virtual
machine has an execution stack of values which would be manipulated
as each byte code is processed. Beside the stack, there will be a list of
constant (sometimes also called as "constants pool") to hold Boolean,
integer, floating-point or string values. When a certain byte code needs
a constant for the operand, an index is specified which should be used
to look up the constant in the constants pool.</p>

<p>There are only few byte code, sufficient enough to perform calculation.
Yes, this is really minimalist but yet does the job fairly well.
The following provides brief description for each type of bytecode.</p>

<blockquote>

<p><i>Nop</i> means no operation.</p>

<p><i>Load</i> means loads a constant and push it to the stack. The constant can
be found at constant pools, at position by 'index', it could be a Boolean,
integer, floating-point or string value.</p>

<p><i>Ref</i> means gets a value from a reference. Member variable 'index' will
refers to a string value in the constant pools, i.e. the name of the reference.
Typically the reference is either a cell (e.g. A1), range of cells (A1:B10)
or possibly function name. Example: expression A2+B2 will be compiled as:<br>
Constants:<br>
#0: "A2"<br>
#1: "B2"<br>
Codes:<br>
Ref #0<br>
Ref #1<br>
Add
</p>

<p><i>Function</i>.
Example: expression "sin(x)" will be compiled as:<br>
Constants:<br>
#0: "sin"<br>
#1: "x"<br>
Codes:<br>
Ref #0<br>
Ref #1<br>
Function 1
</p>

<p><i>Neg</i> is a unary operator, a value is popped from stack and negated and then
pushed back to the stack. If it is not number (Boolean or string), it
will be converted first.</p>

<p><i>Add, Sub, Mul, Div and Pow</i> are binary operators, two values are popped from
stack and processed (added, subtracted, multiplied, divided, or power) and
the result is pushed to the stack.</p>

<p><i>Concat</i> is string operation, two values are popped from stack (and converted
to string if they are not string values), concatenated, and the result is
pushed to the stack.</p>

<p><i>Not</i> is a logical operation, a value is popped from stack and its Boolean
not is pushed into the stack. When it is not Boolean value, there will be
a cast.</p>

<p><i>Equal, Less, and Greater</i> are comparison operators, two values are
popped from stack and compared appropriately. The result, which is a Boolean
value, is pushed into the stack. To simplify, there no &quot;not equal&quot;
comparison because it can be regarded as &quot;equal&quot; followed by
&quot;not&quot; byte codes. Same goes for &quot;less than or equal to&quot; and
&quot;greater than or equal to&quot;.</p>

</blockquote>

<p>The expression scanner is based on finite state acceptor. The state denotes
the position of cursor, e.g. inside a cell token, inside an identifier, etc.
State transition is following by emitting the associated token to the
result buffer. Rather than showing the state diagrams here, it is much more
convenience and less complicated to browse the scanner source code and try
to follow its algorithm from there.</p>

<p>The parser is designed using one of bottom-up parsing technique, namely
based on Polish notation. Instead of ordering the tokens in suffix Polish
form, the parser (which is also the code generator) simply outputs
byte codes. In its operation, the parser requires the knowledge of operator
precedence to correctly translate unparenthesized infix expression and
thus requires the use of a syntax stack.</p>

<p>The parser algorithm is given as follows:</p>

<blockquote>
Repeat the following steps:<br>
Step 1: Get next token<br>
Step 2: If it is an identifier<br>
- push it to syntax stack<br>
- generated "Ref"<br>
Step 3: If it is a Boolean, integer, float or string value<br>
- push it to syntax stack<br>
- generated "Load"<br>
Step 4: If it is an operator<br>
- check for reduce rules<br>
<br>
- when no more rules applies, push token to the syntax stack<br>
</blockquote>

<p>The reduce rules are:</p>

<p>Rule A: <i>function argument</i>:
if token is semicolon or right parenthesis,
if syntax stack looks as:
<ul type="square">
<li>non-operator &lt;--- top</li>
<li>operator ;</li>
<li>non-operator</li>
<li>operator (</li>
<li>identifier</li>
</ul>
then reduce to
<ul type="circle">
<li>non operator</li>
<li>operator (</li>
<li>identifier</li>
<li>increase number of function arguments</li>
</ul>
</p>

<p>Rule B: last function argument<br>
if syntax stack looks as:<br>
<ul type="square">
<li>operator )</li>
<li>non-operator</li>
<li>operator (</li>
<li>identifier</li>
</ul>
then reduce to:<br>
<ul type="circle">
<li>non-operator</li>
<li>generated "Function" + number of function arguments</li>
</ul>
</p>

<p>Rule C: function without argument<br>
if syntax stack looks as:<br>
<ul type="square">
<li>operator )</li>
<li>operator (</li>
<li>identifier</li>
</ul>
then reduce to:<br>
<ul type="circle">
<li>non-operator (dummy)</li>
</ul>
</p>

<p>Rule D: parenthesis removal<br>
if syntax stack looks as:<br>
<ul type="square">
<li>operator (</li>
<li>non-operator</li>
<li>operator )</li>
</ul>
then reduce to:<br>
<ul type="circle">
<li>non-operator</li>
</ul>
</p>

<p>Rule E: binary operator<br>
if syntax stack looks as:<br>
<ul type="square">
<li>non-operator</li>
<li>binary operator</li>
<li>non-operator</li>
<li>and if the precedence of the binary operator in the syntax stack
is greater or equals to the precedence of token</li>
</ul>
then reduce to:<br>
<ul type="circle">
<li>non-operator</li>
<li>and generated appropriate byte code for the binary operator</li>
</ul>
</p>

<p>Rule F: unary operator<br>
if syntax stack looks as:<br>
<ul type="square">
<li>non-operator</li>
<li>unary operator</li>
<li>operator</li>
</ul>
then reduce to:<br>
<ul type="circle">
<li>operator</li>
<li>and generated "Neg" if unary operator is '-'</li>
</ul>
</p>

<p>Percent operator is a special case and not handled the above mentioned rule.
When the parser finds the percent operator, it checks whether there's a non-operator
token right before the percent. If yes, then the following code is generated:
<tt>load 0.01</tt> followed by <tt>multiply</tt>.</p>

<h2>Value</h2>

<p>Status: FINISHED.<br>
</p>

<p>to be written.</p>

<h2>Commands Based on KCommand<br>
</h2>

<p>Status: IN PROGRESS.</p>

<p>Until lately, to implement undo and redo, KSpread creates corresponding
KSpreadUndo classes for each action and runs them when the user undoes
those actions. KSpreadUndo also has redo function whose job is to redo
again the action after being undone.</p>

<p>All this needs to be converted to manipulators - these will be KCommand,
hence we should be able to undo/redo every operation (provided that the
corresponding manipulator provides methods to store/recall the undo information).</p>

<h2>Cell Storage</h2>
<p>Status: PLANNED.</p>

<p>Cells are grouped together, and then hashed.</p>

<h2>Format Storage</h2>
<p>Status: PLANNED.</p>

<p>Formatting specifies how a cell should look like. It involves font
attributes like bold or italics, vertical and horizontal alignment,
rotation angle, shading, background color and so on. Each cell can have
its own format, but bear also in mind that a whole row or column format
should also apply.</p>

<p>Current way of storing formatting information is rather inefficient:
pack it together inside the cell. The reason is because most of cells
are either very plain (no formatting) and/or only have partial attribute
(e.g. only marked as bold, no font family or color is specified).
Therefore the approximately 20 bytes used to hold formatting information
are quite a waste of memory. Even worse, this requires that the cell
must exist even if it is not in use. As illustration, imagine a worksheet
where within range A1:B20 only 5 cells are not empty. When the user
selects this range and changes the background color to yellow, then
those 5 cells must store this information in their data structure but
how about the other 35 cells? Since the formatting is attached to the cell,
there is no choice but to create them. Doing this, just for the sake of
storing format, is actually not really good.</p>

<p>A new way to store formatting information is proposed below.</p>

<p>For each type of format a user can use, we have the corresponding
<i>formatting piece</i>, for example &quot;bold on&quot;, &quot;bold off&quot;,
&quot;font Arial&quot, &quot;left-border 1 px&quot;, etc. Whenever the user
applies formatting to a range (could also be a whole column, row, or worksheet),
we save appropriate respective formatting piece in a stack. Say the user has
marked column B as bold, row 2 as italics, and set range A1:C5 with
yellow background color. Our formatting stack would look like:
</p>

<table cellspacing="0" cellpadding="3" border="1">
<tr>
  <td><b>Range</b></td>
  <td><b>Formatting Piece</b></td>
</tr>
<tr>
  <td>Column B</td>
  <td>Bold on</td>
</tr>
<tr>
  <td>Row 2</td>
  <td>Italics on</td>
</tr>
<tr>
  <td>A1:C5</td>
  <td>Yellow background</td>
</tr>
</table>

<p>Now let try to figure out the overall format of cell B2. From the first
we know it should be bold, from the second it should be italics, and last
it should have yellow background. This complete format is the one which
we used to render cell B2 on screen. In similar fashion, we can know that
cell A1 on the other only specifies the yellow background, because the first
and second pieces do not apply there.</p>

<p>Another possible way to see the format storage is by using 3-D perspective.
For each formatting piece, imagine there is a surface which covers the
formatted range (the xy-plane). The formatting information is simply attached
to the surface (say, as surface attribute). Every surface is stacked together,
its depth (the z-axis) denotes the sequence, i.e. the first surface is the
deepest. For the example above, we can view the pieces as one surface specifies
&quot;bold on&quot; which is a vertical of column B, one surface specifies
&quot;italics on&quot; which is a horizontal band of row 2 and one last surface
which specifies &quot;yellow background&quot; stretched in the range A1:C5.
How to find complete format for A2? This is now a matter of surface
determination. Traversing in the direction of z-axis from B2 reveals
that we hit the last and second surfaces only; thereby we can know the complete
format is &quot;italics, yellow background&quot;.</p>

<p>It is clear that a format storage corresponds to one sheet only. For each
sheet, there should be one format storage. Cells can still have accessors to
its formatting information, these simply become wrapper for proper calls to the
format storage. Since each formatting piece holds information about the applied
range, we must take care that the formatting storage is correctly updated
whenever cells, rows or columns are inserted and deleted.</p>

<p>In order to avoid low performance, we must use a smart way to iterate over
all formatting pieces whenever we want to find out complete format for given
cell(s). When the sheet gets very complex, it is likely that we will have
many many formatting pieces that are not even overlap. This means, when we
need formatting of cell A1, it is no use to check formatting pieces of range
Z1:Z10 or A100:B100 which do not include cell A1 and are even very far from A1.
One possible solution is to arrange the formatting pieces in a quad-tree.
Because one piece can cover a very large area, it is possible that it will
be in more than one leaf in the quad-tree. <i>Details on the possible use of
quad-tree or other methods should be explored further more</i>.</p>

<h2>Default Toolbars</h2>
<p>Status: IN PROGRESS.</p>

<p>Relevant mailing-list threads:</p>

<ul>
<li><a href="http://lists.kde.org/?l=kde-usability&m=110751297906231&w=2">
http://lists.kde.org/?l=kde-usability&amp;m=110751297906231&amp;w=2</a></li>
<li><a href="http://lists.kde.org/?l=koffice-devel&m=110723903332496&w=2">
http://lists.kde.org/?l=koffice-devel&amp;m=110723903332496&amp;w=2</a></li>
</ul>

<p>Toolbars are utilized to place most frequently used actions. It is 
important to present the user with default toolbars which make 
sense, i.e. they do not contain unnecessary buttons. In-depth usability 
analysis and/or further discussions are needed to make decision which 
buttons need to be in the toolbar and which don't.</p>

<p>For reference, here is a list of default shown toolbars in 
some spreadsheet applications:</p>

<p><b>Microsoft Excel 2002</b>:</p>

<ul>
<li><em>Standard toolbar</em>: New, Open, Save, Search, Print, Print Preview, 
Spelling, Cut, Copy, Paste, Format Painter, Undo, Redo, Insert Hyperlink, 
Autosum, Sort Ascending, Sort Descending, Chart Wizard, Drawing, Zoom, Help</li>
<li><em>Formating toolbar:</em>: Font, Font Size, Bold, Italic, Align Left, 
Align Center, Align Right, Merge and Center, Format Currency, Format Percent, 
Comma Style, Increase Precision, Decrease Precision, Increase Indent, 
Decrease Indent, Merge Cells, Unmerge Cells, Merge Across, Borders, 
Fill Color, Font Color</li>
</ul>


<p><b>OpenOffice 2.0</b>:</p>

<ul>
<li><em>Standard toolbar</em>: New, Open, Save, E-mail, Edit, PDF, Print, 
Page Preview, Spellcheck, Cut, Copy, Paste, Format Paintbrush, Undo, Redo, 
Hyperlink, Sort Ascending, Sort Descending, Insert Chart, Navigator, Styles, 
Gallery, Zoom</li>
<li><em>Format Object toolbar:</em> Font Name, Font Size, Bold, Italic, 
Underline, Font Color, Align Left, Align Center, Align Right, Justified, 
Merge Cells, Format as Currency, Format as Percent, Format Standard, 
Increase Precision, Decrease Precision, Increase Indent, Decrease Indent,
Borders, Line Style, Line Color, Background Color, Align Top, Align Center 
Vertically, Align Bottom </li>
</ul>

<p><b>Gnumeric 1.4</b>:</p>

<ul>
<li><em>Standard toolbar</em>: New, Open, Save, Print, Print Preview, 
Cut, Copy, Paste, Undo, Redo, Hyperlink, Autosum, Function/Formula, 
Insert Chart, Zoom</li>
<li><em>Format toolbar:</em> Font Family, Font Size, Bold, Italic, 
Underline, Align Left, Align Center, Align Right, Merge and Center, 
Merge Cells, Split Cell, Format Currency, Format Percent, Thousand Separator, 
Increase Precision, Decrease Precision, Decrease Indent, Increase Indent, 
Borders, Background, Foreground</li>
</ul>

<h2>Test Framework</h2>
<p>Status: IN PROGRESS.</p>

<p>Relevant mailing-list threads:</p>


<ul>
<li><a href="http://lists.kde.org/?l=kde-cvs&m=109511244721480&w=2">
http://lists.kde.org/?l=kde-cvs&amp;m=109511244721480&amp;w=2</a></li>
<li><a href="http://lists.kde.org/?l=koffice-devel&m=109518196922944&w=2">
http://lists.kde.org/?l=koffice-devel&amp;m=109518196922944&amp;w=2</a></li>
</ul>

<p>It is well known that writing clean and easily understandable module will lead 
to better maintenance. However testing that particular module everytime there is 
a significant change requires considerable amount of time and effort. Since KSpread 
and other applications of its scale consist of hundreds of modules, in this case 
automatic testing of each module will help a lot, not to mention that it might 
catch bug as early as possible.</p>

<p>KSpread has a simple test framework to facilitate such kind of test. This can 
be activated using the shortcut <tt>Ctrl+Shift+T</tt>. This test is however 
not accessible via menu, because it is intended to be used only by the developers.
Ideally, there should be tests for all modules contained in KSpread. It is 
the responsibility of the developer to create the corresponding <tt>tester</tt> 
for the code that he or she is working on. All tests should be kept in 
<tt>koffice/kspread/tests/</tt>.</p>

<p>Making a new tester is not difficult. The easiest way is to copy an already 
existing tester and modify it. Basically, it must be a subclass of class Tester 
(see <tt>koffice/kspread/tests/tester.h</tt>). Just reimplement the virtual 
function <tt>run()</tt> and it is ready. In order to make it possible to run 
the new tester, add an instance of the class in TestRunner 
(for details, see <tt>koffice/kspread/tests/testrunner.cpp</tt>).</p>

<p>A tester must be self-contained, it should not use any test data from 
current document. If necessary, it must create (or hard code) the data by 
itself.</p>

<p>Whenever parts of KSpread features are improved or rewritten, it is always 
a good idea to run the related tests to ensure that all the changes do not do 
any harm. However, bear in mind that there is no 100% guarantee that the new 
code is bug-free.</p>

<p>Also, if there is a bug which is not caught by the tester (i.e. it does not 
fail the tester, but the bug is confirmed), then the relevant tester must be 
modified to include one or more test cases similar to the offending bug. 
When the bug is finally fixed, from that point the test should always pass all 
test cases.</p>

</p>

<h2>Coding Style</h2>
<p>Status: IN PROGRESS.</p>

<p>(to be written in details).</p>

<p>Write <b>clean code</b>. To be correct is better than to be fast.
KSpread source code is known to grow very fast in its early days and but later
on also more difficult to understand.</p>

<p>Put comment as documentation for classes and member functions. There is still
lack of documentation as for now, whoever understands something about the
classes and functions should write the documentation.</p>

<p>In complex source files, list of header includes can be very long. Unless 
there is special reason not do it, try to group them together, i.e. standard 
C/C++ headers come first, followed by TQt headers, and then KDE headers, 
KOffice core/UI headers and application specific headers. For each group, 
sort the header files alphabetically. </p>

<p>Write test cases. This will ease further maintenance. See also the section on Test 
Framework above.</p>

<p>Do not use the term <i>table</i>. It was incorrectly invented quite likely
because of the term <i>Tabelle</i> (German, literally means table). The correct
term is <i>sheet</i> or <i>worksheet</i>. The English version of Microsoft
uses <i>sheet</i> while the German version uses <i>Tabelle</i>.</p>

<p>Use <a href="http://developer.kde.org/documentation/library/kdeqt/trinityarch/devel-binarycompatibility.htm">d-pointer</a> trick (also known pimpl) whenever possible. Such practice will help when later on
we want to expose the API and need to maintain binary compatibility. But the
most important thing is to separate the interface and the implementation.
Furthermore, build time is reduced since modification on the implementation
would not cause tons of recompile.</p>

<p>When creating a new class, use namespace KSpread. Do not use KSpread prefix
anymore. Example: use <tt>KSpread::Foo</tt> instead of <tt>KSpreadFoo</tt>.
Also source file name should not contain kspread prefix anymore, i.e.
<tt>foo.h</tt> and <tt>foo.cpp</tt> (but not <tt>kspread_foo.h</tt> and
<tt>kspread_foo.cpp</tt>) for the above example.</p>

</body></html>