README
1The originals of these come from svn checkout
2http://source.icu-project.org/repos/icu/icu/trunk/source/data/brkitr they no
3longer appear in the icu tarballs, but are in icu's svn
4
5dict_word is used for dictionary word break, edit_word is for cursor
6travelling, while count_word is for word count.
7
8At various stages these copies have been customized and are now horribly out of
9sync. It unclear which diffs from the base versions are deliberate and which
10are now accidental :-(
11
12The various issues and customizations have been reviewed, with tests written for
13customizations that are still relevant. However, these files are still extremely
14out-of-date and need to be refreshed. Relevant customizations should be reapplied
15on top of a current version.
16
17done, regression tests added:
18
19#112623# update Japanese word breakiterator dictionary
20#i50172# add cell breakiterator rule for Tamil
21#i80412# indic cursoring
22#i107843# em-dash/en-dash breakiterator fix for spell checking
23#i103552# Japanese word for 'shutdown' added to ja.dic
24#i113785# ligatures for spell checking will no longer break words
25An opening quote should not be counted as a word by word count tool (regression test in writer)
26fdo#31271 wrong line break with (
27#i89042# word count fix (regression test is in writer)
28#i58513# add break iterator rules for Finish
29#i19716# fix wrong line break on bracket characters
30#i21290# extend Greek script type
31#i21907# fix isBeginWord and isEndWord problem
32#i85411# Apply patch for ZWSP
33#i17155# fix line breakiterator rule to make slash and hyphen as part of word when doing line break
34#i13451# add '-' as midLetter for Catalan dictionary word breakiterator
35#i13494# fix word breakiterator rule to handle punctuations and signs correctly
36#i29548# Fix Thai word breakiterator problem
37#i11993# #i14904# fix word breakiterator issues
38#i64400# dash/hyphen should not break words (de/nds/nl/sv)
39#i22602# make dot stick on beginning of a word when doing line break
40#i24098# skip preceding space for beginOfSentence
41#i24098# fix beginOfSentence, which did not work correctly when cursor is on the beginning of the sentence
42#i51661# add quotation mark as middle letter for Hebrew in word breakiterator rule.
43#i50172# add cell breakiterator rule for Tamil
44#i55778# reverse back last change, treat letter and number combination as one word.
45#i56347# apply patch to recognize suffixes of numbers in Hungarian spellchecking
46#i56348# add Hungarian word break rule for edit mode
47#i65267# fix line break rule
48#i86439# many changes to implement, tweak, debug UTF-16 surrogate pair handling
49#i75631# "
50#i75632# "
51#i75633# "
52#i75412# "
53#i80645# fix backslash issues in line breakiterator
54#i80841# fix hyphen line break problem
55#i81448# fixed dot line break issue
56#i81448# fix the problem of line break on punctuations (commit message says i81440)
57#i81448# fix problem of line break on symbols
58#i83649# fixed the problem of line break between quotation mark and open bracket
59#i83464# fix the problem of line break between letter and 1326
60b6634800# fix line break problem of dot after letter and before number
61#i83229# fix the problem of leading hyphen for numbers
62#i80815# count words like MS Word
63
64likely superseded:
65
66#i21392# Obscure line break behavior mismatch in string of symbols between MSO and LO.
67#i80548# "fix dash issues in line breakiterator" - fix no longer works
68#i72868# "fix Chinese punctuation for line breakiterator" - fix no longer works
69#i80891# "fix Chinese punctuation for line breakiterator" - fix no longer works
70
71#i27711# Adding/tweaking/removing languages later added to ICU.
72#i33756# "
73#i41671# "
74#i41671# "
75#i55063# "
76#i24850# ICU upgrades, internal bug fixes, or other work-arounds.
77#i24098# "
78#112772# "
79#i35285# "
804a1f1586173839d532f90507c72306bc9e2aec56 "
81a10b0e70c641d7438c557ef718c6942b3abffaec "
8205fadde6f025bcaafca4f3093e88be3cc1bb6836 "
83#i57866# "
84#i57866# "
85#i69482# "
86#142664# "
87#i60645# "
88#i53388# "
89#i60645# "
90#i78393# "
91#i73903# "
92#i75412# "
93#i72589# "
94#i75319# "
95#i76706# "
96#i64400# "
97#i64400# "
98#i79148# "
99#i55063# "
100#i87530# "
101#i88041# "
102#i88411# "
103#i80923# "
104#i80923# "
105#i81519# "
106
107
108suspect:
109
110
111- The intentions behind the following commits are unclear, as the referenced bugs were in the
112StarOffice internal bug tracker. These changes are contemporaneous with TR14 Revision 17, and seem
113to be part of an effort to backport upstream rule changes across multiple language customizations.
114
115commit 746ea3d8c29b27b23af3433446f66db0ad3096d6
116Author: Oliver Bolte <obo@openoffice.org>
117Date: Tue Jan 11 10:19:26 2005 +0000
118
119 INTEGRATION: CWS i18n15 (1.2.20); FILE MERGED
120 2004/09/04 02:03:53 khong 1.2.20.1: #117685# make dictionary word contain only letter or only number, dot can be in middle or end of a word, but only one.
121
122commit a31a26ce1a9c7f63e354836fd9e1282b6a5063a1
123Author: Oliver Bolte <obo@openoffice.org>
124Date: Tue Jan 11 10:19:07 2005 +0000
125
126 INTEGRATION: CWS i18n15 (1.2.58); FILE MERGED
127 2004/09/04 02:03:53 khong 1.2.58.1: #117685# make dictionary word contain only letter or only number, dot can be in middle or end of a word, but only one.
128
129commit f7babbe5ffcae9d60ab5e547887a0ccc453c2bcb
130Author: Oliver Bolte <obo@openoffice.org>
131Date: Tue Jan 11 10:18:51 2005 +0000
132
133 INTEGRATION: CWS i18n15 (1.3.36); FILE MERGED
134 2004/09/04 02:03:53 khong 1.3.36.1: #117685# make dictionary word contain only letter or only number, dot can be in middle or end of a word, but only one.
135
136
137- The intention behind the following commit is unclear, as the bug references are incorrect and no
138good candidates were immediately apparent. Based on the text of the commit, however, it appears to
139be a simple bug fix for skipSpace(). This function has also had a great deal of churn since this
140commit, further suggesting it is no longer pertinent.
141
142commit 1967d8fb182b3101dee4f715e78be384400bc1e8
143Author: Kurt Zenker <kz@openoffice.org>
144Date: Wed Sep 5 16:37:28 2007 +0000
145
146 INTEGRATION: CWS i18n37 (1.22.6); FILE MERGED
147 2007/09/03 18:27:39 khong 1.22.6.2: i8132 fixed a problem in skipping space for word breakiterator
148 2007/08/31 21:30:30 khong 1.22.6.1: i81158 fix skipping space problem
149
150