Commit | Line | Data |
---|---|---|
c906108c | 1 | /* Include file cached obstack implementation. |
c2d11a7d JM |
2 | Written by Fred Fish <fnf@cygnus.com> |
3 | Rewritten by Jim Blandy <jimb@cygnus.com> | |
af5f3db6 | 4 | |
b811d2c2 | 5 | Copyright (C) 1999-2020 Free Software Foundation, Inc. |
c906108c | 6 | |
c5aa993b | 7 | This file is part of GDB. |
c906108c | 8 | |
c5aa993b JM |
9 | This program is free software; you can redistribute it and/or modify |
10 | it under the terms of the GNU General Public License as published by | |
a9762ec7 | 11 | the Free Software Foundation; either version 3 of the License, or |
c5aa993b | 12 | (at your option) any later version. |
c906108c | 13 | |
c5aa993b JM |
14 | This program is distributed in the hope that it will be useful, |
15 | but WITHOUT ANY WARRANTY; without even the implied warranty of | |
16 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
17 | GNU General Public License for more details. | |
c906108c | 18 | |
c5aa993b | 19 | You should have received a copy of the GNU General Public License |
a9762ec7 | 20 | along with this program. If not, see <http://www.gnu.org/licenses/>. */ |
c906108c SS |
21 | |
22 | #ifndef BCACHE_H | |
23 | #define BCACHE_H 1 | |
24 | ||
c2d11a7d JM |
25 | /* A bcache is a data structure for factoring out duplication in |
26 | read-only structures. You give the bcache some string of bytes S. | |
27 | If the bcache already contains a copy of S, it hands you back a | |
28 | pointer to its copy. Otherwise, it makes a fresh copy of S, and | |
29 | hands you back a pointer to that. In either case, you can throw | |
30 | away your copy of S, and use the bcache's. | |
31 | ||
32 | The "strings" in question are arbitrary strings of bytes --- they | |
33 | can contain zero bytes. You pass in the length explicitly when you | |
34 | call the bcache function. | |
35 | ||
36 | This means that you can put ordinary C objects in a bcache. | |
37 | However, if you do this, remember that structs can contain `holes' | |
38 | between members, added for alignment. These bytes usually contain | |
39 | garbage. If you try to bcache two objects which are identical from | |
40 | your code's point of view, but have different garbage values in the | |
41 | structure's holes, then the bcache will treat them as separate | |
42 | strings, and you won't get the nice elimination of duplicates you | |
43 | were hoping for. So, remember to memset your structures full of | |
44 | zeros before bcaching them! | |
45 | ||
46 | You shouldn't modify the strings you get from a bcache, because: | |
47 | ||
48 | - You don't necessarily know who you're sharing space with. If I | |
49df298f AC |
49 | stick eight bytes of text in a bcache, and then stick an eight-byte |
50 | structure in the same bcache, there's no guarantee those two | |
51 | objects don't actually comprise the same sequence of bytes. If | |
52 | they happen to, the bcache will use a single byte string for both | |
53 | of them. Then, modifying the structure will change the string. In | |
54 | bizarre ways. | |
c2d11a7d JM |
55 | |
56 | - Even if you know for some other reason that all that's okay, | |
49df298f AC |
57 | there's another problem. A bcache stores all its strings in a hash |
58 | table. If you modify a string's contents, you will probably change | |
59 | its hash value. This means that the modified string is now in the | |
60 | wrong place in the hash table, and future bcache probes will never | |
61 | find it. So by mutating a string, you give up any chance of | |
62 | sharing its space with future duplicates. | |
63 | ||
64 | ||
65 | Size of bcache VS hashtab: | |
66 | ||
67 | For bcache, the most critical cost is size (or more exactly the | |
68 | overhead added by the bcache). It turns out that the bcache is | |
69 | remarkably efficient. | |
70 | ||
71 | Assuming a 32-bit system (the hash table slots are 4 bytes), | |
72 | ignoring alignment, and limit strings to 255 bytes (1 byte length) | |
73 | we get ... | |
74 | ||
75 | bcache: This uses a separate linked list to track the hash chain. | |
76 | The numbers show roughly 100% occupancy of the hash table and an | |
77 | average chain length of 4. Spreading the slot cost over the 4 | |
78 | chain elements: | |
79 | ||
80 | 4 (slot) / 4 (chain length) + 1 (length) + 4 (chain) = 6 bytes | |
81 | ||
82 | hashtab: This uses a more traditional re-hash algorithm where the | |
83 | chain is maintained within the hash table. The table occupancy is | |
84 | kept below 75% but we'll assume its perfect: | |
85 | ||
86 | 4 (slot) x 4/3 (occupancy) + 1 (length) = 6 1/3 bytes | |
87 | ||
88 | So a perfect hashtab has just slightly larger than an average | |
89 | bcache. | |
90 | ||
91 | It turns out that an average hashtab is far worse. Two things | |
92 | hurt: | |
93 | ||
94 | - Hashtab's occupancy is more like 50% (it ranges between 38% and | |
95 | 75%) giving a per slot cost of 4x2 vs 4x4/3. | |
96 | ||
97 | - the string structure needs to be aligned to 8 bytes which for | |
98 | hashtab wastes 7 bytes, while for bcache wastes only 3. | |
99 | ||
100 | This gives: | |
101 | ||
102 | hashtab: 4 x 2 + 1 + 7 = 16 bytes | |
103 | ||
104 | bcache 4 / 4 + 1 + 4 + 3 = 9 bytes | |
105 | ||
106 | The numbers of GDB debugging GDB support this. ~40% vs ~70% overhead. | |
107 | ||
108 | ||
109 | Speed of bcache VS hashtab (the half hash hack): | |
110 | ||
111 | While hashtab has a typical chain length of 1, bcache has a chain | |
112 | length of round 4. This means that the bcache will require | |
113 | something like double the number of compares after that initial | |
114 | hash. In both cases the comparison takes the form: | |
115 | ||
116 | a.length == b.length && memcmp (a.data, b.data, a.length) == 0 | |
117 | ||
118 | That is lengths are checked before doing the memcmp. | |
119 | ||
120 | For GDB debugging GDB, it turned out that all lengths were 24 bytes | |
121 | (no C++ so only psymbols were cached) and hence, all compares | |
122 | required a call to memcmp. As a hack, two bytes of padding | |
123 | (mentioned above) are used to store the upper 16 bits of the | |
124 | string's hash value and then that is used in the comparison vis: | |
125 | ||
126 | a.half_hash == b.half_hash && a.length == b.length && memcmp | |
127 | (a.data, b.data, a.length) | |
128 | ||
129 | The numbers from GDB debugging GDB show this to be a remarkable | |
130 | 100% effective (only necessary length and memcmp tests being | |
131 | performed). | |
132 | ||
133 | Mind you, looking at the wall clock, the same GDB debugging GDB | |
134 | showed only marginal speed up (0.780 vs 0.773s). Seems GDB is too | |
135 | busy doing something else :-( | |
136 | ||
137 | */ | |
c2d11a7d | 138 | |
dfb65191 CB |
139 | namespace gdb { |
140 | ||
25629dfd | 141 | struct bstring; |
af5f3db6 | 142 | |
25629dfd TT |
143 | struct bcache |
144 | { | |
145 | /* Allocate a bcache. HASH_FN and COMPARE_FN can be used to pass in | |
146 | custom hash, and compare functions to be used by this bcache. If | |
4cbd39b2 | 147 | HASH_FUNCTION is NULL fast_hash() is used and if COMPARE_FUNCTION is |
25629dfd TT |
148 | NULL memcmp() is used. */ |
149 | ||
150 | explicit bcache (unsigned long (*hash_fn)(const void *, | |
151 | int length) = nullptr, | |
152 | int (*compare_fn)(const void *, const void *, | |
153 | int length) = nullptr) | |
4cbd39b2 | 154 | : m_hash_function (hash_fn == nullptr ? default_hash : hash_fn), |
25629dfd TT |
155 | m_compare_function (compare_fn == nullptr ? compare : compare_fn) |
156 | { | |
157 | } | |
158 | ||
159 | ~bcache (); | |
160 | ||
161 | /* Find a copy of the LENGTH bytes at ADDR in BCACHE. If BCACHE has | |
162 | never seen those bytes before, add a copy of them to BCACHE. In | |
163 | either case, return a pointer to BCACHE's copy of that string. | |
164 | Since the cached value is ment to be read-only, return a const | |
165 | buffer. If ADDED is not NULL, set *ADDED to true if the bytes | |
166 | were newly added to the cache, or to false if the bytes were | |
167 | found in the cache. */ | |
168 | ||
169 | const void *insert (const void *addr, int length, int *added = nullptr); | |
170 | ||
171 | /* Print statistics on this bcache's memory usage and efficacity at | |
172 | eliminating duplication. TYPE should be a string describing the | |
173 | kind of data this bcache holds. Statistics are printed using | |
174 | `printf_filtered' and its ilk. */ | |
175 | void print_statistics (const char *type); | |
176 | int memory_used (); | |
177 | ||
178 | private: | |
179 | ||
180 | /* All the bstrings are allocated here. */ | |
181 | struct obstack m_cache {}; | |
182 | ||
183 | /* How many hash buckets we're using. */ | |
184 | unsigned int m_num_buckets = 0; | |
185 | ||
186 | /* Hash buckets. This table is allocated using malloc, so when we | |
187 | grow the table we can return the old table to the system. */ | |
188 | struct bstring **m_bucket = nullptr; | |
189 | ||
190 | /* Statistics. */ | |
191 | unsigned long m_unique_count = 0; /* number of unique strings */ | |
192 | long m_total_count = 0; /* total number of strings cached, including dups */ | |
193 | long m_unique_size = 0; /* size of unique strings, in bytes */ | |
194 | long m_total_size = 0; /* total number of bytes cached, including dups */ | |
195 | long m_structure_size = 0; /* total size of bcache, including infrastructure */ | |
196 | /* Number of times that the hash table is expanded and hence | |
197 | re-built, and the corresponding number of times that a string is | |
198 | [re]hashed as part of entering it into the expanded table. The | |
199 | total number of hashes can be computed by adding TOTAL_COUNT to | |
200 | expand_hash_count. */ | |
201 | unsigned long m_expand_count = 0; | |
202 | unsigned long m_expand_hash_count = 0; | |
203 | /* Number of times that the half-hash compare hit (compare the upper | |
204 | 16 bits of hash values) hit, but the corresponding combined | |
205 | length/data compare missed. */ | |
206 | unsigned long m_half_hash_miss_count = 0; | |
207 | ||
208 | /* Hash function to be used for this bcache object. */ | |
209 | unsigned long (*m_hash_function)(const void *addr, int length); | |
210 | ||
211 | /* Compare function to be used for this bcache object. */ | |
212 | int (*m_compare_function)(const void *, const void *, int length); | |
213 | ||
214 | /* Default compare function. */ | |
215 | static int compare (const void *addr1, const void *addr2, int length); | |
216 | ||
4cbd39b2 CB |
217 | /* Default hash function. */ |
218 | static unsigned long default_hash (const void *ptr, int length) | |
219 | { | |
220 | return fast_hash (ptr, length, 0); | |
221 | } | |
222 | ||
25629dfd TT |
223 | /* Expand the hash table. */ |
224 | void expand_hash_table (); | |
225 | }; | |
226 | ||
dfb65191 CB |
227 | } /* namespace gdb */ |
228 | ||
c906108c | 229 | #endif /* BCACHE_H */ |