Commit | Line | Data |
---|---|---|
3241b1d3 JT |
1 | Introduction |
2 | ============ | |
3 | ||
4 | The more-sophisticated device-mapper targets require complex metadata | |
5 | that is managed in kernel. In late 2010 we were seeing that various | |
6 | different targets were rolling their own data strutures, for example: | |
7 | ||
8 | - Mikulas Patocka's multisnap implementation | |
9 | - Heinz Mauelshagen's thin provisioning target | |
10 | - Another btree-based caching target posted to dm-devel | |
11 | - Another multi-snapshot target based on a design of Daniel Phillips | |
12 | ||
13 | Maintaining these data structures takes a lot of work, so if possible | |
14 | we'd like to reduce the number. | |
15 | ||
16 | The persistent-data library is an attempt to provide a re-usable | |
17 | framework for people who want to store metadata in device-mapper | |
18 | targets. It's currently used by the thin-provisioning target and an | |
19 | upcoming hierarchical storage target. | |
20 | ||
21 | Overview | |
22 | ======== | |
23 | ||
24 | The main documentation is in the header files which can all be found | |
25 | under drivers/md/persistent-data. | |
26 | ||
27 | The block manager | |
28 | ----------------- | |
29 | ||
30 | dm-block-manager.[hc] | |
31 | ||
32 | This provides access to the data on disk in fixed sized-blocks. There | |
33 | is a read/write locking interface to prevent concurrent accesses, and | |
34 | keep data that is being used in the cache. | |
35 | ||
36 | Clients of persistent-data are unlikely to use this directly. | |
37 | ||
38 | The transaction manager | |
39 | ----------------------- | |
40 | ||
41 | dm-transaction-manager.[hc] | |
42 | ||
43 | This restricts access to blocks and enforces copy-on-write semantics. | |
44 | The only way you can get hold of a writable block through the | |
45 | transaction manager is by shadowing an existing block (ie. doing | |
46 | copy-on-write) or allocating a fresh one. Shadowing is elided within | |
47 | the same transaction so performance is reasonable. The commit method | |
48 | ensures that all data is flushed before it writes the superblock. | |
49 | On power failure your metadata will be as it was when last committed. | |
50 | ||
51 | The Space Maps | |
52 | -------------- | |
53 | ||
54 | dm-space-map.h | |
55 | dm-space-map-metadata.[hc] | |
56 | dm-space-map-disk.[hc] | |
57 | ||
58 | On-disk data structures that keep track of reference counts of blocks. | |
59 | Also acts as the allocator of new blocks. Currently two | |
60 | implementations: a simpler one for managing blocks on a different | |
61 | device (eg. thinly-provisioned data blocks); and one for managing | |
62 | the metadata space. The latter is complicated by the need to store | |
63 | its own data within the space it's managing. | |
64 | ||
65 | The data structures | |
66 | ------------------- | |
67 | ||
68 | dm-btree.[hc] | |
69 | dm-btree-remove.c | |
70 | dm-btree-spine.c | |
71 | dm-btree-internal.h | |
72 | ||
73 | Currently there is only one data structure, a hierarchical btree. | |
74 | There are plans to add more. For example, something with an | |
75 | array-like interface would see a lot of use. | |
76 | ||
77 | The btree is 'hierarchical' in that you can define it to be composed | |
78 | of nested btrees, and take multiple keys. For example, the | |
79 | thin-provisioning target uses a btree with two levels of nesting. | |
80 | The first maps a device id to a mapping tree, and that in turn maps a | |
81 | virtual block to a physical block. | |
82 | ||
83 | Values stored in the btrees can have arbitrary size. Keys are always | |
84 | 64bits, although nesting allows you to use multiple keys. |