Commit | Line | Data |
---|---|---|
f2836352 JT |
1 | Guidance for writing policies |
2 | ============================= | |
3 | ||
4 | Try to keep transactionality out of it. The core is careful to | |
5 | avoid asking about anything that is migrating. This is a pain, but | |
6 | makes it easier to write the policies. | |
7 | ||
8 | Mappings are loaded into the policy at construction time. | |
9 | ||
10 | Every bio that is mapped by the target is referred to the policy. | |
11 | The policy can return a simple HIT or MISS or issue a migration. | |
12 | ||
13 | Currently there's no way for the policy to issue background work, | |
14 | e.g. to start writing back dirty blocks that are going to be evicte | |
15 | soon. | |
16 | ||
17 | Because we map bios, rather than requests it's easy for the policy | |
18 | to get fooled by many small bios. For this reason the core target | |
19 | issues periodic ticks to the policy. It's suggested that the policy | |
20 | doesn't update states (eg, hit counts) for a block more than once | |
21 | for each tick. The core ticks by watching bios complete, and so | |
22 | trying to see when the io scheduler has let the ios run. | |
23 | ||
24 | ||
25 | Overview of supplied cache replacement policies | |
26 | =============================================== | |
27 | ||
bccab6a0 MS |
28 | multiqueue (mq) |
29 | --------------- | |
f2836352 | 30 | |
9ed84698 | 31 | This policy is now an alias for smq (see below). |
f2836352 | 32 | |
9ed84698 | 33 | The following tunables are accepted, but have no effect: |
01911c19 | 34 | |
78e03d69 JT |
35 | 'sequential_threshold <#nr_sequential_ios>' |
36 | 'random_threshold <#nr_random_ios>' | |
37 | 'read_promote_adjustment <value>' | |
38 | 'write_promote_adjustment <value>' | |
39 | 'discard_promote_adjustment <value>' | |
f2836352 | 40 | |
bccab6a0 MS |
41 | Stochastic multiqueue (smq) |
42 | --------------------------- | |
43 | ||
44 | This policy is the default. | |
45 | ||
46 | The stochastic multi-queue (smq) policy addresses some of the problems | |
47 | with the multiqueue (mq) policy. | |
48 | ||
49 | The smq policy (vs mq) offers the promise of less memory utilization, | |
50 | improved performance and increased adaptability in the face of changing | |
51 | workloads. SMQ also does not have any cumbersome tuning knobs. | |
52 | ||
53 | Users may switch from "mq" to "smq" simply by appropriately reloading a | |
54 | DM table that is using the cache target. Doing so will cause all of the | |
55 | mq policy's hints to be dropped. Also, performance of the cache may | |
56 | degrade slightly until smq recalculates the origin device's hotspots | |
57 | that should be cached. | |
58 | ||
59 | Memory usage: | |
60 | The mq policy uses a lot of memory; 88 bytes per cache block on a 64 | |
61 | bit machine. | |
62 | ||
63 | SMQ uses 28bit indexes to implement it's data structures rather than | |
64 | pointers. It avoids storing an explicit hit count for each block. It | |
65 | has a 'hotspot' queue rather than a pre cache which uses a quarter of | |
66 | the entries (each hotspot block covers a larger area than a single | |
67 | cache block). | |
68 | ||
69 | All these mean smq uses ~25bytes per cache block. Still a lot of | |
70 | memory, but a substantial improvement nontheless. | |
71 | ||
72 | Level balancing: | |
73 | MQ places entries in different levels of the multiqueue structures | |
74 | based on their hit count (~ln(hit count)). This means the bottom | |
75 | levels generally have the most entries, and the top ones have very | |
76 | few. Having unbalanced levels like this reduces the efficacy of the | |
77 | multiqueue. | |
78 | ||
79 | SMQ does not maintain a hit count, instead it swaps hit entries with | |
80 | the least recently used entry from the level above. The over all | |
81 | ordering being a side effect of this stochastic process. With this | |
82 | scheme we can decide how many entries occupy each multiqueue level, | |
83 | resulting in better promotion/demotion decisions. | |
84 | ||
85 | Adaptability: | |
86 | The MQ policy maintains a hit count for each cache block. For a | |
87 | different block to get promoted to the cache it's hit count has to | |
88 | exceed the lowest currently in the cache. This means it can take a | |
89 | long time for the cache to adapt between varying IO patterns. | |
90 | Periodically degrading the hit counts could help with this, but I | |
91 | haven't found a nice general solution. | |
92 | ||
93 | SMQ doesn't maintain hit counts, so a lot of this problem just goes | |
94 | away. In addition it tracks performance of the hotspot queue, which | |
95 | is used to decide which blocks to promote. If the hotspot queue is | |
96 | performing badly then it starts moving entries more quickly between | |
97 | levels. This lets it adapt to new IO patterns very quickly. | |
98 | ||
99 | Performance: | |
100 | Testing SMQ shows substantially better performance than MQ. | |
101 | ||
8735a813 HM |
102 | cleaner |
103 | ------- | |
104 | ||
105 | The cleaner writes back all dirty blocks in a cache to decommission it. | |
106 | ||
f2836352 JT |
107 | Examples |
108 | ======== | |
109 | ||
110 | The syntax for a table is: | |
111 | cache <metadata dev> <cache dev> <origin dev> <block size> | |
112 | <#feature_args> [<feature arg>]* | |
113 | <policy> <#policy_args> [<policy arg>]* | |
114 | ||
115 | The syntax to send a message using the dmsetup command is: | |
116 | dmsetup message <mapped device> 0 sequential_threshold 1024 | |
117 | dmsetup message <mapped device> 0 random_threshold 8 | |
118 | ||
119 | Using dmsetup: | |
120 | dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ | |
121 | /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" | |
122 | creates a 128GB large mapped device named 'blah' with the | |
123 | sequential threshold set to 1024 and the random_threshold set to 8. |