Commit | Line | Data |
---|---|---|
709ac06a | 1 | |
c854a990 ES |
2 | BTRFS |
3 | ===== | |
709ac06a | 4 | |
c854a990 | 5 | Btrfs is a copy on write filesystem for Linux aimed at |
709ac06a DW |
6 | implementing advanced features while focusing on fault tolerance, |
7 | repair and easy administration. Initially developed by Oracle, Btrfs | |
8 | is licensed under the GPL and open for contribution from anyone. | |
9 | ||
10 | Linux has a wealth of filesystems to choose from, but we are facing a | |
11 | number of challenges with scaling to the large storage subsystems that | |
12 | are becoming common in today's data centers. Filesystems need to scale | |
13 | in their ability to address and manage large storage, and also in | |
14 | their ability to detect, repair and tolerate errors in the data stored | |
15 | on disk. Btrfs is under heavy development, and is not suitable for | |
16 | any uses other than benchmarking and review. The Btrfs disk format is | |
17 | not yet finalized. | |
18 | ||
19 | The main Btrfs features include: | |
20 | ||
21 | * Extent based file storage (2^64 max file size) | |
22 | * Space efficient packing of small files | |
23 | * Space efficient indexed directories | |
24 | * Dynamic inode allocation | |
25 | * Writable snapshots | |
26 | * Subvolumes (separate internal filesystem roots) | |
27 | * Object level mirroring and striping | |
28 | * Checksums on data and metadata (multiple algorithms available) | |
29 | * Compression | |
30 | * Integrated multiple device support, with several raid algorithms | |
31 | * Online filesystem check (not yet implemented) | |
32 | * Very fast offline filesystem check | |
33 | * Efficient incremental backup and FS mirroring (not yet implemented) | |
34 | * Online filesystem defragmentation | |
35 | ||
36 | ||
c854a990 ES |
37 | Mount Options |
38 | ============= | |
39 | ||
40 | When mounting a btrfs filesystem, the following option are accepted. | |
842bef58 | 41 | Options with (*) are default options and will not show in the mount options. |
c854a990 ES |
42 | |
43 | alloc_start=<bytes> | |
44 | Debugging option to force all block allocations above a certain | |
45 | byte threshold on each block device. The value is specified in | |
46 | bytes, optionally with a K, M, or G suffix, case insensitive. | |
47 | Default is 1MB. | |
48 | ||
fc0ca9af | 49 | noautodefrag(*) |
c854a990 | 50 | autodefrag |
fc0ca9af QW |
51 | Disable/enable auto defragmentation. |
52 | Auto defragmentation detects small random writes into files and queue | |
53 | them up for the defrag process. Works best for small files; | |
54 | Not well suited for large database workloads. | |
c854a990 ES |
55 | |
56 | check_int | |
57 | check_int_data | |
58 | check_int_print_mask=<value> | |
59 | These debugging options control the behavior of the integrity checking | |
60 | module (the BTRFS_FS_CHECK_INTEGRITY config option required). | |
61 | ||
62 | check_int enables the integrity checker module, which examines all | |
63 | block write requests to ensure on-disk consistency, at a large | |
64 | memory and CPU cost. | |
65 | ||
66 | check_int_data includes extent data in the integrity checks, and | |
67 | implies the check_int option. | |
68 | ||
69 | check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values | |
70 | as defined in fs/btrfs/check-integrity.c, to control the integrity | |
71 | checker module behavior. | |
72 | ||
73 | See comments at the top of fs/btrfs/check-integrity.c for more info. | |
74 | ||
906c176e DS |
75 | commit=<seconds> |
76 | Set the interval of periodic commit, 30 seconds by default. Higher | |
77 | values defer data being synced to permanent storage with obvious | |
78 | consequences when the system crashes. The upper bound is not forced, | |
79 | but a warning is printed if it's more than 300 seconds (5 minutes). | |
80 | ||
c854a990 ES |
81 | compress |
82 | compress=<type> | |
83 | compress-force | |
84 | compress-force=<type> | |
85 | Control BTRFS file data compression. Type may be specified as "zlib" | |
86 | "lzo" or "no" (for no compression, used for remounting). If no type | |
87 | is specified, zlib is used. If compress-force is specified, | |
88 | all files will be compressed, whether or not they compress well. | |
89 | If compression is enabled, nodatacow and nodatasum are disabled. | |
90 | ||
91 | degraded | |
92 | Allow mounts to continue with missing devices. A read-write mount may | |
93 | fail with too many devices missing, for example if a stripe member | |
94 | is completely missing. | |
95 | ||
96 | device=<devicepath> | |
97 | Specify a device during mount so that ioctls on the control device | |
9ed354b7 | 98 | can be avoided. Especially useful when trying to mount a multi-device |
c854a990 ES |
99 | setup as root. May be specified multiple times for multiple devices. |
100 | ||
e07a2ade | 101 | nodiscard(*) |
c854a990 | 102 | discard |
e07a2ade QW |
103 | Disable/enable discard mount option. |
104 | Discard issues frequent commands to let the block device reclaim space | |
105 | freed by the filesystem. | |
106 | This is useful for SSD devices, thinly provisioned | |
c854a990 ES |
107 | LUNs and virtual machine images, but may have a significant |
108 | performance impact. (The fstrim command is also available to | |
109 | initiate batch trims from userspace). | |
110 | ||
53036293 | 111 | noenospc_debug(*) |
c854a990 | 112 | enospc_debug |
53036293 | 113 | Disable/enable debugging option to be more verbose in some ENOSPC conditions. |
c854a990 ES |
114 | |
115 | fatal_errors=<action> | |
116 | Action to take when encountering a fatal error: | |
117 | "bug" - BUG() on a fatal error. This is the default. | |
118 | "panic" - panic() on a fatal error. | |
119 | ||
2c9ee856 | 120 | noflushoncommit(*) |
c854a990 ES |
121 | flushoncommit |
122 | The 'flushoncommit' mount option forces any data dirtied by a write in a | |
123 | prior transaction to commit as part of the current commit. This makes | |
124 | the committed state a fully consistent view of the file system from the | |
125 | application's perspective (i.e., it includes all completed file system | |
126 | operations). This was previously the behavior only when a snapshot is | |
127 | created. | |
128 | ||
129 | inode_cache | |
130 | Enable free inode number caching. Defaults to off due to an overflow | |
131 | problem when the free space crcs don't fit inside a single page. | |
132 | ||
133 | max_inline=<bytes> | |
134 | Specify the maximum amount of space, in bytes, that can be inlined in | |
135 | a metadata B-tree leaf. The value is specified in bytes, optionally | |
136 | with a K, M, or G suffix, case insensitive. In practice, this value | |
137 | is limited by the root sector size, with some space unavailable due | |
138 | to leaf headers. For a 4k sectorsize, max inline data is ~3900 bytes. | |
139 | ||
140 | metadata_ratio=<value> | |
141 | Specify that 1 metadata chunk should be allocated after every <value> | |
142 | data chunks. Off by default. | |
143 | ||
bd0330ad | 144 | acl(*) |
c854a990 | 145 | noacl |
bd0330ad | 146 | Enable/disable support for Posix Access Control Lists (ACLs). See the |
c854a990 ES |
147 | acl(5) manual page for more information about ACLs. |
148 | ||
842bef58 | 149 | barrier(*) |
c854a990 | 150 | nobarrier |
842bef58 QW |
151 | Enable/disable the use of block layer write barriers. Write barriers |
152 | ensure that certain IOs make it through the device cache and are on | |
153 | persistent storage. If disabled on a device with a volatile | |
154 | (non-battery-backed) write-back cache, nobarrier option will lead to | |
155 | filesystem corruption on a system crash or power loss. | |
c854a990 | 156 | |
a258af7a | 157 | datacow(*) |
c854a990 | 158 | nodatacow |
a258af7a QW |
159 | Enable/disable data copy-on-write for newly created files. |
160 | Nodatacow implies nodatasum, and disables all compression. | |
c854a990 | 161 | |
d399167d | 162 | datasum(*) |
c854a990 | 163 | nodatasum |
d399167d QW |
164 | Enable/disable data checksumming for newly created files. |
165 | Datasum implies datacow. | |
c854a990 | 166 | |
a88998f2 | 167 | treelog(*) |
c854a990 | 168 | notreelog |
a88998f2 | 169 | Enable/disable the tree logging used for fsync and O_SYNC writes. |
c854a990 ES |
170 | |
171 | recovery | |
172 | Enable autorecovery attempts if a bad tree root is found at mount time. | |
173 | Currently this scans a list of several previous tree roots and tries to | |
174 | use the first readable. | |
175 | ||
906c176e DS |
176 | rescan_uuid_tree |
177 | Force check and rebuild procedure of the UUID tree. This should not | |
178 | normally be needed. | |
179 | ||
180 | skip_balance | |
c854a990 ES |
181 | Skip automatic resume of interrupted balance operation after mount. |
182 | May be resumed with "btrfs balance resume." | |
183 | ||
184 | space_cache (*) | |
185 | Enable the on-disk freespace cache. | |
186 | nospace_cache | |
187 | Disable freespace cache loading without clearing the cache. | |
188 | clear_cache | |
189 | Force clearing and rebuilding of the disk space cache if something | |
190 | has gone wrong. | |
191 | ||
192 | ssd | |
193 | nossd | |
194 | ssd_spread | |
195 | Options to control ssd allocation schemes. By default, BTRFS will | |
196 | enable or disable ssd allocation heuristics depending on whether a | |
197 | rotational or nonrotational disk is in use. The ssd and nossd options | |
198 | can override this autodetection. | |
199 | ||
200 | The ssd_spread mount option attempts to allocate into big chunks | |
201 | of unused space, and may perform better on low-end ssds. ssd_spread | |
202 | implies ssd, enabling all other ssd heuristics as well. | |
203 | ||
204 | subvol=<path> | |
205 | Mount subvolume at <path> rather than the root subvolume. <path> is | |
206 | relative to the top level subvolume. | |
207 | ||
208 | subvolid=<ID> | |
209 | Mount subvolume specified by an ID number rather than the root subvolume. | |
210 | This allows mounting of subvolumes which are not in the root of the mounted | |
211 | filesystem. | |
212 | You can use "btrfs subvolume list" to see subvolume ID numbers. | |
213 | ||
214 | subvolrootid=<objectid> (deprecated) | |
215 | Mount subvolume specified by <objectid> rather than the root subvolume. | |
216 | This allows mounting of subvolumes which are not in the root of the mounted | |
217 | filesystem. | |
218 | You can use "btrfs subvolume show " to see the object ID for a subvolume. | |
219 | ||
220 | thread_pool=<number> | |
221 | The number of worker threads to allocate. The default number is equal | |
222 | to the number of CPUs + 2, or 8, whichever is smaller. | |
223 | ||
224 | user_subvol_rm_allowed | |
225 | Allow subvolumes to be deleted by a non-root user. Use with caution. | |
226 | ||
227 | MAILING LIST | |
228 | ============ | |
709ac06a DW |
229 | |
230 | There is a Btrfs mailing list hosted on vger.kernel.org. You can | |
231 | find details on how to subscribe here: | |
232 | ||
233 | http://vger.kernel.org/vger-lists.html#linux-btrfs | |
234 | ||
235 | Mailing list archives are available from gmane: | |
236 | ||
237 | http://dir.gmane.org/gmane.comp.file-systems.btrfs | |
238 | ||
239 | ||
240 | ||
c854a990 ES |
241 | IRC |
242 | === | |
709ac06a DW |
243 | |
244 | Discussion of Btrfs also occurs on the #btrfs channel of the Freenode | |
245 | IRC network. | |
246 | ||
247 | ||
248 | ||
249 | UTILITIES | |
250 | ========= | |
251 | ||
252 | Userspace tools for creating and manipulating Btrfs file systems are | |
253 | available from the git repository at the following location: | |
254 | ||
b52f75a5 AH |
255 | http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git |
256 | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git | |
709ac06a DW |
257 | |
258 | These include the following tools: | |
259 | ||
c7501796 | 260 | * mkfs.btrfs: create a filesystem |
709ac06a | 261 | |
c7501796 | 262 | * btrfs: a single tool to manage the filesystems, refer to the manpage for more details |
709ac06a | 263 | |
c7501796 | 264 | * 'btrfsck' or 'btrfs check': do a consistency check of the filesystem |
709ac06a | 265 | |
c7501796 | 266 | Other tools for specific tasks: |
709ac06a | 267 | |
c7501796 | 268 | * btrfs-convert: in-place conversion from ext2/3/4 filesystems |
709ac06a | 269 | |
c7501796 | 270 | * btrfs-image: dump filesystem metadata for debugging |