Commit | Line | Data |
---|---|---|
562d897d DA |
1 | Virtual Routing and Forwarding (VRF) |
2 | ==================================== | |
3 | The VRF device combined with ip rules provides the ability to create virtual | |
4 | routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the | |
5 | Linux network stack. One use case is the multi-tenancy problem where each | |
6 | tenant has their own unique routing tables and in the very least need | |
7 | different default gateways. | |
8 | ||
9 | Processes can be "VRF aware" by binding a socket to the VRF device. Packets | |
10 | through the socket then use the routing table associated with the VRF | |
11 | device. An important feature of the VRF device implementation is that it | |
12 | impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected | |
13 | (ie., they do not need to be run in each VRF). The design also allows | |
14 | the use of higher priority ip rules (Policy Based Routing, PBR) to take | |
15 | precedence over the VRF device rules directing specific traffic as desired. | |
16 | ||
17 | In addition, VRF devices allow VRFs to be nested within namespaces. For | |
18 | example network namespaces provide separation of network interfaces at L1 | |
19 | (Layer 1 separation), VLANs on the interfaces within a namespace provide | |
20 | L2 separation and then VRF devices provide L3 separation. | |
21 | ||
22 | Design | |
23 | ------ | |
24 | A VRF device is created with an associated route table. Network interfaces | |
25 | are then enslaved to a VRF device: | |
26 | ||
27 | +-----------------------------+ | |
28 | | vrf-blue | ===> route table 10 | |
29 | +-----------------------------+ | |
30 | | | | | |
31 | +------+ +------+ +-------------+ | |
32 | | eth1 | | eth2 | ... | bond1 | | |
33 | +------+ +------+ +-------------+ | |
34 | | | | |
35 | +------+ +------+ | |
36 | | eth8 | | eth9 | | |
37 | +------+ +------+ | |
38 | ||
39 | Packets received on an enslaved device and are switched to the VRF device | |
40 | using an rx_handler which gives the impression that packets flow through | |
41 | the VRF device. Similarly on egress routing rules are used to send packets | |
42 | to the VRF device driver before getting sent out the actual interface. This | |
43 | allows tcpdump on a VRF device to capture all packets into and out of the | |
44 | VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied | |
45 | using the VRF device to specify rules that apply to the VRF domain as a whole. | |
46 | ||
47 | [1] Packets in the forwarded state do not flow through the device, so those | |
48 | packets are not seen by tcpdump. Will revisit this limitation in a | |
49 | future release. | |
50 | ||
51 | [2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev | |
52 | set to real ingress device and egress is limited to NF_INET_POST_ROUTING. | |
53 | Will revisit this limitation in a future release. | |
54 | ||
55 | ||
56 | Setup | |
57 | ----- | |
58 | 1. VRF device is created with an association to a FIB table. | |
59 | e.g, ip link add vrf-blue type vrf table 10 | |
60 | ip link set dev vrf-blue up | |
61 | ||
62 | 2. Rules are added that send lookups to the associated FIB table when the | |
63 | iif or oif is the VRF device. e.g., | |
64 | ip ru add oif vrf-blue table 10 | |
65 | ip ru add iif vrf-blue table 10 | |
66 | ||
67 | Set the default route for the table (and hence default route for the VRF). | |
68 | e.g, ip route add table 10 prohibit default | |
69 | ||
70 | 3. Enslave L3 interfaces to a VRF device. | |
71 | e.g, ip link set dev eth1 master vrf-blue | |
72 | ||
73 | Local and connected routes for enslaved devices are automatically moved to | |
74 | the table associated with VRF device. Any additional routes depending on | |
75 | the enslaved device will need to be reinserted following the enslavement. | |
76 | ||
77 | 4. Additional VRF routes are added to associated table. | |
78 | e.g., ip route add table 10 ... | |
79 | ||
80 | ||
81 | Applications | |
82 | ------------ | |
83 | Applications that are to work within a VRF need to bind their socket to the | |
84 | VRF device: | |
85 | ||
86 | setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); | |
87 | ||
88 | or to specify the output device using cmsg and IP_PKTINFO. | |
89 | ||
90 | ||
91 | Limitations | |
92 | ----------- | |
562d897d DA |
93 | Index of original ingress interface is not available via cmsg. Will address |
94 | soon. | |
4b418bff DA |
95 | |
96 | ################################################################################ | |
97 | ||
98 | Using iproute2 for VRFs | |
99 | ======================= | |
100 | VRF devices do *not* have to start with 'vrf-'. That is a convention used here | |
101 | for emphasis of the device type, similar to use of 'br' in bridge names. | |
102 | ||
103 | 1. Create a VRF | |
104 | ||
105 | To instantiate a VRF device and associate it with a table: | |
106 | $ ip link add dev NAME type vrf table ID | |
107 | ||
108 | Remember to add the ip rules as well: | |
109 | $ ip ru add oif NAME table 10 | |
110 | $ ip ru add iif NAME table 10 | |
111 | $ ip -6 ru add oif NAME table 10 | |
112 | $ ip -6 ru add iif NAME table 10 | |
113 | ||
114 | Without the rules route lookups are not directed to the table. | |
115 | ||
116 | For example: | |
117 | $ ip link add dev vrf-blue type vrf table 10 | |
118 | $ ip ru add pref 200 oif vrf-blue table 10 | |
119 | $ ip ru add pref 200 iif vrf-blue table 10 | |
120 | $ ip -6 ru add pref 200 oif vrf-blue table 10 | |
121 | $ ip -6 ru add pref 200 iif vrf-blue table 10 | |
122 | ||
123 | ||
124 | 2. List VRFs | |
125 | ||
126 | To list VRFs that have been created: | |
127 | $ ip [-d] link show type vrf | |
128 | NOTE: The -d option is needed to show the table id | |
129 | ||
130 | For example: | |
131 | $ ip -d link show type vrf | |
132 | 11: vrf-mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 | |
133 | link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0 | |
134 | vrf table 1 addrgenmode eui64 | |
135 | 12: vrf-red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 | |
136 | link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0 | |
137 | vrf table 10 addrgenmode eui64 | |
138 | 13: vrf-blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 | |
139 | link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0 | |
140 | vrf table 66 addrgenmode eui64 | |
141 | 14: vrf-green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 | |
142 | link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 | |
143 | vrf table 81 addrgenmode eui64 | |
144 | ||
145 | ||
146 | Or in brief output: | |
147 | ||
148 | $ ip -br link show type vrf | |
149 | vrf-mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP> | |
150 | vrf-red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP> | |
151 | vrf-blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP> | |
152 | vrf-green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP> | |
153 | ||
154 | ||
155 | 3. Assign a Network Interface to a VRF | |
156 | ||
157 | Network interfaces are assigned to a VRF by enslaving the netdevice to a | |
158 | VRF device: | |
159 | $ ip link set dev NAME master VRF-NAME | |
160 | ||
161 | On enslavement connected and local routes are automatically moved to the | |
162 | table associated with the VRF device. | |
163 | ||
164 | For example: | |
165 | $ ip link set dev eth0 master vrf-mgmt | |
166 | ||
167 | ||
168 | 4. Show Devices Assigned to a VRF | |
169 | ||
170 | To show devices that have been assigned to a specific VRF add the master | |
171 | option to the ip command: | |
172 | $ ip link show master VRF-NAME | |
173 | ||
174 | For example: | |
175 | $ ip link show master vrf-red | |
176 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP mode DEFAULT group default qlen 1000 | |
177 | link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff | |
178 | 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP mode DEFAULT group default qlen 1000 | |
179 | link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff | |
180 | 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master vrf-red state DOWN mode DEFAULT group default qlen 1000 | |
181 | link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff | |
182 | ||
183 | ||
184 | Or using the brief output: | |
185 | $ ip -br link show master vrf-red | |
186 | eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> | |
187 | eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP> | |
188 | eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST> | |
189 | ||
190 | ||
191 | 5. Show Neighbor Entries for a VRF | |
192 | ||
193 | To list neighbor entries associated with devices enslaved to a VRF device | |
194 | add the master option to the ip command: | |
195 | $ ip [-6] neigh show master VRF-NAME | |
196 | ||
197 | For example: | |
198 | $ ip neigh show master vrf-red | |
199 | 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE | |
200 | 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE | |
201 | ||
202 | $ ip -6 neigh show master vrf-red | |
203 | 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE | |
204 | ||
205 | ||
206 | 6. Show Addresses for a VRF | |
207 | ||
208 | To show addresses for interfaces associated with a VRF add the master | |
209 | option to the ip command: | |
210 | $ ip addr show master VRF-NAME | |
211 | ||
212 | For example: | |
213 | $ ip addr show master vrf-red | |
214 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP group default qlen 1000 | |
215 | link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff | |
216 | inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1 | |
217 | valid_lft forever preferred_lft forever | |
218 | inet6 2002:1::2/120 scope global | |
219 | valid_lft forever preferred_lft forever | |
220 | inet6 fe80::ff:fe00:202/64 scope link | |
221 | valid_lft forever preferred_lft forever | |
222 | 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vrf-red state UP group default qlen 1000 | |
223 | link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff | |
224 | inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2 | |
225 | valid_lft forever preferred_lft forever | |
226 | inet6 2002:2::2/120 scope global | |
227 | valid_lft forever preferred_lft forever | |
228 | inet6 fe80::ff:fe00:203/64 scope link | |
229 | valid_lft forever preferred_lft forever | |
230 | 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master vrf-red state DOWN group default qlen 1000 | |
231 | link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff | |
232 | ||
233 | Or in brief format: | |
234 | $ ip -br addr show master vrf-red | |
235 | eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64 | |
236 | eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64 | |
237 | eth5 DOWN | |
238 | ||
239 | ||
240 | 7. Show Routes for a VRF | |
241 | ||
242 | To show routes for a VRF use the ip command to display the table associated | |
243 | with the VRF device: | |
244 | $ ip [-6] route show table ID | |
245 | ||
246 | For example: | |
247 | $ ip route show table vrf-red | |
248 | prohibit default | |
249 | broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 | |
250 | 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 | |
251 | local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 | |
252 | broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 | |
253 | broadcast 10.2.2.0 dev eth2 proto kernel scope link src 10.2.2.2 | |
254 | 10.2.2.0/24 dev eth2 proto kernel scope link src 10.2.2.2 | |
255 | local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2 | |
256 | broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2 | |
257 | ||
258 | $ ip -6 route show table vrf-red | |
259 | local 2002:1:: dev lo proto none metric 0 pref medium | |
260 | local 2002:1::2 dev lo proto none metric 0 pref medium | |
261 | 2002:1::/120 dev eth1 proto kernel metric 256 pref medium | |
262 | local 2002:2:: dev lo proto none metric 0 pref medium | |
263 | local 2002:2::2 dev lo proto none metric 0 pref medium | |
264 | 2002:2::/120 dev eth2 proto kernel metric 256 pref medium | |
265 | local fe80:: dev lo proto none metric 0 pref medium | |
266 | local fe80:: dev lo proto none metric 0 pref medium | |
267 | local fe80::ff:fe00:202 dev lo proto none metric 0 pref medium | |
268 | local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium | |
269 | fe80::/64 dev eth1 proto kernel metric 256 pref medium | |
270 | fe80::/64 dev eth2 proto kernel metric 256 pref medium | |
271 | ff00::/8 dev vrf-red metric 256 pref medium | |
272 | ff00::/8 dev eth1 metric 256 pref medium | |
273 | ff00::/8 dev eth2 metric 256 pref medium | |
274 | ||
275 | ||
276 | 8. Route Lookup for a VRF | |
277 | ||
278 | A test route lookup can be done for a VRF by adding the oif option to ip: | |
279 | $ ip [-6] route get oif VRF-NAME ADDRESS | |
280 | ||
281 | For example: | |
282 | $ ip route get 10.2.1.40 oif vrf-red | |
283 | 10.2.1.40 dev eth1 table vrf-red src 10.2.1.2 | |
284 | cache | |
285 | ||
286 | $ ip -6 route get 2002:1::32 oif vrf-red | |
287 | 2002:1::32 from :: dev eth1 table vrf-red proto kernel src 2002:1::2 metric 256 pref medium | |
288 | ||
289 | ||
290 | 9. Removing Network Interface from a VRF | |
291 | ||
292 | Network interfaces are removed from a VRF by breaking the enslavement to | |
293 | the VRF device: | |
294 | $ ip link set dev NAME nomaster | |
295 | ||
296 | Connected routes are moved back to the default table and local entries are | |
297 | moved to the local table. | |
298 | ||
299 | For example: | |
300 | $ ip link set dev eth0 nomaster | |
301 | ||
302 | -------------------------------------------------------------------------------- | |
303 | ||
304 | Commands used in this example: | |
305 | ||
306 | cat >> /etc/iproute2/rt_tables <<EOF | |
307 | 1 vrf-mgmt | |
308 | 10 vrf-red | |
309 | 66 vrf-blue | |
310 | 81 vrf-green | |
311 | EOF | |
312 | ||
313 | function vrf_create | |
314 | { | |
315 | VRF=$1 | |
316 | TBID=$2 | |
317 | # create VRF device | |
318 | ip link add vrf-${VRF} type vrf table ${TBID} | |
319 | ||
320 | # add rules that direct lookups to vrf table | |
321 | ip ru add pref 200 oif vrf-${VRF} table ${TBID} | |
322 | ip ru add pref 200 iif vrf-${VRF} table ${TBID} | |
323 | ip -6 ru add pref 200 oif vrf-${VRF} table ${TBID} | |
324 | ip -6 ru add pref 200 iif vrf-${VRF} table ${TBID} | |
325 | ||
326 | if [ "${VRF}" != "mgmt" ]; then | |
327 | ip route add table ${TBID} prohibit default | |
328 | fi | |
329 | ip link set dev vrf-${VRF} up | |
330 | ip link set dev vrf-${VRF} state up | |
331 | } | |
332 | ||
333 | vrf_create mgmt 1 | |
334 | ip link set dev eth0 master vrf-mgmt | |
335 | ||
336 | vrf_create red 10 | |
337 | ip link set dev eth1 master vrf-red | |
338 | ip link set dev eth2 master vrf-red | |
339 | ip link set dev eth5 master vrf-red | |
340 | ||
341 | vrf_create blue 66 | |
342 | ip link set dev eth3 master vrf-blue | |
343 | ||
344 | vrf_create green 81 | |
345 | ip link set dev eth4 master vrf-green | |
346 | ||
347 | ||
348 | Interface addresses from /etc/network/interfaces: | |
349 | auto eth0 | |
350 | iface eth0 inet static | |
351 | address 10.0.0.2 | |
352 | netmask 255.255.255.0 | |
353 | gateway 10.0.0.254 | |
354 | ||
355 | iface eth0 inet6 static | |
356 | address 2000:1::2 | |
357 | netmask 120 | |
358 | ||
359 | auto eth1 | |
360 | iface eth1 inet static | |
361 | address 10.2.1.2 | |
362 | netmask 255.255.255.0 | |
363 | ||
364 | iface eth1 inet6 static | |
365 | address 2002:1::2 | |
366 | netmask 120 | |
367 | ||
368 | auto eth2 | |
369 | iface eth2 inet static | |
370 | address 10.2.2.2 | |
371 | netmask 255.255.255.0 | |
372 | ||
373 | iface eth2 inet6 static | |
374 | address 2002:2::2 | |
375 | netmask 120 | |
376 | ||
377 | auto eth3 | |
378 | iface eth3 inet static | |
379 | address 10.2.3.2 | |
380 | netmask 255.255.255.0 | |
381 | ||
382 | iface eth3 inet6 static | |
383 | address 2002:3::2 | |
384 | netmask 120 | |
385 | ||
386 | auto eth4 | |
387 | iface eth4 inet static | |
388 | address 10.2.4.2 | |
389 | netmask 255.255.255.0 | |
390 | ||
391 | iface eth4 inet6 static | |
392 | address 2002:4::2 | |
393 | netmask 120 |