Feb 112012
 

I have fielded a number of questions on VMware’s multicast support and figure it is time I did a short blog on it. There is a good white paper on the topic on the VMware site called Multicast Performance on vSphere 5.0 that deals with performance changes that have been made to enhance multicast support in vSphere 5.

The recurring question I get is how multicast is handled in vSphere. The short answer is the vSwitch does not play a role in the IGMP join and leave messages that the VMs send in order to start and stop receiving multicast groups respectively.

The vSwitch, and this is for both for the Standard Switch (VSS) and vSphere Distributed Switch (vDS), have an inherent knowledge of the configuration of the VM’s Virtual NIC (vNIC). Typically when a Guest OS is interested in receiving traffic from certain multicast group, the network stack in the guest OS pushes down the corresponding multicast MAC address to the vNIC. The vSwitch gets this multicast MAC address from the vNIC emulation directly and tracks it in the forwarding table. When the Guest OS sends out IGMP Join/Leave messages, the vSwitch does not interpret them and forwards them to the physical switch (pSwitch) which makes the usual decision on accepting the join, or not, based on it’s configuration. This is possible on the pSwitches because they use IGMP snooping and keep track, on a per physical port, what multicast groups to send out of each port.

When multicast traffic comes into the vSwitch, the vSwitch forwards copies of the traffic to subscribed VMs. Forwarding for multicast traffic is done the same way as unicast, and is based on destination MAC address. Since the vSwitch tracks which vNIC is interested in which multicast groups, it delivers packets only to right set of VMs. In this way the vSwitch does not deliver packet to all VMs but only to vNICs that match the forwarding table lookup.

If a VM leaves a multicast group, it will send a IGMP leave message which will be forwarded to the pSwitch and then removes the multicast MAC address from it’s vNIC to stop receiving the stream. The vSwitch will then remove the vNIC from the forwarding table for that multicast group. If the VM in question was the last one on the ESXi Server that had requested the multicast group, the pSwitch will also remove the group from the list of multicast groups to send out of the physical port.

What if the VM is vMotioned?

When a VM is vMotioned, it’s vNIC configuration goes with it. The destination hosts sees this vNIC configuration and updates it’s forwarding tables to forward the necessary multicast traffic it receives to the VM. To prevent any transient multicast packet loss after a vMotion, the vSwitch also injects an IGMP query into the VM, using its unicast MAC address, so that multicast receiver presence is known to the pSwitches immediately. This avoids the VM missing multicast traffic by having to wait for next IGMP query to come from a IGMP querier on the network.

The IGMP querier is usually a router on the network and is required in order for IGMP snooping to work on pSwitches. The pSwitches use this information in their multicast forwarding tables and without it would not be able to do IGMP snooping. The routers send out IGMP queries to address 224.0.0.1, all-systems multicast group, and the VMs that have subscribed to a multicast group respond with a membership report listing the groups they are participating in. The pSwitch snoops this information and updates its multicast forwarding tables to starts forwarding the multicast groups for the VM.

How about Physical NIC Teaming

Physical NIC teaming is supported but how it works is dependent on the type of load balancing scheme used.

If the physical NICs are all active and the teaming is virtual source port ID or MAC hash based, then the VM’s IGMP join messages will go out of the configured physical NIC and the corresponding pSwitch will update its multicast forwarding tables to send out the multicast group to the VM on the associated physical port.

For the case where one of the physical NICs starts out in standby mode and VMs are failed over to it. The vSwitch will, like in the vMotion case above, inject IGMP queries into the VMs affected by the failover so that multicast receiver presence is known to the pSwitch immediately to allow packets forwarding.

In the case of link aggregation that uses IP hash for load balancing, the pSwitch treats the pNICs as one channel and will fail the multicast traffic between the pNICs as they are all subscribed to the same groups. The pNIC used to send the multicast traffic to the vSwitch will depend on the pSwitch load-balancing scheme. Keep in mind that to use link aggregation with multiple pSwitches, the switches need to be a stack in order to look like a single switch to the ESXi servers.

In Closing

Multicast traffic is not one of those things that is talked about a lot in general virtualization implementations. There is great support for it now in vSphere with performance constantly being improved.

The main driving force behind multicast in virtualized environments is probably financial institutions that rely heavily on it for streaming of things like market data and video. With the use of 10GB NICs and performance improvements in multicast handling, there is now very little stopping the virtualization of even the most demanding of multicast based applications.

Mar 122011
 

 

IPv6 Basics

IPv6 Addresses:

IPv6 Addresses are 128bit as opposed to IPv4 which has 32 bit adddresses.  IPv6 address are represented as 8 groups 4 hexadecimal numbers.  This is is the first main difference from IPv4 which used 4 groups of decimal numbers to represent the IP address.  As an example my home network has a default router with an IP address of:

2001:0DB8:150F:2500:0000:0000:0000:0001

 

Note that I am using RFC 3849 prefix 2001:0DB8:: reserved for documentation in this blog.

 

IPv6 addresses can be simplified by removing  any group of four zeros between colons.  The address can thus be simplified to:

2001:DB8:150F:2500::::0001

The addresses can be simplified further by removing any colon that is in between collons resulting in:

2001:DB8:150F:2500::0001

Like most numbering schemes we can simplify this further by removing any leading zeros to get my final address which is:

2001:DB8:150F:2500::1

This is not so bad now is it.  Not that much different from the 192.168.2.1 addresses that you are used to.

Like IPv4 addresses IPv6 addresses have a network (prefix) portion and node (host) portion.  In my networks case, I have been assigned the

2001:DB8:150F:2500::/56

network. This means that my network has a 56 bit mask of:

FFFF:FFFF:FFFF:FF00

(or 255.255.255.255.255.255.255.0 old school IPv6 parlance)

This means I have an address range of

Start:  2001:DB8:150F:2500:0000:0000:0000:0000

End: 2001:DB8:150F:25FF:FFFF:FFFF:FFFF:FFFF

Good IPv6 subnetting says your subnets have to be /64 networks, meaning in my case that I have 256 (FF) subnets:

Start:    2001:DB8:150F:2500::

2001:DB8:150F:2501::

2001:DB8:150F:2502::

2001:DB8:150F:2503::

2001:DB8:150F:2504::

2001:DB8:150F:25F9::

2001:DB8:150F:25FA::

2001:DB8:150F:25FB::

2001:DB8:150F:25FC::

2001:DB8:150F:25FD::

2001:DB8:150F:25FE::

End:       2001:DB8:150F:25FF::

 

Each of these subnets has a node range that is 64bits using the first subnets the IP addresses for the nodes would be in the following range:

Start:  2001:DB8:150F:2500:0000:0000:0000:0000

End:    2001:DB8:150F:2500:FFFF:FFFF:FFFF:FFFF

As you can see these are a lot of nodes, exactly 2^64 == 1.844674407370955e+19 nodes in each of these subnets.

Address Assignment:

There are two options in IPv6 for assigning the the node part of an IPv6 address. You can use DHCP, or let the nodes auto configure their own IP addresses.  The recommended method is to let the nodes auto configure themselves by listening to a prefix advertised by a router.

DHCP6 is not different from IPv4 DHCP so I will not get that into it.  Just note that not all Operating systems that claim to support IPv6 have a DHCP client available.  If you are running Linux you can start the dhcp6c DHCP client to get a DHCP IPv6 address.  DHCP6 is enabled by default in Windows 7 (and I hear in Vista as well, but I skipped that version of Windows).  Mac OS X on the other hand does not have an option for DHCP6.

 

Auto Configuration; Prefix Announcements:

To cover all your bases, just enable prefix announcements and let the nodes auto configure themselves.  For auto configuration to work with prefix announcements the subnet has to use a /64 mask.  The network portion in an auto configured IPv6 address is based on the MAC address of the machine.  In my Apple laptops case, I have

MAC addrees: C8:BC:C8:D3:7E:6E

IP Address: 2001:DB8:150F:150F:CABC:C8FF:FED3:7E6E

Where as a Linux box with a Giga-Byte motherboard has

MAC addrees: 6C:F0:49:E6:13:98

IP Address: 2001:DB8:150F:150F:6EF0:49FF:FEE6:1398

 

As you can see the generation scheme is quite simple.  Add 2 to the first nibble (8 bits) son in my Macs case  C8 becomes CA.  The next two nibbles, and the last 3 nibbles are the rest of the MAC address.  Note that the FF:FE is used in between as a filler as a MAC addresses only have 48 bits and we need 64 for the node address.

Prefix announcements can also send out IPv6 DNS server addresses, but not much else.

 

Auto Configuration; Stateless:

If there is no Prefix announcement from a router, the IPv6 nodes can still auto configure themselves with non routable Link Local addresses similar to the IPv4 169.254.x.x addresses.  The method of generating the Link Local address is the same as I described in Prefix announcements, except the prefix is a predefined reserved FE80::/64. This is described in great detail in RFC 2462.  To use my mac laptops case I would have

MAC addrees: C8:BC:C8:D3:7E:6E

IP Address: FE80::CABC:C8FF:FED3:7E6E

To summarize there are actually a couple of different address types referred to as “scopes” that are associated with IPv6.  I will not attempt to describe them but will plagiarize the excellent description that I found on the University of Wisconsin knowledge base as they do an excellent job of describing them.

  • Global scope addresses are the regular globally reachable address and often registered in DNS. For UW-Madison, the global prefix is 2607:f388::/32.
  • Link-Local scope is used within a particular subnet only and are not routable at all. They start with the IPv6 prefix fe80::/64, unlike in IPv4 where link local addresses are used only if no valid IP is available, in IPv6 they are always configured.
  • Loopback is the how a host can refer to itself, similar to 127.0.0.1 in IPv4. The IPv6 address is ::1/128 and is also called Host Scope.
  • Multicast can be used both with link-local, site-local, and global scope. This is how, for example, nodes on a given LAN can find each other. Multicast addresses are in the range ff00::/8.
  • Broadcast is not used in IPv6 in favor of Multicast.
  • Site-Local scope is specific to an enterprise. However as an addressing range, it has been deprecated since 2004. Documentation that referrers to it or the range fec0::/10 is out of date.
  • Uniform Local Addressing to some degree replaces site-local. ULA is similar to RFC 1918 address in IPv4, but with some differences. ULA is relatively new, and there still is an amount of churn in the standards bodies about how the addresses should be used.

In the next article I will go over my home network and how I set it up.