I have fielded a number of questions on VMware’s multicast support and figure it is time I did a short blog on it. There is a good white paper on the topic on the VMware site called Multicast Performance on vSphere 5.0 that deals with performance changes that have been made to enhance multicast support in vSphere 5.
The recurring question I get is how multicast is handled in vSphere. The short answer is the vSwitch does not play a role in the IGMP join and leave messages that the VMs send in order to start and stop receiving multicast groups respectively.
The vSwitch, and this is for both for the Standard Switch (VSS) and vSphere Distributed Switch (vDS), have an inherent knowledge of the configuration of the VM’s Virtual NIC (vNIC). Typically when a Guest OS is interested in receiving traffic from certain multicast group, the network stack in the guest OS pushes down the corresponding multicast MAC address to the vNIC. The vSwitch gets this multicast MAC address from the vNIC emulation directly and tracks it in the forwarding table. When the Guest OS sends out IGMP Join/Leave messages, the vSwitch does not interpret them and forwards them to the physical switch (pSwitch) which makes the usual decision on accepting the join, or not, based on it’s configuration. This is possible on the pSwitches because they use IGMP snooping and keep track, on a per physical port, what multicast groups to send out of each port.
When multicast traffic comes into the vSwitch, the vSwitch forwards copies of the traffic to subscribed VMs. Forwarding for multicast traffic is done the same way as unicast, and is based on destination MAC address. Since the vSwitch tracks which vNIC is interested in which multicast groups, it delivers packets only to right set of VMs. In this way the vSwitch does not deliver packet to all VMs but only to vNICs that match the forwarding table lookup.
If a VM leaves a multicast group, it will send a IGMP leave message which will be forwarded to the pSwitch and then removes the multicast MAC address from it’s vNIC to stop receiving the stream. The vSwitch will then remove the vNIC from the forwarding table for that multicast group. If the VM in question was the last one on the ESXi Server that had requested the multicast group, the pSwitch will also remove the group from the list of multicast groups to send out of the physical port.
What if the VM is vMotioned?
When a VM is vMotioned, it’s vNIC configuration goes with it. The destination hosts sees this vNIC configuration and updates it’s forwarding tables to forward the necessary multicast traffic it receives to the VM. To prevent any transient multicast packet loss after a vMotion, the vSwitch also injects an IGMP query into the VM, using its unicast MAC address, so that multicast receiver presence is known to the pSwitches immediately. This avoids the VM missing multicast traffic by having to wait for next IGMP query to come from a IGMP querier on the network.
The IGMP querier is usually a router on the network and is required in order for IGMP snooping to work on pSwitches. The pSwitches use this information in their multicast forwarding tables and without it would not be able to do IGMP snooping. The routers send out IGMP queries to address 220.127.116.11, all-systems multicast group, and the VMs that have subscribed to a multicast group respond with a membership report listing the groups they are participating in. The pSwitch snoops this information and updates its multicast forwarding tables to starts forwarding the multicast groups for the VM.
How about Physical NIC Teaming
Physical NIC teaming is supported but how it works is dependent on the type of load balancing scheme used.
If the physical NICs are all active and the teaming is virtual source port ID or MAC hash based, then the VM’s IGMP join messages will go out of the configured physical NIC and the corresponding pSwitch will update its multicast forwarding tables to send out the multicast group to the VM on the associated physical port.
For the case where one of the physical NICs starts out in standby mode and VMs are failed over to it. The vSwitch will, like in the vMotion case above, inject IGMP queries into the VMs affected by the failover so that multicast receiver presence is known to the pSwitch immediately to allow packets forwarding.
In the case of link aggregation that uses IP hash for load balancing, the pSwitch treats the pNICs as one channel and will fail the multicast traffic between the pNICs as they are all subscribed to the same groups. The pNIC used to send the multicast traffic to the vSwitch will depend on the pSwitch load-balancing scheme. Keep in mind that to use link aggregation with multiple pSwitches, the switches need to be a stack in order to look like a single switch to the ESXi servers.
Multicast traffic is not one of those things that is talked about a lot in general virtualization implementations. There is great support for it now in vSphere with performance constantly being improved.
The main driving force behind multicast in virtualized environments is probably financial institutions that rely heavily on it for streaming of things like market data and video. With the use of 10GB NICs and performance improvements in multicast handling, there is now very little stopping the virtualization of even the most demanding of multicast based applications.