I have fielded a number of questions on VMware’s multicast support and figure it is time I did a short blog on it. There is a good white paper on the topic on the VMware site called Multicast Performance on vSphere 5.0 that deals with performance changes that have been made to enhance multicast support in vSphere 5.
The recurring question I get is how multicast is handled in vSphere. The short answer is the vSwitch does not play a role in the IGMP join and leave messages that the VMs send in order to start and stop receiving multicast groups respectively.
The vSwitch, and this is for both for the VSS and vDS, have an inherent knowledge of the configuration of the VM’s vNICs. When a VM reconfigure it’s vNIC and sends out an IGMP join in order to participate in a multicast group, the vSwitch takes a note of this. The IGMP join message is passed on to the physical switches and they make the usual decision on accepting the join, or not, based on their configuration. This is possible on the physical switches because they are doing IGMP snooping and keep track, on a per physical port, what multicast groups to send out to each endpoint based on the join messages they have snooped. Because the vSwitch is not participating in the multicast routing decisions, it will work with either IGMP and PIM for multicast.
When a multicast traffic comes into the vSwitch, the vSwitch looks up the VMs that it knows have subscribed to the group and forwards copies of the traffic only to those VMs. In this way the vSwitch will act as a multicast filter and not flood all multicast packets to all vNICs in the L2 like it would with a broadcast. If a VM leaves a multicast group, it will send a IGMP leave message which will be forwarded to the physical switches and reconfigure it’s vNIC to stop receiving the stream. On seeing this reconfiguration, the vSwitch will stop forwarding that specific multicast group to the VM. If the VM in question was the last one on the ESXi Server that had requested the multicast group, then the physical switch will also remove the group from the list of multicast groups to send out of it’s physical port.
What if the VM is VMotioned?
When a VM is VMotioned, it’s vNIC configuration goes with it. The destination hosts sees this configuration and forwards the necessary multicast traffic it receives to the VM. If it so happens that the VM moves to a host that did not have any VMs subscribed to it’s multicast group, then the vSwitch relies on periodic multicast membership query requests from multicast enabled routers. The routers send out a multicast membership query to address 126.96.36.199, all-systems multicast group, and the VMs that have subscribed to a multicast group respond with a membership report listing the groups they are participating in. the physical switch snoops this and starts forwarding the multicast group for the VM.
How about Physical NIC Teaming
Physical NIC teaming is supported but how it works is dependent on the type of load balancing scheme used.
If the physical NICs are all active and the teaming is virtual source port ID or MAC Has based, then the VM IGMP join messages will go out of the configured physical NIC and the corresponding physical port will be configured to send out the multicast group for the VM.
For the case where one of the physical NICs starts out in standby mode and VMs are failed over to it. The vSwitch will rely on membership queries and membership reports from the routers and VMs respectively in order to subscribe the physical port associated with the NIC to the multicast groups that the VMs need.
In the case of link aggregation that uses IP hash for load balancing, the physical switch treats the NICs as one channel and will fail the multicast traffic between them as they are all subscribed to the same groups. The NIC used to send the multicast traffic to the vSwitch will depend on the physical switch load-balancing scheme. As such this is a good load-balancing scheme for the cases where you need to reduce the amount of missed multicast traffic in due to link failure. Keep in mind that to use link aggregation with multiple physical switches, that they need to be a stack in order to look like a single switch to the ESXi servers