Quantifying Skype QoS (VoIP)

8.4 Skype is highly recommended by the users, because of its cost-saving advantage over traditional telephones. It also overmatches its peer softphones on the quality of service. 8.4.1

Skype Quality of Service

The performance of Skype was compared with another commonly used VoIP client software MSN messager by Gao and Luo [11]. The set up delay and M2E delay for the audio conversation functions were compared. The experiments were performed under different network environments including LAN, WLAN, DSL, GPRS, and CDMA2000x, with the clients located in same or different domains. Results Connection Setup Delay: The connection setup delay is measured for the two softwares. In most cases, Skype outperformed MSN messager with shorter delays. Because Skype can traverse almost all types of NAT and firewalls, it can work under every testing condition, whereas MSN can only set up communication for clients in the same domain or no NAT/ firewall is involved. The setup delay for Skype varies due to the traversal process. M2E Delay: The M2E delay is the the mouth-to-ear delay measured in the conversation. The experiments show that when the network condition is good, that is, when in the same domain or no NAT/firewall exists, MSN has shorter M2E delays in many such cases. But both MSN and Skype has acceptable M2E delays, which does not much effect the quality of voice. 8.4.2

Skype Connection to PSTN

The design of Skype also considers the compatibility with the traditional telephone network. In order to connect to PSTN, common VoIP solutions set a conjunction point between the two different networks, and signals are converted at this point. In Skype, the connecting servers are SkypeIn and SkypeOut, respectively. SkypeIn represents the IP end point as a normal phone number to the PSTN side. When a PSTN number is dialed by Skype client software, the call is directed to SkypeOut, which translates the PSTN number and setups the connection. During the conversation, SkypeIn/ SkypeOut are responsible to convert the analog signal of voice data to digital format and packetize them at the IP side. At the same time, SkypeIn and SkypeOut servers are connected to the Skype network as normal clients. Even functionally speaking, they run completely different tasks from the ordinary client. The quantification of quality of voice in Skype-to-PSTN scenario is difficult, because the delay can be generated in both network transition and signal converting stages, and there is less information about the SkypeIn/SkypeOut path. An alternative is to consider in an end-to-end way, and measure the E2M delay during the conversation. SkypeIn/SkypeOut provide high performance service for connecting PC to landline, but compared to Skype-to-Skype telephony, the delay increase is obvious, caused by the signal converting process and capacity limitation of SkypeIn/SkypeOut. 8.4.3

Skype Group Conversation

Skype also supports group conversation. According to previous analysis on Skype traffic, Skype uses end-mixing strategy in group conversation, that is, one of the members in the group is chosen as the leader. The leader is responsible for accepting audio from all other members, mixing the speeches, and forwarding them to all other participants. As illustrated in Figure 8.4, in the end-mixing topology network, normal participants transmit their own speech data to the mixer, or leader, where the voices are mixed and sent back to them. Normal participants get a mixed copy of the voices from all other members in the conference, including the mixer. The mixer collects all the individual voices, and also generates a copy for its own use. Although many measurement methods have been developed for two-party conversation, few studies have been done on the multi-party scenario. In a multi-party conversation, the additional audio-mixing process should be considered, but the impact of this process has not yet been well studied. Fu et al. [12] proposed a new metric Group Mean Opinion Score (GMOS) based on two-party MOS. The key idea of GMOS was to calculate a subjective score based on MOS from each pair of speaker and listener in the conference. Among the users in the voice conference, the quality of voice perceived by each participant can be very different. Because of the heterogeneous network conditions, even for the same listener, the voice from different speaker may not be of the same quality. In GMOS model, assuming that the session has N participants, P1, P2,. . . , PN, each participant provides MOS scores for the rest N – 1 others.

Topology of multi-party Skype conversation.

FIGURE 8.4 Topology of multi-party Skype conversation. Hence from one participant’s perspective, say the i th participant Vu his/her score to the overall quality of the conference session is his/her GMOS towards this conference. And it can be computed as where a is used to adjust the listener’s subjective opinion towards the quality of voice and MOS(k) is the MOS score set by participant i for participant k.


Experimental Setup

Using the new metric, Fu et al. [12] performed experiments of three-person, four-person, and five-person conferences, in which the participants were in different locations, as shown in Figure 8.5. MOS and GMOS scores were collected from pairs of listeners and speakers.


The experimental results of a are close to 0. The listeners score with a G (0, 1] are positive towards the voice quality, whereas the listeners score with a G [-1, 0) are negative towards the voice quality. About 10% of the GMOSes are beyond the MIN and MAX range, that is,

Network setting of Skype conference.

FIGURE 8.5 Network setting of Skype conference. about 50% of the values of a G [-0.2, 0.2]. For the 392 as, the average is 0.093 with the standard deviation of 0.464. Thus statistically, these subjects are positive on average for Skype. Using the existing objective quality assessment model for two-party conversation, the MOS for each pair can be calculated. For example, Fu et al. [12] developed a Two Step Mapping Method (TSMM) [12], which uses E-Model to estimate MOS scores in one conference, and then compute GMOS based on these MOSes. As a result, the calculated GMOS is very close to the subjectively collected GMOS of the conference. 8.4.4

Skype Over Wireless Environment

With the wide deployment of wireless networks, people would question if the mobile environment can provide sufficient data transfer rates to support the VoIP applications. Hofifeld et al. [13] have analyzed the achievable and actual quality of IP-based telephony calls on Skype under wireless environments. The study contains two parts: performance measurements in a real Universal Mobile Telecommunications System (UMTS) network and in a test environment which emulate rate control mechanisms and changing system condition of UMTS networks. In the experiments, PESQ is used to evaluate quality of voice, packet loss, inter-packet delay, and throughput, and capture the influence of network-based factors. Also, Network Utility Function (NUF) is applied to describe the impact of the network on the quality of voice eventually perceived by end-users. As the PESQ model requires signals from both the sender and receiver, the voice data were recorded in wav files. Therefore, the degradation of the quality of voice because of the Skype iLBC codec should be considered. In the experiments, the PESQ value of 3.93 was used as reference for the result of audio codec degradation. The PESQ reduction caused by the network connectivity between sender and receiver is described by the Network Utility Function (NUF) UNetw: PESQrcvd — UNetw * PESQsent. The UNF takes the impact of several common network factors into consideration: Here the m-Utility Function (m-UF) Um captures the variation of the mean throughput during a certain observation window AW. msent is the average throughput of sender, and mrcvd is the average throughput of receiver. Assuming a linear dependence on the loss ratio: l + max{l – (mrcvd/msent), 0}, then Um can be calculated as Um = max{1 – kml, 0}, where km denotes the degree of utility reduction. Us denotes the s-Utility Function (s-UF), which captures the change of the standard deviation of the throughput from ssent to srcvd during an observation window AW. To calculate the standard deviation, the throughput values are averages during a short interval of AT. The relative change of the standard deviation is denoted as s = (srcvd – ssent )/ssent. Then, Us can be calculated as


Here the parameter k+ reflects the decrease of Us when the standard deviation doubles, while k- is used when the standard deviation is vanishing.

Experimental Setup

First, a bottleneck scenario is set up in a LAN, which includes a traffic-shaping router to measure the effect of bandwidth variation on VoIP services. Skype clients are then connected to the Internet via a public UMTS operator, and the uplink and downlink of VoIP traffic are monitored. Figure 8.6 shows the experimental setup in the UMTS scenario, as well as the service degradation caused by Skype codec.


In the bottleneck LAN experiment, the dynamically changing condition in UMTS is emulated by generating different packet loss rates. Skype is able to detect packet losses and treat them as an indication of congestion in the network. It then increases the throughput of the sender by sending packets with larger payload. The result of Skype throughput and PESQ score are presented in Figure 8.7. In the real UMTS experiment, the packet interarrival time and PESQ scores are measured for uplink and downlink, and listed in Tables 8.4-8.7. In this study, as expected, the packet losses degrade the PESQ value in the bottleneck LAN experiment. Moreover, due to network jitter and the use of a different codec, the PESQ values in the public UMTS environment are worse than those in the bottleneck LAN emulation. However, the quality of voice is still acceptable, indicating that the capacity offered by UMTS is sufficient for supporting mobile VoIP calls. 8.4.5

Skype User Satisfactions

In addition to the measurement of the perceptual quality of voice for Skype, Chen et al. [14] presented a model to quantify a new objective index of user satisfaction. The concept of user satisfaction is based on the assumption that if users are not satisfied with the service, they will terminate the conversation sooner, resulting in a shorter session. Hence a new metric, User Satisfaction Index (USI), is built upon call durations.

Skype conversation on UMTS.

FIGURE 8.6 Skype conversation on UMTS.

Bottleneck LAN scenario.

FIGURE 8.7 Bottleneck LAN scenario.


Key Performance Measures for UMTS Uplink Scenario


Received Packets in the UMTS Uplink Scenario


Key Performance Measures for UMTS Downlink Scenario


Received Packets in the UMTS Downlink Scenario

Similar to the measurement of the perceptual quality of voice, in the new model, common network factors are analyzed against call durations. The statistical study on experimental traces indicates that the bitrate and jitter are closely related to the session time. As a result, the USI of a session is defined as follows: USI = -//fZ (8.9) = 2.15 * log(bitrate) – 1.55 * log(jitter) – 0.36 * RTT, (8.10)

Voice interactivity metrics.

FIGURE 8.8 Voice interactivity metrics. The new model only includes simple parameters, which are easy to measure. Furthermore, USI is able to assess the voice interactivity and smoothness of the conversation. The authors proposed three metrics as illustrated in Figure 8.8. The index of interactivity denotes the responsiveness, which is the degree of alternation between statement and response. The response time denotes the delay between two talk bursts. Including the burst lengths, the three metrics are related to the quality of conversation, and thus they can directly affect USI. The correlation test verifies the relationship between USI and the interactivity metrics. Overall, USI provides another effective quantification method for measuring VoIP service quality.

Leave a Reply

Your email address will not be published. Required fields are marked *