|
|||||
[IP
Telephony Cookbook] /
Technological Background
Technological
Background }
2
This
chapter provides technical
background information about
the protocols and
components
used
in IP Telephony. It introduces the
relevant component types, gives
detailed information
about
H.323, SIP and RTP as
well as information about
media gateway control and
vendor
-specific
protocols.
}
2.1
Components
An
IP Telephony infrastructure usually
consists of different types of components.This
section
gives
an overview of typical components without
describing them in a protocol-specific
context.
}
2.1.1
Terminal
A
terminal is a communication endpoint that
terminates calls and their
media streams. Most
commonly,
this is either a hardware or a
software telephone or videophone,
possibly enhanced
with
data capabilities.There are
terminals that are intended
for user interaction and
others that are
automated,
e.g., answering
machines.
An
IP Telephony terminal is located on at
least one IP address.There
may well be multiple
terminals
on the same IP address but
they are treated
independently. Most of the
time, a terminal
has
been assigned one or more
addresses (see Section
2.1.5), which others will
use to dial to it.
If
IP Telephony servers are
used, a terminal registers
the addresses with its
server.
}
2.1.2
Server
Placing
an IP Telephony call requires at
least two terminals, and
the knowledge of the IP
address
and
port number of the terminal to
call. Obviously, forcing the user to
remember and use IP
addresses
for placing calls is not
ideal and dynamic IP addressing
schemes (DHCP) make
this
requirement
even more intolerable.
As
mentioned before, terminals usually
register their addresses
with a server.The server
stores
these
telephone addresses along
with the IP addresses of the
respective terminals, and is
thus able
to
map a telephone address to a
host.
When
a telephone user dials an
address, the server tries to
resolve the given address
into a
network
address.To do so, the server
may interact with other
telephony servers or
services.
It
may also provide further
call routing mechanisms like
CPL (Call Processing
Language) scripts
P.11
[IP
Telephony Cookbook] /
Technological Background
or
skill-based routing (e.g.,
route calls to `WWW-Support' to a list of
persons who are tagged
to
be
responsible for this
subject).
Finally,
a telephony server is responsible
for authenticating registrations,
authorising calling parties
and
performing the
accounting
{
2.1.3
Gateway
Gateways
are telephony endpoints that
facilitate calls between endpoints
that usually would
not
interoperate.
Usually this means that a
gateway translates one
signalling protocol into another
(e.g.
SIP/ISDN
signalling gateways), but translating
between different network addresses
(IPv4/IPv6)
or
codecs (media gateways) can
be considered gatewaying as well. Of
course, it is possible
that
multiple
functionalities exist in a single
gateway.
Finding
gateways between VoIP and a
traditional PBX is usually quite simple.
Gateways that
translate
different VoIP protocols are
harder to find. Most of them
are limited to basic
call
functionality.
{
2.1.4
Conference bridge
Conference
bridges provide the means to
have 3-point or multi-point
conferences that can
either
be
ad-hoc or scheduled. Because of
the high resource requirements,
conference bridges
are
usually
dedicated servers with
special media
hardware.
{
2.1.5
Addressing
A
user willing to use a
communication service needs an
identifier to describe himself and
the
called
party. Ideally, such an identifier
should be independent of the
user's physical
location.The
network
should be then responsible
for finding the current location of
the called party. A
specific
user
may define to be reached by
multiple contact address
identifiers.
Regular
telephony systems use E.164
numbers (the international public
telecommunication
numbering
plan). An identifier is composed of up to
fifteen digits with a leading plus
sign, for
example,
+1234565789123.When dialling, the leading
plus is normally replaced by
the
international
access code, usually double
zero (00).This is followed by a country
code and a
subscriber
number.
The
first IP Telephony systems used
the IP addresses of end-point
devices as user identifiers.
Sometimes
they are still used
now. However, IP addresses
are not location-independent (even
if
IPv6
is used) and they are
hard to remember (especially if
IPv6 is used) so they are
not suitable as
user
identifiers.
Current
IP Telephony systems use two
kinds of identifiers:
-
URIs (RFC2396);
-
Numbers (E.164).
P.12
[IP
Telephony Cookbook] /
Technological Background
Some
systems tried to use names
(alpha-numeric strings), but this led to
a flat naming space
and
thus
limited zones of applicability.
A
Universal Resource Identifier (URI)
uses a registered naming space to
describe a resource in a
location-independent
way. Resources are available
under a variety of naming schemes and
access
methods
including e-mail addresses
(mailto), SIP identifiers (sip),
H.323 identifiers (h.323,
RFC3508)
or telephone numbers (draft-ietf-iptel-rfc2806bis-02).
E-mail-like identifiers have
several
advantages.They are easy to
remember, nearly every
Internet user already has an
e-mail
address
and a new service can be
added using the same
identifier.The user location can be
found
with
a Domain Name System
(DNS).The disadvantage of URIs is
that they are difficult
or
impossible
to dial on some user devices
(phones).
If
we want to integrate a regular
telephony system with IP
Telephony, we must deal with
phone
number
identifiers even on the IP
Telephony-side.The numbers are
not well suited for
an
Internet
world relying on domain names.Therefore,
the ENUM system was
invented, using
adapted
phone numbers as domain names. ENUM is
described in Chapter 7.
{
2.2.
Protocols
{
2.2.1
H.323
The
H.323 Series of Recommendations evolved
out of the ITU-T's work on
video telephony
and
multimedia conferencing. After completing
standardisation on video telephony
and
videoconferencing
for ISDN at up to 2 Mbit/s in
the H.320 series, the ITU-T
took on work on
similar
multimedia communication over ATM
networks (H.310, H.321), over
the analogue Public
Switched
Telephone Network (PSTN)
using modem technology
(H.324), and over the
stillborn
Isochronous
Ethernet (H.322).The most widely-adopted
and hence most promising
network
infrastructure
- and the one bearing
the largest difficulties to achieve
well-defined Quality of
Service
- was addressed in the beginning of
1995 in H.323: Local Area
Networks, with the
focus
on
IP as the network layer
protocol.The primary goal
was to interface
multimedia
communication
equipment on LANs to the reasonably
well-established base on
circuit-switched
networks.
The
initial version of H.323 was
approved by the ITU-T about
one year later, in June
1996,
thereby
providing a base on which the industry
could converge.The initial focus
was clearly on
local
network environments, because QoS
mechanisms for IP-based wide
area networks, such as
the
Internet, were not well
established at this point. In
early 1996, Internet-wide
deployment of
H.323
was already explicitly included in
the scope, as was the
aim to support
voice-only
applications
and, thus, the foundations
to use H.323 for IP
Telephony were laid. H.323
has
continuously
evolved towards becoming a technically
sound and functionally rich
protocol
platform
for IP Telephony applications.The first
major additions to this end
were included in
H.323
version 2, approved by the ITU-T in
January 1998. In September
1999, H.323v3 was
approved
by the ITU-T, incorporating numerous
further functional and conceptual
extensions to
enable
H.323 to serve as a basis
for IP Telephony on a global
scale and as well as making it
meet
requirements
in enterprise environments. Moreover,
many new enhancements were
introduced
into
the H.323 protocol.Version 4 was
approved on November 17,
2000 and contains
enhancements
in a number of important areas, including
reliability, scalability, and
flexibility.
P.13
![]() [IP
Telephony Cookbook] /
Technological Background
New
features help facilitate more scalable
Gateway and MCU solutions to
meet the growing
market
requirements. H.323 has been
the undisputed leader in
voice, video, and data
conferencing
on
packet networks, and Version 4
endeavours to keep H.323
ahead of the
competition.
{
2.2.1.1
Scope
As
stated before, the scope of
H.323 encompasses multimedia
communication in IP-based
networks,
with significant consideration
given to gatewaying to circuit-switched
networks (in
particular
to ISDN-based video telephony and to
PSTN/ISDN/GSM for voice
communication).
Internet
/ Intranet
ISDN
H.320
H.323
Terminal
H.323
Gatekeeper
PSTN
H.324
H.323
MCU
H.323
ATM
Gateway
H.310,
H.321
H.323
Terminal
Internet/Intranet
SIP
Figure
2.1 Scope and components
defined in H.323
H.323
defines a number of functional / logical components as
shown in Figure 2.1:
-
Terminal
Terminals
are H.323-capable endpoints,
which may be implemented in software
on
workstations
or as stand-alone devices (such as
telephones).They are assigned to
one or more
aliases
(e.g. a user's name/URI)
and/or telephone
number(s);
-
Gateway
Gateways
interconnect H.323 entities (such as
endpoints, MCUs, or other
gateways) to other
network/protocol
environments (such as the
telephone network).They are
also assigned one or
more
aliases and/or telephone
number(s).The H.323 Series of
Recommendations provides
detailed
specifications for interfacing H.323 to
H.320, ISDN/PSTN, and
ATM-based networks.
Recent
work also addresses control
and media gateway
specifications for telephony
trunking
networks
such as SS7/ISUP;
-
Gatekeeper
The
gatekeeper is the core
management entity in an H.323
environment. It is, among
other
things,
responsible for access
control, address resolution
and H.323 network (load)
management
and
provides the central hook to
implement any kind of
utilisation / access policies. An
H.323
environment
is subdivided into zones
(which may, but need not be
congruent with the
underlying
network topology); each zone is
controlled by one primary gatekeeper
(with
optional
backup gatekeepers). Gatekeepers
may also provide added
value, e.g., act as a
P.14
![]() [IP
Telephony Cookbook] /
Technological Background
conferencing
bridge or offer supplementary
call services. An H.323
Gatekeeper can also
be
equipped
with the proxy feature.
Such a feature enables the
routing through the gatekeeper
of
the
RTP traffic (audio and video)
and the T.120 traffic (data), so no
traffic is directly exchanged
between
endpoints. (It could be considered a
kind of IP-to-IP gateway
that can be used
for
security
and QoS purposes);
-
Multipoint
Controller (MC)
A
Multipoint Controller is a logical entity
that interconnects the call
signalling and
conference
control
channels of two or more H.323
entities in a star topology. MCs
coordinate the (control
aspects
of) media exchange between
all entities involved in a
conference.They also provide
the
endpoints
with participant lists, exercise
floor control, etc. MCs
may be embedded in any
H.323
entity
(terminals, gateways gatekeepers) or
implemented as stand-alone entities.They can
be
cascaded
to allow conferences spanning
multiple MCs;
-
Multipoint
Processor (MP)
For
multipoint conferences with
H.323, an optional Multipoint Processor
may be used that
receives
media streams from the
individual endpoints, combines
them through some
mixing/switching
technique, and transmits the
resulting media streams back
to the endpoints;
-
Multipoint
Control Unit
(MCU)
In
the H.323 world, an MCU is
simply a combination of an MC and an MP
in a single device.
The
term originates in the ISDN
videoconferencing world where MCUs
were needed to
create
multipoint conferences out of a
set of point-to-point
connections.
{
2.2.1.2
Signalling protocols
H.323
resides on top of the basic
Internet Protocols (IP, IP
Multicast,TCP, and UDP) in a
similar
way
as the IETF protocols
discussed in the next
subsection, and can make
use of integrated and
differentiated
services along with resource
reservation protocols.
Audio
Conference
Gatekeeper
Data
Applications
Video
Control
T.120
RTP/RTCP
RAS
H.225.0
H.245
RSVP
Relaiable
MC
UDP
TCP
+ RFC 1006
IP
/ IP Multicast
Intergrated
/ Differentiated Services
Forwarding
Figure
2.2 H.323 protocol
architecture
For
basic call signalling and
conference control interactions
with H.323, the
aforementioned
components
communicate using three control
protocols:
P.15
[IP
Telephony Cookbook] /
Technological Background
-
H.225.0
Registration, Admission, and Status
(RAS)
The
RAS channel is used for
communication between H.323
endpoints and their
gatekeeper
and
for some inter-gatekeeper
communication. Endpoints use
RAS to register with
their
gatekeeper,
to request permission to utilise system
resources, to have addresses of
remote
endpoints
resolved, etc. Gatekeepers
use RAS to keep track of
the status of their
associated
endpoints
and to collect information about
actual resource utilisation
after call
termination.
RAS
provides mechanisms for user/endpoint
authentication and call
authorisation;
-
H.225.0
Call Signalling
The
call signalling channel is
used to signal call setup
intention, success, failures,
etc, as well as
to
carry operations for
supplementary services (see
below). Call signalling
messages are derived
from
Q.931 (ISDN call
signalling); however, simplified
procedures and only a subset
of the
messages
are used in H.323.The call
signalling channel is used end-to-end
between calling
party
and called party and
may optionally run through
one or more gatekeepers (the
call
signalling
models are later described
in the `Signalling models'
Section).
Optimisations:
Since
version 3, H.225.0 supports
the following
enhancements:
-
Multiple
Calls - To
prevent using a dedicated
TCP connection for each
call, gateways can
be
built to handle multiple
calls on each connection.
-
Maintain
Connection - Similar
to Multiple Calls, this
enhancement will reduce the
need
to
open new TCP connections.
After the last call has
ended, the endpoint may
decide to
maintain
the TCP connection to provide a
better call setup time
for the next
call.
The
primary use of both enhancements is at
the communication between
servers (gatekeeper,
MCU)
or gateways.While, in theory, both mechanisms
were possible before, beginning
with
H.323v3,
the messages contained
fields to indicate support
for the mechanisms;
-
H.245
Conference Control
The
conference control channel is
used to establish and
control two-party calls (as
well as
multiparty
conferences). Its functionality
includes determining possible
modes for media
exchange
(e.g., select media encoding
formats that both parties
understand) and
configuring
actual
media streams (including
exchanging transport addresses to
send media streams to
and
receive
them from). H.245 can be
used to carry user input
(such as DTMF) and
enables
confidential
media exchange and defines
syntax and semantics for
multipoint conference
operation
(see below). Finally, it provides a
number of maintenance messages. Also,
this logical
channel
may (optionally) run through one or more
gatekeepers, or directly between
calling
party
and called party (please
refer to the `Signalling
models' Section for
details).
It
should be noted that H.245
is a legacy protocol inherited from
the collective work on
multimedia
conferencing over ATM, PSTN
and other networks. Hence it carries a
lot of fields
and
procedures that do not apply
to H.323 but make the protocol
specification quite
heavyweight.
Optimisations:
The
conference control channel is
also subject to optimisations. Per
default, it is transported
over
an exclusive TCP connection but it may
also be tunnelled within the
signalling connection
P.16
![]() [IP
Telephony Cookbook] /
Technological Background
(H.245
tunnelling). Other optimisations deal
with the call setup
time.The last chance to
start an
H.245
channel is on receipt of the
CONNECT message which implies
that the first
seconds
after
the user accepted the
call, no media is transmitted.
H.245 may also start
parallel to the
setup
of the H.225 call
signalling, which is not
really a new feature but
another way of
dealing
with
H.245.Vendors often call this
Early
Connect or
Early
Media. Since
H.323v2, it is
possible
to start a call using a less
powerful but sufficient capability
exchange by simply offering
possible
media channels that just
have to be accepted.This procedure,
called FastConnect
or
FastStart,
requires less round-trips and is
transported over the H.225
channel. After the
FastConnect
procedure
is finished or when it fails,
the normal H.245 procedures
start.
A
number of extensions to H.323 include
mechanisms for more efficient call
setup (H.323 Annex
E)
and reduction of protocol overhead e.g.,
for simple telephones (SETs,
simple endpoint types
and
H.323.Annex F).
{
2.2.1.3
Gatekeeper discovery and
registration
An
H.323 endpoint usually registers
with a gatekeeper that
provides basic services like
address
resolution
for calling the other
endpoints.There are two
possibilities for an endpoint to find
its
gatekeeper:
-
Multicast
discovery
The
endpoint sends a gatekeeper request
(GRQ) to a well-known multicast
address
(224.0.1.41)
and port (1718). Receiving
gatekeepers may confirm
their responsibility for
the
endpoint
(GCF) or ignore the
request
-
Configuration
The
endpoint knows the IP address of the
gatekeeper by manual configuration.While
there is
no
need for a gatekeeper
request (GRQ) to be sent to
the preconfigured gatekeeper,
some
products
need this protocol step. If a
gatekeeper receives a GRQ
via unicast, it must
either
confirm
(GCF) the request or reject it
(GRJ).
When
trying to discover the
gatekeeper via multicast, an endpoint may
request any gatekeeper
or
specify
the request by adding a
gatekeeper identifier to the
request. Only the gatekeeper
that has
the
requested identifier may
reply positively. (see
Figure 2.3)
Endpoint
Gatekeeper
Gatekeeper
h323:prelle
id1
id2
GRQ:id1
GRQ:id1
GCF
GRJ
RRQ:prelle
RCF
Figure
2.3 Discovery and
registration process
P.17
[IP
Telephony Cookbook] /
Technological Background
After
the endpoint discovers the location of
the gatekeeper, it tries to
register itself (RRQ).
Such
a
registration includes (among
other information):
-
The addresses of the endpoint - for a
terminal, this may be the
user ids or telephone
numbers.
An
endpoint may have more than
one address. In theory it is possible
that addresses belong
to
different
users to enable multiple
users to share a single
phone - in practice, this
depends on the
phones
and the gatekeeper
implementation;
-
Prefixes - if the registering endpoint is
a gateway it may register number
prefixes instead of
addresses;
-
Time to live - an endpoint may request
how long the registration
will last.This value can
be
overwritten
by gatekeeper policies.
The
gatekeeper checks the
requested registration information
and confirms the
(possibly
modified)
values (RCF). It may also
reject a registration request
because of, for example,
invalid
addresses.
In the case of a confirmation,
the gatekeeper assigns a unique
identifier to the endpoint,
which
will be used in subsequent
requests to indicate that
the endpoint is still
registered.
2.2.1.3.1
Addresses and
registrations
H.323
defines and utilises several
address types.The one most
commonly used and derived
from
the
PSTN world is the dialled
digit address, which is
defined as a number dialled by the
endpoint.
It
does not include further information
(e.g., about the dial
plan) and needs to be
interpreted by
the
server.The server might
convert the dialled number
into a party number that
includes
information
about the type of number and
the dial plan.
To
provide alphanumeric or name
dialling, H.323 supports
H.323-IDs that represent
either
usernames
or e-mail-like addresses, or the more
general approach of URL-ID which
represent
any
kind of URL.
Unlike
SIP addresses, an H.323
address can only be
registered by one endpoint (per
zone), so a
call
to that address only
resolves to a single endpoint.To
call multiple destinations
simultaneously
in
H.323 requires a gatekeeper
that actively maps a single
address to multiple different
addresses
and
tries to contact them in
sequence.
2.2.1.3.2
Updating registrations
A
registration expires after a
defined time and must
therefore be refreshed i.e.,
kept alive by
subsequent
registrations which include the
previously-assigned endpoint identifier.To
reduce the
registration
overhead of regular registrations,
H.323 supports KeepAlive
registrations
that contain
only
the previously-assigned endpoint
identifier. Of course, these
registrations may only be
sent if
the
registration information is
unchanged.
Endpoints
requesting the registration of
large numbers of addresses
would exceed the size of
a
UDP
packet, so H.323v4 supports
Additive
Registration, a
mechanism that allows an
endpoint
to
send multiple registration
requests (RRQ) in which the
addresses do not replace
existing
registrations
but are submitted in addition to
them.
P.18
[IP
Telephony Cookbook] /
Technological Background
{
2.2.1.4
Signalling models
Call
signalling messages and
H.245 control messages may
be exchanged either end-to-end
between
calling party and called
party or through a gatekeeper. Depending
on the role the
gatekeeper
plays in the call signalling
and in the H.245 signalling,
the H.323 specification
foresees
three
different types of signalling
models:
-
Direct
signalling
With
this signalling model, only
H.225.0 RAS messages are
routed through the
gatekeeper
while
the other logical channel
messages are directly exchanged
between the two
endpoints;
-
Gatekeeper-routed
call signalling
With
this signalling model, H.225.0
RAS and H.225.0 call
signalling messages are
routed
through
the gatekeeper, while the
H.245 Conference Control
messages are directly
exchanged
between
the two endpoints;
-
Gatekeeper-routed
H.245 control, H.225.0 RAS
and H.225.0
Call
signalling and H.245
Conference Control messages
are routed through the
gatekeeper and
only
the media streams are
directly exchanged between the
two endpoints.
The
following sub-sections detail
each signalling model.The
figures displayed in this
section apply
both
to the use of a single
gatekeeper and to the use of
a gatekeeper network. Since
the signalling
model
is decided by the configuration of the
endpoint's gatekeeper and
applies to all the
messages
the
gatekeeper handles, the
extensions to the multiple
gatekeeper are straightforward
(they simply
apply
the definition of the
signalling model described in the
itemised list above to
each
gatekeeper
involved), except for the location of
zone-external targets (described later in
the
`Locating
zone external targets' section).
Message exchanges in any of
the figures in this
section
are
not reported, as the figures
are intended to remain
bounded in the ellipse where
the H.323
Gatekeeper
is depicted. Also, it is described in
the `Locating zone external
targets' section.
Please
note
that there is no indication
about the call termination
in the sub-section of each
signalling
model.
Please refer to the
`Communication phases' Section
for details.
The
direct signalling model is depicted in
Figure 2.4. In this model,
the H.225.0 Call
Signalling
and
H.245 Conference Control
messages are exchanged directly
between the call terminals.
As
shown
in the figure, the
communication starts with an ARQ
(Admission
ReQuest)
message
sent
by the calling party (which
may be either a terminal or a
gateway) to the
gatekeeper.The
ARQ
message is used by the endpoint to
request access to the
packet-based network from
the
gatekeeper,
which either grants the
request with an ACF
(Admission
ConFirm) or
denies it
with
an ARJ (Admission
ReJect). If an
ARJ is issued, the call is
terminated. After this first
step,
the
call signalling part of the
call begins with the
transmission of the SET UP message
from the
calling
party to the called
party.The transport address of
the SET UP message (and of
all the
H.225.0
call signalling messages) is
retrieved by the calling
party from the destCallSignalAddress
field
carried inside the ACF
received. In the case of the
direct signalling model, it is the
address of
the
destination endpoint. Upon receiving
the SET UP message, the
called party starts its
H.225.0
RAS
procedure with the
gatekeeper. If successful, a CONNECT
message is sent back to
the
calling
party to indicate acceptance of
the call. Before sending
the CONNECT message,
two
other
messages may be sent from
the called party to the
calling party (those two
messages are not
depicted
in the figure since we have
reported only mandatory
messages):
P.19
![]() [IP
Telephony Cookbook] /
Technological Background
-
ALERTING
message
This
message may be sent by the
called user to indicate that
called
user alerting has
been
initiated
(in everyday terms, the
`phone is ringing');
-
CALL
PROCEEDING message
This
message may be sent by the
called user to indicate that
requested
call establishment has
been
initiated and no more call-establishment
information will be
accepted.
Figure
2.4 Direct signalling
model
The
CONNECT message closes the
H.225.0 call signalling part
of the call and makes
the
terminals
starting the H.245
conference control one. In
such call mode, the
H.245 Conference
Control
messages are exchanged directly
between the two endpoints
(the correct
`h245Address'
was
retrieved from the CONNECT
message itself).The procedures started
with the H.245
Conference
Control channel are used
to:
-
allow the exchange of
audiovisual and data
capabilities, with the
TERMINAL CAPABILITY
messages;
-
request the transmission of a
particular audiovisual and
data mode, with the
LOGICAL
CHANNEL
SIGNALLING messages;
-
manage the logical channels
used to transport the
audiovisual and data
information;
-
establish which terminal is
the master terminal and
which is the slave terminal
for the purposes
of
managing logical channels, with
the MASTER SLAVE
DETERMINATION messages;
-
carry various control and
indication signals;
-
control the bit rate of
individual logical channels and
the whole multiplex, with
the
MULTIPLEX
TABLE SIGNALLING
messages;
-
measure the round trip
delay, from one terminal to
the other and back,
with the ROUND
TRIP
DELAY messages.
Once
the H.245 conference control
messages are exchanged, the
two endpoints have all
the
necessary
information to open the
media streams.
2.2.1.4.2
Gatekeeper-routed call signalling
model
The
gatekeeper-routed call signalling model
is depicted in Figure 2.5. In
this model, the H.245
Conference
Control messages are
exchanged directly between the
call termination
clients.With
each
call, the communication
starts with an ARQ message
(Admission
ReQuest) sent by
the
calling
party to its gatekeeper.The ARQ
message is used by the endpoint to
request access to the
P.20
![]() [IP
Telephony Cookbook] /
Technological Background
packet-based
network from the gatekeeper,
which either grants the
request with an ACF
(Admission
ConFirm) or
denies it with an ARJ
(Admission
ReJect). After
this first step, the
call
signalling part of the call
begins with the transmission
of the SET UP message from
the calling
party
to its gatekeeper.The transport
address of the SET UP message
(and of all the H.225.0
call
signalling
messages) is retrieved by the
calling party from the
destCallSignalAddress
field,
carried
inside
the ACF received. In the
case of the gatekeeper-routed
call signalling model, it is
the
address
of the gatekeeper itself.The SET UP
message is then forwarded by
the gatekeeper (or by
the
gatekeeper network) to the
called endpoint. Upon receiving
the SET UP message, the
called
party
starts its H.225.0 RAS
procedure with its
gatekeeper. If successful, a CONNECT
message is
sent
to indicate acceptance of the
call. Because of the call
model, this message is also
sent to the
called
endpoint's gatekeeper which is in
charge of forwarding it to the calling
party endpoint
(either
directly or using the gatekeeper
network). Before sending the
CONNECT message, two
other
messages may be sent from
the called party to its
gatekeeper (those two
messages are not
depicted
in the figure since only
mandatory messages are
reported):
-
ALERTING
message
This
message may be sent by the
called user to indicate that
called
user alerting has
been
initiated
(in everyday terms, the
`phone is ringing');
-
CALL
PROCEEDING message
This
message may be sent by the
called user to indicate that
requested
call establishment has
been
initiated and no more call
establishment information will be
accepted.
Figure
2.5 Gatekeeper-routed call
signalling model
The
two optional messages listed
above are then forwarded by
the gatekeeper (or by
the
gatekeeper
network) to the calling party.
After receiving the CONNECT
message, the calling
party
starts the H.245 Conference
Control channel procedures directly
with the called party
(the
correct
h245Address was retrieved
from the CONNECT message
itself).The scope of the
H.245
Conference
Control channel procedure is
the same as is detailed
above. Please refer to the
`Direct
signalling
model' Section for
details.
2.2.1.4.3
Gatekeeper-routed H.245 control
model
The
gatekeeper-routed H.245 control model is
depicted in Figure 2.6. In
this model, only the
media
streams are exchanged directly
between the call termination
clients. For each call,
the
communication
starts with an ARQ (Admission
ReQuest)
message sent by the calling
party to
its
gatekeeper.The ARQ message is used by
the endpoint to be allowed to access
the packet-based
P.21
![]() [IP
Telephony Cookbook] /
Technological Background
network
by the gatekeeper, which
either grants the request
with an ACF (Admission
ConFirm)
or
denies it with an ARJ
(Admission
ReJect). After
this first step, the call
signalling part of
the
call
begins with the transmission
of the SET UP message from
the calling party to its
gatekeeper.
The
transport address of the SET UP
message (and of all the
H.225.0 call signalling
messages) is
retrieved
by the calling party from
the destCallSignalAddress
field
carried inside the
ACF
received.
In the case of gatekeeper-routed
H.245 control model, it is the
address of the
gatekeeper
itself.The
SET UP message is then forwarded by
the gatekeeper (or by the
gatekeeper network)
to
the called endpoint. Upon
receiving the SET UP message,
the called party starts
its H.225.0
RAS
procedure with its
gatekeeper. If successful, a CONNECT
message is sent to
indicate
acceptance
of the call. Because of the
call model, this message is
also sent to the called
endpoint's
gatekeeper,
which is in charge of forwarding it to
the calling party endpoint (either
directly or
using
the gatekeeper network).
Before sending the CONNECT
message, two other messages
may
be
sent from the called
party to its gatekeeper
(those two messages are
not depicted in the
figure
since
only mandatory messages are
reported):
-
ALERTING
message
This
message may be sent by the
called user to indicate that
called
user alerting has
been
initiated
(in everyday terms, the
`phone is ringing');
-
CALL
PROCEEDING message
This
message may be sent by the
called user to indicate that
requested call establishment
has
been
initiated and no more call
establishment information will be
accepted.
Figure
2.6 Gatekeeper-routed H.245
control model
The
two optional messages listed
above are then forwarded by
the gatekeeper (or by
the
gatekeeper
network) to the calling party.
After receiving the CONNECT
message, the calling
party
starts the H.245 Conference
Control channel procedures
with its gatekeeper (the
correct
h245Address
was retrieved from the
CONNECT message itself). All
of the H.245 channel
messages
are then exchanged by the
endpoints with their
gatekeeper (or gatekeepers). It is
the
gatekeeper
(or gatekeeper network) which
takes care of forwarding them up to
the remote
endpoint
as foreseen by the gatekeeper-routed
H.245 control model.The
scope of the H.245
Conference
Control channel procedure is
the same as is detailed
above. Please refer to the
`Direct
signalling
model' Section for
details.
P.22
[IP
Telephony Cookbook] /
Technological Background
{
2.2.1.5
Communication phases
In
a H.323, communication may be
identified in five different
phases:
-
Call set up;
-
Initial communication and
capability exchange;
-
Establishment of audiovisual
communication;
-
Call services;
-
Call termination.
2.2.1.5.1
Call setup
Recommendation
H.225.0 defines the call
setup messages and
procedures detailed
here.The
recommendation
foresees that requests for
bandwidth reservation should
take place at the
earliest
possible
phase. Unlike other
protocols, there is no explicit
synchronisation between two
endpoints
during
the call setup procedure
(two endpoints can send a
SET UP message to each other
at
exactly
the same time). Actions to be
taken when problems of
synchronisation during
the
exchange
of SET UP messages arise are
resolved by the application itself.
Applications not
supporting
multiple simultaneous calls
should issue a busy signal
when they have an
outstanding
SET
UP message, while applications
supporting multiple simultaneous
calls issue a busy
signal
only
to the same endpoint to which
they sent an outstanding SET UP
message. Moreover, an
endpoint
should be capable of sending
the ALERTING messages.
ALERTING means that
the
called
party has been alerted of an
incoming call (`phone
ringing', in the language of
the old
telephony).
Only the ultimate called
endpoint originates the ALERTING
message and only
when
the
application has already alerted
the user. If a gateway is involved,
the gateway sends
ALERTING
when it receives a ring
indication from the Switched
Circuit Network
(SCN).
The
sending of an ALERTING message is
not required if an endpoint can
respond to a SET UP
message
with a CONNECT, CALL
PROCEEDING, or RELEASE COMPLETE
within four
seconds.
After successfully sending a SET UP
message, an endpoint can expect to
receive either an
ALERTING,
CONNECT, CALL PROCEEDING, or
RELEASE COMPLETE message
within
4
seconds after successful
transmission. Finally, to maintain the
consistency of the meaning of
the
CONNECT
message between packet-based networks
and circuit-switched networks,
the
CONNECT
message should be sent only
if it is certain that the
capability exchange
will
successfully
take place and a minimum
level of communications can be
performed.
The
call setup phase may
have different realisations:
-
basic
call setup when neither
endpoint are
registered
In
this call setup the
two endpoints communicate
directly;
-
both
endpoints registered to the
same gatekeeper
In
this call set up the
communication is decided by the
signalling model configured on
the
gatekeeper;
-
only
calling endpoint has
gatekeeper
In
this call setup only
the calling party sends
messages to the gatekeeper
depending on the
signalling
models configured while the
called party sends the
messages directly to the calling
party
endpoint;
-
only
called endpoint has
gatekeeper
In
this call setup only
the called party sends
messages to the gatekeeper
depending on the
signalling
models configured while the
calling party sends the
messages directly to the
called
endpoint;
P.23
[IP
Telephony Cookbook] /
Technological Background
-
both
endpoints registered to different
gatekeepers
Each
of the two endpoints communicate
with their gatekeeper
depending on the
signalling
model
configured, additional H.225.0 RAS
messages may be exchanged
between gatekeeper in
order
to retrieve location information (see
`Locating zone external targets'
Section for more
details)
-
call
setup with Fast Connect
procedure
In
this call set up,
the media channels are
established using the
Fast
Connect procedure.The
Fast
Connect procedure
speeds up the establishment of a
basic point-to-point call
(only one
round-trip
message exchange is needed)
enabling immediate media stream delivery
upon call
connection.The
Fast
Connect procedure
is started if the calling endpoint
initiates it by
sending
a SETUP message containing the
FastStart
element
(to advise it is going to use
the
Fast
Connect procedure).
This
kind of element contains,
among the other things, a
sequence of all of the
parameters
necessary
to immediately open and
begin transferring media on
the channels.The Fast
Connect
procedure
may be refused by the called
endpoint (motivations may be either
because it wants to
use
features requiring use of H.245 or
because it does not
implement it).The Fast
Connect
procedure
may be refused with any
H.225.0 call signalling
message, up to and including
the
CONNECT
one. Refusing the Fast
Connect procedure
(or not initiating it)
requires that H.245
procedures
be used for the exchange of
capabilities and the opening
of media channels. Moreover,
the
Fast
Connect procedure
allows more information for
the scope of H.323/SIP
gatewaying
(further
details to be found in Chapter
4);
-
call
setup via
gateways
When
a gateway is involved, the call
setup between it and the
network endpoint is the same
as
the
endpoint-to-endpoint call
setup;
-
call
setup with an MCU
When
an MCU is involved, all endpoints
exchange call signalling
with the MCU (and with
the
interested
gatekeepers, if any). No changes
are foreseen between an endpoint
and the MCU call
setup
since it proceeds the same
as the endpoint-to-endpoint;
-
broadcast
call setup
This
kind of call setup follows
the procedures defined in
Recommendation H.332.
2.2.1.5.2
Initial communication and capability
exchange
After
exchanging call setup
messages, the endpoints, if
they plan to use H.245,
establish the H.245
Control
Channel.The H.245 Control Channel is
used for the capability
exchange and to open
the
media channels.The H.245
Control Channel procedures
are neither started nor
closed if
CONNECT
does not arrive. An H.245
Control Channel can also be
opened on reception of
ALERTING
or CALL PROCEEDING messages) or
when an endpoint sends
RELEASE
COMPLETE.
H.323 endpoints support the
capabilities exchange procedure of
H.245.The H.245
TERMINALCAPABILITYSET
message is used for the
exchange of endpoint system
capabilities.
This
message is the first H.245
message sent.
The
master-slave determination procedure of
H.245 must be supported by
H.323-compliant
endpoints.
In cases of multipoint conferencing
(MC), capability is present in more
than one
endpoint
and the master-slave determination is
used for determining which
MC will play an
active
role.The H.245 Control Channel
procedure also provides
master-slave determination for
opening
bi-directional channels for
data.
P.24
![]() [IP
Telephony Cookbook] /
Technological Background
After
Terminal
Capability Exchange has been
initiated, a master-slave
determination
procedure
(consisting of either
MASTERSLAVEDETERMINATION or
MASTERSLAVEDETERMINATIONACK)
has to be started as the first H.245
Conference
Control
procedure. Upon failure of
initial capability exchange or
master-slave determination
procedures,
a maximum of two retries are
performed before the endpoint passes to
the Call
Termination
phase. Normally, after
successful completion of the
requirements of this phase,
the
endpoints
proceed directly to establishment of the
audiovisual communication
phase.
2.2.1.5.2.1
Encapsulation of H.245 messages
within H.225.0 call
signalling messages
Encapsulation
of H.245 messages inside
H.225.0 call signalling
messages instead of establishing
a
separate
H.245 channel is possible in
order to save resources,
synchronise call signalling
and
control
and reduce call setup
time.This process is called
`encapsulation' or `tunnelling' of
H.245
messages.This
procedure allows the
terminal to copy the encoded
H.245 message using
one
structure
inside the data of the
Call Signalling Channel. If
tunnelling is used, any
H.225.0 call
signalling
message may contain one or
more H.245 messages. If there is no
need to send an
H.225.0
call signalling message when
an H.245 message has to be transmitted, a
FACILITY
message
is sent detailing (with appropriate
fields inside) the reason
for such a message.
2.2.1.5.3.
Establishment of audiovisual
communication
The
establishment of audiovisual
communication follows the
procedures of Recommendation
H.245.
Open logical channels for the
various information streams
are opened using the
H.245
procedures.The
audio and video streams are
transported using an unreliable protocol,
while data
communications
are transported using a
reliable protocol.The transport
address that the
receiving
endpoint
has assigned to a specific logical
channel (audio, video or data) is
transported by the
OPENLOGICALCHANNELACK
message (an example is given
in Figure 2.7).That transport
address
is used to transmit the information
stream associated with that
logical channel.
Figure
2.7 OPENLOGICALCHANNELACK message
content
2.2.1.5.4.
Call services
When
the call is active, the
terminal may request additional
call services. Among the
services
reported
here are the Bandwidth
Change Services and
Supplementary Services.With
Bandwidth
Change
Services. During a conference,
the endpoints or gatekeeper
(if involved) may, at any
time,
P.25
![]() [IP
Telephony Cookbook] /
Technological Background
request
an increase or decrease in the
call bandwidth. If the
aggregate bit rate of all
transmitted
and
received channels does not
exceed the current call
bandwidth, then an endpoint may
change
the
bit rate of a logical channel
without requesting a bandwidth
change. After requesting
a
bandwidth
change, the endpoint waits
for confirmation prior to
actually changing the bit
rate
(confirmation
usually comes from the
gatekeeper). Asking for call
bandwidth changes is
performed
using a BANDWIDTH CHANGE
REQUEST(BRQ) message. If the
request is not
accepted,
a BANDWIDTH CHANGE REJECT
(BRJ) message is returned to
the endpoint. If
the
request is accepted, a BANDWIDTH
CHANGE CONFIRM (BCF) is sent
back to the
endpoint.With
Supplementary Services, support is
optional.The H.450 Series
of
Recommendations
describes a method of providing Supplementary
Services in the H.323
environment.
Figure 2.8 reports some of
the supplementary services
defined so far and
their
number
in the series.
Recommendation
number
Recommendation
Title
H.450.1
Supplementary
Service Framework
H.450.2
Call
Transfer Supplementary
Service
H.450.3
Call
Diversion Supplementary
Service
H.450.4
Call
Hold Supplementary
Service
H.450.5
Call
Park and Pickup
Supplementary Service
H.450.6
Call
Waiting Supplementary
Service
H.450.7
Message
Waiting Supplementary
Service
H.450.8
Name
Identification Supplementary
Service
H.450.9
Call
Completion Supplementary
Service
H.450.10
Call
Offer Supplementary
Service
H.450.11
Call
Intrusion Supplementary
Service
Figure
2.8 Supplementary services of
the H.450-Series
2.2.1.5.5.
Call termination
A
call may be terminated either by both
endpoints or by the gatekeeper.
Call termination is
defined
using the following
procedure:
-
video should be terminated after a
complete picture and then
all logical channels for
video
closed;
-
data transmission should be terminated
and then all logical
channels for data
closed;
-
audio transmission should be terminated
and then all logical
channels for audio
closed;
-
the H.245 ENDSESSIONCOMMAND
message (H.245 Control
Channel) should be sent
by
the
endpoint/gatekeeper.This message indicates
that the call has to be
disconnected; then
the
H.245
message transmission should be
terminated;
-
the ENDSESSIONCOMMAND message
should be sent back to the
sending endpoint and
then
the H.245 Control Channel
should be closed;
-
a RELEASE COMPLETE message
should be sent closing the
Call Signalling Channel if
this is
still
open.
An
endpoint receiving an ENDSESSIONCOMMAND
message does not need to
receive it back
again
after replying to it in order to clear a
call.Terminating a call within a
conference does not
mean
that the whole conference
needs to be terminated. In order to terminate a
conference, an
H.245
message (DROPCONFERENCE) is used.Then
the MC should terminate the
calls with
the
endpoint as described above.
P.26
![]() [IP
Telephony Cookbook] /
Technological Background
A
call may be terminated differently
depending on the gatekeeper
presence and on the
party
issuing
the call termination:
-
call
clearing without a
gatekeeper
No
further action is required;
-
call
clearing with a
gatekeeper
The
gatekeeper needs to be informed
about the call termination.
After RELEASE
COMPLETE
is sent, an H.225.0 DISENGAGE
REQUEST (DRQ) message should
be sent by
each
endpoint to its gatekeeper. A Disengage
Confirm (DCF)
message is sent back to
the
endpoints
to acknowledge the
reception;
-
call
clearing issued by the
gatekeeper
A
call may be terminated by the
gatekeeper by sending a DRQ to an endpoint.The
procedure
described
above for call termination
should be followed immediately by
the endpoint up to the
RELEASE
COMPLETE message.Then a reply to
the gatekeeper should be
sent using a DCF
message.The
other endpoint should follow
the same call termination
procedures upon
receiving
the ENDSESSIONCOMMAND message. Moreover,
if a multipoint conference is
taking
place, in order to close the
entire conference, the
gatekeeper should send a DRQ to
each
endpoint
in the conference.
{
2.2.1.6
Locating zone-external
targets
When
calling an address that is registered at
the same gatekeeper as the
calling party, the
gatekeeper
just needs to look up its
internal tables to resolve the
target address. Complexity
enters
the
picture if the destination address is
registered with another
gatekeeper.While Chapter 7
will
cover
this topic in more detail,
the most basic mechanism
that H.323 provides is
explained here.
A
gatekeeper may explicitly
request the resolution of an
address from other
gatekeepers. On receipt
of
a request to call an address for
which the gatekeeper has no
registration, it can send
out a
location
request (LRQ) to other
gatekeepers (see Figure
2.9).The receiving gatekeeper,
assuming it
knows
the address, will reply
with the Transport
Service Access Point (a combination of
IP
address
and port number) of either
the requested address or its
own call signalling
TSAP.
Endpoint
Endpoint
Gatekeeper
Gatekeeper
x@tzi.o
tzi.org
ubik@cesnet.cz
cesnet.cz
RRQ:
x@tzi.org
RRQ:
ubik@cesnet.cz
RCF
RCF
ARQ:x@tzi.org
LRQ:x@tzi.org
LCF
+ IP
ACF
+ IP
Setup:
x@tzi.org
Figure
2.9 External address resolution
using LRQs
P.27
![]() [IP
Telephony Cookbook] /
Technological Background
A
location request can be sent
via unicast or multicast. If sent
via multicast, only the
gatekeeper
that
can resolve the address
replies. If a gatekeeper receives a
unicast LRQ, it either confirms
or
rejects
the request.
This
mechanism can have a list of
peer gatekeepers to ask, in
parallel or sequentially. It is
also
possible
to assign a domain suffix or number
prefix to each peer so that
an address with a
matching
number prefix of a neighbouring institution
will result in a request to
the gatekeeper of
that
institution. By defining default peers,
one could also build a
hierarchy of gatekeepers
(see
Chapter
7 for further details).
{
2.2.1.7
A sample call
scenario
Figure
2.10 depicts an example of an inter-zone
call setup using H.323
with one gatekeeper
(A)
using
direct signalling while the
other uses routed signalling.The calling
party in zone A
contacts
its
gatekeeper to ask for
permission to call the
called party in zone B (1).The
gatekeeper of zone
A
confirms this request and
provides the calling party
with the address of zone
B's gatekeeper
(2).1
The calling party establishes a call
signalling channel (and
subsequently/in parallel
the
conference
control channel) to the
gatekeeper of zone B (3),
who determines the location of
the
called
party and forwards the
request to the called party
(4).
Zone
A
Zone
B
Gate-
Gate-
(6)
keeper
keeper
(3)
(1)
(2)
(4)
(7)
(8)
(5)
Caller
Callee
(9)
H.225.0
RAS
H.225.0
Call Signaling +
H.245
Media
Streams
Figure
2.10 A sample H.323 call
setup scenario
The
called party explicitly confirms
with its gatekeeper that it
is allowed to accept the
call (5, 6)
and,
if so, alerts the recipient of
the call, returns an
alerting indication and
(once the receiving
user
picks
up the call) eventually an
indication of successful connection setup
back to the calling
party
(7,
8). In (parallel to) this
exchange, capability negotiation and
media stream configuration
take
place.When
the setup has completed, both parties
start sending media streams
directly to each
other.
P.28
[IP
Telephony Cookbook] /
Technological Background
{
2.2.1.8
Additional (call)
services
It
is well known from our daily interaction
with PBXs that telephony
service comprises far
more
than
just call setup and
teardown: n-way conferencing
and various supplementary
services (such as
call
transfer, call waiting,
etc.) are available. Similar
features, at least the more
commonly known
and
used ones, need to be
provided by IP Telephony systems as
well in order to be accepted
by
customers.
Additional call services in
H.323 can be grouped into
three categories:
-
Conferencing
H.323
inherently supports multipoint
tightly-coupled conferencing, i.e.,
conferences with
access
control, optional support for
conference chairs, and close
synchronisation of conference
state
among all participants from
the outset, through the
concept of a Multipoint
Controller
and
an optional Multipoint Processor.While
control is centralised in the
MC, in theory, data
exchange
may be either via IP multicast,
multi-unicast (i.e., peer-wise fan-out
between
endpoints
without MP), or through an MP.
(There seems to be practically no
H.323 equipment
supporting
media multicast.) The distribution mode
may be selected per media
and per
endpoint
peer and is controlled by the
MC;
-
Broadcast
conferencing;
H.323
also provides an interface to
support large loosely-coupled
conferences as are
frequently
used
in the Mbone to multicast seminars,
events, etc. In this case,
the MC defines a
session
description
(using the Session
Description Protocol, SDP, see below)
for the H.323
media
sessions
(which have to operate using
multicast) and announces
this description by some
means
(e.g.,
the Session Announcement Protocol, SAP).
Details are defined in ITU-T
H.332.
-
Supplementary
services
H.323
provides a variety of supplementary
services with additional
ones continuously
being
defined.While
some services can be
accomplished using the basic
H.323 specifications, the
H.450.x
Recommendations
defines a framework (derived from
QSIG, the ECMA/ISO/ETSI
standard
for
supplementary service signalling in
PBXs) and a number of
services (call transfer,
call di-
version,
call hold, call park &
pickup, call waiting, message
waiting indication and call
completion).
Further
extensions to supplementary services
and other functional enhancements
are on the way.
In
particular, an HTTP-based extension framework is being
defined at the time of
writing to
enable
rapid introduction of new
services without the need
for standardisation.
{
2.2.1.9
H.235 Security
The
H.235 recommendation defines elements of
security for H.323:
-
Authentication
Authentication
can be achieved by using a
shared secret (password) or
digital signatures.The
RAS
messages include a token that
was generated using either
the shared secret or
the
signature.
A receiving entity authenticates
the sender by comparing the
received token with a
self-generated
token;
-
Message
Integrity
Integrity
is achieved by generating password-based
checks on the
message;
Privacy
Mechanisms are provided to
setup encryption on the media
streams.They must be used
in
conjunction
with the H.245 protocol and
employ DES,Triple DES or RC2.The
use of SRTP is
not
supported yet (in
H.235v2).
P.29
[IP
Telephony Cookbook] /
Technological Background
These
mechanisms are grouped into
the Security Profiles, where the
Baseline Security Profile
provides
authentication and message
integrity, making it suitable for
subscription-based
environments
and the Voice Encryption Profile
that provides confidential end-to-end
media
channels.
{
2.2.1.10
Protocol Profiles
H.323
has its origin, as mentioned before, in
the area of multimedia
conferencing.This implies
that
a vast number of options are
available, which are not
necessary for simply
providing
telephony
services.The TIPHON project of the
European Telecommunication Standards
Institute
(ETSI)
has defined a telephony profile for
H.323 that specifies which
combination of options
should
be implemented.
Similarly,
H.323 contains a security framework
(H.235) that describes a
collection of algorithms
and
protocol mechanisms but lacks, because of
international political constraints, a
precise
specification
of a mandatory baseline.This is accounted
for by the ETSI TIPHON
security
profile:
this specification fills in
the gaps and provides
the foundation for
inter-operable
implementations.
In
summary, it can be said that
the H.323 family of
standards provides a mature
basis for
commercial
products in the field of IP
Telephony.While the details of
the protocol are
often
dominated
by their legacy from various
earlier ITU protocols, there is an
active effort to profile
and
simplify the protocol to reduce
the complexity.
{
2.2.2
SIP
{
2.2.2.1
The purpose of SIP
SIP
stands for Session
Initiation Protocol. It is an application-layer
control protocol that has
been
developed
and designed within the
IETF.The protocol has been designed
with easy
implementation,
good scalability, and
flexibility in mind.
The
specification is available in form of
several RFCs.The most
important one is
RFC3261,
which
contains the core protocol
specification.The protocol is used for
creating, modifying
and
terminating
sessions with one or more
participants. By sessions, we understand
a set of senders
and
receivers that communicate and
the state kept in those
senders and receivers during
the
communication.
Examples of a session can include
Internet telephone calls,
distribution of
multimedia,
multimedia conferences, distributed computer
games, etc.
SIP
is not the only protocol
that the communicating
devices will need. It is not
meant to be a
general
purpose protocol.The purpose of
SIP is just to make the
communication possible.The
communication
itself must be achieved by
other means (and possibly
another protocol).Two
protocols
that are most often
used along with SIP
are RTP and SDP.The RTP
protocol is used to
carry
the real-time multimedia data
(including audio, video and
text).The protocol makes it
possible
to encode and split the data
into packets and transport
these packets over the
Internet.
Another
important protocol is SDP, Session
Description Protocol, which is used to
describe and
P.30
[IP
Telephony Cookbook] /
Technological Background
encode
capabilities of session participants.
Such a description is then used to
negotiate the
characteristics
of the session so that all
of the devices can
participate, including, for
example,
negotiation
of codecs used to encode
media so all the
participants will be able to
decode it,
negotiation
of transport protocol used and so
on.
SIP
has been designed in conformance
with the Internet model. It is an
end-to-end
-oriented
signalling protocol which means
that all the logic is
stored in end-devices
(except
routing
of SIP messages). State is
also stored only in
end-devices.There is no single point
of
failure
and networks designed this
way scale well.The price we
have to pay for
the
`distributiveness'
and scalability is higher
message overhead, caused by
the messages being
sent
end-to-end.
It
is worth mentioning that the
end-to-end concept of SIP is a
significant divergence from
a
regular
PSTN (Public Switched Telephone Network)
where all the state and
logic is stored in
the
network
and the end-devices
(telephones) are very
primitive.The aim of SIP is to
provide the
same
functionality that the traditional
PSTNs have, but the end-to-end
design makes SIP
networks
much more powerful and open
to the implementation of new
services that can
hardly
be
implemented in the traditional
PSTNs.
SIP
is based on HTTP protocol.The
HTTP protocol inherited format of
message headers from
RFC822.
HTTP and is probably the
most successful and widely
used protocol in the
Internet.
SIP
tries to combine the best of both. In
fact, HTTP can be classified
as a signalling protocol too,
because
user-agents use the protocol to
tell an HTTP server which
documents they are
interested
in.
SIP is used to carry the
description of session parameters.The description is
encoded into a
document
using SDP. Both protocols
(HTTP and SIP) have
inherited the encoding of
message
headers
from RFC822.The encoding has
proven to be robust and
flexible over the
years.
2.2.2.1.1
SIP URI
SIP
entities are identified
using SIP URI (Uniform
Resource Identifier). A SIP URI has
the form
of
sip:username@domain,
or sip:joe@company.com.
SIP URI consists of a username
part and
a
domain name part, delimited by
the @ (at) character. SIP
URIs are similar to e-mail
addresses
and
it is, for instance,
possible to use the same URI
for e-mail and SIP
communication. Such
URIs
are easy to remember.
{
2.2.2.2
SIP network
elements
Although,
in the simplest configuration, it is
possible to use just two
user agents that send
SIP
messages
directly to each other, a typical SIP
network will contain more
than one type of
SIP
element.
Basic SIP elements are
user agents, proxies,
registrars and redirect
servers.They are
described
briefly in this
section.
Note
that the elements, as
presented in this section,
are often only logical
entities. It is often
profitable
to co-locate them, for
instance, to increase the
speed of processing, but that
depends on
the
particular implementation and
configuration.
P.31
![]() [IP
Telephony Cookbook] /
Technological Background
2.2.2.2.1.
User agents
Internet
endpoints that use SIP to
find eachother and to
negotiate a session's characteristics
are
called
user agents. User agents
usually, but not necessarily,
reside on a user's computer in form
of
an
application.This is currently the most widely-used
approach, but user agents
can be also
cellular
phones, PSTN gateways, PDAs,
automated IVR systems and so
on.
User
agents are often referred to
as User Agent Server (UAS)
and User Agent Client (UAC).
UAS
and
UAC are logical entities and
each user agent contains a
UAC and UAS. UAC is
the part of
the
user agent that sends
requests and receives
responses. UAS is the part
of the user agent
that
receives
requests and sends
responses.
Because
a user agent contains both
UAC and UAS, user
agents behave like a UAC or
a UAS. For
instance,
a calling party's user agent
behaves like UAC when it
sends an INVITE request
and
receives
responses to the request. A
called party's user agent
behaves like a UAS when it
receives
the
INVITE and sends
responses.
But
this situation changes when
the called party decides to
send a BYE and terminate the
session.
In
this case the called
party's user agent (sending
BYE) behaves like UAC
and the calling
party's
user
agent behaves like
UAS.
Called
Party
UAC
Stateful
Forking Proxy
Calling
Party
UAS
INVITE
UAC
INVITE
UAC
UAS
Called
Party
UAS
INVITE
UAC
UAS
BYE
UAC
Figure
2.11 UAC and
UAS
Figure
2.11 shows three user
agents and one stateful
forking proxy. Each user
agent contains UAC
and
UAS.The part of the proxy
that receives the INVITE
from the calling party, in
fact, acts as a
UAS.When
forwarding the request statefully,
the proxy creates two
UACs, each of them
responsible
for one branch.
In
the example, called party B
picked up and later, when he
wants to tear down the
call, he sends
a
BYE. At this time, the
user agent that was
previously UAS becomes a UAC
and vice versa.
P.32
[IP
Telephony Cookbook] /
Technological Background
2.2.2.2.2
Proxy servers
SIP
allows the creation of an
infrastructure of network hosts
called proxy servers. User
agents
can
send messages to a proxy
server. Proxy servers are
very important entities in
the SIP
infrastructure.They
perform routing of a session invitations
according to invitee's current
location,
authentication, accounting and many
other important functions.
The
most important task of a
proxy server is to route
session invitations `closer' to a called
party.
The
session invitation will
usually traverse a set of
proxies until it finds one
which knows the
actual
location of the called party.
Such a proxy will forward
the session invitation directly to
the
called
party and the called
party will then accept or
decline the session
invitation.
There
are two basic types of
SIP Proxy Servers, stateless
and stateful.
2.2.2.2.2.1
Stateless servers
Stateless
servers are simple message
forwarders.They forward messages
independently of
eachother.
Although messages are
usually arranged into
transactions (see Section
2.2.2.4).
Stateless
proxies do not take care of
transactions.
Stateless
proxies are simple, but
faster than stateful proxy
servers.They can be used as
simple load
balancers,
message translators and
routers. One of drawbacks of
stateless proxies is that
they are
unable
to absorb re-transmissions of messages or
perform more advanced routing,
for instance,
forking
or recursive traversal.
2.2.2.2.2.2
Stateful servers
Stateful
proxies are more complex.
Upon reception of a request,
stateful proxies create a
state and
keep
the state until the
transaction finishes. Some
transactions, especially those
created by
INVITE,
can last quite long (until
the called party picks up or
declines the call). Because
stateful
proxies
must maintain the state for
the duration of the transactions,
their performance is
limited.
The
ability to associate SIP
messages into transactions
gives stateful proxies some
interesting
features.
Stateful proxies can perform
forking; that means that
upon reception of a message,
two or
more
messages will be sent
out.
Stateful
proxies can absorb
re-transmissions because they
know from the transaction
state if they
have
already received the same
message (stateless proxies
cannot do the check because
they keep
no
state).
Stateful
proxies can perform more complicated
methods of finding a user. It
is, for instance,
possible
to try to reach user's office
phone and when he does
not pick up, redirect
the call to his
cell
phone. Stateless proxies
cannot do this because they
have no way of knowing how
the
transaction
targeted to the office phone
finished.
Most
SIP Proxies today are
stateful because their configuration is
usually very
complex.They
often
perform accounting, forking
and some sort of NAT
traversal aid and all
those features
require
a stateful proxy.
P.33
![]() [IP
Telephony Cookbook] /
Technological Background
2.2.2.2.2.3
Proxy server usage
In
a typical configuration, each centrally-administered
entity (a company, for
instance) has its own
SIP
Proxy Server, which is used
by all user agents in the
entity. Suppose that there
are two
companies,
A and B, and each of them
has its own proxy server.
Figure 2.12 shows how a
session
invitation
from employee Joe in company
A will reach employee Bob in
company B.
DNS
Server
2.
SIP SRV
for
b.com
Company
A
Company
B
3.
proxy.b.com
proxy.a.com
Joe
proxy.b.com
4.
INVITE
1.
INVITE
5.
INVITE
5.6.7.8
Bob
6.
BYE
1.2.3.4
Figure
2.12 Session
invitation
User
Joe uses address sip:bob@b.com
to
call Bob. Joe's user agent
does not know how to
route
the
invitation itself but it is configured to
send all outbound traffic to the
company SIP Proxy
Server
proxy.a.com.The
proxy server figures out
that user sip:bob@b.com is
in a different
company
so it will look up B's SIP
Proxy Server and send
the invitation there. B's
proxy server
can
be either pre-configured at proxy.a.com or
the proxy will use
DNS SRV records to find
B's
proxy
server.The invitation reaches
proxy.bo.com.The proxy knows that Bob is
currently sitting
in
his office and is reachable through
phone on his desk, which has
IP address 1.2.3.4, so
the
proxy
will send the invitation
there.
2.2.2.2.3
Registrar
Its
has been mentioned that the
SIP Proxy at proxy.b.com
knows
current Bob's location but
have
not mentioned yet how a
proxy can learn current location of a
user. Bob's user agent
(SIP
phone)
must register with a
registrar.The registrar is a special
SIP entity that receives
registrations
from
users, extracts information
about their current location (IP
address, port and username
in
this
case) and stores the
information into a location database.The
purpose of the location
database
is
to map sip:bob@b.com
to
something like sip:bob@1.2.3.4:5060.The
location database is
then
used by B's proxy
server.When the proxy
receives an invitation for
sip:bob@b.com
it
will
search
the location database. It finds
sip:bob@1.2.3.4:5060
and
will send the invitation
there.
A
registrar is very often a logical
entity only. Because of their
tight coupling with
proxies,
registrars
are usually co-located with
proxy servers.
P.34
![]() [IP
Telephony Cookbook] /
Technological Background
Figure
2.13 shows a typical SIP
registration. A REGISTER message
containing Address of
Record
sip:jan@iptel.org
and
contact address sip:jan@1.2.3.4:5060 where
1.2.3.4 is IP
address
of the phone is sent to the
registrar.The registrar extracts
this information and stores
it
into
the location database. If everything
went well then the
registrar sends a 200 OK
response to
the
phone and the process of
registration is finished.
Location
Database
Record
in Location Database
User
Agent
Registrar
Location
Database
User
sip:jan@iptel.org is
reachable
at sip:jan@1.2.3.4:5060
REGISTER
Store
Location
2.
STORE
200
OK
1.
REGISTER
sip:jan@iptel.org
1.2.3.4:5060
3.
200 OK
Registrar
Figure
2.13 Overview of
Registrar
Each
registration has a limited life
span.The expires
header
field or the expires
parameter
of the
contact
header field determines for
how long the registration is valid.The
user agent must
refresh
the
registration within the life
span. Otherwise it will expire
and the user will
become
unavailable.
2.2.2.2.4
Redirect server
The
entity that receives a
request and sends back a
reply containing a list of the current
location
of
a particular user is called
redirect server. A redirect
server receives requests and
looks up the
intended
recipient of the request in the location
database, created by a registrar. It
then creates a
list
of current locations of the user
and sends it to the request
originator in a response within
SIP
3xx
redirection responses class.
The
originator of the request
then extracts the list of
destinations and sends
another request
directly
to them. Figure 2.14 shows a
typical redirection.
P.35
![]() [IP
Telephony Cookbook] /
Technological Background
Redirect
Server
INVITE
#1
302
Moved Temporarily
INVITE
#2
User
Agent A
User
Agent B
Figure
2.14 SIP Redirection
{
2.2.2.3
SIP messages
Communication
using SIP (often called
signalling) is comprised of a series of
messages. Messages
can
be transported independently by the
network. Usually they are
each transported in a
separate
UDP
datagram. Each message
consists of a `first line', a
message header and a message
body.The
first
line identifies type of the
message.There are two types
of messages: requests and
responses.
Requests
are usually used to initiate
some action or inform the
recipient of the request of
something.
Replies are used to confirm
that a request was received
and processed and contain
the
status
of the processing.
A
typical SIP request looks
like this:
INVITE
sip:7170@iptel.org SIP/2.0
Via:
SIP/2.0/UDP
195.37.77.100:5040;rport
Max-Forwards:
10
From:
"jiri"
<sip:jiri@iptel.org>;tag=76ff7a07-c091-4192-84a0-
d56e91fe104f
To:
<sip:jiri@bat.iptel.org>
Call-ID:
d10815e0-bf17-4afa-8412-d9130a793d96@213.20.128.35
CSeq:
2 INVITE
Contact:
<sip:213.20.128.35:9315>
User-Agent:
Windows RTC/1.0
Proxy-Authorisation:
Digest username="jiri",
realm="iptel.org",
algorithm="MD5",
uri="sip:jiri@bat.iptel.org",
nonce="3cef753900000001771328f5ae1b8b7f0d742da1feb5753c",
response="53fe98db10e1074
b03b3e06438bda70f"
Content-Type:
application/sdp
Content-Length:
451
v=0
o=jku2
0 0 IN IP4 213.20.128.35
s=session
P.36
[IP
Telephony Cookbook] /
Technological Background
c=IN
IP4 213.20.128.35
b=CT:1000
t=0
0
m=audio
54742 RTP/AVP 97 111 112 6 0
8 4 5 3 101
a=rtpmap:97
red/8000
a=rtpmap:111
SIREN/16000
a=fmtp:111
bitrate=16000
a=rtpmap:112
G7221/16000
a=fmtp:112
bitrate=24000
a=rtpmap:6
DVI4/16000
a=rtpmap:0
PCMU/8000
a=rtpmap:4
G723/8000
a=rtpmap:
3 GSM/8000
a=rtpmap:101
telephone-event/8000
a=fmtp:101
0-16
The
first line tells us that
this is an INVITE message which is
used to establish a session.The
URI
on
the first line, sip:7170@iptel.org
is
called Request URI and
contains the URI of the
next
hop
of the message. In this
case, it will be host
iptel.org.
A
SIP request can contain
one or more Via
header
fields which are used to
record path of the
request.They
are later used to route
SIP responses exactly the
same way.The INVITE message
contains
just one Via
header
field which was created by
the user agent that
sent the request.
From
the
Via
field we
can tell that the
user agent is running on
host 195.37.77.100 and port
5060.
The
From
and
To
header
fields identify initiator
(calling party) and recipient
(called party) of the
invitation
(just like in SMTP where
they identify sender and
recipient of a message).
The
From
header
field contains a tag
parameter which serves as a
dialogue identifier and will
be
described
in Section 2.2.2.5.
The
Call-ID
header
field is a dialogue identifier
and its purpose is to
identify messages belonging
to
the same call. Such
messages have the same
Call-ID
identifier.
CSeq
is used
to maintain order
of
requests. Because requests
can be sent over an
unreliable transport that
can re-order
messages,
sequence
numbers must be present in
the messages so that recipient
can identify
re-transmissions
and
out-of-order requests.
The
Contact
header
field contains the IP
address and port on which
the sender is awaiting
further
requests sent by called
party. Other header fields
are not important and
will be not
described
here.
The
Message
header is
delimited from message body
by an empty line.The Message
body
of
the
INVITE request contains a description of
the media type accepted by
the sender and
encoded
in
SDP.
2.2.2.3.1.
SIP requests
An
INVITE request has been described.The
request is used to invite a
called party to a
session.
Other
important requests
are:
P.37
[IP
Telephony Cookbook] /
Technological Background
-
ACK
This
message acknowledges receipt of a final
response to INVITE. Establishing of a
session
utilises
3-way hand-shaking due to asymmetric
nature of the invitation. It
may take a while
before
the called party accepts or
declines the call so the
called party's user agent
periodically
re-transmits
a positive final response until it
receives an ACK (which
indicates that the
calling
party
is still there and ready to
communicate);
-
BYE
BYE
messages are used to tear
down multimedia sessions. A
party wishing to tear down
a
session
sends a BYE to the other
party;
-
CANCEL
CANCEL
is used to cancel a not yet
fully-established session. It is used
when the called
party
has
not replied with a final
response yet but the calling
party wants to abort the
call (typically
when
a called party does not
respond for some
time);
-
REGISTER
The
purpose of REGISTER is to let the
registrar know of current user's
location. Information
about
the current IP address and
port on which a user can be
reached is carried in
REGISTER
messages.
Registrar extracts this
information and puts it into
a location database.The database
can
be later used by SIP Proxy
Servers to route calls to
the user. Registrations are
time-limited
and
need to be periodically refreshed.
The
listed requests usually have
no message body because it is
not needed in most
situations (but
can
have one). In addition, many
other request-types have
been defined but their
descriptions are
out
of the scope of this
document.
2.2.2.3.2
SIP responses
When
a user agent or proxy server
receives a request, it sends a
reply. Each request must be
replied
to
except ACK requests which
trigger no replies.
A
typical reply looks like
this:
SIP/2.0
200 OK
Via:
SIP/2.0/UDP
192.168.1.30:5060;received=66.87.48.68
From:
sip:sip2@iptel.org
To:
sip:sip2@iptel.org;tag=794fe65c16edfdf45da4fc39a5d2867c.b713
Call-ID:
2443936363@192.168.1.30
CSeq:
63629 REGISTER
Contact:
<sip:sip2@66.87.48.68:5060;transport=udp>;q=0.00;expires=120
Server:
Sip EXpress router
(0.8.11pre21xrc (i386/linux))
Content-Length:
0
Warning:
392 195.37.77.101:5060 "Noisy
feedback tells:
pid=5110
req_src_ip=66.87.48.68
req_src_port=5060
in_uri=sip:iptel.org
out_uri=sip:iptel.org
via_cnt==1"
Responses
are very similar to the
requests, except for the
first line.The first line of
response
contains
a protocol version (SIP/2.0) reply
code and reason phrase.The
reply code is an
integer
number
from 100 to 699 and
indicates type of the
response.There are 6 classes of
responses:
P.38
[IP
Telephony Cookbook] /
Technological Background
1xx
are provisional responses. A
provisional response is a response
that tells to its recipient
that
the
associated request was
received but the result of
the processing is not known
yet. Provisional
responses
are sent only when
the processing does not
finish immediately.The sender must
stop
re-transmitting
the request upon reception
of a provisional response.
Typically,
proxy servers send responses
with code 100 when
they start processing an INVITE
and
user
agents send responses with
code 180 (Ringing) which
means that the called
party's phone is
ringing.
2xx
responses are positive final
responses. A final response is the
ultimate response that
the
originator
of the request will ever
receive.Therefore, final responses
express the result of
the
processing
of the associated request.
Final responses also terminate
transactions. Responses
with
code
from 200 to 299 are
positive responses.That means
that the request was
processed
successfully
and accepted. For instance,
a 200 OK response is sent
when a user accepts
the
invitation
to a session (INVITE
request).
A
UAC may receive several
200 messages to a single
INVITE request.This is because a
forking
proxy
(described later) can fork
the request so it will reach
several UAS and each of them
will
accept
the invitation. In this
case, each response is
distinguished by the tag
parameter in the To
header
field. Each response represents a
distinct dialogue with an unambiguous
dialogue identifier:
-
3xx responses are used to
redirect a calling party. A redirection
response gives information
about
the
user's new location or an alternative
service that the calling
party might use to satisfy
the
call.
Redirection responses are
usually sent by proxy
servers.When a proxy receives a
request
and
does not want or can't
process it for any reason,
it will send a redirection response to
the
calling
party and put another location
into the response which
the calling party might want
to
try.
It can be the location of another
proxy or the current location of the
called party (from
the
location
database created by a registrar).The
calling party is then supposed to
re-send the
request
to the new location. 3xx
responses are final;
-
4xx are negative final
responses. A 4xx response
means that the problem is on
the sender's side.
The
request could not be processed
because it contains bad
syntax or cannot be fulfilled at
that
server.
-
5xx means that the problem
is on server's side.The request is
apparently valid but the
server
failed
to fulfil it. Clients should
usually retry the request
later;
-
6xx reply code means
that the request cannot be
fulfilled at any server.This
response is usually
sent
by a server that has definitive
information about a particular
user. User agents usually
send
a
603
Decline response
when the user does
not want to participate in
the session.
In
addition to the response class,
the first line also contains
the reason phrase.The code
number is
intended
to be processed by machines. It is not
very human-friendly but it is very
easy to parse
and
understand by machines.The reason
phrase usually contains a
human-readable message
describing
the result of the
processing. A user agent
should render the reason
phrase to the user.
The
request to which a particular
response belongs is identified
using the CSeq
header
field. In
addition
to the sequence number, this
header field also contains
the method of corresponding
request.
In our example it was a REGISTER
request.
P.39
[IP
Telephony Cookbook] /
Technological Background
{
2.2.2.4.
SIP transactions
Although
we said that SIP messages
are sent independently over
the network, they are
usually
arranged
into transactions by user
agents and certain types of
proxy servers.Therefore SIP is
said
to
be a transactional protocol.
A
transaction is a sequence of SIP
messages exchanged between
SIP network elements.
A
transaction
consists of one request and
all responses to that
request.That includes zero or
more
provisional
responses and one or more final
responses (remember that an INVITE
might be
answered
by more than one final response
when a proxy server forks
the request).
If
a transaction was initiated by an INVITE
request, then the same
transaction also includes
ACK,
but
only if the final response
was not a 2xx response. If
the final response was a 2xx
response, then
the
ACK is not considered part
of the transaction.
As
we can see, this is quite asymmetric
behaviour, ACK is part of
transactions with a negative
final
response
but is not part of transactions
with positive final responses.The
reason for this
separation
is
the importance of delivery of all 200 OK
messages. Not only do they
establish a session, but
also
200 OK can be generated by
multiple entities when a
proxy server forks the
request and all
of
them must be delivered to
the calling user agent.Therefore,
user agents take
responsibility in
this
case and retransmit 200 OK
responses until they receive
an ACK. Also note that
only
responses
to INVITE are retransmitted.
SIP
entities that have a notion
of transactions are called
stateful. Such entities
usually create a
state
associated
with a transaction that is
kept in the memory for
the duration of the
transaction.When
a
request or response comes, a
stateful entity tries to
associate the request (or
response) to existing
transactions.To
be able to do this, it must
extract a unique transaction identifier
from the message
and
compare it to identifiers of all existing
transactions. If such a transaction
exists, then its
state
gets
updated from the
message.
In
the previous SIP RFC2543,
the transaction identifier
was calculated as hash of
all important
message
header fields (that included
To,
From,
Request-URI
and
CSeq).This
proved to be very
slow
and complex. During interoperability
tests, such transaction identifiers
were a common
source
of problems.
In
the new RFC3261, the
way of calculating transaction
identifiers was completely
changed.
Instead
of the complicated hashing of important
header fields, a SIP message
now includes the
identifier
directly.The branch parameter of
Via
header
fields directly contains the
transaction
identifier.This
is a significant simplification, but
there still exist old
implementations that do not
support
the new way of calculating
of the transaction identifier, so
even new implementations
have
to support the old way.They
must be backwards-compatible.
Figure
2.15 shows what messages
belong to what transactions
during a conversation of two
user
agents.
P.40
![]() [IP
Telephony Cookbook] /
Technological Background
Called
party
Calling
party
INVITE
100
Trying
Transaction
#1
180
Ringing
200
OK
ACK
BYE
200
OK
Transaction
#2
Figure
2.15 SIP transactions
{
2.2.2.5
SIP Dialogues
It
has been shown what
transactions are, that one
transaction includes INVITE and
its responses
and
another transaction includes
BYE and its responses
when a session is being torn
down.Those
two
transactions should be somehow
related-both
of them
belong to the same dialogue.
A
dialogue
represents a peer-to-peer SIP
relationship between two
user agents. A dialogue
persists
for
some time and it is very
important concept for user
agents. Dialogues facilitate the
proper
sequencing
and routing of messages
between SIP
endpoints.
Dialogues
are identified using
Call-ID,
From
tag,
and To
tag.
Messages that belong to the
same
dialogue
must have these fields
equal.We have shown that
CSeq
header
field is used to
order
messages.
In fact, it is used to order
messages within a dialogue.The number
must be
monotonically
increased for each message
sent within a dialogue.
Otherwise the peer will
handle
it
as an out-of-order request or
retransmission. In fact, the
CSeq
number
identifies a transaction
within
a dialogue, because we have
said that requests and
associated responses are
called
transactions.This
means that only one
transaction in each direction can be
active within a
dialogue.
One could also say that a
dialogue is a sequence of transactions.
Figure 2.16 extends
Figure
2.15 to show which messages
belong to the same
dialogue.
P.41
![]() [IP
Telephony Cookbook] /
Technological Background
Called
party
Calling
party
INVITE
100
Trying
Transaction
#1
180
Ringing
200
OK
Dialog
ACK
BYE
200
OK
Transaction
#2
Figure
2.16 SIP dialogue
Some
messages establish a dialogue
and some do not.This is used
to explicitly express the
relation-
ship
of messages and also to send
messages that are not
related to other messages
outside a dialogue.
That
is easier to implement because
user agents do not have to
maintain the dialogue
state.
For
instance, an INVITE message establishes a
dialogue, because it will
later be followed by a
BYE
request, which will tear
down the session established
by the INVITE.This BYE is
sent
within
the dialogue established by
the INVITE.
But,
if a user agent sends a
MESSAGE request, such a
request does not establish
any dialogue. Any
subsequent
messages (even MESSAGE) will
be sent independently of the
previous one.
2.2.2.5.1.
Dialogues facilitate
routing
Dialogues
are also used to route
the messages between user
agents, as described
briefly.
Suppose
that user sip:bob@a.com
wants
to talk to user sip:pete@b.com.
He knows the SIP
address
of the called party (sip:pete@b.com)
but this address does not
say anything about
current
location of the user, i.e.,
the calling party does not
know to which host to send
the
request.Therefore,
the INVITE request will be
sent to a proxy
server.
The
request will be sent from
proxy to proxy until it
reaches one that knows the
current location
of
the called party.This process is
called routing. Once the
request reaches the called
party,
the
called party's user agent
will create a response that
will be sent back to the
calling party.
The
called party's user agent
will also put a contact
header field into the
response which will
contain
the current location of the user.The
original request also
contained a contact header
field
which
means that both user agents
know the current location of the
peer.
P.42
![]() [IP
Telephony Cookbook] /
Technological Background
Because
the user agents know
the location of each other, it is not
necessary to send further
requests
to any proxy.They can be sent directly
from user agent to user
agent.That is exactly how
dialogues
facilitate routing.
Further
messages within a dialogue
are sent directly from user
agent to user agent.This is
a
significant
performance improvement because proxies
do not see all the messages
within a
dialogue.They
are used to route just
the first request that
establishes the dialogue.The
direct
messages
are also delivered with
much smaller latency because
a typical proxy usually implements
complex
routing logic. Figure 2.17
contains an example of a message
within a dialogue
(BYE)
that
bypasses the proxies.
Proxy
1
Proxy
2
INVITE
INVITE
INVITE
BYE
Calling
party
Called
party
Figure
2.17 SIP trapezoid
2.2.2.5.2
Dialogue identifiers
Dialogue
identifiers consist of three parts,
Call-Id, From
tag and To
tag, but
it is not that clear
why
dialogue identifiers are created exactly
this way and who
contributes which
part.
Call-ID
is called call identifier. It
must be a unique string that
identifies a call. A call
consists of
one
or more dialogues. Multiple user
agents may respond to a
request when a proxy along
the
path
forks the request. Each
user agent that sends a
2xx response, establishes a
separate dialogue
with
the calling party. All such
dialogues are part of the
same call and have
the same Call-ID.
A
From
tag is
generated by the calling party
and it uniquely identifies the
dialogue in the calling
party's
user agent.
A
To
tag is
generated by a called party
and uniquely identifies it,
just like the From
tag is
the
dialogue
in the called party's user
agent.
This
hierarchical dialogue identifier is
necessary because a single
call-invitation can create
several
dialogues
and the calling party must
be able to distinguish
them.
{
2.2.2.6
Typical SIP
scenarios
This
section gives a brief
overview of typical SIP scenarios
that usually make up the
SIP traffic.
P.43
![]() [IP
Telephony Cookbook] /
Technological Background
2.2.2.6.1
Registration
Users
must register themselves
with a registrar to be reachable by
other users. A
registration
comprises
a REGISTER message followed by a
200 OK sent by the registrar
if the registration
was
successful. Registrations are
usually authorised so a 407
reply which can appear if
the user did
not
provide valid credentials.
Figure 2.18 shows an example
of a registration.
Registrar
User
Agent
REGISTER
w/o
credentials
407
REGISTER
w/
credentials
200
OK
Figure
2.18 REGISTER message
flow
2.2.2.6.2
Session invitation
A
session invitation consists of
one INVITE request which is
usually sent to a proxy.The
proxy
sends
immediately a 100
Trying reply to
stop re-transmissions and
forwards the request
further.
All
provisional responses generated by
the called party are
sent back to the calling
party. See teh
180
Ringing response
in the call flow.The
response is generated when
the called party's
phone
starts
ringing.
A
200 OK is generated once the
called party picks up the
phone and it is re-transmitted by
the
called
party's user agent until it
receives an ACK from the
calling party.The session is
established
at
this point.
2.2.2.6.3
Session termination
Session
termination is accomplished by sending a
BYE request within the
dialogue established by
INVITE.
BYE messages are sent
directly from one user agent
to the other, unless a proxy on
the
path
of the INVITE request has indicated
that it wishes to stay on
the path by using
record
routing
(see Section
2.2.2.6.4).
A
party wishing to tear down a
session sends a BYE request
to the other party involved in
the
session.The
other party sends a 200 OK
response to confirm the BYE
and the session is
terminated.
See Figure 2.20, left
message flow.
P.44
![]() [IP
Telephony Cookbook] /
Technological Background
Calling
party
SIP
Proxy
Called
party
INVITE
100
Trying
INVITE
100
Trying
180
Ringing
180
Ringing
200
OK
200
OK
ACK
RTP
Streams
Figure
2.19 INVITE message
flow
2.2.2.6.4
Record routing
All
requests sent within a
dialogue are, by default,
sent directly from one user
agent to the other.
Only
requests outside a dialogue
traverse SIP proxies.This approach
makes a SIP network
more
scalable
because only a small number of
SIP messages hit the
proxies.
There
are certain situations in
which a SIP Proxy needs to
stay on the path of all
further
messages.
For instance, proxies
controlling a NAT box, or proxies doing
accounting need to
stay
on
the path of BYE
requests.
The
mechanism by which a proxy
can inform user agents
that it wishes to stay on
the path of all
further
messages is called record
routing. Such a proxy would
insert a Record-Route
header
field
into SIP messages which
contain address of the proxy.
Messages sent within a
dialogue will
then
traverse all SIP proxies
that put a Record-Route
header
field into the
message.
The
recipient of the request receives a
set of Record-Route
header
fields in the message. It
must
mirror
all the Record-Route
header
fields into responses
because the originator of
the request
also
needs to know the set of
proxies.
P.45
![]() [IP
Telephony Cookbook] /
Technological Background
Without
record routing
With
record routing
SIP
Proxy
UA1
SIP
Proxy
UA1
UA2
UA2
BYE
BYE
200
OK
BYE
200
OK
200
OK
Figure
2.20 BYE message flow
(with and without record
routing)
The
lefthand message flow of
Figure 2.20 shows how a
BYE (request within dialogue
established
by
INVITE) is sent directly to the
other user agent when
there is no Record-Route
header
field
in
the message.The righthand message
flow shows how the
situation changes when the
proxy
puts
a Record-Route
header
field into the
message.
2.2.2.6.5
Event subscription and
notification
The
SIP specification has been
extended to support a general
mechanism allowing subscription
to
events.
Such evens can include, SIP
Proxy statistics changes to,
presence information,
session
changes
and so on.
The
mechanism is used mainly to
convey information on presence
(the willingness to
communicate)
of users. Figure 2.21 shows
the basic message
flow.
Server
User
Agent
SUBSCRIBE
200
OK
NOTIFY
200
OK
Event
NOTIFY
200
OK
Figure
2.21 Event subscription and
notification
P.46
![]() [IP
Telephony Cookbook] /
Technological Background
A
user agent interested in an
event notification sends a
SUBSCRIBE message to a SIP
server.The
SUBSCRIBE
message establishes a dialogue
and is immediately replied to by
the server using a
200
OK response. At this point,
the dialogue is established.The
server sends a NOTIFY
request to
the
user every time the
event to which the user
subscribed changes. NOTIFY
messages are sent
within
the dialogue established by
the SUBSCRIBE.
Note
that the first NOTIFY
message in Figure 2.21 is
sent regardless of any event
that triggers
notifications.
Subscriptions,
as well as registrations, have a
limited life span and
therefore must be periodically
refreshed.
2.2.2.6.6
Instant messages
Instant
messages are sent using a
MESSAGE request. MESSAGE
requests do not establish
a
dialogue
and therefore they will
always traverse the same
set of proxies.This is the simplest
form
of
sending instant messages.The
text of the instant message
is transported in the body of
the SIP
request.
User
Agent
Proxy
User
Agent
MESSAGE
MESSAGE
200
OK
200
OK
MESSAGE
MESSAGE
200
OK
200
OK
Figure
2.22 Instant Messages
{
2.2.3.
Media Gateway Control
Protocols
In
a traditional telephone network, the
infrastructure consists of large
telephone switches
which
interconnect
with each other to create
the backbone network and
which also connect to
customer
equipment (PBXs, telephones).While the internal
network today is based upon
digital
communication,
links to customers may be either
analogue (PSTN) or digital
(ISDN).The links
to
customers are shared between
call signalling (for dialling, invocation
of supplementary services,
etc.)
and carriage of voice/data. In
the backbone, dedicated
(virtual) links interconnecting
switches
are reserved for call
signalling (de-facto creation of a
dedicated network of its
own)
whereas
voice/data traffic is carried on separate
links.The Signalling System
No. 7 (SS7) or
P.47
![]() [IP
Telephony Cookbook] /
Technological Background
variants
of it are used as the call
signalling protocol between switches;
this protocol is used to
route
voice/data channels across
the backbone network by instructing
each switch on the
way
which
incoming `line' is to be forwarded to
which outgoing `line' and
which other
processing
(such
as simple voice compression, in-band
signalling detection to customer
premise equipment,
etc.)
is to be applied.Voice/data channels
themselves are plain bit
pipes identified by roughly a
trunk
and line identifier at each
switch.
Figure
2.23 Application scenario
for Media Gateway Control
Protocols
A
similar construction is now considered by a number of
telecom companies for
IP-based
backbone
networks that may successively
replace parts of their
overall switched-network
infrastructure,
as depicted in Figure 3.7.
Instead of voice switches, IP
routers are used to build up
a
backbone
network which employs IP
routing, possibly MPLS, and,
most likely, some explicit
form
of
QoS support to carry voice
and data packets from
any point in the network to
any other. In
contrast
to voice switches, this does
not require explicit configuration of
the individual
routers
per
voice connection. Instead, only
the entry and exit points
need to be configured with
each
others'
addresses, so that they know
where to send their voice/data
packets.Two types of
gateways
are
used at the edges of the IP
network to connect to the
conventional telephone
network:
signalling
gateways to convert SS7
signalling into IP-based
call control (which may
make use of
H.323
or SIP or simply provide a
transport to carry SS7
signalling in IP packets
[SIGTRAN])
and
media gateways that perform
voice transcoding. Some
central entity (or more probably,
a
number
of co-operating entities) forms the
intelligent core of the
backbone, the Media
Gateway
Controller(s).They
interpret call signalling and
decide how to route calls
and they provide
supplementary
services, etc. Having
decided on how a call is to be
established, they inform
the
(largely
passive and `dumb') media
gateways at the edges
(ingress and egress
gateways) how and
where
to transmit the voice packets.The
Media Gateway Controllers also
re-configure the
gateways
in case of any changes in
the call, invocation of supplementary
services, etc.The media
gateways
may be capable of detecting invocation of
control features in the
media channel (e.g.,
through
DTMF tones) and notify
the Media Gateway Controller(s),
which then initiate
the
appropriate
actions.
A
number of protocols have been
defined for communication
between Media Gateway
Controllers
and media gateways. Initial
versions were developed by
multiple camps, some
of
which
merged to create the Media
Gateway Control Protocol (MGCP),
the only one of
the
proprietary
protocols that is documented as an
Informational RFC (RFC
2705). An effort was
launched
to make the two remaining
camps cooperate and develop
a single protocol to be
standardised,
which resulted in work
groups in the ITU-T (rooted in
Study Group 16, Q.14)
and
P.48
[IP
Telephony Cookbook] /
Technological Background
in
the IETF (Media Gateway
Control, MEGACO WG).The protocol
being jointly developed
is
referred
to as H.248 in the ITU-T and as
MEGACO in the IETF.
One
particular protocol extension currently
discussed in the IETF is the
definition of a protocol
for
communication with an IP telephone at
the customer premises that
fits seamlessly with
the
Media
Gateway Control architecture.
Such a telephone would be a
rather simple entity,
essentially
capable
of transmitting and receiving events
and reacting to them, while
the call services
are
provided
directly by the network
infrastructure.
{
2.2.4
Proprietary signalling
protocols
Today
nearly every vendor that
offers VoIP products uses
his own VoIP protocol, e.g.,
Cisco's
Skinny
or Siemens's CorNet.They were
invented by the vendors to be
able to provide more
specific
supplementary services in the Voice
over IP world, in order to
offer customers all
the
features
they already know from
their classic PBX.The
enterprise solutions usually
feature such
proprietary
protocols at the cure and
provide minimalist support
for standardised protocols
(until
now
usually H.323) with only
basic call
functionality.
Giving
detailed information about
those protocols is out of
the scope of this document
and is
usually
difficult to provide because
most protocols are not
publicly available.
{
2.2.5.
Real Time Protocol (RTP) and Real Time
Control Protocol (RTCP)
RTP
and RTCP are the
transport protocols used for
IP Telephony media streams.
Both of them
were
defined in RFC1889: the
former as a protocol to carry data
that has real-time properties,
the
latter
to monitor the quality of service
and to convey information
about the participants in
on-
going
session.The services provided by
the RTP protocol are:
-
identification of the carried
information (audio and video
codecs);
-
checking packet in-order delivery
and, if necessary, re-ordering the
out-of-sequence blocks;
-
transport of the coder/decoder
synchronisation information;
-
monitoring of the information
delivery.
The
RTP protocol uses the underlying
User Datagram Protocol (UDP) to
manage multiple
connections
between two entities and to
check for data integrity
(checksum). An
important
point
to stress is that RTP
neither provides any means
to have a guaranteed QoS nor assumes
the
underlying
network delivers ordered
packets.
The
RTCP protocol uses the same
protocols as RTP to periodically send
control packets to all
session
participants. Every RTP
channel using port number N
has its own RTCP protocol
channel
with
port number equal to N+1.The
services provided by the
RTCP are:
-
giving a feedback on the
data quality distribution, feedback
used to keep control of the
active
codecs;
-
transporting a constant identifier for
the RTP source (CNAME),
used by the video
data;
-
advertising the number of
session participants which is
used to adjust the RTP
data transmission
rate;
-
carrying session control information
used to identify the session
participants.
P.49
![]() [IP
Telephony Cookbook] /
Technological Background
The
next two subsections
describe the RTP and
RTCP header and the
different types of packets
that
the two protocols
use.
{
2.2.5.1
RTP header
Figure
2.24 shows the RTP
header.The first twelve bytes are
present in all of the RTP
packets.
The
last bytes, containing the
CSRC (Contributing SouRCe) identifiers list, is
present only when
a
mixer is crossed (mixer
refers to a system which
receives two or more RTP flows,
combines
them
and forwards the resulting
flow).
Figure
2.24 RTP header
The
header fields are here
detailed:
-
version (V - 2 bits) contains
the RTP protocol
version;
-
padding (P - 1 bit), if set to 1,
then the packet contains
one or more additional bytes after
the
data
field;
-
extension (X - 1 bit), if set to 1,
then the header is followed
by an extension;
-
CSRC count (CC - 4 bits)
contains the CSRC identifier
number which follows the
header;
-
marker (M - 1 bit) is the
application available
field;
-
payload type (PT - 7 bits)
identifies the data field
format of the RTP packet
and determines its
interpretation
by the application;
-
sequence number (16 bits)
value incremented by one for
each RTP packet sent, is
used by the
receiver
to detect losses and to determine
the right sequence;
-
RTP timestamp (32 bits) is
the sampling time of the
first RTP byte, used for
synchronisation
and
jitter calculation;
-
SSRC ID (32 bits) identifies
the synchronisation source,
chosen randomly within a
RTP
session;
-
CSRC ID list (from 0 to 15*32
bits) is an optional field identifying
the sources which
contribute
to the data in the
packet.The number of the CSRC
IDs is written in the
CSRC
count
field.
{
2.2.5.2
RTCP packet-types and
format
In
order to transport the
session control information, the
RTCP foresees a number of
packet-types:
-
SR, Sender Report, to carry
the information sent by the
transmitters, to give notice to
the
other
participants on the control
information they should
receive (number of bytes, number
of
packets,
etc.);
P.50
[IP
Telephony Cookbook] /
Technological Background
-
RR, Receiver Report, to carry
the statistics of the
session participants which
are not active
transmitters;
-
SDES, Source DESscription, to carry
the session description (including
the CNAME
identifier);
-
BYE, to notify the intention
of leaving the
session;
-
AAP, to carry application
specific functions, used by experimental
use of new
applications.
Every
RTCP packet begins with a
fixed part similar to the
one of the RTP ones,
and this part is
then
followed by structural elements of
variable length. More than
one RTCP packet may
be
linked
together to build a COMPOUND PACKET.
Moreover, in order to maximise
the
statistics
resolution, the SR and the RR
packet-types are to be sent more often
than the other
packet-types.
P.51
Table of Contents:
|
|||||