Echo cannot origin from an VoIP network. But delay times due to codecs and buffering quickly makes even the slightest echo recieved very annoying. Echo is generated by digital <-> anolog conversions either in the PSTN or “at the other end” (overlaps of ear & mouth)”.
If you hear echo then the source of the echo is either the far end or somewhere in the PSTN network.
If the far end hears echo then you generate the Echo.
[ad#co-1]
There are a couple of mechanism to prevent echo that is ERL (Echo Return Loss) or ERLE (Echo Return Loss Enhance). ERLE is often named Echo Canceller.
ERL use adjustment of powerlevels for recieve and transmit audio stream, and should be adjusted so that the echo is as low volume as possible without loosing the ability to communicate with the far end. You have 2 possibilities :
1) Lower powerlevels of what you send out – Risk : The far end cannot hear you
2) Lower powerlevels of what you recieve – Risk : You cannot hear the far end
If the Echo is Doubletalk – then ERLE cannot distinguish Real talk from Echo, and the ERLE stops working.
If this happens you should adjust ERL values.
So ERLE stops working with doubletalk. But there is an other more annoying way to stop the ERLE from working – The is delay times outside all gateways from the VoIP network. Being the gateway toward PSTN og The Analog Pots.
Every POTS should have an ERLE on at least 8 ms, this would normally be sufficient to removed ECHO generated bye overlap between Ear & Mouth pieces or in the short cable connecting the phone to the POTS. Of course IP phones should also have at least 8 ms ERLE.
The gateway towards the PSTN network should have sufficient ERLE to compensate for the delay in the PSTN (which is about ~1 ms pr. 1000 km fiber, 2-4 ms pr. interconnection – ADMs and so on) … The Standard for ERLE i G.164 which specify at least 128 ms – But 128 would make the price of the gateway astronimical for us non-Telco operators due to the complexity in ERLE. Therefore typical values 16-64 ms typically 16-32 ms ERLE. ERLE buffers constantly 16 ms of outgoing talk, converts it to a reverse pattern (a ^ function – e.g. a HAT – function) and compare it to the incoming voice stream. If it finds a matching pattern on the incoming voice stream it will be applied if not it will be discarded. If delay times is more that the buffer in ERLE then ERLE will never work. And echo cannot be eliminated. You can only minimize the annoyence using ERL operations.
There are allways ERLE deployed in the Public PSTN network where it interconnects with international destinations, and towards Mobile networks (PLMNs). But not allways between PSTN operators. Thats because all other telephony network and connection can tolerate fairly large amount of delay in the public network – You can have echo but it is not hearable by the human ear.
With VoIP networks which have to convert voice stream to data streams and back again so that the round tripdelay times can often be more that 160 ms – Which is well in to the hearable area (I belive all types of Echo with a gap or roundtrip delay above 32 ms is hearable first talking in a large room but the larger roundtrip delay is the more distinct the echo becomes.
This also means that lag times in the IP network is not a probable contributor to the Echo – The operation around the DSPs width roundtrip
delays over 160 ms is the probable contributor. Reducing buffering in the DSP function is a possibility, but at the cost of bandwidth and does
normally not help much.
The only way to eliminate echo is adjusting ERL and have to sufficient ERLE in the gateways. Where ERL reduces the powerlevel of the Echo, so that ERLE can kickin and remove the echo completely (Actually reduces the powerlevel of the Echo to very low values (unhearable) . The combined loss Echo by the combined ERL and ERLE is ACOM – I cant remember the threshold values of the
power levels for hearable sounds but ACOM should be below this value if ERLE works.
You can only completely remove echo by removing the source of the Echo.