Ethernet Receiver

More Photos PLL Theory Q&A Back to projects

More photos

This project is an attempt to capture 10BASE-T (10Mbps) Ethernet packets using phase locked loop (PLL) clock recovery. This is not the best way to do it! An FPGA is the way to go. This is really an exercise in PLL design. The challenge is to lock in less than 64 cycles (6.4 µs) which is the length of the 10BASE-T preamble. This demands a fast response i.e. wide loop bandwidth. Also, at a comparison frequency of 10 MHz, gate propagation delays are significant and switching speed is critical. This is one PLL application, however, where phase noise is not a concern. Although it works, the design has some shortcomings as you will read.

Transceiver

This is my interface to twisted pair. Even though I'm not transmitting packets, the hub (or whatever) at the other end will not talk to me unless I periodically send NLP link pulses.

T2 is wired to invert the negative-going pulses produced by U8/C. I'm using an SI-10021 RJ45 connector with integral magnetics purchased from RS Components. The wiring shown is for connection to a hub via a "straight" cable. Either swap TD/RD, or use a "crossed" cable to a PC.

Manchester encoding

Clock and data are combined by transmitting logic '0' as 10 and logic '1' as 01. There's an edge in the middle of every bit. To allow receivers to synchronise, packets begin with a 64 bit preamble which is the sequence 101010... ending in 11:

I'm using quadrature clocks. The rising edge of the In-Phase Clock (CLK I) falls in the centre of the data bit. Lock must be acquired during the preamble. The Quadrature Clock (CLK Q) can then be used to gate-out unwanted edges between data bits.

VCO

My original plan was to use a voltage controlled crystal oscillator (VCXO). Unfortunately, as I discovered when I tried to design the loop filter, it's impossible to achieve wide loop bandwidth using a VCXO. kVCO is too low. The answer was to use an LC oscillator. My circuit is based on one used by RACAL in the RA1772 communications receiver:

L1, C6 and C7 form a capacitvely-tapped tuned circuit resonant at 20 MHz. The output at Q7 collector is almost square - ideal for logic.

L2 is perhaps a little low. I tried a 10k resistor here but the loop response was too slow because it formed a low-pass filter with the input capacity of C7/C8. L2 must be small for the same reason. Unfortunately, it probably loads the tuned circuit reducing the Q somewhat.

The measured transfer characteristic of the VCO is shown below:

The slope is 320 KHz per volt after the divide-by-2. kVCO = 2 x 106 radians per volt-second. The linear region is not centred on 20MHz!

The '35Z varicap (1-8V / 535-22pF) is intended for AM Radio tuning! I tried a BB809 first but the response was too slow. Lock-in time dropped dramatically when I touched the '35Z across it. kVCO was greatly increased because the '35Z has a lot more pF per volt.

The duty cycle of the 20 MHz signal is not 50/50. Because of this, and logic propagation delays, the clocks are not quite in quadrature. The oscillator could run at 40 MHz with an additional divide-by-2 preceding the quadrature generator.

PLL

A charge pump using complementary PNP/NPN transistors follows a standard phase-frequency detector based on dual D-type flip-flops. U9/C gates-out the unwanted EDGE pulses at the phase comparator reference input:

The charge pump has a current output of approximately ±3mA giving the overall phase detector a theoretical gain of kPD = 0.5 mA per radian. The exact value is hard to predict. Unfortunately, switching speed is a problem with this circuit. I added capacitor C15 to speed-up Q3 turn-on but it's still slow at turning off.

Another problem is imbalance between source and sink currents. The loop automatically corrects for this by running at a small constant phase error resulting in zero net charge output. Although small, the error was too much for the original lock detector. The voltage across C16 did not quite cross the lower logic threshold until I reduced R16 to 3k9.

The duty cycle at the output of U8/b is a measure of the quality of lock. Ideally, it should be narrow spikes. The reset pulses from U9/D also look a bit wide. Gate propagation delays are significant at the 10 MHz comparison frequency. Substituting a 74F00 device improved matters slightly.

Q6 pulls the VCO down to fMIN between packets (because there are no EDGE pulses). As the VCO control voltage falls, Q6 base/collector junction becomes forward biased and it saturates. Q5/Q6 base biasing needs to be pretty stiff because it must source 3mA without droop.

I thought it wise to buffer the loop filter with a JFET source-follower. The VCO control line needs to be driven from a fairly low impedance. The source resistor was returned to a negative supply to keep ID fairly constant and the junction reverse-biased over a wide range of VCO control voltages, however, the circuit seems to work fine if it's simply returned to ground.

Here's a 'scope screen shot of the capture transient. The horizontal scale is 1µS per division:

The lower trace is received data. Part of the preamble and broadcast MAC address (FF-FF-FF-FF-FF-FF) are visible. The loop gets a little kick at the boundary. These kicks occur throughout the packet whenever the polarity of the synchronising edge changes. Here's a close up:

Note the f/2 square wave on the VCO control line during the preamble. It looks like the system has different propagation delays for hi-lo and lo-hi transitions. During the preamble, the loop sees alternate late and early edges. It then levels-off in the MAC address where it always locks on the same transition.

Rising DATA edges look early on the 'scope. Putting C=100pF, V=0.2V and I=3mA into CV = It gives t=7ns which is about what I see. Replacing the 74HC86 with a 74F86 didn't help and I wouldn't expect it to. According to the F86 datasheet, tPLH and tPHL differ by only 0.7ns. The culprit would appear to be either the tansceiver circuit or my hub.

Loop filter design

A bit of maths is required to ensure stability. The loop will oscillate if the phase of the open loop gain is 180 degrees at the frequency where its magnitude passes through unity. I've written a little VB/C++ thingy to do bode plots and step response. The object is to get the step response to settle down as quickly as possible whilst avoiding instability. With a 3rd order loop, you get a pole and a zero to play with:

Transfer function:

Open loop gain:

Closed loop gain:

Green = |G| Red = |H| Blue = G Black = Step (1µs/div)

Using a GUI, you get an intuitive feel for what the components do. Assuming C1»C2 : C1 moves the zero; C2 moves the pole; and R moves both in tandem. G is 180 degrees except where |G| crosses unity. The pole and zero must straddle this point. The zero reduces the slope from −40dB to −20dB per decade. Once safely through unity, the pole kicks in taking the slope back to −40dB per decade which maximises reference attenuation.

Behind the GUI, I'm using the MSVC complex class like this:


complex<double> Filter(complex<double> jw)
{
    return (jw*R*(C1+C2) + 1.0) / (jw*C1) / (jw*R*C2 + 1.0);  
}


complex<double> OpenLoop(double f)
{
    complex<double> jw(0, TWO_PI * f);
    return Kpd * Kvco/jw * Filter(jw);
}

    ...

    complex<double> g = OpenLoop(f);
    complex<double> h = g / (1.0+g);

I'm calculating step response by summing weighted (1, 1/31/51/71/9  ...) odd-order harmonics of a low frequency fundamental i.e. brute force and ignorance. A more sophisticated method is to use SCILAB.

You can plot the step response with SCILAB like this:


c1 = 680e-12;
c2 = 100e-12;
r = 1800;
kvco = 2e6;
kpd = 0.5e-3;
s = poly(0,'s');
f = 1/s/c1 * (1+s*r*(c1+c2)) / (1+s*r*c2);
g = kpd * f * kvco/s;
h = g/.(1);
h = tf2ss(h, 1e-40);
t = 0:.1e-6:1e-5;
xbasc(0);xset("window",0);xselect();
plot2d([t',t'],[(csim('step',t,h))',ones(t')]);   
  

You can't say h=g/(1+g) in SCILAB. You must use the /. (slashdot) system feedback operator to avoid a simplification problem. See also slash. The second tf2ss() parameter sets an optional "tolerance" which I found necessary to avoid errors with certain transfer functions.

By inserting

  f = 1 / (1/f + s*10e-12); 
you can incorporate 10pF of parallel stray capacitance in the simulation. It doesn't make much difference.

The step response is only a guide to lock-in time. It's difficult to predict the length of the capture transient. Every one is different! It depends on the initial phase error which is random. The loop may even initially go the wrong way.

SCILAB can also do bode plots of course:

...
g = kpd * f * kvco/s;
g = syslin('c', g);
xbasc(0);
bode(g, 1e3, 1e7, .01);     

Serial to parallel conversion

As a concession to programmable logic, I'm using a 16V8 to detect the end of the preamble and count the bits in:

The GAL was programmed using Atmel WinCUPL:


pin 2 = D7;
pin 3 = D6;
pin 4 = Lock;

pin [12..14] = [Bit0..2];

pin 17 = !Frame;
pin 18 = !Strobe;
pin 19 = !Gate;

field Bit = [Bit0..2];

Start = D6 & D7;

Frame.d = Lock & ( Start # Frame);

sequence Bit
{
	present 'd'0 if Frame next 'd'1;
	present 'd'1 if Frame next 'd'2;
	present 'd'2 if Frame next 'd'3;
	present 'd'3 if Frame next 'd'4;
	present 'd'4 if Frame next 'd'5;
	present 'd'5 if Frame next 'd'6;
	present 'd'6 if Frame next 'd'7 out Gate.d;  
	present 'd'7 if Frame next 'd'0;
}

Strobe.d = Gate;
    Bit0.d  =>
        !Bit0 & Frame

    Bit1.d  =>
        !Bit0 & Bit1 & Frame
      # Bit0 & !Bit1 & Frame

    Bit2.d  =>
        !Bit0 & Bit2 & Frame
      # !Bit1 & Bit2 & Frame
      # Bit0 & Bit1 & !Bit2 & Frame

    Frame.d  =>
        Frame & Lock
      # D6 & D7 & Lock

    Gate.d  =>
        !Bit0 & Bit1 & Bit2 & Frame

    Strobe.d  =>
        Gate
Timing diagram

Parallel data is valid on the rising edge of STROBE. The spare pin nodes could be used to lengthen or further delay the strobe pulse if desired. I'm using a Thurlby LA160 logic analyser to inspect the packets:

Sending test packets

I'm using WinPcap to generate test packets. See here for how. By sending continuous broadcast packets in a loop, I can view capture transients on the 'scope. Before the lock detector was working, and when the acquisition time was longer than the preamble, I sent long packets filled with the data byte 0x55 to simulate an extended preamble.

Packet structure

FieldLength (bytes)
Preamble8
Destination MAC address6
Source MAC address6
Type2
Data46 - 1500
Checksum4

Possible improvements

I've tried substituting faster 74F devices for HCMOS with some success. Quality of lock as measured by the duty cycle at the output of U8/B improved slightly. Only U2, 7, 9 and 10 need be replaced. The rest can stay as HCMOS. Strictly speaking, 74F VOH is too low to drive HCMOS directly. A 74HCT00 can be deployed at U8 to interface between them. CLK Q will need level conversion through U8.

I might try a faster charge pump using diode switching:

This is based on a circuit used by RACAL in the RA1792 communications receiver. It's faster because the (bipolar) current sources are ON all the time. And it should be easier to achieve source/sink balance using this circuit. Unfortunately, the output is constrained between 0 and 5 volts.

Interesting links

www.fpga4fun.com/10BASE-T.html
Manchester Decoder in 3 CLBs