Menu

OpenLTE B200 / Clock Drift?

hyerisf
2017-04-25
2017-08-14
1 2 > >> (Page 1 of 2)
  • hyerisf

    hyerisf - 2017-04-25

    Hi,

    I'm trying to get OpenLTE working with a USRP B200. It currently works very intermittently: sometimes, it works and I can see it on UE, most of the time, I can see nothing. Settings-wise, there's nothing controversial:

    band 5 (or whatever)
    bandwidth 5
    tx_gain 85
    rx_gain 35
    clock_source internal

    I believe the host is powerful enough (Thinkpad x240AL), and there's no O/L/U complaints from UHD. The "Asking for clock rate" and "Actually got clock rate" output matches up exactly.

    Using a bladeRF to test also produces the same result (generally, I can't see anything on UE).

    The only thing I can think of is clock isues. Do I need a dedicated clock source / GPSDO to run OpenLTE? I've done a bit of background reading, and it appears that OpenLTE works without a GPSDO / any external clock source (e.g. https://lh3.googleusercontent.com/proxy/0k0C25c8OzATMW8sPURKLM3EjsxjKCAuliI9f7xrN4z4axGCHTWyeRdPA63Rt-0dA1O0PJ574iaXC9BAtD7BjAZAj3sYmTUv1w0ye3bxJexIaGngo2CSnRW7UNtW6GgmKWlQ-pR3shQ-oDVLp9c=w530-h347-p - nothing connected externally, nothing visible on-board as far as I can tell)

    Am I missing something? Is there some other way to debug (and correct?) clock inaccuracies, if that's the case?

    Thanks!

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-13

    Hello Friend!

    I've having the exact same issue, I'm using a Lenovo X250 @ 2.6Ghz with 2 cores, I think the issues we're encounting is due to our laptop's power.

    I think the X230 @ 2.9Ghz maybe better?

    Did you get this resolved?

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-13

    The only other thing I can think of is the B200 actually dosn't work and its listed there as supported a troll?

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-13

    The only other thing I can think of is the USB Bus? Being terrible? Maybe? This issue drives me crazy!

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    If I set the bandwidth to 1.4 I can actually see the network, however I'm unable to attach to it (Correct Ki, IMSI and IMEI added)

    I'm almost certian at this point the issue is related to clock and the need for GPSDO.

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    The BladeRF comes very well calibrated and has no need for external clock. I find however the TCXO needs a minute or so after cold power up before the clock is stable.

    You should confirm your device's calibration against a known GSM network using the kal tool.

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Cheers for the reply Jeremy, I've confirmed my clock is off by -2Khz on my B200 and -67Khz on my BladeRF, currently running kal on BladeRF to get a better DAC Trim.

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    That 67khz is a suspicious number.
    It's 1/4 of the GSM symbol rate which means the kal tool may not be working properly, 67kHz assuming GSM1800 network is over 35 ppm which does not sound right at all.

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    I would step back a moment.
    There's no convincing evidence that this is a clock problem.
    These devices are very well calibrated out of the box, so if you are going to play around with the trim dac at least save your old value.

    Now, given what you have told me, that you do see the network at 1.4MHz, and at no other bandwidths, this suggests it is more likely to be the system struggling to keep up. What does the debug console say? Do you get lots of messages about skipping subframes?

    One recommendation I can make is to disable hyperthreading. I have turned on the PMC counters in Windows and can confirm if the O/S schedules code on the the other hyperthread of the core the radio thread is running on, this can cause the FFT processing to double or even triple in length, causing stuttering.

     

    Last edit: Jeremy Quirke 2017-08-14
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Indeed they're very well calibrated out of the box however I purchased this BladeRF over a year ago and it's been sitting in a box, I believe it may be a CLK issue.

    I've followed OpenAirInterface's guide on setting up the Ubuntu machine, disabling P-States, Disabling Intel Power Managing etc, there however seem to be no options to disable Hyperthreading?

    I'm also using the lowlatency kernel, 2 cores @ 2.6 seems to be more than enough? No?

    Thanks for your response!

     

    Last edit: Dylanger Daly 2017-08-14
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    One core runs the dedicated radio thread so everything else will go on the first core.

    Should be enough.

    Again, are you receiving warnings when you connect to the debug port about skipping frames?

    Be cautious on your platform that the code below isn't affinitizing the other message processing threads to run on the hyperthreaded sibling of the radio core (I'm not familiar with how Linux reports _SC_NPROCESSORS_ONLN)

    pthread_getaffinity_np(msgq->rx_thread, sizeof(af_mask), &af_mask);
    CPU_CLR(sysconf(_SC_NPROCESSORS_ONLN)-1, &af_mask);
    pthread_setaffinity_np(msgq->rx_thread, sizeof(af_mask), &af_mask);
    
     

    Last edit: Jeremy Quirke 2017-08-14
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Sorry yes, literally crazy spammed with them, like 100s every second.

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    OK so you have a performance problem, not a clock problem.

    Again, double check that the radio thread is running real time and affinitized to one hyperthreaded core.

    Make sure the other threads are affinitized to every other core and not the hyperthreaded sibiling of the radio core.

    Also check of course that other processes aren't chewing CPU etc (the obvious).

    There's one more obvious question - is the USB port operating at SuperSpeed

    (bladerf-cli -i, info command)

     

    Last edit: Jeremy Quirke 2017-08-14
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Hmm, these are the errors I get:

    08/14/2017 17:19:37.112977 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.113366 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.113507 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.113639 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.113769 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.113944 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114062 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114094 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114122 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114458 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114492 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114520 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.114696 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_phy circular buffer empty on receive
    08/14/2017 17:19:37.115120 info mac LTE_fdd_enb_mac.cc 407 MAC_dl_tti - PHY_dl_tti != 2 (1), skipping 0 subframes
    08/14/2017 17:19:37.115289 info mac LTE_fdd_enb_mac.cc 407 MAC_dl_tti - PHY_dl_tti != 2 (1), skipping 0 subframes
    08/14/2017 17:19:37.115328 warning msgq LTE_fdd_enb_msgq.cc 234 phy_to_mac circular buffer empty on receive
    08/14/2017 17:19:37.115367 warning msgq LTE_fdd_enb_msgq.cc 234 phy_to_mac circular buffer empty on receive
    08/14/2017 17:19:37.116244 info mac LTE_fdd_enb_mac.cc 407 MAC_dl_tti - PHY_dl_tti != 2 (0), skipping 0 subframes
    08/14/2017 17:19:37.116784 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7981) does not match PHY (8001)
    08/14/2017 17:19:37.117119 info mac LTE_fdd_enb_mac.cc 407 MAC_dl_tti - PHY_dl_tti != 2 (-1), skipping 4 subframes
    08/14/2017 17:19:37.117827 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7982) does not match PHY (8002)
    08/14/2017 17:19:37.118742 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7983) does not match PHY (8003)
    08/14/2017 17:19:37.120008 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7984) does not match PHY (8004)
    08/14/2017 17:19:37.123144 warning msgq LTE_fdd_enb_msgq.cc 234 mac_to_timer circular buffer empty on receive
    08/14/2017 17:19:37.123303 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7997) does not match PHY (8007)
    08/14/2017 17:19:37.123683 error phy LTE_fdd_enb_phy.cc 468 Late DL subframe from MAC:8007, PHY is currently on 8008
    08/14/2017 17:19:37.123977 error phy LTE_fdd_enb_phy.cc 503 Late UL subframe from MAC:8004, PHY is currently on 8005
    08/14/2017 17:19:37.124128 warning msgq LTE_fdd_enb_msgq.cc 234 phy_to_mac circular buffer empty on receive
    08/14/2017 17:19:37.124186 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7998) does not match PHY (8008)
    08/14/2017 17:19:37.124422 error phy LTE_fdd_enb_phy.cc 468 Late DL subframe from MAC:8008, PHY is currently on 8009
    08/14/2017 17:19:37.124645 error phy LTE_fdd_enb_phy.cc 503 Late UL subframe from MAC:8005, PHY is currently on 8006
    08/14/2017 17:19:37.124832 error phy LTE_fdd_enb_phy.cc 709 PDSCH current_tti from MAC (7999) does not match PHY (8009)
    08/14/2017 17:19:37.125117 error phy LTE_fdd_enb_phy.cc 503 Late UL subframe from MAC:8006, PHY is currently on 8007

    {Loops Here}

    Yeah she's on SuperSpeed

    Here's OpenLTE Config:
    System Configuration Parameters
    Read parameters using read <param> format
    Set parameters using write <param> <value> format
    Commands:
    start - Constructs the system information and starts the eNB
    stop - Stops the eNB
    shutdown - Stops the eNB and exits
    construct_si - Constructs the new system information
    help - Prints this screen
    add_user imsi=<imsi> imei=<imei> k=<k> - Adds a user to the HSS (<imsi> and <imei> are 15 decimal digits, and <k> is 32 hex digits)
    del_user imsi=<imsi> - Deletes a user from the HSS
    print_users - Prints all the users in the HSS
    print_registered_users - Prints all the users currently registered
    Radio Parameters:
    available_radios: (read-only)
    0: no_rf
    1: bladerf-ID
    selected_radio_name (read-only) = bladerf-ID
    selected_radio_idx = 1
    clock_source = internal
    System Parameters:
    band = 5
    bandwidth = 5
    cell_id = 1
    debug_level = radio phy mac rlc pdcp rrc mme gw user rb timer iface msgq
    debug_type = error warning info debug
    dl_center_freq = 869700000
    dl_earfcn = 2407
    dns_addr = C0A80101
    enable_pcap = 0
    ip_addr_start = C0A80102
    mac_direct_to_ue = 0
    mcc = 001
    mnc = 01
    n_ant = 1
    n_id_cell = 0
    p0_nominal_pucch = -96
    p0_nominal_pusch = -70
    phy_direct_to_ue = 0
    q_hyst = 0
    q_rx_lev_min = -140
    rx_gain = 30
    search_win_size = 0
    sib3_present = 0
    sib4_present = 0
    sib5_present = 0
    sib6_present = 0
    sib7_present = 0
    sib8_present = 0
    tracking_area_code = 1
    tx_gain = 30
    ul_center_freq = 824700000
    ul_earfcn = 20407
    use_cnfg_file = 0
    use_user_file = 0

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Top Output:
    Tasks: 222 total, 2 running, 220 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 21.3 us, 7.1 sy, 0.0 ni, 71.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
    KiB Mem : 7888280 total, 6223636 free, 901288 used, 763356 buff/cache
    KiB Swap: 8073212 total, 8073212 free, 0 used. 6555636 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2384 root 20 0 1909604 138340 15404 S 57.8 1.8 5:56.41 LTE_fdd_enodeb

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    OK,

    We need to eliminate your problem. Here are several possibilities

    1. CPU. The processor itself is fast enough to run 5MHz carrier, but either
      1.1 there is lots of interrupt/bottom half processing going on
      1.2 you are suffering lots of L2/L3 cache thrashing due to stuff running on the hyperthread/interrupts/bottom halves, this can hurt the downlink IFFT time bigly :P

    2. USB controller can't keep up. I would look here first. Follow this process to identify whether the controller can keep up

    https://github.com/Nuand/bladeRF/wiki/Debugging-dropped-samples-and-identifying-achievable-sample-rates

    Edit: the above process tests the RX path only. If this sucessfully passes @ 7.68MHz, then set the GPIO to digital loopback, and then use tx/rx commands in bladerf-cli to verify the transmitted file comes back exactly the same (md5sum them).

    Good luck!

     

    Last edit: Jeremy Quirke 2017-08-14
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Thanks heaps for this info!

    So I successfully disabled hyperthreading, the OS only sees one core running at its max freq,

    I've also run the 'gap' test on the USB Bus as per the link above and it came back 'No Gaps', was there more I had to test for the USB Bus?

    00:1d.0 USB controller: Intel Corporation Wildcat Point-LP USB EHCI Controller (rev 03) (prog-if 20 [EHCI])
    Subsystem: Lenovo Wildcat Point-LP USB EHCI Controller
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort-="">SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 23
    Region 0: Memory at f123d000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 3
    Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    AFCap: TP+ FLR+
    AFCtrl: FLR-
    AFStatus: TP-
    Kernel driver in use: ehci-pci

    This is the controller I'm using, its really weird, I still get these errors quite hard, they spam super fast.

    I think it could potentially be the USB Bus with this laptop (X250) because I get the exact same issues on my USRP B200, same errors being spammed on debug.

    Would I be able to grab a recommendation for a laptop? I was thinking the Lenovo X230 i7?

    Thanks you so much for helping me with these issues, I really appreciate it!

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    no problem.

    OK yes, we need to clarify what you tested.

    Test A

    The instructions in the link I provided are testing only the RX path without any TX.
    This is the first step to pass.

    1. I would run the test with n=250M or similar at sample rates 1.92M, 3.84M, and 7.68M and 15.36M and even 30.72M. 15.36M will allow for a 5MHz carrier (2 directions).
    2. You should be using an SSD for this test as magnetic drive may be the limiting factor. If you don't have an SSD create a RAM disk of 1-2GB.

    Once you confirm no gaps @ 15.36M proceed to the next test.

    Test B

    Next test is to configure the bladeRF @ 3.84M and 7.68M and use digital loopback.
    You will then transmit the gap-less file you received in Test A and receive it back using bladerf-cli
    use tx config, rx config, tx start and rx start commands to do this.

    Then you will gap-check the received file and make sure it ok

    If there are no differences the USB bus is OK.

    Edit: Wait, do you mean you only have a single processor showing?
    pleas show
    cat /proc/cpuinfo

     

    Last edit: Jeremy Quirke 2017-08-14
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Awesome so Test A was successful (SSD HDD):
    bladeRF> set samplerate rx 15.36M

    Setting RX sample rate - req: 15360000 0/1Hz, actual: 15360000 0/1Hz

    bladeRF> set gpio 0x257

    GPIO: 0x00000257
    LMS Enable: Enabled
    LMS RX Enable: Enabled
    LMS TX Enable: Enabled
    TX Band: Low Band (300M - 1.5GHz)
    RX Band: Low Band (300M - 1.5GHz)
    RX Source: Internal 32-bit counter

    bladeRF> rx config file=/dev/shm/samples_15.36msps.bin n=15.36M
    bladeRF> rx start
    bladeRF> rx

    State: Idle
    Last error: None
    File: /dev/shm/samples_15.36msps.bin
    File format: SC16 Q11, Binary
    Samples: 16106127
    Buffers: 32
    Samples per buffer: 32768
    Transfers: 16
    Timeout (ms): 1000

    bladeRF> x
    ➜ ~ ./blade_gaps.py /dev/shm/samples_15.36msps.bin
    Number of gaps:0

    Would you be able to elaborate a little more on how to tx and rx the samples_15.36msps.bin file?

    Also when you say digital loopback do you mean literally use a pigtail and hook RX to TX? Or is this a GPIO Config thing?

    CPU Info:
    ➜ ~ cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 61
    model name : Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
    stepping : 4
    microcode : 0x25
    cpu MHz : 2593.773
    cache size : 4096 KB
    physical id : 0
    siblings : 1
    core id : 0
    cpu cores : 1
    apicid : 0
    initial apicid : 0
    fpu : yes
    fpu_exception : yes
    cpuid level : 20
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt dtherm arat pln pts
    bugs :
    bogomips : 5187.54
    clflush size : 64
    cache_alignment : 64
    address sizes : 39 bits physical, 48 bits virtual
    power management:

    Cheers!

     

    Last edit: Dylanger Daly 2017-08-14
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    tx config file=samples_15.36msps repeat=0
    rx config file=samples_15.36msps.check n=76.8M
    set loopback firmware
    tx start
    rx start

    now run the python gaps on samples_15.36msps.check

    for some reason I can't run both TX and RX on windows
    Maybe you can

    If that doesnt work, no matter.
    I think our issue is CPU after all.

    1 core may not be enough.

    I have core i7-4710 with 4xcore 8xHT, I can run fine with no skipping even with web browsing etc @ 5MHz LTE.

    Normally I run radio thread on logical processor 7, and everything else affinitized to logical processors 0-5 inclusive (avoid core 6 - I can't disable HT on my laptop).

    But if I affinitize the whole process to one core, to simulate your scenario, it falls apart.

    So I guess there's your answer.

    In you case turning on the HT is no good either as that is not going to help.

     

    Last edit: Jeremy Quirke 2017-08-14
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    On the upside, this experiment helped me reproduce a bug in the bladerf interface. I have started a new thread with the diff and bugfix.

     
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Cheers Jeremy!

    I actually got hit with 5 gaps during this test:
    ➜ ~ ./blade_gaps.py /dev/shm/samples_15.36msps.check
    [16106127] = 0, Expected 16106127, Gap = 16106127
    [32212254] = 0, Expected 16106127, Gap = 16106127
    [48318381] = 0, Expected 16106127, Gap = 16106127
    [64424508] = 0, Expected 16106127, Gap = 16106127
    [80530635] = 0, Expected 16106127, Gap = 16106127
    Number of gaps:5

    Does this mean CPU or USB Bus?

    Awesome! Squashing Bugs!

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    The pattern is too regular for it to be real gaps, so, no, I would say no issue with USB for you.

    Your issue is CPU, single core will not cut the mustard unfortunately for you.
    You might have some luck @ 1.4MHz bandwidth instead.

     

    Last edit: Jeremy Quirke 2017-08-14
  • Dylanger Daly

    Dylanger Daly - 2017-08-14

    Ah awesome, I've actually tried 1.4Mhz, it works, its discoverable on UEs, however I can't attach and on top of that I still get thoes errors above spammed?

     
  • Jeremy Quirke

    Jeremy Quirke - 2017-08-14

    Yeah still almost certainly CPU limited mate, what I'm guessing is the single biggest issue is because that CPU is running everything, the L1/L2 caches are getting cleared out between the math-heavy stuff (process_dl) which is hurting performance.

    Not to mention hardware interrupts and bottom halves (especially USB controller ones that copy buffers around) will have no choice but to be assigned to that single CPU which will also likely cause cache evictions etc.

    The fact that 1.4MHz is getting somewhere points to this.
    Again, 2 cores should be enough. I did the experiment on my system locking the affinity to 2 cores only and it ran at 5MHz without a glitch.

     

    Last edit: Jeremy Quirke 2017-08-14
1 2 > >> (Page 1 of 2)

Log in to post a comment.