Möhrenfeld

Juniper is somewhat (in-)famous for unhelpful and/or superfluous error messages. We faced a new one on our Juniper MX boxes recently. A normal-looking static route was committed, the next-hop was reachable via an irb interface bound to a VPLS instance:

[edit routing-options static]
    +    route 10.1.2.168/29 {
    +        next-hop 10.1.2.167;
    +        tag 666;
    +    }

Directly after committing the following messages occurred in the log:

kernel: iflist_ifindex_lookup(651) error: 2, errmsg: ifle for ifindex 5106 is not found
kernel: iflist_ifindex_lookup(651) error: 2, errmsg: ifle for ifindex 5106 is not found
rpd[14024]: RPD_SYSTEM: Get index for rt table CUSTOMER-TRANSFER failed of gf 11 kqp_op 2: Resource temporarily unavailable
rpd[14024]: RPD_KRT_Q_RETRIES: nexthop add: Resource temporarily unavailable
kernel: Timeout for req id=41835 ifd_index=285

After that the following errors occurred again and again over time:

rpd[14024]: RPD_KRT_Q_RETRIES: nexthop add: No error: 0
rpd[14024]: RPD_KRT_Q_RETRIES: nexthop add: Resource temporarily unavailable
rpd[14024]: RPD_KRT_Q_RETRIES: nexthop add: Resource temporarily unavailable
rpd[14024]: RPD_KRT_Q_RETRIES: nexthop add: Resource temporarily unavailable

This in itself is not very helpful. The only hint was the “KRT_Q” part. JunOS has a kernel routing table (krt). Routes acquired from routing protocols and static configuration are placed in this table. This table is then syncronized to the linecards (FPC/MPC) via the krt queue.

To see what’s going on there are hidden commands to see the krt status and queue contents. First we look at the show krt state command:

root@core1> show krt state
General state:
        Install job is not running
        Number of operations queued: 1
                Routing table adds: 0
                Interface routes: 0
                High pri multicast   Adds/Changes: 0
                Indirect Next Hop    Adds/Changes: 0       Deletes: 0
                MPLS        Adds: 0       Changes: 0
                High pri    Adds: 1       Changes: 0       Deletes: 0
                Normal pri Indirects: 0
                Normal pri  Adds: 0       Changes: 0       Deletes: 0
                GMP GENCFG Objects: 0
                Routing Table deletes: 0
        Number of operations deferred: 1 <-------------------
        Number of operations canceled: 0
        Number of async queue entries: 1 <-------------------
        Number of async non queue entries: 0
        Time until next queue run: 0.847768
        Routes learned from kernel: 34

You can see there is one “async entry” queued and also one “deferred operation”. Normally these should all be at zero after routes are installed in the linecards. In our case however this one operation was stuck there and did not disappear.

To see what exactly is queued, you can look at the queue with show krt queue:

root@core1> show krt queue
[..]
High-priority add queue: 1 queued
                ADD nhtype Router index 0 (46562)
                    error 'ENOENT -- Item not found'
                    kqp '0x10959000'
[..]

We can determine that the operation was stuck in the “High-priority add queue”. Also there is an error (that is not very helpful) and from the “ADD nhtype” we can guess that the operation tried to add some kind of next-hop entry.

This caused us to look at the static route next-hop again. The next-hop was pointing to 10.1.2.167. This in return was located on an irb:

root@core1> show route 10.1.2.167
10.1.2.160/29  *[Direct/0] 2d 01:49:44
                    > via irb.510

And there we have the problem. The network on the IRB is 10.1.2.160/29. 10.1.2.167 is the last address in this network, so it is the broadcast address. Setting the next-hop of a static route to the broadcast address is a bad idea (and was a typo in our case). After changing the next-hop to point to another address the krt queue cleared up and the error messages vanished.

In conclusion, the error messages were cryptic and without some detective work it was not apparent what went wrong. In my opinion this problem should’ve been catched before the route got stuck in the krt queue and it should’ve produced a more helpful error message.

This blog entry is also for people who google this problem. Sadly this is often the most helpful thing to do when faced with cryptic Juniper error messages.