Plenary Transcript

Chaired By:: Clara Wade, Brian Nisbet
Session:: Plenary
Date:: Tuesday, 19 May
Time:: 11:00 ‐ 12:30 (UTC +0100)
Room:: Main Room
Meetecho chat:: View Chat

RIPE 92

11 am plenary:

BRIAN NISBET: Hello, good morning to you all. Welcome to the second plenary session of the day, we have lots of fun talks for you. I'm Brian and along with Clara, we'll share this session, you haven't heard it often enough, the PC elections, the deadline is 3.30 today to nominate yourself. Thanks to those who have. And yes, if you want, if you haven't got a head shot and you want a head shot for your PC nomination, the official photographer is happy to do that for you. So, without further ado, we'll introduce our first speaker, Antonios Charlton to talk about RPKI garbage collection.

ANTONIOS CHARLTON: Hello everybody, can you hear me? I hope you can hear me, OK, perfect. We can begin. So I am going to start with a story on what I found on this wonderful subject of RPKI. So it all starts with a friend that I have and this friend likes to learn new things and it's how he progresses in his career and how he moves to spend his free time in a better way, basically. And he has played with network gear of you know various vendors, he has played with networks and overlays that are using BGP and some kind of filtering and what not and this has been successful so far because he has been able to improve his career and it's been going to far very well for him.

But he wants to keep learning, he wants to learn even more about the internet and try to see what else is out there, like beyond the confinements of a small home network that runs in essentially one location and everything is VPNs and tunnels. But he is also budget conscious, not everybody has unlimited money to spend on their learning journey and this is something that I completely understand so he wanted to find a solution that would help him learn more progress with his career and eventually be at a place where he can find a job in this industry.

So we were discussing this and one day I received a message from him that said can you help me get an AS and some IP addresses like IPv6 and the reason for that is that you can test in these virtual networks, you can test all of the filters and all that but when he joined the real internet, that's where you have to do RPKI, all these data base objects, large volume of objects, filtering, starts to get more serious and more important so that's something that you can only experience by doing this in the real world, of course by taking precautions and not causing too much damage along the way. And my answer was was of course I like to help people when I can and if I I have the opportunity and I thought this would be super simple to do it, just get an AS, I thought it's like I can go to the LIR portal, it's like two minutes to file the application to get an AS, right?

So I go there, request the resources, I click the AS button and I move forward, not to minutes to get the AS of course, it's two minutes to apply. It turns out we need a contract apparently, it's not that easy, OK, makes sense. I look at the website, there is a nice template I think by the legal team or the community that you just download this doc X file, you change some details like the names, addresses, whatever and that's very useful. But apparently in the contract it says I need to charge this person, OK, like I say OK, can I give it for free like maybe change the doc X file and just say this is provided for free, turns out no, existing policy requires you to charge people, you cannot offer sponsored resources unfortunately nor free.

So now it's a high budget transaction, I need to contact my accountant and see how I can charge VAT and how I can pay VAT and issue invoices, suddenly it turns out it's too expensive so I unfortunately have to send this person to the many LIRs that are sponsoring resources and the good thing is that some of them are doing this thankfully for a very low or basically no margin at all, they just charge the price that they have to pay to RIPE NCC.

So long story short after 20 or 25 days I think we were able to get an AS and despite it being almost a month, we are still interested in moving this forward and we say finally we can do the interesting things now. We can configure BGP, we can do all sorts of things to move this forward and he can finally start to learning.

So what I'm doing for my side is I go to my AS set for my downstream customers, I add his AS there as a member, then I have some automation that I am using on my networks Coloclue and it generates the customer YAML file so I don't have to configure the filters and what not.

And it generates the prefix list, it doesn't send me whatever he wants, and we are more responsible, and when I look at the pull request that's generated, I see IPv4 addresses there, I see two /22s in this prefix list and I am like OK, this is a budget conscious friend, he probably didn't spend 40,000 euro buying IPv4 so what's up with that. This was totally unexpected to see IPv4 when I generated these lists. So I go to BGP tools like the website to look up the AS and see what's happening and figure out if there was a mistake or anything like that. And I also discover that he is connected to DE‑CIX apparently and he also has a Juniper router and remember this is a budget conscious friend, he also got a rack in Madrid and a port in DE‑CIX and he also bought Juniper routers along the way so what's going on here, he probably is hiding his hobbies money from me and he is just having me work for free.

I am over dramatising this of course, he told me he has no idea what this IPv4 prefixes are so I troubleshooted my automation tool, I ran it with the AS and it returns these two /22s. So this seems shall seemed to be associated with this AS. And I check, ID bug it and it turns out one of the database, the IRR database being used by BGP Q4 a pseudo database, which likes to put it simply is basically you take all of the ROAs that are published and create route 6 objects and then you serve them like I am over simplifying it here, it's not, but essentially basically you get all of the ROAs, as route objects.

So I look into this further and I use Monocle to check the ROAs and indeed RIPE NCC the trust anchor of RIPE NCC has two valid ROAs up to 24 for his AS. And these are current, these are being renewed, everything is fine.

So back to BGP tools, I discover that this AS had previous lives. Indeed one of it had two companies that had had AS before, one of them is really in Madrid, so probably they were the ones are the routers and the DE‑CIX port.

So then did we discover a way to get free IPv4? Because right now I have someone, he applied for an AS, he got an AS, and he can do an RPKI valid hijack and nobody would be able to tell, this would propagate everywhere and there would be no way of knowing for anyone that this is an attack.

To be clear like this is a joke, these addresses are not yours, so if it happens to you, you don't own these addresses, you are actually attacking a real network. And in fact if one of these is not even advertised, you are also getting a hundred percent of the traffic.

So I go to one of the RPKI experts that I really trust who knows the most at least as far as I am concerned about this topic and I ask him hey, like what if I keep getting ASes and what will happen then. And he tells me, try go getting 100 ASes from RIPE NCC, they are not going to give you that many. Mrs rate limiting that's happening, so the registry will stop you, you cannot come up with 99 excuses.

And literally this happened in the morning, literally that afternoon I was scrolling through Mastodon and this is a bot there from BGP tools and this bot is posting all of the new AS allocations in origin and you can see here a bunch of them I think like 15 or something and turns out that you can probably get more than one AS if you justify it well enough.

So yeah. Maybe it's not as solid and we had a laugh and that was it but then I'm lucky enough to be working at a company where I get to decide how I spend some of my time, so I work at Cisco in case that wasn't clear and what I did was I discovered all of that on my personal network. I saw that this was happening and I solved the problem for this specific use case. But I wondered is it something that happens to someone else, does it happen and how often does it happen to the entire internet.

And since I'm at least currently entrusted to something degree with the ability to choose what I spend my time on, I decided to study this, like do this research using company resources, company time, to figure out if there is a problem indeed for the internet as a whole, if this is something that we should be worried about.

So here's what I did. First of all I got a list of all the ASes and the dates they were assigned. This is a file that is provided, you download it in five seconds, boom, you are done.

And the next thing is you go to 24, 28 hours before an AS is assigned (48 hours) and you look if there are any ROAs pointing to this AS, they are legitimateising this AS as an origin for their prefixes and the first part was easy like getting the ASes, the second part, apparently it was more difficult because there is this open project, RPKI views, who could really use contributions in money and time and resources in general and they have the archive of all of the certificates that you can go back to and research, if we didn't have an archive, this would not be possible. We could not look back in time.

And the question then is why is this the correct methodology. The idea is the following. Like RIPE currently, like the registry assigns the ASes randomly so I theorize if you go a day before an assignment is made, not even the registry staff knows which AS they are going to assign. So probably not even the new assign knee is going to know to pro create these ROAs so any object, any ROA that you see for an and I signed AS, even if it's 24 or 48 hours before, it should probably be a mistake that's left over from a previous life.

And I ran the numbers and let's see what it looks like. So for 2026, so far at least, 18% of the ASes that are allocated in this region are basically come with pre‑existing ROAs and the 2025, which is a full year, about one in five. So that means that roughly one in five new ASes that we give outcome with free prefixes, again a joke. I wanted to look more into that and I took year 2025 as an example. And here are the assignments that have been made through 2025, the busiest month was July. So summer seems to be peak networking season. And they are roughly constant over here and you can see we make about 150 to 200 ASes every month. And I looked at how many of these have ROAs and it turns out that like as expected, we have some peaks but it is roughly the same and then for picks, most people get free IPv6 with their AS, not their own, that is, and a few luckier ones get IPv4 ROAses pointing in their AS and some of them get both, just to have a complete set of address families so we can do dual stack and whatever else they want with hijack space.

And the problem here is that there is a double digit percent chance of you getting IPv4 like of course IPv6 is also an issue but you can also, you know potentially a commercial resource here like IPv4 that you can hijack and it would not appear as a hijack.

So the other thing that I noticed was that this seems to have started in 2025 in this arbitrary timeframe, it starts at almost zero, I was wondering what is happening, like why do we have, what was happening before. So I pushed it to the limit of RPKI views and this is 2021, this is almost the beginning of 2021, I town loaded basically as much as data as I could and here are the assignments in the region and they have you know a slowly increasing trend and here are the ROAs at the bottom, I had just a few of them like pre‑existing ROAs but let me actually normalise this so now assignments are 100% and you can see here what's happening, the list with the pre‑cysting ROAs so there were some that were happened like around 2021 we started assigning ASes with pre‑existing ROAs, it lasted for two years and then we almost stopped, it almost completely stopped happening and this lasted for two more years and then it started happening again significantly.

And we reached even a month that was almost 40% and it has been around twenty something, thirty percent recently of assignments with pre‑existing ROAs. So I wanted to find a way to internet that and I was looking at the chart and I was thinking OK, so like there is this two years period and what does two years mean for our community, like I have heard this number before. This period of two years. And I kept thinking about it, could this be related to something that happened and lasted two years and as I was thinking, oh, sorry, this is an unrelated slide, I don't know how it made it here, it's oh it's the IPv4 waiting list, basically we ran out of IPv4 in 2021 and then two years later, you can basically close your account, OK, let's ignore it, let's move forward, that's a mistake, I should have removed this slide...

So what about other RIRs. Well it turns out that the most difficult thing I had to do was get a simple CSV

Of AS comma assignment date, the thing that you can get in five seconds from the RIPE NCC, it's very, very difficult to get, correctly at least, from any other RIR, you can get it easily incorrectly, that was super easy to do but getting the actual correct file, that was not possible.

Two NIRs are it, that was nice, that speaks to the great work of the registry and database team are doing in maintaining records but making access easy to conduct research like this and going through these ASes manually, this is a problem that has been happening in other RIRs but I don't know to which degree that is.

So what can we do. So is this a problem with RPKI, the first question, because one of every five ASes we are handing outcomes with pre‑existing prefixes and this allows you to escalate network privileges but this is since this is privilege escalation month for Linux, this date maybe we are use used to that and in theory it's not an RPKI problem because it works fine and this is just the, it's just misconfigured, it's not something that is in the protocol and the protocol already works better than IRR because you have to keep reassigning these ROAs, you are never going to end up with the garbage and multi‑decade routes that exist over there.

The other thing is the policy that we have. We have a clean net policy, if an AS is returned, it's kept for six months out of circulation and then it's handed back to the pool and then randomly reassigned, there is a very long and detailed process to make sure that it is cleaned up and this is not something that includes a lot of items for RPKI, perhaps this is something that we can expand on. But for the interests of time, let's talk about there is a session 30 minutes in the Address Policy Working Group, this is on Thursday if you want to see any chart, any data that I can prepare in two days to present to you by then or if you have, if you want to discuss it and find a solution to this problem, please come to this session; other than that I am open to questions right now.

Thank you.

(APPLAUSE.)

BRIAN NISBET: Apparently you are going to get some questions. So. Yes, please.

SPEAKER: Job Snijders, RPKI views. First of all, not a question but a compliment, fantastic presentation, really informative, thank you for putting together this analysis. Can you go back to slide 30 please.

ANTONIOS CHARLTON: Very slowly I can.

SPEAKER: You are not counting the slides as they progress? ? !

ANTONIOS CHARLTON: Is it this one?

SPEAKER: Yes please, I am not a mathematician, that's absolutely not random assignments and I think we have a cache invalidation problem rather than a garbage collection problem so if the RIRs were to do a uniform distributed assignment from the Ayana received blocks, RIRs receive the blocks in chunks of 1,000 and throughout the year amassed a number of buckets, I think this observation of inadvertence privilege escalation would dissipate for in large part, another approach could be like a first and last out style... but I think the core of the problem isn't, like garbage collection is great, it's good to clean up ROAs that are pointing in the wrong direction but there are consequences because maybe some thought the ASN is unused and intentionally tried to make ROA point to an unused ASN and suddenly it's thanks RIPE I was using this configuration for weird purposes so random ASN assignment is good approach here and this is not random. I think. But I could be mistaken.

ANTONIOS CHARLTON: It doesn't look very random but the policy officially is these are random numbers, thigh thought on the tomorrow I can is we cannot garbage collect

SPEAKER: Role the dice once.

ANTONIOS CHARLTON: We cannot force remove ASes, what about delegated CA it's and TA As, we can perhaps email them for in the tool we can notify about stale records, are you sure you still need this so we can do things and that's what I was planning to discuss.

SPEAKER: The other one, the cool down period you mentioned is six months.

ANTONIOS CHARLTON: Six months after it's clean, yes.

SPEAKER: Maybe it's too short.

ANTONIOS CHARLTON: We can also increase it, that's an option

SPEAKER: Again, thank you.

AUDIENCE SPEAKER: Really cool presentation, thank you very much for doing this, two things. One, did you remove any of the assignments that were previously entirely never ever used or did you never clean those up, this might also be a peer fact of someone fat fingers on their actual prefixes, he is that something I don't think you have differentiated from in your numbers, it might be helpful cleaning that out between differentiating between never used ASNs and ASNs that were recycled, that might be helpful and the second component, I am not sure there's a lot we can do because the ROAs in this use case are assigned, were admitted by the resource owner on the prefix side of things, not on the ASN side of things, how do you differentiate this is stale versus miss misconfiguration, versus this is intentioned misconfiguration,

ANTONIOS CHARLTON: It's it's not been circulated for six months and two days before assignment still has ROAs, it's probably misconfiguration, there could be some researcher doing something or something on purpose an on unassigned ASes, it's not scientifically proven, most of it would be that, like 99.something%.

SPEAKER: Tim... RIPE NCC, thank you for presenting this. Nice story. So the RPKI dashboard is very much focused around showing people which announcements we believe might be happening for their IP V fixes and it guides people to creating ROAs, what it does not do well is show people the ROAs that they have that don't seem to correspond with any announcement and there's actually a very high number of them. I am not sure if I dare say it, well I will say it, 30 to 60% if you do a world map depending on region, I am not sure if it's significant between regions but anyway, there's a lot. And I think there's work to do there for us as well because we do want people, you can you be describe to it and get warnings about it, I think we should probably extend that warning people are those ROAs perhaps stale, did you forget rule, I think we should do that better.

BRIAN NISBET: Please.

SPEAKER: Hello. I'm Tobias Fiebeg, you could have called. For reference, the assignments you are currently seeing there with temporary assignments where we let students experience real world BGP, there's a RIPE Labs article on that if you are interested, we also noticed the issue you noticed there and decided to do something about it which is dropping an email being yo, you might have a problem there. We also that a lot of these ROAs that pre‑existed to trace to real announcements, not misconfigurations. We had those also a year before when we got these first time random announcements, and I'm now going to say things, if they are incorrect, Marco is also in the mic line. Roughly two years ago like at the fall meeting, I did have this problem of getting this very special temporary assignment process and I noted that there might be addresses approval and assignment seems to be cyclic and based on time so if you know that there's a nice ASN in the pool like 3310, you could time it, you need a unique prefix, I have 29, I have enough prefixes, I could get a lot of ASNs and they are built on 1 January, so if I return them a year later, it also doesn't hurt me. I think there were some changes to the pool procedure roughly two years ago and also for these temporary assignments at least, there's no real cool down period, some of these ASNs were reassigned, it was a three week period and like downstream somewhere in wherever because it already got reassigned. Difficult. Maybe we should take this also a bit off‑line.

ANTONIOS CHARLTON: Yeah, sure.

SPEAKER: Marco Schmidt, RIPE NCC. Thank you so much for this very entertaining and yeah, attractive presentation and I agree with you there's something that can be done better, I think I have already some ideas of answers to some of the mysteries that you discovered, we could dive in a deep way into this one, there's a lot to unpack, we can also talk off‑line, thank you for raising that and I think something can be done between the RIPE NCC and some in the community to do it better.

ANTONIOS CHARLTON: Thank you very much.

SPEAKER: Gert involved in policy. Great presentation, just point out this ROAs that seem to be stale make the pool that we get from eye Anna bigger, there's no reason the NCC couldn't cycle true 10,000 or a 100,00032 ASNs, except for, I don't know, the reason I came here is actually something else. We have the extended stats files from all the registries and there's a certain assumption that these files are correct. If they weren't, there's a huge problem. And this really should be addressed and personally I would just throw them all at Marco and say please deal with this and get it sorted out but I don't have any data that says why are these files incorrect so I think there's serious work to be done because the registry data needs to be correct for all five registries all the time.

ANTONIOS CHARLTON: Yes, this is a good point, my AS I am using has three registration dates, depending who you ask from 1995 to 2000 and twenty something so yeah.

CLARA WADE: We have Ruediger online.

RUDIGER VOLK: The comments and the question is well aren't the basic problems that mean ROAs are the responsibility of the address owner and if the address owner does not clear what he allows buy the ROAs so he is getting what he wants to get and the community gets a little bit of collateral damage. Kind of the second thing could be that the address owners are missing, that monitoring tools that actually tell them that something about what they have put in to the ROAs has changed and kind of in my opinion someone who is responsible for a resource and actually cares for it would be actually looking at someone... and unfortunately we haven't done a good job in providing such monitoring service but exactly that is one of the very important missing pieces in the RPKI. Not only for ROAs but for ROAs in space. Thank you.

ANTONIOS CHARLTON: Yes, I agrees it the address owner's problem, fault in this case but as a security engineer, I want to help people not miss configure things and I want security by default so yes, monitoring I think is the right choice, I am aware of commercial solutions, say non‑RIPE NCC solutions but yeah, perhaps we can get some of the RIPE tools as well. So.

BRIAN NISBET: OK. Thank you very, very much.

(APPLAUSE.)

BRIAN NISBET: Lots more discussion this week. So, now we have Petros talking about OpenPenny and goodwill because there will be a live demo in this presentation and so all good vibes, please.

PETROS GIGIS: Hi everybody, my name is Petros Gigis and I recently finished my PhD and today I am going to talk about OpenPenny, a new OpenSource tool that helps you identify legitimate non‑spoof traffic with high accuracy. And the idea is simple. If we cannot trust the observed traffic and we cannot really know if it claims to be from, many operational signals may become unreliable.

So I start my presentation with an example, let us consider that we are an ISP‑A and we have multiple presence points around the world, e.g. New York, Rome and Tokyo, and we have a regional ISP that operates within the UK that operates within the ERPK and our ISP‑A provides global transit. We expect traffic to arrive to the ISP‑A from ops that are nearby the UK for example Italy or North America, this depends on the routing.

And we have tools we can use access of telemetry and traffic volumes, packet counters, technologies like Netflow, Sflow or even we can use routing data like BGP.

And at some point we observe traffic with source IP coming at an an ingress point that never seems to be passed. We wonder what happened, did someone change the routing or is there a route leak, is it a policy violation and normally based on the fact that we have this like OK, we are seeing this source prefix coming in Europe or North America and now we see it coming from else, and everything suggests that we have a routing taint, and depending on our objective, we may consider that this shouldn't be happening or it's something that we can accept.

But what if there was no issue. The incident may be entirely misleading.

And this can happen if in the case of the traffic that web observing in our pop in Tokyo where sitting traffic from someone else with the spoofed and using a source page of the ISP‑B and if in that case we ran into this am big beauty problem, we don't know if it's a valid routing path between the ISP‑B or someone is just sending spoof traffic and just looking at the packet counters or traffic volume, it cannot tell us if the traffic comes to our pop from the source it claims to be from.

And yeah, we may wonder OK why this matters. So if we get mislead and we think there's a routing issue and start trouble shooting, we start spending time on routing issue that never existed. And if we can accurately tell the traffic where it claims to be from, we can confirm customer provider policy violations and we can also detect unexpected rouse changes based on the data plane so the problem here is not looking for spoof traffic, we are focusing on finding legitimate traffic very reliably and next, it's an example of a stealth BGP hijack, in this graph, there are six ASes and let's consider that AS B announced last 16 and we can see how the path pop gateses towards ASSand also AS is free to implement RoV while AS 2 they don't do.

At some point in time, a different AS announces a more specific /24 but it's part of a /16. And because the neighbour AS, AS 100 does not present any router invalidation, they propagate and instead it's being propagated by AS 2 and is being sent to AS‑A and AS 3 in this example, it's going to be filtered because there's no ROV entry and yeah, the /24 can protect this because the traffic noted for this, we can see there is a problem now in traffic but what if AS 101 tries to make it more smart and reroute the traffic back to the original path?

In that case the traffic still reaches the destination and if we don't have a valid point within the ASes that's installed the /24 route, we cannot see it on the control plane but what if we could look on our English point in and we could reliably tell OK, I am seeing now traffic from IP space of A coming that I didn't expect and again without being able to tell it reliably to some problem, we wonder OK, is it a real path change or just spoof background noise. And unfortunately from control plane, it's almost impossible to get reliably. There are existing tools are very useful, for example you have RPKI, which can tell us if the origins of authorised, you can tell us tools that do monitoring but can tell us did the route change, we have NetFlow, Sflow technologies that can detect where the traffic is coming to our network but none of these tools with tell us reliably if a source of a traffic that is coming comes from where it claims to be from. Or it's just spoofed noise.

And with this new OpenSource tool OpenPenny, we are trying to add this missing data plane signal. Now we move to the second part of the presentation. Where we give you an idea of how OpenPenny is built, so OpenPenny has two modes, the first mode is a massive mode, it doesn't interact with traffic, it just observes flows and what it tries to do is, it tries to find suspicious aggregates and the idea, if we see something that shouldn't be in in this there, maybe we can see patterns and if there's something in the patterns, we can move them to a more intrusive testing mechanism, as I said the passing mode collects TCP flow statistics, it looks for the exact with load balancing, it looks for out of order packets and looks for flows that they were abruptly terminated or they are unstable.

And the second mode is they want is the most interesting one where we test traffic behaviour. And the idea is that we are trying to find some legitimate flows, some flows that we can be very high confident that they come from the source that they claim to be from.

And in order to do this, we rely on a probablisitc mechanism, and the idea is the following, if we have traffic coming from somewhere we didn't expect, it may be already problematic so what if we tweak a very small amount of a traffic for example if we drop a few TCP packets and wait to see if the protocol is going to retransmit the missing packets.

Of course that's dropping one or two packets. It's not that because we have to deal with this TCP quirks, external losses, we want to minimise user impact, to to cause TCP to further step back and we need to be aware of people that we they know how our mechanism works and they will come up with a pattern and try to bypass and lead us to a false conclusion.

And firstly, we have seen that it would drop 12 packets across thousands of packets, the chances for an attacker to bypass it is one in a million and if you want to know more about this mechanism and all the different implementation concerns, you can refer to our paper on SIGCOMM 24.

So let me talk about OpenPenny. It's an experimental software under Github and it supports first packet processing, to do that we use ‑‑ XDP and DPDK as packet process and we do in the user space. It supports flow aggregation and we can do testing across multiple flows which shows the impact of traffic and it supports interaction modes, one through CLI and another through gRPC and we tested it and we can cope with tens of bits of traffic.

Our deployment model that we had in mind is this. So we have an x86 box in the network and whenever something doesn't look right, we just send a small traffic slides of the traffic to x86 box for a few seconds up to maybe a minute and then the x86 box processes the traffic and then directs the traffic back to the network. So it doesn't require any modification, in the network itself. It does require some of ACL tools to monitor the traffic and send it to the port when OpenPenny is connected.

And let me summarise what OpenPenny can help you achieve.

So you can reduce false ingress alerts in your network by verifying that traffic that is coming, where it doesn't be is legitimate because finding legitimate traffic means there is a valid routing point from the ingress point to the source, it can lead to more observations and better visibility on stealthy path changes and maybe used for evidence for policy violations. The tool is available on Github at the following link and it clause a BSD2 clause licence, if you have time or issues, feel free to look at Github and I published an article in RIPE Labs article and our vision for it. As a final part of observation, I have set up a test that consists of three x86 boxes, we used 100 G links, in this demo, I have one x86 box used as a source and one box used as a destination and initially traffic goes direct through between the two nodes and when I'm configuring OpenPenny, to start testing the traffic, there is a command into it starting routing the traffic to the 86 node and when the traffic is being redirected back to the destination.

OK. So in the terminal on the right is the box for trans OpenPenny and I am going to use the daemon, which talks with the gRPC proto path. We need to use the BDF rules. The next step is I am going to send a gRPC messages to instruct the daemon to install the rule in the E BP F and forward traffic to the user space. So the message looks like this. So we are specifying what we want to do, we can do prefix matting, in this case I am just doing TCP. TCP on this port 5201 and there are mechanisms of the tool, where there's going to be probability, when to stop, how many flows to try and all documented on the Github so in this case we are dropping one out of, 1,000 data packets but this is across multiple flows and there's an upper bound on the number of drops set to 12.

And now I am going to ‑‑ that's a demo effect.

This is not normal! This is a new bug that I have never seen!

I don't even get what it's doing. Interesting. We had a video, can we play it? If I tried to do it 1,000 times, it's not going to end up like this.

Is it possible to use the video? OK, this is the data on GitHub and this is architecture. So we have the AF packet, we have two pipe lines, reactive mode and passive mode and this is what I was telling you previously about how you can configure the gRPC, in this video recording, we are using 5% packet drop, again we are setting a max of 12 packet drops.

And yeah, we also instructed it to stop after finding three, after full testing three flows and yeah. Again we start with daemon and here we send the gRPC message, so we see the matching rules, everybody by‑passes the kernal, comes to the userspace where the app is running and the first part of it demo we are testing for legitimate traffic, the outcome is going to be that ‑‑ OK. Maybe ‑‑ yes.

So it's fast because we just look on a small fraction of the traffic and our test is, we drop 12 packets across multiple flows that started to iPerv, it's also I think it's in this two CPUs and yeah, we can see the other packets being processed since the outcome and the outcome is closed loop which meanings the traffic is legitimate and in the second example, there is a case of mixed traffic so we have both spoofed and mate traffic and we can detect legitimate traffic there, if we had aggregate and reached the outcome there is no closed loop flows, no ‑‑ it's spoofed, then we see it individual flows and we have a number of flows that we test and in this example for example we detected that two flows that are edge lit mate and then we have detected 19 TCP flows that are spoofed and I am generating traffic patterns that mimics exactly spoofed traffic. So we can see that in the individual flows we dropped six packets but didn't see the any error transmission.

So very quickly. OK. Very briefly, so spoof traffic creates noise in monitoring systems and OpenPenny focuses on finding any small portion of legitimate traffic that's coming, which causes the routing in the source destination, the protocol is funded by RIPE NCC community project fund and we are very grateful for this and if you want to use it, try it or if you have ideas or anything for it, come to the mic or find me later during the coffee or lunch and yeah. Thank you.

(APPLAUSE.)

BRIAN NISBET: OK. We have at least one question.

SPEAKER: Jen Linkova: . Sorry, I probably missed something very simple here. In TCP if your handshake completed, it gives you a very high confidence that connection is not spoofed. What is the point of trying to verify TCP flows instead of just looking at handshake.

PETROS GIGIS: That's a very good question. Unfortunately you can do this, you can get a signal only if you are company like cloud river or Google or Meta that you terminate the connections, if you are testing ISP, you see traffic transiting through your network, you cannot know what the line traffic does. So the tool is built for networks that are sitting in the middle and they don't see any about any state about the connection.

JEN LINKOVA: Almost symmetrical traffic flow that might not see a response at all, OK, thank you.

BRIAN NISBET: Any other questions? Nothing online, Clara? OK. Thank you very much.

(APPLAUSE.)

OK and our third and final presentation in this session we have he will lease and Fred talking about redefining Netflix's BGP architecture.

ELISA VENNEGUES: Hello everyone, I am Elisa, I am a network architect at Netflix and I am here with Fred. Before we dive in, a quick show of hands who is still running some form of BGP for mesh? We still do, OK. A quick few hands raised. So we are going to discuss here today, we are going to present you with what we have done. It's not one true BGP design, it's just what we came up with and that suits our needs for what matters for Netflix. I am going to present you with what BGP architecture means for Netflix and how we operate our network and Fred will take over with the technical solution that we have designed and we have deployed and I will, we will carry on well validation process, what we have tested and how we make sure basically that everything is working as expected and I will present you with the migration process and the telemetry that we set in place to make sure that the migration is successful and easily trackable and we'll end up with the lessons learned and the conclusion. So what is our BGP Netflix architectures.

So today we operate an open connect BGP which is more than Meta by the way, those numbers can look high but they are not so much so we are in the multi‑contributory territory in terms of traffic delivery, we rely heavily on cautious so both on our network and inside our ISP partners so they deliver the traffic and they operate BGP as well so we know where to steer the traffic to and we operate in this 175 countries over the world. So it represents 90% of the people almost and we have like 400 points of presence worldwide, you can see on the map all the blue stuff where we are located, both from OCA perspective and our footprint in terms of network localisation and from the edge routing perspective, we have 20,000 BGP sessions and we peer with nearly 3,000 different ASNs over the world, you can see we are heavily peer network and it matters when it comes to BGP, we have to take it carefully because that's what's basically deliver the traffic on the ground.

So, I will describe quickly the functions and different networks devices that we operate on the BGP, we have the switches which are basically routers but all that from the back in the day, so we are basically connecting the OCA which sits our border to the cautious to our peering partners and they run iBGP for what the mesh is about and eBGP with our own partners, we also have new kind of devices so aggregation routers which aggregates the gaming appliances, which are sitting in our data centre and not in the ISP partner network at all. So that's a different and basically they connect to the upstream ESs and they could be remote in the middle and those are mostly IGPs or both V6 and V4 and we have on the right‑hand side for you the provider edge, which connects every single tunnel to Netflix, studios and MNA and so VPN V4 and V6 and the entry point for everything internal to our network. So those are the basic BGP functions that you can see in any kind of network, that's how it relates to us.

And this is basically the current, not so much current any more but that was where we were coming from, everything was iBGP full mesh between ESs and PEs and we do have OCA which are the gaming appliances doing eBGP, so easy to operate, easy to reason about you the thing is the scale that we are operating at means that every single device as to has to manage and 1,000 significance like only for IGP, it doesn't scale well, not easy when you want to modify any policies to exchange routes, we don't have to do that so much because we don't carry user traffic over the BGP, we steer the traffic, the nearest point of exit so but with gaming, that is a new services that we have to board, that changed the way we operate and we may have to carry user gaming traffic over the backbone and as such we need to now exchange routes within our backbone. So that's the main drift from what we were doing, we didn't have any need before to move away from that mesh, that's the reason why we need that had right now and not before.

So when it comes to defining the new architecture, I think getting a clear sense of where we want to go and what we want to achieve is key. And so scaleability is definitely something that we have to manage and that was not something that was easy with the iBGP mesh, automation is something we heavily rely on, no problem at all even with the mesh, everything was automated but having something more easy to manage and flexible adjust is definitely something that we need to develop and carry on with what we are doing now.

And the resilience obviously, that's we can really, we cannot afford to fail that that's definitely key and observability is what is when you operate at that scale, what matters for when you operate at that scale. So over to Fred for the technical solution.

FREDERIC CUILLER: Thanks for the introduction and I will walk you through what we have deployed together at Netflix.

When trying to move away from iBGP and full mesh, in any possible ways, the first way would be to use a BGP configuration but they tend to be of, they are not flexible, they are very hard to operate in an edge and they are not really widely deployed any more. On the other end, BGP route reflectors are a proven solution and they can scale very high. In this situation we have chosen to deploy at Netflix and what we have decided to go with are delegated of path route reflectors meaning the route reflectors are only delegated to end or BGP contraplane traffic, they are not taking part in the full plane at all. When deploying BGP route reflectors, there are a couple of architecture questions you need to, how many of them do you need, most likely more than two, but how many more if needed. Where should you deploy the route reflectors? We see what we have decided to do. How should you interconnect them in the backbone and fix for example we have decided to dual attach each route reflector to two individual ES routers to maximise their ability. What we have done as well is we have been using 100 gig interfaces to monitor route reflectors available in the market, not necessarily we need to process 100 gig of BGP traffic, we wanted to stick to the standard and just make things easier to deploy.

For the rest, the design was very conservative, we say it has to work, it has to be very stable. These are critical pieces of equipment in the network, so we stick with over BGP to switch over BGP traffic inside or between the different ES, but for the rest we are conservative, we are using the default BGP timers, nothing created.

One key problem of route reflectors is they are introducing optimal routing and we don't want to do, for full mesh you have optimal rowing the first way is use BGP optimal route perfection which is doing what it's supposed to do. We didn't go in way for the following reason:

The way BGP walks essentially you configure a number of routes on the reflector and you put clients in specific groups but the more groups you have and the more BGP path you have on your reflector, the longer it will take to converse so we didn't want to go this way, instead we went with the following design we are using BGP additional path and what we are doing is we are doing BGP best path as back up for internet routes so the one million before 256 and for the rest and for tactical routes which are flagged with special BGP communities we are doing BGP additional paths, we are sending everything to the clients and we let the clients work.

Now we have a design, we have to pick up a platform a solution. This summarises the available options on the market. It's vendor dependent, right. Every vendor on the market today is about to provide a different solution, there are traditional routers where the main function is perform traffic forewarning. You could go with a VNF, like a virtual machine that will run over an authorisation layer and you can go with a fully integrated applicance where essentially the network operating system is running bare metal over a big server essentially. What we want is scale performance, and what's better than the big server which has a lot of CPU and a lot of memory. The good thing is they are not bad options. I would the right solution for you, some of them are technical, for example scale, performance, etc, some of them are not technical, we see related to organisations like for example if you decide to go with VNF, is your operation team capable to deploy, operate, secure and optimise the underlying virtualisation layer, for Netflix we decided to go with the alliance way which was providing the best performance available.

RIPE is also a good opportunity to share with the community the best practices and here again vendor specific, it will applyy to any solution that you have already today.

Summarising here are different things you should do today and if you are not doing that today, maybe you should reconsider them. I have grouped them into several families. For security, for example, you want to make sure that you are doing you will contraplane policy and you have values right, not to too low, not too high. For scale a couple of techniques available as well, especially if you have a high number of clients. You might let the clients initiate the BGP session, specifically useful when you are doing software upgrades, instead of the route reflector trying to initiate 2,000 BGP sessions at, you let the clients initiate the BGP and the TCP session. Similarly when you are using reflectors, you are not in the forwarding plane, you don't need to install all the the routes in rib, you can skip those using some features. If you are doing VPN V4, consider using rot target constraints to do some filtering, you want to make sure packing as many BGP as possible enable packing, if not nailed by default. To make sure you are packing all those BGP updates.

Most BGP stack on the market and industry today are using the concept of BGP groups or peer groups, not only to to it helps dealing with configuration but doing some optimisation, we are building BGP update once and replicate across all the clients which makes things very efficient. Why it's very good to speed up convergance, it has some constraints, I would say. It's because if you have a single client which is part of this big group which is a little bit slower, it will slow down the rest of the group and it will impact negativity your convergance times. You want to make sure that deploy something which is able to detect the slow peers and mitigate them around the way we do mitigation is isolate the clients, slow clients in the dedicated groups.

Those are the production environment numbers, we are operating at very high scale here, there are big differences between the different ‑‑ the smallest one is Latin, we have over 4 million BGP path, the biggest is North America where we have almost 50 million BGP paths. There is a pair of reflector by geography, a single cluster and we have full mesh between the route reflectors, very simple design and as you can see what we have done is we have placed the route reflectors as close as possible to the clients. It was not making sense to deploy route reflectors here. Obviously before deploying things in production, we went through validation phase and what we have done together with Netflix is entered after the vendor validation into what we are calling early field trial phase, it's a process at Cisco where we co‑design and co‑test the products while they are still under development, it's a win partnership because it allows customers to speed up the validation and also ROAs vendor to expose and test solutions outside of our labs, the idea is to catch and fix before deployment. I would like you, telling you we didn't have any issues, of course there are things we couldn't catch in the lab, in production, things we did for example to tell tree at high scale like you need the production scale to be able to catch those.

Other than that, it was very traditional validation, I would say we had document we released the test, we consolidated everything, here on this screenshot what you can see is the production convergance time for IPv4, meaning behind the scenes IPv6 VPN V had and 6 are also converging and that was after a route reflector reroute to make sure everything was working fine, 0 minutes to converge 50 million BGP paths. Which is pretty fast.

Over to Elisa.

ELISA VENNEGUES: So it's big question is how do we get there. From a design and a concept to getting into the real world of our backbone so we came up with a phased migration so we can run the two control plane, the mesh and the new era in parlay so that we can safely and smoothly migrate from what we are running to today so what will be run tomorrow. Sop in 2025, when we first started getting the device, we started with the on site installation and turn up. So pretty easy, depending on the country you are sending those boxes to. Then we came up with the staging, so basically the route reflector will here with the ESs and be part of the IGP mesh and started some pilot prefix with a specific BGP community so we can track those pilot prefixes, being able to release all the advertisement from the router so that we can basically mimic and compare and validate that the configuration on the reflector were acting as expected so that was a very interesting phase that we were able to spot if anything was going, in terms of advertisement of those pilot prefixes and then we have we will leased the advertisement from all the devices, we were able to assess the scaling limits of the boxes but also all our tooling against those router reflectors, that's when we spotted the GNMI issue that was basically triggering a 99% authorisation on the boxes because we were using a young specific model that is not open config, and not optimised on the route reflector, base clear we were passing all the table every inso that it was too intense for the boxes. So that was a good step to basically assess all, everything at scale before we were able to do the next phase which is a poor device migration, meaning that we just set a tag on the poor device basis and it will trigger the deployment automatically to the client and it's, what it does is basically adjusting the route policy in out on the route reflector on and the client, we did it such that it didn't impact traffic, we released it to the client but we don't previous pref on the client side when we receive the prefixes, such that we can easily compare what we receive from the mesh and the route reflector, the route reflector is not the preferred pass, so we still operate as previously and then we swap that pref when we are confident that the route we have on the client is what we are expecting to have.

So we swap the preference tag so we swap the preference on the route policy and the route reflector route over the mesh route. So that has been done on the device basis overall our devices, per batch. When we were confident enough that we knew for sure using some scripts to basically do the comparison before and after and we migrated all the devices to this new state, we stale operate both, so in case we are not covered with route reflector, we still have the mesh and we are now at the stage where we shut down the mesh shall the peer group level on the client and we leverage only on the reflector route and the final stage is will be to removed the iBGP mesh, we rely heavily on the BGP community to tag routes so we know what we receive and where those routes need to be reflected. That's how we basically fine tune and filter what we send back to the clients when we know that that route is needed on that client. So that's something additional, I didn't mention mention before but that's the way we reduce the number of paths rereflect back to the clients because we cannot send 50 million paths obviously.

Came up with some telemetry, that's the way we thought was easy to operate and monitor, that migration is going on the right path so we mostly target open configure model so that we know that we can be used those paths for every devices in our network, depending on, independently of the vendor basically.

We monitor the BGP session counts, session established on the route reflector so we know if thinking goes wrong at what, at any point of time and we also monitor the number of routes received and pass counts per address family and reflect that back to the client, it gives us a very good view during the migration and even after if anything needs to happen on the route reflector.

And then I go to Fred for the lessons learned and conclusion. Thank you, Fred.

FREDERIC CUILLER: There you have it, there's the story together. Good old BGP routing, we are done finally. It took us more time than initially planned, it was supposed to be a simple BGP rout traffic project but like network engineering, there's always things going on but we are done, we have migrations now over so that's good.

We did all these changes and implemented this project without, I believe, breaking anything and nobody noticed anything, right, on their network. Netflix delivered new services such as live and gaming which were not possible before, there's still a couple of things we'd like to do, a couple of project around visibility especially, we are doing BGP telemetry at high scale and high frequency every 30 seconds but remember telemetry is all about streaming states or contours and now we would like to go deeper and to do that, we have to rely on BGP monitoring protocol and that would be most likely the next project that we will work on together, again still some architecture discussions to consider, should we enable bnp on the route reflector and we reach a central point of getting full visibility or should we bnp instead on the border of the network.

That's it for today. I wish you an excellent rest of the week at RIPE. We are happy to take any questions during the Q&A section or even afterwards if you want. Thanks everyone.

(APPLAUSE).

AUDIENCE SPEAKER: Hello. Far from a simple project, really impressive, changing the plane on flight. So a couple of questions, that the first is rather rhetorical, from what I understood you went with the Cisco solution for your route route reflectors, I guess it comes from your presentation. The question was, do you rely just solely on a single vendor for this? Wasn't there a concern there that, do you deploy multiple vendors?

ELISA VENNEGUES: So, today we have two sides of the coin, when it comes to delivering the traffic at Netflix, we rely on the route reflector for everything networking related and when it comes to serving the clients, we also have like BIRD stack which basically gathers all the route from the ESs and at a local, you know, IX site. So and that is what is feeding the steering today. So we have two basically control planes, one is for the backbone itself and one for the customer delivery traffic.

SPEAKER: For the backbone you just utilise the...

ELISA VENNEGUES: We run mostly on the client side and we rely only as of today on the Cisco stack for the route reflector.

SPEAKER: Thomas Strikx, Cloudflare. Thank you. Super difficult to change the engines on the plane mid flight so super impressive, as an operator of a previous large network, we did traffic engineering on a day to day basis to mitigate congestion, moving to a route reflector set up has massive impacts on kind of how you do traffic engineering on a day to day basis, was that an impact you noticed on the operational teams, is that something you needed to consider at all, is traffic engineering on a problem for Netflix?

ELISA VENNEGUES: By traffic engineering you mean traffic engineering when you steer the traffic so the client?

SPEAKER: No, when you steer the traffic I guess to the eyeball that's requesting the data that you have through your backbone, so you have to change an interface to reroute through the broken backbone.

ELISA VENNEGUES: The way we steer the traffic is we own the client and we own the server, right, so we just set, send a trigger to the client to resteer so that's the way we are operating this steering in Netflix and it's not based on the route reflector today.

SPEAKER: You don't have to do any BGP traffic engineering at all.

ELISA VENNEGUES: We do but not customer engineering traffic.

SPEAKER: Thank you.

SPEAKER: Thanks, this is really a great internal experience from a large network, I wanted to ask you if you were using BGP prefix independent convergance, if you have that enabled and I know it's super useful if you had it enabled in the rest of the network as well?

ELISA VENNEGUES: We do.

SPEAKER: I found that enabling BGP on the normal internet address families, mitigated the need for extensive optimal route reflection or ad path because it always installs a second path anyway so you have like a project path that it will use in multipath and stuff

ELISA VENNEGUES: That's a very valid point. The reason why we add to add path, we wanted to reflect some route to Metro so some other Metro, because the route reflector, centralised, we need those add paths and we compiled those paths and used BGP community to match on those communities that defined the Metro so that we know for sure that we are reflecting the right path so for us pick wasn't enough for the use case that we are trying to solve for.

SPEAKER: Thanks.

SPEAKER: Paolo, co‑chair of grow working group at IETF. So BGP telemetry is very important, I was curious why you went the let's say the open configure way?

FREDERIC CUILLER: I can tell you the story.

SPEAKER: I was curious whether it was a limitation and I have a second thing which is like there is quite some work being done right now in the working group around the use case of BMP route relectors so I guess your experience with this deployment would be very beneficial if you want to at some point take it back.

FREDERIC CUILLER: I can take the two questions, for the second one you are most likely in touch with people we work with already, I am happy to have Netflix participating in this working group group, this is a story we initially went with the Cisco yeng model because they were richer, the thing at Netflix scale with 50 million BGP path and at very high frequency like every 30 seconds, what we encountered software implementation and it turns out we had optimisation done for the open configure were there but which was not back vaulted in the Cisco yeng model, it was a good thing because it is a thing, we wanted to standardise as much as possible on open configure so we have one model to with the route reflector and so the clients... had a precision the route reflector was not under 99 CPU utilisation, right. Of all the BGP threats we have, there is one responsible for management stuff, which enters telemetry, show commands, etc and that was a single BGP Thread which was going ICP U and the quick fix was to switch to the open config model which was not exhibiting this behaviour because we had caching optimisation, we are not working the 50 million BGP path, what we have done since is we have back vaulted the open config optimisation back to the Cisco model, so now you can use whatever you want.

BRIAN NISBET: OK. Thank you very much.

(APPLAUSE.)

OK so that's concludes our session for now this afternoon. Please rate the talks, nominate yourself for the PC. Make sure you register for the GM and enjoy your lunch. Thank you very much.

(Lunch break)