Hi Brian,
On 11-02-2022 15:49, Brian King wrote:
Hello Wikitech,
I’m a new SRE on the Search team. As Ryan and I are in the middle of relocating many
large shards on our Elastic cluster, I wanted to ask your thoughts about using jumbo
frames and/or LACP for our physical Elastic nodes. We’re also moving to new hardware
10Gbps networking, so it seems like a good time to start optimizing our network
settings.
Let us know what you think (and please feel free to suggest any other optimizations).
Unexpected question on this list. Link aggregation (lag) is an easy way
to add more bandwidth without changing the logical setup. In core
networks I'm used to only use lags (even if it has only one member) for
future growth. LACP is the protocol of choice because it's open and
widely supported. Do use fast mode (interval of 1 second instead of 30
seconds). For critical links you can also use microBFD, but that seems a
bit overkill in your case. If you have switches that support MC-LAG, you
can connect to different switches and only have reduced bandwidth when
one of the switches is unavailable.
Every network should be build for jumbo frames with an physical mtu
around 9216 (different vendors, different ways of calculating it) and an
IP mtu of 9000. Gives better performance for bulk traffic like backup,
probably less in other cases. Usually no problem with standard switches
and routers, but firewalls and load balancers might be problematic.
Do buy network interfaces that support the proper offloading because
otherwise it will hit your CPU. I recall some cases in the past where
the slightly cheaper cards didn't support vxlan offloading and the
likes. Not sure if that's still a thing these days.
Oh and if you're not using 10GBASE-T: Do buy right optics. 10GBASE-SR
(multimode, 850 nm, usually black) and 10GBASE-LR (singlemode, 1310 nm,
usually blue) don't mix :-)
Maarten