157 points by hjuutilainen 774 days ago | 34 comments

runjake 774 days ago [-]

The author states they have evolved from Ansible to pyATS[1], but pyATS is a Cisco project. With Cisco's poor code project and open source track record, I'm not sure how this is much of an improvement, and IMHO, it's arguably worse.

For possible alternatives, check out NAPALM[2] and Nornir[3].

It's also worth checking out Python for Network Engineers[4].

1. https://developer.cisco.com/docs/pyats/

2. https://napalm.readthedocs.io/en/latest/

3. https://nornir.readthedocs.io/en/latest/

4. https://pyneng.readthedocs.io/en/latest/index.html

xnyanta 774 days ago [-]

Had the same reaction as soon as I found out pyATS is a cisco-specific thing. I run very simple networks for events on shoestring hardware/budgets and built a simple wrapper around my own object model using python, jinja and napalm to deploy cisco switches via SSH. Has terraform-like semantics (plan/apply) and lets me be productive and eliminate config drift. Napalm does all of the heavy lifting, it is fantastic. I will probably be integrating it with netbox soon.

batch12 774 days ago [-]

Looks like he works for Cisco at the moment. Maybe that has something to do with it.

nu11ptr 774 days ago [-]

I do network automation for a profession. I build tools (technically compilers) that take a proprietary object model designed for our private cloud and translate that into Ansible (v1) or Terraform (v2) code. At our company, I actually call using these tools in isolation doing it "manually". This is because the largest benefit of automation, I believe, is the abstraction gained from the new object model and being to to generate and store the inputs for Ansible/Terraform in a database. If you have to track and specify all the inputs into Ansible/Terraform and write the playbooks/HCL manually it is my experience you don't actually save all that much work. However, when you have an object model specifically designed for your use case, you can deliver a new client network in literally minutes (essentially nothing more than the cloud model, exactly what AWS/Azure, etc does for their networking). The downside is most enterprises don't have people like me to write the code to do this, and writing it for a single deployment would likely not see the gains that we see as a managed service provider.

totallywrong 774 days ago [-]

Isn't that a lot of words to say that you have a custom set of Terraform modules for your needs? If you're describing a different or better way to do it I'm missing it.

nu11ptr 774 days ago [-]

No. It is a frontend application that works as a CRUD REST API, validates the data, generates what it can, and stores it into a database/IPAM. It can then be changed, viewed, modified, deleted, etc.

When you are ready to deploy I "compile" the object model data into an IR (representing the "network topology") and then make a final pass and translate into HCL for all the various backends.

I'm not saying its "better" as it has trade offs. I'm saying for networks specifically, it is the only way I've seen in the real world to give these tools lots of value. Otherwise the network engineers end up spending all their time looking up the input data (vlans, subnets, ips, etc.) which is the part that is most time consuming for manual configuration as well. The validation and auto-generation of the input data is where the value comes in.

totallywrong 774 days ago [-]

Got it thanks, makes sense. The way I've frequently seen this done, that goes more in line with the IaC and GitOps trends, is people making a PR to the config repo with the required values. Then a pipeline runs and does all validations, pulls data from external sources, and runs the terraform plan. If everything looks good upon review a merge applies the saved plan.

dangus 773 days ago [-]

Interesting way to do things there. Have you looked into Pulumi or Terraform CDK?

I don’t know if either of those would help you or not and I’m not proficient in either, but some of the components you described seem like they might have some overlap.

nu11ptr 771 days ago [-]

Those things are about using code instead of HCL for modeling primarily. For us, it is about UI and UX (it is a REST API consumed by a Rundeck form and other services) as most of our engineers are not devops trained. Also, TF is only one possible backend. We actually emit other configuation code and configuration instruction sheets as MD and PDF for things we don't support.

jmbwell 774 days ago [-]

There's a push and pull; ansible and terraform both have some facilities for doing what you describe, but of course if you're using both tools, then you wind up where you are, needing yet another layer of abstraction common to both.

In the book, the author presents an approach for storing the object state and organizing the repository for ansible purposes in what is at least as sensible a way as any other I've seen. For installations that might not directly benefit from additional layers of abstraction, managing object model state using ansible's native functionality might well be sufficient.

This is all a legitimate challenge, in any case. Network infrastructure and service instances have some management issues in common, but where they differ, they can differ by quite a bit, in ways that are hard to model at any level of abstraction.

nu11ptr 774 days ago [-]

I'm not using both. The first version of my tool used Ansible. The second version used Terraform. They were written 4 years apart. My users are not devops savvy. They use runbook forms to call into my API giving them a very simple UI that requires almost zero input. The object model includes lifecycling so certain attributes can be changed, etc. and validation done to ensure only a correct network is output. This isn't required by everyone, but it wasn't done out of necessity on how I'm using the tools, but to satisfy the business problem I'm trying to solve (automate network deployment with as few human inputs as possible over the entire lifespan of a client and infrastructure).

I wasn't critiquing the author, but networks inherently have a lot of input data. Much of this is not of concern to the end user, hence why public clouds require almost zero input on the network side.

I agree that my object model is purpose built for our product. It would not work for someone else's network.

thestepafter 773 days ago [-]

I’m currently using Ansible for something similar. Mind if I ask why you switched to Terraform?

nu11ptr 773 days ago [-]

Faster: it uses a local state file, so it doesn't need to interrogate the devices every time.

Stateful: you don't have to manually track "present" and "absent" - you just omit and it will notice it needs to delete it

More standard: Writing HCL is very similar between providers. Every module in Ansible typically behaves pretty differently

tmerse 774 days ago [-]

This sounds interesting, but I am not sure I fully understand. Could an analogy be the object model to loosely correspond to sth like Amazon cdk and the Ansible part being the derived Cloudformation (any other analogy should do, but those are things I understand a bit more although I use quite a bit of ansible, but I am no network Person)? I still don't fully understand the database part. Is it a better way to manage env variables/allows for more flexible input?

Thank you

nu11ptr 774 days ago [-]

Essentially we have a very specific network topology we are trying to build for each of our clients. The goal is to auto-generate as much of the input as possible, validate that which is given, and allow it to be lifecycled (attributes can change, but only in certain valid ways, objects created/changed/deleted, but only if they aren't referenced by other objects, etc). Due to this, a database is need to store each "object". When the network is "pushed", the database walked and a fresh set of ansible (or terraform for v2) is generated in seconds.

Iow, it is custom set of lego bricks that can only be combined in certain ways to build valid networks. It is propriety to our cloud product which has the benefit of allowing us to abstract things away that others probably couldn't, but the downside of making it entirely non-reusuable for a different use case.

tguvot 773 days ago [-]

just curious, is your system publicly available or is it internal tooling of yours ? i spent a lot of time in service orchestration domain, and it been hobby of mine ever since.

nu11ptr 773 days ago [-]

internal, sorry

tguvot 774 days ago [-]

i worked on a product that did something similar for telecoms. had a closed loop automation and graphical designer for object model. it was 10 years ago.

looking today at all the manual work with playbooks/etc, it's astonishing. feels like things didn't move forward at all in past decade

dopylitty 774 days ago [-]

Even in the big public clouds the user facing networking really hasn't progressed beyond a layer of lipstick on top of the kludges that were created for connecting physical servers 40 years ago.

For instance in AWS you still have to care about BGP and ASNs if you want to follow the most seamless approach to create a multi-region mesh of VPCs. Why should I have to care about that? AWS already knows where all the packets came from and where they're going and should just put them in the right place. I don't care how they get there and I certainly shouldn't have to care about BGP attributes[1].

1. https://docs.aws.amazon.com/network-manager/latest/cloudwan/...

tguvot 773 days ago [-]

probably interoperability with "legacy" equipment and networks

jagged-chisel 774 days ago [-]

Are you using an open source tool/stack to do this? Sounds pretty awesome and I’d love to learn!

nu11ptr 773 days ago [-]

Mostly - Python and MongoDb mostly

xnyanta 774 days ago [-]

This model is probably more common than you think, I don't see how anyone would be doing this any other way in a scalable fashion.

theideaofcoffee 774 days ago [-]

I glanced through the guide and it's Windows and Cisco (specifically IOS) heavy: mentions of the old Cisco architecture via Core/Access/Distribution, where larger DC networks have converged onto spine/spline setups, CDP/Cisco Discovery protocol whereas the open-source LLDP is more generic, even the nomenclature of 802.1q VLAN tags: access versus trunk. But I guess if you are starting to automate a legacy office network, it might be useful.

More recent non-IOS network OSes that lend themselves to automation, especially in the datacenter, the likes of Cumulus or SONiC are pure linux with some asic-vendor-specific bits and bobs, so I'm unsure of the applicability of this guide to larger, more modern networks. Tools like ansible could be a good fit here, but since they are 'just' linux, might as well use a dedicated config management tool like chef or puppet.

Otherwise I think it's well written for someone in a smaller shop wanting to get their feet wet with ansible and other tools but still stuck on IOS.

jimmar 774 days ago [-]

> old Cisco architecture via Core/Access/Distribution, where larger DC networks have converged onto spine/spline setups

Please correct me if I'm wrong, but I see the "old" core/access/distribution layers still relevant. The datacenter spine/spline setup applies to networking between server racks in the data center.

> 802.1q VLAN tags: access versus trunk

Again, are you saying that these are outdated? I'm not a practicing network engineer, but I know several network engineers and they've told me that understanding 802.1q VLAN tags to segment network traffic has been helpful.

kazen44 774 days ago [-]

> Please correct me if I'm wrong, but I see the "old" core/access/distribution layers still relevant. The datacenter spine/spline setup applies to networking between server racks in the data center.

this is correct. The place where spine-leaf really shines is when used in combination with evpn-vxlan. You can then encapsulate every tenant network inside a VXLAN domain and route those between your leafs switches through your spine layer.

This is basically a clos fabric which is non-blocking, and is very easy to expand horizontally. It also gives you nice features like ARP suppression[0]. These features are important in a DC fabric because ARP flooding is traffic which is not revenue generating, and should be minimized as much as possible.

For normal Enterprise/Office network, running an evpn-vxlan fabric is usually far to complex for the benefits involved.

[0] https://satishdotpatel.github.io/how-does-arp-suppression-wo...

darkr 774 days ago [-]

> 802.1q VLAN tags: access versus trunk

I think the parent was saying that these are Cisco specific terms; more generic terms would be "untagged" + "tagged".

ajsnigrutin 774 days ago [-]

Trunk and access ports are like kleenex and bandaids. Yes, technically cisco terminology, but used everywhere.

iso1631 774 days ago [-]

Absolutely, here's a config from one of my aristas(with bits snipped)

   interface Ethernet1
      switchport trunk native vlan 899
      switchport trunk allowed vlan 801
      switchport mode trunk
   interface Ethernet13
      switchport access vlan 311

And on a Juniper

   set interfaces xe-0/2/1 unit 0 family ethernet-switching interface-mode trunk
   set interfaces xe-0/2/1 unit 0 family ethernet-switching vlan members Mgmt_B
   set interfaces xe-0/2/1 unit 0 family ethernet-switching vlan members Audio_2
   ....
   set interfaces ge-0/0/19 unit 0 family ethernet-switching interface-mode access
   set interfaces ge-0/0/19 unit 0 family ethernet-switching vlan members Audio_2

When Cisco, Arista, Juniper all use access vs trunk it's hardly a vendor specific term

dvno42 774 days ago [-]

Hey this is cool! Thanks for sharing your hard work.

I have been living this for the past few years building an automation product[0] and services company to lower the barrier of entry and have tested many of these methodologies. We’ve also written many different runbooks/playbooks for complicated workflows. I’d like to share a couple experiences/opinions:

Netconf and vendor apis are lovely when available and working well. Many devices don’t support this and falling back to SSH (sometimes even telnet) is a must for automation. Imo, you could add value to your book by touching on Ktbyer's Netmiko/Paramiko[1] as well as their nuances (timeouts, dealing with interactive prompts, etc).

AAA is a big component of automation too. Having something in place to handle authn/authz (radius/tacacs) enables consistency for access across vendors. This also enables least privileged accounts and rotation/limited lifetime of creds when used with something like Hashicorp Vault[2]. I think you briefly mentioned secrets management though Ansible vault.

Another technology that may be worth mentioning is Textfsm[3] in conjunction with Netmiko. When we automate workflows for clients, there’s often times where the data we need to parse isn’t easily parsable. Using and expanding on textfsm makes this doable.

Lastly, much automation may only be one firmware change away from breaking. Even with the big vendors, bugs are common that are (ime) low priority to the OEM. Keep this in mind when writing runbooks/playbooks, try to rely on features and output that are unlikely to change across versions.

[0]https://realmhelm.com [1]https://github.com/ktbyers/netmiko [2]https://github.com/hashicorp/vault [3]https://github.com/google/textfsm

Cyph0n 774 days ago [-]

+1 to textfsm: it is an extremely powerful approach to reliably parse CLI-based outputs. I used to do some IOS-XR device automation when I worked at Cisco - mainly for integration testing - and I (and other teams) used it heavily.

This ties in to your point about how you often need to fallback to SSH or Telnet. For example, a lot of platform-specific data isn’t exposed through standard interfaces, but almost everything is available through a CLI. There are also times when you have no choice but to use the CLI - for example, when re-imaging or reloading a device.

metadat 774 days ago [-]

Direct link to the PDF:

https://github.com/automateyournetwork/automate_your_network...

betaby 774 days ago [-]

ctr+f 'yang' - nothing

ctr+f 'netconf' - nothing

SergeAx 773 days ago [-]

> I believe in open source software

But... PDF is not "open source", it is literally a compiled binary blob :)

This is pretty cool book though. If author reading this: can you please publish real source files for the book?

Loading comments...

runjake 774 days ago [-]

For possible alternatives, check out NAPALM[2] and Nornir[3].

It's also worth checking out Python for Network Engineers[4].

1. https://developer.cisco.com/docs/pyats/

2. https://napalm.readthedocs.io/en/latest/

3. https://nornir.readthedocs.io/en/latest/

4. https://pyneng.readthedocs.io/en/latest/index.html

xnyanta 774 days ago [-]

batch12 774 days ago [-]

Looks like he works for Cisco at the moment. Maybe that has something to do with it.

nu11ptr 774 days ago [-]

totallywrong 774 days ago [-]

Isn't that a lot of words to say that you have a custom set of Terraform modules for your needs? If you're describing a different or better way to do it I'm missing it.

nu11ptr 774 days ago [-]

No. It is a frontend application that works as a CRUD REST API, validates the data, generates what it can, and stores it into a database/IPAM. It can then be changed, viewed, modified, deleted, etc.

When you are ready to deploy I "compile" the object model data into an IR (representing the "network topology") and then make a final pass and translate into HCL for all the various backends.

totallywrong 774 days ago [-]

dangus 773 days ago [-]

Interesting way to do things there. Have you looked into Pulumi or Terraform CDK?

I don’t know if either of those would help you or not and I’m not proficient in either, but some of the components you described seem like they might have some overlap.

nu11ptr 771 days ago [-]

jmbwell 774 days ago [-]

nu11ptr 774 days ago [-]

I agree that my object model is purpose built for our product. It would not work for someone else's network.

thestepafter 773 days ago [-]

I’m currently using Ansible for something similar. Mind if I ask why you switched to Terraform?

nu11ptr 773 days ago [-]

Faster: it uses a local state file, so it doesn't need to interrogate the devices every time.

Stateful: you don't have to manually track "present" and "absent" - you just omit and it will notice it needs to delete it

More standard: Writing HCL is very similar between providers. Every module in Ansible typically behaves pretty differently

tmerse 774 days ago [-]

Thank you

nu11ptr 774 days ago [-]

tguvot 773 days ago [-]

just curious, is your system publicly available or is it internal tooling of yours ? i spent a lot of time in service orchestration domain, and it been hobby of mine ever since.

nu11ptr 773 days ago [-]

internal, sorry

tguvot 774 days ago [-]

i worked on a product that did something similar for telecoms. had a closed loop automation and graphical designer for object model. it was 10 years ago.

looking today at all the manual work with playbooks/etc, it's astonishing. feels like things didn't move forward at all in past decade

dopylitty 774 days ago [-]

Even in the big public clouds the user facing networking really hasn't progressed beyond a layer of lipstick on top of the kludges that were created for connecting physical servers 40 years ago.

1. https://docs.aws.amazon.com/network-manager/latest/cloudwan/...

tguvot 773 days ago [-]

probably interoperability with "legacy" equipment and networks

jagged-chisel 774 days ago [-]

Are you using an open source tool/stack to do this? Sounds pretty awesome and I’d love to learn!

nu11ptr 773 days ago [-]

Mostly - Python and MongoDb mostly

xnyanta 774 days ago [-]

This model is probably more common than you think, I don't see how anyone would be doing this any other way in a scalable fashion.

theideaofcoffee 774 days ago [-]

Otherwise I think it's well written for someone in a smaller shop wanting to get their feet wet with ansible and other tools but still stuck on IOS.

jimmar 774 days ago [-]

> old Cisco architecture via Core/Access/Distribution, where larger DC networks have converged onto spine/spline setups

Please correct me if I'm wrong, but I see the "old" core/access/distribution layers still relevant. The datacenter spine/spline setup applies to networking between server racks in the data center.

> 802.1q VLAN tags: access versus trunk

kazen44 774 days ago [-]

> Please correct me if I'm wrong, but I see the "old" core/access/distribution layers still relevant. The datacenter spine/spline setup applies to networking between server racks in the data center.

For normal Enterprise/Office network, running an evpn-vxlan fabric is usually far to complex for the benefits involved.

[0] https://satishdotpatel.github.io/how-does-arp-suppression-wo...

darkr 774 days ago [-]

> 802.1q VLAN tags: access versus trunk

I think the parent was saying that these are Cisco specific terms; more generic terms would be "untagged" + "tagged".

ajsnigrutin 774 days ago [-]

Trunk and access ports are like kleenex and bandaids. Yes, technically cisco terminology, but used everywhere.

iso1631 774 days ago [-]

Absolutely, here's a config from one of my aristas(with bits snipped)

   interface Ethernet1
      switchport trunk native vlan 899
      switchport trunk allowed vlan 801
      switchport mode trunk
   interface Ethernet13
      switchport access vlan 311

And on a Juniper

   set interfaces xe-0/2/1 unit 0 family ethernet-switching interface-mode trunk
   set interfaces xe-0/2/1 unit 0 family ethernet-switching vlan members Mgmt_B
   set interfaces xe-0/2/1 unit 0 family ethernet-switching vlan members Audio_2
   ....
   set interfaces ge-0/0/19 unit 0 family ethernet-switching interface-mode access
   set interfaces ge-0/0/19 unit 0 family ethernet-switching vlan members Audio_2

When Cisco, Arista, Juniper all use access vs trunk it's hardly a vendor specific term

dvno42 774 days ago [-]

Hey this is cool! Thanks for sharing your hard work.

[0]https://realmhelm.com [1]https://github.com/ktbyers/netmiko [2]https://github.com/hashicorp/vault [3]https://github.com/google/textfsm

Cyph0n 774 days ago [-]

metadat 774 days ago [-]

Direct link to the PDF:

https://github.com/automateyournetwork/automate_your_network...

betaby 774 days ago [-]

ctr+f 'yang' - nothing

ctr+f 'netconf' - nothing

SergeAx 773 days ago [-]

> I believe in open source software

But... PDF is not "open source", it is literally a compiled binary blob :)

This is pretty cool book though. If author reading this: can you please publish real source files for the book?