Good afternoon everyone,
I'm new to the deeper side of networking as I've never had the opportunity before to delve into the dark side of networking, so please bear this in mind...
I look after our organization's I.T. infrastructure, so at the moment, quite the jack of all trades, and recently we began getting funny issues on our network which started with our executive's office Wi-Fi AP losing connectivity as well as their Colour printer losing connectivity, I then after some head scratching pin pointed it to a faulty/tempramental el'cheapo Netgear switch and proceeded to look into replacing these old nasty switches with some of better quality/reputation, so our organization got 4x HP V1920 48 Port PoE (370W) switches and 4x new x121 GBIC SPF cards as we have our old offices linked to our new office via 2x fiber links.
Anyhow, long story short, I replaced the switches this past Friday and all seemed great, until yesterday evening and today the whole day.
We run two VLANs (VLAN1 = DATA and VLAN2 = VOIP) - so basically on my side (new building) I have a Mikrotik RB750 doing the routing between the 2x VLANs (192.168.1.xxx and 20.30.40.xxx) - there are certain ports on my switch configured to be untagged from VLAN 2, not a member of VLAN 1 and PVID 2, which connects to our Mitel 3300 PABX, MBG, etc, in essence anything that has a 20.30.40.xxx IP only, then all other ports are Untagged on VLAN 1, tagged VLAN2 and PVID 1 in Hybrid mode - as stated before, all ran fine for most of the weekend up until yesterday and today I was running up and down with weird issues on the network, and this is how it was set up on our old switches.
Basically what would happen is the phone network would start timing out or destination host not found, high latency 2000+ms on LAN, etc. and the switches would stop seeing each other, e.g. I can't see 192.168.1.50, 52, 54 or 55, but I can still ping devices attached to either of them.
I then disabled RSTP and it came right again for a few hours, then the same would occur, I then re-enabled RSTP and it came right for a while, then it would do the same, and then I finally changed from RSTP to plain STP (almost 2 hours ago and everything is still going fine).
I also played around with disconnecting a single fiber link, then the patch cable link, to see if it starts responding again if I remove either, but it's been inconclusive.
My network basically goes from the switches to SNOM 300 VOIP phones with VLAN ID set to 2, and then the PC receiving an IP on VLAN ID 1.
My network is currently set up as follows:
[url]http://s20.postimg.org/72i840n2l/Network_HP.png[/url]
The orange cable represents the fiber links linking the switches on Ports 52 on all 4 switches, and then the green cable represents the bridge between the 2 links on the other side (connecting to ports 48 on both switches).
My suspicion is there is a loop somewhere, I am still trying to find it, but my question, won't STP prevent these types of anomalies from occurring?
Also, why does it seem to come right when I enable/disable STP on all switches, but then after a while it starts again?
I sincerely hope it stays as stable as it has been now for the past almost 2 hours after I changed from RSTP to STP, but like I mentioned before, my knowledge of advanced networking is minimal so I don't understand fully where the problem might be.
Any feedback/input/questions would be greatly appreciated.