Adding capabilities to my ELS system

Oh Infiniband - that's fun stuff to play with. I started out working on storage systems with FDDI which was smoking fast at 100Mbps! You can get to the north side of 950Mbps by using jumbo frames on gig TCP/IP links. That's the thing about iSCSI for storage - you have the overhead of TCP/IP and if the interoperability is more important than performance it's a good solution but it's not always the fastest. I looked quickly and people were claiming of getting about 95Mbps on the Teensy. Looks like you need an ethernet kit which is a ribbon cable, a capacitor and a jack that goes onto a small circuit board. I worry that this might send you down another rabbit hole but maybe it's unavoidable.

I think the advantage of the FTDI is dedicated hardware serial which may or may not help you in this case.
 
Oh Infiniband - that's fun stuff to play with. I started out working on storage systems with FDDI which was smoking fast at 100Mbps! You can get to the north side of 950Mbps by using jumbo frames on gig TCP/IP links. That's the thing about iSCSI for storage - you have the overhead of TCP/IP and if the interoperability is more important than performance it's a good solution but it's not always the fastest. I looked quickly and people were claiming of getting about 95Mbps on the Teensy. Looks like you need an ethernet kit which is a ribbon cable, a capacitor and a jack that goes onto a small circuit board. I worry that this might send you down another rabbit hole but maybe it's unavoidable.

I think the advantage of the FTDI is dedicated hardware serial which may or may not help you in this case.
In 2013 or so we had a group of servers that did GPU based computation (massively parallel computing) that each had 3 GPU's per server, and three servers, or a total of 9 GPU's. They were interconnected with Infiniband. We did E-Mag simulations on it. (My colleague and I wrote the requirements for the servers.) We did whole vehicle simulations at 77 GHz. (The car was meshed into cells roughly 0.3mm x 0.3mm x 0.3mm. That's many millions of cells. Plus the space around it was meshed as well.) We could finally show customers exactly why putting a crease in the bumper was a bad idea for radar performance. We were able to show the time domain radar waves bouncing or diffracting off the crease, as well as showing what would happen if we were able to round an edge. Before that, we couldn't show the customer why something was not advisable. We made videos showing what happens.

Have done the jumbo packet thing MTU=9000 vs MTU=1500. It increases the data payload percentage. It's ok on a dedicated network, not good on a public one. Have an Ethernet kit here. I did play with it early on, but not for logging, more to see that it worked. But no, it's not a rabbit hole that I wish to enter. It might be a viable option, though.

FTDI is low bandwidth in comparison. Solution may be something in between. Or FTDI, just because I can try it. However, FTDI reminds me of CAN, it's slow compared to many alternatives. If the pipe is too skinny to empty the buffer, the buffer overflows... And then you lose the very information that you were trying so hard to capture. I've seen that happen on our radars. We installed 100 Mbit ethernet to data log.

So the solution is to be more selective with what is transferred, and take chances that you will miss logging the important stuff that happens rarely, or to capture as much as possible and deal with those problems. I was astonished at how easily I hit 1.2MB/sec in the idling case. Yes, I can reduce that by more than 1/2, but that's still a lot of output.
 
Ok, found the reason for why folks say USB has blocking on Teensy. Dug down deep into the core files. Sandwiched inside of a transfer subroutine, in usb.c, there is a do while loop, (inside of another loop) that disables all interrupts for up to 2400 cycles, or 3.84 us. As far as I can tell, all interrupts can be blocked for longer than that, since multiple do while loops can be executed. The transfer routine is called for both TX and RX operations. Honestly, it's what I consider a bug, or more kindly, an oversight, which could be made interrupt safe. Not sure I want to play at this level, as I can brick my device (I think) by making an ill-reasoned or uninformed change.

The algorithm that is the core of my ELS cannot tolerate blocking of the position ISR for more than 6 us.

Wonder if there's some lurking interrupt issue in ethernet?
 
I think you're getting to the root of why dedicated hardware serial can be better performance than shared USB.

I did see that there were different drivers/libraries available for the ethernet - there's a native library and also a QNEthernet library. I saw some people claim 95Mbps but if that locks up the Teensy that won't be all that handy.
 
So, if you avoid writing to USB the transfer calls and associated blocking code isn't run, correct? That is what I've sort of picked up on elsewhere. Not sure about ethernet. That may be configured similarly.

How much do you really need to log, to understand what's going on? If you look at the state machine state, and a few variables regarding syncing would that help you refine what you need to plot later? You are correct that the FTDI isn't blazing fast. Another solution might be another teensy and a 20mbit serial connection between them, then used the second one as a buffer???

As for networking and data throughput, we run into similar issues. Fortunately most of our situations don't required networked transfers. We moved data from hardware to PC, and with modern interfaces like USB3 the throughput is fast enough for what we do. The bonus is that it's also available on most PC hardware now. And with modern SSD and M.2 drives, dumping data to disk is fast.
 
Back to the old way for the moment. Reducing USB traffic a bit. Still need to get some additional variables. This last trial, I hear open loop motor chatter. I'm trying to figure out if that's just the motor, or how I'm controlling it. My latest mini log is not revealing. I need to log state of the direction, to see if it has been corrupted. It should always be CCW.
Code:
ThreadtoStop selected.  Warning, use at own risk!
Enter Special Test Threading Mode, use great caution!
Entered if(zstop7touched),  t = 25.0833626217
Zstop7 =-0.0002
Zstop7 has been adjusted by 5um to adjust for over travel
Entered if(zstop7touched),  t = 25.2326572683
Zstop7 =-0.0002
Zstop7 has been adjusted by 5um to adjust for over travel
Entered if(zstart7touched), t = 51.8866569817
Zpark7 = -0.7647, Zpark7 has been adusted for over travel
Entered if(start7touched),  t = 77.6015775433
Virtual T2S started.  Extremely Experimental!
Zval7 = -1.5385, zstart7 = -1.538504, zpark7 = -0.764724, zstop7 = -0.000157
Zval_local set,          t = 77.6015996283
t2smode true,            t = 77.6016044467
t2smode disable stepper, t = 77.9629507617
Stepper is paused at     t = 77.9629560283
Display blanked for 601364 us
Calc virt ang adj,       t = 77.9629652033
stepper_virtual_angle = 254.250000, stepper_relative_angle = 152.100006, virtual angle adjust = 257.850006
sync achieved,           t = 78.0925503583
Accum reset, enStepper,  t = 78.0925555933
Main Loop = 129.6525050000 msec
First Step at            t = 78.0925877333 sec
Zval = -0.0002 in
Have threaded to a RH stop.  Disengage cutter, position Z to Zstart, engage cutter, and press Start again

Sync to firststep = 32.415000 us, paused to sync = 129.599290 ms, degrees = 309.631791, Zsync = -0.7647, virtualAngleAdjust = 257.850006
 
So, if you avoid writing to USB the transfer calls and associated blocking code isn't run, correct? That is what I've sort of picked up on elsewhere. Not sure about ethernet. That may be configured similarly.

How much do you really need to log, to understand what's going on? If you look at the state machine state, and a few variables regarding syncing would that help you refine what you need to plot later? You are correct that the FTDI isn't blazing fast. Another solution might be another teensy and a 20mbit serial connection between them, then used the second one as a buffer???

As for networking and data throughput, we run into similar issues. Fortunately most of our situations don't required networked transfers. We moved data from hardware to PC, and with modern interfaces like USB3 the throughput is fast enough for what we do. The bonus is that it's also available on most PC hardware now. And with modern SSD and M.2 drives, dumping data to disk is fast.
As far as I can tell, there's no blocking if one doesn't read or write to USB. But every usb serial transaction could block other code since it uses __disable_irq, for up to 2400 cycles. I don't think this is common, but it can happen. If the encoder irq is blocked for 2 counts, I slip angular position. As far as I can determine, the code can't detect this error, since it isn't using a HW counter as reference. Definitely a place for errors to creep in. I'll have to keep that in mind.

To answer your 2nd question, umm, all I need to log is the error. But since I don't know what is even the source of error, all I can do is scatter gun log as much as possible, to give me a clue where to home in. Look, this could be a failure to comprehend the "issue" problem, as opposed to any error on my part. I suspect it could be a little of everything...

Being data transport limited is only a small part of the problem. The work around is to send less data. The risk is not logging the "correct and insightful" data during the "incident".

The bigger part of the issue is coming up with a way to reduce the problem space to something that's possible to monitor and fault isolate.
 
Typical goldilocks puzzle where you have too little data, then way too much data and hopefully at some point you'll have just the right amount of data.

Keep breaking the problem down into chunks that you can "unit test" and try to get the minimal amount of data to no overwhelm yourself or the system.

You'll get there - you're in the bog trying to slog through it. Just keep swimming!
 
Did something, I should have done long ago. Found a piece of foam to isolate the constantly running stepper from my desk. It's 20x quieter! A lot easier to concentrate without a droning sound all the time. My desk was acting as a sounding board :(

My RPM has always wandered a little. It doesn't matter to the operator, usually since it varies less than 1. However, I don't think the RPM actually is wandering. I'm driving it from a second Teensy, whose only task is to drive the stepper. There's no interrupts, no USB, just a simple program running as fast as it can. The pulses to the spindle simulator stepper have a period of about 750us and a standard deviation of 19ns, for 200 measurements. So not varying a lot to the motor, which is driving the encoder via a timing belt.

The way I measure the RPM is to count pulses in a 20ms interval. I have a 20ms time base for my system. I think the time base isn't very good.

I notice this, because I blank the display for 2 periods of the alpha filtered RPM. It bothers me that this RPM wanders, it's just another variable. I'm seeing up to 675us period variation in only 3 trials. But the motor only has 20ns of variation at it's input to the driver. I'm comparing peak deviation to standard deviation, but you get my point, it's at least 3 orders of magnitude different. I should change that timing routine, and do it differently. Think I was being lazy, in the very beginning, and it was good enough 3 years ago. Might not be good enough for what I need now.

It might mean that I have somehow induced error into my syncing calculations - which can lead to phase errors in thread start. I don't see how at the moment, but if I'm assuming some short term "constancy" in doing a calculation, and things aren't "constant" over that interval, then error creeps in. Assumptions are dangerous, one needs to challenge them all to see if they are valid for the particular circumstance. Hmm need to evaluate what the error might be if my assumptions weren't true. That means I need to look at the code that was implemented and review what I was thinking, and explicitly verbalize what is needed for this section to work, ie, what exactly am I assuming. Then ask myself, what if that wasn't true, what might happen? It's admittedly cerebral, but my implicit assumptions need to be made explicit, and then individually challenged.

Well shoot, that's not what I wanted to do today. I wanted to do something fun, this is feeling a lot like work.
 
Ok, ran a stripped down test program toggling the built in LED every 20ms interrupt, using a periodic timer. The test program shows a standard deviation of <10ns for over 2600 measurements. Not GPS timing good, but more than good enough for what I need. I will see how to adapt this to my ELS code. Hopefully it will be a lot better than 600 us of deviation (maybe 60 us RMS, using a peak to rms ratio of 10) that I currently have. This might be a red herring. Nonetheless, it makes parts of the process more controlled which reduces overall uncertainty. Who knows, maybe I will be able to uncover something along the way. Have to keep on swimming in the swamp...
 
Back
Top