Wednesday, April 25, 2012

intermittent bogus frames during continuous single stepping

The issue is that CreatePauseFrame accesses a frame (m_CurFrame) which the renderer no longer owns. CreatePauseFrame accesses it indirectly via m_BackBuf, which uses the surface description m_SurfDesc, the lpSurface member of which points to the frame, having been set to m_CurFrame by the Render method. However m_CurFrame is ONLY guaranteed to be valid in Render, i.e. during the period between reading the input frame and and writing it to either the monitor or the free queue. The moment the current frame's reference count reaches zero, it no longer belongs to the renderer and can be overwritten by another thread at any time. In theory the issue could occur whenever the app pauses, but the odds are low. Continuous single stepping greatly increases the odds, by calling CreatePauseFrame in a tight loop.

It sounds scary but it's probably only a nuisance: so long as m_CurFrame still points to valid memory, the worst that can happen is an occasional bogus frame on the monitor.

If single step doesn't disable monitoring, the behavior becomes less likely, because in this case the frame is probably queued to the monitor window. The monitor window's timer hook is responsible for dequeuing and disposing of the frame, but it runs in the main thread, so it can't decrement the frame's reference count to zero while we're in SingleStep, because the main thread can't be in two places at once.

The fact that the engine stops the renderer after the plugins doesn't save us, because it doesn't guarantee that the renderer will render another frame between the time that some plugin munges the renderer's current frame just before stopping, and when the renderer worker thread stops.

A possible solution would be to change engine's Pause to do the following:

1. stop the renderer
2. wait for the render queue to contain a frame
3. pause the engine, stopping all plugin workers
4. single step the renderer, processing the queued frame
5. create the pause frame from the renderer's current frame

By the time you create the pause frame, all plugins were stopped BEFORE the final frame was rendered, so there's no one left alive to munge the current frame.

Note however that this solution breaks the normal single step, causing it to always step two frames instead of one.

Wednesday, April 11, 2012

In the works: column resizing, previews, non-AVI clips

The next version (2.2.04) is almost ready, and introduces a long-overdue feature: resizable columns in all views. It's only UI candy, but it's expected behavior and fairly low-risk. The next version also fixes a fairly serious bug which was accidentally introduced in 2.2.01: the MIDI Setup view's Plugin page is always empty, and selecting its tab can cause the view to resize incorrectly, overwriting the droplist and Learn check box.

After that, I'm considering adding a Preview bar, for previewing the output of one or more plugins. This would mostly be useful for clip players and source plugins. The bar would behave similarly to the Monitor bar, except that it would allow multiple preview windows within the bar. All the previews would be the same same size, but the number of previews would be user-selectable, along with their layout (horizontal stack, vertical stack, or tiled). The advantage of giving previews their own bar (as opposed to adding them to the existing Monitor bar) is that this allows the Monitor and Preview bars to be positioned differently in the GUI. The Monitor bar typically mirrors the program's main output, so it wants to be bigger than the previews; it may even belong on its own display (this is possible if you have enough video outputs).

I'm also working on upgrading the clip player to use DirectShow instead of VfW, so it can open non-AVI clips directly without needing AVISynth. This is a riskier proposition but it would make FFRend more user-friendly.

Thursday, April 05, 2012

FFRend 2.2.03: Monitor source is back!

FFRend 2.2.03 is available for download. It brings back monitor source selection, a nice feature that was lost in the V2 rewrite. It also fixes some bugs.

Monitor source selection allows the monitor bar to display the output of any plugin, not just the one connected to the renderer. This was difficult to implement in V2 due to multithreading complications, so it was omitted, until now. To change the monitor source, use Plugin/Monitor (F8), the plugin context menu, or the monitor bar's context menu.

Bug fixes include a stall that occurred when stopping a recording if the final plugin had multiple threads assigned to it, and UI jerkiness when a menu was displayed while an edit control had focus.

Also, due to an embarrassing oversight, the check for updates feature introduced in 2.2.02 fails to find its installer script. 2.2.03 fixes this, or if you prefer, there's a patch available here

Please download the latest version. Here are the release notes.

Tuesday, April 03, 2012

streaming uncompressed video: LAN vs. HDMI capture

One of my background research projects is streaming uncompressed video (ideally at least XGA res, 30 FPS) from one PC to another over a LAN. This would allow some interesting multi-user setups, e.g.:

FFRend -> FFRend: first PC/user is providing a submix to the second PC/user.
Whorld -> FFRend: same idea

The main advantages: distributed computing (load is distributed over multiple computers), and each PC can run full-screen and have dedicated user input (mouse, keyboard, MIDI etc). Whorld -> FFRend is particularly interesting b/c the Whorld app offers a much richer user experience than the UltraWhorld plugin.

I did a quick back of the envelope calculation: 1024 x 768 x 3 = 2.36 MB x 30FPS = 70.8 MB/s. Then I tested my Gigabit LAN. With 9k jumbo frames enabled, I was able to get around 79 MB/s, as measured by Netio. That's from an i7 920 box to an i5 2500K box, through a switch, with short cables. I tried eliminating the switch but it didn't matter. Bottom line: it's a bit too close for comfort but it might work.

The next thing I tried was bigFug's streamers. I tried for an hour or so, but no luck: I could only get them working in memory mode, not in TCP or UDP. So I resigned myself to rolling my own. No big deal, it's more fun anyway. I already had the sockets code kicking around from other projects. The one thing I didn't have was fast code for busting RGB32 down to RGB24. I run in 32-bit color, but I can't afford to send the unused alpha channel over the LAN. 1024 x 768 x 4 x 30 would be 94.5 MB/s, definitely not doable.

So did some poking around on the net, but in the end it was quicker to just roll the 32/24 conversions in x86 assembler. My benchmarks say the get the job done in about 500 microseconds. I could cut that in half or better using SSE3 but that's a hassle and I can't be bothered at this stage. The code is here if you need it.

Other points of interest: setting the sockets receive and send buffer sizes (SO_RCVBUF and SO_SNDBUF) is crucial. I got the best results by making them big enough to fit an entire frame. I tried disabling Nagle (TCP_NODELAY) but it turns out Nagle is helping in this case. Be careful to avoid round-trips. The send plugin should only send, and the receive plugin should only receive. Don't second-guess TCP, just relax and let it handle the bookkeeping.

So anyway I fired up my streaming plugins and discovered what I probably should have guessed: It works fine, so long as the PCs don't have much else to do. I can stream uncompressed XGA at 30 FPS with nary a hiccup, with one or maybe two (out of four) cores fully loaded by other plugins. But if I load that third core, the whole show grinds to a halt. Even though that core appears to be idle.

It's strange, because I don't show much CPU load from the streaming, even if I display kernel usage. My theory is that it's something else, maybe memory bandwidth, or bus traffic. The truth is, I just don't know why it doesn't work better.

It might work a whole lot better with 10 GB Ethernet, but the NICs are still way too expensive. So my question (finally) is: how difficult would it be to realize this scheme using HDMI capture?? I've seen cards around that claim to be able to capture uncompressed HDMI at resolutions plenty higher than XGA. The Blackmagic Intensity Pro even has a nice DirectShow API so I could presumably roll myself a Freeframe plugin to receive the incoming frames and insert them into my mix. Does this make any sense or am I just dreaming?

UPDATE: As it turns out the Blackmagic devices only output HD video resolutions (720p or 1080i).

I'm going to explore multihoming instead. According to my reading there's a decent chance I can get 120 MB/s or better using dual-port server-style NICs. The current crop from Intel looks good. Probably the CPU utilization will go down too, because with server-style cards the CPU can offload more of the work. It's a bit pricey but still cheap compared to 10 Gigabit.

HDMI capture wouldn't have solved my problem exactly anyway. The goal was basically to insert one frame stream directly into the other, so that no frames are lost. Streaming over TCP does this, and the proof is that if you pause the destination PC, flow control kicks in and the source PC pauses too. In other words the source PC is really a slave of the destination.

Unless I'm misunderstanding, with HDMI (or any other) capture, it's not master/slave, it's asynchronous: between the source and the destination you've got a display adapter which has its own clock and isn't pausing for anything. It neither knows nor cares whether the destination is capturing. If the destination pauses, the source merrily continues to output frames, and those frames are lost from the destination's point of view. This isn't what I want.

I might even try just installing some ordinary extra NICs, the $10 kind. I don't necessarily need LACP: since I have custom code, I could do my own link aggregation, e.g. by sending half the frame over one NIC and half over the other. Depending on CPU and memory bus saturation issues etc. it might not work, but it's a cheap and moderately amusing experiment.