Monday, December 25, 2006

PeteMixer trashes 2 bytes beyond output buffer

In version 1.15 of the Freeframe plugins, PeteMixer trashes the 2 bytes immediately following the output buffer. Version 1.14 also has the bug. I believe the bug also occurs in the AFX and VJo versions but I can't verify this directly.

[NOTE: this bug also occurs in PeteRadialBlur, PeteSpiralBlur, and PeteTimeBlur. For more info see here.]

The memory violation only occurs with the MMX version of the plugin. Interestingly, the non-MMX version of PeteMixer also fails, but in a different way: it always outputs black.

In the non-MMX version, the output colors are shifted right 16 when 8 was intended.

// nOutputBlue>>=16; // ck: shifting twice as much as needed
nOutputBlue>>=8;

The MMX error is the use of movq when movd was intended. The output pointer pCurrentOutput is a 32-bit pointer, and it's being incremented by one (i.e. four bytes) for each iteration, which means that on the last iteration, writing 64 bits to *pCurrentOutput trashes the first two bytes of whatever happens to be above the output buffer in memory.

// movq [esi],mm7 // ck: TRASHES 2 bytes beyond output buffer
movd [esi],mm7 // ck: 32-bit move

The input pointers are also using movq, and while this doesn't overwrite memory it could cause a protection violation.

Theoretically the MMX version also uses more memory bandwidth than it needs to. I benchmarked it, and found that the corrected version is indeed slightly faster. For the original, 1000 calls to ProcessFrameCopy on a 640 x 480 frame took an average of 15.304 ms per frame, while the corrected version took an average of 14.820 per frame: a gain of 3%. I speculate that the gain is so small because the needless memory operations always hit L2 cache.

Heap-trashing bugs are notoriously difficult to find, and this one was no exception. They don't always cause a crash, and even when they do, it typically happens much later, in some unrelated component. In fact it was precisely this symptom--bizarre failures in things that never failed before--that pointed me in the right direction.

One disadvantage of writing such nice free plugins is that they're everywhere. I shudder to think how many mysterious crashes have been unjustly blamed on host applications. Ah, the joys and perils of inline assembler.

You can download an UNOFFICIAL patched DLL here. For more info see the patched source.

Monday, November 27, 2006

SetSurfaceDesc memory leak

The CVideo object is leaking memory badly, and I'm surprised I never noticed it before. It leaks the entire DirectSurface every time a new video is opened. At 640 x 480 that's around 1 MB per open, which adds up fast. This could explain why Whorld misbehaves after many hours of triggering video clips.

Initially I suspected VfW, but then I verified that VfW definitely cleans up after itself. The problem turns out to be with DirectDraw, specifically with SetSurfaceDesc.

CVideo's constructor creates a default 1 x 1 memory surface. When a video is opened, CVideo attaches this surface to the video frame, using SetSurfaceDesc. This is a major optimization: it allows CVideo to avoid using GDI to blit each video frame to the DirectDraw surface, because the video frame IS the DirectDraw surface. In practice SetSurfaceDesc only needs to be called once, when the video opens, because VfW doesn't change the address of video frame after that. In fact it only changes the address if you open a video with a different frame size or pixel format, sensibly enough. CVideo checks for a change in frame buffer address, and if one occurs, it reattaches its surface to the new address.

According to the MSDN on SetSurface, "The DirectDrawSurface object will not deallocate surface memory that it didn't allocate. Therefore, when the surface memory is no longer needed, it is your responsibility to deallocate it. However, when SetSurfaceDesc is called, DirectDraw frees the original surface memory that it implicitly allocated when creating the surface."

I interpreted this to mean that once you've done at least one SetSurfaceDesc for a surface, you're on your own, as far as memory management. But what happens is, when the video is closed, DirectDraw leaves some object the same size as the frame buffer object allocated. It can't be VfW's frame buffer, because VfW destroys that when you call AVIStreamGetFrameClose. I can't imagine how or why this happens, but it sure isn't documented.

I only found two ways to make DirectDraw release this mysterious hidden frame buffer. The obvious way is to destroy the surface, but I'd prefer not to do this, because it means destroying and re-creating the surface every time a video is opened, which seems wasteful. The other way is to call SetSurfaceDesc again, passing it a 1 x 1 dummy surface. This works fine, and only takes about 50 microseconds. The surface description never changes, so it can even be a static array.

DDSURFACEDESC CVideo::m_DefSurf = {
sizeof(DDSURFACEDESC), // dwSize
DDSD_WIDTH | DDSD_HEIGHT | DDSD_PITCH | DDSD_LPSURFACE | DDSD_PIXELFORMAT, // dwFlags
1, // dwHeight
1, // dwWidth
4, // lPitch (Width * BitCount / 8)
0, 0, 0, 0, // dwBackBufferCount, dwMipMapCount, dwAlphaBitDepth, dwReserved
&m_DefSurfMem, // lpSurface
{0, 0}, {0, 0}, {0, 0}, {0, 0}, // color keys
{
sizeof(DDPIXELFORMAT), // dwSize
DDPF_RGB, // dwFlags
0, // dwFourCC
32, // dwRGBBitCount
0xff0000, // dwRBitMask
0x00ff00, // dwGBitMask
0x0000ff // dwBBitMask
}
};
DWORD CVideo::m_DefSurfMem; // pointed to by m_DefSurf.lpSurface
...
void CVideo::Close()
{
// if surface exists, we must attach it to a default 1 x 1 memory surface,
// otherwise DirectDraw leaves a mysterious hidden frame buffer allocated
if (m_Surface != NULL)
m_Surface->SetSurfaceDesc(&m_DefSurf, 0); // prevents a major leak
...

Friday, November 24, 2006

Undo performance

The undo manager uses CArray to implement the undo history. As a result, the performance of undo notification varies significantly depending on whether undo is limited, or unlimited. Performance was measured using a test function that repeatedly generates the same undo event, as shown below. To simulate realistic conditions, the test function was called from the timer hook, and the results were stored in an array and written after the test, avoiding potential interference from file I/O.

If undo is unlimited, notification time is mostly constant, except when the CArray has to grow. Since growing entails copying the entire array to a new memory location, the time required to grow the CArray increases linearly with the number of undoable edits. In a test of 10000 iterations, undo notification took an average of 50 microseconds. The actual samples were nearly indistinguishable from the average, except when the array grew, resulting in peaks which increased linearly, up to 1.6 milliseconds by the end of the test. The time between peaks also increased linearly as expected, due to MFC's heuristic method of computing the grow size. There were also a few seemingly random, unexplained spikes of nearly 2.5 milliseconds.

If undo is limited, notification time is constant. This is because once the limit is reached, adding a new notification deletes the oldest event from the history. Deleting from the front of a CArray requires copying the entire array down one element, but the array size is constant, so there's no memory reallocation, and the time required to do the copy doesn't change. In a test of 10000 iterations, undo notification took an average of 60 microseconds, only 10 microseconds more than the unlimited case. The actual samples were similar to the average, with randomly-spaced peaks up to around 150 microseconds. Again there were some unexplained spikes, though they were an order of magnitude lower, around 250 microseconds.

Note that OnPlugBypass with undo notification commented out takes an average of 38 microseconds, so in all cases undo notification takes longer than other work performed by OnPlugBypass.

Conclusion: undo performance is suboptimal, due to the use of CArray. An implementation based on CList would almost certainly perform better for unlimited undo, and probably the same or slightly better for limited undo. This optimization needs to be weighed against substantially increased complexity in the undo manager, e.g. array indexing would have to be replaced by iteration.

This hypothesis was tested by slapping together a minimally functional CList-based implementation and repeating the test. The result: for unlimited undo, the average time was 48 microseconds, and the actual samples showed only minor deviations, e.g. 80 or 150 microseconds, except for the occasional unexplained 2.5 millisecond spike. On the other hand, the undo manager complications look pretty formidable.

static const MAX_SAMPS = 10000;
float samp[MAX_SAMPS];
int samps = 0;
void CMainFrame::OnTimer(UINT nIDEvent)
{
if (m_Plugin[0].IsCreated()) {
#if 0 // zero for unlimited undo
if (!samps)
m_UndoMgr.SetLevels(100);
#endif
OnPlugBypass();
if (samps == MAX_SAMPS) {
FILE *fp = fopen("undo bench.txt", "wc");
for (int i = 0; i < samps; i++)
fprintf(fp, "%d\t%f\n", i, samp[i]);
fclose(fp);
exit(0);
}
}
...

#include "benchmark.h"
extern float samp[];
extern int samps;
void CFFPlugsDlg::OnPlugBypass()
{
int sel = GetCurSel();
if (sel >= 0) {
CBenchmark b;
NotifyUndoableEdit(UCODE_BYPASS);
samp[samps++] = float(b.Elapsed());
BypassPlugin(sel, !IsPluginBypassed(sel));
}
}

Thursday, November 09, 2006

Synchronizing automations to clip length

The manual method is pretty straightforward, though it does require a calculator. Take FFRend's ideal frame rate (NOT the video clip's frame rate, that doesn't matter), and divide it by the video clip's frame count. Now multiply the result by 100. Enter that number in the Master speed toolbar, and you're all set, though you might also want to pause, rewind the clip, and sync the oscillators.

This scheme redefines the frequency unit, from Hertz to clip passes. A frequency of 1 will repeat once per clip pass, 2 will repeat twice per clip pass, .5 will repeat every other clip pass, etc.

The X 100 accounts for the fact that master speed is a percentage.

For example, if the clip is 1859 frames long, and it's playing at 25 FPS:

Master Speed = 25 / 1859 * 100 = 1.3448

Monday, November 06, 2006

frame buffer bit counts

PlayerFF works in Resolume and Flowmotion, but not in OpenTZT, because OpenTZT passes 24-bit frames to the plugin, even though the screen resolution is 32-bit. The underlying problem is that you can't use DirectDraw to blit between surfaces with different bit counts. I tell my AVI reader (AviToBmp) to uncompress the video into the best format for the display (by passing AVIStreamGetFrameOpen AVIGETFRAMEF_BESTDISPLAYFMT). I use SetSurfaceDesc to turn the video frame into a DirectDraw surface, which means if my display is set for 32 bits, my video frame is also 32 bits, regardless of the actual color depth of the video. That's optimal if the host frame buffers also have the display's bit count, which you would think they would, but in OpenTZT, they don't for some reason, so the blit fails with error E_NOTIMPL.

AVIStreamGetFrameOpen can be also passed a BITMAPINFO that tells it what format to decompress to. This allows me force the the video format to match the host's format, as follows:

BITMAPINFOHEADER bih;
ZeroMemory(&bih, sizeof(bih));
bih.biSize = sizeof(bih);
bih.biWidth = m_pBmpInfo->bmiHeader.biWidth;
bih.biHeight = m_pBmpInfo->bmiHeader.biHeight;
bih.biPlanes = 1;
bih.biBitCount = 24; // or whatever host wants
m_pGetFrame = AVIStreamGetFrameOpen(m_pStream, &bih);

Another solution is to just accept that PlayerFF won't work in OpenTZT. Most VJ softwares don't need a player plugin anyway, because they already have elaborate media players built into them. Let's not forget that PlayerFF is primarily designed for use in FFRend!

Another problem: OpenTZT and Flowmotion display PlayerFF's output upside-down, but it looks fine in Resolume and FFRend. Something's pretty wrong there...

plugin ID must be unique in Resolume

It appears that Resolume uses the plugin ID to keep track of its freeframe plugins. It came up because PlayerFF was using FFDemoSrc's plugin ID, and since FFDemoSrc happened to also be in Resolume's plugin folder, selecting PlayerFF actually selected FFDemoSrc instead. Flowmotion doesn't exhibit this behavior. So the freeframe documentation doesn't lie: plugin ID really does need to be unique! One wonders what non-authority is responsible for coordinating this...

PlayerFF: freeframe clip player

I got my standalone freeframe clip player up last night. It's called PlayerFF (OK maybe it needs a better name). It handles AVI/BMP/JPG/GIF, and has three parameters so far:

Clip Select (which clip you're playing)
Pause (0 is play, any other value is pause)
Position (0 is the start of the clip, 1 is the end)

The clips are hard-coded at the moment. :(

Here's what I propose for clip management. The plugin should have both a "Clip Select" and a "Bank Select" parameter. It will look in the magical folder "\My Documents\PlayerFF". Any clips it finds there will wind up in bank zero, UNLESS the magical folder contains an optional playlist file. The playlist file must be called playlist.txt, and it contains the paths of the clips to load, one per line, with optional bank separators. Clips are loaded in the order they appear in the playlist, or if there's no playlist, in alphabetical order.

:0
C:\temp\avi files\Night Traffic.avi
C:\temp\avi files\earth1.avi
C:\temp\avi files\Boat Ride to Punta Sal (xvid).avi
C:\temp\avi files\01_24_04-med.avi
:1
C:\temp\avi files\tint.avi
C:\temp\avi files\kissinggirls.avi
C:\Chris\images\debbie\DSC_0080.jpg

plugin and project info can have different parameter counts

I just found a neat bug. I added some parameters to my new PlayerFF plugin, and when I loaded up a FFRend project that uses it, there was garbage in the modulation settings for the new parameters.

It turns out I was assuming that the plugin's number of parameters, and the number of parameters I have information about in the project file, are always the same. That's normally the case of course, but a new version of the plugin with more (or less) parameters violates that assumption. Oops.

And the solution:

// the plugin's number of parameters might not match our info's parameter count,
// for example if it's a different version of the plugin; take whichever is less
int rows = min(GetPluginRows(PlugIdx), Info.m_Parm.GetSize());

Saturday, October 28, 2006

how to keep frame counter from clobbering toolbar hints

in CMainFrame::OnNotify:

case AFX_IDW_TOOLBAR:
if (nmh->code == TBN_HOTITEMCHANGE) {
LPNMTBHOTITEM lpnmhi = (LPNMTBHOTITEM)nmh;
if (lpnmhi->dwFlags & HICF_ENTERING) // if entering toolbar
m_HideFrameCounter = TRUE; // hide frame counter
else if (lpnmhi->dwFlags & HICF_LEAVING) // if leaving toolbar
m_HideFrameCounter = FALSE; // show frame counter
}
break;

How to get a huge file size


static bool GetFileSizeEx(LPCSTR Path, LARGE_INTEGER& Size)
{
HANDLE hFile = CreateFile(Path, GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
if (hFile == INVALID_HANDLE_VALUE)
return(FALSE);
Size.LowPart = GetFileSize(hFile, (PULONG)&Size.HighPart);
CloseHandle(hFile);
return(Size.LowPart != 0xFFFFFFFF || GetLastError() != NO_ERROR);
}

Friday, October 20, 2006

More on matching original big hex

Considerable progress has been made, using a special version of WhorldFF that reads jumps times from a list, i.e. a file containing a list of the frame numbers at which to do random jumps. The frame numbers were determined by painstaking experimentation.

The key concept is that in the version of FFRend that created original big hex, all of WhorldFF's parameters were initially set by the host. By comparison, in the current version of FFRend, parameters are only set if they differ from their defaults. So for example, number of rings defaults to .5. If it's .5 in the preset, it won't be sent to WhorldFF, so WhorldFF will use the patch's value (1000) instead of 154 (what .5 denormalizes to). The corrected preset uses a value of .500001 rather than .5, because this tricks FFRend into sending WhorldFF the parameter, but is also close enough to the desired number (.5) so that it makes no difference.

Another thing: Tile's Cell Width and Cell height have to start at .54, not .5! No idea why but it's crucial. With this change we get exact matching at 640 x 480, except for very minor variations near the jump points.

Oh and one more thing: original big hex's initial frame offset turns out to 65, not 67. This was obscured by the Tile Cell parameter error.

Minor detail: still # 4454 was mislabeled, it's actually 4554.

Iages 2288 and 2297 aren't correct even at 640 x 480, and since they're suspiciously close to the jump at 2283, it's likely that 2283 is misplaced. The 1105 image is also off (the jump is at 1104, a single frame before!).

Tuesday, October 17, 2006

qsort-based template class for sorting arrays


template class CSortArray {
public:
static void Sort(T *a, int Size, bool Desc = FALSE) {
qsort(a, Size, sizeof(T), Desc ? CmpDesc : CmpAsc);
}

private:
static int CmpAsc(const void *arg1, const void *arg2) {
if (*(T *)arg1 < *(T *)arg2)
return(-1);
if (*(T *)arg1 > *(T *)arg2)
return(1);
return(0);
}
static int CmpDesc(const void *arg1, const void *arg2) {
if (*(T *)arg1 > *(T *)arg2)
return(-1);
if (*(T *)arg1 < *(T *)arg2)
return(1);
return(0);
}
};

Sunday, October 15, 2006

SetTimer granularity problem, and multimedia alternative

SetTimer's granularity is 10 ms in w2k, and 15.625 ms in XP. As the following data shows, w2k can achieve exactly 25 Hz, but in XP the best fit is 21.33 Hz. Neither OS can achieve 30 Hz, and worse yet, they fail differently: a period of 33 ms gets you 25 Hz in w2k, and 21.33 Hz in XP.

available fequencies (w2k):
period 1..10 11..20 21..30 31..40 41..50 51..60
freq 100 50 33.33 25 20 16.67

available fequencies (XP):
period 1..15 16..31 32..46 47..62 63..78 79..93
freq 64 32 21.33 16 12.8 10.67

Note: in XP, weird behavior occurs near the above boundaries, e.g. at period = 31, freq oscillates between 31.03 and 30.52.

Multimedia timers appear to have 1 ms granularity, but are significantly more expensive in terms of CPU load. Note that that the callback runs in a system thread and must not post timer messages while OnTimer is running; otherwise if OnTimer consumes more than one timer period's worth of CPU time, the GUI will be non-responsive.

volatile BOOL m_InTimer; // true if we're in OnTimer
static void CALLBACK MMTimeProc(UINT uID, UINT uMsg, DWORD dwUser, DWORD dw1, DWORD dw2)
{
if (!m_InTimer)
PostMessage((HWND)dwUser, WM_TIMER, 0, 0);
}

Saturday, October 14, 2006

Matching big hex at HD high-res

At 1920 x 1440, Zoom FF param should be (log(0.997395 * 3) + 1) / 2 = .737994
Must also compensate line width! Should be 3 * 6 = 18
Perfect match with WhorldFF, Kaleidascope and Tile. Problem lies further down the signal chain.
Timeblur is also a perfect match, Solarize is also fine, the problem is with Glow.
Hypothesis: if resolution is doubled, inner and outer radius must both be doubled also.
Result: It works fine up to 1280 x 960. At 1600 x 1200 and above, Glow exhibits unexpected behavior.
Bummer. Looks like we're limited to 1280 x 960.

NOTE that you can't change the desired frame rate in the options dialog without also compensating master speed, otherwise all the automations will be off, e.g. if you double the frame rate you must also double master speed.

Friday, October 13, 2006

Resolume plugins that aren’t compatible with FFRend

Invalid bit depth (24-bit only):
iua_RectField.dll
resAsciiArt.dll
resCaptureScreen.dll
resChristmasBalls.dll
resLumaImage.dll
resPuzzle.dll
resResolumeBlocks.dll
resTracker.dll
resZxSpectrum.dll

Crashes:
resDelay.dll

Doesn’t work?
resFeedback.dll
resDelayBlend.dll

Making big images

Recording at 1920 x 1080 (HD) works fine. On the hot rod, I get almost 2 FPS uncompressed and around 1 FPS using XVID. That seems pretty damn slow, but it's about 3000 times faster than Electric Sheep. The question is whether 1080 is good enough for generating poster-sized images, e.g. 11 x 17 at 300 DPI. HD 1080 is only 6.4 x 3.6 at 300 DPI. I tried 5100 x 3300 and it didn't seem to work even on the hot rod. There was memory to spare, so I'm guessing that the exponential increase in compute time would mean render times on the order of an hour per frame, like Electric Sheep's. It would be interesting to see whether it eventually coughs up some frames.

Tuesday, October 10, 2006

Matching the original big hex recording

Hue parameter must be .50000001 which fools "don't set parm needlessly" test, so that hue gets set.

Since origin motion is random, WhorldFF does an initial jump even though the tempo is zero. It takes a while to settle down because of damping. The final location is:

x = .001251258888516
y = .563585314493240

After the initial jump it appears to stay put, but NO, it jumps again at around frame 5000. Why? Looks like a divide by zero problem somewhere, maybe CRealTimer::SetFreq isn't handling zero correctly.

It's hopeless, because the illustrated curves patch uses a random oscillator for poly sides. The random jumps are being triggered asynchronously by CRealTimer's thread, which causes both the origin sequence and the poly sides sequence to be unpredictable.

So the original can't be matched. Sorry! One option is to just live with big hex being different every time it'srecorded. That's either a cool feature or a pain in the ass, depending on your point view. Another option is to disable random jumps in the illustrated curves patch. That should make big hex deterministic, though this needs to be proved. (Yes, it's deterministic, provided you remember to restart the app before each recording).

Disabling random jumps also fixes the occasional inconsistencies in big hex's origin motion. Since whorld and the kaleidescope effect both have origin motion, they're adding or subtracting, and sometimes canceling each other out, which causes the origin to lurch, hesitate, or reverse. But is this good or bad behavior? Again it's subjective: it could be interesting, or annoying. I generally like the smooth origin motion better.

Experiment: big hex 66% with Al Fasawz. The animation moves too fast and spoils the mood of the music. Try a master pitch of 27%. That's the maximum speed reduction possible without losing some time blur (the original had time blur = .27, and .27 / .27 = 1.0, which is the maximum Freeframe parameter). Slow enough?

Monday, September 25, 2006

Artifact along right edge

It's a narrow vertical strip at the far right edge, spanning the entire height of the image, that's shifted vertically by a few a pixels. It appears to come from Pete's glow effect and possibly from other effects as well. Check this and notify him. To remove it from frame captures, crop to 560 x 480 via PS Canvas Size.

Tuesday, September 19, 2006

Move bug

After repeated drag moves, plugins weren't matching their titles and parameter rows. Inserting source first was a bad idea. The correct method is: copy source to a temp, delete source, insert destination, copy temp to destination. This also gets rid of the bump source/dest kludge.

Useful debug code that helped solve it:

void CFFPluginArray::Dump()
{
printf("\n");
printf("Count = %d, LastLoadedIdx = %d\n", m_Count, m_LastLoadedIdx);
for (int i = 0; i < m_Count; i++) {
CString s;
if (m_Plugin[i].IsLoaded()) {
CFFPlugin *pp = &m_Plugin[i].m_Plugin;
pp->GetPluginName(s);
printf("%d: '%s' %s\n", i, s, m_Plugin[i].IsBypassed() ? "[BYPASS]" : "");
for (int j = 0; j < pp->GetNumParams(); j++) {
pp->GetParamName(j, s);
if (m_Plugin[i].IsCreated()) {
printf("\t'%s' = %g\n", s, m_Plugin[i].m_Instance.GetParam(j));
} else
printf("\t'%s'\n", s);
}
} else
printf("%d: (empty)\n", i);
}
}

Sunday, August 13, 2006

master pitch problems

Whorld's Master Speed parameter must be compensated for master pitch changes. This is non-trivial because it's a logarithmic value mapped to a linear normalized parameter. The formula to determine the correct Master Speed parameter for a given pitch is:

ff_speed = ((log(whorld_speed) / log(20)) + 1) / 2

where the nominal pitch is 1.0 (100%), half-speed is 0.5 (50%), double-speed is 2.0 (200%), etc.

Some examples:
whorld FF
1.0 .5
0.5 .384311
2.0 .615689
.6666 .432326211
.27 .281466900

Time Blur must also be compensated. The solution turns out to be trivial: just divide the Time Blur by the pitch, i.e. if Time Blur is .27, and pitch is .5, Time Blur should be .54. The idea is that if you're creating half-speed data, you want to blur twice as many frames to achieve the same effect.

Other Whorld parameters:

ff_zoom = (log(whorld_zoom) + 1) / 2
ff_hue = (whorld_hue / 360)

CNumEdit bug

A CNumEdit bug was discovered While attempting to compensate Time Blur's parameter for master pitch: CNumEdit notifies the parent before the aux window (aux is a CEditSlider in this case). The parent (a row dialog) sends a notification via SendMessage instead of PostMessage, and since the recipient (main frame) reads the value from the slider, the value is stale, because the slider hasn't been updated yet. The symptom: if a parameter is edited by typing text in the edit control and pressing tab, the slider moves to correct position, but the plugin doesn't respond to the change. The solution: CNumEdit should notify the aux window first, then the parent.

This is also a problem in Whorld, but it's masked because CMasterDlg::OnNotify uses PostMessage. Using SendMessage causes the same bug to appear. With the corrected CNumEdit, CMasterDlg::OnNotify can use SendMessage without problems. This is preferrable since SendMessage is theoretically more efficient.