Closed Bug 795940 Opened 12 years ago Closed 12 years ago

crash in mozilla::image::RasterImage::*Worker::Run

Categories

(Core :: Graphics: ImageLib, defect)

18 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox18 + fixed

People

(Reporter: scoobidiver, Assigned: joe)

References

(Depends on 3 open bugs)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [qa?])

Crash Data

Attachments

(7 files, 3 obsolete files)

951 bytes, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
22.10 KB, patch
jrmuizel
: review+
justin.lebar+bug
: review+
Details | Diff | Splinter Review
11.74 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
9.96 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
1.29 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
17.50 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
2.90 KB, patch
justin.lebar+bug
: review+
Details | Diff | Splinter Review
It first appeared in 18.0a1/20121001. The regression window is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a680fd777c3b&tochange=beee809b7ade
It's likely a regression from bug 486918.

Signature 	mozilla::LinkedListElement<mozilla::WebGLProgram>::remove() More Reports Search
UUID	11735dae-e0f1-4660-b745-baaa12121001
Date Processed	2012-10-01 16:06:19
Uptime	2697
Last Crash	more than 3 months before submission
Install Age	45.0 minutes since version was first installed.
Install Time	2012-10-01 15:21:12
Product	Firefox
Version	18.0a1
Build ID	20121001030603
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 37 stepping 5
Crash Reason	EXCEPTION_ACCESS_VIOLATION_WRITE
Crash Address	0x0
App Notes 	
AdapterVendorID: 0x8086, AdapterDeviceID: 0x0046, AdapterSubsysID: 146d103c, AdapterDriverVersion: 8.15.10.2302
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x8086
Adapter Device ID	0x0046
Total Virtual Memory	4294836224
Available Virtual Memory	3504123904
System Memory Use Percentage	40
Available Page File	12093915136
Available Physical Memory	4972847104

Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::LinkedListElement<mozilla::WebGLProgram>::remove 	obj-firefox/dist/include/mozilla/LinkedList.h:155
1 	xul.dll 	mozilla::LinkedList<mozilla::image::RasterImage::ScaleRequest>::popFirst 	obj-firefox/dist/include/mozilla/LinkedList.h:285
2 	xul.dll 	mozilla::image::RasterImage::ScaleWorker::Run 	image/src/RasterImage.cpp:2680
3 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:612
4 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:256
5 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:395
6 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:90
7 	msvcr100.dll 	_callthreadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:314
8 	msvcr100.dll 	_threadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:292
9 	kernel32.dll 	BaseThreadInitThunk 	
10 	ntdll.dll 	__RtlUserThreadStart 	
11 	ntdll.dll 	_RtlUserThreadStart

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3ALinkedListElement%3Cmozilla%3A%3AWebGLProgram%3E%3A%3Aremove%28%29
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A18.0a1&signature=nsQueryReferent%3A%3Aoperator%28%29%28nsID%20const%26%2C%20void**%29
It's currently #1 top crasher in today's build.
Keywords: topcrash
Crash Signature: [@ mozilla::LinkedListElement<mozilla::WebGLProgram>::remove()] [@ nsQueryReferent::operator()(nsID const&, void**)] → [@ mozilla::LinkedListElement<mozilla::WebGLProgram>::remove()] [@ nsQueryReferent::operator()(nsID const& void**)] [@ mozilla::LinkedListElement<mozilla::image::RasterImage::ScaleRequest>::setPreviousUnsafe(mozilla::image::RasterImage::ScaleRequest*)] …
OS: Windows 7 → All
Hardware: x86 → All
Crash Signature: mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::asT()] [@ mozilla::image::RasterImage::ScaleWorker::Run()] → mozilla::LinkedListElement<mozilla::image::DiscardTracker::Node>::asT()] [@ mozilla::image::RasterImage::ScaleWorker::Run()] [@ nsCOMPtr_base::assign_from_helper | mozilla::image::RasterImage::DrawWorker::Run] [@ nsCOMPtr_base::assign_from_helper(nsCOM…
Crash Signature: nsID const&) | nsCOMPtr<imgIContainerObserver>::nsCOMPtr<imgIContainerObserver>(nsCOMPtr_helper const&) | mozilla::image::RasterImage::DrawWorker::Run()] → nsID const&) | nsCOMPtr<imgIContainerObserver>::nsCOMPtr<imgIContainerObserver>(nsCOMPtr_helper const&) | mozilla::image::RasterImage::DrawWorker::Run()] [@ WaitForSingleObjectEx | WaitForSingleObject | google_breakpad::ExceptionHandler::WriteMinidumpOnH…
Summary: crash in mozilla::image::RasterImage::ScaleWorker::Run → crash in mozilla::image::RasterImage::*Worker::Run
Some URLs from both signatures:

5 	https://www.facebook.com/
5 	about:newtab
4 	http://www.facebook.com/
2 	https://maps.google.com/
2 	http://www.reddit.com/
2 	http://www.openstreetmap.org/?box=yes&bbox=24.04978%2C49.80822%2C24.09706%2C49.8
2 	https://maps.google.pt/maps?hl=pt-PT
1 	http://www.google.bg/#hl=bg&sclient=psy-ab&q=ngk+sve6ti&oq=ngk+sve6ti&gs_l=serp.
1 	http://www.distancesbetween.com/distance-between/distance-from-chandigarh-to-pra
1 	http://castelli-cycling.com/en/products/
1 	https://encrypted.google.com/search?tbs=sbi:AMhZZivMbMZKCQ25dXxbmhGpZqTohwKkFqVG
1 	https://www.facebook.com/plugins/
1 	http://skyrim.nexusmods.com/mods/19666
1 	https://maps.google.pl/maps?hl=pl
1 	https://mail.google.com/mail/
1 	https://www.facebook.com/sourav.paul.5070/photos_albums
1 	https://www.google.com/search?q=refuge+shore+vista&ie=utf-8&oe=utf-8&aq=t&rls=or
1 	http://www.facebook.com/SonicDeluxe/allactivity
1 	http://news.google.fr/nwshp?hl=fr&tab=wn&pog=false
1 	http://pinit-cdn.pinterest.com/pinit.html?url=http%3A%2F%2Fwww.sulit.com.ph%2Fin
1 	http://skyrim.nexusmods.com/mods/16983


4 	https://maps.google.com/
3 	about:newtab
3 	about:blank
2 	http://maps.google.com.br/maps?hl=pt-BR&tab=wl
2 	https://maps.google.es/maps
2 	https://maps.google.co.uk/
2 	https://maps.google.es/maps
2 	https://www.facebook.com/
2 	http://maps.google.ch/?q=47.5447261649031,7.58390039205551(Reichensteinerstrasse
1 	http://earthquake-report.com/2012/09/14/major-earthquakes-list-september-15-2012
1 	https://www.google.com/#hl=en&safe=off&sclient=psy-ab&q=lenfried+av+video&oq=len
1 	http://maps.google.ch/?q=47.5719922,7.5760249(H%C3%BCningerstrasse%2046,%204056%
1 	https://www.facebook.com/photo.php?fbid=408157772578888&set=pb.100001540061045.-
1 	http://maps.google.de/maps?oe=utf-8&rls=org.mozilla:en-US:unofficial&client=fire
1 	http://www.123i.com.br/apartamento/sp-sao-paulo/santana/rua-drua-guilherme-crist
1 	http://www.genbeta.com/linux/microsoft-esta-entre-los-veinte-maximos-contribuyen
1 	http://www.deredactie.be/cm/vrtnieuws
Crash Signature: google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread(_EXCEPTION_POINTERS*, MDRawAssertionInfo*)] [@ imgFrame::GetRect()] [@ ScaleFrameImage] [@ mozilla::image::RasterImage::DrawWorker::Run()] → google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread(_EXCEPTION_POINTERS* MDRawAssertionInfo*)] [@ imgFrame::GetRect()] [@ ScaleFrameImage] [@ mozilla::image::RasterImage::DrawWorker::Run()] [@ google_breakpad::ExceptionHandler::HandlePureV…
Crash Signature: void**)] [@ mozilla::LinkedListElement<mozilla::image::RasterImage::ScaleRequest>::setPreviousUnsafe(mozilla::image::RasterImage::ScaleRequest*)] [@ mozilla::LinkedList<mozilla::image::RasterImage::ScaleRequest>::popFirst] [@ mozilla::LinkedListElemen… → void**)] [@ nsQueryReferent::operator()] [@ mozilla::LinkedListElement<mozilla::image::RasterImage::ScaleRequest>::setPreviousUnsafe(mozilla::image::RasterImage::ScaleRequest*)] [@ mozilla::LinkedList<mozilla::image::RasterImage::ScaleRequest>::popFirs…
I don't know why I didn't make this connection before, but ScaleRequests can outlive their RasterImage, which is 100% a recipe for disaster.

I'd love to have STR, but I have a fix for this anyways.
This should fix this bug (and, incidentally, bug 795942). Basically, hold on to a reference to the RasterImage while the ScaleRequest is outstanding, and drop it once we're done.

I audited this code to make sure we do things in the right order, but a good deep review would be great, Justin. :)

Note that we both addref and release on the main thread, on purpose.
Assignee: nobody → joe
Attachment #667014 - Flags: review?(justin.lebar+bug)
Why does this code not follow our variable naming conventions?
It's a public struct, and lots of structs have non mFoo style. I don't care that much, though.
Crash Signature: [@ mozilla::LinkedListElement<mozilla::WebGLProgram>::remove()] [@ nsQueryReferent::operator()(nsID const& void**)] [@ nsQueryReferent::operator()] [@ mozilla::LinkedListElement<mozilla::image::RasterImage::ScaleRequest>::setPreviousUnsafe(mozilla::ima… → [@ mozilla::LinkedListElement<mozilla::WebGLProgram>::remove()] [@ nsQueryReferent::operator()(nsID const&, void**)] [@ nsQueryReferent::operator()] [@ nsNodeWeakReference::QueryReferent(nsID const& void**)] [@ mozilla::LinkedListElement<mozilla::imag…
> +  // While the request is outstanding, we hold a reference to it so it won't be
> +  // deleted from under us (and, since it owns us, so we won't be deleted).

While the request is outstanding, we hold a reference to /its RasterImage/, right?

When code which tries to be clever like this fails, it can fail really hard.  For example, if we ever don't call RequestDraw for a ScaleRequest, we've just leaked an image.  I'd rather we clean this up by making the ScaleRequests properly refcounted and all that jazz.

But you're probably going to have to make some sizable changes in order to support multiple scales per image, so I'm happy if we put this off until then.  I can't see how to break this patch.
Attachment #667014 - Flags: review?(justin.lebar+bug) → review+
Crash Signature: [@ mozilla::LinkedListElement<mozilla::WebGLProgram>::remove()] [@ nsQueryReferent::operator()(nsID const&, void**)] [@ nsQueryReferent::operator()] [@ nsNodeWeakReference::QueryReferent(nsID const& void**)] [@ mozilla::LinkedListElement<mozilla::imag… → [@ mozilla::LinkedListElement<mozilla::WebGLProgram>::remove()] [@ mozilla::LinkedListElement<mozilla::WebGLRenderbuffer>::remove()] [@ nsQueryReferent::operator()(nsID const&, void**)] [@ @0x0 | nsQueryReferent::operator()(nsID const& void**)] [@ nsQ…
Backed out again for leaking references at shutdown:  https://hg.mozilla.org/integration/mozilla-inbound/rev/9737260d1647
Without attachment 667014 [details] [diff] [review], this has no effect, but it's required to avoid leaks at shutdown in that patch. Basically, we explicitly shut down the scaling thread and the scaling and drawing singletons at shutdown time (rather than relying on ClearOnShutdown) so we can clear out their queues of images to scale.
Attachment #667749 - Flags: review?(justin.lebar+bug)
Attachment #667014 - Attachment description: hold a reference to the RasterImage while ScaleRequests are outstanding → part 1: hold a reference to the RasterImage while ScaleRequests are outstanding
Reverts bug 797632, because this bug should fix those crashes.
Attachment #667750 - Flags: review?(justin.lebar+bug)
Comment on attachment 667750 [details] [diff] [review]
part 2: reenable high-quality downscaling

Actually, I guess we should re-enable *after* the scaler's fixed.
Attachment #667750 - Attachment description: part -1: reenable high-quality downscaling → part 2: reenable high-quality downscaling
Crash Signature: skia::`anonymous namespace''::ConvolveHorizontally<int>(unsigned char const*, skia::ConvolutionFilter1D const&, unsigned char*)] [@ cairo_font_options_init_default] → skia::`anonymous namespace''::ConvolveHorizontally<int>(unsigned char const*, skia::ConvolutionFilter1D const&, unsigned char*)] [@ cairo_font_options_init_default] [@ msvcr100.dll@0x8af06] [@ js::mjit::stubs::FixupArity(js::VMFrame& unsigned int)] [@…
Depends on: 711953
Comment on attachment 667749 [details] [diff] [review]
part 0: Clear out our queues of scales and draws at shutdown

Heh.

I'd prefer MOZ_ASSERT to NS_ABORT_IF_FALSE, but w/e.
Attachment #667749 - Flags: review?(justin.lebar+bug) → review+
Attachment #667750 - Flags: review?(justin.lebar+bug) → review+
The previous two patches still leaked on Try, so I went back to the drawing board. This patch is based on the one in bug 795942. It's significantly more complex, but does a lot more. Basically:

1. We no longer hold on to imgFrames or direct pointers to RasterImage. Instead, we hold a reference to imgFrames' backing store surfaces so we know they exist. (We lock and unlock these references on the main thread). We also maintain a weak pointer to our RasterImage.

2. We dynamically allocate ScaleRequests and pass them to the scaling thread. They aren't refcounted (because doing that in conjunction with LinkedList<> turns out to be complex), but instead are single-use, allocated when passed to the thread and deleted when finished.

3. RasterImage stores a ScaleResult. Before the DrawRequest finishes, it gives the RasterImage its scaled frame (both are nsAutoPtrs, so operator= clears the ScaleRequest's pointer).

This is also going through try: https://tbpl.mozilla.org/?tree=Try&rev=dc441f7d5ddb

Note that it's possible that I still need something like part 1 to not leak at shutdown.
Attachment #668197 - Flags: review?(justin.lebar+bug)
Attachment #668197 - Flags: review?(jmuizelaar)
Comment on attachment 668197 [details] [diff] [review]
allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage

Review of attachment 668197 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/RasterImage.cpp
@@ +2653,2 @@
>  
>    mScaleRequests.insertBack(request);

It seems like we could just make ScaleRequests nsIRunnable and dispatch them directly instead of maintaining a queue by hand.

@@ +2748,5 @@
> +{
> +  if (request) {
> +    MOZ_ASSERT(request->done);
> +    mScaleResult.done = true;
> +    mScaleResult.pending = false;

done and pending might be better off in a single enumerated type.

::: image/src/RasterImage.h
@@ +853,5 @@
>    bool     IsDecodeFinished();
>    TimeStamp mDrawStartTime;
>  
>    inline bool CanScale(gfxPattern::GraphicsFilter aFilter, gfxSize aScale);
> +  ScaleResult mScaleResult;

Dropping the ability to cancel scale requests will need to be restored so that we can replace scale requests that aren't needed anymore.
Attachment #668197 - Flags: review?(jmuizelaar) → review+
Comment on attachment 668197 [details] [diff] [review]
allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage

Mind if I look at the next version?
Attachment #668197 - Flags: review?(justin.lebar+bug)
There are still crashes in m-c even after http://hg.mozilla.org/mozilla-central/rev/134f5b1d6d50
(In reply to Scoobidiver from comment #20)
> There are still crashes in m-c even after
> http://hg.mozilla.org/mozilla-central/rev/134f5b1d6d50

Yeah, it was backed out unfortunately: http://hg.mozilla.org/mozilla-central/rev/9737260d1647
(In reply to Joe Drew (:JOEDREW! \o/) from comment #21)
> Yeah, it was backed out unfortunately:
> http://hg.mozilla.org/mozilla-central/rev/9737260d1647
How do you explain there were about 2300 crashes per build before 18.0a1/20121003 and one or two crashes per build after 18.0a1/20121004 without any fixed dependent bugs?
(In reply to Scoobidiver from comment #22)
> (In reply to Joe Drew (:JOEDREW! \o/) from comment #21)
> > Yeah, it was backed out unfortunately:
> > http://hg.mozilla.org/mozilla-central/rev/9737260d1647
> How do you explain there were about 2300 crashes per build before
> 18.0a1/20121003 and one or two crashes per build after 18.0a1/20121004
> without any fixed dependent bugs?

bug 797632.
Depends on: 797632
Implemented the small suggestions from Jeff in this patch. Coming up: changing {Scale,Draw}Worker into an on-demand-allocated nsRunnable.
Attachment #667014 - Attachment is obsolete: true
Attachment #667749 - Attachment is obsolete: true
Attachment #668197 - Attachment is obsolete: true
Attachment #669745 - Flags: review?(justin.lebar+bug)
Attachment #669745 - Flags: review?(jmuizelaar)
Instead of maintaining our own queue (and all the complexity that comes with that), let's just create new objects and fire them at the various threads, which *already have queues*. This lets us remove the Mutex too, which is a nice bonus.

Still to do: implement Stopping of requests, and making sure we don't thrash.
Attachment #669747 - Flags: review?(justin.lebar+bug)
Attachment #669747 - Flags: review?(jmuizelaar)
Attachment #669745 - Attachment description: allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage v2 → part 0.1: allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage v2
Attachment #669747 - Attachment description: change from singletons to on-demand allocated runnables → part 0.2: change from singletons to on-demand allocated runnables
While we can't support stopping a scale request that's currently executing, we can abort it if we get to it early.
Attachment #670088 - Flags: review?(justin.lebar+bug)
Attachment #670088 - Flags: review?(jmuizelaar)
Part 0.1 reverted this, and we need it.
Attachment #670089 - Flags: review?(justin.lebar+bug)
Attachment #670089 - Flags: review?(jmuizelaar)
Attachment #669745 - Flags: review?(jmuizelaar) → review+
Comment on attachment 670089 [details] [diff] [review]
part 0.4: only support a single downscaling request

Review of attachment 670089 [details] [diff] [review]:
-----------------------------------------------------------------

Regrettably.
Attachment #670089 - Flags: review?(jmuizelaar) → review+
This moves the helper classes to the CPP file to avoid having them parsed by everyone else.

The only non-trivial change is moving the observer notification to RasterImage, which feels more clean anyways.

ONLY JEFF MUIZELAAR MAY REVIEW THIS BECAUSE HE FORCED ME TO DO IT
Attachment #670141 - Flags: review?(jmuizelaar)
Comment on attachment 670088 [details] [diff] [review]
part 0.3: support stopping requests

Review of attachment 670088 [details] [diff] [review]:
-----------------------------------------------------------------

This seems ok, but perhaps over complicated?

::: image/src/RasterImage.h
@@ +570,5 @@
>      uint32_t dstStride;
>      mozilla::gfx::SurfaceFormat srcFormat;
>      bool dstLocked;
>      bool done;
> +    bool stopped;

This needs at least a comment. It would be better to be atomic<bool> stopped.
Attachment #670088 - Flags: review?(jmuizelaar) → review+
Comment on attachment 669745 [details] [diff] [review]
part 0.1: allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage v2

I'm going to do a bad thing here and unflag myself for the reviews I asked to do.  Sorry.  :(

If you /want/ the extra set of eyes, let me know and I'll make time.
Attachment #669745 - Flags: review?(justin.lebar+bug)
Attachment #669747 - Flags: review?(justin.lebar+bug)
Attachment #670088 - Flags: review?(justin.lebar+bug)
Attachment #670089 - Flags: review?(justin.lebar+bug)
Comment on attachment 669745 [details] [diff] [review]
part 0.1: allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage v2

Justin, I definitely want another set of eyes on this patch. The others I will just make sure Jeff does a deep review on it.
Attachment #669745 - Flags: review?(justin.lebar+bug)
Attachment #670141 - Flags: review?(jmuizelaar) → review+
Comment on attachment 669747 [details] [diff] [review]
part 0.2: change from singletons to on-demand allocated runnables

Review of attachment 669747 [details] [diff] [review]:
-----------------------------------------------------------------

Overall I really like the look of this.

::: image/src/RasterImage.cpp
@@ +2618,3 @@
>  {
> +  if (!sScaleWorkerThreadInitialized) {
> +    PR_SetCurrentThreadName("Image Scaler");

NS_SetThreadName() please

@@ +2781,5 @@
>      else if (!(mScaleResult.status == SCALE_PENDING && mScaleResult.scale == scale)) {
> +      nsRefPtr<ScaleRunner> runner = new ScaleRunner(this, scale, frame);
> +      if (runner->IsOK()) {
> +        if (!sScaleWorkerThread) {
> +          NS_NewThread(getter_AddRefs(sScaleWorkerThread), runner, NS_DISPATCH_NORMAL);

It would be more clear if you passed NULL instead of runner and did the dispatch below unconditionally.

::: image/src/RasterImage.h
@@ +477,1 @@
>    {

We should move ScaleRequest into RasterImage.cpp. I can't see a reason for it to be out here.
Attachment #669747 - Flags: review?(jmuizelaar) → review+
I'm still working my way through this code, but one piece stuck out at me:

>+      if (!image) {
>+        return false;
>       }
>-      return srcDataLocked;
>+
>+      bool success = false;
>+      if (!dstLocked) {
>+        nsRefPtr<gfxASurface> surf;
>+
>+        success |= NS_SUCCEEDED(image->LockImage());
>+        dstLocked = NS_SUCCEEDED(dstFrame->LockImageData());
>+        success |= NS_SUCCEEDED(srcFrame->LockImageData());
>+
>+        success |= NS_SUCCEEDED(srcFrame->GetSurface(getter_AddRefs(surf)));
>+        srcSurface = surf->GetAsImageSurface();
>+        success |= NS_SUCCEEDED(dstFrame->GetSurface(getter_AddRefs(surf)));
>+        dstSurface = surf->GetAsImageSurface();
>+
>+        success = success && dstLocked && srcSurface && dstSurface;
>+
>+        if (success) {
>+          srcData = srcSurface->Data();
>+          dstData = dstSurface->Data();
>+          srcStride = srcSurface->Stride();
>+          dstStride = dstSurface->Stride();
>+          srcFormat = mozilla::gfx::ImageFormatToSurfaceFormat(srcFrame->GetFormat());
>+        }
>+
>+        // We have references to the Thebes surfaces, so we don't need to leave
>+        // the frames we don't own locked.
>+        success |= NS_SUCCEEDED(image->UnlockImage());
>+        success |= NS_SUCCEEDED(srcFrame->UnlockImageData());
>+      }
>+      return success;

I don't get how this is right.

Suppose dstFrame->GetSurface() fails.  Then dstSurface =
srcFrame->GetSurface()->GetAsImageSurface()??

Indeed, suppose srcFrame->GetSurface() fails.  Then surf is null and we crash.

At the end, why are we successful if we unlock either the image or srcFrame?
Surely if we failed to lock one of them above, we should only unlock that one
down here.

I don't see why we should succeed if image->LockImage() fails while
srcFrame->LockImageData() succeeds, but that may be my inexperience with this
API talking.

(Similarly, I don't see why we can unlock the src data but can wait to unlock
dstFrame's iamge data until ReleaseSurfaces() -- if the key is that you're
holding a strong ref to the gfx surface, then that applies to both objects --
but maybe that's because I know nothing about gfx.)

And so on; there are probably other bugs.
Comment on attachment 669745 [details] [diff] [review]
part 0.1: allocate ScaleRequests dynamically, hold onto ScaleResult, use weak pointer for RasterImage v2

Nothing else jumps out at me here as particularly worrying.

This review is on a diff I generated by passing this patch through git's patience diff:

  $ git apply <(curl -L https://bug795940.bugzilla.mozilla.org/attachment.cgi?id=669745)
  $ git diff -U8 --patience

The effect of the diff should be identical to the attachment, but the text of
the diff is slightly different.

>diff --git a/image/src/RasterImage.h b/image/src/RasterImage.h
>-    bool LockSourceData()
>+    // This can only be called on the main thread.

Please add a precondition (either in a comment or explicitly in source) that
dstFrame is non-null.  It's not at all clear that I'm expected to set it before
calling this.

>+    bool GetSurfaces(imgFrame* srcFrame)
>     {
>-      if (!srcDataLocked) {
>-        bool success = true;
>-        success = success && NS_SUCCEEDED(image->LockImage());
>-        success = success && NS_SUCCEEDED(srcFrame->LockImageData());
>-        srcDataLocked = success;
>+      MOZ_ASSERT(NS_IsMainThread());
>+
>+      RasterImage* image = weakImage;

To be paranoid, hold onto a strong ref here?

>-    bool UnlockSourceData()
>+    // This can only be called on the main thread.
>+    bool ReleaseSurfaces()
>     {
>-      bool success = true;
>-      if (srcDataLocked) {
>-        success = success && NS_SUCCEEDED(image->UnlockImage());
>-        success = success && NS_SUCCEEDED(srcFrame->UnlockImageData());
>-
>-        // If unlocking fails, there's nothing we can do to make it work, so we
>-        // claim that we're not locked regardless.
>-        srcDataLocked = false;
>+      MOZ_ASSERT(NS_IsMainThread());
>+
>+      RasterImage* image = weakImage;

Strong ref out of paranoia?  (This one is more reasonable than the last one,
since dstFrame->UnlockImageData() could conceivably destroy the image.)

Why we don't we touch dstFrame if the image is not alive?

ISTM that if !image we should do everything here, possibly including
dstFrame->UnlockImageData(); I don't see why we return early.

>@@ -230,33 +176,29 @@ RasterImage::RasterImage(imgStatusTracker* aStatusTracker) :

>+  request->done = mozilla::gfx::Scale(request->srcData, request->srcRect.width, request->srcRect.height, request->srcStride,
>+                                      request->dstData, request->dstSize.width, request->dstSize.height, request->dstStride,
>+                                      request->srcFormat);

Looking at how you use this field, it seems like ScaleRequest::successful would
be a better name than ScaleRequest::done.

Also, nit, these are very long lines.

>+bool
>+RasterImage::ScaleWorker::RequestScale(ScaleRequest* request,
>+                                       RasterImage* image,
>+                                       imgFrame* aSrcFrame)
> {

Please add a comment here that this method returns true iff it saves the scale
request into the worker's list.  That's quite important.

In fact, you could avoid a footgun by having this method /create/ the scale
request.

>   mRequestsMutex.AssertCurrentThreadOwns();
> 
>-  ScaleRequest* request = &aImg->mScaleRequest;
>-  if (request->isInList())
>-    return;
>+  // Destination is unconditionally ARGB32 because that's what the scaler
>+  // outputs.
>+  request->dstFrame = new imgFrame();
>+  nsresult rv = request->dstFrame->Init(0, 0, request->dstSize.width, request->dstSize.height,
>+                                        gfxASurface::ImageFormatARGB32);
>+
>+  if (NS_FAILED(rv) || !request->GetSurfaces(aSrcFrame)) {
>+    return false;
>+  }
> 
>   mScaleRequests.insertBack(request);
> 
>   if (!sScaleWorkerThread) {
>     NS_NewThread(getter_AddRefs(sScaleWorkerThread), this, NS_DISPATCH_NORMAL);
>     ClearOnShutdown(&sScaleWorkerThread);

I've had bugs in the past where we draw images after shutdown.  If we
RequestScale after ClearOnShutdown does its business, then it will assert here.
Also, I don't know what creating a thread that late will do, but it's probably
not pretty.

Anyway, forewarned is forearmed.

>@@ -2757,65 +2698,57 @@ RasterImage::DrawWorker::Singleton()
>+  // ScaleWorker is finished with this request, so we can unlock the data now.
>+  request->ReleaseSurfaces();
>+
>+  // Only set the scale result if the request finished successfully.
>+  if (request->done) {
>+    RasterImage* image = request->weakImage;
>+    if (image) {
>+      nsCOMPtr<imgIContainerObserver> observer(do_QueryReferent(image->mObserver));
>+      if (observer) {
>+        imgFrame *scaledFrame = request->dstFrame.get();
>+        scaledFrame->ImageUpdated(scaledFrame->GetRect());
>+        observer->FrameChanged(nullptr, image, &request->srcRect);
>+      }
>+
>+      image->SetScaleResult(request);

Here ISTM you really should have a strong ref to the image.

>@@ -2836,16 +2769,38 @@ RasterImage::CanScale(gfxPattern::GraphicsFilter aFilter,
> void
>+RasterImage::SetScaleResult(ScaleRequest* request)

Looking at how you use this method, separating it into two methods seems
clearer to me.

>+void
>+RasterImage::SetResultPending(ScaleRequest* request)

Put "Scale" somewhere in this method name?  If you split up SetScaleResult,
then maybe make the three methods' names parallel in some way?
Attachment #669745 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #34)
> (Similarly, I don't see why we can unlock the src data but can wait to unlock
> dstFrame's iamge data until ReleaseSurfaces() -- if the key is that you're
> holding a strong ref to the gfx surface, then that applies to both objects --
> but maybe that's because I know nothing about gfx.)

This is the only part you're misunderstanding. We're exploiting the API here: when you lock a frame, the surface it returns will be the one that backs the frame. We write directly to that surface in the scaler, and then allow imgFrame to do its usual thing (Optimize() to other types of surfaces) when we Unlock. So it needs to be Unlocked only *after* the scaler's done.
(In reply to Justin Lebar [:jlebar] from comment #35)
> Why we don't we touch dstFrame if the image is not alive?
> 
> ISTM that if !image we should do everything here, possibly including
> dstFrame->UnlockImageData(); I don't see why we return early.

Good point. Will do.

> >+bool
> >+RasterImage::ScaleWorker::RequestScale(ScaleRequest* request,
> >+                                       RasterImage* image,
> >+                                       imgFrame* aSrcFrame)
> > {
> 
> Please add a comment here that this method returns true iff it saves the
> scale
> request into the worker's list.  That's quite important.
> 
> In fact, you could avoid a footgun by having this method /create/ the scale
> request.

I believe part 0.2 fulfills this, because we change ScaleWorker to ScaleRunner (an nsIRunnable that we create every time we need a scale, rather than a singleton), then do this:

+      nsRefPtr<ScaleRunner> runner = new ScaleRunner(this, scale, frame);
+      if (runner->IsOK()) {
+        if (!sScaleWorkerThread) {
+          NS_NewThread(getter_AddRefs(sScaleWorkerThread), runner, NS_DISPATCH_NORMAL);
+          ClearOnShutdown(&sScaleWorkerThread);
+        }
+        else {
+          sScaleWorkerThread->Dispatch(runner, NS_DISPATCH_NORMAL);
+        }

> >@@ -2836,16 +2769,38 @@ RasterImage::CanScale(gfxPattern::GraphicsFilter aFilter,
> > void
> >+RasterImage::SetScaleResult(ScaleRequest* request)
> 
> Looking at how you use this method, separating it into two methods seems
> clearer to me.
> 
> >+void
> >+RasterImage::SetResultPending(ScaleRequest* request)
> 
> Put "Scale" somewhere in this method name?  If you split up SetScaleResult,
> then maybe make the three methods' names parallel in some way?

In part 0.3 I change these to ScalingStart, which sets various members to the correct state, and ScalingEnd, which does what SetScaleResult used to do based on the status of the scale. Does that suffice?
The changes in parts 0.2 and 0.3 look great.  Thanks.
This will be folded into part 0.2, but I wanted to make sure that you couldn't poke other holes in it first :)

I think this addresses comment 34's problems.
Attachment #670969 - Flags: review?(justin.lebar+bug)
It appears that GetAsImageSurface() can fail, and this patch doesn't check for that.

The rest looks fine, although I admit that now that the logic inside this "struct" is complicated, the lack of "mFoo" on members makes it a bit tricky to follow.
Attachment #670969 - Flags: review?(justin.lebar+bug) → review+
(In reply to Ryan VanderMeulen from comment #42)
> Any way to test this?

Bug 795371 :(
Doesn't seem "Resolved Fixed."
In this build (Win64 19.0a1 2012-10-22) it seems buggy, properly scaling some images some of the time, but more often does not do HQ scaling.
Not crashing for me, just not working correctly. Same tests work exactly as expected for me in MSIE and Chromium.

These are tests that produce results I describe, for me.
https://bug486918.bugzilla.mozilla.org/attachment.cgi?id=428179
https://bug486918.bugzilla.mozilla.org/attachment.cgi?id=657754
I have similar results on some other (internal) pages, so I have seen it with diff. images from multiple sources.

(Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0)
(Build platform, target, i686-pc-mingw32)
I know it's counter-intuitive, but what we usually do at Mozilla is mark bugs as "resolved fixed" when they land (even if they're not working properly) and then file new bugs on follow-up issues.

You are much likely to get the attention you would like for your concerns if you file a new bug and mark it as blocking this one.
Can this patch be pushed to Firefox 18 or there's too much risk?
*way* too risky. That's why we didn't enable high quality downscaling on 18 :)
(In reply to Scott Thorne from comment #44)
> Not crashing for me, just not working correctly.

"Not crashing" means that this particular bug is indeed fixed. Any other problems, as Justin said in comment #45, should go into their own bugs, we like tracking every individual issue as an individual bug. :)
Depends on: 807236
Could not verify the crash for 2012-10-01 Nightly entering the URLs from comment 2 
Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/18.0 Firefox/18.0
Build ID: 20121001030603

are there any special configurations to reproduce this?

Verified for
Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/18.0 Firefox/18.0 beta 1
Build ID: 20121121075611

and Latest Nightly
Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20.0 Firefox/20.0
Build ID: 20121127030907

There were no crashes opening and exploring links from comment 2 for Firefox 18.0 beta 1 and Latest Nightly.
Whiteboard: [qa?]
Since high-quality image downscaling has been (re?)enabled, I experience display artefacts on downscaled images.  See bug 831244 for config and details.  (Should I reopen this bug?)
(In reply to David A. Madore from comment #50)
> (Should I reopen this bug?)

This bug was fixed. You shouldn't get a crash now. So why reopen this bug?
Depends on: 831244
Depends on: 846315
No longer depends on: 846315
Depends on: 849590
Depends on: 895412
Depends on: 896249
Depends on: CVE-2014-1557
Depends on: 927377
No longer depends on: 927377
Depends on: 1013224
Restrict Comments: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: