[statnet_help] Upgrading from ERGM v3.10 to v4.6

Carter T. Butts via statnet_help statnet_help at u.washington.edu
Fri Aug 30 01:37:41 PDT 2024


Hi, Aditya -

I'll be interested in Pavel's take on the convergence issues, but just
to verify, you are assessing convergence based on a /second/ MCMC run,
correct?  The MCMC statistics in the ergm object are from the
penultimate iteration, and may thus be out of equilibrium (but this does
/not/ necessarily mean that the /model/ did not converge).  However, if
you simulate a new set of draws from the fitted model and the mean stats
do not match, /then/ you have an issue.  (This is why we now point folks
to gof() for that purpose.)  It looks like your plots are from the ergm
object and not from a gof() run (or other secondary simulation), so I
want to verify that first.

I also note that a quick glance at the plots from your more exhaustive
simulation case don't seem all that far off, which could indicate either
that the model did converge (and per above, we're not looking at a draw
from the final model), or that it converged within the tolerances that
were set, and you may need to tighten them.  But best to first know if
there's a problem in the first place.

Another observation is that, per my earlier email, you may need O(N^2)
toggles per draw to get good performance if your model has a nontrivial
level of dependence.  You are using a thinning interval of 1e6, which is
in your case around 30*N.  It's possible that you've got too much
dependence for that: O(N^2) here would mean some multiple of about 1e9,
which is about a thousand times greater than what you're using.  Really
large, sparse networks sometimes /can/ be modeled well without that much
thinning, but it's not a given.  Relatedly, your trace plots from the
1e6 run suggest a fair amount of autocorrelation on some statistics,
which suggests a lack of efficiency.  (Autocorrelation by itself isn't
necessarily a problem, but it means that your effective MCMC sample size
is smaller than it seems, and this can reduce the effectiveness of the
MCMCMLE procedure.   The ones from the 1e6 run aren't bad enough that I
would be alarmed, but if I were looking for things to tighten up and
knew this could be a problem, they suggest possible room for
improvement.)  So anyway, I wouldn't crank this up until verifying that
it's needed, but you are still operating on the low end of computational
effort (whether it seems like it or not!).

Finally, I would note that for the stochastic approximation method,
convergence is to some degree (and it's a bit complex) determined by how
many subphases are run, and how many iterations are used per subphase. 
This algorithm is due to Tom in his classic JoSS paper (but without the
complement moves), which is still a good place to look for details.  It
is less fancy than some more modern algorithms of its type, but is
extremely hard to beat (I've tried and failed more than once!).  In any
event, there are several things that can tighten that algorithm relative
to its defaults, including increasing thinning, increasing the
iterations per subphase, and increasing the number of subphases.  Some
of these sharply increase computational cost, because e.g. the number of
actual subphase iterations doubles (IIRC) at each subphase - so
sometimes one benefits by increasing the phase number but greatly
reducing the base number of iterations per phase.  The learning rate
("SA.initial.gain") can also matter, although I would probably avoid
messing with it if the model is well-behaved (as here).  I will say
that, except under exotic conditions in which I am performing
Unspeakable ERGM Experiments (TM) of which we tell neither children nor
grad students, I do not recall ever needing to do much with the base
parameters - adjusting thinning, as needs must, has almost always done
the trick.  Still, if other measures fail, tinkering with these settings
can/will certainly affect convergence.

I'd check on those things first, and then see if you still have a
problem....

Hope that helps,

-Carter

On 8/29/24 12:13 PM, Khanna, Aditya wrote:

>

> Hi Carter and All,

>

>

> Thank you so much for the helpful guidance here. I think following

> your suggestions has brought us very close to reproducing the target

> statistics in the simulated networks, but there are still some gaps.

>

>

> Our full previous exchange is below, but to summarize:  I have an ERGM

> that I fit previously with ERGM v3.10.4 on a directed network with

> 32,000 nodes. The model consisted of in- and out-degrees in addition

> to other terms, including a custom distance term. In trying to

> reproduce this fit with ergm v4.6, the model did not initially converge.

>

>

> Your suggestion to try setting the main.method = “Stochastic

> Approximation” considerably improved the fitting. Specifying the

> convergence detection to “Hotelling” on top of that brought us almost

> to simulated networks that capture all the mean statistics. (Following

> an old discussion thread

> <https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/346__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGb4BZQZz$>on

> the statnet github, I also tried setting the termination criteria to

> Hummel and MCMLE.effectiveSize = NULL. I think, for me, in practice,

> Hotelling worked a bit better than Hummel though).

>

>

> In general, I tried fitting the model with variants of this

> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/blob/27736b2728965188ed73821e797b5ac7007b1093/fit-ergms/ergm-estimation-with-meta-data.R*L257-L296__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGU058cV1$>specification.

> I got the best results with setting both MCMC samplesize=1e6 and

> interval = 1e6 (see table below).

>

>

> MCMC interval

>

>

>

> MCMC sample size

>

>

>

> Convergence Detection

>

>

>

> Results/Outcome

>

>

>

> Note

>

> 1e6

>

>

>

> 1e6

>

>

>

> Hotelling

>

>

>

> Closest agreement  between simulated and target statistics

>

>

>

> Max. Lik. fit summary and simulation Rout

> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/commit/777bae726d29dae969f06e0d17b40ee59a01a7fc__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGffZwKYQ$>

>

>

> Violin plots

> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/tree/rhel9-setup/fit-ergms/out__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGZwugBZJ$>showing

> the simulated and target statistics for each parameter

>

>

>

> But, I found that this was the closest I could get producing simulated

> statistics that matched the target statistics. In general, any further

> increasing or decreasing of either the samplesize or interval did not

> help generate a closer result, i.e., this looked to be some optimum in

> the fit parameter space. I can provide further details on the results

> of those fits, which for some configurations didn’t converge, and if

> they did converge, the goodness-of-fit was worse than what I had with

> setting the MCMC interval and samplesize to 1e6. Based on your

> experiences, I was wondering if this is expected?

>

>

> For now, my main question is, are there any suggestions on how I can

> further tune the fitting parameters to match my targets more closely?

> I can provide specific details on the outcomes of those fitting

> processes if that would be helpful.

>

>

> Thanks for your consideration.

>

> Aditya

>

> On Thu, May 16, 2024 at 2:33 PM Carter T. Butts via statnet_help

> <statnet_help at u.washington.edu> wrote:

>

> Hi, Aditya -

>

> I will defer to the mighty Pavel for the exact best formula to

> reproduce 3.x fits with the latest codebase. (You need to switch

> convergence detection to "Hotelling," and there are some other

> things that must be modified.) However, as a general matter, for

> challenging models where Geyer-Thompson-Hummel has a hard time

> converging (particularly on a large node set), you may find it

> useful to try the stochastic approximation method

> (main="Stochastic" in your control argument will activate it). 

> G-T-H can (in principle) have sharper convergence when near the

> solution, but in practice SA fails more gracefully.   I would

> suggest increasing your default MCMC thinning interval

> (MCMC.interval), given your network size; depending on density,

> extent of dependence, and other factors, you may need O(N^2)

> toggles per step.  It is sometimes possible to get away with as

> few as k*N (for some k in, say, the 5-100 range), but if your

> model has substantial dependence and is not exceptionally sparse

> then you will probably need to be in the quadratic regime.  One

> notes that it can sometimes be helpful when getting things set up

> to run "pilot" fits with the default or otherwise smaller thinning

> intervals, so that you can discover if e.g. you have a data issue

> or other problem before you spend the waiting time on a

> high-quality model fit.

>

> To put in the obligatory PSA, both G-T-H and SA are simply

> different strategies for computing the same thing (the MLE, in

> this case), so both are fine - they just have different

> engineering tradeoffs.  So use whichever proves more effective for

> your model and data set.

>

> Hope that helps,

>

> -Carter

>

>

> On 5/16/24 7:52 AM, Khanna, Aditya via statnet_help wrote:

>> Dear Statnet Dev and User Community:

>>

>> I have an ERGM that I fit previously with ERGM v3.10.4 on a

>> directed network with 32,000 nodes. The model included in- and

>> out-degrees, in addition to other terms. The complete Rout from

>> this fit can be seen here

>> <https://urldefense.com/v3/__https://gist.github.com/khanna7/aefd836baf47463051439c9e72764388__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwORmxHSho$>.

>> I am now trying to reproduce this fit with ergm v4.6, but the

>> model does not converge. (See here

>> <https://urldefense.com/v3/__https://gist.github.com/khanna7/fbabdde53c79504dfeaebd215bb5ee20__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOW7y31IM$>.)

>>

>> I am looking for ideas on how to trouble shoot this. One

>> suggestion I got was to set values for the "tuning parameters" in

>> the v4.6 to their defaults from v3.11.4. But ERGM v4.6 has a lot

>> more  parameters that can be specified, and I am not sure which

>> ones make most sense to consider.

>>

>> I would be grateful for any suggestions on this or alternate

>> ideas to try.

>>

>> Many thanks,

>> Aditya

>>

>>

>>

>>

>> --

>>

>> <https://urldefense.com/v3/__https://sph.brown.edu/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWf8YDMv$>

>>

>>

>> <https://urldefense.com/v3/__https://sph.brown.edu/events/10-year-anniversary__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOcsy9Aer$>

>>

>>

>>

>> Aditya S. Khanna, Ph.D.

>>

>> Assistant Professor

>>

>> Department of Behavioral and Social Sciences

>>

>> Center for Alcohol and Addiction Studies

>>

>> Brown University School of Public Health

>>

>> Pronouns: he/him/his

>>

>>

>> 401-863-6616

>>

>> sph.brown.edu

>> <https://urldefense.com/v3/__https://sph.brown.edu/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWf8YDMv$>

>>

>> https://vivo.brown.edu/display/akhann16

>> <https://urldefense.com/v3/__https://vivo.brown.edu/display/akhann16__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWy55iTf$>

>>

>>

>> _______________________________________________

>> statnet_help mailing list

>> statnet_help at u.washington.edu

>> https://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwObRNh35k$

> _______________________________________________

> statnet_help mailing list

> statnet_help at u.washington.edu

> http://mailman13.u.washington.edu/mailman/listinfo/statnet_help

> <https://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGTc77FsB$>

>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/statnet_help/attachments/20240830/1ba58843/attachment-0001.html>


More information about the statnet_help mailing list