[statnet_help] Upgrading from ERGM v3.10 to v4.6

Khanna, Aditya via statnet_help statnet_help at u.washington.edu
Thu Sep 5 12:57:20 PDT 2024


Hi Carter,

Thank you so much for your helpful response as always. I have organized my
report in terms of the various things you suggest.

Verifying MCMC, GOF and the “second” MCMC: Yes, the ERGM for the model
described below does converge, but, despite having converged, the simulated
networks don’t seem to statistically capture the targets. I did make the GOF
<https://github.com/hepcep/net-ergm-v4plus/blob/rhel9-setup/fit-ergms/out/updated-with-oct12-2024-synthpop-ergmv4-6-all-plosone-terms-increase-mcmc-1e6-hotelling.pdf>
plots as well. Most of the terms look good, though some have peaks that are
off from the zero. In general, however, I have come to rely more on
actually simulating networks from the fitted ERGM object (what I think you
mean by “second MCMC run”) in addition to the GOF plots. Usually I consider
my goal fulfilled if the simulated network objects capture the targets,
even if the GOF plots don’t look perfect.

Model Convergence and tightening the MCMC tolerances: In terms of
tightening the MCMC tolerances, I did increase the MCMC interval to 1e9, of
the order of O(N^2). But this particular specification timed out after 120
hours, and I didn’t try to run it for longer time than that.

Alternate parameters to tighten the MCMC: I have experimented with the MCMC
sample size and interval parameters, but have not been able to improve the
quality of the simulated network. I am not as familiar with what options
are available within the bounds of some reasonable computational cost.


In summary, the problem remains that despite the ERGM convergence, the
quality of the simulated networks suggests room for improvement, since the
specified targets are not captured within the distribution of the simulated
networks.

Aditya

On Fri, Aug 30, 2024 at 4:37 AM Carter T. Butts via statnet_help <
statnet_help at u.washington.edu> wrote:


> Hi, Aditya -

>

> I'll be interested in Pavel's take on the convergence issues, but just to

> verify, you are assessing convergence based on a *second* MCMC run,

> correct? The MCMC statistics in the ergm object are from the penultimate

> iteration, and may thus be out of equilibrium (but this does *not*

> necessarily mean that the *model* did not converge). However, if you

> simulate a new set of draws from the fitted model and the mean stats do not

> match, *then* you have an issue. (This is why we now point folks to

> gof() for that purpose.) It looks like your plots are from the ergm object

> and not from a gof() run (or other secondary simulation), so I want to

> verify that first.

>

> I also note that a quick glance at the plots from your more exhaustive

> simulation case don't seem all that far off, which could indicate either

> that the model did converge (and per above, we're not looking at a draw

> from the final model), or that it converged within the tolerances that were

> set, and you may need to tighten them. But best to first know if there's a

> problem in the first place.

>

> Another observation is that, per my earlier email, you may need O(N^2)

> toggles per draw to get good performance if your model has a nontrivial

> level of dependence. You are using a thinning interval of 1e6, which is in

> your case around 30*N. It's possible that you've got too much dependence

> for that: O(N^2) here would mean some multiple of about 1e9, which is about

> a thousand times greater than what you're using. Really large, sparse

> networks sometimes *can* be modeled well without that much thinning, but

> it's not a given. Relatedly, your trace plots from the 1e6 run suggest a

> fair amount of autocorrelation on some statistics, which suggests a lack of

> efficiency. (Autocorrelation by itself isn't necessarily a problem, but it

> means that your effective MCMC sample size is smaller than it seems, and

> this can reduce the effectiveness of the MCMCMLE procedure. The ones from

> the 1e6 run aren't bad enough that I would be alarmed, but if I were

> looking for things to tighten up and knew this could be a problem, they

> suggest possible room for improvement.) So anyway, I wouldn't crank this

> up until verifying that it's needed, but you are still operating on the low

> end of computational effort (whether it seems like it or not!).

>

> Finally, I would note that for the stochastic approximation method,

> convergence is to some degree (and it's a bit complex) determined by how

> many subphases are run, and how many iterations are used per subphase.

> This algorithm is due to Tom in his classic JoSS paper (but without the

> complement moves), which is still a good place to look for details. It is

> less fancy than some more modern algorithms of its type, but is extremely

> hard to beat (I've tried and failed more than once!). In any event, there

> are several things that can tighten that algorithm relative to its

> defaults, including increasing thinning, increasing the iterations per

> subphase, and increasing the number of subphases. Some of these sharply

> increase computational cost, because e.g. the number of actual subphase

> iterations doubles (IIRC) at each subphase - so sometimes one benefits by

> increasing the phase number but greatly reducing the base number of

> iterations per phase. The learning rate ("SA.initial.gain") can also

> matter, although I would probably avoid messing with it if the model is

> well-behaved (as here). I will say that, except under exotic conditions in

> which I am performing Unspeakable ERGM Experiments (TM) of which we tell

> neither children nor grad students, I do not recall ever needing to do much

> with the base parameters - adjusting thinning, as needs must, has almost

> always done the trick. Still, if other measures fail, tinkering with these

> settings can/will certainly affect convergence.

>

> I'd check on those things first, and then see if you still have a

> problem....

>

> Hope that helps,

>

> -Carter

> On 8/29/24 12:13 PM, Khanna, Aditya wrote:

>

> Hi Carter and All,

>

> Thank you so much for the helpful guidance here. I think following your

> suggestions has brought us very close to reproducing the target statistics

> in the simulated networks, but there are still some gaps.

>

> Our full previous exchange is below, but to summarize: I have an ERGM

> that I fit previously with ERGM v3.10.4 on a directed network with 32,000

> nodes. The model consisted of in- and out-degrees in addition to other

> terms, including a custom distance term. In trying to reproduce this fit

> with ergm v4.6, the model did not initially converge.

>

> Your suggestion to try setting the main.method = “Stochastic

> Approximation” considerably improved the fitting. Specifying the

> convergence detection to “Hotelling” on top of that brought us almost to

> simulated networks that capture all the mean statistics. (Following an old discussion

> thread

> <https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/346__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGb4BZQZz$>

> on the statnet github, I also tried setting the termination criteria to

> Hummel and MCMLE.effectiveSize = NULL. I think, for me, in practice,

> Hotelling worked a bit better than Hummel though).

>

> In general, I tried fitting the model with variants of this

> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/blob/27736b2728965188ed73821e797b5ac7007b1093/fit-ergms/ergm-estimation-with-meta-data.R*L257-L296__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGU058cV1$>

> specification. I got the best results with setting both MCMC samplesize=1e6

> and interval = 1e6 (see table below).

>

> MCMC interval

>

> MCMC sample size

>

> Convergence Detection

>

> Results/Outcome

>

> Note

>

> 1e6

>

> 1e6

>

> Hotelling

>

> Closest agreement between simulated and target statistics

>

> Max. Lik. fit summary and simulation Rout

> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/commit/777bae726d29dae969f06e0d17b40ee59a01a7fc__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGffZwKYQ$>

>

>

> Violin plots

> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/tree/rhel9-setup/fit-ergms/out__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGZwugBZJ$>

> showing the simulated and target statistics for each parameter

>

>

> But, I found that this was the closest I could get producing simulated

> statistics that matched the target statistics. In general, any further

> increasing or decreasing of either the samplesize or interval did not help

> generate a closer result, i.e., this looked to be some optimum in the fit

> parameter space. I can provide further details on the results of those

> fits, which for some configurations didn’t converge, and if they did

> converge, the goodness-of-fit was worse than what I had with setting the

> MCMC interval and samplesize to 1e6. Based on your experiences, I was

> wondering if this is expected?

>

> For now, my main question is, are there any suggestions on how I can

> further tune the fitting parameters to match my targets more closely? I can

> provide specific details on the outcomes of those fitting processes if that

> would be helpful.

>

> Thanks for your consideration.

> Aditya

>

> On Thu, May 16, 2024 at 2:33 PM Carter T. Butts via statnet_help <

> statnet_help at u.washington.edu> wrote:

>

>> Hi, Aditya -

>>

>> I will defer to the mighty Pavel for the exact best formula to reproduce

>> 3.x fits with the latest codebase. (You need to switch convergence

>> detection to "Hotelling," and there are some other things that must be

>> modified.) However, as a general matter, for challenging models where

>> Geyer-Thompson-Hummel has a hard time converging (particularly on a large

>> node set), you may find it useful to try the stochastic approximation

>> method (main="Stochastic" in your control argument will activate it).

>> G-T-H can (in principle) have sharper convergence when near the solution,

>> but in practice SA fails more gracefully. I would suggest increasing your

>> default MCMC thinning interval (MCMC.interval), given your network size;

>> depending on density, extent of dependence, and other factors, you may need

>> O(N^2) toggles per step. It is sometimes possible to get away with as few

>> as k*N (for some k in, say, the 5-100 range), but if your model has

>> substantial dependence and is not exceptionally sparse then you will

>> probably need to be in the quadratic regime. One notes that it can

>> sometimes be helpful when getting things set up to run "pilot" fits with

>> the default or otherwise smaller thinning intervals, so that you can

>> discover if e.g. you have a data issue or other problem before you spend

>> the waiting time on a high-quality model fit.

>>

>> To put in the obligatory PSA, both G-T-H and SA are simply different

>> strategies for computing the same thing (the MLE, in this case), so both

>> are fine - they just have different engineering tradeoffs. So use

>> whichever proves more effective for your model and data set.

>>

>> Hope that helps,

>>

>> -Carter

>>

>>

>> On 5/16/24 7:52 AM, Khanna, Aditya via statnet_help wrote:

>>

>> Dear Statnet Dev and User Community:

>>

>> I have an ERGM that I fit previously with ERGM v3.10.4 on a directed

>> network with 32,000 nodes. The model included in- and out-degrees, in

>> addition to other terms. The complete Rout from this fit can be seen here

>> <https://urldefense.com/v3/__https://gist.github.com/khanna7/aefd836baf47463051439c9e72764388__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwORmxHSho$>.

>> I am now trying to reproduce this fit with ergm v4.6, but the model does

>> not converge. (See here

>> <https://urldefense.com/v3/__https://gist.github.com/khanna7/fbabdde53c79504dfeaebd215bb5ee20__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOW7y31IM$>

>> .)

>>

>> I am looking for ideas on how to trouble shoot this. One suggestion I got

>> was to set values for the "tuning parameters" in the v4.6 to their defaults

>> from v3.11.4. But ERGM v4.6 has a lot more parameters that can be

>> specified, and I am not sure which ones make most sense to consider.

>>

>> I would be grateful for any suggestions on this or alternate ideas to try.

>>

>> Many thanks,

>> Aditya

>>

>>

>>

>>

>> --

>>

>>

>> <https://urldefense.com/v3/__https://sph.brown.edu/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWf8YDMv$>

>>

>>

>> <https://urldefense.com/v3/__https://sph.brown.edu/events/10-year-anniversary__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOcsy9Aer$>

>>

>> Aditya S. Khanna, Ph.D.

>>

>> Assistant Professor

>>

>> Department of Behavioral and Social Sciences

>>

>> Center for Alcohol and Addiction Studies

>>

>> Brown University School of Public Health

>>

>> Pronouns: he/him/his

>>

>> 401-863-6616

>>

>> sph.brown.edu

>> <https://urldefense.com/v3/__https://sph.brown.edu/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWf8YDMv$>

>>

>> https://vivo.brown.edu/display/akhann16

>> <https://urldefense.com/v3/__https://vivo.brown.edu/display/akhann16__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWy55iTf$>

>>

>> _______________________________________________

>> statnet_help mailing liststatnet_help at u.washington.eduhttps://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwObRNh35k$

>>

>> _______________________________________________

>> statnet_help mailing list

>> statnet_help at u.washington.edu

>> http://mailman13.u.washington.edu/mailman/listinfo/statnet_help

>> <https://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGTc77FsB$>

>>

> _______________________________________________

> statnet_help mailing list

> statnet_help at u.washington.edu

> http://mailman13.u.washington.edu/mailman/listinfo/statnet_help

>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/statnet_help/attachments/20240905/1d536751/attachment-0001.html>


More information about the statnet_help mailing list