[statnet_help] Upgrading from ERGM v3.10 to v4.6
Carter T. Butts via statnet_help
statnet_help at u.washington.edu
Sat Sep 7 17:28:45 PDT 2024
Hi, Aditya -
On 9/5/24 12:57 PM, Khanna, Aditya wrote:
>
> Hi Carter,
>
>
> Thank you so much for your helpful response as always. I have
> organized my report in terms of the various things you suggest.
>
>
> Verifying MCMC, GOF and the “second” MCMC: Yes, the ERGM for the model
> described below does converge, but, despite having converged, the
> simulated networks don’t seem to statistically capture the targets. I
> did make the GOF
> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/blob/rhel9-setup/fit-ergms/out/updated-with-oct12-2024-synthpop-ergmv4-6-all-plosone-terms-increase-mcmc-1e6-hotelling.pdf__;!!CzAuKJ42GuquVTTmVmPViYEvSg!LKGctG3KpUbsSQwk8O2E14uWvIjFBhbEa_mHZ5i_MJ9vtXw_UfiYpxvp4p1HYdnN3aFwRdIYqyjPTzdZdnlas3N1$>plots
> as well. Most of the terms look good, though some have peaks that are
> off from the zero. In general, however, I have come to rely more on
> actually simulating networks from the fitted ERGM object (what I
> think you mean by “second MCMC run”) in addition to the GOF plots.
> Usually I consider my goal fulfilled if the simulated network objects
> capture the targets, even if the GOF plots don’t look perfect.
>
>
There seems to be some confusion here: setting aside exotica, if your
mean in-model statistics don't match your observed in-model statistics,
then your model has not converged. The matching here that matters is
from an MCMC run from the fitted model, which is what the gof() function
does (but this is /not/ what you get from MCMC diagnostics on the fitted
ergm() object, which should show the penultimate run - those are handy
for diagnosing general MCMC issues, but need not show convergence even
when the final coefficients yield a convergent model). The plots you
linked to under "GOF" above seem to be MCMC diagnostics from an ergm()
object, not gof() output. I'll come back to "exotica" below, but first
it is important to be sure that we are discussing the same thing.
> Model Convergence and tightening the MCMC tolerances:In terms of
> tightening the MCMC tolerances, I did increase the MCMC interval to
> 1e9, of the order of O(N^2). But this particular specification timed
> out after 120 hours, and I didn’t try to run it for longer time than that.
>
>
Unfortunately, there's no "that's longer than I want it to be" card that
convinces mixing to happen more quickly: if your model is really going
to take 1e9 or more updates per step to converge, and if that takes
longer than 120 hours on your hardware, then that's how long it's going
to take. If there were magic sauce that could guarantee great
convergence with minimal computational effort every time, it would
already be in the code. That said, I don't know that all other options
have yet been ruled out; and, when folks encounter expensive models on
large networks, there are various approximate solutions that they may be
willing to live with. But if one is working with a dependence model on
>=1e4 nodes, one must except that one may be in a regime in which
gold-standard MCMC-MLE is very expensive. Just sayin'.
> Alternate parameters to tighten the MCMC: I have experimented with the
> MCMC sample size and interval parameters, but have not been able to
> improve the quality of the simulated network. I am not as familiar
> with what options are available within the bounds of some reasonable
> computational cost.
>
> In summary, the problem remains that despite the ERGM convergence, the
> quality of the simulated networks suggests room for improvement, since
> the specified targets are not captured within the distribution of the
> simulated networks.
>
>
OK, let me see if I can offer some further advice, based on your email
and also something that came up in your exchange with Pavel:
1. We should be clear that, assuming no-exotica, you should be assessing
convergence from an MCMC run on the fitted model (as produced by gof()
or done manually). So far, the plots I've seen appear not to be runs
from the fitted model, so I have not actually seen evidence of the
alleged phenomenon. Also, to be clear, (absent exotica) if your
simulated mean stats don't match the observed stats (up to numerical and
sometimes statistical noise), your model hasn't converged. A model that
isn't converging is not the same as a model that has converged but that
is inadequate, and the fixes are very different.
2. The exchange with Pavel led me to dig into your code a bit more, and
I realized that you are not fitting to an observed network, but to
target stats presumably based on design estimation. This could put you
into the "exotica" box, because it is likely that - due to errors in
your estimated targets - there exists no ERGM in your specified family
whose expected statistics exactly match the target statistics. So long
as they aren't too far off, you still ought to be able to get close, but
hypothetically one could have a situation where someone gets an
unusually bad estimate for one or a small number of targets, and their
matches are persistently off; in this case, the issue is that the MLE no
longer satisfies the first moment condition (expected statistics do not
match the target statistics), so this is no longer a valid criterion for
assessing convergence. If one is willing/able to make some
distributional statements about one's target estimates, there are some
natural relaxations of the usual convergence criteria, and almost surely
Pavel has written them down, so I defer to him. :-) But anyway, /if/
your model really seems not to be converging (by the criteria of (1)),
and /if/ you are using estimated target stats, then I would certainly
want to investigate the possibility that your model has actually
converged (and that you're just seeing measurement error in your target
stats) before going much further. To write reckless words (that you
should read recklessly), one naive heuristic that could perhaps be worth
trying would be to look at the Z-scores (t_o-t_s)/(s2_o^2+s2_s^2)^0.5,
where t_o is the observed (estimated) target, t_s is the simulated mean
statistic, s2_o is the standard error of your target estimator, and s2_s
is the standard error of the simulation mean. (If you are e.g. using
Horvitz-Thompson, you can approximate s2_o using standard results, and
you can likewise use autocorrelation-corrected approximations to s2_s.)
If these are not large, then this suggests that the discrepancies
between the targets and the mean stats are not very large compared to
what you would expect from the variation in your simulation outcomes and
in your measurement process. This does not take into account e.g.
correlations among statistics, nor support constraints, but it seems
like a plausible starting point. (Pavel and Martina have been working
with these kinds of problems a lot of late, so doubtless can suggest
better heuristics.)
3. Pavel's comments pointed to SAN, which also led me to observe that
you are starting by fitting to an empty graph. I recommend against
that. In principle, the annealer should get you to a not-too-bad
starting point, but in my own informal simulation tests I have observed
that this doesn't always work well if the network is very large; in
particular, if SAN dumps you out with a starting point that is far from
equilibrium, you are wasting a lot of MCMC steps wandering towards the
high-density region of the graph space, and this can sometimes lead to
poor results (especially if you can't afford to run some (large k)*N^2
burn-in - and recall that the default MCMC algorithm tends to preserve
density, so if the seed is poor in that regard, it can take a lot of
iterations to fix). My suggestion is to use rgraph() to get a Bernoulli
graph draw from a model whose mixing characteristics (and, above all,
density) approximate the target, and start with that. An easy way to
set the parameters is to fit a pilot ERGM using only independence terms,
use these construct a tie probability matrix, and pass that to the tp
argument of rgraph(). Your case makes for a very large matrix, but it's
still within the range of the feasible. (rgraph() does not use
adjacency matrices internally, and so long as you set the return value
to be an edgelist is not constrained by the sizes of feasible adjacency
matrices, but if you want to pass an explicit tie probability matrix
then obviously that puts you in the adjacency matrix regime.) Anyway,
it's better to use rgraph() for this than an simulate() call, because it
will be both faster and an exact simulation (no MCMC). A poorer
approach not to bother with mixing structure, and just to draw an
initial state with the right density (which at least reduces the risk
that SAN exits with a graph that is too sparse)....but you might as well
put your starting point as close to the right neighborhood as you can.
The goal here is to help the annealer get you to a high-potential graph,
rather than expecting it to carry you there from a remote location. It
is possible that this turns out not to be a problem in your particular
case, but it seems worth ruling out.
Hope that helps,
-Carter
> Aditya
>
>
> On Fri, Aug 30, 2024 at 4:37 AM Carter T. Butts via statnet_help
> <statnet_help at u.washington.edu> wrote:
>
> Hi, Aditya -
>
> I'll be interested in Pavel's take on the convergence issues, but
> just to verify, you are assessing convergence based on a /second/
> MCMC run, correct? The MCMC statistics in the ergm object are
> from the penultimate iteration, and may thus be out of equilibrium
> (but this does /not/ necessarily mean that the /model/ did not
> converge). However, if you simulate a new set of draws from the
> fitted model and the mean stats do not match, /then/ you have an
> issue. (This is why we now point folks to gof() for that
> purpose.) It looks like your plots are from the ergm object and
> not from a gof() run (or other secondary simulation), so I want to
> verify that first.
>
> I also note that a quick glance at the plots from your more
> exhaustive simulation case don't seem all that far off, which
> could indicate either that the model did converge (and per above,
> we're not looking at a draw from the final model), or that it
> converged within the tolerances that were set, and you may need to
> tighten them. But best to first know if there's a problem in the
> first place.
>
> Another observation is that, per my earlier email, you may need
> O(N^2) toggles per draw to get good performance if your model has
> a nontrivial level of dependence. You are using a thinning
> interval of 1e6, which is in your case around 30*N. It's possible
> that you've got too much dependence for that: O(N^2) here would
> mean some multiple of about 1e9, which is about a thousand times
> greater than what you're using. Really large, sparse networks
> sometimes /can/ be modeled well without that much thinning, but
> it's not a given. Relatedly, your trace plots from the 1e6 run
> suggest a fair amount of autocorrelation on some statistics, which
> suggests a lack of efficiency. (Autocorrelation by itself isn't
> necessarily a problem, but it means that your effective MCMC
> sample size is smaller than it seems, and this can reduce the
> effectiveness of the MCMCMLE procedure. The ones from the 1e6
> run aren't bad enough that I would be alarmed, but if I were
> looking for things to tighten up and knew this could be a problem,
> they suggest possible room for improvement.) So anyway, I
> wouldn't crank this up until verifying that it's needed, but you
> are still operating on the low end of computational effort
> (whether it seems like it or not!).
>
> Finally, I would note that for the stochastic approximation
> method, convergence is to some degree (and it's a bit complex)
> determined by how many subphases are run, and how many iterations
> are used per subphase. This algorithm is due to Tom in his
> classic JoSS paper (but without the complement moves), which is
> still a good place to look for details. It is less fancy than
> some more modern algorithms of its type, but is extremely hard to
> beat (I've tried and failed more than once!). In any event, there
> are several things that can tighten that algorithm relative to its
> defaults, including increasing thinning, increasing the iterations
> per subphase, and increasing the number of subphases. Some of
> these sharply increase computational cost, because e.g. the number
> of actual subphase iterations doubles (IIRC) at each subphase - so
> sometimes one benefits by increasing the phase number but greatly
> reducing the base number of iterations per phase. The learning
> rate ("SA.initial.gain") can also matter, although I would
> probably avoid messing with it if the model is well-behaved (as
> here). I will say that, except under exotic conditions in which I
> am performing Unspeakable ERGM Experiments (TM) of which we tell
> neither children nor grad students, I do not recall ever needing
> to do much with the base parameters - adjusting thinning, as needs
> must, has almost always done the trick. Still, if other measures
> fail, tinkering with these settings can/will certainly affect
> convergence.
>
> I'd check on those things first, and then see if you still have a
> problem....
>
> Hope that helps,
>
> -Carter
>
> On 8/29/24 12:13 PM, Khanna, Aditya wrote:
>>
>> Hi Carter and All,
>>
>>
>> Thank you so much for the helpful guidance here. I think
>> following your suggestions has brought us very close to
>> reproducing the target statistics in the simulated networks, but
>> there are still some gaps.
>>
>>
>> Our full previous exchange is below, but to summarize: I have an
>> ERGM that I fit previously with ERGM v3.10.4 on a directed
>> network with 32,000 nodes. The model consisted of in- and
>> out-degrees in addition to other terms, including a custom
>> distance term. In trying to reproduce this fit with ergm v4.6,
>> the model did not initially converge.
>>
>>
>> Your suggestion to try setting the main.method = “Stochastic
>> Approximation” considerably improved the fitting. Specifying the
>> convergence detection to “Hotelling” on top of that brought us
>> almost to simulated networks that capture all the mean
>> statistics. (Following an old discussion thread
>> <https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/346__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGb4BZQZz$>on
>> the statnet github, I also tried setting the termination criteria
>> to Hummel and MCMLE.effectiveSize = NULL. I think, for me, in
>> practice, Hotelling worked a bit better than Hummel though).
>>
>>
>> In general, I tried fitting the model with variants of this
>> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/blob/27736b2728965188ed73821e797b5ac7007b1093/fit-ergms/ergm-estimation-with-meta-data.R*L257-L296__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGU058cV1$>specification.
>> I got the best results with setting both MCMC samplesize=1e6 and
>> interval = 1e6 (see table below).
>>
>>
>> MCMC interval
>>
>>
>>
>> MCMC sample size
>>
>>
>>
>> Convergence Detection
>>
>>
>>
>> Results/Outcome
>>
>>
>>
>> Note
>>
>> 1e6
>>
>>
>>
>> 1e6
>>
>>
>>
>> Hotelling
>>
>>
>>
>> Closest agreement between simulated and target statistics
>>
>>
>>
>> Max. Lik. fit summary and simulation Rout
>> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/commit/777bae726d29dae969f06e0d17b40ee59a01a7fc__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGffZwKYQ$>
>>
>>
>> Violin plots
>> <https://urldefense.com/v3/__https://github.com/hepcep/net-ergm-v4plus/tree/rhel9-setup/fit-ergms/out__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGZwugBZJ$>showing
>> the simulated and target statistics for each parameter
>>
>>
>>
>> But, I found that this was the closest I could get producing
>> simulated statistics that matched the target statistics. In
>> general, any further increasing or decreasing of either the
>> samplesize or interval did not help generate a closer result,
>> i.e., this looked to be some optimum in the fit parameter space.
>> I can provide further details on the results of those fits, which
>> for some configurations didn’t converge, and if they did
>> converge, the goodness-of-fit was worse than what I had with
>> setting the MCMC interval and samplesize to 1e6. Based on your
>> experiences, I was wondering if this is expected?
>>
>>
>> For now, my main question is, are there any suggestions on how I
>> can further tune the fitting parameters to match my targets more
>> closely? I can provide specific details on the outcomes of those
>> fitting processes if that would be helpful.
>>
>>
>> Thanks for your consideration.
>>
>> Aditya
>>
>> On Thu, May 16, 2024 at 2:33 PM Carter T. Butts via statnet_help
>> <statnet_help at u.washington.edu> wrote:
>>
>> Hi, Aditya -
>>
>> I will defer to the mighty Pavel for the exact best formula
>> to reproduce 3.x fits with the latest codebase. (You need to
>> switch convergence detection to "Hotelling," and there are
>> some other things that must be modified.) However, as a
>> general matter, for challenging models where
>> Geyer-Thompson-Hummel has a hard time converging
>> (particularly on a large node set), you may find it useful to
>> try the stochastic approximation method (main="Stochastic" in
>> your control argument will activate it). G-T-H can (in
>> principle) have sharper convergence when near the solution,
>> but in practice SA fails more gracefully. I would suggest
>> increasing your default MCMC thinning interval
>> (MCMC.interval), given your network size; depending on
>> density, extent of dependence, and other factors, you may
>> need O(N^2) toggles per step. It is sometimes possible to
>> get away with as few as k*N (for some k in, say, the 5-100
>> range), but if your model has substantial dependence and is
>> not exceptionally sparse then you will probably need to be in
>> the quadratic regime. One notes that it can sometimes be
>> helpful when getting things set up to run "pilot" fits with
>> the default or otherwise smaller thinning intervals, so that
>> you can discover if e.g. you have a data issue or other
>> problem before you spend the waiting time on a high-quality
>> model fit.
>>
>> To put in the obligatory PSA, both G-T-H and SA are simply
>> different strategies for computing the same thing (the MLE,
>> in this case), so both are fine - they just have different
>> engineering tradeoffs. So use whichever proves more
>> effective for your model and data set.
>>
>> Hope that helps,
>>
>> -Carter
>>
>>
>> On 5/16/24 7:52 AM, Khanna, Aditya via statnet_help wrote:
>>> Dear Statnet Dev and User Community:
>>>
>>> I have an ERGM that I fit previously with ERGM v3.10.4 on a
>>> directed network with 32,000 nodes. The model included in-
>>> and out-degrees, in addition to other terms. The complete
>>> Rout from this fit can be seen here
>>> <https://urldefense.com/v3/__https://gist.github.com/khanna7/aefd836baf47463051439c9e72764388__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwORmxHSho$>.
>>> I am now trying to reproduce this fit with ergm v4.6, but
>>> the model does not converge. (See here
>>> <https://urldefense.com/v3/__https://gist.github.com/khanna7/fbabdde53c79504dfeaebd215bb5ee20__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOW7y31IM$>.)
>>>
>>> I am looking for ideas on how to trouble shoot this. One
>>> suggestion I got was to set values for the "tuning
>>> parameters" in the v4.6 to their defaults from v3.11.4. But
>>> ERGM v4.6 has a lot more parameters that can be specified,
>>> and I am not sure which ones make most sense to consider.
>>>
>>> I would be grateful for any suggestions on this or alternate
>>> ideas to try.
>>>
>>> Many thanks,
>>> Aditya
>>>
>>>
>>>
>>>
>>> --
>>>
>>> <https://urldefense.com/v3/__https://sph.brown.edu/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWf8YDMv$>
>>>
>>>
>>> <https://urldefense.com/v3/__https://sph.brown.edu/events/10-year-anniversary__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOcsy9Aer$>
>>>
>>>
>>>
>>> Aditya S. Khanna, Ph.D.
>>>
>>> Assistant Professor
>>>
>>> Department of Behavioral and Social Sciences
>>>
>>> Center for Alcohol and Addiction Studies
>>>
>>> Brown University School of Public Health
>>>
>>> Pronouns: he/him/his
>>>
>>>
>>> 401-863-6616
>>>
>>> sph.brown.edu
>>> <https://urldefense.com/v3/__https://sph.brown.edu/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWf8YDMv$>
>>>
>>> https://vivo.brown.edu/display/akhann16
>>> <https://urldefense.com/v3/__https://vivo.brown.edu/display/akhann16__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwOWy55iTf$>
>>>
>>>
>>> _______________________________________________
>>> statnet_help mailing list
>>> statnet_help at u.washington.edu
>>> https://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!KsbhvmLlx8TkLK7y2NKz59hK4-4H7KXVV7dEyUG4vcQzi4Mh7nO-9HupA7_ep2V2p9KkD_i00tcg6nDqczDwObRNh35k$
>> _______________________________________________
>> statnet_help mailing list
>> statnet_help at u.washington.edu
>> http://mailman13.u.washington.edu/mailman/listinfo/statnet_help
>> <https://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!K2TPlppMmLqkp0AMD_Fj8vqeLS9FI7fJTx0UxQMy3qt_I1hNTinFaFOclmyytc3bbAXyEkzYTJhdrHccGTc77FsB$>
>>
> _______________________________________________
> statnet_help mailing list
> statnet_help at u.washington.edu
> http://mailman13.u.washington.edu/mailman/listinfo/statnet_help
> <https://urldefense.com/v3/__http://mailman13.u.washington.edu/mailman/listinfo/statnet_help__;!!CzAuKJ42GuquVTTmVmPViYEvSg!LKGctG3KpUbsSQwk8O2E14uWvIjFBhbEa_mHZ5i_MJ9vtXw_UfiYpxvp4p1HYdnN3aFwRdIYqyjPTzdZdoKq5yPB$>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman13.u.washington.edu/pipermail/statnet_help/attachments/20240907/1b206d71/attachment-0001.html>
More information about the statnet_help
mailing list