First, thanks for the presentation. I want to first say that I am always a big fan of wrappers that enhance someone’s ability to use a tech of their choosing - and for that great!
For the sake of stimulating discussion and thinking how the next generation of tools will shape productivity in our field, I think this is a reasonable opportunity to pick at some of the current implementation details and the expectation for packages such as this going forward, especially the interaction with PsN.
I would be happy to hear from others, but I feel like trying to run nonmem through R, especially inside something like Rmarkdown runs into the “when you have a hammer everything looks like a nail”. I love rmarkdown and use it for literally every interim analysis, full technical report, toy project, and many things in between. But what it succeeds best at, is providing a textual representation of the actions you’d like to achieve (the R code) and a tangible results of said action, intertwined with richer documenting capabilities. This invaluable for iterative or dynamic workflows where it can become otherwise difficult to reproduce how a result was achieved.
Maybe I’m missing something here, but that value proposition (I can re-run outputs and regenerate results as I tweak things) is blunted by the time complexity of most non-trivial nonmem runs. Eg, when we think about the layout of an analysis flow for a given run:
pre-processing data/control stream --> run model --> tweak because compile issue --> run model again --> post-process results
compounded across many runs, does not fit into Rmarkdown’s strengths. What happens when you only want to re-run downstream sections? Do you create 1 rmd per run, if not, are you expected to re-run all runs every time you change the settings of one? What happens when you give the document to someone else to run, etc. The idea of locking my Rsession while waiting for a run to complete, or backgrounding the job or maybe? setting the chunk to eval=F so it doesn’t get re-run on subsequent invocations has always felt very awkward.
I would love to hear what people that use such a workflow are doing to mitigate these issues. I personally have ‘resorted’ to a number of isolation and caching techniques to minimize unnecessary re-runs, even for R-based simulations, and nonmem estimation/bootstrap/sse problems dwarf the time complexity of most R simulations.
Furthermore, reproducibility is ‘only’ as good as the ability to control and version the overall environment. Are people just checking in every run artifact? How are configuration changes (working on both windows and linux systems for different parts of project) addressed. Likewise, if using a lower clean level that means a substantial amount of unnecessary data duplication (eg copies of data/output tables in each run subfolder on top of the top level copies).
Admittedly I do not have a direct pulse on all the community pain points, but maybe this is the time to start discussing more about what a ‘modern’ analysis workflow looks like, and what tooling gaps exist to make it easier to succeed.
@MikeKSmith is there a location where such user feedback and design discussions are curated.
For example, off the top of my head, things I’d find value in are like:
- managing data across rmd files that is referenced in many of them (say 1 file per run, need original data in each)
- better representation with more control over the run records to see all models/settings available across project runs
- PsN clean levels not ideally suited to more customized globbing/file matching control for selectively cleaning folders
- easily generating ‘publication-ready’ tables of specific run combinations (base model --> cov model x/y/z --> final model)
- scaffolding out common models/parameter structures and linking to data (stop manually generating $INPUT)
- better integration with version control systems (common gitignore settings, history pruning, bundling all outputs from a run to specific commit message)
- esp. for larger organizations/teams, providing tooling for managing project structure/directories/data access and control
whereas,personally I would never use the direct submission scripts like bootstrap_PsN (given reasons such as outlined above), and often I want to submit into some queue system (SGE) anyway. That said, having an api that makes it easy to access/consume psn’s multitude of customizations and generate the script values (
bootstrap run001.mod -clean=2 .....) that I could then do what I will the resulting string opens up so many more avenues - direct submission, submit to a queue, submit to an api endpoint, etc.
I don’t want to belabor the points further, or take away from the efforts Mike et al have made so far, I just hope that the future of these efforts are not “we’ve now wrapped features x/y/z for the sake of being able to”, rather are driven by open dialog built around community feedback to say - people are most frustrated with 1/2/3 and we’ve taken a crack at solving those pain points with features a/b/c.