ISoP ISoP on Twitter | ACoP ACoP on Twitter | PAGE | WCoP | PAGANZ | PAGJa

NONMEM and PsN workflows in R using rspeaksnonmem


#1

https://github.com/isop-phmx/studyGroup/issues/35

Event Proposal:

  • General Topic: NONMEM and PsN workflows in R using rspeaksnonmem
  • Potential Presenter(s): MikeKSmith
  • Potential Date(s): April 2017
  • Background Resources for the Audience:
    An R package called “rspeaksnonmem” is in development that allows the user to develop workflows running NONMEM and PsN functions from within R and to update NONMEM control stream components and write these out for execution from within R. It facilitates writing reproducible workflows based on NONMEM and PsN functionality. rspeaksnonmem calls on the functionality of the package RNMImport to read and parse the control stream and to read in the outputs from NONMEM. rspeaksnonmem breaks the NONMEM control stream into components associated with the data, parameters, model and tasks so that it is transparent to the user exactly what aspects of the model have changed and it allows some modularity of model definition.

The code in development is hosted on Github (https://github.com/MikeKSmith/rspeaksnonmem). Vignettes have been written to illustrate how to use various aspects of rspeaksnonmem. https://github.com/MikeKSmith/rspeaksnonmem/tree/master/vignettes


#2

YouTube link for the Live Stream: https://www.youtube.com/watch?v=VFdGnWrRXw8


#3

Great work @MikeKSmith – how do you expect to manage long running scripts - do you expect knitr to be stable during such long running jobs?


#4

I haven’t had enough experience to say whether knitr would be stable with a long run, but I think Devin Pastoor’s comment about using this package: https://github.com/jeroen/sys may help alleviate some issues since it would somewhat manage execution.


#5

Mike,

First, thanks for the presentation. I want to first say that I am always a big fan of wrappers that enhance someone’s ability to use a tech of their choosing - and for that great!

For the sake of stimulating discussion and thinking how the next generation of tools will shape productivity in our field, I think this is a reasonable opportunity to pick at some of the current implementation details and the expectation for packages such as this going forward, especially the interaction with PsN.

I would be happy to hear from others, but I feel like trying to run nonmem through R, especially inside something like Rmarkdown runs into the “when you have a hammer everything looks like a nail”. I love rmarkdown and use it for literally every interim analysis, full technical report, toy project, and many things in between. But what it succeeds best at, is providing a textual representation of the actions you’d like to achieve (the R code) and a tangible results of said action, intertwined with richer documenting capabilities. This invaluable for iterative or dynamic workflows where it can become otherwise difficult to reproduce how a result was achieved.

Maybe I’m missing something here, but that value proposition (I can re-run outputs and regenerate results as I tweak things) is blunted by the time complexity of most non-trivial nonmem runs. Eg, when we think about the layout of an analysis flow for a given run:

pre-processing data/control stream --> run model --> tweak because compile issue :wink: --> run model again --> post-process results

compounded across many runs, does not fit into Rmarkdown’s strengths. What happens when you only want to re-run downstream sections? Do you create 1 rmd per run, if not, are you expected to re-run all runs every time you change the settings of one? What happens when you give the document to someone else to run, etc. The idea of locking my Rsession while waiting for a run to complete, or backgrounding the job or maybe? setting the chunk to eval=F so it doesn’t get re-run on subsequent invocations has always felt very awkward.

I would love to hear what people that use such a workflow are doing to mitigate these issues. I personally have ‘resorted’ to a number of isolation and caching techniques to minimize unnecessary re-runs, even for R-based simulations, and nonmem estimation/bootstrap/sse problems dwarf the time complexity of most R simulations.

Furthermore, reproducibility is ‘only’ as good as the ability to control and version the overall environment. Are people just checking in every run artifact? How are configuration changes (working on both windows and linux systems for different parts of project) addressed. Likewise, if using a lower clean level that means a substantial amount of unnecessary data duplication (eg copies of data/output tables in each run subfolder on top of the top level copies).

Admittedly I do not have a direct pulse on all the community pain points, but maybe this is the time to start discussing more about what a ‘modern’ analysis workflow looks like, and what tooling gaps exist to make it easier to succeed.

@MikeKSmith is there a location where such user feedback and design discussions are curated.

For example, off the top of my head, things I’d find value in are like:

  • managing data across rmd files that is referenced in many of them (say 1 file per run, need original data in each)
  • better representation with more control over the run records to see all models/settings available across project runs
  • PsN clean levels not ideally suited to more customized globbing/file matching control for selectively cleaning folders
  • easily generating ‘publication-ready’ tables of specific run combinations (base model --> cov model x/y/z --> final model)
  • scaffolding out common models/parameter structures and linking to data (stop manually generating $INPUT)
  • better integration with version control systems (common gitignore settings, history pruning, bundling all outputs from a run to specific commit message)
  • esp. for larger organizations/teams, providing tooling for managing project structure/directories/data access and control

whereas,personally I would never use the direct submission scripts like bootstrap_PsN (given reasons such as outlined above), and often I want to submit into some queue system (SGE) anyway. That said, having an api that makes it easy to access/consume psn’s multitude of customizations and generate the script values (bootstrap run001.mod -clean=2 .....) that I could then do what I will the resulting string opens up so many more avenues - direct submission, submit to a queue, submit to an api endpoint, etc.

I don’t want to belabor the points further, or take away from the efforts Mike et al have made so far, I just hope that the future of these efforts are not “we’ve now wrapped features x/y/z for the sake of being able to”, rather are driven by open dialog built around community feedback to say - people are most frustrated with 1/2/3 and we’ve taken a crack at solving those pain points with features a/b/c.


#6

Thanks for the response Devin.

To answer the question “Where user feedback and design discussions are curated…” I’m happy for anybody to contribute and submit GitHub Issues for rspeaksnonmem. Even if it’s feature requests or questions about workflow, it’ll help collate them and we can discuss for future development or implementation. On the more general topic of pharmacometrics workflow I guess here on discuss.go-isop.org is a good place to capture that.

Markdown is a good way of capturing and documenting results of a single NONMEM run - as you might in a lab notebook. Typically you’d want to present convergence information, parameter estimates with their associated uncertainty and some model diagnostics, numeric and/or graphical. I agree that judicious use of caching results would be good so that minor changes to text doesn’t involve re-running the whole model. I agree that extending this to embed running of all models in an analysis in markdown is overkill… Markdown or LaTeX has advantages for final reporting though - pulling in model outputs, creating figures and tables - which means that you don’t have to copy-paste or worry about whether your tables / figures are consistent with the latest runs of the models. This assumes though that you have run the model already…

Your “value” statements ring true with my own thoughts. rspeaksnonmem could address some of these, but I see more power when it’s used in combination with other tools for version control, job submission and management and provenance / metadata management i.e. tracking inputs / outputs and relationships between runs or models. In our typical analysis “tree” the path to the final model is seldom linear. We need tools to help us capture what we did and what we found for each model, but also tools that help us to look back from the final model to its lineage, re-run in the event of late-breaking changes to data or inputs, and show that our findings can be reproduced or replicated by a third party if required.