Tips and tricks to write LaTeX papers in with figures generated in Python

akshayn · on March 18, 2019

> We also recommend to save the command used to generate a figure in the LaTeX file

An approach I have adopted recently is Knitr[1], so this layer of indirection goes away. With knitr, my data goes directly into the paper repository, and then my Makefile has something like this:

  %.tex: graphs/%.Rnw
    Rscript -e "library(knitr); knit('$?')"

The nice thing is exactly what the authors recommend: it's much easier to enforce a standard appearance across all the figures, and automatically incorporate more recent data into the paper as part of the compilation process.

[1] https://yihui.name/knitr/

Wookai · on March 19, 2019

Looks awesome, thanks for sharing!

abhgh · on March 19, 2019

I'd also add that for figures Inkscape is invaluable [1]. Save as svg once, and export it as whatever later. I typically export it to PDF (from within Inkscape) for pdflatex.

While its typically indispensable for schematics, I often seem to run into the use case of combining previously generated plots or figures, or adding a label/text. Since Inkscape can import pngs, this is a breeze with it. I don't have to go back to the original code to regenerate plots, or fiddle around with latex to make minor adjustments.

For stuff generated via matplotlib, I'd strongly recommend seaborn as an additional library [2]. This is a wrapper over matplotlib. It can prettify plots with just an import and a 'set' command. You can, of course, use it to plot too, and for stuff doable in matplotlib using the seaborn alternative is much easier and looks better with little or no work. And they support pandas dataframes.

[1] https://inkscape.org/

[2] https://seaborn.pydata.org/

fourier_mode · on March 19, 2019

The problem with inkscape is that, any slight changes to the figures would make the user go deep into the workflow pipeline to make the changes. However using LaTeX packages like TikZ or PSTicks would simplify the workflow and make the document more maintainable.

abhgh · on March 19, 2019

I think this is one of those things that depend on your actual workflow, content etc. I see your point; for me this hasn't been a problem.

zapnuk · on March 19, 2019

Have you considered plotly [1] as an alternative to matplotlib?

It is mostly known for online plots but it has a free offline API that can export plots to eps/pfd/png/etc.

Parts of their libraries are aimed for interactive plots but imo the basic plots look even better than those of seaborn.

[1] https://plot.ly/python/

radus · on March 19, 2019

I do this as well. And you can save svg files from matplotlib for editing or composition in Inkscape!

Jill_the_Pill · on March 19, 2019

Having just completed a dissertation in LaTex, with figures online in Overleaf and Dropbox (some of them screenshots), scripts and data spread across two computers and an external hard drive, desperate last minute plot text changes right in the pdf, I just have to ask: WHY DIDN"T YOU POST ALL THIS SOONER?

Wookai · on March 19, 2019

I'm sorry ! It has been online for 4 years now, I simply never thought of sharing...

sigurdjs · on March 19, 2019

If you are serious about making beautiful figures in latex, I would seriously recommend using tikz and pgf-plots. It is quite easy to automatically generate tikz-code from python (after all it is supposed to be read and written by humans) and all aspects of the figure can easily be customized. I have been quite successful in generating automated reports with pretty and easily readable figures using tikz and pgf.

If anyone is interested I have uploaded a sample script for generating XY-plots from two numpy lists to github. The code is by no means very good, but I just wanted to share in case anyone wants to try this approach.

https://github.com/sigurdjs/python-tikz

programLyrique · on March 19, 2019

And it's also possible to directly load a csv file with all the data in latex and plot if with pgf, which makes it possible to keep all the plotting options in the latex file:

  \addplot table[x ={Column1}, y ={Column2}] {myData.csv};

The issue is that it can take some time for pgf to load the data and do computations on them, but you can use the external library of tikz so that it does not compute the plot again (and save it as a pdf for later uses).

Wookai · on March 19, 2019

Indeed, TikZ is great to create beautiful-looking plots! A friend of mine is quite good at it, e.g. look at Figure 2 in [1] and Figure 2.1 (p. 18) in [2]. For complex figures, though, I find that TikZ can be a bit hard to master and sometimes results in longer compilation times.

[1] http://www.hrzn.ch/publications/tf-icnp15.pdf [2] http://www.hrzn.ch/publications/thesis.pdf

jedberg · on March 18, 2019

> When writing LaTeX documents, put one sentence per line in your source file.

An interesting tip, never thought of that! It changes the way you write a bit, but it does make finding changes easier, finding errors easier, and forces you to think more about each sentence since you have to hit "enter" at the end of each one.

bo1024 · on March 19, 2019

It also works much better with version control software (git). Not only does it help with diffs, as the article mention, but it makes merging way easier in case you and your coauthor change two adjacent sentences at the same time.

Wookai · on March 19, 2019

It takes a while to get the "muscle memory" of adding a new line after each sentence but it does make things much easier!

u801e · on March 19, 2019

Interestingly enough, I made a very similar comment[1] about code line length and only using one statement per line in another HN thread several days ago.

[1] https://news.ycombinator.com/item?id=19349464

bonoboTP · on March 19, 2019

I find it useful to work with plots in Jupyter notebooks. Use the "%matplotlib notebook" cell magic to get interactive plots inline.

Then you can use savefig when it looks good. Then save the code you used into some file near the Latex sources.

maksimum · on March 19, 2019

I also use this approach.

To standardize appearance I put appearance modifiers in `notebook_context/__init__.py`, and then in my second jupyter cell

  from notebook_context import *
  configure_plotting_for_publication()

Example notebook_context: https://github.com/maksimt/empirical_privacy/blob/master/src...

mlthoughts2018 · on March 18, 2019

I also recommend separating repetitive parts of plot generating code into template files, such as with mako or jinja2, and then programmatically generate sequences of plots by first piping the data into the jinja2 template, and then using insert commands to insert it into a bigger tex document.

I found this helpful when writing a paper where the appendix needed over 35 different tables of regression results, all with the same format but populated with data from different subpopulations, which would need to be regenerated (including updated captions, etc.) any time data cleaning or methodology was changed.

Wookai · on March 19, 2019

That's a great point! Templates are a great tool to generate big tables from results, I usually do that for most of the results in my papers, makes it easier to have the odd copy/paste error. I might add this to the tips and tricks, thanks!

unwind · on March 18, 2019

Meta: there seems to be an extra "in" in the title, that makes no sense to me, at least.

Not a native speaker, though.

Wookai · on March 19, 2019

Oops, I forgot to erase that when shortening the title, sorry! Not sure I can update it now :(...

naniwaduni · on March 18, 2019

It's a careless transposition of "papers in LaTeX" → "LaTeX papers in" without removing the "in".

Wookai · on March 19, 2019

Indeed, sorry about that!

euske · on March 19, 2019

Re: figures in EPS. I think SVG is the way to go. It can be generated with matplotlib or even a simpler script (it's just an XML after all). It can be hand edited. It's viewable with a browser. And it can be converted to PDF with rsvg-convert.

I personally find matplotlib a bit unintuitive to use, so I made a 100-line script for generating SVG. It's great.

knolan · on March 18, 2019

This is probably most useful for postgrad students getting started with writing with TeX.

It’s worth pointing out that the figures are made using the matplotlib library, which is primarily based on Matlab’s plotting functionality. This is perhaps just as useful for new researchers as many of them are taught Matlab exclusively throughout their undergraduate courses.

p10_user · on March 19, 2019

It’s great for getting started, but if you start really customizing your plots the Object oriented usage of matplotlib is really the way to go.

jonathanpoulter · on March 19, 2019

A minor plug: I've found I generate graphs and tables in Jupyter notebooks, so I wrote ipynb-tex, to allow you to reference cells from a notebook directly in your LaTeX documents. This supports tables, and figures.

https://github.com/poulter7/ipynb-tex

mychele · on March 19, 2019

I would suggest checking matplotlib2tikz and matlab2tikz to get pgfplot/tikz figures from matplotlib and matlab plots

Wookai · on March 19, 2019

Indeed, they're pretty cool (although in my experience the resulting TikZ code sometimes slows down compilation quite a bit).

semi-extrinsic · on March 18, 2019

One itch which (curiously) I can't seem to quite scratch in LaTeX is that it should be possible to say "plot equation \ref{eq:smth} for X in (-4,4)" and just get the bloody graph. Why should I need to define the equation again in a separate place, perhaps even in a separate file?

Wookai · on March 19, 2019

It's not exactly what you want, but you can do parametrized plots in TikZ to plot a given function for some range: http://www.texample.net/tikz/examples/parameterized-plots/.

I wouldn't be surprised if someone wrote a tool that allows you to use a reference to an equation as the function to plot :).

dmlorenzetti · on March 19, 2019

This is not what you asked for, since it still requires a separate file. However it might be close enough to what you want, and -- for complicated expressions -- possibly even better.

You can write (or derive) the expression using sympy, then have sympy generate a numpy expression that can be evaluated. Sympy can also generate the LaTeX code for any expression. So while that isn't an in-LaTeX solution, it may be close to what you want.

Johansson's "Numerical Python" shows several examples of this. I will scavenge one of his examples below (trusting it falls under "fair use", and hoping I transcribe it correctly -- note I have left out the imports). The example uses sympy to generate and plot Taylor series expansions of sin(x).

The key bit to look for in the example is `sympy.lambdify()`.

    sym_x = sympy.Symbol("x")
    x = np.linspace(-2 * np.pi, 2 * np.pi, 100)

    def sin_expansion(x, n):
        return sympy.lambdify(sym_x, sympy.sin(sym_x).series(n=n+1).removeO(), 'numpy')(x)

    fig, ax = plt.subplots()
    ax.plot(x, np.sin(x), linewidth=4, color="red", label='exact')
    colors = ["blue", "black"]
    linestyles = [':', '-.', '--']

    for idx, n in enumerate(range(1, 12, 2)):
        ax.plot(x, sin_expansion(x, n), color=colors[idx // 3],
            linestyle=linestyles[idx % 3], linewidth=3,
            label="order %d approx." % (n+1))

    ax.set_ylim(-1.1, 1.1)
    ax.set_xlim(-1.5*np.pi, 1.5*np.pi)

    ax.legend(bbox_to_anchor=(1.02, 1), loc=2, borderaxespad=0.0)
    fig.subplots_adjust(right=.75)

I highly recommend the book. It's full of nuggets like this.

kccqzy · on March 18, 2019

LaTeX doesn't have enough information about what your notations mean. You can very well write nonsensical formulas that look pretty in LaTeX but are absolutely meaningless.

bingerman · on March 19, 2019

I wish I had read the texbook or something similar sooner to gain knowledge like this. Used latex for years without knowing the basics and I regret that a lot.

Also, (v)phantom and smash are something I really should have learned before all those fancy packages, nowadays I'm mostly using context anyways.

joseph8th · on March 19, 2019

Any opinion on the utility of Emacs Org-mode to organize and manage LaTeX? In particular Org Babel?

p10_user · on March 19, 2019

I’ve written documents in org mode and converted to pdf via LaTeX, but I find that if the document gets sufficiently complicated with formatting, I have so many LaTeX blocks in my org file I might as well be writing LaTeX directly.

Maybe I’m doing something wrong. YMMV

loskutak · on March 19, 2019

That is true, but org-mode really shines when you want to do literate programming stuff, e.g. have the matplotlib code directly in the orgfile, ...

tapia · on March 19, 2019

I already implement most of the points mentioned there. The most useful (and new) tip for me was however the rasterization part. I normally like to have pdf figures for my LaTeX papers, but last time I had some graphics with some thousands of points plotted, which were taking too long to be printed if you did that from windows (in Linux there was no problem, that's why I didn't catch the problem earlier). At the end I decided to save the plot as png, but was not happy about it haha. It would have been good to know the rasterization trick earlier.

Wookai · on March 19, 2019

Indeed, it's pretty useful to be able to rasterize only parts of the plot! Glad you find it useful!

Wookai · on March 20, 2019

Thanks all for the great feedback and discussion, I'll update this thread once I push an update. If you're interested, there was a great discussion on /r/MachineLearing as well: https://www.reddit.com/r/MachineLearning/comments/b2oiaj/d_b...

stilley2 · on March 18, 2019

Thanks for the write-up! Two notes from my experience: pgf output works well with latex as well (although will slowdown compilation), and I recommend not using the pyplot submodule, especially if you'll be running things remotely over ssh and don't have a display

alanbernstein · on March 18, 2019

Would you suggest an alternative to pyplot? What problems does it cause for you?

stilley2 · on March 18, 2019

I had problems using pyplot over ssh because it can assume there's a display and fail when it couldn't find one. Maybe this has changed. I use the OO interface. For example https://matplotlib.org/gallery/api/agg_oo_sgskip.html

jpeloquin · on March 18, 2019

Changing the plot backend should fix this.

  import matplotlib
  matplotlib.use('Agg')
  import matplotlib.pyplot as plt

https://stackoverflow.com/questions/2801882/generating-a-png...

stilley2 · on March 19, 2019

I believe Agg is only for bitmap output. While there are probably backends that work with a headless system, I find the OO option much more flexible.

p10_user · on March 19, 2019

Agg works for all outputs. I use it in combination with OO over ssh all the time.

Wookai · on March 19, 2019

Indeed, I remember having to switch to Agg for this reason but can't remember why I switched back. Maybe some rendering issues I had with Agg, not sure...

billfruit · on March 19, 2019

Is there a better and more comprehensive plotting library than Matplotlib, it's 3D plots a lack polish. Also it is kind of verbose and require much boilerplate. Its api is sprawling and hard to remember.

frumiousirc · on March 19, 2019

For 3D, I use https://en.wikipedia.org/wiki/MayaVi and its `tvtk` module and https://www.paraview.org/

It's API is not as sprawling but understanding the data classes takes some effort.

For 2D, matplotlib or ROOT fit my uses. But, if you think matplotlib is sprawling, you'll hate ROOT.

ujuj · on March 19, 2019

Check Seaborn, it may suit your needs.

https://seaborn.pydata.org/

dagw · on March 19, 2019

Also Plotly and Bokeh, although both are more targeted towards producing interactive web based plots rather than print ready plots.

musicale · on March 19, 2019

I have one tip for anyone using LaTeX:

Please stop using the awful Computer Modern typeface.

JoshTriplett · on March 19, 2019

I personally find Computer Modern quite pleasant to read and work with.

Apart from personal preference, why avoid Computer Modern?

analog31 · on March 19, 2019

It shows that you took an extra year to write your dissertation. ;-)

Just kidding, but I got my degree 25 years ago, and at the time, the students who tried to have a perfectly typeset document spent a lot of time on that pursuit.

andreasvc · on March 19, 2019

It is an extremely thin font which makes it unsuitable for screen reading.

jjgreen · on March 19, 2019

A lot of the time, the TeX output target is paper, and there (with a decent resolution printer) it looks rather splendid (in my view anyway).

0-_-0 · on March 19, 2019

For me the horizontal line in the letter "e" often disappears with smaller text sizes.

rflrob · on March 19, 2019

But then how will people know how smart I am for using LaTeX? ;)

edgarvaldes · on March 19, 2019

Any typeface in particular that you do recommend?

wenc · on March 19, 2019

I used Bitstream-Charter [1] for my dissertation. It looks much better than Computer Modern.

My resume is typeset in Linux Libertine [2] which is used in this superlatively beautiful and elegant CV template by Dario Taraborelli [3]. Requires xelatex.

[1] http://www.tug.dk/FontCatalogue/charterbt/

[2] http://www.tug.dk/FontCatalogue/linuxlibertine/

[3] http://nitens.org/taraborelli/cvtex

Pseudomanifold · on March 19, 2019

Somewhat controversial, but I like it: Minion Pro :)

lusmd · on March 19, 2019

\usepackage{mathpazo}

Or

\usepackage[utopia]{mathdesign}

Some people like garamond (also a mathdesign font).

chess93 · on March 19, 2019

Computer Modern

andrepd · on March 19, 2019

I find Computer Modern to be quite readable on paper and screen. Any particular reason you feel this way?