Hacker News new | past | comments | ask | show | jobs | submit login
Python Internals: PyObject (gahcep.com)
169 points by kercker on July 28, 2016 | hide | past | favorite | 30 comments



Although it's been posted here (only twice, though!) before, Philip Guo put up a 10-hour series of lectures on CPython internals[0]. I am nearly done with them, and have found them very useful and well-paced. Posting here because the author of this article seems to have stopped the series, and anyone interested in this stuff should definitely check out the Guo lectures.

[0]: http://pgbovine.net/cpython-internals.htm


Also this blog post about modifying CPython internals is very interesting: https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow...


Wow, I had none seen this before. What a fantastic resource. Thanks for sharing, also thanks to Phillip for making this available to everyone.


+1 Guo series is excellent. I completed them twice & very confident about my python skills more than ever before :)


Thanks everyone! Recording lectures has been great since it doesn't take much extra work than not recording; just takes some time to render the final videos and upload. Ideally I'd have time to split the videos up into smaller chunks, but alas free time is diminishing nowadays.


Would you have any interest in providing someone (I would volunteer) the videos so that they can package them in smaller, topic-specific pieces for easier consumption? I don't know anything about editing videos, but if your writings and lectures have taught me anything, it's that I'm capable of learning things and using that knowledge to do interesting and useful work.

As an aside, you are one of my main sources of inspiration. I was a high school dropout with several terms of failing out of community college courses, but after reading your PhD Grind series/book, I'm now a (nearly) straight A student and am very determined to go as far with my education as I can. I would be honored to help you in any small way I can to express my gratitude!

Thanks again for unknowingly improving my life.


Thank you for your effort and sharing the knowledge. pythontutor.com is excellent project!

While watching the series for second time - I started taking some notes/hints. Here its: http://giis.co.in/c_py_notes.txt Its more like hints. one need to watch video to understand this small note. (sorry for typos/mistakes)

thanks again!

EDIT: with youtube offline mode, I saved some video on smartphone & then while going for evening walk, I was listening to it, Got better in health & python skills at the same time :D


Contrary to what the article says, PyVarObject is still PyObject at the same time and PyVarObjects have nothing to do with mutability (certainly, it's trivialy obvious that not all PyVarObjects are mutable as immutable tuples are PyVarObjects, while mutable lists are not)

The thing that is handled by PyVarObject are types whose instances can differ in their size, with tuple and bytes objects being probably most common cases when it is used.


If you try to embed or extend Python in C you will be exposed to PyObject very quickly. The official Python documentation can walk you through that process (visit the embedding or extending parts). The PyObject structure mentioned in the article also comes with many functions to manage them programatically, cast them to other types, etc.


I'm curious if you know much about jobs doing this kind of thing. I've worked a ton with Python, and know a lot about using Cython to write extension modules, wrap C++ code, etc.

I really want a job where I find performance bottlenecks in research or business analytics or financial analytics code, migrate parts to robust Cython implementations, and then build and maintain the broader Python libraries and APIs surrounding that.

But what I'm unfortunately finding is that basically nobody uses this stuff anywhere in practice. I've interviewed at a ton of places doing everything in C++ entirely, and I just see their code and I think, my god, why would anyone choose this? You can write the critical stuff in C++ just as you are, and then so easily wrap it for Python, then write the other 99% of the code in super easy, dynamic Python, even with various kinds of validating systems, or using typing and type annotations to avoid certain kinds of bugs, and just everyone's life is easier.

But they are just so entrenched in the old way of doing it that nobody is willing to try.

I've been consistently amazed at how few jobs in the monthly Who Is Hiring thread are tagged even just with NumPy. I think I've seen maybe one job in 6 months that mentions Cython, and then I found two others outside of HN and the interviews with those places didn't work out.

How to find a job doing this stuff is like my main vexing work problem at the moment. It's so useful, but it's sort of locked outside of some kind of energy barrier that quant teams seem unwilling to cross.


Suggest you look for an opening on JP Morgan's Athena project, or Bank of America Merrill Lynch Quartz. Both those platforms are C++ on the inside with Python APIs. The originators have gone independent here: http://www.wsq.io/about-us/ The site says they're hiring.


I don't know why that throwaway account's comment below was removed ... it's ubiquitously well-known that the code quality inside of the Quartz/Athena/other spinoffs at other banks silo is utterly terrible, like don't go within ten feet of it without a radiation hazard suit on.

This is not a controversial opinion, nor really even a swipe at anyone. It's just bad, bad, bad code inside of banks.

I remember talking to someone on the Athena team during a job interview there and they told me that they had to fill out paperwork and wait weeks for approval just to pip install a third-party package for a sandboxed and not-production-facing ad hoc data analysis task. I don't care what industry you're in, that's just taking bureaucracy way too far. I've worked in quant finance before in a highly regulated environment, and there was never the slightest worry that you couldn't just grab any third-party tool you needed to do your job. Getting into production would require a big process, sure, but not just grabbing pandas, say, and answering someone's quick ad hoc business question. In fact, this sort of thing also was never a problem in defense research either.

Anyway, I personally view the fact that the team who spun out of I believe Goldman originally to make the Athena/Quartz/etc stuff (I think now they are even infecting smaller banks like PNC with it too?), is easily one of the worst things that has happened to the world financially in the past 15 years. Sadly not hyperbole.


A few trading companies do this kind of work also.

Since you really want to write C and Python... Do you care if you have to write trading code to help billionaires make more money?


I don't think I would be a good match for bank culture or HFT, but if the work you describe is in a small-to-medium sized asset manager or hedge fund, then I would not mind, and would find it enjoyable. I worked at such a place a number of years ago and it was one of the better jobs I've had.

The trouble I find is that a lot of small-to-medium sized asset managers and hedge funds don't have very good technology practices in general, and are pretty far from sophisticated use of low-level Python. There are often a lot of political reasons why they cling to Excel/VBA, R, or MATLAB, and there's almost always some political battle happening between the people that want to rationalize the system to a proper software design, and people who just want to keep cranking on the hodge podge of existing tools.

I'd probably be fine with some trade-off regarding all of that, if the pay was acceptable and there was a strong commitment to a healthy work/life balance. But among finance firms this is extremely hard to find.

So even though I am not at all opposed to doing this work in finance, I still find the volume of acceptable job openings in that field to be too small to make it realistic.


Obvious throwaway.

Do NOT work on Quartz or Athena or WSQ.

I have worked on two of these. I have seen the code for all three. These systems have too many WTFs to count.

They have lead to technical crisis at both JP Morgan and Bank of America Merrill Lynch. At Bank of America Merrill Lynch, the crisis lead to resignations of the chief architect and the CTO.

Stay away.


Well, people have found success embedding Lua or JavaScript, since they're easier to sandbox than Python.

Now, Python extensions can be a bit hard to get right. Especially since Python has a specific threading model and if you are trying to wrap an existing C++ code base that is multithreaded in Python... it can be challenging.

Anyways, best of luck with your quest.


Can you elaborate on why those are easier to sandbox than Python?


Python offers a standard library with many, many functionalities. You can start processes, do networking, manipulate files, etc. You might not necessarily want to give your scripting environment those facilities.


https://www.enthought.com/company/careers/ - I worked at Enthought and can vet for the team that works there.

https://www.continuum.io/join-us-empower-people-solve-worlds... - I haven't worked here personally but have heard good things.

Both are in Austin, Texas.


I worked remotely for Continuum before. As I mentioned in another comment, I think Continuum is a great place with amazing engineers, but my particular working style was not well-suited for the mixture of start-up environment and ad hoc consulting. My projects also did not happen to feature the use of any C stuff, only database stuff, pure Python, and pandas. I do think some other projects involved the Cython stuff, but nothing I was assigned to work on.

So unfortunately these types of options aren't likely a good match for the sort of work I am looking for.


I do all of the stuff you mentioned on a daily basis (quant in a high-frequency trading firm) -- NumPy (plus its C API), Cython, numba, a good measure of C++, and even some occasional LLVM. The typical projects range from ingesting market data in high volumes to doing complex machine learning mumbo jumbo :)


I've tried applying at SIG in Philly in the past; I have all the things you mention on my resume (I worked previously at Continuum and also at a quant asset manager), but have not ever heard any kind of reply to my resume (not even a rejection) unfortunately.


It's funny because I do work at SIG (Dublin, Ireland), and we do know a few folks from Continuum personally.

For a quant role, even if in the end it is likely to involve a lot of hardcore low level coding, you're (generally) expected to have a PhD in math/phys/stat/eeng though, that's the rules. For developers roles I don't know all the details. But one thing for sure, we're hiring for quant roles pretty aggressively right now :)


When I worked at a large quant asset manager as a quant analyst in Boston, I did not have a PhD and neither did any of the other analysts except maybe 1 or 2. It did seem you needed a PhD for portfolio management or higher, but not for quant analyst. This was also true at a hedge fund in Boston ... I ultimately turned down their offer, but they seemd to mostly hire master's degree folks (I have a master's in stats/machine learning) not full PhDs.

Are there any openings (that would sponsor a visa) in the Ireland office? I'd love to apply and think SIG could be a good fit, but I've just had no luck responding to the Stack Overflow ads from the Philly office.


Yea there might be, better drop me a line w/ your cv, could try to figure it out.


Why not write a blog post explaining your approach with some examples and mention on the page that you're available for a job/consulting job?


Continuum.io (the guys behind Anaconda and a lot of new python data libraries like blaze, numba, dask) do this kind of work as consultants.


I actually worked at Continuum already. It's a great place with some of the best Python engineers you can possibly find.

I found that the particular combination of start-up environment and consulting was not a good fit for me, and I suspect I would feel the same about Enthought or other kinds of analytics consulting companies, so alas this is not an option for me.


Here is blog post about python list implementation, one of 2 main data structures used internally: http://www.laurentluce.com/posts/python-list-implementation/


Cython can be used to auto-generate optimized Python C-API code that gets source-mapped line by line to original Cython code in the form of HTML annotations. This stuff is very hard to write by hand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: