Hacker News new | past | comments | ask | show | jobs | submit login
How many lines of code is Candy Japan? (candyjapan.com)
237 points by bemmu on Sept 30, 2016 | hide | past | favorite | 77 comments



This may just be me, but I find well maintained projects tend to get sticky around 10k lines of code. Much smaller than that and every new thing is just that a new thing. Much larger and it's easer for new things to kind of get tacked on at the edges. But, around 10k the project is small enough to understand and most new code tends to result in some refactoring for shared functionality. 10k lines is also fairly close to novel length.

Anyone else notice this?


Above 10k lines is where statically typed languages really begin to justify their overhead. I work primarily in Ruby, C++ and Java, and while for smaller projects, Ruby is far more of a joy to use, for large projects (I've worked on C++ projects well over a million lines of code) static typing is a blessing.

Refactoring Ruby is painful in a large project, even with decent test coverage (and a large portion of the tests basically enforce what you'd get automatically with static typing). C++ is doable in that you'll get (increasingly humane with clang and recent GCCs) compiler errors when you've forgotten some detail. But refactoring Java is positively germane with all of the tools that are enabled by such a consistent VM-model. (It really is neat to be able to rename a method in an object on 100k LOC project simply by renaming it in one place or to change a signature with helpers along the way.)

Some of that makes those languages less fun (and quick) to work with initially, but for projects that are likely to eventually grow quite large, there can be an eventual pay-off.


What kills me about it refactoring c++ is dealing with headers. A lot of times, something should really be a separate file (or should me merged into another file) but it would be a pain in the ass to fix the headers.


This is a core problem with C++. Until we have packages, you might want to consider another language. Rust is usually a good choice.


GitLab is 30k+ lines of Ruby code http://gitlab-org.gitlab.io/gitlab-ce/coverage-ruby/#_AllFil...

We're not missing static typing as far as I know. We still are able to ship a lot of changes every month. I'm not sure if it is our structure (idiomatic Rails), test coverage (over 85%), something else, or maybe we should miss it but don't realize it.

One disadvantage is that static typing causes more lines of code than dynamic code. And code size is the best predictor of code quality http://blog.vivekhaldar.com/post/10669678292/size-is-the-bes...


> One disadvantage is that static typing causes more lines of code than dynamic code.

Not if the statically typed language uses inference.


That would be the best of both worlds indeed. Hope we get this for Ruby https://codon.com/consider-static-typing


The seminal languages with static typing had this feature, so I feel it does the field a disservice to say "the best of both worlds": there are a few annoyingly popular statically typed languages which don't have this property, and they should be considered deficient to the extent to which it is not for theoretical type depenence reasons.


One of the most interesting videos on programming I've watched in a while addresses that idea: https://vimeo.com/108441214 The video can be summed in a lot of ways as "What if we actually took that seriously and started designing around it?"

I haven't had enough chance to do it, but I'm at least intrigued.


Isn't that the Unix philosophy in a nutshell? Lots of small programs that only do one thing, but can be strung together? They can also be rewritten without worrying about affecting anything else.


When working in Python I noticed that 20kloc was the point at which I stopped being able to hold a project in my head, at which point I became unable to safely refactor, which in turn meant refactoring basically stopped being worth it. Any codebase larger than that, the core ossified, and new functionality was only neally possible to add at the edges.

I think I can do a lot better than that now that I'm using Scala. Certainly I don't feel like I've hit the limit yet.


This is where you need tests. Leaning on the type system can take you further, but at some point you're going to need tests with Scala too.


Disagree. Effective use of types has a much better cost/benefit than tests, and it's possible to go all the way down to 0% defect rate using just the type system (by encoding proof of correctness into the type system) if you really need to.


I highly doubt that you can prove correctness with types alone.

How would you verify the correctness of this function with types alone:

Integer add(Integer a, Integer b) { return a + b; }

How would types verify for me that the result is correct and not a+b+1?

Formal proofs work for this but to make them work you need more than just types.


The return type of your function is too broad: It should be the "type of integers which can be proven to be the sum of a and b".

That sounds like a joke but it's what going overboard with dependently typed programming languages is like.


Only insofar as your types essentially become the bones of the program... which you'll then want to refactor at some point.

Types are awesome. Languages like Rust give them to me without having to learn a ton of theory. However, eventually, I realise that my types have boxed me into a certain design - and I need to refactor heavily, which involves rewriting a large chunk of my types which I was relying on to show me that my code was correct.


Of course you sometimes need to refactor your types. That's fine; the type checker will guarantee that you can do so safely. You probably want to be cautious about changing your axioms, and to that end it might be worth distinguishing them explicitly, but that's easily done (e.g. you can put them in a separate module).


It'll tell you your program compiles. Unless you've gone seriously overboard and typed every piece of information down to primitives, and built your types so there's no possible way to convert them out of those types at the wrong time, you've still got a chance of having bugs.

Functionality testing will tell you when you've failed to encode an invariant in your type, and there's a good chance that the reason you failed to is just that it's more effort than it's worth. Else we'd all be writing proofs for every program in Coq.


Do your tests also contain every single piece of information?


You can at least add a new test easily when someone reports a bug. Tightening up your types, otoh, is a trade-off against brittleness.

In practice, as I alluded to in my previous post, you really want types combined with broad functionality testing across a range of scenarios. You really don't want to be relying on "it compiles, I assume it works" unless you're able to back out of a deployment in seconds when it doesn't.


Tests are much more brittle than types; automated tooling doesn't know what the tests are saying, and there's no way to make code "test-generic".

Some things are probably not worth typing, but if they're not worth typing they're not worth testing either. Whatever your target defect rate, the most efficient way to hit that target is using types.


> there's no way to make code "test-generic"

Sure there is. Stop testing random bits of internal functionality, start testing huge modules with well-known (semi-public) APIs, or your entire program, and combine that with a ~reasonable~ type system. You want full-system testing. You don't need a whole lot of tests to be sure "it's basically working", and it's a lot easier to turn repeatable bug reports into tests.

When you inevitably break a piece of functionality despite your types, you might not know precisely how it broke, but at least you know it's broken without a customer calling you up. And every time a customer does call you up, you can add a full-functionality test - "this is what the user did, and this is what I expect from that".

Every other engineering branch backs up good practices and solid math with lots of testing. So should we, types or no types.


Been using Scala a lot at work lately and I've definitely helped myself out with using types to good effect, but I still feel like I could be doing better... the one lesson I've definitely learned is if you're going to be serializing things pick a library that makes upgrading those serializations easy or you'll live in fear of changing types.


Yeah, I'd recommend Thrift or Protocol Buffers for serialization. Having distinct serialized objects feels like overhead to start with, but it's well worth it if you ever want to change the code without risking incompatibilities.


What can I read to be convinced of your viewpoint?


I don't know - I reached it through direct experience (a number of years programming professionally in a language with a good type system) rather than anything I read.

https://spin.atomicobject.com/2014/12/09/typed-language-tdd-... makes the case that many tests can be replaced more effectively by types (though the use of Java makes the point less clear than it should be, and means that some properties are so difficult to express through types that the author prefers to use tests). That one can prove correctness with types is well-known, and it stands to reason that there is no need for tests for formally verified code. These two things together suggest that there's never a point on the defect rate/cost curve that's easier to reach with tests than with types, but they don't rule it out entirely.


How would your static typing save you from:

  model.setTitle( dto.getDescription() );


(assuming the bug here is a description being assigned to a title)

getDescription simply needs to return a Description object while getTitle returns a Title object. Duh.

Only in the final rendering should it get turned into a string and there it should only accept a Title type for doing so.

Okay, I'm kind of joking, but this is seriously the solution if you really want to get down to it - primitive overuse is the problem, so wrap it in a type that makes it explicit. You need cheap and easy types to take full advantage of this sort of thing.

EDIT: And even with those reading code developed in such a way may go overboard at some point. I'm not sure as I haven't really seen it done to the fullest extent possible. Typically even when taking a fairly strong advantage of such a type system, unit tests and stuff are still used, but theoretically you could enforce their parameters entirely in the type system. Of course, that enforcement can also be incorrect, but it does make it clearer in many cases.


I can see your argument in some cases, but if you are accepting input, at some point you'll have to accept a string (or any primitive) and convert it to your custom type.

How does static typing ensure you're mapping the correct JSON input fields to your custom types?


> How does static typing ensure you're mapping the correct JSON input fields to your custom types?

Ideally you push it all the way out. Rather than JSON you use Thrift or similar where the messages are strongly typed.

But if you do use JSON, how much benefit does unit testing your JSON mapping really give you? The mapping in code is just a list of field:field pairs, the test is just the same thing, is there really any value in repeating it twice? I'd sooner rely on careful code review than a test.


That's...horrible. This is not a comment on you or your experience, but is this typical for Python? I’m used to working in C++ code bases with several million lines of code. Does Python really begin to fall down that early? I’ve been looking at using it in an upcoming small project, but I expect it to be larger than 20K. Now I’m reconsidering.


I agree for dynamic typed language like python and javascript 10-20k is a good limit. With a static typed language like C++, if the abstraction is well designed people should be able to handle more.


There will be a lot less mental overhead in statically typed languages, thanks to sane / functional autocompletion and a lot of information being abstracted away into types, so on that I agree. The other part may be that more loosely typed languages have a tendency to be less strictly structured, but that depends on the project.

The project I'm working in atm is neatly structured and at ±60K, but you definitely see certain team members knowing more about one part than the other. We've also started a POC to start moving to Typescript for the more critical areas of the application (services, data models).


I work in a supermarket (in Japan).

Those EAN13 codes you are printing are US only and probably reserved. I recommend that you change to any one of the in-store codes like 2*. It specifically carries no restrictions.


EAN means "European Article Numbering".


I know. IAN and EAN are interchangeably used.


What problems may this cause? It's not like he's scanning this at a supermarket till


Nothing. It is like using reserved world IPs in your internal network. There is perfectly valid range you can use so why not shift to that?


What could be the problem ?


My guess when confronted with the number of lines of Python code for the front page was 5000 SLoC. I lowballed it intentionally because I was under the suspicion that it might be lower than one might normally guess. (Don't we all take pride in keeping our SLoC count as low as we can for any given problem?)

The video had some closing points, including "NoSQL sucks for reports", which was explained briefly. OP, if you could, I would like for you to elaborate a bit more on this point. I am not doubting that it's true for Google App Engine, just interested to hear about the particulars.


I can speak to this, being a heavy GAE user and formerly a heavy MongoDB user.

For a reporting system you want an ad-hoc query language, fast in-database aggregation, and joins.

The GAE datastore has none of the three, and MongoDB lacks joins. These are not fatal flaws - the GAE datastore has other advantages like infinite scalability, built-in synchronous geographic replication, failover, zero maintenance, etc. But for analytics, we replicate a subset of data to Postgres. It's still cheaper to have developers write occasional replication code than to hire a DBA to maintain the database, and I never have to worry that an ill-conceived sql statement will create an incident. I've come to the conclusion that using the GAE datastore as primary datastore and replicating to specialized databases is a pretty good architecture for systems that require reliability and low-maintenance.

MongoDB's aggregation framework is really painful because the query language is weird and very low-level - you have to do most query planning yourself. And without joins you hit the limits of the kinds of questions you can ask the system very quickly.

FWIW, my guess was 10k lines of code.


In the article he writes that it sucks because you end up writing queries in python, which would have been better suited to be written in SQL.

I didn't listen to the talk, so I don't know if he mentions it there as well, and you are looking for a more in-depth explanation.


Very interesting reading, as almost always from CJ.

I was impressed by the precision in the data, i.e. being able to tell exactly how many lines of code deal with "fraud detection". Perhaps there's a "frauddetection.py" file, but then there has to be someone importing it and using it, and those lines are harder to count, I'd expect. But perhaps the integrations are small enough to deal with manually.


The code is short enough that what I did was actually just go through every line of code and try to assign it to a category. It did have some which were hard, such as the import statements you mention that multiple parts might use. But I figured being off by a few dozen lines wouldn't be that important for seeing the bigger picture.


@OP Why not dump the data into postgres rather than code the schema and transactions? I had a similar issue and postgres was a godsend.


Well he is on google cloud, which when he wrote it he used the best option.

Google cloud now has cloud sql, which would likely be much better. But then he needs to code, test, and migrate.. perhaps not worth it?


The parent is likely referring to dumping the GAE Datastore data into a Postgres instance for reporting and other analysis. Not swapping out the primary datastore for the app. This comment elsewhere in this thread explains the strategy nicely: https://news.ycombinator.com/item?id=12613223


I don't think he has the scale to require NoSQL. Cloud SQL could host his data and his reporting in a single database.


It wouldn't be impossible, but probably less work overall just to continue to write little Python scripts for my reporting purposes.


Yes, good points.


It is supposing Recurly doesn't have a better fraud story. Did you look at using one one "anti fraud" as a service companies or a different vendor that does subscriptions?


At the time candyjapan had their fraud problems, Recurly didn't have very good support for these services. Since then, we've formed a partnership with Kount:

https://recurly.com/press/recurly-teams-up-with-kount-to-hel... https://docs.recurly.com/docs/kount

You get some basic protections for free but you can also integrate with your Kount account if you want to customize.


Great read! My guess would have been greater than 10,000. Couple of typos towards the end you might want to fix:

"I don't know how I could have expect it, but somehow from the start I should have prepared for fraud. You want to at least keep an on any suspicious activity and react quickly if you start getting many chargebacks."

'expected' and 'at least keep an eye on'


Thanks!


Coming from an enterprise background, that number is astonishingly low.


The Active Directory integration alone would take 5k lines.


Yes, but how many hundreds of microservices do you have? :-)


I totally get the point on NoSQL, I started out with Redis and some custom micro services that wrote to Boltdb for my food site. This started getting out of hand, now I am converting everything to Postgresql. I will still use Redis for some fast access/caching, but for reporting, sql is still the best way to go once you know what your data looks like.


Do you guys know similar "show offs", where a author/business presents their internals with real world requirements (shipping, freud protection) tied to the software component? I've found it especially interesting.


does he live in japan and ship from there? Or live in the uS and somehow gets the candy?


He does live in Japan: "Bemmu started Candy Japan with wife Nachi and lives in Tokushima, Japan."


Thanks to this very informative blog post I also discovered https://embedd.io that allows to embed HN and Reddit comments to any website. Thanks again!

EDIT: I don't know why I am getting downvoted. I am NOT from embedd.io (why else would you downvote otherwise?) and just genuinely discovered the service. HN is truly toxic lately.


Embedd.io is nice, take a look at the code snippet for importing it. I think it's neat how they embed the configuration options inside the script tags.


You're probably being down voted because your comment doesn't really add anything to the discussion, but try not to take it too personally, people are just pushing your comment to the bottom of the page so that comments that are about the article can bubble up.


> You're probably being down voted because your comment doesn't really add anything to the discussion

Clearly, you're incorrect, because several people have explicitly stated that they appreciated the mention of this site (and probably there are more who were glad to find out but didn't comment). `bemmu`, Candy Japan's founder, also positively responded to the comment in question.

In my opinion, downvotes on HN should be reserved for malicious comments like trolls, ad hominems, off-topic spam (real spam, rare on HN thanks to the mods. not just "I don't like this comment and think it's off-topic"), factually incorrect comments, and "reddit style" humor.

Users should not downvote comments they simply disagree with, on-topic promotional comments, and tangentially related comments. Tangents are a great way to discover new ideas.

However, there is a small cadre of HN downvoters who disagree with me, and who rabidly downvote any comment with a whiff of being off-topic or controversial. These downvoters must spend lots of time on HN, since their downvotes always come very quickly after a comment is submitted.

Luckily, they do not represent the majority of HNers who can downvote, since comments like this one typically crawl back up into positive territory after the first 30-60 minutes. Some of my comments that were initially downvoted for being "off-topic" even crawled back up to over 10 upvotes.

To the GP commenter, please don't get too frustrated. It's a relatively small number of people who downvote useful comments like yours. It's true that they are among the first to vote on comments, but almost always their downvotes are reversed after some time.


Yeah, but that is just how people are. You can huff and puff but clearly the design and format of HN is largely to blame.

People whining about down votes and then complaining about it only makes the discussions worse. Your comment is going to be seen by maybe .1% of HN. There are better avenues to raise this issues.

If you want to change how down votes are handled going off topic in a discussion is not how you do it.

If you are complaining about down votes you are polluting the discussion even more than the erroneous down votes themselves.

People are people. It takes more to change behavior than complaints.


> If you want to change how down votes are handled going off topic in a discussion is not how you do it.

If that's the case, then both you and I are pissing into the wind here.

> If you are complaining about down votes you are polluting the discussion even more than the erroneous down votes themselves.

What about complaining about complaints about downvotes? Does that pollute the discussion or is it exempt?


Pissing against the wind and complaining about complaining is ok now, because this thread is at the bottom now, where it belongs and if somebody is scrolling down so far he might be looking for stuff like this.


> Users should not downvote comments they simply disagree with

The site founders have explicitly stated that downvoting to disagree is fine. There is also the 'flag' mechanism for toxic comments.


Every comment I read that was enough of a waste of my time gets a downvote, including factually true statements made with the best of intentions.

There are established norms that downvoting is not just for the cases you mentioned: https://news.ycombinator.com/item?id=117171.

I'm not going to sit here through the fifteenth iteration of BSD v GPL even if the arguments are cogent and civil.


Your comment was definitely a waste of my time (I didn't gain anything from reading it) so by your standards I should downvote you.

  > I'm not going to sit here through the fifteenth iteration of BSD v GPL even if the arguments are cogent and civil.
No, you're carefully going to read them (to make sure they're "enough of a waste of my time") and then downvote them, because it's clearly a great way to spend your precious time.


> Your comment was definitely a waste of my time (I didn't gain anything from reading it) so by your standards I should downvote you.

Yes.


This is a very good website. Thanks!

As for the downvotes, don't mind them. It's not "karma" like in Reddit. There is no hunt for points here! People downvote based on redevance.


It's off topic to this discussion. If you discovered a new interesting website/service it's best to submit it as a new discussion.


[meta]

But does it really need it's own thread?

I don't think it's really out of topic. People linked to the blogpost. The service in question is used on the blogpost.

We are a technical bunch. It make sense to comment on the technologies used in a link rather than only its content.

Exemple: candyjapan has this nice comment at the top of their pages sourcecode.

"Hi, Thanks for reading the source. If you would like to start your own subscription box, buy my book! https://www.candyjapan.com/book It contains information that will honestly be useful to you, such as different platforms for running box sites like these and even some sample marketing campaigns."

It's only mildly interesting, so you don't create a thread but make a comment about it. It's still relevant to the discussion because the link was to the website. It is however irrelevant to the content of the article that was linked, but that shouldn't matter.


I'm glad he mentioned it cause I wouldn't have noticed otherwise. I submitted it [1].

[1]: https://news.ycombinator.com/item?id=12612574


I agree, nice website and glad it's resubmitted. I just tried to explain possible downvotes (I didn't downvote myself).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: