Machine learning applied to cannabis breeding

oldbootz · Nov 8, 2017

Hi folks

I have a background in programming and networking and I have been studying some of the latest advances in machine learning.

The idea is that if enough data is gathered, preferably in an automated fashion, and we can train up a convolutional neural network with it, we should be able to do some prediction about what to expect from the progeny of an inbreed cross. I think it would work maybe ok with inbreeding because all the genes are contained within the P1s and once you can input the F1s data it should be able to predict the F2.

The data inputs can be diverse and one would not have to program very much about how the neural network reasons about the data. We could input terpine analysis profiles, cannabiniod profiles, genetic analysis, plant dimensions, growth durations, all kinds of data.

As an output we would want a prediction of some of the same inputs that we put in. What kind of cannabinoids, terpines, morphological proportions etc.

So there are a few free machine learning networks released to be used by the public. For example we have google vision API https://cloud.google.com/vision/ and we have IBMs watson and there are quite a few run at home type binaries like Tensor Flow https://www.tensorflow.org/.

I don't think we are quite there yet in making this happen. First of all it would take a LOT of data to train any kind of accurate network, and it would take a lot of machine power which is less of a barrier if one leverages the power of a network already set up by a large company like google. But we are already seeing some amazing things coming from these networks and i think it wont be too long before we can compose various trained networks together to create a super network that would be able to do this.

With 50 qubit quantum computing on the horizon I see a bright future for data driven business.

I did hear recently that some people came up with a way to genetically analyse a seed and generate a terpine profile that the plant would have had if grown. This is really cool!

Thought I would share with you all my stoned thoughts because it excites me!

weedtoker · Nov 8, 2017

Several papers, and ag with other crops kinda made a glimpse of present/future breeding strategies, I as a layman, do believe in them saying both traditional breeding methods, and tech will be utilized in conjunction, as for me nothing substitutes nose/smoke test/organoleptics human analysis of the finished product, at least in certain breeding programs cannabis has shown that, "guesswork" or not (maybe self-delusional, future will tell)... Once UN treaty goes out of the window, or US changes schedule the fun games that you talk, and more will really begin to ramp up, specially in hemp as food/fiber source? Drug cannabis is a small piece of a pie that's just entering the oven now maaannn...or at least it seems in this stoned mind :dance013:

log roller · Nov 13, 2017

Convolutional neural networks are interesting, and not to shit on your idea, but I don't really think this will be helpful. There seems to be a certain contingent of folks out there (not saying you) who talk a big game about involving all this fancy technology in the breeding process, yet seem to have fuck-all to actually show for it. (Ask Weird his opinion on the subject.) Meanwhile others like me are doing things the old fashioned way and steadily progressing with better and better actual results each generation. Breeding isn't rocket science and a lot of things that theoretically sound like they might be helpful, aren't actually helpful in real world practice. It sort of reminds me how certain folks think they need all this fancy gear like soil Ph meters, brix meters, etc etc to give them an edge, whereas others just look at the plant and study it carefully to get the same results. I do look forward to more knowledge being gained about the genetics of cannabis and which specific sequences control which traits, but it's going to be a long time until such knowledge becomes of any real world practical use in cannabis breeding. My $.02

oldbootz · Nov 13, 2017

Hi log roller

Yea traditional breeding is obviously great as I smoke good weed bred every day like this! I am not trash talking traditional breeding at all, I have huge respect for people doing this properly.

But that's not to say that machine learning will not have its place in the future (not now its too soon) especially in situations with very high plant numbers. Humans can only hold so much in their minds at a time. Could you remember each phenotype in a 10 000 plant selection trial? Humans are quite error prone. Computers are less error prone but require a human to identify the errors.

Todays tech:
1. Computer vision can more accurately tell you what is in a picture than the average human being.
2. Computers can beat human beings at complex games, chess, go, DOTA2
https://www.youtube.com/watch?v=92tn67YDXg0
3. Computers can process large amounts of information quicker than a human.
4. Computers can reach expert level at certain tasks in a matter of days to months that an average human would take 10 000+ hours to reach.

Tomorrows tech is all about telling a computer to learn something and utilizing its superior processing power and accuracy to do a job better than a human (for its specific task). If you trained enough complimentary networks like this you could compose them to create a super network that would be able to solve very diverse types of problems.

I would link some vids but they all 1+ hours and will bore you to death if you not a coder. Check out the Dota 2 world champion playing a machine learning AI.

OG bub · Nov 14, 2017

I wouldnt be interested.
Breeding and observation to obtain quality results is a skill, learned over time. its earned if done correctly.
it never ceases to amaze me how folks are always looking for a shortcut.

I understand your suggestion, and human error.. but quite honestly, if anyone needs to have a 10,000 plant count to select from for a breeding project, let alone 10k phenos lol, my friend, that person might need to ask him/herself if they even have a clue as to what they are looking for!

just my opinion!
Bub.

iTarzan · Nov 14, 2017

Cool thread subject oldboots.

Machine learning worries me because of the Terminator theory. I don't totally like the Voyager project either. It doesn't seem like a good idea to advertise our planet. It could be great but it could be a disaster.

Genetically analyzing a seed has merit imho. Predicting the F2s would not tell you what seed is what so you would still need to grow them all out. It would tell you if they would be worth growing though.

OGBub people want to be self reliant while having pro quality results. Hobbyists and enthusiasts want top of the line results. It has been that way for decades in many different areas.

Especially now. It is the age of apps.

log roller · Nov 16, 2017

But that's not to say that machine learning will not have its place in the future (not now its too soon) especially in situations with very high plant numbers.

I think it will be 15-20 years probably before we get to the point where our human knowledge is good enough for machine learning to be of any use in this particular field of endeavour.

Humans can only hold so much in their minds at a time. Could you remember each phenotype in a 10 000 plant selection trial?

Sure, easily. I just take notes as I smoke each pheno, giving it a A/B/C rating. And then I select the best for next generation. Better selection criteria isn't needed.

To enter your data into a computer you need to sequence DNA of each plant. How much work and effort does that take, vs. what sort of real world gain?

Computers are less error prone

On the contrary. As a computer programmer I can tell you they are definitely more error prone. Not only are they subject to human error but to unpredictable and unforeseeable machine error as well. Your computer is full of bugs, back doors, security flaws, coding errors.

1. Computer vision can more accurately tell you what is in a picture than the average human being.

* with extensive training, and still subject to unpredictable and unforeseeable error. Witness the joke that is Tesla motors and their "autopilot" tech which has lead to some hilarious fails (and Darwin Award wins), such as running head on at 70+ mph into obstacles etc.

2. Computers can beat human beings at complex games, chess, go, DOTA2

* if specifically programmed to do so, and with extensive training using huge datasets, not because of 'intelligence'. Without human intelligence writing the program and extensively training it, a computer is nothing more than a brick.

3. Computers can process large amounts of information quicker than a human.

* Yes, it can deliver error and mistakes with unimaginable speed. Garbage in = garbage out.

4. Computers can reach expert level at certain tasks in a matter of days to months that an average human would take 10 000+ hours to reach.

Can a computer learn to be a plant breeder on its own, without any input from an expert plant breeder programmer? Nope.

Tomorrows tech is all about telling a computer to learn something and utilizing its superior processing power and accuracy to do a job better than a human (for its specific task). If you trained enough complimentary networks like this you could compose them to create a super network that would be able to solve very diverse types of problems.

Your ideas seem to be based mainly on naive beliefs, due to your young age and lack of real world experience. I have been programming computers since 1991 so I do know a thing or two about the field.

Once again, not to shit on your idea, but I don't see this thread as leading anywhere productive. If you think the idea has merit then try it. Nobody here can tell you anything more about the subject than I can. Try it and you'll quickly find out just how not easy it really is. :tiphat:

iTarzan · Nov 16, 2017

log roller said:
Sure, easily. I just take notes as I smoke each pheno, giving it a A/B/C rating. And then I select the best for next generation. Better selection criteria isn't needed.

To use your own words. Your ideas seem to be based mainly on naive beliefs, due to your young age and lack of real world experience.

If you smoked one pheno a day it would take you 30 years to smoke them all and test them properly. ONE A DAY or your findings would be compromised. Then you would hard be pressed to go back through them all and really understand the slight differences in them as you described them without the actual memory of them.

log roller said:
Your ideas seem to be based mainly on naive beliefs, due to your young age and lack of real world experience. I have been programming computers since 1991 so I do know a thing or two about the field.

There is always someone better. Never lose your humility.

oldbootz · Nov 16, 2017

I think it will be 15-20 years probably before we get to the point where our human knowledge is good enough for machine learning to be of any use in this particular field of endeavour.

Cool so we both agree then that it's coming!

IntelliGeneS · Nov 16, 2017

There is a huge benefit to computational approaches to data collection activities which is being applied in the seed industry at large as of now even. Primarily this manifests as utilizing image analysis to generate data which is more accurate and rapid than a human collector can generate, including derivative measures which are simply not feasable without algorithmic processing, and large dataset modeling particularly of molecular-phenotype associated data.

I know from experience, however, that applying breeding decision strategy and technique to computational systems is far from satisfactorily implementable at a general level at this time. That is not to say that machine learning approaches could be (and are) used successfully in fine tuning of breeding programs, just that the current applications are single-trait and very limited in scope. There is a huge amount of uncertainty involved in breeding programs, especially complex ones comprised of layers on layers of breeding projects and market segment classifications, and we are nowhere near the ability to translate that into formats understandable to machine thought yet, we have enough trouble getting humans to have a good grasp on it with all the risk modeling, ideation of potential, and probability interaction overlays involved.

log roller · Nov 16, 2017

If you smoked one pheno a day it would take you 30 years to smoke them all and test them properly. ONE A DAY or your findings would be compromised.

Bullshit.

a) I usually smoke 2-3 phenos a day, and my findings are NOT compromised. The rating for each plant does not have to be exact. It's just a generic A/B/C rating. Not rocket science.

b) You don't need to grow anywhere near 10,000 phenos to get good results. If you did, then answer the question I asked: how much work does it take to genetically sequence that many phenos? How long does it take to sequence just *one* plant, vs just smoking it and recording the results?

c) Many traits that you are selecting for don't require smoking at all; for instance, stem strength, hardiness, drought and disease resistance, etc. None of those are difficult to select for in large numbers.

d) Maybe you sequence a plant, and analyze it with gas chromatography as well, and get an exact percentage of what molecules are present. But you still can't answer the question: does it smell good? Does it smoke good? How does it compare with this other pheno here that has a slightly different makeup? Only a smoke test can tell anything.

Then you would hard be pressed to go back through them all and really understand the slight differences in them as you described them without the actual memory of them.

Have you actually personally done any large scale cannabis breeding?

log roller · Nov 16, 2017

oldbootz said:
Cool so we both agree then that it's coming!

Yes. One day, far in the future. Not today.

meizzwang · Nov 16, 2017

Definitely would be an exciting tool, but like painting, singing, or producing movies, breeding is an art. Understandably, in cannabis world, the most talented artists tend not to share their secrets, or if they do, they do so in a way that can't be reproduced by others. You also have to ask yourself, how many of these truly talented breeders want to have their legacy and importance erased by machines? They will become insignificant if a machine can do better than they can.

Maybe machines can learn to create art, anything is possible! Oftentimes, the limitation of the machine is due to the limitation of the individual(s) who create it: it's not uncommon for an engineering genius to lack artistic abilities and emotional intelligence since their brains are wired mainly to logic (of course, there's exceptions). Similarly, it's not uncommon for artistic geniuses to not be able to work with engineers to create the damn thing since they don't think on the same wave length. But if you have someone or a team who that possess both high IQ's and artistic breeding talent, that can become a powerful combo!

iTarzan · Nov 17, 2017

log roller said:
Have you actually personally done any large scale cannabis breeding?

No! I have done some large scale cannabis smoking though.

log roller · Nov 17, 2017

Understandably, in cannabis world, the most talented artists tend not to share their secrets, or if they do, they do so in a way that can't be reproduced by others.

That's not understandable at all, actually. I hate people like that. I don't learn things just to keep them to myself and benefit myself exclusively. That's selfishness. Just like the folks that never want weed to be legalized, so they can keep raking in $3000/lb+ forever. Fuck those people.

You'll notice that those people who have the fewest and lamest ideas, are also the ones who are most secretive about what they've got.

You also have to ask yourself, how many of these truly talented breeders want to have their legacy and importance erased by machines? They will become insignificant if a machine can do better than they can.

Yeah, just like the buggy whip manufacturers became 'insignificant' because machines made their job obsolete.

Breeding cannabis isn't rocket science. It really isn't. I think certain people have a vested interest in making it out to be this super hard and complicated thing, so they can keep being worshipped as gods. There's a lot of douchebags like that in the world in all different fields. The essence of it is as Luther Burbank laid out: "select the best, reject the rest." The more you have to select from, the better.

If you had 10,000 plants to select from, then select the top 500-1000 best looking/smelling/hardiest plants, and smoke through those only.

Just to reiterate, I don't mean to trash OP's idea, because it certainly has merit. It's just that genetic sequencing technology will have to come a long ways before it's practical. You would need to sequence each plant and right now that's a difficult and cumbersome process. If/when the day comes when you can sequence a plant in a minute or less, automatically without a bunch of sample preparation etc, then I think you would be able to really apply some data analytics, and it certainly would be a useful tool for a breeder to use to supplement his own human senses and cognition. :tiphat:

IntelliGeneS · Nov 18, 2017

log roller, I think you're out of touch if you think NGS (not to mention nanopore sequencing) technology is impractical. It's incredibly rapid, fairly cheap, and is becoming even faster and more portable as the techniques and processing mature.

Also you're not seeing the potential benefits if you think waiting to have smokable flower to record the results is faster, the initial benefits of molecular breeding approaches are always the ability to massively increase breeding throughput. If you recognize the benefit of having a 10,000 plant population to select from (even if you don't understand the fundamental reasons why), then you can't deny the benefit of being able to make selections on 100,000 plants using the same space and the same amount of time.

Regardless, the idea presented by the OP had nothing to do with sequencing data in itself. The idea of machine learning applied to breeding (which has been done) is about adaptive and predictive modeling being used to guide selection decisions. To date, it has mostly been accomplished using training datasets to establish models, which are then periodically updated using observation sets, and breeding activities have been guided very effectively using this approach (in milk cows, for example). This approach is effective without sequence (or even marker) data, it merely needs good-quality phenotypic datasets of sufficient size over several generations to implement. Having sequence data (or even just fair SNP coverage data) and utilizing big data and phenotype-association computational approaches simply allows integrated molecular development strategies during the breeding process rather than having to pursue independent molecular marker discovery activities which can then be used to accelerate parallel breeding programs. These kinds of approaches are utilized all the time, much of modern breeding is using shortcut methodologies to be able to make decisions early which otherwise would be much more time consuming and costly. What's sometimes overlooked by those in the OP's position is that the failure of these approaches often hinges on the quality of the data collected, as failures in data quality have ripple effects on the ability of models and systems to draw meaningful associations.

meizzwang · Nov 20, 2017

log roller said:
That's not understandable at all, actually. I hate people like that. I don't learn things just to keep them to myself and benefit myself exclusively. That's selfishness. Just like the folks that never want weed to be legalized, so they can keep raking in $3000/lb+ forever. Fuck those people.

Patents allow people to share the details of their inventions without losing the potential monetary reward. Unfortunately, cannabis can't be legally patented AFAIK. While one may see holding back "trade secrets" as being selfish, another may see it as having the right to reap the competitive advantage or financial rewards of their invention. Imagine you invented a car and another company took your idea, had better marketing, and you couldn't sell a single one because you shared your secrets with others.

Bongstar420 · Nov 29, 2017

Strange. The most important thing that you can think to apply this to is getting high. I'm sure there aren't serious problems with food that need attention. Also, the dankest weed in the world happened by accident. Machines can't make danker weed than is possible and won't do anything unless we already did most of the work.

oldbootz said:
Hi folks

I have a background in programming and networking and I have been studying some of the latest advances in machine learning.

The idea is that if enough data is gathered, preferably in an automated fashion, and we can train up a convolutional neural network with it, we should be able to do some prediction about what to expect from the progeny of an inbreed cross. I think it would work maybe ok with inbreeding because all the genes are contained within the P1s and once you can input the F1s data it should be able to predict the F2.

The data inputs can be diverse and one would not have to program very much about how the neural network reasons about the data. We could input terpine analysis profiles, cannabiniod profiles, genetic analysis, plant dimensions, growth durations, all kinds of data.

As an output we would want a prediction of some of the same inputs that we put in. What kind of cannabinoids, terpines, morphological proportions etc.

So there are a few free machine learning networks released to be used by the public. For example we have google vision API https://cloud.google.com/vision/ and we have IBMs watson and there are quite a few run at home type binaries like Tensor Flow https://www.tensorflow.org/.

I don't think we are quite there yet in making this happen. First of all it would take a LOT of data to train any kind of accurate network, and it would take a lot of machine power which is less of a barrier if one leverages the power of a network already set up by a large company like google. But we are already seeing some amazing things coming from these networks and i think it wont be too long before we can compose various trained networks together to create a super network that would be able to do this.

With 50 qubit quantum computing on the horizon I see a bright future for data driven business.

I did hear recently that some people came up with a way to genetically analyse a seed and generate a terpine profile that the plant would have had if grown. This is really cool!

Thought I would share with you all my stoned thoughts because it excites me!

Bongi · Jan 18, 2018

I am very interested in things like this! In fact at this moment I am trying to get inside university to study machine learning and genetics. I dont know lot about them yet. Only some basic things.
How would you think that we collect data automaticaly? Omnidirectional rc plant sausers?

Then 3d scanner to get physical properetys + some kind of x-ray scanner for roots so that we know how they look?
Robotic arm with some kind of sample taker and preparer for chemical analysis?
I think arm could be trained to take samples with same kind of way as Google trained robotic arm to grap things. Many robots and plants to take samples from.

Machine learning applied to cannabis breeding

Active member

Well-known member

Member

Active member

~Cannabis-Resinous~

Well-known member

Member

Well-known member

Active member

Member

Member

Member

Member

Well-known member

Member

Member

Member

Member

Bongi